Commit Graph

259 Commits

Author SHA1 Message Date
Alex Dadgar 46770d57e5 Forwarding 2018-02-15 13:59:01 -08:00
Alex Dadgar cfe9afc567 Store connection time 2018-02-15 13:59:01 -08:00
Alex Dadgar 6dd1c9f49d Refactor 2018-02-15 13:59:00 -08:00
Alex Dadgar ad7bc0c6bd Server can forward ClientStats.Stats 2018-02-15 13:59:00 -08:00
Alex Dadgar 940a2df8a1 Pull inmem codec to helper 2018-02-15 13:59:00 -08:00
Alex Dadgar 13bbf3fbbb Track client connections 2018-02-15 13:59:00 -08:00
Alex Dadgar ba5ecb8c1a Dynamic RPC servers with context 2018-02-15 13:59:00 -08:00
Alex Dadgar 288b3c0e05 Helper to populate RPC server endpoints 2018-02-15 13:59:00 -08:00
Kyle Havlovitz 709b693d39 Clean up some leftover autopilot differences from Consul 2018-02-08 10:27:26 -08:00
Kyle Havlovitz 2ccf565bf6 Refactor redundancy_zone/upgrade_version out of client meta 2018-01-29 20:03:38 -08:00
Kyle Havlovitz a162b9ce14 Move server health loop into autopilot leader actions 2018-01-23 12:57:02 -08:00
Chelsea Komlo d09cc2a69f
Merge pull request #3492 from hashicorp/f-client-tls-reload
Client/Server TLS dynamic reload
2018-01-23 05:51:32 -05:00
Chelsea Holland Komlo 7d3c240871 swap raft layer tls wrapper 2018-01-19 17:00:15 -05:00
Chelsea Holland Komlo a8f655fbb3 allow for similar error messages for closed connections 2018-01-17 12:02:40 -05:00
Chelsea Holland Komlo 35466a331a fixing up raft reload tests
close second goroutine in raft-net
2018-01-17 10:29:15 -05:00
Kyle Havlovitz 7b980c42d8 Add raft remove by id endpoint/command 2018-01-16 13:35:32 -08:00
Chelsea Holland Komlo 5f52e8e103 feedback from code review 2018-01-16 11:55:11 -05:00
Chelsea Holland Komlo 649f86f094 refactor creating a new tls configuration 2018-01-16 08:02:39 -05:00
Chelsea Holland Komlo 214d128eb9 reload raft transport layer
fix up linting
2018-01-08 14:52:28 -05:00
Chelsea Holland Komlo 0708d34135 call reload on agent, client, and server separately 2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo 909bb0af07 refactor rpc listener methods, wait for proper shutdown 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo 6a2432659a code review fixups 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo 9741097406 reloading tls config should be atomic for clients/servers 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo e7bd156ef2 check error on generating tls context 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo 9b0a7a7f7c remove code duplication 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo 4e0dbd23cf prevent races when reloading, fully shut down raft 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo ae7fc4695e fixups from code review
Revert "close raft long-lived connections"

This reverts commit 3ffda28206fcb3d63ad117fd1d27ae6f832b6625.

reload raft connections on changing tls
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo dfb6a3d9a8 close raft long-lived connections 2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo acd3d1b162 fix up downgrading client to plaintext
add locks around changing server configuration
2018-01-08 09:21:06 -05:00
Chelsea Holland Komlo c0ad9a4627 add ability to upgrade/downgrade nomad agents tls configurations via sighup 2018-01-08 09:21:06 -05:00
Preetha Appan fcded9ba61
Add a TODO comment around handling peer address for remove peer correctly for raft protocol 3 2018-01-05 14:22:45 -06:00
Kyle Havlovitz 1c07066064 Add autopilot functionality based on Consul's autopilot 2017-12-18 14:29:41 -08:00
Kyle Havlovitz b775fc7b33
Added support for v2 raft APIs and -raft-protocol option 2017-12-12 10:17:16 -06:00
Chelsea Komlo 2dfda33703 Nomad agent reload TLS configuration on SIGHUP (#3479)
* Allow server TLS configuration to be reloaded via SIGHUP

* dynamic tls reloading for nomad agents

* code cleanup and refactoring

* ensure keyloader is initialized, add comments

* allow downgrading from TLS

* initalize keyloader if necessary

* integration test for tls reload

* fix up test to assert success on reloaded TLS configuration

* failure in loading a new TLS config should remain at current

Reload only the config if agent is already using TLS

* reload agent configuration before specific server/client

lock keyloader before loading/caching a new certificate

* introduce a get-or-set method for keyloader

* fixups from code review

* fix up linting errors

* fixups from code review

* add lock for config updates; improve copy of tls config

* GetCertificate only reloads certificates dynamically for the server

* config updates/copies should be on agent

* improve http integration test

* simplify agent reloading storing a local copy of config

* reuse the same keyloader when reloading

* Test that server and client get reloaded but keep keyloader

* Keyloader exposes GetClientCertificate as well for outgoing connections

* Fix spelling

* correct changelog style
2017-11-14 17:53:23 -08:00
Alex Dadgar 5c34af1ee1 leader acl token 2017-10-23 14:10:14 -07:00
Alex Dadgar c1cc51dbee sync 2017-10-13 14:36:02 -07:00
Alex Dadgar 4173834231 Enable more linters 2017-09-26 15:26:33 -07:00
Alex Dadgar e5ec915ac3 sync 2017-09-19 10:08:23 -05:00
Alex Dadgar 84d06f6abe Sync namespace changes 2017-09-07 17:04:21 -07:00
Armon Dadgar 3e46094cee Passthrough replication token for token/policy replication 2017-09-04 13:05:53 -07:00
Armon Dadgar dc1904b57a nomad: adding ACL token resolution logic 2017-09-04 13:04:45 -07:00
Armon Dadgar e4f5f305ea nomad: adding Get/List endpoints for ACL policies 2017-09-04 13:03:15 -07:00
Alex Dadgar 62c14c21a5 Merge pull request #3142 from hashicorp/f-deployment-watcher
Deployment watcher takes state store
2017-08-31 10:45:17 -07:00
Jeremy Olexa f94f237597 Update peers.info message for operators 2017-08-31 08:51:04 -05:00
Alex Dadgar 590ff91bf3 Deployment watcher takes state store 2017-08-30 18:51:59 -07:00
Chelsea Holland Komlo 465c4d7082 change endpoint to /v1/search 2017-08-14 17:38:10 +00:00
Chelsea Holland Komlo 5ee58a391b rename to cluster search
comment updates
2017-08-14 17:36:14 +00:00
Luke Farnell f0ced87b95 fixed all spelling mistakes for goreport 2017-08-07 17:13:05 -04:00
Chelsea Holland Komlo 4dd6b46198 Retrieve job information for resources endpoint
requires further refactoring and logic for more contexts
2017-08-04 14:34:25 +00:00
Alex Dadgar dad9e69822 more comment fixes 2017-07-07 12:03:11 -07:00
Alex Dadgar 7154e4e08f Remove setters 2017-07-07 12:03:11 -07:00
Alex Dadgar 87d187d777 Tests 2017-07-07 12:03:11 -07:00
Alex Dadgar 6f821beec4 fix integration slightly 2017-07-07 12:03:11 -07:00
Alex Dadgar 7af65aa3d7 Add watcher to server 2017-07-07 12:03:11 -07:00
Alex Dadgar d04877d23c initial impl 2017-07-07 12:03:11 -07:00
Michael Schurter 33318501b6 Backoff on Consul lookup failures 2017-04-19 12:42:47 -07:00
Michael Schurter e204a287ed Refactor Consul Syncer into new ServiceClient
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
2017-04-19 12:42:47 -07:00
Alex Dadgar a9c8b09da8 Push to configs 2017-04-14 15:24:55 -07:00
Alex Dadgar 5be806a3df Fix vet script and fix vet problems
This PR fixes our vet script and fixes all the missed vet changes.

It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar 8bfc4255eb Add server metrics 2017-02-14 16:02:18 -08:00
Alex Dadgar b1cd81e997 Remove todos 2017-02-10 15:41:23 -08:00
Alex Dadgar 2d4d9b79d8 Operator command/endpoint/documentation 2017-02-09 18:04:46 -08:00
Alex Dadgar ae31f4c84e Respond to comments 2017-02-08 14:50:19 -08:00
Alex Dadgar ee368762ae It builds 2017-02-02 16:07:15 -08:00
Alex Dadgar 26db1bd12c Join + Leave peer 2017-02-02 15:49:06 -08:00
Alex Dadgar ac10aed731 Update setupRaft 2017-02-02 15:31:36 -08:00
Alex Dadgar 78cfcd2724 Bump protocol version and update numOtherPeers 2017-02-02 13:52:31 -08:00
Alex Dadgar 15ffdff497 Vault Client on Server handles SIGHUP
This PR allows the Vault client on the server to handle a SIGHUP. This
allows updating the Vault token and any other configuration without
downtime.
2017-02-01 14:24:10 -08:00
Diptanu Choudhury e927de02d2 Moved functions to helper from structs 2017-01-18 15:55:14 -08:00
Diptanu Choudhury 1a8fa8c8d5 Making Nomad TLS configs region aware 2016-11-01 11:55:29 -07:00
Diptanu Choudhury 7c61e115bd Moved tlsutil into helpers 2016-10-25 16:05:37 -07:00
Diptanu Choudhury cf35aeac84 Moving the TLSConfig to structs 2016-10-25 15:57:38 -07:00
Diptanu Choudhury e03927bb5c Changed the way TLS config is parsed 2016-10-24 13:56:19 -07:00
Diptanu Choudhury 2e3118e69c Implemented TLS support for http and rpc 2016-10-23 22:22:00 -07:00
Diptanu Choudhury 0f6e0d10b6 Enable serf encryption (#1791)
* Added the keygen command

* Added support for gossip encryption

* Changed the URL for keyring management

* Fixed the cli

* Added some tests

* Added tests for keyring operations

* Added a test for removal of keys

* Added some docs

* Fixed some docs

* Added general options
2016-10-17 10:48:04 -07:00
Alex Dadgar 48696ba0cc Use tomb to shutdown
Token revocation

Remove from the statestore

Revoke tokens

Don't error when Vault is disabled as this could cause issue if the operator ever goes from enabled to disabled

update server interface to allow enable/disable and config loading

test the new functions

Leader revoke

Use active
2016-08-28 14:06:25 -07:00
Alex Dadgar 713e310670 Renew loop 2016-08-17 16:25:38 -07:00
Alex Dadgar 750a44b2c0 Create a Vault interface for the server 2016-08-17 16:25:38 -07:00
Alex Dadgar 6e2f0a2776 Server has Vault API client 2016-08-17 16:25:38 -07:00
Sean Chittenden 8bdb38d016
Code golf
Pointed out by: @dadgar
2016-06-21 14:26:01 -07:00
Sean Chittenden df4fe2e502
Fix the shuffling of remote datacenters.
Pointed out by: @ryanuber
2016-06-21 13:37:22 -07:00
Sean Chittenden 46e2d54acf
Provide `nomad.Config` with a default `LogOutput` of `os.StdErr` 2016-06-17 06:44:10 -07:00
Sean Chittenden 9a60999100
Pass a logger arg to `NewClient` and `NewServer` 2016-06-16 23:29:23 -07:00
Sean Chittenden 7c24487850
Fix up various error handling 2016-06-16 14:40:09 -07:00
Sean Chittenden 71cd9984ae
Immediately query Consul upon initialization if we have no peers.
Also don't attempt to join the Server with itself.
2016-06-16 14:27:10 -07:00
Sean Chittenden 65319252b9
Rework `server_auto_join` to use a timer instead of the peer count.
It is perfectly viable for an admin to downsize a Nomad Server cluster
down to 1, 2, or `num % 2 == 0` (however ill-advised such activities
may be).  And instead of using `bootstrap_expect`, use a timeout-based
strategy.  If the `bootstrapFn` hasn't observed a leader in 15s it will
fall back to Consul and will poll every ~60s until it sees a leader.
2016-06-16 12:14:03 -07:00
Sean Chittenden b0fecbefc1
Define `BootstrapExepct` as an `int32` so it can be manipulated atomically. 2016-06-16 12:00:15 -07:00
Sean Chittenden 5b0def194a
Namespace the log messages 2016-06-15 12:40:51 -07:00
Sean Chittenden bffc82d668
Do not consider the number of Serf members when considering falling back to Consul. 2016-06-15 12:40:51 -07:00
Sean Chittenden 324af8d7f1
Guard the auto-join functionality behind its `consul.server_auto_join` tunable 2016-06-15 12:40:51 -07:00
Sean Chittenden 5e0ced2ae7
Shuffle all datacenters vs only the nearest N datacenters.
Per discussion, we want to be aggressive about fanning out vs possibly
fixating on only local DCs.  With RPC forwarding in place, a random walk
may be less optimal from a network latency perspective, but it is guaranteed
to eventually result in a converged state because all DCs are candidates
during the bootstrapping process.
2016-06-15 12:40:51 -07:00
Sean Chittenden 2123460cf0
Bump various Consul search limits
Client: Search limit increased from 4 random DCs to 8 random DCs, plus nearest.
Server: Search factor increased from 3 to 5 times the bootstrap_expect.

This should allow for faster convergence in large environments (e.g.
sub-5min for 10K Consul DCs).
2016-06-15 12:40:51 -07:00
Sean Chittenden e8d1264dbc
Short-circuit the bootstrapFn if we have a leader 2016-06-15 12:40:51 -07:00
Sean Chittenden f05514335b
Teach Nomad servers how to fall back to Consul. 2016-06-15 12:40:51 -07:00
Sean Chittenden 3d64daafd9
Fold RaftPeers() into its only call site now 2016-06-10 15:54:39 -04:00
Sean Chittenden bff57a0dce
Reconcile, clean up, and centralize API version numbers (major and minor).
Reduce future confusion by introducing a minor version that is gossiped out
via the `mvn` Serf tag (Minor Version Number, `vsn` is already being used for
to communicate `Major Version Number`).

Background: hashicorp/consul/issues/1346#issuecomment-151663152
2016-06-10 15:50:11 -04:00
Sean Chittenden d76c042a13
Invert error handling logic 2016-06-10 15:50:11 -04:00
Sean Chittenden 89168b0c51
Invert check definition so the error is first 2016-06-10 15:50:11 -04:00
Sean Chittenden 17116fc5a7
Rebalance Nomad client RPCs among different Nomad servers.
Implement client/rpc_proxy.RpcProxy.
2016-06-10 15:50:11 -04:00
Sean Chittenden 49deaae2ae
Seed random once in main 2016-06-10 15:48:36 -04:00
Sean Chittenden dc28ab0cb5
Speling police 2016-05-15 09:41:34 -07:00
Diptanu Choudhury 26d1b60369 Adding raft peers in agent info 2016-04-05 10:30:46 -07:00
Diptanu Choudhury d472dc2988 Adding the raft leader addr to server stats 2016-04-03 16:38:39 -07:00
Alex Dadgar bf74e2f790 display server leaders per region 2016-03-17 16:04:09 -07:00
Armon Dadgar 7fc7cd9453 nomad: batch client updates for 50msec 2016-02-21 18:51:34 -08:00
Alex Dadgar 143972b6d9 Job GC endpoint 2016-02-20 15:50:41 -08:00
Alex Dadgar 25c5e543f4 Use crypto random seed 2016-02-17 11:47:02 -08:00
Alex Dadgar 01cadf7cb0 Seed the servers random number generator 2016-02-16 19:40:02 -08:00
Alex Dadgar c55eb0816c Address comments 2016-01-31 18:46:45 -08:00
Alex Dadgar 74135f02a4 Blocked Eval tracker 2016-01-31 18:04:45 -08:00
Alex Dadgar 80dd30b03d Add force spawn endpoint 2016-01-13 10:19:53 -08:00
Alex Dadgar b3e87b6719 Remove the periodicRunner interface and pass the server as an interface to the periodicDispatcher 2015-12-23 18:26:39 -08:00
Alex Dadgar 670cc50a02 merge 2015-12-23 18:26:39 -08:00
Alex Dadgar a892d61ae7 FSM integration 2015-12-23 18:26:39 -08:00
Ryan Uber d983f266b0 nomad: sort regions before returning 2015-11-24 13:15:01 -08:00
Ryan Uber 39b2c3a07b nomad: use a read-only lock 2015-11-23 22:27:07 -08:00
Ryan Uber ad6b55a37a nomad: support listing regions 2015-11-23 22:27:03 -08:00
Armon Dadgar b68c8404b1 nomad: remove noisy logs 2015-09-21 14:14:19 -07:00
Chris Bednarski da93d4a30f Change error to err to be consistent with other usage 2015-09-11 10:26:33 -07:00
Chris Bednarski 39feffd67f Change debug to info 2015-09-11 10:24:52 -07:00
Chris Bednarski 6ea318cacb Change to use the logger instance defined earlier 2015-09-11 10:23:53 -07:00
Chris Bednarski 3000ef07c1 Added logging to server startup so we can see where it fails 2015-09-11 10:20:36 -07:00
Armon Dadgar 7d69aa78c1 nomad: using Raft StartAsLeader to make tests faster 2015-09-07 10:46:41 -07:00
Armon Dadgar 0702f17989 Rename client endpoint to node endpoint 2015-09-06 20:31:32 -07:00
Armon Dadgar 46cbe8285d nomad: adding Alloc endpoint 2015-09-06 15:34:28 -07:00
Armon Dadgar 4d6238ebb8 client: test updating alloc status 2015-08-29 14:22:24 -07:00
Armon Dadgar 2ea99f211a nomad: updating for new alloc representation 2015-08-25 17:36:52 -07:00
Armon Dadgar d4e2faf216 nomad: track the workers in the pool 2015-08-23 10:53:53 -07:00
Armon Dadgar 3d48ff4c6f nomad: no heartbeat for nodes in terminal status 2015-08-22 17:17:13 -07:00
Armon Dadgar 40def1a187 nomad: expose RuntimeStats 2015-08-20 15:29:30 -07:00
Armon Dadgar c1d8ab0983 nomad: rename client endpoint struct 2015-08-16 17:40:35 -07:00
Armon Dadgar 79a1471b85 nomad: add delivery limit to eval broker 2015-08-16 10:55:55 -07:00
Armon Dadgar aadecdd805 nomad: adding version API endpoint 2015-08-15 12:59:10 -07:00
Armon Dadgar 8c2f8cddd0 nomad: create system scheduler as needed 2015-08-06 17:08:40 -07:00
Armon Dadgar 5f1ebb9274 nomad: adding special 'system' scheduler 2015-08-06 17:04:35 -07:00
Armon Dadgar 7cdcc45c64 nomad: cleanup stats goroutines 2015-08-05 16:45:50 -07:00
Armon Dadgar 3c2a16038b nomad: first pass adding scheduling workers 2015-07-28 15:12:08 -07:00
Armon Dadgar 4be2c1b9db nomad: adding plan endpoint 2015-07-27 15:31:49 -07:00
Armon Dadgar 743c7f32ef nomad: integrate plan queue 2015-07-27 15:11:42 -07:00
Armon Dadgar 487acdd3c7 nomad: threading eval broker through fsm and state store 2015-07-23 22:30:08 -07:00
Armon Dadgar 93b832cf59 nomad: emit metrics from the eval broker 2015-07-23 22:17:37 -07:00
Armon Dadgar fc11808fe9 nomad: add eval broker, configurable nack timeout 2015-07-23 21:44:17 -07:00
Armon Dadgar eff7170c77 nomad: adding RPC endpoints for fetching Eval 2015-07-23 16:00:19 -07:00
Armon Dadgar 501a57f9d8 nomad: Adding the CRUD endpoints for jobs 2015-07-23 14:41:18 -07:00
Armon Dadgar 5fdf3c5d33 nomad: adding Client endpoint 2015-06-07 12:14:41 -07:00
Armon Dadgar 4f78d6a7e9 nomad: adding connection pool 2015-06-07 11:50:29 -07:00
Armon Dadgar ac6ed01a1c nomad: track local peers 2015-06-07 11:32:01 -07:00
Armon Dadgar 25aad83ea4 nomad: testing leader bridging of serf 2015-06-05 23:54:45 +02:00
Armon Dadgar c18c068bf2 nomad: testing remove peer 2015-06-04 13:02:39 +02:00
Armon Dadgar b456588d1c nomad: adding reconcile channel 2015-06-04 12:42:56 +02:00