open-nomad

Author	SHA1	Message	Date
Luiz Aoqui	64b558c14c	core: store and check for Raft version changes (#12362 ) Downgrading the Raft version protocol is not a supported operation. Checking for a downgrade is hard since this information is not stored in any persistent place. When a server re-joins a cluster with a prior Raft version, the Serf tag is updated so Nomad can't tell that the version changed. Mixed version clusters must be supported to allow for zero-downtime rolling upgrades. During this it's expected that the cluster will have mixed Raft versions. Enforcing consistency strong version consistency would disrupt this flow. The approach taken here is to store the Raft version on disk. When the server starts the `raft_protocol` value is written to the file `data_dir/raft/version`. If that file already exists, its content is checked against the current `raft_protocol` value to detect downgrades and prevent the server from starting. Any other types of errors are ignore to prevent disruptions that are outside the control of operators. The only option in cases of an invalid or corrupt file would be to delete it, making this check useless. So just overwrite its content with the new version and provide guidance on how to check that their cluster is an expected state.	2022-03-24 14:42:00 -04:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Charlie Voiselle	98a240cd99	Make number of scheduler workers reloadable (#11593 ) ## Development Environment Changes * Added stringer to build deps ## New HTTP APIs * Added scheduler worker config API * Added scheduler worker info API ## New Internals * (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume() * Update shutdown to use context * Add mutex for contended server data - `workerLock` for the `workers` slice - `workerConfigLock` for the `Server.Config.NumSchedulers` and `Server.Config.EnabledSchedulers` values ## Other * Adding docs for scheduler worker api * Add changelog message Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>	2022-01-06 11:56:13 -05:00
Mahmood Ali	acbfeb5815	Simplify Bootstrap logic in tests This change updates tests to honor `BootstrapExpect` exclusively when forming test clusters and removes test only knobs, e.g. `config.DevDisableBootstrap`. Background: Test cluster creation is fragile. Test servers don't follow the BootstapExpected route like production clusters. Instead they start as single node clusters and then get rejoin and may risk causing brain split or other test flakiness. The test framework expose few knobs to control those (e.g. `config.DevDisableBootstrap` and `config.Bootstrap`) that control whether a server should bootstrap the cluster. These flags are confusing and it's unclear when to use: their usage in multi-node cluster isn't properly documented. Furthermore, they have some bad side-effects as they don't control Raft library: If `config.DevDisableBootstrap` is true, the test server may not immediately attempt to bootstrap a cluster, but after an election timeout (~50ms), Raft may force a leadership election and win it (with only one vote) and cause a split brain. The knobs are also confusing as Bootstrap is an overloaded term. In BootstrapExpect, we refer to bootstrapping the cluster only after N servers are connected. But in tests and the knobs above, it refers to whether the server is a single node cluster and shouldn't wait for any other server. Changes: This commit makes two changes: First, it relies on `BootstrapExpected` instead of `Bootstrap` and/or `DevMode` flags. This change is relatively trivial. Introduce a `Bootstrapped` flag to track if the cluster is bootstrapped. This allows us to keep `BootstrapExpected` immutable. Previously, the flag was a config value but it gets set to 0 after cluster bootstrap completes.	2020-03-02 13:47:43 -05:00
Mahmood Ali	a9f551542d	Merge pull request #160 from hashicorp/b-mtls-hostname server: validate role and region for RPC w/ mTLS	2020-01-30 12:59:17 -06:00
Seth Hoenig	f0c3dca49c	tests: swap lib/freeport for tweaked helper/freeport Copy the updated version of freeport (sdk/freeport), and tweak it for use in Nomad tests. This means staying below port 10000 to avoid conflicts with the lib/freeport that is still transitively used by the old version of consul that we vendor. Also provide implementations to find ephemeral ports of macOS and Windows environments. Ports acquired through freeport are supposed to be returned to freeport, which this change now also introduces. Many tests are modified to include calls to a cleanup function for Server objects. This should help quite a bit with some flakey tests, but not all of them. Our port problems will not go away completely until we upgrade our vendor version of consul. With Go modules, we'll probably do a 'replace' to swap out other copies of freeport with the one now in 'nomad/helper/freeport'.	2019-12-09 08:37:32 -06:00
Jasmine Dahilig	51e141be7a	backfill region from job hcl in jobUpdate and jobPlan endpoints - updated region in job metadata that gets persisted to nomad datastore - fixed many unrelated unit tests that used an invalid region value (they previously passed because hcl wasn't getting picked up and the job would default to global region)	2019-06-13 08:03:16 -07:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Chelsea Holland Komlo	41e35edf0c	fix test that now requires different config for test assertions	2018-06-07 17:07:06 -04:00
Alex Dadgar	21c5ed850d	Register events	2018-05-22 14:06:33 -07:00
Chelsea Holland Komlo	dd5f627feb	set server configuration checksum on reload	2018-03-27 18:03:52 -04:00
Chelsea Holland Komlo	c2a95f9d7d	add test for upgrading only RPC connections	2018-03-26 10:55:27 -04:00
Chelsea Holland Komlo	66e44cdb73	Allow TLS configurations for HTTP and RPC connections to be reloaded separately	2018-03-21 17:51:08 -04:00
Alex Dadgar	b8607ad6d6	Heartbeat uses client rpc advertise and server defaults server rpc advertise addr	2018-03-16 16:47:08 -07:00
Alex Dadgar	92cb552ff6	Always add core scheduler and detect invalid schedulers	2018-03-14 10:53:27 -07:00
Alex Dadgar	a6dfffa4fa	Add testing interfaces	2018-02-15 13:59:00 -08:00
Alex Dadgar	4243438661	Improve TLS cluster testing	2018-02-15 13:59:00 -08:00
Chelsea Komlo	d09cc2a69f	Merge pull request #3492 from hashicorp/f-client-tls-reload Client/Server TLS dynamic reload	2018-01-23 05:51:32 -05:00
Chelsea Holland Komlo	7d3c240871	swap raft layer tls wrapper	2018-01-19 17:00:15 -05:00
Chelsea Holland Komlo	a8f655fbb3	allow for similar error messages for closed connections	2018-01-17 12:02:40 -05:00
Chelsea Holland Komlo	35466a331a	fixing up raft reload tests close second goroutine in raft-net	2018-01-17 10:29:15 -05:00
Chelsea Holland Komlo	214d128eb9	reload raft transport layer fix up linting	2018-01-08 14:52:28 -05:00
Chelsea Holland Komlo	0708d34135	call reload on agent, client, and server separately	2018-01-08 09:56:31 -05:00
Chelsea Holland Komlo	c0ad9a4627	add ability to upgrade/downgrade nomad agents tls configurations via sighup	2018-01-08 09:21:06 -05:00
Kyle Havlovitz	1c07066064	Add autopilot functionality based on Consul's autopilot	2017-12-18 14:29:41 -08:00
Alex Dadgar	cb0d0ef009	move to consul freeport implementation	2017-10-23 16:51:40 -07:00
Alex Dadgar	dbc014b360	Standardize retrieving a free port into a helper package	2017-10-23 16:48:20 -07:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Armon Dadgar	20a8e590a0	nomad: support ACL bootstrap reset	2017-09-10 16:03:30 -07:00
Alex Dadgar	84d06f6abe	Sync namespace changes	2017-09-07 17:04:21 -07:00
Armon Dadgar	4bda2fa9e9	nomad: ACL endpoints check support enabled and redirect to authority	2017-09-04 13:05:53 -07:00
Alex Dadgar	06eddf243c	parallel nomad tests	2017-07-25 17:39:36 -07:00
Alex Dadgar	7a74080079	Log reason a plan gets rejected per node. This PR adds a log explaining why a plan gets rejected. Should help debugging.	2017-07-13 17:14:02 -07:00
Alex Dadgar	079a4da7d2	Fix flaky test: TestServer_RPC_MixedTLS	2017-05-11 14:55:12 -07:00
Michael Schurter	e204a287ed	Refactor Consul Syncer into new ServiceClient Fixes #2478 #2474 #1995 #2294 The new client only handles agent and task service advertisement. Server discovery is mostly unchanged. The Nomad client agent now handles all Consul operations instead of the executor handling task related operations. When upgrading from an earlier version of Nomad existing executors will be told to deregister from Consul so that the Nomad agent can re-register the task's services and checks. Drivers - other than qemu - now support an Exec method for executing abritrary commands in a task's environment. This is used to implement script checks. Interfaces are used extensively to avoid interacting with Consul in tests that don't assert any Consul related behavior.	2017-04-19 12:42:47 -07:00
Michael Schurter	a81c387adf	Require TLS for server RPC when enabled Fixes #2525 We used to be checking a RequireTLS field that was never set. Instead we can just check the TLSConfig.EnableRPC field and require TLS if it's enabled. Added a few unfortunately slow integration tests to assert the intended behavior of misconfigured RPC TLS. Also disable a lot of noisy test logging when -v isn't specified.	2017-04-06 09:34:36 -07:00
Alex Dadgar	15ffdff497	Vault Client on Server handles SIGHUP This PR allows the Vault client on the server to handle a SIGHUP. This allows updating the Vault token and any other configuration without downtime.	2017-02-01 14:24:10 -08:00
Alex Dadgar	82960c46d8	Tests	2016-10-11 13:28:18 -07:00
Alex Dadgar	a8efce874f	Token renewal and beginning of tests	2016-08-17 16:25:38 -07:00
Alex Dadgar	713e310670	Renew loop	2016-08-17 16:25:38 -07:00
Alex Dadgar	6e2f0a2776	Server has Vault API client	2016-08-17 16:25:38 -07:00
Sean Chittenden	9a60999100	Pass a logger arg to `NewClient` and `NewServer`	2016-06-16 23:29:23 -07:00
Sean Chittenden	31313b68cf	Don't assign to an atomic w/o using atomic setter func	2016-06-16 14:43:46 -07:00
Sean Chittenden	14f9d2a947	Use the config's log output	2016-06-15 12:40:51 -07:00
Sean Chittenden	f05514335b	Teach Nomad servers how to fall back to Consul.	2016-06-15 12:40:51 -07:00
Sean Chittenden	060300007e	Use a monotonically incrementing number to create unique node names. Also remove the space from the "name" of the node	2016-06-10 15:50:11 -04:00
Ryan Uber	a230a70cc7	nomad: testing region list	2015-11-23 22:27:07 -08:00
Alex Dadgar	2f5a2b795b	Fix a racy test and increase the raft timeouts to mitigate other racy tests	2015-10-16 17:53:43 -07:00
Armon Dadgar	7d69aa78c1	nomad: using Raft StartAsLeader to make tests faster	2015-09-07 10:46:41 -07:00
Armon Dadgar	2ea99f211a	nomad: updating for new alloc representation	2015-08-25 17:36:52 -07:00

1 2

56 commits