open-nomad

Commit Graph

Author	SHA1	Message	Date
hashicorp-copywrite[bot]	005636afa0	[COMPLIANCE] Add Copyright and License Headers	2023-04-10 15:36:59 +00:00
Tim Gross	3c78980b78	make version checks specific to region (1.4.x) (#14912 ) * One-time tokens are not replicated between regions, so we don't want to enforce that the version check across all of serf, just members in the same region. * Scheduler: Disconnected clients handling is specific to a single region, so we don't want to enforce that the version check across all of serf, just members in the same region. * Variables: enforce version check in Apply RPC * Cleans up a bunch of legacy checks. This changeset is specific to 1.4.x and the changes for previous versions of Nomad will be manually backported in a separate PR.	2022-10-17 16:23:51 -04:00
Michael Schurter	654d458960	core: add deprecated mvn tag to serf (#12327 ) Revert a small part of #11600 after @lgfa29 discovered it would break compatibility with Nomad <= v1.2! Nomad <= v1.2 expects the `vsn` tag to exist in Serf. It has always been `1`. It has no functional purpose. However it causes a parsing error if it is not set: https://github.com/hashicorp/nomad/blob/v1.2.6/nomad/util.go#L103-L108 This means Nomad servers at version v1.2 or older will not allow servers without this tag to join. The `mvn` minor version tag is also checked, but soft fails. I'm not setting that because I want as much of this cruft gone as possible.	2022-03-24 14:44:21 -04:00
Michael Schurter	7494a0c4fd	core: remove all traces of unused protocol version Nomad inherited protocol version numbering configuration from Consul and Serf, but unlike those projects Nomad has never used it. Nomad's `protocol_version` has always been `1`. While the code is effectively unused and therefore poses no runtime risks to leave, I felt like removing it was best because: 1. Nomad's RPC subsystem has been able to evolve extensively without needing to increment the version number. 2. Nomad's HTTP API has evolved extensively without increment `API{Major,Minor}Version`. If we want to version the HTTP API in the future, I doubt this is the mechanism we would choose. 3. The presence of the `server.protocol_version` configuration parameter is confusing since `server.raft_protocol` is an important parameter for operators to consider. Even more confusing is that there is a distinct Serf protocol version which is included in `nomad server members` output under the heading `Protocol`. `raft_protocol` is the only protocol version relevant to Nomad developers and operators. The other protocol versions are either deadcode or have never changed (Serf). 4. If we were to need to version the RPC, HTTP API, or Serf protocols, I don't think these configuration parameters and variables are the best choice. If we come to that point we should choose a versioning scheme based on the use case and modern best practices -- not this 6+ year old dead code.	2022-02-18 16:12:36 -08:00
Luiz Aoqui	0e09b120e4	fix mTLS certificate check on agent to agent RPCs (#11998 ) PR #11956 implemented a new mTLS RPC check to validate the role of the certificate used in the request, but further testing revealed two flaws: 1. client-only endpoints did not accept server certificates so the request would fail when forwarded from one server to another. 2. the certificate was being checked after the request was forwarded, so the check would happen over the server certificate, not the actual source. This commit checks for the desired mTLS level, where the client level accepts both, a server or a client certificate. It also validates the cercertificate before the request is forwarded.	2022-02-04 20:35:20 -05:00
Luiz Aoqui	c4cff5359f	Verify TLS certificate on endpoints that are used between agents only (#11956 )	2022-02-02 15:03:18 -05:00
Tim Gross	7d53ed88d6	csi: client RPCs should return wrapped errors for checking (#8605 ) When the client-side actions of a CSI client RPC succeed but we get disconnected during the RPC or we fail to checkpoint the claim state, we want to be able to retry the client RPC without getting blocked by the client-side state (ex. mount points) already having been cleaned up in previous calls.	2020-08-07 11:01:36 -04:00
Mahmood Ali	cf53ee57cd	remove unused dropButLastChannel	2020-02-13 18:56:53 -05:00
Mahmood Ali	79823ae07d	handle channel close signal Always deliver last value then send close signal.	2020-01-28 09:44:34 -05:00
Mahmood Ali	e436d2701a	Handle Nomad leadership flapping Fixes a deadlock in leadership handling if leadership flapped. Raft propagates leadership transition to Nomad through a NotifyCh channel. Raft blocks when writing to this channel, so channel must be buffered or aggressively consumed[1]. Otherwise, Raft blocks indefinitely in `raft.runLeader` until the channel is consumed[1] and does not move on to executing follower related logic (in `raft.runFollower`). While Raft `runLeader` defer function blocks, raft cannot process any other raft operations. For example, `run{Leader\|Follower}` methods consume `raft.applyCh`, and while runLeader defer is blocked, all raft log applications or config lookup will block indefinitely. Sadly, `leaderLoop` and `establishLeader` makes few Raft calls! `establishLeader` attempts to auto-create autopilot/scheduler config [3]; and `leaderLoop` attempts to check raft configuration [4]. All of these calls occur without a timeout. Thus, if leadership flapped quickly while `leaderLoop/establishLeadership` is invoked and hit any of these Raft calls, Raft handler _deadlock_ forever. Depending on how many times it flapped and where exactly we get stuck, I suspect it's possible to get in the following case: * Agent metrics/stats http and RPC calls hang as they check raft.Configurations * raft.State remains in Leader state, and server attempts to handle RPC calls (e.g. node/alloc updates) and these hang as well As we create goroutines per RPC call, the number of goroutines grow over time and may trigger a out of memory errors in addition to missed updates. [1] `d90d6d6bda/config.go (L190-L193)` [2] `d90d6d6bda/raft.go (L425-L436)` [3] `2a89e47746/nomad/leader.go (L198-L202)` [4] `2a89e47746/nomad/leader.go (L877)`	2020-01-22 13:08:34 -05:00
Mahmood Ali	4b2ba62e35	acl: check ACL against object namespace Fix a bug where a millicious user can access or manipulate an alloc in a namespace they don't have access to. The allocation endpoints perform ACL checks against the request namespace, not the allocation namespace, and performs the allocation lookup independently from namespaces. Here, we check that the requested can access the alloc namespace regardless of the declared request namespace. Ideally, we'd enforce that the declared request namespace matches the actual allocation namespace. Unfortunately, we haven't documented alloc endpoints as namespaced functions; we suspect starting to enforce this will be very disruptive and inappropriate for a nomad point release. As such, we maintain current behavior that doesn't require passing the proper namespace in request. A future major release may start enforcing checking declared namespace.	2019-10-08 12:59:22 -04:00
Lang Martin	4610c70777	util simplify partitionAll	2019-07-10 13:56:19 -04:00
Lang Martin	10848841be	util partitionAll for paging	2019-07-10 13:56:19 -04:00
Arshneet Singh	b7b050cdd1	Change min version required for plan optimization	2019-04-24 12:36:07 -07:00
Arshneet Singh	d4e7a5c005	Add comments to functions, and use require instead of assert	2019-04-23 09:57:21 -07:00
Arshneet Singh	0dd4c109e8	Compat tags	2019-04-23 09:18:01 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Alex Dadgar	5009566503	do not bootstrap with non voters	2018-09-19 17:17:39 -07:00
Michael Schurter	e1cbcf0b3c	rpc: give min rpc version variable a better name	2018-04-09 11:09:05 -07:00
Michael Schurter	88a9409f8e	rpc: only attempt NodeRpc for nodes>=0.8 Attempting NodeRpc (or streaming node rpc) for clients that do not support it causes it to hang indefinitely because while the TCP connection exists, the client will never respond.	2018-04-09 11:08:06 -07:00
Alex Dadgar	6c1fa878ea	Forwarding	2018-02-15 13:59:02 -08:00
Alex Dadgar	6dd1c9f49d	Refactor	2018-02-15 13:59:00 -08:00
Kyle Havlovitz	2ccf565bf6	Refactor redundancy_zone/upgrade_version out of client meta	2018-01-29 20:03:38 -08:00
Kyle Havlovitz	7b980c42d8	Add raft remove by id endpoint/command	2018-01-16 13:35:32 -08:00
Kyle Havlovitz	1c07066064	Add autopilot functionality based on Consul's autopilot	2017-12-18 14:29:41 -08:00
Kyle Havlovitz	045f346293	Use region instead of datacenter for version checking	2017-12-12 10:17:16 -06:00
Kyle Havlovitz	f088446d48	Add missing exist checks and doc line	2017-12-12 10:17:16 -06:00
Kyle Havlovitz	b775fc7b33	Added support for v2 raft APIs and -raft-protocol option	2017-12-12 10:17:16 -06:00
Alex Dadgar	c1cc51dbee	sync	2017-10-13 14:36:02 -07:00
Alex Dadgar	84d06f6abe	Sync namespace changes	2017-09-07 17:04:21 -07:00
Sean Chittenden	bff57a0dce	Reconcile, clean up, and centralize API version numbers (major and minor). Reduce future confusion by introducing a minor version that is gossiped out via the `mvn` Serf tag (Minor Version Number, `vsn` is already being used for to communicate `Major Version Number`). Background: hashicorp/consul/issues/1346#issuecomment-151663152	2016-06-10 15:50:11 -04:00
Sean Chittenden	49deaae2ae	Seed random once in main	2016-06-10 15:48:36 -04:00
Sean Chittenden	e36686a17d	Use consul/lib's RandomStagger Removes four redundant copies of the method in the process.	2016-06-10 15:48:36 -04:00
Sean Chittenden	e0e7d94450	Use consul/lib's RateScaledInterval	2016-06-10 15:48:36 -04:00
Alex Dadgar	e1dc47de91	Remove blank line	2016-02-17 11:48:52 -08:00
Alex Dadgar	25c5e543f4	Use crypto random seed	2016-02-17 11:47:02 -08:00
Armon Dadgar	ea0795995d	Use a single implementation of GenerateUUID	2015-09-07 15:23:03 -07:00
Chris Bednarski	96cb220ff4	Update references to "os" to use "kernel.name" This brings test code and mocks up to date with the fingerprinter. This was a slightly larger change than I anticipated, but I think it's good for two reasons: 1. More semanitcally correct. `os.name` is something like "Windows 10 Pro" or "Ubuntu", while `kernel.name` is "windows" or "linux". `os.version` and `kernel.version` match these semantics. 2. `kernel.name` is much easier to grep for than `os`, which is helpful because oracle can't help us with strings.	2015-08-28 01:30:47 -07:00
Armon Dadgar	e489ee8ebd	nomad: add rate based scaling util methods	2015-08-22 17:12:24 -07:00
Armon Dadgar	40def1a187	nomad: expose RuntimeStats	2015-08-20 15:29:30 -07:00
Armon Dadgar	8913a42674	nomad: move and test max function	2015-08-04 17:13:40 -07:00
Armon Dadgar	890db2d2b7	nomad: adding utility shuffle	2015-07-23 17:30:07 -07:00
Armon Dadgar	e8964a4975	nomad: adding utility methods	2015-06-06 00:14:08 +02:00
Armon Dadgar	490fa1b7db	nomad: testing serf join	2015-06-04 12:33:12 +02:00
Armon Dadgar	4f6ecce727	nomad: working on serf member parsing	2015-06-03 13:35:48 +02:00
Armon Dadgar	d52122f041	nomad: more skeleton	2015-06-03 12:26:50 +02:00
Armon Dadgar	1e7f84f3e6	nomad: adding basic structure for raft	2015-06-01 17:49:10 +02:00

47 Commits