open-nomad

Author	SHA1	Message	Date
Michael Schurter	43fb0e82dc	client: prevent watching stale alloc state (#18612 ) When waiting on a previous alloc we must query against the leader before switching to a stale query with index set. Also check to ensure the response is fresh before using it like #18269	2023-09-29 14:37:10 -07:00
Michael Schurter	547a95795a	client: prevent using stale allocs (#18601 ) Similar to #18269, it is possible that even if Node.GetClientAllocs retrieves fresh allocs that the subsequent Alloc.GetAllocs call retrieves stale allocs. While `diffAlloc(existing, updated)` properly ignores stale alloc updates, alloc deletions have no such check. So if a client retrieves an alloc created at index 123, and then a subsequent Alloc.GetAllocs call hits a new server which returns results at index 100, the client will stop the alloc created at 123 because it will be missing from the stale response. This change applies the same logic as #18269 and ensures only fresh responses are used. Glossary: * fresh - modified at an index > the query index * stale - modified at an index <= the query index	2023-09-29 14:34:04 -07:00
hc-github-team-nomad-core	a6ecf954b0	backport of commit 7bd5c6e84eef890cebdb404d9cb2e281919d4529 (#18555 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-09-21 17:16:14 -05:00
hc-github-team-nomad-core	a2f56797a0	backport of commit 4895d708b438b42e52fd54a128f9ec4cb6d72277 (#18531 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-09-18 14:29:29 -05:00
hc-github-team-nomad-core	46b4847885	backport of commit c6dbba7cde911bb08f1f8da445a44a0125cd2047 (#18505 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-09-14 14:38:05 -05:00
hc-github-team-nomad-core	6ae643a3bf	backport of commit 12580c345a89312542c18878680dd581da3d44eb (#18479 ) Co-authored-by: Shantanu Gadgil <shantanugadgil@users.noreply.github.com>	2023-09-13 10:16:07 -04:00
hc-github-team-nomad-core	156db8d368	backport of commit 668dc5f7a767e85d62379e3e02405d2afa93f1db (#18448 ) Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-09-11 13:22:30 +01:00
hc-github-team-nomad-core	a7f85c804f	backport of commit 22cbb913db0fa1cbb4e24d197b067d64ea02739a (#18437 ) Co-authored-by: Daniel Bennett <dbennett@hashicorp.com>	2023-09-08 17:10:25 -05:00
hc-github-team-nomad-core	ef780825d4	backport of commit 05c332221471d39053eaecafe4832ddd6e1b3b89 (#18365 ) Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-08-30 09:05:57 -05:00
hc-github-team-nomad-core	4b59840bb1	backport of commit d0a93f12d1ec1e2b276f9958898c9a6fe4f6b077 (#18351 ) Co-authored-by: Matthew Salsamendi <matthewsalsamendi@gmail.com>	2023-08-28 19:44:39 -04:00
hc-github-team-nomad-core	d8ff618c40	backport of commit f25480c9e929c27476c8930f05832e8b96167660 (#18341 ) Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>	2023-08-25 16:36:35 -07:00
James Rasell	3730b66d8c	test: use correct parallel test setup func (#18326 ) (#18330 )	2023-08-25 14:48:06 +01:00
hc-github-team-nomad-core	621bce1da2	backport of commit 14a38bee7bc4386e74157f6a99f3db7382d7e6a5 (#18275 ) Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2023-08-21 16:34:32 -04:00
Tim Gross	0a19fe3b60	fix multiple overflow errors in exponential backoff (#18200 ) We use capped exponential backoff in several places in the code when handling failures. The code we've copy-and-pasted all over has a check to see if the backoff is greater than the limit, but this check happens after the bitshift and we always increment the number of attempts. This causes an overflow with a fairly small number of failures (ex. at one place I tested it occurs after only 24 iterations), resulting in a negative backoff which then never recovers. The backoff becomes a tight loop consuming resources and/or DoS'ing a Nomad RPC handler or an external API such as Vault. Note this doesn't occur in places where we cap the number of iterations so the loop breaks (usually to return an error), so long as the number of iterations is reasonable. Introduce a helper with a check on the cap before the bitshift to avoid overflow in all places this can occur. Fixes: #18199 Co-authored-by: stswidwinski <stan.swidwinski@gmail.com>	2023-08-15 14:39:09 -04:00
Seth Hoenig	a45b689d8e	update go1.21 (#18184 ) * build: update to go1.21 * go: eliminate helpers in favor of min/max * build: run go mod tidy * build: swap depguard for semgrep * command: fixup broken tls error check on go1.21	2023-08-15 14:40:33 +02:00
Charlie Voiselle	bac4d112d1	[dep] bump golang.org/x/exp (#18102 ) There are some refactorings that have to be made in the getter and state where the api changed in `slices` * Bump golang.org/x/exp * Bump golang.org/x/exp in api * Update job_endpoint_test * [feedback] unexport sort function	2023-08-03 15:14:39 -04:00
hc-github-team-nomad-core	9301daa8e8	backport of commit a3a637ee8efe5e1251f60f781369bd9052c4d4a2 (#18132 ) This pull request was automerged via backport-assistant	2023-08-02 08:47:19 -05:00
hc-github-team-nomad-core	b75f552246	fingerprint: fix 'default' alias not added to interface specified by `network_interface` (#18096 ) (#18116 ) Co-authored-by: Kevin Schoonover <github@kschoon.me>	2023-08-01 08:38:03 -04:00
hc-github-team-nomad-core	2ed92e0c6c	Backport of feature: Add new field render_templates on restart block into release/1.6.x (#18094 ) This pull request was automerged via backport-assistant	2023-07-28 13:54:00 -05:00
James Rasell	b8cb1e79a3	chore(lint): use Go stdlib variables for HTTP methods and status codes (#17968 ) (#18074 ) Co-authored-by: Ville Vesilehto <ville@vesilehto.fi>	2023-07-26 16:38:39 +01:00
hc-github-team-nomad-core	02c2f1a50f	Backport of Retain task states for post stop tasks at the time of node GC into release/1.6.x (#18033 ) This pull request was automerged via backport-assistant	2023-07-21 12:55:29 -05:00
hc-github-team-nomad-core	b1bfb59394	Backport of metrics: report task memory_max value into release/1.6.x (#18004 ) This pull request was automerged via backport-assistant	2023-07-19 15:50:34 -05:00
hc-github-team-nomad-core	b7689e87ec	Backport of nsd: retain query params in HTTP health checks into release/1.6.x (#18003 ) This pull request was automerged via backport-assistant	2023-07-19 15:47:02 -05:00
hc-github-team-nomad-core	e5fb6fe687	backport of commit 615e76ef3c23497f768ebd175f0c624d32aeece8 (#17993 ) This pull request was automerged via backport-assistant	2023-07-19 13:31:14 -05:00
Michael Schurter	c82f439a6d	remove empty file (#17853 )	2023-07-10 16:34:10 -07:00
Tim Gross	ad7355e58b	CSI: persist previous mounts on client to restore during restart (#17840 ) When claiming a CSI volume, we need to ensure the CSI node plugin is running before we send any CSI RPCs. This extends even to the controller publish RPC because it requires the storage provider's "external node ID" for the client. This primarily impacts client restarts but also is a problem if the node plugin exits (and fingerprints) while the allocation that needs a CSI volume claim is being placed. Unfortunately there's no mapping of volume to plugin ID available in the jobspec, so we don't have enough information to wait on plugins until we either get the volume from the server or retrieve the plugin ID from data we've persisted on the client. If we always require getting the volume from the server before making the claim, a client restart for disconnected clients will cause all the allocations that need CSI volumes to fail. Even while connected, checking in with the server to verify the volume's plugin before trying to make a claim RPC is inherently racy, so we'll leave that case as-is and it will fail the claim if the node plugin needed to support a newly-placed allocation is flapping such that the node fingerprint is changing. This changeset persists a minimum subset of data about the volume and its plugin in the client state DB, and retrieves that data during the CSI hook's prerun to avoid re-claiming and remounting the volume unnecessarily. This changeset also updates the RPC handler to use the external node ID from the claim whenever it is available. Fixes: #13028	2023-07-10 13:20:15 -04:00
Devashish Taneja	0d9dee3cbe	Include parent job ID as a Docker container label (#17843 ) Fixes: #17751	2023-07-10 11:27:45 -04:00
Seth Hoenig	4452f0623b	env/aws: updates from ec2info (#17835 )	2023-07-07 10:12:05 -05:00
Yorick Gersie	3e66291b0e	cni: ensure to setup CNI addresses in deterministic order (#17766 ) * cni: ensure to setup CNI addresses in deterministic order Currently as commented in the code the go-cni library returns an unordered map of interfaces. In cases where there are multiple CNI interfaces being created this creates a problem with service registration and healthchecking because the first address in the map is being used. The use case we have where this is an issue is that we run CNI with the macvlan plugin to isolate workloads, but they still need to be able to access the host on a static address to be able to perform local resolving and hit host services like the Consul agent API. To make this work there are 2 options, you either add a macvlan interface on the host with an assigned address for each VLAN you have or you create an additional veth bridged interface in the container namespace. We chose the latter option through a custom CNI plugin but the ordering issue leaves us with incorrect service registration. * Updates after feedback * First check for the CNIResult interfaces length, if it's zero we don't need to proceed at all. * Use sorted interfaces list for the address fallback scenario as well. * Remove "found" log message logic, when an address isn't found an error is returned stating the allocation could not be configured as an address was missing from the CNIResult. If we still need a Warn message then we can add it to the condition that returns the error if no address could be found instead of using the "found" bool logic.	2023-07-06 13:25:29 -07:00
Patric Stout	ebb363d43e	metrics: add "total_ticks_count" for CPU metrics (#17579 ) This counter tells you the total amount of ticks for that CPU entry since the start of Nomad.	2023-07-05 10:28:55 -04:00
Tim Gross	f65a925096	adjust prioritized client updates (#17541 ) In #17354 we made client updates prioritized to reduce client-to-server traffic. When the client has no previously-acknowledged update we assume that the update is of typical priority; although we don't know that for sure in practice an allocation will never become healthy quickly enough that the first update we send is the update saying the alloc is healthy. But that doesn't account for allocations that quickly fail in an unrecoverable way because of allocrunner hook failures, and it'd be nice to be able to send those failure states to the server more quickly. This changeset does so and adds some extra comments on reasoning behind priority.	2023-06-26 09:14:24 -04:00
grembo	7936c1e33f	Add `disable_file` parameter to job's `vault` stanza (#13343 ) This complements the `env` parameter, so that the operator can author tasks that don't share their Vault token with the workload when using `image` filesystem isolation. As a result, more powerful tokens can be used in a job definition, allowing it to use template stanzas to issue all kinds of secrets (database secrets, Vault tokens with very specific policies, etc.), without sharing that issuing power with the task itself. This is accomplished by creating a directory called `private` within the task's working directory, which shares many properties of the `secrets` directory (tmpfs where possible, not accessible by `nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the container. If the `disable_file` parameter is set to `false` (its default), the Vault token is also written to the NOMAD_SECRETS_DIR, so the default behavior is backwards compatible. Even if the operator never changes the default, they will still benefit from the improved behavior of Nomad never reading the token back in from that - potentially altered - location.	2023-06-23 15:15:04 -04:00
James Rasell	b9440965db	client: remove unused nsd check allocation result diff func (#17695 )	2023-06-23 15:26:06 +01:00
Tim Gross	11216d09af	client: send node secret with every client-to-server RPC (#16799 ) In Nomad 1.5.3 we fixed a security bug that allowed bypass of ACL checks if the request came thru a client node first. But this fix broke (knowingly) the identification of many client-to-server RPCs. These will be now measured as if they were anonymous. The reason for this is that many client-to-server RPCs do not send the node secret and instead rely on the protection of mTLS. This changeset ensures that the node secret is being sent with every client-to-server RPC request. In a future version of Nomad we can add enforcement on the server side, but this was left out of this changeset to reduce risks to the safe upgrade path. Sending the node secret as an auth token introduces a new problem during initial introduction of a client. Clients send many RPCs concurrently with `Node.Register`, but until the node is registered the node secret is unknown to the server and will be rejected as invalid. This causes permission denied errors. To fix that, this changeset introduces a gate on having successfully made a `Node.Register` RPC before any other RPCs can be sent (except for `Status.Ping`, which we need earlier but which also ignores the error because that handler doesn't do an authorization check). This ensures that we only send requests with a node secret already known to the server. This also makes client startup a little easier to reason about because we know `Node.Register` must succeed first, and it should make for a good place to hook in future plans for secure introduction of nodes. The tradeoff is that an existing client that has running allocs will take slightly longer (a second or two) to transition to ready after a restart, because the transition in `Node.UpdateStatus` is gated at the server by first submitting `Node.UpdateAlloc` with client alloc updates.	2023-06-22 11:06:49 -04:00
Seth Hoenig	5138c5b99e	client: do not disable memory swappiness if kernel does not support it (#17625 ) * client: do not disable memory swappiness if kernel does not support it This PR adds a workaround for very old Linux kernels which do not support the memory swappiness interface file. Normally we write a "0" to the file to explicitly disable swap. In the case the kernel does not support it, give libcontainer a nil value so it does not write anything. Fixes #17448 * client: detect swappiness by writing to the file * fixup changelog Co-authored-by: James Rasell <jrasell@users.noreply.github.com> --------- Co-authored-by: James Rasell <jrasell@users.noreply.github.com>	2023-06-22 09:36:31 -05:00
VishnuJin	67efb19e94	fingerprint: added windows os.build attribute to host fingerprint (#17576 )	2023-06-21 10:53:50 -04:00
Patric Stout	4767d44b94	Fix DevicesSets being removed when cpusets are reloaded with cgroup v2 (#17535 ) * Fix DevicesSets being removed when cpusets are reloaded with cgroup v2 This meant that if any allocation was created or removed, all active DevicesSets were removed from all cgroups of all tasks. This was most noticeable with "exec" and "raw_exec", as it meant they no longer had access to /dev files. * e2e: add test for verifying cgroups do not interfere with access to devices --------- Co-authored-by: Seth Hoenig <shoenig@duck.com>	2023-06-15 09:39:36 -05:00
Tim Gross	dc9fae34ca	node pools: add pool as label on client metrics (#17528 ) This changeset adds the node pool as a label anywhere we're already emitting labels with additional information such as node class or ID about the client.	2023-06-14 15:58:38 -04:00
Luiz Aoqui	ec80d051d8	client: fix panic on alloc stop in non-Linux environments (#17515 ) Provide a no-op implementation of the drivers.DriverNetoworkManager interface to be used by systems that don't support network isolation and prevent panics where a network manager is expected.	2023-06-14 10:22:38 -04:00
Seth Hoenig	557a6b4a5e	docker: stop network pause container of lost alloc after node restart (#17455 ) This PR fixes a bug where the docker network pause container would not be stopped and removed in the case where a node is restarted, the alloc is moved to another node, the node comes back up. See the issue below for full repro conditions. Basically in the DestroyNetwork PostRun hook we would depend on the NetworkIsolationSpec field not being nil - which is only the case if the Client stays alive all the way from network creation to network teardown. If the node is rebooted we lose that state and previously would not be able to find the pause container to remove. Now, we manually find the pause container by scanning them and looking for the associated allocID. Fixes #17299	2023-06-09 08:46:29 -05:00
Seth Hoenig	134e70cbab	client: fix client panic during drain cause by shutdown (#17450 ) During shutdown of a client with drain_on_shutdown there is a race between the Client ending the cgroup and the task's cpuset manager cleaning up the cgroup. During the path traversal, skip anything we cannot read, which avoids the nil DirEntry we try to dereference now.	2023-06-07 15:12:44 -05:00
Jerome Eteve	c26f01eefd	client checks kernel module in /sys/module for WSL2 bridge networking (#17306 )	2023-06-06 10:26:50 -04:00
Seth Hoenig	d1d4d22f8e	test: ensure cpuset cgroup is setup before fingerprinting (#17428 ) This PR fixes a racey test where we need to ensure the cpuset cgroup is setup before trying to fingerprint it.	2023-06-05 14:15:00 -05:00
hashicorp-copywrite[bot]	0f4532f138	[COMPLIANCE] Add Copyright and License Headers (#17429 ) Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>	2023-06-05 13:23:59 -04:00
Luiz Aoqui	6039c18ab6	node pools: register a node in a node pool (#17405 )	2023-06-02 17:50:50 -04:00
Tim Gross	06972fae0c	prioritized client updates (#17354 ) The allocrunner sends several updates to the server during the early lifecycle of an allocation and its tasks. Clients batch-up allocation updates every 200ms, but experiments like the C2M challenge has shown that even with this batching, servers can be overwhelmed with client updates during high volume deployments. Benchmarking done in #9451 has shown that client updates can easily represent ~70% of all Nomad Raft traffic. Each allocation sends many updates during its lifetime, but only those that change the `ClientStatus` field are critical for progressing a deployment or kicking off a reschedule to recover from failures. Add a priority to the client allocation sync and update the `syncTicker` receiver so that we only send an update if there's a high priority update waiting, or on every 5th tick. This means when there are no high priority updates, the client will send updates at most every 1s instead of 200ms. Benchmarks have shown this can reduce overall Raft traffic by 10%, as well as reduce client-to-server RPC traffic. This changeset also switches from a channel-based collection of updates to a shared buffer, so as to split batching from sending and prevent backpressure onto the allocrunner when the RPC is slow. This doesn't have a major performance benefit in the benchmarks but makes the implementation of the prioritized update simpler. Fixes: #9451	2023-05-31 15:34:16 -04:00
Luiz Aoqui	bb2395031b	client: fix Consul version finterprint (#17349 ) Consul v1.13.8 was released with a breaking change in the /v1/agent/self endpoint version where a line break was being returned. This caused the Nomad finterprint to fail because `NewVersion` errors on parse. This commit removes any extra space from the Consul version returned by the API.	2023-05-30 11:07:57 -04:00
Seth Hoenig	acfdf0f479	compliance: add headers with fixed copywrite tool (#17353 ) Closes #17117	2023-05-30 09:20:32 -05:00
Charlie Voiselle	86e04a4c6c	[core] nil check and error handling for client status in heartbeat responses (#17316 ) Add a nil check to constructNodeServerInfoResponse to manage an apparent race between deregister and client heartbeats. Fixes #17310	2023-05-25 16:04:54 -04:00
Lance Haig	568da5918b	cli: tls certs not created with correct SANs (#16959 ) The `nomad tls cert` command did not create certificates with the correct SANs for them to work with non default domain and region names. This changset updates the code to support non default domains and regions in the certificates.	2023-05-22 09:31:56 -04:00

1 2 3 4 5 ...

4783 commits