open-nomad

Author	SHA1	Message	Date
Seth Hoenig	fd9744b9eb	Merge pull request #14301 from hashicorp/b-fix-check-status-test-racey testing: fix flakey check status test	2022-08-25 08:30:46 -05:00
James Rasell	601588df6b	Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main	2022-08-25 12:14:29 +01:00
Luiz Aoqui	e012d9411e	Task lifecycle restart (#14127 ) * allocrunner: handle lifecycle when all tasks die When all tasks die the Coordinator must transition to its terminal state, coordinatorStatePoststop, to unblock poststop tasks. Since this could happen at any time (for example, a prestart task dies), all states must be able to transition to this terminal state. * allocrunner: implement different alloc restarts Add a new alloc restart mode where all tasks are restarted, even if they have already exited. Also unifies the alloc restart logic to use the implementation that restarts tasks concurrently and ignores ErrTaskNotRunning errors since those are expected when restarting the allocation. * allocrunner: allow tasks to run again Prevent the task runner Run() method from exiting to allow a dead task to run again. When the task runner is signaled to restart, the function will jump back to the MAIN loop and run it again. The task runner determines if a task needs to run again based on two new task events that were added to differentiate between a request to restart a specific task, the tasks that are currently running, or all tasks that have already run. * api/cli: add support for all tasks alloc restart Implement the new -all-tasks alloc restart CLI flag and its API counterpar, AllTasks. The client endpoint calls the appropriate restart method from the allocrunner depending on the restart parameters used. * test: fix tasklifecycle Coordinator test * allocrunner: kill taskrunners if all tasks are dead When all non-poststop tasks are dead we need to kill the taskrunners so we don't leak their goroutines, which are blocked in the alloc restart loop. This also ensures the allocrunner exits on its own. * taskrunner: fix tests that waited on WaitCh Now that "dead" tasks may run again, the taskrunner Run() method will not return when the task finishes running, so tests must wait for the task state to be "dead" instead of using the WaitCh, since it won't be closed until the taskrunner is killed. * tests: add tests for all tasks alloc restart * changelog: add entry for #14127 * taskrunner: fix restore logic. The first implementation of the task runner restore process relied on server data (`tr.Alloc().TerminalStatus()`) which may not be available to the client at the time of restore. It also had the incorrect code path. When restoring a dead task the driver handle always needs to be clear cleanly using `clearDriverHandle` otherwise, after exiting the MAIN loop, the task may be killed by `tr.handleKill`. The fix is to store the state of the Run() loop in the task runner local client state: if the task runner ever exits this loop cleanly (not with a shutdown) it will never be able to run again. So if the Run() loops starts with this local state flag set, it must exit early. This local state flag is also being checked on task restart requests. If the task is "dead" and its Run() loop is not active it will never be able to run again. * address code review requests * apply more code review changes * taskrunner: add different Restart modes Using the task event to differentiate between the allocrunner restart methods proved to be confusing for developers to understand how it all worked. So instead of relying on the event type, this commit separated the logic of restarting an taskRunner into two methods: - `Restart` will retain the current behaviour and only will only restart the task if it's currently running. - `ForceRestart` is the new method where a `dead` task is allowed to restart if its `Run()` method is still active. Callers will need to restart the allocRunner taskCoordinator to make sure it will allow the task to run again. * minor fixes	2022-08-24 17:43:07 -04:00
Tim Gross	c732b215f0	vault: detect namespace change in config reload (#14298 ) The `namespace` field was not included in the equality check between old and new Vault configurations, which meant that a Vault config change that only changed the namespace would not be detected as a change and the clients would not be reloaded. Also, the comparison for boolean fields such as `enabled` and `allow_unauthenticated` was on the pointer and not the value of that pointer, which results in spurious reloads in real config reload that is easily missed in typical test scenarios. Includes a minor refactor of the order of fields for `Copy` and `Merge` to match the struct fields in hopes it makes it harder to make this mistake in the future, as well as additional test coverage.	2022-08-24 17:03:29 -04:00
Seth Hoenig	ff59b90d41	testing: fix flakey check status test This PR fixes a flakey test where we did not wait on the check status to actually become failing (go too fast and you just get a pending check). Instead add a helper for waiting on any check in the alloc to become the state we are looking for.	2022-08-24 15:11:41 -05:00
Seth Hoenig	062c817450	cleanup: move fs helpers into escapingfs	2022-08-24 14:45:34 -05:00
Piotr Kazmierczak	7077d1f9aa	template: custom change_mode scripts (#13972 ) This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707	2022-08-24 17:43:01 +02:00
James Rasell	2ccc48c167	cli: use policy flag for role creation and update.	2022-08-24 15:15:02 +01:00
James Rasell	7401677e4e	cli: output none when a token has no expiration.	2022-08-24 15:14:49 +01:00
James Rasell	9782d6d7ff	acl: allow tokens to lookup linked roles. (#14227 ) When listing or reading an ACL role, roles linked to the ACL token used for authentication can be returned to the caller.	2022-08-24 13:51:51 +02:00
Luiz Aoqui	7ee3de3ea5	fix minor issues found durint ENT merge (#14250 )	2022-08-23 17:22:18 -04:00
Luiz Aoqui	7a8cacc9ec	allocrunner: refactor task coordinator (#14009 ) The current implementation for the task coordinator unblocks tasks by performing destructive operations over its internal state (like closing channels and deleting maps from keys). This presents a problem in situations where we would like to revert the state of a task, such as when restarting an allocation with tasks that have already exited. With this new implementation the task coordinator behaves more like a finite state machine where task may be blocked/unblocked multiple times by performing a state transition. This initial part of the work only refactors the task coordinator and is functionally equivalent to the previous implementation. Future work will build upon this to provide bug fixes and enhancements.	2022-08-22 18:38:49 -04:00
Tim Gross	bf57d76ec7	allow ACL policies to be associated with workload identity (#14140 ) The original design for workload identities and ACLs allows for operators to extend the automatic capabilities of a workload by using a specially-named policy. This has shown to be potentially unsafe because of naming collisions, so instead we'll allow operators to explicitly attach a policy to a workload identity. This changeset adds workload identity fields to ACL policy objects and threads that all the way down to the command line. It also a new secondary index to the ACL policy table on namespace and job so that claim resolution can efficiently query for related policies.	2022-08-22 16:41:21 -04:00
Luiz Aoqui	dbffdca92e	template: use pointer values for gid and uid (#14203 ) When a Nomad agent starts and loads jobs that already existed in the cluster, the default template uid and gid was being set to 0, since this is the zero value for int. This caused these jobs to fail in environments where it was not possible to use 0, such as in Windows clients. In order to differentiate between an explicit 0 and a template where these properties were not set we need to use a pointer.	2022-08-22 16:25:49 -04:00
James Rasell	2736cf0dfa	acl: make listing RPC and HTTP API a stub return object. (#14211 ) Making the ACL Role listing return object a stub future-proofs the endpoint. In the event the role object grows, we are not bound by having to return all fields within the list endpoint or change the signature of the endpoint to reduce the list return size.	2022-08-22 17:20:23 +02:00
James Rasell	802d005ef5	acl: add replication to ACL Roles from authoritative region. (#14176 ) ACL Roles along with policies and global token will be replicated from the authoritative region to all federated regions. This involves a new replication loop running on the federated leader. Policies and roles may be replicated at different times, meaning the policies and role references may not be present within the local state upon replication upsert. In order to bypass the RPC and state check, a new RPC request parameter has been added. This is used by the replication process; all other callers will trigger the ACL role policy validation check. There is a new ACL RPC endpoint to allow the reading of a set of ACL Roles which is required by the replication process and matches ACL Policies and Tokens. A bug within the ACL Role listing RPC has also been fixed which returned incorrect data during blocking queries where a deletion had occurred.	2022-08-22 08:54:07 +02:00
Seth Hoenig	88a1353149	cli: display nomad service check status output in CLI commands This PR adds some NSD check status output to the CLI. 1. The 'nomad alloc status' command produces nsd check summary output (if present) 2. The 'nomad alloc checks' sub-command is added to produce complete nsd check output (if present)	2022-08-19 09:18:29 -05:00
Michael Schurter	3b57df33e3	client: fix data races in config handling (#14139 ) Before this change, Client had 2 copies of the config object: config and configCopy. There was no guidance around which to use where (other than configCopy's comment to pass it to alloc runners), both are shared among goroutines and mutated in data racy ways. At least at one point I think the idea was to have `config` be mutable and then grab a lock to overwrite `configCopy`'s pointer atomically. This would have allowed alloc runners to read their config copies in data race safe ways, but this isn't how the current implementation worked. This change takes the following approach to safely handling configs in the client: 1. `Client.config` is the only copy of the config and all access must go through the `Client.configLock` mutex 2. Since the mutex only protects the config pointer itself and not fields inside the Config struct: all config mutation must be done on a copy of the config, and then Client's config pointer is overwritten while the mutex is acquired. Alloc runners and other goroutines with the old config pointer will not see config updates. 3. Deep copying is implemented on the Config struct to satisfy the previous approach. The TLS Keyloader is an exception because it has its own internal locking to support mutating in place. An unfortunate complication but one I couldn't find a way to untangle in a timely fashion. 4. To facilitate deep copying I made an internally backward incompatible API change: our `helper/funcs` used to turn containers (slices and maps) with 0 elements into nils. This probably saves a few memory allocations but makes it very easy to cause panics. Since my new config handling approach uses more copying, it became very difficult to ensure all code that used containers on configs could handle nils properly. Since this code has caused panics in the past, I fixed it: nil containers are copied as nil, but 0-element containers properly return a new 0-element container. No more "downgrading to nil!"	2022-08-18 16:32:04 -07:00
Seth Hoenig	c5d36eaa2f	cleanup: fixing warnings and refactoring of command package, part 2 This PR continues the cleanup of the command package, removing linter warnings, refactoring to use helpers, making tests easier to read, etc.	2022-08-18 09:43:39 -05:00
Seth Hoenig	4c1a0d4907	cleanup: first pass at fixing command package warnings This PR is the first of several for cleaning up warnings, and refactoring bits of code in the command package. First pass is over acl_ files and gets some helpers in place.	2022-08-17 15:33:37 -05:00
Piotr Kazmierczak	b63944b5c1	cleanup: replace TypeToPtr helper methods with pointer.Of (#14151 ) Bumping compile time requirement to go 1.18 allows us to simplify our pointer helper methods.	2022-08-17 18:26:34 +02:00
James Rasell	51a7df50bb	cli: add ability to create and view tokens with ACL role links.	2022-08-17 14:49:52 +01:00
Seth Hoenig	7728cf5a9a	Merge pull request #14132 from hashicorp/build-update-go1.19 build: update to go1.19	2022-08-16 11:20:27 -05:00
Seth Hoenig	b3ea68948b	build: run gofmt on all go source files Go 1.19 will forecefully format all your doc strings. To get this out of the way, here is one big commit with all the changes gofmt wants to make.	2022-08-16 11:14:11 -05:00
Seth Hoenig	56b0b456dc	Merge pull request #14102 from hashicorp/cleanup-mesh-gateway-value cleanup: consul mesh gateway type need not be pointer	2022-08-16 10:07:16 -05:00
Charlie Voiselle	dba6b39815	SV CLI: var init (#13820 ) * Nomad dep: add museli/reflow * SV CLI: var init	2022-08-15 13:43:29 -04:00
Tim Gross	4005759d28	move secure variable conflict resolution to state store (#13922 ) Move conflict resolution implementation into the state store with a new Apply RPC. This also makes the RPC for secure variables much more similar to Consul's KV, which will help us support soft deletes in a post-1.4.0 version of Nomad. Reimplement quotas in the state store functions. Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2022-08-15 11:19:53 -04:00
Seth Hoenig	f9355c29fb	cleanup: consul mesh gateway type need not be pointer This PR changes the use of structs.ConsulMeshGateway to value types instead of via pointers. This will help in a follow up PR where we cleanup a lot of custom comparison code with helper functions instead.	2022-08-13 11:26:58 -05:00
James Rasell	9c97560ded	cli: add new acl role subcommands for CRUD role actions. (#14087 )	2022-08-12 09:52:32 +02:00
Seth Hoenig	ba5c45ab93	cli: respect vault token in plan command This PR fixes a regression where the 'job plan' command would not respect a Vault token if set via --vault-token or $VAULT_TOKEN. Basically the same bug/fix as for the validate command in https://github.com/hashicorp/nomad/issues/13062 Fixes https://github.com/hashicorp/nomad/issues/13939	2022-08-11 08:54:08 -05:00
Seth Hoenig	1901cfaba8	Merge pull request #14069 from brian-athinkingape/cli-fix-memstats-cgroupsv2 cli: for systems with cgroups v2, fix alloation resource utilization showing 0 memory used	2022-08-11 07:27:48 -05:00
James Rasell	9cd0dd2ff7	http: add ACL Role HTTP endpoints for CRUD actions. These new endpoints are exposed under the /v1/acl/roles and /v1/acl/role endpoints.	2022-08-11 08:44:19 +01:00
Luiz Aoqui	815adbada5	Post 1.3.3 release (#14064 ) * Generate files for 1.3.3 release * Prepare for next release * Merge release 1.3.3 files Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>	2022-08-09 17:27:29 -04:00
Brian Chau	6621bb9db5	cli: for systems with cgroups v2, fix alloation resource utilization showing 0 memory used	2022-08-09 14:09:14 -07:00
Derek Strickland	77df9c133b	Add Nomad RetryConfig to agent template config (#13907 ) * add Nomad RetryConfig to agent template config	2022-08-03 16:56:30 -04:00
Piotr Kazmierczak	530280505f	client: enable specifying user/group permissions in the template stanza (#13755 ) * Adds Uid/Gid parameters to template. * Updated diff_test * fixed order * update jobspec and api * removed obsolete code * helper functions for jobspec parse test * updated documentation * adjusted API jobs test. * propagate uid/gid setting to job_endpoint * adjusted job_endpoint tests * making uid/gid into pointers * refactor * updated documentation * updated documentation * Update client/allocrunner/taskrunner/template/template_test.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * Update website/content/api-docs/json-jobs.mdx Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> * propagating documentation change from Luiz * formatting * changelog entry * changed changelog entry Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-08-02 22:15:38 +02:00
James Rasell	bb5b510c9d	cli: do not import structs, use API package only. (#13938 )	2022-08-02 16:33:08 +02:00
Eric Weber	cbce13c1ac	Add stage_publish_base_dir field to csi_plugin stanza of a job (#13919 ) * Allow specification of CSI staging and publishing directory path * Add website documentation for stage_publish_dir * Replace erroneous reference to csi_plugin.mount_config with csi_plugin.mount_dir * Avoid requiring CSI plugins to be redeployed after introducing StagePublishDir	2022-08-02 09:42:44 -04:00
Tim Gross	e5ac6464f6	secure vars: enforce ENT quotas (OSS work) (#13951 ) Move the secure variables quota enforcement calls into the state store to ensure quota checks are atomic with quota updates (in the same transaction). Switch to a machine-size int instead of a uint64 for quota tracking. The ENT-side quota spec is described as int, and negative values have a meaning as "not permitted at all". Using the same type for tracking will make it easier to the math around checks, and uint64 is infeasibly large anyways. Add secure vars to quota HTTP API and CLI outputs and API docs.	2022-08-02 09:32:09 -04:00
James Rasell	663aa92b7a	Merge branch 'main' into f-gh-13120-sso-umbrella	2022-08-02 08:30:03 +01:00
Tim Gross	8404f998f7	fix flaky `TestAgent_ProxyRPC_Dev` test (#13925 ) This test is a fairly trivial test of the agent RPC, but the test setup waits for a short fixed window after the node starts to send the RPC. After looking at detailed logs for recent test failures, it looks like the node registration for the first node doesn't get a chance to happen before we make the RPC call. Use `WaitForResultUntil` to give the test more time to run in slower test environments, while allowing it to finish quickly if possible.	2022-07-28 14:47:15 -04:00
Lars Lehtonen	a80df0480e	testing: fix dropped test errors in command/agent (#13926 )	2022-07-28 11:04:31 -04:00
Seth Hoenig	d8fe1d10ba	cleanup: use constants for on_update values	2022-07-21 13:09:47 -05:00
Seth Hoenig	c61e779b48	Merge pull request #13715 from hashicorp/dev-nsd-checks client: add support for checks in nomad services	2022-07-21 10:22:57 -05:00
Seth Hoenig	67c6336c67	Merge pull request #13870 from hashicorp/exp-fp-optimization client: use test timeouts for network fingerprinters in dev mode	2022-07-21 08:18:02 -05:00
Tim Gross	9c43c28575	search: use secure vars ACL policy for secure vars context (#13788 ) The search RPC used a placeholder policy for searching within the secure variables context. Now that we have ACL policies built for secure variables, we can use them for search. Requires a new loose policy for checking if a token has any secure variables access within a namespace, so that we can filter on specific paths in the iterator.	2022-07-21 08:39:36 -04:00
Seth Hoenig	6f93aca63e	devmode: use minimal network timeouts for network fingerprinters in dev mode	2022-07-20 15:13:14 -05:00
Tim Gross	97a6346da0	keyring: use nanos for `CreateTime` in key metadata (#13849 ) Most of our objects use int64 timestamps derived from `UnixNano()` instead of `time.Time` objects. Switch the keyring metadata to use `UnixNano()` for consistency across the API.	2022-07-20 14:46:57 -04:00
Tim Gross	96aea74b4b	docs: keyring commands (#13690 ) Document the secure variables keyring commands, document the aliased gossip keyring commands, and note that the old gossip keyring commands are deprecated.	2022-07-20 14:14:10 -04:00
Will Jordan	5354409b1a	Return 429 response on HTTP max connection limit (#13621 ) Return 429 response on HTTP max connection limit. Instead of silently closing the connection, return a `429 Too Many Requests` HTTP response with a helpful error message to aid debugging when the connection limit is unintentionally reached. Set a 10-millisecond write timeout and rate limiter for connection-limit 429 response to prevent writing the HTTP response from consuming too many server resources. Add `nomad.agent.http.exceeded metric` counting the number of HTTP connections exceeding concurrency limit.	2022-07-20 14:12:21 -04:00
James Rasell	f6d12a3c00	acl: enable configuration and visualisation of token expiration for users (#13846 ) * api: add ACL token expiry params to HTTP API * cli: allow setting and displaying ACL token expiry	2022-07-20 10:06:23 +02:00
hc-github-team-nomad-core	fa09c13016	Generate files for 1.3.2 release	2022-07-13 19:33:41 -04:00
Michael Schurter	d54d90edfa	http: only log alloc/exec errors when non-nil (#13730 )	2022-07-13 09:44:51 -07:00
Luiz Aoqui	b656981cf0	Track plan rejection history and automatically mark clients as ineligible (#13421 ) Plan rejections occur when the scheduler work and the leader plan applier disagree on the feasibility of a plan. This may happen for valid reasons: since Nomad does parallel scheduling, it is expected that different workers will have a different state when computing placements. As the final plan reaches the leader plan applier, it may no longer be valid due to a concurrent scheduling taking up intended resources. In these situations the plan applier will notify the worker that the plan was rejected and that they should refresh their state before trying again. In some rare and unexpected circumstances it has been observed that workers will repeatedly submit the same plan, even if they are always rejected. While the root cause is still unknown this mitigation has been put in place. The plan applier will now track the history of plan rejections per client and include in the plan result a list of node IDs that should be set as ineligible if the number of rejections in a given time window crosses a certain threshold. The window size and threshold value can be adjusted in the server configuration. To avoid marking several nodes as ineligible at one, the operation is rate limited to 5 nodes every 30min, with an initial burst of 10 operations.	2022-07-12 18:40:20 -04:00
Seth Hoenig	297d386bdc	client: add support for checks in nomad services This PR adds support for specifying checks in services registered to the built-in nomad service provider. Currently only HTTP and TCP checks are supported, though more types could be added later.	2022-07-12 17:09:50 -05:00
Michael Schurter	3e50f72fad	core: merge reserved_ports into host_networks (#13651 ) Fixes #13505 This fixes #13505 by treating reserved_ports like we treat a lot of jobspec settings: merging settings from more global stanzas (client.reserved.reserved_ports) "down" into more specific stanzas (client.host_networks[].reserved_ports). As discussed in #13505 there are other options, and since it's totally broken right now we have some flexibility: Treat overlapping reserved_ports on addresses as invalid and refuse to start agents. However, I'm not sure there's a cohesive model we want to publish right now since so much 0.9-0.12 compat code still exists! We would have to explain to folks that if their -network-interface and host_network addresses overlapped, they could only specify reserved_ports in one place or the other?! It gets ugly. Use the global client.reserved.reserved_ports value as the default and treat host_network[].reserverd_ports as overrides. My first suggestion in the issue, but @groggemans made me realize the addresses on the agent's interface (as configured by -network-interface) may overlap with host_networks, so you'd need to remove the global reserved_ports from addresses shared with a shared network?! This seemed really confusing and subtle for users to me. So I think "merging down" creates the most expressive yet understandable approach. I've played around with it a bit, and it doesn't seem too surprising. The only frustrating part is how difficult it is to observe the available addresses and ports on a node! However that's a job for another PR.	2022-07-12 14:40:25 -07:00
Charlie Voiselle	6be7a41351	SV: CLI: var list command (#13707 ) * SV CLI: var list * Fix wildcard prefix filtering Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-12 12:49:39 -04:00
James Rasell	0292f48396	server: add ACL token expiration config parameters. (#13667 ) This commit adds configuration parameters to control ACL token expirations. This includes both limits on the min and max TTL expiration values, as well as a GC threshold for expired tokens.	2022-07-12 13:43:25 +02:00
Tim Gross	a5a9eedc81	core job for secure variables re-key (#13440 ) When the `Full` flag is passed for key rotation, we kick off a core job to decrypt and re-encrypt all the secure variables so that they use the new key.	2022-07-11 13:34:06 -04:00
Charlie Voiselle	555ac432cd	SV: CAS: Implement Check and Set for Delete and Upsert (#13429 ) * SV: CAS * Implement Check and Set for Delete and Upsert * Reading the conflict from the state store * Update endpoint for new error text * Updated HTTP api tests * Conflicts to the HTTP api * SV: structs: Update SV time to UnixNanos * update mock to UnixNano; refactor * SV: encrypter: quote KeyID in error * SV: mock: add mock for namespace w/ SV	2022-07-11 13:34:06 -04:00
Tim Gross	eaf430bfd5	secure variable server configuration (#13307 ) Add fields for configuring root key garbage collection and automatic rotation. Fix the keystore path so that we write to a tempdir when in dev mode.	2022-07-11 13:34:06 -04:00
Tim Gross	4d011d4c53	move gossip keyring command to their own subcommands (#13383 ) Move all the gossip keyring and key generation commands under `operator gossip keyring` subcommands to align with the new `operator secure-variables keyring` subcommands. Deprecate the `operator keyring` and `operator keygen` commands.	2022-07-11 13:34:06 -04:00
Charlie Voiselle	1fe080c6de	Implement HTTP search API for Variables (#13257 ) * Add Path only index for SecureVariables * Add GetSecureVariablesByPrefix; refactor tests * Add search for SecureVariables * Add prefix search for secure variables	2022-07-11 13:34:05 -04:00
Charlie Voiselle	06c6a950c4	Secure Variables: Seperate Encrypted and Decrypted structs (#13355 ) This PR splits SecureVariable into SecureVariableDecrypted and SecureVariableEncrypted in order to use the type system to help verify that cleartext secret material is not committed to file. * Make Encrypt function return KeyID * Split SecureVariable Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-11 13:34:05 -04:00
Tim Gross	56deb6f8cc	keyring CLI: refactor to use subcommands (#13351 ) Split the flag options for the `secure-variables keyring` into their own subcommands. The gossip keyring CLI will be similarly refactored and the old version will be deprecated.	2022-07-11 13:34:05 -04:00
Tim Gross	81b0c4fd36	keyring command line (#13169 ) Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>	2022-07-11 13:34:04 -04:00
Charlie Voiselle	619e0cbafd	Don't write a SecureVariable with no Items (#13258 )	2022-07-11 13:34:04 -04:00
Tim Gross	5a85d96322	remove end-user algorithm selection (#13190 ) After internal design review, we decided to remove exposing algorithm choice to the end-user for the initial release. We'll solve nonce rotation by forcing rotations automatically on key GC (in a core job, not included in this changeset). Default to AES-256 GCM for the following criteria: * faster implementation when hardware acceleration is available * FIPS compliant * implementation in pure go * post-quantum resistance Also fixed a bug in the decoding from keystore and switched to a harder-to-misuse encoding method.	2022-07-11 13:34:04 -04:00
Tim Gross	f2ee585830	bootstrap keyring (#13124 ) When a server becomes leader, it will check if there are any keys in the state store, and create one if there is not. The key metadata will be replicated via raft to all followers, who will then get the key material via key replication (not implemented in this changeset).	2022-07-11 13:34:04 -04:00
Charlie Voiselle	3717688f3e	Secure Variables: Variables - State store, FSM, RPC (#13098 ) * Secure Variables: State Store * Secure Variables: FSM * Secure Variables: RPC * Secure Variables: HTTP API Co-authored-by: Tim Gross <tgross@hashicorp.com>	2022-07-11 13:34:04 -04:00
Tim Gross	05eef2b95c	keystore serialization (#13106 ) This changeset implements the keystore serialization/deserialization: * Adds a JSON serialization extension for the `RootKey` struct, along with a metadata stub. When we serialize RootKey to the on-disk keystore, we want to base64 encode the key material but also exclude any frequently-changing fields which are stored in raft. * Implements methods for loading/saving keys to the keystore. * Implements methods for restoring the whole keystore from disk. * Wires it all up with the `Keyring` RPC handlers and fixes up any fallout on tests.	2022-07-11 13:34:04 -04:00
Tim Gross	c6929a6c1e	keyring HTTP API (#13077 )	2022-07-11 13:34:04 -04:00
Charlie Voiselle	2019eab2c8	Provide mock secure variables implementation (#12980 ) * Add SecureVariable mock * Add SecureVariableStub * Add SecureVariable Copy and Stub funcs	2022-07-11 13:34:03 -04:00
Tim Gross	b6dd1191b2	snapshot restore-from-archive streaming and filtering (#13658 ) Stream snapshot to FSM when restoring from archive The `RestoreFromArchive` helper decompresses the snapshot archive to a temporary file before reading it into the FSM. For large snapshots this performs a lot of disk IO. Stream decompress the snapshot as we read it, without first writing to a temporary file. Add bexpr filters to the `RestoreFromArchive` helper. The operator can pass these as `-filter` arguments to `nomad operator snapshot state` (and other commands in the future) to include only desired data when reading the snapshot.	2022-07-11 10:48:00 -04:00
James Rasell	353323d171	agent: test full object when performing test config parse. (#13668 )	2022-07-11 16:21:36 +02:00
James Rasell	9eb63c9e03	cli: ensure node status and drain use correct cmd name. (#13656 )	2022-07-11 09:50:42 +02:00
Luiz Aoqui	03433dd8af	cli: improve output of eval commands (#13581 ) Use the same output format when listing multiple evals in the `eval list` command and when `eval status <prefix>` matches more than one eval. Include the eval namespace in all output formats and always include the job ID in `eval status` since, even `node-update` evals are related to a job. Add Node ID to the evals table output to help differentiate `node-update` evals. Co-authored-by: James Rasell <jrasell@hashicorp.com>	2022-07-07 13:13:34 -04:00
Tim Gross	1fc8995590	query for leader in `operator debug` command (#13472 ) The `operator debug` command doesn't output the leader anywhere in the output, which adds extra burden to offline debugging (away from an ongoing incident where you can simply check manually). Query the `/v1/status/leader` API but degrade gracefully.	2022-07-06 10:57:44 -04:00
James Rasell	0c0b028a59	core: allow deleting of evaluations (#13492 ) * core: add eval delete RPC and core functionality. * agent: add eval delete HTTP endpoint. * api: add eval delete API functionality. * cli: add eval delete command. * docs: add eval delete website documentation.	2022-07-06 16:30:11 +02:00
James Rasell	181b247384	core: allow pausing and un-pausing of leader broker routine (#13045 ) * core: allow pause/un-pause of eval broker on region leader. * agent: add ability to pause eval broker via scheduler config. * cli: add operator scheduler commands to interact with config. * api: add ability to pause eval broker via scheduler config * e2e: add operator scheduler test for eval broker pause. * docs: include new opertor scheduler CLI and pause eval API info.	2022-07-06 16:13:48 +02:00
Seth Hoenig	97726c2fd8	Merge pull request #12862 from hashicorp/f-choose-services api: enable selecting subset of services using rendezvous hashing	2022-06-30 15:17:40 -05:00
Yoan Blanc	3d96145ea5	feat: docker/docker/pkg/term has been deprecated in favor of moby/term See https://github.com/moby/moby/pull/40825 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2022-06-26 15:35:27 +02:00
Seth Hoenig	9467bc9eb3	api: enable selecting subset of services using rendezvous hashing This PR adds the 'choose' query parameter to the '/v1/service/<service>' endpoint. The value of 'choose' is in the form '<number>\|<key>', number is the number of desired services and key is a value unique but consistent to the requester (e.g. allocID). Folks aren't really expected to use this API directly, but rather through consul-template which will soon be getting a new helper function making use of this query parameter. Example, curl 'localhost:4646/v1/service/redis?choose=2\|abc123' Note: consul-templte v0.29.1 includes the necessary nomadServices functionality.	2022-06-25 10:37:37 -05:00
Seth Hoenig	91e08d5e23	core: remove support for raft protocol version 2 This PR checks server config for raft_protocol, which must now be set to 3 or unset (0). When unset, version 3 is used as the default.	2022-06-23 14:37:50 +00:00
Derek Strickland	9de4d7367c	cli: fix detach handling (#13405 ) Fix detach handling for: - `deployment fail` - `deployment promote` - `deployment resume` - `deployment unblock` - `job promote`	2022-06-21 06:01:23 -04:00
Joseph Martin	4aa96d5bfc	Return evalID if `-detach` flag is passed to job revert (#13364 ) * Return evalID if `-detach` flag is passed to job revert	2022-06-15 14:20:29 -04:00
Grant Griffiths	99896da443	CSI: make plugin health_timeout configurable in csi_plugin stanza (#13340 ) Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2022-06-14 10:04:16 -04:00
Derek Strickland	34dea90d7a	docker: update images to reference hashicorpdev Docker organization (#12903 ) docker: update images to reference hashicorpdev dockerhub organization generate job_init.bindata_assetfs.go Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>	2022-06-08 15:06:00 -04:00
Kevin Schoonover	544c276128	parse ACL token from authorization header (#12534 )	2022-06-06 15:51:02 -04:00
Karan Sharma	9426be01fc	feat: Warn if bootstrap_expect is even number (#12961 )	2022-06-06 15:22:59 +02:00
Lance Haig	4bf27d743d	Allow Operator Generated bootstrap token (#12520 )	2022-06-03 07:37:24 -04:00
Huan Wang	7d15157635	adding support for customized ingress tls (#13184 )	2022-06-02 18:43:58 -04:00
Seth Hoenig	54efec5dfe	docs: add docs and tests for tagged_addresses	2022-05-31 13:02:48 -05:00
Jorge Marey	f966614602	Allow setting tagged addresses on services	2022-05-31 10:06:55 -05:00
James Rasell	84aabe8963	cli: fix minor formatting issue with alloc restart help. (#13135 )	2022-05-27 13:18:47 +02:00
Seth Hoenig	4631045d83	connect: enable setting connect upstream destination namespace	2022-05-26 09:39:36 -05:00
Luiz Aoqui	769ff1dcc3	Merge pull request #13109 from hashicorp/merge-release-1.3.1-branch Merge release 1.3.1 branch	2022-05-25 10:45:09 -04:00
Seth Hoenig	626a345fb2	ci: switch to 22.04 LTS for GHA Core CI tests	2022-05-25 08:19:40 -05:00
hc-github-team-nomad-core	15caf5eab2	Generate files for 1.3.1 release	2022-05-24 16:29:46 -04:00
Michael Schurter	2965dc6a1a	artifact: fix numerous go-getter security issues Fix numerous go-getter security issues: - Add timeouts to http, git, and hg operations to prevent DoS - Add size limit to http to prevent resource exhaustion - Disable following symlinks in both artifacts and `job run` - Stop performing initial HEAD request to avoid file corruption on retries and DoS opportunities. Approach Since Nomad has no ability to differentiate a DoS-via-large-artifact vs a legitimate workload, all of the new limits are configurable at the client agent level. The max size of HTTP downloads is also exposed as a node attribute so that if some workloads have large artifacts they can specify a high limit in their jobspecs. In the future all of this plumbing could be extended to enable/disable specific getters or artifact downloading entirely on a per-node basis.	2022-05-24 16:29:39 -04:00
Will Jordan	d515e5c3b0	Don't buffer json logs on agent startup (#13076 ) There's no reason to buffer json logs on agent startup since logs in this format already aren't reordered.	2022-05-19 15:40:30 -04:00
Seth Hoenig	fc58f4972c	cli: correctly use and validate job with vault token set This PR fixes `job validate` to respect '-vault-token', '$VAULT_TOKEN', '-vault-namespace' if set.	2022-05-19 12:13:34 -05:00
Seth Hoenig	65f7abf2f4	cli: update default redis and use nomad service discovery Closes #12927 Closes #12958 This PR updates the version of redis used in our examples from 3.2 to 7. The old version is very not supported anymore, and we should be setting a good example by using a supported version. The long-form example job is now fixed so that the service stanza uses nomad as the service discovery provider, and so now the job runs without a requirement of having Consul running and configured.	2022-05-17 10:24:19 -05:00
hc-github-team-nomad-core	8c5dbe1a44	Generate files for 1.3.0 release	2022-05-13 17:32:20 -04:00
hc-github-team-nomad-core	b0ec54c885	Generate files for 1.3.0-rc.1 release	2022-05-13 17:31:57 -04:00
James Rasell	636b647a30	agent: fix panic when logging about protocol version config use. (#12962 ) The log line comes before the agent logger has been setup, therefore we need to use the UI logging to avoid panic.	2022-05-13 09:28:43 +02:00
Eng Zer Jun	97d1bc735c	test: use `T.TempDir` to create temporary test directory (#12853 ) * test: use `T.TempDir` to create temporary test directory This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The directory created by `t.TempDir` is automatically removed when the test and all its subtests complete. Prior to this commit, temporary directory created using `ioutil.TempDir` needs to be removed manually by calling `os.RemoveAll`, which is omitted in some tests. The error handling boilerplate e.g. defer func() { if err := os.RemoveAll(dir); err != nil { t.Fatal(err) } } is also tedious, but `t.TempDir` handles this for us nicely. Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix TestLogmon_Start_restart on Windows Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix failing TestConsul_Integration t.TempDir fails to perform the cleanup properly because the folder is still in use testing.go:967: TempDir RemoveAll cleanup: unlinkat /tmp/TestConsul_Integration2837567823/002/191a6f1a-5371-cf7c-da38-220fe85d10e5/web/secrets: device or resource busy Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-05-12 11:42:40 -04:00
Dave May	97cf204c00	debug: add version constraint to avoid pprof panic (#12807 )	2022-04-28 13:18:55 -04:00
Luiz Aoqui	a8cc633156	vault: revert support for entity aliases (#12723 ) After a more detailed analysis of this feature, the approach taken in PR #12449 was found to be not ideal due to poor UX (users are responsible for setting the entity alias they would like to use) and issues around jobs potentially masquerading itself as another Vault entity.	2022-04-22 10:46:34 -04:00
Seth Hoenig	ed71d5db2b	services: fix imports	2022-04-22 09:15:51 -05:00
Seth Hoenig	ae6048eafa	services: format ipv6 in nomad service info output Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-04-22 09:14:29 -05:00
Seth Hoenig	3fcac242c6	services: enable setting arbitrary address value in service registrations This PR introduces the `address` field in the `service` block so that Nomad or Consul services can be registered with a custom `.Address.` to advertise. The address can be an IP address or domain name. If the `address` field is set, the `service.address_mode` must be set in `auto` mode.	2022-04-22 09:14:29 -05:00
Tim Gross	3fc1b67396	CSI: handle nil topologies safely in command line (#12751 )	2022-04-22 09:25:04 -04:00
James Rasell	046831466c	cli: add pagination flags to service info command. (#12730 )	2022-04-22 10:32:40 +02:00
Michael Schurter	5db3a671db	cli: add -json flag to support job commands (#12591 ) * cli: add -json flag to support job commands While the CLI has always supported running JSON jobs, its support has been via HCLv2's JSON parsing. I have no idea what format it expects the job to be in, but it's absolutely not in the same format as the API expects. So I ignored that and added a new -json flag to explicitly support API style JSON jobspecs. The jobspecs can even have the wrapping {"Job": {...}} envelope or not! * docs: fix example for `nomad job validate` We haven't been able to validate inside driver config stanzas ever since the move to task driver plugins. 😭	2022-04-21 13:20:36 -07:00
Tim Gross	f4287c870d	cli: detect directory when applying namespace spec file (#12738 ) The new `namespace apply` feature that allows for passing a namespace specification file detects the difference between an empty namespace and a namespace specification by checking if the file exists. For most cases, the file will have an extension like `.hcl` and so there's little danger that a user will apply a file spec when they intended to apply a file name. But because directory names typically don't include an extension, you're much more likely to collide when trying to `namespace apply` by name only, and then you get a confusing error message of the form: Failed to read file: read $namespace: is a directory Detect the case where the namespace name collides with a directory in the current working directory, and skip trying to load the directory.	2022-04-21 14:53:45 -04:00
James Rasell	716b8e658b	api: Add support for filtering and pagination to the node list endpoint (#12727 )	2022-04-21 17:04:33 +02:00
James Rasell	010acce59f	job_hooks: add implicit constraint when using Consul for services. (#12602 )	2022-04-20 14:09:13 +02:00
Derek Strickland	7c6eb47b78	`consul-template`: revert `function_denylist` logic (#12071 ) * consul-template: replace config rather than append Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>	2022-04-18 13:57:56 -04:00
Shishir	f5121d261e	Add os to NodeListStub struct. (#12497 ) * Add os to NodeListStub struct. Signed-off-by: Shishir Mahajan <smahajan@roblox.com> * Add os as a query param to /v1/nodes. Signed-off-by: Shishir Mahajan <smahajan@roblox.com> * Add test: os as a query param to /v1/nodes. Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2022-04-15 17:22:45 -07:00
Tim Gross	826d9d47f9	CSI: replace structs->api with serialization extension (#12583 ) The CSI HTTP API has to transform the CSI volume to redact secrets, remove the claims fields, and to consolidate the allocation stubs into a single slice of alloc stubs. This was done manually in #8590 but this is a large amount of code and has proven both very bug prone (see #8659, #8666, #8699, #8735, and #12150) and requires updating lots of code every time we add a field to volumes or plugins. In #10202 we introduce encoding improvements for the `Node` struct that allow a more minimal transformation. Apply this same approach to serializing `structs.CSIVolume` to API responses. Also, the original reasoning behind #8590 for plugins no longer holds because the counts are now denormalized within the state store, so we can simply remove this transformation entirely.	2022-04-15 14:29:34 -04:00
Tim Gross	b14e53e446	CSI: fix volume status prefix matching in CLI (#12584 ) The API for `CSIVolume.List` sorts by created index and not by ID, which breaks the logic for prefix matching in the `volume status` output when the prefix is also an exact match. Ensure that we're handling this case correctly.	2022-04-15 14:16:30 -04:00
Tim Gross	f5d8c636c7	CSI: handle per-alloc volumes in `alloc status -verbose` CLI (#12573 ) The Nomad client's `csi_hook` interpolates the alloc suffix with the volume request's name for CSI volumes with `per_alloc = true`, turning `example` into `example[1]`. We need to do this same behavior in the `alloc status` output so that we show the correct volume.	2022-04-15 09:26:19 -04:00
Lars Lehtonen	81bb1ef030	command/agent: check err before close (#12574 )	2022-04-15 08:54:03 -04:00
Seth Hoenig	a1c4f16cf1	connect: prefix tag with nomad.; merge into envoy_stats_tags; update docs This PR expands on the work done in #12543 to - prefix the tag, so it is now "nomad.alloc_id" to be more consistent with Consul tags - merge into pre-existing envoy_stats_tags fields - update the upgrade guide docs - update changelog	2022-04-14 12:52:52 -05:00
Ian Drennan	70bd32df83	Add alloc_id to sidecar bootstrap	2022-04-14 11:46:06 -05:00
James Rasell	4cdc46ae75	service discovery: add pagination and filtering support to info requests (#12552 ) * services: add pagination and filter support to info RPC. * cli: add filter flag to service info command. * docs: add pagination and filter details to services info API. * paginator: minor updates to comment and func signature.	2022-04-13 07:41:44 +02:00
Yoan Blanc	5e8254beda	feat: remove dependency to consul/lib Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2022-04-09 13:22:44 +02:00
hc-github-team-nomad-core	07c6d10c86	Generate files for release	2022-04-07 20:21:26 +00:00
Tim Gross	09b5e8d388	Fix flaky `operator debug` test (#12501 ) We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes: * Make first pprof collection synchronous to preserve the existing behavior for the common case where the pprof interval matches the duration. * Clamp `operator debug` pprof timing to that of the command. The `pprof-duration` should be no more than `duration` and the `pprof-interval` should be no more than `pprof-duration`. Clamp the values rather than throwing errors, which could change the commands that existing users might already have in debugging scripts * Testing: remove test parallelism The `operator debug` tests that stand up servers can't be run in parallel, because we don't have a way of canceling the API calls for pprof. The agent will still be running the last pprof when we exit, and that breaks the next test that talks to that same agent. (Because you can only run one pprof at a time on any process!) We could split off each subtest into its own server, but this test suite is already very slow. In future work we should fix this "for real" by making the API call cancelable. * Testing: assert against unexpected errors in `operator debug` tests. If we assert there are no unexpected error outputs, it's easier for the developer to debug when something is going wrong with the tests because the error output will be presented as a failing test, rather than just a failing exit code check. Or worse, no failing exit code check! This also forces us to be explicit about which tests will return 0 exit codes but still emit (presumably ignorable) error outputs. Additional minor bug fixes (mostly in tests) and test refactorings: * Fix text alignment on pprof Duration in `operator debug` output * Remove "done" channel from `operator debug` event stream test. The goroutine we're blocking for here already tells us it's done by sending a value, so block on that instead of an extraneous channel * Event stream test timer should start at current time, not zero * Remove noise from `operator debug` test log output. The `t.Logf` calls already are picked out from the rest of the test output by being prefixed with the filename. * Remove explicit pprof args so we use the defaults clamped from duration/interval	2022-04-07 15:00:07 -04:00
Tim Gross	1724765096	api: use `cleanhttp.DefaultPooledTransport` for default API client (#12492 ) We expect every Nomad API client to use a single connection to any given agent, so take advantage of keep-alive by switching the default transport to `DefaultPooledClient`. Provide a facility to close idle connections for testing purposes. Restores the previously reverted #12409 Co-authored-by: Ben Buzbee <bbuzbee@cloudflare.com>	2022-04-06 16:14:53 -04:00
James Rasell	431c153cd9	client: add Nomad template service functionality to runner. (#12458 ) This change modifies the template task runner to utilise the new consul-template which includes Nomad service lookup template funcs. In order to provide security and auth to consul-template, we use a custom HTTP dialer which is passed to consul-template when setting up the runner. This method follows Vault implementation. Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-04-06 19:17:05 +02:00
Jasmine Dahilig	f67b108f9f	docs: update vault-token note in job run command #8040 (#12385 )	2022-04-06 10:01:38 -07:00
Derek Strickland	0ab89b1728	Merge pull request #12476 from hashicorp/f-disconnected-client-allocation-handling disconnected clients: Feature branch merge	2022-04-06 10:11:57 -04:00
James Rasell	21d4ecac40	Merge pull request #12459 from hashicorp/b-fix-service-delete-cli-flake cli: fixup service test delete by using atomic actions.	2022-04-06 15:22:08 +02:00
James Rasell	e1c0329fe9	cli: fixup service test delete by using more atomic actions.	2022-04-06 08:36:23 +01:00
Seth Hoenig	2e2ff3f75e	Merge pull request #12419 from hashicorp/exec-cleanup raw_exec: make raw exec driver work with cgroups v2	2022-04-05 16:42:01 -05:00
Derek Strickland	d86ab290a0	Add unknown to TaskGroupSummary (#12269 )	2022-04-05 17:12:23 -04:00
Derek Strickland	8e9f8be511	`MaxClientDisconnect` Jobspec checklist (#12177 ) * api: Add struct, conversion function, and tests * TaskGroup: Add field, validation, and tests * diff: Add diff handler and test * docs: Update docs	2022-04-05 17:12:23 -04:00
Derek Strickland	3cbd76ea9d	disconnected clients: Add reconnect task event (#12133 ) * Add TaskClientReconnectedEvent constant * Add allocRunner.Reconnect function to manage task state manually * Removes server-side push	2022-04-05 17:12:23 -04:00
Shishir	a6801f73d1	cli: add -quiet to nomad node status command. (#12426 )	2022-04-05 15:53:43 -04:00
Luiz Aoqui	ab7eb5de6e	Support Vault entity aliases (#12449 ) Move some common Vault API data struct decoding out of the Vault client so it can be reused in other situations. Make Vault job validation its own function so it's easier to expand it. Rename the `Job.VaultPolicies` method to just `Job.Vault` since it returns the full Vault block, not just their policies. Set `ChangeMode` on `Vault.Canonicalize`. Add some missing tests. Allows specifying an entity alias that will be used by Nomad when deriving the task Vault token. An entity alias assigns an indentity to a token, allowing better control and management of Vault clients since all tokens with the same indentity alias will now be considered the same client. This helps track Nomad activity in Vault's audit logs and better control over Vault billing. Add support for a new Nomad server configuration to define a default entity alias to be used when deriving Vault tokens. This default value will be used if the task doesn't have an entity alias defined.	2022-04-05 14:18:10 -04:00
Grant Griffiths	18a0a2c9a4	CSI: Add secrets flag support for delete volume (#11245 )	2022-04-05 08:59:11 -04:00
Seth Hoenig	52aaf86f52	raw_exec: make raw exec driver work with cgroups v2 This PR adds support for the raw_exec driver on systems with only cgroups v2. The raw exec driver is able to use cgroups to manage processes. This happens only on Linux, when exec_driver is enabled, and the no_cgroups option is not set. The driver uses the freezer controller to freeze processes of a task, issue a sigkill, then unfreeze. Previously the implementation assumed cgroups v1, and now it also supports cgroups v2. There is a bit of refactoring in this PR, but the fundamental design remains the same. Closes #12351 #12348	2022-04-04 16:11:38 -05:00
Danish Prakash	e7e8ce212e	command/operator_debug: add pprof interval (#11938 )	2022-04-04 15:24:12 -04:00
Seth Hoenig	9670adb6c6	cleanup: purge github.com/pkg/errors	2022-04-01 19:24:02 -05:00
Seth Hoenig	fa0dc05b7a	tests: wait on client in a couple of tests These tend to fail on GHA, where I believe the client is not starting up fast enough before making requests. So wait on the client agent first. ``` === RUN TestDebug_CapturedFiles operator_debug_test.go:422: serverName: TestDebug_CapturedFiles.global, clientID, 1afb00e6-13f2-d8d6-d0f9-745a3fd6e8e4 operator_debug_test.go:492: Error Trace: operator_debug_test.go:492 Error: Should be empty, but was No node(s) with prefix "1afb00e6-13f2-d8d6-d0f9-745a3fd6e8e4" found Failed to retrieve clients, 0 nodes found in list: 1afb00e6-13f2-d8d6-d0f9-745a3fd6e8e4 Test: TestDebug_CapturedFiles --- FAIL: TestDebug_CapturedFiles (0.08s) ```	2022-03-30 08:48:23 -05:00
Michael Schurter	cae69ba8ce	Merge pull request #12312 from hashicorp/f-writeToFile template: disallow `writeToFile` by default	2022-03-29 13:41:59 -07:00
Tim Gross	03c1904112	csi: allow `namespace` field to be passed in volume spec (#12400 ) Use the volume spec's `namespace` field to override the value of the `-namespace` and `NOMAD_NAMESPACE` field, just as we do with job spec.	2022-03-29 14:46:39 -04:00
Michael Schurter	7a28fcb8af	template: disallow `writeToFile` by default Resolves #12095 by WONTFIXing it. This approach disables `writeToFile` as it allows arbitrary host filesystem writes and is only a small quality of life improvement over multiple `template` stanzas. This approach has the significant downside of leaving people who have altered their `template.function_denylist` still vulnerable! I added an upgrade note, but we should have implemented the denylist as a `map[string]bool` so that new funcs could be denied without overriding custom configurations. This PR also includes a bug fix that broke enabling all consul-template funcs. We repeatedly failed to differentiate between a nil (unset) denylist and an empty (allow all) one.	2022-03-28 17:05:42 -07:00
Shishir	afcce3eea5	Display OS name in nomad node status command. (#12388 ) Signed-off-by: Shishir Mahajan <smahajan@roblox.com>	2022-03-28 09:28:14 -04:00
Seth Hoenig	e256afdfee	ci: set test log level off in gha	2022-03-25 13:43:33 -05:00
James Rasell	9449e1c3e2	Merge branch 'main' into f-1.3-boogie-nights	2022-03-25 16:40:32 +01:00
James Rasell	1604f46026	Merge pull request #12357 from hashicorp/f-update-cli-namespace-wildcard-support-wording cli: update namespace wildcard help to be non-specific.	2022-03-25 11:24:39 +01:00
Seth Hoenig	987dda3092	Merge pull request #12274 from hashicorp/f-cgroupsv2 client: enable cpuset support for cgroups.v2	2022-03-24 14:22:54 -05:00
Tim Gross	ff1bed38cd	csi: add `-secret` and `-parameter` flag to `volume snapshot create` (#12360 ) Pass-through the `-secret` and `-parameter` flags to allow setting parameters for the snapshot and overriding the secrets we've stored on the CSI volume in the state store.	2022-03-24 10:29:50 -04:00
James Rasell	16b1f19ffe	api: move serviceregistration client to servics to match CLI. The service registration client name was used to provide a distinction between the service block and the service client. This however creates new wording to understand and does not match the CLI, therefore this change fixes that so we have a Services client. Consul specific objects within the service file have been moved to the consul location to create a clearer separation.	2022-03-24 09:08:45 +01:00
James Rasell	96d8512c85	test: move remaining tests to use ci.Parallel.	2022-03-24 08:45:13 +01:00
Seth Hoenig	2e5c6de820	client: enable support for cgroups v2 This PR introduces support for using Nomad on systems with cgroups v2 [1] enabled as the cgroups controller mounted on /sys/fs/cgroups. Newer Linux distros like Ubuntu 21.10 are shipping with cgroups v2 only, causing problems for Nomad users. Nomad mostly "just works" with cgroups v2 due to the indirection via libcontainer, but not so for managing cpuset cgroups. Before, Nomad has been making use of a feature in v1 where a PID could be a member of more than one cgroup. In v2 this is no longer possible, and so the logic around computing cpuset values must be modified. When Nomad detects v2, it manages cpuset values in-process, rather than making use of cgroup heirarchy inheritence via shared/reserved parents. Nomad will only activate the v2 logic when it detects cgroups2 is mounted at /sys/fs/cgroups. This means on systems running in hybrid mode with cgroups2 mounted at /sys/fs/cgroups/unified (as is typical) Nomad will continue to use the v1 logic, and should operate as before. Systems that do not support cgroups v2 are also not affected. When v2 is activated, Nomad will create a parent called nomad.slice (unless otherwise configured in Client conifg), and create cgroups for tasks using naming convention <allocID>-<task>.scope. These follow the naming convention set by systemd and also used by Docker when cgroups v2 is detected. Client nodes now export a new fingerprint attribute, unique.cgroups.version which will be set to 'v1' or 'v2' to indicate the cgroups regime in use by Nomad. The new cpuset management strategy fixes #11705, where docker tasks that spawned processes on startup would "leak". In cgroups v2, the PIDs are started in the cgroup they will always live in, and thus the cause of the leak is eliminated. [1] https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html Closes #11289 Fixes #11705 #11773 #11933	2022-03-23 11:35:27 -05:00
James Rasell	cd42572f0d	Merge pull request #12330 from hashicorp/f-gh-263 cli: add service commands for list, info, and delete.	2022-03-23 15:49:29 +01:00
Tim Gross	1743648901	CSI: fix timestamp from volume snapshot responses (#12352 ) Listing snapshots was incorrectly returning nanoseconds instead of seconds, and formatting of timestamps both list and create snapshot was treating the timestamp as though it were nanoseconds instead of seconds. This resulted in create timestamps always being displayed as zero values. Fix the unit conversion error in the command line and the incorrect extraction in the CSI plugin client code. Beef up the unit tests to make sure this code is actually exercised.	2022-03-23 10:39:28 -04:00
James Rasell	0a9f54c525	cli: update namespace wildcard help to be non-specific. A number of commands support namespace wildcard querying, so it should be up to the sub-command to detail support, rather than keeping this list up to date.	2022-03-23 12:56:48 +01:00
James Rasell	a646333263	Merge branch 'main' into f-1.3-boogie-nights	2022-03-23 09:41:25 +01:00
Tim Gross	2a2ebd0537	CSI: presentation improvements (#12325 ) * Fix plugin capability sorting. The `sort.StringSlice` method in the stdlib doesn't actually sort, but instead constructs a sorting type which you call `Sort()` on. * Sort allocations for plugins by modify index. Present allocations in modify index order so that newest allocations show up at the top of the list. This results in sorted allocs in `nomad plugin status :id`, just like `nomad job status :id`. * Sort allocations for volumes in HTTP response. Present allocations in modify index order so that newest allocations show up at the top of the list. This results in sorted allocs in `nomad volume status :id`, just like `nomad job status :id`. This is implemented in the HTTP response and not in the state store because the state store maintains two separate lists of allocs that are merged before sending over the API. * Fix length of alloc IDs in `nomad volume status` output	2022-03-22 09:48:38 -04:00
James Rasell	c0a2e493c3	cli: add service commands for list, info, and delete.	2022-03-21 11:59:03 +01:00
James Rasell	042bf0fa57	client: hookup service wrapper for use within client hooks.	2022-03-21 10:29:57 +01:00
Seth Hoenig	4d86f5d94d	ci: limit gotestsum to circle ci Part 2 of breaking up https://github.com/hashicorp/nomad/pull/12255 This PR makes it so gotestsum is invoked only in CircleCI. Also the HCLogger(t) is plumbed more correctly in TestServer and TestAgent so that they respect NOMAD_TEST_LOG_LEVEL. The reason for these is we'll want to disable logging in GHA, where spamming the disk with logs really drags performance.	2022-03-18 09:15:01 -05:00
Luiz Aoqui	68e5b58007	cli: display Raft version in `server members` (#12317 ) The previous output of the `nomad server members` command would output a column named `Protocol` that displayed the Serf protocol being currently used by servers. This is not a configurable option, so it holds very little value to operators. It is also easy to confuse it with the Raft Protocol version, which is configurable and highly relevant to operators. This commit replaces the previous `Protocol` column with the new `Raft Version`. It also updates the `-detailed` flag to be called `-verbose` so it matches other commands. The detailed output now also outputs the same information as the standard output with the addition of the previous `Protocol` column and `Tags`.	2022-03-17 14:15:10 -04:00
Luiz Aoqui	15089f055f	api: add related evals to eval details (#12305 ) The `related` query param is used to indicate that the request should return a list of related (next, previous, and blocked) evaluations. Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>	2022-03-17 13:56:14 -04:00
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
James Rasell	7cd28a6fb6	client: refactor common service registration objects from Consul. This commit performs refactoring to pull out common service registration objects into a new `client/serviceregistration` package. This new package will form the base point for all client specific service registration functionality. The Consul specific implementation is not moved as it also includes non-service registration implementations; this reduces the blast radius of the changes as well.	2022-03-15 09:38:30 +01:00
James Rasell	541a0f7d9a	config: add native service discovery admin boolean parameter.	2022-03-14 11:48:13 +01:00
James Rasell	783d7fdc31	jobspec: add service block provider parameter and validation.	2022-03-14 09:21:20 +01:00
Luiz Aoqui	ab8ce87bba	Add pagination, filtering and sort to more API endpoints (#12186 )	2022-03-08 20:54:17 -05:00
Michael Schurter	7bb8de68e5	Merge pull request #12138 from jorgemarey/f-ns-meta Add metadata to namespaces	2022-03-07 10:19:33 -08:00
Tim Gross	b94837a2b8	csi: add pagination args to `volume snapshot list` (#12193 ) The snapshot list API supports pagination as part of the CSI specification, but we didn't have it plumbed through to the command line.	2022-03-07 12:19:28 -05:00
Tim Gross	2dafe46fe3	CSI: allow updates to volumes on re-registration (#12167 ) CSI `CreateVolume` RPC is idempotent given that the topology, capabilities, and parameters are unchanged. CSI volumes have many user-defined fields that are immutable once set, and many fields that are not user-settable. Update the `Register` RPC so that updating a volume via the API merges onto any existing volume without touching Nomad-controlled fields, while validating it with the same strict requirements expected for idempotent `CreateVolume` RPCs. Also, clarify that this state store method is used for everything, not just for the `Register` RPC.	2022-03-07 11:06:59 -05:00
Tim Gross	09a7612150	csi: volume snapshot list plugin option is required (#12197 ) The RPC for listing volume snapshots requires a plugin ID. Update the `volume snapshot list` command to find the specific plugin from the provided prefix.	2022-03-07 09:58:29 -05:00
Michael Schurter	5bf877ecf2	cli: namespace meta should be formatted consistently	2022-03-04 14:13:48 -08:00
Michael Schurter	6841385b73	cli: namespace tests should be run on oss	2022-03-04 14:13:48 -08:00
Michael Schurter	e8a5258ad4	cli: namespace apply should autocomplete hcl files	2022-03-04 14:13:33 -08:00
Tim Gross	b776c1c196	csi: fix prefix queries for plugin list RPC (#12194 ) The `CSIPlugin.List` RPC was intended to accept a prefix to filter the list of plugins being listed. This was being accidentally being done in the state store instead, which contributed to incorrect filtering behavior for plugins in the `volume plugin status` command. Move the prefix matching into the RPC so that it calls the prefix-matching method in the state store if we're looking for a prefix. Update the `plugin status command` to accept a prefix for the plugin ID argument so that it matches the expected behavior of other commands.	2022-03-04 16:44:09 -05:00
Tim Gross	3247e422d1	csi: add missing fields to HTTP API response (#12178 ) The HTTP endpoint for CSI manually serializes the internal struct to the API struct for purposes of redaction (see also #10470). Add fields that were missing from this serialization so they don't show up as always empty in the API response.	2022-03-03 15:15:28 -05:00
James Rasell	8ce6684955	http: add alloc service registration agent HTTP endpoint.	2022-03-03 12:13:32 +01:00
James Rasell	81fe915e6c	http: add job service registration agent HTTP endpoint.	2022-03-03 12:13:13 +01:00
James Rasell	60cc73fe5d	http: add agent service registration HTTP endpoint.	2022-03-03 12:13:00 +01:00
Michael Schurter	0f6923c750	Merge pull request #10808 from hashicorp/f-curl cli: add operator api command	2022-03-02 10:12:16 -08:00
Michael Schurter	0bb9f06637	cli: fix op api method handling	2022-03-01 16:44:15 -08:00
Tim Gross	f65c804544	csi: subcommand for volume snapshot (#12152 )	2022-03-01 13:30:30 -05:00
Tim Gross	f2a4ad0949	CSI: implement support for topology (#12129 )	2022-03-01 10:15:46 -05:00
Tim Gross	c90e674918	CSI: use HTTP headers for passing CSI secrets (#12144 )	2022-03-01 08:47:01 -05:00
Tim Gross	a499401b34	csi: fix redaction of `volume status` mount flags (#12150 ) The `volume status` command and associated API redacts the entire mount options instead of just the `MountFlags` field that can contain sensitive data. Return a redacted value so that the return value makes sense to operators who have set this field.	2022-03-01 08:34:03 -05:00
Tim Gross	99d03cdc6c	CSI: sort capabilities in `plugin status` (#12154 ) Also fix `LIST_SNAPSHOTS` capability name	2022-03-01 07:59:31 -05:00
Tim Gross	02ae95ab22	csi: respect -verbose flag for allocs in volume status (#12153 )	2022-03-01 07:57:29 -05:00
Jorge Marey	a466f01120	Add metadata to namespaces	2022-02-27 09:09:10 +01:00
Michael Schurter	cbf6ba843d	cli: fix op api typos Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>	2022-02-25 16:31:56 -08:00
Michael Schurter	4550c5fb80	cli: only return 1 on errors from op api We don't want people to expect stable error codes for errors, and I don't think these were useful for scripts anyway.	2022-02-25 16:23:31 -08:00
Michael Schurter	a42d832f98	cli: add tests and minor fixes for op api Trimmed spaces around header values. Fixed method getting forced to GET.	2022-02-24 17:06:07 -08:00
Michael Schurter	238a732098	cli: add filter support	2022-02-24 15:52:54 -08:00
Michael Schurter	bb3daac628	rename `nomad curl` to `nomad operator api`	2022-02-24 15:52:54 -08:00
Michael Schurter	141db0c562	cli: add curl command Just a hackweek project at this point.	2022-02-24 15:52:54 -08:00
Tim Gross	31ee2a3c67	CSI: ensure all fields are mapped from structs to api response (#12124 ) In PR #12108 we added missing fields to the plugin response, but we didn't include the manual serialization steps that we need until issue #10470 is resolved.	2022-02-24 14:17:15 -05:00
Tim Gross	13ea2c7fb3	CSI: display plugin capabilities in verbose status (#12116 ) The behaviors of CSI plugins are governed by their capabilities as defined by the CSI specification. When debugging plugin issues, it's useful to know which behaviors are expected so they can be matched against RPC calls made to the plugin allocations. Expose the plugin capabilities as named in the CSI spec in the `nomad plugin status -verbose` output.	2022-02-24 13:51:38 -05:00
Sander Mol	42b338308f	add go-sockaddr templating support to nomad consul address (#12084 )	2022-02-24 09:34:54 -05:00
Florian Apolloner	3bced8f558	namespaces: allow enabling/disabling allowed drivers per namespace	2022-02-24 09:27:32 -05:00
Seth Hoenig	a0350b0608	command: switch from raft-boltdb to raft-boltdb/v2	2022-02-23 14:43:59 -06:00
Seth Hoenig	de95998faa	core: switch to go.etc.io/bbolt This PR swaps the underlying BoltDB implementation from boltdb/bolt to go.etc.io/bbolt. In addition, the Server has a new configuration option for disabling NoFreelistSync on the underlying database. Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81 Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720	2022-02-23 14:26:41 -06:00
Michael Schurter	7494a0c4fd	core: remove all traces of unused protocol version Nomad inherited protocol version numbering configuration from Consul and Serf, but unlike those projects Nomad has never used it. Nomad's `protocol_version` has always been `1`. While the code is effectively unused and therefore poses no runtime risks to leave, I felt like removing it was best because: 1. Nomad's RPC subsystem has been able to evolve extensively without needing to increment the version number. 2. Nomad's HTTP API has evolved extensively without increment `API{Major,Minor}Version`. If we want to version the HTTP API in the future, I doubt this is the mechanism we would choose. 3. The presence of the `server.protocol_version` configuration parameter is confusing since `server.raft_protocol` is an important parameter for operators to consider. Even more confusing is that there is a distinct Serf protocol version which is included in `nomad server members` output under the heading `Protocol`. `raft_protocol` is the only protocol version relevant to Nomad developers and operators. The other protocol versions are either deadcode or have never changed (Serf). 4. If we were to need to version the RPC, HTTP API, or Serf protocols, I don't think these configuration parameters and variables are the best choice. If we come to that point we should choose a versioning scheme based on the use case and modern best practices -- not this 6+ year old dead code.	2022-02-18 16:12:36 -08:00
Luiz Aoqui	de91954582	initial base work for implementing sorting and filter across API endpoints (#12076 )	2022-02-16 14:34:36 -05:00
Luiz Aoqui	110dbeeb9d	Add `go-bexpr` filters to evals and deployment list endpoints (#12034 )	2022-02-16 11:40:30 -05:00
Seth Hoenig	ac3cd73d00	Merge pull request #12054 from hashicorp/b-creation-indexes api: return sorted results in certain list endpoints	2022-02-15 15:08:38 -06:00
Seth Hoenig	40c714a681	api: return sorted results in certain list endpoints These API endpoints now return results in chronological order. They can return results in reverse chronological order by setting the query parameter ascending=true. - Eval.List - Deployment.List	2022-02-15 13:48:28 -06:00
Alex Holyoake	3071c7d91b	config: merge ReservableCores in clientConfig (#12044 )	2022-02-15 08:36:37 -05:00
Tim Gross	2f79a260fe	csi: volume cli prefix matching should accept exact match (#12051 ) The `volume detach`, `volume deregister`, and `volume status` commands accept a prefix argument for the volume ID. Update the behavior on exact matches so that if there is more than one volume that matches the prefix, we should only return an error if one of the volume IDs is not an exact match. Otherwise we won't be able to use these commands at all on those volumes. This also makes the behavior of these commands consistent with `job stop`.	2022-02-11 08:53:03 -05:00
Luiz Aoqui	3bf6036487	Version 1.2.6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJiBIXqAAoJELC0QQl2hbZ2M8cP/A7LENJbFSph25M1aGItra5j BphSX//Sq/v9ZzO44rOGNYQGfTpFT8STJgj2GC50qR/ilF4KX4D0oZlDyu/6D0NG ouN9RUjnFd6IEDQrjqqqhr3F69Z95SWVfi1rfgn/pIgOYkVEXfi6DXaulVVyd2ZT J0G5w5ryl5d8PhuL7TWw4zbhZRQn0hVspZv/1s3/I9aG6Sew8SMweeOxbN9lBr7E H19Amdjh6ugRuPgU7YMpKDVrZQRv9Wt7BUP/uc0u3LiW9z3Ko8ZKnCRKErtL5Kc3 HDZsWe+t3va4Uekzd0HULNcYU4kwjogdRYRzX5kRsOyXelrZkQIqYFiKrk1wVbq/ cYM5DUak6eUQBGhgi3UY0fklBFq4GDGpiwEzn7rvQb0PRSuVyykgbZ12fzyIu8dp tWbR/WOEg9F+jva6HkR2kDIcr5mDmny3Pxi5aUT6lMk1111nCzOjDzhLkQVtfsex FDMByXxM4oWAK3ouq2OIdxDL2c742A2933C4/30KWE7Xy7twsvkGw52irw66VO3V 4PHP880cDvEDaEh15mY/8FlaAE7t/gsCUuYLxGwl33TaXSRBLc9vVNrrp89q53TD ZcvXTBpHUOWa6ZlHF/4f8LW44rowM6bU0Wili7NaWOKx86dnUJMG4sqJifNgcpS/ 7lXogv98CYLbMy4X4if0 =NY1Z -----END PGP SIGNATURE----- gpgsig -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEElFaq1Z5DKdB91i+lKfRZwNnLtXMFAmIFbbkACgkQKfRZwNnL tXOr/g/+N2ZBMK8ohEvtdXLl7WXrVhgJfUSVbdD5Kfshul9CPn3yWRxJzqtEN2Pf 55ozeWLpoziP9y9LviJ7rDidXcTmDFutbFdGJ3L+ZLdLILsNOq1A+lbuwO3fJngZ 5aiPoJLsw4sqj6uHaM6Cls2f145O92nT7GXEHCxuvGHeSf3NkcR+zRY5nPrLTIrA uxYefCOzP6C2I+W7dL4Oj5R5EZd4UDi1WiL8pGzwm24LcagZN2ctctolAeF9OlJX M58UUv9b4GObe617u8MeH0LIlyZiNwn9JqrV33dKVTyrkBIYfYxkzdzMKf1csVYk kQb13KPdPTASBAGTl+sxeXXnw/bg09JXGcvREX5lLyQqY8xGwTv2FpTmybKWLiss Bg6BbejrgtCPBik0EAHWV0+kVzhi9bPfUYwTXLDCzMtrbyCyPoWchruel2sm41U1 ezRDzlSvf6nrXf7sAv6umJICck4Bc5Gol+8W7fxvWqnY9rQ3ds2v7E5lXZMBbOmE JSi+EDWBJjBAXehE6pLxeVsvlHMRWN007Z2UeD4neGIgG7xFJLq6nKeUKoiNIpgk hKBL8iwHyuJfrBB/dcPzI9NV+jL6OZ/oI1RWxSj0MX/B4VXZp8HrqZA5JxzQolUg KIxqe4iX3WIkQv+UU4WiELvs4O7fujB4KWz3iQokhwDxqGUpffk= =5EG2 -----END PGP SIGNATURE----- Merge tag 'v1.2.6' into merge-release-1.2.6-branch Version 1.2.6	2022-02-10 14:55:34 -05:00
Luiz Aoqui	15f9d54dea	api: prevent excessice CPU load on job parse Add new namespace ACL requirement for the /v1/jobs/parse endpoint and return early if HCLv2 parsing fails. The endpoint now requires the new `parse-job` ACL capability or `submit-job`.	2022-02-09 19:51:47 -05:00
Thomas Lefebvre	3b57f3af9d	Add config command and config validate subcommand to nomad CLI (#9198 )	2022-02-08 16:52:35 -05:00
Tim Gross	7ad15b2b42	raft: default to protocol v3 (#11572 ) Many of Nomad's Autopilot features require raft protocol version 3. Set the default raft protocol to 3, and improve the upgrade documentation.	2022-02-03 15:03:12 -05:00
Seth Hoenig	db2347a86c	cleanup: prevent leaks from time.After This PR replaces use of time.After with a safe helper function that creates a time.Timer to use instead. The new function returns both a time.Timer and a Stop function that the caller must handle. Unlike time.NewTimer, the helper function does not panic if the duration set is <= 0.	2022-02-02 14:32:26 -06:00
Derek Strickland	460416e787	Update IsEmpty to check for pre-1.2.4 fields (#11930 )	2022-01-28 14:41:49 -05:00
Derek Strickland	b3c8ab9be7	Update IsEmpty to check for pre-1.2.4 fields (#11930 )	2022-01-26 11:31:37 -05:00
Tim Gross	1dad0e597e	fix integer bounds checks (#11815 ) * driver: fix integer conversion error The shared executor incorrectly parsed the user's group into int32 and then cast to uint32 without bounds checking. This is harmless because an out-of-bounds gid will throw an error later, but it triggers security and code quality scans. Parse directly to uint32 so that we get correct error handling. * helper: fix integer conversion error The autopilot flags helper incorrectly parses a uint64 to a uint which is machine specific size. Although we don't have 32-bit builds, this sets off security and code quality scaans. Parse to the machine sized uint. * driver: restrict bounds of port map The plugin server doesn't constrain the maximum integer for port maps. This could result in a user-visible misconfiguration, but it also triggers security and code quality scans. Restrict the bounds before casting to int32 and return an error. * cpuset: restrict upper bounds of cpuset values Our cpuset configuration expects values in the range of uint16 to match the expectations set by the kernel, but we don't constrain the values before downcasting. An underflow could lead to allocations failing on the client rather than being caught earlier. This also make security and code quality scanners happy. * http: fix integer downcast for per_page parameter The parser for the `per_page` query parameter downcasts to int32 without bounds checking. This could result in underflow and nonsensical paging, but there's no server-side consequences for this. Fixing this will silence some security and code quality scanners though.	2022-01-25 11:16:48 -05:00
Seth Hoenig	0030424384	Merge pull request #11889 from hashicorp/build-update-circle build: upgrade circleci configuration	2022-01-24 10:18:21 -06:00
Seth Hoenig	2f0cfb5740	build: upgrade and speedup circleci configuration This PR upgrades our CI images and fixes some affected tests. - upgrade go-machine-image to premade latest ubuntu LTS (ubuntu-2004:202111-02) - eliminate go-machine-recent-image (no longer necessary) - manage GOPATH in GNUMakefile (see https://discuss.circleci.com/t/gopath-is-set-to-multiple-directories/7174) - fix tcp dial error check (message seems to be OS specific) - spot check values measured instead of specifically 'RSS' (rss no longer reported in cgroups v2) - use safe MkdirTemp for generating tmpfiles NOT applied: (too flakey) - eliminate setting GOMAXPROCS=1 (build tools were also affected by this setting) - upgrade resource type for all imanges to large (2C -> 4C)	2022-01-24 08:28:14 -06:00
Seth Hoenig	f2a71fd0d9	deps: pty has new home github.com/kr/pty was moved to github.com/creack/pty Swap this dependency so we can upgrade to the latest version and no longer need a replace directive.	2022-01-19 12:33:05 -06:00
Seth Hoenig	2a5f7c0386	deps: swap gzip handler for gorilla This has been pinned since the Go modules migration, because the nytimes gzip handler was modified in version v1.1.0 in a way that is no longer compatible. Pretty sure it is this commit: `c551b6c3b4` Instead use handler.CompressHandler from gorilla, which is a web toolkit we already make use of for other things.	2022-01-19 11:52:19 -06:00
Nomad Release bot	de3070d49a	Generate files for 1.2.4 release	2022-01-18 23:43:00 +00:00
Dave May	330d24a873	cli: Add event stream capture to nomad operator debug (#11865 )	2022-01-17 21:35:51 -05:00
Michael Schurter	99c863f909	cli: improve debug error messages (#11507 ) Improves `nomad debug` error messages when contacting agents that do not have /v1/agent/host endpoints (the endpoint was added in v0.12.0) Part of #9568 and manually tested against Nomad v0.8.7. Hopefully isRedirectError can be reused for more cases listed in #9568	2022-01-17 11:15:17 -05:00
Tim Gross	33f7c6cba4	csi: when warning for multiple prefix matches, use full ID (#11853 ) When the `volume deregister` or `volume detach` commands get an ID prefix that matches multiple volumes, show the full length of the volume IDs in the list of volumes shown so so that the user can select the correct one.	2022-01-14 12:25:48 -05:00
Tim Gross	9c4864badd	freebsd: build fix for ARM7 32-bit (#11854 ) The size of `stat_t` fields is architecture dependent, which was reportedly causing a build failure on FreeBSD ARM7 32-bit systems. This changeset matches the behavior we have on Linux.	2022-01-14 12:25:32 -05:00
James Rasell	82b168bf34	Merge pull request #11403 from hashicorp/f-gh-11059 agent/docs: add better clarification when top-level data dir needs setting	2022-01-13 16:41:35 +01:00
Luiz Aoqui	d48e50da9a	Fix log level parsing from lines that include a timestamp (#11838 )	2022-01-13 09:56:35 -05:00
Michael Schurter	e6eff95769	agent: validate reserved_ports are valid Goal is to fix at least one of the causes that can cause a node to be ineligible to receive work: https://github.com/hashicorp/nomad/issues/9506#issuecomment-1002880600	2022-01-12 14:21:47 -08:00
Seth Hoenig	8c97ffd68e	cleanup: stop referencing depreceted HeaderMap field Remove reference to the deprecated ResponseRecorder.HeaderMap field, instead calling .Response.Header() to get the same data. closes #10520	2022-01-12 10:32:54 -06:00
Derek Strickland	0a8e03f0f7	Expose Consul template configuration parameters (#11606 ) This PR exposes the following existing`consul-template` configuration options to Nomad jobspec authors in the `{job.group.task.template}` stanza. - `wait` It also exposes the following`consul-template` configuration to Nomad operators in the `{client.template}` stanza. - `max_stale` - `block_query_wait` - `consul_retry` - `vault_retry` - `wait` Finally, it adds the following new Nomad-specific configuration to the `{client.template}` stanza that allows Operators to set bounds on what `jobspec` authors configure. - `wait_bounds` Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-01-10 10:19:07 -05:00
Charlie Voiselle	98a240cd99	Make number of scheduler workers reloadable (#11593 ) ## Development Environment Changes * Added stringer to build deps ## New HTTP APIs * Added scheduler worker config API * Added scheduler worker info API ## New Internals * (Scheduler)Worker API refactor—Start(), Stop(), Pause(), Resume() * Update shutdown to use context * Add mutex for contended server data - `workerLock` for the `workers` slice - `workerConfigLock` for the `Server.Config.NumSchedulers` and `Server.Config.EnabledSchedulers` values ## Other * Adding docs for scheduler worker api * Add changelog message Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>	2022-01-06 11:56:13 -05:00
Tim Gross	2806dc2bd7	docs/tests for multiple HTTP address config (#11760 )	2022-01-03 10:17:13 -05:00
Kevin Schoonover	5d9a506bc0	agent: support multiple http address in addresses.http (#11582 )	2022-01-03 09:33:53 -05:00
James Rasell	45f4689f9c	chore: fixup inconsistent method receiver names. (#11704 )	2021-12-20 11:44:21 +01:00
Tim Gross	c7cc3cf4dc	cli: stream raft logs to operator raft logs subcommand (#11684 ) The `nomad operator raft logs` command uses a raft helper that reads in the logs from raft and serializes them to JSON. The previous implementation returned the slice of all logs and then serializes the entire object. Update the helper to stream the log entries and then serialize them as newline-delimited JSON.	2021-12-16 13:38:58 -05:00
Tim Gross	f2615992a4	cli: unhide advanced operator raft debugging commands (#11682 ) The `nomad operator raft` and `nomad operator snapshot state` subcommands for inspecting on-disk raft state were hidden and undocumented. Expose and document these so that advanced operators have support for these tools.	2021-12-16 10:32:11 -05:00
Tim Gross	536e3c5282	`nomad eval list` command (#11675 ) Use the new filtering and pagination capabilities of the `Eval.List` RPC to provide filtering and pagination at the command line. Also includes note that `nomad eval status -json` is deprecated and will be replaced with a single evaluation view in a future version of Nomad.	2021-12-15 11:58:38 -05:00
Tim Gross	f8a133a810	cli: ensure `-stale` flag is respected by `nomad operator debug` (#11678 ) When a cluster doesn't have a leader, the `nomad operator debug` command can safely use stale queries to gracefully degrade the consistency of almost all its queries. The query parameter for these API calls was not being set by the command. Some `api` package queries do not include `QueryOptions` because they target a specific agent, but they can potentially be forwarded to other agents. If there is no leader, these forwarded queries will fail. Provide methods to call these APIs with `QueryOptions`.	2021-12-15 10:44:03 -05:00
Tim Gross	a0cf5db797	provide `-no-shutdown-delay` flag for job/alloc stop (#11596 ) Some operators use very long group/task `shutdown_delay` settings to safely drain network connections to their workloads after service deregistration. But during incident response, they may want to cause that drain to be skipped so they can quickly shed load. Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and `nomad job stop` commands that bypasses the delay. This sets a new desired transition state on the affected allocations that the allocation/task runner will identify during pre-kill on the client. Note (as documented here) that using this flag will almost always result in failed inbound network connections for workloads as the tasks will exit before clients receive updated service discovery information and won't be gracefully drained.	2021-12-13 14:54:53 -05:00
Tim Gross	624ecab901	evaluations list pagination and filtering (#11648 ) API queries can request pagination using the `NextToken` and `PerPage` fields of `QueryOptions`, when supported by the underlying API. Add a `NextToken` field to the `structs.QueryMeta` so that we have a common field across RPCs to tell the caller where to resume paging from on their next API call. Include this field on the `api.QueryMeta` as well so that it's available for future versions of List HTTP APIs that wrap the response with `QueryMeta` rather than returning a simple list of structs. In the meantime callers can get the `X-Nomad-NextToken`. Add pagination to the `Eval.List` RPC by checking for pagination token and page size in `QueryOptions`. This will allow resuming from the last ID seen so long as the query parameters and the state store itself are unchanged between requests. Add filtering by job ID or evaluation status over the results we get out of the state store. Parse the query parameters of the `Eval.List` API into the arguments expected for filtering in the RPC call.	2021-12-10 13:43:03 -05:00
Lukas W	0e5958d671	CLI: Return non-zero exit code when deployment fails in `nomad run` (#11550 ) * Exit non-zero from run command if deployment fails * Fix typo in deployment monitor introduced in 0edda11	2021-12-09 09:09:28 -05:00
Vyacheslav Morov	6a244f18ad	cli: Add var args to plan output. (#11631 )	2021-12-07 10:43:52 -05:00
Tim Gross	03e697a69d	scheduler: config option to reject job registration (#11610 ) During incident response, operators may find that automated processes elsewhere in the organization can be generating new workloads on Nomad clusters that are unable to handle the workload. This changeset adds a field to the `SchedulerConfiguration` API that causes all job registration calls to be rejected unless the request has a management ACL token.	2021-12-06 15:20:34 -05:00
Derek Strickland	fb6dbffa59	Override TLS flags individually for meta commands (#11592 ) * Override TLS flags individually for meta commands * Update command/meta.go Co-authored-by: Tim Gross <tgross@hashicorp.com> Co-authored-by: Tim Gross <tgross@hashicorp.com>	2021-12-01 12:07:48 -05:00

... 3 4 5 6 7 ...

3539 commits