open-nomad

Commit Graph

Author	SHA1	Message	Date
Luiz Aoqui	61d79e75b0	docs: add docs for the autoscaler `on_error` and `on_check_error` configuration (#12083 )	2022-02-24 12:12:29 -05:00
James Rasell	bc6056cbbe	Merge pull request #12122 from hashicorp/b-api-remove-namespace-test-ent-tag api: remove ent build tag on namespace test file.	2022-02-24 17:13:15 +01:00
James Rasell	8f175d44da	api: remove ent build tag on namespace test file.	2022-02-24 16:40:04 +01:00
Tim Gross	22cf24a6bd	CSI: retry claims from client when max claims are reached (#12113 ) When the alloc runner claims a volume, an allocation for a previous version of the job may still have the volume claimed because it's still shutting down. In this case we'll receive an error from the server. Retry this error until we succeed or until a very long timeout expires, to give operators a chance to recover broken plugins. Make the alloc runner hook tolerant of temporary RPC failures.	2022-02-24 10:39:07 -05:00
Tim Gross	cfe3117af8	CSI: enforce usage at claim time (#12112 ) * Remove redundant schedulable check in `FreeWriteClaims`. If a volume has been created but not yet claimed, its capabilities will be checked in `WriteSchedulable` at both scheduling time and claim time. We don't need to also check them in the `FreeWriteClaims` method. * Enforce maximum volume claims for writers. When the scheduler checks feasibility for CSI volumes, the check is fairly loose: earlier versions of the same job are not counted as active claims. This allows the scheduler to place new allocations for the new version of a job, under the assumption that we'll replace the existing allocations and their volume claims. But when the alloc runner claims the volume, we need to enforce the active claims even if they're for allocations of an earlier version of the job. Otherwise we'll try to mount a volume that's currently being unmounted, and this will cause replacement allocations to frequently fail. * Enforce single-node reader check for read-only volumes. When the alloc runner makes a claim for a read-only volume, we only check that the volume is potentially schedulable and not that it actually has free read claims.	2022-02-24 09:37:37 -05:00
Sander Mol	42b338308f	add go-sockaddr templating support to nomad consul address (#12084 )	2022-02-24 09:34:54 -05:00
Florian Apolloner	3bced8f558	namespaces: allow enabling/disabling allowed drivers per namespace	2022-02-24 09:27:32 -05:00
Seth Hoenig	57b9c64b8f	Merge pull request #12107 from hashicorp/use-bbolt core: swap bolt impl and enable configuring raft freelist sync behavior	2022-02-24 08:25:54 -06:00
Seth Hoenig	8e6d97744b	docs: emphasize snapshot before upgrading	2022-02-24 08:22:41 -06:00
Tim Gross	5b7b9fdafb	csi: tolerate missing plugins on job delete (#12114 ) If a plugin job fails before successfully fingerprinting the plugins, the plugin will not exist when we try to delete the job. Tolerate missing plugins.	2022-02-24 08:53:15 -05:00
Seth Hoenig	a0350b0608	command: switch from raft-boltdb to raft-boltdb/v2	2022-02-23 14:43:59 -06:00
Seth Hoenig	da9b978806	client: resolve rebase conflict	2022-02-23 14:32:32 -06:00
Seth Hoenig	0420724c14	build: disallow old boltdb during build	2022-02-23 14:28:31 -06:00
Seth Hoenig	ca84ba12ac	agent: switch to go.etc.io/bbolt for state store This PR modifies the server and client agents to use `go.etc.io/bbolt` as the implementation for their state stores.	2022-02-23 14:28:31 -06:00
Seth Hoenig	de95998faa	core: switch to go.etc.io/bbolt This PR swaps the underlying BoltDB implementation from boltdb/bolt to go.etc.io/bbolt. In addition, the Server has a new configuration option for disabling NoFreelistSync on the underlying database. Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81 Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720	2022-02-23 14:26:41 -06:00
Tim Gross	246db87a74	CSI: allow for concurrent plugin allocations (#12078 ) The dynamic plugin registry assumes that plugins are singletons, which matches the behavior of other Nomad plugins. But because dynamic plugins like CSI are implemented by allocations, we need to handle the possibility of multiple allocations for a given plugin type + ID, as well as behaviors around interleaved allocation starts and stops. Update the data structure for the dynamic registry so that more recent allocations take over as the instance manager singleton, but we still preserve the previous running allocations so that restores work without racing. Multiple allocations can run on a client for the same plugin, even if only during updates. Provide each plugin task a unique path for the control socket so that the tasks don't interfere with each other.	2022-02-23 15:23:07 -05:00
Tim Gross	e5a52b0b6f	CSI: add missing plugin capabilities to api response (#12108 ) Detection of the full set of plugin capabilities was added in Nomad 1.1 for the volume creation workflow, but these were not added to the API response for plugins.	2022-02-23 15:22:29 -05:00
Tim Gross	17dc0adee3	csi: fix broken test (#12110 )	2022-02-23 13:48:39 -05:00
Charlie Voiselle	01f6e57602	Fixed scheduler config examples (#12049 )	2022-02-23 12:58:29 -05:00
Tim Gross	57a546489f	CSI: minor refactoring (#12105 ) * rename method checking that free write claims are available * use package-level variables for claim errors * semgrep fix for testify	2022-02-23 11:13:51 -05:00
Tim Gross	de134d9783	csi: fix mocked modes in volumewatcher test (#12104 ) The volumewatcher test incorrectly represents the change in attachment and access modes introduced in Nomad 1.1.0 to support volume creation. This leads to a test that happens to pass but only accidentally. Update the test to correctly represent the volume modes set by the existing claims on the test volumes.	2022-02-23 09:51:20 -05:00
Mike Nomitch	f3d1cf4dbd	Merge pull request #12065 from hashicorp/docs-add-form-link Adding link to interview form	2022-02-22 11:05:20 -08:00
Tim Gross	309ac6c3d8	csi: don't wait to fire initial unmount RPC (#12102 ) In PR #11892 we updated the `csi_hook` to unmount the volume locally via the CSI node RPCs before releasing the claim from the server. The timer for this hook was initialized with the retry time, forcing us to wait 1s before making the first unmount RPC calls. Use the new helper for timers to ensure we clean up the timer nicely.	2022-02-22 13:43:06 -05:00
Luiz Aoqui	02ee075506	docs: update link to `mount` in Docker task driver (#12101 )	2022-02-22 13:39:49 -05:00
Michael Schurter	6ccdc6a022	Merge pull request #11600 from hashicorp/f-remove-unused-version core: remove all traces of unused protocol version	2022-02-22 09:51:42 -08:00
Michael Schurter	5410ec81c5	docs: add changelog for #11600	2022-02-18 16:16:19 -08:00
Michael Schurter	7494a0c4fd	core: remove all traces of unused protocol version Nomad inherited protocol version numbering configuration from Consul and Serf, but unlike those projects Nomad has never used it. Nomad's `protocol_version` has always been `1`. While the code is effectively unused and therefore poses no runtime risks to leave, I felt like removing it was best because: 1. Nomad's RPC subsystem has been able to evolve extensively without needing to increment the version number. 2. Nomad's HTTP API has evolved extensively without increment `API{Major,Minor}Version`. If we want to version the HTTP API in the future, I doubt this is the mechanism we would choose. 3. The presence of the `server.protocol_version` configuration parameter is confusing since `server.raft_protocol` is an important parameter for operators to consider. Even more confusing is that there is a distinct Serf protocol version which is included in `nomad server members` output under the heading `Protocol`. `raft_protocol` is the only protocol version relevant to Nomad developers and operators. The other protocol versions are either deadcode or have never changed (Serf). 4. If we were to need to version the RPC, HTTP API, or Serf protocols, I don't think these configuration parameters and variables are the best choice. If we come to that point we should choose a versioning scheme based on the use case and modern best practices -- not this 6+ year old dead code.	2022-02-18 16:12:36 -08:00
Adrián López	b1565c7bf4	Update autoscaler AWS ASG target docs: AWS keypair can be empty (#11977 )	2022-02-18 17:29:19 -05:00
James Rasell	f2d73442e8	docs: add autoscaler hcloud target plugin link. (#12087 )	2022-02-18 17:28:38 -05:00
Michael Schurter	48aaa2c7d9	Merge pull request #11975 from hashicorp/f-connect-debugging connect: write envoy bootstrap debugging info	2022-02-18 13:56:22 -08:00
Seth Hoenig	6406615ebd	Merge pull request #12011 from hashicorp/cc-use-proxyid connect: bootstrap envoy using -proxy-id	2022-02-18 15:42:21 -06:00
Seth Hoenig	6550c90198	connect: bootstrap envoy using -proxy-id This PR modifies the Consul CLI arguments used to bootstrap envoy for Connect sidecars to make use of '-proxy-id' instead of '-sidecar-for'. Nomad registers the sidecar service, so we know what ID it has. The '-sidecar-for' was intended for use when you only know the name of the service for which the sidecar is being created. The improvement here is that using '-proxy-id' does not require an underlying request for listing Consul services. This will make make the interaction between Nomad and Consul more efficient. Closes #10452	2022-02-18 14:58:23 -06:00
Michael Schurter	27b8112123	connect: write envoy bootstrap debugging info When Consul Connect just works, it's wonderful. When it doesn't work it can be exceeding difficult to debug: operators have to check task events, Nomad logs, Consul logs, Consul APIs, and even then critical information is missing. Using Consul to generate a bootstrap config for Envoy is notoriously difficult. Nomad doesn't even log stderr, so operators are left trying to piece together what went wrong. This patch attempts to provide maximal context which unfortunately includes secrets. Secrets are always restricted to the secrets/ directory. This makes debugging a little harder, but allows operators to know exactly what operation Nomad was trying to perform. What's added: - stderr is sent to alloc/logs/envoy_bootstrap.stderr.0 - the CLI is written to secrets/.envoy_bootstrap.cmd - the environment is written to secrets/.envoy_bootstrap.env as JSON Accessing this information is unfortunately awkward: ``` nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.env nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.cmd nomad alloc fs b36a alloc/logs/envoy_bootstrap.stderr.0 ``` The above assumes an alloc id that starts with `b36a` and a Connect sidecar proxy for a service named `count-countdash`. If the alloc is unable to start successfully, the debugging files are only accessible from the host filesystem.	2022-02-18 12:02:36 -08:00
Seth Hoenig	c3d28b996d	Merge pull request #12079 from hashicorp/deps-update-raft deps: upgrade hashicorp/raft to v1.3.5	2022-02-18 10:03:21 -06:00
Seth Hoenig	c8d27257e7	deps: upgrade hashicorp/raft to v1.3.5	2022-02-17 13:49:56 -06:00
Seth Hoenig	a17e3ec83f	Merge pull request #12080 from hashicorp/b-fix-gobin-circle build: BIN value must use single-path GOPATH value	2022-02-17 13:48:36 -06:00
Jai	7a211de861	Merge pull request #12082 from hashicorp/f-ui/refactor-namespace namespace refactoring	2022-02-17 11:04:36 -05:00
Seth Hoenig	7519f3e2f5	build: use single-path GOPATH set in makefile When GOBIN is not set, BIN must be set to the single-path workaround value of GOPATH, because Circle.	2022-02-17 09:26:13 -06:00
Michael Klein	99e8583990	fix: linting issues and remove remainidn pauseTest	2022-02-17 16:06:49 +01:00
Michael Klein	8c4bbdb38c	fix: reflect namespace change volume-detail-test	2022-02-17 15:20:11 +01:00
Michael Klein	0f07af68cc	fix: prettier related volume-list - test	2022-02-17 15:19:45 +01:00
Michael Klein	8d4a915941	fix: prettier related test-failutre task-group-detail	2022-02-17 15:19:20 +01:00
Michael Klein	a41ed7d10f	fix: task-group-detail tests due to namespace changes URLs have changed - tests need to reflect that.	2022-02-17 14:50:05 +01:00
Michael Klein	649f59190e	fix: breadcrumbs allocations due to recent namespace changes * change the breadcrumbs generation to use `idWithNamespace` * adapt tests to reflect new URLs for jobs with namespaces	2022-02-17 14:38:27 +01:00
Michael Klein	e15883ad79	fix: use `@<namespace>` with remaining `JobDetail.visit`s	2022-02-17 13:22:15 +01:00
Michael Klein	e46df89a7a	fix: pack-detail test We need to change the way we access `JobDetail`-pages based on recent namespace changes.	2022-02-17 12:59:58 +01:00
Michael Klein	9a37550204	fix: anonymous policy test job-details We need to access job-details differently when they have a namespace due to recent namespace changes - we need to make the tests reflect that.	2022-02-17 12:45:01 +01:00
Michael Klein	c9a839b76f	fix: less cleverness™ when checking currentURL job-details There is no need to check the namespace query-param anymore with `urlWithNamespace` but some tests still are using this. We refactor the tests to be less clever and check the URL in a more manual approach by explicitly defining how the URL should look like if a job belongs to a namespace.	2022-02-17 12:42:23 +01:00
Michael Klein	37993d9fb8	fix: client-detail-test no default namespace param Recent changes changed the behavior of not adding the `@default` -namespace - we need to adapt the tests accordingly	2022-02-17 12:41:33 +01:00
Michael Klein	b0a90b425e	fix: allocations page tests regarding job links Default namespaced jobs don't append the `@default`-id anymore due to recent `jobs.job#serialize` changes.	2022-02-17 11:56:29 +01:00

1 2 3 4 5 ...

22611 Commits All Branches Search

22611 Commits

All Branches