open-nomad

Commit Graph

Author	SHA1	Message	Date
Luiz Aoqui	4dd8b6b571	cli: include all possible scores in alloc status metric table (#11128 )	2021-09-08 17:30:11 -04:00
Luiz Aoqui	305f0b5702	ui: set the job namespace when redirecting after the job is dispatched (#11141 )	2021-09-07 12:27:33 -04:00
Isabel Suchanek	ab51050ce8	events: fix wildcard namespace handling (#10935 ) This fixes a bug in the event stream API where it currently interprets namespace=* as an actual namespace, not a wildcard. When Nomad parses incoming requests, it sets namespace to default if not specified, which means the request namespace will never be an empty string, which is what the event subscription was checking for. This changes the conditional logic to check for a wildcard namespace instead of an empty one. It also updates some event tests to include the default namespace in the subscription to match current behavior. Fixes #10903	2021-09-02 09:36:55 -07:00
Luiz Aoqui	eb0ed980a5	ui: set namespace when looking for and displaying children jobs (#11110 )	2021-09-01 14:40:25 -04:00
Mahmood Ali	641afebeed	update golang to 1.16.7 (#11083 )	2021-08-25 11:56:46 -04:00
Roopak Venkatakrishnan	dcf5981bcd	Update x/sys to support go 1.17 (#11065 ) Co-authored-by: James Rasell <jrasell@hashicorp.com>	2021-08-25 17:23:01 +02:00
Luiz Aoqui	104d29e808	Don't timestamp active log file (#11070 ) * don't timestamp active log file * website: update log_file default value * changelog: add entry for #11070 * website: add upgrade instructions for log_file in v1.14 and v1.2.0	2021-08-23 11:27:34 -04:00
Mahmood Ali	84a3522133	Consider all system jobs for a new node (#11054 ) When a node becomes ready, create an eval for all system jobs across namespaces. The previous code uses `job.ID` to deduplicate evals, but that ignores the job namespace. Thus if there are multiple jobs in different namespaces sharing the same ID/Name, only one will be considered for running in the new node. Thus, Nomad may skip running some system jobs in that node.	2021-08-18 09:50:37 -04:00
Michael Schurter	a7aae6fa0c	Merge pull request #10848 from ggriffiths/listsnapshot_secrets CSI Listsnapshot secrets support	2021-08-10 15:59:33 -07:00
Mahmood Ali	ea003188fa	system: re-evaluate node on feasibility changes (#11007 ) Fix a bug where system jobs may fail to be placed on a node that initially was not eligible for system job placement. This changes causes the reschedule to re-evaluate the node if any attribute used in feasibility checks changes. Fixes https://github.com/hashicorp/nomad/issues/8448	2021-08-10 17:17:44 -04:00
Mahmood Ali	bfc766357e	deployments: canary=0 is implicitly autopromote (#11013 ) In a multi-task-group job, treat 0 canary groups as auto-promote. This change fixes an edge case where Nomad requires a manual promotion, if the job had any group with canary=0 and rest of groups having auto_promote set. Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2021-08-10 17:06:40 -04:00
Mahmood Ali	efcc8bf082	Speed up client startup and registration (#11005 ) Speed up client startup, by retrying more until the servers are known. Currently, if client fingerprinting is fast and finishes before the client connect to a server, node registration may be delayed by 15 seconds or so! Ideally, we'd wait until the client discovers the servers and then retry immediately, but that requires significant code changes. Here, we simply retry the node registration request every second. That's basically the equivalent of check if the client discovered servers every second. Should be a cheap operation. When testing this change on my local computer and where both servers and clients are co-located, the time from startup till node registration dropped from 34 seconds to 8 seconds!	2021-08-10 17:06:18 -04:00
Luiz Aoqui	c1d1906628	ui: add missing pipe separator in parameterized and periodic jobs (#11020 )	2021-08-10 13:48:20 -04:00
Jai	29a7fe6efa	Merge pull request #10666 from hashicorp/b-ui/search-namespaces ui: Fix fuzzy search namespace-handling	2021-08-10 13:13:20 -04:00
Jai Bhagat	a9b9132f35	edit hierarchy to lead with namespace before job	2021-08-10 10:35:36 -04:00
Luiz Aoqui	d283e90c35	ui: only dipslay "Dispatch Job" button on parameterized jobs (#11019 )	2021-08-09 17:49:08 -04:00
Michael Schurter	c39ca0773d	Merge pull request #10951 from hashicorp/b-cn-proxy consul/connect: avoid warn messages on connect proxy errors	2021-08-06 15:25:40 -07:00
James Rasell	a9a04141a3	consul/connect: avoid warn messages on connect proxy errors When creating a TCP proxy bridge for Connect tasks, we are at the mercy of either end for managing the connection state. For long lived gRPC connections the proxy could reasonably expect to stay open until the context was cancelled. For the HTTP connections used by connect native tasks, we experience connection disconnects. The proxy gets recreated as needed on follow up requests, however we also emit a WARN log when the connection is broken. This PR lowers the WARN to a TRACE, because these disconnects are to be expected. Ideally we would be able to proxy at the HTTP layer, however Consul or the connect native task could be configured to expect mTLS, preventing Nomad from MiTM the requests. We also can't mange the proxy lifecycle more intelligently, because we have no control over the HTTP client or server and how they wish to manage connection state. What we have now works, it's just noisy. Fixes #10933	2021-08-05 11:27:35 +02:00
James Rasell	c7449b4810	changelog: add entry for #10929	2021-08-05 10:48:36 +02:00
Luiz Aoqui	7341615fac	changelog: add entry for #10934 (#11001 )	2021-08-04 11:33:18 -04:00
Mahmood Ali	0bc12fba7c	Only initialize task.VolumeMounts when not-nil (#10990 ) 1.1.3 had a bug where task.VolumeMounts will be an empty slice instead of nil. Eventually, it gets canonicalized and is set to `nil`, but it seems to confuse dry-run planning. The regression was introduced in https://github.com/hashicorp/nomad/pull/10855/files#diff-56b3c82fcbc857f8fb93a903f1610f6e6859b3610a4eddf92bad9ea27fdc85ecL1028-R1037 . Curiously, it's the only place where `len(apiTask.VolumeMounts)` check was dropped. I assume it was dropped accidentally. Fixes #10981	2021-08-02 13:08:10 -04:00
Mahmood Ali	22a91f7003	update changelog (#10963 )	2021-07-28 16:02:04 -04:00
Grant Griffiths	fecbbaee22	CSI ListSnapshots secrets implementation Signed-off-by: Grant Griffiths <ggriffiths@purestorage.com>	2021-07-28 11:30:29 -07:00
Mahmood Ali	62fe6f12f9	api: revert to defaulting to http/1 (#10958 ) * api: revert to defaulting to http/1 PR #10778 incidentally changed the api http client to connect with HTTP/2 first. However, the websocket libraries used in `alloc exec` features don't handle http/2 well, and don't downgrade to http/1 gracefully. Given that the switch is incidental, and not requested by users. Furthermore, api consumers can opt-in to forcing http/2 by setting custom http clients. Fixes #10922	2021-07-28 11:21:53 -04:00
Michael Schurter	ea996c321d	Merge pull request #10916 from hashicorp/f-audit-log-mode Add audit log file mode config parameter	2021-07-27 12:16:37 -07:00
Michael Schurter	d64d70607a	docs: add changelog for #10916	2021-07-27 11:51:38 -07:00
Mahmood Ali	ac3cf10849	nomad: only activate one-time auth tokens with 1.1.0 (#10952 ) Fix a panic in handling one-time auth tokens, used to support `nomad ui --authenticate`. If the nomad leader is a 1.1.x with some servers running as 1.0.x, the pre-1.1.0 servers risk crashing and the cluster may lose quorum. That can happen when `nomad authenticate -ui` command is issued, or when the leader scans for expired tokens every 10 minutes. Fixed #10943 .	2021-07-27 13:17:55 -04:00
Mahmood Ali	d97927ebcf	cli: Use glint to determine if os.Stdout is tty (#10926 ) Use glint to determine if os.Stdout is a terminal. glint Terminal renderer expects os.Stdout [not only to be a terminal, but also to have non-zero size](`b492b545f6/renderer_term.go (L39-L46)`). It's unclear how this condition arises, but this additional check causes Nomad to render deployments progress through glint when glint cannot support it. By using golint to perform the check, we eliminate the risk of mis-judgement.	2021-07-23 11:27:47 -04:00
Jai	0ccf60444d	Merge pull request #10893 from hashicorp/f-ui/namespace-acl-bug edit ember-can to add additional attribute for namespace	2021-07-22 12:57:34 -04:00
Jai Bhagat	5d33884cdc	ui: fixes #10885	2021-07-22 11:44:25 -04:00
Seth Hoenig	54d9bad657	Merge pull request #10904 from hashicorp/b-no-affinity-intern core: remove internalization of affinity strings	2021-07-22 09:09:07 -05:00
Luiz Aoqui	484037aff1	fix `nomad alloc signal` help message (#10917 )	2021-07-21 11:02:44 -04:00
Luiz Aoqui	a26874215a	changelog: add entry for #10675 (#10919 )	2021-07-21 10:05:48 -04:00
Mahmood Ali	8df9b1fd0f	client: avoid acting on stale data after launch (#10907 ) When the client launches, use a consistent read to fetch its own allocs, but allow stale read afterwards as long as reads don't revert into older state. This change addresses an edge case affecting restarting client. When a client restarts, it may fetch a stale data concerning its allocs: allocs that have completed prior to the client shutdown may still have "run/running" desired/client status, and have the client attempt to re-run again. An alternative approach is to track the indices such that the client set MinQueryIndex on the maximum index the client ever saw, or compare received allocs against locally restored client state. Garbage collection complicates this approach (local knowledge is not complete), and the approach still risks starting "dead" allocations (e.g. the allocation may have been placed when client just restarted and have already been reschuled by the time the client started. This approach here is effective against all kinds of stalness problems with small overhead.	2021-07-20 15:13:28 -04:00
Michael Schurter	efe8ea2c2c	Merge pull request #10849 from benbuzbee/benbuz/fix-destroy Don't treat a failed recover + successful destroy as a successful recover	2021-07-19 10:49:31 -07:00
Michael Schurter	6aee3de420	docs: add changelog entry for #10849	2021-07-16 15:58:58 -07:00
Seth Hoenig	ac5c83cafd	core: remove internalization of affinity strings Basically the same as #10896 but with the Affinity struct. Since we use reflect.DeepEquals for job comparison, there is risk of false positives for changes due to a job struct with memoized vs non-memoized strings. Closes #10897	2021-07-15 15:15:39 -05:00
Mahmood Ali	996ea1fa46	Merge pull request #10875 from hashicorp/b-namespace-flag-override cli: `-namespace` should override job namespace	2021-07-14 17:28:36 -04:00
Mahmood Ali	26509f2299	Merge pull request #10864 from hashicorp/b-10746-plan-datacenter scheduler: datacenter updates should be destructive	2021-07-14 17:25:13 -04:00
Seth Hoenig	3fce1d3f11	Merge pull request #10898 from hashicorp/f-rm-vendor build: no longer use vendor directory	2021-07-14 13:00:41 -05:00
Seth Hoenig	1b5f902842	docs: update changelog	2021-07-14 11:21:00 -05:00
Seth Hoenig	a4af3fcad0	docs: add changelog entry	2021-07-14 10:46:40 -05:00
James Rasell	66d3b98db5	Merge pull request #10892 from hashicorp/b-gh-10890 deps: update consul-template to v0.25.2.	2021-07-14 09:26:16 +02:00
Luiz Aoqui	dd8213abc1	changelog: add entry for GH-10563 (#10894 )	2021-07-13 16:12:41 -04:00
James Rasell	8da0663a06	changelog: add entry for #10892	2021-07-13 10:29:44 +02:00
Seth Hoenig	f80ae067a8	consul/connect: fix bug causing high cpu with multiple connect sidecars in group This PR fixes a bug where the underlying Envoy process of a Connect gateway would consume a full core of CPU if there is more than one sidecar or gateway in a group. The utilization was being caused by Consul injecting an envoy_ready_listener on 127.0.0.1:8443, of which only one of the Envoys would be able to bind to. The others would spin in a hot loop trying to bind the listener. As a workaround, we now specify -address during the Envoy bootstrap config step, which is how Consul maps this ready listener. Because there is already the envoy_admin_listener, and we need to continue supporting running gateways in host networking mode, and in those case we want to use the same port value coming from the service.port field, we now bind the admin listener to the 127.0.0.2 loop-back interface, and the ready listener takes 127.0.0.1. This shouldn't make a difference in the 99.999% use case where envoy is being run in its official docker container. Advanced users can reference ${NOMAD_ENVOY_ADMIN_ADDR_<service>} (as they 'ought to) if needed, as well as the new variable ${NOMAD_ENVOY_READY_ADDR_<service>} for the envoy_ready_listener.	2021-07-09 14:34:44 -05:00
Tim Gross	5937f54fc3	client: interpolate meta blocks with task environment (#10876 ) Adds missing interpolation step to the `meta` blocks when building the task environment. Also fixes incorrect parameter order in the test assertion and adds diagnostics to the test.	2021-07-08 16:03:15 -04:00
Seth Hoenig	7c3db812fd	consul/connect: remove sidecar proxy before removing parent service This PR will have Nomad de-register a sidecar proxy service before attempting to de-register the parent service. Otherwise, Consul will emit a warning and an error. Fixes #10845	2021-07-08 13:30:19 -05:00
Tim Gross	a3bc87a2eb	cli: `-namespace` should override job namespace When a jobspec doesn't include a namespace, we provide it with the default namespace, but this ends up overriding the explicit `-namespace` flag. This changeset uses the same logic as region parsing to create an order of precedence: the query string parameter (the `-namespace` flag) overrides the API request body which overrides the jobspec.	2021-07-08 13:17:27 -04:00
Seth Hoenig	868b246128	consul/connect: Avoid assumption of parent service when filtering connect proxies This PR uses regex-based matching for sidecar proxy services and checks when syncing with Consul. Previously we would check if the parent of the sidecar was still being tracked in Nomad. This is a false invariant - one which we must not depend when we make #10845 work. Fixes #10843	2021-07-08 09:43:41 -05:00

1 2

64 Commits