open-nomad

Author	SHA1	Message	Date
Chris Baker	7c6071e6c4	updated alloc_endpoint to mutate a copy of the returned allocation, instead of the instance in the state store	2020-11-15 17:52:50 +00:00
Chris Baker	b244d5e949	documenting test for #9367	2020-11-15 17:47:50 +00:00
Seth Hoenig	bb8a5816a0	jobspec: add support for headers in artifact stanza This PR adds the ability to set HTTP headers when downloading an artifact from an `http` or `https` resource. The implementation in `go-getter` is such that a new `HTTPGetter` must be created for each artifact that sets headers (as opposed to conveniently setting headers per-request). This PR maintains the memoization of the default Getter objects, creating new ones only for artifacts where headers are set. Closes #9306	2020-11-13 12:03:54 -06:00
Seth Hoenig	b19bc6be2b	consul: prevent re-registration churn by correctly comparing sidecar tags Previously, connect sidecars would be re-registered with consul every cycle of Nomad's reconciliation loop around Consul service registrations. This is because part of the comparison used `reflect.DeepEqual` on []string objects, which returns false when one object is `[]string{}` and the other is `[]string{}(nil)`. Unforunately, this was always the case, and every Connect sidecar service would be re-registered on every iteration, which happens every 30 seconds.	2020-11-11 18:01:17 -06:00
Nick Ethier	5e1634eda1	structs: canonicalize allocatedtaskresources to populate shared ports (#9309 )	2020-11-11 16:21:47 -05:00
Tim Gross	60874ebe25	csi: Postrun hook should not change mode (#9323 ) The unpublish workflow requires that we know the mode (RW vs RO) if we want to unpublish the node. Update the hook and the Unpublish RPC so that we mark the claim for release in a new state but leave the mode alone. This fixes a bug where RO claims were failing node unpublish. The core job GC doesn't know the mode, but we don't need it for that workflow, so add a mode specifically for GC; the volumewatcher uses this as a sentinel to check whether claims (with their specific RW vs RO modes) need to be claimed.	2020-11-11 13:06:30 -05:00
Mahmood Ali	69849a42a5	Merge pull request #9298 from hashicorp/f-hcl2-localsvars HCL2: Variables and Locals	2020-11-09 16:44:37 -05:00
Luiz Aoqui	ea81ac5d3d	Merge pull request #9296 from hashicorp/b-remove-namespace-from-scale-request Remove Namespace field from JobScaleRequest	2020-11-09 15:13:33 -05:00
Mahmood Ali	1ae3e8a8eb	Start using the new jobspec2 API	2020-11-09 15:01:31 -05:00
Luiz Aoqui	c536286c7a	remove Namespace field from JobScaleRequest	2020-11-09 13:02:05 -05:00
Nick Ethier	04f5c4ee5f	ar/groupservice: remove drivernetwork (#9233 ) * ar/groupservice: remove drivernetwork * consul: allow host address_mode to accept raw port numbers * consul: fix logic for blank address	2020-11-05 15:00:22 -05:00
Kris Hicks	1da9e7fc67	Add event sink API and CLI commands (#9226 ) Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>	2020-11-02 09:57:35 -08:00
Chris Baker	719077a26d	added new policy capabilities for recommendations API state store: call-out to generic update of job recommendations from job update method recommendations API work, and http endpoint errors for OSS support for scaling polices in task block of job spec add query filters for ScalingPolicy list endpoint command: nomad scaling policy list: added -job and -type	2020-10-28 14:32:16 +00:00
Drew Bailey	86080e25a9	Send events to EventSinks (#9171 ) * Process to send events to configured sinks This PR adds a SinkManager to a server which is responsible for managing managed sinks. Managed sinks subscribe to the event broker and send events to a sink writer (webhook). When changes to the eventstore are made the sinkmanager and managed sink are responsible for reloading or starting a new managed sink. * periodically check in sink progress to raft Save progress on the last successfully sent index to raft. This allows a managed sink to resume close to where it left off in the event of a lost server or leadership change dereference eventsink so we can accurately use the watchch When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger	2020-10-26 17:27:54 -04:00
Drew Bailey	1ae39a9ed9	event sink crud operation api (#9155 ) * network sink rpc/api plumbing state store methods and restore upsert sink test get sink delete sink event sink list and tests go generate new msg types validate sink on upsert * go generate	2020-10-23 14:23:00 -04:00
Michael Schurter	c2dd9bc996	core: open source namespaces	2020-10-22 15:26:32 -07:00
Mahmood Ali	059e87c862	Merge pull request #9142 from hashicorp/f-hclv2-2.3 Support HCLv2 for Nomad jobs	2020-10-22 12:26:28 -05:00
Drew Bailey	f3dcefe5a9	remove event durability (#9147 ) * remove event durability temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs * drop events table schema and state store methods * fix neweventbuffer invocations	2020-10-22 12:21:03 -04:00
Mahmood Ali	f52bda4c30	api: update /render api to parse hclv2	2020-10-21 15:46:57 -04:00
Drew Bailey	6c788fdccd	Events/msgtype cleanup (#9117 ) * use msgtype in upsert node adds message type to signature for upsert node, update tests, remove placeholder method * UpsertAllocs msg type test setup * use upsertallocs with msg type in signature update test usage of delete node delete placeholder msgtype method * add msgtype to upsert evals signature, update test call sites with test setup msg type handle snapshot upsert eval outside of FSM and ignore eval event remove placeholder upsertevalsmsgtype handle job plan rpc and prevent event creation for plan msgtype cleanup upsertnodeevents updatenodedrain msgtype msg type 0 is a node registration event, so set the default to the ignore type * fix named import * fix signature ordering on upsertnode to match	2020-10-19 09:30:15 -04:00
Drew Bailey	fba0d6dc6a	event buffer size and durable count must be non negative	2020-10-15 16:34:33 -04:00
Nick Ethier	4903e5b114	Consul with CNI and host_network addresses (#9095 ) * consul: advertise cni and multi host interface addresses * structs: add service/check address_mode validation * ar/groupservices: fetch networkstatus at hook runtime * ar/groupservice: nil check network status getter before calling * consul: comment network status can be nil	2020-10-15 15:32:21 -04:00
Michael Schurter	ea55c497b7	Merge pull request #9094 from hashicorp/f-1.0 s/0.13/1.0/g	2020-10-15 08:53:33 -07:00
James Rasell	42a6e7140f	Merge pull request #9083 from hashicorp/b-fix-enterprise-config-merge agent: fix enterprise config overlay merging.	2020-10-15 08:40:49 +02:00
Michael Schurter	9c3972937b	s/0.13/1.0/g 1.0 here we come!	2020-10-14 15:17:47 -07:00
Michael Schurter	dd09fa1a4a	Merge pull request #9055 from hashicorp/f-9017-resources api: add field filters to /v1/{allocations,nodes}	2020-10-14 14:49:39 -07:00
Michael Schurter	6890cffd7a	unify boolean parameter parsing	2020-10-14 12:23:25 -07:00
Drew Bailey	c463479848	filter on additional filter keys, remove switch statement duplication properly wire up durable event count move newline responsibility moves newline creation from NDJson to the http handler, json stream only encodes and sends now ignore snapshot restore if broker is disabled enable dev mode to access event steam without acl use mapping instead of switch use pointers for config sizes, remove unused ttl, simplify closed conn logic	2020-10-14 14:14:33 -04:00
Michael Schurter	8ccbd92cb6	api: add field filters to /v1/{allocations,nodes} Fixes #9017 The ?resources=true query parameter includes resources in the object stub listings. Specifically: - For `/v1/nodes?resources=true` both the `NodeResources` and `ReservedResources` field are included. - For `/v1/allocations?resources=true` the `AllocatedResources` field is included. The ?task_states=false query parameter removes TaskStates from /v1/allocations responses. (By default TaskStates are included.)	2020-10-14 10:35:22 -07:00
Drew Bailey	684807bddb	namespace filtering	2020-10-14 12:44:43 -04:00
Drew Bailey	df96b89958	Add EvictCallbackFn to handle removing entries from go-memdb when they are removed from the event buffer. Wire up event buffer size config, use pointers for structs.Events instead of copying.	2020-10-14 12:44:42 -04:00
Drew Bailey	315f77a301	rehydrate event publisher on snapshot restore address pr feedback	2020-10-14 12:44:41 -04:00
Drew Bailey	d793529d61	event durability count and cfg	2020-10-14 12:44:40 -04:00
Drew Bailey	b4c135358d	use Events to wrap index and events, store in events table	2020-10-14 12:44:39 -04:00
Drew Bailey	559517455a	wire up enable_event_publisher	2020-10-14 12:44:38 -04:00
Drew Bailey	4793bb4e01	Events/deployment events (#9004 ) * Node Drain events and Node Events (#8980) Deployment status updates handle deployment status updates (paused, failed, resume) deployment alloc health generate events from apply plan result txn err check, slim down deployment event one ndjson line per index * consolidate down to node event + type * fix UpdateDeploymentAllocHealth test invocations * fix test	2020-10-14 12:44:37 -04:00
Drew Bailey	a4a2975edf	Event Stream API/RPC (#8947 ) This Commit adds an /v1/events/stream endpoint to stream events from. The stream framer has been updated to include a SendFull method which does not fragment the data between multiple frames. This essentially treats the stream framer as a envelope to adhere to the stream framer interface in the UI. If the `encode` query parameter is omitted events will be streamed as newline delimted JSON.	2020-10-14 12:44:36 -04:00
James Rasell	e0734bed77	agent: fix enterprise config overlay merging.	2020-10-14 09:35:16 +02:00
Chris Baker	1d35578bed	removed backwards-compatible/untagged metrics deprecated in 0.7	2020-10-13 20:18:39 +00:00
Seth Hoenig	ed13e5723f	consul/connect: dynamically select envoy sidecar at runtime As newer versions of Consul are released, the minimum version of Envoy it supports as a sidecar proxy also gets bumped. Starting with the upcoming Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the default implementation of Connect sidecar proxy. This PR introduces a change such that each Nomad Client will query its local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545) and then launch the Connect sidecar proxy task using the latest supported version of Envoy. If the `SupportedProxies` API component is not available from Consul, Nomad will fallback to the old version of Envoy supported by old versions of Consul. Setting the meta configuration option `meta.connect.sidecar_image` or setting the `connect.sidecar_task` stanza will take precedence as is the current behavior for sidecar proxies. Setting the meta configuration option `meta.connect.gateway_image` will take precedence as is the current behavior for connect gateways. `meta.connect.sidecar_image` and `meta.connect.gateway_image` may make use of the special `${NOMAD_envoy_version}` variable interpolation, which resolves to the newest version of Envoy supported by the Consul agent. Addresses #8585 #7665	2020-10-13 09:14:12 -05:00
Yoan Blanc	891accb89a	use allow/deny instead of the colored alternatives (#9019 ) Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-10-12 08:47:05 -04:00
Seth Hoenig	6cffbecb3a	Merge pull request #9033 from pierreca/verify-remove-checks Do not double-remove checks removed by Consul	2020-10-06 10:16:13 -05:00
James Rasell	ffe6533ad1	Merge pull request #9027 from hashicorp/f-gh-9026 cli: move tests to use NewMockUi func.	2020-10-06 08:28:18 +02:00
Pierre Cauchois	1efe05f516	Do not double-remove checks removed by Consul When deregistering a service, consul also deregisters the associated checks. The current state keeps track of all services and all checks separately and deregisters them in sequence, which leads, whether during syncs or shutdowns, to check deregistrations happening twice and failing the second time (generating errors in logs) This fix includes: - a fix to the sync logic that just pulls the checks after the services have been synced - a fix to the shutdown mechanism that gets an updated list of checks after deregistering the services, so that we get a cleaner check deregistration process.	2020-10-06 00:30:29 +00:00
Chris Baker	7f701fddd0	updated docs and validation to further prohibit null chars in region, datacenter, and job name	2020-10-05 18:01:50 +00:00
James Rasell	2ed78b8a7e	cli: move tests to use NewMockUi func.	2020-10-05 16:07:41 +02:00
Kent 'picat' Gruber	5e1c716835	Merge pull request #8998 from hashicorp/keygen-32-bytes Use 32-byte key for gossip encryption to enable AES-256	2020-10-02 17:17:55 -04:00
Kent 'picat' Gruber	b03f79700c	Fix panic in test due to the agent's logger not being initialized yet So a null logger is used to avoid the problem.	2020-10-02 11:10:27 -04:00
Fredrik Hoem Grelland	953d4de8dd	update consul-template to v0.25.1 (#8988 )	2020-10-01 14:08:49 -04:00
Kent 'picat' Gruber	90e85f9add	Fix other usages of initKeyring func to use logger as third argument	2020-10-01 11:13:06 -04:00
Kent 'picat' Gruber	b98bb99dfe	Log AES-128 and AES-192 key sizes during keyring initialization	2020-10-01 11:12:14 -04:00
Michael Schurter	765473e8b0	jobspec: lower min cpu resources from 10->1 Since CPU resources are usually a soft limit it is desirable to allow setting it as low as possible to allow tasks to run only in "idle" time. Setting it to 0 is still not allowed to avoid potential unintentional side effects with allowing a zero value. While there may not be any side effects this commit attempts to minimize risk by avoiding the issue. This does not change the defaults.	2020-09-30 12:15:13 -07:00
Michael Schurter	1544341f09	Merge pull request #8862 from hashicorp/release-0.12.4 Prepare for 0.13 development cycle	2020-09-10 09:14:44 -07:00
Mahmood Ali	d4f385d6e1	Upgrade to golang 1.15 (#8858 ) Upgrade to golang 1.15 Starting with golang 1.5, setting Ctty value result in `Setctty set but Ctty not valid in child` error, as part of https://github.com/golang/go/issues/29458 . This commit lifts the fix in https://github.com/creack/pty/pull/97 .	2020-09-09 15:59:29 -04:00
Nomad Release bot	3b8a2f22dc	Generate files for 0.12.4-rc1 release	2020-09-03 02:59:23 +00:00
Tim Gross	b77fe023b5	MRD: move 'job stop -global' handling into RPC (#8776 ) The initial implementation of global job stop for MRD looped over all the regions in the CLI for expedience. This changeset includes the OSS parts of moving this into the RPC layer so that API consumers don't have to implement this logic themselves.	2020-08-28 14:28:13 -04:00
Lang Martin	7d483f93c0	csi: plugins track jobs in addition to allocations, and use job information to set expected counts (#8699 ) * nomad/structs/csi: add explicit job support * nomad/state/state_store: capture job updates directly * api/nodes: CSIInfo needs the AllocID * command/agent/csi_endpoint: AllocID was missing Co-authored-by: Tim Gross <tgross@hashicorp.com>	2020-08-27 17:20:00 -04:00
Seth Hoenig	9f1f2a5673	Merge branch 'master' into f-cc-ingress	2020-08-26 15:31:05 -05:00
Seth Hoenig	dfe179abc5	consul/connect: fixup some comments and context timeout	2020-08-26 13:17:16 -05:00
Tim Gross	f9b6c8153c	csi: fix panic in serializing nil allocs in volume API (#8735 ) - fix panic in serializing nil allocs in volume API - prevent potential panic in serializing plugin allocs	2020-08-25 10:13:05 -04:00
Seth Hoenig	26e77623e5	consul/connect: fixup tests to use new consul sdk	2020-08-24 12:02:41 -05:00
Seth Hoenig	c4fa644315	consul/connect: remove envoy dns option from gateway proxy config	2020-08-24 09:11:55 -05:00
Yoan Blanc	327d17e0dc	fixup! vendor: consul/api, consul/sdk v1.6.0 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-08-24 08:59:03 +02:00
Seth Hoenig	5b072029f2	consul/connect: add initial support for ingress gateways This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza. ```hcl service { connect { gateway { proxy { // envoy proxy configuration } ingress { // ingress-gateway configuration entry } } } } ``` A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value. Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver. Aims to address #8294 and tangentially #8647	2020-08-21 16:21:54 -05:00
Nick Ethier	3cd5f46613	Update UI to use new allocated ports fields (#8631 ) * nomad: canonicalize alloc shared resources to populate ports * ui: network ports * ui: remove unused task network references and update tests with new shared ports model * ui: lint * ui: revert auto formatting * ui: remove unused page objects * structs: remove unrelated test from bad conflict resolution * ui: formatting	2020-08-20 11:07:13 -04:00
Tim Gross	22e77bb03c	mrd: remove redundant validation in HTTP endpoint (#8685 ) The `regionForJob` function in the HTTP job endpoint overrides the region for multiregion jobs to `global`, which is used as a sentinel value in the server's job endpoint to avoid re-registration loops. This changeset removes an extraneous check that results in errors in the web UI and makes round-tripping through the HTTP API cumbersome for all consumers.	2020-08-18 16:48:09 -04:00
Lang Martin	6d8165c410	command/agent/csi_endpoint: explicit allocations (#8669 )	2020-08-13 15:48:08 -04:00
Tim Gross	7dca72acbe	csi: fix panic from assignment to nil map in plugin API (#8666 )	2020-08-13 11:36:41 -04:00
Tim Gross	3faa138732	fix panic converting structs to API in CSI endpoint (#8659 )	2020-08-12 15:59:10 -04:00
Nomad Release bot	1ea9d4eb22	Generate files for 0.12.2 release	2020-08-12 00:50:49 +00:00
Lang Martin	07ea822c6a	nomad debug renamed to nomad operator debug (#8602 ) * renamed: command/debug.go -> command/operator_debug.go * website: rename debug -> operator debug * website/pages/api-docs/agent: name in api docs	2020-08-11 15:39:44 -04:00
Lang Martin	c82b2a2454	CSI: volume and plugin allocations in the API (#8590 ) * command/agent/csi_endpoint: explicitly convert to API structs, and convert allocs for single object get endpoints	2020-08-11 12:24:41 -04:00
Tim Gross	443fdaa86b	csi: nomad volume detach command (#8584 ) The soundness guarantees of the CSI specification leave a little to be desired in our ability to provide a 100% reliable automated solution for managing volumes. This changeset provides a new command to bridge this gap by providing the operator the ability to intervene. The command doesn't take an allocation ID so that the operator doesn't have to keep track of alloc IDs that may have been GC'd. Handle this case in the unpublish RPC by sending the client RPC for all the terminal/nil allocs on the selected node.	2020-08-11 10:18:54 -04:00
Seth Hoenig	fd4804bf26	consul: able to set pass/fail thresholds on consul service checks This change adds the ability to set the fields `success_before_passing` and `failures_before_critical` on Consul service check definitions. This is a feature added to Consul v1.7.0 and later. https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical Nomad doesn't do much besides pass the fields through to Consul. Fixes #6913	2020-08-10 14:08:09 -05:00
Drew Bailey	b296558b8e	oss compoments for multi-vault namespaces adds in oss components to support enterprise multi-vault namespace feature upgrade specific doc on vault multi-namespaces vault docs update test to reflect new error	2020-07-24 10:14:59 -04:00
James Rasell	95db43eaf0	Merge pull request #8491 from hashicorp/b-gh-8481 api: task groups in system jobs do not support scaling stanzas.	2020-07-24 14:20:26 +02:00
Nomad Release bot	f2f50bf48e	Generate files for 0.12.1 release	2020-07-23 13:17:59 +00:00
Lars Lehtonen	e26ea30b7e	command/agent: fix dropped test error (#8504 )	2020-07-22 15:06:35 -04:00
James Rasell	2da8bd8f58	agent: task groups in system jobs do not support scaling stanzas.	2020-07-22 11:10:59 +02:00
Mahmood Ali	72ac33e4e7	Refactor setupLoggers	2020-07-17 11:05:57 -04:00
Mahmood Ali	ad2d484974	Set AgentShutdown	2020-07-17 11:04:57 -04:00
Chris Baker	f8478b6f82	Merge branch 'master' of github.com:hashicorp/nomad into release-0.12.0	2020-07-08 21:16:31 +00:00
Nick Ethier	119ece09a0	docs: add CNI and host_network docs (#8391 ) Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>	2020-07-08 15:45:04 -04:00
Nomad Release bot	549e766eab	Generate files for 0.12.0-rc1 release	2020-07-07 03:17:05 +00:00
Nick Ethier	e0fb634309	ar: support opting into binding host ports to default network IP (#8321 ) * ar: support opting into binding host ports to default network IP * fix config plumbing * plumb node address into network resource * struct: only handle network resource upgrade path once	2020-07-06 18:51:46 -04:00
Tim Gross	18250f71fd	fix region flag vs job region handling in plan/submit (#8347 )	2020-07-06 15:46:09 -04:00
Chris Baker	9100b6b7c0	changes to make sure that Max is present and valid, to improve error messages * made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected * added checks to HCL parsing that it is present * when Scaling.Max is absent/invalid, don't return extraneous error messages during validation * tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs resolves #8355	2020-07-04 19:05:50 +00:00
Mahmood Ali	329969b97e	tests: make testagent shutdown idempotent Avoid double freeing ports if an agent.Shutdown() is called multiple times.	2020-07-03 09:16:01 -04:00
Lang Martin	6c22cd587d	api: `nomad debug` new /agent/host (#8325 ) * command/agent/host: collect host data, multi platform * nomad/structs/structs: new HostDataRequest/Response * client/agent_endpoint: add RPC endpoint * command/agent/agent_endpoint: add Host * api/agent: add the Host endpoint * nomad/client_agent_endpoint: add Agent Host with forwarding * nomad/client_agent_endpoint: use findClientConn This changes forwardMonitorClient and forwardProfileClient to use findClientConn, which was cribbed from the common parts of those funcs. * command/debug: call agent hosts * command/agent/host: eliminate calling external programs	2020-07-02 09:51:25 -04:00
Tim Gross	23be116da0	csi: add -force flag to volume deregister (#8295 ) The `nomad volume deregister` command currently returns an error if the volume has any claims, but in cases where the claims can't be dropped because of plugin errors, providing a `-force` flag gives the operator an escape hatch. If the volume has no allocations or if they are all terminal, this flag deletes the volume from the state store, immediately and implicitly dropping all claims without further CSI RPCs. Note that this will not also unmount/detach the volume, which we'll make the responsibility of a separate `nomad volume detach` command.	2020-07-01 12:17:51 -04:00
Tim Gross	e52f76ed53	update compiled static assets	2020-06-24 16:37:13 -04:00
Tim Gross	a449009e9f	multiregion validation fixes (#8265 ) Multi-region jobs need to bypass validating counts otherwise we get spurious warnings in Job.Plan.	2020-06-24 12:18:51 -04:00
Seth Hoenig	e79b79034d	connect/native: fixup command/agent/consul/connect test cases	2020-06-24 09:05:56 -05:00
Seth Hoenig	6c5ab7f45e	consul/connect: split connect native flag and task in service	2020-06-23 10:22:22 -05:00
Seth Hoenig	4d71f22a11	consul/connect: add support for running connect native tasks This PR adds the capability of running Connect Native Tasks on Nomad, particularly when TLS and ACLs are enabled on Consul. The `connect` stanza now includes a `native` parameter, which can be set to the name of task that backs the Connect Native Consul service. There is a new Client configuration parameter for the `consul` stanza called `share_ssl`. Like `allow_unauthenticated` the default value is true, but recommended to be disabled in production environments. When enabled, the Nomad Client's Consul TLS information is shared with Connect Native tasks through the normal Consul environment variables. This does NOT include auth or token information. If Consul ACLs are enabled, Service Identity Tokens are automatically and injected into the Connect Native task through the CONSUL_HTTP_TOKEN environment variable. Any of the automatically set environment variables can be overridden by the Connect Native task using the `env` stanza. Fixes #6083	2020-06-22 14:07:44 -05:00
Michael Schurter	562704124d	Merge pull request #8208 from hashicorp/f-multi-network multi-interface network support	2020-06-19 15:46:48 -07:00
Mahmood Ali	963b1251ff	Merge pull request #8082 from hashicorp/f-raft-multipler Implement raft multipler flag	2020-06-19 10:04:59 -04:00
Nick Ethier	f0559a8162	multi-interface network support	2020-06-19 09:42:10 -04:00
Mahmood Ali	38a01c050e	Merge pull request #8192 from hashicorp/f-status-allnamespaces-2 CLI Allow querying all namespaces for jobs and allocations - Try 2	2020-06-18 20:16:52 -04:00
Nick Ethier	0bc0403cc3	Task DNS Options (#7661 ) Co-Authored-By: Tim Gross <tgross@hashicorp.com> Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>	2020-06-18 11:01:31 -07:00
Mahmood Ali	e784fe331a	use '*' to indicate all namespaces This reverts the introduction of AllNamespaces parameter that was merged earlier but never got released.	2020-06-17 16:27:43 -04:00
Tim Gross	7b12445f29	multiregion: change AutoRevert to OnFailure	2020-06-17 11:05:45 -04:00
Tim Gross	b09b7a2475	Multiregion job registration Integration points for multiregion jobs to be registered in the enterprise version of Nomad: * hook in `Job.Register` for enterprise to send job to peer regions * remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs	2020-06-17 11:04:58 -04:00
Tim Gross	161bcd9479	use constants from http package	2020-06-17 11:04:02 -04:00
Tim Gross	b93efc16d5	multiregion CLI: nomad deployment unblock	2020-06-17 11:03:44 -04:00
Drew Bailey	9263fcb0d3	Multiregion deploy status and job status CLI	2020-06-17 11:03:34 -04:00
Tim Gross	6851024925	Multiregion structs Initial struct definitions, jobspec parsing, validation, and conversion between Nomad structs and API structs for multi-region deployments.	2020-06-17 11:00:14 -04:00
Chris Baker	1e3563e08c	wip: added PreserveCounts to struct.JobRegisterRequest, development test for Job.Register	2020-06-16 18:45:17 +00:00
Mahmood Ali	69bb42acf8	tests: prefix agent logs to identify agent sources	2020-06-07 16:38:11 -04:00
Mahmood Ali	9eb13ae144	basic snapshot restore	2020-06-07 15:46:23 -04:00
Mahmood Ali	de44d9641b	Merge pull request #8047 from hashicorp/f-snapshot-save API for atomic snapshot backups	2020-06-01 07:55:16 -04:00
Mahmood Ali	a73cd01a00	Merge pull request #8001 from hashicorp/f-jobs-list-across-nses endpoint to expose all jobs across all namespaces	2020-05-31 21:28:03 -04:00
Mahmood Ali	0e8fafd739	implement raft multiplier	2020-05-31 12:24:27 -04:00
Drew Bailey	23d24c7a7f	removes pro tags (#8014 )	2020-05-28 15:40:17 -04:00
Drew Bailey	34871f89be	Oss license support for ent builds (#8054 ) * changes necessary to support oss licesning shims revert nomad fmt changes update test to work with enterprise changes update tests to work with new ent enforcements make check update cas test to use scheduler algorithm back out preemption changes add comments * remove unused method	2020-05-27 13:46:52 -04:00
Mahmood Ali	2108681c1d	Endpoint for snapshotting server state	2020-05-21 20:04:38 -04:00
James Rasell	ae0fb98c6b	api: return custom error if API attempts to decode empty body.	2020-05-19 15:46:31 +02:00
Mahmood Ali	5ab2d52e27	endpoint to expose all jobs across all namespaces Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all namespaces. The returned list is to contain a `Namespace` field indicating the job namespace. If ACL is enabled, the request token needs to be a management token or have `namespace:list-jobs` capability on all existing namespaces.	2020-05-18 13:50:46 -04:00
Nomad Release bot	189a378549	Generate files for 0.11.2 release	2020-05-14 20:49:42 +00:00
Mahmood Ali	9366181be6	always check `default_scheduler_config` config Also, avoid early return on validation to avoid masking some validation bugs in dev setup.	2020-05-14 14:16:12 -04:00
Lang Martin	d3c4700cd3	server: stop after client disconnect (#7939 ) * jobspec, api: add stop_after_client_disconnect * nomad/state/state_store: error message typo * structs: alloc methods to support stop_after_client_disconnect 1. a global AllocStates to track status changes with timestamps. We need this to track the time at which the alloc became lost originally. 2. ShouldClientStop() and WaitClientStop() to actually do the math * scheduler/reconcile_util: delayByStopAfterClientDisconnect * scheduler/reconcile: use delayByStopAfterClientDisconnect * scheduler/util: updateNonTerminalAllocsToLost comments This was setup to only update allocs to lost if the DesiredStatus had already been set by the scheduler. It seems like the intention was to update the status from any non-terminal state, and not all lost allocs have been marked stop or evict by now * scheduler/testing: AssertEvalStatus just use require * scheduler/generic_sched: don't create a blocked eval if delayed * scheduler/generic_sched_test: several scheduling cases	2020-05-13 16:39:04 -04:00
Tim Gross	4374c1a837	csi: support Secrets parameter in CSI RPCs (#7923 ) CSI plugins can require credentials for some publishing and unpublishing workflow RPCs. Secrets are configured at the time of volume registration, stored in the volume struct, and then passed around as an opaque map by Nomad to the plugins.	2020-05-11 17:12:51 -04:00
Mahmood Ali	061a439f2c	Merge pull request #7912 from hashicorp/f-scheduler-algorithm-followup Scheduler Algorithm Defaults handling and docs	2020-05-11 09:30:58 -04:00
Tim Gross	3aa761b151	Periodic GC for volume claims (#7881 ) This changeset implements a periodic garbage collection of CSI volumes with missing allocations. This can happen in a scenario where a node update fails partially and the allocation updates are written to raft but the evaluations to GC the volumes are dropped. This feature will cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1 get any stray claims cleaned up.	2020-05-11 08:20:50 -04:00
Mahmood Ali	2c963885b0	handle upgrade path and defaults Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on upgrades or on API handling when user misses the value. The scheduler already treats `""` value as binpack. This PR merely ensures that the operator API returns the effective value.	2020-05-09 12:34:08 -04:00
Tim Gross	801ebcfe8d	periodic GC for CSI plugins (#7878 ) This changeset implements a periodic garbage collection of unused CSI plugins. Plugins are self-cleaning when the last allocation for a plugin is stopped, but this feature will cover any missing edge cases and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins cleaned up.	2020-05-06 16:49:12 -04:00
Mahmood Ali	b9e3cde865	tests and some clean up	2020-05-01 13:13:30 -04:00
Charlie Voiselle	663fb677cf	Add SchedulerAlgorithm to SchedulerConfig	2020-05-01 13:13:29 -04:00
Drew Bailey	42075ef30e	allow test to check if server is enterprise	2020-04-30 14:46:21 -04:00
Drew Bailey	59b76f90e8	hcl fmt from editor license cli formatting, license endpoints ent only test oss error type assertions	2020-04-30 14:46:18 -04:00
Mahmood Ali	b8fb32f5d2	http: adjust log level for request failure Failed requests due to API client errors are to be marked as DEBUG. The Error log level should be reserved to signal problems with the cluster and are actionable for nomad system operators. Logs due to misbehaving API clients don't represent a system level problem and seem spurius to nomad maintainers at best. These log messages can also be attack vectors for deniel of service attacks by filling servers disk space with spurious log messages.	2020-04-22 16:19:59 -04:00
Mahmood Ali	5b42796f1e	Merge pull request #7704 from hashicorp/b-agent-shutdown-order agent: shutdown agent http server last	2020-04-20 10:37:26 -04:00
Mahmood Ali	4e1366f285	agent: route http logs through hclog Pipe http server log to hclog, so that it uses the same logging format as rest of nomad logs. Also, supports emitting them as json logs, when json formatting is set. The http server logs are emitted as Trace level, as they are typically repsent HTTP client errors (e.g. failed tls handshakes, invalid headers, etc). Though, Panic logs represent server errors and are relayed as Error level.	2020-04-20 10:33:40 -04:00
Mahmood Ali	b78680eee7	agent: shutdown agent http server last Shutdown http server last, after nomad client/server components terminate. Before this change, if the agent is taking an unexpectedly long time to shutdown, the operator cannot query the http server directly: they cannot access agent specific http endpoints and need to query another agent about the troublesome agent. Unexpectedly long shutdown can happen in normal cases, e.g. a client might hung is if one of the allocs it is running has a long shutdown_delay. Here, we switch to ensuring that the http server is shutdown last. I believe this doesn't require extra care in agent shutting down logic while operators may be able to submit write http requests. We already need to cope with operators submiting these http requests to another agent or by servers updating the client allocations.	2020-04-13 10:50:07 -04:00
Mahmood Ali	14d6fec05a	tests: deflake some SetServer related tests Some tests assert on numbers on numbers of servers, e.g. TestHTTP_AgentSetServers and TestHTTP_AgentListServers_ACL . Though, in dev and test modes, the agent starts with servers having duplicate entries for advertised and normalized RPC values, then settles with one unique value after Raft/Serf re-sets servers with one single unique value. This leads to flakiness, as the test will fail if assertion runs before Serf update takes effect. Here, we update the inital dev handling so it only adds a unique value if the advertised and normalized values are the same. Sample log lines illustrating the problem: ``` === CONT TestHTTP_AgentSetServers TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO] nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:9008 Address:127.0.0.1:9008}]" TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO] nomad: serf: EventMemberJoin: TestHTTP_AgentSetServers.global 127.0.0.1 TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.035Z [DEBUG] client.server_mgr: new server list: new_servers=[127.0.0.1:9008, 127.0.0.1:9008] old_servers=[] ... TestHTTP_AgentSetServers: agent_endpoint_test.go:759: Error Trace: agent_endpoint_test.go:759 http_test.go:1089 agent_endpoint_test.go:705 Error: "[127.0.0.1:9008 127.0.0.1:9008]" should have 1 item(s), but has 2 Test: TestHTTP_AgentSetServers ```	2020-04-07 09:27:48 -04:00
Mahmood Ali	ed4c4d13a4	fixup! backend: support WS authentication handshake in alloc/exec	2020-04-03 14:20:31 -04:00
Mahmood Ali	e63e096136	backend: support WS authentication handshake in alloc/exec The javascript Websocket API doesn't support setting custom headers (e.g. `X-Nomad-Token`). This change adds support for having an authentication handshake message: clients can set `ws_handshake` URL query parameter to true and send a single handshake message with auth token first before any other mssage. This is a backward compatible change: it does not affect nomad CLI path, as it doesn't set `ws_handshake` parameter.	2020-04-03 11:18:54 -04:00
Mahmood Ali	990cfb6fef	agent config parsing tests for scheduler config	2020-04-03 07:54:32 -04:00
Chris Baker	277d29c6e7	Merge pull request #7572 from hashicorp/f-7422-scaling-events finalizing scaling API work	2020-04-01 13:49:22 -05:00
Seth Hoenig	9aa9721143	connect: fix bug where absent connect.proxy stanza needs default config In some refactoring, a bug was introduced where if the connect.proxy stanza in a submitted job was nil, the default proxy configuration would not be initialized with default values, effectively breaking Connect. connect { sidecar_service {} # should work } In contrast, by setting an empty proxy stanza, the config values would be inserted correctly. connect { sidecar_service { proxy {} # workaround } } This commit restores the original behavior, where having a proxy stanza present is not required. The unit test for this case has also been corrected.	2020-04-01 11:19:32 -06:00
Chris Baker	40d6b3bbd1	adding raft and state_store support to track job scaling events updated ScalingEvent API to record "message string,error bool" instead of confusing "reason,error *string"	2020-04-01 16:15:14 +00:00
Seth Hoenig	14c7cebdea	connect: enable automatic expose paths for individual group service checks Part of #6120 Building on the support for enabling connect proxy paths in #7323, this change adds the ability to configure the 'service.check.expose' flag on group-level service check definitions for services that are connect-enabled. This is a slight deviation from the "magic" that Consul provides. With Consul, the 'expose' flag exists on the connect.proxy stanza, which will then auto-generate expose paths for every HTTP and gRPC service check associated with that connect-enabled service. A first attempt at providing similar magic for Nomad's Consul Connect integration followed that pattern exactly, as seen in #7396. However, on reviewing the PR we realized having the `expose` flag on the proxy stanza inseperably ties together the automatic path generation with every HTTP/gRPC defined on the service. This makes sense in Consul's context, because a service definition is reasonably associated with a single "task". With Nomad's group level service definitions however, there is a reasonable expectation that a service definition is more abstractly representative of multiple services within the task group. In this case, one would want to define checks of that service which concretely make HTTP or gRPC requests to different underlying tasks. Such a model is not possible with the course `proxy.expose` flag. Instead, we now have the flag made available within the check definitions themselves. By making the expose feature resolute to each check, it is possible to have some HTTP/gRPC checks which make use of the envoy exposed paths, as well as some HTTP/gRPC checks which make use of some orthongonal port-mapping to do checks on some other task (or even some other bound port of the same task) within the task group. Given this example, group "server-group" { network { mode = "bridge" port "forchecks" { to = -1 } } service { name = "myserver" port = 2000 connect { sidecar_service { } } check { name = "mycheck-myserver" type = "http" port = "forchecks" interval = "3s" timeout = "2s" method = "GET" path = "/classic/responder/health" expose = true } } } Nomad will automatically inject (via job endpoint mutator) the extrapolated expose path configuration, i.e. expose { path { path = "/classic/responder/health" protocol = "http" local_path_port = 2000 listener_port = "forchecks" } } Documentation is coming in #7440 (needs updating, doing next) Modifications to the `countdash` examples in https://github.com/hashicorp/demo-consul-101/pull/6 which will make the examples in the documentation actually runnable. Will add some e2e tests based on the above when it becomes available.	2020-03-31 17:15:50 -06:00
Seth Hoenig	41244c5857	jobspec: parse multi expose.path instead of explicit slice	2020-03-31 17:15:27 -06:00
Seth Hoenig	0266f056b8	connect: enable proxy.passthrough configuration Enable configuration of HTTP and gRPC endpoints which should be exposed by the Connect sidecar proxy. This changeset is the first "non-magical" pass that lays the groundwork for enabling Consul service checks for tasks running in a network namespace because they are Connect-enabled. The changes here provide for full configuration of the connect { sidecar_service { proxy { expose { paths = [{ path = <exposed endpoint> protocol = <http or grpc> local_path_port = <local endpoint port> listener_port = <inbound mesh port> }, ... ] } } } stanza. Everything from `expose` and below is new, and partially implements the precedent set by Consul: https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference Combined with a task-group level network port-mapping in the form: port "exposeExample" { to = -1 } it is now possible to "punch a hole" through the network namespace to a specific HTTP or gRPC path, with the anticipated use case of creating Consul checks on Connect enabled services. A future PR may introduce more automagic behavior, where we can do things like 1) auto-fill the 'expose.path.local_path_port' with the default value of the 'service.port' value for task-group level connect-enabled services. 2) automatically generate a port-mapping 3) enable an 'expose.checks' flag which automatically creates exposed endpoints for every compatible consul service check (http/grpc checks on connect enabled services).	2020-03-31 17:15:27 -06:00
Seth Hoenig	1ce4eb17fa	client: use consistent name for struct receiver parameter This helps reduce the number of squiggly lines in Goland.	2020-03-31 17:15:27 -06:00
Yoan Blanc	225c9c1215	fixup! vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:48:07 -04:00
Yoan Blanc	761d014071	vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:45:21 -04:00
Seth Hoenig	b3664c628c	Merge pull request #7524 from hashicorp/docs-consul-acl-minimums consul: annotate Consul interfaces with ACLs	2020-03-30 13:27:27 -06:00
Mahmood Ali	7df337e4c4	Merge pull request #7534 from hashicorp/b-windows-dev-network windows: support -dev mode	2020-03-30 14:35:28 -04:00
Seth Hoenig	0a812ab689	consul: annotate Consul interfaces with ACLs	2020-03-30 10:17:28 -06:00

1 2 3 4 5 ...

1814 commits