open-nomad

Author	SHA1	Message	Date
James Rasell	473fb4bb17	Merge pull request #9272 from hashicorp/f-recommendation-cli cli: add recommendation commands.	2020-11-06 16:29:57 +01:00
James Rasell	fe92d8b3cb	cli: add recommendation commands. Adds new CLI commands for applying, dismissing and detailing Nomad recommendations under a new top level recommendation command.	2020-11-06 11:16:24 +01:00
Nick Ethier	04f5c4ee5f	ar/groupservice: remove drivernetwork (#9233 ) * ar/groupservice: remove drivernetwork * consul: allow host address_mode to accept raw port numbers * consul: fix logic for blank address	2020-11-05 15:00:22 -05:00
Charlie Voiselle	0e373f70c1	Merge pull request #9255 from hashicorp/d-missing-example-comma Add missing comma in help-text example.	2020-11-04 09:51:09 -05:00
James Rasell	b147ec1e57	cli: update scaling policy list help to include job flag.	2020-11-04 13:35:29 +01:00
Charlie Voiselle	443a93be11	Add missing comma in help-text example. @krishicks spotted this while playing with command.	2020-11-02 18:00:53 -05:00
Kris Hicks	48a260fc33	Update monitor func not to take a prefix (#9251 ) The only user of monitor(evalID, true) was command/eval_status, and eval_status had a duplicate of the prefix-handling code inside it, so in all cases the complete evalID was being passed to monitor. Given that, we can remove the prefix code from command/monitor, and remove the boolean arg.	2020-11-02 10:24:49 -08:00
Kris Hicks	1da9e7fc67	Add event sink API and CLI commands (#9226 ) Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>	2020-11-02 09:57:35 -08:00
Chris Baker	719077a26d	added new policy capabilities for recommendations API state store: call-out to generic update of job recommendations from job update method recommendations API work, and http endpoint errors for OSS support for scaling polices in task block of job spec add query filters for ScalingPolicy list endpoint command: nomad scaling policy list: added -job and -type	2020-10-28 14:32:16 +00:00
Drew Bailey	86080e25a9	Send events to EventSinks (#9171 ) * Process to send events to configured sinks This PR adds a SinkManager to a server which is responsible for managing managed sinks. Managed sinks subscribe to the event broker and send events to a sink writer (webhook). When changes to the eventstore are made the sinkmanager and managed sink are responsible for reloading or starting a new managed sink. * periodically check in sink progress to raft Save progress on the last successfully sent index to raft. This allows a managed sink to resume close to where it left off in the event of a lost server or leadership change dereference eventsink so we can accurately use the watchch When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger	2020-10-26 17:27:54 -04:00
Drew Bailey	1ae39a9ed9	event sink crud operation api (#9155 ) * network sink rpc/api plumbing state store methods and restore upsert sink test get sink delete sink event sink list and tests go generate new msg types validate sink on upsert * go generate	2020-10-23 14:23:00 -04:00
Michael Schurter	c2dd9bc996	core: open source namespaces	2020-10-22 15:26:32 -07:00
Mahmood Ali	059e87c862	Merge pull request #9142 from hashicorp/f-hclv2-2.3 Support HCLv2 for Nomad jobs	2020-10-22 12:26:28 -05:00
Drew Bailey	f3dcefe5a9	remove event durability (#9147 ) * remove event durability temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs * drop events table schema and state store methods * fix neweventbuffer invocations	2020-10-22 12:21:03 -04:00
Mahmood Ali	d3a17b5c82	address review feedback	2020-10-22 11:49:37 -04:00
Mahmood Ali	f52bda4c30	api: update /render api to parse hclv2	2020-10-21 15:46:57 -04:00
Mahmood Ali	84ec0b38e8	cli: use HCLv2 parser Also, fallback to using HCLv1.	2020-10-21 15:46:57 -04:00
Mahmood Ali	1ae924973e	hclv1: tweak HCLv1 tests This ensures that gatway ReadOnly key is tested. Also, update the hclv1 test-fixtures to be hclv1 compliant.	2020-10-21 14:05:46 -04:00
Drew Bailey	6c788fdccd	Events/msgtype cleanup (#9117 ) * use msgtype in upsert node adds message type to signature for upsert node, update tests, remove placeholder method * UpsertAllocs msg type test setup * use upsertallocs with msg type in signature update test usage of delete node delete placeholder msgtype method * add msgtype to upsert evals signature, update test call sites with test setup msg type handle snapshot upsert eval outside of FSM and ignore eval event remove placeholder upsertevalsmsgtype handle job plan rpc and prevent event creation for plan msgtype cleanup upsertnodeevents updatenodedrain msgtype msg type 0 is a node registration event, so set the default to the ignore type * fix named import * fix signature ordering on upsertnode to match	2020-10-19 09:30:15 -04:00
Drew Bailey	fba0d6dc6a	event buffer size and durable count must be non negative	2020-10-15 16:34:33 -04:00
Nick Ethier	4903e5b114	Consul with CNI and host_network addresses (#9095 ) * consul: advertise cni and multi host interface addresses * structs: add service/check address_mode validation * ar/groupservices: fetch networkstatus at hook runtime * ar/groupservice: nil check network status getter before calling * consul: comment network status can be nil	2020-10-15 15:32:21 -04:00
Michael Schurter	ea55c497b7	Merge pull request #9094 from hashicorp/f-1.0 s/0.13/1.0/g	2020-10-15 08:53:33 -07:00
James Rasell	42a6e7140f	Merge pull request #9083 from hashicorp/b-fix-enterprise-config-merge agent: fix enterprise config overlay merging.	2020-10-15 08:40:49 +02:00
Michael Schurter	9c3972937b	s/0.13/1.0/g 1.0 here we come!	2020-10-14 15:17:47 -07:00
Michael Schurter	dd09fa1a4a	Merge pull request #9055 from hashicorp/f-9017-resources api: add field filters to /v1/{allocations,nodes}	2020-10-14 14:49:39 -07:00
Michael Schurter	6890cffd7a	unify boolean parameter parsing	2020-10-14 12:23:25 -07:00
Dave May	f37e90be18	Metrics gotemplate support, debug bundle features (#9067 ) * add goroutine text profiles to nomad operator debug * add server-id=all to nomad operator debug * fix bug from changing metrics from string to []byte * Add function to return MetricsSummary struct, metrics gotemplate support * fix bug resolving 'server-id=all' when no servers are available * add url to operator_debug tests * removed test section which is used for future operator_debug.go changes * separate metrics from operator, use only structs from go-metrics * ensure parent directories are created as needed * add suggested comments for text debug pprof * move check down to where it is used * add WaitForFiles helper function to wait for multiple files to exist * compact metrics check Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com> * fix github's silly apply suggestion Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>	2020-10-14 15:16:10 -04:00
Drew Bailey	c463479848	filter on additional filter keys, remove switch statement duplication properly wire up durable event count move newline responsibility moves newline creation from NDJson to the http handler, json stream only encodes and sends now ignore snapshot restore if broker is disabled enable dev mode to access event steam without acl use mapping instead of switch use pointers for config sizes, remove unused ttl, simplify closed conn logic	2020-10-14 14:14:33 -04:00
Michael Schurter	8ccbd92cb6	api: add field filters to /v1/{allocations,nodes} Fixes #9017 The ?resources=true query parameter includes resources in the object stub listings. Specifically: - For `/v1/nodes?resources=true` both the `NodeResources` and `ReservedResources` field are included. - For `/v1/allocations?resources=true` the `AllocatedResources` field is included. The ?task_states=false query parameter removes TaskStates from /v1/allocations responses. (By default TaskStates are included.)	2020-10-14 10:35:22 -07:00
Drew Bailey	684807bddb	namespace filtering	2020-10-14 12:44:43 -04:00
Drew Bailey	df96b89958	Add EvictCallbackFn to handle removing entries from go-memdb when they are removed from the event buffer. Wire up event buffer size config, use pointers for structs.Events instead of copying.	2020-10-14 12:44:42 -04:00
Drew Bailey	315f77a301	rehydrate event publisher on snapshot restore address pr feedback	2020-10-14 12:44:41 -04:00
Drew Bailey	d793529d61	event durability count and cfg	2020-10-14 12:44:40 -04:00
Drew Bailey	b4c135358d	use Events to wrap index and events, store in events table	2020-10-14 12:44:39 -04:00
Drew Bailey	559517455a	wire up enable_event_publisher	2020-10-14 12:44:38 -04:00
Drew Bailey	4793bb4e01	Events/deployment events (#9004 ) * Node Drain events and Node Events (#8980) Deployment status updates handle deployment status updates (paused, failed, resume) deployment alloc health generate events from apply plan result txn err check, slim down deployment event one ndjson line per index * consolidate down to node event + type * fix UpdateDeploymentAllocHealth test invocations * fix test	2020-10-14 12:44:37 -04:00
Drew Bailey	a4a2975edf	Event Stream API/RPC (#8947 ) This Commit adds an /v1/events/stream endpoint to stream events from. The stream framer has been updated to include a SendFull method which does not fragment the data between multiple frames. This essentially treats the stream framer as a envelope to adhere to the stream framer interface in the UI. If the `encode` query parameter is omitted events will be streamed as newline delimted JSON.	2020-10-14 12:44:36 -04:00
James Rasell	e0734bed77	agent: fix enterprise config overlay merging.	2020-10-14 09:35:16 +02:00
Chris Baker	1d35578bed	removed backwards-compatible/untagged metrics deprecated in 0.7	2020-10-13 20:18:39 +00:00
Seth Hoenig	ed13e5723f	consul/connect: dynamically select envoy sidecar at runtime As newer versions of Consul are released, the minimum version of Envoy it supports as a sidecar proxy also gets bumped. Starting with the upcoming Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the default implementation of Connect sidecar proxy. This PR introduces a change such that each Nomad Client will query its local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545) and then launch the Connect sidecar proxy task using the latest supported version of Envoy. If the `SupportedProxies` API component is not available from Consul, Nomad will fallback to the old version of Envoy supported by old versions of Consul. Setting the meta configuration option `meta.connect.sidecar_image` or setting the `connect.sidecar_task` stanza will take precedence as is the current behavior for sidecar proxies. Setting the meta configuration option `meta.connect.gateway_image` will take precedence as is the current behavior for connect gateways. `meta.connect.sidecar_image` and `meta.connect.gateway_image` may make use of the special `${NOMAD_envoy_version}` variable interpolation, which resolves to the newest version of Envoy supported by the Consul agent. Addresses #8585 #7665	2020-10-13 09:14:12 -05:00
Yoan Blanc	891accb89a	use allow/deny instead of the colored alternatives (#9019 ) Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-10-12 08:47:05 -04:00
Tim Gross	ecec432653	csi: allow for volume detach to work with gc'd nodes (#9057 ) When we try to prefix match the `nomad volume detach` node ID argument, the node may have been already GC'd. The volume unpublish workflow gracefully handles this case so that we can free the claim. So make a best effort to find a node ID among the volume's claimed allocations, or otherwise just use the node ID we've been given by the user as-is.	2020-10-09 09:45:03 -04:00
Tim Gross	82749bd6a6	csi: allow volume detach to take a node ID prefix (#9041 ) Fixes a bug where the `nomad volume detach` command would not accept a node ID prefix instead of a full node ID. The volume ID is already prefix matched server-side.	2020-10-07 11:14:57 -04:00
James Rasell	d2fe895216	Merge pull request #9023 from hashicorp/f-gh-8648 cli: add scale and scaling-events commands to job cmd.	2020-10-06 18:03:41 +02:00
Dave May	561637c063	Merge pull request #9034 from hashicorp/dmay-debug-metrics Add metrics command / output to debug bundle	2020-10-06 11:47:09 -04:00
Seth Hoenig	6cffbecb3a	Merge pull request #9033 from pierreca/verify-remove-checks Do not double-remove checks removed by Consul	2020-10-06 10:16:13 -05:00
davemay99	18aa30c00f	metrics return bytes instead of string for more flexibility	2020-10-06 10:49:15 -04:00
davemay99	19a075cf47	update deprecated syntax per GH-9027	2020-10-06 09:47:16 -04:00
davemay99	7160c26f04	sync vendored modules	2020-10-06 09:16:52 -04:00
James Rasell	b7dac9020f	Merge pull request #9025 from hashicorp/f-gh-8649 cli: add policy list and info to new scaling cmd.	2020-10-06 12:40:43 +02:00
James Rasell	552d1b2ed4	cli: ensure scaling policy target doesn't have trailing comma	2020-10-06 12:18:17 +02:00
James Rasell	564adc1678	cli: add scale and scaling-events commands to job cmd. This adds the ability to scale Nomad jobs and view scaling events via the CLI.	2020-10-06 09:58:46 +02:00
James Rasell	ffe6533ad1	Merge pull request #9027 from hashicorp/f-gh-9026 cli: move tests to use NewMockUi func.	2020-10-06 08:28:18 +02:00
davemay99	603cc1776c	Add metrics command / output to debug bundle	2020-10-05 22:30:01 -04:00
Pierre Cauchois	1efe05f516	Do not double-remove checks removed by Consul When deregistering a service, consul also deregisters the associated checks. The current state keeps track of all services and all checks separately and deregisters them in sequence, which leads, whether during syncs or shutdowns, to check deregistrations happening twice and failing the second time (generating errors in logs) This fix includes: - a fix to the sync logic that just pulls the checks after the services have been synced - a fix to the shutdown mechanism that gets an updated list of checks after deregistering the services, so that we get a cleaner check deregistration process.	2020-10-06 00:30:29 +00:00
Chris Baker	7f701fddd0	updated docs and validation to further prohibit null chars in region, datacenter, and job name	2020-10-05 18:01:50 +00:00
James Rasell	2ed78b8a7e	cli: move tests to use NewMockUi func.	2020-10-05 16:07:41 +02:00
James Rasell	b8727997cd	cli: add policy list and info to new scaling cmd. This adds the ability to detail scaling policies via the CLI.	2020-10-05 15:18:30 +02:00
Kent 'picat' Gruber	5e1c716835	Merge pull request #8998 from hashicorp/keygen-32-bytes Use 32-byte key for gossip encryption to enable AES-256	2020-10-02 17:17:55 -04:00
Kent 'picat' Gruber	b03f79700c	Fix panic in test due to the agent's logger not being initialized yet So a null logger is used to avoid the problem.	2020-10-02 11:10:27 -04:00
Fredrik Hoem Grelland	953d4de8dd	update consul-template to v0.25.1 (#8988 )	2020-10-01 14:08:49 -04:00
Kent 'picat' Gruber	90e85f9add	Fix other usages of initKeyring func to use logger as third argument	2020-10-01 11:13:06 -04:00
Kent 'picat' Gruber	b98bb99dfe	Log AES-128 and AES-192 key sizes during keyring initialization	2020-10-01 11:12:14 -04:00
Kent 'picat' Gruber	42bdb03f43	Fix operator keygen test to check for 32 bytes	2020-09-30 17:04:33 -04:00
Kent 'picat' Gruber	6cefe03359	Generate 32-byte gossip key for nomad operator keygen command The key generated from this command is used for gossip encrpytion, which utilizes AES GCM encryption. Using a key size of 16-bytes enables AES-128 while a key size of 32 bytes enables AES-256. The underlying memberlist library supports the larger key size, and is ultimatley preferable from a security standpoint. Consul also uses 32 bytes by default: `1a14b94441`	2020-09-30 17:02:37 -04:00
Michael Schurter	765473e8b0	jobspec: lower min cpu resources from 10->1 Since CPU resources are usually a soft limit it is desirable to allow setting it as low as possible to allow tasks to run only in "idle" time. Setting it to 0 is still not allowed to avoid potential unintentional side effects with allowing a zero value. While there may not be any side effects this commit attempts to minimize risk by avoiding the issue. This does not change the defaults.	2020-09-30 12:15:13 -07:00
Dave May	eaa4f6faf5	Merge pull request #8922 from hashicorp/dmay-raftutil-path Raftutil cleanup, plus helper function to find raft.db	2020-09-29 15:12:32 -04:00
Tim Gross	b12938a9fb	command: fix a typo in the help text for namespaces (#8975 )	2020-09-28 12:23:25 -04:00
davemay99	f2b3536da2	refactor functions to find raft.db	2020-09-24 19:00:53 -04:00
Nick Ethier	e75a3f349b	command: remove mbits from quota hcl (#8740 )	2020-09-24 11:44:59 -04:00
davemay99	bc9fb2a6ee	remove extra debug output	2020-09-17 21:42:53 -04:00
davemay99	5a159f1108	Raftutil cleanup, plus helper function to find raft.db	2020-09-17 21:35:17 -04:00
Mahmood Ali	87b0437e0f	Merge pull request #8911 from hashicorp/f-task_network_warning-smaller Smaller 0.12 mbit deprecation PR	2020-09-17 08:11:13 -05:00
Tim Gross	7a691d0000	filter volumes by type in 'nomad node status' output (#8902 ) Volume requests can be either CSI or host volumes, so when displaying the CSI volume info for `nomad node status -verbose` we need to filter out the host volumes.	2020-09-16 15:00:12 -04:00
Mahmood Ali	d65cda5e70	Update job examples with MBit deprecation	2020-09-16 11:06:19 -04:00
Charlie Voiselle	5ec3945531	Change tabs to spaces in nomad monitor help text	2020-09-14 15:08:30 -04:00
Michael Schurter	1544341f09	Merge pull request #8862 from hashicorp/release-0.12.4 Prepare for 0.13 development cycle	2020-09-10 09:14:44 -07:00
Mahmood Ali	d4f385d6e1	Upgrade to golang 1.15 (#8858 ) Upgrade to golang 1.15 Starting with golang 1.5, setting Ctty value result in `Setctty set but Ctty not valid in child` error, as part of https://github.com/golang/go/issues/29458 . This commit lifts the fix in https://github.com/creack/pty/pull/97 .	2020-09-09 15:59:29 -04:00
Nomad Release bot	3b8a2f22dc	Generate files for 0.12.4-rc1 release	2020-09-03 02:59:23 +00:00
Drew Bailey	33dc50dca0	Merge pull request #8793 from hashicorp/debug-cli/run-intervals run commands for duration and interval without needing to specify server or node	2020-08-31 16:07:26 -04:00
Drew Bailey	6d7a6ebb38	run commands for duration and interval without needing to specify servers or nodes	2020-08-31 14:13:03 -04:00
Lang Martin	b4d364f030	command/plugin_status_csi: plugin status :id keeps expected count	2020-08-31 13:56:54 -04:00
Drew Bailey	1f7ea53876	add license info to operator debug command	2020-08-31 13:22:23 -04:00
Mahmood Ali	66df214792	raft debug commands are low-level internal commands	2020-08-31 08:45:59 -04:00
Mahmood Ali	12dbf699fa	Apply suggestions from code review Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2020-08-31 08:45:59 -04:00
Mahmood Ali	d588f91575	add helper commands for debugging state	2020-08-31 08:45:59 -04:00
Jeffrey 'jf' Lim	25071c525a	Fix cmd.Name() for NodeEligibilityCommand	2020-08-29 22:45:42 +08:00
Tim Gross	b77fe023b5	MRD: move 'job stop -global' handling into RPC (#8776 ) The initial implementation of global job stop for MRD looped over all the regions in the CLI for expedience. This changeset includes the OSS parts of moving this into the RPC layer so that API consumers don't have to implement this logic themselves.	2020-08-28 14:28:13 -04:00
Lang Martin	97c7f2acea	command/operator_debug: mkdir before storing agent-host (#8707 ) The api calls were reordered, the new order omits the `agent-host.json` result by fetching it before the directory is created.	2020-08-28 11:58:06 -04:00
Lang Martin	7d483f93c0	csi: plugins track jobs in addition to allocations, and use job information to set expected counts (#8699 ) * nomad/structs/csi: add explicit job support * nomad/state/state_store: capture job updates directly * api/nodes: CSIInfo needs the AllocID * command/agent/csi_endpoint: AllocID was missing Co-authored-by: Tim Gross <tgross@hashicorp.com>	2020-08-27 17:20:00 -04:00
Seth Hoenig	9f1f2a5673	Merge branch 'master' into f-cc-ingress	2020-08-26 15:31:05 -05:00
Seth Hoenig	dfe179abc5	consul/connect: fixup some comments and context timeout	2020-08-26 13:17:16 -05:00
Tim Gross	f9b6c8153c	csi: fix panic in serializing nil allocs in volume API (#8735 ) - fix panic in serializing nil allocs in volume API - prevent potential panic in serializing plugin allocs	2020-08-25 10:13:05 -04:00
Seth Hoenig	26e77623e5	consul/connect: fixup tests to use new consul sdk	2020-08-24 12:02:41 -05:00
Seth Hoenig	c4fa644315	consul/connect: remove envoy dns option from gateway proxy config	2020-08-24 09:11:55 -05:00
Yoan Blanc	327d17e0dc	fixup! vendor: consul/api, consul/sdk v1.6.0 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-08-24 08:59:03 +02:00
Seth Hoenig	5b072029f2	consul/connect: add initial support for ingress gateways This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza. ```hcl service { connect { gateway { proxy { // envoy proxy configuration } ingress { // ingress-gateway configuration entry } } } } ``` A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value. Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver. Aims to address #8294 and tangentially #8647	2020-08-21 16:21:54 -05:00
Nick Ethier	3cd5f46613	Update UI to use new allocated ports fields (#8631 ) * nomad: canonicalize alloc shared resources to populate ports * ui: network ports * ui: remove unused task network references and update tests with new shared ports model * ui: lint * ui: revert auto formatting * ui: remove unused page objects * structs: remove unrelated test from bad conflict resolution * ui: formatting	2020-08-20 11:07:13 -04:00
Tim Gross	22e77bb03c	mrd: remove redundant validation in HTTP endpoint (#8685 ) The `regionForJob` function in the HTTP job endpoint overrides the region for multiregion jobs to `global`, which is used as a sentinel value in the server's job endpoint to avoid re-registration loops. This changeset removes an extraneous check that results in errors in the web UI and makes round-tripping through the HTTP API cumbersome for all consumers.	2020-08-18 16:48:09 -04:00
Tim Gross	38ec70eb8d	multiregion: validation should always return error for OSS (#8687 )	2020-08-18 15:35:38 -04:00
Lang Martin	6d8165c410	command/agent/csi_endpoint: explicit allocations (#8669 )	2020-08-13 15:48:08 -04:00
Tim Gross	7dca72acbe	csi: fix panic from assignment to nil map in plugin API (#8666 )	2020-08-13 11:36:41 -04:00
Tim Gross	3faa138732	fix panic converting structs to API in CSI endpoint (#8659 )	2020-08-12 15:59:10 -04:00
Nomad Release bot	1ea9d4eb22	Generate files for 0.12.2 release	2020-08-12 00:50:49 +00:00
Lang Martin	07ea822c6a	nomad debug renamed to nomad operator debug (#8602 ) * renamed: command/debug.go -> command/operator_debug.go * website: rename debug -> operator debug * website/pages/api-docs/agent: name in api docs	2020-08-11 15:39:44 -04:00
Lang Martin	1d7998f39f	`debug` command archive content changes (#8462 ) * command/debug: print interval data so the operator knows its waiting * command/debug: use the Consul/Vault env for queries * command/debug: capture the operator endpoints * command/debug: capture API errors in the archive bundle	2020-08-11 13:14:28 -04:00
Lang Martin	c82b2a2454	CSI: volume and plugin allocations in the API (#8590 ) * command/agent/csi_endpoint: explicitly convert to API structs, and convert allocs for single object get endpoints	2020-08-11 12:24:41 -04:00
Tim Gross	443fdaa86b	csi: nomad volume detach command (#8584 ) The soundness guarantees of the CSI specification leave a little to be desired in our ability to provide a 100% reliable automated solution for managing volumes. This changeset provides a new command to bridge this gap by providing the operator the ability to intervene. The command doesn't take an allocation ID so that the operator doesn't have to keep track of alloc IDs that may have been GC'd. Handle this case in the unpublish RPC by sending the client RPC for all the terminal/nil allocs on the selected node.	2020-08-11 10:18:54 -04:00
Seth Hoenig	fd4804bf26	consul: able to set pass/fail thresholds on consul service checks This change adds the ability to set the fields `success_before_passing` and `failures_before_critical` on Consul service check definitions. This is a feature added to Consul v1.7.0 and later. https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical Nomad doesn't do much besides pass the fields through to Consul. Fixes #6913	2020-08-10 14:08:09 -05:00
Drew Bailey	c06a84e4a2	ignore VAULT_NAMESPACE (#8581 ) VAULT_NAMESPACE in 0.12.1 and previous versions is already ignored. \n revert change that used it as a default since it will break oss users	2020-07-31 10:33:21 -04:00
Drew Bailey	b296558b8e	oss compoments for multi-vault namespaces adds in oss components to support enterprise multi-vault namespace feature upgrade specific doc on vault multi-namespaces vault docs update test to reflect new error	2020-07-24 10:14:59 -04:00
Mahmood Ali	b800a4f80e	Merge pull request #8514 from sashaaKr/bugfix/cli_ui change url to client	2020-07-24 09:54:39 -04:00
James Rasell	95db43eaf0	Merge pull request #8491 from hashicorp/b-gh-8481 api: task groups in system jobs do not support scaling stanzas.	2020-07-24 14:20:26 +02:00
Tim Gross	43d2052c99	csi: avoid panic in CLI for failed node attachment cleanup (#8525 ) If the node API returns an attached volume that doesn't belong to an alloc (because it's failed to clean up properly), `nomad node status` will panic when rendering the response. Also, avoid empty volumes output in node status	2020-07-24 08:17:27 -04:00
sasha	f09a65227d	remove test file	2020-07-23 18:44:10 +03:00
sasha	2980010e63	change url to client	2020-07-23 18:41:38 +03:00
Nomad Release bot	f2f50bf48e	Generate files for 0.12.1 release	2020-07-23 13:17:59 +00:00
Mahmood Ali	fc38cd21c4	mrd: only output evalID if found If the multi-region job is a periodic/dispatch job, stopping them returns an empty EvalID. This removes some unexpected empty lines.	2020-07-22 16:43:03 -04:00
Lars Lehtonen	e26ea30b7e	command/agent: fix dropped test error (#8504 )	2020-07-22 15:06:35 -04:00
Drew Bailey	744cf9b2e8	remove duplicate license info (#8496 )	2020-07-22 10:21:56 -04:00
James Rasell	2da8bd8f58	agent: task groups in system jobs do not support scaling stanzas.	2020-07-22 11:10:59 +02:00
Mahmood Ali	c29dec2ebd	format job init hcl (#8483 )	2020-07-21 11:49:02 -04:00
Mahmood Ali	a483dde8b9	minor tweaks from Ent	2020-07-20 09:25:09 -04:00
Mahmood Ali	72ac33e4e7	Refactor setupLoggers	2020-07-17 11:05:57 -04:00
Mahmood Ali	ad2d484974	Set AgentShutdown	2020-07-17 11:04:57 -04:00
Mahmood Ali	ec9a12e54e	Fix pro tags	2020-07-17 11:02:00 -04:00
Tim Gross	bd457343de	MRD: all regions should start pending (#8433 ) Deployments should wait until kicked off by `Job.Register` so that we can assert that all regions have a scheduled deployment before starting any region. This changeset includes the OSS fixes to support the ENT work. `IsMultiregionStarter` has no more callers in OSS, so remove it here.	2020-07-14 10:57:37 -04:00
Chris Baker	f8478b6f82	Merge branch 'master' of github.com:hashicorp/nomad into release-0.12.0	2020-07-08 21:16:31 +00:00
Nick Ethier	119ece09a0	docs: add CNI and host_network docs (#8391 ) Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>	2020-07-08 15:45:04 -04:00
Tim Gross	1098ca6ef1	fix multiregion plan output flags (#8375 ) The call to render the output diff swapped the `diff` and `verbose` bool parameters, resulting in dropping the diff output in multi-region plans but not single-region plans.	2020-07-08 10:10:08 -04:00
Nomad Release bot	549e766eab	Generate files for 0.12.0-rc1 release	2020-07-07 03:17:05 +00:00
Nick Ethier	e0fb634309	ar: support opting into binding host ports to default network IP (#8321 ) * ar: support opting into binding host ports to default network IP * fix config plumbing * plumb node address into network resource * struct: only handle network resource upgrade path once	2020-07-06 18:51:46 -04:00
Tim Gross	18250f71fd	fix region flag vs job region handling in plan/submit (#8347 )	2020-07-06 15:46:09 -04:00
Chris Baker	9100b6b7c0	changes to make sure that Max is present and valid, to improve error messages * made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected * added checks to HCL parsing that it is present * when Scaling.Max is absent/invalid, don't return extraneous error messages during validation * tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs resolves #8355	2020-07-04 19:05:50 +00:00
Mahmood Ali	329969b97e	tests: make testagent shutdown idempotent Avoid double freeing ports if an agent.Shutdown() is called multiple times.	2020-07-03 09:16:01 -04:00
Lang Martin	1e7560d621	command/debug: use the correct env vars for Consul token (#8332 )	2020-07-02 10:04:22 -04:00
Lang Martin	6c22cd587d	api: `nomad debug` new /agent/host (#8325 ) * command/agent/host: collect host data, multi platform * nomad/structs/structs: new HostDataRequest/Response * client/agent_endpoint: add RPC endpoint * command/agent/agent_endpoint: add Host * api/agent: add the Host endpoint * nomad/client_agent_endpoint: add Agent Host with forwarding * nomad/client_agent_endpoint: use findClientConn This changes forwardMonitorClient and forwardProfileClient to use findClientConn, which was cribbed from the common parts of those funcs. * command/debug: call agent hosts * command/agent/host: eliminate calling external programs	2020-07-02 09:51:25 -04:00
Mahmood Ali	1917989a1f	document namespace option in CLI docs	2020-07-01 15:31:41 -04:00
Tim Gross	23be116da0	csi: add -force flag to volume deregister (#8295 ) The `nomad volume deregister` command currently returns an error if the volume has any claims, but in cases where the claims can't be dropped because of plugin errors, providing a `-force` flag gives the operator an escape hatch. If the volume has no allocations or if they are all terminal, this flag deletes the volume from the state store, immediately and implicitly dropping all claims without further CSI RPCs. Note that this will not also unmount/detach the volume, which we'll make the responsibility of a separate `nomad volume detach` command.	2020-07-01 12:17:51 -04:00
Mahmood Ali	ee6fbcbc0f	Merge pull request #8296 from hashicorp/b-tests-cleanup-20200625 Cleanup for command package tests	2020-06-26 09:31:41 -04:00
Mahmood Ali	30492e8119	tests: avoid using os.Setenv for tokens	2020-06-26 08:52:21 -04:00
Mahmood Ali	9583190eb3	tests: use flagAddress instead of process env Using Setenv may can cause test interference, where a test may accidentally pick up value set by another test.	2020-06-26 08:52:21 -04:00
Mahmood Ali	384d8cf3a5	Merge pull request #8271 from hashicorp/f-comment-init-check-stanza Comment out default Consul check; Update URLs	2020-06-26 08:30:30 -04:00
Nick Ethier	89118016fc	command: correctly show host IP in ports output /w multi-host networks (#8289 )	2020-06-25 15:16:01 -04:00
Lang Martin	9b657b5e5e	new command: nomad debug captures a debug archive of cluster state (#8244 ) * command/debug: build a local archive of debug data * command/debug: query consul and vault directly * command/debug: include pprof CPUProfile Trace and goroutine * command/debug: trap signals and close the monitor requests	2020-06-25 12:51:23 -04:00
Mahmood Ali	8631e9dad5	always shutdown test server on test cleanup	2020-06-25 12:44:19 -04:00
Tim Gross	e52f76ed53	update compiled static assets	2020-06-24 16:37:13 -04:00
Charlie Voiselle	e0e3a66b3a	Fix link to scheduler page	2020-06-24 15:44:07 -04:00
Charlie Voiselle	9b20269709	Comment out default Consul check; Update URLs Having an active check in the sample job causes issues with testing deployments in environments that are not integrated with Consul. This negatively impacts some of the getting-started experiences. Commenting out the check allows deployments to proceed successfully but leaves it in the sample job for convenience. Made a drive-by fix to all of the URLs in the jobfile	2020-06-24 15:34:48 -04:00
Tim Gross	67ffcb35e9	multiregion: add support for 'job plan' (#8266 ) Add a scatter-gather for multiregion job plans. Each region's servers interpolate the plan locally in `Job.Plan` but don't distribute the plan as done in `Job.Run`. Note that it's not possible to return a usable modify index from a multiregion plan for use with `-check-index`. Even if we were to force the modify index to be the same at the start of `Job.Run` the index immediately drifts during each region's deployments, depending on events local to each region. So we omit this section of a multiregion plan.	2020-06-24 13:24:55 -04:00
Tim Gross	a449009e9f	multiregion validation fixes (#8265 ) Multi-region jobs need to bypass validating counts otherwise we get spurious warnings in Job.Plan.	2020-06-24 12:18:51 -04:00
Seth Hoenig	3872b493e5	Merge pull request #8011 from hashicorp/f-cnative-host consul/connect: implement initial support for connect native	2020-06-24 10:33:12 -05:00
Seth Hoenig	e79b79034d	connect/native: fixup command/agent/consul/connect test cases	2020-06-24 09:05:56 -05:00
Tim Gross	010d94d419	multiregion: job stop across regions with -global flag (#8258 ) Adds a `-global` flag for stopping multiregion jobs in all regions at once. Warn the user if they attempt to stop a multiregion job in a single region.	2020-06-23 15:56:04 -04:00
James Rasell	bc40665f1d	cli: fix license get command help Synopsis text.	2020-06-23 18:47:39 +02:00
Seth Hoenig	6c5ab7f45e	consul/connect: split connect native flag and task in service	2020-06-23 10:22:22 -05:00
Seth Hoenig	4d71f22a11	consul/connect: add support for running connect native tasks This PR adds the capability of running Connect Native Tasks on Nomad, particularly when TLS and ACLs are enabled on Consul. The `connect` stanza now includes a `native` parameter, which can be set to the name of task that backs the Connect Native Consul service. There is a new Client configuration parameter for the `consul` stanza called `share_ssl`. Like `allow_unauthenticated` the default value is true, but recommended to be disabled in production environments. When enabled, the Nomad Client's Consul TLS information is shared with Connect Native tasks through the normal Consul environment variables. This does NOT include auth or token information. If Consul ACLs are enabled, Service Identity Tokens are automatically and injected into the Connect Native task through the CONSUL_HTTP_TOKEN environment variable. Any of the automatically set environment variables can be overridden by the Connect Native task using the `env` stanza. Fixes #6083	2020-06-22 14:07:44 -05:00
Mahmood Ali	fa4e898c45	accomodate enterprise specific commands `nomad operator snapshot agent` is an Enterprise specific command	2020-06-22 10:27:25 -04:00
Michael Schurter	562704124d	Merge pull request #8208 from hashicorp/f-multi-network multi-interface network support	2020-06-19 15:46:48 -07:00
Mahmood Ali	bf08b7a890	Merge pull request #8214 from hashicorp/docs-snapshot-update Update changelog and snapshot docs	2020-06-19 14:27:12 -04:00
Mahmood Ali	d04ab67045	Apply suggestions from code review Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>	2020-06-19 13:36:22 -04:00
Mahmood Ali	ef6507d6ee	cli: use <file> for consistency	2020-06-19 12:19:38 -04:00
Mahmood Ali	ce0eee6a78	complete missed message	2020-06-19 11:02:36 -04:00
Mahmood Ali	963b1251ff	Merge pull request #8082 from hashicorp/f-raft-multipler Implement raft multipler flag	2020-06-19 10:04:59 -04:00
Nick Ethier	f0559a8162	multi-interface network support	2020-06-19 09:42:10 -04:00
Mahmood Ali	38a01c050e	Merge pull request #8192 from hashicorp/f-status-allnamespaces-2 CLI Allow querying all namespaces for jobs and allocations - Try 2	2020-06-18 20:16:52 -04:00
Nick Ethier	0bc0403cc3	Task DNS Options (#7661 ) Co-Authored-By: Tim Gross <tgross@hashicorp.com> Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>	2020-06-18 11:01:31 -07:00
Mahmood Ali	5c623f33d5	cli: warn on multiple prefix matches when querying all namespaces	2020-06-17 16:32:51 -04:00
Mahmood Ali	8d9ce41202	cli: query all namespaces for alloc subcommands	2020-06-17 16:31:06 -04:00
Mahmood Ali	7a33a75449	cli: jobs allow querying jobs in all namespaces	2020-06-17 16:31:01 -04:00
Mahmood Ali	e784fe331a	use '*' to indicate all namespaces This reverts the introduction of AllNamespaces parameter that was merged earlier but never got released.	2020-06-17 16:27:43 -04:00
Tim Gross	7b12445f29	multiregion: change AutoRevert to OnFailure	2020-06-17 11:05:45 -04:00
Tim Gross	b09b7a2475	Multiregion job registration Integration points for multiregion jobs to be registered in the enterprise version of Nomad: * hook in `Job.Register` for enterprise to send job to peer regions * remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs	2020-06-17 11:04:58 -04:00
Tim Gross	161bcd9479	use constants from http package	2020-06-17 11:04:02 -04:00
Tim Gross	b93efc16d5	multiregion CLI: nomad deployment unblock	2020-06-17 11:03:44 -04:00
Drew Bailey	9263fcb0d3	Multiregion deploy status and job status CLI	2020-06-17 11:03:34 -04:00
Tim Gross	6851024925	Multiregion structs Initial struct definitions, jobspec parsing, validation, and conversion between Nomad structs and API structs for multi-region deployments.	2020-06-17 11:00:14 -04:00
Chris Baker	de8a46b0f8	added -preserve-counts to `job run` CLI, updated website	2020-06-16 18:45:28 +00:00
Chris Baker	377f881fbd	removed api.RegisterJobRequest in favor of api.JobRegisterRequest modified `job inspect` and `job run -output` to use anonymous struct to keep previous behavior	2020-06-16 18:45:17 +00:00
Chris Baker	1e3563e08c	wip: added PreserveCounts to struct.JobRegisterRequest, development test for Job.Register	2020-06-16 18:45:17 +00:00
James Rasell	080d521691	Merge pull request #8162 from hashicorp/b-gh-8161 cli: fix malformed alloc status address list when more than 1 addr	2020-06-16 16:35:53 +02:00
James Rasell	222987602b	cli: fix malformed alloc status address list when more than 1 addr	2020-06-15 14:35:47 +02:00
Mahmood Ali	9bfc3e28d9	Apply suggestions from code review Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2020-06-15 08:32:16 -04:00
Mahmood Ali	dda67192b6	clarify error message Co-authored-by: Tim Gross <tgross@hashicorp.com>	2020-06-09 11:26:52 -04:00
Mahmood Ali	63f6307487	tests: client already disabled	2020-06-07 16:38:11 -04:00
Mahmood Ali	69bb42acf8	tests: prefix agent logs to identify agent sources	2020-06-07 16:38:11 -04:00
Mahmood Ali	257b3600ab	implement snapshot restore CLI	2020-06-07 15:47:07 -04:00
Mahmood Ali	9eb13ae144	basic snapshot restore	2020-06-07 15:46:23 -04:00
Seth Hoenig	435c0d9fc8	deps: Switch to Go modules for dependency management This PR switches the Nomad repository from using govendor to Go modules for managing dependencies. Aspects of the Nomad workflow remain pretty much the same. The usual Makefile targets should continue to work as they always did. The API submodule simply defers to the parent Nomad version on the repository, keeping the semantics of API versioning that currently exists.	2020-06-02 14:30:36 -05:00
Mahmood Ali	de44d9641b	Merge pull request #8047 from hashicorp/f-snapshot-save API for atomic snapshot backups	2020-06-01 07:55:16 -04:00
Mahmood Ali	19cc84ec05	Apply suggestions from code review Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>	2020-05-31 21:29:17 -04:00
Mahmood Ali	a73cd01a00	Merge pull request #8001 from hashicorp/f-jobs-list-across-nses endpoint to expose all jobs across all namespaces	2020-05-31 21:28:03 -04:00
Mahmood Ali	0e8fafd739	implement raft multiplier	2020-05-31 12:24:27 -04:00
Drew Bailey	23d24c7a7f	removes pro tags (#8014 )	2020-05-28 15:40:17 -04:00
Drew Bailey	34871f89be	Oss license support for ent builds (#8054 ) * changes necessary to support oss licesning shims revert nomad fmt changes update test to work with enterprise changes update tests to work with new ent enforcements make check update cas test to use scheduler algorithm back out preemption changes add comments * remove unused method	2020-05-27 13:46:52 -04:00
Drew Bailey	5948c4f497	Revert "disable license cli commands"	2020-05-26 12:39:39 -04:00
Seth Hoenig	889e7ddd0c	build: use hashicorp hclfmt We have been using fatih/hclfmt which is long abandoned. Instead, switch to HashiCorp's own hclfmt implementation. There are some trivial changes in behavior around whitespace.	2020-05-24 18:31:57 -05:00
Mahmood Ali	08b69d3bc4	implement snapshot inspect CLI	2020-05-21 20:04:38 -04:00
Mahmood Ali	0a27559b8f	Implement snapshot save CLI	2020-05-21 20:04:38 -04:00
Mahmood Ali	2108681c1d	Endpoint for snapshotting server state	2020-05-21 20:04:38 -04:00
James Rasell	ae0fb98c6b	api: return custom error if API attempts to decode empty body.	2020-05-19 15:46:31 +02:00
Mahmood Ali	5ab2d52e27	endpoint to expose all jobs across all namespaces Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all namespaces. The returned list is to contain a `Namespace` field indicating the job namespace. If ACL is enabled, the request token needs to be a management token or have `namespace:list-jobs` capability on all existing namespaces.	2020-05-18 13:50:46 -04:00
Nomad Release bot	189a378549	Generate files for 0.11.2 release	2020-05-14 20:49:42 +00:00
Mahmood Ali	9366181be6	always check `default_scheduler_config` config Also, avoid early return on validation to avoid masking some validation bugs in dev setup.	2020-05-14 14:16:12 -04:00
Lang Martin	d3c4700cd3	server: stop after client disconnect (#7939 ) * jobspec, api: add stop_after_client_disconnect * nomad/state/state_store: error message typo * structs: alloc methods to support stop_after_client_disconnect 1. a global AllocStates to track status changes with timestamps. We need this to track the time at which the alloc became lost originally. 2. ShouldClientStop() and WaitClientStop() to actually do the math * scheduler/reconcile_util: delayByStopAfterClientDisconnect * scheduler/reconcile: use delayByStopAfterClientDisconnect * scheduler/util: updateNonTerminalAllocsToLost comments This was setup to only update allocs to lost if the DesiredStatus had already been set by the scheduler. It seems like the intention was to update the status from any non-terminal state, and not all lost allocs have been marked stop or evict by now * scheduler/testing: AssertEvalStatus just use require * scheduler/generic_sched: don't create a blocked eval if delayed * scheduler/generic_sched_test: several scheduling cases	2020-05-13 16:39:04 -04:00
Tim Gross	4374c1a837	csi: support Secrets parameter in CSI RPCs (#7923 ) CSI plugins can require credentials for some publishing and unpublishing workflow RPCs. Secrets are configured at the time of volume registration, stored in the volume struct, and then passed around as an opaque map by Nomad to the plugins.	2020-05-11 17:12:51 -04:00
Drew Bailey	466e8d5043	disable license cli commands	2020-05-11 13:49:29 -04:00
Mahmood Ali	061a439f2c	Merge pull request #7912 from hashicorp/f-scheduler-algorithm-followup Scheduler Algorithm Defaults handling and docs	2020-05-11 09:30:58 -04:00
Tim Gross	3aa761b151	Periodic GC for volume claims (#7881 ) This changeset implements a periodic garbage collection of CSI volumes with missing allocations. This can happen in a scenario where a node update fails partially and the allocation updates are written to raft but the evaluations to GC the volumes are dropped. This feature will cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1 get any stray claims cleaned up.	2020-05-11 08:20:50 -04:00
Mahmood Ali	2c963885b0	handle upgrade path and defaults Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on upgrades or on API handling when user misses the value. The scheduler already treats `""` value as binpack. This PR merely ensures that the operator API returns the effective value.	2020-05-09 12:34:08 -04:00
Drew Bailey	fde40046a1	update license output	2020-05-07 12:14:15 -04:00
Tim Gross	801ebcfe8d	periodic GC for CSI plugins (#7878 ) This changeset implements a periodic garbage collection of unused CSI plugins. Plugins are self-cleaning when the last allocation for a plugin is stopped, but this feature will cover any missing edge cases and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins cleaned up.	2020-05-06 16:49:12 -04:00
Drew Bailey	48c451709e	update license command output to reflect api changes	2020-05-05 10:28:58 -04:00
Mahmood Ali	78ae7b885a	Merge pull request #7810 from hashicorp/spread-configuration spread scheduling algorithm	2020-05-01 13:15:19 -04:00
Mahmood Ali	b9e3cde865	tests and some clean up	2020-05-01 13:13:30 -04:00
Charlie Voiselle	663fb677cf	Add SchedulerAlgorithm to SchedulerConfig	2020-05-01 13:13:29 -04:00
Drew Bailey	581ad558a8	temporarily test for 404 until endpoint is ready	2020-05-01 11:24:37 -04:00
Drew Bailey	41c7d49eb7	properly format license output	2020-04-30 14:46:26 -04:00
Drew Bailey	42075ef30e	allow test to check if server is enterprise	2020-04-30 14:46:21 -04:00
Drew Bailey	acacecc67b	add license reset command to commands help text formatting remove reset no signed option	2020-04-30 14:46:20 -04:00
Drew Bailey	a266284f60	test all commands oss err	2020-04-30 14:46:19 -04:00
Drew Bailey	59b76f90e8	hcl fmt from editor license cli formatting, license endpoints ent only test oss error type assertions	2020-04-30 14:46:18 -04:00
Drew Bailey	74abe6ef48	license cli commands cli changes, formatting	2020-04-30 14:46:17 -04:00
Lang Martin	e32b5b12dd	command: deployment status without a prefix lists deployments (#7821 )	2020-04-28 15:11:32 -04:00
Mahmood Ali	b8fb32f5d2	http: adjust log level for request failure Failed requests due to API client errors are to be marked as DEBUG. The Error log level should be reserved to signal problems with the cluster and are actionable for nomad system operators. Logs due to misbehaving API clients don't represent a system level problem and seem spurius to nomad maintainers at best. These log messages can also be attack vectors for deniel of service attacks by filling servers disk space with spurious log messages.	2020-04-22 16:19:59 -04:00
Mahmood Ali	5b42796f1e	Merge pull request #7704 from hashicorp/b-agent-shutdown-order agent: shutdown agent http server last	2020-04-20 10:37:26 -04:00
Mahmood Ali	4e1366f285	agent: route http logs through hclog Pipe http server log to hclog, so that it uses the same logging format as rest of nomad logs. Also, supports emitting them as json logs, when json formatting is set. The http server logs are emitted as Trace level, as they are typically repsent HTTP client errors (e.g. failed tls handshakes, invalid headers, etc). Though, Panic logs represent server errors and are relayed as Error level.	2020-04-20 10:33:40 -04:00
Jeffrey 'jf' Lim	eab600d3e1	Fix/improve "job plan" messaging (#7580 )	2020-04-17 15:53:16 -04:00
Mahmood Ali	b78680eee7	agent: shutdown agent http server last Shutdown http server last, after nomad client/server components terminate. Before this change, if the agent is taking an unexpectedly long time to shutdown, the operator cannot query the http server directly: they cannot access agent specific http endpoints and need to query another agent about the troublesome agent. Unexpectedly long shutdown can happen in normal cases, e.g. a client might hung is if one of the allocs it is running has a long shutdown_delay. Here, we switch to ensuring that the http server is shutdown last. I believe this doesn't require extra care in agent shutting down logic while operators may be able to submit write http requests. We already need to cope with operators submiting these http requests to another agent or by servers updating the client allocations.	2020-04-13 10:50:07 -04:00
Mahmood Ali	14d6fec05a	tests: deflake some SetServer related tests Some tests assert on numbers on numbers of servers, e.g. TestHTTP_AgentSetServers and TestHTTP_AgentListServers_ACL . Though, in dev and test modes, the agent starts with servers having duplicate entries for advertised and normalized RPC values, then settles with one unique value after Raft/Serf re-sets servers with one single unique value. This leads to flakiness, as the test will fail if assertion runs before Serf update takes effect. Here, we update the inital dev handling so it only adds a unique value if the advertised and normalized values are the same. Sample log lines illustrating the problem: ``` === CONT TestHTTP_AgentSetServers TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO] nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:9008 Address:127.0.0.1:9008}]" TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO] nomad: serf: EventMemberJoin: TestHTTP_AgentSetServers.global 127.0.0.1 TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.035Z [DEBUG] client.server_mgr: new server list: new_servers=[127.0.0.1:9008, 127.0.0.1:9008] old_servers=[] ... TestHTTP_AgentSetServers: agent_endpoint_test.go:759: Error Trace: agent_endpoint_test.go:759 http_test.go:1089 agent_endpoint_test.go:705 Error: "[127.0.0.1:9008 127.0.0.1:9008]" should have 1 item(s), but has 2 Test: TestHTTP_AgentSetServers ```	2020-04-07 09:27:48 -04:00
Mahmood Ali	ed4c4d13a4	fixup! backend: support WS authentication handshake in alloc/exec	2020-04-03 14:20:31 -04:00
Mahmood Ali	e63e096136	backend: support WS authentication handshake in alloc/exec The javascript Websocket API doesn't support setting custom headers (e.g. `X-Nomad-Token`). This change adds support for having an authentication handshake message: clients can set `ws_handshake` URL query parameter to true and send a single handshake message with auth token first before any other mssage. This is a backward compatible change: it does not affect nomad CLI path, as it doesn't set `ws_handshake` parameter.	2020-04-03 11:18:54 -04:00
Mahmood Ali	990cfb6fef	agent config parsing tests for scheduler config	2020-04-03 07:54:32 -04:00
Chris Baker	277d29c6e7	Merge pull request #7572 from hashicorp/f-7422-scaling-events finalizing scaling API work	2020-04-01 13:49:22 -05:00
Seth Hoenig	9aa9721143	connect: fix bug where absent connect.proxy stanza needs default config In some refactoring, a bug was introduced where if the connect.proxy stanza in a submitted job was nil, the default proxy configuration would not be initialized with default values, effectively breaking Connect. connect { sidecar_service {} # should work } In contrast, by setting an empty proxy stanza, the config values would be inserted correctly. connect { sidecar_service { proxy {} # workaround } } This commit restores the original behavior, where having a proxy stanza present is not required. The unit test for this case has also been corrected.	2020-04-01 11:19:32 -06:00
Chris Baker	40d6b3bbd1	adding raft and state_store support to track job scaling events updated ScalingEvent API to record "message string,error bool" instead of confusing "reason,error *string"	2020-04-01 16:15:14 +00:00
Seth Hoenig	14c7cebdea	connect: enable automatic expose paths for individual group service checks Part of #6120 Building on the support for enabling connect proxy paths in #7323, this change adds the ability to configure the 'service.check.expose' flag on group-level service check definitions for services that are connect-enabled. This is a slight deviation from the "magic" that Consul provides. With Consul, the 'expose' flag exists on the connect.proxy stanza, which will then auto-generate expose paths for every HTTP and gRPC service check associated with that connect-enabled service. A first attempt at providing similar magic for Nomad's Consul Connect integration followed that pattern exactly, as seen in #7396. However, on reviewing the PR we realized having the `expose` flag on the proxy stanza inseperably ties together the automatic path generation with every HTTP/gRPC defined on the service. This makes sense in Consul's context, because a service definition is reasonably associated with a single "task". With Nomad's group level service definitions however, there is a reasonable expectation that a service definition is more abstractly representative of multiple services within the task group. In this case, one would want to define checks of that service which concretely make HTTP or gRPC requests to different underlying tasks. Such a model is not possible with the course `proxy.expose` flag. Instead, we now have the flag made available within the check definitions themselves. By making the expose feature resolute to each check, it is possible to have some HTTP/gRPC checks which make use of the envoy exposed paths, as well as some HTTP/gRPC checks which make use of some orthongonal port-mapping to do checks on some other task (or even some other bound port of the same task) within the task group. Given this example, group "server-group" { network { mode = "bridge" port "forchecks" { to = -1 } } service { name = "myserver" port = 2000 connect { sidecar_service { } } check { name = "mycheck-myserver" type = "http" port = "forchecks" interval = "3s" timeout = "2s" method = "GET" path = "/classic/responder/health" expose = true } } } Nomad will automatically inject (via job endpoint mutator) the extrapolated expose path configuration, i.e. expose { path { path = "/classic/responder/health" protocol = "http" local_path_port = 2000 listener_port = "forchecks" } } Documentation is coming in #7440 (needs updating, doing next) Modifications to the `countdash` examples in https://github.com/hashicorp/demo-consul-101/pull/6 which will make the examples in the documentation actually runnable. Will add some e2e tests based on the above when it becomes available.	2020-03-31 17:15:50 -06:00
Seth Hoenig	41244c5857	jobspec: parse multi expose.path instead of explicit slice	2020-03-31 17:15:27 -06:00
Seth Hoenig	0266f056b8	connect: enable proxy.passthrough configuration Enable configuration of HTTP and gRPC endpoints which should be exposed by the Connect sidecar proxy. This changeset is the first "non-magical" pass that lays the groundwork for enabling Consul service checks for tasks running in a network namespace because they are Connect-enabled. The changes here provide for full configuration of the connect { sidecar_service { proxy { expose { paths = [{ path = <exposed endpoint> protocol = <http or grpc> local_path_port = <local endpoint port> listener_port = <inbound mesh port> }, ... ] } } } stanza. Everything from `expose` and below is new, and partially implements the precedent set by Consul: https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference Combined with a task-group level network port-mapping in the form: port "exposeExample" { to = -1 } it is now possible to "punch a hole" through the network namespace to a specific HTTP or gRPC path, with the anticipated use case of creating Consul checks on Connect enabled services. A future PR may introduce more automagic behavior, where we can do things like 1) auto-fill the 'expose.path.local_path_port' with the default value of the 'service.port' value for task-group level connect-enabled services. 2) automatically generate a port-mapping 3) enable an 'expose.checks' flag which automatically creates exposed endpoints for every compatible consul service check (http/grpc checks on connect enabled services).	2020-03-31 17:15:27 -06:00
Seth Hoenig	1ce4eb17fa	client: use consistent name for struct receiver parameter This helps reduce the number of squiggly lines in Goland.	2020-03-31 17:15:27 -06:00
Lang Martin	8d4f39fba1	csi: add node events to report progress mounting and unmounting volumes (#7547 ) * nomad/structs/structs: new NodeEventSubsystemCSI * client/client: pass triggerNodeEvent in the CSIConfig * client/pluginmanager/csimanager/instance: add eventer to instanceManager * client/pluginmanager/csimanager/manager: pass triggerNodeEvent * client/pluginmanager/csimanager/volume: node event on [un]mount * nomad/structs/structs: use storage, not CSI * client/pluginmanager/csimanager/volume: use storage, not CSI * client/pluginmanager/csimanager/volume_test: eventer * client/pluginmanager/csimanager/volume: event on error * client/pluginmanager/csimanager/volume_test: check event on error * command/node_status: remove an extra space in event detail format * client/pluginmanager/csimanager/volume: use snake_case for details * client/pluginmanager/csimanager/volume_test: snake_case details	2020-03-31 17:13:52 -04:00
Yoan Blanc	225c9c1215	fixup! vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:48:07 -04:00
Yoan Blanc	761d014071	vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:45:21 -04:00
Seth Hoenig	b3664c628c	Merge pull request #7524 from hashicorp/docs-consul-acl-minimums consul: annotate Consul interfaces with ACLs	2020-03-30 13:27:27 -06:00
Mahmood Ali	7df337e4c4	Merge pull request #7534 from hashicorp/b-windows-dev-network windows: support -dev mode	2020-03-30 14:35:28 -04:00
Seth Hoenig	0a812ab689	consul: annotate Consul interfaces with ACLs	2020-03-30 10:17:28 -06:00
Drew Bailey	a98dc8c768	update audit examples to an endpoint that is audited	2020-03-30 10:03:11 -04:00
Mahmood Ali	dedf1cd3d7	tests: remove TestHTTP_NodeDrain_Compat Nomad 0.11 servers no longer support having pre-0.8 clients.	2020-03-30 07:06:52 -04:00
Mahmood Ali	8b2b3f99d3	tests: deflake TestHTTP_NodeDrain A node may be recognized as not running any allocs and have its drain flag reset before the test queries it.	2020-03-30 07:06:52 -04:00
Mahmood Ali	b0cc23ae63	tests: deflake TestConsul_PeriodicSync	2020-03-30 07:06:47 -04:00

... 3 4 5 6 7 ...

3008 commits