open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	bd457343de	MRD: all regions should start pending (#8433 ) Deployments should wait until kicked off by `Job.Register` so that we can assert that all regions have a scheduled deployment before starting any region. This changeset includes the OSS fixes to support the ENT work. `IsMultiregionStarter` has no more callers in OSS, so remove it here.	2020-07-14 10:57:37 -04:00
Chris Baker	f8478b6f82	Merge branch 'master' of github.com:hashicorp/nomad into release-0.12.0	2020-07-08 21:16:31 +00:00
Nick Ethier	119ece09a0	docs: add CNI and host_network docs (#8391 ) Co-authored-by: Seth Hoenig <shoenig@hashicorp.com>	2020-07-08 15:45:04 -04:00
Tim Gross	1098ca6ef1	fix multiregion plan output flags (#8375 ) The call to render the output diff swapped the `diff` and `verbose` bool parameters, resulting in dropping the diff output in multi-region plans but not single-region plans.	2020-07-08 10:10:08 -04:00
Nomad Release bot	549e766eab	Generate files for 0.12.0-rc1 release	2020-07-07 03:17:05 +00:00
Nick Ethier	e0fb634309	ar: support opting into binding host ports to default network IP (#8321 ) * ar: support opting into binding host ports to default network IP * fix config plumbing * plumb node address into network resource * struct: only handle network resource upgrade path once	2020-07-06 18:51:46 -04:00
Tim Gross	18250f71fd	fix region flag vs job region handling in plan/submit (#8347 )	2020-07-06 15:46:09 -04:00
Chris Baker	9100b6b7c0	changes to make sure that Max is present and valid, to improve error messages * made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected * added checks to HCL parsing that it is present * when Scaling.Max is absent/invalid, don't return extraneous error messages during validation * tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs resolves #8355	2020-07-04 19:05:50 +00:00
Mahmood Ali	329969b97e	tests: make testagent shutdown idempotent Avoid double freeing ports if an agent.Shutdown() is called multiple times.	2020-07-03 09:16:01 -04:00
Lang Martin	1e7560d621	command/debug: use the correct env vars for Consul token (#8332 )	2020-07-02 10:04:22 -04:00
Lang Martin	6c22cd587d	api: `nomad debug` new /agent/host (#8325 ) * command/agent/host: collect host data, multi platform * nomad/structs/structs: new HostDataRequest/Response * client/agent_endpoint: add RPC endpoint * command/agent/agent_endpoint: add Host * api/agent: add the Host endpoint * nomad/client_agent_endpoint: add Agent Host with forwarding * nomad/client_agent_endpoint: use findClientConn This changes forwardMonitorClient and forwardProfileClient to use findClientConn, which was cribbed from the common parts of those funcs. * command/debug: call agent hosts * command/agent/host: eliminate calling external programs	2020-07-02 09:51:25 -04:00
Mahmood Ali	1917989a1f	document namespace option in CLI docs	2020-07-01 15:31:41 -04:00
Tim Gross	23be116da0	csi: add -force flag to volume deregister (#8295 ) The `nomad volume deregister` command currently returns an error if the volume has any claims, but in cases where the claims can't be dropped because of plugin errors, providing a `-force` flag gives the operator an escape hatch. If the volume has no allocations or if they are all terminal, this flag deletes the volume from the state store, immediately and implicitly dropping all claims without further CSI RPCs. Note that this will not also unmount/detach the volume, which we'll make the responsibility of a separate `nomad volume detach` command.	2020-07-01 12:17:51 -04:00
Mahmood Ali	ee6fbcbc0f	Merge pull request #8296 from hashicorp/b-tests-cleanup-20200625 Cleanup for command package tests	2020-06-26 09:31:41 -04:00
Mahmood Ali	30492e8119	tests: avoid using os.Setenv for tokens	2020-06-26 08:52:21 -04:00
Mahmood Ali	9583190eb3	tests: use flagAddress instead of process env Using Setenv may can cause test interference, where a test may accidentally pick up value set by another test.	2020-06-26 08:52:21 -04:00
Mahmood Ali	384d8cf3a5	Merge pull request #8271 from hashicorp/f-comment-init-check-stanza Comment out default Consul check; Update URLs	2020-06-26 08:30:30 -04:00
Nick Ethier	89118016fc	command: correctly show host IP in ports output /w multi-host networks (#8289 )	2020-06-25 15:16:01 -04:00
Lang Martin	9b657b5e5e	new command: nomad debug captures a debug archive of cluster state (#8244 ) * command/debug: build a local archive of debug data * command/debug: query consul and vault directly * command/debug: include pprof CPUProfile Trace and goroutine * command/debug: trap signals and close the monitor requests	2020-06-25 12:51:23 -04:00
Mahmood Ali	8631e9dad5	always shutdown test server on test cleanup	2020-06-25 12:44:19 -04:00
Tim Gross	e52f76ed53	update compiled static assets	2020-06-24 16:37:13 -04:00
Charlie Voiselle	e0e3a66b3a	Fix link to scheduler page	2020-06-24 15:44:07 -04:00
Charlie Voiselle	9b20269709	Comment out default Consul check; Update URLs Having an active check in the sample job causes issues with testing deployments in environments that are not integrated with Consul. This negatively impacts some of the getting-started experiences. Commenting out the check allows deployments to proceed successfully but leaves it in the sample job for convenience. Made a drive-by fix to all of the URLs in the jobfile	2020-06-24 15:34:48 -04:00
Tim Gross	67ffcb35e9	multiregion: add support for 'job plan' (#8266 ) Add a scatter-gather for multiregion job plans. Each region's servers interpolate the plan locally in `Job.Plan` but don't distribute the plan as done in `Job.Run`. Note that it's not possible to return a usable modify index from a multiregion plan for use with `-check-index`. Even if we were to force the modify index to be the same at the start of `Job.Run` the index immediately drifts during each region's deployments, depending on events local to each region. So we omit this section of a multiregion plan.	2020-06-24 13:24:55 -04:00
Tim Gross	a449009e9f	multiregion validation fixes (#8265 ) Multi-region jobs need to bypass validating counts otherwise we get spurious warnings in Job.Plan.	2020-06-24 12:18:51 -04:00
Seth Hoenig	3872b493e5	Merge pull request #8011 from hashicorp/f-cnative-host consul/connect: implement initial support for connect native	2020-06-24 10:33:12 -05:00
Seth Hoenig	e79b79034d	connect/native: fixup command/agent/consul/connect test cases	2020-06-24 09:05:56 -05:00
Tim Gross	010d94d419	multiregion: job stop across regions with -global flag (#8258 ) Adds a `-global` flag for stopping multiregion jobs in all regions at once. Warn the user if they attempt to stop a multiregion job in a single region.	2020-06-23 15:56:04 -04:00
James Rasell	bc40665f1d	cli: fix license get command help Synopsis text.	2020-06-23 18:47:39 +02:00
Seth Hoenig	6c5ab7f45e	consul/connect: split connect native flag and task in service	2020-06-23 10:22:22 -05:00
Seth Hoenig	4d71f22a11	consul/connect: add support for running connect native tasks This PR adds the capability of running Connect Native Tasks on Nomad, particularly when TLS and ACLs are enabled on Consul. The `connect` stanza now includes a `native` parameter, which can be set to the name of task that backs the Connect Native Consul service. There is a new Client configuration parameter for the `consul` stanza called `share_ssl`. Like `allow_unauthenticated` the default value is true, but recommended to be disabled in production environments. When enabled, the Nomad Client's Consul TLS information is shared with Connect Native tasks through the normal Consul environment variables. This does NOT include auth or token information. If Consul ACLs are enabled, Service Identity Tokens are automatically and injected into the Connect Native task through the CONSUL_HTTP_TOKEN environment variable. Any of the automatically set environment variables can be overridden by the Connect Native task using the `env` stanza. Fixes #6083	2020-06-22 14:07:44 -05:00
Mahmood Ali	fa4e898c45	accomodate enterprise specific commands `nomad operator snapshot agent` is an Enterprise specific command	2020-06-22 10:27:25 -04:00
Michael Schurter	562704124d	Merge pull request #8208 from hashicorp/f-multi-network multi-interface network support	2020-06-19 15:46:48 -07:00
Mahmood Ali	bf08b7a890	Merge pull request #8214 from hashicorp/docs-snapshot-update Update changelog and snapshot docs	2020-06-19 14:27:12 -04:00
Mahmood Ali	d04ab67045	Apply suggestions from code review Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>	2020-06-19 13:36:22 -04:00
Mahmood Ali	ef6507d6ee	cli: use <file> for consistency	2020-06-19 12:19:38 -04:00
Mahmood Ali	ce0eee6a78	complete missed message	2020-06-19 11:02:36 -04:00
Mahmood Ali	963b1251ff	Merge pull request #8082 from hashicorp/f-raft-multipler Implement raft multipler flag	2020-06-19 10:04:59 -04:00
Nick Ethier	f0559a8162	multi-interface network support	2020-06-19 09:42:10 -04:00
Mahmood Ali	38a01c050e	Merge pull request #8192 from hashicorp/f-status-allnamespaces-2 CLI Allow querying all namespaces for jobs and allocations - Try 2	2020-06-18 20:16:52 -04:00
Nick Ethier	0bc0403cc3	Task DNS Options (#7661 ) Co-Authored-By: Tim Gross <tgross@hashicorp.com> Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>	2020-06-18 11:01:31 -07:00
Mahmood Ali	5c623f33d5	cli: warn on multiple prefix matches when querying all namespaces	2020-06-17 16:32:51 -04:00
Mahmood Ali	8d9ce41202	cli: query all namespaces for alloc subcommands	2020-06-17 16:31:06 -04:00
Mahmood Ali	7a33a75449	cli: jobs allow querying jobs in all namespaces	2020-06-17 16:31:01 -04:00
Mahmood Ali	e784fe331a	use '*' to indicate all namespaces This reverts the introduction of AllNamespaces parameter that was merged earlier but never got released.	2020-06-17 16:27:43 -04:00
Tim Gross	7b12445f29	multiregion: change AutoRevert to OnFailure	2020-06-17 11:05:45 -04:00
Tim Gross	b09b7a2475	Multiregion job registration Integration points for multiregion jobs to be registered in the enterprise version of Nomad: * hook in `Job.Register` for enterprise to send job to peer regions * remove monitoring from `nomad job run` and `nomad job stop` for multiregion jobs	2020-06-17 11:04:58 -04:00
Tim Gross	161bcd9479	use constants from http package	2020-06-17 11:04:02 -04:00
Tim Gross	b93efc16d5	multiregion CLI: nomad deployment unblock	2020-06-17 11:03:44 -04:00
Drew Bailey	9263fcb0d3	Multiregion deploy status and job status CLI	2020-06-17 11:03:34 -04:00
Tim Gross	6851024925	Multiregion structs Initial struct definitions, jobspec parsing, validation, and conversion between Nomad structs and API structs for multi-region deployments.	2020-06-17 11:00:14 -04:00
Chris Baker	de8a46b0f8	added -preserve-counts to `job run` CLI, updated website	2020-06-16 18:45:28 +00:00
Chris Baker	377f881fbd	removed api.RegisterJobRequest in favor of api.JobRegisterRequest modified `job inspect` and `job run -output` to use anonymous struct to keep previous behavior	2020-06-16 18:45:17 +00:00
Chris Baker	1e3563e08c	wip: added PreserveCounts to struct.JobRegisterRequest, development test for Job.Register	2020-06-16 18:45:17 +00:00
James Rasell	080d521691	Merge pull request #8162 from hashicorp/b-gh-8161 cli: fix malformed alloc status address list when more than 1 addr	2020-06-16 16:35:53 +02:00
James Rasell	222987602b	cli: fix malformed alloc status address list when more than 1 addr	2020-06-15 14:35:47 +02:00
Mahmood Ali	9bfc3e28d9	Apply suggestions from code review Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2020-06-15 08:32:16 -04:00
Mahmood Ali	dda67192b6	clarify error message Co-authored-by: Tim Gross <tgross@hashicorp.com>	2020-06-09 11:26:52 -04:00
Mahmood Ali	63f6307487	tests: client already disabled	2020-06-07 16:38:11 -04:00
Mahmood Ali	69bb42acf8	tests: prefix agent logs to identify agent sources	2020-06-07 16:38:11 -04:00
Mahmood Ali	257b3600ab	implement snapshot restore CLI	2020-06-07 15:47:07 -04:00
Mahmood Ali	9eb13ae144	basic snapshot restore	2020-06-07 15:46:23 -04:00
Seth Hoenig	435c0d9fc8	deps: Switch to Go modules for dependency management This PR switches the Nomad repository from using govendor to Go modules for managing dependencies. Aspects of the Nomad workflow remain pretty much the same. The usual Makefile targets should continue to work as they always did. The API submodule simply defers to the parent Nomad version on the repository, keeping the semantics of API versioning that currently exists.	2020-06-02 14:30:36 -05:00
Mahmood Ali	de44d9641b	Merge pull request #8047 from hashicorp/f-snapshot-save API for atomic snapshot backups	2020-06-01 07:55:16 -04:00
Mahmood Ali	19cc84ec05	Apply suggestions from code review Co-authored-by: Drew Bailey <2614075+drewbailey@users.noreply.github.com>	2020-05-31 21:29:17 -04:00
Mahmood Ali	a73cd01a00	Merge pull request #8001 from hashicorp/f-jobs-list-across-nses endpoint to expose all jobs across all namespaces	2020-05-31 21:28:03 -04:00
Mahmood Ali	0e8fafd739	implement raft multiplier	2020-05-31 12:24:27 -04:00
Drew Bailey	23d24c7a7f	removes pro tags (#8014 )	2020-05-28 15:40:17 -04:00
Drew Bailey	34871f89be	Oss license support for ent builds (#8054 ) * changes necessary to support oss licesning shims revert nomad fmt changes update test to work with enterprise changes update tests to work with new ent enforcements make check update cas test to use scheduler algorithm back out preemption changes add comments * remove unused method	2020-05-27 13:46:52 -04:00
Drew Bailey	5948c4f497	Revert "disable license cli commands"	2020-05-26 12:39:39 -04:00
Seth Hoenig	889e7ddd0c	build: use hashicorp hclfmt We have been using fatih/hclfmt which is long abandoned. Instead, switch to HashiCorp's own hclfmt implementation. There are some trivial changes in behavior around whitespace.	2020-05-24 18:31:57 -05:00
Mahmood Ali	08b69d3bc4	implement snapshot inspect CLI	2020-05-21 20:04:38 -04:00
Mahmood Ali	0a27559b8f	Implement snapshot save CLI	2020-05-21 20:04:38 -04:00
Mahmood Ali	2108681c1d	Endpoint for snapshotting server state	2020-05-21 20:04:38 -04:00
James Rasell	ae0fb98c6b	api: return custom error if API attempts to decode empty body.	2020-05-19 15:46:31 +02:00
Mahmood Ali	5ab2d52e27	endpoint to expose all jobs across all namespaces Allow a `/v1/jobs?all_namespaces=true` to list all jobs across all namespaces. The returned list is to contain a `Namespace` field indicating the job namespace. If ACL is enabled, the request token needs to be a management token or have `namespace:list-jobs` capability on all existing namespaces.	2020-05-18 13:50:46 -04:00
Nomad Release bot	189a378549	Generate files for 0.11.2 release	2020-05-14 20:49:42 +00:00
Mahmood Ali	9366181be6	always check `default_scheduler_config` config Also, avoid early return on validation to avoid masking some validation bugs in dev setup.	2020-05-14 14:16:12 -04:00
Lang Martin	d3c4700cd3	server: stop after client disconnect (#7939 ) * jobspec, api: add stop_after_client_disconnect * nomad/state/state_store: error message typo * structs: alloc methods to support stop_after_client_disconnect 1. a global AllocStates to track status changes with timestamps. We need this to track the time at which the alloc became lost originally. 2. ShouldClientStop() and WaitClientStop() to actually do the math * scheduler/reconcile_util: delayByStopAfterClientDisconnect * scheduler/reconcile: use delayByStopAfterClientDisconnect * scheduler/util: updateNonTerminalAllocsToLost comments This was setup to only update allocs to lost if the DesiredStatus had already been set by the scheduler. It seems like the intention was to update the status from any non-terminal state, and not all lost allocs have been marked stop or evict by now * scheduler/testing: AssertEvalStatus just use require * scheduler/generic_sched: don't create a blocked eval if delayed * scheduler/generic_sched_test: several scheduling cases	2020-05-13 16:39:04 -04:00
Tim Gross	4374c1a837	csi: support Secrets parameter in CSI RPCs (#7923 ) CSI plugins can require credentials for some publishing and unpublishing workflow RPCs. Secrets are configured at the time of volume registration, stored in the volume struct, and then passed around as an opaque map by Nomad to the plugins.	2020-05-11 17:12:51 -04:00
Drew Bailey	466e8d5043	disable license cli commands	2020-05-11 13:49:29 -04:00
Mahmood Ali	061a439f2c	Merge pull request #7912 from hashicorp/f-scheduler-algorithm-followup Scheduler Algorithm Defaults handling and docs	2020-05-11 09:30:58 -04:00
Tim Gross	3aa761b151	Periodic GC for volume claims (#7881 ) This changeset implements a periodic garbage collection of CSI volumes with missing allocations. This can happen in a scenario where a node update fails partially and the allocation updates are written to raft but the evaluations to GC the volumes are dropped. This feature will cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1 get any stray claims cleaned up.	2020-05-11 08:20:50 -04:00
Mahmood Ali	2c963885b0	handle upgrade path and defaults Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on upgrades or on API handling when user misses the value. The scheduler already treats `""` value as binpack. This PR merely ensures that the operator API returns the effective value.	2020-05-09 12:34:08 -04:00
Drew Bailey	fde40046a1	update license output	2020-05-07 12:14:15 -04:00
Tim Gross	801ebcfe8d	periodic GC for CSI plugins (#7878 ) This changeset implements a periodic garbage collection of unused CSI plugins. Plugins are self-cleaning when the last allocation for a plugin is stopped, but this feature will cover any missing edge cases and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins cleaned up.	2020-05-06 16:49:12 -04:00
Drew Bailey	48c451709e	update license command output to reflect api changes	2020-05-05 10:28:58 -04:00
Mahmood Ali	78ae7b885a	Merge pull request #7810 from hashicorp/spread-configuration spread scheduling algorithm	2020-05-01 13:15:19 -04:00
Mahmood Ali	b9e3cde865	tests and some clean up	2020-05-01 13:13:30 -04:00
Charlie Voiselle	663fb677cf	Add SchedulerAlgorithm to SchedulerConfig	2020-05-01 13:13:29 -04:00
Drew Bailey	581ad558a8	temporarily test for 404 until endpoint is ready	2020-05-01 11:24:37 -04:00
Drew Bailey	41c7d49eb7	properly format license output	2020-04-30 14:46:26 -04:00
Drew Bailey	42075ef30e	allow test to check if server is enterprise	2020-04-30 14:46:21 -04:00
Drew Bailey	acacecc67b	add license reset command to commands help text formatting remove reset no signed option	2020-04-30 14:46:20 -04:00
Drew Bailey	a266284f60	test all commands oss err	2020-04-30 14:46:19 -04:00
Drew Bailey	59b76f90e8	hcl fmt from editor license cli formatting, license endpoints ent only test oss error type assertions	2020-04-30 14:46:18 -04:00
Drew Bailey	74abe6ef48	license cli commands cli changes, formatting	2020-04-30 14:46:17 -04:00
Lang Martin	e32b5b12dd	command: deployment status without a prefix lists deployments (#7821 )	2020-04-28 15:11:32 -04:00
Mahmood Ali	b8fb32f5d2	http: adjust log level for request failure Failed requests due to API client errors are to be marked as DEBUG. The Error log level should be reserved to signal problems with the cluster and are actionable for nomad system operators. Logs due to misbehaving API clients don't represent a system level problem and seem spurius to nomad maintainers at best. These log messages can also be attack vectors for deniel of service attacks by filling servers disk space with spurious log messages.	2020-04-22 16:19:59 -04:00
Mahmood Ali	5b42796f1e	Merge pull request #7704 from hashicorp/b-agent-shutdown-order agent: shutdown agent http server last	2020-04-20 10:37:26 -04:00
Mahmood Ali	4e1366f285	agent: route http logs through hclog Pipe http server log to hclog, so that it uses the same logging format as rest of nomad logs. Also, supports emitting them as json logs, when json formatting is set. The http server logs are emitted as Trace level, as they are typically repsent HTTP client errors (e.g. failed tls handshakes, invalid headers, etc). Though, Panic logs represent server errors and are relayed as Error level.	2020-04-20 10:33:40 -04:00
Jeffrey 'jf' Lim	eab600d3e1	Fix/improve "job plan" messaging (#7580 )	2020-04-17 15:53:16 -04:00
Mahmood Ali	b78680eee7	agent: shutdown agent http server last Shutdown http server last, after nomad client/server components terminate. Before this change, if the agent is taking an unexpectedly long time to shutdown, the operator cannot query the http server directly: they cannot access agent specific http endpoints and need to query another agent about the troublesome agent. Unexpectedly long shutdown can happen in normal cases, e.g. a client might hung is if one of the allocs it is running has a long shutdown_delay. Here, we switch to ensuring that the http server is shutdown last. I believe this doesn't require extra care in agent shutting down logic while operators may be able to submit write http requests. We already need to cope with operators submiting these http requests to another agent or by servers updating the client allocations.	2020-04-13 10:50:07 -04:00
Mahmood Ali	14d6fec05a	tests: deflake some SetServer related tests Some tests assert on numbers on numbers of servers, e.g. TestHTTP_AgentSetServers and TestHTTP_AgentListServers_ACL . Though, in dev and test modes, the agent starts with servers having duplicate entries for advertised and normalized RPC values, then settles with one unique value after Raft/Serf re-sets servers with one single unique value. This leads to flakiness, as the test will fail if assertion runs before Serf update takes effect. Here, we update the inital dev handling so it only adds a unique value if the advertised and normalized values are the same. Sample log lines illustrating the problem: ``` === CONT TestHTTP_AgentSetServers TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO] nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:9008 Address:127.0.0.1:9008}]" TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO] nomad: serf: EventMemberJoin: TestHTTP_AgentSetServers.global 127.0.0.1 TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.035Z [DEBUG] client.server_mgr: new server list: new_servers=[127.0.0.1:9008, 127.0.0.1:9008] old_servers=[] ... TestHTTP_AgentSetServers: agent_endpoint_test.go:759: Error Trace: agent_endpoint_test.go:759 http_test.go:1089 agent_endpoint_test.go:705 Error: "[127.0.0.1:9008 127.0.0.1:9008]" should have 1 item(s), but has 2 Test: TestHTTP_AgentSetServers ```	2020-04-07 09:27:48 -04:00
Mahmood Ali	ed4c4d13a4	fixup! backend: support WS authentication handshake in alloc/exec	2020-04-03 14:20:31 -04:00
Mahmood Ali	e63e096136	backend: support WS authentication handshake in alloc/exec The javascript Websocket API doesn't support setting custom headers (e.g. `X-Nomad-Token`). This change adds support for having an authentication handshake message: clients can set `ws_handshake` URL query parameter to true and send a single handshake message with auth token first before any other mssage. This is a backward compatible change: it does not affect nomad CLI path, as it doesn't set `ws_handshake` parameter.	2020-04-03 11:18:54 -04:00
Mahmood Ali	990cfb6fef	agent config parsing tests for scheduler config	2020-04-03 07:54:32 -04:00
Chris Baker	277d29c6e7	Merge pull request #7572 from hashicorp/f-7422-scaling-events finalizing scaling API work	2020-04-01 13:49:22 -05:00
Seth Hoenig	9aa9721143	connect: fix bug where absent connect.proxy stanza needs default config In some refactoring, a bug was introduced where if the connect.proxy stanza in a submitted job was nil, the default proxy configuration would not be initialized with default values, effectively breaking Connect. connect { sidecar_service {} # should work } In contrast, by setting an empty proxy stanza, the config values would be inserted correctly. connect { sidecar_service { proxy {} # workaround } } This commit restores the original behavior, where having a proxy stanza present is not required. The unit test for this case has also been corrected.	2020-04-01 11:19:32 -06:00
Chris Baker	40d6b3bbd1	adding raft and state_store support to track job scaling events updated ScalingEvent API to record "message string,error bool" instead of confusing "reason,error *string"	2020-04-01 16:15:14 +00:00
Seth Hoenig	14c7cebdea	connect: enable automatic expose paths for individual group service checks Part of #6120 Building on the support for enabling connect proxy paths in #7323, this change adds the ability to configure the 'service.check.expose' flag on group-level service check definitions for services that are connect-enabled. This is a slight deviation from the "magic" that Consul provides. With Consul, the 'expose' flag exists on the connect.proxy stanza, which will then auto-generate expose paths for every HTTP and gRPC service check associated with that connect-enabled service. A first attempt at providing similar magic for Nomad's Consul Connect integration followed that pattern exactly, as seen in #7396. However, on reviewing the PR we realized having the `expose` flag on the proxy stanza inseperably ties together the automatic path generation with every HTTP/gRPC defined on the service. This makes sense in Consul's context, because a service definition is reasonably associated with a single "task". With Nomad's group level service definitions however, there is a reasonable expectation that a service definition is more abstractly representative of multiple services within the task group. In this case, one would want to define checks of that service which concretely make HTTP or gRPC requests to different underlying tasks. Such a model is not possible with the course `proxy.expose` flag. Instead, we now have the flag made available within the check definitions themselves. By making the expose feature resolute to each check, it is possible to have some HTTP/gRPC checks which make use of the envoy exposed paths, as well as some HTTP/gRPC checks which make use of some orthongonal port-mapping to do checks on some other task (or even some other bound port of the same task) within the task group. Given this example, group "server-group" { network { mode = "bridge" port "forchecks" { to = -1 } } service { name = "myserver" port = 2000 connect { sidecar_service { } } check { name = "mycheck-myserver" type = "http" port = "forchecks" interval = "3s" timeout = "2s" method = "GET" path = "/classic/responder/health" expose = true } } } Nomad will automatically inject (via job endpoint mutator) the extrapolated expose path configuration, i.e. expose { path { path = "/classic/responder/health" protocol = "http" local_path_port = 2000 listener_port = "forchecks" } } Documentation is coming in #7440 (needs updating, doing next) Modifications to the `countdash` examples in https://github.com/hashicorp/demo-consul-101/pull/6 which will make the examples in the documentation actually runnable. Will add some e2e tests based on the above when it becomes available.	2020-03-31 17:15:50 -06:00
Seth Hoenig	41244c5857	jobspec: parse multi expose.path instead of explicit slice	2020-03-31 17:15:27 -06:00
Seth Hoenig	0266f056b8	connect: enable proxy.passthrough configuration Enable configuration of HTTP and gRPC endpoints which should be exposed by the Connect sidecar proxy. This changeset is the first "non-magical" pass that lays the groundwork for enabling Consul service checks for tasks running in a network namespace because they are Connect-enabled. The changes here provide for full configuration of the connect { sidecar_service { proxy { expose { paths = [{ path = <exposed endpoint> protocol = <http or grpc> local_path_port = <local endpoint port> listener_port = <inbound mesh port> }, ... ] } } } stanza. Everything from `expose` and below is new, and partially implements the precedent set by Consul: https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference Combined with a task-group level network port-mapping in the form: port "exposeExample" { to = -1 } it is now possible to "punch a hole" through the network namespace to a specific HTTP or gRPC path, with the anticipated use case of creating Consul checks on Connect enabled services. A future PR may introduce more automagic behavior, where we can do things like 1) auto-fill the 'expose.path.local_path_port' with the default value of the 'service.port' value for task-group level connect-enabled services. 2) automatically generate a port-mapping 3) enable an 'expose.checks' flag which automatically creates exposed endpoints for every compatible consul service check (http/grpc checks on connect enabled services).	2020-03-31 17:15:27 -06:00
Seth Hoenig	1ce4eb17fa	client: use consistent name for struct receiver parameter This helps reduce the number of squiggly lines in Goland.	2020-03-31 17:15:27 -06:00
Lang Martin	8d4f39fba1	csi: add node events to report progress mounting and unmounting volumes (#7547 ) * nomad/structs/structs: new NodeEventSubsystemCSI * client/client: pass triggerNodeEvent in the CSIConfig * client/pluginmanager/csimanager/instance: add eventer to instanceManager * client/pluginmanager/csimanager/manager: pass triggerNodeEvent * client/pluginmanager/csimanager/volume: node event on [un]mount * nomad/structs/structs: use storage, not CSI * client/pluginmanager/csimanager/volume: use storage, not CSI * client/pluginmanager/csimanager/volume_test: eventer * client/pluginmanager/csimanager/volume: event on error * client/pluginmanager/csimanager/volume_test: check event on error * command/node_status: remove an extra space in event detail format * client/pluginmanager/csimanager/volume: use snake_case for details * client/pluginmanager/csimanager/volume_test: snake_case details	2020-03-31 17:13:52 -04:00
Yoan Blanc	225c9c1215	fixup! vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:48:07 -04:00
Yoan Blanc	761d014071	vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:45:21 -04:00
Seth Hoenig	b3664c628c	Merge pull request #7524 from hashicorp/docs-consul-acl-minimums consul: annotate Consul interfaces with ACLs	2020-03-30 13:27:27 -06:00
Mahmood Ali	7df337e4c4	Merge pull request #7534 from hashicorp/b-windows-dev-network windows: support -dev mode	2020-03-30 14:35:28 -04:00
Seth Hoenig	0a812ab689	consul: annotate Consul interfaces with ACLs	2020-03-30 10:17:28 -06:00
Drew Bailey	a98dc8c768	update audit examples to an endpoint that is audited	2020-03-30 10:03:11 -04:00
Mahmood Ali	dedf1cd3d7	tests: remove TestHTTP_NodeDrain_Compat Nomad 0.11 servers no longer support having pre-0.8 clients.	2020-03-30 07:06:52 -04:00
Mahmood Ali	8b2b3f99d3	tests: deflake TestHTTP_NodeDrain A node may be recognized as not running any allocs and have its drain flag reset before the test queries it.	2020-03-30 07:06:52 -04:00
Mahmood Ali	b0cc23ae63	tests: deflake TestConsul_PeriodicSync	2020-03-30 07:06:47 -04:00
Mahmood Ali	ec6afa5795	windows: support -dev mode Support running `nomad agent -dev` in Windows, by setting proper network interface. Prior to this change, `nomad` uses `lo` interface but Windows uses "Loopback Pseudo-Interface 1" to refer to loopback device interface: https://github.com/golang/go/blob/go1.14.1/src/net/net_windows_test.go#L304-L318 .	2020-03-28 12:01:51 -04:00
Drew Bailey	a66b4be0f3	remove auditing for /ui/	2020-03-27 10:12:42 -04:00
Drew Bailey	de687edb2e	wrap http.Handlers better comments	2020-03-27 09:35:10 -04:00
Lang Martin	50ff9ccd44	csi: plugin deregistration on plugin job GC (#7502 ) * nomad/structs/csi: delete just one plugin type from a node * nomad/structs/csi: add DeleteAlloc * nomad/state/state_store: add deleteJobFromPlugin * nomad/state/state_store: use DeleteAlloc not DeleteNodeType * move CreateTestCSIPlugin to state to avoid an import cycle * nomad/state/state_store_test: delete a plugin by deleting its jobs * nomad/_test: move CreateTestCSIPlugin to state nomad/state/state_store: update one plugin per transaction * command/plugin_status_test: move CreateTestCSIPlugin * nomad: csi: handle nils CSIPlugin methods, clarity	2020-03-26 17:07:18 -04:00
Drew Bailey	b96a4da6fc	sync changes made to oss files from ent	2020-03-25 10:57:44 -04:00
Drew Bailey	218bfff6dd	add in change missed from ent	2020-03-25 10:53:38 -04:00
Drew Bailey	97cc19276d	add auditor	2020-03-25 10:48:23 -04:00
Drew Bailey	7329a88758	allow all build contexts to use noOpAuditor	2020-03-25 10:38:40 -04:00
Mahmood Ali	1c1186b344	Merge pull request #7487 from hashicorp/b-xss-oss agent: prevent XSS by controlling Content-Type	2020-03-25 09:56:11 -04:00
Michael Schurter	29622013fa	remove double negative from comment Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>	2020-03-25 09:45:43 -04:00
Michael Schurter	1a27b8a07d	test: assert monitor endpoint sets proper headers	2020-03-25 09:45:43 -04:00
Michael Schurter	d6d44a8214	test: assert fs endpoints are xss safe	2020-03-25 09:45:43 -04:00
Michael Schurter	5ff458e840	agent: prevent XSS by controlling Content-Type	2020-03-25 09:45:43 -04:00
Mahmood Ali	c7cf60c837	tests: test agent to use a noop auditor	2020-03-25 08:45:44 -04:00
Mahmood Ali	ceed57b48f	per-task restart policy	2020-03-24 17:00:41 -04:00
Lang Martin	8bd0405f33	csi: return an empty result list from plugins & volumes without `type`, not an error (#7471 )	2020-03-24 14:28:28 -04:00
Chris Baker	bc13bfb433	bad conversion between api.ScalingPolicy and structs.ScalingPolicy meant that we were throwing away .Min if provided	2020-03-24 14:39:06 +00:00
Chris Baker	f6ec5f9624	made count optional during job scaling actions added ACL protection in Job.Scale in Job.Scale, only perform a Job.Register if the Count was non-nil	2020-03-24 14:39:05 +00:00
Chris Baker	233db5258a	changes to Canonicalize, Validate, and api->struct conversion so that tg.Count, tg.Scaling.Min/Max are well-defined with reasonable defaults. - tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent) - Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and	2020-03-24 13:57:17 +00:00
Chris Baker	00092a6c29	fixed http endpoints for job.register and job.scalestatus	2020-03-24 13:57:16 +00:00
Chris Baker	925b59e1d2	wip: scaling status return, almost done	2020-03-24 13:57:15 +00:00
Chris Baker	42270d862c	wip: some tests still failing updating job scaling endpoints to match RFC, cleaning up the API object as well	2020-03-24 13:57:14 +00:00
Chris Baker	abc7a52f56	finished refactoring state store, schema, etc	2020-03-24 13:57:14 +00:00
Chris Baker	3d54f1feba	wip: added Enabled to ScalingPolicyListStub, removed JobID from body of scaling request	2020-03-24 13:57:12 +00:00
Chris Baker	024d203267	wip: added tests for client methods around group scaling	2020-03-24 13:57:11 +00:00
Chris Baker	1c5c2eb71b	wip: add GET endpoint for job group scaling target	2020-03-24 13:57:10 +00:00
Chris Baker	179ab68258	wip: added job.scale rpc endpoint, needs explicit test (tested via http now)	2020-03-24 13:57:09 +00:00
Chris Baker	8453e667c2	wip: working on job group scaling endpoint	2020-03-24 13:55:20 +00:00
Chris Baker	6665d0bfb0	wip: added policy get endpoint, added UUID to policy	2020-03-24 13:55:20 +00:00
Chris Baker	9c2560ceeb	wip: upsert/delete scaling policies on job upsert/delete	2020-03-24 13:55:18 +00:00
Chris Baker	65d92f1fbf	WIP: adding ScalingPolicy to api/structs and state store	2020-03-24 13:55:18 +00:00
Drew Bailey	10f3b6899b	rename struct field to auditor	2020-03-23 20:09:01 -04:00
Drew Bailey	cf5fcf3748	make auditor interface more explicit	2020-03-23 19:32:58 -04:00
Mahmood Ali	61c42034d5	cli: show lifecycle info in alloc status Display task lifecycle info in `nomad alloc status <alloc_id>` output. I chose to embed it in the Task header and only add it for tasks with lifecycle info. Also, I chose to order the tasks in the following order: 1. prestart non-sidecar tasks 2. prestart sidecar tasks 3. main tasks The tasks are sorted lexicographically within each tier. Sample output: ``` $ nomad alloc status 6ec0eb52 ID = 6ec0eb52-e6c8-665c-169c-113d6081309b Eval ID = fb0caa98 Name = lifecycle.cache[0] [...] Task "init" (prestart) is "dead" Task Resources CPU Memory Disk Addresses 0/500 MHz 0 B/256 MiB 300 MiB [...] Task "some-sidecar" (prestart sidecar) is "running" Task Resources CPU Memory Disk Addresses 0/500 MHz 68 KiB/256 MiB 300 MiB [...] Task "redis" is "running" Task Resources CPU Memory Disk Addresses 10/500 MHz 984 KiB/256 MiB 300 MiB [...] ```	2020-03-23 15:57:24 -04:00
Drew Bailey	d0d32d8f06	fix compilation with correct func	2020-03-23 14:32:11 -04:00
Tim Gross	076fbbf08f	Merge pull request #7012 from hashicorp/f-csi-volumes Container Storage Interface Support	2020-03-23 14:19:46 -04:00
Lang Martin	e100444740	csi: add mount_options to volumes and volume requests (#7398 ) Add mount_options to both the volume definition on registration and to the volume block in the group where the volume is requested. If both are specified, the options provided in the request replace the options defined in the volume. They get passed to the NodePublishVolume, which causes the node plugin to actually mount the volume on the host. Individual tasks just mount bind into the host mounted volume (unchanged behavior). An operator can mount the same volume with different options by specifying it twice in the group context. closes #7007 * nomad/structs/volumes: add MountOptions to volume request * jobspec/test-fixtures/basic.hcl: add mount_options to volume block * jobspec/parse_test: add expected MountOptions * api/tasks: add mount_options * jobspec/parse_group: use hcl decode not mapstructure, mount_options * client/allocrunner/csi_hook: pass MountOptions through client/allocrunner/csi_hook: add a VolumeMountOptions client/allocrunner/csi_hook: drop Options client/allocrunner/csi_hook: use the structs options * client/pluginmanager/csimanager/interface: UsageOptions.MountOptions * client/pluginmanager/csimanager/volume: pass MountOptions in capabilities * plugins/csi/plugin: remove todo 7007 comment * nomad/structs/csi: MountOptions * api/csi: add options to the api for parsing, match structs * plugins/csi/plugin: move VolumeMountOptions to structs * api/csi: use specific type for mount_options * client/allocrunner/csi_hook: merge MountOptions here * rename CSIOptions to CSIMountOptions * client/allocrunner/csi_hook * client/pluginmanager/csimanager/volume * nomad/structs/csi * plugins/csi/fake/client: add PrevVolumeCapability * plugins/csi/plugin * client/pluginmanager/csimanager/volume_test: remove debugging * client/pluginmanager/csimanager/volume: fix odd merging logic * api: rename CSIOptions -> CSIMountOptions * nomad/csi_endpoint: remove a 7007 comment * command/alloc_status: show mount options in the volume list * nomad/structs/csi: include MountOptions in the volume stub * api/csi: add MountOptions to stub * command/volume_status_csi: clean up csiVolMountOption, add it * command/alloc_status: csiVolMountOption lives in volume_csi_status * command/node_status: display mount flags * nomad/structs/volumes: npe * plugins/csi/plugin: npe in ToCSIRepresentation * jobspec/parse_test: expand volume parse test cases * command/agent/job_endpoint: ApiTgToStructsTG needs MountOptions * command/volume_status_csi: copy paste error * jobspec/test-fixtures/basic: hclfmt * command/volume_status_csi: clean up csiVolMountOption	2020-03-23 13:59:25 -04:00
Lang Martin	3621df1dbf	csi: volume ids are only unique per namespace (#7358 ) * nomad/state/schema: use the namespace compound index * scheduler/scheduler: CSIVolumeByID interface signature namespace * scheduler/stack: SetJob on CSIVolumeChecker to capture namespace * scheduler/feasible: pass the captured namespace to CSIVolumeByID * nomad/state/state_store: use namespace in csi_volume index * nomad/fsm: pass namespace to CSIVolumeDeregister & Claim * nomad/core_sched: pass the namespace in volumeClaimReap * nomad/node_endpoint_test: namespaces in Claim testing * nomad/csi_endpoint: pass RequestNamespace to state.* * nomad/csi_endpoint_test: appropriately failed test * command/alloc_status_test: appropriately failed test * node_endpoint_test: avoid notTheNamespace for the job * scheduler/feasible_test: call SetJob to capture the namespace * nomad/csi_endpoint: ACL check the req namespace, query by namespace * nomad/state/state_store: remove deregister namespace check * nomad/state/state_store: remove unused CSIVolumes * scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace * nomad/csi_endpoint: ACL check * nomad/state/state_store_test: remove call to state.CSIVolumes * nomad/core_sched_test: job namespace match so claim gc works	2020-03-23 13:59:25 -04:00
Lang Martin	99841222ed	csi: change the API paths to match CLI command layout (#7325 ) * command/agent/csi_endpoint: support type filter in volumes & plugins * command/agent/http: use /v1/volume/csi & /v1/plugin/csi * api/csi: use /v1/volume/csi & /v1/plugin/csi * api/nodes: use /v1/volume/csi & /v1/plugin/csi * api/nodes: not /volumes/csi, just /volumes * command/agent/csi_endpoint: fix ot parameter parsing	2020-03-23 13:58:30 -04:00
Lang Martin	80619137ab	csi: volumes listed in `nomad node status` (#7318 ) * api/allocations: GetTaskGroup finds the taskgroup struct * command/node_status: display CSI volume names * nomad/state/state_store: new CSIVolumesByNodeID * nomad/state/iterator: new SliceIterator type implements memdb.ResultIterator * nomad/csi_endpoint: deal with a slice of volumes * nomad/state/state_store: CSIVolumesByNodeID return a SliceIterator * nomad/structs/csi: CSIVolumeListRequest takes a NodeID * nomad/csi_endpoint: use the return iterator * command/agent/csi_endpoint: parse query params for CSIVolumes.List * api/nodes: new CSIVolumes to list volumes by node * command/node_status: use the new list endpoint to print volumes * nomad/state/state_store: error messages consider the operator * command/node_status: include the Provider	2020-03-23 13:58:30 -04:00
Tim Gross	de4ad6ca38	csi: add Provider field to CSI CLIs and APIs (#7285 ) Derive a provider name and version for plugins (and the volumes that use them) from the CSI identity API `GetPluginInfo`. Expose the vendor name as `Provider` in the API and CLI commands.	2020-03-23 13:58:30 -04:00
Lang Martin	887e1f28c9	csi: CLI for volume status, registration/deregistration and plugin status (#7193 ) * command/csi: csi, csi_plugin, csi_volume * helper/funcs: move ExtraKeys from parse_config to UnusedKeys * command/agent/config_parse: use helper.UnusedKeys * api/csi: annotate CSIVolumes with hcl fields * command/csi_plugin: add Synopsis * command/csi_volume_register: use hcl.Decode style parsing * command/csi_volume_list * command/csi_volume_status: list format, cleanup * command/csi_plugin_list * command/csi_plugin_status * command/csi_volume_deregister * command/csi_volume: add Synopsis * api/contexts/contexts: add csi search contexts to the constants * command/commands: register csi commands * api/csi: fix struct tag for linter * command/csi_plugin_list: unused struct vars * command/csi_plugin_status: unused struct vars * command/csi_volume_list: unused struct vars * api/csi: add allocs to CSIPlugin * command/csi_plugin_status: format the allocs * api/allocations: copy Allocation.Stub in from structs * nomad/client_rpc: add some error context with Errorf * api/csi: collapse read & write alloc maps to a stub list * command/csi_volume_status: cleanup allocation display * command/csi_volume_list: use Schedulable instead of Healthy * command/csi_volume_status: use Schedulable instead of Healthy * command/csi_volume_list: sprintf string * command/csi: delete csi.go, csi_plugin.go * command/plugin: refactor csi components to sub-command plugin status * command/plugin: remove csi * command/plugin_status: remove csi * command/volume: remove csi * command/volume_status: split out csi specific * helper/funcs: add RemoveEqualFold * command/agent/config_parse: use helper.RemoveEqualFold * api/csi: do ,unusedKeys right * command/volume: refactor csi components to `nomad volume` * command/volume_register: split out csi specific * command/commands: use the new top level commands * command/volume_deregister: hardwired type csi for now * command/volume_status: csiFormatVolumes rescued from volume_list * command/plugin_status: avoid a panic on no args * command/volume_status: avoid a panic on no args * command/plugin_status: predictVolumeType * command/volume_status: predictVolumeType * nomad/csi_endpoint_test: move CreateTestPlugin to testing * command/plugin_status_test: use CreateTestCSIPlugin * nomad/structs/structs: add CSIPlugins and CSIVolumes search consts * nomad/state/state_store: add CSIPlugins and CSIVolumesByIDPrefix * nomad/search_endpoint: add CSIPlugins and CSIVolumes * command/plugin_status: move the header to the csi specific * command/volume_status: move the header to the csi specific * nomad/state/state_store: CSIPluginByID prefix * command/status: rename the search context to just Plugins/Volumes * command/plugin,volume_status: test return ids now * command/status: rename the search context to just Plugins/Volumes * command/plugin_status: support -json and -t * command/volume_status: support -json and -t * command/plugin_status_csi: comments * command/_status: clean up text api/csi: fix stale comments * command/volume: make deregister sound less fearsome * command/plugin_status: set the id length * command/plugin_status_csi: more compact plugin health * command/volume: better error message, comment	2020-03-23 13:58:30 -04:00
Tim Gross	016281135c	storage: add volumes to 'nomad alloc status' CLI (#7256 ) Adds a stanza for both Host Volumes and CSI Volumes to the the CLI output for `nomad alloc status`. Mostly relies on information already in the API structs, but in the case where there are CSI Volumes we need to make extra API calls to get the volume status. To reduce overhead, these extra calls are hidden behind the `-verbose` flag.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	15c6c05ccf	api: Parse CSI Volumes Previously when deserializing volumes we skipped over volumes that were not of type `host`. This commit ensures that we parse both host and csi volumes correctly.	2020-03-23 13:58:30 -04:00
Lang Martin	88316208a0	csi: server-side plugin state tracking and api (#6966 ) * structs: CSIPlugin indexes jobs acting as plugins and node updates * schema: csi_plugins table for CSIPlugin * nomad: csi_endpoint use vol.Denormalize, plugin requests * nomad: csi_volume_endpoint: rename to csi_endpoint * agent: add CSI plugin endpoints * state_store_test: use generated ids to avoid t.Parallel conflicts * contributing: add note about registering new RPC structs * command: agent http register plugin lists * api: CSI plugin queries, ControllerHealthy -> ControllersHealthy * state_store: copy on write for volumes and plugins * structs: copy on write for volumes and plugins * state_store: CSIVolumeByID returns an unhealthy volume, denormalize * nomad: csi_endpoint use CSIVolumeDenormalizePlugins * structs: remove struct errors for missing objects * nomad: csi_endpoint return nil for missing objects, not errors * api: return meta from Register to avoid EOF error * state_store: CSIVolumeDenormalize keep allocs in their own maps * state_store: CSIVolumeDeregister error on missing volume * state_store: CSIVolumeRegister set indexes * nomad: csi_endpoint use CSIVolumeDenormalizePlugins tests	2020-03-23 13:58:29 -04:00
Lang Martin	2f646fa5e9	agent: csi endpoint	2020-03-23 13:58:29 -04:00
Danielle Lancashire	8fb312e48e	node_status: Add CSI Summary info This commit introduces two new fields to the basic output of `nomad node status <node-id>`. 1) "CSI Controllers", which displays the names of registered controller plugins. 2) "CSI Drivers", which displays the names of registered CSI Node plugins. However, it does not implement support for verbose output, such as including health status or other fingerprinted data.	2020-03-23 13:58:29 -04:00
Danielle Lancashire	426c26d7c0	CSI Plugin Registration (#6555 ) This changeset implements the initial registration and fingerprinting of CSI Plugins as part of #5378. At a high level, it introduces the following: * A `csi_plugin` stanza as part of a Nomad task configuration, to allow a task to expose that it is a plugin. * A new task runner hook: `csi_plugin_supervisor`. This hook does two things. When the `csi_plugin` stanza is detected, it will automatically configure the plugin task to receive bidirectional mounts to the CSI intermediary directory. At runtime, it will then perform an initial heartbeat of the plugin and handle submitting it to the new `dynamicplugins.Registry` for further use by the client, and then run a lightweight heartbeat loop that will emit task events when health changes. * The `dynamicplugins.Registry` for handling plugins that run as Nomad tasks, in contrast to the existing catalog that requires `go-plugin` type plugins and to know the plugin configuration in advance. * The `csimanager` which fingerprints CSI plugins, in a similar way to `drivermanager` and `devicemanager`. It currently only fingerprints the NodeID from the plugin, and assumes that all plugins are monolithic. Missing features * We do not use the live updates of the `dynamicplugin` registry in the `csimanager` yet. * We do not deregister the plugins from the client when they shutdown yet, they just become indefinitely marked as unhealthy. This is deliberate until we figure out how we should manage deploying new versions of plugins/transitioning them.	2020-03-23 13:58:28 -04:00
Drew Bailey	b09abef332	Audit config, seams for enterprise audit features allow oss to parse sink duration clean up audit sink parsing ent eventer config reload fix typo SetEnabled to eventer interface client acl test rm dead code fix failing test	2020-03-23 13:47:42 -04:00
Jasmine Dahilig	73a64e4397	change jobspec lifecycle stanza to use sidecar attribute instead of block_until status	2020-03-21 17:52:57 -04:00
Jasmine Dahilig	1485b342e2	remove deadline code for now	2020-03-21 17:52:56 -04:00
Jasmine Dahilig	7b3f3497ed	mock task hook coordinator in consul integration test	2020-03-21 17:52:55 -04:00
Jasmine Dahilig	34f8055f39	remove logging debug line from cli	2020-03-21 17:52:49 -04:00
Mahmood Ali	c0f59ea06e	minor improvement	2020-03-21 17:52:44 -04:00
Jasmine Dahilig	bc78d6b64d	add lifecycle info to alloc status short	2020-03-21 17:52:42 -04:00
Jasmine Dahilig	fc13fa9739	change TaskLifecycle RunLevel to Hook and add Deadline time duration	2020-03-21 17:52:37 -04:00
Mahmood Ali	4ebeac721a	update structs with lifecycle	2020-03-21 17:52:36 -04:00
Mahmood Ali	3b5786ddb3	add lifecycle to api and parser	2020-03-21 17:52:36 -04:00
James Rasell	ef469e1a6e	Merge pull request #7379 from hashicorp/b-fix-agent-cmd--dev-connect-help cli: fix indentation issue with -dev-connect agent help output.	2020-03-19 08:34:45 +01:00
James Rasell	e3d14cc634	cli: fix indentation issue with -dev-connect agent help output.	2020-03-18 12:25:20 +01:00
Derek Strickland	b1490fe2dd	update log output to clarify that nodes were filtered out rather than down	2020-03-17 14:45:11 -04:00
Michael Schurter	b72b3e765c	Merge pull request #7170 from fredrikhgrelland/consul_template_upgrade Update consul-template to v0.24.1 and remove deprecated vault grace	2020-03-10 14:15:47 -07:00
Mahmood Ali	19f25f588f	Merge pull request #7252 from hashicorp/b-test-cluster-forming Simplify Bootstrap logic in tests	2020-03-03 16:56:08 -05:00
Mahmood Ali	acbfeb5815	Simplify Bootstrap logic in tests This change updates tests to honor `BootstrapExpect` exclusively when forming test clusters and removes test only knobs, e.g. `config.DevDisableBootstrap`. Background: Test cluster creation is fragile. Test servers don't follow the BootstapExpected route like production clusters. Instead they start as single node clusters and then get rejoin and may risk causing brain split or other test flakiness. The test framework expose few knobs to control those (e.g. `config.DevDisableBootstrap` and `config.Bootstrap`) that control whether a server should bootstrap the cluster. These flags are confusing and it's unclear when to use: their usage in multi-node cluster isn't properly documented. Furthermore, they have some bad side-effects as they don't control Raft library: If `config.DevDisableBootstrap` is true, the test server may not immediately attempt to bootstrap a cluster, but after an election timeout (~50ms), Raft may force a leadership election and win it (with only one vote) and cause a split brain. The knobs are also confusing as Bootstrap is an overloaded term. In BootstrapExpect, we refer to bootstrapping the cluster only after N servers are connected. But in tests and the knobs above, it refers to whether the server is a single node cluster and shouldn't wait for any other server. Changes: This commit makes two changes: First, it relies on `BootstrapExpected` instead of `Bootstrap` and/or `DevMode` flags. This change is relatively trivial. Introduce a `Bootstrapped` flag to track if the cluster is bootstrapped. This allows us to keep `BootstrapExpected` immutable. Previously, the flag was a config value but it gets set to 0 after cluster bootstrap completes.	2020-03-02 13:47:43 -05:00
Mahmood Ali	386f20099b	Honor CNI and bridge related fields Nomad agent may silently ignore cni_path and bridge setting, when it merges configs from multiple files (or against default/dev config). This PR ensures that the values are merged properly.	2020-02-28 14:23:13 -05:00
Mahmood Ali	437d03779c	tests: add tests for parsing cni fields	2020-02-28 14:18:45 -05:00
Fredrik Hoem Grelland	edb3bd0f3f	Update consul-template to v0.24.1 and remove deprecated vault_grace (#7170 )	2020-02-23 16:24:53 +01:00
Seth Hoenig	0f99cdd0d9	Merge pull request #7192 from hashicorp/b-connect-stanza-ignore consul/connect: in-place update sidecar service registrations on changes	2020-02-21 09:24:53 -06:00
Seth Hoenig	54b5173eca	consul/connect: in-place update sidecar service registrations on changes Fix a bug where consul service definitions would not be updated if changes were made to the service in the Nomad job. Currently this only fixes the bug for cases where the fix is a matter of updating consul agent's service registration. There is related bug where destructive changes are required (see #6877) which will be fixed in another PR. The enable_tag_override configuration setting for the parent service is applied to the sidecar service. Fixes #6459	2020-02-19 13:07:04 -06:00
Mahmood Ali	f4d8e1296f	Merge pull request #7171 from hashicorp/update-autopilot-20200214 Update consul vendor and add MinQuorum flag	2020-02-19 10:45:20 -06:00
Drew Bailey	3c0719274c	inlude pro in http_oss.go	2020-02-18 10:29:28 -05:00
Mahmood Ali	98ad59b1de	update rest of consul packages	2020-02-16 16:25:04 -06:00
Mahmood Ali	f492ab6d9e	implement MinQuorum	2020-02-16 16:04:59 -06:00
Mahmood Ali	fd51982018	tests: Avoid StartAsLeader raft config flag It's being deprecated	2020-02-13 18:56:53 -05:00
Seth Hoenig	543354aabe	Merge pull request #7106 from hashicorp/f-ctag-override client: enable configuring enable_tag_override for services	2020-02-13 12:34:48 -06:00
Michael Schurter	8c332a3757	Merge pull request #7102 from hashicorp/test-limits Fix some race conditions and flaky tests	2020-02-13 10:19:11 -08:00
Seth Hoenig	7f33b92e0b	command: use consistent CONSUL_HTTP_TOKEN name Consul CLI uses CONSUL_HTTP_TOKEN, so Nomad should use the same. Note that consul-template uses CONSUL_TOKEN, which Nomad also uses, so be careful to preserve any reference to that in the consul-template context.	2020-02-12 10:42:33 -06:00
Seth Hoenig	0e44094d1a	client: enable configuring enable_tag_override for services Consul provides a feature of Service Definitions where the tags associated with a service can be modified through the Catalog API, overriding the value(s) configured in the agent's service configuration. To enable this feature, the flag enable_tag_override must be configured in the service definition. Previously, Nomad did not allow configuring this flag, and thus the default value of false was used. Now, it is configurable. Because Nomad itself acts as a state machine around the the service definitions of the tasks it manages, it's worth describing what happens when this feature is enabled and why. Consider the basic case where there is no Nomad, and your service is provided to consul as a boring JSON file. The ultimate source of truth for the definition of that service is the file, and is stored in the agent. Later, Consul performs "anti-entropy" which synchronizes the Catalog (stored only the leaders). Then with enable_tag_override=true, the tags field is available for "external" modification through the Catalog API (rather than directly configuring the service definition file, or using the Agent API). The important observation is that if the service definition ever changes (i.e. the file is changed & config reloaded OR the Agent API is used to modify the service), those "external" tag values are thrown away, and the new service definition is once again the source of truth. In the Nomad case, Nomad itself is the source of truth over the Agent in the same way the JSON file was the source of truth in the example above. That means any time Nomad sets a new service definition, any externally configured tags are going to be replaced. When does this happen? Only on major lifecycle events, for example when a task is modified because of an updated job spec from the 'nomad job run <existing>' command. Otherwise, Nomad's periodic re-sync's with Consul will now no longer try to restore the externally modified tag values (as long as enable_tag_override=true). Fixes #2057	2020-02-10 08:00:55 -06:00
Michael Schurter	65d38d9255	test: fix flaky TestHTTP_FreshClientAllocMetrics	2020-02-07 15:50:53 -08:00
Michael Schurter	9d3093fa31	test: fix missing agent shutdowns	2020-02-07 15:50:53 -08:00
Michael Schurter	d96ceee8c5	testagent: fix case where agent would retry forever	2020-02-07 15:50:53 -08:00
Michael Schurter	e903501e65	test: improve error messages when failing	2020-02-07 15:50:53 -08:00
Michael Schurter	63032917fc	test: allow goroutine to exit even if test blocks	2020-02-07 15:50:53 -08:00
Michael Schurter	9905dec6a3	test: workaround limits race	2020-02-07 15:50:53 -08:00
Michael Schurter	19a1932bbb	test: wait longer than timeout The 1s timeout raced with the 1s deadline it was trying to detect.	2020-02-07 15:50:53 -08:00
Michael Schurter	fd81208db7	test: fix flaky health test Test set Agent.client=nil which prevented the client from being shutdown. This leaked goroutines and could cause panics due to the leaked client goroutines logging after their parent test had finished. Removed ACLs from the server test because I couldn't get it to work with the test agent, and it tested very little.	2020-02-07 15:50:53 -08:00
Michael Schurter	2896f78f77	client: fix race accessing Node.status * Call Node.Canonicalize once when Node is created. * Lock when accessing fields mutated by node update goroutine	2020-02-07 15:50:47 -08:00
Drew Bailey	d830998572	agent Profile req nil check s.agent.Server() clean up logic and tests	2020-02-03 13:20:05 -05:00
Drew Bailey	c4f45f9bde	Fix panic when monitoring a local client node Fixes a panic when accessing a.agent.Server() when agent is a client instead. This pr removes a redundant ACL check since ACLs are validated at the RPC layer. It also nil checks the agent server and uses Client() when appropriate.	2020-02-03 13:20:04 -05:00
Seth Hoenig	78a7d1e426	comments: cleanup some leftover debug comments and such	2020-01-31 19:04:35 -06:00
Seth Hoenig	076cb4754e	agent: re-enable the server in dev mode	2020-01-31 19:04:19 -06:00
Seth Hoenig	8219c78667	nomad: handle SI token revocations concurrently Be able to revoke SI token accessors concurrently, and also ratelimit the requests being made to Consul for the various ACL API uses.	2020-01-31 19:04:14 -06:00
Seth Hoenig	2c7ac9a80d	nomad: fixup token policy validation	2020-01-31 19:04:08 -06:00
Seth Hoenig	9df33f622f	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Seth Hoenig	f030a22c7c	command, docs: create and document consul token configuration for connect acls (gh-6716) This change provides an initial pass at setting up the configuration necessary to enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert` commands (similar to Vault tokens). These values are not actually used yet in this changeset.	2020-01-31 19:02:53 -06:00
Michael Schurter	c82b14b0c4	core: add limits to unauthorized connections Introduce limits to prevent unauthorized users from exhausting all ephemeral ports on agents: * `{https,rpc}_handshake_timeout` * `{http,rpc}_max_conns_per_client` The handshake timeout closes connections that have not completed the TLS handshake by the deadline (5s by default). For RPC connections this timeout also separately applies to first byte being read so RPC connections with TLS enabled have `rpc_handshake_time * 2` as their deadline. The connection limit per client prevents a single remote TCP peer from exhausting all ephemeral ports. The default is 100, but can be lowered to a minimum of 26. Since streaming RPC connections create a new TCP connection (until MultiplexV2 is used), 20 connections are reserved for Raft and non-streaming RPCs to prevent connection exhaustion due to streaming RPCs. All limits are configurable and may be disabled by setting them to `0`. This also includes a fix that closes connections that attempt to create TLS RPC connections recursively. While only users with valid mTLS certificates could perform such an operation, it was added as a safeguard to prevent programming errors before they could cause resource exhaustion.	2020-01-30 10:38:25 -08:00
Mahmood Ali	9611324654	Merge pull request #6922 from hashicorp/b-alloc-canoncalize Handle Upgrades and Alloc.TaskResources modification	2020-01-28 15:12:41 -05:00
Mahmood Ali	90cae566e5	Merge pull request #6935 from hashicorp/b-default-preemption-flag scheduler: allow configuring default preemption for system scheduler	2020-01-28 15:11:06 -05:00
Mahmood Ali	af17b4afc7	Support customizing full scheduler config	2020-01-28 14:51:42 -05:00
Nick Ethier	5636203d4e	consul: fix var name from rebase	2020-01-27 14:00:19 -05:00
Nick Ethier	0ae99b3c9c	consul: fix var name from rebase	2020-01-27 12:55:52 -05:00
Nick Ethier	5cbb94e16e	consul: add support for canary meta	2020-01-27 09:53:30 -05:00
Danielle	5fd52171aa	cli: add system command and subcmds to interact with system API. (#6924 ) cli: add system command and subcmds to interact with system API.	2020-01-13 16:16:08 +01:00
Mahmood Ali	1ab682f622	scheduler: allow configuring default preemption for system scheduler Some operators want a greater control over when preemption is enabled, especially during an upgrade to limit potential side-effects.	2020-01-13 08:30:49 -05:00
James Rasell	4e48217a4e	cli: add system command and subcmds to interact with system API. The system command includes gc and reconcile-summaries subcommands which covers all currently available system API calls. The help information is largely pulled from the current Nomad website API documentation.	2020-01-13 11:34:46 +01:00
Drew Bailey	f97d2e96c1	refactor api profile methods comment why we ignore errors parsing params	2020-01-09 15:15:12 -05:00
Drew Bailey	b702dede49	adds qc param, address pr feedback	2020-01-09 15:15:11 -05:00
Drew Bailey	085659f6ff	condense table test	2020-01-09 15:15:10 -05:00
Drew Bailey	45210ed901	Rename profile package to pprof Address pr feedback, rename profile package to pprof to more accurately describe its purpose. Adds gc param for heap lookup profiles.	2020-01-09 15:15:10 -05:00
Drew Bailey	1b8af920f3	address pr feedback	2020-01-09 15:15:09 -05:00
Drew Bailey	4ced73875b	leave acl checking to rpc endpoints fix test expectation test wrapNonJSON	2020-01-09 15:15:08 -05:00
Drew Bailey	279512c7f8	provide helpful error, cleanup logic	2020-01-09 15:15:08 -05:00
Drew Bailey	7bbba613a5	prevent doubly wrapping with rpc error	2020-01-09 15:15:07 -05:00
Drew Bailey	fd42020ad6	RPC server EnableDebug option Passes in agent enable_debug config to nomad server and client configs. This allows for rpc endpoints to have more granular control if they should be enabled or not in combination with ACLs. enable debug on client test	2020-01-09 15:15:07 -05:00
Drew Bailey	9a80938fb1	region forwarding; prevent recursive forwards for impossible requests prevent region forwarding loop, backfill tests fix failing test	2020-01-09 15:15:06 -05:00
Drew Bailey	46121fe3fd	move shared structs out of client and into nomad	2020-01-09 15:15:05 -05:00
Drew Bailey	3672414888	test pprof headers and profile methods tidy up, add comments clean up seconds param assignment	2020-01-09 15:15:04 -05:00
Drew Bailey	fc37448683	warn when enabled debug is on when registering m -> a receiver name return codederrors, fix query	2020-01-09 15:15:04 -05:00
Drew Bailey	62eb2d76a6	acl and debug test table rename implementation method	2020-01-09 15:15:03 -05:00
Drew Bailey	50288461c9	Server request forwarding for Agent.Profile Return rpc errors for profile requests, set up remote forwarding to target leader or server id for profile requests. server forwarding, endpoint tests	2020-01-09 15:15:03 -05:00
Drew Bailey	901f362858	test for known pprof endpoints	2020-01-09 15:15:02 -05:00
Drew Bailey	49ad5fbc85	agent pprof endpoints wip, agent endpoint and client endpoint for pprof profiles agent endpoint test	2020-01-09 15:15:02 -05:00
Mahmood Ali	a2e181dd45	CLI: protect against AllocatedResources being nil	2020-01-08 17:22:05 -05:00
Charlie Voiselle	5298fee5d6	Typo fix Synopsis needs to start with uppercase to match other commands	2020-01-08 10:44:00 -05:00
James Rasell	f2d1e45135	cli: include namespace in output when querying job stauts. (#6912 )	2020-01-08 08:24:03 -05:00
Michael Schurter	571ed261c8	Merge pull request #6898 from hashicorp/hicks/fix-typo Fix typo, Ethier -> Either	2020-01-02 14:52:18 -08:00
Kris Hicks	7fef7508cb	Fix typo, Ethier -> Either	2020-01-02 14:42:27 -08:00
Charlie Voiselle	fd3bf5f971	cli: Allow user to specify dest filename for nomad init (#6520 ) * Allow user to specify dest filename for nomad init * Create changelog entry for GH-6520	2019-12-19 14:59:12 -05:00
Drew Bailey	8e59e91991	Merge pull request #6746 from hashicorp/f-shutdown-delay-tg Group shutdown_delay	2019-12-18 16:01:30 -05:00
Lang Martin	06f441f562	test: quota: relax multierror message matching to Contains	2019-12-17 13:20:14 -05:00
Lang Martin	fb6c27b828	test: build quota_apply_test, remove the tests that require ent	2019-12-17 13:20:14 -05:00
Drew Bailey	d9e41d2880	docs for shutdown delay update docs, address pr comments ensure pointer is not nil use pointer for diff tests, set vs unset	2019-12-16 11:38:35 -05:00
Drew Bailey	24929776a2	shutdown delay for task groups copy struct values ensure groupserviceHook implements RunnerPreKillhook run deregister first test that shutdown times are delayed move magic number into variable	2019-12-16 11:38:16 -05:00
Mahmood Ali	76be9b4afb	cli: sequence cli.Ui operations Fixes a bug where if a command flag parsing errors, the resulting error and help usage messages get interleaved in unexpected and non-user friendly way. The reason is that we have flag parsing library effectively writes to ui.Error in a goroutine. This is problematic: first, we lose the sequencing between help usage and error message; second, cli.Ui methods are not concurrent safe. Here, we introduce a custom error writer that buffers result and calls ui.Error() in the write method and in the same goroutine. For context, we need to wrap ui.Error because it's line-oriented, while flags library expects a io.Writer which is bytes oriented.	2019-12-16 10:08:17 -05:00
Danielle	246a4e898b	Merge pull request #6828 from hashicorp/b/nomad-monitor-panic command: error when no node is found for `monitor`	2019-12-10 14:29:32 +01:00
Danielle Lancashire	cd764ab0e9	command: error when no node is found for `monitor` Currently `nomad monitor -node-id` will panic when a node-id does not match any nodes, as there is no empty result bounds checking. Here we return an error to the user when no nodes are found.	2019-12-10 13:10:47 +01:00
Seth Hoenig	f0c3dca49c	tests: swap lib/freeport for tweaked helper/freeport Copy the updated version of freeport (sdk/freeport), and tweak it for use in Nomad tests. This means staying below port 10000 to avoid conflicts with the lib/freeport that is still transitively used by the old version of consul that we vendor. Also provide implementations to find ephemeral ports of macOS and Windows environments. Ports acquired through freeport are supposed to be returned to freeport, which this change now also introduces. Many tests are modified to include calls to a cleanup function for Server objects. This should help quite a bit with some flakey tests, but not all of them. Our port problems will not go away completely until we upgrade our vendor version of consul. With Go modules, we'll probably do a 'replace' to swap out other copies of freeport with the one now in 'nomad/helper/freeport'.	2019-12-09 08:37:32 -06:00
Michael Schurter	3008473f9b	Merge branch 'master' into release-0102	2019-12-04 14:13:34 -08:00
Mahmood Ali	7b8cfee162	tests: deflake TestHTTP_FreshClientAllocMetrics The test asserts that alloc counts get reported accurately in metrics by inspecting the metrics endpoint directly. Sadly, the metrics as collected by `armon/go-metrics` seem to be stateful and may contain info from other tests. This means that the test can fail depending on the order of returned metrics. Inspecting the metrics output of one failing run, you can see the duplicate guage entries but for different node_ids: ``` { "Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal", "Value": 10, "Labels": { "datacenter": "dc1", "node_class": "none", "node_id": "67402bf4-00f3-bd8d-9fa8-f4d1924a892a" } }, { "Name": "service-name.default-0a3ba4b6-2109-485e-be74-6864228aed3d.client.allocations.terminal", "Value": 0, "Labels": { "datacenter": "dc1", "node_class": "none", "node_id": "a2945b48-7e66-68e2-c922-49b20dd4e20c" } }, ```	2019-11-22 18:41:21 -05:00
Nomad Release bot	db6420367d	Generate files for 0.10.2-rc1 release	2019-11-22 18:42:49 +00:00
Drew Bailey	b45ce9e997	add server-id to -h output	2019-11-21 16:04:28 -05:00
Drew Bailey	b3765b06ea	add server-id to -h output	2019-11-21 16:01:09 -05:00
Drew Bailey	6d5156bbba	Allows a node uuid prefix to be passed in	2019-11-21 15:15:41 -05:00
Drew Bailey	7ca6dbe61e	Allows a node uuid prefix to be passed in	2019-11-21 14:51:48 -05:00
Lang Martin	069e9a624b	command: quota init writes files with a network limit	2019-11-20 17:59:55 -06:00
Lang Martin	d2fc279af4	command: quota status reports network usage	2019-11-20 17:59:34 -06:00
Lang Martin	f45bebdb66	command: quota init writes files with a network limit	2019-11-20 18:44:06 -05:00
Lang Martin	2e2c662977	command: quota status reports network usage	2019-11-20 18:44:06 -05:00
Michael Schurter	48239d7f2e	Merge pull request #6017 from hashicorp/f-policy-json api: Add parsed rules to policy response	2019-11-20 15:31:03 -08:00
Mahmood Ali	be6c60455e	Merge pull request #6669 from hashicorp/b-cors-allow-credentials Allow UI to query client directly for task logs/state	2019-11-20 15:14:01 -05:00
Buck Doyle	db77a24ed3	Merge branch 'master' into f-policy-json	2019-11-20 11:20:07 -06:00
Michael Schurter	ecf970b5a5	Merge pull request #6370 from pmcatominey/tls-server-name command: add -tls-server-name flag	2019-11-20 08:44:54 -08:00
Preetha	42c1c85285	Merge pull request #6421 from hashicorp/b-acl-bootstrap-codes api: acl bootstrap errors aren't 500	2019-11-20 10:36:08 -06:00
Preetha	be4a51d5b8	Merge pull request #6349 from hashicorp/b-host-stats client: Return empty values when host stats fail	2019-11-20 10:13:02 -06:00
Buck Doyle	7e3188a4ea	CLI: Remove duplicated error output (#6738 )	2019-11-19 16:05:53 -06:00
Mahmood Ali	97974c4b13	Merge pull request #6684 from hashicorp/b-nomad-exec-stdout-tty nomad exec: check stdout for tty as well	2019-11-19 15:55:21 -05:00
Mahmood Ali	6f8bb5e90b	api: acl bootstrap errors aren't 500 Noticed that ACL endpoints return 500 status code for user errors. This is confusing and can lead to false monitoring alerts. Here, I introduce a concept of RPCCoded errors to be returned by RPC that signal a code in addition to error message. Codes for now match HTTP codes to ease reasoning. ``` $ nomad acl bootstrap Error bootstrapping: Unexpected response code: 500 (ACL bootstrap already done (reset index: 9)) $ nomad acl bootstrap Error bootstrapping: Unexpected response code: 400 (ACL bootstrap already done (reset index: 9)) ```	2019-11-19 15:51:57 -05:00
Tim Gross	1210261fe2	hclfmt nomad jobspecs (#6724 )	2019-11-19 10:36:41 -05:00
Nick Ethier	bd454a4c6f	client: improve group service stanza interpolation and check_re… (#6586 ) * client: improve group service stanza interpolation and check_restart support Interpolation can now be done on group service stanzas. Note that some task runtime specific information that was previously available when the service was registered poststart of a task is no longer available. The check_restart stanza for checks defined on group services will now properly restart the allocation upon check failures if configured.	2019-11-18 13:04:01 -05:00
Drew Bailey	9b63828658	serverID to target remote leader or server handle the case where we request a server-id which is this current server update docs, error on node and server id params more accurate names for tests use shared no leader err, formatting rm bad comment remove redundant variable	2019-11-14 10:07:35 -05:00
Drew Bailey	b644e1f47d	add server-id to monitor specific server	2019-11-14 09:53:41 -05:00
Drew Bailey	acd97d0731	Merge pull request #6670 from hashicorp/api/fallthrough-test test rootfallthrough handler	2019-11-13 10:51:31 -05:00
Lars Lehtonen	1dbf44bc40	command/agent: Prune Dead Code (#6682 ) * remove unused MockPeriodicJob() from tests * remove unused getIndex() from tests * remove unused checkIndex() from tests * remove unused assertIndex() from tests * remove unused Agent.findLoopbackDevice()	2019-11-13 08:20:01 -05:00
Lars Lehtonen	e85509c466	command: error handling before file close (#6681 )	2019-11-13 08:18:20 -05:00
Drew Bailey	f5310ff63f	fix so assertions are test case driven	2019-11-12 14:28:21 -05:00
Mahmood Ali	591cb75ee4	nomad exec: check stdout for tty as well When inferring whether to use TTY, check both stdin and stdout are terminals. Otherwise, we get failures like the following: ``` $ nomad alloc exec --job example echo hi hi $ echo \| nomad alloc exec --job example echo hi hi $ nomad alloc exec --job example echo hi \| head -n1 failed to exec into task: not a terminal ```	2019-11-12 11:39:06 -05:00
Lars Lehtonen	98d3e47b32	command: fix TestHelpers_LineLimitReader_TimeLimit() goroutine (#6678 )	2019-11-12 08:35:11 -05:00
Charlie Voiselle	835831a3d8	Added service wrapper code (#6220 ) This is the basic code to add the Windows Service Manager hooks to Nomad. Includes vendoring golang.org/x/sys/windows/svc and added Docs: * guide for installing as a windows service. * configuration for logging to file from PR #6429	2019-11-11 15:16:07 -05:00
Drew Bailey	f989f38594	test /ui/ path	2019-11-11 12:12:42 -05:00
Drew Bailey	a0548824f3	test rootfallthrough handler	2019-11-11 12:08:44 -05:00
Mahmood Ali	b2145f2d02	Allow UI to query client directly Nomad web UI currently fails when querying client nodes for allocation state end endpoints, due to CORS policy. The issue is that CORS requests that are marked `withCredentials` need the http server to include a `Access-Control-Allow-Credentials` [1]. But Nomad Task Logs and filesystem requests include authenticating information and thus marked with `credentials=true`[2][3]. It's worth noting that the browser currently sends credentials and authentication token to servers anyway; it's just that the response is not made available to caller nomad ui javascript. For task logs specifically, nomad ui retries again by querying the web ui address (typically pointing to a nomad server) which will forward the request to the nomad client agent appropriately. [1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Access-Control-Allow-Credentials [2] `101d0373ee/ui/app/components/task-log.js (L50)` [3] `101d0373ee/ui/app/services/token.js (L25-L39)`	2019-11-11 15:13:30 +00:00
Lars Lehtonen	08d5342812	command/agent: TestAgent_ServerConfig() fix dropped errors (#6659 )	2019-11-11 09:46:46 -05:00
Drew Bailey	04439a5a78	better func name, swap conditional	2019-11-11 08:35:56 -05:00
Drew Bailey	c85df2dac7	returns a 404 if not found instead of redirect to ui	2019-11-08 15:34:35 -05:00
Tim Gross	4909adb32c	fix broken test expectation from message change (#6635 )	2019-11-06 16:33:13 -05:00
Drew Bailey	7b2ad28ef6	unlock before returning, no need for label comment, trigger build return length written	2019-11-05 11:44:29 -05:00
Drew Bailey	d3b48a3e45	simplify logch goroutine	2019-11-05 11:44:28 -05:00
Drew Bailey	df57f70a68	wireup plain=true\|false query param	2019-11-05 11:44:28 -05:00
Drew Bailey	f4a7e3dc75	coordinate closing of doneCh, use interface to simplify callers comments	2019-11-05 11:44:26 -05:00
Drew Bailey	fe542680dc	log-json -> json fix typo command/agent/monitor/monitor.go Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com> Update command/agent/monitor/monitor.go Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com> address feedback, lock to prevent send on closed channel fix lock/unlock for dropped messages	2019-11-05 09:51:59 -05:00
Drew Bailey	84c8e79f90	simplify assert message	2019-11-05 09:51:56 -05:00
Drew Bailey	8726b685de	address feedback	2019-11-05 09:51:56 -05:00
Drew Bailey	e4b3e1d7d4	allow more time for streaming message remove unused struct	2019-11-05 09:51:55 -05:00
Drew Bailey	318b6c91bf	monitor command takes no args rm extra new line fix lint errors return after close fix, simplify test	2019-11-05 09:51:55 -05:00
Drew Bailey	0e759c401c	moving endpoints over to frames	2019-11-05 09:51:54 -05:00
Drew Bailey	c7b633b6c1	lock in sub select rm redundant lock wip to use framing wip switch to stream frames	2019-11-05 09:51:54 -05:00
Drew Bailey	fb23c1325d	fix deadlock issue, switch to frames envelope	2019-11-05 09:51:54 -05:00
Drew Bailey	32f62edbb0	return 400 if invalid log_json param is given Addresses feedback around monitor implementation subselect on stopCh to prevent blocking forever. Set up a separate goroutine to check every 3 seconds for dropped messages. rename returned ch to avoid confusion	2019-11-05 09:51:53 -05:00
Drew Bailey	17d876d5ef	rename function, initialize log level better underscores instead of dashes for query params	2019-11-05 09:51:53 -05:00
Drew Bailey	8e3915c7fc	use channel instead of empty string to determine close	2019-11-05 09:51:52 -05:00
Drew Bailey	da6229d704	update go-hclog dep remove duplicate lock	2019-11-05 09:51:52 -05:00
Drew Bailey	db65b1f4a5	agent:read acl policy for monitor	2019-11-05 09:51:52 -05:00
Drew Bailey	f46fd5b3e1	only look up rpchandler for node if we have nodeid fix some comments and nomad monitor -h output	2019-11-05 09:51:51 -05:00
Drew Bailey	3b9c33a5f0	new hclog with standardlogger intercept	2019-11-05 09:51:49 -05:00
Drew Bailey	a45ae1cd58	enable json formatting, use queryoptions	2019-11-05 09:51:49 -05:00
Drew Bailey	786989dbe3	New monitor pkg for shared monitor functionality Adds new package that can be used by client and server RPC endpoints to facilitate monitoring based off of a logger clean up old code small comment about write rm old comment about minsize rename to Monitor Removes connection logic from monitor command Keep connection logic in endpoints, use a channel to send results from monitoring use new multisink logger and interfaces small test for dropped messages update go-hclogger and update sink/intercept logger interfaces	2019-11-05 09:51:49 -05:00
Drew Bailey	e076204820	get local rpc endpoint working	2019-11-05 09:51:48 -05:00
Drew Bailey	976c43157c	remove log_writer prefix output with proper spacing update gzip handler, adjust first byte flow to allow gzip handler bypass wip, first stab at wiring up rpc endpoint	2019-11-05 09:51:48 -05:00
Drew Bailey	0de94466b2	Display error when remote side ended monitor multisink logger remove usage of logwriter	2019-11-05 09:51:48 -05:00
Drew Bailey	f60e44afc7	Adds nomad monitor command Adds nomad monitor command. Like consul monitor, this command allows you to stream logs from a nomad agent in real time with a a specified log level add endpoint tests Upgrade go-hclog to latest version The current version of go-hclog pads log prefixes to equal lengths so info becomes [INFO ] and debug becomes [DEBUG]. This breaks hashicorp/logutils/level.go Check function. Upgrading to the latest version removes this padding and fixes log filtering that uses logutils Check	2019-11-05 09:51:47 -05:00
Drew Bailey	b386119d15	Add Agent Monitor to receive streaming logs Queries /v1/agent/monitor and receives streaming logs from client	2019-11-05 09:51:47 -05:00
Drew Bailey	b0184e2032	Adds AgentMonitor Endpoint AgentMonitor is an endpoint to stream logs for a given agent. It allows callers to pass in a supplied log level, which may be different than the agents config allowing for temporary debugging with lower log levels. Pass in logWriter when setting up Agent	2019-11-05 09:51:46 -05:00
Drew Bailey	3a11f1f23a	Merge pull request #6609 from hashicorp/b-alloc-status-consistency Prevent nomad alloc status output inconsistency	2019-11-04 10:12:04 -05:00
Drew Bailey	a7adc54235	Prevent nomad alloc status output inconsistency Prevent random map ordering and sort alphabetically better variable name	2019-11-01 14:01:32 -04:00
Michael Schurter	9fed8d1bed	client: fix panic from 0.8 -> 0.10 upgrade makeAllocTaskServices did not do a nil check on AllocatedResources which causes a panic when upgrading directly from 0.8 to 0.10. While skipping 0.9 is not supported we intend to fix serious crashers caused by such upgrades to prevent cluster outages. I did a quick audit of the client package and everywhere else that accesses AllocatedResources appears to be properly guarded by a nil check.	2019-11-01 07:47:03 -07:00
Mahmood Ali	3f6e50617a	Merge pull request #6047 from hashicorp/b-ignore-server-if-disabled Only warn against BootstrapExpect set in CLI flag	2019-10-29 10:55:44 -04:00
Lang Martin	aa77ea4032	quota: parse network stanza in quotas (#6511 )	2019-10-24 10:41:54 -04:00
Michael Schurter	39437a5c5b	Merge branch 'master' into release-0100	2019-10-22 08:17:57 -07:00
Nomad Release bot	3e6c9dd40e	Generate files for 0.10.0 release	2019-10-22 12:34:56 +00:00
Seth Hoenig	8b03477f46	Merge pull request #6448 from hashicorp/f-set-connect-sidecar-tags connect: enable setting tags on consul connect sidecar service in job…	2019-10-17 15:14:09 -05:00
Seth Hoenig	039fbd3f3b	connect: enable setting tags on consul connect sidecar service in jobspec (#6415 )	2019-10-17 19:25:20 +00:00
Mahmood Ali	61e66cb077	Merge pull request #6427 from hashicorp/b-fs-endpoint-errors agent: report fs log errors as http errors	2019-10-15 20:12:59 -04:00
Mahmood Ali	88f8127820	tests: avoid using unnecessary pipe	2019-10-15 17:22:03 -04:00
Mahmood Ali	e6d5635e1a	Merge pull request #6425 from hashicorp/f-cli-show-full-ids cli: show full id for single node or alloc status	2019-10-15 10:54:25 -04:00
Danielle	fee482ae6c	Merge pull request #6331 from hashicorp/dani/f-volume-mount-propagation volumes: Add support for mount propagation	2019-10-14 14:29:40 +02:00
Danielle Lancashire	4fbcc668d0	volumes: Add support for mount propagation This commit introduces support for configuring mount propagation when mounting volumes with the `volume_mount` stanza on Linux targets. Similar to Kubernetes, we expose 3 options for configuring mount propagation: - private, which is equivalent to `rprivate` on Linux, which does not allow the container to see any new nested mounts after the chroot was created. - host-to-task, which is equivalent to `rslave` on Linux, which allows new mounts that have been created _outside of the container_ to be visible inside the container after the chroot is created. - bidirectional, which is equivalent to `rshared` on Linux, which allows both the container to see new mounts created on the host, but importantly _allows the container to create mounts that are visible in other containers an don the host_ private and host-to-task are safe, but bidirectional mounts can be dangerous, as if the code inside a container creates a mount, and does not clean it up before tearing down the container, it can cause bad things to happen inside the kernel. To add a layer of safety here, we require that the user has ReadWrite permissions on the volume before allowing bidirectional mounts, as a defense in depth / validation case, although creating mounts should also require a priviliged execution environment inside the container.	2019-10-14 14:09:58 +02:00
Danielle	2640155ae5	Merge pull request #6429 from hashicorp/f-log-to-file Add support for logging to a file	2019-10-11 13:35:39 +02:00
Nomad Release bot	3007f1662e	Generate files for 0.10.0-rc1 release	2019-10-10 19:08:23 +00:00
Danielle Lancashire	5cedf6d024	logging: Correctly track number of written bytes Currently this assumes that a short write will never happen. While these are improbable in a case where rotation being off a few bytes would matter, this now correctly tracks the number of written bytes.	2019-10-10 14:02:14 +02:00
Danielle Lancashire	b67215d4f8	logging: Sort files when pruning old logs Currently this logging implementation is dependent on the order of files as returned by filepath.Glob, which although internal methods are documented to be lexographical, does not publicly document this. Here we defensively resort.	2019-10-10 13:51:16 +02:00
Mahmood Ali	4b2ba62e35	acl: check ACL against object namespace Fix a bug where a millicious user can access or manipulate an alloc in a namespace they don't have access to. The allocation endpoints perform ACL checks against the request namespace, not the allocation namespace, and performs the allocation lookup independently from namespaces. Here, we check that the requested can access the alloc namespace regardless of the declared request namespace. Ideally, we'd enforce that the declared request namespace matches the actual allocation namespace. Unfortunately, we haven't documented alloc endpoints as namespaced functions; we suspect starting to enforce this will be very disruptive and inappropriate for a nomad point release. As such, we maintain current behavior that doesn't require passing the proper namespace in request. A future major release may start enforcing checking declared namespace.	2019-10-08 12:59:22 -04:00
Mahmood Ali	3c0d8c7611	Merge pull request #6441 from hashicorp/b-agent-token Redact replication tokens in /agent/self	2019-10-08 12:55:45 -04:00
Danielle Lancashire	9eaac48f25	agent: Refactor log setup to support log-to-file	2019-10-07 14:42:32 +02:00
Danielle Lancashire	442f4888b3	agent: Introduce File Logger This commit introduces a rotating file logger for Nomad Agent Logs. The logger implementation itself is a lift and shift from Consul, with tests updated to fit with the Nomad pattern of using require, and not having a testutil for creating tempdirs cleanly.	2019-10-07 14:37:31 +02:00
Danielle Lancashire	d3614ea0a8	config: Add required configuration for logging to a file	2019-10-07 14:16:59 +02:00
Mahmood Ali	d09355efe4	cli: show full id for single node or alloc status Show full ID on individual alloc or node status views. Shortening the ID isn't very helpful in these cases, and makes looking up the full id slightly more complicated when user needs to interact with API. List views are unmodified and show short id unless `-vebose` flag is passed. Before ``` $ nomad node status -self \| head -n2 ID = 21fc51f9 Name = mars-2.local $ nomad alloc status 15ae54cd \| head -n3 ID = 15ae54cd-08dd-3681-03cf-4c23ace7e7c3 Eval ID = a6b15f86 Name = example.cache[0] ``` After: ``` $ nomad node status -self \| head -n2 ID = 21fc51f9-fd39-0fa0-fb41-f34c7aa36101 Name = mars-2.local $ nomad alloc status 15ae54cd \| head -n3 ID = 15ae54cd-08dd-3681-03cf-4c23ace7e7c3 Eval ID = a6b15f86-ca8e-e536-b544-4bfb43137ff3 Name = example.cache[0] ```	2019-10-04 16:36:18 -04:00
Mahmood Ali	317e0f9e44	agent: report fs log errors as http errors This fixes two bugs: First, FS Logs API endpoint only propagated error back to user if it was encoded with code, which isn't common. Other errors get suppressed and callers get an empty response with 200 error code. Now, these endpoints return a 500 status code along with the error message. Before ``` $ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout"; echo * Trying 127.0.0.1... * TCP_NODELAY set * Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0) > GET /v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout HTTP/1.1 > Host: 127.0.0.1:4646 > User-Agent: curl/7.54.0 > Accept: / > < HTTP/1.1 200 OK < Vary: Accept-Encoding < Vary: Origin < Date: Fri, 04 Oct 2019 19:47:21 GMT < Content-Length: 0 < * Connection #0 to host 127.0.0.1 left intact ``` After ``` $ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout"; echo * Trying 127.0.0.1... * TCP_NODELAY set * Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0) > GET /v1/client/fs/logs/qwerqwera?follow=false&offset=0&origin=start&region=global&task=redis&type=stdout HTTP/1.1 > Host: 127.0.0.1:4646 > User-Agent: curl/7.54.0 > Accept: / > < HTTP/1.1 500 Internal Server Error < Vary: Accept-Encoding < Vary: Origin < Date: Fri, 04 Oct 2019 19:48:12 GMT < Content-Length: 60 < Content-Type: text/plain; charset=utf-8 < * Connection #0 to host 127.0.0.1 left intact alloc lookup failed: index error: UUID must be 36 characters ``` Second, we return 400 status code for request validation errors. Before ``` $ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera"; echo * Trying 127.0.0.1... * TCP_NODELAY set * Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0) > GET /v1/client/fs/logs/qwerqwera HTTP/1.1 > Host: 127.0.0.1:4646 > User-Agent: curl/7.54.0 > Accept: / > < HTTP/1.1 500 Internal Server Error < Vary: Accept-Encoding < Vary: Origin < Date: Fri, 04 Oct 2019 19:47:29 GMT < Content-Length: 22 < Content-Type: text/plain; charset=utf-8 < * Connection #0 to host 127.0.0.1 left intact must provide task name ``` After ``` $ curl -v "http://127.0.0.1:4646/v1/client/fs/logs/qwerqwera"; echo * Trying 127.0.0.1... * TCP_NODELAY set * Connected to 127.0.0.1 (127.0.0.1) port 4646 (#0) > GET /v1/client/fs/logs/qwerqwera HTTP/1.1 > Host: 127.0.0.1:4646 > User-Agent: curl/7.54.0 > Accept: / > < HTTP/1.1 400 Bad Request < Vary: Accept-Encoding < Vary: Origin < Date: Fri, 04 Oct 2019 19:49:18 GMT < Content-Length: 22 < Content-Type: text/plain; charset=utf-8 < * Connection #0 to host 127.0.0.1 left intact must provide task name ```	2019-10-04 16:33:58 -04:00
Lang Martin	fb41dd86ba	default raft protocol v2	2019-09-24 14:37:55 -04:00
Peter McAtominey	de133d883f	command: add -tls-server-name flag	2019-09-24 09:20:41 -07:00
Tim Gross	cd9c23617f	client/connect: ConsulProxy LocalServicePort/Address (#6358 ) Without a `LocalServicePort`, Connect services will try to use the mapped port even when delivering traffic locally. A user can override this behavior by pinning the port value in the `service` stanza but this prevents us from using the Consul service name to reach the service. This commits configures the Consul proxy with its `LocalServicePort` and `LocalServiceAddress` fields.	2019-09-23 14:30:48 -04:00
Danielle Lancashire	39fe07f66b	api: Redact tokens in /agent/self	2019-09-23 19:07:27 +02:00
Danielle Lancashire	8b44369073	api: Redact ACL Replication Token Currently when hitting the /v1/agent/self API with ACL Replication enabled results in the token being returned in the API. This commit redacts that information, as it should be treated as a shared secret.	2019-09-22 14:35:53 +02:00
Chris Baker	6f38cca15a	fixed incorrect CLI documentation in `job deployments` listed `-all-allocs` instead of `-all`	2019-09-20 12:24:53 -05:00
Danielle Lancashire	e81d113e3f	command: Improve metrics fail logging	2019-09-19 04:17:42 +02:00
Mahmood Ali	b4a7585e5e	Merge pull request #6328 from hashicorp/b-gh-6269 cli: emit job version number proper	2019-09-17 19:06:44 -04:00
Tim Gross	e3e30c15a9	remove resolved TODO from UpdateTTL docstring (#6336 )	2019-09-16 16:26:06 -04:00
Mahmood Ali	df8a168d06	cli: emit job version number proper We must emit alloc job number rather than its the field address.	2019-09-13 19:04:32 -04:00
Danielle Lancashire	78b61de45f	config: Hoist volume.config.source into volume Currently, using a Volume in a job uses the following configuration: ``` volume "alias-name" { type = "volume-type" read_only = true config { source = "host_volume_name" } } ``` This commit migrates to the following: ``` volume "alias-name" { type = "volume-type" source = "host_volume_name" read_only = true } ``` The original design was based due to being uncertain about the future of storage plugins, and to allow maxium flexibility. However, this causes a few issues, namely: - We frequently need to parse this configuration during submission, scheduling, and mounting - It complicates the configuration from and end users perspective - It complicates the ability to do validation As we understand the problem space of CSI a little more, it has become clear that we won't need the `source` to be in config, as it will be used in the majority of cases: - Host Volumes: Always need a source - Preallocated CSI Volumes: Always needs a source from a volume or claim name - Dynamic Persistent CSI Volumes: Always needs a source to attach the volumes to for managing upgrades and to avoid dangling. - Dynamic Ephemeral CSI Volumes: Less thought out, but `source` will probably point to the plugin name, and a `config` block will allow you to pass meta to the plugin. Or will point to a pre-configured ephemeral config. *If implemented The new design simplifies this by merging the source into the volume stanza to solve the above issues with usability, performance, and error handling.	2019-09-13 04:37:59 +02:00
Mahmood Ali	877260afd8	fix 'nomad namespace apply' help Named arguments need to preceed positional arguments.	2019-09-09 10:04:41 -07:00
Nomad Release bot	dc7d728a82	Generate files for 0.10.0-beta1 release	2019-09-06 18:47:09 +00:00
Michael Schurter	31eb8375e5	Merge pull request #6282 from hashicorp/f-connect-dev-path connect: check if consul is on PATH	2019-09-05 12:25:23 -07:00
Michael Schurter	457684e34e	connect: check if consul is on PATH Only in -dev-connect mode for now since its valid to install Consul after Nomad has started in production.	2019-09-05 12:05:42 -07:00
Jasmine Dahilig	e1c73cdab5	add validation for job_gc_interval (#6277 )	2019-09-05 11:20:46 -07:00
Mahmood Ali	6d73ca0cfb	Merge pull request #6250 from hashicorp/f-raft-protocol-v3 Update default raft protocol to version 3	2019-09-04 09:34:41 -04:00
Tim Gross	0f29dcc935	support script checks for task group services (#6197 ) In Nomad prior to Consul Connect, all Consul checks work the same except for Script checks. Because the Task being checked is running in its own container namespaces, the check is executed by Nomad in the Task's context. If the Script check passes, Nomad uses the TTL check feature of Consul to update the check status. This means in order to run a Script check, we need to know what Task to execute it in. To support Consul Connect, we need Group Services, and these need to be registered in Consul along with their checks. We could push the Service down into the Task, but this doesn't work if someone wants to associate a service with a task's ports, but do script checks in another task in the allocation. Because Nomad is handling the Script check and not Consul anyways, this moves the script check handling into the task runner so that the task runner can own the script check's configuration and lifecycle. This will allow us to pass the group service check configuration down into a task without associating the service itself with the task. When tasks are checked for script checks, we walk back through their task group to see if there are script checks associated with the task. If so, we'll spin off script check tasklets for them. The group-level service and any restart behaviors it needs are entirely encapsulated within the group service hook.	2019-09-03 15:09:04 -04:00
Buck Doyle	21ec6a237c	Merge branch 'master' into f-policy-json # Conflicts: # CHANGELOG.md	2019-09-03 09:56:25 -05:00
Jasmine Dahilig	4edebe389a	add default update stanza and max_parallel=0 disables deployments (#6191 )	2019-09-02 10:30:09 -07:00
Evan Ercolano	fcf66918d0	Remove unused canary param from MakeTaskServiceID	2019-08-31 16:53:23 -04:00
Michael Schurter	4bd53deba9	Merge pull request #6236 from hashicorp/b-ignore-connect-services consul: ignore connect services when syncing	2019-08-30 13:11:09 -07:00
Michael Schurter	67b7bc1e90	consul: ignore connect services when syncing Consul registers Connect services automatically, however Nomad thinks it owns them due to the _nomad prefix. Since the services are managed by Consul, Nomad needs to explicitly ignore them or otherwies they will be removed.	2019-08-30 11:53:41 -07:00
Tim Gross	b79021adfd	cli: split -dev and -dev-connect flags	2019-08-30 09:33:30 -04:00
Buck Doyle	ab96785fc9	Change test to use valid HCL for rules	2019-08-29 16:09:02 -05:00
Mahmood Ali	6eabf53b91	Default raft protocol to version 3	2019-08-28 15:56:59 -04:00
Nick Ethier	9e96971a75	cli: display group ports and address in alloc status command output (#6189 ) * cli: display group ports and address in alloc status command output * add assertions for port.To = -1 case and convert assertions to testify	2019-08-27 23:59:36 -04:00
Jasmine Dahilig	ffceab0879	remove network stanza from job init --short example jobspec (#6179 )	2019-08-27 07:36:32 -07:00
Tim Gross	11030f7aa0	init: add generated assets into bindata	2019-08-26 14:24:15 -04:00
Tim Gross	4d4461d1f5	agent: -dev=connect mode bind to 0.0.0.0 The dev mode flag for connect was binding to the default interface's IP, but this makes for a bad user experience for the CLI which will default to 127.0.0.1. If we bind to 0.0.0.0 instead the CLI will work without further configuration by the user.	2019-08-23 13:51:16 -04:00
Jerome Gravel-Niquet	cbdc1978bf	Consul service meta (#6193 ) * adds meta object to service in job spec, sends it to consul * adds tests for service meta * fix tests * adds docs * better hashing for service meta, use helper for copying meta when registering service * tried to be DRY, but looks like it would be more work to use the helper function	2019-08-23 12:49:02 -04:00
Michael Schurter	95b8048553	Merge pull request #6121 from hashicorp/f-connect-bootstrap connect: task hook for bootstrapping envoy sidecar	2019-08-22 10:58:31 -07:00
Michael Schurter	59e0b67c7f	connect: task hook for bootstrapping envoy sidecar Fixes #6041 Unlike all other Consul operations, boostrapping requires Consul be available. This PR tries Consul 3 times with a backoff to account for the group services being asynchronously registered with Consul.	2019-08-22 08:15:32 -07:00
Danielle Lancashire	2e5f28029f	remove hidden field from host volumes We're not shipping support for "hidden" volumes in 0.10 any more, I'll convert this to an issue+mini RFC for future enhancement.	2019-08-22 08:48:05 +02:00
Danielle	c280e97619	Merge pull request #6184 from hashicorp/dani/fix-api api: Fix definition of HostVolumeInfo	2019-08-22 00:13:28 +02:00
Danielle Lancashire	112b986736	api: Fix definition of HostVolumeInfo	2019-08-21 22:34:41 +02:00
Danielle Lancashire	9df7e0eb72	clientconfig: Fix parsing multiple host volumes	2019-08-21 22:19:58 +02:00
Michael Schurter	050cc32fde	Merge pull request #6157 from hashicorp/f-connect-register Register connect enabled group services with Consul	2019-08-20 14:45:38 -07:00
Michael Schurter	b008fd1724	connect: register group services with Consul Fixes #6042 Add new task group service hook for registering group services like Connect-enabled services. Does not yet support checks.	2019-08-20 12:25:10 -07:00
Tim Gross	c404491f1f	test: require root for linux devmode test	2019-08-20 13:31:49 -04:00
Tim Gross	a0e923f46c	add optional task field to group service checks	2019-08-20 09:35:31 -04:00
Nick Ethier	24f5a4c276	sidecar_task override in connect admission controller (#6140 ) * structs: use seperate SidecarTask struct for sidecar_task stanza and add merge * nomad: merge SidecarTask into proxy task during connect Mutate hook	2019-08-20 01:22:46 -04:00
Tim Gross	2ab004d971	command: add `-connect` flag to job init Adds an example job for Consul Connect integration as well as an annotated example job.	2019-08-19 14:43:04 -04:00
Tim Gross	2a592a2e0c	agent: add optional param to -dev flag for connect (#6126 ) Consul Connect must route traffic between network namespaces through a public interface (i.e. not localhost). In order to support testing in dev mode, users needed to manually set the interface which doesn't make for a smooth experience. This commit adds a facility for adding optional parameters to the `nomad agent -dev` flag and uses it to add a `-dev=connect` flag that binds to a public interface on the host.	2019-08-14 15:29:37 -04:00
Tim Gross	13376cff9c	move `nomad init` outputs to go-bindata assets	2019-08-14 14:10:23 -04:00
Preetha	8c6312d973	Merge pull request #6097 from hashicorp/f-kind-validate Add validation for kind field if it is a consul connect proxy	2019-08-13 11:05:30 -05:00
Preetha Appan	72e45dd01e	More code review feedback	2019-08-12 17:41:40 -05:00
Tim Gross	03433f35d4	client/template: configuration for function blacklist and sandboxing When rendering a task template, the `plugin` function is no longer permitted by default and will raise an error. An operator can opt-in to permitting this function with the new `template.function_blacklist` field in the client configuration. When rendering a task template, path parameters for the `file` function will be treated as relative to the task directory by default. Relative paths or symlinks that point outside the task directory will raise an error. An operator can opt-out of this protection with the new `template.disable_file_sandbox` field in the client configuration.	2019-08-12 16:34:48 -04:00
Preetha Appan	35506c516d	Improve validation logic and add table driven tests	2019-08-12 14:39:50 -05:00

... 6 7 8 9 10 ...

3032 Commits