open-nomad

Author	SHA1	Message	Date
Tim Gross	8ce201854a	client: recreate script checks on Update (#6265 ) Splitting the immutable and mutable components of the scriptCheck led to a bug where the environment interpolation wasn't being incorporated into the check's ID, which caused the UpdateTTL to update for a check ID that Consul didn't have (because our Consul client creates the ID from the structs.ServiceCheck each time we update). Task group services don't have access to a task environment at creation, so their checks get registered before the check can be interpolated. Use the original check ID so they can be updated.	2019-09-05 11:43:23 -04:00
Michael Schurter	ee06c36345	Merge pull request #6254 from hashicorp/test-connect-e2e-demo e2e: test demo job for connect	2019-09-04 14:33:26 -07:00
Nick Ethier	e440ba80f1	ar: refactor network bridge config to use go-cni lib (#6255 ) * ar: refactor network bridge config to use go-cni lib * ar: use eth as the iface prefix for bridged network namespaces * vendor: update containerd/go-cni package * ar: update network hook to use TODO contexts when calling configurator * unnecessary conversion	2019-09-04 16:33:25 -04:00
Michael Schurter	93b47f4ddc	client: reword error message	2019-09-04 12:40:09 -07:00
Tim Gross	0f29dcc935	support script checks for task group services (#6197 ) In Nomad prior to Consul Connect, all Consul checks work the same except for Script checks. Because the Task being checked is running in its own container namespaces, the check is executed by Nomad in the Task's context. If the Script check passes, Nomad uses the TTL check feature of Consul to update the check status. This means in order to run a Script check, we need to know what Task to execute it in. To support Consul Connect, we need Group Services, and these need to be registered in Consul along with their checks. We could push the Service down into the Task, but this doesn't work if someone wants to associate a service with a task's ports, but do script checks in another task in the allocation. Because Nomad is handling the Script check and not Consul anyways, this moves the script check handling into the task runner so that the task runner can own the script check's configuration and lifecycle. This will allow us to pass the group service check configuration down into a task without associating the service itself with the task. When tasks are checked for script checks, we walk back through their task group to see if there are script checks associated with the task. If so, we'll spin off script check tasklets for them. The group-level service and any restart behaviors it needs are entirely encapsulated within the group service hook.	2019-09-03 15:09:04 -04:00
Michael Schurter	5957030d18	connect: add unix socket to proxy grpc for envoy (#6232 ) * connect: add unix socket to proxy grpc for envoy Fixes #6124 Implement a L4 proxy from a unix socket inside a network namespace to Consul's gRPC endpoint on the host. This allows Envoy to connect to Consul's xDS configuration API. * connect: pointer receiver on structs with mutexes * connect: warn on all proxy errors	2019-09-03 08:43:38 -07:00
Jasmine Dahilig	4edebe389a	add default update stanza and max_parallel=0 disables deployments (#6191 )	2019-09-02 10:30:09 -07:00
Evan Ercolano	fcf66918d0	Remove unused canary param from MakeTaskServiceID	2019-08-31 16:53:23 -04:00
Nick Ethier	cf014c7fd5	ar: ensure network forwarding is allowed for bridged allocs (#6196 ) * ar: ensure network forwarding is allowed in iptables for bridged allocs * ensure filter rule exists at setup time	2019-08-28 10:51:34 -04:00
Mahmood Ali	cc460d4804	Write to client store while holding lock Protect against a race where destroying and persist state goroutines race. The downside is that the database io operation will run while holding the lock and may run indefinitely. The risk of lock being long held is slow destruction, but slow io has bigger problems.	2019-08-26 13:45:58 -04:00
Mahmood Ali	c132623ffc	Don't persist allocs of destroyed alloc runners This fixes a bug where allocs that have been GCed get re-run again after client is restarted. A heavily-used client may launch thousands of allocs on startup and get killed. The bug is that an alloc runner that gets destroyed due to GC remains in client alloc runner set. Periodically, they get persisted until alloc is gced by server. During that time, the client db will contain the alloc but not its individual tasks status nor completed state. On client restart, client assumes that alloc is pending state and re-runs it. Here, we fix it by ensuring that destroyed alloc runners don't persist any alloc to the state DB. This is a short-term fix, as we should consider revamping client state management. Storing alloc and task information in non-transaction non-atomic concurrently while alloc runner is running and potentially changing state is a recipe for bugs. Fixes https://github.com/hashicorp/nomad/issues/5984 Related to https://github.com/hashicorp/nomad/pull/5890	2019-08-25 11:21:28 -04:00
Lang Martin	4f6493a301	taskrunner getter set Umask for go-getter, setuid test	2019-08-23 15:59:03 -04:00
Jerome Gravel-Niquet	cbdc1978bf	Consul service meta (#6193 ) * adds meta object to service in job spec, sends it to consul * adds tests for service meta * fix tests * adds docs * better hashing for service meta, use helper for copying meta when registering service * tried to be DRY, but looks like it would be more work to use the helper function	2019-08-23 12:49:02 -04:00
Nick Ethier	96d379071d	ar: fix bridge networking port mapping when port.To is unset (#6190 )	2019-08-22 21:53:52 -04:00
Michael Schurter	59e0b67c7f	connect: task hook for bootstrapping envoy sidecar Fixes #6041 Unlike all other Consul operations, boostrapping requires Consul be available. This PR tries Consul 3 times with a backoff to account for the group services being asynchronously registered with Consul.	2019-08-22 08:15:32 -07:00
Michael Schurter	b008fd1724	connect: register group services with Consul Fixes #6042 Add new task group service hook for registering group services like Connect-enabled services. Does not yet support checks.	2019-08-20 12:25:10 -07:00
Tim Gross	03433f35d4	client/template: configuration for function blacklist and sandboxing When rendering a task template, the `plugin` function is no longer permitted by default and will raise an error. An operator can opt-in to permitting this function with the new `template.function_blacklist` field in the client configuration. When rendering a task template, path parameters for the `file` function will be treated as relative to the task directory by default. Relative paths or symlinks that point outside the task directory will raise an error. An operator can opt-out of this protection with the new `template.disable_file_sandbox` field in the client configuration.	2019-08-12 16:34:48 -04:00
Danielle Lancashire	861caa9564	HostVolumeConfig: Source -> Path	2019-08-12 15:39:08 +02:00
Danielle Lancashire	e132a30899	structs: Unify Volume and VolumeRequest	2019-08-12 15:39:08 +02:00
Danielle Lancashire	6ef8d5233e	client: Add volume_hook for mounting volumes	2019-08-12 15:39:08 +02:00
Mahmood Ali	b17bac5101	Render consul templates using task env only (#6055 ) When rendering a task consul template, ensure that only task environment variables are used. Currently, `consul-template` always falls back to host process environment variables when key isn't a task env var[1]. Thus, we add an empty entry for each host process env-var not found in task env-vars. [1] `bfa5d0e133/template/funcs.go (L61-L75)`	2019-08-05 16:30:47 -04:00
Mahmood Ali	f66169cd6a	Merge pull request #6065 from hashicorp/b-nil-driver-exec Check if driver handle is nil before execing	2019-08-02 09:48:28 -05:00
Mahmood Ali	a4670db9b7	Check if driver handle is nil before execing Defend against tr.getDriverHandle being nil. Exec handler checks if task is running, but it may be stopped between check and driver handler fetching.	2019-08-02 10:07:41 +08:00
Nick Ethier	321d10a041	client: remove debugging lines	2019-07-31 01:04:09 -04:00
Nick Ethier	af6b191963	client: add autofetch for CNI plugins	2019-07-31 01:04:09 -04:00
Nick Ethier	1e9dd1b193	remove unused file	2019-07-31 01:04:09 -04:00
Nick Ethier	09a4cfd8d7	fix failing tests	2019-07-31 01:04:07 -04:00
Nick Ethier	ef83f0831b	ar: plumb client config for networking into the network hook	2019-07-31 01:04:06 -04:00
Nick Ethier	af66a35924	networking: Add new bridge networking mode implementation	2019-07-31 01:04:06 -04:00
Nick Ethier	63c5504d56	ar: fix lint errors	2019-07-31 01:03:19 -04:00
Nick Ethier	e312201d18	ar: rearrange network hook to support building on windows	2019-07-31 01:03:19 -04:00
Nick Ethier	370533c9c7	ar: fix test that failed due to error renaming	2019-07-31 01:03:19 -04:00
Nick Ethier	2d60ef64d9	plugins/driver: make DriverNetworkManager interface optional	2019-07-31 01:03:19 -04:00
Nick Ethier	f87e7e9c9a	ar: plumb error handling into alloc runner hook initialization	2019-07-31 01:03:18 -04:00
Nick Ethier	ef1795b344	ar: add tests for network hook	2019-07-31 01:03:18 -04:00
Nick Ethier	15989bba8e	ar: cleanup lint errors	2019-07-31 01:03:18 -04:00
Nick Ethier	220cba3e7e	ar: move linux specific code to it's own file and add tests	2019-07-31 01:03:18 -04:00
Nick Ethier	548f78ef15	ar: initial driver based network management	2019-07-31 01:03:17 -04:00
Nick Ethier	66c514a388	Add network lifecycle management Adds a new Prerun and Postrun hooks to manage set up of network namespaces on linux. Work still needs to be done to make the code platform agnostic and support Docker style network initalization.	2019-07-31 01:03:17 -04:00
Jasmine Dahilig	2157f6ddf1	add formatting for hcl parsing error messages (#5972 )	2019-07-19 10:04:39 -07:00
Mahmood Ali	cd6f1d3102	Update consul-template dependency to latest To pick up the fix in https://github.com/hashicorp/consul-template/pull/1231 .	2019-07-18 07:32:03 +07:00
Mahmood Ali	8a82260319	log unrecoverable errors	2019-07-17 11:01:59 +07:00
Mahmood Ali	1a299c7b28	client/taskrunner: fix stats stats retry logic Previously, if a channel is closed, we retry the Stats call. But, if that call fails, we go in a backoff loop without calling Stats ever again. Here, we use a utility function for calling driverHandle.Stats call that retries as one expects. I aimed to preserve the logging formats but made small improvements as I saw fit.	2019-07-11 13:58:07 +08:00
Preetha Appan	ef9a71c68b	code review feedback	2019-07-10 10:41:06 -05:00
Preetha Appan	990e468edc	Populate task event struct with kill timeout This makes for a nicer task event message	2019-07-09 09:37:09 -05:00
Mahmood Ali	f10201c102	run post-run/post-stop task runner hooks Handle when prestart failed while restoring a task, to prevent accidentally leaking consul/logmon processes.	2019-07-02 18:38:32 +08:00
Mahmood Ali	4afd7835e3	Fail alloc if alloc runner prestart hooks fail When an alloc runner prestart hook fails, the task runners aren't invoked and they remain in a pending state. This leads to terrible results, some of which are: * Lockup in GC process as reported in https://github.com/hashicorp/nomad/pull/5861 * Lockup in shutdown process as TR.Shutdown() waits for WaitCh to be closed * Alloc not being restarted/rescheduled to another node (as it's still in pending state) * Unexpected restart of alloc on a client restart, potentially days/weeks after alloc expected start time! Here, we treat all tasks to have failed if alloc runner prestart hook fails. This fixes the lockups, and permits the alloc to be rescheduled on another node. While it's desirable to retry alloc runner in such failures, I opted to treat it out of scope. I'm afraid of some subtles about alloc and task runners and their idempotency that's better handled in a follow up PR. This might be one of the root causes for https://github.com/hashicorp/nomad/issues/5840 .	2019-07-02 18:35:47 +08:00
Mahmood Ali	7614b8f09e	Merge pull request #5890 from hashicorp/b-dont-start-completed-allocs-2 task runner to avoid running task if terminal	2019-07-02 15:31:17 +08:00
Mahmood Ali	7bfad051b9	address review comments	2019-07-02 14:53:50 +08:00
Mahmood Ali	3d89ae0f1e	task runner to avoid running task if terminal This change fixes a bug where nomad would avoid running alloc tasks if the alloc is client terminal but the server copy on the client isn't marked as running. Here, we fix the case by having task runner uses the allocRunner.shouldRun() instead of only checking the server updated alloc. Here, we preserve much of the invariants such that `tr.Run()` is always run, and don't change the overall alloc runner and task runner lifecycles. Fixes https://github.com/hashicorp/nomad/issues/5883	2019-06-27 11:27:34 +08:00

1 2 3 4 5 ...

319 commits