open-nomad

Author	SHA1	Message	Date
Seth Hoenig	03a4af9563	client: PR cleanup - shadow context variable	2020-01-31 19:05:19 -06:00
Seth Hoenig	587a5d4a8d	nomad: make TaskGroup.UsesConnect helper a public helper	2020-01-31 19:05:11 -06:00
Seth Hoenig	057f117592	client: manage TR kill from parent on SI token derivation failure Re-orient the management of the tr.kill to happen in the parent of the spawned goroutine that is doing the actual token derivation. This makes the code a little more straightforward, making it easier to reason about not leaking the worker goroutine.	2020-01-31 19:05:02 -06:00
Seth Hoenig	c8761a3f11	client: set context timeout around SI token derivation The derivation of an SI token needs to be safegaurded by a context timeout, otherwise an unresponsive Consul could cause the siHook to block forever on Prestart.	2020-01-31 19:04:56 -06:00
Seth Hoenig	4ee55fcd6c	nomad,client: apply more comment/style PR tweaks	2020-01-31 19:04:52 -06:00
Seth Hoenig	be7c671919	nomad,client: apply smaller PR suggestions Apply smaller suggestions like doc strings, variable names, etc. Co-Authored-By: Nick Ethier <nethier@hashicorp.com> Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2020-01-31 19:04:40 -06:00
Seth Hoenig	78a7d1e426	comments: cleanup some leftover debug comments and such	2020-01-31 19:04:35 -06:00
Seth Hoenig	5c5da95f34	client: skip task SI token file load failure if testing as root The TestEnvoyBootstrapHook_maybeLoadSIToken test case only works when running as a non-priveleged user, since it deliberately tries to read an un-readable file to simulate a failure loading the SI token file.	2020-01-31 19:04:30 -06:00
Seth Hoenig	ab7ae8bbb4	client: remove unused indirection for referencing consul executable Was thinking about using the testing pattern where you create executable shell scripts as test resources which "mock" the process a bit of code is meant to fork+exec. Turns out that wasn't really necessary in this case.	2020-01-31 19:04:25 -06:00
Seth Hoenig	2c7ac9a80d	nomad: fixup token policy validation	2020-01-31 19:04:08 -06:00
Seth Hoenig	d204f2f4f0	client: enable envoy bootstrap hook to set SI token When creating the envoy bootstrap configuration, we should append the "-token=<token>" argument in the case where the sidsHook placed the token in the secrets directory.	2020-01-31 19:04:01 -06:00
Seth Hoenig	9df33f622f	nomad: proxy requests for Service Identity tokens between Clients and Consul Nomad jobs may be configured with a TaskGroup which contains a Service definition that is Consul Connect enabled. These service definitions end up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In the case where Consul ACLs are enabled, a Service Identity token is required for these tasks to run & connect, etc. This changeset enables the Nomad Server to recieve RPC requests for the derivation of SI tokens on behalf of instances of Consul Connect using Tasks. Those tokens are then relayed back to the requesting Client, which then injects the tokens in the secrets directory of the Task.	2020-01-31 19:03:53 -06:00
Seth Hoenig	93cf770edb	client: enable nomad client to request and set SI tokens for tasks When a job is configured with Consul Connect aware tasks (i.e. sidecar), the Nomad Client should be able to request from Consul (through Nomad Server) Service Identity tokens specific to those tasks.	2020-01-31 19:03:38 -06:00
Mahmood Ali	9611324654	Merge pull request #6922 from hashicorp/b-alloc-canoncalize Handle Upgrades and Alloc.TaskResources modification	2020-01-28 15:12:41 -05:00
Mahmood Ali	b1b714691c	address review comments	2020-01-15 08:57:05 -05:00
Nick Ethier	1f28633954	Merge pull request #6816 from hashicorp/b-multiple-envoy connect: configure envoy to support multiple sidecars in the same alloc	2020-01-09 23:25:39 -05:00
Mahmood Ali	4e5d867644	client: stop using alloc.TaskResources Now that alloc.Canonicalize() is called in all alloc sources in the client (i.e. on state restore and RPC fetching), we no longer need to check alloc.TaskResources. alloc.AllocatedResources is always non-nil through alloc runner. Though, early on, we check for alloc validity, so NewTaskRunner and TaskEnv must still check. `TestClient_AddAllocError` test validates that behavior.	2020-01-09 09:25:07 -05:00
Tim Gross	fa4da93578	interpolate environment for services in script checks (#6916 ) In 0.10.2 (specifically 387b016) we added interpolation to group service blocks and centralized the logic for task environment interpolation. This wasn't also added to script checks, which caused a regression where the IDs for script checks for services w/ interpolated fields (ex. the service name) didn't match the service ID that was registered with Consul. This changeset calls the same taskenv interpolation logic during `script_check` configuration, and adds tests to reduce the risk of future regressions by comparing the IDs of service hook and the check hook.	2020-01-09 08:12:54 -05:00
Nick Ethier	9c3cc63cd1	tr: initialize envoybootstrap prestart hook response.Env field	2020-01-08 13:41:38 -05:00
Nick Ethier	105cbf6df9	tr: expose envoy sidecar admin port as environment variable	2020-01-06 21:53:45 -05:00
Nick Ethier	677e9cdc16	connect: configure envoy such that multiple sidecars can run in the same alloc	2020-01-06 11:26:27 -05:00
Mahmood Ali	4a1cc67f58	Merge pull request #6820 from hashicorp/f-skip-docker-logging-knob driver: allow disabling log collection	2019-12-13 11:41:20 -05:00
Mahmood Ali	46bc3b57e6	address review comments	2019-12-13 11:21:00 -05:00
Chris Dickson	4d8ba272d1	client: expose allocated CPU per task (#6784 )	2019-12-09 15:40:22 -05:00
Mahmood Ali	0b7085ba3a	driver: allow disabling log collection Operators commonly have docker logs aggregated using various tools and don't need nomad to manage their docker logs. Worse, Nomad uses a somewhat heavy docker api call to collect them and it seems to cause problems when a client runs hundreds of log collections. Here we add a knob to disable log aggregation completely for nomad. When log collection is disabled, we avoid running logmon and docker_logger for the docker tasks in this implementation. The downside here is once disabled, `nomad logs ...` commands and API no longer return logs and operators must corrolate alloc-ids with their aggregated log info. This is meant as a stop gap measure. Ideally, we'd follow up with at least two changes: First, we should optimize behavior when we can such that operators don't need to disable docker log collection. Potentially by reverting to using pre-0.9 syslog aggregation in linux environments, though with different trade-offs. Second, when/if logs are disabled, nomad logs endpoints should lookup docker logs api on demand. This ensures that the cost of log collection is paid sparingly.	2019-12-08 14:15:03 -05:00
Preetha	be4a51d5b8	Merge pull request #6349 from hashicorp/b-host-stats client: Return empty values when host stats fail	2019-11-20 10:13:02 -06:00
Lang Martin	aa985ebe21	getter: allow the gcs download scheme (#6692 )	2019-11-19 09:10:56 -05:00
Nick Ethier	bd454a4c6f	client: improve group service stanza interpolation and check_re… (#6586 ) * client: improve group service stanza interpolation and check_restart support Interpolation can now be done on group service stanzas. Note that some task runtime specific information that was previously available when the service was registered poststart of a task is no longer available. The check_restart stanza for checks defined on group services will now properly restart the allocation upon check failures if configured.	2019-11-18 13:04:01 -05:00
Michael Schurter	9fed8d1bed	client: fix panic from 0.8 -> 0.10 upgrade makeAllocTaskServices did not do a nil check on AllocatedResources which causes a panic when upgrading directly from 0.8 to 0.10. While skipping 0.9 is not supported we intend to fix serious crashers caused by such upgrades to prevent cluster outages. I did a quick audit of the client package and everywhere else that accesses AllocatedResources appears to be properly guarded by a nil check.	2019-11-01 07:47:03 -07:00
Michael Schurter	f54f1cb321	Revert "Revert "Use joint context to cancel prestart hooks""	2019-10-08 11:34:09 -07:00
Michael Schurter	81a30ae106	Revert "Use joint context to cancel prestart hooks"	2019-10-08 11:27:08 -07:00
Drew Bailey	69eebcd241	simplify logic to check for vault read event defer shutdown to cleanup after failed run Co-Authored-By: Michael Schurter <mschurter@hashicorp.com> update comment to include ctx note for shutdown	2019-09-30 11:02:14 -07:00
Drew Bailey	7565b8a8d9	Use joint context to cancel prestart hooks fixes https://github.com/hashicorp/nomad/issues/6382 The prestart hook for templates blocks while it resolves vault secrets. If the secret is not found it continues to retry. If a task is shutdown during this time, the prestart hook currently does not receive shutdownCtxCancel, causing it to hang. This PR joins the two contexts so either killCtx or shutdownCtx cancel and stop the task.	2019-09-30 10:48:01 -07:00
Danielle Lancashire	4f2343e1c0	client: Return empty values when host stats fail Currently, there is an issue when running on Windows whereby under some circumstances the Windows stats API's will begin to return errors (such as internal timeouts) when a client is under high load, and potentially other forms of resource contention / system states (and other unknown cases). When an error occurs during this collection, we then short circuit further metrics emission from the client until the next interval. This can be problematic if it happens for a sustained number of intervals, as our metrics aggregator will begin to age out older metrics, and we will eventually stop emitting various types of metrics including `nomad.client.unallocated.*` metrics. However, when metrics collection fails on Linux, gopsutil will in many cases (e.g cpu.Times) silently return 0 values, rather than an error. Here, we switch to returning empty metrics in these failures, and logging the error at the source. This brings the behaviour into line with Linux/Unix platforms, and although making aggregation a little sadder on intermittent failures, will result in more desireable overall behaviour of keeping metrics available for further investigation if things look unusual.	2019-09-19 01:22:07 +02:00
Danielle Lancashire	78b61de45f	config: Hoist volume.config.source into volume Currently, using a Volume in a job uses the following configuration: ``` volume "alias-name" { type = "volume-type" read_only = true config { source = "host_volume_name" } } ``` This commit migrates to the following: ``` volume "alias-name" { type = "volume-type" source = "host_volume_name" read_only = true } ``` The original design was based due to being uncertain about the future of storage plugins, and to allow maxium flexibility. However, this causes a few issues, namely: - We frequently need to parse this configuration during submission, scheduling, and mounting - It complicates the configuration from and end users perspective - It complicates the ability to do validation As we understand the problem space of CSI a little more, it has become clear that we won't need the `source` to be in config, as it will be used in the majority of cases: - Host Volumes: Always need a source - Preallocated CSI Volumes: Always needs a source from a volume or claim name - Dynamic Persistent CSI Volumes: Always needs a source to attach the volumes to for managing upgrades and to avoid dangling. - Dynamic Ephemeral CSI Volumes: Less thought out, but `source` will probably point to the plugin name, and a `config` block will allow you to pass meta to the plugin. Or will point to a pre-configured ephemeral config. *If implemented The new design simplifies this by merging the source into the volume stanza to solve the above issues with usability, performance, and error handling.	2019-09-13 04:37:59 +02:00
Tim Gross	3fa4bca4a0	script checks: Update needs to update Alloc as well (#6291 )	2019-09-06 11:18:00 -04:00
Tim Gross	8ce201854a	client: recreate script checks on Update (#6265 ) Splitting the immutable and mutable components of the scriptCheck led to a bug where the environment interpolation wasn't being incorporated into the check's ID, which caused the UpdateTTL to update for a check ID that Consul didn't have (because our Consul client creates the ID from the structs.ServiceCheck each time we update). Task group services don't have access to a task environment at creation, so their checks get registered before the check can be interpolated. Use the original check ID so they can be updated.	2019-09-05 11:43:23 -04:00
Tim Gross	0f29dcc935	support script checks for task group services (#6197 ) In Nomad prior to Consul Connect, all Consul checks work the same except for Script checks. Because the Task being checked is running in its own container namespaces, the check is executed by Nomad in the Task's context. If the Script check passes, Nomad uses the TTL check feature of Consul to update the check status. This means in order to run a Script check, we need to know what Task to execute it in. To support Consul Connect, we need Group Services, and these need to be registered in Consul along with their checks. We could push the Service down into the Task, but this doesn't work if someone wants to associate a service with a task's ports, but do script checks in another task in the allocation. Because Nomad is handling the Script check and not Consul anyways, this moves the script check handling into the task runner so that the task runner can own the script check's configuration and lifecycle. This will allow us to pass the group service check configuration down into a task without associating the service itself with the task. When tasks are checked for script checks, we walk back through their task group to see if there are script checks associated with the task. If so, we'll spin off script check tasklets for them. The group-level service and any restart behaviors it needs are entirely encapsulated within the group service hook.	2019-09-03 15:09:04 -04:00
Michael Schurter	5957030d18	connect: add unix socket to proxy grpc for envoy (#6232 ) * connect: add unix socket to proxy grpc for envoy Fixes #6124 Implement a L4 proxy from a unix socket inside a network namespace to Consul's gRPC endpoint on the host. This allows Envoy to connect to Consul's xDS configuration API. * connect: pointer receiver on structs with mutexes * connect: warn on all proxy errors	2019-09-03 08:43:38 -07:00
Evan Ercolano	fcf66918d0	Remove unused canary param from MakeTaskServiceID	2019-08-31 16:53:23 -04:00
Lang Martin	4f6493a301	taskrunner getter set Umask for go-getter, setuid test	2019-08-23 15:59:03 -04:00
Jerome Gravel-Niquet	cbdc1978bf	Consul service meta (#6193 ) * adds meta object to service in job spec, sends it to consul * adds tests for service meta * fix tests * adds docs * better hashing for service meta, use helper for copying meta when registering service * tried to be DRY, but looks like it would be more work to use the helper function	2019-08-23 12:49:02 -04:00
Michael Schurter	59e0b67c7f	connect: task hook for bootstrapping envoy sidecar Fixes #6041 Unlike all other Consul operations, boostrapping requires Consul be available. This PR tries Consul 3 times with a backoff to account for the group services being asynchronously registered with Consul.	2019-08-22 08:15:32 -07:00
Tim Gross	03433f35d4	client/template: configuration for function blacklist and sandboxing When rendering a task template, the `plugin` function is no longer permitted by default and will raise an error. An operator can opt-in to permitting this function with the new `template.function_blacklist` field in the client configuration. When rendering a task template, path parameters for the `file` function will be treated as relative to the task directory by default. Relative paths or symlinks that point outside the task directory will raise an error. An operator can opt-out of this protection with the new `template.disable_file_sandbox` field in the client configuration.	2019-08-12 16:34:48 -04:00
Danielle Lancashire	861caa9564	HostVolumeConfig: Source -> Path	2019-08-12 15:39:08 +02:00
Danielle Lancashire	e132a30899	structs: Unify Volume and VolumeRequest	2019-08-12 15:39:08 +02:00
Danielle Lancashire	6ef8d5233e	client: Add volume_hook for mounting volumes	2019-08-12 15:39:08 +02:00
Mahmood Ali	b17bac5101	Render consul templates using task env only (#6055 ) When rendering a task consul template, ensure that only task environment variables are used. Currently, `consul-template` always falls back to host process environment variables when key isn't a task env var[1]. Thus, we add an empty entry for each host process env-var not found in task env-vars. [1] `bfa5d0e133/template/funcs.go (L61-L75)`	2019-08-05 16:30:47 -04:00
Mahmood Ali	f66169cd6a	Merge pull request #6065 from hashicorp/b-nil-driver-exec Check if driver handle is nil before execing	2019-08-02 09:48:28 -05:00
Mahmood Ali	a4670db9b7	Check if driver handle is nil before execing Defend against tr.getDriverHandle being nil. Exec handler checks if task is running, but it may be stopped between check and driver handler fetching.	2019-08-02 10:07:41 +08:00

1 2 3 4 5 ...

272 commits