open-nomad

Author	SHA1	Message	Date
Michael Schurter	10c3bad652	client: never embed alloc_dir in chroot Fixes #2522 Skip embedding client.alloc_dir when building chroot. If a user configures a Nomad client agent so that the chroot_env will embed the client.alloc_dir, Nomad will happily infinitely recurse while building the chroot until something horrible happens. The best case scenario is the filesystem's path length limit is hit. The worst case scenario is disk space is exhausted. A bad agent configuration will look something like this: ```hcl data_dir = "/tmp/nomad-badagent" client { enabled = true chroot_env { # Note that the source matches the data_dir "/tmp/nomad-badagent" = "/ohno" # ... } } ``` Note that `/ohno/client` (the state_dir) will still be created but not `/ohno/alloc` (the alloc_dir). While I cannot think of a good reason why someone would want to embed Nomad's client (and possibly server) directories in chroots, there should be no cause for harm. chroots are only built when Nomad runs as root, and Nomad disables running exec jobs as root by default. Therefore even if client state is copied into chroots, it will be inaccessible to tasks. Skipping the `data_dir` and `{client,server}.state_dir` is possible, but this PR attempts to implement the minimum viable solution to reduce risk of unintended side effects or bugs. When running tests as root in a vm without the fix, the following error occurs: ``` === RUN TestAllocDir_SkipAllocDir alloc_dir_test.go:520: Error Trace: alloc_dir_test.go:520 Error: Received unexpected error: Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long Test: TestAllocDir_SkipAllocDir --- FAIL: TestAllocDir_SkipAllocDir (22.76s) ``` Also removed unused Copy methods on AllocDir and TaskDir structs. Thanks to @eveld for not letting me forget about this!	2021-10-18 09:22:01 -07:00
Seth Hoenig	519429a2de	consul: probe consul namespace feature before using namespace api This PR changes Nomad's wrapper around the Consul NamespaceAPI so that it will detect if the Consul Namespaces feature is enabled before making a request to the Namespaces API. Namespaces are not enabled in Consul OSS, and require a suitable license to be used with Consul ENT. Previously Nomad would check for a 404 status code when makeing a request to the Namespaces API to "detect" if Consul OSS was being used. This does not work for Consul ENT with Namespaces disabled, which returns a 500. Now we avoid requesting the namespace API altogether if Consul is detected to be the OSS sku, or if the Namespaces feature is not licensed. Since Consul can be upgraded from OSS to ENT, or a new license applied, we cache the value for 1 minute, refreshing on demand if expired. Fixes https://github.com/hashicorp/nomad-enterprise/issues/575 Note that the ticket originally describes using attributes from https://github.com/hashicorp/nomad/issues/10688. This turns out not to be possible due to a chicken-egg situation between bootstrapping the agent and setting up the consul client. Also fun: the Consul fingerprinter creates its own Consul client, because there is no [currently] no way to pass the agent's client through the fingerprint factory.	2021-06-07 12:19:25 -05:00
Seth Hoenig	f17ba33f61	consul: plubming for specifying consul namespace in job/group This PR adds the common OSS changes for adding support for Consul Namespaces, which is going to be a Nomad Enterprise feature. There is no new functionality provided by this changeset and hopefully no new bugs.	2021-04-05 10:03:19 -06:00
Seth Hoenig	26e77623e5	consul/connect: fixup tests to use new consul sdk	2020-08-24 12:02:41 -05:00
Jasmine Dahilig	7b3f3497ed	mock task hook coordinator in consul integration test	2020-03-21 17:52:55 -04:00
Mahmood Ali	98ad59b1de	update rest of consul packages	2020-02-16 16:25:04 -06:00
Michael Schurter	fc1bb95ef8	Remove old comment; it's been fixed!	2019-01-14 09:56:53 -08:00
Nick Ethier	82175d1328	client/drivermananger: add driver manager The driver manager is modeled after the device manager and is started by the client. It's responsible for handling driver lifecycle and reattachment state, as well as processing the incomming fingerprint and task events from each driver. The mananger exposes a method for registering event handlers for task events that is used by the task runner to update the server when a task has been updated with an event. Since driver fingerprinting has been implemented by the driver manager, it is no longer needed in the fingerprint mananger and has been removed.	2018-12-18 22:55:18 -05:00
Alex Dadgar	4ee603c382	Device hook and devices affect computed node class This PR introduces a device hook that retrieves the device mount information for an allocation. It also updates the computed node class computation to take into account devices. TODO Fix the task runner unit test. The environment variable is being lost even though it is being properly set in the prestart hook.	2018-11-27 17:25:33 -08:00
Michael Schurter	6bdbfb8129	tests: get consul integration tests building	2018-11-05 12:32:05 -08:00
Michael Schurter	d71a1b4547	tests: more fixes due to api changes	2018-10-29 15:25:22 -07:00
Alex Dadgar	7946a14aa8	Fix lints	2018-10-16 16:56:56 -07:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	7739ef51ce	agent + consul	2018-09-13 10:43:40 -07:00
Alex Dadgar	300b1a7a15	Tests only use testlog package logger	2018-06-13 15:40:56 -07:00
Alex Dadgar	90c2108bfb	Fix gc tests + parallel destroy + small test fixes	2018-06-12 10:23:45 -07:00
Preetha Appan	ce6d4a8d7a	Fix tests and move isClient to constructor	2018-06-01 15:59:53 -05:00
Michael Schurter	d3650fb2cd	test: build with mock_driver by default `make release` and `make prerelease` set a `release` tag to disable enabling the `mock_driver`	2018-04-18 14:45:33 -07:00
Michael Schurter	0971114f0c	Replace Consul TLSSkipVerify handling Instead of checking Consul's version on startup to see if it supports TLSSkipVerify, assume that it does and only log in the job service handler if we discover Consul does not support TLSSkipVerify. The old code would break TLSSkipVerify support if Nomad started before Consul (such as on system boot) as TLSSkipVerify would default to false if Consul wasn't running. Since TLSSkipVerify has been supported since Consul 0.7.2, it's safe to relax our handling.	2018-03-14 17:43:06 -07:00
Michael Schurter	714eb0b266	Services should not require a port Fixes #3673	2017-12-19 15:50:23 -08:00
Michael Schurter	b71edf846f	Hash fields used in task service IDs Fixes #3620 Previously we concatenated tags into task service IDs. This could break deregistration of tag names that contained double //s like some Fabio tags. This change breaks service ID backward compatibility so on upgrade all users services and checks will be removed and re-added with new IDs. This change has the side effect of including all service fields in the ID's hash, so we no longer have to track PortLabel and AddressMode changes independently.	2017-12-08 12:03:43 -08:00
Michael Schurter	0aace3d749	Don't set Interval on TTL health checks	2017-10-16 17:35:47 -07:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Alex Dadgar	43dff0a11d	Fix integration test	2017-08-14 10:52:49 -07:00
Alex Dadgar	d86b3977b9	Fix alloc health with checks using interpolation Fixes an issue in which the allocation health watcher was checking for allocations health based on un-interpolated services and checks. Change the interface for retrieving check information from Consul to retrieving all registered services and checks by allocation. In the future this will allow us to output nicer messages. Fixes https://github.com/hashicorp/nomad/issues/2969	2017-08-07 16:27:08 -07:00
Michael Schurter	edbe62a879	Synchronously deregister agent on shutdown Fixes #2891 Previously the agent services and checks were being asynchrously deregistered on shutdown, so it was a race between the sync goroutine deregistering them and Nomad shutting down. This switches to synchronously deregister agent serivces and checks which doesn't really have a downside since the sync goroutines retry behavior doesn't help on shutdown anyway.	2017-07-24 11:40:37 -07:00
Alex Dadgar	553bc91725	Parallel client tests (#2890 ) * alloc_runner * Random tests * parallel task_runner and no exec compatible check * Parallel client * Fail fast and use random ports * Fix docker port mapping * Make concurrent pull less timing dependant * up parallel * Fixes * don't build chroots in parallel on travis * Reduce parallelism on travis with lxc/rkt * make java test app not run forever * drop parallelism a little * use docker ports that are out of the os's ephemeral port range * Limit even more on travis * rkt deadline	2017-07-22 19:04:36 -07:00
Alex Dadgar	4dd5d943c7	remove root requirement on consul integration check	2017-07-21 19:32:41 -07:00
Alex Dadgar	067ed86a47	Client watches for allocation health using task state and Consul checks This PR adds watching of allocation health at the client. The client can watch for health based on the tasks running on time and also based on the consul checks passing.	2017-07-07 12:10:04 -07:00
Alex Dadgar	ba70cc4f01	Merge branch 'master' into f-bolt-db	2017-05-09 11:11:55 -07:00
Michael Schurter	85210eb92f	Update consul/api to support unix socket addrs Fixes #2594	2017-05-08 11:57:04 -07:00
Alex Dadgar	2d54ee2925	Fix tests	2017-05-03 15:14:19 -07:00
Alex Dadgar	9faa98e13b	Fix tests	2017-05-03 12:38:49 -07:00
Michael Schurter	8926743106	Fix consul test build on Windows	2017-04-19 16:14:11 -07:00
Michael Schurter	e997ae44a5	Skip checks with TLSSkipVerify if it's unsupported Fixes #2218	2017-04-19 12:45:34 -07:00
Michael Schurter	5e908dbf38	Use nifty testtask sleep command for xplat compat	2017-04-19 12:42:47 -07:00
Michael Schurter	40b3987e90	Explain cleanup defer in test	2017-04-19 12:42:47 -07:00
Michael Schurter	e204a287ed	Refactor Consul Syncer into new ServiceClient Fixes #2478 #2474 #1995 #2294 The new client only handles agent and task service advertisement. Server discovery is mostly unchanged. The Nomad client agent now handles all Consul operations instead of the executor handling task related operations. When upgrading from an earlier version of Nomad existing executors will be told to deregister from Consul so that the Nomad agent can re-register the task's services and checks. Drivers - other than qemu - now support an Exec method for executing abritrary commands in a task's environment. This is used to implement script checks. Interfaces are used extensively to avoid interacting with Consul in tests that don't assert any Consul related behavior.	2017-04-19 12:42:47 -07:00

38 commits