open-nomad

Commit Graph

Author	SHA1	Message	Date
Chris Baker	91c4e1eabb	Merge pull request #5541 from hashicorp/b/5540-bad-client-alloc-metrics client/metrics: fixed stale metrics	2019-04-22 15:07:30 -04:00
Mahmood Ali	f515b93b5e	Merge pull request #5577 from hashicorp/dani/b-logmon-unrecoverable logging: Attempt to recover logmon failures	2019-04-22 14:40:24 -04:00
Michael Schurter	61f17a1043	tweak logging level for failed log line Co-Authored-By: notnoop <mahmood@notnoop.com>	2019-04-22 14:40:17 -04:00
Chris Baker	0b1a4dd206	client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy	2019-04-22 18:31:45 +00:00
Michael Schurter	26f3bdbf8f	Merge pull request #5583 from ygersie/fingerprint_nilpointer fix nil pointer in fingerprinting AWS env leading to crash	2019-04-19 08:08:59 -07:00
Mahmood Ali	902eed4bf9	clarify cryptic log line	2019-04-19 09:31:43 -04:00
Mahmood Ali	f74d60439f	client: log detected driver health state Noticed that `detected drivers` log line was misleading - when a driver doesn't fingerprint before timeout, their health status is empty string `""` which we would mark as detected. Now, we log all drivers along with their state to ease driver fingerprint debugging.	2019-04-19 09:15:25 -04:00
Mahmood Ali	6bdc9860b7	client: avoid registering node twice right away I noticed that `watchNodeUpdates()` almost immediately after `registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5 seconds. This call is unnecessary and made debugging a bit harder. So here, we ensure that we only re-register node for new node events, not for initial registration.	2019-04-19 09:12:50 -04:00
Mahmood Ali	f82ea8824f	client: wait for batched driver updated Here we retain 0.8.7 behavior of waiting for driver fingerprints before registering a node, with some timeout. This is needed for system jobs, as system job scheduling for node occur at node registration, and the race might mean that a system job may not get placed on the node because of missing drivers. The timeout isn't strictly necessary, but raising it to 1 minute as it's closer to indefinitely blocked than 1 second. We need to keep the value high enough to capture as much drivers/devices, but low enough that doesn't risk blocking too long due to misbehaving plugin. Fixes https://github.com/hashicorp/nomad/issues/5579	2019-04-19 09:00:24 -04:00
Yorick Gersie	95f81f3eeb	fix nil pointer in fingerprinting AWS env leading to crash HTTP Client returns a nil response if an error has occured. We first need to check for an error before being able to check the HTTP response code.	2019-04-19 11:07:13 +02:00
Danielle Lancashire	c31966fc71	loggging: Attempt to recover logmon failures Currently, when logmon fails to reattach, we will retry reattachment to the same pid until the task restart specification is exhausted. Because we cannot clear hook state during error conditions, it is not possible for us to signal to a future restart that it _shouldn't_ attempt to reattach to the plugin. Here we revert to explicitly detecting reattachment seperately from a launch of a new logmon, so we can recover from scenarios where a logmon plugin has failed. This is a net improvement over the current hard failure situation, as it means in the most common case (the pid has gone away), we can recover. Other reattachment failure modes where the plugin may still be running could potentially cause a duplicate process, or a subsequent failure to launch a new plugin. If there was a duplicate process, it could potentially cause duplicate logging. This is better than a production workload outage. If there was a subsequent failure to launch a new plugin, it would fail in the same (retry until restarts are exhausted) as the current failure mode.	2019-04-18 13:41:56 +02:00
Michael Schurter	a85e7b7cc9	vault: fix data races	2019-04-16 11:22:44 -07:00
Michael Schurter	0aeb3dbd86	vault: fix renewal time Renewal time was being calculated as 10s+Intn(lease-10s), so the renewal time could be very rapid or within 1s of the deadline: [10s, lease) This commit fixes the renewal time by calculating it as: (lease/2) +/- 10s For a lease of 60s this means the renewal will occur in [20s, 40s).	2019-04-16 11:22:44 -07:00
Michael Schurter	f7a7acc345	Merge pull request #5518 from hashicorp/f-simplify-kill client: simplify kill logic	2019-04-15 14:11:58 -07:00
Chris Baker	6848591914	vault namespaces: inject VAULT_NAMESPACE alongside VAULT_TOKEN + documentation	2019-04-12 15:06:34 +00:00
Lang Martin	a2a1e7829d	Revert accidental merge of pr #5482 Revert "fingerprint Constraints and Affinities have Equals, as set" This reverts commit 596f16fb5f1a4a6766a57b3311af806d22382609. Revert "client tests assert the independent handling of interface and speed" This reverts commit 7857ac5993a578474d0570819f99b7b6e027de40. Revert "structs missed applying a style change from the review" This reverts commit 658916e3274efa438beadc2535f47109d0c2f0f2. Revert "client, structs comments" This reverts commit be2838d6baa9d382a5013fa80ea016856f28ade2. Revert "client fingerprint updateNetworks preserves the network configuration" This reverts commit fc309cb430e62d8e66267a724f006ae9abe1c63c. Revert "client_test cleanup comments from review" This reverts commit bc0bf4efb9114e699bc662f50c8f12319b6b3445. Revert "client Networks Equals is set equality" This reverts commit f8d432345b54b1953a4a4c719b9269f845e3e573. Revert "struct cleanup indentation in RequestedDevice Equals" This reverts commit f4746411cab328215def6508955b160a53452da3. Revert "struct Equals checks for identity before value checking" This reverts commit 0767a4665ed30ab8d9586a59a74db75d51fd9226. Revert "fix client-test, avoid hardwired platform dependecy on lo0" This reverts commit e89dbb2ab182b6368507dbcd33c3342223eb0ae7. Revert "refactor error in client fingerprint to include the offending data" This reverts commit a7fed726c6e0264d42a58410d840adde780a30f5. Revert "add client updateNodeResources to merge but preserve manual config" This reverts commit 84bd433c7e1d030193e054ec23474380ff3b9032. Revert "refactor struts.RequestedDevice to have its own Equals" This reverts commit 689782524090e51183474516715aa2f34908b8e6. Revert "refactor structs.Resource.Networks to have its own Equals" This reverts commit 49e2e6c77bb3eaa4577772b36c62205061c92fa1. Revert "refactor structs.Resource.Devices to have its own Equals" This reverts commit 4ede9226bb971ae42cc203560ed0029897aec2c9. Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources" This reverts commit 49fbaace5298d5ccf031eb7ebec93906e1d468b5. Revert "add structs.Resources Equals" This reverts commit 8528a2a2a6450e4462a1d02741571b5efcb45f0b. Revert "test that fingerprint resources are updated, net not clobbered" This reverts commit 8ee02ddd23bafc87b9fce52b60c6026335bb722d.	2019-04-11 10:29:40 -04:00
Lang Martin	5d3596eb7e	client tests assert the independent handling of interface and speed	2019-04-11 09:56:22 -04:00
Lang Martin	7258a13c72	client, structs comments	2019-04-11 09:56:22 -04:00
Lang Martin	22d87e4538	client fingerprint updateNetworks preserves the network configuration	2019-04-11 09:56:22 -04:00
Lang Martin	8fe9699e51	client_test cleanup comments from review	2019-04-11 09:56:22 -04:00
Lang Martin	63c993c8ae	fix client-test, avoid hardwired platform dependecy on lo0	2019-04-11 09:56:22 -04:00
Lang Martin	a9db848974	refactor error in client fingerprint to include the offending data	2019-04-11 09:56:22 -04:00
Lang Martin	f211500cea	add client updateNodeResources to merge but preserve manual config	2019-04-11 09:56:22 -04:00
Lang Martin	a4b59130d2	test that fingerprint resources are updated, net not clobbered	2019-04-11 09:56:21 -04:00
Danielle Lancashire	e135876493	allocs: Add nomad alloc restart This adds a `nomad alloc restart` command and api that allows a job operator with the alloc-lifecycle acl to perform an in-place restart of a Nomad allocation, or a given subtask.	2019-04-11 14:25:49 +02:00
Chris Baker	829a972693	vault client test: minor formatting vendor: using upstream circonus-gometrics	2019-04-10 10:34:10 -05:00
Chris Baker	c0a7aee610	vault e2e: pass vault version into setup instead of having to infer it from test name	2019-04-10 10:34:10 -05:00
Chris Baker	f0c184fc29	taskrunner: removed some unecessary config from a test	2019-04-10 10:34:10 -05:00
Chris Baker	a26d4fe1e5	docs: -vault-namespace, VAULT_NAMESPACE, and config agent: added VAULT_NAMESPACE env-based configuration	2019-04-10 10:34:10 -05:00
Chris Baker	170f5239c8	client: gofmt	2019-04-10 10:34:10 -05:00
Chris Baker	a1d7971b2e	taskrunner: pass configured Vault namespace into TaskTemplateConfig	2019-04-10 10:34:10 -05:00
Chris Baker	0eaeef872f	config/docs: added `namespace` to vault config server/client: process `namespace` config, setting on the instantiated vault client	2019-04-10 10:34:10 -05:00
Michael Schurter	45b4827ad7	Bump to 0.9.1-dev	2019-04-09 09:01:48 -07:00
Nomad Release bot	e307734e4a	Generate files for 0.9.0 release	2019-04-09 01:56:00 +00:00
Michael Schurter	f7d4428855	client: simplify kill logic Remove runLaunched tracking as Run is always called for killable TaskRunners. TaskRunners which fail before Run can be called (during NewTaskRunner or Restore) are not killable as they're never added to the client's alloc map.	2019-04-04 15:18:33 -07:00
Michael Schurter	3af602b633	Remove 0.9.0-rc2 generated files	2019-04-03 07:41:09 -07:00
Nomad Release bot	16b4336ccf	Generate files for 0.9.0-rc2 release	2019-04-03 01:54:29 +00:00
Michael Schurter	923cd91850	Merge pull request #5504 from hashicorp/b-exec-path executor/linux: make chroot binary paths absolute	2019-04-02 14:09:50 -07:00
Michael Schurter	1d569a27dc	Revert "executor/linux: add defensive checks to binary path" This reverts commit cb36f4537e63d53b198c2a87d1e03880895631bd.	2019-04-02 11:17:12 -07:00
Michael Schurter	fc5487dbbc	executor/linux: add defensive checks to binary path	2019-04-02 09:40:53 -07:00
Michael Schurter	7d49bc4c71	executor/linux: make chroot binary paths absolute Avoid libcontainer.Process trying to lookup the binary via $PATH as the executor has already found where the binary is located.	2019-04-01 15:45:31 -07:00
Mahmood Ali	81f4f07ed7	rename fifo methods for clarity	2019-04-01 16:52:58 -04:00
Mahmood Ali	e87afe465b	clarify closeDone blocking and field name	2019-04-01 16:10:34 -04:00
Mahmood Ali	9d647713c0	no requires in a test goroutine	2019-04-01 15:38:39 -04:00
Mahmood Ali	2b1f858e1b	log when fifo fails to open	2019-04-01 13:18:03 -04:00
Mahmood Ali	967452a3f0	fifo: Use plain fifo file in Unix This PR switches to using plain fifo files instead of golang structs managed by containerd/fifo library. The library main benefit is management of opening fifo files. In Linux, a reader `open()` request would block until a writer opens the file (and vice-versa). The library uses goroutines so that it's the first IO operation that blocks. This benefit isn't really useful for us: Given that logmon simply streams output in a separate process, blocking of opening or first read is effectively the same. The library additionally makes further complications for managing state and tracking read/write permission that seems overhead for our use, compared to using a file directly. Looking here, I made the following incidental changes: * document that we do handle if fifo files are already created, as we rely on that behavior for logmon restarts * use type system to lock read vs write: currently, fifo library returns `io.ReadWriteCloser` even if fifo is opened for writing only!	2019-04-01 13:18:03 -04:00
Michael Schurter	a4572919cd	Merge pull request #5456 from hashicorp/test-taskenv tests: port pre-0.9 task env tests	2019-03-25 10:41:38 -07:00
Michael Schurter	8efad12538	tests: port pre-0.9 task env tests I chose to make them more of integration tests since there's a lot more plumbing involved. The internal implementation details of how we craft task envs can now change and these tests will still properly assert the task runtime environment is setup properly.	2019-03-25 09:46:53 -07:00
Michael Schurter	9afbc45cff	Bump to dev post-0.9.0-rc1 release	2019-03-22 08:26:30 -07:00
Nomad Release bot	3ab3dd4105	Generate files for 0.9.0-rc1 release	2019-03-21 19:06:13 +00:00

1 2 3 4 5 ...

3693 Commits