open-nomad

Author	SHA1	Message	Date
Jasmine Dahilig	ed9740db10	Merge pull request #5664 from hashicorp/f-http-hcl-region backfill region from hcl for jobUpdate and jobPlan	2019-06-13 12:25:01 -07:00
Jasmine Dahilig	51e141be7a	backfill region from job hcl in jobUpdate and jobPlan endpoints - updated region in job metadata that gets persisted to nomad datastore - fixed many unrelated unit tests that used an invalid region value (they previously passed because hcl wasn't getting picked up and the job would default to global region)	2019-06-13 08:03:16 -07:00
Mahmood Ali	e31159bf1f	Prepare for 0.9.4 dev cycle	2019-06-12 18:47:50 +00:00
Nomad Release bot	4803215109	Generate files for 0.9.3 release	2019-06-12 16:11:16 +00:00
Danielle	f923b568e0	Merge pull request #5821 from hashicorp/dani/b-5770 trhooks: Add TaskStopHook interface to services	2019-06-12 17:30:49 +02:00
Danielle Lancashire	c326344b57	trt: Fix test	2019-06-12 17:06:11 +02:00
Danielle Lancashire	13d76e35fd	trhooks: Add TaskStopHook interface to services We currently only run cleanup Service Hooks when a task is either Killed, or Exited. However, due to the implementation of a task runner, tasks are only Exited if they every correctly started running, which is not true when you recieve an error early in the task start flow, such as not being able to pull secrets from Vault. This updates the service hook to also call consul deregistration routines during a task Stop lifecycle event, to ensure that any registered checks and services are cleared in such cases. fixes #5770	2019-06-12 16:00:21 +02:00
Mahmood Ali	2acf30fdd3	Fallback to `alloc.TaskResources` for old allocs When a client is running against an old server (e.g. running 0.8), `alloc.AllocatedResources` may be nil, and we need to check the deprecated `alloc.TaskResources` instead. Fixes https://github.com/hashicorp/nomad/issues/5810	2019-06-11 10:32:53 -04:00
Mahmood Ali	7a4900aaa4	client/allocrunner: depend on internal task state Alloc runner already tracks tasks associated with alloc. Here, we become defensive by relying on the alloc runner tracked tasks, rather than depend on server never updating the job unexpectedly.	2019-06-10 18:42:51 -04:00
Mahmood Ali	d30c3d10b0	Merge pull request #5747 from hashicorp/b-test-fixes-20190521-1 More test fixes	2019-06-05 19:09:18 -04:00
Mahmood Ali	935ee86e92	Merge pull request #5737 from fwkz/fix-restart-attempts Fix restart attempts of `restart` stanza in `delay` mode.	2019-06-05 19:05:07 -04:00
Mahmood Ali	97957fbf75	Prepare for 0.9.3 dev cycle	2019-06-05 14:54:00 +00:00
Nomad Release bot	43bfbf3fcc	Generate files for 0.9.2 release	2019-06-05 11:59:27 +00:00
Mahmood Ali	a9f81f2daa	client config flag to disable remote exec This exposes a client flag to disable nomad remote exec support in environments where access to tasks ought to be restricted. I used `disable_remote_exec` client flag that defaults to allowing remote exec. Opted for a client config that can be used to disable remote exec globally, or to a subset of the cluster if necessary.	2019-06-03 15:31:39 -04:00
Mahmood Ali	a4ead8ff79	remove 0.9.2-rc1 generated code	2019-05-23 11:14:24 -04:00
Nomad Release bot	6d6bc59732	Generate files for 0.9.2-rc1 release	2019-05-22 19:29:30 +00:00
Michael Schurter	a54511b304	Merge pull request #5731 from hashicorp/b-ignore-dc client: drop unused DC field from servers list	2019-05-22 08:42:15 -07:00
Mahmood Ali	84419f08ce	client: synchronize client.invalidAllocs access invalidAllocs may be accessed and manipulated from different goroutines, so must be locked.	2019-05-22 09:37:49 -04:00
Danielle Lancashire	27583ed8c1	client: Pass servers contacted ch to allocrunner This fixes an issue where batch and service workloads would never be restarted due to indefinitely blocking on a nil channel. It also raises the restoration logging message to `Info` to simplify log analysis.	2019-05-22 13:47:35 +02:00
Mahmood Ali	9df1e00f35	tests: fix data race in client/allocrunner/taskrunner/template TestTaskTemplateManager_Rerender_Signal Given that Signal may be called multiple times, blocking for `SignalCh` isn't sufficient to synchornizing access to Signals field.	2019-05-21 13:56:58 -04:00
Mahmood Ali	b06e585713	Merge pull request #5739 from hashicorp/r-rm-logmon-syslog-deadcode logmon: remove syslog server deadcode	2019-05-21 11:46:48 -04:00
Mahmood Ali	eca23bf9c4	Merge pull request #5742 from hashicorp/b-test-fixes-20190520 Grab bag of (primarily race) test fixes	2019-05-21 11:46:36 -04:00
Mahmood Ali	e88bb61488	Merge pull request #5740 from hashicorp/b-nomad-exec-term-race exec: allow drivers to handle stream termination	2019-05-21 11:24:12 -04:00
Mahmood Ali	b475ccbe3e	client: synchronize access to ar.alloc `allocRunner.alloc` is protected by `allocRunner.allocLock`, so let's use `allocRunner.Alloc()` helper function to access it.	2019-05-21 09:55:05 -04:00
Mahmood Ali	2a7b073167	tests: fix fifo lib race Accidentally accessed outer `err` variable inside a goroutine	2019-05-21 09:49:56 -04:00
Mahmood Ali	296bd41c9e	tests: fix data race in client TestDriverManager_Fingerprint_Periodic	2019-05-21 09:49:56 -04:00
Mahmood Ali	d9e59eece0	tests: fix client TestFS_Stream data race Close is invoked in a different goroutine from test	2019-05-21 09:49:56 -04:00
Mahmood Ali	75e0a3f405	exec: allow drivers to handle stream termination Without this change, alloc_endpoint cancel the context passed to handler when we detect EOF. This races driver in setting exit code; and we run into a case where the exec process terminates cleanly yet we attempt to mark it as failed with context error. Here, we rely on the driver to handle errors returned from Stream and without racing to set an error.	2019-05-21 09:40:25 -04:00
Mahmood Ali	974bcbecc9	logmon: remove syslog server deadcode Remove unused syslog server related code that got replaced by the docker logger in Nomad 0.9	2019-05-21 09:36:43 -04:00
fwkz	8b84bec95a	Fix restart attempts of `restart` stanza. Number of restarts during 2nd interval is off by one.	2019-05-21 13:27:19 +02:00
Michael Schurter	d41abda957	client: drop unused DC field from servers list See #5730 for details.	2019-05-20 14:19:15 -07:00
Michael Schurter	2fe0768f3b	docs: changelog entry for #5669 and fix comment	2019-05-14 10:54:00 -07:00
Michael Schurter	af9096c8ba	client: register before restoring Registration and restoring allocs don't share state or depend on each other in any way (syncing allocs with servers is done outside of registration). Since restoring is synchronous, start the registration goroutine first. For nodes with lots of allocs to restore or close to their heartbeat deadline, this could be the difference between becoming "lost" or not.	2019-05-14 10:53:27 -07:00
Michael Schurter	e07f73bfe0	client: do not restart dead tasks until server is contacted (try 2) Refactoring of 104067bc2b2002a4e45ae7b667a476b89addc162 Switch the MarkLive method for a chan that is closed by the client. Thanks to @notnoop for the idea! The old approach called a method on most existing ARs and TRs on every runAllocs call. The new approach does a once.Do call in runAllocs to accomplish the same thing with less work. Able to remove the gate abstraction that did much more than was needed.	2019-05-14 10:53:27 -07:00
Michael Schurter	d7e5ace1ed	client: do not restart dead tasks until server is contacted Fixes #1795 Running restored allocations and pulling what allocations to run from the server happen concurrently. This means that if a client is rebooted, and has its allocations rescheduled, it may restart the dead allocations before it contacts the server and determines they should be dead. This commit makes tasks that fail to reattach on restore wait until the server is contacted before restarting.	2019-05-14 10:53:27 -07:00
Michael Schurter	3b1f8991a1	client: log when server list changes Stop logging in the happy path when nothing has changed.	2019-05-13 15:42:55 -07:00
Michael Schurter	48db8135da	Merge pull request #5492 from hashicorp/f-allocated-mem client: expose allocated memory per task	2019-05-13 13:31:22 -07:00
Lang Martin	1d03a43ce2	Merge pull request #5642 from hashicorp/b-network-fingerprinting-ipv4 network fingerprinting multiple IPs on the configured network device	2019-05-13 11:46:53 -04:00
Michael Schurter	1c4e585fa7	client: expose allocated memory per task Related to #4280 This PR adds `client.allocs.<job>.<group>.<alloc>.<task>.memory.allocated` as a gauge in bytes to metrics to ease calculating how close a task is to OOMing. ``` 'nomad.client.allocs.memory.allocated.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 268435456.000 'nomad.client.allocs.memory.cache.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 5677056.000 'nomad.client.allocs.memory.kernel_max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000 'nomad.client.allocs.memory.kernel_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000 'nomad.client.allocs.memory.max_usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8908800.000 'nomad.client.allocs.memory.rss.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 876544.000 'nomad.client.allocs.memory.swap.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 0.000 'nomad.client.allocs.memory.usage.example.cache.6d98cbaf-d6bc-2a84-c63f-bfff8905a9d8.redis.rusty': 8208384.000 ```	2019-05-10 11:12:12 -07:00
Lang Martin	f6bc45dd23	client improve a comment in updateNetworks	2019-05-10 11:25:04 -04:00
Mahmood Ali	919827f2df	Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base nomad exec part 1: plumbing and docker driver	2019-05-09 18:09:27 -04:00
Mahmood Ali	ab2cae0625	implement client endpoint of nomad exec Add a client streaming RPC endpoint for processing nomad exec tasks, by invoking the relevant task handler for execution.	2019-05-09 16:49:08 -04:00
Preetha	1d02886bb6	Merge pull request #5654 from hashicorp/b-hearbeat-lockfix Remove unnecessary locking and serverlist syncing in heartbeats	2019-05-08 13:36:39 -05:00
Preetha Appan	3289e7f4a0	fix typo and add one more test scenario	2019-05-08 10:54:22 -05:00
Preetha Appan	db6b291a5a	code review feedback	2019-05-07 16:23:32 -05:00
Chris Baker	93ec1293be	stale allocation data leads to incorrect (and even negative) metrics (#5637 ) * client: was not using up-to-date client state in determining which alloc count towards allocated resources * Update client/client.go Co-Authored-By: cgbaker <cgbaker@hashicorp.com>	2019-05-07 15:54:36 -04:00
Preetha Appan	b063fc81a4	Remove unnecessary locking and serverlist syncing in heartbeats This removes an unnecessary shared lock between discovery and heartbeating which was causing heartbeats to be missed upon retries when a single server fails. Also made a drive by fix to call the periodic server shuffler goroutine.	2019-05-06 14:44:55 -05:00
Michael Schurter	8c7b3ff45a	Fix comment Co-Authored-By: preetapan <preetha@hashicorp.com>	2019-05-03 10:01:30 -05:00
Michael Schurter	e19fa33f9c	Remove unnecessary boolean clause Co-Authored-By: preetapan <preetha@hashicorp.com>	2019-05-03 10:00:17 -05:00
Preetha Appan	b99a204582	Update deployment health on failed allocations only if health is unset This fixes a confusing UX where a previously successful deployment's healthy/unhealthy count would get updated if any allocations failed after the deployment was already marked as successful.	2019-05-02 22:59:56 -05:00

1 2 3 4 5 ...

3759 commits