open-nomad

Commit Graph

Author	SHA1	Message	Date
Preetha Appan	99eca85206	Scheduler changes to support network at task group level Also includes unit tests for binpacker and preemption. The tests verify that network resources specified at the task group level are properly accounted for	2019-07-31 01:04:08 -04:00
Nick Ethier	7c9520b404	scheduler: fix disk constraints	2019-07-31 01:04:08 -04:00
Nick Ethier	09a4cfd8d7	fix failing tests	2019-07-31 01:04:07 -04:00
Nick Ethier	af66a35924	networking: Add new bridge networking mode implementation	2019-07-31 01:04:06 -04:00
Nick Ethier	15989bba8e	ar: cleanup lint errors	2019-07-31 01:03:18 -04:00
Nick Ethier	66c514a388	Add network lifecycle management Adds a new Prerun and Postrun hooks to manage set up of network namespaces on linux. Work still needs to be done to make the code platform agnostic and support Docker style network initalization.	2019-07-31 01:03:17 -04:00
Lang Martin	8157a7b6f8	system_sched submits failed evals as blocked	2019-07-18 10:32:12 -04:00
Preetha Appan	3484f18984	Fix more tests	2019-06-26 16:30:53 -05:00
Preetha Appan	10e7d6df6d	Remove compat code associated with many previous versions of nomad This removes compat code for namespaces (0.7), Drain(0.8) and other older features from releases older than Nomad 0.7	2019-06-25 19:05:25 -05:00
Mahmood Ali	8d4f914be9	Merge pull request #5790 from hashicorp/b-reschedule-desired-state Mark rescheduled allocs as stopped.	2019-06-13 17:28:59 -04:00
Mahmood Ali	5e6327b6a1	Test behavior no reschedule for service/batch jobs	2019-06-13 16:41:19 -04:00
Mahmood Ali	faf643a375	Don't stop rescheduleLater allocations When an alloc is due to be rescheduleLater, it goes through the reconciler twice: once to be ignored with a follow up evals, and once again when processing the follow up eval where they appear as rescheduleNow. Here, we ignore them in the first run and mark them as stopped in second iteration; rather than stop them twice.	2019-06-13 09:44:41 -04:00
Mahmood Ali	5dc404ecab	Only preempt for network when there is a network When examining preemption for networks, only consider allocs that have networks. Fixes https://github.com/hashicorp/nomad/issues/5793	2019-06-07 18:55:55 -04:00
Mahmood Ali	98575f5788	test: add tests for network devices and preemption	2019-06-07 18:55:02 -04:00
Mahmood Ali	fd8fb8c22b	Stop allocs to be rescheduled Currently, when an alloc fails and is rescheduled, the alloc desired state remains as "run" and the nomad client may not free the resources. Here, we ensure that an alloc is marked as stopped when it's rescheduled. Notice the Desired Status and Description before and after this change: Before: ``` mars-2:nomad notnoop$ nomad alloc status 02aba49e ID = 02aba49e Eval ID = bb9ed1d2 Name = example-reschedule.nodes[0] Node ID = 5853d547 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = run Desired Description = <none> Created = 10s ago Modified = 5s ago Replacement Alloc ID = d6bf872b Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 0/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:12:45Z Finished At = 2019-06-06T21:12:50Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:12:50-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:12:50-04:00 Terminated Exit Code: 1 2019-06-06T17:12:45-04:00 Started Task started by client 2019-06-06T17:12:45-04:00 Task Setup Building Task Directory 2019-06-06T17:12:45-04:00 Received Task received by client ``` After: ``` ID = 5001ccd1 Eval ID = 53507a02 Name = example-reschedule.nodes[0] Node ID = a3b04364 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = stop Desired Description = alloc was rescheduled because it failed Created = 13s ago Modified = 3s ago Replacement Alloc ID = 7ba7ac20 Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 21/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:22:50Z Finished At = 2019-06-06T21:22:55Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:22:55-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:22:55-04:00 Terminated Exit Code: 1 2019-06-06T17:22:50-04:00 Started Task started by client 2019-06-06T17:22:50-04:00 Task Setup Building Task Directory 2019-06-06T17:22:50-04:00 Received Task received by client ```	2019-06-06 17:27:12 -04:00
Mahmood Ali	3eda42d027	tests: Migrated allocs aren't lost Fix `TestServiceSched_NodeDown` for checking that the migrated allocs are actually marked to be stopped. The boolean logic in test made it skip actually checking client status as long as desired status was stop. Here, we mark some jobs for migration while leaving others as running, and we check that lost flag is only set for non-migrated allocs.	2019-06-06 16:05:07 -04:00
Lang Martin	34230577df	describe a pending deployment with auto_promote accurately	2019-05-22 12:32:08 -04:00
Lang Martin	d462639cc9	sched reconcile copy AutoPromote to DeploymentState	2019-05-22 12:32:08 -04:00
Preetha Appan	374eee421f	Fix comment and assert score in test case	2019-05-15 12:35:57 -05:00
Nick Ethier	f0b9f8e37a	fix missing brace	2019-05-15 13:02:04 -04:00
Nick Ethier	0d851b5d11	scheduler: add check to prohibit returning inf during spread boost calculation	2019-05-15 13:00:24 -04:00
Lang Martin	29ea112586	system_sched & test cleanup comments	2019-05-01 12:25:26 -04:00
Lang Martin	c490dacf76	system_sched_test extend the test to check ineligible nodes	2019-05-01 12:25:26 -04:00
Lang Martin	c43bcbd35e	system_sched when a node is filtered, don't mark failure	2019-05-01 12:25:26 -04:00
Lang Martin	aecec5df1b	system_sched_test create partially constrained job	2019-05-01 12:25:26 -04:00
Arshneet Singh	d4e7a5c005	Add comments to functions, and use require instead of assert	2019-04-23 09:57:21 -07:00
Arshneet Singh	4cf4324b8f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	0dd4c109e8	Compat tags	2019-04-23 09:18:01 -07:00
Arshneet Singh	65f5fab131	Add tests for plan normalization	2019-04-23 09:18:01 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Danielle	198a838b61	Merge pull request #5512 from hashicorp/dani/f-alloc-stop alloc-lifecycle: nomad alloc stop	2019-04-23 13:05:08 +02:00
Danielle Lancashire	832f607433	allocs: Add nomad alloc stop This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.	2019-04-23 12:50:23 +02:00
Preetha Appan	bcb5c8c70d	remove stray new line	2019-04-12 10:32:48 -05:00
Preetha Appan	8ddc076c1d	Refactor scheduler package to enable preemption for batch/service jobs	2019-04-10 20:24:01 -05:00
James Rasell	9470507cf4	Add NodeName to the alloc/job status outputs. Currently when operators need to log onto a machine where an alloc is running they will need to perform both an alloc/job status call and then a call to discover the node name from the node list. This updates both the job status and alloc status output to include the node name within the information to make operator use easier. Closes #2359 Cloess #1180	2019-04-10 10:34:10 -05:00
Preetha Appan	da1ce9bcea	Fix bug where scoring metadata would be overridden during an inplace upgrade.	2019-03-12 23:36:46 -05:00
Alex Dadgar	41265d4d61	Change types of weights on spread/affinity	2019-01-30 12:20:38 -08:00
Nick Ethier	24cbf42798	scheduler: fix NPE when deployment is nil, but placement is a canary	2019-01-28 20:22:59 -06:00
Alex Dadgar	5198ff05c3	convert driver to device for device constraint/attributes	2019-01-23 10:58:45 -08:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Preetha Appan	3b054d6135	Remove unnecessary usage of alloc.Resource	2019-01-10 16:36:47 -06:00
Mahmood Ali	0dfa93a3c1	appease linter	2019-01-08 10:58:49 -05:00
Alex Dadgar	8a35d7b1dd	Test recovery	2019-01-07 14:49:41 -08:00
Preetha	f406e66ab8	Merge pull request #4881 from hashicorp/f-device-preemption Device preemption	2018-12-11 18:34:19 -06:00
Preetha Appan	977a4a540d	Early continue after meeting needed count Also adds another optimization that filters out un-needed allocations as a final filtering step	2018-12-11 10:12:18 -06:00
Preetha Appan	f60c52c8ba	Score combinations of allocs from multiple devices for preemption	2018-12-07 18:35:47 -06:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Preetha Appan	63681fac0c	use structured logging everywhere consistently	2018-12-03 08:31:41 -06:00
Preetha Appan	766820def3	addresses some code clarity review comments	2018-11-27 11:02:06 -06:00
Mahmood Ali	96ffe044e7	Simplify map count update logic Co-Authored-By: preetapan <preetha@hashicorp.com>	2018-11-27 10:03:11 -06:00

1 2 3 4 5 ...

605 Commits