Commit Graph

1217 Commits

Author SHA1 Message Date
Lang Martin 91e139dcb5 structs NodeDeregisterBatchRequestType must go at the end 2019-07-10 13:56:20 -04:00
Lang Martin 683ab8d1d2 structs add NodeDeregisterBatchRequest 2019-07-10 13:56:19 -04:00
Lang Martin 3fb82e83a5 structs add back NodeDeregisterRequest.NodeID, compatibility 2019-07-10 13:56:19 -04:00
Lang Martin 77cf037bff struct NodeDeregisterRequest has a batch of NodeIDs 2019-07-10 13:56:19 -04:00
Michael Schurter e10fea1d7a nomad: include snapshot index when submitting plans
Plan application should use a state snapshot at or after the Raft index
at which the plan was created otherwise it risks being rejected based on
stale data.

This commit adds a Plan.SnapshotIndex which is set by workers when
submitting plan. SnapshotIndex is set to the Raft index of the snapshot
the worker used to generate the plan.

Plan.SnapshotIndex plays a similar role to PlanResult.RefreshIndex.
While RefreshIndex informs workers their StateStore is behind the
leader's, SnapshotIndex is a way to prevent the leader from using a
StateStore behind the worker's.

Plan.SnapshotIndex should be considered the *lower bound* index for
consistently handling plan application.

Plans must also be committed serially, so Plan N+1 should use a state
snapshot containing Plan N. This is guaranteed for plans *after* the
first plan after a leader election.

The Raft barrier on leader election ensures the leader's statestore has
caught up to the log index at which it was elected. This guarantees its
StateStore is at an index > lastPlanIndex.
2019-06-24 12:16:46 -07:00
Mahmood Ali 87173111de
Merge pull request #5746 from hashicorp/b-no-updating-inmem-node
set node.StatusUpdatedAt in raft
2019-06-05 19:05:21 -04:00
Lang Martin d46613ff44 structs check TaskGroup.Update for nil 2019-05-22 12:34:57 -04:00
Lang Martin 10a3fd61b0 comment replace COMPAT 0.7.0 for job.Update with more current info 2019-05-22 12:34:57 -04:00
Lang Martin 67ebcc47dd structs comment todo DeploymentStatus & DeploymentStatusDescription 2019-05-22 12:34:57 -04:00
Lang Martin 21bf9fdf90 structs job warnings for taskgroup with mixed auto_promote settings 2019-05-22 12:34:57 -04:00
Lang Martin d27d6f8ede structs validate requires Canary for AutoPromote 2019-05-22 12:32:08 -04:00
Lang Martin f23f9fd99e describe a pending deployment without auto_promote more explicitly 2019-05-22 12:32:08 -04:00
Lang Martin 34230577df describe a pending deployment with auto_promote accurately 2019-05-22 12:32:08 -04:00
Lang Martin b5fd735960 add update AutoPromote bool 2019-05-22 12:32:08 -04:00
Mahmood Ali 6bdbeed319 set node.StatusUpdatedAt in raft
Fix a case where `node.StatusUpdatedAt` was manipulated directly in
memory.

This ensures that StatusUpdatedAt is set in raft layer, and ensures that
the field is updated when node drain/eligibility is updated too.
2019-05-21 16:13:32 -04:00
Preetha 2dcd4291f8
Merge pull request #5702 from hashicorp/f-filter-by-create-index
Filter deployments by create index
2019-05-15 21:50:41 -05:00
Michael Schurter d7e5ace1ed client: do not restart dead tasks until server is contacted
Fixes #1795

Running restored allocations and pulling what allocations to run from
the server happen concurrently. This means that if a client is rebooted,
and has its allocations rescheduled, it may restart the dead allocations
before it contacts the server and determines they should be dead.

This commit makes tasks that fail to reattach on restore wait until the
server is contacted before restarting.
2019-05-14 10:53:27 -07:00
Preetha Appan 07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest 2019-05-13 18:33:41 -05:00
Jasmine Dahilig 30d346ca15
Merge pull request #5665 from hashicorp/b-empty-datacenters
add non-empty string validation for datacenters
2019-05-13 10:23:26 -07:00
Mahmood Ali cf1f3625b4 Update ugorji/go to latest
Our testing so far indicates that ugorji/go/codec maintains backward
compatiblity with the version we are using now, for purposes of Nomad
serialization.

Using latest ugorji/go allows us to get back to using upstream library,
get get the optimizations benefits in RPC paths (including code
generation optimizations).

ugorji/go introduced two significant changes:
* time binary format in debb8e2d2e.  Setting `h.BasicHandle.TimeNotBuiltin = true` restores old behavior
* ugorji/go started honoring `json` tag as well:

v1.1.4 is the latest but has a bug in handling RawString that's fixed in
d09a80c1e0
.
2019-05-09 19:35:58 -04:00
Mahmood Ali 9d3f13e9b3 remove Index field from EmitNodeEventsResponse
`Index` is already included as part of `WriteMeta` embedding.

This is a backward compatible change: Clients never read the field; and
Server refernces to `EmitNodeEventsResponse.Index` would be using the
value in `WriteMeta`, which is consistent with other response structs.
2019-05-08 08:42:26 -04:00
Jasmine Dahilig 016495c368 add non-empty string validation for datacenters 2019-05-03 06:48:02 -07:00
Lang Martin 371014b781
Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config
client fingerprinter doesn't overwrite manual configuration
2019-04-26 12:55:34 -04:00
Danielle Lancashire 3409e0be89 allocs: Add nomad alloc signal command
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.

Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
2019-04-25 12:43:32 +02:00
Arshneet Singh d4e7a5c005 Add comments to functions, and use require instead of assert 2019-04-23 09:57:21 -07:00
Arshneet Singh 4cf4324b8f Remove allowPlanOptimization from schedulers 2019-04-23 09:18:02 -07:00
Arshneet Singh 0dd4c109e8 Compat tags 2019-04-23 09:18:01 -07:00
Arshneet Singh b977748a4b Add code for plan normalization 2019-04-23 09:18:01 -07:00
Danielle 198a838b61
Merge pull request #5512 from hashicorp/dani/f-alloc-stop
alloc-lifecycle: nomad alloc stop
2019-04-23 13:05:08 +02:00
Danielle Lancashire 832f607433 allocs: Add nomad alloc stop
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.

This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.

The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
2019-04-23 12:50:23 +02:00
Lang Martin 7de6e28ddc structs need to keep assert Equal interface implementation for tests 2019-04-19 15:23:49 -04:00
Lang Martin 977d33970b structs equals use labeled continue for clarity 2019-04-19 15:23:48 -04:00
Lang Martin 7b99488afa struct equals use a working pattern for setwise comparison 2019-04-19 15:23:48 -04:00
Lang Martin eba4e29440 client fingerprinter doesn't overwrite manual configuration
Revert "Revert accidental merge of pr #5482"
This reverts commit c45652ab8c113487b9d4fbfb107782cbcf8a85b0.
2019-04-19 15:23:48 -04:00
Preetha Appan 22109d1e20
Add preemption related fields to AllocationListStub 2019-04-18 10:36:44 -05:00
Lang Martin a2a1e7829d Revert accidental merge of pr #5482
Revert "fingerprint Constraints and Affinities have Equals, as set"
This reverts commit 596f16fb5f1a4a6766a57b3311af806d22382609.

Revert "client tests assert the independent handling of interface and speed"
This reverts commit 7857ac5993a578474d0570819f99b7b6e027de40.

Revert "structs missed applying a style change from the review"
This reverts commit 658916e3274efa438beadc2535f47109d0c2f0f2.

Revert "client, structs comments"
This reverts commit be2838d6baa9d382a5013fa80ea016856f28ade2.

Revert "client fingerprint updateNetworks preserves the network configuration"
This reverts commit fc309cb430e62d8e66267a724f006ae9abe1c63c.

Revert "client_test cleanup comments from review"
This reverts commit bc0bf4efb9114e699bc662f50c8f12319b6b3445.

Revert "client Networks Equals is set equality"
This reverts commit f8d432345b54b1953a4a4c719b9269f845e3e573.

Revert "struct cleanup indentation in RequestedDevice Equals"
This reverts commit f4746411cab328215def6508955b160a53452da3.

Revert "struct Equals checks for identity before value checking"
This reverts commit 0767a4665ed30ab8d9586a59a74db75d51fd9226.

Revert "fix client-test, avoid hardwired platform dependecy on lo0"
This reverts commit e89dbb2ab182b6368507dbcd33c3342223eb0ae7.

Revert "refactor error in client fingerprint to include the offending data"
This reverts commit a7fed726c6e0264d42a58410d840adde780a30f5.

Revert "add client updateNodeResources to merge but preserve manual config"
This reverts commit 84bd433c7e1d030193e054ec23474380ff3b9032.

Revert "refactor struts.RequestedDevice to have its own Equals"
This reverts commit 689782524090e51183474516715aa2f34908b8e6.

Revert "refactor structs.Resource.Networks to have its own Equals"
This reverts commit 49e2e6c77bb3eaa4577772b36c62205061c92fa1.

Revert "refactor structs.Resource.Devices to have its own Equals"
This reverts commit 4ede9226bb971ae42cc203560ed0029897aec2c9.

Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources"
This reverts commit 49fbaace5298d5ccf031eb7ebec93906e1d468b5.

Revert "add structs.Resources Equals"
This reverts commit 8528a2a2a6450e4462a1d02741571b5efcb45f0b.

Revert "test that fingerprint resources are updated, net not clobbered"
This reverts commit 8ee02ddd23bafc87b9fce52b60c6026335bb722d.
2019-04-11 10:29:40 -04:00
Lang Martin 07ff740408 fingerprint Constraints and Affinities have Equals, as set 2019-04-11 09:56:22 -04:00
Lang Martin 8f07698c03 structs missed applying a style change from the review 2019-04-11 09:56:22 -04:00
Lang Martin 7258a13c72 client, structs comments 2019-04-11 09:56:22 -04:00
Lang Martin 1878bf694e client Networks Equals is set equality 2019-04-11 09:56:22 -04:00
Lang Martin e1c91afd19 struct cleanup indentation in RequestedDevice Equals 2019-04-11 09:56:22 -04:00
Lang Martin 0c90efebdc struct Equals checks for identity before value checking 2019-04-11 09:56:22 -04:00
Lang Martin 1a594b53f6 refactor struts.RequestedDevice to have its own Equals 2019-04-11 09:56:21 -04:00
Lang Martin ec1ccdeda0 refactor structs.Resource.Networks to have its own Equals
NodeResource.Networks uses the same function
2019-04-11 09:56:21 -04:00
Lang Martin 06008465c4 refactor structs.Resource.Devices to have its own Equals 2019-04-11 09:56:21 -04:00
Lang Martin 36f3022246 add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources 2019-04-11 09:56:21 -04:00
Lang Martin d4567e9909 add structs.Resources Equals 2019-04-11 09:56:21 -04:00
Danielle Lancashire e135876493 allocs: Add nomad alloc restart
This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
2019-04-11 14:25:49 +02:00
Chris Baker 0ba1600545
server/job_endpoint: accept vault token and pass as part of Job.RegisterRequest [#4555] 2019-04-10 10:34:10 -05:00
James Rasell 9470507cf4
Add NodeName to the alloc/job status outputs.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.

This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.

Closes #2359
Cloess #1180
2019-04-10 10:34:10 -05:00
Charlie Voiselle 604c49beb8
Merge pull request #5344 from hashicorp/b-nexteval-for-failed-follow-up
Set NextEval when making `failed-follow-up` evals
2019-02-22 14:14:41 -08:00
Charlie Voiselle 006afdca9b Added comments
* caller should created eval id
* prev/next eval used in failed-follow-up
2019-02-22 10:22:52 -08:00
Michael Schurter 6580ed668e client: don't redownload completed artifacts on retries
Track the download status of each artifact independently so that if only
one of many artifacts fails to download, completed artifacts aren't
downloaded again.
2019-02-20 08:45:12 -08:00
Alex Dadgar 41265d4d61 Change types of weights on spread/affinity 2019-01-30 12:20:38 -08:00
Preetha ec92bf673c
Merge pull request #5223 from hashicorp/f-jobs-list-datacenters
Add Datacenters to the JobListStub struct
2019-01-24 08:13:30 -06:00
Preetha Appan 38422642cb
Use DesiredState to determine whether to stop sending task events 2019-01-22 16:43:32 -06:00
Michael Lange ce7bc4f56f Add Datacenters to the JobsListStub struct
So it can be used for filtering the full list of jobs
2019-01-22 11:16:35 -08:00
Mahmood Ali 7bdd43f3e0 api: avoid codegen for syncing
Given that the values will rarely change, specially considering that any
changes would be backward incompatible change.  As such, it's simpler to
keep syncing manually in the rare occasion and avoid the syncing code
overhead.
2019-01-18 18:52:31 -05:00
Mahmood Ali 253532ec00 api: avoid import nomad/structs pkg
nomad/structs is an internal package and imports many libraries (e.g.
raft, codec) that are not relevant to api clients, and may cause
unnecessary dependency pain (e.g. `github.com/ugorji/go/codec`
version is very old now).

Here, we add a code generator that imports the relevant constants from
`nomad/structs`.

I considered using this approach for other structs, but didn't find a
quick viable way to reduce duplication.  `nomad/structs` use values as
struct fields (e.g. `string`), while `api` uses value pointer (e.g.
`*string`) instead.  Also, sometimes, `api` structs contain deprecated
fields or additional documentation, so simple copy-paste doesn't work.
For these reasons, I opt to keep the status quo.
2019-01-18 14:51:19 -05:00
Nick Ethier 597b7b751d
tr: add retry /w backoff to stats_hook failure 2019-01-12 12:18:24 -05:00
Alex Dadgar 79cfe26021 vet 2019-01-07 14:49:41 -08:00
Alex Dadgar 8a35d7b1dd Test recovery 2019-01-07 14:49:41 -08:00
Danielle Tomlinson 3647b701a6 taskrunner: Emit task events when a hook fails 2018-12-13 18:20:18 +01:00
Alex Dadgar c918a96490 Warn if IOPS is being used 2018-12-06 16:17:09 -08:00
Alex Dadgar 1e3c3cb287 Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Alex Dadgar 4ee603c382 Device hook and devices affect computed node class
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.

TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Nick Ethier 29591a7c2e
task_runner: emit event on task exit with exit result details 2018-11-19 22:59:17 -05:00
Danielle Tomlinson 8bf17fe22d
Merge pull request #4875 from hashicorp/f-constraints
scheduler: Make != constraints more flexible
2018-11-15 11:04:21 -08:00
Danielle Tomlinson 9c72dafc95 scheduler: Add is_set/is_not_set constraints
This adds constraints for asserting that a given attribute or value
exists, or does not exist. This acts as a companion to =, or !=
operators, e.g:

```hcl
constraint {
        attribute = "${attrs.type}"
        operator  = "!="
        value     = "database"
}

constraint {
        attribute = "${attrs.type}"
        operator  = "is_set"
}
```
2018-11-15 11:00:32 -08:00
Mahmood Ali 046f098bac Track Node Device attributes and serve them in API 2018-11-14 14:42:29 -05:00
Alex Dadgar 08dc2ea702
Merge pull request #4867 from hashicorp/b-deployment-progress-deadline
Blocked evaluation fixes
2018-11-13 10:29:03 -08:00
Preetha Appan 5f0a9d2cfd
Show preemption output in plan CLI 2018-11-08 09:48:43 -06:00
Alex Dadgar feb83a2be3 assign devices 2018-11-07 10:32:03 -08:00
Alex Dadgar 2d2248e209 Add devices to allocated resources 2018-11-07 10:32:03 -08:00
Alex Dadgar b1c5d52817 Track jobs by namespace 2018-11-07 10:22:08 -08:00
Preetha Appan 6fdc84cce3
add comment 2018-11-02 18:11:36 -05:00
Preetha Appan a6b714b81c
update preemption tests to use new node resource structs
also includes a fix to remove unnecessary subtraction of network mbits
2018-11-02 17:59:53 -05:00
Preetha b2b52b1ada
Merge pull request #4794 from hashicorp/f-preemption-systemjobs
Preemption for system jobs
2018-11-02 16:28:06 -05:00
Preetha Appan 57fe5050f0
more minor review feedback 2018-11-01 17:05:17 -05:00
Preetha Appan fd60e66f86
Plumb alloc resource cache in a few more places.
also removed now unused method
2018-11-01 16:44:43 -05:00
Mahmood Ali 9da19c6450 address review comments 2018-10-30 13:58:52 -04:00
Mahmood Ali 4937095389 Allow artifacts checksum interpolation
Fixes https://github.com/hashicorp/nomad/issues/4814
2018-10-30 13:24:30 -04:00
Preetha Appan f1c3eb2792
Introduce interface with multiple implementations for resource distance 2018-10-30 11:06:32 -05:00
Preetha Appan 0494a098ce
More style and readablity fixes from review 2018-10-30 11:06:32 -05:00
Preetha Appan 8807c25b11
Modify preemption code to use new style of resource structs 2018-10-30 11:06:32 -05:00
Preetha Appan bd34cbb1f7
Support for new scheduler config API, first use case is to disable preemption 2018-10-30 11:06:32 -05:00
Preetha Appan cc295b90de
Implement preemption for system jobs.
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
2018-10-30 11:06:32 -05:00
Preetha Appan d11064d6ba
structs and API changes to plan and alloc structs needed for preemption 2018-10-30 11:06:32 -05:00
Preetha Appan 9257387a69
Add number of evictions to DesiredUpdates struct to use in CLI/API 2018-10-30 11:06:32 -05:00
Preetha Appan 5ff4b8e36f
REview feedback 2018-10-30 11:06:32 -05:00
Preetha Appan 5b3bfb63eb
structs and API changes to plan and alloc structs needed for preemption 2018-10-30 11:06:32 -05:00
Michael Schurter e060174130 ar: fix leader handling, state restoring, and destroying unrun ARs
* Migrated all of the old leader task tests and got them passing
* Refactor and consolidate task killing code in AR to always kill leader
  tasks first
* Fixed lots of issues with state restoring
* Fixed deadlock in AR.Destroy if AR.Run had never been called
* Added a new in memory statedb for testing
2018-10-19 09:45:45 -07:00
Michael Schurter a4b4d7b266 consul service hook
Deregistration works but difficult to test due to terminal updates not
being fully implemented in the new client/ar/tr.
2018-10-16 16:53:29 -07:00
Alex Dadgar e401c660e7 Implement lifecycle hooks on the task runner 2018-10-16 16:53:29 -07:00
Alex Dadgar a78cefec18 use int64 2018-10-16 15:34:32 -07:00
Preetha Appan 7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs 2018-10-16 16:21:42 -05:00
Alex Dadgar f5a76d8411 review comments 2018-10-15 15:31:13 -07:00
Alex Dadgar f9b056e1d1 Replace attributes map with new Attribute object 2018-10-13 14:08:58 -07:00
Alex Dadgar 04ba425dd5 validate constraints/affinities 2018-10-13 12:27:49 -07:00
Alex Dadgar 9b5aaac410 Device feasability checker 2018-10-13 12:27:49 -07:00
Alex Dadgar bfb4caa2e7 node devices 2018-10-13 12:27:49 -07:00
Alex Dadgar 5a07f9f96e parse affinities and constraints on devices 2018-10-11 14:05:19 -07:00
Alex Dadgar 6b08b9d6b6 Define device request structs 2018-10-08 15:38:03 -07:00
Alex Dadgar 01f8e5b95f renames 2018-10-04 14:57:25 -07:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar bac5cb1e8b Scheduler uses allocated resources 2018-10-02 17:08:25 -07:00
Alex Dadgar 147d2430a1 allocated resources structs 2018-09-29 18:47:28 -07:00
Alex Dadgar 5c8697667e Node reserved resources 2018-09-29 18:44:55 -07:00
Alex Dadgar 3183153315 Node resources on client 2018-09-29 17:23:41 -07:00
Alex Dadgar 6a21f9fe96 Unique TriggerBy for blocked evals
Give blocked evals a unique triggerby reason to make debugging a chain
of evaluations easier.
2018-09-24 14:47:49 -07:00
Alex Dadgar d7f5be9148 Better comment on snapshotindex 2018-09-24 13:53:43 -07:00
Alex Dadgar 99498da6ed Denormalize jobs in plan and ignore resources of terminal allocs
Denormalize jobs in AppendAllocs:
AppendAlloc was originally only ever called for inplace upgrades and new
allocations. Both these code paths would remove the job from the
allocation. Now we use this to also add fields such as FollowupEvalID
which did not normalize the job. This is only a performance enhancement.

Ignore terminal allocs:
Failed allocations are annotated with the followup Eval ID when one is
created to replace the failed allocation. However, in the plan applier,
when we check if allocations fit, these terminal allocations were not
filtered. This could result in the plan being rejected if the node would
be overcommited if the terminal allocations resources were considered.
2018-09-24 13:53:43 -07:00
Preetha Appan 996484981c
Fix panic when reschedule policy for allocation can't be looked up
because its task group changed
2018-09-05 17:01:02 -05:00
Alex Dadgar cc92cd92cd
Merge pull request #4642 from hashicorp/b-vet
Fix vet errors and use newer go version in travis
2018-09-04 17:04:02 -07:00
Alex Dadgar c6576ddac1 Fix make check errors 2018-09-04 16:03:52 -07:00
Preetha Appan 26288b9522
Fix more review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 751c0eb5a5
code review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 4f8e925b54
Move topk and delay heap to separate packages under lib 2018-09-04 16:10:11 -05:00
Preetha Appan 9bc0962527
Track top k nodes by norm score rather than top k nodes per scorer 2018-09-04 16:10:11 -05:00
Preetha Appan 6ed527c636
Use heap to store top K scoring nodes.
Scoring metadata is now aggregated by scorer type to make it easier
to parse when reading it in the CLI.
2018-09-04 16:10:11 -05:00
Preetha Appan dd5fe6373f
Fix scoring logic for uneven spread to incorporate current alloc count
Also addressed other small code review comments
2018-09-04 16:10:11 -05:00
Preetha Appan e72c0fe527
more cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 92d37acc2a
comment and formatting cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 5812f906c8
Allow empty spread targets, and validate target percentages. 2018-09-04 16:10:11 -05:00
Preetha Appan 71bff00326
validate spread from job/task group validate methods 2018-09-04 16:10:11 -05:00
Preetha Appan fbd0004707
Fix warnings 2018-09-04 16:10:11 -05:00
Preetha Appan 5eb82b6260
Validate method, and rename ratio field to percent 2018-09-04 16:10:11 -05:00
Preetha Appan 0037d72fa8
Structs and validation for spread 2018-09-04 16:10:11 -05:00
Preetha Appan c407e3626f
More review comments 2018-09-04 16:10:11 -05:00
Preetha Appan dbbb4a957a
Fail validation if system job has affinities 2018-09-04 16:10:11 -05:00
Preetha Appan 0bc030c6fb
Treat set_contains as a synonym of set_contains_all 2018-09-04 16:10:11 -05:00
Preetha Appan f06c7ab2ad
Fix Copy method for job and task to include affinities 2018-09-04 16:10:11 -05:00
Preetha Appan 9f0caa9c3d
Affinity parsing, api and structs 2018-09-04 16:10:11 -05:00
Nick Ethier 41e010cdc2
nomad: add 'Dispatch' field to Job
New -bash: Dispatch: command not found field is used to denote if the Job is a child dispatched job of
a parameterized job.
2018-06-11 11:59:03 -04:00
Alex Dadgar f2b2e0482b code review fixes 2018-05-31 10:57:08 -07:00
Alex Dadgar 195e19827b Deployment adds JobSpecModifyIndex
Deployment tracks the Job.JobModifyIndex so that PUTS against /v1/jobs
can be more easily coorelated with the created deployment.

Fixes https://github.com/hashicorp/nomad/issues/4301
2018-05-30 11:33:56 -07:00
Alex Dadgar 21c5ed850d Register events 2018-05-22 14:06:33 -07:00
Alex Dadgar 17aac1c9de node heartbeat missed event 2018-05-22 14:05:46 -07:00
Alex Dadgar 5f2080bc26 Emit events based on eligibility 2018-05-22 14:04:59 -07:00
Alex Dadgar 86be50fa05
Merge pull request #4284 from hashicorp/f-drain-event
Emit Node Events for draining
2018-05-22 21:04:18 +00:00
Preetha 159888a856
Merge pull request #4274 from hashicorp/f-force-rescheduling
Add CLI and API support for forcing rescheduling of failed allocs
2018-05-21 16:24:22 -07:00
Alex Dadgar 0cb31feb1f Add node event when draining is set/removed/updated 2018-05-10 16:54:43 -07:00
Alex Dadgar a35248d1d8 Plumb event via FSM 2018-05-10 16:30:54 -07:00
Preetha Appan b12df3c64b
Added CLI for evaluating job given ID, and modified client API for evaluate to take a request payload 2018-05-09 15:04:27 -05:00
Chelsea Holland Komlo d51611040f Add driver health information to node list stub 2018-05-09 11:21:54 -04:00
Preetha Appan c1b92c284e
Work in progress - force rescheduling of failed allocs 2018-05-08 17:26:57 -05:00
Michael Schurter e90d051c43
consul: change hashed canary bytes 2018-05-07 14:55:01 -05:00
Alex Dadgar 8626c1b94a
Reschedule when we have canaries properly 2018-05-07 14:55:01 -05:00
Michael Schurter 50e04c976e
consul: support canary tags for services
Also refactor Consul ServiceClient to take a struct instead of a massive
set of arguments. Meant updating a lot of code but it should be far
easier to extend in the future as you will only need to update a single
struct instead of every single call site.

Adds an e2e test for canary tags.
2018-05-07 14:55:01 -05:00
Alex Dadgar f4af30fbb5
Canary tags structs 2018-05-07 14:50:01 -05:00
Alex Dadgar f95ab4ade8
Mark canaries on creation, and unmark on promotion 2018-05-07 14:50:01 -05:00
Alex Dadgar 224b3092ae
change default to 10m and docs 2018-05-07 14:50:01 -05:00
Alex Dadgar 8a81038cdb
Set Reschedule from deployment watcher 2018-05-07 14:50:01 -05:00
Alex Dadgar fcf4f582d0
small review feedback fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar e5caaf3358
Small test fix 2018-05-07 14:50:01 -05:00
Alex Dadgar 99e00fb774
Pass through timestamp 2018-05-07 14:50:01 -05:00
Alex Dadgar 1336002255
Progress deadline in deployment state 2018-05-07 14:50:01 -05:00
Alex Dadgar ee50789c22
Initial implementation 2018-05-07 14:50:01 -05:00
Michael Schurter f6a4713141 consul: make grpc checks more like http checks 2018-05-04 11:08:11 -07:00
Michael Schurter 382caec1e1 consul: initial grpc implementation
Needs to be more like http.
2018-05-04 11:08:11 -07:00
Preetha Appan 274bed1892
Add RescheduleTracker to allocs list stub struct 2018-05-01 14:53:47 -05:00
Alex Dadgar 15ad3f94af Fix command line 2018-04-26 15:46:22 -07:00
Alex Dadgar d0f237086b UX touchups 2018-04-26 15:24:27 -07:00
Chelsea Holland Komlo fca0169dbc handle potential panic in cron parsing 2018-04-26 16:57:45 -04:00
Alex Dadgar 4f2a7b6949 Fix copying drivers 2018-04-16 15:45:51 -07:00
Preetha Appan 9f84e17bfd
dont print reschedule policy in error message 2018-04-11 17:07:14 -05:00
Preetha Appan a7b7b662ed
Make system jobs fail validation if they contain a reschedule stanza 2018-04-11 14:56:20 -05:00
Preetha 6254d75eee
Merge pull request #4101 from hashicorp/b-rescheduling-edge-fixes
Fixes edge cases around timing/ task finish time being set more than once
2018-04-04 16:18:21 -05:00
Preetha Appan 5e4525bd30
Moves setting finishedAt to the right place and adds two unit tests. 2018-04-04 14:38:15 -05:00
Michael Schurter b1a90462a8
Merge pull request #4094 from hashicorp/b-drain-panic
drain: fix double-close panic on drain future
2018-04-04 10:31:14 -07:00
Alex Dadgar 4c9c6decd3
Merge pull request #4100 from hashicorp/b-vault-no-auth
Improve handling of Vault errors
2018-04-03 17:23:43 -07:00
Alex Dadgar 9617a13a2b Correctly handle the upgrade path of a node being drained when applying Raft logs 2018-04-03 15:32:44 -07:00
Preetha Appan 00537c739b
Fixes edge cases around timing and task finish time being set more than once 2018-04-03 16:34:59 -05:00
Alex Dadgar 58a3ec3fb2 Improve Vault error handling 2018-04-03 14:29:22 -07:00
Michael Schurter 6840becf46 drain: refactor batch_future into its own file
aka What If structs.go Wasn't So Big?
2018-04-02 16:40:06 -07:00
Alex Dadgar dc03fab29b Canonicalize migrate 2018-03-29 17:42:58 -07:00
Michael Schurter 62e9553333
Merge pull request #4069 from hashicorp/f-hashealth
add HasHealth helper for nil checks
2018-03-29 17:03:20 -07:00
Alex Dadgar 301704091b Handle upgrade where Node doesn't have eligiblity
This PR handles upgrading a node that has no scheduling eligiblity set.
2018-03-29 16:52:23 -07:00
Preetha 9a732c4acb
Merge pull request #4071 from hashicorp/b-handle-missing-finishedat
handle missing finishedAt
2018-03-29 17:11:34 -05:00
Preetha 81d48fc7cf
Merge pull request #4079 from hashicorp/b-filter-desiredstop
Filter desired status stop allocs correctly
2018-03-29 15:36:22 -05:00
Preetha Appan c8317532ff
Use time from task events if task state does not have FinishedAt set 2018-03-29 14:05:56 -05:00
Alex Dadgar b194f93f2f Disallow Update stanza on Batch 2018-03-29 11:28:56 -07:00
Michael Schurter 91b5bb58d9 add HasHealth helper for nil checks
We performed the DeploymentStatus nil checks a couple different ways, so
hopefully this helper will consoldiate them and make it more clear what
the code is doing.
2018-03-29 09:29:19 -07:00
Preetha Appan 5090fefe96
Filter out allocs with DesiredState = stop, and unit tests 2018-03-29 09:28:52 -05:00
Preetha Appan 2da661595d
If FinishedAt is not set use alloc's modify time for rescheduling logic 2018-03-29 07:42:58 -05:00
Alex Dadgar de4b3772f1 Create evals for system jobs when drain is unset
This PR creates evals for system jobs when:

* Drain is unset and mark eligible is true
* Eligibility is restored to the node
2018-03-27 15:53:24 -07:00
Chelsea Holland Komlo b522a0fadc fix up to string to use time.Time 2018-03-27 15:43:57 -04:00
Chelsea Holland Komlo 003bc209b9 use time.Time for node events for compatibility 2018-03-27 15:43:57 -04:00
Alex Dadgar 59005d1d26
Merge pull request #4049 from hashicorp/b-tunnel
Only track nodes if the conn is from the node
2018-03-27 12:39:34 -07:00
Alex Dadgar 5dacb057b7 Only track nodes if the conn is from the node
Fixes a bug in which a connection to a Nomad server was treated as a
connection to a node because the server forwarded a node specific RPC.
2018-03-27 09:59:31 -07:00
Preetha Appan 33e170c15d
s/linear/constant/g 2018-03-26 14:45:09 -05:00
Preetha Appan 7db930b3c3
Extra test case and better error message for ambiguous config 2018-03-26 13:30:09 -05:00
Preetha Appan fbd56c35a8
Adds additional validation for ambigous settings (having both unlimited and attempts set) 2018-03-24 10:29:20 -05:00
Alex Dadgar 39987d5236
Merge branch 'master' into b-acl-name 2018-03-22 14:51:40 -07:00
Michael Schurter a7f627e34c eligbile -> eligible 2018-03-21 16:55:22 -07:00
Michael Schurter a4f346abeb remove spurious TODOs and FIXMEs 2018-03-21 16:55:22 -07:00
Michael Schurter 922842546c JobNs -> NamespacedID
Also drop the New func as it's easy to swap the order of arguments since
they're both strings.
2018-03-21 16:51:45 -07:00
Michael Schurter 8dc7d9fb6a drainer: RegisterJob -> RegisterJobs
Test job watcher
2018-03-21 16:51:45 -07:00
Alex Dadgar 2d91b9dfba Batch drain update 2018-03-21 16:51:44 -07:00
Alex Dadgar 7b2bad8c5e Toggle Drain allows resetting eligibility
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar 405dab2253 integration test and basic fixes 2018-03-21 16:51:44 -07:00
Alex Dadgar e63bcb474d Drainer 2018-03-21 16:51:44 -07:00
Alex Dadgar 4754366640 job watcher 2018-03-21 16:51:44 -07:00
Alex Dadgar a37329189a Improve DeadlineTime helper 2018-03-21 16:51:44 -07:00
Alex Dadgar 0fba0101b6 RPC/FSM/State Store for Eligibility 2018-03-21 16:51:44 -07:00
Alex Dadgar 2f5309d82a Remove update time 2018-03-21 16:51:43 -07:00
Alex Dadgar 010228577e Drain cli, api, http 2018-03-21 16:51:43 -07:00
Alex Dadgar e459a666ed Node.Drain takes strategy 2018-03-21 16:49:48 -07:00
Michael Schurter d1ec65d765 switch to new raft DesiredTransition message 2018-03-21 16:49:48 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter c0542474db drain: initial drainv2 structs and impl 2018-03-21 16:49:48 -07:00
Chelsea Komlo 6fc9231dac
Merge pull request #3856 from hashicorp/f-client-add-health-checks
Client driver health checks for Docker
2018-03-21 18:05:00 -04:00
Preetha 01898b2c25
Merge pull request #4007 from hashicorp/f-show-rescheduling-cli-job-status
Show a section on upcoming delayed evaluations when applicable
2018-03-21 14:37:38 -05:00
Chelsea Holland Komlo 3aa726baab fix scheduler driver name; create node structs file 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 0bde357731 add concept of health checks to fingerprinters and nodes
fix up feedback from code review

add driver info for all drivers to node
2018-03-21 15:15:25 -04:00
Preetha 17f2f52f08
Merge pull request #3979 from hashicorp/b_update_compat_delete
Delete compatibility code for job level update stanza
2018-03-21 09:17:01 -05:00
Preetha Appan 31a3c81c3b
Show a section on upcoming delayed evaluations when applicable 2018-03-19 21:42:37 -05:00
Preetha Appan 33a5a72323
Make suggested interval round to seconds, and more end to end test cases 2018-03-19 14:56:52 -05:00
Alex Dadgar 586ae36d13 Batch Deregister RPC 2018-03-16 10:53:03 -07:00
Preetha Appan 9a5e6edf1f
Rename DelayCeiling to MaxDelay 2018-03-14 16:10:32 -05:00
Preetha Appan 9fed0d2103
Get reschedule policy from the alloc directly 2018-03-14 16:10:32 -05:00
Preetha Appan 4d5e9bcb45
Extra comments, remove unnecessary if condition 2018-03-14 16:10:32 -05:00
Preetha Appan 1ab8f2b57a
Address some code review comments 2018-03-14 16:10:32 -05:00
Preetha Appan 342c3fb961
Added FollowupEvalID field and helper methods to calculate reschedule eligibility based on delay 2018-03-14 16:10:32 -05:00
Preetha Appan 87538fc87d
Fix formatting 2018-03-14 16:10:32 -05:00
Preetha Appan 5f50c3d618
Add new reschedule options to API layer and unit tests 2018-03-14 16:10:32 -05:00
Preetha Appan 10c9662222
New delayed rescheduling options, validation function and unit tests 2018-03-14 16:10:32 -05:00
Preetha Appan a924183604
Remove compat code for upgrade stanza that copied state from job level update stanza 2018-03-14 10:21:46 -05:00
Chelsea Komlo 810eedfa2a
Merge pull request #3945 from hashicorp/f-add-node-events
Add node events
2018-03-14 08:42:55 -04:00
Preetha 360d6e5a92
Merge pull request #3968 from hashicorp/f-nicer-vault-error
Make server side error messages from vault more clearer
2018-03-13 20:49:39 -05:00
Preetha Appan 7b5955826d
Fix lint warning 2018-03-13 20:49:01 -05:00
Alex Dadgar de6ebb6e6c small cleanup 2018-03-13 18:08:22 -07:00
Alex Dadgar 63e14b7d63 nodeevents -> events 2018-03-13 18:08:22 -07:00
Alex Dadgar d3c3deffad fixes 2018-03-13 18:08:22 -07:00
Chelsea Holland Komlo b41501e442 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo 1488b076d1 code review feedback 2018-03-13 18:08:21 -07:00
Chelsea Holland Komlo a8bcbd81e6 batch submitting node events 2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo d30c269fbe code review feedback 2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo 0f306aa0dd move all structs to structs file 2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo 00d9923454 Ensure node updates don't strip node events
Add node events to CLI
2018-03-13 18:05:40 -07:00
Chelsea Holland Komlo 4ede27a3c8 RPC, FSM, state store for Node.EmitEvent
add node event when registering a node for the first time
2018-03-13 18:05:40 -07:00
Preetha Appan e08ecb7da2
Fix incorrect comment 2018-03-13 18:25:41 -05:00
Preetha Appan 9618f52746
Remove error wrapping and make vault connection server side errors clearer. 2018-03-13 17:09:03 -05:00
Michael Schurter 7dd7fbcda2 non-Existent -> nonexistent
Reverting from #3963

https://www.merriam-webster.com/dictionary/existent
2018-03-12 11:59:33 -07:00
Josh Soref 173ce63fe9 spelling: transition 2018-03-11 19:06:05 +00:00
Josh Soref c4c4645f46 spelling: summary 2018-03-11 19:00:07 +00:00
Josh Soref 3140a5dcf9 spelling: response 2018-03-11 18:48:24 +00:00
Josh Soref fdd7b5ee9d spelling: reschedule 2018-03-11 18:50:50 +00:00
Josh Soref c384e14f3d spelling: request 2018-03-11 18:42:43 +00:00
Josh Soref fb5beb664d spelling: monotonically 2018-03-11 18:28:31 +00:00