Alex Dadgar
4ee603c382
Device hook and devices affect computed node class
...
This PR introduces a device hook that retrieves the device mount
information for an allocation. It also updates the computed node class
computation to take into account devices.
TODO Fix the task runner unit test. The environment variable is being
lost even though it is being properly set in the prestart hook.
2018-11-27 17:25:33 -08:00
Nick Ethier
95362eaa02
Merge pull request #4844 from hashicorp/f-docker-plugin
...
Docker driver plugin
2018-11-20 20:43:03 -05:00
Mahmood Ali
2e6133fd33
nil secrets as recoverable to keep renew attempts
2018-11-20 17:11:55 -05:00
Mahmood Ali
5827438983
Renew past recorded expiry till unrecoverable error
...
Keep attempting to renew Vault token past locally recorded expiry, just
in case the token was renewed out of band, e.g. on another Nomad server,
until Vault returns an unrecoverable error.
2018-11-20 17:10:55 -05:00
Mahmood Ali
5836a341dd
fix typo
2018-11-20 17:10:55 -05:00
Mahmood Ali
93add67e04
round ttl duration for users
2018-11-20 17:10:55 -05:00
Mahmood Ali
4a0544b369
Track renewal expiration properly
2018-11-20 17:10:55 -05:00
Mahmood Ali
79aa934a4b
reconcile interface
2018-11-20 17:10:55 -05:00
Mahmood Ali
6efea6d8fc
Populate agent-info with vault
...
Return Vault TTL info to /agent/self API and `nomad agent-info` command.
2018-11-20 17:10:55 -05:00
Mahmood Ali
6034af5084
Avoid explicit precomputed stats field
...
Seems like the stats field is a micro-optimization that doesn't justify
the complexity it introduces. Removing it and computing the stats from
revoking field directly.
2018-11-20 17:10:54 -05:00
Mahmood Ali
14842200ec
More metrics for Server vault
...
Add a gauge to track remaining time-to-live, duration of renewal request API call.
2018-11-20 17:10:54 -05:00
Mahmood Ali
e1994e59bd
address review comments
2018-11-20 17:10:54 -05:00
Mahmood Ali
35179c9655
Wrap Vault API api errors for easing debugging
2018-11-20 17:10:54 -05:00
Mahmood Ali
55456fc823
Set a 1s floor for Vault renew operation backoff
2018-11-20 17:10:54 -05:00
Mahmood Ali
7ad8f6c103
Merge pull request #4903 from hashicorp/b-delete-versions-mod-while-iter
...
Fix a panic related to batch GC
2018-11-20 15:16:02 -05:00
Mahmood Ali
6281700c0c
address review comments
2018-11-20 13:21:39 -05:00
Nick Ethier
29591a7c2e
task_runner: emit event on task exit with exit result details
2018-11-19 22:59:17 -05:00
Mahmood Ali
d744e71fa9
add a missing no errorassertion
2018-11-19 21:44:00 -05:00
Mahmood Ali
b93643cd96
Fix a panic related to batch GC
...
`deleteJobVersions` does concurrent modifications to iterated items
while iterating, by deleting job versions while it's iterating on them,
2018-11-19 20:59:45 -05:00
Mahmood Ali
bff9c3b3e9
Reproduce a panic related to batch GC
...
Test case that reproduces a panic with the following stacktrace:
```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715]
goroutine 35 [running]:
testing.tRunner.func1(0xc0001e2200)
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387
panic(0x167e400, 0x1c43a30)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530)
/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0003a0420, 0x1756059, 0xb)
/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200)
/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc0001e2200, 0x1777138)
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353
```
2018-11-19 20:58:32 -05:00
Michael Schurter
56ed4f01be
vault: fix panic by checking for nil secret
...
Vault's RenewSelf(...) API may return (nil, nil). We failed to check if
secret was nil before attempting to use it.
RenewSelf:
e3eee5b4fb/api/auth_token.go (L138-L155)
Calls ParseSecret:
e3eee5b4fb/api/secret.go (L309-L311)
If anyone has an idea on how to test this I didn't see any options. We
use a real Vault service, so there's no opportunity to mock the
response.
2018-11-19 17:07:59 -08:00
Danielle Tomlinson
8bf17fe22d
Merge pull request #4875 from hashicorp/f-constraints
...
scheduler: Make != constraints more flexible
2018-11-15 11:04:21 -08:00
Danielle Tomlinson
9c72dafc95
scheduler: Add is_set/is_not_set constraints
...
This adds constraints for asserting that a given attribute or value
exists, or does not exist. This acts as a companion to =, or !=
operators, e.g:
```hcl
constraint {
attribute = "${attrs.type}"
operator = "!="
value = "database"
}
constraint {
attribute = "${attrs.type}"
operator = "is_set"
}
```
2018-11-15 11:00:32 -08:00
Mahmood Ali
046f098bac
Track Node Device attributes and serve them in API
2018-11-14 14:42:29 -05:00
Mahmood Ali
a4a9347501
fix comment typos
2018-11-14 08:36:14 -05:00
Mahmood Ali
1e92161f14
Merge pull request #4858 from hashicorp/b-fix-master-20181109
...
Fix some tests in master
2018-11-13 16:08:26 -05:00
Alex Dadgar
08dc2ea702
Merge pull request #4867 from hashicorp/b-deployment-progress-deadline
...
Blocked evaluation fixes
2018-11-13 10:29:03 -08:00
Mahmood Ali
865419e756
convert all config durations to strings in tests
2018-11-13 10:21:40 -05:00
Mahmood Ali
4e18846fd9
Adjust streaming duration
...
This test expects 11 repeats of the same message emitted at intervals of
200ms; so we need more than 2 seconds to adjust for time sleep
variations and the like. So raising it to 3s here that should be
enough.
2018-11-13 10:21:40 -05:00
Mahmood Ali
1403ad21b9
Changelog job re-run fix
2018-11-13 07:52:51 -05:00
Mahmood Ali
e2d668f21c
Merge pull request #4861 from hashicorp/b-batch-deregister-transaction
...
Run job deregistering in a single transaction
2018-11-12 20:59:44 -05:00
Alex Dadgar
a90dc978e1
Handle new eval being the duplicate properly
2018-11-12 16:02:23 -08:00
Mahmood Ali
8513b3cccb
Comment public functions and batch write txn
2018-11-12 16:09:39 -05:00
Preetha Appan
7ef126a027
Smaller methods, and added tests for RPC layer
2018-11-10 17:37:33 -06:00
Preetha Appan
75662b50d1
Use response object/querymeta/writemeta in scheduler config API
2018-11-10 10:31:10 -06:00
Mahmood Ali
9c0a15f3ce
Run job deregistering in a single transaction
...
Fixes https://github.com/hashicorp/nomad/issues/4299
Upon investigating this case further, we determined the issue to be a race between applying `JobBatchDeregisterRequest` fsm operation and processing job-deregister evals.
Processing job-deregister evals should wait until the FSM log message finishes applying, by using the snapshot index. However, with `JobBatchDeregister`, any single individual job deregistering was applied accidentally incremented the snapshot index and resulted into processing job-deregister evals. When a Nomad server receives an eval for a job in the batch that is yet to be deleted, we accidentally re-run it depending on the state of allocation.
This change ensures that we delete deregister all of the jobs and inserts all evals in a single transactions, thus blocking processing related evals until deregistering complete.
2018-11-09 22:35:26 -05:00
Preetha
3739713ce1
Merge pull request #4839 from hashicorp/b-gc-alloc-jobversion
...
Remove terminal allocations associated with older job modify index
2018-11-09 12:21:42 -06:00
Preetha Appan
39072977d6
Use create index as trigger condition to gc old terminal allocs
2018-11-09 11:44:21 -06:00
Alex Dadgar
2f06d88f47
Merge pull request #4847 from hashicorp/b-blocked-eval
...
Blocked evaluation fixes
2018-11-08 13:40:01 -08:00
Alex Dadgar
98398a8a44
Merge pull request #4842 from hashicorp/b-deployment-progress-deadline
...
Fix multiple bugs with progress deadline handling
2018-11-08 13:31:54 -08:00
Alex Dadgar
991791a513
typo fix
2018-11-08 13:28:27 -08:00
Alex Dadgar
be54e56570
review fixes
2018-11-08 09:48:36 -08:00
Preetha Appan
5f0a9d2cfd
Show preemption output in plan CLI
2018-11-08 09:48:43 -06:00
Alex Dadgar
dbb05357bc
fix test
2018-11-07 11:59:24 -08:00
Alex Dadgar
36abd3a3d8
review comments
2018-11-07 10:33:22 -08:00
Alex Dadgar
e3cbb2c82e
allocs fit checks if devices get oversubscribed
2018-11-07 10:33:22 -08:00
Alex Dadgar
4f9b3ede87
Split device accounter and allocator
2018-11-07 10:32:03 -08:00
Alex Dadgar
6fa893c801
affinities
2018-11-07 10:32:03 -08:00
Alex Dadgar
feb83a2be3
assign devices
2018-11-07 10:32:03 -08:00
Alex Dadgar
2d2248e209
Add devices to allocated resources
2018-11-07 10:32:03 -08:00