Commit Graph

2829 Commits

Author SHA1 Message Date
Danielle Lancashire 2fb93a6229 evalbroker: test for no enqueue on disabled 2019-05-15 11:02:21 +02:00
Nick Ethier ade97bc91f
fixup #5172 and rebase against master 2019-05-14 14:37:34 -04:00
Nick Ethier cab6a95668
Merge branch 'master' into pr/5172
* master: (912 commits)
  Update redirects.txt
  Added redirect for Spark guide link
  client: log when server list changes
  docs: mention regression in task config validation
  fix update to changelog
  update CHANGELOG with datacenter config validation https://github.com/hashicorp/nomad/pull/5665
  typo: "atleast" -> "at least"
  implement nomad exec for rkt
  docs: fixed typo
  use pty/tty terminology similar to github.com/kr/pty
  vendor github.com/kr/pty
  drivers: implement streaming exec for executor based drivers
  executors: implement streaming exec
  executor: scaffolding for executor grpc handling
  client: expose allocated memory per task
  client improve a comment in updateNetworks
  stalebot: Add 'thinking' as an exempt label (#5684)
  Added Sparrow link
  update links to use new canonical location
  Add redirects for restructing done in GH-5667
  ...
2019-05-14 14:10:33 -04:00
Michael Schurter d7e5ace1ed client: do not restart dead tasks until server is contacted
Fixes #1795

Running restored allocations and pulling what allocations to run from
the server happen concurrently. This means that if a client is rebooted,
and has its allocations rescheduled, it may restart the dead allocations
before it contacts the server and determines they should be dead.

This commit makes tasks that fail to reattach on restore wait until the
server is contacted before restarting.
2019-05-14 10:53:27 -07:00
Danielle Lancashire d9815888ed evalbroker: Simplify nextDelayedEval locking 2019-05-14 14:06:27 +02:00
Danielle Lancashire 38562afbc1 evalbroker: No new enqueues when disabled
Currently when an evalbroker is disabled, it still recieves delayed
enqueues via log application in the fsm. This causes an ever growing
heap of evaluations that will never be drained, and can cause memory
issues in larger clusters, or when left running for an extended period
of time without a leader election.

This commit prevents the enqueuing of evaluations while we are
disabled, and relies on the leader restoreEvals routine to handle
reconciling state during a leadership transition.

Existing dequeues during an Enabled->Disabled broker state transition are
handled by the enqueueLocked function dropping evals.
2019-05-14 13:59:10 +02:00
Danielle Lancashire c91ae21a6c evalbroker: Flush within update lock
Primarily a cleanup commit, however, currently there is a potential race
condition (that I'm not sure we've ever actually hit) during a flapping
SetEnabled/Disabled state where we may never correctly restart the eval
broker, if it was being called from multiple routines.
2019-05-14 13:26:56 +02:00
Preetha Appan 4d3f74e161
Fix test setup to have correct jobcreateindex for deployments 2019-05-13 18:53:47 -05:00
Preetha Appan d448750449
Lookup job only once, and fix tests 2019-05-13 18:33:41 -05:00
Preetha Appan 07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest 2019-05-13 18:33:41 -05:00
Jasmine Dahilig 30d346ca15
Merge pull request #5665 from hashicorp/b-empty-datacenters
add non-empty string validation for datacenters
2019-05-13 10:23:26 -07:00
Mahmood Ali 919827f2df
Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base
nomad exec part 1: plumbing and docker driver
2019-05-09 18:09:27 -04:00
Mahmood Ali 3c668732af server: server forwarding logic for nomad exec endpoint 2019-05-09 16:49:08 -04:00
Jasmine Dahilig 0ba2bd15b9 add unit tests for datacenter non-empty string validation 2019-05-08 11:51:52 -07:00
Mahmood Ali 9d3f13e9b3 remove Index field from EmitNodeEventsResponse
`Index` is already included as part of `WriteMeta` embedding.

This is a backward compatible change: Clients never read the field; and
Server refernces to `EmitNodeEventsResponse.Index` would be using the
value in `WriteMeta`, which is consistent with other response structs.
2019-05-08 08:42:26 -04:00
Preetha 1538913a2a
Merge pull request #5628 from hashicorp/f-preemption-config
Add config to disable preemption for batch/service jobs
2019-05-06 15:40:35 -05:00
Mahmood Ali f35ad92a8b
Merge pull request #5646 from hashicorp/some-ugorji-fixes
Codegen codec helpers for all nomad structs
2019-05-06 13:23:12 -04:00
Lang Martin 9f3f11df97
Merge pull request #5601 from hashicorp/b-config-parse-direct-hcl
config parse direct hcl
2019-05-06 12:05:19 -04:00
Mahmood Ali 92c133b905 Update peers info with new raft config details 2019-05-03 16:55:53 -04:00
Preetha Appan ad3c263d3f
Rename to match system scheduler config.
Also added docs
2019-05-03 14:06:12 -05:00
Jasmine Dahilig 016495c368 add non-empty string validation for datacenters 2019-05-03 06:48:02 -07:00
Hemanth Basappa 3fef02aa93 Add support in nomad for supporting raft 3 protocol peers.json 2019-05-02 09:11:23 -07:00
Mahmood Ali 21d21baf8b codegen codecs for nomad structs
`ls *[!_test].go` was ignoring any file that ends with `s.go` (or any of
the letter inside `[]`), including `structs.go`!
2019-05-01 12:42:55 -04:00
Lang Martin 598112a1cc tag HCL bookkeeping keys with json:"-" to keep them out of the api 2019-04-30 10:29:14 -04:00
Lang Martin 5ebae65d1a agent/config, config/* mapstructure tags -> hcl tags 2019-04-30 10:29:14 -04:00
Preetha Appan 6615d5c868
Add config to disable preemption for batch/service jobs 2019-04-29 18:48:07 -05:00
Lang Martin 371014b781
Merge pull request #5553 from hashicorp/b-fingerprinter-manual-config
client fingerprinter doesn't overwrite manual configuration
2019-04-26 12:55:34 -04:00
Danielle Lancashire 3409e0be89 allocs: Add nomad alloc signal command
This command will be used to send a signal to either a single task within an
allocation, or all of the tasks if <task-name> is omitted. If the sent signal
terminates the allocation, it will be treated as if the allocation has crashed,
rather than as if it was operator-terminated.

Signal validation is currently handled by the driver itself and nomad
does not attempt to restrict or validate them.
2019-04-25 12:43:32 +02:00
Arshneet Singh b7b050cdd1 Change min version required for plan optimization 2019-04-24 12:36:07 -07:00
Arshneet Singh 9cc39edb67 Return error when preempted/stopped alloc doesn't exist during denormalization 2019-04-24 12:36:07 -07:00
Lang Martin 19ba0f4882 structs_test use testify require.True instead of t.Fatal 2019-04-23 17:00:11 -04:00
Arshneet Singh d4e7a5c005 Add comments to functions, and use require instead of assert 2019-04-23 09:57:21 -07:00
Arshneet Singh 4cf4324b8f Remove allowPlanOptimization from schedulers 2019-04-23 09:18:02 -07:00
Arshneet Singh 0dd4c109e8 Compat tags 2019-04-23 09:18:01 -07:00
Arshneet Singh 65f5fab131 Add tests for plan normalization 2019-04-23 09:18:01 -07:00
Arshneet Singh b977748a4b Add code for plan normalization 2019-04-23 09:18:01 -07:00
Danielle 198a838b61
Merge pull request #5512 from hashicorp/dani/f-alloc-stop
alloc-lifecycle: nomad alloc stop
2019-04-23 13:05:08 +02:00
Danielle Lancashire 832f607433 allocs: Add nomad alloc stop
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.

This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.

The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
2019-04-23 12:50:23 +02:00
Lang Martin 8aa97cff13 tests over setwise equality of fingerprinted parts 2019-04-19 15:49:24 -04:00
Lang Martin 7de6e28ddc structs need to keep assert Equal interface implementation for tests 2019-04-19 15:23:49 -04:00
Lang Martin 977d33970b structs equals use labeled continue for clarity 2019-04-19 15:23:48 -04:00
Lang Martin 7b99488afa struct equals use a working pattern for setwise comparison 2019-04-19 15:23:48 -04:00
Lang Martin eba4e29440 client fingerprinter doesn't overwrite manual configuration
Revert "Revert accidental merge of pr #5482"
This reverts commit c45652ab8c113487b9d4fbfb107782cbcf8a85b0.
2019-04-19 15:23:48 -04:00
Preetha Appan 22109d1e20
Add preemption related fields to AllocationListStub 2019-04-18 10:36:44 -05:00
Lang Martin a2a1e7829d Revert accidental merge of pr #5482
Revert "fingerprint Constraints and Affinities have Equals, as set"
This reverts commit 596f16fb5f1a4a6766a57b3311af806d22382609.

Revert "client tests assert the independent handling of interface and speed"
This reverts commit 7857ac5993a578474d0570819f99b7b6e027de40.

Revert "structs missed applying a style change from the review"
This reverts commit 658916e3274efa438beadc2535f47109d0c2f0f2.

Revert "client, structs comments"
This reverts commit be2838d6baa9d382a5013fa80ea016856f28ade2.

Revert "client fingerprint updateNetworks preserves the network configuration"
This reverts commit fc309cb430e62d8e66267a724f006ae9abe1c63c.

Revert "client_test cleanup comments from review"
This reverts commit bc0bf4efb9114e699bc662f50c8f12319b6b3445.

Revert "client Networks Equals is set equality"
This reverts commit f8d432345b54b1953a4a4c719b9269f845e3e573.

Revert "struct cleanup indentation in RequestedDevice Equals"
This reverts commit f4746411cab328215def6508955b160a53452da3.

Revert "struct Equals checks for identity before value checking"
This reverts commit 0767a4665ed30ab8d9586a59a74db75d51fd9226.

Revert "fix client-test, avoid hardwired platform dependecy on lo0"
This reverts commit e89dbb2ab182b6368507dbcd33c3342223eb0ae7.

Revert "refactor error in client fingerprint to include the offending data"
This reverts commit a7fed726c6e0264d42a58410d840adde780a30f5.

Revert "add client updateNodeResources to merge but preserve manual config"
This reverts commit 84bd433c7e1d030193e054ec23474380ff3b9032.

Revert "refactor struts.RequestedDevice to have its own Equals"
This reverts commit 689782524090e51183474516715aa2f34908b8e6.

Revert "refactor structs.Resource.Networks to have its own Equals"
This reverts commit 49e2e6c77bb3eaa4577772b36c62205061c92fa1.

Revert "refactor structs.Resource.Devices to have its own Equals"
This reverts commit 4ede9226bb971ae42cc203560ed0029897aec2c9.

Revert "add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources"
This reverts commit 49fbaace5298d5ccf031eb7ebec93906e1d468b5.

Revert "add structs.Resources Equals"
This reverts commit 8528a2a2a6450e4462a1d02741571b5efcb45f0b.

Revert "test that fingerprint resources are updated, net not clobbered"
This reverts commit 8ee02ddd23bafc87b9fce52b60c6026335bb722d.
2019-04-11 10:29:40 -04:00
Lang Martin 07ff740408 fingerprint Constraints and Affinities have Equals, as set 2019-04-11 09:56:22 -04:00
Lang Martin 8f07698c03 structs missed applying a style change from the review 2019-04-11 09:56:22 -04:00
Lang Martin 7258a13c72 client, structs comments 2019-04-11 09:56:22 -04:00
Lang Martin 1878bf694e client Networks Equals is set equality 2019-04-11 09:56:22 -04:00
Lang Martin e1c91afd19 struct cleanup indentation in RequestedDevice Equals 2019-04-11 09:56:22 -04:00
Lang Martin 0c90efebdc struct Equals checks for identity before value checking 2019-04-11 09:56:22 -04:00
Lang Martin 1a594b53f6 refactor struts.RequestedDevice to have its own Equals 2019-04-11 09:56:21 -04:00
Lang Martin ec1ccdeda0 refactor structs.Resource.Networks to have its own Equals
NodeResource.Networks uses the same function
2019-04-11 09:56:21 -04:00
Lang Martin 06008465c4 refactor structs.Resource.Devices to have its own Equals 2019-04-11 09:56:21 -04:00
Lang Martin 36f3022246 add COMPAT(0.10): Remove in 0.10 notes to impl for structs.Resources 2019-04-11 09:56:21 -04:00
Lang Martin d4567e9909 add structs.Resources Equals 2019-04-11 09:56:21 -04:00
Danielle Lancashire e135876493 allocs: Add nomad alloc restart
This adds a `nomad alloc restart` command and api that allows a job operator
with the alloc-lifecycle acl to perform an in-place restart of a Nomad
allocation, or a given subtask.
2019-04-11 14:25:49 +02:00
Chris Baker 34e100cc96
server vault client: use two vault clients, one with namespace, one without for /sys calls 2019-04-10 10:34:10 -05:00
Michael Schurter cc7768c170
Update nomad/structs/config/vault.go
Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
2019-04-10 10:34:10 -05:00
Chris Baker a26d4fe1e5
docs: -vault-namespace, VAULT_NAMESPACE, and config
agent: added VAULT_NAMESPACE env-based configuration
2019-04-10 10:34:10 -05:00
Chris Baker d3041cdb17
wip: added config parsing support, CLI flag, still need more testing, VAULT_ var, documentation 2019-04-10 10:34:10 -05:00
Chris Baker 0eaeef872f
config/docs: added `namespace` to vault config
server/client: process `namespace` config, setting on the instantiated vault client
2019-04-10 10:34:10 -05:00
Michael Schurter c0cd96ef75
Update nomad/job_endpoint_test.go
Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
2019-04-10 10:34:10 -05:00
Michael Schurter 188c32421a
Update nomad/job_endpoint.go
Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
2019-04-10 10:34:10 -05:00
Chris Baker 0ba1600545
server/job_endpoint: accept vault token and pass as part of Job.RegisterRequest [#4555] 2019-04-10 10:34:10 -05:00
James Rasell 9470507cf4
Add NodeName to the alloc/job status outputs.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.

This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.

Closes #2359
Cloess #1180
2019-04-10 10:34:10 -05:00
Michael Schurter 45b4827ad7 Bump to 0.9.1-dev 2019-04-09 09:01:48 -07:00
Nomad Release bot e307734e4a Generate files for 0.9.0 release 2019-04-09 01:56:00 +00:00
Michael Schurter 3af602b633 Remove 0.9.0-rc2 generated files 2019-04-03 07:41:09 -07:00
Nomad Release bot 16b4336ccf Generate files for 0.9.0-rc2 release 2019-04-03 01:54:29 +00:00
Michael Schurter 9afbc45cff Bump to dev post-0.9.0-rc1 release 2019-03-22 08:26:30 -07:00
Nomad Release bot 3ab3dd4105 Generate files for 0.9.0-rc1 release 2019-03-21 19:06:13 +00:00
HashedDan caad68e799 server: inconsistent receiver notation corrected
Signed-off-by: HashedDan <georgedanielmangum@gmail.com>
2019-03-16 17:53:53 -05:00
Alex Dadgar e779d9444b
Update nomad/eval_endpoint_test.go
Co-Authored-By: schmichael <michael.schurter@gmail.com>
2019-03-05 15:19:15 -08:00
Alex Dadgar 1857f5d7c1
Update nomad/eval_endpoint.go
Co-Authored-By: schmichael <michael.schurter@gmail.com>
2019-03-05 15:19:07 -08:00
Michael Schurter e37bbb21a5 nomad: simplify code and improve parameter name 2019-03-04 13:44:14 -08:00
Michael Schurter 05f51499ba nomad: compare current eval when setting WaitIndex
Consider currently dequeued Evaluation's ModifyIndex when determining
its WaitIndex. Normally the Evaluation itself would already be in the
state store snapshot used to determine the WaitIndex. However, since the FSM
applies Raft messages to the state store concurrently with Dequeueing,
it's possible the currently dequeued Evaluation won't yet exist in the
state store snapshot used by JobsForEval.

This can be solved by always considering the current eval's modify index
and using it if it is greater than all of the evals returned by the
state store.
2019-03-01 15:23:39 -08:00
Michael Schurter 3f386e3951 Remove generated files for 0.9.0-beta3 2019-02-26 10:34:08 -08:00
Michael Schurter d74755900e Generate files for 0.9.0-beta3 release 2019-02-26 09:44:49 -08:00
Charlie Voiselle 604c49beb8
Merge pull request #5344 from hashicorp/b-nexteval-for-failed-follow-up
Set NextEval when making `failed-follow-up` evals
2019-02-22 14:14:41 -08:00
Charlie Voiselle 006afdca9b Added comments
* caller should created eval id
* prev/next eval used in failed-follow-up
2019-02-22 10:22:52 -08:00
Charlie Voiselle c28c195f42 Set NextEval when making `failed-follow-up` evals
This allows users to locate failed-follow-up evals more easily
2019-02-20 16:07:11 -08:00
Michael Schurter 6580ed668e client: don't redownload completed artifacts on retries
Track the download status of each artifact independently so that if only
one of many artifacts fails to download, completed artifacts aren't
downloaded again.
2019-02-20 08:45:12 -08:00
Michael Schurter 2db91425e3 Remove 0.9.0-beta2 generated files 2019-02-01 08:28:44 -08:00
Alex Dadgar 84d0afccae Generate files for 0.9.0-beta2 2019-01-30 13:31:50 -08:00
Alex Dadgar d2e5ede119 remove generated structs 2019-01-30 12:38:34 -08:00
Alex Dadgar 41265d4d61 Change types of weights on spread/affinity 2019-01-30 12:20:38 -08:00
Alex Dadgar bc804dda2e Nomad 0.9.0-beta1 generated code 2019-01-30 10:49:44 -08:00
Preetha Appan c848a1d387
ensure tests run a 0.9 server 2019-01-29 16:19:45 -06:00
Preetha Appan 496eb1de0c
Guard operator endpoints for minimum server version 2019-01-29 15:50:36 -06:00
Preetha Appan 7578522f58
variable name fix 2019-01-29 13:48:45 -06:00
Preetha Appan a6cebbbf9e
Make sure that all servers are 0.9 before applying scheduler config entry 2019-01-29 12:47:42 -06:00
Michael Schurter 3aba7ee826 nomad: fix panic when no node conn found
A missing return would cause a panic when a server could find no route
to a client.
2019-01-28 21:55:35 -08:00
Mahmood Ali f9164dae67
Merge pull request #5228 from hashicorp/f-vault-err-tweaks
server/vault: tweak error messages
2019-01-25 11:17:31 -05:00
Mahmood Ali f4560d8a2a server/vault: tweak error messages
Closes #5139
2019-01-25 10:33:54 -05:00
Preetha ec92bf673c
Merge pull request #5223 from hashicorp/f-jobs-list-datacenters
Add Datacenters to the JobListStub struct
2019-01-24 08:13:30 -06:00
Michael Schurter 13f061a83f
Merge pull request #5196 from hashicorp/f-plugin-utils
Make plugins/shared external and make pluginutls/
2019-01-23 06:59:32 -08:00
Michael Schurter 32daa7b47b goimports until make check is happy 2019-01-23 06:27:14 -08:00
Michael Schurter be0bab7c3f move pluginutils -> helper/pluginutils
I wanted a different color bikeshed, so I get to paint it
2019-01-22 15:50:08 -08:00
Alex Dadgar 4bdccab550 goimports 2019-01-22 15:44:31 -08:00
Alex Dadgar cdcd3c929c loader and singleton 2019-01-22 15:11:57 -08:00
Alex Dadgar 6c2782f037 move catalog + grpcutils 2019-01-22 15:11:57 -08:00
Preetha Appan 38422642cb
Use DesiredState to determine whether to stop sending task events 2019-01-22 16:43:32 -06:00
Michael Lange ce7bc4f56f Add Datacenters to the JobsListStub struct
So it can be used for filtering the full list of jobs
2019-01-22 11:16:35 -08:00
Mahmood Ali e1803b685b tests: deflake TestClientAllocations_GarbageCollect_Remote
Use the same strategy as one in f2f383b07543a09ca989b82738926f7248e1ab28
2019-01-19 09:07:27 -05:00
Mahmood Ali b2203a3a22
Merge pull request #5215 from hashicorp/test-fix-garbagecollect
test: fix flaky garbage collect test
2019-01-18 21:10:01 -05:00
Mahmood Ali 05e32fb525
Merge pull request #5213 from hashicorp/b-api-separate
Slimmer /api package
2019-01-18 20:52:53 -05:00
Michael Schurter 0cd35ba335 test: fix flaky garbage collect test
This seems to fix TestClientAllocations_GarbageCollectAll_Remote being
flaky.

This test confuses me. It joins 2 servers, but then goes out of its way
to make sure the test client only interacts with one. There are not
enough comments for me to figure out the precise assertions this test is
trying to make.

A good old fashioned wait-for-the-client-to-register seems to fix the
flakiness though. The error was that the node could not be found, so
this makes some sense. However, lots of other tests seem to use the same
"wait for node" logic and don't appear to be flaky, so who knows why
waiting fixes this one.

Passes with -race.
2019-01-18 16:01:30 -08:00
Mahmood Ali 7bdd43f3e0 api: avoid codegen for syncing
Given that the values will rarely change, specially considering that any
changes would be backward incompatible change.  As such, it's simpler to
keep syncing manually in the rare occasion and avoid the syncing code
overhead.
2019-01-18 18:52:31 -05:00
Preetha Appan 510d7839e4
code review comments 2019-01-18 17:41:39 -06:00
Mahmood Ali 253532ec00 api: avoid import nomad/structs pkg
nomad/structs is an internal package and imports many libraries (e.g.
raft, codec) that are not relevant to api clients, and may cause
unnecessary dependency pain (e.g. `github.com/ugorji/go/codec`
version is very old now).

Here, we add a code generator that imports the relevant constants from
`nomad/structs`.

I considered using this approach for other structs, but didn't find a
quick viable way to reduce duplication.  `nomad/structs` use values as
struct fields (e.g. `string`), while `api` uses value pointer (e.g.
`*string`) instead.  Also, sometimes, `api` structs contain deprecated
fields or additional documentation, so simple copy-paste doesn't work.
For these reasons, I opt to keep the status quo.
2019-01-18 14:51:19 -05:00
Preetha Appan be9656d195
fix linting 2019-01-17 15:36:33 -06:00
Preetha Appan 0f8a113ead
Refactor to find jobs with child instances more effeciently
also added unit tests
2019-01-17 14:29:48 -06:00
Preetha Appan be36fee48e
Use IsParameterized/isPeriodic methods 2019-01-17 12:15:42 -06:00
Preetha Appan 81a8f18cac
Fix bug in reconcile summaries that affects periodic/parameterized jobs
This fixes incorrect parent job summaries by recomputing them in the
ReconcileJobSummaries method in the state store
2019-01-17 12:01:01 -06:00
Nick Ethier 597b7b751d
tr: add retry /w backoff to stats_hook failure 2019-01-12 12:18:24 -05:00
Mahmood Ali 4414a2ce1c tests: remove tests for unsupported features
With switching to driver plugins, driver validation is quite tricky and
we need to do some design thinking before supporting it against.
2019-01-10 10:21:48 -05:00
Nick Wales 7a7b5da0df Adds optional Consul service tags to nomad server and agent services, gh#4297 2019-01-09 22:02:46 +00:00
Mahmood Ali 1f2473263e fix more cases of logging arity errors 2019-01-09 09:22:47 -05:00
Mahmood Ali 6f077a73dc Fix panic on failure
Error expects an odd number of arguments, and panics otherwise.
2019-01-08 12:19:44 -05:00
Michael Schurter 324e989327
Merge pull request #5034 from hashicorp/test-fix-races
Test fix races
2019-01-08 07:04:09 -08:00
Alex Dadgar 79cfe26021 vet 2019-01-07 14:49:41 -08:00
Alex Dadgar 8a35d7b1dd Test recovery 2019-01-07 14:49:41 -08:00
Nick Ethier a96afb6c91
fix tests that fail as a result of async client startup 2018-12-20 00:53:44 -05:00
Michael Schurter 6c1dbb659d test: fix race and nil panic in nomad/ tests
Race was test only and due to unlocked map access.

Panic was test only and due to checking a field on a struct even when we
knew the struct was nil.

Race output that was fixed:
```
==================
WARNING: DATA RACE
Read at 0x00c000697dd0 by goroutine 768:
  runtime.mapaccess2()
      /usr/local/go/src/runtime/map.go:439 +0x0
  github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds.func8()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:402
+0xe6
  github.com/hashicorp/nomad/testutil.WaitForResultRetries()
      /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:30
+0x5a
  github.com/hashicorp/nomad/testutil.WaitForResult()
      /home/schmichael/go/src/github.com/hashicorp/nomad/testutil/wait.go:22
+0x57
  github.com/hashicorp/nomad/nomad.TestLeader_PeriodicDispatcher_Restore_Adds()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader_test.go:401
+0xb53
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162

Previous write at 0x00c000697dd0 by goroutine 569:
  runtime.mapassign()
      /usr/local/go/src/runtime/map.go:549 +0x0
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).Add()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:224
+0x2eb
  github.com/hashicorp/nomad/nomad.(*Server).restorePeriodicDispatcher()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:394
+0x29a
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:234
+0x593
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Goroutine 768 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:878 +0x650
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1119 +0xa8
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1117 +0x4ee
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1034 +0x2ee
  main.main()
      _testmain.go:1150 +0x221

Goroutine 569 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter 004fa574cb test: fix race in eval broker update chan
Similar to previous commits the delayed eval update chan was set and
access from different goroutines causing a race. Passing the chan on the
stack resolves the race.

Race output from `go test -race -run 'Server_RPC$'` in nomad/

```
==================
WARNING: DATA RACE
Write at 0x00c000339150 by goroutine 63:
  github.com/hashicorp/nomad/nomad.(*EvalBroker).flush()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:708
+0x3dc
  github.com/hashicorp/nomad/nomad.(*EvalBroker).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:174
+0xc4
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:718
+0x1fd
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c000339150 by goroutine 73:
  github.com/hashicorp/nomad/nomad.(*EvalBroker).runDelayedEvalsWatcher()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:771
+0x176

Goroutine 63 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 73 (running) created at:
  github.com/hashicorp/nomad/nomad.(*EvalBroker).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/eval_broker.go:170
+0x173
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:207
+0x355
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter 1c137690c4 test: fix race around block eval chans
Similar to previous commit, stop and change chans were being set and
accessed from different goroutines. Passing the chans on the stack
resolves the race.

Output from `go test -race -run 'Server_RPC$' in nomad/

```
==================
WARNING: DATA RACE
Write at 0x00c0002b4e10 by goroutine 63:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:648
+0x32a
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c0002b4e10 by goroutine 75:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).watchCapacity()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:483
+0xfe

Goroutine 63 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 75 (finished) created at:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:141
+0xba
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
==================
WARNING: DATA RACE
Write at 0x00c0002b4e50 by goroutine 63:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).Flush()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:649
+0x388
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:149
+0x12b
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:721
+0x232
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:122
+0x95d
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c0002b4e50 by goroutine 77:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).prune()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:690
+0xae

Goroutine 63 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 77 (finished) created at:
  github.com/hashicorp/nomad/nomad.(*BlockedEvals).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/blocked_evals.go:142
+0xdc
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:210
+0x392
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Michael Schurter 80263861aa test: fix race around updateCh handling
PeriodicDispatch.SetEnabled sets updateCh in one goroutine, and
PeriodicDispatch.run accesses updateCh in another.

The race can be prevented by having SetEnabled pass updateCh to run.

Race detector output from `go test -race -run TestServer_RPC` in nomad/

```
==================
WARNING: DATA RACE
Write at 0x00c0001d3f48 by goroutine 75:
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:468
+0x256
  github.com/hashicorp/nomad/nomad.(*Server).revokeLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:724
+0x267
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:131
+0x3c
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:163
+0x4dd
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c

Previous read at 0x00c0001d3f48 by goroutine 515:
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).run()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:338
+0x177

Goroutine 75 (running) created at:
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:70
+0x269

Goroutine 515 (running) created at:
  github.com/hashicorp/nomad/nomad.(*PeriodicDispatch).SetEnabled()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/periodic.go:176
+0x1bc
  github.com/hashicorp/nomad/nomad.(*Server).establishLeadership()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:231
+0x582
  github.com/hashicorp/nomad/nomad.(*Server).leaderLoop()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:117
+0x82e
  github.com/hashicorp/nomad/nomad.(*Server).monitorLeadership.func1()
      /home/schmichael/go/src/github.com/hashicorp/nomad/nomad/leader.go:72
+0x6c
==================
```
2018-12-19 15:48:02 -08:00
Danielle Tomlinson 3647b701a6 taskrunner: Emit task events when a hook fails 2018-12-13 18:20:18 +01:00
Chris Baker af593c401c
Merge pull request #4974 from hashicorp/b-1173-log-spam
rpc accept loop: added backoff on logging
2018-12-12 16:54:42 -08:00
Chris Baker 121a9eb8cb some changes for more idiomatic code 2018-12-12 23:11:17 +00:00
Alex Dadgar fbe4d67d1b fix iops related tests 2018-12-12 14:32:22 -08:00
Chris Baker 34600f8b75 fixed bug in loop delay 2018-12-12 19:16:41 +00:00
Chris Baker 89c64932c1 gofmt 2018-12-12 19:09:06 +00:00
Chris Baker 22c11d8799 improved code for readability 2018-12-12 18:52:06 +00:00
Preetha f406e66ab8
Merge pull request #4881 from hashicorp/f-device-preemption
Device preemption
2018-12-11 18:34:19 -06:00
Alex Dadgar 1531b6d534
Merge pull request #4970 from hashicorp/f-no-iops
Deprecate IOPS
2018-12-11 12:51:22 -08:00
Chris Baker 59beae35df nomad/rpc listener: modified to throttle logging on "permanent" Accept() errors as well (with a higher delay cap) 2018-12-07 22:14:15 +00:00
Chris Baker 707bac0a7b rpc accept loop: added backoff on logging for failed connections, in case there is a fast fail loop (NMD-1173) 2018-12-07 20:12:55 +00:00
Alex Dadgar c918a96490 Warn if IOPS is being used 2018-12-06 16:17:09 -08:00
Alex Dadgar 1e3c3cb287 Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Alex Dadgar 14a61ea3ea Don't GC running but desired stop allocations
This PR fixes an edge case where we could GC an allocation that was in a
desired stop state but had not terminated yet. This can be hit if the
client hasn't shutdown the allocation yet or if the allocation is still
shutting down (long kill_timeout).

Fixes https://github.com/hashicorp/nomad/issues/4940
2018-12-05 13:01:12 -08:00
Mahmood Ali adb4d69576
Merge pull request #4956 from hashicorp/b-vault-client-tweaks-followup
server/vault: Lock Vault expiration tracking
2018-12-04 19:46:59 -05:00
Mahmood Ali 50e38104a5 server/nomad: Lock Vault expiration tracking
`currentExpiration` field is accessed in multiple goroutines: Stats and
renewal, so needs locking.

I don't anticipate high contention, so simple mutex suffices.
2018-12-04 09:29:48 -05:00
Preetha Appan 8656d3379f
Add guards around subtracting summary count 2018-12-03 11:16:35 -06:00
Danielle Tomlinson 51a9f7369e
Merge pull request #4936 from hashicorp/f-legacy-refactor
Refactor and repackage client/driver
2018-11-30 13:38:06 +01:00
Danielle Tomlinson d4cbd608ff nomad: Remove on-submission job validation
With the introduction of driver plugins, we're temporarily relying on
_run time validation_ of driver configurations, rather than submission
time.
2018-11-30 10:47:08 +01:00
Nick Ethier 80ae7e34f4
Merge pull request #4906 from hashicorp/f-metric-prefix-master
Port metric prefix filtering to master
2018-11-29 22:27:47 -05:00
Nick Ethier b1484aec33
nomad: fix hclog usage 2018-11-29 22:27:39 -05:00
Mahmood Ali 0a2611e41f vault: protect against empty Vault secret response
Also, fix a case where a successful second attempt of loading token can
cause a panic.
2018-11-29 09:34:17 -05:00