Commit graph

16987 commits

Author SHA1 Message Date
Drew Bailey f51a3d1f37
nomad state store must be modified through raft, rm local state change 2020-02-03 13:57:34 -05:00
Drew Bailey 74779f23e6
keep placed canaries aligned with alloc status 2020-02-03 13:57:33 -05:00
Drew Bailey 38965bf3b3
Merge pull request #7053 from hashicorp/b-client-monitor-acl-panic
Fix panic when monitoring a local client node
2020-02-03 13:45:46 -05:00
Michael Schurter ef498b26ab
docs: fix misspelling 2020-02-03 10:32:22 -08:00
Michael Lange 1d9c7515ad
Merge pull request #6979 from hashicorp/f/codeowners
Add the digital marketing team as the code owners for the website dir
2020-02-03 10:28:12 -08:00
Drew Bailey 29dc0688a2
Merge pull request #6996 from hashicorp/system-sched-ineligible-updates
System sched ignore ineligible updates
2020-02-03 13:22:30 -05:00
Drew Bailey 06f4a0d946
update changelog 2020-02-03 13:20:07 -05:00
Drew Bailey d830998572
agent Profile req nil check s.agent.Server()
clean up logic and tests
2020-02-03 13:20:05 -05:00
Drew Bailey c4f45f9bde
Fix panic when monitoring a local client node
Fixes a panic when accessing a.agent.Server() when agent is a client
instead. This pr removes a redundant ACL check since ACLs are validated
at the RPC layer. It also nil checks the agent server and uses Client()
when appropriate.
2020-02-03 13:20:04 -05:00
Mahmood Ali 06a283ea93
Merge pull request #7045 from hashicorp/b-rpc-fixes
Some fixes to connection pooling
2020-02-03 13:10:15 -05:00
Seth Hoenig e3684f84cb
Merge pull request #7054 from hashicorp/f-remove-leftover-debug-line
e2e: remove leftover e2e debug println
2020-02-03 12:02:19 -06:00
Mahmood Ali a4d58c3178 pool: Clear connection before releasing
This to be consistent with other connection clean up handler as well as consul's https://github.com/hashicorp/consul/blob/v1.6.3/agent/pool/pool.go#L468-L479 .
2020-02-03 12:41:11 -05:00
Seth Hoenig 057179edea e2e: remove leftover debug println statement 2020-02-03 11:15:38 -06:00
Mahmood Ali d43e005291
Merge pull request #7051 from hashicorp/b-copy-jobs-oss
sentinel: copy jobs to prevent mutation
2020-02-03 10:49:44 -05:00
Drew Bailey a255993ecd
update changelog 2020-02-03 09:04:09 -05:00
Drew Bailey 1c046a74d8
comment for filtering reason 2020-02-03 09:02:09 -05:00
Drew Bailey e71f132455
add test for node eligibility 2020-02-03 09:02:09 -05:00
Drew Bailey 6b492630dd
make diffSystemAllocsForNode aware of eligibility
diffSystemAllocs -> diffSystemAllocsForNode, this function is only used
for diffing system allocations, but lacked awareness of eligible
nodes and the node ID that the allocation was going to be placed.

This change now ignores a change if its existing allocation is on an
ineligible node. For a new allocation, it also checks tainted and
ineligible nodes in the same function instead of nil-ing out the diff
after computation in diffSystemAllocs
2020-02-03 09:02:08 -05:00
Drew Bailey e613a258da
ignore computed diffs if node is ineligible
test flakey, add temp sleeps for debugging

fix computed class
2020-02-03 09:02:08 -05:00
Michael Schurter 9bedd0202e sentinel: copy jobs to prevent mutation
It's unclear whether Sentinel code can mutate values passed to the eval,
so ensure it cannot by copying the job.
2020-02-03 08:48:51 -05:00
Michael Lange 71688376bd
Merge pull request #7047 from hashicorp/f-ui/node-drain-icons
UI: Node drain status light icons
2020-01-31 22:45:04 -08:00
Seth Hoenig 792a014b1a
Merge pull request #7027 from hashicorp/dev-connect-acls
connect acls - rebase all the things
2020-01-31 19:37:06 -06:00
Seth Hoenig 22eb91fbdf nomad/docs: increment version number to 0.10.4 2020-01-31 19:06:46 -06:00
Seth Hoenig 8872b44b58 docs: update chanagelog to mention connect with acls 2020-01-31 19:06:42 -06:00
Seth Hoenig 6bfa50acdc nomad: remove unused default schedular variable
This is from a merge conflict resolution that went the wrong direction.

I assumed the block had been added, but really it had been removed. Now,
it is removed once again.
2020-01-31 19:06:37 -06:00
Seth Hoenig d3cd6afd7e nomad: min cluster version for connect ACLs is now v0.10.4 2020-01-31 19:06:19 -06:00
Seth Hoenig db7bcba027 tests: set consul token for nomad client for testing SIDS TR hook 2020-01-31 19:06:15 -06:00
Seth Hoenig 9b20ca5b25 e2e: setup consul ACLs a little more correctly 2020-01-31 19:06:11 -06:00
Seth Hoenig 83c717a624 e2e: remove redundant extra API call for getting allocs 2020-01-31 19:06:07 -06:00
Seth Hoenig b212654b92 e2e: agent token was only being set for server0 2020-01-31 19:06:03 -06:00
Seth Hoenig f7a1e9cee3 e2e: use hclfmt on consul acls policy config files 2020-01-31 19:05:59 -06:00
Seth Hoenig e9e0d2e3fc e2e: uncomment test case that is not broken 2020-01-31 19:05:55 -06:00
Seth Hoenig df633ee45f e2e: do not use eventually when waiting for allocs
This test is causing panics. Unlike the other similar tests, this
one is using require.Eventually which is doing something bad, and
this change replaces it with a for-loop like the other tests.

Failure:

=== RUN   TestE2E/Connect
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest/TestConnectDemo
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest/TestMultiServiceConnect
=== RUN   TestE2E/Connect/*connect.ConnectClientStateE2ETest
panic: Fail in goroutine after TestE2E/Connect/*connect.ConnectE2ETest has completed

goroutine 38 [running]:
testing.(*common).Fail(0xc000656500)
	/opt/google/go/src/testing/testing.go:565 +0x11e
testing.(*common).Fail(0xc000656100)
	/opt/google/go/src/testing/testing.go:559 +0x96
testing.(*common).FailNow(0xc000656100)
	/opt/google/go/src/testing/testing.go:587 +0x2b
testing.(*common).Fatalf(0xc000656100, 0x1512f90, 0x10, 0xc000675f88, 0x1, 0x1)
	/opt/google/go/src/testing/testing.go:672 +0x91
github.com/hashicorp/nomad/e2e/connect.(*ConnectE2ETest).TestMultiServiceConnect.func1(0x0)
	/home/shoenig/go/src/github.com/hashicorp/nomad/e2e/connect/multi_service.go:72 +0x296
github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually.func1(0xc0004962a0, 0xc0002338f0)
	/home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1494 +0x27
created by github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually
	/home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1493 +0x272
FAIL	github.com/hashicorp/nomad/e2e	21.427s
2020-01-31 19:05:47 -06:00
Seth Hoenig 5e5fadbcdf e2e: remove forgotten unused field from new struct 2020-01-31 19:05:41 -06:00
Seth Hoenig fc498c2b96 e2e: e2e test for connect with consul acls
Provide script for managing Consul ACLs on a TF provisioned cluster for
e2e testing. Script can be used to 'enable' or 'disable' Consul ACLs,
and automatically takes care of the bootstrapping process if necessary.

The bootstrapping process takes a long time, so we may need to
extend the overall e2e timeout (20 minutes seems fine).

Introduces basic tests for Consul Connect with ACLs.
2020-01-31 19:05:36 -06:00
Seth Hoenig 4152254c3a tests: skip some SIDS hook tests if running tests as root 2020-01-31 19:05:32 -06:00
Seth Hoenig 441e8c7db7 client: additional test cases around failures in SIDS hook 2020-01-31 19:05:27 -06:00
Seth Hoenig c281b05fc0 client: PR cleanup - improved logging around kill task in SIDS hook 2020-01-31 19:05:23 -06:00
Seth Hoenig 03a4af9563 client: PR cleanup - shadow context variable 2020-01-31 19:05:19 -06:00
Seth Hoenig 587a5d4a8d nomad: make TaskGroup.UsesConnect helper a public helper 2020-01-31 19:05:11 -06:00
Seth Hoenig ee89a754f1 nomad: fix leftover missed refactoring in consul policy checking 2020-01-31 19:05:06 -06:00
Seth Hoenig 057f117592 client: manage TR kill from parent on SI token derivation failure
Re-orient the management of the tr.kill to happen in the parent of
the spawned goroutine that is doing the actual token derivation. This
makes the code a little more straightforward, making it easier to
reason about not leaking the worker goroutine.
2020-01-31 19:05:02 -06:00
Seth Hoenig c8761a3f11 client: set context timeout around SI token derivation
The derivation of an SI token needs to be safegaurded by a context
timeout, otherwise an unresponsive Consul could cause the siHook
to block forever on Prestart.
2020-01-31 19:04:56 -06:00
Seth Hoenig 4ee55fcd6c nomad,client: apply more comment/style PR tweaks 2020-01-31 19:04:52 -06:00
Seth Hoenig be7c671919 nomad,client: apply smaller PR suggestions
Apply smaller suggestions like doc strings, variable names, etc.

Co-Authored-By: Nick Ethier <nethier@hashicorp.com>
Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>
2020-01-31 19:04:40 -06:00
Seth Hoenig 78a7d1e426 comments: cleanup some leftover debug comments and such 2020-01-31 19:04:35 -06:00
Seth Hoenig 5c5da95f34 client: skip task SI token file load failure if testing as root
The TestEnvoyBootstrapHook_maybeLoadSIToken test case only works when
running as a non-priveleged user, since it deliberately tries to read
an un-readable file to simulate a failure loading the SI token file.
2020-01-31 19:04:30 -06:00
Seth Hoenig ab7ae8bbb4 client: remove unused indirection for referencing consul executable
Was thinking about using the testing pattern where you create executable
shell scripts as test resources which "mock" the process a bit of code
is meant to fork+exec. Turns out that wasn't really necessary in this case.
2020-01-31 19:04:25 -06:00
Seth Hoenig 076cb4754e agent: re-enable the server in dev mode 2020-01-31 19:04:19 -06:00
Seth Hoenig 8219c78667 nomad: handle SI token revocations concurrently
Be able to revoke SI token accessors concurrently, and also
ratelimit the requests being made to Consul for the various
ACL API uses.
2020-01-31 19:04:14 -06:00