open-nomad

Commit Graph

Author	SHA1	Message	Date
Michael Schurter	85d4309128	docs: add 0.11.0-rc1 to download page update banner from "Beta" to "Release Candidate"	2020-04-07 07:48:53 -07:00
Mahmood Ali	14d6fec05a	tests: deflake some SetServer related tests Some tests assert on numbers on numbers of servers, e.g. TestHTTP_AgentSetServers and TestHTTP_AgentListServers_ACL . Though, in dev and test modes, the agent starts with servers having duplicate entries for advertised and normalized RPC values, then settles with one unique value after Raft/Serf re-sets servers with one single unique value. This leads to flakiness, as the test will fail if assertion runs before Serf update takes effect. Here, we update the inital dev handling so it only adds a unique value if the advertised and normalized values are the same. Sample log lines illustrating the problem: ``` === CONT TestHTTP_AgentSetServers TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO] nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:9008 Address:127.0.0.1:9008}]" TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO] nomad: serf: EventMemberJoin: TestHTTP_AgentSetServers.global 127.0.0.1 TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.035Z [DEBUG] client.server_mgr: new server list: new_servers=[127.0.0.1:9008, 127.0.0.1:9008] old_servers=[] ... TestHTTP_AgentSetServers: agent_endpoint_test.go:759: Error Trace: agent_endpoint_test.go:759 http_test.go:1089 agent_endpoint_test.go:705 Error: "[127.0.0.1:9008 127.0.0.1:9008]" should have 1 item(s), but has 2 Test: TestHTTP_AgentSetServers ```	2020-04-07 09:27:48 -04:00
Charles Z	42e1070aa4	Fix wrong title in sidecar_task stanza docs (#7335 )	2020-04-07 08:28:30 -04:00
Spencer Owen	51741f7aad	Improve vault language (#7644 )	2020-04-07 07:54:25 -04:00
Tim Gross	caa258b924	docs: add warnings for CSI plugin jobspec (#7642 ) * Node/monolith plugins require root privileges and this wasn't being made super clear. * Node/monolith plugins should always be run as system jobs.	2020-04-07 07:51:50 -04:00
Lang Martin	c0dbcbef5f	e2e: csi: wait for volume write claims to be released before starting read jobs (#7641 )	2020-04-07 07:40:44 -04:00
Michael Schurter	b5104e5b6a	Merge pull request #7643 from hashicorp/b-7537 ar/bridge_networking: ensure cni configuration is loaded	2020-04-06 20:25:40 -07:00
Michael Schurter	3c5205c534	docs: add #7643 to changelog	2020-04-06 20:25:09 -07:00
Nick Ethier	44ad5d96d8	ar/bridge: use cni.IsCNINotInitialized helper	2020-04-06 21:44:01 -04:00
Nick Ethier	58fe326090	ar/bridge: better cni status err handling	2020-04-06 21:21:42 -04:00
Nick Ethier	6a286777c7	ar/bridge: ensure cni configuration is always loaded	2020-04-06 21:02:26 -04:00
Nick Ethier	5166806993	Merge pull request #7600 from hashicorp/b-5767 tr/service_hook: prevent Update from running before Poststart finish	2020-04-06 16:52:42 -04:00
Tim Gross	0a9b32e94e	docs: fix broken internal link on job plan page (#7640 )	2020-04-06 16:39:11 -04:00
Michael Lange	35d4b48644	Merge pull request #7577 from hashicorp/f-ui/csi-beta-label UI: Change CSI to Storage and mark it as beta	2020-04-06 13:21:11 -07:00
Nick Ethier	567609e101	tr/service_hook: reset initialized flag during deregister	2020-04-06 16:05:36 -04:00
Buck Doyle	f10906e006	UI: add exec handling for dead jobs/task states (#7637 ) This closes #7456. It hides the terminal when the job is dead and displays an error when trying to open an exec session for a task that isn’t running. There’s a skipped test for the latter behaviour that I’ll have to come back for.	2020-04-06 14:08:22 -05:00
Buck Doyle	fc7de8b153	UI: add live-updating to exec sidebar (#7499 ) This closes #7454. It makes use of the existing watchable tools to allow the exec popup sidebar to be live-updating. It also adds alphabetic sorting of task groups and tasks.	2020-04-06 13:52:42 -05:00
Drew Bailey	476ca7e7b6	Merge pull request #7638 from hashicorp/audit-docs-queryparams add note about query params for filtering	2020-04-06 13:08:55 -04:00
Drew Bailey	71b687e7d8	add note about query params for filtering	2020-04-06 13:07:38 -04:00
Drew Bailey	4ab7c03641	Merge pull request #7618 from hashicorp/b-shutdown-delay-updates Fixes bug that prevented group shutdown_delay updates	2020-04-06 13:05:20 -04:00
Michael Lange	bf9887083b	Merge pull request #7630 from hashicorp/f-ui/csi-acceptance-tests UI: CSI Acceptance Tests	2020-04-06 09:37:45 -07:00
Drew Bailey	0d4bb6bf92	guard against nil maps	2020-04-06 12:25:50 -04:00
Drew Bailey	0c26161e56	update changelog	2020-04-06 11:58:27 -04:00
Tim Gross	50f807060a	e2e: csi tests can only run on linux (#7635 )	2020-04-06 11:57:59 -04:00
Drew Bailey	3b8afce9e6	test added and removed	2020-04-06 11:53:46 -04:00
Drew Bailey	0d550049e9	ensure shutdown delay can be removed	2020-04-06 11:33:04 -04:00
Drew Bailey	9874e7b21d	Group shutdown delay fixes Group shutdown delay updates were not properly handled in Update hook. This commit also ensures that plan output is displayed.	2020-04-06 11:29:12 -04:00
Tim Gross	73dc2ad443	e2e/csi: add waiting for alloc stop	2020-04-06 10:15:55 -04:00
Tim Gross	027277a0d9	csi: make volume GC in job deregister safely async The `Job.Deregister` call will block on the client CSI controller RPCs while the alloc still exists on the Nomad client node. So we need to make the volume claim reaping async from the `Job.Deregister`. This allows `nomad job stop` to return immediately. In order to make this work, this changeset changes the volume GC so that the GC jobs are on a by-volume basis rather than a by-job basis; we won't have to query the (possibly deleted) job at the time of volume GC. We smuggle the volume ID and whether it's a purge into the GC eval ID the same way we smuggled the job ID previously.	2020-04-06 10:15:55 -04:00
Tim Gross	5a3b45864d	csi: fix unpublish workflow ID mismatches The CSI plugins uses the external volume ID for all operations, but the Client CSI RPCs uses the Nomad volume ID (human-friendly) for the mount paths. Pass the External ID as an arg in the RPC call so that the unpublish workflows have it without calling back to the server to find the external ID. The controller CSI plugins need the CSI node ID (or in other words, the storage provider's view of node ID like the EC2 instance ID), not the Nomad node ID, to determine how to detach the external volume.	2020-04-06 10:15:55 -04:00
Tim Gross	161f9aedc3	scheduler: prevent a reported NPE for CSI (#7633 )	2020-04-06 09:42:27 -04:00
Mahmood Ali	23be53a366	Merge pull request #7612 from hashicorp/b-auth-alloc-exec-ws Authenticate alloc/exec websocket requests	2020-04-06 09:24:51 -04:00
Michael Lange	2955d356e7	Test coverage for the volume detail page	2020-04-04 17:13:40 -07:00
Michael Lange	25f4f5a61d	Sort allocation tables by modify index	2020-04-04 17:11:58 -07:00
Michael Lange	a1d2e585a1	Update breadcrumb to match side menu	2020-04-04 17:11:29 -07:00
Michael Lange	6b798518b9	Fix the allocations page compoent to support multiple prop keys It was designed to be used this way, but allocationFor has never worked as intended 🤦	2020-04-04 10:56:12 -07:00
Michael Lange	76bead58a3	Add page size select tests to volumes list tests	2020-04-04 09:58:34 -07:00
Michael Lange	e8e41c5757	Acceptance tests for the volumes list page	2020-04-03 19:28:12 -07:00
Michael Lange	0d90d082bc	Page object for volumes list	2020-04-03 19:28:11 -07:00
Michael Lange	59427662d0	Handle namespaces in the mirage handler for volumes	2020-04-03 19:28:10 -07:00
Michael Lange	fb44f76800	Correctly handle the namespace query param and forbidden state	2020-04-03 19:28:09 -07:00
Michael Lange	280fa5d53b	Annotate volume row and make the tr clickable	2020-04-03 19:27:44 -07:00
Michael Lange	62aa943a95	Filter out volumes that don't match the chosen namespace	2020-04-03 19:27:11 -07:00
Michael Lange	62b7a07189	Sort alphabetically, A first	2020-04-03 19:26:26 -07:00
Michael Lange	1729d41509	Merge pull request #7574 from hashicorp/f-ui/configurable-page-sizes UI Configurable Page Sizes	2020-04-03 16:06:17 -07:00
Lang Martin	1750426d04	csi: run volume claim GC on `job stop -purge` (#7615 ) * nomad/state/state_store: error message copy/paste error * nomad/structs/structs: add a VolumeEval to the JobDeregisterResponse * nomad/job_endpoint: synchronously, volumeClaimReap on job Deregister * nomad/core_sched: make volumeClaimReap available without a CoreSched * nomad/job_endpoint: Deregister return early if the job is missing * nomad/job_endpoint_test: job Deregistion is idempotent * nomad/core_sched: conditionally ignore alloc status in volumeClaimReap * nomad/job_endpoint: volumeClaimReap all allocations, even running * nomad/core_sched_test: extra argument to collectClaimsToGCImpl * nomad/job_endpoint: job deregistration is not idempotent	2020-04-03 17:37:26 -04:00
Mahmood Ali	6c950e971d	Merge pull request #7622 from hashicorp/tests-deflake-TestAutopilot_RollingUpdate tests: deflake TestAutopilot_RollingUpdate	2020-04-03 17:29:55 -04:00
Mahmood Ali	816a93ed4a	tests: deflake TestAutopilot_RollingUpdate I hypothesize that the flakiness in rolling update is due to shutting down s3 server before s4 is properly added as a voter. The chain of the flakiness is as follows: 1. Bootstrap with s1, s2, s3 2. Add s4 3. Wait for servers to register with 3 voting peers * But we already have 3 voters (s1, s2, and s3) * s4 is added as a non-voter in Raft v3 and must wait until autopilot promots it 4. Test proceeds without s4 being a voter 5. s3 shutdown 6. cluster changes stall due to leader election and too many pending configuration changes (e.g. removing s3 from raft, promoting s4). Here, I have the test wait until s4 is marked as a voter before shutting down s3, so we don't have too many configuration changes at once. In https://circleci.com/gh/hashicorp/nomad/57092, I noticed the following events: ``` TestAutopilot_RollingUpdate: autopilot_test.go:204: adding server s4 TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.789Z [INFO] nomad/serf.go:60: nomad: adding server: server="nomad-137.global (Addr: 127.0.0.1:9177) (DC: dc1)" TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.789Z [INFO] raft/raft.go:1018: nomad.raft: updating configuration: command=AddNonvoter server-id=c54b5bf4-1159-34f6-032d-56aefeb08425 server-addr=127.0.0.1:9177 servers="[{Suffrage:Voter ID:df01ba65-d1b2-17a9-f792-a4459b3a7c09 Address:127.0.0.1:9171} {Suffrage:Voter ID:c3337778-811e-2675-87f5-006309888387 Address:127.0.0.1:9173} {Suffrage:Voter ID:186d5e15-c473-e2b3-b5a4-3259a84e10ef Address:127.0.0.1:9169} {Suffrage:Nonvoter ID:c54b5bf4-1159-34f6-032d-56aefeb08425 Address:127.0.0.1:9177}]" TestAutopilot_RollingUpdate: autopilot_test.go:218: shutting down server s3 TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.797Z [INFO] raft/replication.go:456: nomad.raft: aborting pipeline replication: peer="{Nonvoter c54b5bf4-1159-34f6-032d-56aefeb08425 127.0.0.1:9177}" TestAutopilot_RollingUpdate: autopilot_test.go:235: waiting for s4 to stabalize and be promoted TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.975Z [ERROR] raft/raft.go:1656: nomad.raft: failed to make requestVote RPC: target="{Voter c3337778-811e-2675-87f5-006309888387 127.0.0.1:9173}" error="dial tcp 127.0.0.1:9173: connect: connection refused" TestAutopilot_RollingUpdate: retry.go:121: autopilot_test.go:241: don't want "c3337778-811e-2675-87f5-006309888387" autopilot_test.go:241: didn't find map[c54b5bf4-1159-34f6-032d-56aefeb08425:true] in []raft.ServerID{"df01ba65-d1b2-17a9-f792-a4459b3a7c09", "186d5e15-c473-e2b3-b5a4-3259a84e10ef"} ``` Note how s3, c3337778, is present in the peers list in the final failure, but s4, c54b5bf4, is added as a Nonvoter and isn't present in the final peers list.	2020-04-03 17:15:41 -04:00
Tim Gross	966286fee5	fix encoding/decoding tags for api.Task (#7620 ) When `nomad job inspect` encodes the response, if the decoded JSON from the API doesn't exactly match the API struct, the field value will be omitted even if it has a value. We only want the JSON struct tag to `omitempty`.	2020-04-03 16:45:49 -04:00
Tim Gross	d81797ea33	e2e: improve test reliability for CSI (#7616 ) This changeset: * adds eval status to the error messages emitted when we have placement failure in tests. The implementation here isn't quite perfect but it's a lot better than "condition not met". * enforces the ordering of teardown of the CSI test * doesn't pass the purge flag to one of the two CSI tests, so that we exercise both code paths.	2020-04-03 15:52:58 -04:00

... 3 4 5 6 7 ...

18039 Commits All Branches Search

18039 Commits

All Branches