Commit Graph

18039 Commits

Author SHA1 Message Date
Michael Schurter 85d4309128 docs: add 0.11.0-rc1 to download page
update banner from "Beta" to "Release Candidate"
2020-04-07 07:48:53 -07:00
Mahmood Ali 14d6fec05a tests: deflake some SetServer related tests
Some tests assert on numbers on numbers of servers, e.g.
TestHTTP_AgentSetServers and TestHTTP_AgentListServers_ACL . Though, in dev and
test modes, the agent starts with servers having duplicate entries for
advertised and normalized RPC values, then settles with one unique value after
Raft/Serf re-sets servers with one single unique value.

This leads to flakiness, as the test will fail if assertion runs before Serf
update takes effect.

Here, we update the inital dev handling so it only adds a unique value if the
advertised and normalized values are the same.

Sample log lines illustrating the problem:

```
=== CONT  TestHTTP_AgentSetServers
    TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:127.0.0.1:9008 Address:127.0.0.1:9008}]"
    TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.016Z [INFO]  nomad: serf: EventMemberJoin: TestHTTP_AgentSetServers.global 127.0.0.1
    TestHTTP_AgentSetServers: testlog.go:34: 2020-04-06T21:47:51.035Z [DEBUG] client.server_mgr: new server list: new_servers=[127.0.0.1:9008, 127.0.0.1:9008] old_servers=[]
...
    TestHTTP_AgentSetServers: agent_endpoint_test.go:759:
                Error Trace:    agent_endpoint_test.go:759
                                                        http_test.go:1089
                                                        agent_endpoint_test.go:705
                Error:          "[127.0.0.1:9008 127.0.0.1:9008]" should have 1 item(s), but has 2
                Test:           TestHTTP_AgentSetServers
```
2020-04-07 09:27:48 -04:00
Charles Z 42e1070aa4
Fix wrong title in sidecar_task stanza docs (#7335) 2020-04-07 08:28:30 -04:00
Spencer Owen 51741f7aad
Improve vault language (#7644) 2020-04-07 07:54:25 -04:00
Tim Gross caa258b924
docs: add warnings for CSI plugin jobspec (#7642)
* Node/monolith plugins require root privileges and this wasn't being
  made super clear.
* Node/monolith plugins should always be run as system jobs.
2020-04-07 07:51:50 -04:00
Lang Martin c0dbcbef5f
e2e: csi: wait for volume write claims to be released before starting read jobs (#7641) 2020-04-07 07:40:44 -04:00
Michael Schurter b5104e5b6a
Merge pull request #7643 from hashicorp/b-7537
ar/bridge_networking: ensure cni configuration is loaded
2020-04-06 20:25:40 -07:00
Michael Schurter 3c5205c534 docs: add #7643 to changelog 2020-04-06 20:25:09 -07:00
Nick Ethier 44ad5d96d8
ar/bridge: use cni.IsCNINotInitialized helper 2020-04-06 21:44:01 -04:00
Nick Ethier 58fe326090
ar/bridge: better cni status err handling 2020-04-06 21:21:42 -04:00
Nick Ethier 6a286777c7
ar/bridge: ensure cni configuration is always loaded 2020-04-06 21:02:26 -04:00
Nick Ethier 5166806993
Merge pull request #7600 from hashicorp/b-5767
tr/service_hook: prevent Update from running before Poststart finish
2020-04-06 16:52:42 -04:00
Tim Gross 0a9b32e94e
docs: fix broken internal link on job plan page (#7640) 2020-04-06 16:39:11 -04:00
Michael Lange 35d4b48644
Merge pull request #7577 from hashicorp/f-ui/csi-beta-label
UI: Change CSI to Storage and mark it as beta
2020-04-06 13:21:11 -07:00
Nick Ethier 567609e101
tr/service_hook: reset initialized flag during deregister 2020-04-06 16:05:36 -04:00
Buck Doyle f10906e006
UI: add exec handling for dead jobs/task states (#7637)
This closes #7456. It hides the terminal when the job is dead and
displays an error when trying to open an exec session for a task
that isn’t running. There’s a skipped test for the latter behaviour
that I’ll have to come back for.
2020-04-06 14:08:22 -05:00
Buck Doyle fc7de8b153
UI: add live-updating to exec sidebar (#7499)
This closes #7454. It makes use of the existing watchable tools to
allow the exec popup sidebar to be live-updating. It also adds
alphabetic sorting of task groups and tasks.
2020-04-06 13:52:42 -05:00
Drew Bailey 476ca7e7b6
Merge pull request #7638 from hashicorp/audit-docs-queryparams
add note about query params for filtering
2020-04-06 13:08:55 -04:00
Drew Bailey 71b687e7d8
add note about query params for filtering 2020-04-06 13:07:38 -04:00
Drew Bailey 4ab7c03641
Merge pull request #7618 from hashicorp/b-shutdown-delay-updates
Fixes bug that prevented group shutdown_delay updates
2020-04-06 13:05:20 -04:00
Michael Lange bf9887083b
Merge pull request #7630 from hashicorp/f-ui/csi-acceptance-tests
UI: CSI Acceptance Tests
2020-04-06 09:37:45 -07:00
Drew Bailey 0d4bb6bf92
guard against nil maps 2020-04-06 12:25:50 -04:00
Drew Bailey 0c26161e56
update changelog 2020-04-06 11:58:27 -04:00
Tim Gross 50f807060a
e2e: csi tests can only run on linux (#7635) 2020-04-06 11:57:59 -04:00
Drew Bailey 3b8afce9e6
test added and removed 2020-04-06 11:53:46 -04:00
Drew Bailey 0d550049e9
ensure shutdown delay can be removed 2020-04-06 11:33:04 -04:00
Drew Bailey 9874e7b21d
Group shutdown delay fixes
Group shutdown delay updates were not properly handled in Update hook.
This commit also ensures that plan output is displayed.
2020-04-06 11:29:12 -04:00
Tim Gross 73dc2ad443 e2e/csi: add waiting for alloc stop 2020-04-06 10:15:55 -04:00
Tim Gross 027277a0d9 csi: make volume GC in job deregister safely async
The `Job.Deregister` call will block on the client CSI controller RPCs
while the alloc still exists on the Nomad client node. So we need to
make the volume claim reaping async from the `Job.Deregister`. This
allows `nomad job stop` to return immediately. In order to make this
work, this changeset changes the volume GC so that the GC jobs are on a
by-volume basis rather than a by-job basis; we won't have to query
the (possibly deleted) job at the time of volume GC. We smuggle the
volume ID and whether it's a purge into the GC eval ID the same way we
smuggled the job ID previously.
2020-04-06 10:15:55 -04:00
Tim Gross 5a3b45864d csi: fix unpublish workflow ID mismatches
The CSI plugins uses the external volume ID for all operations, but
the Client CSI RPCs uses the Nomad volume ID (human-friendly) for the
mount paths. Pass the External ID as an arg in the RPC call so that
the unpublish workflows have it without calling back to the server to
find the external ID.

The controller CSI plugins need the CSI node ID (or in other words,
the storage provider's view of node ID like the EC2 instance ID), not
the Nomad node ID, to determine how to detach the external volume.
2020-04-06 10:15:55 -04:00
Tim Gross 161f9aedc3
scheduler: prevent a reported NPE for CSI (#7633) 2020-04-06 09:42:27 -04:00
Mahmood Ali 23be53a366
Merge pull request #7612 from hashicorp/b-auth-alloc-exec-ws
Authenticate alloc/exec websocket requests
2020-04-06 09:24:51 -04:00
Michael Lange 2955d356e7 Test coverage for the volume detail page 2020-04-04 17:13:40 -07:00
Michael Lange 25f4f5a61d Sort allocation tables by modify index 2020-04-04 17:11:58 -07:00
Michael Lange a1d2e585a1 Update breadcrumb to match side menu 2020-04-04 17:11:29 -07:00
Michael Lange 6b798518b9 Fix the allocations page compoent to support multiple prop keys
It was designed to be used this way, but allocationFor has never
worked as intended 🤦
2020-04-04 10:56:12 -07:00
Michael Lange 76bead58a3 Add page size select tests to volumes list tests 2020-04-04 09:58:34 -07:00
Michael Lange e8e41c5757 Acceptance tests for the volumes list page 2020-04-03 19:28:12 -07:00
Michael Lange 0d90d082bc Page object for volumes list 2020-04-03 19:28:11 -07:00
Michael Lange 59427662d0 Handle namespaces in the mirage handler for volumes 2020-04-03 19:28:10 -07:00
Michael Lange fb44f76800 Correctly handle the namespace query param and forbidden state 2020-04-03 19:28:09 -07:00
Michael Lange 280fa5d53b Annotate volume row and make the tr clickable 2020-04-03 19:27:44 -07:00
Michael Lange 62aa943a95 Filter out volumes that don't match the chosen namespace 2020-04-03 19:27:11 -07:00
Michael Lange 62b7a07189 Sort alphabetically, A first 2020-04-03 19:26:26 -07:00
Michael Lange 1729d41509
Merge pull request #7574 from hashicorp/f-ui/configurable-page-sizes
UI Configurable Page Sizes
2020-04-03 16:06:17 -07:00
Lang Martin 1750426d04
csi: run volume claim GC on `job stop -purge` (#7615)
* nomad/state/state_store: error message copy/paste error

* nomad/structs/structs: add a VolumeEval to the JobDeregisterResponse

* nomad/job_endpoint: synchronously, volumeClaimReap on job Deregister

* nomad/core_sched: make volumeClaimReap available without a CoreSched

* nomad/job_endpoint: Deregister return early if the job is missing

* nomad/job_endpoint_test: job Deregistion is idempotent

* nomad/core_sched: conditionally ignore alloc status in volumeClaimReap

* nomad/job_endpoint: volumeClaimReap all allocations, even running

* nomad/core_sched_test: extra argument to collectClaimsToGCImpl

* nomad/job_endpoint: job deregistration is not idempotent
2020-04-03 17:37:26 -04:00
Mahmood Ali 6c950e971d
Merge pull request #7622 from hashicorp/tests-deflake-TestAutopilot_RollingUpdate
tests: deflake TestAutopilot_RollingUpdate
2020-04-03 17:29:55 -04:00
Mahmood Ali 816a93ed4a tests: deflake TestAutopilot_RollingUpdate
I hypothesize that the flakiness in rolling update is due to shutting
down s3 server before s4 is properly added as a voter.

The chain of the flakiness is as follows:

1. Bootstrap with s1, s2, s3
2. Add s4
3. Wait for servers to register with 3 voting peers
   * But we already have 3 voters (s1, s2, and s3)
   * s4 is added as a non-voter in Raft v3 and must wait until autopilot promots it
4. Test proceeds without s4 being a voter
5. s3 shutdown
6. cluster changes stall due to leader election and too many pending configuration
changes (e.g. removing s3 from raft, promoting s4).

Here, I have the test wait until s4 is marked as a voter before shutting
down s3, so we don't have too many configuration changes at once.

In https://circleci.com/gh/hashicorp/nomad/57092, I noticed the
following events:

```
TestAutopilot_RollingUpdate: autopilot_test.go:204: adding server s4
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.789Z [INFO]  nomad/serf.go:60: nomad: adding server: server="nomad-137.global (Addr: 127.0.0.1:9177) (DC: dc1)"
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.789Z [INFO]  raft/raft.go:1018: nomad.raft: updating configuration: command=AddNonvoter server-id=c54b5bf4-1159-34f6-032d-56aefeb08425 server-addr=127.0.0.1:9177 servers="[{Suffrage:Voter ID:df01ba65-d1b2-17a9-f792-a4459b3a7c09 Address:127.0.0.1:9171} {Suffrage:Voter ID:c3337778-811e-2675-87f5-006309888387 Address:127.0.0.1:9173} {Suffrage:Voter ID:186d5e15-c473-e2b3-b5a4-3259a84e10ef Address:127.0.0.1:9169} {Suffrage:Nonvoter ID:c54b5bf4-1159-34f6-032d-56aefeb08425 Address:127.0.0.1:9177}]"

    TestAutopilot_RollingUpdate: autopilot_test.go:218: shutting down server s3
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.797Z [INFO]  raft/replication.go:456: nomad.raft: aborting pipeline replication: peer="{Nonvoter c54b5bf4-1159-34f6-032d-56aefeb08425 127.0.0.1:9177}"
    TestAutopilot_RollingUpdate: autopilot_test.go:235: waiting for s4 to stabalize and be promoted
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.975Z [ERROR] raft/raft.go:1656: nomad.raft: failed to make requestVote RPC: target="{Voter c3337778-811e-2675-87f5-006309888387 127.0.0.1:9173}" error="dial tcp 127.0.0.1:9173: connect: connection refused"
    TestAutopilot_RollingUpdate: retry.go:121: autopilot_test.go:241: don't want "c3337778-811e-2675-87f5-006309888387"
        autopilot_test.go:241: didn't find map[c54b5bf4-1159-34f6-032d-56aefeb08425:true] in []raft.ServerID{"df01ba65-d1b2-17a9-f792-a4459b3a7c09", "186d5e15-c473-e2b3-b5a4-3259a84e10ef"}
```

Note how s3, c3337778, is present in the peers list in the final
failure, but s4, c54b5bf4, is added as a Nonvoter and isn't present in
the final peers list.
2020-04-03 17:15:41 -04:00
Tim Gross 966286fee5
fix encoding/decoding tags for api.Task (#7620)
When `nomad job inspect` encodes the response, if the decoded JSON
from the API doesn't exactly match the API struct, the field value
will be omitted even if it has a value. We only want the JSON struct
tag to `omitempty`.
2020-04-03 16:45:49 -04:00
Tim Gross d81797ea33
e2e: improve test reliability for CSI (#7616)
This changeset:

* adds eval status to the error messages emitted when we have
  placement failure in tests. The implementation here isn't quite
  perfect but it's a lot better than "condition not met".
* enforces the ordering of teardown of the CSI test
* doesn't pass the purge flag to one of the two CSI tests, so that we
  exercise both code paths.
2020-04-03 15:52:58 -04:00