Commit graph

17796 commits

Author SHA1 Message Date
Tim Gross 50f807060a
e2e: csi tests can only run on linux (#7635) 2020-04-06 11:57:59 -04:00
Tim Gross 73dc2ad443 e2e/csi: add waiting for alloc stop 2020-04-06 10:15:55 -04:00
Tim Gross 027277a0d9 csi: make volume GC in job deregister safely async
The `Job.Deregister` call will block on the client CSI controller RPCs
while the alloc still exists on the Nomad client node. So we need to
make the volume claim reaping async from the `Job.Deregister`. This
allows `nomad job stop` to return immediately. In order to make this
work, this changeset changes the volume GC so that the GC jobs are on a
by-volume basis rather than a by-job basis; we won't have to query
the (possibly deleted) job at the time of volume GC. We smuggle the
volume ID and whether it's a purge into the GC eval ID the same way we
smuggled the job ID previously.
2020-04-06 10:15:55 -04:00
Tim Gross 5a3b45864d csi: fix unpublish workflow ID mismatches
The CSI plugins uses the external volume ID for all operations, but
the Client CSI RPCs uses the Nomad volume ID (human-friendly) for the
mount paths. Pass the External ID as an arg in the RPC call so that
the unpublish workflows have it without calling back to the server to
find the external ID.

The controller CSI plugins need the CSI node ID (or in other words,
the storage provider's view of node ID like the EC2 instance ID), not
the Nomad node ID, to determine how to detach the external volume.
2020-04-06 10:15:55 -04:00
Tim Gross 161f9aedc3
scheduler: prevent a reported NPE for CSI (#7633) 2020-04-06 09:42:27 -04:00
Mahmood Ali 23be53a366
Merge pull request #7612 from hashicorp/b-auth-alloc-exec-ws
Authenticate alloc/exec websocket requests
2020-04-06 09:24:51 -04:00
Michael Lange 1729d41509
Merge pull request #7574 from hashicorp/f-ui/configurable-page-sizes
UI Configurable Page Sizes
2020-04-03 16:06:17 -07:00
Lang Martin 1750426d04
csi: run volume claim GC on job stop -purge (#7615)
* nomad/state/state_store: error message copy/paste error

* nomad/structs/structs: add a VolumeEval to the JobDeregisterResponse

* nomad/job_endpoint: synchronously, volumeClaimReap on job Deregister

* nomad/core_sched: make volumeClaimReap available without a CoreSched

* nomad/job_endpoint: Deregister return early if the job is missing

* nomad/job_endpoint_test: job Deregistion is idempotent

* nomad/core_sched: conditionally ignore alloc status in volumeClaimReap

* nomad/job_endpoint: volumeClaimReap all allocations, even running

* nomad/core_sched_test: extra argument to collectClaimsToGCImpl

* nomad/job_endpoint: job deregistration is not idempotent
2020-04-03 17:37:26 -04:00
Mahmood Ali 6c950e971d
Merge pull request #7622 from hashicorp/tests-deflake-TestAutopilot_RollingUpdate
tests: deflake TestAutopilot_RollingUpdate
2020-04-03 17:29:55 -04:00
Mahmood Ali 816a93ed4a tests: deflake TestAutopilot_RollingUpdate
I hypothesize that the flakiness in rolling update is due to shutting
down s3 server before s4 is properly added as a voter.

The chain of the flakiness is as follows:

1. Bootstrap with s1, s2, s3
2. Add s4
3. Wait for servers to register with 3 voting peers
   * But we already have 3 voters (s1, s2, and s3)
   * s4 is added as a non-voter in Raft v3 and must wait until autopilot promots it
4. Test proceeds without s4 being a voter
5. s3 shutdown
6. cluster changes stall due to leader election and too many pending configuration
changes (e.g. removing s3 from raft, promoting s4).

Here, I have the test wait until s4 is marked as a voter before shutting
down s3, so we don't have too many configuration changes at once.

In https://circleci.com/gh/hashicorp/nomad/57092, I noticed the
following events:

```
TestAutopilot_RollingUpdate: autopilot_test.go:204: adding server s4
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.789Z [INFO]  nomad/serf.go:60: nomad: adding server: server="nomad-137.global (Addr: 127.0.0.1:9177) (DC: dc1)"
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.789Z [INFO]  raft/raft.go:1018: nomad.raft: updating configuration: command=AddNonvoter server-id=c54b5bf4-1159-34f6-032d-56aefeb08425 server-addr=127.0.0.1:9177 servers="[{Suffrage:Voter ID:df01ba65-d1b2-17a9-f792-a4459b3a7c09 Address:127.0.0.1:9171} {Suffrage:Voter ID:c3337778-811e-2675-87f5-006309888387 Address:127.0.0.1:9173} {Suffrage:Voter ID:186d5e15-c473-e2b3-b5a4-3259a84e10ef Address:127.0.0.1:9169} {Suffrage:Nonvoter ID:c54b5bf4-1159-34f6-032d-56aefeb08425 Address:127.0.0.1:9177}]"

    TestAutopilot_RollingUpdate: autopilot_test.go:218: shutting down server s3
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.797Z [INFO]  raft/replication.go:456: nomad.raft: aborting pipeline replication: peer="{Nonvoter c54b5bf4-1159-34f6-032d-56aefeb08425 127.0.0.1:9177}"
    TestAutopilot_RollingUpdate: autopilot_test.go:235: waiting for s4 to stabalize and be promoted
    TestAutopilot_RollingUpdate: testlog.go:34: 2020-04-03T20:08:19.975Z [ERROR] raft/raft.go:1656: nomad.raft: failed to make requestVote RPC: target="{Voter c3337778-811e-2675-87f5-006309888387 127.0.0.1:9173}" error="dial tcp 127.0.0.1:9173: connect: connection refused"
    TestAutopilot_RollingUpdate: retry.go:121: autopilot_test.go:241: don't want "c3337778-811e-2675-87f5-006309888387"
        autopilot_test.go:241: didn't find map[c54b5bf4-1159-34f6-032d-56aefeb08425:true] in []raft.ServerID{"df01ba65-d1b2-17a9-f792-a4459b3a7c09", "186d5e15-c473-e2b3-b5a4-3259a84e10ef"}
```

Note how s3, c3337778, is present in the peers list in the final
failure, but s4, c54b5bf4, is added as a Nonvoter and isn't present in
the final peers list.
2020-04-03 17:15:41 -04:00
Tim Gross 966286fee5
fix encoding/decoding tags for api.Task (#7620)
When `nomad job inspect` encodes the response, if the decoded JSON
from the API doesn't exactly match the API struct, the field value
will be omitted even if it has a value. We only want the JSON struct
tag to `omitempty`.
2020-04-03 16:45:49 -04:00
Tim Gross d81797ea33
e2e: improve test reliability for CSI (#7616)
This changeset:

* adds eval status to the error messages emitted when we have
  placement failure in tests. The implementation here isn't quite
  perfect but it's a lot better than "condition not met".
* enforces the ordering of teardown of the CSI test
* doesn't pass the purge flag to one of the two CSI tests, so that we
  exercise both code paths.
2020-04-03 15:52:58 -04:00
Mahmood Ali 340f9a5e91 ui: explicit reference to window.localStorage 2020-04-03 14:31:19 -04:00
Mahmood Ali ed4c4d13a4 fixup! backend: support WS authentication handshake in alloc/exec 2020-04-03 14:20:31 -04:00
Buck Doyle 2940aa14e5 Remove redundant step assertion 2020-04-03 12:54:47 -05:00
Buck Doyle b9a2d20445 Remove redundant pause 2020-04-03 12:53:57 -05:00
Buck Doyle e6ecd2bf4f Remove redundant assertions
These are more things that are already covered elsewhere.
2020-04-03 12:52:39 -05:00
Buck Doyle 4de1255a31 Remove redundant assertions from token exec test
This only needs to check that the token is sent, the rest of
the assertions were covered by the previous test.
2020-04-03 12:35:51 -05:00
Buck Doyle cb6f110b97 Remove intermediate storage variable 2020-04-03 12:27:03 -05:00
Buck Doyle b12f97bb81 Change to setting token directly
Most tests bypass setting the token via the UI, instead choosing
to set it in localStorage directly, because the acceptance tests
for the token UI are sufficient to exercise that part of the UI,
so this speeds up the test a bit.
2020-04-03 12:26:25 -05:00
Buck Doyle 0ec5e95f46 Add space 2020-04-03 12:21:44 -05:00
Buck Doyle fbe40a5d36
UI: add handling for exec command-editing keys (#7601)
This is a minimal implementation that closes #7463. It doesn’t include
true support for moving around within the command to edit using arrow
keys because it gets too complex when managing wrapping at the edge of
the terminal. Instead, arrow keys are ignored. It also ignores ^A and
^E, which are cursor manipulations that pose similar problems to arrow
keys. It does support ^U, which deletes the entire command.

It also allows a command to be pasted, which was previously unsupported.
This is accomplished by migrating from Xterm.js’s onKey handler to
onData, which is recommended here:
https://github.com/xtermjs/xterm.js/issues/2673#issuecomment-574897733

onData is a higher-level handler that issues events with the final
interpreted data instead of the individual key events. That means the
processing in this PR has changed from inspecting DOM key events to
inspecting their ASCII equivalents, which I’ve extracted into a utility
dictionary for use in tests and implementation.

One consequence of ignoring most control characters is that if you paste
a string that includes a control character, that character will be
stripped. It’s somewhat strange for compound sequences like arrow keys; 
if you run copy('/bin/b' + '\x1b[D' + 'ash') in a Javascript console and
paste what’s on the clipboard, you get "/bin/b[Dash". That’s because
the left arrow key, as in that centre portion of the string,
is represented by the escape character and a coded sequence. Stripping
the control character leaves the coded sequence as part of the paste.
That seems like an acceptable compromise vs either ignoring any pasted
string with control characters (confusing UX) or trying to interpret and
strip all such compound control sequences (difficult to be exhaustive).
2020-04-03 12:14:47 -05:00
Mahmood Ali cec76a4f66 ui: send authentication ws handshake
Have the UI send the authentication websocket handshake message.
2020-04-03 11:49:22 -04:00
Mahmood Ali e63e096136 backend: support WS authentication handshake in alloc/exec
The javascript Websocket API doesn't support setting custom headers
(e.g. `X-Nomad-Token`).  This change adds support for having an
authentication handshake message: clients can set `ws_handshake` URL
query parameter to true and send a single handshake message with auth
token first before any other mssage.

This is a backward compatible change: it does not affect nomad CLI path, as it
doesn't set `ws_handshake` parameter.
2020-04-03 11:18:54 -04:00
Seth Hoenig 4d3686aa52
Merge pull request #7611 from hashicorp/docs-tls-consul-in-changelog
docs: add connect with TLS consul in changelog
2020-04-03 09:01:50 -06:00
Seth Hoenig 433ccab8ae docs: add connect w/ tls consul in changelog 2020-04-03 08:57:39 -06:00
Seth Hoenig 60c9b73eba
Merge pull request #7602 from hashicorp/b-connect-bootstrap-tls-config
connect: set consul TLS options on envoy bootstrap
2020-04-03 08:50:36 -06:00
Tim Gross 4c51687cbf
e2e: remove gometa from e2eutils (#7610) 2020-04-03 10:22:22 -04:00
Drew Bailey aabb7fdd38
Merge pull request #7494 from hashicorp/update-audit-docs
make placement of filter and sinks stanzas clearer
2020-04-03 09:04:49 -04:00
Mahmood Ali c6cdb3a8a8
Merge pull request #7608 from hashicorp/b-config-lower-case
Tweak declared hcl key casing in structs
2020-04-03 08:59:31 -04:00
Drew Bailey d55a10203b
Update website/pages/docs/configuration/audit.mdx
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-03 08:53:38 -04:00
Drew Bailey ba5908d12f
make placement of filter and sinks stanzas clearer 2020-04-03 08:43:05 -04:00
Mahmood Ali 5587dc58c0 Use lowercase for hcl keys
This is not a change in behavior, hcl key matching is case insensitive
as desmonstrated in `command.agent/TestConfig_Parse`
2020-04-03 07:56:00 -04:00
Mahmood Ali 990cfb6fef agent config parsing tests for scheduler config 2020-04-03 07:54:32 -04:00
Michael Lange edea9faf22 Refactor page-size-select page object into a reusable component 2020-04-02 15:52:44 -07:00
Seth Hoenig f7ef64d264
Merge pull request #7440 from hashicorp/docs-connect-expose
docs: add documentation for proxy.expose configuration
2020-04-02 16:51:24 -06:00
Seth Hoenig 9a79b182b7 docs: add exposed TG service check examples to expose docs 2020-04-02 16:42:38 -06:00
Michael Lange 49019bd967 Make table foot fields consistent at all breakpoints
This effectively overrides Bulma's default field layout tweaks
at different breakpoints. This includes going from flex to block
and different font-sizes.
2020-04-02 13:41:41 -07:00
Michael Lange b1d5a77e76 Remove extranneous order property
The "default" order values as set by Bulma are different for different
breakpoints. Since this wasn't considering breakpoints, it resulted
in the unexpected reordering of pagination elements as different page
widths. Turns out removing this property gives us what we want.
2020-04-02 13:27:29 -07:00
Tim Gross 34348708ac
csi: internals docs (#7522)
Summarizes the internal RFCs for plugins and volume lifecycles.
2020-04-02 16:05:26 -04:00
Tim Gross f6b3d38eb8
CSI: move node unmount to server-driven RPCs (#7596)
If a volume-claiming alloc stops and the CSI Node plugin that serves
that alloc's volumes is missing, there's no way for the allocrunner
hook to send the `NodeUnpublish` and `NodeUnstage` RPCs.

This changeset addresses this issue with a redesign of the client-side
for CSI. Rather than unmounting in the alloc runner hook, the alloc
runner hook will simply exit. When the server gets the
`Node.UpdateAlloc` for the terminal allocation that had a volume claim,
it creates a volume claim GC job. This job will made client RPCs to a
new node plugin RPC endpoint, and only once that succeeds, move on to
making the client RPCs to the controller plugin. If the node plugin is
unavailable, the GC job will fail and be requeued.
2020-04-02 16:04:56 -04:00
Michael Lange 81e7296447 Apply the page size select behavior to the other pages with the page size selector 2020-04-02 12:50:37 -07:00
Michael Lange f08fd23d00 Factor page select tests into their own behavior 2020-04-02 12:50:36 -07:00
Michael Lange 70eb558b65 Acceptance tests for the page size selector on the jobs list view 2020-04-02 12:50:35 -07:00
Michael Lange e0110e1757 Repeat new pagination pattern throughout the app 2020-04-02 12:50:34 -07:00
Michael Lange 8dc54a6164 Reset current page when changing page size 2020-04-02 12:50:33 -07:00
Michael Lange 3d02f61455 Replace crusty lt and gt with chevron icons 2020-04-02 12:50:32 -07:00
Michael Lange 546751a9b4 Style the page size selector 2020-04-02 12:50:31 -07:00
Michael Lange 53954d1bc3 Add page size select to the jobs list page 2020-04-02 12:50:30 -07:00
Michael Lange 06524fe5a7 Page size select component 2020-04-02 12:50:29 -07:00