Commit graph

3656 commits

Author SHA1 Message Date
Chris Baker 436d46bd19
Merge branch 'main' into f-node-drain-api 2021-04-01 15:22:57 -05:00
Tim Gross 0856483115 CSI: fingerprint detailed node capabilities
In order to support new node RPCs, we need to fingerprint plugin capabilities
in more detail. This changeset mirrors recent work to fingerprint controller
capabilities, but is not yet in use by any Nomad RPC.
2021-04-01 16:00:58 -04:00
Tim Gross 466b620fa4
CSI: volume snapshot 2021-04-01 11:16:52 -04:00
Tim Gross 71b9daffb9 CSI: fix misleading HTTP test
The HTTP test to create CSI volumes depends on having a controller plugin to
talk to, but the test was using a node-only plugin, which allows it to
silently ignore the missing controller.
2021-03-31 16:37:09 -04:00
Tim Gross 9fc4cf1419 CSI: fingerprint detailed controller capabilities
In order to support new controller RPCs, we need to fingerprint volume
capabilities in more detail and perform controller RPCs only when the specific
capability is present. This fixes a bug in Ceph support where the plugin can
only suport create/delete but we assume that it also supports attach/detach.
2021-03-31 16:37:09 -04:00
Tim Gross f149abfa41 CSI: volume creation/registration should not validate attachment
The CSI specification requires that we validate a list of `Capability` (access
mode + accessibility) when we create volume, but the existing volume
registration workflow incorrectly validates a single capability. The
specific capability required by a volume claim is checked at the time we make
the claim, so remove the check for `AttachmentMode`/`AcccessMode`.
2021-03-31 16:37:09 -04:00
Tim Gross aec5337862 CSI: HTTP handlers for create/delete/list 2021-03-31 16:37:09 -04:00
Tim Gross d38008176e CSI: create/delete/list volume RPCs
This commit implements the RPC handlers on the client that talk to the CSI
plugins on that client for the Create/Delete/List RPC.
2021-03-31 16:37:09 -04:00
Tim Gross 43622680fa test infrastructure for mock client RPCs (#10193)
This commit includes a new test client that allows overriding the RPC
protocols. Only the RPCs that are passed in are registered, which lets you
implement a mock RPC in the server tests. This commit includes an example of
this for the ClientCSI RPC server.
2021-03-31 16:37:09 -04:00
Mahmood Ali e24dac2f39 fixup! oversubscription: Add MemoryMaxMB to internal structs 2021-03-31 09:26:26 -04:00
Mahmood Ali 4ec2d8f5e4 fixup! oversubscription: Add MemoryMaxMB to internal structs 2021-03-31 08:52:20 -04:00
Mahmood Ali 0c2551270a oversubscription: Add MemoryMaxMB to internal structs
Start tracking a new MemoryMaxMB field that represents the maximum memory a task
may use in the client. This allows tasks to specify a memory reservation (to be
used by scheduler when placing the task) but use excess memory used on the
client if the client has any.

This commit adds the server tracking for the value, and ensures that allocations
AllocatedResource fields include the value.
2021-03-30 16:55:58 -04:00
Nick Ethier daecfa61e6
Merge pull request #10203 from hashicorp/f-cpu-cores
Reserved Cores [1/4]: Structs and scheduler implementation
2021-03-29 14:05:54 -04:00
Chris Baker 16e37b986a reworked Node.Canonicalize() to enforce invariants, fixed a broken test 2021-03-26 18:58:38 +00:00
Chris Baker 99ef00d5da reinserted/expanded fsm node.canonicalize test that was still needed 2021-03-26 17:10:39 +00:00
Chris Baker 770c9cecb5 restored Node.Sanitize() for RPC endpoints
multiple other updates from code review
2021-03-26 17:03:15 +00:00
Mahmood Ali dbc3850358
Merge pull request #10145 from hashicorp/b-periodic-init-status
periodic: always reset periodic children status
2021-03-26 09:19:08 -04:00
Mahmood Ali 5d75705edd dispatched parameterized job should clear status too 2021-03-25 15:14:21 -04:00
Mahmood Ali e643742a38 Add a test for parameterized summary counts 2021-03-25 11:27:09 -04:00
Mahmood Ali b0e048bfa4 periodic: always reset periodic children status
Fixes a bug where Nomad reports negative or incorrect running children
counts for periodic jobs.

The periodic dispatcher derives a child job without reseting the status.
If the periodic job has a `running` status, the derived job will start
as `running` status and transition to `pending`.  Since this is
unexpected transition, the counting in StateStore.setJobSummary gets out of sync and
result in negative/incorrect values.

Note that this only affects periodic jobs after a leader transition.
During the first job registration, the job is added with `pending` or
`""` status. However, after a leader transition, the new leader
repopulates the dispatcher heap with `"running"` status and triggers the
bug.
2021-03-25 11:27:09 -04:00
Chris Baker 07646316e5 some comments on the new json extensions/encoding 2021-03-23 18:18:51 +00:00
Chris Baker a43b3a8736 change to fail-safe in json encoding 2021-03-23 18:13:10 +00:00
Chris Baker cb540ed691 added tests that the API doesn't leak Node.SecretID
added more documentation on JSON encoding to the contributing guide
2021-03-23 18:09:20 +00:00
Drew Bailey 74836b95b2
configuration and oss components for licensing (#10216)
* configuration and oss components for licensing

* vendor sync
2021-03-23 09:08:14 -04:00
Mahmood Ali 73b7b98018
testing: default nomad test nodes to 1.0.0 (#10213) 2021-03-23 08:24:26 -04:00
Chris Baker a186badf35 moved JSON handlers and extension code around a bit for proper order of
initialization
2021-03-22 14:12:42 +00:00
Chris Baker 9f7bc5a575 refactor? 2021-03-22 01:49:21 +00:00
Chris Baker dd291e69f4 removed deprecated fields from Drain structs and API
node drain: use msgtype on txn so that events are emitted
wip: encoding extension to add Node.Drain field back to API responses

new approach for hiding Node.SecretID in the API, using `json` tag
documented this approach in the contributing guide
refactored the JSON handlers with extensions
modified event stream encoding to use the go-msgpack encoders with the extensions
2021-03-21 15:30:11 +00:00
Nick Ethier 648ade63ad scheduler: implement scheduling of reserved cores 2021-03-19 00:29:07 -04:00
Nick Ethier 26b200e8bd api: add new 'cores' field to task resources 2021-03-18 23:13:30 -04:00
Nick Ethier 4b2912d343 structs: add struct fields and funcs for reservable cpu cores 2021-03-18 22:49:06 -04:00
Tim Gross fa25e048b2
CSI: unique volume per allocation
Add a `PerAlloc` field to volume requests that directs the scheduler to test
feasibility for volumes with a source ID that includes the allocation index
suffix (ex. `[0]`), rather than the exact source ID.

Read the `PerAlloc` field when making the volume claim at the client to
determine if the allocation index suffix (ex. `[0]`) should be added to the
volume source ID.
2021-03-18 15:35:11 -04:00
Tim Gross 9b2b580d1a
CSI: remove prefix matching from CSIVolumeByID and fix CLI prefix matching (#10158)
Callers of `CSIVolumeByID` are generally assuming they should receive a single
volume. This potentially results in feasibility checking being performed
against the wrong volume if a volume's ID is a prefix substring of other
volume (for example: "test" and "testing").

Removing the incorrect prefix matching from `CSIVolumeByID` breaks prefix
matching in the command line client. Add the required elements for prefix
matching to the commands and API.
2021-03-18 14:32:40 -04:00
Charlie Voiselle 0473f35003
Fixup uses of sanity (#10187)
* Fixup uses of `sanity`
* Remove unnecessary comments.

These checks are better explained by earlier comments about
the context of the test. Per @tgross, moved the tests together
to better reinforce the overall shared context.

* Update nomad/fsm_test.go
2021-03-16 18:05:08 -04:00
Mahmood Ali 1d48433356
server: handle invalid jobs in expose handler hook (#10154)
The expose handler hook must handle if the submitted job is invalid. Without this validation, the rpc handler panics on invalid input.

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2021-03-10 09:12:46 -05:00
Tim Gross 7010a344d6 one-time token: never return expired tokens 2021-03-10 08:17:56 -05:00
Tim Gross 97b0e26d1f RPC endpoints to support 'nomad ui -login'
RPC endpoints for the user-driven APIs (`UpsertOneTimeToken` and
`ExchangeOneTimeToken`) and token expiration (`ExpireOneTimeTokens`).
Includes adding expiration to the periodic core GC job.
2021-03-10 08:17:56 -05:00
Tim Gross 6b2c4b56d0 state store updates for one-time tokens
The `OneTimeToken` struct is to support the `nomad ui -login` command. This
changeset adds the struct to the Nomad state store.
2021-03-10 08:17:56 -05:00
Andre Ilhicas f45fc6c899
consul/connect: enable setting local_bind_address in upstream 2021-02-26 11:37:31 +00:00
Drew Bailey 86d9e1ff90
Merge pull request #9955 from hashicorp/on-update-services
Service and Check on_update configuration option (readiness checks)
2021-02-24 10:11:05 -05:00
Tim Gross b764f52ab9
deploymentwatcher: reset progress deadline on promotion (#10042)
In a deployment with two groups (ex. A and B), if group A's canary becomes
healthy before group B's, the deadline for the overall deployment will be set
to that of group A. When the deployment is promoted, if group A is done it
will not contribute to the next deadline cutoff. Group B's old deadline will
be used instead, which will be in the past and immediately trigger a
deployment progress failure. Reset the progress deadline when the job is
promotion to avoid this bug, and to better conform with implicit user
expectations around how the progress deadline should interact with promotions.
2021-02-22 16:44:03 -05:00
James Rasell 6553cc3da7
drainer: fix error message when handling drain deadlined nodes. 2021-02-18 11:45:44 +01:00
AndrewChubatiuk cd152643fb fixed connect port label 2021-02-13 02:42:14 +02:00
AndrewChubatiuk 3d0aa2ef56 allocate sidecar task port on host_network interface 2021-02-13 02:42:13 +02:00
Nick Ethier fcc1f4c805
Merge pull request #9946 from hashicorp/b-9477
structs: namespace port validation by host_network
2021-02-11 12:53:28 -05:00
Seth Hoenig 45e0e70a50 consul/connect: enable custom sidecars to use expose checks
This PR enables jobs configured with a custom sidecar_task to make
use of the `service.expose` feature for creating checks on services
in the service mesh. Before we would check that sidecar_task had not
been set (indicating that something other than envoy may be in use,
which would not support envoy's expose feature). However Consul has
not added support for anything other than envoy and probably never
will, so having the restriction in place seems like an unnecessary
hindrance. If Consul ever does support something other than Envoy,
they will likely find a way to provide the expose feature anyway.

Fixes #9854
2021-02-09 10:49:37 -06:00
Drew Bailey 8507d54e3b
e2e test for on_update service checks
check_restart not compatible with on_update=ignore

reword caveat
2021-02-08 08:32:40 -05:00
Drew Bailey 82f971f289
OnUpdate configuration for services and checks
Allow for readiness type checks by configuring nomad to ignore warnings
or errors reported by a service check. This allows the deployment to
progress and while Consul handles introducing the sercive into a
resource pool once the check passes.
2021-02-08 08:32:40 -05:00
Nick Ethier eacc4da499
Merge branch 'master' into b-9477 2021-02-05 11:58:13 -05:00
Tim Gross eb3dd17fb2 volumes: implement plan diff for volume requests
The details of host volume and CSI volume requests do not show up in `nomad
plan` outputs, although the updates are detected by the scheduler and result
in an update as expected.
2021-02-04 16:55:17 -05:00