Commit Graph

4178 Commits

Author SHA1 Message Date
Tim Gross 3d38592fbb
csi: add VolumeContext to NodeStage/Publish RPCs (#8239)
In #7957 we added support for passing a volume context to the controller RPCs.
This is an opaque map that's created by `CreateVolume` or, in Nomad's case,
in the volume registration spec.

However, we missed passing this field to the `NodeStage` and `NodePublish` RPC,
which prevents certain plugins (such as MooseFS) from making node RPCs.
2020-06-22 13:54:32 -04:00
Michael Schurter 562704124d
Merge pull request #8208 from hashicorp/f-multi-network
multi-interface network support
2020-06-19 15:46:48 -07:00
Mahmood Ali 3824e0362c
Revert "client: defensive against getting stale alloc updates" 2020-06-19 15:39:44 -04:00
Nick Ethier a87e91e971
test: fix up testing around host networks 2020-06-19 13:53:31 -04:00
Nick Ethier f0ac1f027a
lint: spelling 2020-06-19 11:29:41 -04:00
Nick Ethier 0374ad3e6c
taskenv: populate NOMAD_IP|PORT|ADDR env from allocated ports 2020-06-19 10:51:32 -04:00
Nick Ethier f0559a8162
multi-interface network support 2020-06-19 09:42:10 -04:00
Nick Ethier 4a44deaa5c CNI Implementation (#7518) 2020-06-18 11:05:29 -07:00
Nick Ethier 0bc0403cc3 Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Drew Bailey 84afc28ceb
only report tasklogger is running if both stdout and stderr are still running (#8155)
* only report tasklogger is running if both stdout and stderr are still running

* changelog
2020-06-12 09:17:35 -04:00
Lang Martin ac7c39d3d3
Delayed evaluations for `stop_after_client_disconnect` can cause unwanted extra followup evaluations around job garbage collection (#8099)
* client/heartbeatstop: reversed time condition for startup grace

* scheduler/generic_sched: use `delayInstead` to avoid a loop

Without protecting the loop that creates followUpEvals, a delayed eval
is allowed to create an immediate subsequent delayed eval. For both
`stop_after_client_disconnect` and the `reschedule` block, a delayed
eval should always produce some immediate result (running or blocked)
and then only after the outcome of that eval produce a second delayed
eval.

* scheduler/reconcile: lostLater are different than delayedReschedules

Just slightly. `lostLater` allocs should be used to create batched
evaluations, but `handleDelayedReschedules` assumes that the
allocations are in the untainted set. When it creates the in-place
updates to those allocations at the end, it causes the allocation to
be treated as running over in the planner, which causes the initial
`stop_after_client_disconnect` evaluation to be retried by the worker.
2020-06-03 09:48:38 -04:00
Mahmood Ali 5703c0db80 tests: Run a task long enough to be restartable 2020-05-31 10:33:03 -04:00
Drew Bailey 59ca304fce
give enterpriseclient a logger (#8072) 2020-05-28 15:43:16 -04:00
Drew Bailey 34871f89be
Oss license support for ent builds (#8054)
* changes necessary to support oss licesning shims

revert nomad fmt changes

update test to work with enterprise changes

update tests to work with new ent enforcements

make check

update cas test to use scheduler algorithm

back out preemption changes

add comments

* remove unused method
2020-05-27 13:46:52 -04:00
Mahmood Ali 2588b3bc98 cleanup driver eventor goroutines
This fixes few cases where driver eventor goroutines are leaked during
normal operations, but especially so in tests.

This change makes few modifications:

First, it switches drivers to use `Context`s to manage shutdown events.
Previously, it relied on callers invoking `.Shutdown()` function that is
specific to internal drivers only and require casting.  Using `Contexts`
provide a consistent idiomatic way to manage lifecycle for both internal
and external drivers.

Also, I discovered few places where we don't clean up a temporary driver
instance in the plugin catalog code, where we dispense a driver to
inspect and validate the schema config without properly cleaning it up.
2020-05-26 11:04:04 -04:00
Tim Gross ba11aef5d9
csi: skip unit tests on unsupported platforms (#8033)
Some of the unit tests for CSI require platform-specific APIs that aren't
available on macOS. We can safely skip these tests.
2020-05-21 13:56:50 -04:00
Tim Gross aa8927abb4
volumes: return better error messages for unsupported task drivers (#8030)
When an allocation runs for a task driver that can't support volume mounts,
the mounting will fail in a way that can be hard to understand. With host
volumes this usually means failing silently, whereas with CSI the operator
gets inscrutable internals exposed in the `nomad alloc status`.

This changeset adds a MountConfig field to the task driver Capabilities
response. We validate this when the `csi_hook` or `volume_hook` fires and
return a user-friendly error.

Note that we don't currently have a way to get driver capabilities up to the
server, except through attributes. Validating this when the user initially
submits the jobspec would be even better than what we're doing here (and could
be useful for all our other capabilities), but that's out of scope for this
changeset.

Also note that the MountConfig enum starts with "supports all" in order to
support community plugins in a backwards compatible way, rather than cutting
them off from volume mounting unexpectedly.
2020-05-21 09:18:02 -04:00
Tim Gross 065fa7af8b
stats_hook: log normal shutdown condition as debug, not error (#8028)
The `stats_hook` writes an Error log every time an allocation becomes
terminal. This is a normal condition, not an error. A real error
condition like a failure to collect the stats is logged later. It just
creates log noise, and this is a particularly bad operator experience
for heavy batch workloads.
2020-05-20 10:28:30 -04:00
Mahmood Ali 751f337f1c Update hcl2 vendoring
The hcl2 library has moved from http://github.com/hashicorp/hcl2 to https://github.com/hashicorp/hcl/tree/hcl2.

This updates Nomad's vendoring to start using hcl2 library.  Also
updates some related libraries (e.g. `github.com/zclconf/go-cty/cty` and
`github.com/apparentlymart/go-textseg`).
2020-05-19 15:00:03 -04:00
Tim Gross 6a463dc13a
csi: use a blocking initial connection with timeout (#7965)
The plugin supervisor lazily connects to plugins, but this means we
only get "Unavailable" back from the gRPC call in cases where the
plugin can never be reached (for example, if the Nomad client has the
wrong permissions for the socket).

This changeset improves the operator experience by switching to a
blocking `DialWithContext`. It eagerly connects so that we can
validate the connection is real and get a "failed to open" error in
case where Nomad can't establish the initial connection.
2020-05-15 08:17:11 -04:00
Tim Gross 2082cf738a
csi: support for VolumeContext and VolumeParameters (#7957)
The MVP for CSI in the 0.11.0 release of Nomad did not include support
for opaque volume parameters or volume context. This changeset adds
support for both.

This also moves args for ControllerValidateCapabilities into a struct.
The CSI plugin `ControllerValidateCapabilities` struct that we turn
into a CSI RPC is accumulating arguments, so moving it into a request
struct will reduce the churn of this internal API, make the plugin
code more readable, and make this method consistent with the other
plugin methods in that package.
2020-05-15 08:16:01 -04:00
Tim Gross 24aa32c503 csi: use a blocking initial connection with timeout
The plugin supervisor lazily connects to plugins, but this means we
only get "Unavailable" back from the gRPC call in cases where the
plugin can never be reached (for example, if the Nomad client has the
wrong permissions for the socket).

This changeset improves the operator experience by switching to a
blocking `DialWithContext`. It eagerly connects so that we can
validate the connection is real and get a "failed to open" error in
case where Nomad can't establish the initial connection.
2020-05-14 15:59:19 -04:00
Tim Gross 4f54a633a2
csi: refactor internal client field name to ExternalID (#7958)
The CSI plugins RPCs require the use of the storage provider's volume
ID, rather than the user-defined volume ID. Although changing the RPCs
to use the field name `ExternalID` risks breaking backwards
compatibility, we can use the `ExternalID` name internally for the
client and only use `VolumeID` at the RPC boundaries.
2020-05-14 11:56:07 -04:00
Lang Martin d3c4700cd3
server: stop after client disconnect (#7939)
* jobspec, api: add stop_after_client_disconnect

* nomad/state/state_store: error message typo

* structs: alloc methods to support stop_after_client_disconnect

1. a global AllocStates to track status changes with timestamps. We
   need this to track the time at which the alloc became lost
   originally.

2. ShouldClientStop() and WaitClientStop() to actually do the math

* scheduler/reconcile_util: delayByStopAfterClientDisconnect

* scheduler/reconcile: use delayByStopAfterClientDisconnect

* scheduler/util: updateNonTerminalAllocsToLost comments

This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now

* scheduler/testing: AssertEvalStatus just use require

* scheduler/generic_sched: don't create a blocked eval if delayed

* scheduler/generic_sched_test: several scheduling cases
2020-05-13 16:39:04 -04:00
Mahmood Ali 0ece631e60 allochealth: Fix when check health preceeds task health
Fix a bug where if the alloc check becomes healthy before the task health, the
alloc may never be considered healthy.
2020-05-13 07:44:39 -04:00
Mahmood Ali 934c5e8ff0 tests: tests for health check sequencing
Add a failing tests to show that if an alloc checks is marked healthy before the
alloc tasks start up, the alloc may be forever considered unhealthy.
2020-05-13 07:43:00 -04:00
Tim Gross 4374c1a837
csi: support Secrets parameter in CSI RPCs (#7923)
CSI plugins can require credentials for some publishing and
unpublishing workflow RPCs. Secrets are configured at the time of
volume registration, stored in the volume struct, and then passed
around as an opaque map by Nomad to the plugins.
2020-05-11 17:12:51 -04:00
Mahmood Ali 938e916d9c When serializing msgpack, only consider codec tag
When serializing structs with msgpack, only consider type tags of
`codec`.

Hashicorp/go-msgpack (based on ugorji/go) defaults to interpretting
`codec` tag if it's available, but falls to using `json` if `codec`
isn't present.

This behavior is surprising in cases where we want to serialize json
differently from msgpack, e.g. serializing `ConsulExposeConfig`.
2020-05-11 14:14:10 -04:00
Mahmood Ali 543f08c1ae Deflake TestTaskTemplateManager_BlockedEvents test
This change deflakes TestTaskTemplateManager_BlockedEvents test, because
it is expecting a number of events without accounting for transitional
state.

The test TestTaskTemplateManager_BlockedEvents attempts to ensure that a
template rendering emits blocked events for missing template ksys.

It works by setting a template that requires keys 0,1,2,3,4 and then
eventually sets keys 0,1,2,3 and ensures that we get a final event indicating
that keys 3 and 4 are still missing.

The test waits to get a blocked event for the final state, but it can
fail if receives a blocked event for a transitional state (e.g. one
reporting 2,3,4,5 are missing).

This fixes the test by ensuring that it waits until the final message
before assertion.

Also, it clarifies the intent of the test with stricter assertions and
additional comments.
2020-05-09 14:09:39 -04:00
Juan Larriba a0df437c62
Run Linux Images (LCOW) and Windows Containers side by side (#7850)
Makes it possible to run Linux Containers On Windows with Nomad alongside Windows Containers. Fingerprint prevents only to run Nomad in Windows 10 with Linux Containers
2020-05-04 13:08:47 -04:00
Lang Martin ad2fb4b297 client/heartbeatstop: don't store client state, use timeout
In order to minimize this change while keeping a simple version of the
behavior, we set `lastOk` to the current time less the intial server
connection timeout. If the client starts and never contacts the
server, it will stop all configured tasks after the initial server
connection grace period, on the assumption that we've been out of
touch longer than any configured `stop_after_client_disconnect`.

The more complex state behavior might be justified later, but we
should learn about failure modes first.
2020-05-01 12:35:49 -04:00
Lang Martin 28bac139cb client/heartbeatstop: destroy allocs when disconnected from servers
- track lastHeartbeat, the client local time of the last successful
  heartbeat round trip
- track allocations with `stop_after_client_disconnect` configured
- trigger allocation destroy (which handles cleanup)
- restore heartbeat/killable allocs tracking when allocs are recovered from disk
- on client restart, stop those allocs after a grace period if the
  servers are still partioned
2020-05-01 12:35:49 -04:00
Tim Gross cc7dbad1c7
csi: restore long timeout for controller plugins (#7840)
During MVP development, we reduced the timeout for controller plugins
to avoid long hangs in GC workers. But now that this work has been
moved to the volume watcher, we can restore the original timeout which
is better suited for the characteristic timescales of some cloud
provider APIs and better matches the behavior of k8s.
2020-04-30 17:12:05 -04:00
Seth Hoenig 880c4e23d3 env_aws: combine 3 log lines into 1 2020-04-29 10:47:36 -06:00
Seth Hoenig 67303b666c
env_aws: downgrade log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:34:26 -06:00
Seth Hoenig 5ddc607701
env_aws: fixup log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:33:53 -06:00
Seth Hoenig f8596a3602 env_aws: use best-effort lookup table for CPU performance in EC2
Fixes #7681

The current behavior of the CPU fingerprinter in AWS is that it
reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field).

This is because the max CPU frequency is not available by reading
anything on the EC2 instance itself. Normally on Linux one would
look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq`
or perhaps parse the values from the `CPU max MHz` field in
`/proc/cpuinfo`, but those values are not available.

Furthermore, no metadata about the CPU is made available in the
EC2 metadata service.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html

Since `go-psutil` cannot determine the max CPU speed it defaults to
the current CPU speed, which could be basically any number between
0 and the true max. This is particularly bad on large, powerful
reserved instances which often idle at ~800 MHz while Nomad does
its fingerprinting (typically IO bound), which Nomad then uses as
the max, which results in severe loss of available resources.

Since the CPU specification is unavailable programmatically (at least
not without sudo) use a best-effort lookup table. This table was
generated by going through every instance type in AWS documentation
and copy-pasting the numbers.
https://aws.amazon.com/ec2/instance-types/

This approach obviously is not ideal as future instance types will
need to be added as they are introduced to AWS. However, using the
table should only be an improvement over the status quo since right
now Nomad miscalculates available CPU resources on all instance types.
2020-04-28 19:01:33 -06:00
Mahmood Ali 18dba6fdad Harmonize go-msgpack/codec/codecgen
Use v1.1.5 of go-msgpack/codec/codecgen, so go-msgpack codecgen matches
the library version.

We branched off earlier to pick up
f51b518921
, but apparently that's not needed as we could customize the package via
`-c` argument.
2020-04-28 17:12:31 -04:00
Tim Gross 083b35d651
csi: checkpoint volume claim garbage collection (#7782)
Adds a `CSIVolumeClaim` type to be tracked as current and past claims
on a volume. Allows for a client RPC failure during node or controller
detachment without having to keep the allocation around after the
first garbage collection eval.

This changeset lays groundwork for moving the actual detachment RPCs
into a volume watching loop outside the GC eval.
2020-04-23 11:06:23 -04:00
Charlie Voiselle c68c19f3cf
Use ExternalID in NodeStageVolume RPC (#7754) 2020-04-20 17:13:46 -04:00
Anthony Scalisi 9664c6b270
fix spelling errors (#6985) 2020-04-20 09:28:19 -04:00
Drew Bailey 8bfee62b70
Run task shutdown_delay regardless of service registration
task shutdown_delay will currently only run if there are registered
services for the task. This implementation detail isn't explicity stated
anywhere and is defined outside of the service stanza.

This change moves shutdown_delay to be evaluated after prekill hooks are
run, outside of any task runner hooks.

just use time.sleep
2020-04-10 11:06:26 -04:00
Nick Ethier 44ad5d96d8
ar/bridge: use cni.IsCNINotInitialized helper 2020-04-06 21:44:01 -04:00
Nick Ethier 58fe326090
ar/bridge: better cni status err handling 2020-04-06 21:21:42 -04:00
Nick Ethier 6a286777c7
ar/bridge: ensure cni configuration is always loaded 2020-04-06 21:02:26 -04:00
Nick Ethier 5166806993
Merge pull request #7600 from hashicorp/b-5767
tr/service_hook: prevent Update from running before Poststart finish
2020-04-06 16:52:42 -04:00
Nick Ethier 567609e101
tr/service_hook: reset initialized flag during deregister 2020-04-06 16:05:36 -04:00
Drew Bailey 4ab7c03641
Merge pull request #7618 from hashicorp/b-shutdown-delay-updates
Fixes bug that prevented group shutdown_delay updates
2020-04-06 13:05:20 -04:00
Drew Bailey 0d550049e9
ensure shutdown delay can be removed 2020-04-06 11:33:04 -04:00
Drew Bailey 9874e7b21d
Group shutdown delay fixes
Group shutdown delay updates were not properly handled in Update hook.
This commit also ensures that plan output is displayed.
2020-04-06 11:29:12 -04:00