Commit graph

4378 commits

Author SHA1 Message Date
Seth Hoenig cc70ce64ce consul/connect: avoid extra copy of connect stanza while interpolating 2020-12-09 11:44:07 -06:00
Seth Hoenig eb7cdce52b client/fingerprint/cpu: use fallback total compute value if cpu not detected
Previously, Nomad would fail to startup if the CPU fingerprinter could
not detect the cpu total compute (i.e. cores * mhz). This is common on
some EC2 instance types (graviton class), where the env_aws fingerprinter
will override the detected CPU performance with a more accurate value
anyway.

Instead of crashing on startup, have Nomad use a low default for available
cpu performance of 1000 ticks (e.g. 1 core * 1 GHz). This enables Nomad
to get past the useless cpu fingerprinting on those EC2 instances. The
crashing error message is now a log statement suggesting the setting of
cpu_total_compute in client config.

Fixes #7989
2020-12-09 10:35:58 -06:00
Seth Hoenig b51459a879 consul/connect: interpolate connect block
This PR enables job submitters to use interpolation in the connect
block of jobs making use of consul connect. Before, only the name of
the connect service would be interpolated, and only for a few select
identifiers related to the job itself (#6853). Now, all connect fields
can be interpolated using the full spectrum of runtime parameters.

Note that the service name is interpolated at job-submission time,
and cannot make use of values known only at runtime.

Fixes #7221
2020-12-09 09:10:00 -06:00
Kris Hicks 93155ba3da
Add gocritic to golangci-lint config (#9556) 2020-12-08 12:47:04 -08:00
Seth Hoenig 1ca5ea3240 env_aws: run ec2info to update ec2 info
Use `tools/ec2info` to update the generated table of instance types.
`$ go run .`
2020-12-02 09:35:03 -06:00
Seth Hoenig 3b2b083cbf
Merge pull request #9487 from hashicorp/f-connect-sidecar-concurrency
consul/connect: default envoy concurrency to 1
2020-12-01 15:51:41 -06:00
Seth Hoenig bf857684d1 consul/connect: default envoy concurrency to 1
Previously, every Envoy Connect sidecar would spawn as many worker
threads as logical CPU cores. That is Envoy's default behavior when
`--concurrency` is not explicitly set. Nomad now sets the concurrency
flag to 1, which is sensible for the default cpu = 250 Mhz resources
allocated for sidecar proxies. The concurrency value can be configured
in Client configuration by setting `meta.connect.proxy_concurrency`.

Closes #9341
2020-12-01 13:12:45 -06:00
Michael Schurter ea0e1789f4
Merge pull request #9435 from hashicorp/f-allocupdate-timer
client: always wait 200ms before sending updates
2020-12-01 08:45:17 -08:00
Drew Bailey 9adca240f8
Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447)
* upsertaclpolicies

* delete acl policies msgtype

* upsert acl policies msgtype

* delete acl tokens msgtype

* acl bootstrap msgtype

wip unsubscribe on token delete

test that subscriptions are closed after an ACL token has been deleted

Start writing policyupdated test

* update test to use before/after policy

* add SubscribeWithACLCheck to run acl checks on subscribe

* update rpc endpoint to use broker acl check

* Add and use subscriptions.closeSubscriptionFunc

This fixes the issue of not being able to defer unlocking the mutex on
the event broker in the for loop.

handle acl policy updates

* rpc endpoint test for terminating acl change

* add comments

Co-authored-by: Kris Hicks <khicks@hashicorp.com>
2020-12-01 11:11:34 -05:00
Benjamin Buzbee e0acbbfcc6
Fix RPC retry logic in nomad client's rpc.go for blocking queries (#9266) 2020-11-30 15:11:10 -05:00
Roman Vynar b957f87cd7 Add compute/zone to Azure fingerprinting 2020-11-26 13:26:51 +02:00
Michael Schurter 5ec065b180 client: always wait 200ms before sending updates
Always wait 200ms before calling the Node.UpdateAlloc RPC to send
allocation updates to servers.

Prior to this change we only reset the update ticker when an error was
encountered. This meant the 200ms ticker was running while the RPC was
being performed. If the RPC was slow due to network latency or server
load and took >=200ms, the ticker would tick during the RPC.

Then on the next loop only the select would randomly choose between the
two viable cases: receive an update or fire the RPC again.

If the RPC case won it would immediately loop again due to there being
no updates to send.

When the update chan receive is selected a single update is added to the
slice. The odds are then 50/50 that the subsequent loop will send the
single update instead of receiving any more updates.

This could cause a couple of problems:

1. Since only a small number of updates are sent, the chan buffer may
   fill, applying backpressure, and slowing down other client
   operations.
2. The small number of updates sent may already be stale and not
   represent the current state of the allocation locally.

A risk here is that it's hard to reason about how this will interact
with the 50ms batches on servers when the servers under load.

A further improvement would be to completely remove the alloc update
chan and instead use a mutex to build a map of alloc updates. I wanted
to test the lowest risk possible change on loaded servers first before
making more drastic changes.
2020-11-25 11:36:51 -08:00
Michael Schurter 15f2b8fe7c client: skip broken test and fix assertion 2020-11-18 10:01:02 -08:00
Michael Schurter ff91bba70e client: fix interpolation in template source
While Nomad v0.12.8 fixed `NOMAD_{ALLOC,TASK,SECRETS}_DIR` use in
`template.destination`, interpolating these variables in
`template.source` caused a path escape error.

**Why not apply the destination fix to source?**

The destination fix forces destination to always be relative to the task
directory. This makes sense for the destination as a destination outside
the task directory would be unreachable by the task. There's no reason
to ever render a template outside the task directory. (Using `..` does
allow destinations to escape the task directory if
`template.disable_file_sandbox = true`. That's just awkward and unsafe
enough I hope no one uses it.)

There is a reason to source a template outside a task
directory. At least if there weren't then I can't think of why we
implemented `template.disable_file_sandbox`. So v0.12.8 left the
behavior of `template.source` the more straightforward "Interpolate and
validate."

However, since outside of `raw_exec` every other driver uses absolute
paths for `NOMAD_*_DIR` interpolation, this means those variables are
unusable unless `disable_file_sandbox` is set.

**The Fix**

The variables are now interpolated as relative paths *only for the
purpose of rendering templates.* This is an unfortunate special case,
but reflects the fact that the templates view of the filesystem is
completely different (unconstrainted) vs the task's view (chrooted).
Arguably the values of these variables *should be context-specific.*
I think it's more reasonable to think of the "hack" as templating
running uncontainerized than that giving templates different paths is a
hack.

**TODO**

- [ ] E2E tests
- [ ] Job validation may still be broken and prevent my fix from
      working?

**raw_exec**

`raw_exec` is actually broken _a different way_ as exercised by tests in
this commit. I think we should probably remove these tests and fix that
in a followup PR/release, but I wanted to leave them in for the initial
review and discussion. Since non-containerized source paths are broken
anyway, perhaps there's another solution to this entire problem I'm
overlooking?
2020-11-17 22:03:04 -08:00
Wim 4e37897dd9 Use correct interface for netStatus
CNI plugins can return multiple interfaces, eg the bridge plugin.
We need the interface with the sandbox.
2020-11-14 22:29:30 +01:00
Seth Hoenig 4cc3c01d5b
Merge pull request #9352 from hashicorp/f-artifact-headers
jobspec: add support for headers in artifact stanza
2020-11-13 14:04:27 -06:00
Seth Hoenig bb8a5816a0 jobspec: add support for headers in artifact stanza
This PR adds the ability to set HTTP headers when downloading
an artifact from an `http` or `https` resource.

The implementation in `go-getter` is such that a new `HTTPGetter`
must be created for each artifact that sets headers (as opposed
to conveniently setting headers per-request). This PR maintains
the memoization of the default Getter objects, creating new ones
only for artifacts where headers are set.

Closes #9306
2020-11-13 12:03:54 -06:00
Jasmine Dahilig d6110cbed4
lifecycle: add poststop hook (#8194) 2020-11-12 08:01:42 -08:00
Chris Baker 48b1674335
Merge pull request #9311 from jeromegn/allow-empty-devices
Don't ignore nil devices in plugin fingerprint
2020-11-11 13:54:03 -06:00
Tim Gross 60874ebe25
csi: Postrun hook should not change mode (#9323)
The unpublish workflow requires that we know the mode (RW vs RO) if we want to
unpublish the node. Update the hook and the Unpublish RPC so that we mark the
claim for release in a new state but leave the mode alone. This fixes a bug
where RO claims were failing node unpublish.

The core job GC doesn't know the mode, but we don't need it for that workflow,
so add a mode specifically for GC; the volumewatcher uses this as a sentinel
to check whether claims (with their specific RW vs RO modes) need to be claimed.
2020-11-11 13:06:30 -05:00
Jerome Gravel-Niquet d1f1dbd203
Don't ignore nil devices in plugin fingerprint
Even if a plugin sends back an empty `[]*device.DeviceGroup`, it's transformed to `nil` during the RPC. Our custom device plugin is returning empty `FingerprintResponse.Devices` very often. Our temporary fix is to send a dummy `*DeviceGroup` if the slice is empty. This has the effect of never triggering the "first fingerprint" and therefore timing out after 50s.

In turn, this made our node exceed its hearbeat grace period when restarting it, revoking all vault tokens for its allocations, causing a restart of all our allocations because the token couldn't be renewed.

Removing the logic for `f.Devices == nil` does not appear to affect the functionality of the function.
2020-11-10 16:04:22 -05:00
Seth Hoenig 9960f96446 client/fingerprint: detect unloaded dynamic bridge kernel module
In Nomad v0.12.0, the client added additional fingerprinting around the
presense of the bridge kernel module. The fingerprinter only checked in
`/proc/modules` which is a list of loaded modules. In some cases, the
bridge kernel module is builtin rather than dynamically loaded. The fix
for that case is in #8721. However we were still missing the case where
the bridge module is dynamically loaded, but not yet loaded during the
startup of the Nomad agent. In this case the fingerprinter would believe
the bridge module was unavailable when really it gets loaded on demand.

This PR now has the fingerprinter scan the kernel module dependency file,
which will contain an entry for the bridge module even if it is not yet
loaded.

In summary, the client now looks for the bridge kernel module in
 - /proc/modules
 - /lib/modules/<kernel>/modules.builtin
 - /lib/modules/<kernel>/modules.dep

Closes #8423
2020-11-09 13:56:14 -06:00
Nick Ethier 04f5c4ee5f
ar/groupservice: remove drivernetwork (#9233)
* ar/groupservice: remove drivernetwork

* consul: allow host address_mode to accept raw port numbers

* consul: fix logic for blank address
2020-11-05 15:00:22 -05:00
Stefan Richter 484ef8a1e8
Add NOMAD_JOB_ID and NOMAD_JOB_PAERENT_ID env variables (#8967)
Beforehand tasks and field replacements did not have access to the
unique ID of their job or its parent. This adds this information as
new environment variables.
2020-10-23 10:49:58 -04:00
Tim Gross 1fb1c9c5d4
artifact/template: make destination path absolute inside taskdir (#9149)
Prior to Nomad 0.12.5, you could use `${NOMAD_SECRETS_DIR}/mysecret.txt` as
the `artifact.destination` and `template.destination` because we would always
append the destination to the task working directory. In the recent security
patch we treated the `destination` absolute path as valid if it didn't escape
the working directory, but this breaks backwards compatibility and
interpolation of `destination` fields.

This changeset partially reverts the behavior so that we always append the
destination, but we also perform the escape check on that new destination
after interpolation so the security hole is closed.

Also, ConsulTemplate test should exercise interpolation
2020-10-22 15:47:49 -04:00
Tim Gross 6df36e4cdb artifact/template: prevent file sandbox escapes
Ensure that the client honors the client configuration for the
`template.disable_file_sandbox` field when validating the jobspec's
`template.source` parameter, and not just with consul-template's own `file`
function.

Prevent interpolated `template.source`, `template.destination`, and
`artifact.destination` fields from escaping file sandbox.
2020-10-21 14:34:12 -04:00
Alexander Shtuchkin 90fd8bb85f
Implement 'batch mode' for persisting allocations on the client. (#9093)
Fixes #9047, see problem details there.

As a solution, we use BoltDB's 'Batch' mode that combines multiple
parallel writes into small number of transactions. See
https://github.com/boltdb/bolt#batch-read-write-transactions for
more information.
2020-10-20 16:15:37 -04:00
Seth Hoenig 9cdb98f0e4 client: add tests around meta and canarymeta interpolation
Expanding on #9096, add tests for making sure service.Meta and
service.CanaryMeta are interpolated from environment variables.
2020-10-20 12:50:29 -05:00
Jorge Marey 8a0ef606a3 Add interpolation on service canarymeta 2020-10-20 12:45:36 -05:00
Drew Bailey 6c788fdccd
Events/msgtype cleanup (#9117)
* use msgtype in upsert node

adds message type to signature for upsert node, update tests, remove placeholder method

* UpsertAllocs msg type test setup

* use upsertallocs with msg type in signature

update test usage of delete node

delete placeholder msgtype method

* add msgtype to upsert evals signature, update test call sites with test setup msg type

handle snapshot upsert eval outside of FSM and ignore eval event

remove placeholder upsertevalsmsgtype

handle job plan rpc and prevent event creation for plan

msgtype cleanup upsertnodeevents

updatenodedrain msgtype

msg type 0 is a node registration event, so set the default  to the ignore type

* fix named import

* fix signature ordering on upsertnode to match
2020-10-19 09:30:15 -04:00
Nick Ethier 4903e5b114
Consul with CNI and host_network addresses (#9095)
* consul: advertise cni and multi host interface addresses

* structs: add service/check address_mode validation

* ar/groupservices: fetch networkstatus at hook runtime

* ar/groupservice: nil check network status getter before calling

* consul: comment network status can be nil
2020-10-15 15:32:21 -04:00
Michael Schurter 9c3972937b s/0.13/1.0/g
1.0 here we come!
2020-10-14 15:17:47 -07:00
Chris Baker 1d35578bed removed backwards-compatible/untagged metrics deprecated in 0.7 2020-10-13 20:18:39 +00:00
Seth Hoenig ed13e5723f consul/connect: dynamically select envoy sidecar at runtime
As newer versions of Consul are released, the minimum version of Envoy
it supports as a sidecar proxy also gets bumped. Starting with the upcoming
Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current
versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the
default implementation of Connect sidecar proxy.

This PR introduces a change such that each Nomad Client will query its
local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545)
and then launch the Connect sidecar proxy task using the latest supported version
of Envoy. If the `SupportedProxies` API component is not available from
Consul, Nomad will fallback to the old version of Envoy supported by old
versions of Consul.

Setting the meta configuration option `meta.connect.sidecar_image` or
setting the `connect.sidecar_task` stanza will take precedence as is
the current behavior for sidecar proxies.

Setting the meta configuration option `meta.connect.gateway_image`
will take precedence as is the current behavior for connect gateways.

`meta.connect.sidecar_image` and `meta.connect.gateway_image` may make
use of the special `${NOMAD_envoy_version}` variable interpolation, which
resolves to the newest version of Envoy supported by the Consul agent.

Addresses #8585 #7665
2020-10-13 09:14:12 -05:00
Seth Hoenig 5a3748ca82
Merge pull request #9038 from hashicorp/f-ec2-table
env_aws: get ec2 cpu perf data from AWS API
2020-10-12 18:55:33 -05:00
Nick Ethier d45be0b5a6
client: add NetworkStatus to Allocation (#8657) 2020-10-12 13:43:04 -04:00
Yoan Blanc 891accb89a
use allow/deny instead of the colored alternatives (#9019)
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-10-12 08:47:05 -04:00
Tim Gross b5abf4ec9d csi: fix incorrect comment on csi_hook context lifetime 2020-10-09 11:03:51 -04:00
Seth Hoenig 9b555fe6d5 env_aws: fixup test case node attr detection 2020-10-08 12:59:07 -05:00
Seth Hoenig e693d15a5b env_aws: get ec2 cpu perf data from AWS API
Previously, Nomad was using a hand-made lookup table for looking
up EC2 CPU performance characteristics (core count + speed = ticks).

This data was incomplete and incorrect depending on region. The AWS
API has the correct data but requires API keys to use (i.e. should not
be queried directly from Nomad).

This change introduces a lookup table generated by a small command line
tool in Nomad's tools module which uses the Amazon AWS API.

Running the tool requires AWS_* environment variables set.
  $ # in nomad/tools/cpuinfo
  $ go run .

Going forward, Nomad can incorporate regeneration of the lookup table
somewhere in the CI pipeline so that we remain up-to-date on the latest
offerings from EC2.

Fixes #7830
2020-10-08 12:01:09 -05:00
Landan Cheruka 023a2d36b7
fingerprint: changed unique.platform.azure.hostname to unique.platform.azure.name (#9016) 2020-10-02 16:50:12 -04:00
Javier Heredia 103ac0a37f
Add consul segment fingerprint (#7214) 2020-10-02 15:15:59 -04:00
Fredrik Hoem Grelland a015c52846
configure nomad cluster to use a Consul Namespace [Consul Enterprise] (#8849) 2020-10-02 14:46:36 -04:00
Fredrik Hoem Grelland 953d4de8dd
update consul-template to v0.25.1 (#8988) 2020-10-01 14:08:49 -04:00
Landan Cheruka 3df1802119
client: added azure fingerprinting support (#8979) 2020-10-01 09:10:27 -04:00
Lars Lehtonen 03abe3c890
client: fix test umask (#8987) 2020-09-30 08:09:41 -04:00
Mahmood Ali 2e9e8ccc24
Merge pull request #8982 from hashicorp/b-exec-dns-resolv
drivers/exec: fix DNS resolution in systemd hosts
2020-09-29 11:39:43 -05:00
Mahmood Ali 7ddf4b2902 drivers/exec: fix DNS resolution in systemd hosts
Host with systemd-resolved have `/etc/resolv.conf` is a symlink
to `/run/systemd/resolve/stub-resolv.conf`. By bind-mounting
/etc/resolv.conf only, the exec container DNS resolution fail very badly.

This change fixes DNS resolution by binding /run/systemd/resolve as
well.

Note that this assumes that the systemd resolver (default to 127.0.0.53) is
accessible within the container. This is the case here because exec
containers share the same network namespace by default.

Jobs with custom network dns configurations are not affected, and Nomad
will continue to use the job dns settings rather than host one.
2020-09-29 11:33:51 -04:00
Seth Hoenig af9543c997 consul: fix validation of task in group-level script-checks
When defining a script-check in a group-level service, Nomad needs to
know which task is associated with the check so that it can use the
correct task driver to execute the check.

This PR fixes two bugs:
1) validate service.task or service.check.task is configured
2) make service.check.task inherit service.task if it is itself unset

Fixes #8952
2020-09-28 15:02:59 -05:00
Pete Woods 81fa2a01fc
Add node "status", "scheduling eligibility" to all client metrics (#8925)
- We previously added these to the client host metrics, but it's useful to have them on all client metrics.
- e.g. so you can exclude draining nodes from charts showing your fleet size.
2020-09-22 13:53:50 -04:00
Pierre Cauchois e4b739cafd
RPC Timeout/Retries account for blocking requests (#8921)
The current implementation measures RPC request timeout only against
config.RPCHoldTimeout, which is fine for non-blocking requests but will
almost surely be exceeded by long-poll requests that block for minutes
at a time.

This adds an HasTimedOut method on the RPCInfo interface that takes into
account whether the request is blocking, its maximum wait time, and the
RPCHoldTimeout.
2020-09-18 08:58:41 -04:00
Joel May 2adc5bdec7
fingerprinting: add AWS MAC and public-ipv6 (#8887) 2020-09-17 09:03:01 -04:00
Lars Lehtonen 55f0302c46
client/allocrunner/taskrunner: client.Close after err check (#8825) 2020-09-04 08:12:08 -04:00
Tim Gross 8ad90b4253
fix params for Agent.Host client RPC (#8795)
The parameters for the receiving side of the Agent.Host client RPC did not
take the arguments serialized at the server side. This results in a panic.
2020-08-31 17:14:26 -04:00
Jasmine Dahilig 71a694f39c
Merge pull request #8390 from hashicorp/lifecycle-poststart-hook
task lifecycle poststart hook
2020-08-31 13:53:24 -07:00
Jasmine Dahilig fbe0c89ab1 task lifecycle poststart: code review fixes 2020-08-31 13:22:41 -07:00
Seth Hoenig 9f1f2a5673 Merge branch 'master' into f-cc-ingress 2020-08-26 15:31:05 -05:00
Seth Hoenig dfe179abc5 consul/connect: fixup some comments and context timeout 2020-08-26 13:17:16 -05:00
Mahmood Ali 10954bf717 close file when done reading 2020-08-24 20:22:42 -04:00
Mahmood Ali 0be632debf don't lock if ref is nil
Ensure that d.mu is only dereferenced if d is not-nil, to avoid a null
dereference panic.
2020-08-24 20:19:40 -04:00
Seth Hoenig 26e77623e5 consul/connect: fixup tests to use new consul sdk 2020-08-24 12:02:41 -05:00
Seth Hoenig a09d1746bf
Merge branch 'master' into consul-v1.7.7 2020-08-24 10:43:00 -05:00
Yoan Blanc 327d17e0dc
fixup! vendor: consul/api, consul/sdk v1.6.0
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-08-24 08:59:03 +02:00
Mark Lee cd23fd7ca2 refactor lookup code 2020-08-24 12:24:16 +09:00
Mark Lee cd7aabca72 lookup kernel builtin modules too 2020-08-24 11:09:13 +09:00
Seth Hoenig 5b072029f2 consul/connect: add initial support for ingress gateways
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.

```hcl
service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      ingress {
        // ingress-gateway configuration entry
      }
    }
  }
}
```

A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.

Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.

Aims to address #8294 and tangentially #8647
2020-08-21 16:21:54 -05:00
Shengjing Zhu 7a4f48795d Adjust cgroup change in libcontainer 2020-08-20 00:31:07 +08:00
Michael Schurter de08ae8083 test: add allocrunner test for poststart hooks 2020-08-12 09:54:14 -07:00
Nick Ethier e39574be59
docker: support group allocated ports and host_networks (#8623)
* docker: support group allocated ports

* docker: add new ports driver config to specify which group ports are mapped

* docker: update port mapping docs
2020-08-11 18:30:22 -04:00
Lang Martin a27913e699
CSI RPC Token (#8626)
* client/allocrunner/csi_hook: use the Node SecretID
* client/allocrunner/csi_hook: include the namespace for Claim
2020-08-11 13:08:39 -04:00
Michael Schurter e1946b66ce client: remove shortcircuit preventing poststart hooks from running 2020-08-11 09:48:24 -07:00
Michael Schurter 04a135b57d client: don't restart poststart sidecars on success 2020-08-11 09:47:18 -07:00
Tim Gross 7d53ed88d6
csi: client RPCs should return wrapped errors for checking (#8605)
When the client-side actions of a CSI client RPC succeed but we get
disconnected during the RPC or we fail to checkpoint the claim state, we want
to be able to retry the client RPC without getting blocked by the client-side
state (ex. mount points) already having been cleaned up in previous calls.
2020-08-07 11:01:36 -04:00
Tim Gross 2854298089
csi: release claims via csi_hook postrun unpublish RPC (#8580)
Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This
may forward client RPCs to the node plugins or to the controller plugins,
depending on whether other allocations on this node have claims on this
volume.

By making clients responsible for running the `CSIVolume.Unpublish` RPC (and
making the RPC available to a `nomad volume detach` command), the
volumewatcher becomes only used by the core GC job and we no longer need
async volume GC from job deregister and node update.
2020-08-06 14:51:46 -04:00
Jasmine Dahilig e8ed6851e2 lifecycle: add allocrunner and task hook coordinator unit tests 2020-07-29 12:39:42 -07:00
Seth Hoenig a392b19b6a consul/connect: fixup some spelling, comments, consts 2020-07-29 09:26:01 -05:00
Seth Hoenig 04bb6c416f consul/connect: organize lock & fields in http/grpc socket hooks 2020-07-29 09:26:01 -05:00
Seth Hoenig dbee956c05 consul/connect: optimze grpc socket hook check for bridge network first 2020-07-29 09:26:01 -05:00
Seth Hoenig 2511f48351 consul/connect: add support for bridge networks with connect native tasks
Before, Connect Native Tasks needed one of these to work:

- To be run in host networking mode
- To have the Consul agent configured to listen to a unix socket
- To have the Consul agent configured to listen to a public interface

None of these are a great experience, though running in host networking is
still the best solution for non-Linux hosts. This PR establishes a connection
proxy between the Consul HTTP listener and a unix socket inside the alloc fs,
bypassing the network namespace for any Connect Native task. Similar to and
re-uses a bunch of code from the gRPC listener version for envoy sidecar proxies.

Proxy is established only if the alloc is configured for bridge networking and
there is at least one Connect Native task in the Task Group.

Fixes #8290
2020-07-29 09:26:01 -05:00
Drew Bailey bd421b6197
Merge pull request #8453 from hashicorp/oss-multi-vault-ns
oss compoments for multi-vault namespaces
2020-07-27 08:45:22 -04:00
Mahmood Ali 2d0b80a0ed
Merge pull request #6517 from hashicorp/b-fingerprint-shutdown-race
client: don't retry fingerprinting on shutdown
2020-07-24 11:56:32 -04:00
Drew Bailey b296558b8e
oss compoments for multi-vault namespaces
adds in oss components to support enterprise multi-vault namespace feature

upgrade specific doc on vault multi-namespaces

vault docs

update test to reflect new error
2020-07-24 10:14:59 -04:00
Michael Schurter 1400e0480d
Merge pull request #8521 from hashicorp/docs-hearbeat
docs: s/hearbeat/heartbeat and fix link
2020-07-23 14:07:24 -07:00
Tim Gross 56c6dacd38
csi: NodePublish should not create target_path, only its parent dir (#8505)
The NodePublish workflow currently creates the target path and its parent
directory. However, the CSI specification says that the CO shall ensure the
parent directory of the target path exists, and that the SP shall place the
block device or mounted directory at the target path. Much of our testing has
been with CSI plugins that are more forgiving, but our behavior breaks
spec-compliant CSI plugins.

This changeset ensures we only create the parent directory.
2020-07-23 15:52:22 -04:00
Michael Schurter 8340ad4da8 docs: s/hearbeat/heartbeat and fix link
Also fixed the same typo in a test. Fixing the typo fixes the link, but
the link was still broken when running the website locally due to the
trailing slash. It would have worked in prod thanks to redirects, but
using the canonical URL seems ideal.
2020-07-23 11:33:34 -07:00
Mahmood Ali 91ba9ccefe honor config.NetworkInterface in NodeNetworks 2020-07-21 15:43:45 -04:00
Mahmood Ali c2d3c3e431
nvidia: support disabling the nvidia plugin (#8353) 2020-07-21 10:11:16 -04:00
Jasmine Dahilig 44c21bd3c7 fix panic, but poststart is still stalled 2020-07-10 09:03:10 -07:00
Mahmood Ali e9bf3a42f5
Merge pull request #8333 from hashicorp/b-test-tweak-20200701
tests: avoid os.Exit in a test
2020-07-10 11:18:28 -04:00
Jasmine Dahilig 9e27231953 add poststart hook to task hook coordinator & structs 2020-07-08 11:01:35 -07:00
Nick Ethier e0fb634309
ar: support opting into binding host ports to default network IP (#8321)
* ar: support opting into binding host ports to default network IP

* fix config plumbing

* plumb node address into network resource

* struct: only handle network resource upgrade path once
2020-07-06 18:51:46 -04:00
Lang Martin 6c22cd587d
api: nomad debug new /agent/host (#8325)
* command/agent/host: collect host data, multi platform

* nomad/structs/structs: new HostDataRequest/Response

* client/agent_endpoint: add RPC endpoint

* command/agent/agent_endpoint: add Host

* api/agent: add the Host endpoint

* nomad/client_agent_endpoint: add Agent Host with forwarding

* nomad/client_agent_endpoint: use findClientConn

This changes forwardMonitorClient and forwardProfileClient to use
findClientConn, which was cribbed from the common parts of those
funcs.

* command/debug: call agent hosts

* command/agent/host: eliminate calling external programs
2020-07-02 09:51:25 -04:00
Mahmood Ali 026d8c6eed tests: avoid os.Exit in a test 2020-07-01 15:25:13 -04:00
Mahmood Ali 7f460d2706 allocrunner: terminate sidecars in the end
This fixes a bug where a batch allocation fails to complete if it has
sidecars.

If the only remaining running tasks in an allocations are sidecars - we
must kill them and mark the allocation as complete.
2020-06-29 15:12:15 -04:00
Seth Hoenig 011c6b027f connect/native: doc and comment tweaks from PR 2020-06-24 10:13:22 -05:00
Seth Hoenig 03a5706919 connect/native: check for pre-existing consul token 2020-06-24 09:16:28 -05:00
Seth Hoenig 6154181a64 connect/native: update connect native hook tests 2020-06-23 12:07:35 -05:00
Seth Hoenig c5d3f58bee connect/native: give tls files an extension 2020-06-23 12:06:28 -05:00
Seth Hoenig 4d71f22a11 consul/connect: add support for running connect native tasks
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.

The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.

There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.

If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.

Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.

Fixes #6083
2020-06-22 14:07:44 -05:00
Tim Gross 3d38592fbb
csi: add VolumeContext to NodeStage/Publish RPCs (#8239)
In #7957 we added support for passing a volume context to the controller RPCs.
This is an opaque map that's created by `CreateVolume` or, in Nomad's case,
in the volume registration spec.

However, we missed passing this field to the `NodeStage` and `NodePublish` RPC,
which prevents certain plugins (such as MooseFS) from making node RPCs.
2020-06-22 13:54:32 -04:00
Michael Schurter 562704124d
Merge pull request #8208 from hashicorp/f-multi-network
multi-interface network support
2020-06-19 15:46:48 -07:00
Mahmood Ali 3824e0362c
Revert "client: defensive against getting stale alloc updates" 2020-06-19 15:39:44 -04:00
Nick Ethier a87e91e971
test: fix up testing around host networks 2020-06-19 13:53:31 -04:00
Nick Ethier f0ac1f027a
lint: spelling 2020-06-19 11:29:41 -04:00
Nick Ethier 0374ad3e6c
taskenv: populate NOMAD_IP|PORT|ADDR env from allocated ports 2020-06-19 10:51:32 -04:00
Nick Ethier f0559a8162
multi-interface network support 2020-06-19 09:42:10 -04:00
Nick Ethier 4a44deaa5c CNI Implementation (#7518) 2020-06-18 11:05:29 -07:00
Nick Ethier 0bc0403cc3 Task DNS Options (#7661)
Co-Authored-By: Tim Gross <tgross@hashicorp.com>
Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>
2020-06-18 11:01:31 -07:00
Drew Bailey 84afc28ceb
only report tasklogger is running if both stdout and stderr are still running (#8155)
* only report tasklogger is running if both stdout and stderr are still running

* changelog
2020-06-12 09:17:35 -04:00
Lang Martin ac7c39d3d3
Delayed evaluations for stop_after_client_disconnect can cause unwanted extra followup evaluations around job garbage collection (#8099)
* client/heartbeatstop: reversed time condition for startup grace

* scheduler/generic_sched: use `delayInstead` to avoid a loop

Without protecting the loop that creates followUpEvals, a delayed eval
is allowed to create an immediate subsequent delayed eval. For both
`stop_after_client_disconnect` and the `reschedule` block, a delayed
eval should always produce some immediate result (running or blocked)
and then only after the outcome of that eval produce a second delayed
eval.

* scheduler/reconcile: lostLater are different than delayedReschedules

Just slightly. `lostLater` allocs should be used to create batched
evaluations, but `handleDelayedReschedules` assumes that the
allocations are in the untainted set. When it creates the in-place
updates to those allocations at the end, it causes the allocation to
be treated as running over in the planner, which causes the initial
`stop_after_client_disconnect` evaluation to be retried by the worker.
2020-06-03 09:48:38 -04:00
Mahmood Ali 5703c0db80 tests: Run a task long enough to be restartable 2020-05-31 10:33:03 -04:00
Drew Bailey 59ca304fce
give enterpriseclient a logger (#8072) 2020-05-28 15:43:16 -04:00
Drew Bailey 34871f89be
Oss license support for ent builds (#8054)
* changes necessary to support oss licesning shims

revert nomad fmt changes

update test to work with enterprise changes

update tests to work with new ent enforcements

make check

update cas test to use scheduler algorithm

back out preemption changes

add comments

* remove unused method
2020-05-27 13:46:52 -04:00
Mahmood Ali 2588b3bc98 cleanup driver eventor goroutines
This fixes few cases where driver eventor goroutines are leaked during
normal operations, but especially so in tests.

This change makes few modifications:

First, it switches drivers to use `Context`s to manage shutdown events.
Previously, it relied on callers invoking `.Shutdown()` function that is
specific to internal drivers only and require casting.  Using `Contexts`
provide a consistent idiomatic way to manage lifecycle for both internal
and external drivers.

Also, I discovered few places where we don't clean up a temporary driver
instance in the plugin catalog code, where we dispense a driver to
inspect and validate the schema config without properly cleaning it up.
2020-05-26 11:04:04 -04:00
Tim Gross ba11aef5d9
csi: skip unit tests on unsupported platforms (#8033)
Some of the unit tests for CSI require platform-specific APIs that aren't
available on macOS. We can safely skip these tests.
2020-05-21 13:56:50 -04:00
Tim Gross aa8927abb4
volumes: return better error messages for unsupported task drivers (#8030)
When an allocation runs for a task driver that can't support volume mounts,
the mounting will fail in a way that can be hard to understand. With host
volumes this usually means failing silently, whereas with CSI the operator
gets inscrutable internals exposed in the `nomad alloc status`.

This changeset adds a MountConfig field to the task driver Capabilities
response. We validate this when the `csi_hook` or `volume_hook` fires and
return a user-friendly error.

Note that we don't currently have a way to get driver capabilities up to the
server, except through attributes. Validating this when the user initially
submits the jobspec would be even better than what we're doing here (and could
be useful for all our other capabilities), but that's out of scope for this
changeset.

Also note that the MountConfig enum starts with "supports all" in order to
support community plugins in a backwards compatible way, rather than cutting
them off from volume mounting unexpectedly.
2020-05-21 09:18:02 -04:00
Tim Gross 065fa7af8b
stats_hook: log normal shutdown condition as debug, not error (#8028)
The `stats_hook` writes an Error log every time an allocation becomes
terminal. This is a normal condition, not an error. A real error
condition like a failure to collect the stats is logged later. It just
creates log noise, and this is a particularly bad operator experience
for heavy batch workloads.
2020-05-20 10:28:30 -04:00
Mahmood Ali 751f337f1c Update hcl2 vendoring
The hcl2 library has moved from http://github.com/hashicorp/hcl2 to https://github.com/hashicorp/hcl/tree/hcl2.

This updates Nomad's vendoring to start using hcl2 library.  Also
updates some related libraries (e.g. `github.com/zclconf/go-cty/cty` and
`github.com/apparentlymart/go-textseg`).
2020-05-19 15:00:03 -04:00
Tim Gross 6a463dc13a
csi: use a blocking initial connection with timeout (#7965)
The plugin supervisor lazily connects to plugins, but this means we
only get "Unavailable" back from the gRPC call in cases where the
plugin can never be reached (for example, if the Nomad client has the
wrong permissions for the socket).

This changeset improves the operator experience by switching to a
blocking `DialWithContext`. It eagerly connects so that we can
validate the connection is real and get a "failed to open" error in
case where Nomad can't establish the initial connection.
2020-05-15 08:17:11 -04:00
Tim Gross 2082cf738a
csi: support for VolumeContext and VolumeParameters (#7957)
The MVP for CSI in the 0.11.0 release of Nomad did not include support
for opaque volume parameters or volume context. This changeset adds
support for both.

This also moves args for ControllerValidateCapabilities into a struct.
The CSI plugin `ControllerValidateCapabilities` struct that we turn
into a CSI RPC is accumulating arguments, so moving it into a request
struct will reduce the churn of this internal API, make the plugin
code more readable, and make this method consistent with the other
plugin methods in that package.
2020-05-15 08:16:01 -04:00
Tim Gross 24aa32c503 csi: use a blocking initial connection with timeout
The plugin supervisor lazily connects to plugins, but this means we
only get "Unavailable" back from the gRPC call in cases where the
plugin can never be reached (for example, if the Nomad client has the
wrong permissions for the socket).

This changeset improves the operator experience by switching to a
blocking `DialWithContext`. It eagerly connects so that we can
validate the connection is real and get a "failed to open" error in
case where Nomad can't establish the initial connection.
2020-05-14 15:59:19 -04:00
Tim Gross 4f54a633a2
csi: refactor internal client field name to ExternalID (#7958)
The CSI plugins RPCs require the use of the storage provider's volume
ID, rather than the user-defined volume ID. Although changing the RPCs
to use the field name `ExternalID` risks breaking backwards
compatibility, we can use the `ExternalID` name internally for the
client and only use `VolumeID` at the RPC boundaries.
2020-05-14 11:56:07 -04:00
Lang Martin d3c4700cd3
server: stop after client disconnect (#7939)
* jobspec, api: add stop_after_client_disconnect

* nomad/state/state_store: error message typo

* structs: alloc methods to support stop_after_client_disconnect

1. a global AllocStates to track status changes with timestamps. We
   need this to track the time at which the alloc became lost
   originally.

2. ShouldClientStop() and WaitClientStop() to actually do the math

* scheduler/reconcile_util: delayByStopAfterClientDisconnect

* scheduler/reconcile: use delayByStopAfterClientDisconnect

* scheduler/util: updateNonTerminalAllocsToLost comments

This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now

* scheduler/testing: AssertEvalStatus just use require

* scheduler/generic_sched: don't create a blocked eval if delayed

* scheduler/generic_sched_test: several scheduling cases
2020-05-13 16:39:04 -04:00
Mahmood Ali 0ece631e60 allochealth: Fix when check health preceeds task health
Fix a bug where if the alloc check becomes healthy before the task health, the
alloc may never be considered healthy.
2020-05-13 07:44:39 -04:00
Mahmood Ali 934c5e8ff0 tests: tests for health check sequencing
Add a failing tests to show that if an alloc checks is marked healthy before the
alloc tasks start up, the alloc may be forever considered unhealthy.
2020-05-13 07:43:00 -04:00
Tim Gross 4374c1a837
csi: support Secrets parameter in CSI RPCs (#7923)
CSI plugins can require credentials for some publishing and
unpublishing workflow RPCs. Secrets are configured at the time of
volume registration, stored in the volume struct, and then passed
around as an opaque map by Nomad to the plugins.
2020-05-11 17:12:51 -04:00
Mahmood Ali 938e916d9c When serializing msgpack, only consider codec tag
When serializing structs with msgpack, only consider type tags of
`codec`.

Hashicorp/go-msgpack (based on ugorji/go) defaults to interpretting
`codec` tag if it's available, but falls to using `json` if `codec`
isn't present.

This behavior is surprising in cases where we want to serialize json
differently from msgpack, e.g. serializing `ConsulExposeConfig`.
2020-05-11 14:14:10 -04:00
Mahmood Ali 543f08c1ae Deflake TestTaskTemplateManager_BlockedEvents test
This change deflakes TestTaskTemplateManager_BlockedEvents test, because
it is expecting a number of events without accounting for transitional
state.

The test TestTaskTemplateManager_BlockedEvents attempts to ensure that a
template rendering emits blocked events for missing template ksys.

It works by setting a template that requires keys 0,1,2,3,4 and then
eventually sets keys 0,1,2,3 and ensures that we get a final event indicating
that keys 3 and 4 are still missing.

The test waits to get a blocked event for the final state, but it can
fail if receives a blocked event for a transitional state (e.g. one
reporting 2,3,4,5 are missing).

This fixes the test by ensuring that it waits until the final message
before assertion.

Also, it clarifies the intent of the test with stricter assertions and
additional comments.
2020-05-09 14:09:39 -04:00
Juan Larriba a0df437c62
Run Linux Images (LCOW) and Windows Containers side by side (#7850)
Makes it possible to run Linux Containers On Windows with Nomad alongside Windows Containers. Fingerprint prevents only to run Nomad in Windows 10 with Linux Containers
2020-05-04 13:08:47 -04:00
Lang Martin ad2fb4b297 client/heartbeatstop: don't store client state, use timeout
In order to minimize this change while keeping a simple version of the
behavior, we set `lastOk` to the current time less the intial server
connection timeout. If the client starts and never contacts the
server, it will stop all configured tasks after the initial server
connection grace period, on the assumption that we've been out of
touch longer than any configured `stop_after_client_disconnect`.

The more complex state behavior might be justified later, but we
should learn about failure modes first.
2020-05-01 12:35:49 -04:00
Lang Martin 28bac139cb client/heartbeatstop: destroy allocs when disconnected from servers
- track lastHeartbeat, the client local time of the last successful
  heartbeat round trip
- track allocations with `stop_after_client_disconnect` configured
- trigger allocation destroy (which handles cleanup)
- restore heartbeat/killable allocs tracking when allocs are recovered from disk
- on client restart, stop those allocs after a grace period if the
  servers are still partioned
2020-05-01 12:35:49 -04:00
Tim Gross cc7dbad1c7
csi: restore long timeout for controller plugins (#7840)
During MVP development, we reduced the timeout for controller plugins
to avoid long hangs in GC workers. But now that this work has been
moved to the volume watcher, we can restore the original timeout which
is better suited for the characteristic timescales of some cloud
provider APIs and better matches the behavior of k8s.
2020-04-30 17:12:05 -04:00
Seth Hoenig 880c4e23d3 env_aws: combine 3 log lines into 1 2020-04-29 10:47:36 -06:00
Seth Hoenig 67303b666c
env_aws: downgrade log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:34:26 -06:00
Seth Hoenig 5ddc607701
env_aws: fixup log line
Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>
2020-04-29 10:33:53 -06:00
Seth Hoenig f8596a3602 env_aws: use best-effort lookup table for CPU performance in EC2
Fixes #7681

The current behavior of the CPU fingerprinter in AWS is that it
reads the **current** speed from `/proc/cpuinfo` (`CPU MHz` field).

This is because the max CPU frequency is not available by reading
anything on the EC2 instance itself. Normally on Linux one would
look at e.g. `sys/devices/system/cpu/cpuN/cpufreq/cpuinfo_max_freq`
or perhaps parse the values from the `CPU max MHz` field in
`/proc/cpuinfo`, but those values are not available.

Furthermore, no metadata about the CPU is made available in the
EC2 metadata service.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html

Since `go-psutil` cannot determine the max CPU speed it defaults to
the current CPU speed, which could be basically any number between
0 and the true max. This is particularly bad on large, powerful
reserved instances which often idle at ~800 MHz while Nomad does
its fingerprinting (typically IO bound), which Nomad then uses as
the max, which results in severe loss of available resources.

Since the CPU specification is unavailable programmatically (at least
not without sudo) use a best-effort lookup table. This table was
generated by going through every instance type in AWS documentation
and copy-pasting the numbers.
https://aws.amazon.com/ec2/instance-types/

This approach obviously is not ideal as future instance types will
need to be added as they are introduced to AWS. However, using the
table should only be an improvement over the status quo since right
now Nomad miscalculates available CPU resources on all instance types.
2020-04-28 19:01:33 -06:00
Mahmood Ali 18dba6fdad Harmonize go-msgpack/codec/codecgen
Use v1.1.5 of go-msgpack/codec/codecgen, so go-msgpack codecgen matches
the library version.

We branched off earlier to pick up
f51b518921
, but apparently that's not needed as we could customize the package via
`-c` argument.
2020-04-28 17:12:31 -04:00
Tim Gross 083b35d651
csi: checkpoint volume claim garbage collection (#7782)
Adds a `CSIVolumeClaim` type to be tracked as current and past claims
on a volume. Allows for a client RPC failure during node or controller
detachment without having to keep the allocation around after the
first garbage collection eval.

This changeset lays groundwork for moving the actual detachment RPCs
into a volume watching loop outside the GC eval.
2020-04-23 11:06:23 -04:00
Charlie Voiselle c68c19f3cf
Use ExternalID in NodeStageVolume RPC (#7754) 2020-04-20 17:13:46 -04:00
Anthony Scalisi 9664c6b270
fix spelling errors (#6985) 2020-04-20 09:28:19 -04:00
Drew Bailey 8bfee62b70
Run task shutdown_delay regardless of service registration
task shutdown_delay will currently only run if there are registered
services for the task. This implementation detail isn't explicity stated
anywhere and is defined outside of the service stanza.

This change moves shutdown_delay to be evaluated after prekill hooks are
run, outside of any task runner hooks.

just use time.sleep
2020-04-10 11:06:26 -04:00
Nick Ethier 44ad5d96d8
ar/bridge: use cni.IsCNINotInitialized helper 2020-04-06 21:44:01 -04:00
Nick Ethier 58fe326090
ar/bridge: better cni status err handling 2020-04-06 21:21:42 -04:00
Nick Ethier 6a286777c7
ar/bridge: ensure cni configuration is always loaded 2020-04-06 21:02:26 -04:00
Nick Ethier 5166806993
Merge pull request #7600 from hashicorp/b-5767
tr/service_hook: prevent Update from running before Poststart finish
2020-04-06 16:52:42 -04:00
Nick Ethier 567609e101
tr/service_hook: reset initialized flag during deregister 2020-04-06 16:05:36 -04:00
Drew Bailey 4ab7c03641
Merge pull request #7618 from hashicorp/b-shutdown-delay-updates
Fixes bug that prevented group shutdown_delay updates
2020-04-06 13:05:20 -04:00
Drew Bailey 0d550049e9
ensure shutdown delay can be removed 2020-04-06 11:33:04 -04:00
Drew Bailey 9874e7b21d
Group shutdown delay fixes
Group shutdown delay updates were not properly handled in Update hook.
This commit also ensures that plan output is displayed.
2020-04-06 11:29:12 -04:00
Tim Gross 027277a0d9 csi: make volume GC in job deregister safely async
The `Job.Deregister` call will block on the client CSI controller RPCs
while the alloc still exists on the Nomad client node. So we need to
make the volume claim reaping async from the `Job.Deregister`. This
allows `nomad job stop` to return immediately. In order to make this
work, this changeset changes the volume GC so that the GC jobs are on a
by-volume basis rather than a by-job basis; we won't have to query
the (possibly deleted) job at the time of volume GC. We smuggle the
volume ID and whether it's a purge into the GC eval ID the same way we
smuggled the job ID previously.
2020-04-06 10:15:55 -04:00