* upsertaclpolicies
* delete acl policies msgtype
* upsert acl policies msgtype
* delete acl tokens msgtype
* acl bootstrap msgtype
wip unsubscribe on token delete
test that subscriptions are closed after an ACL token has been deleted
Start writing policyupdated test
* update test to use before/after policy
* add SubscribeWithACLCheck to run acl checks on subscribe
* update rpc endpoint to use broker acl check
* Add and use subscriptions.closeSubscriptionFunc
This fixes the issue of not being able to defer unlocking the mutex on
the event broker in the for loop.
handle acl policy updates
* rpc endpoint test for terminating acl change
* add comments
Co-authored-by: Kris Hicks <khicks@hashicorp.com>
Always wait 200ms before calling the Node.UpdateAlloc RPC to send
allocation updates to servers.
Prior to this change we only reset the update ticker when an error was
encountered. This meant the 200ms ticker was running while the RPC was
being performed. If the RPC was slow due to network latency or server
load and took >=200ms, the ticker would tick during the RPC.
Then on the next loop only the select would randomly choose between the
two viable cases: receive an update or fire the RPC again.
If the RPC case won it would immediately loop again due to there being
no updates to send.
When the update chan receive is selected a single update is added to the
slice. The odds are then 50/50 that the subsequent loop will send the
single update instead of receiving any more updates.
This could cause a couple of problems:
1. Since only a small number of updates are sent, the chan buffer may
fill, applying backpressure, and slowing down other client
operations.
2. The small number of updates sent may already be stale and not
represent the current state of the allocation locally.
A risk here is that it's hard to reason about how this will interact
with the 50ms batches on servers when the servers under load.
A further improvement would be to completely remove the alloc update
chan and instead use a mutex to build a map of alloc updates. I wanted
to test the lowest risk possible change on loaded servers first before
making more drastic changes.
While Nomad v0.12.8 fixed `NOMAD_{ALLOC,TASK,SECRETS}_DIR` use in
`template.destination`, interpolating these variables in
`template.source` caused a path escape error.
**Why not apply the destination fix to source?**
The destination fix forces destination to always be relative to the task
directory. This makes sense for the destination as a destination outside
the task directory would be unreachable by the task. There's no reason
to ever render a template outside the task directory. (Using `..` does
allow destinations to escape the task directory if
`template.disable_file_sandbox = true`. That's just awkward and unsafe
enough I hope no one uses it.)
There is a reason to source a template outside a task
directory. At least if there weren't then I can't think of why we
implemented `template.disable_file_sandbox`. So v0.12.8 left the
behavior of `template.source` the more straightforward "Interpolate and
validate."
However, since outside of `raw_exec` every other driver uses absolute
paths for `NOMAD_*_DIR` interpolation, this means those variables are
unusable unless `disable_file_sandbox` is set.
**The Fix**
The variables are now interpolated as relative paths *only for the
purpose of rendering templates.* This is an unfortunate special case,
but reflects the fact that the templates view of the filesystem is
completely different (unconstrainted) vs the task's view (chrooted).
Arguably the values of these variables *should be context-specific.*
I think it's more reasonable to think of the "hack" as templating
running uncontainerized than that giving templates different paths is a
hack.
**TODO**
- [ ] E2E tests
- [ ] Job validation may still be broken and prevent my fix from
working?
**raw_exec**
`raw_exec` is actually broken _a different way_ as exercised by tests in
this commit. I think we should probably remove these tests and fix that
in a followup PR/release, but I wanted to leave them in for the initial
review and discussion. Since non-containerized source paths are broken
anyway, perhaps there's another solution to this entire problem I'm
overlooking?
This PR adds the ability to set HTTP headers when downloading
an artifact from an `http` or `https` resource.
The implementation in `go-getter` is such that a new `HTTPGetter`
must be created for each artifact that sets headers (as opposed
to conveniently setting headers per-request). This PR maintains
the memoization of the default Getter objects, creating new ones
only for artifacts where headers are set.
Closes#9306
The unpublish workflow requires that we know the mode (RW vs RO) if we want to
unpublish the node. Update the hook and the Unpublish RPC so that we mark the
claim for release in a new state but leave the mode alone. This fixes a bug
where RO claims were failing node unpublish.
The core job GC doesn't know the mode, but we don't need it for that workflow,
so add a mode specifically for GC; the volumewatcher uses this as a sentinel
to check whether claims (with their specific RW vs RO modes) need to be claimed.
Even if a plugin sends back an empty `[]*device.DeviceGroup`, it's transformed to `nil` during the RPC. Our custom device plugin is returning empty `FingerprintResponse.Devices` very often. Our temporary fix is to send a dummy `*DeviceGroup` if the slice is empty. This has the effect of never triggering the "first fingerprint" and therefore timing out after 50s.
In turn, this made our node exceed its hearbeat grace period when restarting it, revoking all vault tokens for its allocations, causing a restart of all our allocations because the token couldn't be renewed.
Removing the logic for `f.Devices == nil` does not appear to affect the functionality of the function.
In Nomad v0.12.0, the client added additional fingerprinting around the
presense of the bridge kernel module. The fingerprinter only checked in
`/proc/modules` which is a list of loaded modules. In some cases, the
bridge kernel module is builtin rather than dynamically loaded. The fix
for that case is in #8721. However we were still missing the case where
the bridge module is dynamically loaded, but not yet loaded during the
startup of the Nomad agent. In this case the fingerprinter would believe
the bridge module was unavailable when really it gets loaded on demand.
This PR now has the fingerprinter scan the kernel module dependency file,
which will contain an entry for the bridge module even if it is not yet
loaded.
In summary, the client now looks for the bridge kernel module in
- /proc/modules
- /lib/modules/<kernel>/modules.builtin
- /lib/modules/<kernel>/modules.dep
Closes#8423
Beforehand tasks and field replacements did not have access to the
unique ID of their job or its parent. This adds this information as
new environment variables.
Prior to Nomad 0.12.5, you could use `${NOMAD_SECRETS_DIR}/mysecret.txt` as
the `artifact.destination` and `template.destination` because we would always
append the destination to the task working directory. In the recent security
patch we treated the `destination` absolute path as valid if it didn't escape
the working directory, but this breaks backwards compatibility and
interpolation of `destination` fields.
This changeset partially reverts the behavior so that we always append the
destination, but we also perform the escape check on that new destination
after interpolation so the security hole is closed.
Also, ConsulTemplate test should exercise interpolation
Ensure that the client honors the client configuration for the
`template.disable_file_sandbox` field when validating the jobspec's
`template.source` parameter, and not just with consul-template's own `file`
function.
Prevent interpolated `template.source`, `template.destination`, and
`artifact.destination` fields from escaping file sandbox.
* use msgtype in upsert node
adds message type to signature for upsert node, update tests, remove placeholder method
* UpsertAllocs msg type test setup
* use upsertallocs with msg type in signature
update test usage of delete node
delete placeholder msgtype method
* add msgtype to upsert evals signature, update test call sites with test setup msg type
handle snapshot upsert eval outside of FSM and ignore eval event
remove placeholder upsertevalsmsgtype
handle job plan rpc and prevent event creation for plan
msgtype cleanup upsertnodeevents
updatenodedrain msgtype
msg type 0 is a node registration event, so set the default to the ignore type
* fix named import
* fix signature ordering on upsertnode to match
* consul: advertise cni and multi host interface addresses
* structs: add service/check address_mode validation
* ar/groupservices: fetch networkstatus at hook runtime
* ar/groupservice: nil check network status getter before calling
* consul: comment network status can be nil
As newer versions of Consul are released, the minimum version of Envoy
it supports as a sidecar proxy also gets bumped. Starting with the upcoming
Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current
versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the
default implementation of Connect sidecar proxy.
This PR introduces a change such that each Nomad Client will query its
local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545)
and then launch the Connect sidecar proxy task using the latest supported version
of Envoy. If the `SupportedProxies` API component is not available from
Consul, Nomad will fallback to the old version of Envoy supported by old
versions of Consul.
Setting the meta configuration option `meta.connect.sidecar_image` or
setting the `connect.sidecar_task` stanza will take precedence as is
the current behavior for sidecar proxies.
Setting the meta configuration option `meta.connect.gateway_image`
will take precedence as is the current behavior for connect gateways.
`meta.connect.sidecar_image` and `meta.connect.gateway_image` may make
use of the special `${NOMAD_envoy_version}` variable interpolation, which
resolves to the newest version of Envoy supported by the Consul agent.
Addresses #8585#7665
Previously, Nomad was using a hand-made lookup table for looking
up EC2 CPU performance characteristics (core count + speed = ticks).
This data was incomplete and incorrect depending on region. The AWS
API has the correct data but requires API keys to use (i.e. should not
be queried directly from Nomad).
This change introduces a lookup table generated by a small command line
tool in Nomad's tools module which uses the Amazon AWS API.
Running the tool requires AWS_* environment variables set.
$ # in nomad/tools/cpuinfo
$ go run .
Going forward, Nomad can incorporate regeneration of the lookup table
somewhere in the CI pipeline so that we remain up-to-date on the latest
offerings from EC2.
Fixes#7830
Host with systemd-resolved have `/etc/resolv.conf` is a symlink
to `/run/systemd/resolve/stub-resolv.conf`. By bind-mounting
/etc/resolv.conf only, the exec container DNS resolution fail very badly.
This change fixes DNS resolution by binding /run/systemd/resolve as
well.
Note that this assumes that the systemd resolver (default to 127.0.0.53) is
accessible within the container. This is the case here because exec
containers share the same network namespace by default.
Jobs with custom network dns configurations are not affected, and Nomad
will continue to use the job dns settings rather than host one.
When defining a script-check in a group-level service, Nomad needs to
know which task is associated with the check so that it can use the
correct task driver to execute the check.
This PR fixes two bugs:
1) validate service.task or service.check.task is configured
2) make service.check.task inherit service.task if it is itself unset
Fixes#8952
- We previously added these to the client host metrics, but it's useful to have them on all client metrics.
- e.g. so you can exclude draining nodes from charts showing your fleet size.
The current implementation measures RPC request timeout only against
config.RPCHoldTimeout, which is fine for non-blocking requests but will
almost surely be exceeded by long-poll requests that block for minutes
at a time.
This adds an HasTimedOut method on the RPCInfo interface that takes into
account whether the request is blocking, its maximum wait time, and the
RPCHoldTimeout.
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.
```hcl
service {
connect {
gateway {
proxy {
// envoy proxy configuration
}
ingress {
// ingress-gateway configuration entry
}
}
}
}
```
A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.
Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.
Aims to address #8294 and tangentially #8647
* docker: support group allocated ports
* docker: add new ports driver config to specify which group ports are mapped
* docker: update port mapping docs
When the client-side actions of a CSI client RPC succeed but we get
disconnected during the RPC or we fail to checkpoint the claim state, we want
to be able to retry the client RPC without getting blocked by the client-side
state (ex. mount points) already having been cleaned up in previous calls.
Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This
may forward client RPCs to the node plugins or to the controller plugins,
depending on whether other allocations on this node have claims on this
volume.
By making clients responsible for running the `CSIVolume.Unpublish` RPC (and
making the RPC available to a `nomad volume detach` command), the
volumewatcher becomes only used by the core GC job and we no longer need
async volume GC from job deregister and node update.
Before, Connect Native Tasks needed one of these to work:
- To be run in host networking mode
- To have the Consul agent configured to listen to a unix socket
- To have the Consul agent configured to listen to a public interface
None of these are a great experience, though running in host networking is
still the best solution for non-Linux hosts. This PR establishes a connection
proxy between the Consul HTTP listener and a unix socket inside the alloc fs,
bypassing the network namespace for any Connect Native task. Similar to and
re-uses a bunch of code from the gRPC listener version for envoy sidecar proxies.
Proxy is established only if the alloc is configured for bridge networking and
there is at least one Connect Native task in the Task Group.
Fixes#8290
adds in oss components to support enterprise multi-vault namespace feature
upgrade specific doc on vault multi-namespaces
vault docs
update test to reflect new error
The NodePublish workflow currently creates the target path and its parent
directory. However, the CSI specification says that the CO shall ensure the
parent directory of the target path exists, and that the SP shall place the
block device or mounted directory at the target path. Much of our testing has
been with CSI plugins that are more forgiving, but our behavior breaks
spec-compliant CSI plugins.
This changeset ensures we only create the parent directory.
Also fixed the same typo in a test. Fixing the typo fixes the link, but
the link was still broken when running the website locally due to the
trailing slash. It would have worked in prod thanks to redirects, but
using the canonical URL seems ideal.
* ar: support opting into binding host ports to default network IP
* fix config plumbing
* plumb node address into network resource
* struct: only handle network resource upgrade path once
* command/agent/host: collect host data, multi platform
* nomad/structs/structs: new HostDataRequest/Response
* client/agent_endpoint: add RPC endpoint
* command/agent/agent_endpoint: add Host
* api/agent: add the Host endpoint
* nomad/client_agent_endpoint: add Agent Host with forwarding
* nomad/client_agent_endpoint: use findClientConn
This changes forwardMonitorClient and forwardProfileClient to use
findClientConn, which was cribbed from the common parts of those
funcs.
* command/debug: call agent hosts
* command/agent/host: eliminate calling external programs
This fixes a bug where a batch allocation fails to complete if it has
sidecars.
If the only remaining running tasks in an allocations are sidecars - we
must kill them and mark the allocation as complete.
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.
The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.
There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.
If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.
Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.
Fixes#6083
In #7957 we added support for passing a volume context to the controller RPCs.
This is an opaque map that's created by `CreateVolume` or, in Nomad's case,
in the volume registration spec.
However, we missed passing this field to the `NodeStage` and `NodePublish` RPC,
which prevents certain plugins (such as MooseFS) from making node RPCs.