Commit graph

3612 commits

Author SHA1 Message Date
Drew Bailey 684807bddb
namespace filtering 2020-10-14 12:44:43 -04:00
Drew Bailey fdc576af09
handle txn returning error 2020-10-14 12:44:42 -04:00
Drew Bailey df96b89958
Add EvictCallbackFn to handle removing entries from go-memdb when they
are removed from the event buffer.

Wire up event buffer size config, use pointers for structs.Events
instead of copying.
2020-10-14 12:44:42 -04:00
Drew Bailey 315f77a301
rehydrate event publisher on snapshot restore
address pr feedback
2020-10-14 12:44:41 -04:00
Drew Bailey d793529d61
event durability count and cfg 2020-10-14 12:44:40 -04:00
Drew Bailey b4c135358d
use Events to wrap index and events, store in events table 2020-10-14 12:44:39 -04:00
Drew Bailey 9d48818eb8
writetxn can return error, add alloc and job generic events. Add events
table for durability
2020-10-14 12:44:39 -04:00
Drew Bailey 400455d302
Events/eval alloc events (#9012)
* generic eval update event

first pass at alloc client update events

* api/event client
2020-10-14 12:44:37 -04:00
Drew Bailey 4793bb4e01
Events/deployment events (#9004)
* Node Drain events and Node Events (#8980)

Deployment status updates

handle deployment status updates (paused, failed, resume)

deployment alloc health

generate events from apply plan result

txn err check, slim down deployment event

one ndjson line per index

* consolidate down to node event + type

* fix UpdateDeploymentAllocHealth test invocations

* fix test
2020-10-14 12:44:37 -04:00
Drew Bailey a4a2975edf
Event Stream API/RPC (#8947)
This Commit adds an /v1/events/stream endpoint to stream events from.

The stream framer has been updated to include a SendFull method which
does not fragment the data between multiple frames. This essentially
treats the stream framer as a envelope to adhere to the stream framer
interface in the UI.

If the `encode` query parameter is omitted events will be streamed as
newline delimted JSON.
2020-10-14 12:44:36 -04:00
Drew Bailey 207068ca28
Events/event source node (#8918)
* Node Register/Deregister event sourcing

example upsert node with context

fill in writetxnwithctx

ctx passing to handle event type creation, wip test

node deregistration event

drop Node from registration event

* node batch deregistration
2020-10-14 12:44:35 -04:00
Drew Bailey 4753904b90
Events/cfg enable publisher (#8916)
* only enable publisher based on config

* add default prune tick

* back out state abandon changes on fsm close
2020-10-14 12:44:35 -04:00
Drew Bailey f820744746
abandon current state on server shutdown 2020-10-14 12:44:34 -04:00
Drew Bailey fddac3af00
Event Buffer Implemenation
adds an event buffer to hold events from raft changes.

update events to use event buffer

fix append call

provide way to prune buffer items after TTL

event publisher tests

basic publish test

wire up max item ttl

rename package to stream, cleanup exploratory work

subscription filtering

subscription plumbing

allow subscribers to consume events, handle closing subscriptions

back out old exploratory ctx work

fix lint

remove unused ctx bits

add a few comments

fix test

stop publisher on abandon
2020-10-14 12:44:34 -04:00
Chris Baker 1d35578bed removed backwards-compatible/untagged metrics deprecated in 0.7 2020-10-13 20:18:39 +00:00
Seth Hoenig ed13e5723f consul/connect: dynamically select envoy sidecar at runtime
As newer versions of Consul are released, the minimum version of Envoy
it supports as a sidecar proxy also gets bumped. Starting with the upcoming
Consul v1.9.X series, Envoy v1.11.X will no longer be supported. Current
versions of Nomad hardcode a version of Envoy v1.11.2 to be used as the
default implementation of Connect sidecar proxy.

This PR introduces a change such that each Nomad Client will query its
local Consul for a list of Envoy proxies that it supports (https://github.com/hashicorp/consul/pull/8545)
and then launch the Connect sidecar proxy task using the latest supported version
of Envoy. If the `SupportedProxies` API component is not available from
Consul, Nomad will fallback to the old version of Envoy supported by old
versions of Consul.

Setting the meta configuration option `meta.connect.sidecar_image` or
setting the `connect.sidecar_task` stanza will take precedence as is
the current behavior for sidecar proxies.

Setting the meta configuration option `meta.connect.gateway_image`
will take precedence as is the current behavior for connect gateways.

`meta.connect.sidecar_image` and `meta.connect.gateway_image` may make
use of the special `${NOMAD_envoy_version}` variable interpolation, which
resolves to the newest version of Envoy supported by the Consul agent.

Addresses #8585 #7665
2020-10-13 09:14:12 -05:00
Tim Gross 4335d847a4
Allow job Version to start at non-zero value (#9071)
Stop coercing version of new job to 0 in the state_store, so that we can add
regions to a multi-region deployment. Send new version, rather than existing
version, to MRD to accomodate version-choosing logic changes in ENT.

Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>
2020-10-12 13:59:48 -04:00
Nick Ethier d45be0b5a6
client: add NetworkStatus to Allocation (#8657) 2020-10-12 13:43:04 -04:00
Yoan Blanc 891accb89a
use allow/deny instead of the colored alternatives (#9019)
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-10-12 08:47:05 -04:00
Tim Gross 9b4917ae5f csi: volumewatcher only needs one pass to collect past claims
If a volume GC and a `nomad volume detach` command land concurrently, we can
end up with multiple claims without an allocation, which results in extra
no-op work when finding claims to collect as past claims.
2020-10-09 11:03:51 -04:00
Tim Gross ec1e75d9f4 csi: remove stray TODO comment
This item was completed in #8626
2020-10-09 11:03:51 -04:00
Tim Gross e8c13a2307
csi: validate mount options during volume registration (#9044)
Volumes using attachment mode `file-system` use the CSI filesystem API when
they're mounted, and can be passed mount options. But `block-device` mode
volumes don't have this option. When RPCs are made to plugins, we are silently
dropping the mount options we don't expect to see, but this results in a poor
operator experience when the mount options aren't honored. This changeset
makes passing mount options to a `block-device` volume a validation error.
2020-10-08 09:23:21 -04:00
Tim Gross 3ceb5b36b1
csi: allow more than 1 writer claim for multi-writer mode (#9040)
Fixes a bug where CSI volumes with the `MULTI_NODE_MULTI_WRITER` access mode
were using the same logic as `MULTI_NODE_SINGLE_WRITER` to determine whether
the volume had writer claims available for scheduling.

Extends CSI claim endpoint test to exercise multi-reader and make sure `WriteFreeClaims`
is exercised for multi-writer in feasibility test.
2020-10-07 10:43:23 -04:00
Seth Hoenig 0c5ae5769f
Merge pull request #9029 from hashicorp/b-tgs-updates
consul/connect: trigger update as necessary on connect changes
2020-10-05 16:48:04 -05:00
Seth Hoenig f44a4f68ee consul/connect: trigger update as necessary on connect changes
This PR fixes a long standing bug where submitting jobs with changes
to connect services would not trigger updates as expected. Previously,
service blocks were not considered as sources of destructive updates
since they could be synced with consul non-destructively. With Connect,
task group services that have changes to their connect block or to
the service port should be destructive, since the network plumbing of
the alloc is going to need updating.

Fixes #8596 #7991

Non-destructive half in #7192
2020-10-05 14:53:00 -05:00
Chris Baker 7f701fddd0 updated docs and validation to further prohibit null chars in region, datacenter, and job name 2020-10-05 18:01:50 +00:00
Chris Baker 23ea7cd27c updated job validate to refute job/group/task IDs containing null characters
updated CHANGELOG and upgrade guide
2020-10-05 18:01:49 +00:00
Chris Baker c8fd9428d4 documenting tests around null characters in job id, task group name, and task name 2020-10-05 18:01:49 +00:00
Fredrik Hoem Grelland a015c52846
configure nomad cluster to use a Consul Namespace [Consul Enterprise] (#8849) 2020-10-02 14:46:36 -04:00
Michael Schurter 765473e8b0 jobspec: lower min cpu resources from 10->1
Since CPU resources are usually a soft limit it is desirable to allow
setting it as low as possible to allow tasks to run only in "idle" time.

Setting it to 0 is still not allowed to avoid potential unintentional
side effects with allowing a zero value. While there may not be any side
effects this commit attempts to minimize risk by avoiding the issue.

This does *not* change the defaults.
2020-09-30 12:15:13 -07:00
Luiz Aoqui 88d4eecfd0
add scaling policy type 2020-09-29 17:57:46 -04:00
Seth Hoenig af9543c997 consul: fix validation of task in group-level script-checks
When defining a script-check in a group-level service, Nomad needs to
know which task is associated with the check so that it can use the
correct task driver to execute the check.

This PR fixes two bugs:
1) validate service.task or service.check.task is configured
2) make service.check.task inherit service.task if it is itself unset

Fixes #8952
2020-09-28 15:02:59 -05:00
Michael Schurter 9dd59ceaa7 core: improve job deregister error logging
Noticed this error in some production logs, and they were far from
helpful. Changes:

1. Include job ID in logs
2. Wrap errors and log once instead of double log lines
3. Test fsm error handling behavior
2020-09-21 08:59:03 -07:00
Pierre Cauchois e4b739cafd
RPC Timeout/Retries account for blocking requests (#8921)
The current implementation measures RPC request timeout only against
config.RPCHoldTimeout, which is fine for non-blocking requests but will
almost surely be exceeded by long-poll requests that block for minutes
at a time.

This adds an HasTimedOut method on the RPCInfo interface that takes into
account whether the request is blocking, its maximum wait time, and the
RPCHoldTimeout.
2020-09-18 08:58:41 -04:00
Seth Hoenig 57fc593363 consul/connect: validate group network on expose port injection
In #7800, Nomad would automatically generate a port label for service
checks making use of the expose feature, if the port was not already
set. This change assumed the group network would be correctly defined
(as is checked in a validation hook later). If the group network was
not definied, a panic would occur on job submisssion. This change
re-uses the group network validation helper to make sure the network
is correctly definied before adding ports to it.

Fixes #8875
2020-09-14 10:25:03 -05:00
Chris Baker d0cc0a768b
Update nomad/job_endpoint.go 2020-09-10 17:18:23 -05:00
Chris Baker eff726609d move variable out of oss-only build into shared file, fixes ent compile error introduced by #8834 2020-09-10 22:08:25 +00:00
Yoan Blanc 48d07c4d12
fix: panic in test introduced by #8453 (#8834)
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-09-09 09:38:15 -04:00
Chris Baker bfa366ea72
Update nomad/deployment_endpoint.go 2020-09-08 16:39:51 -05:00
Chris Baker 0e509bc11e check ACLs against deployment namespace on Deployment.GetDeployment, filtering the deployment if the ACL isn't appropriate 2020-09-08 19:57:28 +00:00
Drew Bailey 28aa0387e9
remove node events for state track changing pr
remove Txn and update calls with ReadTxn()

constructor for changetrackerdb
2020-09-04 10:23:35 -04:00
Drew Bailey d5f6d3b3c5
fix a few missed txn changes 2020-09-01 10:27:21 -04:00
Drew Bailey 9253146bf4
fix bad merge from scalingpoliciesbynamespace 2020-09-01 10:27:20 -04:00
Drew Bailey 45762d8df8
noop changetracker for snapshots 2020-09-01 10:27:20 -04:00
Drew Bailey 0af749c92e
Transaction change tracking
This commit wraps memdb.DB with a changeTrackerDB, which is a thin
wrapper around memdb.DB which enables go-memdb's TrackChanges on all write
transactions. When the transaction is comitted the changes are sent to
an eventPublisher which will be used to create and emit change events.

debugging TestFSM_ReconcileSummaries

wip

revert back rebase

revert back rebase

fix snapshot to actually use a snapshot
2020-09-01 10:27:20 -04:00
Jasmine Dahilig 71a694f39c
Merge pull request #8390 from hashicorp/lifecycle-poststart-hook
task lifecycle poststart hook
2020-08-31 13:53:24 -07:00
Jasmine Dahilig fbe0c89ab1 task lifecycle poststart: code review fixes 2020-08-31 13:22:41 -07:00
Mahmood Ali 117aec0036 Fix accidental broken clones
Fix CSIMountOptions.Copy() and VolumeRequest.Copy() where they
accidentally returned a reference to self rather than a deep copy.

`&(*ref)` in Golang apparently equivalent to plain `&ref`.
2020-08-28 15:29:22 -04:00
Tim Gross b77fe023b5
MRD: move 'job stop -global' handling into RPC (#8776)
The initial implementation of global job stop for MRD looped over all the
regions in the CLI for expedience. This changeset includes the OSS parts of
moving this into the RPC layer so that API consumers don't have to implement
this logic themselves.
2020-08-28 14:28:13 -04:00
Tim Gross 35b1b3bed7
structs: filter NomadTokenID from job diff (#8773)
Multiregion deployments use the `NomadTokenID` to allow the deploymentwatcher
to send RPCs between regions with the original submitter's ACL token. This ID
should be filtered from diffs so that it doesn't cause a difference for
purposes of job plans.
2020-08-28 13:40:51 -04:00
Lang Martin 7d483f93c0
csi: plugins track jobs in addition to allocations, and use job information to set expected counts (#8699)
* nomad/structs/csi: add explicit job support
* nomad/state/state_store: capture job updates directly
* api/nodes: CSIInfo needs the AllocID
* command/agent/csi_endpoint: AllocID was missing
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-08-27 17:20:00 -04:00
Seth Hoenig c4fd1c97aa
Merge pull request #8761 from hashicorp/b-consul-op-token-check
consul/connect: make use of task kind to determine service name in consul token checks
2020-08-27 14:08:33 -05:00
Tim Gross 606df14e78
MRD: deregister regions that are dropped on update (#8763)
This changeset is the OSS hooks for what will be implemented in ENT.
2020-08-27 14:54:45 -04:00
Seth Hoenig 84176c9a41 consul/connect: make use of task kind to determine service name in consul token checks
When consul.allow_unauthenticated is set to false, the job_endpoint hook validates
that a `-consul-token` is provided and validates the token against the privileges
inherent to a Consul Service Identity policy for all the Connect enabled services
defined in the job.

Before, the check was assuming the service was of type sidecar-proxy. This fixes the
check to use the type of the task so we can distinguish between the different connect
types.
2020-08-27 12:14:40 -05:00
Chris Baker 8b9145fabd state_store/fix the prefix bugs for scaling policies documented in 1a9318 2020-08-27 04:25:37 +00:00
Chris Baker 655cbb4d3c documenting tests for prefix bugs around job scaling policies 2020-08-27 03:22:13 +00:00
Seth Hoenig 9f1f2a5673 Merge branch 'master' into f-cc-ingress 2020-08-26 15:31:05 -05:00
Seth Hoenig 5d670c6d01 consul/connect: use context cancel more safely 2020-08-26 14:23:31 -05:00
Seth Hoenig dfe179abc5 consul/connect: fixup some comments and context timeout 2020-08-26 13:17:16 -05:00
Mahmood Ali 45f549e29e
Merge pull request #8691 from hashicorp/b-reschedule-job-versions
Respect alloc job version for lost/failed allocs
2020-08-25 18:02:45 -04:00
Mahmood Ali def768728e Have Plan.AppendAlloc accept the job 2020-08-25 17:22:09 -04:00
Mahmood Ali 18632955f2 clarify PathEscapesAllocDir specification
Clarify how to handle prefix value and path traversal within the alloc
dir but outside the prefix directory.
2020-08-24 20:44:26 -04:00
Mahmood Ali 9794760933 validate parameterized job request meta
Fixes a bug where `keys` metadata wasn't populated, as we iterated over
the empty newly-created `keys` map rather than the request Meta field.
2020-08-24 20:39:01 -04:00
Seth Hoenig 26e77623e5 consul/connect: fixup tests to use new consul sdk 2020-08-24 12:02:41 -05:00
Seth Hoenig c4fa644315 consul/connect: remove envoy dns option from gateway proxy config 2020-08-24 09:11:55 -05:00
Yoan Blanc 327d17e0dc
fixup! vendor: consul/api, consul/sdk v1.6.0
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-08-24 08:59:03 +02:00
Seth Hoenig 5b072029f2 consul/connect: add initial support for ingress gateways
This PR adds initial support for running Consul Connect Ingress Gateways (CIGs) in Nomad. These gateways are declared as part of a task group level service definition within the connect stanza.

```hcl
service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      ingress {
        // ingress-gateway configuration entry
      }
    }
  }
}
```

A gateway can be run in `bridge` or `host` networking mode, with the caveat that host networking necessitates manually specifying the Envoy admin listener (which cannot be disabled) via the service port value.

Currently Envoy is the only supported gateway implementation in Consul, and Nomad only supports running Envoy as a gateway using the docker driver.

Aims to address #8294 and tangentially #8647
2020-08-21 16:21:54 -05:00
Nick Ethier 3cd5f46613
Update UI to use new allocated ports fields (#8631)
* nomad: canonicalize alloc shared resources to populate ports

* ui: network ports

* ui: remove unused task network references and update tests with new shared ports model

* ui: lint

* ui: revert auto formatting

* ui: remove unused page objects

* structs: remove unrelated test from bad conflict resolution

* ui: formatting
2020-08-20 11:07:13 -04:00
Mahmood Ali 8a342926b7 Respect alloc job version for lost/failed allocs
This change fixes a bug where lost/failed allocations are replaced by
allocations with the latest versions, even if the version hasn't been
promoted yet.

Now, when generating a plan for lost/failed allocations, the scheduler
first checks if the current deployment is in Canary stage, and if so, it
ensures that any lost/failed allocations is replaced one with the latest
promoted version instead.
2020-08-19 09:52:48 -04:00
Tim Gross 1aa242c15a
failed core jobs should not have follow-ups (#8682)
If a core job fails more than the delivery limit, the leader will create a new
eval with the TriggeredBy field set to `failed-follow-up`.

Evaluations for core jobs have the leader's ACL, which is not valid on another
leader after an election. The `failed-follow-up` evals do not have ACLs, so
core job evals that fail more than the delivery limit or core job evals that
span leader elections will never succeed and will be re-enqueued forever. So
we should not retry with a `failed-follow-up`.
2020-08-18 16:48:43 -04:00
Tim Gross 38ec70eb8d
multiregion: validation should always return error for OSS (#8687) 2020-08-18 15:35:38 -04:00
Lang Martin e8a5565c1a
nomad/state/state_store: handle type conversion failure explicitly (#8660) 2020-08-12 17:53:12 -04:00
Michael Schurter de08ae8083 test: add allocrunner test for poststart hooks 2020-08-12 09:54:14 -07:00
Mahmood Ali c462f8d0d5
Merge pull request #8524 from hashicorp/b-vault-health-checks
Skip checking Vault health
2020-08-11 16:01:07 -04:00
Lang Martin a27913e699
CSI RPC Token (#8626)
* client/allocrunner/csi_hook: use the Node SecretID
* client/allocrunner/csi_hook: include the namespace for Claim
2020-08-11 13:08:39 -04:00
Lang Martin c82b2a2454
CSI: volume and plugin allocations in the API (#8590)
* command/agent/csi_endpoint: explicitly convert to API structs, and convert allocs for single object get endpoints
2020-08-11 12:24:41 -04:00
Tim Gross def7084be7
msgpack-rpc errors cannot be wrapped (#8633)
Our RPC calls mangle the errors we get, which prevents us from using wrapped
errors and `errors.Is`.

Also fixes log message fields.
2020-08-11 10:25:43 -04:00
Tim Gross 443fdaa86b
csi: nomad volume detach command (#8584)
The soundness guarantees of the CSI specification leave a little to be desired
in our ability to provide a 100% reliable automated solution for managing
volumes. This changeset provides a new command to bridge this gap by providing
the operator the ability to intervene.

The command doesn't take an allocation ID so that the operator doesn't have to
keep track of alloc IDs that may have been GC'd. Handle this case in the
unpublish RPC by sending the client RPC for all the terminal/nil allocs on the
selected node.
2020-08-11 10:18:54 -04:00
Tim Gross fb27082e5c
RPC errors must be wrapped in order to wrap internal errors (#8632)
The CSI client RPC uses error wrapping to detect the type of error bubbling up
from plugins, but if the errors we get aren't wrapped at each layer, we can't
unwrap the inner error.

Also eliminates some unused args.
2020-08-11 09:13:52 -04:00
Lang Martin f245ba91c4
nomad/state/state_store: two cases of incorrect CSIPlugin in-place (#8630) 2020-08-10 18:15:29 -04:00
Mahmood Ali dce1dc44eb distinguish between transient and persistent errors 2020-08-10 16:46:06 -04:00
Seth Hoenig 6ab3d21d2c consul: validate script type when ussing check thresholds 2020-08-10 14:08:09 -05:00
Seth Hoenig fd4804bf26 consul: able to set pass/fail thresholds on consul service checks
This change adds the ability to set the fields `success_before_passing` and
`failures_before_critical` on Consul service check definitions. This is a
feature added to Consul v1.7.0 and later.
  https://www.consul.io/docs/agent/checks#success-failures-before-passing-critical

Nomad doesn't do much besides pass the fields through to Consul.

Fixes #6913
2020-08-10 14:08:09 -05:00
Tim Gross e5496c7994
csi: missing plugins during node delete are not an error (#8619)
When deregistering a client, CSI plugins running on that client may not get a
chance to fingerprint before being stopped. Account for the case where a
plugin allocation is the last instance of the plugin and has been deleted from
the state store to avoid errors during node deregistration.
2020-08-10 11:02:01 -04:00
Mahmood Ali 628985c51e
Merge pull request #8613 from alrs/state-test-errs
nomad/state: fix dropped scaling_policy test errors
2020-08-10 08:14:19 -04:00
Lars Lehtonen f8a42f587f
nomad/state: fix dropped scaling_policy test errors 2020-08-07 23:05:33 -07:00
Tim Gross 69f4f171e5
CSI: fix missing ACL tokens for leader-driven RPCs (#8607)
The volumewatcher and GC job in the leader can't make CSI RPCs when ACLs are
enabled without the leader ACL token being passed thru.
2020-08-07 15:37:27 -04:00
Tim Gross 7d53ed88d6
csi: client RPCs should return wrapped errors for checking (#8605)
When the client-side actions of a CSI client RPC succeed but we get
disconnected during the RPC or we fail to checkpoint the claim state, we want
to be able to retry the client RPC without getting blocked by the client-side
state (ex. mount points) already having been cleaned up in previous calls.
2020-08-07 11:01:36 -04:00
Tim Gross 81b604fa13
csi: controller unpublish should check current alloc count (#8604)
Using the count of node claims from earlier in the `CSIVolume.Unpublish RPC
doesn't correctly account for cases where the RPC was interrupted but
checkpointed. Instead, we'll check the current allocation count and status to
determine whether we need to send a controller unpublish.
2020-08-07 10:43:45 -04:00
Tim Gross 2854298089
csi: release claims via csi_hook postrun unpublish RPC (#8580)
Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This
may forward client RPCs to the node plugins or to the controller plugins,
depending on whether other allocations on this node have claims on this
volume.

By making clients responsible for running the `CSIVolume.Unpublish` RPC (and
making the RPC available to a `nomad volume detach` command), the
volumewatcher becomes only used by the core GC job and we no longer need
async volume GC from job deregister and node update.
2020-08-06 14:51:46 -04:00
Michael Schurter 057e1c021f
Merge pull request #8597 from hashicorp/b-vault-revoke-log-line
vault: log once per interval if batching revocation
2020-08-06 11:32:47 -07:00
Tim Gross 314458ebdb
csi: update volumewatcher to use unpublish RPC (#8579)
This changeset updates `nomad/volumewatcher` to take advantage of the
`CSIVolume.Unpublish` RPC. This lets us eliminate a bunch of code and
associated tests. The raft batching code can be safely dropped, as the
characteristic times of the CSI RPCs are on the order of seconds or even
minutes, so batching up raft RPCs added complexity without any real world
performance wins.

Includes refactor w/ test cleanup and dead code elimination in volumewatcher
2020-08-06 14:31:18 -04:00
Tim Gross eaa14ab64c
csi: add unpublish RPC (#8572)
This changeset is plumbing for a `nomad volume detach` command that will be
reused by the volumewatcher claim GC as well.
2020-08-06 13:51:29 -04:00
Tim Gross 4bbf18703f
csi: retry controller client RPCs on next controller (#8561)
The documentation encourages operators to run multiple controller plugin
instances for HA, but the client RPCs don't take advantage of this by retrying
when the RPC fails in cases when the plugin is unavailable (because the node
has drained or the alloc has failed but we haven't received an updated
fingerprint yet).

This changeset tries all known controllers on ready nodes before giving up,
and adds tests that exercise the client RPC routing and retries.
2020-08-06 13:24:24 -04:00
Michael Schurter 2385fee0d2 vault: log once per interval if batching revocation
This log line should be rare since:

1. Most tokens should be logged synchronously, not via this async
   batched method. Async revocation only takes place when Vault
   connectivity is lost and after leader election so no revocations are
   missed.
2. There should rarely be >1 batch (1,000) tokens to revoke since the
   above conditions should be brief and infrequent.
3. Interval is 5 minutes, so this log line will be emitted at *most*
   once every 5 minutes.

What makes this log line rare is also what makes it interesting: due to
a bug prior to Nomad 0.11.2 some tokens may never get revoked. Therefore
Nomad tries to re-revoke them on every leader election. This caused a
massive buildup of old tokens that would never be properly revoked and
purged. Nomad 0.11.3 mostly fixed this but still had a bug in purging
revoked tokens via Raft (fixed in #8553).

The nomad.vault.distributed_tokens_revoked metric is only ticked upon
successful revocation and purging, making any bugs or slowness in the
process difficult to detect.

Logging before a potentially slow revocation+purge operation is
performed will give users much better indications of what activity is
going on should the process fail to make it to the metric.
2020-08-05 15:39:21 -07:00
Mahmood Ali 490b9ce3a0
Handle Scaling Policies in Job Plan endpoint (#8567)
Fixes https://github.com/hashicorp/nomad/issues/8544

This PR fixes a bug where using `nomad job plan ...` always report no change if the submitted job contain scaling.

The issue has three contributing factors:
1. The plan endpoint doesn't populate the required scaling policy ID; unlike the job register endpoint
2. The plan endpoint suppresses errors on job insertion - the job insertion fails here, because the scaling policy is missing the required ID
3. The scheduler reports no update necessary when the relevant job isn't in store (because the insertion failed)

This PR fixes the first two factors.  Changing the scheduler to be more strict might make sense, but may violate some idempotency invariant or make the scheduler more brittle.
2020-07-30 12:27:36 -04:00
Seth Hoenig 2511f48351 consul/connect: add support for bridge networks with connect native tasks
Before, Connect Native Tasks needed one of these to work:

- To be run in host networking mode
- To have the Consul agent configured to listen to a unix socket
- To have the Consul agent configured to listen to a public interface

None of these are a great experience, though running in host networking is
still the best solution for non-Linux hosts. This PR establishes a connection
proxy between the Consul HTTP listener and a unix socket inside the alloc fs,
bypassing the network namespace for any Connect Native task. Similar to and
re-uses a bunch of code from the gRPC listener version for envoy sidecar proxies.

Proxy is established only if the alloc is configured for bridge networking and
there is at least one Connect Native task in the Task Group.

Fixes #8290
2020-07-29 09:26:01 -05:00
Michael Schurter 80f521cce5 vault: expired tokens count toward batch limit
As of 0.11.3 Vault token revocation and purging was done in batches.
However the batch size was only limited by the number of *non-expired*
tokens being revoked.

Due to bugs prior to 0.11.3, *expired* tokens were not properly purged.
Long-lived clusters could have thousands to *millions* of very old
expired tokens that never got purged from the state store.

Since these expired tokens did not count against the batch limit, very
large batches could be created and overwhelm servers.

This commit ensures expired tokens count toward the batch limit with
this one line change:

```
- if len(revoking) >= toRevoke {
+ if len(revoking)+len(ttlExpired) >= toRevoke {
```

However, this code was difficult to test due to being in a periodically
executing loop. Most of the changes are to make this one line change
testable and test it.
2020-07-28 15:42:47 -07:00
Drew Bailey bd421b6197
Merge pull request #8453 from hashicorp/oss-multi-vault-ns
oss compoments for multi-vault namespaces
2020-07-27 08:45:22 -04:00
Drew Bailey b296558b8e
oss compoments for multi-vault namespaces
adds in oss components to support enterprise multi-vault namespace feature

upgrade specific doc on vault multi-namespaces

vault docs

update test to reflect new error
2020-07-24 10:14:59 -04:00
James Rasell da91e1d0fc
api: add namespace to scaling status GET response object. 2020-07-24 11:19:25 +02:00
Mahmood Ali 5d86f84c5a test tweaks 2020-07-23 13:25:25 -04:00
Mahmood Ali 5f6162ba46 run revoke daemon if connection is successful 2020-07-23 13:08:16 -04:00
Mahmood Ali 48ebedb738 vault: simply make the API call
Avoid checking if API is accessible, just make the API call and handle
when it fails.
2020-07-23 11:33:08 -04:00
Tim Gross d3341a2019 refactor: make it clear where we're accessing dstate
The field name `Deployment.TaskGroups` contains a map of `DeploymentState`,
which makes it a little harder to follow state updates when combined with
inconsistent naming conventions, particularly when we also have the state
store or actual `TaskGroup`s in scope. This changeset changes all uses to
`dstate` so as not to be confused with actual TaskGroups.
2020-07-20 11:25:53 -04:00
Lang Martin a3bfd8c209
structs: Job.Validate only allows stop_after_client_disconnected on batch and service jobs (#8444)
* nomad/structs/structs: add to Job.Validate

* Update nomad/structs/structs.go

Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>

* nomad/structs/structs: match error strings to the config file

* nomad/structs/structs_test: clarify the test a bit

* nomad/structs/structs_test: typo in the test error comparison

Co-authored-by: Mahmood Ali <mahmood@hashicorp.com>
2020-07-20 10:27:25 -04:00
Mahmood Ali 78568b8e63 Remove unused state.TestInitState 2020-07-20 09:55:55 -04:00
Mahmood Ali a483dde8b9 minor tweaks from Ent 2020-07-20 09:25:09 -04:00
Mahmood Ali 5adbd9f666 enterprise specific state store objects 2020-07-20 09:22:26 -04:00
Mahmood Ali ad2d484974 Set AgentShutdown 2020-07-17 11:04:57 -04:00
Mahmood Ali 647c5e4c03
Merge pull request #8435 from hashicorp/b-atomic-job-register
Atomic eval insertion with job (de-)registration
2020-07-15 13:48:07 -04:00
Mahmood Ali aa500f7ba3 comment compat concern in fsm.go 2020-07-15 11:23:49 -04:00
Mahmood Ali f4a921f2be no need to handle duplicate evals anymore 2020-07-15 11:14:49 -04:00
Mahmood Ali a314744210 only set args.Eval after all servers upgrade
We set the Eval field on job (de-)registration only after all servers
get upgraded, to avoid dealing with duplicate evals.
2020-07-15 11:10:57 -04:00
Mahmood Ali 910776caf0 time.Now().UTC().UnixNano() -> time.Now().UnixNano() 2020-07-15 08:49:17 -04:00
Kurt Neufeld 62851f6ccb
fixed typo in output (#1) 2020-07-14 10:33:17 -06:00
Mahmood Ali fbfe4ab1bd Atomic eval insertion with job (de-)registration
This fixes a bug where jobs may get "stuck" unprocessed that
dispropotionately affect periodic jobs around leadership transitions.
When registering a job, the job registration and the eval to process it
get applied to raft as two separate transactions; if the job
registration succeeds but eval application fails, the job may remain
unprocessed. Operators may detect such failure, when submitting a job
update and get a 500 error code, and they could retry; periodic jobs
failures are more likely to go unnoticed, and no further periodic
invocations will be processed until an operator force evaluation.

This fixes the issue by ensuring that the job registration and eval
application get persisted and processed atomically in the same raft log
entry.

Also, applies the same change to ensure atomicity in job deregistration.

Backward Compatibility

We must maintain compatibility in two scenarios: mixed clusters where a
leader can handle atomic updates but followers cannot, and a recent
cluster processes old log entries from legacy or mixed cluster mode.

To handle this constraints: ensure that the leader continue to emit the
Evaluation log entry until all servers have upgraded; also, when
processing raft logs, the servers honor evaluations found in both spots,
the Eval in job (de-)registration and the eval update entries.

When an updated server sees mix-mode behavior where an eval is inserted
into the raft log twice, it ignores the second instance.

I made one compromise in consistency in the mixed-mode scenario: servers
may disagree on the eval.CreateIndex value: the leader and updated
servers will report the job registration index while old servers will
report the index of the eval update log entry. This discripency doesn't
seem to be material - it's the eval.JobModifyIndex that matters.
2020-07-14 11:59:29 -04:00
Tim Gross bd457343de
MRD: all regions should start pending (#8433)
Deployments should wait until kicked off by `Job.Register` so that we can
assert that all regions have a scheduled deployment before starting any
region. This changeset includes the OSS fixes to support the ENT work.

`IsMultiregionStarter` has no more callers in OSS, so remove it here.
2020-07-14 10:57:37 -04:00
Tim Gross 0ce3c1e942
multiregion: allow empty region DCs (#8426)
It's supposed to be possible for a region not to have `datacenters` set so
that it can use the job's `datacenters` field. This requires that operators
use the same DC name across multiple regions, but that's the default client
configuration.
2020-07-13 13:34:19 -04:00
Nick Ethier d171189afc
nomad: recanonicalize network after connect hook (#8407)
* nomad: recanonicalize network after connect hook
2020-07-10 10:59:51 -04:00
Seth Hoenig 6fc63ede76
Merge pull request #7733 from jorgemarey/b-vault-policies
Fix get all vault token policies
2020-07-09 10:05:59 -05:00
Seth Hoenig f023df7b68
Merge pull request #8392 from hashicorp/f-infer-cn-taskname
consul/connect: infer task name for native service if possible
2020-07-08 14:17:25 -05:00
Seth Hoenig 5be1679b86
Merge pull request #8338 from jorgemarey/b-fix-sidecar-task
Change connectDriverConfig to be a func
2020-07-08 14:00:27 -05:00
Seth Hoenig 1a75da0ce0 consul/connect: infer task name in service if possible
Before, the service definition for a Connect Native service would always
require setting the `service.task` parameter. Now, that parameter is
automatically inferred when there is only one task in the task group.

Fixes #8274
2020-07-08 13:31:44 -05:00
Jasmine Dahilig 9e27231953 add poststart hook to task hook coordinator & structs 2020-07-08 11:01:35 -07:00
Tim Gross ec96ddf648
fix swapped old/new multiregion plan diffs (#8378)
The multiregion plan diffs swap the old and new versions for each region when
they're edited (rather than added/removed). The `multiregionRegionDiff`
function call incorrectly reversed its arguments for existing regions.
2020-07-08 10:10:50 -04:00
Jorge Marey a3740cba9b Change connectDriverConfig to be a func 2020-07-07 08:59:59 +02:00
Nick Ethier e0fb634309
ar: support opting into binding host ports to default network IP (#8321)
* ar: support opting into binding host ports to default network IP

* fix config plumbing

* plumb node address into network resource

* struct: only handle network resource upgrade path once
2020-07-06 18:51:46 -04:00
Chris Baker 5b96c3d50e
Merge pull request #8360 from hashicorp/b-8355-better-scaling-validation
better error handling around Scaling->Max
2020-07-06 11:32:02 -05:00
Chris Baker 5aa46e9a8f modified state store to allow version skipping, to support multiregion version syncing also, passing existing version into multiregionRegister to support this 2020-07-06 14:16:55 +00:00
Lars Lehtonen f32e80175d
nomad: fix dropped test error (#8356) 2020-07-06 08:46:54 -04:00
Chris Baker a77e012220 better testing of scaling parsing, fixed some broken tests by api
changes
2020-07-04 19:32:37 +00:00
Chris Baker 9100b6b7c0 changes to make sure that Max is present and valid, to improve error messages
* made api.Scaling.Max a pointer, so we can detect (and complain) when it is neglected
* added checks to HCL parsing that it is present
* when Scaling.Max is absent/invalid, don't return extraneous error messages during validation
* tweak to multiregion handling to ensure that the count is valid on the interpolated regional jobs

resolves #8355
2020-07-04 19:05:50 +00:00
Lang Martin 6c22cd587d
api: nomad debug new /agent/host (#8325)
* command/agent/host: collect host data, multi platform

* nomad/structs/structs: new HostDataRequest/Response

* client/agent_endpoint: add RPC endpoint

* command/agent/agent_endpoint: add Host

* api/agent: add the Host endpoint

* nomad/client_agent_endpoint: add Agent Host with forwarding

* nomad/client_agent_endpoint: use findClientConn

This changes forwardMonitorClient and forwardProfileClient to use
findClientConn, which was cribbed from the common parts of those
funcs.

* command/debug: call agent hosts

* command/agent/host: eliminate calling external programs
2020-07-02 09:51:25 -04:00
Tim Gross 23be116da0
csi: add -force flag to volume deregister (#8295)
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.

If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
2020-07-01 12:17:51 -04:00
Mahmood Ali 7f460d2706 allocrunner: terminate sidecars in the end
This fixes a bug where a batch allocation fails to complete if it has
sidecars.

If the only remaining running tasks in an allocations are sidecars - we
must kill them and mark the allocation as complete.
2020-06-29 15:12:15 -04:00
Drew Bailey 01e2cc5054
allow ClusterMetadata to accept a watchset (#8299)
* allow ClusterMetadata to accept a watchset

* use nil instead of empty watchset
2020-06-26 13:23:32 -04:00
Mahmood Ali 49a177ce28
Merge pull request #8017 from hashicorp/f-change-sched-updated
Set Updated to true for all non-CAS requests on v1/operator/scheduler/configuration
2020-06-26 08:39:37 -04:00
Mahmood Ali 6605ebd314
Merge pull request #8223 from hashicorp/f-multi-network-validate-ports
core: validate port numbers are < 65535
2020-06-26 08:31:01 -04:00
Nick Ethier 89118016fc
command: correctly show host IP in ports output /w multi-host networks (#8289) 2020-06-25 15:16:01 -04:00
Tim Gross 67ffcb35e9
multiregion: add support for 'job plan' (#8266)
Add a scatter-gather for multiregion job plans. Each region's servers
interpolate the plan locally in `Job.Plan` but don't distribute the plan as
done in `Job.Run`.

Note that it's not possible to return a usable modify index from a multiregion
plan for use with `-check-index`. Even if we were to force the modify index to
be the same at the start of `Job.Run` the index immediately drifts during each
region's deployments, depending on events local to each region. So we omit
this section of a multiregion plan.
2020-06-24 13:24:55 -04:00
Tim Gross a449009e9f
multiregion validation fixes (#8265)
Multi-region jobs need to bypass validating counts otherwise we get spurious
warnings in Job.Plan.
2020-06-24 12:18:51 -04:00
Seth Hoenig 3872b493e5
Merge pull request #8011 from hashicorp/f-cnative-host
consul/connect: implement initial support for connect native
2020-06-24 10:33:12 -05:00
Seth Hoenig 011c6b027f connect/native: doc and comment tweaks from PR 2020-06-24 10:13:22 -05:00
Michael Schurter 7869ebc587 docs: add comments to structs.Port struct 2020-06-23 11:38:01 -07:00
Michael Schurter 13ed710a04 core: validate port numbers are <= 65535
The scheduler returns a very strange error if it detects a port number
out of range. If these would somehow make it to the client they would
overflow when converted to an int32 and could cause conflicts.
2020-06-23 11:31:49 -07:00
Seth Hoenig 6c5ab7f45e consul/connect: split connect native flag and task in service 2020-06-23 10:22:22 -05:00
Seth Hoenig 4d71f22a11 consul/connect: add support for running connect native tasks
This PR adds the capability of running Connect Native Tasks on Nomad,
particularly when TLS and ACLs are enabled on Consul.

The `connect` stanza now includes a `native` parameter, which can be
set to the name of task that backs the Connect Native Consul service.

There is a new Client configuration parameter for the `consul` stanza
called `share_ssl`. Like `allow_unauthenticated` the default value is
true, but recommended to be disabled in production environments. When
enabled, the Nomad Client's Consul TLS information is shared with
Connect Native tasks through the normal Consul environment variables.
This does NOT include auth or token information.

If Consul ACLs are enabled, Service Identity Tokens are automatically
and injected into the Connect Native task through the CONSUL_HTTP_TOKEN
environment variable.

Any of the automatically set environment variables can be overridden by
the Connect Native task using the `env` stanza.

Fixes #6083
2020-06-22 14:07:44 -05:00
Mahmood Ali 862834a792 testS: add all namespaces test for allocations 2020-06-22 10:26:08 -04:00
Michael Schurter 562704124d
Merge pull request #8208 from hashicorp/f-multi-network
multi-interface network support
2020-06-19 15:46:48 -07:00