Commit graph

2066 commits

Author SHA1 Message Date
James Rasell 8295d0e516
acl: add validation to binding rule selector on upsert. (#16210)
* acl: add validation to binding rule selector on upsert.

* docs: add more information on binding rule selector escaping.
2023-02-17 15:38:55 +01:00
Alessio Perugini 4e9ec24b22
Allow configurable range of Job priorities (#16084) 2023-02-17 09:23:13 -05:00
Seth Hoenig 165791dd89
artifact: protect against unbounded artifact decompression (1.5.0) (#16151)
* artifact: protect against unbounded artifact decompression

Starting with 1.5.0, set defaut values for artifact decompression limits.

artifact.decompression_size_limit (default "100GB") - the maximum amount of
data that will be decompressed before triggering an error and cancelling
the operation

artifact.decompression_file_count_limit (default 4096) - the maximum number
of files that will be decompressed before triggering an error and
cancelling the operation.

* artifact: assert limits cannot be nil in validation
2023-02-14 09:28:39 -06:00
Charlie Voiselle d93ba0cf32
Add warnings to var put for non-alphanumeric keys. (#15933)
* Warn when Items key isn't directly accessible

Go template requires that map keys are alphanumeric for direct access
using the dotted reference syntax. This warns users when they create
keys that run afoul of this requirement.

- cli: use regex to detect invalid indentifiers in var keys
- test: fix slash in escape test case
- api: share warning formatting function between API and CLI
- ui: warn if var key has characters other than _, letter, or number

---------
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2023-02-13 16:14:59 -05:00
Seth Hoenig df3e8f82da
deps: update go-set, go-landlock (#16146)
Made a breaking change in go-set (String() signature), need to update
both these dependencies together and also fix a thing in structs.go
2023-02-13 08:26:30 -06:00
Tim Gross c2bd829fe2
tests: don't mutate global structs in core scheduler tests (#16120)
Some of the core scheduler tests need the maximum batch size for writes to be
smaller than the usual `structs.MaxUUIDsPerWriteRequest`. But they do so by
unsafely modifying the global struct, which creates test flakes in other tests.

Modify the functions under test to take a batch size parameter. Production code
will pass the global while the tests can inject smaller values. Turn the
`structs.MaxUUIDsPerWriteRequest` into a constant, and add a semgrep rule for
avoiding this kind of thing in the future.
2023-02-10 09:26:00 -05:00
Seth Hoenig 0e7bf87ee1
deps: upgrade to hashicorp/golang-lru/v2 (#16085) 2023-02-08 15:20:33 -06:00
Michael Schurter 35d65c7c7e
Dynamic Node Metadata (#15844)
Fixes #14617
Dynamic Node Metadata allows Nomad users, and their jobs, to update Node metadata through an API. Currently Node metadata is only reloaded when a Client agent is restarted.

Includes new UI for editing metadata as well.

---------

Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>
2023-02-07 14:42:25 -08:00
Michael Schurter 0a496c845e
Task API via Unix Domain Socket (#15864)
This change introduces the Task API: a portable way for tasks to access Nomad's HTTP API. This particular implementation uses a Unix Domain Socket and, unlike the agent's HTTP API, always requires authentication even if ACLs are disabled.

This PR contains the core feature and tests but followup work is required for the following TODO items:

- Docs - might do in a followup since dynamic node metadata / task api / workload id all need to interlink
- Unit tests for auth middleware
- Caching for auth middleware
- Rate limiting on negative lookups for auth middleware

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-02-06 11:31:22 -08:00
Phil Renaud d3c351d2d2
Label for the Web UI (#16006)
* Demoable state

* Demo mirage color

* Label as a block with foreground and background colours

* Test mock updates

* Go test updated

* Documentation update for label support
2023-02-02 16:29:04 -05:00
Tim Gross 19a2c065f4
System and sysbatch jobs always have zero index (#16030)
Service jobs should have unique allocation Names, derived from the
Job.ID. System jobs do not have unique allocation Names because the index is
intended to indicated the instance out of a desired count size. Because system
jobs do not have an explicit count but the results are based on the targeted
nodes, the index is less informative and this was intentionally omitted from the
original design.

Update docs to make it clear that NOMAD_ALLOC_INDEX is always zero for 
system/sysbatch jobs

Validate that `volume.per_alloc` is incompatible with system/sysbatch jobs.
System and sysbatch jobs always have a `NOMAD_ALLOC_INDEX` of 0. So
interpolation via `per_alloc` will not work as soon as there's more than one
allocation placed. Validate against this on job submission.
2023-02-02 16:18:01 -05:00
Daniel Bennett 335f0a5371
docs: how to troubleshoot consul connect envoy (#15908)
* largely a doc-ification of this commit message:
  d47678074bf8ae9ff2da3c91d0729bf03aee8446
  this doesn't spell out all the possible failure modes,
  but should be a good starting point for folks.

* connect: add doc link to envoy bootstrap error

* add Unwrap() to RecoverableError
  mainly for easier testing
2023-02-02 14:20:26 -06:00
Charlie Voiselle cc6f4719f1
Add option to expose workload token to task (#15755)
Add `identity` jobspec block to expose workload identity tokens to tasks.

---------

Co-authored-by: Anders <mail@anars.dk>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
2023-02-02 10:59:14 -08:00
Phil Renaud 3db9f11c37
[feat] Nomad Job Templates (#15746)
* Extend variables under the nomad path prefix to allow for job-templates (#15570)

* Extend variables under the nomad path prefix to allow for job-templates

* Add job-templates to error message hinting

* RadioCard component for Job Templates (#15582)

* chore: add

* test: component API

* ui: component template

* refact: remove  bc naming collission

* styles: remove SASS var causing conflicts

* Disallow specific variable at nomad/job-templates (#15681)

* Disallows variables at exactly nomad/job-templates

* idiomatic refactor

* Expanding nomad job init to accept a template flag (#15571)

* Adding a string flag for templates on job init

* data-down actions-up version of a custom template editor within variable

* Dont force grid on job template editor

* list-templates flag started

* Correctly slice from end of path name

* Pre-review cleanup

* Variable form acceptance test for job template editing

* Some review cleanup

* List Job templates test

* Example from template test

* Using must.assertions instead of require etc

* ui: add choose template button (#15596)

* ui: add new routes

* chore: update file directory

* ui: add choose template button

* test: button and page navigation

* refact: update var name

* ui: use `Button` component from `HDS` (#15607)

* ui: integrate  buttons

* refact: remove  helper

* ui: remove icons on non-tertiary buttons

* refact: update normalize method for key/value pairs (#15612)

* `revert`: `onCancel` for `JobDefinition`

The `onCancel` method isn't included in the component API for `JobEditor` and the primary cancel behavior exists outside of the component. With the exception of the `JobDefinition` page where we include this button in the top right of the component instead of next to the `Plan` button.

* style: increase button size

* style: keep lime green

* ui: select template (#15613)

* ui: deprecate unused component

* ui: deprecate tests

* ui: jobs.run.templates.index

* ui: update logic to handle templates

* refact: revert key/value changes

* style: padding for cards + buttons

* temp: fixtures for mirage testing

* Revert "refact: revert key/value changes"

This reverts commit 124e95d12140be38fc921f7e15243034092c4063.

* ui: guard template for unsaved job

* ui: handle reading template variable

* Revert "refact: update normalize method for key/value pairs (#15612)"

This reverts commit 6f5ffc9b610702aee7c47fbff742cc81f819ab74.

* revert: remove test fixtures

* revert: prettier problems

* refact: test doesnt need filter expression

* styling: button sizes and responsive cards

* refact: remove route guarding

* ui: update variable adapter

* refact: remove model editing behavior

* refact: model should query variables to populate editor

* ui: clear qp on exit

* refact: cleanup deprecated API

* refact: query all namespaces

* refact: deprecate action

* ui: rely on  collection

* refact: patch deprecate transition API

* refact: patch test to expect namespace qp

* styling: padding, conditionals

* ui: flashMessage on 404

* test: update for o(n+1) query

* ui: create new job template (#15744)

* refact: remove unused code

* refact: add type safety

* test: select template flow

* test: add data-test attrs

* chore: remove dead code

* test: create new job flow

* ui: add create button

* ui: create job template

* refact: no need for wildcard

* refact:  record instead of delete

* styling: spacing

* ui: add error handling and form validation to job create template (#15767)

* ui: handle server side errors

* ui: show error to prevent duplicate

* refact: conditional namespace

* ui: save as template flow (#15787)

* bug:  patches failing tests associated with `pretender` (#15812)

* refact: update assertion

* refact: test set-up

* ui: job templates manager view (#15815)

* ui: manager list view

* test: edit flow

* refact: deprecate column-helper

* ui: template edit and delete flow (#15823)

* ui: manager list view

* refact: update title

* refact: update permissions

* ui: template edit page

* bug: typo

* refact: update toast messages

* bug:  clear selections on exit (#15827)

* bug:  clear controllers on exit

* test: mirage config changes (#15828)

* refact: deprecate column-helper

* style: update z-index for HDS

* Revert "style: update z-index for HDS"

This reverts commit d3d87ceab6d083f7164941587448607838944fc1.

* refact: update delete button

* refact: edit redirect

* refact: patch reactivity issues

* styling: fixed width

* refact: override defaults

* styling: edit text causing overflow

* styling:  add inline text

Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>

* bug: edit `text` to `template`

Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>

Co-authored-by: Phil Renaud <phil.renaud@hashicorp.com>

* test:  delete flow job templates (#15896)

* refact: edit names

* bug:  set correct ref to store

* chore: trim whitespace:

* test: delete flow

* bug: reactively update view (#15904)

* Initialized default jobs (#15856)

* Initialized default jobs

* More jobs scaffolded

* Better commenting on a couple example job specs

* Adapter doing the work

* fall back to epic config

* Label format helper and custom serialization logic

* Test updates to account for a never-empty state

* Test suite uses settled and maintain RecordArray in adapter return

* Updates to hello-world and variables example jobspecs

* Parameterized job gets optional payload output

* Formatting changes for param and service discovery job templates

* Multi-group service discovery job

* Basic test for default templates (#15965)

* Basic test for default templates

* Percy snapshot for manage page

* Some late-breaking design changes

* Some copy edits to the header paragraphs for job templates (#15967)

* Added some init options for job templates (#15994)

* Async method for populating default job templates from the variable adapter

---------

Co-authored-by: Jai <41024828+ChaiWithJai@users.noreply.github.com>
2023-02-02 10:37:40 -05:00
jmwilkinson 37834dffda
Allow wildcard datacenters to be specified in job file (#11170)
Also allows for default value of `datacenters = ["*"]`
2023-02-02 09:57:45 -05:00
James Rasell 9e8325d63c
acl: fix a bug in token creation when parsing expiration TTLs. (#15999)
The ACL token decoding was not correctly handling time duration
syntax such as "1h" which forced people to use the nanosecond
representation via the HTTP API.

The change adds an unmarshal function which allows this syntax to
be used, along with other styles correctly.
2023-02-01 17:43:41 +01:00
Mike Nomitch 80848b202e
Increases max variable size to 64KiB from 16KiB (#15983) 2023-01-31 13:32:36 -05:00
Jorge Marey d1c9aad762
Rename fields on proxyConfig (#15541)
* Change api Fields for expose and paths

* Add changelog entry

* changelog: add deprecation notes about connect fields

* api: minor style tweaks

---------

Co-authored-by: Seth Hoenig <shoenig@duck.com>
2023-01-30 09:31:16 -06:00
Piotr Kazmierczak 14b53df3b6
renamed stanza to block for consistency with other projects (#15941) 2023-01-30 15:48:43 +01:00
Yorick Gersie 2a5c423ae0
Allow per_alloc to be used with host volumes (#15780)
Disallowing per_alloc for host volumes in some cases makes life of a nomad user much harder.
When we rely on the NOMAD_ALLOC_INDEX for any configuration that needs to be re-used across
restarts we need to make sure allocation placement is consistent. With CSI volumes we can
use the `per_alloc` feature but for some reason this is explicitly disabled for host volumes.

Ensure host volumes understand the concept of per_alloc
2023-01-26 09:14:47 -05:00
James Rasell 5d33891910
sso: allow binding rules to create management ACL tokens. (#15860)
* sso: allow binding rules to create management ACL tokens.

* docs: update binding rule docs to detail management type addition.
2023-01-26 09:57:44 +01:00
Luiz Aoqui 3479e2231f
core: enforce strict steps for clients reconnect (#15808)
When a Nomad client that is running an allocation with
`max_client_disconnect` set misses a heartbeat the Nomad server will
update its status to `disconnected`.

Upon reconnecting, the client will make three main RPC calls:

- `Node.UpdateStatus` is used to set the client status to `ready`.
- `Node.UpdateAlloc` is used to update the client-side information about
  allocations, such as their `ClientStatus`, task states etc.
- `Node.Register` is used to upsert the entire node information,
  including its status.

These calls are made concurrently and are also running in parallel with
the scheduler. Depending on the order they run the scheduler may end up
with incomplete data when reconciling allocations.

For example, a client disconnects and its replacement allocation cannot
be placed anywhere else, so there's a pending eval waiting for
resources.

When this client comes back the order of events may be:

1. Client calls `Node.UpdateStatus` and is now `ready`.
2. Scheduler reconciles allocations and places the replacement alloc to
   the client. The client is now assigned two allocations: the original
   alloc that is still `unknown` and the replacement that is `pending`.
3. Client calls `Node.UpdateAlloc` and updates the original alloc to
   `running`.
4. Scheduler notices too many allocs and stops the replacement.

This creates unnecessary placements or, in a different order of events,
may leave the job without any allocations running until the whole state
is updated and reconciled.

To avoid problems like this clients must update _all_ of its relevant
information before they can be considered `ready` and available for
scheduling.

To achieve this goal the RPC endpoints mentioned above have been
modified to enforce strict steps for nodes reconnecting:

- `Node.Register` does not set the client status anymore.
- `Node.UpdateStatus` sets the reconnecting client to the `initializing`
  status until it successfully calls `Node.UpdateAlloc`.

These changes are done server-side to avoid the need of additional
coordination between clients and servers. Clients are kept oblivious of
these changes and will keep making these calls as they normally would.

The verification of whether allocations have been updates is done by
storing and comparing the Raft index of the last time the client missed
a heartbeat and the last time it updated its allocations.
2023-01-25 15:53:59 -05:00
Tim Gross 055434cca9
add metric for count of RPC requests (#15515)
Implement a metric for RPC requests with labels on the identity, so that
administrators can monitor the source of requests within the cluster. This
changeset demonstrates the change with the new `ACL.WhoAmI` RPC, and we'll wire
up the remaining RPCs once we've threaded the new pre-forwarding authentication
through the all.

Note that metrics are measured after we forward but before we return any
authentication error. This ensures that we only emit metrics on the server that
actually serves the request. We'll perform rate limiting at the same place.

Includes telemetry configuration to omit identity labels.
2023-01-24 11:54:20 -05:00
Seth Hoenig d2d8ebbeba
consul: correctly interpret missing consul checks as unhealthy (#15822)
* consul: correctly understand missing consul checks as unhealthy

This PR fixes a bug where Nomad assumed any registered Checks would exist
in the service registration coming back from Consul. In some cases, the
Consul may be slow in processing the check registration, and the response
object would not contain checks. Nomad would then scan the empty response
looking for Checks with failing health status, finding none, and then
marking a task/alloc as healthy.

In reality, we must always use Nomad's view of what checks should exist as
the source of truth, and compare that with the response Consul gives us,
making sure they match, before scanning the Consul response for failing
check statuses.

Fixes #15536

* consul: minor CR refactor using maps not sets

* consul: observe transition from healthy to unhealthy checks

* consul: spell healthy correctly
2023-01-19 14:01:12 -06:00
James Rasell fad9b40e53
Merge branch 'main' into sso/gh-13120-oidc-login 2023-01-18 10:05:31 +00:00
James Rasell d29d3412d8
rpc: add OIDC login related endpoints.
This adds new OIDC endpoints on the RPC endpoint. These two RPCs
handle generating the OIDC provider URL and then completing the
login by exchanging the provider token with an internal Nomad
token.

The RPC endpoints both do double forwarding. The initial forward
is to ensure we are talking to the regional leader; the second
then takes into account whether the auth method generates local or
global tokens. If it creates global tokens, we must then forward
onto the federated regional leader.
2023-01-13 13:14:29 +00:00
Seth Hoenig fe7795ce16
consul/connect: support for proxy upstreams opaque config (#15761)
This PR adds support for configuring `proxy.upstreams[].config` for
Consul Connect upstreams. This is an opaque config value to Nomad -
the data is passed directly to Consul and is unknown to Nomad.
2023-01-12 08:20:54 -06:00
Daniel Bennett 7d1059b5ae
connect: ingress gateway validation for http hosts and wildcards (#15749)
* connect: fix non-"tcp" ingress gateway validation

changes apply to http, http2, and grpc:
* if "hosts" is excluded, consul will use its default domain
  e.g. <service-name>.ingress.dc1.consul
* can't set hosts with "*" service name
* test http2 and grpc too
2023-01-11 11:52:32 -06:00
Seth Hoenig 719eee8112
consul: add client configuration for grpc_ca_file (#15701)
* [no ci] first pass at plumbing grpc_ca_file

* consul: add support for grpc_ca_file for tls grpc connections in consul 1.14+

This PR adds client config to Nomad for specifying consul.grpc_ca_file

These changes combined with https://github.com/hashicorp/consul/pull/15913 should
finally enable Nomad users to upgrade to Consul 1.14+ and use tls grpc connections.

* consul: add cl entgry for grpc_ca_file

* docs: mention grpc_tls changes due to Consul 1.14
2023-01-11 09:34:28 -06:00
Piotr Kazmierczak be36a1924f
acl: binding rules evaluation (#15697)
Binder provides an interface for binding claims and ACL roles/policies of Nomad.
2023-01-10 16:08:08 +01:00
Tim Gross 32f6ce1c54
Authenticate method improvements (#15734)
This changeset covers a sidebar discussion that @schmichael and I had around the
design for pre-forwarding auth. This includes some changes extracted out of
#15513 to make it easier to review both and leave a clean history.

* Remove fast path for NodeID. Previously-connected clients will have a NodeID
  set on the context, and because this is a large portion of the RPCs sent we
  fast-pathed it at the top of the `Authenticate` method. But the context is
  shared for all yamux streams over the same yamux session (and TCP
  connection). This lets an authenticated HTTP request to a client use the
  NodeID for authentication, which is a privilege escalation. Remove the fast
  path and annotate it so that we don't break it again.

* Add context to decisions around AuthenticatedIdentity. The `Authenticate`
  method taken on its own looks like it wants to return an `acl.ACL` that folds
  over all the various identity types (creating an ephemeral ACL on the fly if
  neccessary). But keeping these fields idependent allows RPC handlers to
  differentiate between internal and external origins so we most likely want to
  avoid this. Leave some docstrings as a warning as to why this is built the way
  it is.

* Mutate the request rather than returning. When reviewing #15513 we decided
  that forcing the request handler to call `SetIdentity` was repetitive and
  error prone. Instead, the `Authenticate` method mutates the request by setting
  its `AuthenticatedIdentity`.
2023-01-10 09:46:38 -05:00
Piotr Kazmierczak 20a01a0bba
acl: modify update endpoints behavior (#15580)
API and RPC endpoints for ACLAuthMethods and ACLBindingRules should allow users
to send incomplete objects in order to, e.g., update single fields. This PR
provides "merging" functionality for these endpoints.
2022-12-20 11:22:19 +01:00
James Rasell b8aa53d09f
core: add ACL binding rule to replication system. (#15555)
ACL binding rule create and deletes are always forwarded to the
authoritative region. In order to make these available in
federated regions, the leaders in these regions need to replicate
from the authoritative.
2022-12-16 09:08:00 +01:00
James Rasell 13f207ea78
events: add ACL binding rules to core events stream topics. (#15544) 2022-12-14 14:49:49 +01:00
James Rasell 3c941c6bc3
acl: add binding rule object state schema and functionality. (#15511)
This change adds a new table that will store ACL binding rule
objects. The two indexes allow fast lookups by their ID, or by
which auth method they are linked to. Snapshot persist and
restore functionality ensures this table can be saved and
restored from snapshots.

In order to write and delete the object to state, new Raft messages
have been added.

All RPC request and response structs, along with object functions
such as diff and canonicalize have been included within this work
as it is nicely separated from the other areas of work.
2022-12-14 08:48:18 +01:00
Seth Hoenig be3f89b5f9
artifact: enable inheriting environment variables from client (#15514)
* artifact: enable inheriting environment variables from client

This PR adds client configuration for specifying environment variables that
should be inherited by the artifact sandbox process from the Nomad Client agent.

Most users should not need to set these values but the configuration is provided
to ensure backwards compatability. Configuration of go-getter should ideally be
done through the artifact block in a jobspec task.

e.g.

```hcl
client {
  artifact {
    set_environment_variables = "TMPDIR,GIT_SSH_OPTS"
  }
}
```

Closes #15498

* website: update set_environment_variables text to mention PATH
2022-12-09 15:46:07 -06:00
Seth Hoenig 825c5cc65e
artifact: add client toggle to disable filesystem isolation (#15503)
This PR adds the client config option for turning off filesystem isolation,
applicable on Linux systems where filesystem isolation is possible and
enabled by default.

```hcl
client{
  artifact {
    disable_filesystem_isolation = <bool:false>
  }
}
```

Closes #15496
2022-12-08 12:29:23 -06:00
Piotr Kazmierczak 1cb45630f0
acl: canonicalize ACL Auth Method object (#15492) 2022-12-08 14:05:46 +01:00
Tim Gross e0fddee386
Pre forwarding authentication (#15417)
Upcoming work to instrument the rate of RPC requests by consumer (and eventually
rate limit) require that we authenticate a RPC request before forwarding. Add a
new top-level `Authenticate` method to the server and have it return an
`AuthenticatedIdentity` struct. RPC handlers will use the relevant fields of
this identity for performing authorization.

This changeset includes:
* The main implementation of `Authenticate`
* Provide a new RPC `ACL.WhoAmI` for debugging authentication. This endpoint
  returns the same `AuthenticatedIdentity` that will be used by RPC handlers. At
  some point we might want to give this an equivalent HTTP endpoint but I didn't
  want to add that to our public API until some of the other Workload Identity
  work is solidified, especially if we don't need it yet.
* A full coverage test of the `Authenticate` method. This sets up two server
  nodes with mTLS and ACLs, some tokens, and some allocations with workload
  identities.
* Wire up an example of using `Authenticate` in the `Namespace.Upsert` RPC and
  see how authorization happens after forwarding.
* A new semgrep rule for `Authenticate`, which we'll need to update once we're
  ready to wire up more RPC endpoints with authorization steps.
2022-12-06 14:44:03 -05:00
Piotr Kazmierczak 777173e8da
acl: added type to ACL Auth Method stub (#15480) 2022-12-06 14:47:05 +01:00
Piotr Kazmierczak 0eccd3286c
acl: sso auth methods RPC/API/CLI should return created or updated objects (#15410)
Currently CRUD code that operates on SSO auth methods does not return created or updated object upon creation/update. This is bad UX and inconsistent behavior compared to other ACL objects like roles, policies or tokens.

This PR fixes it.

Relates to #13120
2022-11-29 07:36:36 +01:00
James Rasell 726d419da1
acl: replicate auth-methods from federated cluster leaders. (#15366) 2022-11-28 09:20:24 +01:00
Luiz Aoqui 8f91be26ab
scheduler: create placements for non-register MRD (#15325)
* scheduler: create placements for non-register MRD

For multiregion jobs, the scheduler does not create placements on
registration because the deployment must wait for the other regions.
Once of these regions will then trigger the deployment to run.

Currently, this is done in the scheduler by considering any eval for a
multiregion job as "paused" since it's expected that another region will
eventually unpause it.

This becomes a problem where evals not triggered by a job registration
happen, such as on a node update. These types of regional changes do not
have other regions waiting to progress the deployment, and so they were
never resulting in placements.

The fix is to create a deployment at job registration time. This
additional piece of state allows the scheduler to differentiate between
a multiregion change, where there are other regions engaged in the
deployment so no placements are required, from a regional change, where
the scheduler does need to create placements.

This deployment starts in the new "initializing" status to signal to the
scheduler that it needs to compute the initial deployment state. The
multiregion deployment will wait until this deployment state is
persisted and its starts is set to "pending". Without this state
transition it's possible to hit a race condition where the plan applier
and the deployment watcher may step of each other and overwrite their
changes.

* changelog: add entry for #15325
2022-11-25 12:45:34 -05:00
Piotr Kazmierczak bb66b5e770
acl: sso auth method RPC endpoints (#15221)
This PR implements RPC endpoints for SSO auth methods.

This PR is part of the SSO work captured under ☂️ ticket #13120.
2022-11-21 10:15:39 +01:00
Piotr Kazmierczak d02241cad5
acl: sso auth method event stream (#15280)
This PR implements SSO auth method support in the event stream.

This PR is part of the SSO work captured under ☂️ ticket #13120.
2022-11-21 10:06:05 +01:00
Tim Gross 37134a4a37
eval delete: move batching of deletes into RPC handler and state (#15117)
During unusual outage recovery scenarios on large clusters, a backlog of
millions of evaluations can appear. In these cases, the `eval delete` command can
put excessive load on the cluster by listing large sets of evals to extract the
IDs and then sending larges batches of IDs. Although the command's batch size
was carefully tuned, we still need to be JSON deserialize, re-serialize to
MessagePack, send the log entries through raft, and get the FSM applied.

To improve performance of this recovery case, move the batching process into the
RPC handler and the state store. The design here is a little weird, so let's
look a the failed options first:

* A naive solution here would be to just send the filter as the raft request and
  let the FSM apply delete the whole set in a single operation. Benchmarking with
  1M evals on a 3 node cluster demonstrated this can block the FSM apply for
  several minutes, which puts the cluster at risk if there's a leadership
  failover (the barrier write can't be made while this apply is in-flight).

* A less naive but still bad solution would be to have the RPC handler filter
  and paginate, and then hand a list of IDs to the existing raft log
  entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and
  took roughly an hour to complete.

Instead, we're filtering and paginating in the RPC handler to find a page token,
and then passing both the filter and page token in the raft log. The FSM apply
recreates the paginator using the filter and page token to get roughly the same
page of evaluations, which it then deletes. The pagination process is fairly
cheap (only abut 5% of the total FSM apply time), so counter-intuitively this
rework ends up being much faster. A benchmark of 1M evaluations showed this
blocked the FSM apply for 20-30ms at a time (typical for normal operations) and
completes in less than 4 minutes.

Note that, as with the existing design, this delete is not consistent: a new
evaluation inserted "behind" the cursor of the pagination will fail to be
deleted.
2022-11-14 14:08:13 -05:00
Piotr Kazmierczak 4851f9e68a
acl: sso auth method schema and store functions (#15191)
This PR implements ACLAuthMethod type, acl_auth_methods table schema and crud state store methods. It also updates nomadSnapshot.Persist and nomadSnapshot.Restore methods in order for them to work with the new table, and adds two new Raft messages: ACLAuthMethodsUpsertRequestType and ACLAuthMethodsDeleteRequestType

This PR is part of the SSO work captured under ☂️ ticket #13120.
2022-11-10 19:42:41 +01:00
Tim Gross 9e1c0b46d8
API for Eval.Count (#15147)
Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is
designed to support interactive use in the `nomad eval delete` command to get a
count of evals expected to be deleted before doing so.

The state store operations to do this sort of thing are somewhat expensive, but
it's cheaper than serializing a big list of evals to JSON. Note that although it
seems like this could be done as an extra parameter and response field on
`Eval.List`, having it as its own endpoint avoids having to change the response
body shape and lets us avoid handling the legacy filter params supported by
`Eval.List`.
2022-11-07 08:53:19 -05:00
Luiz Aoqui e4c8b59919
Update alloc after reconnect and enforece client heartbeat order (#15068)
* scheduler: allow updates after alloc reconnects

When an allocation reconnects to a cluster the scheduler needs to run
special logic to handle the reconnection, check if a replacement was
create and stop one of them.

If the allocation kept running while the node was disconnected, it will
be reconnected with `ClientStatus: running` and the node will have
`Status: ready`. This combination is the same as the normal steady state
of allocation, where everything is running as expected.

In order to differentiate between the two states (an allocation that is
reconnecting and one that is just running) the scheduler needs an extra
piece of state.

The current implementation uses the presence of a
`TaskClientReconnected` task event to detect when the allocation has
reconnected and thus must go through the reconnection process. But this
event remains even after the allocation is reconnected, causing all
future evals to consider the allocation as still reconnecting.

This commit changes the reconnect logic to use an `AllocState` to
register when the allocation was reconnected. This provides the
following benefits:

  - Only a limited number of task states are kept, and they are used for
    many other events. It's possible that, upon reconnecting, several
    actions are triggered that could cause the `TaskClientReconnected`
    event to be dropped.
  - Task events are set by clients and so their timestamps are subject
    to time skew from servers. This prevents using time to determine if
    an allocation reconnected after a disconnect event.
  - Disconnect events are already stored as `AllocState` and so storing
    reconnects there as well makes it the only source of information
    required.

With the new logic, the reconnection logic is only triggered if the
last `AllocState` is a disconnect event, meaning that the allocation has
not been reconnected yet. After the reconnection is handled, the new
`ClientStatus` is store in `AllocState` allowing future evals to skip
the reconnection logic.

* scheduler: prevent spurious placement on reconnect

When a client reconnects it makes two independent RPC calls:

  - `Node.UpdateStatus` to heartbeat and set its status as `ready`.
  - `Node.UpdateAlloc` to update the status of its allocations.

These two calls can happen in any order, and in case the allocations are
updated before a heartbeat it causes the state to be the same as a node
being disconnected: the node status will still be `disconnected` while
the allocation `ClientStatus` is set to `running`.

The current implementation did not handle this order of events properly,
and the scheduler would create an unnecessary placement since it
considered the allocation was being disconnected. This extra allocation
would then be quickly stopped by the heartbeat eval.

This commit adds a new code path to handle this order of events. If the
node is `disconnected` and the allocation `ClientStatus` is `running`
the scheduler will check if the allocation is actually reconnecting
using its `AllocState` events.

* rpc: only allow alloc updates from `ready` nodes

Clients interact with servers using three main RPC methods:

  - `Node.GetAllocs` reads allocation data from the server and writes it
    to the client.
  - `Node.UpdateAlloc` reads allocation from from the client and writes
    them to the server.
  - `Node.UpdateStatus` writes the client status to the server and is
    used as the heartbeat mechanism.

These three methods are called periodically by the clients and are done
so independently from each other, meaning that there can't be any
assumptions in their ordering.

This can generate scenarios that are hard to reason about and to code
for. For example, when a client misses too many heartbeats it will be
considered `down` or `disconnected` and the allocations it was running
are set to `lost` or `unknown`.

When connectivity is restored the to rest of the cluster, the natural
mental model is to think that the client will heartbeat first and then
update its allocations status into the servers.

But since there's no inherit order in these calls the reverse is just as
possible: the client updates the alloc status and then heartbeats. This
results in a state where allocs are, for example, `running` while the
client is still `disconnected`.

This commit adds a new verification to the `Node.UpdateAlloc` method to
reject updates from nodes that are not `ready`, forcing clients to
heartbeat first. Since this check is done server-side there is no need
to coordinate operations client-side: they can continue sending these
requests independently and alloc update will succeed after the heartbeat
is done.

* chagelog: add entry for #15068

* code review

* client: skip terminal allocations on reconnect

When the client reconnects with the server it synchronizes the state of
its allocations by sending data using the `Node.UpdateAlloc` RPC and
fetching data using the `Node.GetClientAllocs` RPC.

If the data fetch happens before the data write, `unknown` allocations
will still be in this state and would trigger the
`allocRunner.Reconnect` flow.

But when the server `DesiredStatus` for the allocation is `stop` the
client should not reconnect the allocation.

* apply more code review changes

* scheduler: persist changes to reconnected allocs

Reconnected allocs have a new AllocState entry that must be persisted by
the plan applier.

* rpc: read node ID from allocs in UpdateAlloc

The AllocUpdateRequest struct is used in three disjoint use cases:

1. Stripped allocs from clients Node.UpdateAlloc RPC using the Allocs,
   and WriteRequest fields
2. Raft log message using the Allocs, Evals, and WriteRequest fields
3. Plan updates using the AllocsStopped, AllocsUpdated, and Job fields

Adding a new field that would only be used in one these cases (1) made
things more confusing and error prone. While in theory an
AllocUpdateRequest could send allocations from different nodes, in
practice this never actually happens since only clients call this method
with their own allocations.

* scheduler: remove logic to handle exceptional case

This condition could only be hit if, somehow, the allocation status was
set to "running" while the client was "unknown". This was addressed by
enforcing an order in "Node.UpdateStatus" and "Node.UpdateAlloc" RPC
calls, so this scenario is not expected to happen.

Adding unnecessary code to the scheduler makes it harder to read and
reason about it.

* more code review

* remove another unused test
2022-11-04 16:25:11 -04:00
Charlie Voiselle 79c4478f5b
template: error on missing key (#15141)
* Support error_on_missing_value for templates
* Update docs for template stanza
2022-11-04 13:23:01 -04:00