Commit Graph

1124 Commits

Author SHA1 Message Date
James Rasell b3a6cfecc4
api: add OIDC HTTP API endpoints and SDK. 2023-01-13 13:15:58 +00:00
Seth Hoenig c40dea83a4
test: wait for node in api tests that register job (#15677) 2023-01-03 16:11:35 -06:00
Seth Hoenig 9dc82864a9
tests: fix assertion for slice length (#15672)
This assertions got borked during the refactoring; should be at least one
element, not exactly one element.
2023-01-03 15:40:38 -06:00
Seth Hoenig cd75858f4a
api: purge testify and pretty dependencies (#15627)
* api: swap testify for test (acl)

* api: swap testify for test (agent)

 Please enter the commit message for your changes. Lines starting

* api: swap testify for test (allocations)

* api: swap testify for test (api)

* api: swap testify for test (compose)

* api: swap testify for test (constraint)

* api: swap testify for test (consul)

* api: swap testify for test (csi)

* api: swap testify for test (evaluations)

* api: swap testify for test (event stream)

* api: swap testify for test (fs)

* api: swap testify for test (ioutil)

* api: swap testify for test (jobs)

* api: swap testify for test (keyring)

* api: swap testify for test (operator_ent)

* api: swap testify for test (operator_metrics)

* api: swap testify for test (operator)

* api: swap testify for test (quota)

* api: swap testify for test (resources)

* api: swap testify for test (fix operator_metrics)

* api: swap testify for test (scaling)

* api: swap testify for test (search)

* api: swap testify for test (sentinel)

* api: swap testify for test (services)

* api: swap testify for test (status)

* api: swap testify for test (system)

* api: swap testify for test (tasks)

* api: swap testify for test (utils)

* api: swap testify for test (variables)

* api: remove dependencies on testify and pretty
2023-01-01 12:57:26 -06:00
Seth Hoenig 429293a4be
api: cleanup use of deprecated waiter functions (#15608) 2022-12-22 08:21:00 -06:00
Seth Hoenig 26512b4e38
deps: update shoenig/test to 0.5.2 and fixup breaking changes (#15574) 2022-12-20 07:52:10 -06:00
Seth Hoenig 336d730b9c
api: make api tests fast and more concurrency safe (#15543)
This PR tries to make API tests run fast, as an experiment to later apply
to all packages. Key changes include

- Swapping freeport for test/portal for port allocations
- Swappng some uses of WaitForResult with test/wait
- Turning on parallelism in api/testutil/slow.go
- Switching to custom public runner (32 vcpu)

There's also chunk of cleanup brought in for the ride
2022-12-16 12:25:28 -06:00
James Rasell 95c9ffa505
ACL: add ACL binding rule RPC and HTTP API handlers. (#15529)
This change add the RPC ACL binding rule handlers. These handlers
are responsible for the creation, updating, reading, and deletion
of binding rules.

The write handlers are feature gated so that they can only be used
when all federated servers are running the required version.

The HTTP API handlers and API SDK have also been added where
required. This allows the endpoints to be called from the API by users
and clients.
2022-12-15 09:18:55 +01:00
Piotr Kazmierczak 777173e8da
acl: added type to ACL Auth Method stub (#15480) 2022-12-06 14:47:05 +01:00
Seth Hoenig 3ed37b0b1d
fingerprint: add fingerprinting for CNI plugins presense and version (#15452)
This PR adds a fingerprinter to set the attribute
"plugins.cni.version.<name>" => "<version>"

for each CNI plugin in <client>.cni_path (/opt/cni/bin by default).
2022-12-05 14:22:47 -06:00
dependabot[bot] 866f7da8ad
build(deps): bump github.com/shoenig/test from 0.4.4 to 0.4.5 in /api (#15405)
* build(deps): bump github.com/shoenig/test from 0.4.4 to 0.4.5 in /api

Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 0.4.4 to 0.4.5.
- [Release notes](https://github.com/shoenig/test/releases)
- [Commits](https://github.com/shoenig/test/compare/v0.4.4...v0.4.5)

---
updated-dependencies:
- dependency-name: github.com/shoenig/test
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* deps: update github.com/shoenig/test v0.4.4 -> v0.4.5

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
2022-11-29 13:25:53 -05:00
Piotr Kazmierczak 0eccd3286c
acl: sso auth methods RPC/API/CLI should return created or updated objects (#15410)
Currently CRUD code that operates on SSO auth methods does not return created or updated object upon creation/update. This is bad UX and inconsistent behavior compared to other ACL objects like roles, policies or tokens.

This PR fixes it.

Relates to #13120
2022-11-29 07:36:36 +01:00
James Rasell 726d419da1
acl: replicate auth-methods from federated cluster leaders. (#15366) 2022-11-28 09:20:24 +01:00
James Rasell 32dfa431f3
sso: add ACL auth-method HTTP API CRUD endpoints (#15338)
* core: remove custom auth-method TTLS and use ACL token TTLS.

* agent: add ACL auth-method HTTP endpoints for CRUD actions.

* api: add ACL auth-method client.
2022-11-23 09:38:02 +01:00
Tim Gross 37134a4a37
eval delete: move batching of deletes into RPC handler and state (#15117)
During unusual outage recovery scenarios on large clusters, a backlog of
millions of evaluations can appear. In these cases, the `eval delete` command can
put excessive load on the cluster by listing large sets of evals to extract the
IDs and then sending larges batches of IDs. Although the command's batch size
was carefully tuned, we still need to be JSON deserialize, re-serialize to
MessagePack, send the log entries through raft, and get the FSM applied.

To improve performance of this recovery case, move the batching process into the
RPC handler and the state store. The design here is a little weird, so let's
look a the failed options first:

* A naive solution here would be to just send the filter as the raft request and
  let the FSM apply delete the whole set in a single operation. Benchmarking with
  1M evals on a 3 node cluster demonstrated this can block the FSM apply for
  several minutes, which puts the cluster at risk if there's a leadership
  failover (the barrier write can't be made while this apply is in-flight).

* A less naive but still bad solution would be to have the RPC handler filter
  and paginate, and then hand a list of IDs to the existing raft log
  entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and
  took roughly an hour to complete.

Instead, we're filtering and paginating in the RPC handler to find a page token,
and then passing both the filter and page token in the raft log. The FSM apply
recreates the paginator using the filter and page token to get roughly the same
page of evaluations, which it then deletes. The pagination process is fairly
cheap (only abut 5% of the total FSM apply time), so counter-intuitively this
rework ends up being much faster. A benchmark of 1M evaluations showed this
blocked the FSM apply for 20-30ms at a time (typical for normal operations) and
completes in less than 4 minutes.

Note that, as with the existing design, this delete is not consistent: a new
evaluation inserted "behind" the cursor of the pagination will fail to be
deleted.
2022-11-14 14:08:13 -05:00
Derek Strickland 80b6f27efd
api: remove `mapstructure` tags from`Port` struct (#12916)
This PR solves a defect in the deserialization of api.Port structs when returning structs from theEventStream.

Previously, the api.Port struct's fields were decorated with both mapstructure and hcl tags to support the network.port stanza's use of the keyword static when posting a static port value. This works fine when posting a job and when retrieving any struct that has an embedded api.Port instance as long as the value is deserialized using JSON decoding. The EventStream, however, uses mapstructure to decode event payloads in the api package. mapstructure expects an underlying field named static which does not exist. The result was that the Port.Value field would always be set to 0.

Upon further inspection, a few things became apparent.

The struct already has hcl tags that support the indirection during job submission.
Serialization/deserialization with both the json and hcl packages produce the desired result.
The use of of the mapstructure tags provided no value as the Port struct contains only fields with primitive types.
This PR:

Removes the mapstructure tags from the api.Port structs
Updates the job parsing logic to use hcl instead of mapstructure when decoding Port instances.
Closes #11044

Co-authored-by: DerekStrickland <dstrickland@hashicorp.com>
Co-authored-by: Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
2022-11-08 11:26:28 +01:00
Tim Gross 9e1c0b46d8
API for `Eval.Count` (#15147)
Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is
designed to support interactive use in the `nomad eval delete` command to get a
count of evals expected to be deleted before doing so.

The state store operations to do this sort of thing are somewhat expensive, but
it's cheaper than serializing a big list of evals to JSON. Note that although it
seems like this could be done as an extra parameter and response field on
`Eval.List`, having it as its own endpoint avoids having to change the response
body shape and lets us avoid handling the legacy filter params supported by
`Eval.List`.
2022-11-07 08:53:19 -05:00
dependabot[bot] d848c28066
build(deps): bump github.com/shoenig/test from 0.4.3 to 0.4.4 in /api (#15163)
* build(deps): bump github.com/shoenig/test from 0.4.3 to 0.4.4 in /api

Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 0.4.3 to 0.4.4.
- [Release notes](https://github.com/shoenig/test/releases)
- [Commits](https://github.com/shoenig/test/compare/v0.4.3...v0.4.4)

---
updated-dependencies:
- dependency-name: github.com/shoenig/test
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* deps: also update root go mod

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-11-06 08:06:01 -06:00
Charlie Voiselle 79c4478f5b
template: error on missing key (#15141)
* Support error_on_missing_value for templates
* Update docs for template stanza
2022-11-04 13:23:01 -04:00
Phil Renaud ffb4c63af7
[ui] Adds meta to job list stub and displays a pack logo on the jobs index (#14833)
* Adds meta to job list stub and displays a pack logo on the jobs index

* Changelog

* Modifying struct for optional meta param

* Explicitly ask for meta anytime I look up a job from index or job page

* Test case for the endpoint

* adding meta field to API struct and ommitting from response if empty

* passthru method added to api/jobs.list

* Meta param listed in docs for jobs list

* Update api/jobs.go

Co-authored-by: Tim Gross <tgross@hashicorp.com>

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-11-02 16:58:24 -04:00
dependabot[bot] 0ae4c4a241
build(deps): bump github.com/stretchr/testify in /api (#15082)
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.8.0 to 1.8.1.
- [Release notes](https://github.com/stretchr/testify/releases)
- [Commits](https://github.com/stretchr/testify/compare/v1.8.0...v1.8.1)

---
updated-dependencies:
- dependency-name: github.com/stretchr/testify
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-31 08:45:04 -05:00
dependabot[bot] 81ac5d93f1
build(deps): bump github.com/kr/pretty from 0.3.0 to 0.3.1 in /api (#14859)
* build(deps): bump github.com/kr/pretty from 0.3.0 to 0.3.1 in /api

Bumps [github.com/kr/pretty](https://github.com/kr/pretty) from 0.3.0 to 0.3.1.
- [Release notes](https://github.com/kr/pretty/releases)
- [Commits](https://github.com/kr/pretty/compare/v0.3.0...v0.3.1)

---
updated-dependencies:
- dependency-name: github.com/kr/pretty
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* deps: update in root as well

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-10-27 11:58:00 -05:00
Seth Hoenig 756b71b7d2
deps: bump shoenig for str func bugfixes (#14974)
And fix the one place we use them.
2022-10-20 08:11:43 -05:00
James Rasell 2db8c67a6d
api: add convenience string func to Topic type. (#14843) 2022-10-19 14:12:23 +02:00
Seth Hoenig c571db34e7
deps: bump shoenig/test to 0.4.1 (#14931)
bugfix for SliceContainsAll and adding SliceContainsSubset
2022-10-18 09:46:25 -05:00
Giovani Avelar a625de2062
Allow specification of a custom job name/prefix for parameterized jobs (#14631) 2022-10-06 16:21:40 -04:00
dependabot[bot] 86d5c4fa67
build(deps): bump github.com/docker/go-units from 0.4.0 to 0.5.0 in /api (#14430)
* build(deps): bump github.com/docker/go-units from 0.4.0 to 0.5.0 in /api

Bumps [github.com/docker/go-units](https://github.com/docker/go-units) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/docker/go-units/releases)
- [Commits](https://github.com/docker/go-units/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: github.com/docker/go-units
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* deps: also update go-units in nomad

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-09-26 09:30:17 -05:00
dependabot[bot] 1e79d13097
build(deps): bump github.com/shoenig/test from 0.3.1 to 0.4.0 in /api (#14681)
Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 0.3.1 to 0.4.0.
- [Release notes](https://github.com/shoenig/test/releases)
- [Commits](https://github.com/shoenig/test/compare/v0.3.1...v0.4.0)

---
updated-dependencies:
- dependency-name: github.com/shoenig/test
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-26 08:48:37 -05:00
Charlie Voiselle 15bb444da8
[vars:api] Return fake QueryMeta on 403s with Peek() (#14661)
Consul-template expects that the Response Metadata is non-zero, but on a 403 we do not pass back QueryMeta headers.
2022-09-22 13:15:05 -04:00
James Rasell 41284382dd
api: update keyring comment to reflect correct feature name. (#14558) 2022-09-13 10:05:03 -04:00
James Rasell 1f877bac1c
acl: fix encoding expiration time in ACL token list API. (#14542) 2022-09-12 15:50:35 +02:00
Charlie Voiselle b55112714f
Vars: CLI commands for `var get`, `var put`, `var purge` (#14400)
* Includes updates to `var init`
2022-09-09 17:55:20 -04:00
Tim Gross 9259a373cd
remove root keyring install API (#14514)
* keyring rotate API should require put/post method
* remove keyring install API
2022-09-09 08:50:35 -04:00
Tim Gross 3fc7482ecd
CSI: failed allocation should not block its own controller unpublish (#14484)
A Nomad user reported problems with CSI volumes associated with failed
allocations, where the Nomad server did not send a controller unpublish RPC.

The controller unpublish is skipped if other non-terminal allocations on the
same node claim the volume. The check has a bug where the allocation belonging
to the claim being freed was included in the check incorrectly. During a normal
allocation stop for job stop or a new version of the job, the allocation is
terminal. But allocations that fail are not yet marked terminal at the point in
time when the client sends the unpublish RPC to the server.

For CSI plugins that support controller attach/detach, this means that the
controller will not be able to detach the volume from the allocation's host and
the replacement claim will fail until a GC is run. This changeset fixes the
conditional so that the claim's own allocation is not included, and makes the
logic easier to read. Include a test case covering this path.

Also includes two minor extra bugfixes:

* Entities we get from the state store should always be copied before
altering. Ensure that we copy the volume in the top-level unpublish workflow
before handing off to the steps.

* The list stub object for volumes in `nomad/structs` did not match the stub
object in `api`. The `api` package also did not include the current
readers/writers fields that are expected by the UI. True up the two objects and
add the previously undocumented fields to the docs.
2022-09-08 13:30:05 -04:00
Tim Gross cc9b480996
testing: setting env var incompatible with parallel tests (#14405)
Neither the `os.Setenv` nor `t.Setenv` helper are safe to use in parallel tests
because environment variables are process-global. The stdlib panics if you try
to do this. Remove the `ci.Parallel()` call from all tests where we're setting
environment variables.
2022-08-30 14:49:03 -04:00
James Rasell 755b4745ed
Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main 2022-08-30 08:59:13 +01:00
Tim Gross 1dc053b917 rename SecureVariables to Variables throughout 2022-08-26 16:06:24 -04:00
Tim Gross dcfd31296b file rename 2022-08-26 16:06:24 -04:00
Charlie Voiselle ad737d008b
SV API: return upserted variable to caller (#14325)
* Return created variable to caller in HTTP and Go APIs
* Update tests for returned values
2022-08-25 17:38:15 -04:00
James Rasell 601588df6b
Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main 2022-08-25 12:14:29 +01:00
James Rasell 7a0798663d
acl: fix a bug where roles could be duplicated by name.
An ACL roles name must be unique, however, a bug meant multiple
roles of the same same could be created. This fixes that problem
with checks in the RPC handler and state store.
2022-08-25 09:20:43 +01:00
Luiz Aoqui e012d9411e
Task lifecycle restart (#14127)
* allocrunner: handle lifecycle when all tasks die

When all tasks die the Coordinator must transition to its terminal
state, coordinatorStatePoststop, to unblock poststop tasks. Since this
could happen at any time (for example, a prestart task dies), all states
must be able to transition to this terminal state.

* allocrunner: implement different alloc restarts

Add a new alloc restart mode where all tasks are restarted, even if they
have already exited. Also unifies the alloc restart logic to use the
implementation that restarts tasks concurrently and ignores
ErrTaskNotRunning errors since those are expected when restarting the
allocation.

* allocrunner: allow tasks to run again

Prevent the task runner Run() method from exiting to allow a dead task
to run again. When the task runner is signaled to restart, the function
will jump back to the MAIN loop and run it again.

The task runner determines if a task needs to run again based on two new
task events that were added to differentiate between a request to
restart a specific task, the tasks that are currently running, or all
tasks that have already run.

* api/cli: add support for all tasks alloc restart

Implement the new -all-tasks alloc restart CLI flag and its API
counterpar, AllTasks. The client endpoint calls the appropriate restart
method from the allocrunner depending on the restart parameters used.

* test: fix tasklifecycle Coordinator test

* allocrunner: kill taskrunners if all tasks are dead

When all non-poststop tasks are dead we need to kill the taskrunners so
we don't leak their goroutines, which are blocked in the alloc restart
loop. This also ensures the allocrunner exits on its own.

* taskrunner: fix tests that waited on WaitCh

Now that "dead" tasks may run again, the taskrunner Run() method will
not return when the task finishes running, so tests must wait for the
task state to be "dead" instead of using the WaitCh, since it won't be
closed until the taskrunner is killed.

* tests: add tests for all tasks alloc restart

* changelog: add entry for #14127

* taskrunner: fix restore logic.

The first implementation of the task runner restore process relied on
server data (`tr.Alloc().TerminalStatus()`) which may not be available
to the client at the time of restore.

It also had the incorrect code path. When restoring a dead task the
driver handle always needs to be clear cleanly using `clearDriverHandle`
otherwise, after exiting the MAIN loop, the task may be killed by
`tr.handleKill`.

The fix is to store the state of the Run() loop in the task runner local
client state: if the task runner ever exits this loop cleanly (not with
a shutdown) it will never be able to run again. So if the Run() loops
starts with this local state flag set, it must exit early.

This local state flag is also being checked on task restart requests. If
the task is "dead" and its Run() loop is not active it will never be
able to run again.

* address code review requests

* apply more code review changes

* taskrunner: add different Restart modes

Using the task event to differentiate between the allocrunner restart
methods proved to be confusing for developers to understand how it all
worked.

So instead of relying on the event type, this commit separated the logic
of restarting an taskRunner into two methods:
- `Restart` will retain the current behaviour and only will only restart
  the task if it's currently running.
- `ForceRestart` is the new method where a `dead` task is allowed to
  restart if its `Run()` method is still active. Callers will need to
  restart the allocRunner taskCoordinator to make sure it will allow the
  task to run again.

* minor fixes
2022-08-24 17:43:07 -04:00
Piotr Kazmierczak 7077d1f9aa
template: custom change_mode scripts (#13972)
This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707
2022-08-24 17:43:01 +02:00
Seth Hoenig 60e22d2b13
Merge pull request #14221 from hashicorp/build-require-go1.19
build: go.mod should require go1.19
2022-08-23 07:53:13 -05:00
Tim Gross bf57d76ec7
allow ACL policies to be associated with workload identity (#14140)
The original design for workload identities and ACLs allows for operators to
extend the automatic capabilities of a workload by using a specially-named
policy. This has shown to be potentially unsafe because of naming collisions, so
instead we'll allow operators to explicitly attach a policy to a workload
identity.

This changeset adds workload identity fields to ACL policy objects and threads
that all the way down to the command line. It also a new secondary index to the
ACL policy table on namespace and job so that claim resolution can efficiently
query for related policies.
2022-08-22 16:41:21 -04:00
Luiz Aoqui dbffdca92e
template: use pointer values for gid and uid (#14203)
When a Nomad agent starts and loads jobs that already existed in the
cluster, the default template uid and gid was being set to 0, since this
is the zero value for int. This caused these jobs to fail in
environments where it was not possible to use 0, such as in Windows
clients.

In order to differentiate between an explicit 0 and a template where
these properties were not set we need to use a pointer.
2022-08-22 16:25:49 -04:00
James Rasell 2736cf0dfa
acl: make listing RPC and HTTP API a stub return object. (#14211)
Making the ACL Role listing return object a stub future-proofs the
endpoint. In the event the role object grows, we are not bound by
having to return all fields within the list endpoint or change the
signature of the endpoint to reduce the list return size.
2022-08-22 17:20:23 +02:00
Seth Hoenig 9bce3a2e36 build: go.mod should require go1.19
Since we started using atomic.Pointer, we should specify the go1.19
requirement in our go.mod files.
2022-08-21 20:41:49 -05:00
Seth Hoenig 88a1353149 cli: display nomad service check status output in CLI commands
This PR adds some NSD check status output to the CLI.

1. The 'nomad alloc status' command produces nsd check summary output (if present)
2. The 'nomad alloc checks' sub-command is added to produce complete nsd check output (if present)
2022-08-19 09:18:29 -05:00
dependabot[bot] 05d943ed51
build(deps): bump github.com/shoenig/test from 0.3.0 to 0.3.1 in /api (#14194)
Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 0.3.0 to 0.3.1.
- [Release notes](https://github.com/shoenig/test/releases)
- [Commits](https://github.com/shoenig/test/compare/v0.3.0...v0.3.1)

---
updated-dependencies:
- dependency-name: github.com/shoenig/test
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-19 11:32:59 +02:00