Commit Graph

23635 Commits

Author SHA1 Message Date
Luiz Aoqui ad84b22a72
Post 1.3.4 release (#14329)
* Generate files for 1.3.4 release

* Prepare for next release

* Update CHANGELOG.md

Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>
2022-08-26 10:09:13 -04:00
dependabot[bot] 451194397f
build(deps): bump github.com/hashicorp/go-memdb from 1.3.2 to 1.3.3 (#14206)
Bumps [github.com/hashicorp/go-memdb](https://github.com/hashicorp/go-memdb) from 1.3.2 to 1.3.3.
- [Release notes](https://github.com/hashicorp/go-memdb/releases)
- [Changelog](https://github.com/hashicorp/go-memdb/blob/main/changes.go)
- [Commits](https://github.com/hashicorp/go-memdb/compare/v1.3.2...v1.3.3)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-memdb
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-26 10:07:41 -04:00
Seth Hoenig 6b2655ad86 cleanup: create pointer.Compare helper function
This PR creates a pointer.Compare helper for comparing equality of
two pointers. Strictly only works with primitive types we know are
safe to derefence and compare using '=='.
2022-08-26 08:55:59 -05:00
dependabot[bot] 42792c4813
build(deps): bump github.com/hashicorp/go-hclog from 1.2.0 to 1.2.2 (#14208)
Bumps [github.com/hashicorp/go-hclog](https://github.com/hashicorp/go-hclog) from 1.2.0 to 1.2.2.
- [Release notes](https://github.com/hashicorp/go-hclog/releases)
- [Commits](https://github.com/hashicorp/go-hclog/compare/v1.2.0...v1.2.2)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-hclog
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-26 09:31:54 -04:00
dependabot[bot] 1eb34c1099
build(deps): bump github.com/aws/aws-sdk-go from 1.42.27 to 1.44.84 (#14326)
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.42.27 to 1.44.84.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.42.27...v1.44.84)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-26 09:13:37 -04:00
Charlie Voiselle ad737d008b
SV API: return upserted variable to caller (#14325)
* Return created variable to caller in HTTP and Go APIs
* Update tests for returned values
2022-08-25 17:38:15 -04:00
dependabot[bot] 6d3389653b
build(deps): bump github.com/shirou/gopsutil/v3 from 3.21.12 to 3.22.7 (#14209)
* build(deps): bump github.com/shirou/gopsutil/v3 from 3.21.12 to 3.22.7

Bumps [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil) from 3.21.12 to 3.22.7.
- [Release notes](https://github.com/shirou/gopsutil/releases)
- [Commits](https://github.com/shirou/gopsutil/compare/v3.21.12...v3.22.7)

---
updated-dependencies:
- dependency-name: github.com/shirou/gopsutil/v3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* changelog entry

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
2022-08-25 14:15:41 -04:00
Seth Hoenig 38ad855ae7
Merge pull request #14230 from hashicorp/b-fix-cpuset-init
client: refactor cpuset manager initialization
2022-08-25 11:19:39 -05:00
Seth Hoenig 51384dd63f client: refactor cpuset manager initialization
This PR refactors the code path in Client startup for setting up the cpuset
cgroup manager (non-linux systems not affected).

Before, there was a logic bug where we would try to read the cpuset.cpus.effective
cgroup interface file before ensuring nomad's parent cgroup existed. Therefor that
file would not exist, and the list of useable cpus would be empty. Tasks started
thereafter would not have a value set for their cpuset.cpus.

The refactoring fixes some less than ideal coding style. Instead we now bootstrap
each cpuset manager type (v1/v2) within its own constructor. If something goes
awry during bootstrap (e.g. cgroups not enabled), the constructor returns the
noop implementation and logs a warning.

Fixes #14229
2022-08-25 11:18:43 -05:00
Seth Hoenig fd9744b9eb
Merge pull request #14301 from hashicorp/b-fix-check-status-test-racey
testing: fix flakey check status test
2022-08-25 08:30:46 -05:00
James Rasell 601588df6b
Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main 2022-08-25 12:14:29 +01:00
James Rasell 2293dcf35f
Merge pull request #14291 from hashicorp/f-gh-13120-sso-various-small-fixes
acl: three small fixes for CLI and state consistency
2022-08-25 12:05:32 +02:00
James Rasell 7a0798663d
acl: fix a bug where roles could be duplicated by name.
An ACL roles name must be unique, however, a bug meant multiple
roles of the same same could be created. This fixes that problem
with checks in the RPC handler and state store.
2022-08-25 09:20:43 +01:00
Luiz Aoqui 31ab7964bd
ui: task lifecycle restart all tasks (#14223)
Now that tasks that have finished running can be restarted, the UI needs
to use the actual task state to determine which CSS class to use when
rendering the task lifecycle chart element.
2022-08-24 18:43:44 -04:00
Luiz Aoqui e012d9411e
Task lifecycle restart (#14127)
* allocrunner: handle lifecycle when all tasks die

When all tasks die the Coordinator must transition to its terminal
state, coordinatorStatePoststop, to unblock poststop tasks. Since this
could happen at any time (for example, a prestart task dies), all states
must be able to transition to this terminal state.

* allocrunner: implement different alloc restarts

Add a new alloc restart mode where all tasks are restarted, even if they
have already exited. Also unifies the alloc restart logic to use the
implementation that restarts tasks concurrently and ignores
ErrTaskNotRunning errors since those are expected when restarting the
allocation.

* allocrunner: allow tasks to run again

Prevent the task runner Run() method from exiting to allow a dead task
to run again. When the task runner is signaled to restart, the function
will jump back to the MAIN loop and run it again.

The task runner determines if a task needs to run again based on two new
task events that were added to differentiate between a request to
restart a specific task, the tasks that are currently running, or all
tasks that have already run.

* api/cli: add support for all tasks alloc restart

Implement the new -all-tasks alloc restart CLI flag and its API
counterpar, AllTasks. The client endpoint calls the appropriate restart
method from the allocrunner depending on the restart parameters used.

* test: fix tasklifecycle Coordinator test

* allocrunner: kill taskrunners if all tasks are dead

When all non-poststop tasks are dead we need to kill the taskrunners so
we don't leak their goroutines, which are blocked in the alloc restart
loop. This also ensures the allocrunner exits on its own.

* taskrunner: fix tests that waited on WaitCh

Now that "dead" tasks may run again, the taskrunner Run() method will
not return when the task finishes running, so tests must wait for the
task state to be "dead" instead of using the WaitCh, since it won't be
closed until the taskrunner is killed.

* tests: add tests for all tasks alloc restart

* changelog: add entry for #14127

* taskrunner: fix restore logic.

The first implementation of the task runner restore process relied on
server data (`tr.Alloc().TerminalStatus()`) which may not be available
to the client at the time of restore.

It also had the incorrect code path. When restoring a dead task the
driver handle always needs to be clear cleanly using `clearDriverHandle`
otherwise, after exiting the MAIN loop, the task may be killed by
`tr.handleKill`.

The fix is to store the state of the Run() loop in the task runner local
client state: if the task runner ever exits this loop cleanly (not with
a shutdown) it will never be able to run again. So if the Run() loops
starts with this local state flag set, it must exit early.

This local state flag is also being checked on task restart requests. If
the task is "dead" and its Run() loop is not active it will never be
able to run again.

* address code review requests

* apply more code review changes

* taskrunner: add different Restart modes

Using the task event to differentiate between the allocrunner restart
methods proved to be confusing for developers to understand how it all
worked.

So instead of relying on the event type, this commit separated the logic
of restarting an taskRunner into two methods:
- `Restart` will retain the current behaviour and only will only restart
  the task if it's currently running.
- `ForceRestart` is the new method where a `dead` task is allowed to
  restart if its `Run()` method is still active. Callers will need to
  restart the allocRunner taskCoordinator to make sure it will allow the
  task to run again.

* minor fixes
2022-08-24 17:43:07 -04:00
Tim Gross c732b215f0
vault: detect namespace change in config reload (#14298)
The `namespace` field was not included in the equality check between old and new
Vault configurations, which meant that a Vault config change that only changed
the namespace would not be detected as a change and the clients would not be
reloaded.

Also, the comparison for boolean fields such as `enabled` and
`allow_unauthenticated` was on the pointer and not the value of that pointer,
which results in spurious reloads in real config reload that is easily missed in
typical test scenarios.

Includes a minor refactor of the order of fields for `Copy` and `Merge` to match
the struct fields in hopes it makes it harder to make this mistake in the
future, as well as additional test coverage.
2022-08-24 17:03:29 -04:00
Seth Hoenig 6398548ebd
Merge pull request #14283 from hashicorp/f-java-corretto-test-case
drivers/java: add parsing test case for corretto 17
2022-08-24 15:28:20 -05:00
Seth Hoenig 5e18c7b5b2
Merge pull request #14297 from hashicorp/b-logmon-fork-mystery-bin
client/logmon: acquire executable in init block
2022-08-24 15:25:09 -05:00
Seth Hoenig ff59b90d41 testing: fix flakey check status test
This PR fixes a flakey test where we did not wait on the check
status to actually become failing (go too fast and you just get
a pending check).

Instead add a helper for waiting on any check in the alloc to become
the state we are looking for.
2022-08-24 15:11:41 -05:00
Seth Hoenig 062c817450 cleanup: move fs helpers into escapingfs 2022-08-24 14:45:34 -05:00
Seth Hoenig 423ea1a5c4 client/logmon: acquire executable in init block
This PR causes the logmon task runner to acquire the binary of the
Nomad executable in an 'init' block, so as to almost certainly get
the name while the nomad file still exists.

This is an attempt at fixing the case where a deleted Nomad file
(e.g. during upgrade) may be getting renamed with a mysterious
suffix first.

If this doesn't work, as a last resort we can literally just trim
the mystery string.

Fixes: #14079
2022-08-24 13:17:20 -05:00
Piotr Kazmierczak 7077d1f9aa
template: custom change_mode scripts (#13972)
This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707
2022-08-24 17:43:01 +02:00
Luiz Aoqui 848f2dcc22
changelog: update #14212 to breaking-change (#14292) 2022-08-24 11:36:53 -04:00
Seth Hoenig bff6c88683 cleanup: remove more copies of min/max from helper 2022-08-24 09:56:15 -05:00
Luiz Aoqui ea1802ffa0
deps: sync versions of go-discover in go.mod (#14269)
In #13491 the version of `go-discover` was updated in `go.mod` but the
comment above it mentions that it also needs to be updated in the
`replace` directive.
2022-08-24 10:32:13 -04:00
Piotr Kazmierczak 077b6e7098
docs: Update upgrade guide to reflect enterprise changes introduced in nomad-enterprise (#14212)
This PR documents a change made in the enterprise version of nomad that addresses the following issue:

When a user tries to filter audit logs, they do so with a stanza that looks like the following:

audit {
  enabled = true

  filter "remove deletes" {
    type = "HTTPEvent"
    endpoints  = ["*"]
    stages = ["OperationComplete"]
    operations = ["DELETE"]
  }
}

When specifying both an "endpoint" and a "stage", the events with both matching a "endpoint" AND a matching "stage" will be filtered.

When specifying both an "endpoint" and an "operation" the events with both matching a "endpoint" AND a matching "operation" will be filtered.

When specifying both a "stage" and an "operation" the events with a matching a "stage" OR a matching "operation" will be filtered.

The "OR" logic with stages and operations is unexpected and doesn't allow customers to get specific on which events they want to filter. For instance the following use-case is impossible to achieve: "I want to filter out all OperationReceived events that have the DELETE verb".
2022-08-24 16:31:49 +02:00
Seth Hoenig 0d97a94814 drivers/java: add parsing test case for corretto 17 2022-08-24 09:16:38 -05:00
James Rasell 2ccc48c167
cli: use policy flag for role creation and update. 2022-08-24 15:15:02 +01:00
James Rasell 7401677e4e
cli: output none when a token has no expiration. 2022-08-24 15:14:49 +01:00
Seth Hoenig cfe9db0f66
build: set osusergo build tag by default (#14248)
This PR activates the osuergo build tag in GNUMakefile. This forces the os/user
package to be compiled without CGO. Doing so seems to resolve a race condition
in getpwnam_r that causes alloc creation to hang or panic on `user.Lookup("nobody")`.
2022-08-24 08:11:56 -05:00
James Rasell 9782d6d7ff
acl: allow tokens to lookup linked roles. (#14227)
When listing or reading an ACL role, roles linked to the ACL token
used for authentication can be returned to the caller.
2022-08-24 13:51:51 +02:00
Luiz Aoqui 7ee3de3ea5
fix minor issues found durint ENT merge (#14250) 2022-08-23 17:22:18 -04:00
Michael Schurter 0114bcfe5b
core: move LicenseConfig to shared file (#14247)
This moves LicenseConfig and its Copy method to a shared file so that it can be shared with enterprise code.
2022-08-23 13:44:10 -07:00
Luiz Aoqui af5c01a070
ui: use task state to determine if task is active (#14224)
The current implementation uses the task's finishedAt field to determine
if a task is active of not, but this check is not accurate. A task in
the "pending" state will not have finishedAt value but it's also not
active.

This discrepancy results in some components, like the inline stats chart
of the task row component, to be displayed even whey they shouldn't.
2022-08-23 15:50:40 -04:00
Luiz Aoqui d3be0abf61
ci: fix gofmt on tasklifecycle (#14232) 2022-08-23 15:47:15 -04:00
Tim Gross afb9fe6a4e
docs: fix an anchor link in secure vars docs (#14231) 2022-08-23 10:46:24 -04:00
Seth Hoenig b5427a9f3b
Merge pull request #14215 from hashicorp/docs-update-checks-for-nsd
docs: update check documentation with NSD specifics
2022-08-23 09:23:53 -05:00
Seth Hoenig fb82f11e70
docs: fix checks doc typo
Co-authored-by: Piotr Kazmierczak <phk@mm.st>
2022-08-23 09:23:36 -05:00
Seth Hoenig 60e22d2b13
Merge pull request #14221 from hashicorp/build-require-go1.19
build: go.mod should require go1.19
2022-08-23 07:53:13 -05:00
Luiz Aoqui 7a8cacc9ec
allocrunner: refactor task coordinator (#14009)
The current implementation for the task coordinator unblocks tasks by
performing destructive operations over its internal state (like closing
channels and deleting maps from keys).

This presents a problem in situations where we would like to revert the
state of a task, such as when restarting an allocation with tasks that
have already exited.

With this new implementation the task coordinator behaves more like a
finite state machine where task may be blocked/unblocked multiple times
by performing a state transition.

This initial part of the work only refactors the task coordinator and
is functionally equivalent to the previous implementation. Future work
will build upon this to provide bug fixes and enhancements.
2022-08-22 18:38:49 -04:00
Phil Renaud 662721ce75
Remove a test pause and a lint error from #14199 (#14222) 2022-08-22 16:51:49 -04:00
Tim Gross bf57d76ec7
allow ACL policies to be associated with workload identity (#14140)
The original design for workload identities and ACLs allows for operators to
extend the automatic capabilities of a workload by using a specially-named
policy. This has shown to be potentially unsafe because of naming collisions, so
instead we'll allow operators to explicitly attach a policy to a workload
identity.

This changeset adds workload identity fields to ACL policy objects and threads
that all the way down to the command line. It also a new secondary index to the
ACL policy table on namespace and job so that claim resolution can efficiently
query for related policies.
2022-08-22 16:41:21 -04:00
Charlie Voiselle 29e63a6cb2
Make var get a blocking query as expected (#14205) 2022-08-22 16:37:21 -04:00
Charlie Voiselle b79447d013
Add .0 to make goenv happy (#14218) 2022-08-22 16:36:02 -04:00
Seth Hoenig 3fd0ec6569
Merge pull request #14219 from hashicorp/e2e-nsd-checks
e2e: add e2e tests for nomad service disco checks
2022-08-22 15:31:35 -05:00
Seth Hoenig 38727b6ab9 e2e: add e2e tests for nomad service disco checks
This PR adds 2 e2e tests for ensuring nomad service discovery checks
get created and produce status results as expected.
2022-08-22 15:31:13 -05:00
Luiz Aoqui dbffdca92e
template: use pointer values for gid and uid (#14203)
When a Nomad agent starts and loads jobs that already existed in the
cluster, the default template uid and gid was being set to 0, since this
is the zero value for int. This caused these jobs to fail in
environments where it was not possible to use 0, such as in Windows
clients.

In order to differentiate between an explicit 0 and a template where
these properties were not set we need to use a pointer.
2022-08-22 16:25:49 -04:00
Michael Schurter d36e0c02c9
client: stats need latest allocdir (#14204)
In #14139 this code was changed to use the original copy of the config,
but Config.AllocDir is updated in the `Client.init()` method for dev
agents.

This uses the latest version of the alloc dir (which cannot change
further at runtime without a client restart which would reinitialize
the stats collector as well).
2022-08-22 09:28:53 -07:00
Seth Hoenig ea6d010790 docs: update check documentation with NSD specifics
This PR updates the checks documentation to mention support for checks
when using the Nomad service provider. There are limitations of NSD
compared to Consul, and those configuration options are now noted as
being Consul-only.
2022-08-22 10:50:26 -05:00
Phil Renaud fcf2c40c60
[ui] Allocation route services table: show task-level services (#14199)
Adds service fragments to allocations and union taskGroup and task services
2022-08-22 11:45:12 -04:00