Our changelog has become large enough that GitHub's rendering is very slow,
resulting in error pages ("angry unicorns"). Split out the older unsupported
versions of Nomad into their own file so that we only need to render the most
recent versions, while keeping the older versions relatively searchable by
having them in a single file.
This complements the `env` parameter, so that the operator can author
tasks that don't share their Vault token with the workload when using
`image` filesystem isolation. As a result, more powerful tokens can be used
in a job definition, allowing it to use template stanzas to issue all kinds of
secrets (database secrets, Vault tokens with very specific policies, etc.),
without sharing that issuing power with the task itself.
This is accomplished by creating a directory called `private` within
the task's working directory, which shares many properties of
the `secrets` directory (tmpfs where possible, not accessible by
`nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the
container.
If the `disable_file` parameter is set to `false` (its default), the Vault token
is also written to the NOMAD_SECRETS_DIR, so the default behavior is
backwards compatible. Even if the operator never changes the default,
they will still benefit from the improved behavior of Nomad never reading
the token back in from that - potentially altered - location.
* e2e: create a v3/ set of packages for creating Nomad e2e tests
This PR creates an experimental set of packages under `e2e/v3/` for crafting
Nomad e2e tests. Unlike previous generations, this is an attempt at providing
a way to create tests in a declarative (ish) pattern, with a focus on being
easy to use, easy to cleanup, and easy to debug.
@shoenig is just trying this out to see how it goes.
Lots of features need to be implemented.
Many more docs need to be written.
Breaking changes are to be expected.
There are known and unknown bugs.
No warranty.
Quick run of `example` with verbose logging.
```shell
➜ NOMAD_E2E_VERBOSE=1 go test -v
=== RUN TestExample
=== RUN TestExample/testSleep
util3.go:25: register (service) job: "sleep-809"
util3.go:25: checking eval: 9f0ae04d-7259-9333-3763-44d0592d03a1, status: pending
util3.go:25: checking eval: 9f0ae04d-7259-9333-3763-44d0592d03a1, status: complete
util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: running
util3.go:25: checking deployment: a85ad2f8-269c-6620-d390-8eac7a9c397d, status: successful
util3.go:25: deployment a85ad2f8-269c-6620-d390-8eac7a9c397d was a success
util3.go:25: deregister job "sleep-809"
util3.go:25: system gc
=== RUN TestExample/testNamespace
util3.go:25: apply namespace "example-291"
util3.go:25: register (service) job: "sleep-967"
util3.go:25: checking eval: a2a2303a-adf1-2621-042e-a9654292e569, status: pending
util3.go:25: checking eval: a2a2303a-adf1-2621-042e-a9654292e569, status: complete
util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running
util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running
util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: running
util3.go:25: checking deployment: 3395e9a8-3ffc-8990-d5b8-cc0ce311f302, status: successful
util3.go:25: deployment 3395e9a8-3ffc-8990-d5b8-cc0ce311f302 was a success
util3.go:25: deregister job "sleep-967"
util3.go:25: system gc
util3.go:25: cleanup namespace "example-291"
=== RUN TestExample/testEnv
util3.go:25: register (batch) job: "env-582"
util3.go:25: checking eval: 600f3bce-ea17-6d13-9d20-9d9eb2a784f7, status: pending
util3.go:25: checking eval: 600f3bce-ea17-6d13-9d20-9d9eb2a784f7, status: complete
util3.go:25: deregister job "env-582"
util3.go:25: system gc
--- PASS: TestExample (10.08s)
--- PASS: TestExample/testSleep (5.02s)
--- PASS: TestExample/testNamespace (4.02s)
--- PASS: TestExample/testEnv (1.03s)
PASS
ok github.com/hashicorp/nomad/e2e/example 10.079s
```
* cluster3: use filter for kernel.name instead of filtering manually
Unfortunately due to the split build nature of the ember app and
storybook it isn't possible to import mirage in the storybook context to
control scenarios via a knob :(
* Versions added to deploying status panel
* Wrap the running and healthy title in a span
* Versions in the deployment UI next to titles
* Version count and label styles updated
Although #17669 fixed the permissions of the release pipeline to push new
commits, there was still an error when invoking the `build` workflow.
The format of the reference was changed in #17103 such that we're sending the
git ref (a SHA) and not the "--ref" argument required by the GH actions workflow
API, which in this case is apparently specially defined as "The branch or tag
name which contains the version of the workflow file you'd like to run" and not
what git calls a "ref".
This changeset:
* Removes the third-party action entirely so that we're using GitHub's own
tooling. This removes one more thing from the supply chain to pin and ensures a
1:1 mapping of args to what's documented by GitHub.
* Removes the `--ref` argument entirely, which causes it to default to the
current branch that the release workflow is running on (which is always what
we want).
Since the matrix exercises different test cases, it's better to allow
all partitions to completely run, even if one of them fails, so it's
easier to catch multiple test failures.
In Nomad 1.5.3 we fixed a security bug that allowed bypass of ACL checks if the
request came thru a client node first. But this fix broke (knowingly) the
identification of many client-to-server RPCs. These will be now measured as if
they were anonymous. The reason for this is that many client-to-server RPCs do
not send the node secret and instead rely on the protection of mTLS.
This changeset ensures that the node secret is being sent with every
client-to-server RPC request. In a future version of Nomad we can add
enforcement on the server side, but this was left out of this changeset to
reduce risks to the safe upgrade path.
Sending the node secret as an auth token introduces a new problem during initial
introduction of a client. Clients send many RPCs concurrently with
`Node.Register`, but until the node is registered the node secret is unknown to
the server and will be rejected as invalid. This causes permission denied
errors.
To fix that, this changeset introduces a gate on having successfully made a
`Node.Register` RPC before any other RPCs can be sent (except for `Status.Ping`,
which we need earlier but which also ignores the error because that handler
doesn't do an authorization check). This ensures that we only send requests with
a node secret already known to the server. This also makes client startup a
little easier to reason about because we know `Node.Register` must succeed
first, and it should make for a good place to hook in future plans for secure
introduction of nodes. The tradeoff is that an existing client that has running
allocs will take slightly longer (a second or two) to transition to ready after
a restart, because the transition in `Node.UpdateStatus` is gated at the server
by first submitting `Node.UpdateAlloc` with client alloc updates.
In #17103 we set read-only permissions on all the workflows. Unfortunately we
missed that the `release` workflow makes git commits and pushes them to the
repository, so it needs to have write permissions.
* client: do not disable memory swappiness if kernel does not support it
This PR adds a workaround for very old Linux kernels which do not support
the memory swappiness interface file. Normally we write a "0" to the file
to explicitly disable swap. In the case the kernel does not support it,
give libcontainer a nil value so it does not write anything.
Fixes#17448
* client: detect swappiness by writing to the file
* fixup changelog
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
---------
Co-authored-by: James Rasell <jrasell@users.noreply.github.com>
* Degraded vs Healthy etc. status
* Standardize the look of a deploying status panel
* badge styles
* remove job.status from title component in favour of in-panel status
* Remove a redundant check
* re-attrd fail-deployment button considered
This is intended to be used like `yarn exam:parallel -- more --options`
This way a split and partition can be provided by CI without CI also
needing to deal with percy details.
If the dynamic port range for a node is set so that the min is equal to the max,
there's only one port available and this passes config validation. But the
scheduler panics when it tries to pick a random port. Only add the randomness
when there's more than one to pick from.
Adds a test for the behavior but also adjusts the commentary on a couple of the
existing tests that made it seem like this case was already covered if you
didn't look too closely.
Fixes: #17585
the windows docker install script stopped working.
after trying various things to fix the script,
I opted instead for a base image that comes with
docker already installed.
error output during build was:
Installing Docker.
WARNING: Cannot find path 'C:\Users\Administrator\AppData\Local\Temp\DockerMsftProvider\DockerDefault_DockerSearchIndex.json' because it does not exist.
WARNING: Cannot bind argument to parameter 'downloadURL' because it is an empty string.
WARNING: The property 'AbsoluteUri' cannot be found on this object. Verify that the property exists.
WARNING: The property 'RequestMessage' cannot be found on this object. Verify that the property exists.
Failed to install Docker.
Install-Package : No match was found for the specified search criteria and package name 'docker'.