This complements the `env` parameter, so that the operator can author
tasks that don't share their Vault token with the workload when using
`image` filesystem isolation. As a result, more powerful tokens can be used
in a job definition, allowing it to use template stanzas to issue all kinds of
secrets (database secrets, Vault tokens with very specific policies, etc.),
without sharing that issuing power with the task itself.
This is accomplished by creating a directory called `private` within
the task's working directory, which shares many properties of
the `secrets` directory (tmpfs where possible, not accessible by
`nomad alloc fs` or Nomad's web UI), but isn't mounted into/bound to the
container.
If the `disable_file` parameter is set to `false` (its default), the Vault token
is also written to the NOMAD_SECRETS_DIR, so the default behavior is
backwards compatible. Even if the operator never changes the default,
they will still benefit from the improved behavior of Nomad never reading
the token back in from that - potentially altered - location.
* Update ioutil deprecated library references to os and io respectively
* Deal with the errors produced.
Add error handling to filEntry info
Add error handling to info
Fixes#2522
Skip embedding client.alloc_dir when building chroot. If a user
configures a Nomad client agent so that the chroot_env will embed the
client.alloc_dir, Nomad will happily infinitely recurse while building
the chroot until something horrible happens. The best case scenario is
the filesystem's path length limit is hit. The worst case scenario is
disk space is exhausted.
A bad agent configuration will look something like this:
```hcl
data_dir = "/tmp/nomad-badagent"
client {
enabled = true
chroot_env {
# Note that the source matches the data_dir
"/tmp/nomad-badagent" = "/ohno"
# ...
}
}
```
Note that `/ohno/client` (the state_dir) will still be created but not
`/ohno/alloc` (the alloc_dir).
While I cannot think of a good reason why someone would want to embed
Nomad's client (and possibly server) directories in chroots, there
should be no cause for harm. chroots are only built when Nomad runs as
root, and Nomad disables running exec jobs as root by default. Therefore
even if client state is copied into chroots, it will be inaccessible to
tasks.
Skipping the `data_dir` and `{client,server}.state_dir` is possible, but
this PR attempts to implement the minimum viable solution to reduce risk
of unintended side effects or bugs.
When running tests as root in a vm without the fix, the following error
occurs:
```
=== RUN TestAllocDir_SkipAllocDir
alloc_dir_test.go:520:
Error Trace: alloc_dir_test.go:520
Error: Received unexpected error:
Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long
Test: TestAllocDir_SkipAllocDir
--- FAIL: TestAllocDir_SkipAllocDir (22.76s)
```
Also removed unused Copy methods on AllocDir and TaskDir structs.
Thanks to @eveld for not letting me forget about this!
Before, Connect Native Tasks needed one of these to work:
- To be run in host networking mode
- To have the Consul agent configured to listen to a unix socket
- To have the Consul agent configured to listen to a public interface
None of these are a great experience, though running in host networking is
still the best solution for non-Linux hosts. This PR establishes a connection
proxy between the Consul HTTP listener and a unix socket inside the alloc fs,
bypassing the network namespace for any Connect Native task. Similar to and
re-uses a bunch of code from the gRPC listener version for envoy sidecar proxies.
Proxy is established only if the alloc is configured for bridge networking and
there is at least one Connect Native task in the Task Group.
Fixes#8290
* connect: add unix socket to proxy grpc for envoy
Fixes#6124
Implement a L4 proxy from a unix socket inside a network namespace to
Consul's gRPC endpoint on the host. This allows Envoy to connect to
Consul's xDS configuration API.
* connect: pointer receiver on structs with mutexes
* connect: warn on all proxy errors
Fixes#6041
Unlike all other Consul operations, boostrapping requires Consul be
available. This PR tries Consul 3 times with a backoff to account for
the group services being asynchronously registered with Consul.
* Stopping an alloc is implemented via Updates but update hooks are
*not* run.
* Destroying an alloc is a best effort cleanup.
* AllocRunner destroy hooks implemented.
* Disk migration and blocking on a previous allocation exiting moved to
its own package to avoid cycles. Now only depends on alloc broadcaster
instead of also using a waitch.
* AllocBroadcaster now only drops stale allocations and always keeps the
latest version.
* Made AllocDir safe for concurrent use
Lots of internal contexts that are currently unused. Unsure if they
should be used or removed.
If an alloc dir is being GC'd (removed) during snapshotting the walk
func will be passed an error. Previously we didn't check for an error so
a panic would occur when we'd try to use a nil `fileInfo`.
interface has 3 implementations:
1. local for blocking and moving data locally
2. remote for blocking and moving data from another node
3. noop for allocs that don't need to block
Fixes#2478#2474#1995#2294
The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.
The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.
Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.
Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
Fixes an issue where the Ruby runtime expects the sticky bit to be set
on the temp directory. The sticky bit is commonly set on the temp
directory since it is usually shared by many users. This change brings
ours in line with that assumption.
A change in the behavior of `os.Rename` in Go 1.8 brought to light a
difference in the logic between `{Alloc,Task}Runner` and this test:
AllocRunner builds the alloc dir, moves dirs if necessary, and then lets
TaskRunner call TaskDir.Build().
This test called `TaskDir.Build` *before* `AllocDir.Move`, so in Go 1.8
it failed to `os.Rename over` the empty {data,local} dirs.
I updated the test to behave like the real code, but I defensively added
`os.Remove` calls as a subtle change in call order shouldn't break this
code. `os.Remove` won't remove a non-empty directory, so it's still
safe.
Many thanks to @iverberk for the original PR (#1609), but we ended up
not wanting to ship this implementation with 0.5.
We'll come back to it after 0.5 and hopefully find a way to leverage
filesystem accounting and quotas, so we can skip the expensive polling.
* Added the alloc dir move
* Moving allocdirs when starting allocations
* Added the migrate flag to ephemeral disk
* Stopping migration if the allocation doesn't need migration any more
* Added the GetAllocDir method
* refactored code
* Added a test for alloc runner
* Incorporated review comments