c15a16301e
We recently added documentation disambiguating the terminology of the allocation/task working directories. This changeset adds an internals document that describes in more detail exactly what does into the allocation working directory, how this interacts with the filesystem isolation provided by task drivers, and how this interacts with features like `artifact` and `template`. Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
466 lines
15 KiB
Plaintext
466 lines
15 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Filesystem
|
|
sidebar_title: Filesystem
|
|
description: |-
|
|
Nomad creates an allocation working directory for every allocation. Learn what
|
|
goes into the working directory and how it interacts with Nomad task drivers.
|
|
---
|
|
|
|
# Filesystem
|
|
|
|
Nomad creates a working directory for each allocation on a client. This
|
|
directory can be found in the Nomad [`data_dir`] at
|
|
`./allocs/«alloc_id»`. The allocation working directory is where Nomad
|
|
creates task directories and directories shared between tasks, write logs for
|
|
tasks, and downloads artifacts or templates.
|
|
|
|
An allocation with two tasks (named `task1` and `task2`) will have an
|
|
allocation directory like the one below.
|
|
|
|
```shell-session
|
|
.
|
|
├── alloc
|
|
│ ├── data
|
|
│ ├── logs
|
|
│ │ ├── task1.stderr.0
|
|
│ │ ├── task1.stdout.0
|
|
│ │ ├── task2.stderr.0
|
|
│ │ └── task2.stdout.0
|
|
│ └── tmp
|
|
├── task1
|
|
│ ├── local
|
|
│ ├── secrets
|
|
│ └── tmp
|
|
└── task2
|
|
├── local
|
|
├── secrets
|
|
└── tmp
|
|
```
|
|
|
|
- **alloc/**: This directory is shared across all tasks in an allocation and
|
|
can be used to store data that needs to be used by multiple tasks, such as a
|
|
log shipper. This is the directory that's provided to the task as the
|
|
`NOMAD_ALLOC_DIR`. Note that this `alloc/` directory is not the same as the
|
|
"allocation working directory", which is the top-level directory. All tasks
|
|
in a task group can read and write to the `alloc/` directory. Within the
|
|
`alloc/` directory are three standard directories:
|
|
|
|
- **alloc/data/**: This directory is the location used by the
|
|
[`ephemeral_disk`] stanza for shared data.
|
|
|
|
- **alloc/logs/**: This directory is the location of the log files for every
|
|
task within an allocation. The `nomad alloc logs` command streams these
|
|
files to your terminal.
|
|
|
|
- **alloc/tmp/**: A temporary directory used as scratch space by task drivers.
|
|
|
|
- **«taskname»**: Each task has a **task working directory** with the same name as
|
|
the task. Tasks in a task group can't read each other's task working
|
|
directory. Depending on the task driver's [filesystem isolation mode], a
|
|
task may not be able to access the task working directory. Within the
|
|
`task/` directory are three standard directories:
|
|
|
|
- **«taskname»/local/**: This directory is the location provided to the task as the
|
|
`NOMAD_TASK_DIR`. Note this is not the same as the "task working
|
|
directory". This directory is private to the task.
|
|
|
|
- **«taskname»/secrets/**: This directory is the location provided to the task as
|
|
`NOMAD_SECRETS_DIR`. The contents of files in this directory cannot be read
|
|
the the `nomad alloc fs` command. It can be used to store secret data that
|
|
should not be visible outside the task.
|
|
|
|
- **«taskname»/tmp/**: A temporary directory used as scratch space by task drivers.
|
|
|
|
The allocation working directory is the directory you see when using the
|
|
`nomad alloc fs` command. If you were to run `nomad alloc fs` against the
|
|
allocation that made the working directory shown above, you'd see the
|
|
following:
|
|
|
|
```shell-session
|
|
$ nomad alloc fs c0b2245f
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z alloc/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z task1/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z task2/
|
|
|
|
$ nomad alloc fs c0b2245f alloc/
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z data/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z logs/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z tmp/
|
|
|
|
$ nomad alloc fs c0b2245f task1/
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:33Z local/
|
|
drwxrwxrwx 60 B 2020-10-27T18:00:32Z secrets/
|
|
dtrwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z tmp/
|
|
```
|
|
|
|
## Task Drivers and Filesystem Isolation Modes
|
|
|
|
Depending on the task driver, the task's working directory may also be the
|
|
root directory for the running task. This is determined by the task driver's
|
|
[filesystem isolation capability].
|
|
|
|
### `image` isolation
|
|
|
|
Task drivers like `docker` or `qemu` use `image` isolation, where the task
|
|
driver isolates task filesystems as machine images. These filesystems are
|
|
owned by the task driver's external process and not by Nomad itself. These
|
|
filesystems will not typically be found anywhere in the allocation working
|
|
directory. For example, Docker containers will have their overlay filesystem
|
|
unpacked to `/var/run/docker/containerd/«container_id»` by default.
|
|
|
|
Nomad will provide the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
|
|
`NOMAD_SECRETS_DIR` to tasks with `image` isolation, typically by
|
|
bind-mounting them to the task driver's filesystem.
|
|
|
|
You can see an example of `image` isolation by running the following minimal
|
|
job:
|
|
|
|
```hcl
|
|
job "example" {
|
|
datacenters = ["dc1"]
|
|
|
|
task "task1" {
|
|
driver = "docker"
|
|
|
|
config {
|
|
image = "redis:6.0"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
If you look at the allocation working directory from the host, you'll see a
|
|
minimal filesystem tree:
|
|
|
|
```shell-session
|
|
.
|
|
├── alloc
|
|
│ ├── data
|
|
│ ├── logs
|
|
│ │ ├── task1.stderr.0
|
|
│ │ └── task1.stdout.0
|
|
│ └── tmp
|
|
└── task1
|
|
├── local
|
|
├── secrets
|
|
└── tmp
|
|
```
|
|
|
|
The `nomad alloc fs` command shows the same bare directory tree:
|
|
|
|
```shell-session
|
|
$ nomad alloc fs b0686b27
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z alloc/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z task1/
|
|
|
|
$ nomad alloc fs b0686b27 task1
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z local/
|
|
drwxrwxrwx 60 B 2020-10-27T18:51:54Z secrets/
|
|
dtrwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z tmp/
|
|
|
|
$ nomad alloc fs b0686b27 task1/local
|
|
Mode Size Modified Time Name
|
|
```
|
|
|
|
If you inspect the Docker container that's created, you'll see three
|
|
directories bind-mounted into the container:
|
|
|
|
```shell-session
|
|
$ docker inspect 32e | jq '.[0].HostConfig.Binds'
|
|
[
|
|
"/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/alloc:/alloc",
|
|
"/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/local:/local",
|
|
"/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/secrets:/secrets"
|
|
]
|
|
```
|
|
|
|
The root filesystem inside the container can see these three mounts, along
|
|
with the rest of the container filesystem:
|
|
|
|
```shell-session
|
|
$ docker exec -it 32e /bin/sh
|
|
# ls /
|
|
alloc boot dev home lib64 media opt root sbin srv tmp var
|
|
bin data etc lib local mnt proc run secrets sys usr
|
|
```
|
|
|
|
Note that because the three directories are bind-mounted into the container
|
|
filesystem, nothing written outside those three directories elsewhere in the
|
|
allocation working directory will be accessible inside the container. This
|
|
means templates, artifacts, and dispatch payloads for tasks with `image`
|
|
isolation must be written into the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, or
|
|
`NOMAD_SECRETS_DIR`.
|
|
|
|
To work around this limitation, you can use the task driver's mounting
|
|
capabilities to mount one of the three directories to another location in the
|
|
task. For example, with the Docker driver you can use the driver's `mounts`
|
|
block to bind a secret written by a `template` block to the
|
|
`NOMAD_SECRETS_DIR` into a configuration directory elsewhere in the task:
|
|
|
|
```hcl
|
|
job "example" {
|
|
datacenters = ["dc1"]
|
|
|
|
task "task1" {
|
|
driver = "docker"
|
|
|
|
config {
|
|
image = "redis:6.0"
|
|
mounts = [{
|
|
type = "bind"
|
|
source = "secrets"
|
|
target = "/etc/redis.d"
|
|
readonly = true
|
|
}]
|
|
|
|
template {
|
|
destination = "${NOMAD_SECRETS_DIR}/redis.conf"
|
|
data = <<EOT
|
|
{{ with secret "secrets/data/redispass" }}
|
|
requirepass {{- .Data.data.passwd -}}{{end}}
|
|
EOT
|
|
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
|
|
### `chroot` isolation
|
|
|
|
Task drivers like `exec` or `java` (on Linux) use `chroot` isolation, where
|
|
the task driver isolates task filesystems with `chroot` or `pivot_root`. These
|
|
isolated filesystems will be built inside the task working directory.
|
|
|
|
You can see an example of `chroot` isolation by running the following minimal
|
|
job on Linux:
|
|
|
|
```hcl
|
|
job "example" {
|
|
datacenters = ["dc1"]
|
|
|
|
task "task2" {
|
|
driver = "exec"
|
|
|
|
config {
|
|
command = "/bin/sh"
|
|
args = ["-c", "sleep 600"]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
If you look at the allocation working directory from the host, you'll see a
|
|
filesystem tree that has been populated with the task driver's [chroot
|
|
contents], in addition to the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
|
|
`NOMAD_SECRETS_DIR`:
|
|
|
|
```shell-session
|
|
.
|
|
├── alloc
|
|
│ ├── container
|
|
│ ├── data
|
|
│ ├── logs
|
|
│ └── tmp
|
|
└── task2
|
|
├── alloc
|
|
├── bin
|
|
├── dev
|
|
├── etc
|
|
├── executor.out
|
|
├── lib
|
|
├── lib32
|
|
├── lib64
|
|
├── local
|
|
├── proc
|
|
├── run
|
|
├── sbin
|
|
├── secrets
|
|
├── sys
|
|
├── tmp
|
|
└── usr
|
|
```
|
|
|
|
Likewise, the root directory of the task is now available in the `nomad alloc
|
|
fs` command output:
|
|
|
|
```shell-session
|
|
$ nomad alloc fs eebd13a7
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z alloc/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z task2/
|
|
|
|
$ nomad alloc fs eebd13a7 task2
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z alloc/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z bin/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z dev/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z etc/
|
|
-rw-r--r-- 297 B 2020-10-27T19:05:24Z executor.out
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib32/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib64/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:22Z local/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z proc/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z run/
|
|
drwxr-xr-x 12 KiB 2020-10-27T19:05:22Z sbin/
|
|
drwxrwxrwx 60 B 2020-10-27T19:05:22Z secrets/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z sys/
|
|
dtrwxrwxrwx 4.0 KiB 2020-10-27T19:05:22Z tmp/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z usr/
|
|
```
|
|
|
|
Nomad will provide the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
|
|
`NOMAD_SECRETS_DIR` to tasks with `chroot` isolation. But unlike with `image`
|
|
isolation, Nomad does not need to bind-mount the `NOMAD_TASK_DIR` directory
|
|
because it can be directly created inside the chroot.
|
|
|
|
```shell-session
|
|
$ nomad alloc exec eebd13a7 /bin/sh
|
|
$ mount
|
|
...
|
|
/dev/mapper/root on /alloc type ext4 (rw,relatime,errors=remount-ro,data=ordered)
|
|
tmpfs on /secrets type tmpfs (rw,noexec,relatime,size=1024k)
|
|
...
|
|
```
|
|
|
|
### `none` isolation
|
|
|
|
The `raw_exec` task driver (or the `java` task driver on Windows) uses the
|
|
`none` filesystem isolation mode. This means the task driver does not isolate
|
|
the filesystem for the task, and the task can read and write anywhere the
|
|
user that's running Nomad can.
|
|
|
|
You can see an example of `none` isolation by running the following minimal
|
|
`raw_exec` job on Linux or Unix.
|
|
|
|
|
|
```hcl
|
|
job "example" {
|
|
datacenters = ["dc1"]
|
|
|
|
task "task3" {
|
|
driver = "raw_exec"
|
|
|
|
config {
|
|
command = "/bin/sh"
|
|
args = ["-c", "sleep 600"]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
If you look at the allocation working directory from the host, you'll see a
|
|
minimal filesystem tree:
|
|
|
|
```shell-session
|
|
.
|
|
├── alloc
|
|
│ ├── data
|
|
│ ├── logs
|
|
│ │ ├── task3.stderr.0
|
|
│ │ └── task3.stdout.0
|
|
│ └── tmp
|
|
└── task3
|
|
├── executor.out
|
|
├── local
|
|
├── secrets
|
|
└── tmp
|
|
```
|
|
|
|
The `nomad alloc fs` command shows the same bare directory tree:
|
|
|
|
```shell-session
|
|
$ nomad alloc fs 87ec7d12 task3
|
|
Mode Size Modified Time Name
|
|
-rw-r--r-- 140 B 2020-10-27T19:15:33Z executor.out
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T19:15:33Z local/
|
|
drwxrwxrwx 60 B 2020-10-27T19:15:33Z secrets/
|
|
dtrwxrwxrwx 4.0 KiB 2020-10-27T19:15:33Z tmp/
|
|
```
|
|
|
|
But if you use `nomad alloc exec` to view the filesystem from inside the
|
|
container, you'll see that the task has access to the entire root
|
|
filesystem. The `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and `NOMAD_SECRETS_DIR`
|
|
point to the filepath on the host, not a path anchored in the task working
|
|
directory. And the task is running as `root`, because the Nomad client agent
|
|
is running as `root`. This is why the `raw_exec` driver is disabled by
|
|
default.
|
|
|
|
```shell-session
|
|
$ nomad alloc exec 87ec7d12 /bin/sh
|
|
# ls /
|
|
bin dev home lib lib64 lost+found mnt proc run snap sys usr vmlinuz
|
|
boot etc initrd.img lib32 libx32 media opt root sbin srv tmp var
|
|
|
|
# echo $NOMAD_SECRETS_DIR
|
|
/var/nomad/alloc/87ec7d12-5e35-8fba-96cc-09e5376be15a/task3/secrets
|
|
|
|
# whoami
|
|
root
|
|
```
|
|
|
|
## Templates, Artifacts, and Dispatch Payloads
|
|
|
|
The other contents of the allocation working directory depend on what features
|
|
the job specification uses. The allocation working directory is populated by
|
|
other features in a specific order:
|
|
|
|
* The allocation working directory is created.
|
|
* The ephemeral disk data is [migrated] from any previous allocation.
|
|
* [CSI volumes] are staged.
|
|
* Then, for each task:
|
|
* Task working directories are created.
|
|
* [Dispatch payloads] are written.
|
|
* [Artifacts] are downloaded.
|
|
* [Templates] are rendered.
|
|
* The task is started by the task driver, which includes all bind mounts and
|
|
[volume mounts].
|
|
|
|
Dispatch payloads, artifacts, and templates are written to the task working
|
|
directory before a task can start because the resulting files may be binary or
|
|
image run by the task. For example, an `artifact` can be used to download a
|
|
Docker image or .jar file, or a `template` can be used to render a shell
|
|
script that's run by `exec`.
|
|
|
|
The `artifact` and `template` blocks write their data to a destination
|
|
relative to the task working directory, not the `NOMAD_TASK_DIR`. For task
|
|
drivers with `image` filesystem isolation, this means the `destination` field
|
|
path should be prefixed with either `NOMAD_TASK_DIR` or
|
|
`NOMAD_SECRETS_DIR`. Otherwise, the file will not be visible from inside the
|
|
resulting container. (The `dispatch_payload` block always writes its data to
|
|
the `NOMAD_TASK_DIR`.)
|
|
|
|
For [CSI volumes], the client will stage the volume before setting up the task
|
|
working directory. Staging typically involves mounting the volume into the CSI
|
|
plugin's task directory, sending commands to the plugin to format the volume
|
|
as required, and making a volume claim to the Nomad server.
|
|
|
|
The behavior of the `volume_mount` block is controlled by the task driver. The
|
|
client builds a mount configuration describing the host volume or CSI volume
|
|
and passes it to the task driver to execute. Because the task driver mounts
|
|
the volume, it is not possible to have `artifact`, `template`, or
|
|
`dispatch_payload` blocks write to a volume.
|
|
|
|
|
|
[Artifacts]: /docs/job-specification/artifact
|
|
[CSI volumes]: /docs/internals/plugins/csi
|
|
[Dispatch payloads]: /docs/job-specification/dispatch_payload
|
|
[Templates]: /docs/job-specification/template
|
|
[`data_dir`]: /docs/configuration#data_dir
|
|
[`ephemeral_disk`]: /docs/job-specification/ephemeral_disk
|
|
[artifact]: /docs/job-specification/artifact
|
|
[chroot contents]: /docs/drivers/exec#chroot
|
|
[filesystem isolation capability]: /docs/internals/plugins/task-drivers#capabilities-capabilities-error
|
|
[filesystem isolation mode]: #task-drivers-and-filesystem-isolation-modes
|
|
[migrated]: /docs/job-specification/ephemeral_disk#migrate
|
|
[template]: /docs/job-specification/template
|
|
[volume mounts]: /docs/job-specification/volume_mount
|