d88b224248
* Update filesystem.mdx Update summary of alloc directory to include information on access differences between task drivers and filesystem isolation modes. Co-authored-by: Tim Gross <tim@0x74696d.com>
470 lines
16 KiB
Plaintext
470 lines
16 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Filesystem
|
|
description: |-
|
|
Nomad creates an allocation working directory for every allocation. Learn what
|
|
goes into the working directory and how it interacts with Nomad task drivers.
|
|
---
|
|
|
|
# Filesystem
|
|
|
|
Nomad creates a working directory for each allocation on a client. This
|
|
directory can be found in the Nomad [`data_dir`] at
|
|
`./allocs/«alloc_id»`. The allocation working directory is where Nomad
|
|
creates task directories and directories shared between tasks, write logs for
|
|
tasks, and downloads artifacts or templates.
|
|
|
|
An allocation with two tasks (named `task1` and `task2`) will have an
|
|
allocation directory like the one below.
|
|
|
|
```shell-session
|
|
.
|
|
├── alloc
|
|
│ ├── data
|
|
│ ├── logs
|
|
│ │ ├── task1.stderr.0
|
|
│ │ ├── task1.stdout.0
|
|
│ │ ├── task2.stderr.0
|
|
│ │ └── task2.stdout.0
|
|
│ └── tmp
|
|
├── task1
|
|
│ ├── local
|
|
│ ├── secrets
|
|
│ └── tmp
|
|
└── task2
|
|
├── local
|
|
├── secrets
|
|
└── tmp
|
|
```
|
|
|
|
- **alloc/**: This directory is shared across all tasks in an allocation and
|
|
can be used to store data that needs to be used by multiple tasks, such as a
|
|
log shipper. This is the directory that's provided to the task as the
|
|
`NOMAD_ALLOC_DIR`. Note that this `alloc/` directory is not the same as the
|
|
"allocation working directory", which is the top-level directory. All tasks
|
|
in a task group can read and write to the `alloc/` directory. But the full host
|
|
path may differ depending on the task driver's [filesystem isolation mode], so
|
|
tasks should always used the `NOMAD_ALLOC_DIR` environment variable
|
|
to find this path rather than relying on the specific implementation of the
|
|
[`none`](#none-isolation), [`chroot`](#chroot-isolation), or [`image`](#image-isolation)
|
|
modes. Within the `alloc/` directory are three standard directories:
|
|
|
|
- **alloc/data/**: This directory is the location used by the
|
|
[`ephemeral_disk`] stanza for shared data.
|
|
|
|
- **alloc/logs/**: This directory is the location of the log files for every
|
|
task within an allocation. The `nomad alloc logs` command streams these
|
|
files to your terminal.
|
|
|
|
- **alloc/tmp/**: A temporary directory used as scratch space by task drivers.
|
|
|
|
- **«taskname»**: Each task has a **task working directory** with the same name as
|
|
the task. Tasks in a task group can't read each other's task working
|
|
directory. Depending on the task driver's [filesystem isolation mode], a
|
|
task may not be able to access the task working directory. Within the
|
|
`task/` directory are three standard directories:
|
|
|
|
- **«taskname»/local/**: This directory is the location provided to the task as the
|
|
`NOMAD_TASK_DIR`. Note this is not the same as the "task working
|
|
directory". This directory is private to the task.
|
|
|
|
- **«taskname»/secrets/**: This directory is the location provided to the task as
|
|
`NOMAD_SECRETS_DIR`. The contents of files in this directory cannot be read
|
|
by the `nomad alloc fs` command. It can be used to store secret data that
|
|
should not be visible outside the task.
|
|
|
|
- **«taskname»/tmp/**: A temporary directory used as scratch space by task drivers.
|
|
|
|
The allocation working directory is the directory you see when using the
|
|
`nomad alloc fs` command. If you were to run `nomad alloc fs` against the
|
|
allocation that made the working directory shown above, you'd see the
|
|
following:
|
|
|
|
```shell-session
|
|
$ nomad alloc fs c0b2245f
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z alloc/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z task1/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z task2/
|
|
|
|
$ nomad alloc fs c0b2245f alloc/
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z data/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:39Z logs/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z tmp/
|
|
|
|
$ nomad alloc fs c0b2245f task1/
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:00:33Z local/
|
|
drwxrwxrwx 60 B 2020-10-27T18:00:32Z secrets/
|
|
dtrwxrwxrwx 4.0 KiB 2020-10-27T18:00:32Z tmp/
|
|
```
|
|
|
|
## Task Drivers and Filesystem Isolation Modes
|
|
|
|
Depending on the task driver, the task's working directory may also be the
|
|
root directory for the running task. This is determined by the task driver's
|
|
[filesystem isolation capability].
|
|
|
|
### `image` isolation
|
|
|
|
Task drivers like `docker` or `qemu` use `image` isolation, where the task
|
|
driver isolates task filesystems as machine images. These filesystems are
|
|
owned by the task driver's external process and not by Nomad itself. These
|
|
filesystems will not typically be found anywhere in the allocation working
|
|
directory. For example, Docker containers will have their overlay filesystem
|
|
unpacked to `/var/run/docker/containerd/«container_id»` by default.
|
|
|
|
Nomad will provide the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
|
|
`NOMAD_SECRETS_DIR` to tasks with `image` isolation, typically by
|
|
bind-mounting them to the task driver's filesystem.
|
|
|
|
You can see an example of `image` isolation by running the following minimal
|
|
job:
|
|
|
|
```hcl
|
|
job "example" {
|
|
datacenters = ["dc1"]
|
|
|
|
task "task1" {
|
|
driver = "docker"
|
|
|
|
config {
|
|
image = "redis:6.0"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
If you look at the allocation working directory from the host, you'll see a
|
|
minimal filesystem tree:
|
|
|
|
```shell-session
|
|
.
|
|
├── alloc
|
|
│ ├── data
|
|
│ ├── logs
|
|
│ │ ├── task1.stderr.0
|
|
│ │ └── task1.stdout.0
|
|
│ └── tmp
|
|
└── task1
|
|
├── local
|
|
├── secrets
|
|
└── tmp
|
|
```
|
|
|
|
The `nomad alloc fs` command shows the same bare directory tree:
|
|
|
|
```shell-session
|
|
$ nomad alloc fs b0686b27
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z alloc/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z task1/
|
|
|
|
$ nomad alloc fs b0686b27 task1
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z local/
|
|
drwxrwxrwx 60 B 2020-10-27T18:51:54Z secrets/
|
|
dtrwxrwxrwx 4.0 KiB 2020-10-27T18:51:54Z tmp/
|
|
|
|
$ nomad alloc fs b0686b27 task1/local
|
|
Mode Size Modified Time Name
|
|
```
|
|
|
|
If you inspect the Docker container that's created, you'll see three
|
|
directories bind-mounted into the container:
|
|
|
|
```shell-session
|
|
$ docker inspect 32e | jq '.[0].HostConfig.Binds'
|
|
[
|
|
"/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/alloc:/alloc",
|
|
"/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/local:/local",
|
|
"/var/nomad/alloc/b0686b27-8af3-8252-028f-af485c81a8b3/task1/secrets:/secrets"
|
|
]
|
|
```
|
|
|
|
The root filesystem inside the container can see these three mounts, along
|
|
with the rest of the container filesystem:
|
|
|
|
```shell-session
|
|
$ docker exec -it 32e /bin/sh
|
|
# ls /
|
|
alloc boot dev home lib64 media opt root sbin srv tmp var
|
|
bin data etc lib local mnt proc run secrets sys usr
|
|
```
|
|
|
|
Note that because the three directories are bind-mounted into the container
|
|
filesystem, nothing written outside those three directories elsewhere in the
|
|
allocation working directory will be accessible inside the container. This
|
|
means templates, artifacts, and dispatch payloads for tasks with `image`
|
|
isolation must be written into the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, or
|
|
`NOMAD_SECRETS_DIR`.
|
|
|
|
To work around this limitation, you can use the task driver's mounting
|
|
capabilities to mount one of the three directories to another location in the
|
|
task. For example, with the Docker driver you can use the driver's `mounts`
|
|
block to bind a secret written by a `template` block to the
|
|
`NOMAD_SECRETS_DIR` into a configuration directory elsewhere in the task:
|
|
|
|
```hcl
|
|
job "example" {
|
|
datacenters = ["dc1"]
|
|
|
|
task "task1" {
|
|
driver = "docker"
|
|
|
|
config {
|
|
image = "redis:6.0"
|
|
mounts = [{
|
|
type = "bind"
|
|
source = "secrets"
|
|
target = "/etc/redis.d"
|
|
readonly = true
|
|
}]
|
|
|
|
template {
|
|
destination = "${NOMAD_SECRETS_DIR}/redis.conf"
|
|
data = <<EOT
|
|
{{ with secret "secrets/data/redispass" }}
|
|
requirepass {{- .Data.data.passwd -}}{{end}}
|
|
EOT
|
|
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Note that relative mount source path are relative to the task working
|
|
directory, so to bind the `NOMAD_ALLOC_DIR` as a mount source, you will need
|
|
to use a relative path that traverses up into the allocation working directory
|
|
(ex. `source = "../alloc"`).
|
|
|
|
### `chroot` isolation
|
|
|
|
Task drivers like `exec` or `java` (on Linux) use `chroot` isolation, where
|
|
the task driver isolates task filesystems with `chroot` or `pivot_root`. These
|
|
isolated filesystems will be built inside the task working directory.
|
|
|
|
You can see an example of `chroot` isolation by running the following minimal
|
|
job on Linux:
|
|
|
|
```hcl
|
|
job "example" {
|
|
datacenters = ["dc1"]
|
|
|
|
task "task2" {
|
|
driver = "exec"
|
|
|
|
config {
|
|
command = "/bin/sh"
|
|
args = ["-c", "sleep 600"]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
If you look at the allocation working directory from the host, you'll see a
|
|
filesystem tree that has been populated with the task driver's [chroot
|
|
contents], in addition to the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
|
|
`NOMAD_SECRETS_DIR`:
|
|
|
|
```shell-session
|
|
.
|
|
├── alloc
|
|
│ ├── container
|
|
│ ├── data
|
|
│ ├── logs
|
|
│ └── tmp
|
|
└── task2
|
|
├── alloc
|
|
├── bin
|
|
├── dev
|
|
├── etc
|
|
├── executor.out
|
|
├── lib
|
|
├── lib32
|
|
├── lib64
|
|
├── local
|
|
├── proc
|
|
├── run
|
|
├── sbin
|
|
├── secrets
|
|
├── sys
|
|
├── tmp
|
|
└── usr
|
|
```
|
|
|
|
Likewise, the root directory of the task is now available in the `nomad alloc fs` command output:
|
|
|
|
```shell-session
|
|
$ nomad alloc fs eebd13a7
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z alloc/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z task2/
|
|
|
|
$ nomad alloc fs eebd13a7 task2
|
|
Mode Size Modified Time Name
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:24Z alloc/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z bin/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z dev/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z etc/
|
|
-rw-r--r-- 297 B 2020-10-27T19:05:24Z executor.out
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib32/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z lib64/
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T19:05:22Z local/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z proc/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z run/
|
|
drwxr-xr-x 12 KiB 2020-10-27T19:05:22Z sbin/
|
|
drwxrwxrwx 60 B 2020-10-27T19:05:22Z secrets/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:24Z sys/
|
|
dtrwxrwxrwx 4.0 KiB 2020-10-27T19:05:22Z tmp/
|
|
drwxr-xr-x 4.0 KiB 2020-10-27T19:05:22Z usr/
|
|
```
|
|
|
|
Nomad will provide the `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and
|
|
`NOMAD_SECRETS_DIR` to tasks with `chroot` isolation. But unlike with `image`
|
|
isolation, Nomad does not need to bind-mount the `NOMAD_TASK_DIR` directory
|
|
because it can be directly created inside the chroot.
|
|
|
|
```shell-session
|
|
$ nomad alloc exec eebd13a7 /bin/sh
|
|
$ mount
|
|
...
|
|
/dev/mapper/root on /alloc type ext4 (rw,relatime,errors=remount-ro,data=ordered)
|
|
tmpfs on /secrets type tmpfs (rw,noexec,relatime,size=1024k)
|
|
...
|
|
```
|
|
|
|
### `none` isolation
|
|
|
|
The `raw_exec` task driver (or the `java` task driver on Windows) uses the
|
|
`none` filesystem isolation mode. This means the task driver does not isolate
|
|
the filesystem for the task, and the task can read and write anywhere the
|
|
user that's running Nomad can.
|
|
|
|
You can see an example of `none` isolation by running the following minimal
|
|
`raw_exec` job on Linux or Unix.
|
|
|
|
```hcl
|
|
job "example" {
|
|
datacenters = ["dc1"]
|
|
|
|
task "task3" {
|
|
driver = "raw_exec"
|
|
|
|
config {
|
|
command = "/bin/sh"
|
|
args = ["-c", "sleep 600"]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
If you look at the allocation working directory from the host, you'll see a
|
|
minimal filesystem tree:
|
|
|
|
```shell-session
|
|
.
|
|
├── alloc
|
|
│ ├── data
|
|
│ ├── logs
|
|
│ │ ├── task3.stderr.0
|
|
│ │ └── task3.stdout.0
|
|
│ └── tmp
|
|
└── task3
|
|
├── executor.out
|
|
├── local
|
|
├── secrets
|
|
└── tmp
|
|
```
|
|
|
|
The `nomad alloc fs` command shows the same bare directory tree:
|
|
|
|
```shell-session
|
|
$ nomad alloc fs 87ec7d12 task3
|
|
Mode Size Modified Time Name
|
|
-rw-r--r-- 140 B 2020-10-27T19:15:33Z executor.out
|
|
drwxrwxrwx 4.0 KiB 2020-10-27T19:15:33Z local/
|
|
drwxrwxrwx 60 B 2020-10-27T19:15:33Z secrets/
|
|
dtrwxrwxrwx 4.0 KiB 2020-10-27T19:15:33Z tmp/
|
|
```
|
|
|
|
But if you use `nomad alloc exec` to view the filesystem from inside the
|
|
container, you'll see that the task has access to the entire root
|
|
filesystem. The `NOMAD_ALLOC_DIR`, `NOMAD_TASK_DIR`, and `NOMAD_SECRETS_DIR`
|
|
point to the filepath on the host, not a path anchored in the task working
|
|
directory. And the task is running as `root`, because the Nomad client agent
|
|
is running as `root`. This is why the `raw_exec` driver is disabled by
|
|
default.
|
|
|
|
```shell-session
|
|
$ nomad alloc exec 87ec7d12 /bin/sh
|
|
# ls /
|
|
bin dev home lib lib64 lost+found mnt proc run snap sys usr vmlinuz
|
|
boot etc initrd.img lib32 libx32 media opt root sbin srv tmp var
|
|
|
|
# echo $NOMAD_SECRETS_DIR
|
|
/var/nomad/alloc/87ec7d12-5e35-8fba-96cc-09e5376be15a/task3/secrets
|
|
|
|
# whoami
|
|
root
|
|
```
|
|
|
|
## Templates, Artifacts, and Dispatch Payloads
|
|
|
|
The other contents of the allocation working directory depend on what features
|
|
the job specification uses. The allocation working directory is populated by
|
|
other features in a specific order:
|
|
|
|
- The allocation working directory is created.
|
|
- The ephemeral disk data is [migrated] from any previous allocation.
|
|
- [CSI volumes] are staged.
|
|
- Then, for each task:
|
|
- Task working directories are created.
|
|
- [Dispatch payloads] are written.
|
|
- [Artifacts] are downloaded.
|
|
- [Templates] are rendered.
|
|
- The task is started by the task driver, which includes all bind mounts and
|
|
[volume mounts].
|
|
|
|
Dispatch payloads, artifacts, and templates are written to the task working
|
|
directory before a task can start because the resulting files may be binary or
|
|
image run by the task. For example, an `artifact` can be used to download a
|
|
Docker image or .jar file, or a `template` can be used to render a shell
|
|
script that's run by `exec`.
|
|
|
|
The `artifact` and `template` blocks write their data to a destination
|
|
relative to the task working directory, not the `NOMAD_TASK_DIR`. For task
|
|
drivers with `image` filesystem isolation, this means the `destination` field
|
|
path should be prefixed with either `NOMAD_TASK_DIR` or
|
|
`NOMAD_SECRETS_DIR`. Otherwise, the file will not be visible from inside the
|
|
resulting container. (The `dispatch_payload` block always writes its data to
|
|
the `NOMAD_TASK_DIR`.)
|
|
|
|
For [CSI volumes], the client will stage the volume before setting up the task
|
|
working directory. Staging typically involves mounting the volume into the CSI
|
|
plugin's task directory, sending commands to the plugin to format the volume
|
|
as required, and making a volume claim to the Nomad server.
|
|
|
|
The behavior of the `volume_mount` block is controlled by the task driver. The
|
|
client builds a mount configuration describing the host volume or CSI volume
|
|
and passes it to the task driver to execute. Because the task driver mounts
|
|
the volume, it is not possible to have `artifact`, `template`, or
|
|
`dispatch_payload` blocks write to a volume.
|
|
|
|
[artifacts]: /docs/job-specification/artifact
|
|
[csi volumes]: /docs/internals/plugins/csi
|
|
[dispatch payloads]: /docs/job-specification/dispatch_payload
|
|
[templates]: /docs/job-specification/template
|
|
[`data_dir`]: /docs/configuration#data_dir
|
|
[`ephemeral_disk`]: /docs/job-specification/ephemeral_disk
|
|
[artifact]: /docs/job-specification/artifact
|
|
[chroot contents]: /docs/drivers/exec#chroot
|
|
[filesystem isolation capability]: /docs/internals/plugins/task-drivers#capabilities-capabilities-error
|
|
[filesystem isolation mode]: #task-drivers-and-filesystem-isolation-modes
|
|
[migrated]: /docs/job-specification/ephemeral_disk#migrate
|
|
[template]: /docs/job-specification/template
|
|
[volume mounts]: /docs/job-specification/volume_mount
|