open-nomad/website/content/docs/job-specification/artifact.mdx
Michael Schurter 2965dc6a1a
artifact: fix numerous go-getter security issues
Fix numerous go-getter security issues:

- Add timeouts to http, git, and hg operations to prevent DoS
- Add size limit to http to prevent resource exhaustion
- Disable following symlinks in both artifacts and `job run`
- Stop performing initial HEAD request to avoid file corruption on
  retries and DoS opportunities.

**Approach**

Since Nomad has no ability to differentiate a DoS-via-large-artifact vs
a legitimate workload, all of the new limits are configurable at the
client agent level.

The max size of HTTP downloads is also exposed as a node attribute so
that if some workloads have large artifacts they can specify a high
limit in their jobspecs.

In the future all of this plumbing could be extended to enable/disable
specific getters or artifact downloading entirely on a per-node basis.
2022-05-24 16:29:39 -04:00

256 lines
8 KiB
Plaintext

---
layout: docs
page_title: artifact Stanza - Job Specification
description: |-
The "artifact" stanza instructs Nomad to fetch and unpack a remote resource,
such as a file, tarball, or binary, and permits downloading artifacts from a
variety of locations using a URL as the input source.
---
# `artifact` Stanza
<Placement groups={['job', 'group', 'task', 'artifact']} />
The `artifact` stanza instructs Nomad to fetch and unpack a remote resource,
such as a file, tarball, or binary. Nomad downloads artifacts using the popular
[`go-getter`][go-getter] library, which permits downloading artifacts from a
variety of locations using a URL as the input source.
```hcl
job "docs" {
group "example" {
task "server" {
artifact {
source = "https://example.com/file.tar.gz"
destination = "local/some-directory"
options {
checksum = "md5:df6a4178aec9fbdc1d6d7e3634d1bc33"
}
}
}
}
}
```
Nomad supports downloading `http`, `https`, `git`, `hg` and `S3` artifacts. If
these artifacts are archived (`zip`, `tgz`, `bz2`, `xz`), they are
automatically unarchived before the starting the task.
## `artifact` Parameters
- `destination` `(string: "local/")` - Specifies the directory path to
download the artifact, relative to the root of the [task's working
directory]. If omitted, the default value is to place the artifact in
`local/`. The destination is treated as a directory unless `mode` is set to
`file`. Source files will be downloaded into that directory path. For more
details on how the `destination` interacts with task drivers, see the
[Filesystem internals] documentation.
- `mode` `(string: "any")` - One of `any`, `file`, or `dir`. If set to `file`
the `destination` must be a file, not a directory. By default the
`destination` will be `local/<filename>`.
- `options` `(map<string|string>: nil)` - Specifies configuration parameters to
fetch the artifact. The key-value pairs map directly to parameters appended to
the supplied `source` URL. Please see the [`go-getter`
documentation][go-getter] for a complete list of options and examples.
- `headers` `(map<string|string>: nil)` - Specifies HTTP headers to set when
fetching the artifact using `http` or `https` protocol. Please see the
[`go-getter` headers documentation][go-getter-headers] for more information.
- `source` `(string: <required>)` - Specifies the URL of the artifact to download.
See [`go-getter`][go-getter] for details.
## Operation Limits
The client [`artifact`][client_artifact] configuration can set limits to
specific artifact operations to prevent excessive data download or operation
time.
If a task's `artifact` retrieval exceeds one of those limits, the task will be
interrupted and fail to start. Refer to the task events for more information.
## `artifact` Examples
The following examples only show the `artifact` stanzas. Remember that the
`artifact` stanza is only valid in the placements listed above.
### Download File
This example downloads the artifact from the provided URL and places it in
`local/file.txt`. The `local/` path is relative to the [task's working
directory].
```hcl
artifact {
source = "https://example.com/file.txt"
}
```
To set HTTP headers in the request for the source the optional `headers` field
can be configured.
```hcl
artifact {
source = "https://example.com/file.txt"
headers {
User-Agent = "nomad-[${NOMAD_JOB_ID}]-[${NOMAD_GROUP_NAME}]-[${NOMAD_TASK_NAME}]"
X-Nomad-Alloc = "${NOMAD_ALLOC_ID}"
}
}
```
To use HTTP basic authentication, preprend the username and password to the
hostname in the URL. All special characters, including the username and
password, must be URL encoded. For example, for a username `exampleUser` and
the password `pass/word!`:
```hcl
artifact {
source = "https://exampleUser:pass%2Fword%21@example.com/file.txt"
}
```
### Download using git
This example downloads the artifact from the provided GitHub URL and places it at
`local/repo`, as specified by the optional `destination` parameter.
```hcl
artifact {
source = "git::https://github.com/hashicorp/nomad-guides"
destination = "local/repo"
}
```
To download from a private repo, sshkey needs to be set. The key must be
base64-encoded string. On Linux, you can run `base64 -w0 <file>` to encode the
file. Or use [HCL2](https://www.nomadproject.io/docs/job-specification/hcl2)
expressions to read and encode the key from a file on your machine:
```hcl
artifact {
# The git:: prefix forces go-getter's protocol detection to use the git ssh
# protocol. It can also automatically detect the protocol from the domain of
# some git hosting providers (such as GitHub) without the prefix.
source = "git::git@bitbucket.org:example/nomad-examples"
destination = "local/repo"
options {
# Make sure that the Nomad user's known hosts file is populated:
# ssh-keyscan github.com | sudo tee -a /root/.ssh/known_hosts
# https://github.com/hashicorp/go-getter/issues/55
sshkey = "${base64encode(file(pathexpand("~/.ssh/id_rsa")))}"
}
}
```
To clone specific refs or at a specific depth, use the `ref` and `depth`
options:
```hcl
artifact {
source = "git::https://github.com/hashicorp/nomad-guides"
destination = "local/repo"
options {
ref = "main"
depth = 1
}
}
```
### Download and Unarchive
This example downloads and unarchives the result in `local/file`. Because the
source URL is an archive extension, Nomad will automatically decompress it:
```hcl
artifact {
source = "https://example.com/file.tar.gz"
}
```
To disable automatic unarchiving, set the `archive` option to false:
```hcl
artifact {
source = "https://example.com/file.tar.gz"
options {
archive = false
}
}
```
### Download and Verify Checksums
This example downloads an artifact and verifies the resulting artifact's
checksum before proceeding. If the checksum is invalid, an error will be
returned.
```hcl
artifact {
source = "https://example.com/file.zip"
options {
checksum = "md5:df6a4178aec9fbdc1d6d7e3634d1bc33"
}
}
```
### Download from an S3-compatible Bucket
These examples download artifacts from Amazon S3. There are several different
types of [S3 bucket addressing][s3-bucket-addr] and [S3 region-specific
endpoints][s3-region-endpoints]. As of Nomad 0.6 non-Amazon S3-compatible
endpoints like [Minio] are supported, but you must explicitly set the "s3::"
prefix.
This example uses path-based notation on a publicly-accessible bucket:
```hcl
artifact {
source = "https://my-bucket-example.s3-us-west-2.amazonaws.com/my_app.tar.gz"
}
```
If a bucket requires authentication, you can avoid the use of credentials by
using [EC2 IAM instance profiles][iam-instance-profiles]. If this is not possible,
credentials may be supplied via the `options` parameter:
```hcl
artifact {
options {
aws_access_key_id = "<id>"
aws_access_key_secret = "<secret>"
aws_access_token = "<token>"
}
}
```
To force the S3-specific syntax, use the `s3::` prefix:
```hcl
artifact {
source = "s3::https://my-bucket-example.s3-eu-west-1.amazonaws.com/my_app.tar.gz"
}
```
Alternatively you can use virtual hosted style:
```hcl
artifact {
source = "https://my-bucket-example.s3-eu-west-1.amazonaws.com/my_app.tar.gz"
}
```
[client_artifact]: /docs/configuration/client#artifact-parameters
[go-getter]: https://github.com/hashicorp/go-getter 'HashiCorp go-getter Library'
[go-getter-headers]: https://github.com/hashicorp/go-getter#headers 'HashiCorp go-getter Headers'
[minio]: https://www.minio.io/
[s3-bucket-addr]: http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingBucket.html#access-bucket-intro 'Amazon S3 Bucket Addressing'
[s3-region-endpoints]: http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region 'Amazon S3 Region Endpoints'
[iam-instance-profiles]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html 'EC2 IAM instance profiles'
[task's working directory]: /docs/runtime/environment#task-directories 'Task Directories'
[filesystem internals]: /docs/internals/filesystem#templates-artifacts-and-dispatch-payloads