Revert "Merge pull request #7322 from hashicorp/docs-fix-guide-links"
This reverts commit 4311f5e95657a2eb7b231daf326af252e6c75ae7, reversing changes made to 5d5469e6facfc4ab59235d5532664bb95a597728.
This commit is contained in:
parent
2e95097106
commit
77e6ad8867
|
@ -6,28 +6,6 @@
|
|||
// the landing page for the category
|
||||
|
||||
export default [
|
||||
{ category: 'quickstart' },
|
||||
{
|
||||
category: 'install',
|
||||
content: [
|
||||
{
|
||||
category: 'production',
|
||||
content: [
|
||||
'requirements',
|
||||
'nomad-agent',
|
||||
'reference-architecture',
|
||||
'deployment-guide'
|
||||
]
|
||||
},
|
||||
'windows-service'
|
||||
]
|
||||
},
|
||||
{ category: 'upgrade', content: ['upgrade-specific'] },
|
||||
{
|
||||
category: 'integrations',
|
||||
content: ['consul-integration', 'consul-connect', 'vault-integration']
|
||||
},
|
||||
'-----------',
|
||||
{
|
||||
category: 'internals',
|
||||
content: [
|
||||
|
|
|
@ -8,7 +8,7 @@ description: The /acl/policy endpoints are used to configure and manage ACL poli
|
|||
# ACL Policies HTTP API
|
||||
|
||||
The `/acl/policies` and `/acl/policy/` endpoints are used to manage ACL policies.
|
||||
For more details about ACLs, please see the [ACL Guide](https://learn.hashicorp.com/nomad/acls/fundamentals).
|
||||
For more details about ACLs, please see the [ACL Guide](/guides/security/acl).
|
||||
|
||||
## List Policies
|
||||
|
||||
|
|
|
@ -8,13 +8,13 @@ description: The /acl/token/ endpoints are used to configure and manage ACL toke
|
|||
# ACL Tokens HTTP API
|
||||
|
||||
The `/acl/bootstrap`, `/acl/tokens`, and `/acl/token/` endpoints are used to manage ACL tokens.
|
||||
For more details about ACLs, please see the [ACL Guide](https://learn.hashicorp.com/nomad/acls/fundamentals).
|
||||
For more details about ACLs, please see the [ACL Guide](/guides/security/acl).
|
||||
|
||||
## Bootstrap Token
|
||||
|
||||
This endpoint is used to bootstrap the ACL system and provide the initial management token.
|
||||
This request is always forwarded to the authoritative region. It can only be invoked once
|
||||
until a [bootstrap reset](https://learn.hashicorp.com/nomad/acls/bootstrap#re-bootstrap-acl-system) is performed.
|
||||
until a [bootstrap reset](/guides/security/acl#reseting-acl-bootstrap) is performed.
|
||||
|
||||
| Method | Path | Produces |
|
||||
| ------ | ---------------- | ------------------ |
|
||||
|
|
|
@ -76,7 +76,7 @@ administration.
|
|||
|
||||
Several endpoints in Nomad use or require ACL tokens to operate. The token are used to authenticate the request and determine if the request is allowed based on the associated authorizations. Tokens are specified per-request by using the `X-Nomad-Token` request header set to the `SecretID` of an ACL Token.
|
||||
|
||||
For more details about ACLs, please see the [ACL Guide](https://learn.hashicorp.com/nomad/acls/fundamentals).
|
||||
For more details about ACLs, please see the [ACL Guide](/guides/security/acl).
|
||||
|
||||
## Authentication
|
||||
|
||||
|
|
|
@ -396,7 +396,7 @@ The `Task` object supports the following keys:
|
|||
Consul for service discovery. A `Service` object represents a routable and
|
||||
discoverable service on the network. Nomad automatically registers when a task
|
||||
is started and de-registers it when the task transitions to the dead state.
|
||||
[Click here](/docs/integrations/consul-integration#service-discovery) to learn more about
|
||||
[Click here](/guides/integrations/consul-integration#service-discovery) to learn more about
|
||||
services. Below is the fields in the `Service` object:
|
||||
|
||||
- `Name`: An explicit name for the Service. Nomad will replace `${JOB}`,
|
||||
|
|
|
@ -825,7 +825,7 @@ $ curl \
|
|||
This endpoint toggles the drain mode of the node. When draining is enabled, no
|
||||
further allocations will be assigned to this node, and existing allocations will
|
||||
be migrated to new nodes. See the [Workload Migration
|
||||
Guide](https://learn.hashicorp.com/nomad/operating-nomad/node-draining) for suggested usage.
|
||||
Guide](/guides/operations/node-draining) for suggested usage.
|
||||
|
||||
| Method | Path | Produces |
|
||||
| ------ | ------------------------- | ------------------ |
|
||||
|
|
|
@ -15,7 +15,7 @@ as interacting with the Raft subsystem.
|
|||
~> Use this interface with extreme caution, as improper use could lead to a
|
||||
Nomad outage and even loss of data.
|
||||
|
||||
See the [Outage Recovery](https://learn.hashicorp.com/nomad/operating-nomad/outage) guide for some examples of how
|
||||
See the [Outage Recovery](/guides/operations/outage) guide for some examples of how
|
||||
these capabilities are used. For a CLI to perform these operations manually,
|
||||
please see the documentation for the
|
||||
[`nomad operator`](/docs/commands/operator) command.
|
||||
|
|
|
@ -12,7 +12,7 @@ description: >-
|
|||
The `/sentinel/policies` and `/sentinel/policy/` endpoints are used to manage Sentinel policies.
|
||||
For more details about Sentinel policies, please see the [Sentinel Policy Guide](https://learn.hashicorp.com/nomad/governance-and-policy/sentinel).
|
||||
|
||||
Sentinel endpoints are only available when ACLs are enabled. For more details about ACLs, please see the [ACL Guide](https://learn.hashicorp.com/nomad/acls/fundamentals).
|
||||
Sentinel endpoints are only available when ACLs are enabled. For more details about ACLs, please see the [ACL Guide](/guides/security/acl).
|
||||
|
||||
~> **Enterprise Only!** This API endpoint and functionality only exists in
|
||||
Nomad Enterprise. This is not present in the open source version of Nomad.
|
||||
|
|
|
@ -197,7 +197,7 @@ via CLI arguments. The `agent` command accepts the following arguments:
|
|||
[data_dir]: /docs/configuration#data_dir
|
||||
[datacenter]: #datacenter
|
||||
[enabled]: /docs/configuration/acl#enabled
|
||||
[encryption overview]: https://learn.hashicorp.com/nomad/transport-security/concepts
|
||||
[encryption overview]: /guides/security/encryption
|
||||
[key_file]: /docs/configuration/consul#key_file
|
||||
[log_json]: /docs/configuration#log_json
|
||||
[log_level]: /docs/configuration#log_level
|
||||
|
@ -206,7 +206,7 @@ via CLI arguments. The `agent` command accepts the following arguments:
|
|||
[network_interface]: #network_interface
|
||||
[network_speed]: #network_speed
|
||||
[node_class]: #node_class
|
||||
[nomad agent]: /docs/install/production/nomad-agent
|
||||
[nomad agent]: /guides/install/production/nomad-agent
|
||||
[plugin_dir]: /docs/configuration#plugin_dir
|
||||
[region]: #region
|
||||
[rejoin_after_leave]: #rejoin_after_leave
|
||||
|
|
|
@ -134,4 +134,4 @@ $ nomad node drain -self -monitor
|
|||
[eligibility]: /docs/commands/node/eligibility
|
||||
[migrate]: /docs/job-specification/migrate
|
||||
[node status]: /docs/commands/node/status
|
||||
[workload migration guide]: https://learn.hashicorp.com/nomad/operating-nomad/node-draining
|
||||
[workload migration guide]: /guides/operations/node-draining
|
||||
|
|
|
@ -63,4 +63,4 @@ UpgradeMigrationTag = ""
|
|||
version info when performing upgrade migrations. If left blank, the Nomad
|
||||
version will be used.
|
||||
|
||||
[autopilot guide]: https://learn.hashicorp.com/nomad/operating-nomad/autopilot
|
||||
[autopilot guide]: /guides/operations/autopilot
|
||||
|
|
|
@ -60,4 +60,4 @@ The return code will indicate success or failure.
|
|||
|
||||
[`redundancy_zone`]: /docs/configuration/server#redundancy_zone
|
||||
[`upgrade_version`]: /docs/configuration/server#upgrade_version
|
||||
[autopilot guide]: https://learn.hashicorp.com/nomad/operating-nomad/autopilot
|
||||
[autopilot guide]: /guides/operations/autopilot
|
||||
|
|
|
@ -46,6 +46,6 @@ The following subcommands are available:
|
|||
[keyring]: /docs/commands/operator/keyring 'Manages gossip layer encryption keys'
|
||||
[list]: /docs/commands/operator/raft-list-peers 'Raft List Peers command'
|
||||
[operator]: /api/operator 'Operator API documentation'
|
||||
[outage recovery guide]: https://learn.hashicorp.com/nomad/operating-nomad/outage
|
||||
[outage recovery guide]: /guides/operations/outage
|
||||
[remove]: /docs/commands/operator/raft-remove-peer 'Raft Remove Peer command'
|
||||
[set-config]: /docs/commands/operator/autopilot-set-config 'Autopilot Set Config command'
|
||||
|
|
|
@ -59,4 +59,4 @@ nomad-server03.global 10.10.11.7:4647 10.10.11.7:4647 follower true
|
|||
configuration. Future versions of Nomad may add support for non-voting servers.
|
||||
|
||||
[operator]: /api/operator
|
||||
[outage recovery]: https://learn.hashicorp.com/nomad/operating-nomad/outage
|
||||
[outage recovery]: /guides/operations/outage
|
||||
|
|
|
@ -42,4 +42,4 @@ nomad operator raft remove-peer [options]
|
|||
[`nomad server force-leave`]: /docs/commands/server/force-leave 'Nomad server force-leave command'
|
||||
[`nomad server members`]: /docs/commands/server/members 'Nomad server members command'
|
||||
[operator]: /api/operator 'Nomad Operator API'
|
||||
[outage recovery]: https://learn.hashicorp.com/nomad/operating-nomad/outage
|
||||
[outage recovery]: /guides/operations/outage
|
||||
|
|
|
@ -12,7 +12,7 @@ description: >-
|
|||
<Placement groups={['autopilot']} />
|
||||
|
||||
The `autopilot` stanza configures the Nomad agent to configure Autopilot behavior.
|
||||
For more information about Autopilot, see the [Autopilot Guide].
|
||||
For more information about Autopilot, see the [Autopilot Guide](/guides/operations/autopilot).
|
||||
|
||||
```hcl
|
||||
autopilot {
|
||||
|
@ -56,5 +56,3 @@ autopilot {
|
|||
- `enable_custom_upgrades` `(bool: false)` - (Enterprise-only) Specifies whether to
|
||||
enable using custom upgrade versions when performing migrations, in conjunction with
|
||||
the [upgrade_version](/docs/configuration/server#upgrade_version) parameter.
|
||||
|
||||
[Autopilot Guide]: https://learn.hashicorp.com/nomad/operating-nomad/autopilot
|
||||
|
|
|
@ -144,12 +144,9 @@ consul {
|
|||
|
||||
### Custom Address and Port
|
||||
|
||||
This example shows pointing the Nomad agent at a different Consul address.
|
||||
|
||||
~> **Note**: You should **never** point directly at a Consul server; always
|
||||
point to a local client.
|
||||
|
||||
In this example, the Consul server is bound and listening on the
|
||||
This example shows pointing the Nomad agent at a different Consul address. Note
|
||||
that you should **never** point directly at a Consul server; always point to a
|
||||
local client. In this example, the Consul server is bound and listening on the
|
||||
node's private IP address instead of localhost, so we use that:
|
||||
|
||||
```hcl
|
||||
|
@ -174,4 +171,4 @@ consul {
|
|||
```
|
||||
|
||||
[consul]: https://www.consul.io/ 'Consul by HashiCorp'
|
||||
[bootstrap]: https://learn.hashicorp.com/nomad/operating-nomad/clustering 'Automatic Bootstrapping'
|
||||
[bootstrap]: /guides/operations/cluster/automatic 'Automatic Bootstrapping'
|
||||
|
|
|
@ -30,11 +30,10 @@ server {
|
|||
|
||||
## `server` Parameters
|
||||
|
||||
- `authoritative_region` `(string: "")` - Specifies the authoritative region,
|
||||
which provides a single source of truth for global configurations such as ACL
|
||||
Policies and global ACL tokens. Non-authoritative regions will replicate from
|
||||
the authoritative to act as a mirror. By default, the local region is assumed
|
||||
to be authoritative.
|
||||
- `authoritative_region` `(string: "")` - Specifies the authoritative region, which
|
||||
provides a single source of truth for global configurations such as ACL Policies and
|
||||
global ACL tokens. Non-authoritative regions will replicate from the authoritative
|
||||
to act as a mirror. By default, the local region is assumed to be authoritative.
|
||||
|
||||
- `bootstrap_expect` `(int: required)` - Specifies the number of server nodes to
|
||||
wait for before bootstrapping. It is most common to use the odd-numbered
|
||||
|
@ -73,11 +72,11 @@ server {
|
|||
- `job_gc_interval` `(string: "5m")` - Specifies the interval between the job
|
||||
garbage collections. Only jobs who have been terminal for at least
|
||||
`job_gc_threshold` will be collected. Lowering the interval will perform more
|
||||
frequent but smaller collections. Raising the interval will perform
|
||||
collections less frequently but collect more jobs at a time. Reducing this
|
||||
interval is useful if there is a large throughput of tasks, leading to a large
|
||||
set of dead jobs. This is specified using a label suffix like "30s" or "3m".
|
||||
`job_gc_interval` was introduced in Nomad 0.10.0.
|
||||
frequent but smaller collections. Raising the interval will perform collections
|
||||
less frequently but collect more jobs at a time. Reducing this interval is
|
||||
useful if there is a large throughput of tasks, leading to a large set of
|
||||
dead jobs. This is specified using a label suffix like "30s" or "3m". `job_gc_interval`
|
||||
was introduced in Nomad 0.10.0.
|
||||
|
||||
- `job_gc_threshold` `(string: "4h")` - Specifies the minimum time a job must be
|
||||
in the terminal state before it is eligible for garbage collection. This is
|
||||
|
@ -91,12 +90,12 @@ server {
|
|||
deployment must be in the terminal state before it is eligible for garbage
|
||||
collection. This is specified using a label suffix like "30s" or "1h".
|
||||
|
||||
- `default_scheduler_config` <code>([scheduler_configuration][Update scheduler
|
||||
config]: nil)</code> - Specifies the initial default scheduler config when
|
||||
bootstrapping cluster. The parameter is ignored once the cluster is
|
||||
bootstrapped or value is updated through the [API endpoint][[Update scheduler
|
||||
config]]. See [the example section](#configuring-scheduler-config) for more
|
||||
details `default_scheduler_config` was introduced in Nomad 0.11.4.
|
||||
- `default_scheduler_config` <code>([scheduler_configuration][update-scheduler-config]:
|
||||
nil)</code> - Specifies the initial default scheduler config when
|
||||
bootstrapping cluster. The parameter is ignored once the cluster is bootstrapped or
|
||||
value is updated through the [API endpoint][update-scheduler-config]. See [the
|
||||
example section](#configuring-scheduler-config) for more details
|
||||
`default_scheduler_config` was introduced in Nomad 0.11.4.
|
||||
|
||||
- `heartbeat_grace` `(string: "10s")` - Specifies the additional time given as a
|
||||
grace period beyond the heartbeat TTL of nodes to account for network and
|
||||
|
@ -136,7 +135,7 @@ server {
|
|||
|
||||
- `redundancy_zone` `(string: "")` - (Enterprise-only) Specifies the redundancy
|
||||
zone that this server will be a part of for Autopilot management. For more
|
||||
information, see the [Autopilot Guide].
|
||||
information, see the [Autopilot Guide](/guides/operations/autopilot).
|
||||
|
||||
- `rejoin_after_leave` `(bool: false)` - Specifies if Nomad will ignore a
|
||||
previous leave and attempt to rejoin the cluster when starting. By default,
|
||||
|
@ -144,15 +143,14 @@ server {
|
|||
cluster again when starting. This flag allows the previous state to be used to
|
||||
rejoin the cluster.
|
||||
|
||||
- `server_join` <code>([server_join][server-join]: nil)</code> - Specifies how
|
||||
the Nomad server will connect to other Nomad servers. The `retry_join` fields
|
||||
may directly specify the server address or use go-discover syntax for
|
||||
auto-discovery. See the [server_join documentation][server-join] for more
|
||||
detail.
|
||||
- `server_join` <code>([server_join][server-join]: nil)</code> - Specifies
|
||||
how the Nomad server will connect to other Nomad servers. The `retry_join`
|
||||
fields may directly specify the server address or use go-discover syntax for
|
||||
auto-discovery. See the [server_join documentation][server-join] for more detail.
|
||||
|
||||
- `upgrade_version` `(string: "")` - A custom version of the format X.Y.Z to use
|
||||
in place of the Nomad version when custom upgrades are enabled in Autopilot.
|
||||
For more information, see the [Autopilot Guide].
|
||||
For more information, see the [Autopilot Guide](/guides/operations/autopilot).
|
||||
|
||||
### Deprecated Parameters
|
||||
|
||||
|
@ -163,9 +161,9 @@ server {
|
|||
succeeds. After one succeeds, no further addresses will be contacted. This is
|
||||
useful for cases where we know the address will become available eventually.
|
||||
Use `retry_join` with an array as a replacement for `start_join`, **do not use
|
||||
both options**. See the [server_join][server-join] section for more
|
||||
information on the format of the string. This field is deprecated in favor of
|
||||
the [server_join stanza][server-join].
|
||||
both options**. See the [server_join][server-join]
|
||||
section for more information on the format of the string. This field is
|
||||
deprecated in favor of the [server_join stanza][server-join].
|
||||
|
||||
- `retry_interval` `(string: "30s")` - Specifies the time to wait between retry
|
||||
join attempts. This field is deprecated in favor of the [server_join
|
||||
|
@ -217,8 +215,8 @@ server {
|
|||
### Automatic Bootstrapping
|
||||
|
||||
The Nomad servers can automatically bootstrap if Consul is configured. For a
|
||||
more detailed explanation, please see the [automatic Nomad bootstrapping]
|
||||
documentation.
|
||||
more detailed explanation, please see the
|
||||
[automatic Nomad bootstrapping documentation](/guides/operations/cluster/automatic).
|
||||
|
||||
### Restricting Schedulers
|
||||
|
||||
|
@ -249,16 +247,14 @@ server {
|
|||
}
|
||||
```
|
||||
|
||||
The structure matches the [Update Scheduler Config] endpoint, but adopted to hcl
|
||||
syntax (namely using snake case rather than camel case).
|
||||
The structure matches the [Update Scheduler Config][update-scheduler-config] endpoint,
|
||||
but adopted to hcl syntax (namely using snake case rather than camel case).
|
||||
|
||||
Nomad servers check their `default_scheduler_config` only during cluster
|
||||
bootstrap. During upgrades, if a previously bootstrapped cluster already set
|
||||
scheduler configuration via the [Update Scheduler Config] endpoint, that is
|
||||
always preferred.
|
||||
scheduler configuration via the [Update Scheduler Config][update-scheduler-config]
|
||||
endpoint, that is always preferred.
|
||||
|
||||
[encryption]: https://learn.hashicorp.com/nomad/transport-security/concepts 'Nomad Encryption Overview'
|
||||
[encryption]: /guides/security/encryption 'Nomad Encryption Overview'
|
||||
[server-join]: /docs/configuration/server_join 'Server Join'
|
||||
[Update Scheduler Config]: /api/operator#update-scheduler-configuration 'Scheduler Config'
|
||||
[Autopilot Guide]: https://learn.hashicorp.com/nomad/operating-nomad/autopilot
|
||||
[automatic Nomad bootstrapping documentation]: https://learn.hashicorp.com/nomad/operating-nomad/clustering
|
||||
[update-scheduler-config]: /api/operator#update-scheduler-configuration 'Scheduler Config'
|
||||
|
|
|
@ -26,7 +26,7 @@ start the Nomad agent.
|
|||
|
||||
This section of the documentation only covers the configuration options for
|
||||
`tls` stanza. To understand how to setup the certificates themselves, please see
|
||||
the ["Enable TLS Encryption for Nomad" Guide].
|
||||
the [Encryption Overview Guide](/guides/security/encryption).
|
||||
|
||||
## `tls` Parameters
|
||||
|
||||
|
@ -101,4 +101,3 @@ tls {
|
|||
```
|
||||
|
||||
[raft]: https://github.com/hashicorp/serf 'Serf by HashiCorp'
|
||||
["Enable TLS Encryption for Nomad" Guide]: https://learn.hashicorp.com/nomad/transport-security/enable-tls
|
||||
|
|
|
@ -29,7 +29,7 @@ Below is a list of community-supported task drivers you can use with Nomad:
|
|||
- [Firecracker][firecracker-task-driver]
|
||||
- [Systemd-Nspawn][nspawn-driver]
|
||||
|
||||
[lxc]: https://learn.hashicorp.com/nomad/using-plugins/lxc
|
||||
[lxc]: /docs/drivers/external/lxc
|
||||
[plugin_guide]: /docs/internals/plugins
|
||||
[singularity]: /docs/drivers/external/singularity
|
||||
[jail-task-driver]: /docs/drivers/external/jail-task-driver
|
||||
|
|
2
website/pages/docs/drivers/external/lxc.mdx
vendored
2
website/pages/docs/drivers/external/lxc.mdx
vendored
|
@ -181,7 +181,7 @@ isolation is not supported as of now.
|
|||
|
||||
[lxc-create]: https://linuxcontainers.org/lxc/manpages/man1/lxc-create.1.html
|
||||
[lxc-driver]: https://releases.hashicorp.com/nomad-driver-lxc
|
||||
[lxc-guide]: https://learn.hashicorp.com/nomad/using-plugins/lxc
|
||||
[lxc-guide]: /guides/operating-a-job/external/lxc.html
|
||||
[lxc_man]: https://linuxcontainers.org/lxc/manpages/man5/lxc.container.conf.5.html#lbAM
|
||||
[plugin]: /docs/configuration/plugin
|
||||
[plugin_dir]: /docs/configuration#plugin_dir
|
||||
|
|
|
@ -39,4 +39,4 @@ more information on how to protect Nomad cluster operations.
|
|||
[plugin]: /docs/configuration/plugin.html
|
||||
[docker_plugin]: /docs/drivers/docker#client-requirements
|
||||
[plugin_guide]: /docs/internals/plugins
|
||||
[acl_guide]: https://learn.hashicorp.com/nomad/acls/policies
|
||||
[acl_guide]: /guides/security/acl
|
||||
|
|
|
@ -24,19 +24,19 @@ Nomad Enterprise Platform enables operators to easily upgrade Nomad as well as e
|
|||
|
||||
Automated Upgrades allows operators to deploy a complete cluster of new servers and then simply wait for the upgrade to complete. As the new servers join the cluster, server logic checks the version of each Nomad server node. If the version is higher than the version on the current set of voters, it will avoid promoting the new servers to voters until the number of new servers matches the number of existing servers at the previous version. Once the numbers match, Nomad will begin to promote new servers and demote old ones.
|
||||
|
||||
See the [Autopilot - Upgrade Migrations](https://learn.hashicorp.com/nomad/operating-nomad/autopilot#upgrade-migrations) documentation for a thorough overview.
|
||||
See the [Autopilot - Upgrade Migrations](/guides/operations/autopilot#upgrade-migrations) documentation for a thorough overview.
|
||||
|
||||
### Enhanced Read Scalability
|
||||
|
||||
This feature enables an operator to introduce non-voting server nodes to a Nomad cluster. Non-voting servers will receive the replication stream but will not take part in quorum (required by the leader before log entries can be committed). Adding explicit non-voters will scale reads and scheduling without impacting write latency.
|
||||
|
||||
See the [Autopilot - Read Scalability](https://learn.hashicorp.com/nomad/operating-nomad/autopilot#server-read-and-scheduling-scaling) documentation for a thorough overview.
|
||||
See the [Autopilot - Read Scalability](/guides/operations/autopilot#server-read-and-scheduling-scaling) documentation for a thorough overview.
|
||||
|
||||
### Redundancy Zones
|
||||
|
||||
Redundancy Zones enables an operator to deploy a non-voting server as a hot standby server on a per availability zone basis. For example, in an environment with three availability zones an operator can run one voter and one non-voter in each availability zone, for a total of six servers. If an availability zone is completely lost, only one voter will be lost, so the cluster remains available. If a voter is lost in an availability zone, Nomad will promote the non-voter to a voter automatically, putting the hot standby server into service quickly.
|
||||
|
||||
See the [Autopilot - Redundancy Zones](https://learn.hashicorp.com/nomad/operating-nomad/autopilot#redundancy-zones) documentation for a thorough overview.
|
||||
See the [Autopilot - Redundancy Zones](/guides/operations/autopilot#redundancy-zones) documentation for a thorough overview.
|
||||
|
||||
## Governance & Policy
|
||||
|
||||
|
|
|
@ -10,7 +10,7 @@ description: The "connect" stanza allows specifying options for Consul Connect i
|
|||
<Placement groups={['job', 'group', 'service', 'connect']} />
|
||||
|
||||
The `connect` stanza allows configuring various options for
|
||||
[Consul Connect](/docs/integrations/consul-connect). It is
|
||||
[Consul Connect](/guides/integrations/consul-connect). It is
|
||||
valid only within the context of a service definition at the task group
|
||||
level.
|
||||
|
||||
|
|
|
@ -244,7 +244,7 @@ $ VAULT_TOKEN="..." nomad job run example.nomad
|
|||
[namespace]: https://learn.hashicorp.com/nomad/governance-and-policy/namespaces
|
||||
[parameterized]: /docs/job-specification/parameterized 'Nomad parameterized Job Specification'
|
||||
[periodic]: /docs/job-specification/periodic 'Nomad periodic Job Specification'
|
||||
[region]: https://learn.hashicorp.com/nomad/operating-nomad/federation
|
||||
[region]: /guides/operations/federation
|
||||
[reschedule]: /docs/job-specification/reschedule 'Nomad reschedule Job Specification'
|
||||
[scheduler]: /docs/schedulers 'Nomad Scheduler Types'
|
||||
[spread]: /docs/job-specification/spread 'Nomad spread Job Specification'
|
||||
|
|
|
@ -44,7 +44,7 @@ stanza for allocations on that node. The `migrate` stanza is for job authors to
|
|||
define how their services should be migrated, while the node drain deadline is
|
||||
for system operators to put hard limits on how long a drain may take.
|
||||
|
||||
See the [Workload Migration Guide](https://learn.hashicorp.com/nomad/operating-nomad/node-draining) for details
|
||||
See the [Workload Migration Guide](/guides/operations/node-draining) for details
|
||||
on node draining.
|
||||
|
||||
## `migrate` Parameters
|
||||
|
|
|
@ -25,14 +25,13 @@ and services. Because you don't know in advance what host your job will be
|
|||
provisioned on, Nomad will provide your tasks with network configuration when
|
||||
they start up.
|
||||
|
||||
Nomad 0.10 enables support for the `network` stanza at the task group level.
|
||||
When the `network` stanza is defined at the group level with `bridge` as the
|
||||
networking mode, all tasks in the task group share the same network namespace.
|
||||
This is a prerequisite for [Consul Connect](/docs/integrations/consul-connect).
|
||||
Tasks running within a network namespace are not visible to applications outside
|
||||
the namespace on the same host. This allows [Connect][] enabled applications to
|
||||
bind only to localhost within the shared network stack, and use the proxy for
|
||||
ingress and egress traffic.
|
||||
Nomad 0.10 enables support for the `network` stanza at the task group level. When
|
||||
the `network` stanza is defined at the group level with `bridge` as the networking mode,
|
||||
all tasks in the task group share the same network namespace. This is a prerequisite for
|
||||
[Consul Connect](/guides/integrations/consul-connect). Tasks running within a
|
||||
network namespace are not visible to applications outside the namespace on the same host.
|
||||
This allows [Connect][] enabled applications to bind only to localhost within the shared network stack,
|
||||
and use the proxy for ingress and egress traffic.
|
||||
|
||||
Note that this document only applies to services that want to _listen_ on a
|
||||
port. Batch jobs or services that only make outbound connections do not need to
|
||||
|
|
|
@ -15,7 +15,7 @@ description: |-
|
|||
|
||||
The `proxy` stanza allows configuring various options for the sidecar proxy
|
||||
managed by Nomad for [Consul
|
||||
Connect](/docs/integrations/consul-connect). It is valid only
|
||||
Connect](/guides/integrations/consul-connect). It is valid only
|
||||
within the context of a `sidecar_service` stanza.
|
||||
|
||||
```hcl
|
||||
|
|
|
@ -271,9 +271,9 @@ Nomad manages registering, updating, and deregistering services with Consul. It
|
|||
is important to understand when each of these steps happens and how they can be
|
||||
customized.
|
||||
|
||||
**Registration**: Nomad will register `group` services and checks _before_
|
||||
**Registration**: Nomad will register `group` services and checks *before*
|
||||
starting any tasks. Services and checks for a specific `task` are registered
|
||||
_after_ the task has started.
|
||||
*after* the task has started.
|
||||
|
||||
**Updating**: If a service or check definition is updated, Nomad will update
|
||||
the service in Consul as well. Consul is updated without restarting a task.
|
||||
|
@ -678,7 +678,7 @@ advertise and check directly since Nomad isn't managing any port assignments.
|
|||
|
||||
[check_restart_stanza]: /docs/job-specification/check_restart 'check_restart stanza'
|
||||
[consul_grpc]: https://www.consul.io/api/agent/check.html#grpc
|
||||
[service-discovery]: /docs/integrations/consul-integration#service-discovery 'Nomad Service Discovery'
|
||||
[service-discovery]: /guides/integrations/consul-integration#service-discovery 'Nomad Service Discovery'
|
||||
[interpolation]: /docs/runtime/interpolation 'Nomad Runtime Interpolation'
|
||||
[network]: /docs/job-specification/network 'Nomad network Job Specification'
|
||||
[qemu]: /docs/drivers/qemu 'Nomad qemu Driver'
|
||||
|
|
|
@ -13,7 +13,7 @@ description: |-
|
|||
|
||||
The `sidecar_service` stanza allows configuring various options for the sidecar
|
||||
proxy managed by Nomad for [Consul
|
||||
Connect](/docs/integrations/consul-connect) integration. It is
|
||||
Connect](/guides/integrations/consul-connect) integration. It is
|
||||
valid only within the context of a connect stanza.
|
||||
|
||||
```hcl
|
||||
|
|
|
@ -13,7 +13,7 @@ description: |-
|
|||
|
||||
The `sidecar_task` stanza allows configuring various options for the proxy
|
||||
sidecar managed by Nomad for [Consul
|
||||
Connect](/docs/integrations/consul-connect) integration such as
|
||||
Connect](/guides/integrations/consul-connect) integration such as
|
||||
resource requirements, kill timeouts and more as defined below. It is valid
|
||||
only within the context of a [`connect`][connect] stanza.
|
||||
|
||||
|
|
|
@ -204,7 +204,7 @@ task "server" {
|
|||
[java]: /docs/drivers/java 'Nomad Java Driver'
|
||||
[docker]: /docs/drivers/docker 'Nomad Docker Driver'
|
||||
[rkt]: /docs/drivers/rkt 'Nomad rkt Driver'
|
||||
[service_discovery]: /docs/integrations/consul-integration#service-discovery 'Nomad Service Discovery'
|
||||
[service_discovery]: /guides/integrations/consul-integration#service-discovery 'Nomad Service Discovery'
|
||||
[template]: /docs/job-specification/template 'Nomad template Job Specification'
|
||||
[user_drivers]: /docs/configuration/client#_quot_user_checked_drivers_quot_
|
||||
[user_blacklist]: /docs/configuration/client#_quot_user_blacklist_quot_
|
||||
|
|
|
@ -264,7 +264,7 @@ group "two" {
|
|||
}
|
||||
```
|
||||
|
||||
[canary]: https://learn.hashicorp.com/nomad/update-strategies/blue-green-and-canary-deployments 'Nomad Canary Deployments'
|
||||
[canary]: /guides/operating-a-job/update-strategies/blue-green-and-canary-deployments 'Nomad Canary Deployments'
|
||||
[checks]: /docs/job-specification/service#check-parameters 'Nomad check Job Specification'
|
||||
[rolling]: https://learn.hashicorp.com/nomad/update-strategies/rolling-upgrades 'Nomad Rolling Upgrades'
|
||||
[strategies]: https://learn.hashicorp.com/nomad/update-strategies/ 'Nomad Update Strategies'
|
||||
[rolling]: /guides/operating-a-job/update-strategies/rolling-upgrades 'Nomad Rolling Upgrades'
|
||||
[strategies]: /guides/operating-a-job/update-strategies 'Nomad Update Strategies'
|
||||
|
|
|
@ -23,7 +23,7 @@ description: |-
|
|||
|
||||
The `upstreams` stanza allows configuring various options for managing upstream
|
||||
services that a [Consul
|
||||
Connect](/docs/integrations/consul-connect) proxy routes to. It
|
||||
Connect](/guides/integrations/consul-connect) proxy routes to. It
|
||||
is valid only within the context of a `proxy` stanza.
|
||||
|
||||
For Consul-specific details see the [Consul Connect
|
||||
|
|
|
@ -172,8 +172,8 @@ documentation for all possible fields and more complete documentation.
|
|||
Nomad. This was remedied in 0.6.5 and does not effect earlier versions
|
||||
of Vault.
|
||||
|
||||
- `token_explicit_max_ttl` - Specifies the max TTL of a token. **Must be set to
|
||||
`0`** to allow periodic tokens.
|
||||
- `token_explicit_max_ttl` - Specifies the max TTL of a token. **Must be set to `0`** to
|
||||
allow periodic tokens.
|
||||
|
||||
- `name` - Specifies the name of the policy. We recommend using the name
|
||||
`nomad-cluster`. If a different name is chosen, replace the token role in the
|
||||
|
|
15
website/pages/guides/analytical-workloads/index.mdx
Normal file
15
website/pages/guides/analytical-workloads/index.mdx
Normal file
|
@ -0,0 +1,15 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Analytical Workloads on Nomad
|
||||
sidebar_title: Analytical Workloads
|
||||
description: List of data services.
|
||||
---
|
||||
|
||||
# Analytical Workloads on Nomad
|
||||
|
||||
Nomad is well-suited for analytical workloads, given its [performance](https://www.hashicorp.com/c1m/) and first-class support for
|
||||
[batch scheduling](/docs/schedulers).
|
||||
|
||||
This section provides some best practices and guidance for running analytical workloads on Nomad.
|
||||
|
||||
Please navigate the appropriate sub-sections for more information.
|
|
@ -0,0 +1,152 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Apache Spark Integration - Configuration Properties
|
||||
sidebar_title: Configuration Properties
|
||||
description: Comprehensive list of Spark configuration properties.
|
||||
---
|
||||
|
||||
# Spark Configuration Properties
|
||||
|
||||
Spark [configuration properties](https://spark.apache.org/docs/latest/configuration#available-properties)
|
||||
are generally applicable to the Nomad integration. The properties listed below
|
||||
are specific to running Spark on Nomad. Configuration properties can be set by
|
||||
adding `--conf [property]=[value]` to the `spark-submit` command.
|
||||
|
||||
- `spark.nomad.authToken` `(string: nil)` - Specifies the secret key of the auth
|
||||
token to use when accessing the API. This falls back to the NOMAD_TOKEN environment
|
||||
variable. Note that if this configuration setting is set and the cluster deploy
|
||||
mode is used, this setting will be propagated to the driver application in the
|
||||
job spec. If it is not set and an auth token is taken from the NOMAD_TOKEN
|
||||
environment variable, the token will not be propagated to the driver which will
|
||||
require the driver to pick up its token from an environment variable.
|
||||
|
||||
- `spark.nomad.cluster.expectImmediateScheduling` `(bool: false)` - Specifies
|
||||
that `spark-submit` should fail if Nomad is not able to schedule the job
|
||||
immediately.
|
||||
|
||||
- `spark.nomad.cluster.monitorUntil` `(string: "submitted"`) - Specifies the
|
||||
length of time that `spark-submit` should monitor a Spark application in cluster
|
||||
mode. When set to `submitted`, `spark-submit` will return as soon as the
|
||||
application has been submitted to the Nomad cluster. When set to `scheduled`,
|
||||
`spark-submit` will return as soon as the Nomad job has been scheduled. When
|
||||
set to `complete`, `spark-submit` will tail the output from the driver process
|
||||
and return when the job has completed.
|
||||
|
||||
- `spark.nomad.datacenters` `(string: dynamic)` - Specifies a comma-separated
|
||||
list of Nomad datacenters to use. This property defaults to the datacenter of
|
||||
the first Nomad server contacted.
|
||||
|
||||
- `spark.nomad.docker.email` `(string: nil)` - Specifies the email address to
|
||||
use when downloading the Docker image specified by
|
||||
[spark.nomad.dockerImage](#spark.nomad.dockerImage). See the
|
||||
[Docker driver authentication](/docs/drivers/docker#authentication)
|
||||
docs for more information.
|
||||
|
||||
- `spark.nomad.docker.password` `(string: nil)` - Specifies the password to use
|
||||
when downloading the Docker image specified by
|
||||
[spark.nomad.dockerImage](#spark.nomad.dockerImage). See the
|
||||
[Docker driver authentication](/docs/drivers/docker#authentication)
|
||||
docs for more information.
|
||||
|
||||
- `spark.nomad.docker.serverAddress` `(string: nil)` - Specifies the server
|
||||
address (domain/IP without the protocol) to use when downloading the Docker
|
||||
image specified by [spark.nomad.dockerImage](#spark.nomad.dockerImage). Docker
|
||||
Hub is used by default. See the
|
||||
[Docker driver authentication](/docs/drivers/docker#authentication)
|
||||
docs for more information.
|
||||
|
||||
- `spark.nomad.docker.username` `(string: nil)` - Specifies the username to use
|
||||
when downloading the Docker image specified by
|
||||
[spark.nomad.dockerImage](#spark-nomad-dockerImage). See the
|
||||
[Docker driver authentication](/docs/drivers/docker#authentication)
|
||||
docs for more information.
|
||||
|
||||
- `spark.nomad.dockerImage` `(string: nil)` - Specifies the `URL` for the
|
||||
[Docker image](/docs/drivers/docker#image) to
|
||||
use to run Spark with Nomad's `docker` driver. When not specified, Nomad's
|
||||
`exec` driver will be used instead.
|
||||
|
||||
- `spark.nomad.driver.cpu` `(string: "1000")` - Specifies the CPU in MHz that
|
||||
should be reserved for driver tasks.
|
||||
|
||||
- `spark.nomad.driver.logMaxFileSize` `(string: "1m")` - Specifies the maximum
|
||||
size by time that Nomad should use for driver task log files.
|
||||
|
||||
- `spark.nomad.driver.logMaxFiles` `(string: "5")` - Specifies the number of log
|
||||
files that Nomad should keep for driver tasks.
|
||||
|
||||
- `spark.nomad.driver.networkMBits` `(string: "1")` - Specifies the network
|
||||
bandwidth that Nomad should allocate to driver tasks.
|
||||
|
||||
- `spark.nomad.driver.retryAttempts` `(string: "5")` - Specifies the number of
|
||||
times that Nomad should retry driver task groups upon failure.
|
||||
|
||||
- `spark.nomad.driver.retryDelay` `(string: "15s")` - Specifies the length of
|
||||
time that Nomad should wait before retrying driver task groups upon failure.
|
||||
|
||||
- `spark.nomad.driver.retryInterval` `(string: "1d")` - Specifies Nomad's retry
|
||||
interval for driver task groups.
|
||||
|
||||
- `spark.nomad.executor.cpu` `(string: "1000")` - Specifies the CPU in MHz that
|
||||
should be reserved for executor tasks.
|
||||
|
||||
- `spark.nomad.executor.logMaxFileSize` `(string: "1m")` - Specifies the maximum
|
||||
size by time that Nomad should use for executor task log files.
|
||||
|
||||
- `spark.nomad.executor.logMaxFiles` `(string: "5")` - Specifies the number of
|
||||
log files that Nomad should keep for executor tasks.
|
||||
|
||||
- `spark.nomad.executor.networkMBits` `(string: "1")` - Specifies the network
|
||||
bandwidth that Nomad should allocate to executor tasks.
|
||||
|
||||
- `spark.nomad.executor.retryAttempts` `(string: "5")` - Specifies the number of
|
||||
times that Nomad should retry executor task groups upon failure.
|
||||
|
||||
- `spark.nomad.executor.retryDelay` `(string: "15s")` - Specifies the length of
|
||||
time that Nomad should wait before retrying executor task groups upon failure.
|
||||
|
||||
- `spark.nomad.executor.retryInterval` `(string: "1d")` - Specifies Nomad's retry
|
||||
interval for executor task groups.
|
||||
|
||||
- `spark.nomad.job.template` `(string: nil)` - Specifies the path to a JSON file
|
||||
containing a Nomad job to use as a template. This can also be set with
|
||||
`spark-submit's --nomad-template` parameter.
|
||||
|
||||
- `spark.nomad.namespace` `(string: nil)` - Specifies the namespace to use. This
|
||||
falls back first to the NOMAD_NAMESPACE environment variable and then to Nomad's
|
||||
default namespace.
|
||||
|
||||
- `spark.nomad.priority` `(string: nil)` - Specifies the priority for the
|
||||
Nomad job.
|
||||
|
||||
- `spark.nomad.region` `(string: dynamic)` - Specifies the Nomad region to use.
|
||||
This property defaults to the region of the first Nomad server contacted.
|
||||
|
||||
- `spark.nomad.shuffle.cpu` `(string: "1000")` - Specifies the CPU in MHz that
|
||||
should be reserved for shuffle service tasks.
|
||||
|
||||
- `spark.nomad.shuffle.logMaxFileSize` `(string: "1m")` - Specifies the maximum
|
||||
size by time that Nomad should use for shuffle service task log files..
|
||||
|
||||
- `spark.nomad.shuffle.logMaxFiles` `(string: "5")` - Specifies the number of
|
||||
log files that Nomad should keep for shuffle service tasks.
|
||||
|
||||
- `spark.nomad.shuffle.memory` `(string: "256m")` - Specifies the memory that
|
||||
Nomad should allocate for the shuffle service tasks.
|
||||
|
||||
- `spark.nomad.shuffle.networkMBits` `(string: "1")` - Specifies the network
|
||||
bandwidth that Nomad should allocate to shuffle service tasks.
|
||||
|
||||
- `spark.nomad.sparkDistribution` `(string: nil)` - Specifies the location of
|
||||
the Spark distribution archive file to use.
|
||||
|
||||
- `spark.nomad.tls.caCert` `(string: nil)` - Specifies the path to a `.pem` file
|
||||
containing the certificate authority that should be used to validate the Nomad
|
||||
server's TLS certificate.
|
||||
|
||||
- `spark.nomad.tls.cert` `(string: nil)` - Specifies the path to a `.pem` file
|
||||
containing the TLS certificate to present to the Nomad server.
|
||||
|
||||
- `spark.nomad.tls.trustStorePassword` `(string: nil)` - Specifies the path to a
|
||||
`.pem` file containing the private key corresponding to the certificate in
|
||||
[spark.nomad.tls.cert](#spark-nomad-tls-cert).
|
126
website/pages/guides/analytical-workloads/spark/customizing.mdx
Normal file
126
website/pages/guides/analytical-workloads/spark/customizing.mdx
Normal file
|
@ -0,0 +1,126 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Apache Spark Integration - Customizing Applications
|
||||
sidebar_title: Customizing Applications
|
||||
description: |-
|
||||
Learn how to customize the Nomad job that is created to run a Spark
|
||||
application.
|
||||
---
|
||||
|
||||
# Customizing Applications
|
||||
|
||||
There are two ways to customize the Nomad job that Spark creates to run an
|
||||
application:
|
||||
|
||||
- Use the default job template and set configuration properties
|
||||
- Use a custom job template
|
||||
|
||||
## Using the Default Job Template
|
||||
|
||||
The Spark integration will use a generic job template by default. The template
|
||||
includes groups and tasks for the driver, executors and (optionally) the
|
||||
[shuffle service](/guides/analytical-workloads/spark/dynamic). The job itself and the tasks that
|
||||
are created have the `spark.nomad.role` meta value defined accordingly:
|
||||
|
||||
```hcl
|
||||
job "structure" {
|
||||
meta {
|
||||
"spark.nomad.role" = "application"
|
||||
}
|
||||
|
||||
# A driver group is only added in cluster mode
|
||||
group "driver" {
|
||||
task "driver" {
|
||||
meta {
|
||||
"spark.nomad.role" = "driver"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "executors" {
|
||||
count = 2
|
||||
task "executor" {
|
||||
meta {
|
||||
"spark.nomad.role" = "executor"
|
||||
}
|
||||
}
|
||||
|
||||
# Shuffle service tasks are only added when enabled (as it must be when
|
||||
# using dynamic allocation)
|
||||
task "shuffle-service" {
|
||||
meta {
|
||||
"spark.nomad.role" = "shuffle"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The default template can be customized indirectly by explicitly [setting
|
||||
configuration properties](/guides/analytical-workloads/spark/configuration).
|
||||
|
||||
## Using a Custom Job Template
|
||||
|
||||
An alternative to using the default template is to set the
|
||||
`spark.nomad.job.template` configuration property to the path of a file
|
||||
containing a custom job template. There are two important considerations:
|
||||
|
||||
- The template must use the JSON format. You can convert an HCL jobspec to
|
||||
JSON by running `nomad job run -output <job.nomad>`.
|
||||
|
||||
- `spark.nomad.job.template` should be set to a path on the submitting
|
||||
machine, not to a URL (even in cluster mode). The template does not need to
|
||||
be accessible to the driver or executors.
|
||||
|
||||
Using a job template you can override Spark’s default resource utilization, add
|
||||
additional metadata or constraints, set environment variables, add sidecar
|
||||
tasks and utilize the Consul and Vault integration. The template does
|
||||
not need to be a complete Nomad job specification, since Spark will add
|
||||
everything necessary to run your the application. For example, your template
|
||||
might set `job` metadata, but not contain any task groups, making it an
|
||||
incomplete Nomad job specification but still a valid template to use with Spark.
|
||||
|
||||
To customize the driver task group, include a task group in your template that
|
||||
has a task that contains a `spark.nomad.role` meta value set to `driver`.
|
||||
|
||||
To customize the executor task group, include a task group in your template that
|
||||
has a task that contains a `spark.nomad.role` meta value set to `executor` or
|
||||
`shuffle`.
|
||||
|
||||
The following template adds a `meta` value at the job level and an environment
|
||||
variable to the executor task group:
|
||||
|
||||
```hcl
|
||||
job "template" {
|
||||
|
||||
meta {
|
||||
"foo" = "bar"
|
||||
}
|
||||
|
||||
group "executor-group-name" {
|
||||
|
||||
task "executor-task-name" {
|
||||
meta {
|
||||
"spark.nomad.role" = "executor"
|
||||
}
|
||||
|
||||
env {
|
||||
BAZ = "something"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Order of Precedence
|
||||
|
||||
The order of precedence for customized settings is as follows:
|
||||
|
||||
1. Explicitly set configuration properties.
|
||||
2. Settings in the job template (if provided).
|
||||
3. Default values of the configuration properties.
|
||||
|
||||
## Next Steps
|
||||
|
||||
Learn how to [allocate resources](/guides/analytical-workloads/spark/resource) for your Spark
|
||||
applications.
|
28
website/pages/guides/analytical-workloads/spark/dynamic.mdx
Normal file
28
website/pages/guides/analytical-workloads/spark/dynamic.mdx
Normal file
|
@ -0,0 +1,28 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Apache Spark Integration - Dynamic Executors
|
||||
sidebar_title: Dynamic Executors
|
||||
description: |-
|
||||
Learn how to dynamically scale Spark executors based the queue of pending
|
||||
tasks.
|
||||
---
|
||||
|
||||
# Dynamically Allocate Spark Executors
|
||||
|
||||
By default, the Spark application will use a fixed number of executors. Setting
|
||||
`spark.dynamicAllocation` to `true` enables Spark to add and remove executors
|
||||
during execution depending on the number of Spark tasks scheduled to run. As
|
||||
described in [Dynamic Resource Allocation](http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation), dynamic allocation requires that `spark.shuffle.service.enabled` be set to `true`.
|
||||
|
||||
On Nomad, this adds an additional shuffle service task to the executor
|
||||
task group. This results in a one-to-one mapping of executors to shuffle
|
||||
services.
|
||||
|
||||
When the executor exits, the shuffle service continues running so that it can
|
||||
serve any results produced by the executor. Due to the nature of resource
|
||||
allocation in Nomad, the resources allocated to the executor tasks are not
|
||||
freed until the shuffle service (and the application) has finished.
|
||||
|
||||
## Next Steps
|
||||
|
||||
Learn how to [integrate Spark with HDFS](/guides/analytical-workloads/spark/hdfs).
|
138
website/pages/guides/analytical-workloads/spark/hdfs.mdx
Normal file
138
website/pages/guides/analytical-workloads/spark/hdfs.mdx
Normal file
|
@ -0,0 +1,138 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Apache Spark Integration - Using HDFS
|
||||
sidebar_title: Running HDFS
|
||||
description: Learn how to deploy HDFS on Nomad and integrate it with Spark.
|
||||
---
|
||||
|
||||
# Using HDFS
|
||||
|
||||
[HDFS](https://en.wikipedia.org/wiki/Apache_Hadoop#Hadoop_distributed_file_system)
|
||||
is a distributed, replicated and scalable file system written for the Hadoop
|
||||
framework. Spark was designed to read from and write to HDFS, since it is
|
||||
common for Spark applications to perform data-intensive processing over large
|
||||
datasets. HDFS can be deployed as its own Nomad job.
|
||||
|
||||
## Running HDFS on Nomad
|
||||
|
||||
A sample HDFS job file can be found [here](https://github.com/hashicorp/nomad/blob/master/terraform/examples/spark/hdfs.nomad).
|
||||
It has two task groups, one for the HDFS NameNode and one for the
|
||||
DataNodes. Both task groups use a [Docker image](https://github.com/hashicorp/nomad/tree/master/terraform/examples/spark/docker/hdfs) that includes Hadoop:
|
||||
|
||||
```hcl
|
||||
group "NameNode" {
|
||||
|
||||
constraint {
|
||||
operator = "distinct_hosts"
|
||||
value = "true"
|
||||
}
|
||||
|
||||
task "NameNode" {
|
||||
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "rcgenova/hadoop-2.7.3"
|
||||
command = "bash"
|
||||
args = [ "-c", "hdfs namenode -format && exec hdfs namenode
|
||||
-D fs.defaultFS=hdfs://${NOMAD_ADDR_ipc}/ -D dfs.permissions.enabled=false" ]
|
||||
network_mode = "host"
|
||||
port_map {
|
||||
ipc = 8020
|
||||
ui = 50070
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 1000
|
||||
memory = 1024
|
||||
network {
|
||||
port "ipc" {
|
||||
static = "8020"
|
||||
}
|
||||
port "ui" {
|
||||
static = "50070"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "hdfs"
|
||||
port = "ipc"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The NameNode task registers itself in Consul as `hdfs`. This enables the
|
||||
DataNodes to generically reference the NameNode:
|
||||
|
||||
```hcl
|
||||
group "DataNode" {
|
||||
|
||||
count = 3
|
||||
|
||||
constraint {
|
||||
operator = "distinct_hosts"
|
||||
value = "true"
|
||||
}
|
||||
|
||||
task "DataNode" {
|
||||
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
network_mode = "host"
|
||||
image = "rcgenova/hadoop-2.7.3"
|
||||
args = [ "hdfs", "datanode"
|
||||
, "-D", "fs.defaultFS=hdfs://hdfs.service.consul/"
|
||||
, "-D", "dfs.permissions.enabled=false"
|
||||
]
|
||||
port_map {
|
||||
data = 50010
|
||||
ipc = 50020
|
||||
ui = 50075
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 1000
|
||||
memory = 1024
|
||||
network {
|
||||
port "data" {
|
||||
static = "50010"
|
||||
}
|
||||
port "ipc" {
|
||||
static = "50020"
|
||||
}
|
||||
port "ui" {
|
||||
static = "50075"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Another viable option for DataNode task group is to use a dedicated
|
||||
[system](/docs/schedulers#system) job.
|
||||
This will deploy a DataNode to every client node in the system, which may or may
|
||||
not be desirable depending on your use case.
|
||||
|
||||
The HDFS job can be deployed using the `nomad job run` command:
|
||||
|
||||
```shell
|
||||
$ nomad job run hdfs.nomad
|
||||
```
|
||||
|
||||
## Production Deployment Considerations
|
||||
|
||||
A production deployment will typically have redundant NameNodes in an
|
||||
active/passive configuration (which requires ZooKeeper). See [HDFS High
|
||||
Availability](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html).
|
||||
|
||||
## Next Steps
|
||||
|
||||
Learn how to [monitor the output](/guides/analytical-workloads/spark/monitoring) of your
|
||||
Spark applications.
|
20
website/pages/guides/analytical-workloads/spark/index.mdx
Normal file
20
website/pages/guides/analytical-workloads/spark/index.mdx
Normal file
|
@ -0,0 +1,20 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Running Apache Spark on Nomad
|
||||
sidebar_title: Apache Spark
|
||||
description: Learn how to run Apache Spark on a Nomad cluster.
|
||||
---
|
||||
|
||||
# Running Apache Spark on Nomad
|
||||
|
||||
Apache Spark is a popular data processing engine/framework that has been
|
||||
architected to use third-party schedulers. The Nomad ecosystem includes a
|
||||
[fork of Apache Spark](https://github.com/hashicorp/nomad-spark) that natively
|
||||
integrates Nomad as a cluster manager and scheduler for Spark. When running on
|
||||
Nomad, the Spark executors that run Spark tasks for your application, and
|
||||
optionally the application driver itself, run as Nomad tasks in a Nomad job.
|
||||
|
||||
## Next Steps
|
||||
|
||||
The links in the sidebar contain detailed information about specific aspects of
|
||||
the integration, beginning with [Getting Started](/guides/analytical-workloads/spark/pre).
|
166
website/pages/guides/analytical-workloads/spark/monitoring.mdx
Normal file
166
website/pages/guides/analytical-workloads/spark/monitoring.mdx
Normal file
|
@ -0,0 +1,166 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Apache Spark Integration - Monitoring Output
|
||||
sidebar_title: Monitoring Output
|
||||
description: Learn how to monitor Spark application output.
|
||||
---
|
||||
|
||||
# Monitoring Spark Application Output
|
||||
|
||||
By default, `spark-submit` in `cluster` mode will submit your application
|
||||
to the Nomad cluster and return immediately. You can use the
|
||||
[spark.nomad.cluster.monitorUntil](/guides/analytical-workloads/spark/configuration#spark-nomad-cluster-monitoruntil) configuration property to have
|
||||
`spark-submit` monitor the job continuously. Note that, with this flag set,
|
||||
killing `spark-submit` will _not_ stop the spark application, since it will be
|
||||
running independently in the Nomad cluster.
|
||||
|
||||
## Spark UI
|
||||
|
||||
In cluster mode, if `spark.ui.enabled` is set to `true` (as by default), the
|
||||
Spark web UI will be dynamically allocated a port. The Web UI will be exposed by
|
||||
Nomad as a service, and the UI’s `URL` will appear in the Spark driver’s log. By
|
||||
default, the Spark web UI will terminate when the application finishes. This can
|
||||
be problematic when debugging an application. You can delay termination by
|
||||
setting `spark.ui.stopDelay` (e.g. `5m` for 5 minutes). Note that this will
|
||||
cause the driver process to continue to run. You can force termination
|
||||
immediately on the “Jobs” page of the web UI.
|
||||
|
||||
## Spark History Server
|
||||
|
||||
It is possible to reconstruct the web UI of a completed application using
|
||||
Spark’s [history server](https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact).
|
||||
The history server requires the event log to have been written to an accessible
|
||||
location like [HDFS](/guides/analytical-workloads/spark/hdfs) or Amazon S3.
|
||||
|
||||
Sample history server job file:
|
||||
|
||||
```hcl
|
||||
job "spark-history-server" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "server" {
|
||||
count = 1
|
||||
|
||||
task "history-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "barnardb/spark"
|
||||
command = "/spark/spark-2.1.0-bin-nomad/bin/spark-class"
|
||||
args = [ "org.apache.spark.deploy.history.HistoryServer" ]
|
||||
port_map {
|
||||
ui = 18080
|
||||
}
|
||||
network_mode = "host"
|
||||
}
|
||||
|
||||
env {
|
||||
"SPARK_HISTORY_OPTS" = "-Dspark.history.fs.logDirectory=hdfs://hdfs.service.consul/spark-events/"
|
||||
"SPARK_PUBLIC_DNS" = "spark-history.service.consul"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 1000
|
||||
memory = 1024
|
||||
network {
|
||||
mbits = 250
|
||||
port "ui" {
|
||||
static = 18080
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "spark-history"
|
||||
tags = ["spark", "ui"]
|
||||
port = "ui"
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The job file above can also be found [here](https://github.com/hashicorp/nomad/blob/master/terraform/examples/spark/spark-history-server-hdfs.nomad).
|
||||
|
||||
To run the history server, first [deploy HDFS](/guides/analytical-workloads/spark/hdfs) and then
|
||||
create a directory in HDFS to store events:
|
||||
|
||||
```shell
|
||||
$ hdfs dfs -fs hdfs://hdfs.service.consul:8020 -mkdir /spark-events
|
||||
```
|
||||
|
||||
You can then deploy the history server with:
|
||||
|
||||
```shell
|
||||
$ nomad job run spark-history-server-hdfs.nomad
|
||||
```
|
||||
|
||||
You can get the private IP for the history server with a Consul DNS lookup:
|
||||
|
||||
```shell
|
||||
$ dig spark-history.service.consul
|
||||
```
|
||||
|
||||
Find the public IP that corresponds to the private IP returned by the `dig`
|
||||
command above. You can access the history server at http://PUBLIC_IP:18080.
|
||||
|
||||
Use the `spark.eventLog.enabled` and `spark.eventLog.dir` configuration
|
||||
properties in `spark-submit` to log events for a given application:
|
||||
|
||||
```shell
|
||||
$ spark-submit \
|
||||
--class org.apache.spark.examples.JavaSparkPi \
|
||||
--master nomad \
|
||||
--deploy-mode cluster \
|
||||
--conf spark.executor.instances=4 \
|
||||
--conf spark.nomad.cluster.monitorUntil=complete \
|
||||
--conf spark.eventLog.enabled=true \
|
||||
--conf spark.eventLog.dir=hdfs://hdfs.service.consul/spark-events \
|
||||
--conf spark.nomad.sparkDistribution=https://nomad-spark.s3.amazonaws.com/spark-2.1.0-bin-nomad.tgz \
|
||||
https://nomad-spark.s3.amazonaws.com/spark-examples_2.11-2.1.0-SNAPSHOT.jar 100
|
||||
```
|
||||
|
||||
## Logs
|
||||
|
||||
Nomad clients collect the `stderr` and `stdout` of running tasks. The CLI or the
|
||||
HTTP API can be used to inspect logs, as documented in
|
||||
[Accessing Logs](/guides/operating-a-job/accessing-logs).
|
||||
In cluster mode, the `stderr` and `stdout` of the `driver` application can be
|
||||
accessed in the same way. The [Log Shipper Pattern](/guides/operating-a-job/accessing-logs#log-shipper-pattern) uses sidecar tasks to forward logs to a central location. This
|
||||
can be done using a job template as follows:
|
||||
|
||||
```hcl
|
||||
job "template" {
|
||||
group "driver" {
|
||||
|
||||
task "driver" {
|
||||
meta {
|
||||
"spark.nomad.role" = "driver"
|
||||
}
|
||||
}
|
||||
|
||||
task "log-forwarding-sidecar" {
|
||||
# sidecar task definition here
|
||||
}
|
||||
}
|
||||
|
||||
group "executor" {
|
||||
|
||||
task "executor" {
|
||||
meta {
|
||||
"spark.nomad.role" = "executor"
|
||||
}
|
||||
}
|
||||
|
||||
task "log-forwarding-sidecar" {
|
||||
# sidecar task definition here
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
Review the Nomad/Spark [configuration properties](/guides/analytical-workloads/spark/configuration).
|
120
website/pages/guides/analytical-workloads/spark/pre.mdx
Normal file
120
website/pages/guides/analytical-workloads/spark/pre.mdx
Normal file
|
@ -0,0 +1,120 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Apache Spark Integration - Getting Started
|
||||
sidebar_title: Getting Started
|
||||
description: Get started with the Nomad/Spark integration.
|
||||
---
|
||||
|
||||
# Getting Started
|
||||
|
||||
To get started, you can use Nomad's example Terraform configuration to
|
||||
automatically provision an environment in AWS, or you can manually provision a
|
||||
cluster.
|
||||
|
||||
## Provision a Cluster in AWS
|
||||
|
||||
Nomad's [Terraform configuration](https://github.com/hashicorp/nomad/tree/master/terraform)
|
||||
can be used to quickly provision a Spark-enabled Nomad environment in
|
||||
AWS. The embedded [Spark example](https://github.com/hashicorp/nomad/tree/master/terraform/examples/spark)
|
||||
provides for a quickstart experience that can be used in conjunction with
|
||||
this guide. When you have a cluster up and running, you can proceed to
|
||||
[Submitting applications](/guides/analytical-workloads/spark/submit).
|
||||
|
||||
## Manually Provision a Cluster
|
||||
|
||||
To manually configure provision a cluster, see the Nomad
|
||||
[Getting Started](/intro/getting-started/install) guide. There are two
|
||||
basic prerequisites to using the Spark integration once you have a cluster up
|
||||
and running:
|
||||
|
||||
- Access to a [Spark distribution](https://nomad-spark.s3.amazonaws.com/spark-2.1.0-bin-nomad.tgz)
|
||||
built with Nomad support. This is required for the machine that will submit
|
||||
applications as well as the Nomad tasks that will run the Spark executors.
|
||||
|
||||
- A Java runtime environment (JRE) for the submitting machine and the executors.
|
||||
|
||||
The subsections below explain further.
|
||||
|
||||
### Configure the Submitting Machine
|
||||
|
||||
To run Spark applications on Nomad, the submitting machine must have access to
|
||||
the cluster and have the Nomad-enabled Spark distribution installed. The code
|
||||
snippets below walk through installing Java and Spark on Ubuntu:
|
||||
|
||||
Install Java:
|
||||
|
||||
```shell
|
||||
$ sudo add-apt-repository -y ppa:openjdk-r/ppa
|
||||
$ sudo apt-get update
|
||||
$ sudo apt-get install -y openjdk-8-jdk
|
||||
$ JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
|
||||
```
|
||||
|
||||
Install Spark:
|
||||
|
||||
```shell
|
||||
$ wget -O - https://nomad-spark.s3.amazonaws.com/spark-2.1.0-bin-nomad.tgz \
|
||||
| sudo tar xz -C /usr/local
|
||||
$ export PATH=$PATH:/usr/local/spark-2.1.0-bin-nomad/bin
|
||||
```
|
||||
|
||||
Export NOMAD_ADDR to point Spark to your Nomad cluster:
|
||||
|
||||
```shell
|
||||
$ export NOMAD_ADDR=http://NOMAD_SERVER_IP:4646
|
||||
```
|
||||
|
||||
### Executor Access to the Spark Distribution
|
||||
|
||||
When running on Nomad, Spark creates Nomad tasks to run executors for use by the
|
||||
application's driver program. The executor tasks need access to a JRE, a Spark
|
||||
distribution built with Nomad support, and (in cluster mode) the Spark
|
||||
application itself. By default, Nomad will only place Spark executors on client
|
||||
nodes that have the Java runtime installed (version 7 or higher).
|
||||
|
||||
In this example, the Spark distribution and the Spark application JAR file are
|
||||
being pulled from Amazon S3:
|
||||
|
||||
```shell
|
||||
$ spark-submit \
|
||||
--class org.apache.spark.examples.JavaSparkPi \
|
||||
--master nomad \
|
||||
--deploy-mode cluster \
|
||||
--conf spark.executor.instances=4 \
|
||||
--conf spark.nomad.sparkDistribution=https://nomad-spark.s3.amazonaws.com/spark-2.1.0-bin-nomad.tgz \
|
||||
https://nomad-spark.s3.amazonaws.com/spark-examples_2.11-2.1.0-SNAPSHOT.jar 100
|
||||
```
|
||||
|
||||
### Using a Docker Image
|
||||
|
||||
An alternative to installing the JRE on every client node is to set the
|
||||
[spark.nomad.dockerImage](/guides/analytical-workloads/spark/configuration#spark-nomad-dockerimage)
|
||||
configuration property to the URL of a Docker image that has the Java runtime
|
||||
installed. If set, Nomad will use the `docker` driver to run Spark executors in
|
||||
a container created from the image. The
|
||||
[spark.nomad.dockerAuth](/guides/analytical-workloads/spark/configuration#spark-nomad-dockerauth)
|
||||
configuration property can be set to a JSON object to provide Docker repository
|
||||
authentication configuration.
|
||||
|
||||
When using a Docker image, both the Spark distribution and the application
|
||||
itself can be included (in which case local URLs can be used for `spark-submit`).
|
||||
|
||||
Here, we include [spark.nomad.dockerImage](/guides/analytical-workloads/spark/configuration#spark-nomad-dockerimage)
|
||||
and use local paths for
|
||||
[spark.nomad.sparkDistribution](/guides/analytical-workloads/spark/configuration#spark-nomad-sparkdistribution)
|
||||
and the application JAR file:
|
||||
|
||||
```shell
|
||||
$ spark-submit \
|
||||
--class org.apache.spark.examples.JavaSparkPi \
|
||||
--master nomad \
|
||||
--deploy-mode cluster \
|
||||
--conf spark.nomad.dockerImage=rcgenova/spark \
|
||||
--conf spark.executor.instances=4 \
|
||||
--conf spark.nomad.sparkDistribution=/spark-2.1.0-bin-nomad.tgz \
|
||||
/spark-examples_2.11-2.1.0-SNAPSHOT.jar 100
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
Learn how to [submit applications](/guides/analytical-workloads/spark/submit).
|
82
website/pages/guides/analytical-workloads/spark/resource.mdx
Normal file
82
website/pages/guides/analytical-workloads/spark/resource.mdx
Normal file
|
@ -0,0 +1,82 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Apache Spark Integration - Resource Allocation
|
||||
sidebar_title: Resource Allocation
|
||||
description: Learn how to configure resource allocation for your Spark applications.
|
||||
---
|
||||
|
||||
# Resource Allocation
|
||||
|
||||
Resource allocation can be configured using a job template or through
|
||||
configuration properties. Here is a sample template in HCL syntax (this would
|
||||
need to be converted to JSON):
|
||||
|
||||
```hcl
|
||||
job "template" {
|
||||
group "group-name" {
|
||||
|
||||
task "executor" {
|
||||
meta {
|
||||
"spark.nomad.role" = "executor"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 2000
|
||||
memory = 2048
|
||||
network {
|
||||
mbits = 100
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Resource-related configuration properties are covered below.
|
||||
|
||||
## Memory
|
||||
|
||||
The standard Spark memory properties will be propagated to Nomad to control
|
||||
task resource allocation: `spark.driver.memory` (set by `--driver-memory`) and
|
||||
`spark.executor.memory` (set by `--executor-memory`). You can additionally specify
|
||||
[spark.nomad.shuffle.memory](/guides/analytical-workloads/spark/configuration#spark-nomad-shuffle-memory)
|
||||
to control how much memory Nomad allocates to shuffle service tasks.
|
||||
|
||||
## CPU
|
||||
|
||||
Spark sizes its thread pools and allocates tasks based on the number of CPU
|
||||
cores available. Nomad manages CPU allocation in terms of processing speed
|
||||
rather than number of cores. When running Spark on Nomad, you can control how
|
||||
much CPU share Nomad will allocate to tasks using the
|
||||
[spark.nomad.driver.cpu](/guides/analytical-workloads/spark/configuration#spark-nomad-driver-cpu)
|
||||
(set by `--driver-cpu`),
|
||||
[spark.nomad.executor.cpu](/guides/analytical-workloads/spark/configuration#spark-nomad-executor-cpu)
|
||||
(set by `--executor-cpu`) and
|
||||
[spark.nomad.shuffle.cpu](/guides/analytical-workloads/spark/configuration#spark-nomad-shuffle-cpu)
|
||||
properties. When running on Nomad, executors will be configured to use one core
|
||||
by default, meaning they will only pull a single 1-core task at a time. You can
|
||||
set the `spark.executor.cores` property (set by `--executor-cores`) to allow
|
||||
more tasks to be executed concurrently on a single executor.
|
||||
|
||||
## Network
|
||||
|
||||
Nomad does not restrict the network bandwidth of running tasks, bit it does
|
||||
allocate a non-zero number of Mbit/s to each task and uses this when bin packing
|
||||
task groups onto Nomad clients. Spark defaults to requesting the minimum of 1
|
||||
Mbit/s per task, but you can change this with the
|
||||
[spark.nomad.driver.networkMBits](/guides/analytical-workloads/spark/configuration#spark-nomad-driver-networkmbits),
|
||||
[spark.nomad.executor.networkMBits](/guides/analytical-workloads/spark/configuration#spark-nomad-executor-networkmbits), and
|
||||
[spark.nomad.shuffle.networkMBits](/guides/analytical-workloads/spark/configuration#spark-nomad-shuffle-networkmbits)
|
||||
properties.
|
||||
|
||||
## Log rotation
|
||||
|
||||
Nomad performs log rotation on the `stdout` and `stderr` of its tasks. You can
|
||||
configure the number number and size of log files it will keep for driver and
|
||||
executor task groups using
|
||||
[spark.nomad.driver.logMaxFiles](/guides/analytical-workloads/spark/configuration#spark-nomad-driver-logmaxfiles)
|
||||
and [spark.nomad.executor.logMaxFiles](/guides/analytical-workloads/spark/configuration#spark-nomad-executor-logmaxfiles).
|
||||
|
||||
## Next Steps
|
||||
|
||||
Learn how to [dynamically allocate Spark executors](/guides/analytical-workloads/spark/dynamic).
|
81
website/pages/guides/analytical-workloads/spark/submit.mdx
Normal file
81
website/pages/guides/analytical-workloads/spark/submit.mdx
Normal file
|
@ -0,0 +1,81 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Apache Spark Integration - Submitting Applications
|
||||
sidebar_title: Submitting Applications
|
||||
description: Learn how to submit Spark jobs that run on a Nomad cluster.
|
||||
---
|
||||
|
||||
# Submitting Applications
|
||||
|
||||
The [`spark-submit`](https://spark.apache.org/docs/latest/submitting-applications.html)
|
||||
script located in Spark’s `bin` directory is used to launch applications on a
|
||||
cluster. Spark applications can be submitted to Nomad in either `client` mode
|
||||
or `cluster` mode.
|
||||
|
||||
## Client Mode
|
||||
|
||||
In `client` mode, the application driver runs on a machine that is not
|
||||
necessarily in the Nomad cluster. The driver’s `SparkContext` creates a Nomad
|
||||
job to run Spark executors. The executors connect to the driver and run Spark
|
||||
tasks on behalf of the application. When the driver’s SparkContext is stopped,
|
||||
the executors are shut down. Note that the machine running the driver or
|
||||
`spark-submit` needs to be reachable from the Nomad clients so that the
|
||||
executors can connect to it.
|
||||
|
||||
In `client` mode, application resources need to start out present on the
|
||||
submitting machine, so JAR files (both the primary JAR and those added with the
|
||||
`--jars` option) can not be specified using `http:` or `https:` URLs. You can
|
||||
either use files on the submitting machine (either as raw paths or `file:` URLs)
|
||||
, or use `local:` URLs to indicate that the files are independently available on
|
||||
both the submitting machine and all of the Nomad clients where the executors
|
||||
might run.
|
||||
|
||||
In this mode, the `spark-submit` invocation doesn’t return until the application
|
||||
has finished running, and killing the `spark-submit` process kills the
|
||||
application.
|
||||
|
||||
In this example, the `spark-submit` command is used to run the `SparkPi` sample
|
||||
application:
|
||||
|
||||
```shell
|
||||
$ spark-submit --class org.apache.spark.examples.SparkPi \
|
||||
--master nomad \
|
||||
--conf spark.nomad.sparkDistribution=https://nomad-spark.s3.amazonaws.com/spark-2.1.0-bin-nomad.tgz \
|
||||
lib/spark-examples*.jar \
|
||||
10
|
||||
```
|
||||
|
||||
## Cluster Mode
|
||||
|
||||
In cluster mode, the `spark-submit` process creates a Nomad job to run the Spark
|
||||
application driver itself. The driver’s `SparkContext` then adds Spark executors
|
||||
to the Nomad job. The executors connect to the driver and run Spark tasks on
|
||||
behalf of the application. When the driver’s `SparkContext` is stopped, the
|
||||
executors are shut down.
|
||||
|
||||
In cluster mode, application resources need to be hosted somewhere accessible
|
||||
to the Nomad cluster, so JARs (both the primary JAR and those added with the
|
||||
`--jars` option) can’t be specified using raw paths or `file:` URLs. You can either
|
||||
use `http:` or `https:` URLs, or use `local:` URLs to indicate that the files are
|
||||
independently available on all of the Nomad clients where the driver and executors
|
||||
might run.
|
||||
|
||||
Note that in cluster mode, the Nomad master URL needs to be routable from both
|
||||
the submitting machine and the Nomad client node that runs the driver. If the
|
||||
Nomad cluster is integrated with Consul, you may want to use a DNS name for the
|
||||
Nomad service served by Consul.
|
||||
|
||||
For example, to submit an application in cluster mode:
|
||||
|
||||
```shell
|
||||
$ spark-submit --class org.apache.spark.examples.SparkPi \
|
||||
--master nomad \
|
||||
--deploy-mode cluster \
|
||||
--conf spark.nomad.sparkDistribution=http://example.com/spark.tgz \
|
||||
http://example.com/spark-examples.jar \
|
||||
10
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
Learn how to [customize applications](/guides/analytical-workloads/spark/customizing).
|
17
website/pages/guides/analytical-workloads/spark/template.mdx
Normal file
17
website/pages/guides/analytical-workloads/spark/template.mdx
Normal file
|
@ -0,0 +1,17 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Apache Spark Integration - Title
|
||||
description: Learn how to .
|
||||
---
|
||||
|
||||
# Title
|
||||
|
||||
## Section 1
|
||||
|
||||
## Section 2
|
||||
|
||||
## Section 3
|
||||
|
||||
## Next Steps
|
||||
|
||||
[Next step](/guides/analytical-workloads/spark/name)
|
11
website/pages/guides/getting-started.mdx
Normal file
11
website/pages/guides/getting-started.mdx
Normal file
|
@ -0,0 +1,11 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Getting Started
|
||||
description: This section takes you to the Getting Started section.
|
||||
---
|
||||
|
||||
# Nomad Getting Started
|
||||
|
||||
Welcome to the Nomad guides section! If you are just getting started with
|
||||
Nomad, please start with the [Nomad introduction](/intro/getting-started/install) instead and then continue on to the guides. The guides provide examples of
|
||||
common Nomad workflows and actions for developers, operators, and security teams.
|
18
website/pages/guides/governance-and-policy/index.mdx
Normal file
18
website/pages/guides/governance-and-policy/index.mdx
Normal file
|
@ -0,0 +1,18 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Governance & Policy on Nomad
|
||||
sidebar_title: Governance & Policy
|
||||
description: List of data services.
|
||||
---
|
||||
|
||||
# Governance & Policy
|
||||
|
||||
These guides have been migrated to [HashiCorp's Learn website].
|
||||
|
||||
You can follow these links to find the specific guides on Learn:
|
||||
|
||||
- [Namespaces](https://learn.hashicorp.com/nomad/governance-and-policy/namespaces)
|
||||
- [Quotas](https://learn.hashicorp.com/nomad/governance-and-policy/quotas)
|
||||
- [Sentinel](https://learn.hashicorp.com/nomad/governance-and-policy/sentinel)
|
||||
|
||||
[hashicorp's learn website]: https://learn.hashicorp.com/nomad?track=governance-and-policy#governance-and-policy
|
14
website/pages/guides/index.mdx
Normal file
14
website/pages/guides/index.mdx
Normal file
|
@ -0,0 +1,14 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Guides
|
||||
description: |-
|
||||
Welcome to the Nomad guides! The section provides various guides for common
|
||||
Nomad workflows and actions.
|
||||
---
|
||||
|
||||
# Nomad Guides
|
||||
|
||||
Welcome to the Nomad guides! If you are just getting started with Nomad, please
|
||||
start with the [Nomad introduction](/intro) instead and then continue
|
||||
on to the guides. The guides provide examples for common Nomad workflows and
|
||||
actions for both users and operators of Nomad.
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Installing Nomad
|
||||
sidebar_title: Installing Nomad
|
||||
description: Learn how to install Nomad.
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Nomad Deployment Guide
|
||||
sidebar_title: Reference Install Guide
|
||||
description: |-
|
||||
|
@ -11,17 +11,17 @@ ea_version: 0.9
|
|||
|
||||
# Nomad Reference Install Guide
|
||||
|
||||
This deployment guide covers the steps required to install and configure a single HashiCorp Nomad cluster as defined in the [Nomad Reference Architecture].
|
||||
This deployment guide covers the steps required to install and configure a single HashiCorp Nomad cluster as defined in the [Nomad Reference Architecture](/guides/install/production/reference-architecture).
|
||||
|
||||
These instructions are for installing and configuring Nomad on Linux hosts running the systemd system and service manager.
|
||||
|
||||
## Reference Material
|
||||
|
||||
This deployment guide is designed to work in combination with the [Nomad Reference Architecture](/docs/install/production/reference-architecture) and [Consul Deployment Guide](https://www.consul.io/docs/guides/deployment-guide.html). Although it is not a strict requirement to follow the Nomad Reference Architecture, please ensure you are familiar with the overall architecture design. For example, installing Nomad server agents on multiple physical or virtual (with correct anti-affinity) hosts for high-availability.
|
||||
This deployment guide is designed to work in combination with the [Nomad Reference Architecture](/guides/install/production/reference-architecture) and [Consul Deployment Guide](https://www.consul.io/docs/guides/deployment-guide.html). Although it is not a strict requirement to follow the Nomad Reference Architecture, please ensure you are familiar with the overall architecture design. For example, installing Nomad server agents on multiple physical or virtual (with correct anti-affinity) hosts for high-availability.
|
||||
|
||||
## Overview
|
||||
|
||||
To provide a highly-available single cluster architecture, we recommend Nomad server agents be deployed to more than one host, as shown in the [Nomad Reference Architecture].
|
||||
To provide a highly-available single cluster architecture, we recommend Nomad server agents be deployed to more than one host, as shown in the [Nomad Reference Architecture](/guides/install/production/reference-architecture).
|
||||
|
||||
![Reference diagram](/img/nomad_reference_diagram.png)
|
||||
|
||||
|
@ -202,13 +202,13 @@ client {
|
|||
|
||||
### ACL configuration
|
||||
|
||||
The [Access Control] guide provides instructions on configuring and enabling ACLs.
|
||||
The [Access Control](/guides/security/acl) guide provides instructions on configuring and enabling ACLs.
|
||||
|
||||
### TLS configuration
|
||||
|
||||
Securing Nomad's cluster communication with mutual TLS (mTLS) is recommended for production deployments and can even ease operations by preventing mistakes and misconfigurations. Nomad clients and servers should not be publicly accessible without mTLS enabled.
|
||||
|
||||
The [Securing Nomad with TLS] guide provides instructions on configuring and enabling TLS.
|
||||
The [Securing Nomad with TLS](/guides/security/securing-nomad) guide provides instructions on configuring and enabling TLS.
|
||||
|
||||
## Start Nomad
|
||||
|
||||
|
@ -222,14 +222,8 @@ sudo systemctl status nomad
|
|||
|
||||
## Next Steps
|
||||
|
||||
- Read the [Outage Recovery] guide to learn the steps required to recover from a
|
||||
Nomad cluster outage.
|
||||
|
||||
- Read the [Autopilot] guide to learn about features in Nomad 0.8 to allow for
|
||||
automatic operator-friendly management of Nomad servers.
|
||||
|
||||
[nomad reference architecture]: /docs/install/production/reference-architecture
|
||||
[autopilot]: https://learn.hashicorp.com/nomad/operating-nomad/autopilot
|
||||
[outage recovery]: https://learn.hashicorp.com/nomad/operating-nomad/outage
|
||||
[access control]: https://learn.hashicorp.com/nomad?track=acls#acls
|
||||
[securing nomad with tls]: https://learn.hashicorp.com/nomad/transport-security/enable-tls
|
||||
- Read [Outage Recovery](/guides/operations/outage) to learn
|
||||
the steps required to recover from a Nomad cluster outage.
|
||||
- Read [Autopilot](/guides/operations/autopilot) to learn about
|
||||
features in Nomad 0.8 to allow for automatic operator-friendly
|
||||
management of Nomad servers.
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Installing Nomad for Production
|
||||
sidebar_title: Production
|
||||
description: Learn how to install Nomad for Production.
|
||||
|
@ -15,26 +15,26 @@ There are multiple steps to cover for a successful Nomad deployment:
|
|||
|
||||
This page lists the two primary methods to installing Nomad and how to verify a successful installation.
|
||||
|
||||
Please refer to [Installing Nomad](/docs/install) sub-section.
|
||||
Please refer to [Installing Nomad](/guides/install) sub-section.
|
||||
|
||||
## Hardware Requirements
|
||||
|
||||
This page details the recommended machine resources (instances), port requirements, and network topology for Nomad.
|
||||
|
||||
Please refer to [Hardware Requirements](/docs/install/production/requirements) sub-section.
|
||||
Please refer to [Hardware Requirements](/guides/install/production/requirements) sub-section.
|
||||
|
||||
## Setting Nodes with Nomad Agent
|
||||
These pages explain the Nomad agent process and how to set the server and client nodes in the cluster.
|
||||
|
||||
Please refer to [Set Server & Client Nodes](/docs/install/production/nomad-agent) and [Nomad Agent documentation](/docs/commands/agent) pages.
|
||||
Please refer to [Set Server & Client Nodes](/guides/install/production/nomad-agent) and [Nomad Agent documentation](/docs/commands/agent) pages.
|
||||
|
||||
## Reference Architecture
|
||||
|
||||
This document provides recommended practices and a reference architecture for HashiCorp Nomad production deployments. This reference architecture conveys a general architecture that should be adapted to accommodate the specific needs of each implementation.
|
||||
|
||||
Please refer to [Reference Architecture](/docs/install/production/reference-architecture) sub-section.
|
||||
Please refer to [Reference Architecture](/guides/install/production/reference-architecture) sub-section.
|
||||
|
||||
## Install Guide Based on Reference Architecture
|
||||
This guide provides an end-to-end walkthrough of the steps required to install a single production-ready Nomad cluster as defined in the Reference Architecture section.
|
||||
|
||||
Please refer to [Reference Install Guide](/docs/install/production/deployment-guide) sub-section.
|
||||
Please refer to [Reference Install Guide](/guides/install/production/deployment-guide) sub-section.
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Nomad Agent
|
||||
sidebar_title: Set Server & Client Nodes
|
||||
description: |-
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Nomad Reference Architecture
|
||||
sidebar_title: Reference Architecture
|
||||
description: |-
|
||||
|
@ -22,7 +22,7 @@ The following topics are addressed:
|
|||
- [High Availability](#high-availability)
|
||||
- [Failure Scenarios](#failure-scenarios)
|
||||
|
||||
This document describes deploying a Nomad cluster in combination with, or with access to, a [Consul cluster](/docs/integrations/consul-integration). We recommend the use of Consul with Nomad to provide automatic clustering, service discovery, health checking and dynamic configuration.
|
||||
This document describes deploying a Nomad cluster in combination with, or with access to, a [Consul cluster](/guides/integrations/consul-integration). We recommend the use of Consul with Nomad to provide automatic clustering, service discovery, health checking and dynamic configuration.
|
||||
|
||||
## <a name="ra"></a>Reference Architecture
|
||||
|
||||
|
@ -32,11 +32,11 @@ In a Nomad multi-region architecture, communication happens via [WAN gossip](/do
|
|||
|
||||
In cloud environments, a single cluster may be deployed across multiple availability zones. For example, in AWS each Nomad server can be deployed to an associated EC2 instance, and those EC2 instances distributed across multiple AZs. Similarly, Nomad server clusters can be deployed to multiple cloud regions to allow for region level HA scenarios.
|
||||
|
||||
For more information on Nomad server cluster design, see the [cluster requirements documentation](/docs/install/production/requirements).
|
||||
For more information on Nomad server cluster design, see the [cluster requirements documentation](/guides/install/production/requirements).
|
||||
|
||||
The design shared in this document is the recommended architecture for production environments, as it provides flexibility and resilience. Nomad utilizes an existing Consul server cluster; however, the deployment design of the Consul server cluster is outside the scope of this document.
|
||||
|
||||
Nomad to Consul connectivity is over HTTP and should be secured with TLS as well as a Consul token to provide encryption of all traffic. This is done using Nomad's [Automatic Clustering with Consul](https://learn.hashicorp.com/nomad/operating-nomad/clustering#use-consul-to-automatically-cluster-nodes).
|
||||
Nomad to Consul connectivity is over HTTP and should be secured with TLS as well as a Consul token to provide encryption of all traffic. This is done using Nomad's [Automatic Clustering with Consul](/guides/operations/cluster/automatic).
|
||||
|
||||
### <a name="one-region"></a>Deployment Topology within a Single Region
|
||||
|
||||
|
@ -56,7 +56,7 @@ By deploying Nomad server clusters in multiple regions, the user is able to inte
|
|||
|
||||
Nomad server clusters in different datacenters can be federated using WAN links. The server clusters can be joined to communicate over the WAN on port `4648`. This same port is used for single datacenter deployments over LAN as well.
|
||||
|
||||
Additional documentation is available to learn more about [Nomad server federation](https://learn.hashicorp.com/nomad/operating-nomad/federation).
|
||||
Additional documentation is available to learn more about [Nomad server federation](/guides/operations/federation).
|
||||
|
||||
## <a name="net"></a>Network Connectivity Details
|
||||
|
||||
|
@ -66,7 +66,7 @@ Nomad servers are expected to be able to communicate in high bandwidth, low late
|
|||
|
||||
Nomad client clusters require the ability to receive traffic as noted above in the Network Connectivity Details; however, clients can be separated into any type of infrastructure (multi-cloud, on-prem, virtual, bare metal, etc.) as long as they are reachable and can receive job requests from the Nomad servers.
|
||||
|
||||
Additional documentation is available to learn more about [Nomad networking](/docs/install/production/requirements#network-topology).
|
||||
Additional documentation is available to learn more about [Nomad networking](/guides/install/production/requirements#network-topology).
|
||||
|
||||
## <a name="system-reqs"></a>Deployment System Requirements
|
||||
|
||||
|
@ -107,7 +107,7 @@ Typical distribution in a cloud environment is to spread Nomad server nodes into
|
|||
|
||||
![Nomad fault tolerance](/img/nomad_fault_tolerance.png)
|
||||
|
||||
Additional documentation is available to learn more about [cluster sizing and failure tolerances](/docs/internals/consensus#deployment-table) as well as [outage recovery](https://learn.hashicorp.com/nomad/operating-nomad/outage).
|
||||
Additional documentation is available to learn more about [cluster sizing and failure tolerances](/docs/internals/consensus#deployment-table) as well as [outage recovery](/guides/operations/outage).
|
||||
|
||||
### Availability Zone Failure
|
||||
|
||||
|
@ -123,12 +123,12 @@ If the AZ containing a Nomad follower server fails, there is no immediate impact
|
|||
|
||||
### Region Failure
|
||||
|
||||
In the event of a region-level failure (which would contain an entire Nomad server cluster), clients will still be able to submit jobs to another region that is properly federated. However, there will likely be data loss as Nomad server clusters do not replicate their data to other region clusters. See [Multi-region Federation](https://learn.hashicorp.com/nomad/operating-nomad/federation) for more setup information.
|
||||
In the event of a region-level failure (which would contain an entire Nomad server cluster), clients will still be able to submit jobs to another region that is properly federated. However, there will likely be data loss as Nomad server clusters do not replicate their data to other region clusters. See [Multi-region Federation](/guides/operations/federation) for more setup information.
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Read [Deployment Guide](/docs/install/production/deployment-guide) to learn
|
||||
- Read [Deployment Guide](/guides/install/production/deployment-guide) to learn
|
||||
the steps required to install and configure a single HashiCorp Nomad cluster.
|
||||
|
||||
[acl]: https://learn.hashicorp.com/nomad?track=acls#acls
|
||||
[acl]: /guides/security/acl
|
||||
[sentinel]: https://learn.hashicorp.com/nomad/governance-and-policy/sentinel
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Hardware Requirements
|
||||
sidebar_title: Hardware Requirements
|
||||
description: |-
|
||||
|
@ -17,10 +17,9 @@ significant network bandwidth. The core count and network recommendations are to
|
|||
ensure high throughput as Nomad heavily relies on network communication and as
|
||||
the Servers are managing all the nodes in the region and performing scheduling.
|
||||
The memory and disk requirements are due to the fact that Nomad stores all state
|
||||
in memory and will store two snapshots of this data onto disk, which causes high
|
||||
IO in busy clusters with lots of writes. Thus disk should be at least 2 times
|
||||
the memory available to the server when deploying a high load cluster. When
|
||||
running on AWS prefer NVME or Provisioned IOPS SSD storage for data dir.
|
||||
in memory and will store two snapshots of this data onto disk, which causes high IO in busy clusters with lots of writes. Thus disk should
|
||||
be at least 2 times the memory available to the server when deploying a high
|
||||
load cluster. When running on AWS prefer NVME or Provisioned IOPS SSD storage for data dir.
|
||||
|
||||
These recommendations are guidelines and operators should always monitor the
|
||||
resource usage of Nomad to determine if the machines are under or over-sized.
|
||||
|
@ -30,8 +29,8 @@ used by Nomad. This should be used to target a specific resource utilization per
|
|||
node and to reserve resources for applications running outside of Nomad's
|
||||
supervision such as Consul and the operating system itself.
|
||||
|
||||
Please see the [reservation configuration](/docs/configuration/client#reserved)
|
||||
for more detail.
|
||||
Please see the [reservation configuration](/docs/configuration/client#reserved) for
|
||||
more detail.
|
||||
|
||||
## Network Topology
|
||||
|
||||
|
@ -71,9 +70,9 @@ port.
|
|||
- RPC (Default 4647). This is used for internal RPC communication between client
|
||||
agents and servers, and for inter-server traffic. TCP only.
|
||||
|
||||
- Serf WAN (Default 4648). This is used by servers to gossip both over the LAN
|
||||
and WAN to other servers. It isn't required that Nomad clients can reach this
|
||||
address. TCP and UDP.
|
||||
- Serf WAN (Default 4648). This is used by servers to gossip both over the LAN and
|
||||
WAN to other servers. It isn't required that Nomad clients can reach this address.
|
||||
TCP and UDP.
|
||||
|
||||
When tasks ask for dynamic ports, they are allocated out of the port range
|
||||
between 20,000 and 32,000. This is well under the ephemeral port range suggested
|
||||
|
@ -91,15 +90,9 @@ $ echo "49152 65535" > /proc/sys/net/ipv4/ip_local_port_range
|
|||
|
||||
## Bridge Networking and `iptables`
|
||||
|
||||
Nomad's task group networks and Consul Connect integration use bridge networking
|
||||
and iptables to send traffic between containers. The Linux kernel bridge module
|
||||
has three "tunables" that control whether traffic crossing the bridge are
|
||||
processed by iptables. Some operating systems (RedHat, CentOS, and Fedora in
|
||||
particular) configure these tunables to optimize for VM workloads where iptables
|
||||
rules might not be correctly configured for guest traffic.
|
||||
Nomad's task group networks and Consul Connect integration use bridge networking and iptables to send traffic between containers. The Linux kernel bridge module has three "tunables" that control whether traffic crossing the bridge are processed by iptables. Some operating systems (RedHat, CentOS, and Fedora in particular) configure these tunables to optimize for VM workloads where iptables rules might not be correctly configured for guest traffic.
|
||||
|
||||
These tunables can be set to allow iptables processing for the bridge network as
|
||||
follows:
|
||||
These tunables can be set to allow iptables processing for the bridge network as follows:
|
||||
|
||||
```shell
|
||||
$ echo 1 > /proc/sys/net/bridge/bridge-nf-call-arptables
|
||||
|
@ -107,9 +100,7 @@ $ echo 1 > /proc/sys/net/bridge/bridge-nf-call-ip6tables
|
|||
$ echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
|
||||
```
|
||||
|
||||
To preserve these settings on startup of a client node, add a file including the
|
||||
following to `/etc/sysctl.d/` or remove the file your Linux distribution puts in
|
||||
that directory.
|
||||
To preserve these settings on startup of a client node, add a file including the following to `/etc/sysctl.d/` or remove the file your Linux distribution puts in that directory.
|
||||
|
||||
```text
|
||||
net.bridge.bridge-nf-call-arptables = 1
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Installing Nomad for QuickStart
|
||||
sidebar_title: Quickstart
|
||||
description: Learn how to install Nomad locally or in a sandbox.
|
||||
|
@ -12,7 +12,8 @@ environment.
|
|||
|
||||
These installations are designed to get you started with Nomad easily and should
|
||||
be used only for experimentation purposes. If you are looking to install Nomad
|
||||
in production, please refer to our [Production Installation] guide here.
|
||||
in production, please refer to our [Production
|
||||
Installation](/guides/install/production) guide here.
|
||||
|
||||
## Local
|
||||
|
||||
|
@ -38,6 +39,5 @@ Experiment with Nomad in your browser via KataCoda's interactive learning platfo
|
|||
- [Introduction to Nomad](https://www.katacoda.com/hashicorp/scenarios/nomad-introduction)
|
||||
- [Nomad Playground](https://katacoda.com/hashicorp/scenarios/playground)
|
||||
|
||||
[installing-binary]: /docs/install#precompiled-binaries
|
||||
[installing-binary]: /guides/install#precompiled-binaries
|
||||
[vagrant-environment]: /intro/getting-started/install#vagrant-setup-optional-
|
||||
[production installation]: /docs/install/production
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Nomad as a Windows Service
|
||||
sidebar_title: Windows Service
|
||||
description: Discusses how to register and run Nomad as a native Windows service.
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Consul Connect
|
||||
sidebar_title: Consul Connect
|
||||
description: >-
|
||||
|
@ -13,8 +13,8 @@ description: >-
|
|||
later.
|
||||
|
||||
~> **Note:** Nomad's Connect integration requires that your underlying operating
|
||||
system to support linux network namespaces. Nomad Connect will not run on
|
||||
Windows or macOS.
|
||||
system to support linux network namespaces. Nomad Connect will not run on
|
||||
Windows or macOS.
|
||||
|
||||
[Consul Connect](https://www.consul.io/docs/connect) provides
|
||||
service-to-service connection authorization and encryption using mutual
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Consul Integration
|
||||
sidebar_title: Consul
|
||||
description: Learn how to integrate Nomad with Consul and add service discovery to jobs
|
||||
|
@ -29,7 +29,7 @@ configuration.
|
|||
Nomad servers and clients will be automatically informed of each other's
|
||||
existence when a running Consul cluster already exists and the Consul agent is
|
||||
installed and configured on each host. Please see the [Automatic Clustering with
|
||||
Consul](https://learn.hashicorp.com/nomad/operating-nomad/clustering#use-consul-to-automatically-cluster-nodes) guide for more information.
|
||||
Consul](/guides/operations/cluster/automatic) guide for more information.
|
||||
|
||||
## Service Discovery
|
||||
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Nomad HashiStack Integrations
|
||||
sidebar_title: Integrations
|
||||
description: This section features Nomad's integrations with Consul and Vault.
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
layout: docs
|
||||
layout: guides
|
||||
page_title: Vault Integration and Retrieving Dynamic Secrets
|
||||
sidebar_title: Vault
|
||||
description: |-
|
||||
|
@ -687,7 +687,7 @@ below </h2>
|
|||
[creation-statements]: https://www.vaultproject.io/api/secret/databases#creation_statements
|
||||
[destination]: /docs/job-specification/template#destination
|
||||
[fabio]: https://github.com/fabiolb/fabio
|
||||
[fabio-lb]: https://learn.hashicorp.com/nomad/load-balancing/fabio
|
||||
[fabio-lb]: https://learn.hashicorp.com/guides/load-balancing/fabio
|
||||
[inline]: /docs/job-specification/template#inline-template
|
||||
[login]: https://www.vaultproject.io/docs/commands/login
|
||||
[nomad-alloc-fs]: /docs/commands/alloc/fs
|
||||
|
@ -699,7 +699,7 @@ below </h2>
|
|||
[role]: https://www.vaultproject.io/docs/auth/token
|
||||
[seal]: https://www.vaultproject.io/docs/concepts/seal
|
||||
[secrets-task-directory]: /docs/runtime/environment#secrets-
|
||||
[step-5]: /docs/integrations/vault-integration#step-5-create-a-token-role
|
||||
[step-5]: /guides/integrations/vault-integration#step-5-create-a-token-role
|
||||
[template]: /docs/job-specification/template
|
||||
[token]: https://www.vaultproject.io/docs/concepts/tokens
|
||||
[vault]: https://www.vaultproject.io/
|
21
website/pages/guides/load-balancing/index.mdx
Normal file
21
website/pages/guides/load-balancing/index.mdx
Normal file
|
@ -0,0 +1,21 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Load Balancing
|
||||
sidebar_title: Load Balancing
|
||||
description: |-
|
||||
There are multiple approaches to load balancing within a Nomad cluster. This
|
||||
discusses the most popular strategies.
|
||||
---
|
||||
|
||||
# Load Balancing
|
||||
|
||||
These guides have been migrated to [HashiCorp's Learn website].
|
||||
|
||||
You can follow these links to find the specific guides on Learn:
|
||||
|
||||
- [Fabio](https://learn.hashicorp.com/nomad/load-balancing/fabio)
|
||||
- [NGINX](https://learn.hashicorp.com/nomad/load-balancing/nginx)
|
||||
- [HAProxy](https://learn.hashicorp.com/nomad/load-balancing/haproxy)
|
||||
- [Traefik](https://learn.hashicorp.com/nomad/load-balancing/traefik)
|
||||
|
||||
[hashicorp's learn website]: https://learn.hashicorp.com/nomad?track=load-balancing#load-balancing
|
126
website/pages/guides/operating-a-job/accessing-logs.mdx
Normal file
126
website/pages/guides/operating-a-job/accessing-logs.mdx
Normal file
|
@ -0,0 +1,126 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Accessing Logs - Operating a Job
|
||||
sidebar_title: Accessing Logs
|
||||
description: |-
|
||||
Nomad provides a top-level mechanism for viewing application logs and data
|
||||
files via the command line interface. This section discusses the nomad alloc
|
||||
logs command and API interface.
|
||||
---
|
||||
|
||||
# Accessing Logs
|
||||
|
||||
Viewing application logs is critical for debugging issues, examining performance
|
||||
problems, or even just verifying the application started correctly. To make this
|
||||
as simple as possible, Nomad provides:
|
||||
|
||||
- Job specification for [log rotation](/docs/job-specification/logs)
|
||||
- CLI command for [log viewing](/docs/commands/alloc/logs)
|
||||
- API for programmatic [log access](/api/client#stream-logs)
|
||||
|
||||
This section will utilize the job named "docs" from the [previous
|
||||
sections](/guides/operating-a-job/submitting-jobs), but these operations
|
||||
and command largely apply to all jobs in Nomad.
|
||||
|
||||
As a reminder, here is the output of the run command from the previous example:
|
||||
|
||||
```shell
|
||||
$ nomad job run docs.nomad
|
||||
==> Monitoring evaluation "42d788a3"
|
||||
Evaluation triggered by job "docs"
|
||||
Allocation "04d9627d" created: node "a1f934c9", group "example"
|
||||
Allocation "e7b8d4f5" created: node "012ea79b", group "example"
|
||||
Allocation "5cbf23a1" modified: node "1e1aa1e0", group "example"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "42d788a3" finished with status "complete"
|
||||
```
|
||||
|
||||
The provided allocation ID (which is also available via the `nomad status`
|
||||
command) is required to access the application's logs. To access the logs of our
|
||||
application, we issue the following command:
|
||||
|
||||
```shell
|
||||
$ nomad alloc logs 04d9627d
|
||||
```
|
||||
|
||||
The output will look something like this:
|
||||
|
||||
```text
|
||||
<timestamp> 10.1.1.196:5678 10.1.1.196:33407 "GET / HTTP/1.1" 200 12 "curl/7.35.0" 21.809µs
|
||||
<timestamp> 10.1.1.196:5678 10.1.1.196:33408 "GET / HTTP/1.1" 200 12 "curl/7.35.0" 20.241µs
|
||||
<timestamp> 10.1.1.196:5678 10.1.1.196:33409 "GET / HTTP/1.1" 200 12 "curl/7.35.0" 13.629µs
|
||||
```
|
||||
|
||||
By default, this will return the logs of the task. If more than one task is
|
||||
defined in the job file, the name of the task is a required argument:
|
||||
|
||||
```shell
|
||||
$ nomad alloc logs 04d9627d server
|
||||
```
|
||||
|
||||
The logs command supports both displaying the logs as well as following logs,
|
||||
blocking for more output, similar to `tail -f`. To follow the logs, use the
|
||||
appropriately named `-f` flag:
|
||||
|
||||
```shell
|
||||
$ nomad alloc logs -f 04d9627d
|
||||
```
|
||||
|
||||
This will stream logs to our console.
|
||||
|
||||
If you wish to see only the tail of a log, use the `-tail` and `-n` flags:
|
||||
|
||||
```shell
|
||||
$ nomad alloc logs -tail -n 25 04d9627d
|
||||
```
|
||||
|
||||
This will show the last 25 lines. If you omit the `-n` flag, `-tail` will
|
||||
default to 10 lines.
|
||||
|
||||
By default, only the logs on stdout are displayed. To show the log output from
|
||||
stderr, use the `-stderr` flag:
|
||||
|
||||
```shell
|
||||
$ nomad alloc logs -stderr 04d9627d
|
||||
```
|
||||
|
||||
## Log Shipper Pattern
|
||||
|
||||
While the logs command works well for quickly accessing application logs, it
|
||||
generally does not scale to large systems or systems that produce a lot of log
|
||||
output, especially for the long-term storage of logs. Nomad's retention of log
|
||||
files is best effort, so chatty applications should use a better log retention
|
||||
strategy.
|
||||
|
||||
Since applications log to the `alloc/` directory, all tasks within the same task
|
||||
group have access to each others logs. Thus it is possible to have a task group
|
||||
as follows:
|
||||
|
||||
```hcl
|
||||
group "my-group" {
|
||||
task "server" {
|
||||
# ...
|
||||
|
||||
# Setting the server task as the leader of the task group allows us to
|
||||
# signal the log shipper task to gracefully shutdown when the server exits.
|
||||
leader = true
|
||||
}
|
||||
|
||||
task "log-shipper" {
|
||||
# ...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In the above example, the `server` task is the application that should be run
|
||||
and will be producing the logs. The `log-shipper` reads those logs from the
|
||||
`alloc/logs/` directory and sends them to a longer-term storage solution such as
|
||||
Amazon S3 or an internal log aggregation system.
|
||||
|
||||
When using the log shipper pattern, especially for batch jobs, the main task
|
||||
should be marked as the [leader task](/docs/job-specification/task#leader).
|
||||
By marking the main task as a leader, when the task completes all other tasks
|
||||
within the group will be gracefully shutdown. This allows the log shipper to
|
||||
finish sending any logs and then exiting itself. The log shipper should set a
|
||||
high enough [`kill_timeout`](/docs/job-specification/task#kill_timeout)
|
||||
such that it can ship any remaining logs before exiting.
|
|
@ -0,0 +1,216 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Affinity
|
||||
sidebar_title: Placement Preferences with Affinities
|
||||
description: The following guide walks the user through using the affinity stanza in Nomad.
|
||||
---
|
||||
|
||||
# Expressing Job Placement Preferences with Affinities
|
||||
|
||||
The [affinity][affinity-stanza] stanza allows operators to express placement preferences for their jobs on particular types of nodes. Note that there is a key difference between the [constraint][constraint] stanza and the affinity stanza. The constraint stanza strictly filters where jobs are run based on [attributes][attributes] and [client metadata][client-metadata]. If no nodes are found to match, the placement does not succeed. The affinity stanza acts like a "soft constraint." Nomad will attempt to match the desired affinity, but placement will succeed even if no nodes match the desired criteria. This is done in conjunction with scoring based on the Nomad scheduler's bin packing algorithm which you can read more about [here][scheduling].
|
||||
|
||||
## Reference Material
|
||||
|
||||
- The [affinity][affinity-stanza] stanza documentation
|
||||
- [Scheduling][scheduling] with Nomad
|
||||
|
||||
## Estimated Time to Complete
|
||||
|
||||
20 minutes
|
||||
|
||||
## Challenge
|
||||
|
||||
Your application can run in datacenters `dc1` and `dc2`, but you have a strong preference to run it in `dc2`. Configure your job to tell the scheduler your preference while still allowing it to place your workload in `dc1` if the desired resources aren't available.
|
||||
|
||||
## Solution
|
||||
|
||||
Specify an affinity with the proper [weight][weight] so that the Nomad scheduler can find the best nodes on which to place your job. The affinity weight will be included when scoring nodes for placement along with other factors like the bin packing algorithm.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
To perform the tasks described in this guide, you need to have a Nomad
|
||||
environment with Consul installed. You can use this
|
||||
[repo](https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud)
|
||||
to easily provision a sandbox environment. This guide will assume a cluster with
|
||||
one server node and three client nodes.
|
||||
|
||||
-> **Please Note:** This guide is for demo purposes and is only using a single server
|
||||
node. In a production cluster, 3 or 5 server nodes are recommended.
|
||||
|
||||
## Steps
|
||||
|
||||
### Step 1: Place One of the Client Nodes in a Different Datacenter
|
||||
|
||||
We are going express our job placement preference based on the datacenter our
|
||||
nodes are located in. Choose one of your client nodes and edit `/etc/nomad.d/nomad.hcl` to change its location to `dc2`. A snippet of an example configuration file is show below with the required change is shown below.
|
||||
|
||||
```shell
|
||||
data_dir = "/opt/nomad/data"
|
||||
bind_addr = "0.0.0.0"
|
||||
datacenter = "dc2"
|
||||
|
||||
# Enable the client
|
||||
client {
|
||||
enabled = true
|
||||
...
|
||||
```
|
||||
|
||||
After making the change on your chosen client node, restart the Nomad service
|
||||
|
||||
```shell
|
||||
$ sudo systemctl restart nomad
|
||||
```
|
||||
|
||||
If everything worked correctly, you should be able to run the `nomad` [node status][node-status] command and see that one of your nodes is now in datacenter `dc2`.
|
||||
|
||||
```shell
|
||||
$ nomad node status
|
||||
ID DC Name Class Drain Eligibility Status
|
||||
3592943e dc1 ip-172-31-27-159 <none> false eligible ready
|
||||
3dea0188 dc1 ip-172-31-16-175 <none> false eligible ready
|
||||
6b6e9518 dc2 ip-172-31-27-25 <none> false eligible ready
|
||||
```
|
||||
|
||||
### Step 2: Create a Job with the `affinity` Stanza
|
||||
|
||||
Create a file with the name `redis.nomad` and place the following content in it:
|
||||
|
||||
```hcl
|
||||
job "redis" {
|
||||
datacenters = ["dc1", "dc2"]
|
||||
type = "service"
|
||||
|
||||
affinity {
|
||||
attribute = "${node.datacenter}"
|
||||
value = "dc2"
|
||||
weight = 100
|
||||
}
|
||||
|
||||
group "cache1" {
|
||||
count = 4
|
||||
|
||||
task "redis" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "redis:latest"
|
||||
port_map {
|
||||
db = 6379
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
port "db" {}
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "redis-cache"
|
||||
port = "db"
|
||||
check {
|
||||
name = "alive"
|
||||
type = "tcp"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note that we used the `affinity` stanza and specified `dc2` as the
|
||||
value for the [attribute][attributes] `${node.datacenter}`. We used the value `100` for the [weight][weight] which will cause the Nomad scheduler to rank nodes in datacenter `dc2` with a higher score. Keep in mind that weights can range from -100 to 100, inclusive. Negative weights serve as anti-affinities which cause Nomad to avoid placing allocations on nodes that match the criteria.
|
||||
|
||||
### Step 3: Register the Job `redis.nomad`
|
||||
|
||||
Run the Nomad job with the following command:
|
||||
|
||||
```shell
|
||||
$ nomad run redis.nomad
|
||||
==> Monitoring evaluation "11388ef2"
|
||||
Evaluation triggered by job "redis"
|
||||
Allocation "0dfcf0ba" created: node "6b6e9518", group "cache1"
|
||||
Allocation "89a9aae9" created: node "3592943e", group "cache1"
|
||||
Allocation "9a00f742" created: node "6b6e9518", group "cache1"
|
||||
Allocation "fc0f21bc" created: node "3dea0188", group "cache1"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "11388ef2" finished with status "complete"
|
||||
```
|
||||
|
||||
Note that two of the allocations in this example have been placed on node `6b6e9518`. This is the node we configured to be in datacenter `dc2`. The Nomad scheduler selected this node because of the affinity we specified. All of the allocations have not been placed on this node because the Nomad scheduler considers other factors in the scoring such as bin packing. This helps avoid placing too many instances of the same job on a node and prevents reduced capacity during a node level failure. We will take a detailed look at the scoring in the next few steps.
|
||||
|
||||
### Step 4: Check the Status of the `redis` Job
|
||||
|
||||
At this point, we are going to check the status of our job and verify where our
|
||||
allocations have been placed. Run the following command:
|
||||
|
||||
```shell
|
||||
$ nomad status redis
|
||||
```
|
||||
|
||||
You should see 4 instances of your job running in the `Summary` section of the
|
||||
output as shown below:
|
||||
|
||||
```shell
|
||||
...
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
cache1 0 0 4 0 0 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created Modified
|
||||
0dfcf0ba 6b6e9518 cache1 0 run running 1h44m ago 1h44m ago
|
||||
89a9aae9 3592943e cache1 0 run running 1h44m ago 1h44m ago
|
||||
9a00f742 6b6e9518 cache1 0 run running 1h44m ago 1h44m ago
|
||||
fc0f21bc 3dea0188 cache1 0 run running 1h44m ago 1h44m ago
|
||||
```
|
||||
|
||||
You can cross-check this output with the results of the `nomad node status` command to verify that the majority of your workload has been placed on the node in `dc2` (in our case, that node is `6b6e9518`).
|
||||
|
||||
### Step 5: Obtain Detailed Scoring Information on Job Placement
|
||||
|
||||
The Nomad scheduler will not always place all of your workload on nodes you have specified in the `affinity` stanza even if the resources are available. This is because affinity scoring is combined with other metrics as well before making a scheduling decision. In this step, we will take a look at some of those other factors.
|
||||
|
||||
Using the output from the previous step, find an allocation that has been placed
|
||||
on a node in `dc2` and use the nomad [alloc status][alloc status] command with
|
||||
the [verbose][verbose] option to obtain detailed scoring information on it. In
|
||||
this example, we will use the allocation ID `0dfcf0ba` (your allocation IDs will
|
||||
be different).
|
||||
|
||||
```shell
|
||||
$ nomad alloc status -verbose 0dfcf0ba
|
||||
```
|
||||
|
||||
The resulting output will show the `Placement Metrics` section at the bottom.
|
||||
|
||||
```shell
|
||||
...
|
||||
Placement Metrics
|
||||
Node binpack job-anti-affinity node-reschedule-penalty node-affinity final score
|
||||
6b6e9518-d2a4-82c8-af3b-6805c8cdc29c 0.33 0 0 1 0.665
|
||||
3dea0188-ae06-ad98-64dd-a761ab2b1bf3 0.33 0 0 0 0.33
|
||||
3592943e-67e4-461f-d888-d5842372a4d4 0.33 0 0 0 0.33
|
||||
```
|
||||
|
||||
Note that the results from the `binpack`, `job-anti-affinity`,
|
||||
`node-reschedule-penalty`, and `node-affinity` columns are combined to produce the
|
||||
numbers listed in the `final score` column for each node. The Nomad scheduler
|
||||
uses the final score for each node in deciding where to make placements.
|
||||
|
||||
## Next Steps
|
||||
|
||||
Experiment with the weight provided in the `affinity` stanza (the value can be
|
||||
from -100 through 100) and observe how the final score given to each node
|
||||
changes (use the `nomad alloc status` command as shown in the previous step).
|
||||
|
||||
[affinity-stanza]: /docs/job-specification/affinity
|
||||
[alloc status]: /docs/commands/alloc/status
|
||||
[attributes]: /docs/runtime/interpolation#node-variables-
|
||||
[constraint]: /docs/job-specification/constraint
|
||||
[client-metadata]: /docs/configuration/client#meta
|
||||
[node-status]: /docs/commands/node/status
|
||||
[scheduling]: /docs/internals/scheduling/scheduling
|
||||
[verbose]: /docs/commands/alloc/status#verbose
|
||||
[weight]: /docs/job-specification/affinity#weight
|
|
@ -0,0 +1,23 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Advanced Scheduling Features
|
||||
description: Introduce advanced scheduling features including affinity and spread.
|
||||
---
|
||||
|
||||
# Advanced Scheduling with Nomad
|
||||
|
||||
The Nomad [scheduler][scheduling] uses a bin packing algorithm to optimize the resource utilization and density of applications in your Nomad cluster. Nomad 0.9 adds new features to allow operators more fine-grained control over allocation placement. This enables use cases similar to the following:
|
||||
|
||||
- Expressing preference for a certain class of nodes for a specific application via the [affinity stanza][affinity-stanza].
|
||||
- Spreading allocations across a datacenter, rack or any other node attribute or metadata with the [spread stanza][spread-stanza].
|
||||
|
||||
Please refer to the guides below for using affinity and spread in Nomad 0.9.
|
||||
|
||||
- [Affinity][affinity-guide]
|
||||
- [Spread][spread-guide]
|
||||
|
||||
[affinity-guide]: /guides/operating-a-job/advanced-scheduling/affinity
|
||||
[affinity-stanza]: /docs/job-specification/affinity
|
||||
[spread-guide]: /guides/operating-a-job/advanced-scheduling/spread
|
||||
[spread-stanza]: /docs/job-specification/spread
|
||||
[scheduling]: /docs/internals/scheduling/scheduling
|
|
@ -0,0 +1,447 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Preemption (Service and Batch Jobs)
|
||||
sidebar_title: Preemption (Service and Batch Jobs)
|
||||
description: |-
|
||||
The following guide walks the user through enabling and using preemption on
|
||||
service and batch jobs in Nomad Enterprise (0.9.3 and above).
|
||||
---
|
||||
|
||||
# Preemption for Service and Batch Jobs
|
||||
|
||||
~> **Enterprise Only!** This functionality only exists in Nomad Enterprise. This
|
||||
is not present in the open source version of Nomad.
|
||||
|
||||
Prior to Nomad 0.9, job [priority][priority] in Nomad was used to process
|
||||
scheduling requests in priority order. Preemption, implemented in Nomad 0.9
|
||||
allows Nomad to evict running allocations to place allocations of a higher
|
||||
priority. Allocations of a job that are blocked temporarily go into "pending"
|
||||
status until the cluster has additional capacity to run them. This is useful
|
||||
when operators need to run relatively higher priority tasks sooner even under
|
||||
resource contention across the cluster.
|
||||
|
||||
While Nomad 0.9 introduced preemption for [system][system-job] jobs, Nomad 0.9.3
|
||||
[Enterprise][enterprise] additionally allows preemption for
|
||||
[service][service-job] and [batch][batch-job] jobs. This functionality can
|
||||
easily be enabled by sending a [payload][payload-preemption-config] with the
|
||||
appropriate options specified to the [scheduler
|
||||
configuration][update-scheduler] API endpoint.
|
||||
|
||||
## Reference Material
|
||||
|
||||
- [Preemption][preemption]
|
||||
- [Nomad Enterprise Preemption][enterprise-preemption]
|
||||
|
||||
## Estimated Time to Complete
|
||||
|
||||
20 minutes
|
||||
|
||||
## Prerequisites
|
||||
|
||||
To perform the tasks described in this guide, you need to have a Nomad
|
||||
environment with Consul installed. You can use this
|
||||
[repo](https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud)
|
||||
to easily provision a sandbox environment. This guide will assume a cluster with
|
||||
one server node and three client nodes. To simulate resource contention, the
|
||||
nodes in this environment will each have 1 GB RAM (For AWS, you can choose the
|
||||
[t2.micro][t2-micro] instance type). Remember that service and batch job
|
||||
preemption require Nomad 0.9.3 [Enterprise][enterprise].
|
||||
|
||||
-> **Please Note:** This guide is for demo purposes and is only using a single
|
||||
server node. In a production cluster, 3 or 5 server nodes are recommended.
|
||||
|
||||
## Steps
|
||||
|
||||
### Step 1: Create a Job with Low Priority
|
||||
|
||||
Start by creating a job with relatively lower priority into your Nomad cluster.
|
||||
One of the allocations from this job will be preempted in a subsequent
|
||||
deployment when there is a resource contention in the cluster. Copy the
|
||||
following job into a file and name it `webserver.nomad`.
|
||||
|
||||
```hcl
|
||||
job "webserver" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
priority = 40
|
||||
|
||||
group "webserver" {
|
||||
count = 3
|
||||
|
||||
task "apache" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "httpd:latest"
|
||||
|
||||
port_map {
|
||||
http = 80
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http"{}
|
||||
}
|
||||
|
||||
memory = 600
|
||||
}
|
||||
|
||||
service {
|
||||
name = "apache-webserver"
|
||||
port = "http"
|
||||
|
||||
check {
|
||||
name = "alive"
|
||||
type = "http"
|
||||
path = "/"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note that the [count][count] is 3 and that each allocation is specifying 600 MB
|
||||
of [memory][memory]. Remember that each node only has 1 GB of RAM.
|
||||
|
||||
### Step 2: Run the Low Priority Job
|
||||
|
||||
Register `webserver.nomad`:
|
||||
|
||||
```shell
|
||||
$ nomad run webserver.nomad
|
||||
==> Monitoring evaluation "1596bfc8"
|
||||
Evaluation triggered by job "webserver"
|
||||
Allocation "725d3b49" created: node "16653ac1", group "webserver"
|
||||
Allocation "e2f9cb3d" created: node "f765c6e8", group "webserver"
|
||||
Allocation "e9d8df1b" created: node "b0700ec0", group "webserver"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "1596bfc8" finished with status "complete"
|
||||
```
|
||||
|
||||
You should be able to check the status of the `webserver` job at this point and see that an allocation has been placed on each client node in the cluster:
|
||||
|
||||
```shell
|
||||
$ nomad status webserver
|
||||
ID = webserver
|
||||
Name = webserver
|
||||
Submit Date = 2019-06-19T04:20:32Z
|
||||
Type = service
|
||||
Priority = 40
|
||||
...
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created Modified
|
||||
725d3b49 16653ac1 webserver 0 run running 1m18s ago 59s ago
|
||||
e2f9cb3d f765c6e8 webserver 0 run running 1m18s ago 1m2s ago
|
||||
e9d8df1b b0700ec0 webserver 0 run running 1m18s ago 59s ago
|
||||
```
|
||||
|
||||
### Step 3: Create a Job with High Priority
|
||||
|
||||
Create another job with a [priority][priority] greater than the job you just deployed. Copy the following into a file named `redis.nomad`:
|
||||
|
||||
```hcl
|
||||
job "redis" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
priority = 80
|
||||
|
||||
group "cache1" {
|
||||
count = 1
|
||||
|
||||
task "redis" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "redis:latest"
|
||||
|
||||
port_map {
|
||||
db = 6379
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
port "db" {}
|
||||
}
|
||||
|
||||
memory = 700
|
||||
}
|
||||
|
||||
service {
|
||||
name = "redis-cache"
|
||||
port = "db"
|
||||
|
||||
check {
|
||||
name = "alive"
|
||||
type = "tcp"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note that this job has a priority of 80 (greater than the priority of the job
|
||||
from [Step 1][step-1]) and requires 700 MB of memory. This allocation will
|
||||
create a resource contention in the cluster since each node only has 1 GB of
|
||||
memory with a 600 MB allocation already placed on it.
|
||||
|
||||
### Step 4: Try to Run `redis.nomad`
|
||||
|
||||
Remember that preemption for service and batch jobs are [disabled by
|
||||
default][preemption-config]. This means that the `redis` job will be queued due
|
||||
to resource contention in the cluster. You can verify the resource contention before actually registering your job by running the [`plan`][plan] command:
|
||||
|
||||
```shell
|
||||
$ nomad plan redis.nomad
|
||||
+ Job: "redis"
|
||||
+ Task Group: "cache1" (1 create)
|
||||
+ Task: "redis" (forces create)
|
||||
|
||||
Scheduler dry-run:
|
||||
- WARNING: Failed to place all allocations.
|
||||
Task Group "cache1" (failed to place 1 allocation):
|
||||
* Resources exhausted on 3 nodes
|
||||
* Dimension "memory" exhausted on 3 nodes
|
||||
```
|
||||
|
||||
Run the job to see that the allocation will be queued:
|
||||
|
||||
```shell
|
||||
$ nomad run redis.nomad
|
||||
==> Monitoring evaluation "1e54e283"
|
||||
Evaluation triggered by job "redis"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "1e54e283" finished with status "complete" but failed to place all allocations:
|
||||
Task Group "cache1" (failed to place 1 allocation):
|
||||
* Resources exhausted on 3 nodes
|
||||
* Dimension "memory" exhausted on 3 nodes
|
||||
Evaluation "1512251a" waiting for additional capacity to place remainder
|
||||
```
|
||||
|
||||
You may also verify the allocation has been queued by now checking the status of the job:
|
||||
|
||||
```shell
|
||||
$ nomad status redis
|
||||
ID = redis
|
||||
Name = redis
|
||||
Submit Date = 2019-06-19T03:33:17Z
|
||||
Type = service
|
||||
Priority = 80
|
||||
...
|
||||
Placement Failure
|
||||
Task Group "cache1":
|
||||
* Resources exhausted on 3 nodes
|
||||
* Dimension "memory" exhausted on 3 nodes
|
||||
|
||||
Allocations
|
||||
No allocations placed
|
||||
```
|
||||
|
||||
You may remove this job now. In the next steps, we will enable service job preemption and re-deploy:
|
||||
|
||||
```shell
|
||||
$ nomad stop -purge redis
|
||||
==> Monitoring evaluation "153db6c0"
|
||||
Evaluation triggered by job "redis"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "153db6c0" finished with status "complete"
|
||||
```
|
||||
|
||||
### Step 5: Enable Service Job Preemption
|
||||
|
||||
Verify the [scheduler configuration][scheduler-configuration] with the following
|
||||
command:
|
||||
|
||||
```shell
|
||||
$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq
|
||||
{
|
||||
"SchedulerConfig": {
|
||||
"PreemptionConfig": {
|
||||
"SystemSchedulerEnabled": true,
|
||||
"BatchSchedulerEnabled": false,
|
||||
"ServiceSchedulerEnabled": false
|
||||
},
|
||||
"CreateIndex": 5,
|
||||
"ModifyIndex": 506
|
||||
},
|
||||
"Index": 506,
|
||||
"LastContact": 0,
|
||||
"KnownLeader": true
|
||||
}
|
||||
```
|
||||
|
||||
Note that [BatchSchedulerEnabled][batch-enabled] and
|
||||
[ServiceSchedulerEnabled][service-enabled] are both set to `false` by default.
|
||||
Since we are preempting service jobs in this guide, we need to set
|
||||
`ServiceSchedulerEnabled` to `true`. We will do this by directly interacting
|
||||
with the [API][update-scheduler].
|
||||
|
||||
Create the following JSON payload and place it in a file named `scheduler.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"PreemptionConfig": {
|
||||
"SystemSchedulerEnabled": true,
|
||||
"BatchSchedulerEnabled": false,
|
||||
"ServiceSchedulerEnabled": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note that [ServiceSchedulerEnabled][service-enabled] has been set to `true`.
|
||||
|
||||
Run the following command to update the scheduler configuration:
|
||||
|
||||
```shell
|
||||
$ curl -XPOST localhost:4646/v1/operator/scheduler/configuration -d @scheduler.json
|
||||
```
|
||||
|
||||
You should now be able to check the scheduler configuration again and see that
|
||||
preemption has been enabled for service jobs (output below is abbreviated):
|
||||
|
||||
```shell
|
||||
$ curl -s localhost:4646/v1/operator/scheduler/configuration | jq
|
||||
{
|
||||
"SchedulerConfig": {
|
||||
"PreemptionConfig": {
|
||||
"SystemSchedulerEnabled": true,
|
||||
"BatchSchedulerEnabled": false,
|
||||
"ServiceSchedulerEnabled": true
|
||||
},
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### Step 6: Try Running `redis.nomad` Again
|
||||
|
||||
Now that you have enabled preemption on service jobs, deploying your `redis` job
|
||||
should evict one of the lower priority `webserver` allocations and place it into
|
||||
a queue. You can run `nomad plan` to see a preview of what will happen:
|
||||
|
||||
```shell
|
||||
$ nomad plan redis.nomad
|
||||
+ Job: "redis"
|
||||
+ Task Group: "cache1" (1 create)
|
||||
+ Task: "redis" (forces create)
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
|
||||
Preemptions:
|
||||
|
||||
Alloc ID Job ID Task Group
|
||||
725d3b49-d5cf-6ba2-be3d-cb441c10a8b3 webserver webserver
|
||||
...
|
||||
```
|
||||
|
||||
Note that Nomad is indicating one of the `webserver` allocations will be
|
||||
evicted.
|
||||
|
||||
Now run the `redis` job:
|
||||
|
||||
```shell
|
||||
$ nomad run redis.nomad
|
||||
==> Monitoring evaluation "7ada9d9f"
|
||||
Evaluation triggered by job "redis"
|
||||
Allocation "8bfcdda3" created: node "16653ac1", group "cache1"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "7ada9d9f" finished with status "complete"
|
||||
```
|
||||
|
||||
You can check the status of the `webserver` job and verify one of the allocations has been evicted:
|
||||
|
||||
```shell
|
||||
$ nomad status webserver
|
||||
ID = webserver
|
||||
Name = webserver
|
||||
Submit Date = 2019-06-19T04:20:32Z
|
||||
Type = service
|
||||
Priority = 40
|
||||
...
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
webserver 1 0 2 0 1 0
|
||||
|
||||
Placement Failure
|
||||
Task Group "webserver":
|
||||
* Resources exhausted on 3 nodes
|
||||
* Dimension "memory" exhausted on 3 nodes
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created Modified
|
||||
725d3b49 16653ac1 webserver 0 evict complete 4m10s ago 33s ago
|
||||
e2f9cb3d f765c6e8 webserver 0 run running 4m10s ago 3m54s ago
|
||||
e9d8df1b b0700ec0 webserver 0 run running 4m10s ago 3m51s ago
|
||||
```
|
||||
|
||||
### Step 7: Stop the Redis Job
|
||||
|
||||
Stop the `redis` job and verify that evicted/queued `webserver` allocation
|
||||
starts running again:
|
||||
|
||||
```shell
|
||||
$ nomad stop redis
|
||||
==> Monitoring evaluation "670922e9"
|
||||
Evaluation triggered by job "redis"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "670922e9" finished with status "complete"
|
||||
```
|
||||
|
||||
You should now be able to see from the `webserver` status that the third allocation that was previously preempted is running again:
|
||||
|
||||
```shell
|
||||
$ nomad status webserver
|
||||
ID = webserver
|
||||
Name = webserver
|
||||
Submit Date = 2019-06-19T04:20:32Z
|
||||
Type = service
|
||||
Priority = 40
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
webserver 0 0 3 0 1 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created Modified
|
||||
f623eb81 16653ac1 webserver 0 run running 13s ago 7s ago
|
||||
725d3b49 16653ac1 webserver 0 evict complete 6m44s ago 3m7s ago
|
||||
e2f9cb3d f765c6e8 webserver 0 run running 6m44s ago 6m28s ago
|
||||
e9d8df1b b0700ec0 webserver 0 run running 6m44s ago 6m25s ago
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
The process you learned in this guide can also be applied to
|
||||
[batch][batch-enabled] jobs as well. Read more about preemption in Nomad
|
||||
Enterprise [here][enterprise-preemption].
|
||||
|
||||
[batch-enabled]: /api/operator#batchschedulerenabled-1
|
||||
[batch-job]: /docs/schedulers#batch
|
||||
[count]: /docs/job-specification/group#count
|
||||
[enterprise]: /docs/enterprise
|
||||
[enterprise-preemption]: /docs/enterprise#preemption
|
||||
[memory]: /docs/job-specification/resources#memory
|
||||
[payload-preemption-config]: /api/operator#sample-payload-1
|
||||
[plan]: /docs/commands/job/plan
|
||||
[preemption]: /docs/internals/scheduling/preemption
|
||||
[preemption-config]: /api/operator#preemptionconfig-1
|
||||
[priority]: /docs/job-specification/job#priority
|
||||
[service-enabled]: /api/operator#serviceschedulerenabled-1
|
||||
[service-job]: /docs/schedulers#service
|
||||
[step-1]: #step-1-create-a-job-with-low-priority
|
||||
[system-job]: /docs/schedulers#system
|
||||
[t2-micro]: https://aws.amazon.com/ec2/instance-types/
|
||||
[update-scheduler]: /api/operator#update-scheduler-configuration
|
||||
[scheduler-configuration]: /api/operator#read-scheduler-configuration
|
|
@ -0,0 +1,231 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Spread
|
||||
sidebar_title: Fault Tolerance with Spread
|
||||
description: The following guide walks the user through using the spread stanza in Nomad.
|
||||
---
|
||||
|
||||
# Increasing Failure Tolerance with Spread
|
||||
|
||||
The Nomad scheduler uses a bin packing algorithm when making job placements on nodes to optimize resource utilization and density of applications. Although bin packing ensures optimal resource utilization, it can lead to some nodes carrying a majority of allocations for a given job. This can cause cascading failures where the failure of a single node or a single data center can lead to application unavailability.
|
||||
|
||||
The [spread stanza][spread-stanza] solves this problem by allowing operators to distribute their workloads in a customized way based on [attributes][attributes] and/or [client metadata][client-metadata]. By using spread criteria in their job specification, Nomad job operators can ensure that failures across a domain such as datacenter or rack don't affect application availability.
|
||||
|
||||
## Reference Material
|
||||
|
||||
- The [spread][spread-stanza] stanza documentation
|
||||
- [Scheduling][scheduling] with Nomad
|
||||
|
||||
## Estimated Time to Complete
|
||||
|
||||
20 minutes
|
||||
|
||||
## Challenge
|
||||
|
||||
Consider a Nomad application that needs to be deployed to multiple datacenters within a region. Datacenter `dc1` has four nodes while `dc2` has one node. This application has 10 instances and 7 of them must be deployed to `dc1` since it receives more user traffic and we need to make sure the application doesn't suffer downtime due to not enough running instances to process requests. The remaining 3 allocations can be deployed to `dc2`.
|
||||
|
||||
## Solution
|
||||
|
||||
Use the `spread` stanza in the Nomad [job specification][job-specification] to ensure the 70% of the workload is being placed in datacenter `dc1` and 30% is being placed in `dc2`. The Nomad operator can use the [percent][percent] option with a [target][target] to customize the spread.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
To perform the tasks described in this guide, you need to have a Nomad
|
||||
environment with Consul installed. You can use this [repo](https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud) to easily provision a sandbox environment. This guide will assume a cluster with one server node and five client nodes.
|
||||
|
||||
-> **Please Note:** This guide is for demo purposes and is only using a single
|
||||
server
|
||||
node. In a production cluster, 3 or 5 server nodes are recommended.
|
||||
|
||||
## Steps
|
||||
|
||||
### Step 1: Place One of the Client Nodes in a Different Datacenter
|
||||
|
||||
We are going to customize the spread for our job placement between the datacenters our nodes are located in. Choose one of your client nodes and edit `/etc/nomad.d/nomad.hcl` to change its location to `dc2`. A snippet of an example configuration file is show below with the required change is shown below.
|
||||
|
||||
```shell
|
||||
data_dir = "/opt/nomad/data"
|
||||
bind_addr = "0.0.0.0"
|
||||
datacenter = "dc2"
|
||||
|
||||
# Enable the client
|
||||
client {
|
||||
enabled = true
|
||||
...
|
||||
```
|
||||
|
||||
After making the change on your chosen client node, restart the Nomad service
|
||||
|
||||
```shell
|
||||
$ sudo systemctl restart nomad
|
||||
```
|
||||
|
||||
If everything worked correctly, you should be able to run the `nomad` [node status][node-status] command and see that one of your nodes is now in datacenter `dc2`.
|
||||
|
||||
```shell
|
||||
$ nomad node status
|
||||
ID DC Name Class Drain Eligibility Status
|
||||
5d16d949 dc2 ip-172-31-62-240 <none> false eligible ready
|
||||
7b381152 dc1 ip-172-31-59-115 <none> false eligible ready
|
||||
10cc48cc dc1 ip-172-31-58-46 <none> false eligible ready
|
||||
93f1e628 dc1 ip-172-31-58-113 <none> false eligible ready
|
||||
12894b80 dc1 ip-172-31-62-90 <none> false eligible ready
|
||||
```
|
||||
|
||||
### Step 2: Create a Job with the `spread` Stanza
|
||||
|
||||
Create a file with the name `redis.nomad` and place the following content in it:
|
||||
|
||||
```hcl
|
||||
job "redis" {
|
||||
datacenters = ["dc1", "dc2"]
|
||||
type = "service"
|
||||
|
||||
spread {
|
||||
attribute = "${node.datacenter}"
|
||||
weight = 100
|
||||
target "dc1" {
|
||||
percent = 70
|
||||
}
|
||||
target "dc2" {
|
||||
percent = 30
|
||||
}
|
||||
}
|
||||
|
||||
group "cache1" {
|
||||
count = 10
|
||||
|
||||
task "redis" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "redis:latest"
|
||||
port_map {
|
||||
db = 6379
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
port "db" {}
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "redis-cache"
|
||||
port = "db"
|
||||
check {
|
||||
name = "alive"
|
||||
type = "tcp"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Note that we used the `spread` stanza and specified the [datacenter][attributes]
|
||||
attribute while targeting `dc1` and `dc2` with the percent options. This will tell the Nomad scheduler to make an attempt to distribute 70% of the workload on `dc1` and 30% of the workload on `dc2`.
|
||||
|
||||
### Step 3: Register the Job `redis.nomad`
|
||||
|
||||
Run the Nomad job with the following command:
|
||||
|
||||
```shell
|
||||
$ nomad run redis.nomad
|
||||
==> Monitoring evaluation "c3dc5ebd"
|
||||
Evaluation triggered by job "redis"
|
||||
Allocation "7a374183" created: node "5d16d949", group "cache1"
|
||||
Allocation "f4361df1" created: node "7b381152", group "cache1"
|
||||
Allocation "f7af42dc" created: node "5d16d949", group "cache1"
|
||||
Allocation "0638edf2" created: node "10cc48cc", group "cache1"
|
||||
Allocation "49bc6038" created: node "12894b80", group "cache1"
|
||||
Allocation "c7e5679a" created: node "5d16d949", group "cache1"
|
||||
Allocation "cf91bf65" created: node "7b381152", group "cache1"
|
||||
Allocation "d16b606c" created: node "12894b80", group "cache1"
|
||||
Allocation "27866df0" created: node "93f1e628", group "cache1"
|
||||
Allocation "8531a6fc" created: node "7b381152", group "cache1"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
```
|
||||
|
||||
Note that three of the ten allocations have been placed on node `5d16d949`. This is the node we configured to be in datacenter `dc2`. The Nomad scheduler has distributed 30% of the workload to `dc2` as we specified in the `spread` stanza.
|
||||
|
||||
Keep in mind that the Nomad scheduler still factors in other components into the overall scoring of nodes when making placements, so you should not expect the spread stanza to strictly implement your distribution preferences like a [constraint][constraint-stanza]. We will take a detailed look at the scoring in the next few steps.
|
||||
|
||||
### Step 4: Check the Status of the `redis` Job
|
||||
|
||||
At this point, we are going to check the status of our job and verify where our allocations have been placed. Run the following command:
|
||||
|
||||
```shell
|
||||
$ nomad status redis
|
||||
```
|
||||
|
||||
You should see 10 instances of your job running in the `Summary` section of the output as show below:
|
||||
|
||||
```shell
|
||||
...
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
cache1 0 0 10 0 0 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created Modified
|
||||
0638edf2 10cc48cc cache1 0 run running 2m20s ago 2m ago
|
||||
27866df0 93f1e628 cache1 0 run running 2m20s ago 1m57s ago
|
||||
49bc6038 12894b80 cache1 0 run running 2m20s ago 1m58s ago
|
||||
7a374183 5d16d949 cache1 0 run running 2m20s ago 2m1s ago
|
||||
8531a6fc 7b381152 cache1 0 run running 2m20s ago 2m2s ago
|
||||
c7e5679a 5d16d949 cache1 0 run running 2m20s ago 1m55s ago
|
||||
cf91bf65 7b381152 cache1 0 run running 2m20s ago 1m57s ago
|
||||
d16b606c 12894b80 cache1 0 run running 2m20s ago 2m1s ago
|
||||
f4361df1 7b381152 cache1 0 run running 2m20s ago 2m3s ago
|
||||
f7af42dc 5d16d949 cache1 0 run running 2m20s ago 1m54s ago
|
||||
```
|
||||
|
||||
You can cross-check this output with the results of the `nomad node status` command to verify that 30% of your workload has been placed on the node in `dc2` (in our case, that node is `5d16d949`).
|
||||
|
||||
### Step 5: Obtain Detailed Scoring Information on Job Placement
|
||||
|
||||
The Nomad scheduler will not always spread your
|
||||
workload in the way you have specified in the `spread` stanza even if the
|
||||
resources are available. This is because spread scoring is factored in with
|
||||
other metrics as well before making a scheduling decision. In this step, we will take a look at some of those other factors.
|
||||
|
||||
Using the output from the previous step, take any allocation that has been placed on a node and use the nomad [alloc status][alloc status] command with the [verbose][verbose] option to obtain detailed scoring information on it. In this example, we will use the allocation ID `0638edf2` (your allocation IDs will be different).
|
||||
|
||||
```shell
|
||||
$ nomad alloc status -verbose 0638edf2
|
||||
```
|
||||
|
||||
The resulting output will show the `Placement Metrics` section at the bottom.
|
||||
|
||||
```shell
|
||||
...
|
||||
Placement Metrics
|
||||
Node node-affinity allocation-spread binpack job-anti-affinity node-reschedule-penalty final score
|
||||
10cc48cc-2913-af54-74d5-d7559f373ff2 0 0.429 0.33 0 0 0.379
|
||||
93f1e628-e509-b1ab-05b7-0944056f781d 0 0.429 0.515 -0.2 0 0.248
|
||||
12894b80-4943-4d5c-5716-c626c6b99be3 0 0.429 0.515 -0.2 0 0.248
|
||||
7b381152-3802-258b-4155-6d7dfb344dd4 0 0.429 0.515 -0.2 0 0.248
|
||||
5d16d949-85aa-3fd3-b5f4-51094cbeb77a 0 0.333 0.515 -0.2 0 0.216
|
||||
```
|
||||
|
||||
Note that the results from the `allocation-spread`, `binpack`, `job-anti-affinity`, `node-reschedule-penalty`, and `node-affinity` columns are combined to produce the numbers listed in the `final score` column for each node. The Nomad scheduler uses the final score for each node in deciding where to make placements.
|
||||
|
||||
## Next Steps
|
||||
|
||||
Change the values of the `percent` options on your targets in the `spread` stanza and observe how the placement behavior along with the final score given to each node changes (use the `nomad alloc status` command as shown in the previous step).
|
||||
|
||||
[alloc status]: /docs/commands/alloc/status
|
||||
[attributes]: /docs/runtime/interpolation#node-variables-
|
||||
[client-metadata]: /docs/configuration/client#meta
|
||||
[constraint-stanza]: /docs/job-specification/constraint
|
||||
[job-specification]: /docs/job-specification
|
||||
[node-status]: /docs/commands/node/status
|
||||
[percent]: /docs/job-specification/spread#percent
|
||||
[spread-stanza]: /docs/job-specification/spread
|
||||
[scheduling]: /docs/internals/scheduling/scheduling
|
||||
[target]: /docs/job-specification/spread#target
|
||||
[verbose]: /docs/commands/alloc/status#verbose
|
210
website/pages/guides/operating-a-job/configuring-tasks.mdx
Normal file
210
website/pages/guides/operating-a-job/configuring-tasks.mdx
Normal file
|
@ -0,0 +1,210 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Configuring Tasks - Operating a Job
|
||||
sidebar_title: Configuring Tasks
|
||||
description: |-
|
||||
Most applications require some kind of configuration. Whether the
|
||||
configuration is provided via the command line, environment variables, or a
|
||||
configuration file, Nomad has built-in functionality for configuration. This
|
||||
section details three common patterns for configuring tasks.
|
||||
---
|
||||
|
||||
# Configuring Tasks
|
||||
|
||||
Most applications require some kind of local configuration. While command line
|
||||
arguments are the simplest method, many applications require more complex
|
||||
configurations provided via environment variables or configuration files. This
|
||||
section explores how to configure Nomad jobs to support many common
|
||||
configuration use cases.
|
||||
|
||||
## Command-line Arguments
|
||||
|
||||
Many tasks accept configuration via command-line arguments. For example,
|
||||
consider the [http-echo](https://github.com/hashicorp/http-echo) server which
|
||||
is a small go binary that renders the provided text as a webpage. The binary
|
||||
accepts two parameters:
|
||||
|
||||
- `-listen` - the `address:port` to listen on
|
||||
- `-text` - the text to render as the HTML page
|
||||
|
||||
Outside of Nomad, the server is started like this:
|
||||
|
||||
```shell
|
||||
$ http-echo -listen=":5678" -text="hello world"
|
||||
```
|
||||
|
||||
The Nomad equivalent job file might look something like this:
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
task "server" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/bin/http-echo"
|
||||
args = [
|
||||
"-listen", ":5678",
|
||||
"-text", "hello world",
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http" {
|
||||
static = "5678"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
~> **This assumes** the <tt>http-echo</tt> binary is already installed and
|
||||
available in the system path. Nomad can also optionally fetch the binary
|
||||
using the <tt>artifact</tt> resource.
|
||||
|
||||
Nomad has many [drivers](/docs/drivers), and most support passing
|
||||
arguments to their tasks via the `args` parameter. This parameter also supports
|
||||
[Nomad interpolation](/docs/runtime/interpolation). For example, if you
|
||||
wanted Nomad to dynamically allocate a high port to bind the service on instead
|
||||
of relying on a static port for the previous job:
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
task "server" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/bin/http-echo"
|
||||
args = [
|
||||
"-listen", ":${NOMAD_PORT_http}",
|
||||
"-text", "hello world",
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http" {}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Some applications can be configured via environment variables. [The
|
||||
Twelve-Factor App](https://12factor.net/config) document suggests configuring
|
||||
applications through environment variables. Nomad supports custom environment
|
||||
variables in two ways:
|
||||
|
||||
- Interpolation in an `env` stanza
|
||||
- Templated in a `template` stanza
|
||||
|
||||
### `env` stanza
|
||||
|
||||
Each task may have an `env` stanza which specifies environment variables:
|
||||
|
||||
```hcl
|
||||
task "server" {
|
||||
env {
|
||||
my_key = "my-value"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `env` stanza also supports
|
||||
[interpolation](/docs/runtime/interpolation):
|
||||
|
||||
```hcl
|
||||
task "server" {
|
||||
env {
|
||||
LISTEN_PORT = "${NOMAD_PORT_http}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
See the [`env`](/docs/job-specification/env) docs for details.
|
||||
|
||||
### Environment Templates
|
||||
|
||||
Nomad's [`template`][template] stanza can be used
|
||||
to generate environment variables. Environment variables may be templated with
|
||||
[Node attributes and metadata][nodevars], the contents of files on disk, Consul
|
||||
keys, or secrets from Vault:
|
||||
|
||||
```hcl
|
||||
template {
|
||||
data = <<EOH
|
||||
LOG_LEVEL="{{key "service/geo-api/log-verbosity"}}"
|
||||
API_KEY="{{with secret "secret/geo-api-key"}}{{.Data.key}}{{end}}"
|
||||
CERT={{ file "path/to/cert.pem" | to JSON }}
|
||||
NODE_ID="{{ env "node.unique.id" }}"
|
||||
EOH
|
||||
|
||||
destination = "secrets/config.env"
|
||||
env = true
|
||||
}
|
||||
```
|
||||
|
||||
The template will be written to disk and then read as environment variables
|
||||
before your task is launched.
|
||||
|
||||
## Configuration Files
|
||||
|
||||
Sometimes applications accept their configurations using files to support
|
||||
complex data structures. Nomad supports downloading
|
||||
[artifacts][artifact] and
|
||||
[templating][template] them prior to launching
|
||||
tasks.
|
||||
This allows shipping of configuration files and other assets that the task
|
||||
needs to run properly.
|
||||
|
||||
Here is an example job which pulls down a configuration file as an artifact and
|
||||
templates it:
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
task "server" {
|
||||
driver = "exec"
|
||||
|
||||
artifact {
|
||||
source = "http://example.com/config.hcl.tmpl"
|
||||
destination = "local/config.hcl.tmpl"
|
||||
}
|
||||
|
||||
template {
|
||||
source = "local/config.hcl.tmpl"
|
||||
destination = "local/config.hcl"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "my-app"
|
||||
args = [
|
||||
"-config", "local/config.hcl",
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
For more information on the artifact resource, please see the [artifact
|
||||
documentation](/docs/job-specification/artifact).
|
||||
|
||||
[artifact]: /docs/job-specification/artifact 'Nomad artifact Job Specification'
|
||||
[nodevars]: /docs/runtime/interpolation#interpreted_node_vars 'Nomad Node Variables'
|
||||
[template]: /docs/job-specification/template 'Nomad template Job Specification'
|
17
website/pages/guides/operating-a-job/external/index.mdx
vendored
Normal file
17
website/pages/guides/operating-a-job/external/index.mdx
vendored
Normal file
|
@ -0,0 +1,17 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: External Plugins
|
||||
description: External plugins guides list
|
||||
---
|
||||
|
||||
## External Plugins
|
||||
|
||||
Starting with Nomad 0.9, task and device drivers are now pluggable. This gives
|
||||
users the flexibility to introduce their own drivers by providing a binary
|
||||
rather than having to fork Nomad to make code changes to it. See the navigation
|
||||
menu on the left or the list below for guides on how to use some of the external
|
||||
drivers available for Nomad.
|
||||
|
||||
- [LXC][lxc]
|
||||
|
||||
[lxc]: /guides/operating-a-job/external/lxc
|
165
website/pages/guides/operating-a-job/external/lxc.mdx
vendored
Normal file
165
website/pages/guides/operating-a-job/external/lxc.mdx
vendored
Normal file
|
@ -0,0 +1,165 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: LXC
|
||||
sidebar_title: Running LXC Applications
|
||||
description: Guide for using LXC external task driver plugin.
|
||||
---
|
||||
|
||||
## LXC
|
||||
|
||||
The `lxc` driver provides an interface for using LXC for running application
|
||||
containers. This guide walks through the steps involved in configuring a Nomad client agent to be able to run lxc jobs. You can download the external LXC driver [here][lxc_driver_download].
|
||||
|
||||
~> Note: This guide is compatible with Nomad 0.9 and above. If you are using an older version of Nomad, see the [LXC][lxc-docs] driver documentation.
|
||||
|
||||
## Reference Material
|
||||
|
||||
- Official [LXC][linux-containers] documentation
|
||||
- Nomad [LXC][lxc-docs] external driver documentation
|
||||
- Nomad LXC external driver [repo][lxc-driver-repo]
|
||||
|
||||
## Installation Instructions
|
||||
|
||||
### Step 1: Install the `lxc` and `lxc-templates` Packages
|
||||
|
||||
Before deploying an LXC workload, you will need to install the `lxc` and `lxc-templates` packages which will provide the runtime and templates to start your container. Run the following command:
|
||||
|
||||
```shell
|
||||
sudo apt install -y lxc lxc-templates
|
||||
```
|
||||
|
||||
### Step 2: Download and Install the LXC Driver
|
||||
|
||||
External drivers must be placed in the [plugin_dir][plugin_dir] directory which
|
||||
defaults to [`data_dir`][data_dir]`/plugins`. Make a directory called `plugins` in [data_dir][data_dir] (which is `/opt/nomad/data` in the example below) and download/place the [LXC driver][lxc_driver_download] in it. The following sequence of commands illustrate this process:
|
||||
|
||||
```shell
|
||||
$ sudo mkdir -p /opt/nomad/data/plugins
|
||||
$ curl -O https://releases.hashicorp.com/nomad-driver-lxc/0.1.0-rc2/nomad-driver-lxc_0.1.0-rc2_linux_amd64.zip
|
||||
$ unzip nomad-driver-lxc_0.1.0-rc2_linux_amd64.zip
|
||||
Archive: nomad-driver-lxc_0.1.0-rc2_linux_amd64.zip
|
||||
inflating: nomad-driver-lxc
|
||||
$ sudo mv nomad-driver-lxc /opt/nomad/data/plugins
|
||||
```
|
||||
|
||||
You can now delete the original zip file:
|
||||
|
||||
```shell
|
||||
$ rm ./nomad-driver-lxc*.zip
|
||||
```
|
||||
|
||||
### Step 3: Verify the LXC Driver Status
|
||||
|
||||
After completing the previous steps, you do not need to explicitly enable the
|
||||
LXC driver in the client configuration, as it is enabled by default.
|
||||
|
||||
Restart the Nomad client service:
|
||||
|
||||
```shell
|
||||
$ sudo systemctl restart nomad
|
||||
```
|
||||
|
||||
After a few seconds, run the `nomad node status` command to verify the client
|
||||
node is ready:
|
||||
|
||||
```shell
|
||||
$ nomad node status
|
||||
ID DC Name Class Drain Eligibility Status
|
||||
81c22a0c dc1 ip-172-31-5-174 <none> false eligible ready
|
||||
```
|
||||
|
||||
You can now run the `nomad node status` command against the specific node ID to
|
||||
see which drivers are initialized on the client. In our case, the client node ID
|
||||
is `81c22a0c` (your client node ID will be different). You should see `lxc`
|
||||
appear in the `Driver Status` section as shown below:
|
||||
|
||||
```shell
|
||||
$ nomad node status 81c22a0c
|
||||
ID = 81c22a0c
|
||||
Name = ip-172-31-5-174
|
||||
Class = <none>
|
||||
DC = dc1
|
||||
Drain = false
|
||||
Eligibility = eligible
|
||||
Status = ready
|
||||
Uptime = 2h13m30s
|
||||
Driver Status = docker,exec,java,lxc,mock_driver,raw_exec,rkt
|
||||
...
|
||||
```
|
||||
|
||||
### Step 4: Register the Nomad Job
|
||||
|
||||
You can run this [LXC example job][lxc-job] to register a Nomad job that deploys an LXC workload.
|
||||
|
||||
```shell
|
||||
$ nomad run lxc.nomad
|
||||
==> Monitoring evaluation "d8be10f4"
|
||||
Evaluation triggered by job "example-lxc"
|
||||
Allocation "4248c82e" created: node "81c22a0c", group "example"
|
||||
Allocation "4248c82e" status changed: "pending" -> "running" (Tasks are running)
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "d8be10f4" finished with status "complete"
|
||||
```
|
||||
|
||||
### Step 5: Check the Status of the Job
|
||||
|
||||
Run the following command to check the status of the jobs in your
|
||||
cluster:
|
||||
|
||||
```shell
|
||||
$ nomad status
|
||||
ID Type Priority Status Submit Date
|
||||
example-lxc service 50 running 2019-01-28T22:05:36Z
|
||||
```
|
||||
|
||||
As shown above, our job is successfully running. You can see detailed
|
||||
information about our specific job with the following command:
|
||||
|
||||
```shell
|
||||
$ nomad status example-lxc
|
||||
ID = example-lxc
|
||||
Name = example-lxc
|
||||
Submit Date = 2019-01-28T22:05:36Z
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
example 0 0 1 0 0 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created Modified
|
||||
4248c82e 81c22a0c example 0 run running 6m58s ago 6m47s ago
|
||||
```
|
||||
|
||||
### More Configuration Options
|
||||
|
||||
The LXC driver is enabled by default in the client configuration. In order to
|
||||
provide additional options to the LXC plugin, add [plugin
|
||||
options][lxc_plugin_options] `volumes_enabled` and `lxc_path` for the `lxc`
|
||||
driver in the client's configuration file like in the following example:
|
||||
|
||||
```hcl
|
||||
plugin "nomad-driver-lxc" {
|
||||
config {
|
||||
enabled = true
|
||||
volumes_enabled = true
|
||||
lxc_path = "/var/lib/lxc"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
[data_dir]: /docs/configuration#data_dir
|
||||
[linux-containers]: https://linuxcontainers.org/lxc/introduction/
|
||||
[linux-containers-home]: https://linuxcontainers.org
|
||||
[lxc_driver_download]: https://releases.hashicorp.com/nomad-driver-lxc
|
||||
[lxc-driver-repo]: https://github.com/hashicorp/nomad-driver-lxc
|
||||
[lxc-docs]: /docs/drivers/external/lxc
|
||||
[lxc-job]: https://github.com/hashicorp/nomad-education-content/blob/master/lxc.nomad
|
||||
[lxc_plugin_options]: /docs/drivers/external/lxc#plugin-options
|
||||
[plugin_dir]: /docs/configuration#plugin_dir
|
||||
[plugin_syntax]: /docs/configuration/plugin
|
|
@ -0,0 +1,82 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Check Restart Stanza - Operating a Job
|
||||
sidebar_title: Check Restarts
|
||||
description: >-
|
||||
Nomad can restart tasks if they have a failing health check based on
|
||||
|
||||
configuration specified in the `check_restart` stanza. Restarts are done
|
||||
locally on the node
|
||||
|
||||
running the task based on their `restart` policy.
|
||||
---
|
||||
|
||||
# Check Restart Stanza
|
||||
|
||||
The [`check_restart` stanza][check restart] instructs Nomad when to restart tasks with unhealthy service checks.
|
||||
When a health check in Consul has been unhealthy for the limit specified in a check_restart stanza,
|
||||
it is restarted according to the task group's restart policy.
|
||||
|
||||
The `limit` field is used to specify the number of times a failing healthcheck is seen before local restarts are attempted.
|
||||
Operators can also specify a `grace` duration to wait after a task restarts before checking its health.
|
||||
|
||||
We recommend configuring the check restart on services if its likely that a restart would resolve the failure. This
|
||||
is applicable in cases like temporary memory issues on the service.
|
||||
|
||||
# Example
|
||||
|
||||
The following `check_restart` stanza waits for two consecutive health check failures with a
|
||||
grace period and considers both `critical` and `warning` statuses as failures
|
||||
|
||||
```text
|
||||
check_restart {
|
||||
limit = 2
|
||||
grace = "10s"
|
||||
ignore_warnings = false
|
||||
}
|
||||
```
|
||||
|
||||
The following CLI example output shows healthcheck failures triggering restarts until its
|
||||
restart limit is reached.
|
||||
|
||||
```shell
|
||||
$ nomad alloc status e1b43128-2a0a-6aa3-c375-c7e8a7c48690
|
||||
ID = e1b43128
|
||||
Eval ID = 249cbfe9
|
||||
Name = demo.demo[0]
|
||||
Node ID = 221e998e
|
||||
Job ID = demo
|
||||
Job Version = 0
|
||||
Client Status = failed
|
||||
Client Description = <none>
|
||||
Desired Status = run
|
||||
Desired Description = <none>
|
||||
Created = 2m59s ago
|
||||
Modified = 39s ago
|
||||
|
||||
Task "test" is "dead"
|
||||
Task Resources
|
||||
CPU Memory Disk Addresses
|
||||
100 MHz 300 MiB 300 MiB p1: 127.0.0.1:28422
|
||||
|
||||
Task Events:
|
||||
Started At = 2018-04-12T22:50:32Z
|
||||
Finished At = 2018-04-12T22:50:54Z
|
||||
Total Restarts = 3
|
||||
Last Restart = 2018-04-12T17:50:15-05:00
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
2018-04-12T17:50:54-05:00 Not Restarting Exceeded allowed attempts 3 in interval 30m0s and mode is "fail"
|
||||
2018-04-12T17:50:54-05:00 Killed Task successfully killed
|
||||
2018-04-12T17:50:54-05:00 Killing Sent interrupt. Waiting 5s before force killing
|
||||
2018-04-12T17:50:54-05:00 Restart Signaled healthcheck: check "service: \"demo-service-test\" check" unhealthy
|
||||
2018-04-12T17:50:32-05:00 Started Task started by client
|
||||
2018-04-12T17:50:15-05:00 Restarting Task restarting in 16.887291122s
|
||||
2018-04-12T17:50:15-05:00 Killed Task successfully killed
|
||||
2018-04-12T17:50:15-05:00 Killing Sent interrupt. Waiting 5s before force killing
|
||||
2018-04-12T17:50:15-05:00 Restart Signaled healthcheck: check "service: \"demo-service-test\" check" unhealthy
|
||||
2018-04-12T17:49:53-05:00 Started Task started by client
|
||||
```
|
||||
|
||||
[check restart]: /docs/job-specification/check_restart 'Nomad check restart Stanza'
|
|
@ -0,0 +1,26 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Handling Failures - Operating a Job
|
||||
sidebar_title: Failure Recovery Strategies
|
||||
description: >-
|
||||
This section describes features in Nomad that automate recovering from failed
|
||||
tasks.
|
||||
---
|
||||
|
||||
# Failure Recovery Strategies
|
||||
|
||||
Most applications deployed in Nomad are either long running services or one time batch jobs.
|
||||
They can fail for various reasons like:
|
||||
|
||||
- A temporary error in the service that resolves when its restarted.
|
||||
- An upstream dependency might not be available, leading to a health check failure.
|
||||
- Disk, Memory or CPU contention on the node that the application is running on.
|
||||
- The application uses Docker and the Docker daemon on that node is unresponsive.
|
||||
|
||||
Nomad provides configurable options to enable recovering failed tasks to avoid downtime. Nomad will
|
||||
try to restart a failed task on the node it is running on, and also try to reschedule it on another node.
|
||||
Please see one of the guides below or use the navigation on the left for details on each option:
|
||||
|
||||
1. [Local Restarts](/guides/operating-a-job/failure-handling-strategies/restart)
|
||||
1. [Check Restarts](/guides/operating-a-job/failure-handling-strategies/check-restart)
|
||||
1. [Rescheduling](/guides/operating-a-job/failure-handling-strategies/reschedule)
|
|
@ -0,0 +1,96 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Reschedule Stanza - Operating a Job
|
||||
sidebar_title: Rescheduling
|
||||
description: >-
|
||||
Nomad can reschedule failing tasks after any local restart attempts have been
|
||||
|
||||
exhausted. This is useful to recover from failures stemming from problems in
|
||||
the node
|
||||
|
||||
running the task.
|
||||
---
|
||||
|
||||
# Reschedule Stanza
|
||||
|
||||
Tasks can sometimes fail due to network, CPU or memory issues on the node running the task. In such situations,
|
||||
Nomad can reschedule the task on another node. The [`reschedule` stanza][reschedule] can be used to configure how
|
||||
Nomad should try placing failed tasks on another node in the cluster. Reschedule attempts have a delay between
|
||||
each attempt, and the delay can be configured to increase between each rescheduling attempt according to a configurable
|
||||
`delay_function`. See the [`reschedule` stanza][reschedule] for more information.
|
||||
|
||||
Service jobs are configured by default to have unlimited reschedule attempts. We recommend using the reschedule
|
||||
stanza to ensure that failed tasks are automatically reattempted on another node without needing operator intervention.
|
||||
|
||||
# Example
|
||||
|
||||
The following CLI example shows job and allocation statuses for a task being rescheduled by Nomad.
|
||||
The CLI shows the number of previous attempts if there is a limit on the number of reschedule attempts.
|
||||
The CLI also shows when the next reschedule will be attempted.
|
||||
|
||||
```shell
|
||||
$nomad job status demo
|
||||
ID = demo
|
||||
Name = demo
|
||||
Submit Date = 2018-04-12T15:48:37-05:00
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = pending
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
demo 0 0 0 2 0 0
|
||||
|
||||
Future Rescheduling Attempts
|
||||
Task Group Eval ID Eval Time
|
||||
demo ee3de93f 5s from now
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created Modified
|
||||
39d7823d f2c2eaa6 demo 0 run failed 5s ago 5s ago
|
||||
fafb011b f2c2eaa6 demo 0 run failed 11s ago 10s ago
|
||||
|
||||
```
|
||||
|
||||
```shell
|
||||
$ nomad alloc status 3d0b
|
||||
ID = 3d0bbdb1
|
||||
Eval ID = 79b846a9
|
||||
Name = demo.demo[0]
|
||||
Node ID = 8a184f31
|
||||
Job ID = demo
|
||||
Job Version = 0
|
||||
Client Status = failed
|
||||
Client Description = <none>
|
||||
Desired Status = run
|
||||
Desired Description = <none>
|
||||
Created = 15s ago
|
||||
Modified = 15s ago
|
||||
Reschedule Attempts = 3/5
|
||||
Reschedule Eligibility = 25s from now
|
||||
|
||||
Task "demo" is "dead"
|
||||
Task Resources
|
||||
CPU Memory Disk Addresses
|
||||
100 MHz 300 MiB 300 MiB p1: 127.0.0.1:27646
|
||||
|
||||
Task Events:
|
||||
Started At = 2018-04-12T20:44:25Z
|
||||
Finished At = 2018-04-12T20:44:25Z
|
||||
Total Restarts = 0
|
||||
Last Restart = N/A
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
2018-04-12T15:44:25-05:00 Not Restarting Policy allows no restarts
|
||||
2018-04-12T15:44:25-05:00 Terminated Exit Code: 127
|
||||
2018-04-12T15:44:25-05:00 Started Task started by client
|
||||
2018-04-12T15:44:25-05:00 Task Setup Building Task Directory
|
||||
2018-04-12T15:44:25-05:00 Received Task received by client
|
||||
|
||||
```
|
||||
|
||||
[reschedule]: /docs/job-specification/reschedule 'Nomad reschedule Stanza'
|
|
@ -0,0 +1,98 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Restart Stanza - Operating a Job
|
||||
sidebar_title: Local Restarts
|
||||
description: >-
|
||||
Nomad can restart a task on the node it is running on to recover from
|
||||
|
||||
failures. Task restarts can be configured to be limited by number of attempts
|
||||
within
|
||||
|
||||
a specific interval.
|
||||
---
|
||||
|
||||
# Restart Stanza
|
||||
|
||||
To enable restarting a failed task on the node it is running on, the task group can be annotated
|
||||
with configurable options using the [`restart` stanza][restart]. Nomad will restart the failed task
|
||||
up to `attempts` times within a provided `interval`. Operators can also choose whether to
|
||||
keep attempting restarts on the same node, or to fail the task so that it can be rescheduled
|
||||
on another node, via the `mode` parameter.
|
||||
|
||||
We recommend setting mode to `fail` in the restart stanza to allow rescheduling the task on another node.
|
||||
|
||||
## Example
|
||||
|
||||
The following CLI example shows job status and allocation status for a failed task that is being restarted by Nomad.
|
||||
Allocations are in the `pending` state while restarts are attempted. The `Recent Events` section in the CLI
|
||||
shows ongoing restart attempts.
|
||||
|
||||
```shell
|
||||
$nomad job status demo
|
||||
ID = demo
|
||||
Name = demo
|
||||
Submit Date = 2018-04-12T14:37:18-05:00
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
demo 0 3 0 0 0 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created Modified
|
||||
ce5bf1d1 8a184f31 demo 0 run pending 27s ago 5s ago
|
||||
d5dee7c8 8a184f31 demo 0 run pending 27s ago 5s ago
|
||||
ed815997 8a184f31 demo 0 run pending 27s ago 5s ago
|
||||
```
|
||||
|
||||
In the following example, the allocation `ce5bf1d1` is restarted by Nomad approximately
|
||||
every ten seconds, with a small random jitter. It eventually reaches its limit of three attempts and
|
||||
transitions into a `failed` state, after which it becomes eligible for [rescheduling][rescheduling].
|
||||
|
||||
```shell
|
||||
$nomad alloc-status ce5bf1d1
|
||||
ID = ce5bf1d1
|
||||
Eval ID = 64e45d11
|
||||
Name = demo.demo[1]
|
||||
Node ID = a0ccdd8b
|
||||
Job ID = demo
|
||||
Job Version = 0
|
||||
Client Status = failed
|
||||
Client Description = <none>
|
||||
Desired Status = run
|
||||
Desired Description = <none>
|
||||
Created = 56s ago
|
||||
Modified = 22s ago
|
||||
|
||||
Task "demo" is "dead"
|
||||
Task Resources
|
||||
CPU Memory Disk Addresses
|
||||
100 MHz 300 MiB 300 MiB
|
||||
|
||||
Task Events:
|
||||
Started At = 2018-04-12T22:29:08Z
|
||||
Finished At = 2018-04-12T22:29:08Z
|
||||
Total Restarts = 3
|
||||
Last Restart = 2018-04-12T17:28:57-05:00
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
2018-04-12T17:29:08-05:00 Not Restarting Exceeded allowed attempts 3 in interval 5m0s and mode is "fail"
|
||||
2018-04-12T17:29:08-05:00 Terminated Exit Code: 127
|
||||
2018-04-12T17:29:08-05:00 Started Task started by client
|
||||
2018-04-12T17:28:57-05:00 Restarting Task restarting in 10.364602876s
|
||||
2018-04-12T17:28:57-05:00 Terminated Exit Code: 127
|
||||
2018-04-12T17:28:57-05:00 Started Task started by client
|
||||
2018-04-12T17:28:47-05:00 Restarting Task restarting in 10.666963769s
|
||||
2018-04-12T17:28:47-05:00 Terminated Exit Code: 127
|
||||
2018-04-12T17:28:47-05:00 Started Task started by client
|
||||
2018-04-12T17:28:35-05:00 Restarting Task restarting in 11.777324721s
|
||||
```
|
||||
|
||||
[restart]: /docs/job-specification/restart 'Nomad restart Stanza'
|
||||
[rescheduling]: /docs/job-specification/reschedule 'Nomad restart Stanza'
|
37
website/pages/guides/operating-a-job/index.mdx
Normal file
37
website/pages/guides/operating-a-job/index.mdx
Normal file
|
@ -0,0 +1,37 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Job Lifecycle
|
||||
sidebar_title: Deploying & Managing Applications
|
||||
description: Learn how to deploy and manage a Nomad Job.
|
||||
---
|
||||
|
||||
# Deploying & Managing Applications
|
||||
|
||||
Developers deploy and manage their applications in Nomad via jobs.
|
||||
|
||||
This section provides some best practices and guidance for operating jobs under
|
||||
Nomad. Please navigate the appropriate sub-sections for more information.
|
||||
|
||||
## Deploying
|
||||
|
||||
The general flow for operating a job in Nomad is:
|
||||
|
||||
1. Author the job file according to the [job specification](/docs/job-specification)
|
||||
1. Plan and review the changes with a Nomad server
|
||||
1. Submit the job file to a Nomad server
|
||||
1. (Optional) Review job status and logs
|
||||
|
||||
## Updating
|
||||
|
||||
When updating a job, there are a number of built-in update strategies which may
|
||||
be defined in the job file. The general flow for updating an existing job in
|
||||
Nomad is:
|
||||
|
||||
1. Modify the existing job file with the desired changes
|
||||
1. Plan and review the changes with a Nomad server
|
||||
1. Submit the job file to a Nomad server
|
||||
1. (Optional) Review job status and logs
|
||||
|
||||
Because the job file defines the update strategy (blue-green, rolling updates,
|
||||
etc.), the workflow remains the same regardless of whether this is an initial
|
||||
deployment or a long-running job.
|
205
website/pages/guides/operating-a-job/inspecting-state.mdx
Normal file
205
website/pages/guides/operating-a-job/inspecting-state.mdx
Normal file
|
@ -0,0 +1,205 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Inspecting State - Operating a Job
|
||||
sidebar_title: Inspecting State
|
||||
description: |-
|
||||
Nomad exposes a number of tools and techniques for inspecting a running job.
|
||||
This is helpful in ensuring the job started successfully. Additionally, it
|
||||
can inform us of any errors that occurred while starting the job.
|
||||
---
|
||||
|
||||
# Inspecting State
|
||||
|
||||
A successful job submission is not an indication of a successfully-running job.
|
||||
This is the nature of a highly-optimistic scheduler. A successful job submission
|
||||
means the server was able to issue the proper scheduling commands. It does not
|
||||
indicate the job is actually running. To verify the job is running, we need to
|
||||
inspect its state.
|
||||
|
||||
This section will utilize the job named "docs" from the [previous
|
||||
sections](/guides/operating-a-job/submitting-jobs), but these operations
|
||||
and command largely apply to all jobs in Nomad.
|
||||
|
||||
## Job Status
|
||||
|
||||
After a job is submitted, you can query the status of that job using the job
|
||||
status command:
|
||||
|
||||
```shell
|
||||
$ nomad job status
|
||||
ID Type Priority Status
|
||||
docs service 50 running
|
||||
```
|
||||
|
||||
At a high level, we can see that our job is currently running, but what does
|
||||
"running" actually mean. By supplying the name of a job to the job status
|
||||
command, we can ask Nomad for more detailed job information:
|
||||
|
||||
```shell
|
||||
$ nomad job status docs
|
||||
ID = docs
|
||||
Name = docs
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
example 0 0 3 0 0 0
|
||||
|
||||
Allocations
|
||||
ID Eval ID Node ID Task Group Desired Status Created At
|
||||
04d9627d 42d788a3 a1f934c9 example run running <timestamp>
|
||||
e7b8d4f5 42d788a3 012ea79b example run running <timestamp>
|
||||
5cbf23a1 42d788a3 1e1aa1e0 example run running <timestamp>
|
||||
```
|
||||
|
||||
Here we can see that there are three instances of this task running, each with
|
||||
its own allocation. For more information on the `status` command, please see the
|
||||
[CLI documentation for <tt>status</tt>](/docs/commands/status).
|
||||
|
||||
## Evaluation Status
|
||||
|
||||
You can think of an evaluation as a submission to the scheduler. An example
|
||||
below shows status output for a job where some allocations were placed
|
||||
successfully, but did not have enough resources to place all of the desired
|
||||
allocations.
|
||||
|
||||
If we issue the status command with the `-evals` flag, we could see there is an
|
||||
outstanding evaluation for this hypothetical job:
|
||||
|
||||
```shell
|
||||
$ nomad job status -evals docs
|
||||
ID = docs
|
||||
Name = docs
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
|
||||
Evaluations
|
||||
ID Priority Triggered By Status Placement Failures
|
||||
5744eb15 50 job-register blocked N/A - In Progress
|
||||
8e38e6cf 50 job-register complete true
|
||||
|
||||
Placement Failure
|
||||
Task Group "example":
|
||||
* Resources exhausted on 1 nodes
|
||||
* Dimension "cpu" exhausted on 1 nodes
|
||||
|
||||
Allocations
|
||||
ID Eval ID Node ID Task Group Desired Status Created At
|
||||
12681940 8e38e6cf 4beef22f example run running <timestamp>
|
||||
395c5882 8e38e6cf 4beef22f example run running <timestamp>
|
||||
4d7c6f84 8e38e6cf 4beef22f example run running <timestamp>
|
||||
843b07b8 8e38e6cf 4beef22f example run running <timestamp>
|
||||
a8bc6d3e 8e38e6cf 4beef22f example run running <timestamp>
|
||||
b0beb907 8e38e6cf 4beef22f example run running <timestamp>
|
||||
da21c1fd 8e38e6cf 4beef22f example run running <timestamp>
|
||||
```
|
||||
|
||||
In the above example we see that the job has a "blocked" evaluation that is in
|
||||
progress. When Nomad can not place all the desired allocations, it creates a
|
||||
blocked evaluation that waits for more resources to become available.
|
||||
|
||||
The `eval status` command enables us to examine any evaluation in more detail.
|
||||
For the most part this should never be necessary but can be useful to see why
|
||||
all of a job's allocations were not placed. For example if we run it on the job
|
||||
named docs, which had a placement failure according to the above output, we
|
||||
might see:
|
||||
|
||||
```shell
|
||||
$ nomad eval status 8e38e6cf
|
||||
ID = 8e38e6cf
|
||||
Status = complete
|
||||
Status Description = complete
|
||||
Type = service
|
||||
TriggeredBy = job-register
|
||||
Job ID = docs
|
||||
Priority = 50
|
||||
Placement Failures = true
|
||||
|
||||
Failed Placements
|
||||
Task Group "example" (failed to place 3 allocations):
|
||||
* Resources exhausted on 1 nodes
|
||||
* Dimension "cpu" exhausted on 1 nodes
|
||||
|
||||
Evaluation "5744eb15" waiting for additional capacity to place remainder
|
||||
```
|
||||
|
||||
For more information on the `eval status` command, please see the [CLI documentation for <tt>eval status</tt>](/docs/commands/eval-status).
|
||||
|
||||
## Allocation Status
|
||||
|
||||
You can think of an allocation as an instruction to schedule. Just like an
|
||||
application or service, an allocation has logs and state. The `alloc status`
|
||||
command gives us the most recent events that occurred for a task, its resource
|
||||
usage, port allocations and more:
|
||||
|
||||
```shell
|
||||
$ nomad alloc status 04d9627d
|
||||
ID = 04d9627d
|
||||
Eval ID = 42d788a3
|
||||
Name = docs.example[2]
|
||||
Node ID = a1f934c9
|
||||
Job ID = docs
|
||||
Client Status = running
|
||||
|
||||
Task "server" is "running"
|
||||
Task Resources
|
||||
CPU Memory Disk Addresses
|
||||
0/100 MHz 728 KiB/10 MiB 300 MiB http: 10.1.1.196:5678
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
10/09/16 00:36:06 UTC Started Task started by client
|
||||
10/09/16 00:36:05 UTC Received Task received by client
|
||||
```
|
||||
|
||||
The `alloc status` command is a good starting to point for debugging an
|
||||
application that did not start. Hypothetically assume a user meant to start a
|
||||
Docker container named "redis:2.8", but accidentally put a comma instead of a
|
||||
period, typing "redis:2,8".
|
||||
|
||||
When the job is executed, it produces a failed allocation. The `alloc status`
|
||||
command will give us the reason why:
|
||||
|
||||
```shell
|
||||
$ nomad alloc status 04d9627d
|
||||
# ...
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
06/28/16 15:50:22 UTC Not Restarting Error was unrecoverable
|
||||
06/28/16 15:50:22 UTC Driver Failure failed to create image: Failed to pull `redis:2,8`: API error (500): invalid tag format
|
||||
06/28/16 15:50:22 UTC Received Task received by client
|
||||
```
|
||||
|
||||
Unfortunately not all failures are as easily debuggable. If the `alloc status`
|
||||
command shows many restarts, there is likely an application-level issue during
|
||||
start up. For example:
|
||||
|
||||
```shell
|
||||
$ nomad alloc status 04d9627d
|
||||
# ...
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
06/28/16 15:56:16 UTC Restarting Task restarting in 5.178426031s
|
||||
06/28/16 15:56:16 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
|
||||
06/28/16 15:56:16 UTC Started Task started by client
|
||||
06/28/16 15:56:00 UTC Restarting Task restarting in 5.00123931s
|
||||
06/28/16 15:56:00 UTC Terminated Exit Code: 1, Exit Message: "Docker container exited with non-zero exit code: 1"
|
||||
06/28/16 15:55:59 UTC Started Task started by client
|
||||
06/28/16 15:55:48 UTC Received Task received by client
|
||||
```
|
||||
|
||||
To debug these failures, we will need to utilize the "logs" command, which is
|
||||
discussed in the [accessing logs](/guides/operating-a-job/accessing-logs)
|
||||
section of this documentation.
|
||||
|
||||
For more information on the `alloc status` command, please see the [CLI
|
||||
documentation for <tt>alloc status</tt>](/docs/commands/alloc/status).
|
|
@ -0,0 +1,90 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Resource Utilization - Operating a Job
|
||||
sidebar_title: Resource Utilization
|
||||
description: |-
|
||||
Nomad supports reporting detailed job statistics and resource utilization
|
||||
metrics for most task drivers. This section describes the ways to inspect a
|
||||
job's resource consumption and utilization.
|
||||
---
|
||||
|
||||
# Resource Utilization
|
||||
|
||||
Understanding the resource utilization of an application is important, and Nomad
|
||||
supports reporting detailed statistics in many of its drivers. The main
|
||||
interface for seeing resource utilization is the `alloc status` command with the
|
||||
`-stats` flag.
|
||||
|
||||
This section will utilize the job named "docs" from the [previous
|
||||
sections](/guides/operating-a-job/submitting-jobs), but these operations
|
||||
and command largely apply to all jobs in Nomad.
|
||||
|
||||
As a reminder, here is the output of the run command from the previous example:
|
||||
|
||||
```shell
|
||||
$ nomad job run docs.nomad
|
||||
==> Monitoring evaluation "42d788a3"
|
||||
Evaluation triggered by job "docs"
|
||||
Allocation "04d9627d" created: node "a1f934c9", group "example"
|
||||
Allocation "e7b8d4f5" created: node "012ea79b", group "example"
|
||||
Allocation "5cbf23a1" modified: node "1e1aa1e0", group "example"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "42d788a3" finished with status "complete"
|
||||
```
|
||||
|
||||
To see the detailed usage statistics, we can issue the command:
|
||||
|
||||
```shell
|
||||
$ nomad alloc status -stats 04d9627d
|
||||
ID = 04d9627d
|
||||
Eval ID = 42d788a3
|
||||
Name = docs.example[2]
|
||||
Node ID = a1f934c9
|
||||
Job ID = docs
|
||||
Client Status = running
|
||||
|
||||
Task "server" is "running"
|
||||
Task Resources
|
||||
CPU Memory Disk Addresses
|
||||
75/100 MHz 784 KiB/10 MiB 300 MiB http: 10.1.1.196:5678
|
||||
|
||||
Memory Stats
|
||||
Cache Max Usage RSS Swap
|
||||
56 KiB 1.3 MiB 784 KiB 0 B
|
||||
|
||||
CPU Stats
|
||||
Percent Throttled Periods Throttled Time
|
||||
0.00% 0 0
|
||||
|
||||
Recent Events:
|
||||
Time Type Description
|
||||
<timestamp> Started Task started by client
|
||||
<timestamp> Received Task received by client
|
||||
```
|
||||
|
||||
Here we can see that we are near the limit of our configured CPU but we have
|
||||
plenty of memory headroom. We can use this information to alter our job's
|
||||
resources to better reflect its actual needs:
|
||||
|
||||
```hcl
|
||||
resources {
|
||||
cpu = 200
|
||||
memory = 10
|
||||
}
|
||||
```
|
||||
|
||||
Adjusting resources is very important for a variety of reasons:
|
||||
|
||||
- Ensuring your application does not get OOM killed if it hits its memory limit.
|
||||
- Ensuring the application performs well by ensuring it has some CPU allowance.
|
||||
- Optimizing cluster density by reserving what you need and not over-allocating.
|
||||
|
||||
While single point in time resource usage measurements are useful, it is often
|
||||
more useful to graph resource usage over time to better understand and estimate
|
||||
resource usage. Nomad supports outputting resource data to statsite and statsd
|
||||
and is the recommended way of monitoring resources. For more information about
|
||||
outputting telemetry see the [Telemetry Guide](/docs/telemetry).
|
||||
|
||||
For more advanced use cases, the resource usage data is also accessible via the
|
||||
client's HTTP API. See the documentation of the Client's [allocation HTTP
|
||||
API](/api/client).
|
173
website/pages/guides/operating-a-job/submitting-jobs.mdx
Normal file
173
website/pages/guides/operating-a-job/submitting-jobs.mdx
Normal file
|
@ -0,0 +1,173 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Submitting Jobs - Operating a Job
|
||||
sidebar_title: Submitting Jobs
|
||||
description: |-
|
||||
The job file is the unit of work in Nomad. Upon authoring, the job file is
|
||||
submitted to the server for evaluation and scheduling. This section discusses
|
||||
some techniques for submitting jobs.
|
||||
---
|
||||
|
||||
# Submitting Jobs
|
||||
|
||||
In Nomad, the description of the job and all its requirements are maintained in
|
||||
a single file called the "job file". This job file resides locally on disk and
|
||||
it is highly recommended that you check job files into source control.
|
||||
|
||||
The general flow for submitting a job in Nomad is:
|
||||
|
||||
1. Author a job file according to the job specification
|
||||
1. Plan and review changes with a Nomad server
|
||||
1. Submit the job file to a Nomad server
|
||||
1. (Optional) Review job status and logs
|
||||
|
||||
Here is a very basic example to get you started.
|
||||
|
||||
## Author a Job File
|
||||
|
||||
Authoring a job file is very easy. For more detailed information, please see the
|
||||
[job specification](/docs/job-specification). Here is a sample job
|
||||
file which runs a small docker container web server to get us started.
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
task "server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "hashicorp/http-echo"
|
||||
args = [
|
||||
"-listen", ":5678",
|
||||
"-text", "hello world",
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http" {
|
||||
static = "5678"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This job file exists on your local workstation in plain text. When you are
|
||||
satisfied with this job file, you will plan and review the scheduler decision.
|
||||
It is generally a best practice to commit job files to source control,
|
||||
especially if you are working in a team.
|
||||
|
||||
## Planning the Job
|
||||
|
||||
Once the job file is authored, we need to plan out the changes. The `nomad job plan`
|
||||
command invokes a dry-run of the scheduler and inform us of which scheduling
|
||||
decisions would take place.
|
||||
|
||||
```shell
|
||||
$ nomad job plan docs.nomad
|
||||
+ Job: "docs"
|
||||
+ Task Group: "example" (1 create)
|
||||
+ Task: "server" (forces create)
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
|
||||
Job Modify Index: 0
|
||||
To submit the job with version verification run:
|
||||
|
||||
nomad job run -check-index 0 docs.nomad
|
||||
|
||||
When running the job with the check-index flag, the job will only be run if the
|
||||
server side version matches the job modify index returned. If the index has
|
||||
changed, another user has modified the job and the plan's results are
|
||||
potentially invalid.
|
||||
```
|
||||
|
||||
Note that no action was taken. This job is not running. This is a complete
|
||||
dry-run and no allocations have taken place.
|
||||
|
||||
## Submitting the Job
|
||||
|
||||
Assuming the output of the plan looks acceptable, we can ask Nomad to execute
|
||||
this job. This is done via the `nomad job run` command. We can optionally supply
|
||||
the modify index provided to us by the plan command to ensure no changes to this
|
||||
job have taken place between our plan and now.
|
||||
|
||||
```shell
|
||||
$ nomad job run docs.nomad
|
||||
==> Monitoring evaluation "0d159869"
|
||||
Evaluation triggered by job "docs"
|
||||
Allocation "5cbf23a1" created: node "1e1aa1e0", group "example"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "0d159869" finished with status "complete"
|
||||
```
|
||||
|
||||
Now that the job is scheduled, it may or may not be running. We need to inspect
|
||||
the allocation status and logs to make sure the job started correctly. The next
|
||||
section on [inspecting state](/guides/operating-a-job/inspecting-state)
|
||||
details ways to examine this job's state.
|
||||
|
||||
## Updating the Job
|
||||
|
||||
When making updates to the job, it is best to always run the plan command and
|
||||
then the run command. For example:
|
||||
|
||||
```diff
|
||||
@@ -2,6 +2,8 @@
|
||||
job "docs" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "example" {
|
||||
+ count = "3"
|
||||
+
|
||||
task "server" {
|
||||
driver = "docker"
|
||||
```
|
||||
|
||||
After we save these changes to disk, run the plan command:
|
||||
|
||||
```shell
|
||||
$ nomad job plan docs.nomad
|
||||
+/- Job: "docs"
|
||||
+/- Task Group: "example" (2 create, 1 in-place update)
|
||||
+/- Count: "1" => "3" (forces create)
|
||||
Task: "server"
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
|
||||
Job Modify Index: 131
|
||||
To submit the job with version verification run:
|
||||
|
||||
nomad job run -check-index 131 docs.nomad
|
||||
|
||||
When running the job with the check-index flag, the job will only be run if the
|
||||
server side version matches the job modify index returned. If the index has
|
||||
changed, another user has modified the job and the plan's results are
|
||||
potentially invalid.
|
||||
```
|
||||
|
||||
And then run the run command, assuming the output looks okay. Note that we are
|
||||
including the "check-index" parameter. This will ensure that no remote changes
|
||||
have taken place to the job between our plan and run phases.
|
||||
|
||||
```text
|
||||
nomad job run -check-index 131 docs.nomad
|
||||
==> Monitoring evaluation "42d788a3"
|
||||
Evaluation triggered by job "docs"
|
||||
Allocation "04d9627d" created: node "a1f934c9", group "example"
|
||||
Allocation "e7b8d4f5" created: node "012ea79b", group "example"
|
||||
Allocation "5cbf23a1" modified: node "1e1aa1e0", group "example"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "42d788a3" finished with status "complete"
|
||||
```
|
||||
|
||||
For more details on advanced job updating strategies such as canary builds and
|
||||
build-green deployments, please see the documentation on [job update
|
||||
strategies](/guides/operating-a-job/update-strategies).
|
|
@ -0,0 +1,457 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Blue/Green & Canary Deployments - Operating a Job
|
||||
sidebar_title: Blue/Green & Canary
|
||||
description: |-
|
||||
Nomad has built-in support for doing blue/green and canary deployments to more
|
||||
safely update existing applications and services.
|
||||
---
|
||||
|
||||
# Blue/Green & Canary Deployments
|
||||
|
||||
Sometimes [rolling
|
||||
upgrades](/guides/operating-a-job/update-strategies/rolling-upgrades) do not
|
||||
offer the required flexibility for updating an application in production. Often
|
||||
organizations prefer to put a "canary" build into production or utilize a
|
||||
technique known as a "blue/green" deployment to ensure a safe application
|
||||
rollout to production while minimizing downtime.
|
||||
|
||||
## Blue/Green Deployments
|
||||
|
||||
Blue/Green deployments have several other names including Red/Black or A/B, but
|
||||
the concept is generally the same. In a blue/green deployment, there are two
|
||||
application versions. Only one application version is active at a time, except
|
||||
during the transition phase from one version to the next. The term "active"
|
||||
tends to mean "receiving traffic" or "in service".
|
||||
|
||||
Imagine a hypothetical API server which has five instances deployed to
|
||||
production at version 1.3, and we want to safely upgrade to version 1.4. We want
|
||||
to create five new instances at version 1.4 and in the case that they are
|
||||
operating correctly we want to promote them and take down the five versions
|
||||
running 1.3. In the event of failure, we can quickly rollback to 1.3.
|
||||
|
||||
To start, we examine our job which is running in production:
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
# ...
|
||||
|
||||
group "api" {
|
||||
count = 5
|
||||
|
||||
update {
|
||||
max_parallel = 1
|
||||
canary = 5
|
||||
min_healthy_time = "30s"
|
||||
healthy_deadline = "10m"
|
||||
auto_revert = true
|
||||
auto_promote = false
|
||||
}
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:1.3"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
We see that it has an `update` stanza that has the `canary` equal to the desired
|
||||
count. This is what allows us to easily model blue/green deployments. When we
|
||||
change the job to run the "api-server:1.4" image, Nomad will create 5 new
|
||||
allocations without touching the original "api-server:1.3" allocations. Below we
|
||||
can see how this works by changing the image to run the new version:
|
||||
|
||||
```diff
|
||||
@@ -2,6 +2,8 @@
|
||||
job "docs" {
|
||||
group "api" {
|
||||
task "api-server" {
|
||||
config {
|
||||
- image = "api-server:1.3"
|
||||
+ image = "api-server:1.4"
|
||||
```
|
||||
|
||||
Next we plan and run these changes:
|
||||
|
||||
```shell
|
||||
$ nomad job plan docs.nomad
|
||||
+/- Job: "docs"
|
||||
+/- Task Group: "api" (5 canary, 5 ignore)
|
||||
+/- Task: "api-server" (forces create/destroy update)
|
||||
+/- Config {
|
||||
+/- image: "api-server:1.3" => "api-server:1.4"
|
||||
}
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
|
||||
Job Modify Index: 7
|
||||
To submit the job with version verification run:
|
||||
|
||||
nomad job run -check-index 7 example.nomad
|
||||
|
||||
When running the job with the check-index flag, the job will only be run if the
|
||||
server side version matches the job modify index returned. If the index has
|
||||
changed, another user has modified the job and the plan's results are
|
||||
potentially invalid.
|
||||
|
||||
$ nomad job run docs.nomad
|
||||
# ...
|
||||
```
|
||||
|
||||
We can see from the plan output that Nomad is going to create 5 canaries that
|
||||
are running the "api-server:1.4" image and ignore all the allocations running
|
||||
the older image. Now if we examine the status of the job we can see that both
|
||||
the blue ("api-server:1.3") and green ("api-server:1.4") set are running.
|
||||
|
||||
```shell
|
||||
$ nomad status docs
|
||||
ID = docs
|
||||
Name = docs
|
||||
Submit Date = 07/26/17 19:57:47 UTC
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
api 0 0 10 0 0 0
|
||||
|
||||
Latest Deployment
|
||||
ID = 32a080c1
|
||||
Status = running
|
||||
Description = Deployment is running but requires manual promotion
|
||||
|
||||
Deployed
|
||||
Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy
|
||||
api true false 5 5 5 5 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created At
|
||||
6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC
|
||||
7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC
|
||||
36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC
|
||||
410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC
|
||||
85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC
|
||||
3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC
|
||||
4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC
|
||||
2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC
|
||||
35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC
|
||||
b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC
|
||||
```
|
||||
|
||||
Now that we have the new set in production, we can route traffic to it and
|
||||
validate the new job version is working properly. Based on whether the new
|
||||
version is functioning properly or improperly we will either want to promote or
|
||||
fail the deployment.
|
||||
|
||||
### Promoting the Deployment
|
||||
|
||||
After deploying the new image along side the old version we have determined it
|
||||
is functioning properly and we want to transition fully to the new version.
|
||||
Doing so is as simple as promoting the deployment:
|
||||
|
||||
```shell
|
||||
$ nomad deployment promote 32a080c1
|
||||
==> Monitoring evaluation "61ac2be5"
|
||||
Evaluation triggered by job "docs"
|
||||
Evaluation within deployment: "32a080c1"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "61ac2be5" finished with status "complete"
|
||||
```
|
||||
|
||||
If we look at the job's status we see that after promotion, Nomad stopped the
|
||||
older allocations and is only running the new one. This now completes our
|
||||
blue/green deployment.
|
||||
|
||||
```shell
|
||||
$ nomad status docs
|
||||
ID = docs
|
||||
Name = docs
|
||||
Submit Date = 07/26/17 19:57:47 UTC
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
api 0 0 5 0 5 0
|
||||
|
||||
Latest Deployment
|
||||
ID = 32a080c1
|
||||
Status = successful
|
||||
Description = Deployment completed successfully
|
||||
|
||||
Deployed
|
||||
Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy
|
||||
api true true 5 5 5 5 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created At
|
||||
6d8eec42 087852e2 api 1 run running 07/26/17 19:57:47 UTC
|
||||
7051480e 087852e2 api 1 run running 07/26/17 19:57:47 UTC
|
||||
36c6610f 087852e2 api 1 run running 07/26/17 19:57:47 UTC
|
||||
410ba474 087852e2 api 1 run running 07/26/17 19:57:47 UTC
|
||||
85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC
|
||||
3ac3fe05 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC
|
||||
4bd51979 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC
|
||||
2998387b 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC
|
||||
35b813ee 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC
|
||||
b53b4289 087852e2 api 0 stop complete 07/26/17 19:53:56 UTC
|
||||
```
|
||||
|
||||
### Failing the Deployment
|
||||
|
||||
After deploying the new image alongside the old version we have determined it
|
||||
is not functioning properly and we want to roll back to the old version. Doing
|
||||
so is as simple as failing the deployment:
|
||||
|
||||
```shell
|
||||
$ nomad deployment fail 32a080c1
|
||||
Deployment "32a080c1-de5a-a4e7-0218-521d8344c328" failed. Auto-reverted to job version 0.
|
||||
|
||||
==> Monitoring evaluation "6840f512"
|
||||
Evaluation triggered by job "example"
|
||||
Evaluation within deployment: "32a080c1"
|
||||
Allocation "0ccb732f" modified: node "36e7a123", group "cache"
|
||||
Allocation "64d4f282" modified: node "36e7a123", group "cache"
|
||||
Allocation "664e33c7" modified: node "36e7a123", group "cache"
|
||||
Allocation "a4cb6a4b" modified: node "36e7a123", group "cache"
|
||||
Allocation "fdd73bdd" modified: node "36e7a123", group "cache"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "6840f512" finished with status "complete"
|
||||
```
|
||||
|
||||
If we now look at the job's status we can see that after failing the deployment,
|
||||
Nomad stopped the new allocations and is only running the old ones and reverted
|
||||
the working copy of the job back to the original specification running
|
||||
"api-server:1.3".
|
||||
|
||||
```shell
|
||||
$ nomad status docs
|
||||
ID = docs
|
||||
Name = docs
|
||||
Submit Date = 07/26/17 19:57:47 UTC
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
api 0 0 5 0 5 0
|
||||
|
||||
Latest Deployment
|
||||
ID = 6f3f84b3
|
||||
Status = successful
|
||||
Description = Deployment completed successfully
|
||||
|
||||
Deployed
|
||||
Task Group Auto Revert Desired Placed Healthy Unhealthy
|
||||
cache true 5 5 5 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created At
|
||||
27dc2a42 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC
|
||||
5b7d34bb 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC
|
||||
983b487d 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC
|
||||
d1cbf45a 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC
|
||||
d6b46def 36e7a123 api 1 stop complete 07/26/17 20:07:31 UTC
|
||||
0ccb732f 36e7a123 api 2 run running 07/26/17 20:06:29 UTC
|
||||
64d4f282 36e7a123 api 2 run running 07/26/17 20:06:29 UTC
|
||||
664e33c7 36e7a123 api 2 run running 07/26/17 20:06:29 UTC
|
||||
a4cb6a4b 36e7a123 api 2 run running 07/26/17 20:06:29 UTC
|
||||
fdd73bdd 36e7a123 api 2 run running 07/26/17 20:06:29 UTC
|
||||
|
||||
$ nomad job deployments docs
|
||||
ID Job ID Job Version Status Description
|
||||
6f3f84b3 example 2 successful Deployment completed successfully
|
||||
32a080c1 example 1 failed Deployment marked as failed - rolling back to job version 0
|
||||
c4c16494 example 0 successful Deployment completed successfully
|
||||
```
|
||||
|
||||
## Canary Deployments
|
||||
|
||||
Canary updates are a useful way to test a new version of a job before beginning
|
||||
a rolling upgrade. The `update` stanza supports setting the number of canaries
|
||||
the job operator would like Nomad to create when the job changes via the
|
||||
`canary` parameter. When the job specification is updated, Nomad creates the
|
||||
canaries without stopping any allocations from the previous job.
|
||||
|
||||
This pattern allows operators to achieve higher confidence in the new job
|
||||
version because they can route traffic, examine logs, etc, to determine the new
|
||||
application is performing properly.
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
# ...
|
||||
|
||||
group "api" {
|
||||
count = 5
|
||||
|
||||
update {
|
||||
max_parallel = 1
|
||||
canary = 1
|
||||
min_healthy_time = "30s"
|
||||
healthy_deadline = "10m"
|
||||
auto_revert = true
|
||||
auto_promote = false
|
||||
}
|
||||
|
||||
task "api-server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "api-server:1.3"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In the example above, the `update` stanza tells Nomad to create a single canary
|
||||
when the job specification is changed. Below we can see how this works by
|
||||
changing the image to run the new version:
|
||||
|
||||
```diff
|
||||
@@ -2,6 +2,8 @@
|
||||
job "docs" {
|
||||
group "api" {
|
||||
task "api-server" {
|
||||
config {
|
||||
- image = "api-server:1.3"
|
||||
+ image = "api-server:1.4"
|
||||
```
|
||||
|
||||
Next we plan and run these changes:
|
||||
|
||||
```shell
|
||||
$ nomad job plan docs.nomad
|
||||
+/- Job: "docs"
|
||||
+/- Task Group: "api" (1 canary, 5 ignore)
|
||||
+/- Task: "api-server" (forces create/destroy update)
|
||||
+/- Config {
|
||||
+/- image: "api-server:1.3" => "api-server:1.4"
|
||||
}
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
|
||||
Job Modify Index: 7
|
||||
To submit the job with version verification run:
|
||||
|
||||
nomad job run -check-index 7 example.nomad
|
||||
|
||||
When running the job with the check-index flag, the job will only be run if the
|
||||
server side version matches the job modify index returned. If the index has
|
||||
changed, another user has modified the job and the plan's results are
|
||||
potentially invalid.
|
||||
|
||||
$ nomad job run docs.nomad
|
||||
# ...
|
||||
```
|
||||
|
||||
We can see from the plan output that Nomad is going to create 1 canary that
|
||||
will run the "api-server:1.4" image and ignore all the allocations running
|
||||
the older image. If we inspect the status we see that the canary is running
|
||||
along side the older version of the job:
|
||||
|
||||
```shell
|
||||
$ nomad status docs
|
||||
ID = docs
|
||||
Name = docs
|
||||
Submit Date = 07/26/17 19:57:47 UTC
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
api 0 0 6 0 0 0
|
||||
|
||||
Latest Deployment
|
||||
ID = 32a080c1
|
||||
Status = running
|
||||
Description = Deployment is running but requires manual promotion
|
||||
|
||||
Deployed
|
||||
Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy
|
||||
api true false 5 1 1 1 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created At
|
||||
85662a7a 087852e2 api 1 run running 07/26/17 19:57:47 UTC
|
||||
3ac3fe05 087852e2 api 0 run running 07/26/17 19:53:56 UTC
|
||||
4bd51979 087852e2 api 0 run running 07/26/17 19:53:56 UTC
|
||||
2998387b 087852e2 api 0 run running 07/26/17 19:53:56 UTC
|
||||
35b813ee 087852e2 api 0 run running 07/26/17 19:53:56 UTC
|
||||
b53b4289 087852e2 api 0 run running 07/26/17 19:53:56 UTC
|
||||
```
|
||||
|
||||
Now if we promote the canary, this will trigger a rolling update to replace the
|
||||
remaining allocations running the older image. The rolling update will happen at
|
||||
a rate of `max_parallel`, so in this case one allocation at a time:
|
||||
|
||||
```shell
|
||||
$ nomad deployment promote 37033151
|
||||
==> Monitoring evaluation "37033151"
|
||||
Evaluation triggered by job "docs"
|
||||
Evaluation within deployment: "ed28f6c2"
|
||||
Allocation "f5057465" created: node "f6646949", group "cache"
|
||||
Allocation "f5057465" status changed: "pending" -> "running"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "37033151" finished with status "complete"
|
||||
|
||||
$ nomad status docs
|
||||
ID = docs
|
||||
Name = docs
|
||||
Submit Date = 07/26/17 20:28:59 UTC
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
api 0 0 5 0 2 0
|
||||
|
||||
Latest Deployment
|
||||
ID = ed28f6c2
|
||||
Status = running
|
||||
Description = Deployment is running
|
||||
|
||||
Deployed
|
||||
Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy
|
||||
api true true 5 1 2 1 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created At
|
||||
f5057465 f6646949 api 1 run running 07/26/17 20:29:23 UTC
|
||||
b1c88d20 f6646949 api 1 run running 07/26/17 20:28:59 UTC
|
||||
1140bacf f6646949 api 0 run running 07/26/17 20:28:37 UTC
|
||||
1958a34a f6646949 api 0 run running 07/26/17 20:28:37 UTC
|
||||
4bda385a f6646949 api 0 run running 07/26/17 20:28:37 UTC
|
||||
62d96f06 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC
|
||||
f58abbb2 f6646949 api 0 stop complete 07/26/17 20:28:37 UTC
|
||||
```
|
||||
|
||||
Alternatively, if the canary was not performing properly, we could abandon the
|
||||
change using the `nomad deployment fail` command, similar to the blue/green
|
||||
example.
|
|
@ -0,0 +1,42 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Handling Signals - Operating a Job
|
||||
sidebar_title: Handling Signals
|
||||
description: |-
|
||||
Well-behaved applications expose a way to perform cleanup prior to exiting.
|
||||
Nomad can optionally send a configurable signal to applications before
|
||||
killing them, allowing them to drain connections or gracefully terminate.
|
||||
---
|
||||
|
||||
# Handling Signals
|
||||
|
||||
On operating systems that support signals, Nomad will send the application a
|
||||
configurable signal before killing it. This gives the application time to
|
||||
gracefully drain connections and conduct other cleanup before shutting down.
|
||||
Certain applications take longer to drain than others, and thus Nomad allows
|
||||
specifying the amount of time to wait for the application to exit before
|
||||
force-killing it.
|
||||
|
||||
Before Nomad terminates an application, it will send the `SIGINT` signal to the
|
||||
process. Processes running under Nomad should respond to this signal to
|
||||
gracefully drain connections. After a configurable timeout, the application
|
||||
will be force-terminated.
|
||||
|
||||
For more details on the `kill_timeout` option, please see the
|
||||
[job specification documentation](/docs/job-specification/task#kill_timeout).
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
group "example" {
|
||||
task "server" {
|
||||
# ...
|
||||
kill_timeout = "45s"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The behavior is slightly different for Docker-based tasks. Nomad will run the
|
||||
`docker stop` command with the specified `kill_timeout`. The signal that `docker stop` sends to your container entrypoint is configurable using the
|
||||
[`STOPSIGNAL` configuration directive](https://docs.docker.com/engine/reference/builder/#stopsignal), however please
|
||||
note that the default is `SIGTERM`.
|
|
@ -0,0 +1,25 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Update Strategies - Operating a Job
|
||||
sidebar_title: Update Strategies
|
||||
description: |-
|
||||
This section describes common patterns for updating already-running jobs
|
||||
including rolling upgrades, blue/green deployments, and canary builds. Nomad
|
||||
provides built-in support for this functionality.
|
||||
---
|
||||
|
||||
# Update Strategies
|
||||
|
||||
Most applications are long-lived and require updates over time. Whether you are
|
||||
deploying a new version of your web application or upgrading to a new version of
|
||||
Redis, Nomad has built-in support for rolling, blue/green, and canary updates.
|
||||
When a job specifies a rolling update, Nomad uses task state and health check
|
||||
information in order to detect allocation health and minimize or eliminate
|
||||
downtime. This section and subsections will explore how to do so safely with
|
||||
Nomad.
|
||||
|
||||
Please see one of the guides below or use the navigation on the left:
|
||||
|
||||
1. [Rolling Upgrades](/guides/operating-a-job/update-strategies/rolling-upgrades)
|
||||
1. [Blue/Green & Canary Deployments](/guides/operating-a-job/update-strategies/blue-green-and-canary-deployments)
|
||||
1. [Handling Signals](/guides/operating-a-job/update-strategies/handling-signals)
|
|
@ -0,0 +1,326 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Rolling Upgrades - Operating a Job
|
||||
sidebar_title: Rolling Upgrades
|
||||
description: |-
|
||||
In order to update a service while reducing downtime, Nomad provides a
|
||||
built-in mechanism for rolling upgrades. Rolling upgrades incrementally
|
||||
transitions jobs between versions and using health check information to
|
||||
reduce downtime.
|
||||
---
|
||||
|
||||
# Rolling Upgrades
|
||||
|
||||
Nomad supports rolling updates as a first class feature. To enable rolling
|
||||
updates a job or task group is annotated with a high-level description of the
|
||||
update strategy using the [`update` stanza][update]. Under the hood, Nomad
|
||||
handles limiting parallelism, interfacing with Consul to determine service
|
||||
health and even automatically reverting to an older, healthy job when a
|
||||
deployment fails.
|
||||
|
||||
## Enabling Rolling Updates
|
||||
|
||||
Rolling updates are enabled by adding the [`update` stanza][update] to the job
|
||||
specification. The `update` stanza may be placed at the job level or in an
|
||||
individual task group. When placed at the job level, the update strategy is
|
||||
inherited by all task groups in the job. When placed at both the job and group
|
||||
level, the `update` stanzas are merged, with group stanzas taking precedence
|
||||
over job level stanzas. See the [`update` stanza
|
||||
documentation](/docs/job-specification/update#upgrade-stanza-inheritance)
|
||||
for an example.
|
||||
|
||||
```hcl
|
||||
job "geo-api-server" {
|
||||
# ...
|
||||
|
||||
group "api-server" {
|
||||
count = 6
|
||||
|
||||
# Add an update stanza to enable rolling updates of the service
|
||||
update {
|
||||
max_parallel = 2
|
||||
min_healthy_time = "30s"
|
||||
healthy_deadline = "10m"
|
||||
}
|
||||
|
||||
task "server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "geo-api-server:0.1"
|
||||
}
|
||||
|
||||
# ...
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In this example, by adding the simple `update` stanza to the "api-server" task
|
||||
group, we inform Nomad that updates to the group should be handled with a
|
||||
rolling update strategy.
|
||||
|
||||
Thus when a change is made to the job file that requires new allocations to be
|
||||
made, Nomad will deploy 2 allocations at a time and require that the allocations
|
||||
be running in a healthy state for 30 seconds before deploying more versions of the
|
||||
new group.
|
||||
|
||||
By default Nomad determines allocation health by ensuring that all tasks in the
|
||||
group are running and that any [service
|
||||
check](/docs/job-specification/service#check-parameters) the tasks register
|
||||
are passing.
|
||||
|
||||
## Planning Changes
|
||||
|
||||
Suppose we make a change to a file to upgrade the version of a Docker container
|
||||
that is configured with the same rolling update strategy from above.
|
||||
|
||||
```diff
|
||||
@@ -2,6 +2,8 @@
|
||||
job "geo-api-server" {
|
||||
group "api-server" {
|
||||
task "server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
- image = "geo-api-server:0.1"
|
||||
+ image = "geo-api-server:0.2"
|
||||
```
|
||||
|
||||
The [`nomad job plan` command](/docs/commands/job/plan) allows
|
||||
us to visualize the series of steps the scheduler would perform. We can analyze
|
||||
this output to confirm it is correct:
|
||||
|
||||
```shell
|
||||
$ nomad job plan geo-api-server.nomad
|
||||
+/- Job: "geo-api-server"
|
||||
+/- Task Group: "api-server" (2 create/destroy update, 4 ignore)
|
||||
+/- Task: "server" (forces create/destroy update)
|
||||
+/- Config {
|
||||
+/- image: "geo-api-server:0.1" => "geo-api-server:0.2"
|
||||
}
|
||||
|
||||
Scheduler dry-run:
|
||||
- All tasks successfully allocated.
|
||||
|
||||
Job Modify Index: 7
|
||||
To submit the job with version verification run:
|
||||
|
||||
nomad job run -check-index 7 my-web.nomad
|
||||
|
||||
When running the job with the check-index flag, the job will only be run if the
|
||||
server side version matches the job modify index returned. If the index has
|
||||
changed, another user has modified the job and the plan's results are
|
||||
potentially invalid.
|
||||
```
|
||||
|
||||
Here we can see that Nomad will begin a rolling update by creating and
|
||||
destroying 2 allocations first and for the time being ignoring 4 of the old
|
||||
allocations, matching our configured `max_parallel`.
|
||||
|
||||
## Inspecting a Deployment
|
||||
|
||||
After running the plan we can submit the updated job by simply running `nomad run`. Once run, Nomad will begin the rolling upgrade of our service by placing
|
||||
2 allocations at a time of the new job and taking two of the old jobs down.
|
||||
|
||||
We can inspect the current state of a rolling deployment using `nomad status`:
|
||||
|
||||
```shell
|
||||
$ nomad status geo-api-server
|
||||
ID = geo-api-server
|
||||
Name = geo-api-server
|
||||
Submit Date = 07/26/17 18:08:56 UTC
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
api-server 0 0 6 0 4 0
|
||||
|
||||
Latest Deployment
|
||||
ID = c5b34665
|
||||
Status = running
|
||||
Description = Deployment is running
|
||||
|
||||
Deployed
|
||||
Task Group Desired Placed Healthy Unhealthy
|
||||
api-server 6 4 2 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created At
|
||||
14d288e8 f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC
|
||||
a134f73c f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC
|
||||
a2574bb6 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC
|
||||
496e7aa2 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC
|
||||
9fc96fcc f7b1ee08 api-server 0 run running 07/26/17 18:04:30 UTC
|
||||
2521c47a f7b1ee08 api-server 0 run running 07/26/17 18:04:30 UTC
|
||||
6b794fcb f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC
|
||||
9bc11bd7 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC
|
||||
691eea24 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC
|
||||
af115865 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC
|
||||
```
|
||||
|
||||
Here we can see that Nomad has created a deployment to conduct the rolling
|
||||
upgrade from job version 0 to 1 and has placed 4 instances of the new job and
|
||||
has stopped 4 of the old instances. If we look at the deployed allocations, we
|
||||
also can see that Nomad has placed 4 instances of job version 1 but only
|
||||
considers 2 of them healthy. This is because the 2 newest placed allocations
|
||||
haven't been healthy for the required 30 seconds yet.
|
||||
|
||||
If we wait for the deployment to complete and re-issue the command, we get the
|
||||
following:
|
||||
|
||||
```shell
|
||||
$ nomad status geo-api-server
|
||||
ID = geo-api-server
|
||||
Name = geo-api-server
|
||||
Submit Date = 07/26/17 18:08:56 UTC
|
||||
Type = service
|
||||
Priority = 50
|
||||
Datacenters = dc1
|
||||
Status = running
|
||||
Periodic = false
|
||||
Parameterized = false
|
||||
|
||||
Summary
|
||||
Task Group Queued Starting Running Failed Complete Lost
|
||||
cache 0 0 6 0 6 0
|
||||
|
||||
Latest Deployment
|
||||
ID = c5b34665
|
||||
Status = successful
|
||||
Description = Deployment completed successfully
|
||||
|
||||
Deployed
|
||||
Task Group Desired Placed Healthy Unhealthy
|
||||
cache 6 6 6 0
|
||||
|
||||
Allocations
|
||||
ID Node ID Task Group Version Desired Status Created At
|
||||
d42a1656 f7b1ee08 api-server 1 run running 07/26/17 18:10:10 UTC
|
||||
401daaf9 f7b1ee08 api-server 1 run running 07/26/17 18:10:00 UTC
|
||||
14d288e8 f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC
|
||||
a134f73c f7b1ee08 api-server 1 run running 07/26/17 18:09:17 UTC
|
||||
a2574bb6 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC
|
||||
496e7aa2 f7b1ee08 api-server 1 run running 07/26/17 18:08:56 UTC
|
||||
9fc96fcc f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC
|
||||
2521c47a f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC
|
||||
6b794fcb f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC
|
||||
9bc11bd7 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC
|
||||
691eea24 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC
|
||||
af115865 f7b1ee08 api-server 0 stop complete 07/26/17 18:04:30 UTC
|
||||
```
|
||||
|
||||
Nomad has successfully transitioned the group to running the updated canary and
|
||||
did so with no downtime to our service by ensuring only two allocations were
|
||||
changed at a time and the newly placed allocations ran successfully. Had any of
|
||||
the newly placed allocations failed their health check, Nomad would have aborted
|
||||
the deployment and stopped placing new allocations. If configured, Nomad can
|
||||
automatically revert back to the old job definition when the deployment fails.
|
||||
|
||||
## Auto Reverting on Failed Deployments
|
||||
|
||||
In the case we do a deployment in which the new allocations are unhealthy, Nomad
|
||||
will fail the deployment and stop placing new instances of the job. It
|
||||
optionally supports automatically reverting back to the last stable job version
|
||||
on deployment failure. Nomad keeps a history of submitted jobs and whether the
|
||||
job version was stable. A job is considered stable if all its allocations are
|
||||
healthy.
|
||||
|
||||
To enable this we simply add the `auto_revert` parameter to the `update` stanza:
|
||||
|
||||
```
|
||||
update {
|
||||
max_parallel = 2
|
||||
min_healthy_time = "30s"
|
||||
healthy_deadline = "10m"
|
||||
|
||||
# Enable automatically reverting to the last stable job on a failed
|
||||
# deployment.
|
||||
auto_revert = true
|
||||
}
|
||||
```
|
||||
|
||||
Now imagine we want to update our image to "geo-api-server:0.3" but we instead
|
||||
update it to the below and run the job:
|
||||
|
||||
```diff
|
||||
@@ -2,6 +2,8 @@
|
||||
job "geo-api-server" {
|
||||
group "api-server" {
|
||||
task "server" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
- image = "geo-api-server:0.2"
|
||||
+ image = "geo-api-server:0.33"
|
||||
```
|
||||
|
||||
If we run `nomad job deployments` we can see that the deployment fails and Nomad
|
||||
auto-reverts to the last stable job:
|
||||
|
||||
```shell
|
||||
$ nomad job deployments geo-api-server
|
||||
ID Job ID Job Version Status Description
|
||||
0c6f87a5 geo-api-server 3 successful Deployment completed successfully
|
||||
b1712b7f geo-api-server 2 failed Failed due to unhealthy allocations - rolling back to job version 1
|
||||
3eee83ce geo-api-server 1 successful Deployment completed successfully
|
||||
72813fcf geo-api-server 0 successful Deployment completed successfully
|
||||
```
|
||||
|
||||
Nomad job versions increment monotonically, so even though Nomad reverted to the
|
||||
job specification at version 1, it creates a new job version. We can see the
|
||||
differences between a jobs versions and how Nomad auto-reverted the job using
|
||||
the `job history` command:
|
||||
|
||||
```shell
|
||||
$ nomad job history -p geo-api-server
|
||||
Version = 3
|
||||
Stable = true
|
||||
Submit Date = 07/26/17 18:44:18 UTC
|
||||
Diff =
|
||||
+/- Job: "geo-api-server"
|
||||
+/- Task Group: "api-server"
|
||||
+/- Task: "server"
|
||||
+/- Config {
|
||||
+/- image: "geo-api-server:0.33" => "geo-api-server:0.2"
|
||||
}
|
||||
|
||||
Version = 2
|
||||
Stable = false
|
||||
Submit Date = 07/26/17 18:45:21 UTC
|
||||
Diff =
|
||||
+/- Job: "geo-api-server"
|
||||
+/- Task Group: "api-server"
|
||||
+/- Task: "server"
|
||||
+/- Config {
|
||||
+/- image: "geo-api-server:0.2" => "geo-api-server:0.33"
|
||||
}
|
||||
|
||||
Version = 1
|
||||
Stable = true
|
||||
Submit Date = 07/26/17 18:44:18 UTC
|
||||
Diff =
|
||||
+/- Job: "geo-api-server"
|
||||
+/- Task Group: "api-server"
|
||||
+/- Task: "server"
|
||||
+/- Config {
|
||||
+/- image: "geo-api-server:0.1" => "geo-api-server:0.2"
|
||||
}
|
||||
|
||||
Version = 0
|
||||
Stable = true
|
||||
Submit Date = 07/26/17 18:43:43 UTC
|
||||
```
|
||||
|
||||
We can see that Nomad considered the job running "geo-api-server:0.1" and
|
||||
"geo-api-server:0.2" as stable but job Version 2 that submitted the incorrect
|
||||
image is marked as unstable. This is because the placed allocations failed to
|
||||
start. Nomad detected the deployment failed and as such, created job Version 3
|
||||
that reverted back to the last healthy job.
|
||||
|
||||
[update]: /docs/job-specification/update 'Nomad update Stanza'
|
223
website/pages/guides/operations/autopilot.mdx
Normal file
223
website/pages/guides/operations/autopilot.mdx
Normal file
|
@ -0,0 +1,223 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Autopilot
|
||||
sidebar_title: Autopilot
|
||||
description: This guide covers how to configure and use Autopilot features.
|
||||
---
|
||||
|
||||
# Autopilot
|
||||
|
||||
Autopilot is a set of new features added in Nomad 0.8 to allow for automatic
|
||||
operator-friendly management of Nomad servers. It includes cleanup of dead
|
||||
servers, monitoring the state of the Raft cluster, and stable server introduction.
|
||||
|
||||
To enable Autopilot features (with the exception of dead server cleanup),
|
||||
the `raft_protocol` setting in the [server stanza](/docs/configuration/server)
|
||||
must be set to 3 on all servers. This setting defaults to 2; a cluster configured with protocol 2 can be upgraded
|
||||
to protocol 3 with a rolling update, provided time for membership to stabilize following each server update.
|
||||
During an upgrade from raft protocol 2 to 3, use the `nomad operator raft list-peers`
|
||||
command between server updates to verify that each server identifier is replaced with a UUID.
|
||||
For more information, see the [Version Upgrade section](/guides/upgrade/upgrade-specific#raft-protocol-version-compatibility)
|
||||
on Raft Protocol versions.
|
||||
|
||||
## Configuration
|
||||
|
||||
The configuration of Autopilot is loaded by the leader from the agent's
|
||||
[Autopilot settings](/docs/configuration/autopilot) when initially
|
||||
bootstrapping the cluster:
|
||||
|
||||
```
|
||||
autopilot {
|
||||
cleanup_dead_servers = true
|
||||
last_contact_threshold = 200ms
|
||||
max_trailing_logs = 250
|
||||
server_stabilization_time = "10s"
|
||||
enable_redundancy_zones = false
|
||||
disable_upgrade_migration = false
|
||||
enable_custom_upgrades = false
|
||||
}
|
||||
```
|
||||
|
||||
After bootstrapping, the configuration can be viewed or modified either via the
|
||||
[`operator autopilot`](/docs/commands/operator) subcommand or the
|
||||
[`/v1/operator/autopilot/configuration`](/api/operator#read-autopilot-configuration)
|
||||
HTTP endpoint:
|
||||
|
||||
```shell
|
||||
$ nomad operator autopilot get-config
|
||||
CleanupDeadServers = true
|
||||
LastContactThreshold = 200ms
|
||||
MaxTrailingLogs = 250
|
||||
ServerStabilizationTime = 10s
|
||||
EnableRedundancyZones = false
|
||||
DisableUpgradeMigration = false
|
||||
EnableCustomUpgrades = false
|
||||
|
||||
$ nomad operator autopilot set-config -cleanup-dead-servers=false
|
||||
Configuration updated!
|
||||
|
||||
$ nomad operator autopilot get-config
|
||||
CleanupDeadServers = false
|
||||
LastContactThreshold = 200ms
|
||||
MaxTrailingLogs = 250
|
||||
ServerStabilizationTime = 10s
|
||||
EnableRedundancyZones = false
|
||||
DisableUpgradeMigration = false
|
||||
EnableCustomUpgrades = false
|
||||
```
|
||||
|
||||
## Dead Server Cleanup
|
||||
|
||||
Dead servers will periodically be cleaned up and removed from the Raft peer
|
||||
set, to prevent them from interfering with the quorum size and leader elections.
|
||||
This cleanup will also happen whenever a new server is successfully added to the
|
||||
cluster.
|
||||
|
||||
Prior to Autopilot, it would take 72 hours for dead servers to be automatically reaped,
|
||||
or operators had to script a `nomad force-leave`. If another server failure occurred,
|
||||
it could jeopardize the quorum, even if the failed Nomad server had been automatically
|
||||
replaced. Autopilot helps prevent these kinds of outages by quickly removing failed
|
||||
servers as soon as a replacement Nomad server comes online. When servers are removed
|
||||
by the cleanup process they will enter the "left" state.
|
||||
|
||||
This option can be disabled by running `nomad operator autopilot set-config`
|
||||
with the `-cleanup-dead-servers=false` option.
|
||||
|
||||
## Server Health Checking
|
||||
|
||||
An internal health check runs on the leader to track the stability of servers.
|
||||
A server is considered healthy if all of the following conditions are true:
|
||||
|
||||
- Its status according to Serf is 'Alive'
|
||||
- The time since its last contact with the current leader is below
|
||||
`LastContactThreshold`
|
||||
- Its latest Raft term matches the leader's term
|
||||
- The number of Raft log entries it trails the leader by does not exceed
|
||||
`MaxTrailingLogs`
|
||||
|
||||
The status of these health checks can be viewed through the
|
||||
[`/v1/operator/autopilot/health`](/api/operator#read-health) HTTP endpoint, with
|
||||
a top level `Healthy` field indicating the overall status of the cluster:
|
||||
|
||||
```shell
|
||||
$ curl localhost:8500/v1/operator/autopilot/health
|
||||
{
|
||||
"Healthy": true,
|
||||
"FailureTolerance": 0,
|
||||
"Servers": [
|
||||
{
|
||||
"ID": "e349749b-3303-3ddf-959c-b5885a0e1f6e",
|
||||
"Name": "node1",
|
||||
"Address": "127.0.0.1:4647",
|
||||
"SerfStatus": "alive",
|
||||
"Version": "0.8.0",
|
||||
"Leader": true,
|
||||
"LastContact": "0s",
|
||||
"LastTerm": 2,
|
||||
"LastIndex": 10,
|
||||
"Healthy": true,
|
||||
"Voter": true,
|
||||
"StableSince": "2017-03-28T18:28:52Z"
|
||||
},
|
||||
{
|
||||
"ID": "e35bde83-4e9c-434f-a6ef-453f44ee21ea",
|
||||
"Name": "node2",
|
||||
"Address": "127.0.0.1:4747",
|
||||
"SerfStatus": "alive",
|
||||
"Version": "0.8.0",
|
||||
"Leader": false,
|
||||
"LastContact": "35.371007ms",
|
||||
"LastTerm": 2,
|
||||
"LastIndex": 10,
|
||||
"Healthy": true,
|
||||
"Voter": false,
|
||||
"StableSince": "2017-03-28T18:29:10Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Stable Server Introduction
|
||||
|
||||
When a new server is added to the cluster, there is a waiting period where it
|
||||
must be healthy and stable for a certain amount of time before being promoted
|
||||
to a full, voting member. This can be configured via the `ServerStabilizationTime`
|
||||
setting.
|
||||
|
||||
---
|
||||
|
||||
~> The following Autopilot features are available only in
|
||||
[Nomad Enterprise](https://www.hashicorp.com/products/nomad/) version 0.8.0 and later.
|
||||
|
||||
## Server Read and Scheduling Scaling
|
||||
|
||||
With the [`non_voting_server`](/docs/configuration/server#non_voting_server) option, a
|
||||
server can be explicitly marked as a non-voter and will never be promoted to a voting
|
||||
member. This can be useful when more read scaling is needed; being a non-voter means
|
||||
that the server will still have data replicated to it, but it will not be part of the
|
||||
quorum that the leader must wait for before committing log entries. Non voting servers can also
|
||||
act as scheduling workers to increase scheduling throughput in large clusters.
|
||||
|
||||
## Redundancy Zones
|
||||
|
||||
Prior to Autopilot, it was difficult to deploy servers in a way that took advantage of
|
||||
isolated failure domains such as AWS Availability Zones; users would be forced to either
|
||||
have an overly-large quorum (2-3 nodes per AZ) or give up redundancy within an AZ by
|
||||
deploying just one server in each.
|
||||
|
||||
If the `EnableRedundancyZones` setting is set, Nomad will use its value to look for a
|
||||
zone in each server's specified [`redundancy_zone`](/docs/configuration/server#redundancy_zone)
|
||||
field.
|
||||
|
||||
Here's an example showing how to configure this:
|
||||
|
||||
```hcl
|
||||
# config.hcl
|
||||
server {
|
||||
redundancy_zone = "west-1"
|
||||
}
|
||||
```
|
||||
|
||||
```shell
|
||||
$ nomad operator autopilot set-config -enable-redundancy-zones=true
|
||||
Configuration updated!
|
||||
```
|
||||
|
||||
Nomad will then use these values to partition the servers by redundancy zone, and will
|
||||
aim to keep one voting server per zone. Extra servers in each zone will stay as non-voters
|
||||
on standby to be promoted if the active voter leaves or dies.
|
||||
|
||||
## Upgrade Migrations
|
||||
|
||||
Autopilot in Nomad Enterprise supports upgrade migrations by default. To disable this
|
||||
functionality, set `DisableUpgradeMigration` to true.
|
||||
|
||||
When a new server is added and Autopilot detects that its Nomad version is newer than
|
||||
that of the existing servers, Autopilot will avoid promoting the new server until enough
|
||||
newer-versioned servers have been added to the cluster. When the count of new servers
|
||||
equals or exceeds that of the old servers, Autopilot will begin promoting the new servers
|
||||
to voters and demoting the old servers. After this is finished, the old servers can be
|
||||
safely removed from the cluster.
|
||||
|
||||
To check the Nomad version of the servers, either the [autopilot health](/api/operator#read-health)
|
||||
endpoint or the `nomad members`command can be used:
|
||||
|
||||
```shell
|
||||
$ nomad server members
|
||||
Name Address Port Status Leader Protocol Build Datacenter Region
|
||||
node1 127.0.0.1 4648 alive true 3 0.7.1 dc1 global
|
||||
node2 127.0.0.1 4748 alive false 3 0.7.1 dc1 global
|
||||
node3 127.0.0.1 4848 alive false 3 0.7.1 dc1 global
|
||||
node4 127.0.0.1 4948 alive false 3 0.8.0 dc1 global
|
||||
```
|
||||
|
||||
### Migrations Without a Nomad Version Change
|
||||
|
||||
The `EnableCustomUpgrades` field can be used to override the version information used during
|
||||
a migration, so that the migration logic can be used for updating the cluster when
|
||||
changing configuration.
|
||||
|
||||
If the `EnableCustomUpgrades` setting is set to `true`, Nomad will use its value to look for a
|
||||
version in each server's specified [`upgrade_version`](/docs/configuration/server#upgrade_version)
|
||||
tag. The upgrade logic will follow semantic versioning and the `upgrade_version`
|
||||
must be in the form of either `X`, `X.Y`, or `X.Y.Z`.
|
119
website/pages/guides/operations/cluster/automatic.mdx
Normal file
119
website/pages/guides/operations/cluster/automatic.mdx
Normal file
|
@ -0,0 +1,119 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Automatic Clustering with Consul
|
||||
sidebar_title: Automatic Clustering with Consul
|
||||
description: |-
|
||||
Learn how to automatically bootstrap a Nomad cluster using Consul. By having
|
||||
a Consul agent installed on each host, Nomad can automatically discover other
|
||||
clients and servers to bootstrap the cluster without operator involvement.
|
||||
---
|
||||
|
||||
# Automatic Clustering with Consul
|
||||
|
||||
To automatically bootstrap a Nomad cluster, we must leverage another HashiCorp
|
||||
open source tool, [Consul](https://www.consul.io/). Bootstrapping Nomad is
|
||||
easiest against an existing Consul cluster. The Nomad servers and clients
|
||||
will become informed of each other's existence when the Consul agent is
|
||||
installed and configured on each host. As an added benefit, integrating Consul
|
||||
with Nomad provides service and health check registration for applications which
|
||||
later run under Nomad.
|
||||
|
||||
Consul models infrastructures as datacenters and multiple Consul datacenters can
|
||||
be connected over the WAN so that clients can discover nodes in other
|
||||
datacenters. Since Nomad regions can encapsulate many datacenters, we recommend
|
||||
running a Consul cluster in every Nomad datacenter and connecting them over the
|
||||
WAN. Please refer to the Consul guide for both
|
||||
[bootstrapping](https://www.consul.io/docs/guides/bootstrapping.html) a single
|
||||
datacenter and [connecting multiple Consul clusters over the
|
||||
WAN](https://www.consul.io/docs/guides/datacenters.html).
|
||||
|
||||
If a Consul agent is installed on the host prior to Nomad starting, the Nomad
|
||||
agent will register with Consul and discover other nodes.
|
||||
|
||||
For servers, we must inform the cluster how many servers we expect to have. This
|
||||
is required to form the initial quorum, since Nomad is unaware of how many peers
|
||||
to expect. For example, to form a region with three Nomad servers, you would use
|
||||
the following Nomad configuration file:
|
||||
|
||||
```hcl
|
||||
# /etc/nomad.d/server.hcl
|
||||
|
||||
data_dir = "/etc/nomad.d"
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = 3
|
||||
}
|
||||
```
|
||||
|
||||
This configuration would be saved to disk and then run:
|
||||
|
||||
```shell
|
||||
$ nomad agent -config=/etc/nomad.d/server.hcl
|
||||
```
|
||||
|
||||
A similar configuration is available for Nomad clients:
|
||||
|
||||
```hcl
|
||||
# /etc/nomad.d/client.hcl
|
||||
|
||||
datacenter = "dc1"
|
||||
data_dir = "/etc/nomad.d"
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
}
|
||||
```
|
||||
|
||||
The agent is started in a similar manner:
|
||||
|
||||
```shell
|
||||
$ nomad agent -config=/etc/nomad.d/client.hcl
|
||||
```
|
||||
|
||||
As you can see, the above configurations include no IP or DNS addresses between
|
||||
the clients and servers. This is because Nomad detected the existence of Consul
|
||||
and utilized service discovery to form the cluster.
|
||||
|
||||
## Internals
|
||||
|
||||
~> This section discusses the internals of the Consul and Nomad integration at a
|
||||
very high level. Reading is only recommended for those curious to the
|
||||
implementation.
|
||||
|
||||
As discussed in the previous section, Nomad merges multiple configuration files
|
||||
together, so the `-config` may be specified more than once:
|
||||
|
||||
```shell
|
||||
$ nomad agent -config=base.hcl -config=server.hcl
|
||||
```
|
||||
|
||||
In addition to merging configuration on the command line, Nomad also maintains
|
||||
its own internal configurations (called "default configs") which include sane
|
||||
base defaults. One of those default configurations includes a "consul" block,
|
||||
which specifies sane defaults for connecting to and integrating with Consul. In
|
||||
essence, this configuration file resembles the following:
|
||||
|
||||
```hcl
|
||||
# You do not need to add this to your configuration file. This is an example
|
||||
# that is part of Nomad's internal default configuration for Consul integration.
|
||||
consul {
|
||||
# The address to the Consul agent.
|
||||
address = "127.0.0.1:8500"
|
||||
|
||||
# The service name to register the server and client with Consul.
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
|
||||
# Enables automatically registering the services.
|
||||
auto_advertise = true
|
||||
|
||||
# Enabling the server and client to bootstrap using Consul.
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
```
|
||||
|
||||
Please refer to the [Consul
|
||||
documentation](/docs/configuration/consul) for the complete set of
|
||||
configuration options.
|
33
website/pages/guides/operations/cluster/cloud_auto_join.mdx
Normal file
33
website/pages/guides/operations/cluster/cloud_auto_join.mdx
Normal file
|
@ -0,0 +1,33 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Cloud Auto-join
|
||||
sidebar_title: Cloud Auto-join
|
||||
description: |-
|
||||
Nomad supports automatic cluster joining using cloud metadata from various
|
||||
cloud providers
|
||||
---
|
||||
|
||||
# Cloud Auto-joining
|
||||
|
||||
As of Nomad 0.8.4,
|
||||
[`retry_join`](/docs/configuration/server_join#retry_join) accepts a
|
||||
unified interface using the
|
||||
[go-discover](https://github.com/hashicorp/go-discover) library for doing
|
||||
automatic cluster joining using cloud metadata. To use retry-join with a
|
||||
supported cloud provider, specify the configuration on the command line or
|
||||
configuration file as a `key=value key=value ...` string. Values are taken
|
||||
literally and must not be URL encoded. If the values contain spaces, backslashes
|
||||
or double quotes thenthey need to be double quoted and the usual escaping rules
|
||||
apply.
|
||||
|
||||
```json
|
||||
{
|
||||
"retry_join": ["provider=my-cloud config=val config2=\"some other val\" ..."]
|
||||
}
|
||||
```
|
||||
|
||||
The cloud provider-specific configurations are documented [here](/docs/configuration/server_join#cloud-auto-join).
|
||||
This can be combined with static IP or DNS addresses or even multiple configurations
|
||||
for different providers. In order to use discovery behind a proxy, you will need to set
|
||||
`HTTP_PROXY`, `HTTPS_PROXY` and `NO_PROXY` environment variables per
|
||||
[Golang `net/http` library](https://golang.org/pkg/net/http/#ProxyFromEnvironment).
|
24
website/pages/guides/operations/cluster/index.mdx
Normal file
24
website/pages/guides/operations/cluster/index.mdx
Normal file
|
@ -0,0 +1,24 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Clustering
|
||||
sidebar_title: Clustering
|
||||
description: Learn how to cluster Nomad.
|
||||
---
|
||||
|
||||
# Clustering
|
||||
|
||||
Nomad models infrastructure into regions and datacenters. Servers reside at the
|
||||
regional layer and manage all state and scheduling decisions for that region.
|
||||
Regions contain multiple datacenters, and clients are registered to a single
|
||||
datacenter (and thus a region that contains that datacenter). For more details on
|
||||
the architecture of Nomad and how it models infrastructure see the [architecture
|
||||
page](/docs/internals/architecture).
|
||||
|
||||
There are multiple strategies available for creating a multi-node Nomad cluster:
|
||||
|
||||
1. [Manual Clustering](/guides/operations/cluster/manual)
|
||||
1. [Automatic Clustering with Consul](/guides/operations/cluster/automatic)
|
||||
1. [Cloud Auto-join](/guides/operations/cluster/cloud_auto_join)
|
||||
|
||||
Please refer to the specific documentation links above or in the sidebar for
|
||||
more detailed information about each strategy.
|
70
website/pages/guides/operations/cluster/manual.mdx
Normal file
70
website/pages/guides/operations/cluster/manual.mdx
Normal file
|
@ -0,0 +1,70 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Manually Clustering
|
||||
sidebar_title: Manual Clustering
|
||||
description: |-
|
||||
Learn how to manually bootstrap a Nomad cluster using the server join
|
||||
command. This section also discusses Nomad federation across multiple
|
||||
datacenters and regions.
|
||||
---
|
||||
|
||||
# Manual Clustering
|
||||
|
||||
Manually bootstrapping a Nomad cluster does not rely on additional tooling, but
|
||||
does require operator participation in the cluster formation process. When
|
||||
bootstrapping, Nomad servers and clients must be started and informed with the
|
||||
address of at least one Nomad server.
|
||||
|
||||
As you can tell, this creates a chicken-and-egg problem where one server must
|
||||
first be fully bootstrapped and configured before the remaining servers and
|
||||
clients can join the cluster. This requirement can add additional provisioning
|
||||
time as well as ordered dependencies during provisioning.
|
||||
|
||||
First, we bootstrap a single Nomad server and capture its IP address. After we
|
||||
have that nodes IP address, we place this address in the configuration.
|
||||
|
||||
For Nomad servers, this configuration may look something like this:
|
||||
|
||||
```hcl
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = 3
|
||||
|
||||
# This is the IP address of the first server we provisioned
|
||||
server_join {
|
||||
retry_join = ["<known-address>:4648"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Alternatively, the address can be supplied after the servers have all been
|
||||
started by running the [`server join` command](/docs/commands/server/join)
|
||||
on the servers individually to cluster the servers. All servers can join just
|
||||
one other server, and then rely on the gossip protocol to discover the rest.
|
||||
|
||||
```shell
|
||||
$ nomad server join <known-address>
|
||||
```
|
||||
|
||||
For Nomad clients, the configuration may look something like:
|
||||
|
||||
```hcl
|
||||
client {
|
||||
enabled = true
|
||||
servers = ["<known-address>:4647"]
|
||||
}
|
||||
```
|
||||
|
||||
The client node's server list can be updated at run time using the [`node config` command](/docs/commands/node/config).
|
||||
|
||||
```shell
|
||||
$ nomad node config -update-servers <IP>:4647
|
||||
```
|
||||
|
||||
The port corresponds to the RPC port. If no port is specified with the IP
|
||||
address, the default RPC port of `4647` is assumed.
|
||||
|
||||
As servers are added or removed from the cluster, this information is pushed to
|
||||
the client. This means only one server must be specified because, after initial
|
||||
contact, the full set of servers in the client's region are shared with the
|
||||
client.
|
36
website/pages/guides/operations/federation.mdx
Normal file
36
website/pages/guides/operations/federation.mdx
Normal file
|
@ -0,0 +1,36 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Multi-region Federation
|
||||
sidebar_title: Federation
|
||||
description: |-
|
||||
Learn how to join Nomad servers across multiple regions so users can submit
|
||||
jobs to any server in any region using global federation.
|
||||
---
|
||||
|
||||
# Multi-region Federation
|
||||
|
||||
Because Nomad operates at a regional level, federation is part of Nomad core.
|
||||
Federation enables users to submit jobs or interact with the HTTP API targeting
|
||||
any region, from any server, even if that server resides in a different region.
|
||||
|
||||
Federating multiple Nomad clusters requires network connectivity between the
|
||||
clusters. Servers in each cluster must be able to communicate over [RPC and
|
||||
Serf][ports]. Federated clusters are expected to communicate over WANs, so they
|
||||
do not need the same low latency as servers within a region.
|
||||
|
||||
Once Nomad servers are able to connect, federating is as simple as joining the
|
||||
servers. From any server in one region, issue a join command to a server in a
|
||||
remote region:
|
||||
|
||||
```shell
|
||||
$ nomad server join 1.2.3.4:4648
|
||||
```
|
||||
|
||||
Note that only one join command is required per region. Servers across regions
|
||||
discover other servers in the cluster via the gossip protocol and hence it's
|
||||
enough to join just one known server.
|
||||
|
||||
If bootstrapped via Consul and the Consul clusters in the Nomad regions are
|
||||
federated, then federation occurs automatically.
|
||||
|
||||
[ports]: /guides/install/production/requirements#ports-used
|
13
website/pages/guides/operations/index.mdx
Normal file
13
website/pages/guides/operations/index.mdx
Normal file
|
@ -0,0 +1,13 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Nomad Operations
|
||||
sidebar_title: Operating Nomad
|
||||
description: Learn how to operate Nomad.
|
||||
---
|
||||
|
||||
# Nomad Operations
|
||||
|
||||
The Nomad Operations guides section provides best practices and guidance for
|
||||
operating Nomad in a real-world production setting.
|
||||
|
||||
Please navigate the appropriate sub-sections for more information.
|
|
@ -0,0 +1,22 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Monitoring and Alerting
|
||||
sidebar_title: Monitoring and Alerting
|
||||
description: |-
|
||||
It is possible to enable telemetry on Nomad servers and clients. Nomad
|
||||
can integrate with various metrics dashboards such as Prometheus, Grafana,
|
||||
Graphite, DataDog, and Circonus.
|
||||
---
|
||||
|
||||
# Monitoring and Alerting
|
||||
|
||||
Nomad provides the opportunity to integrate with metrics dashboard tools such
|
||||
as [Prometheus](https://prometheus.io/), [Grafana](https://grafana.com/),
|
||||
[Graphite](https://graphiteapp.org/), [DataDog](https://www.datadoghq.com/),
|
||||
and [Circonus](https://www.circonus.com).
|
||||
|
||||
- [Prometheus](/guides/operations/monitoring-and-alerting/prometheus-metrics)
|
||||
|
||||
Please refer to the specific documentation links above or in the sidebar for more detailed information about using specific tools to collect metrics on Nomad.
|
||||
See Nomad's [Metrics API](/api/metrics) for more information on how
|
||||
data can be exposed for other metrics tools as well.
|
|
@ -0,0 +1,571 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Using Prometheus to Monitor Nomad Metrics
|
||||
sidebar_title: Prometheus
|
||||
description: |-
|
||||
It is possible to collect metrics on Nomad with Prometheus after enabling
|
||||
telemetry on Nomad servers and clients.
|
||||
---
|
||||
|
||||
# Using Prometheus to Monitor Nomad Metrics
|
||||
|
||||
This guide explains how to configure [Prometheus][prometheus] to integrate with
|
||||
a Nomad cluster and Prometheus [Alertmanager][alertmanager]. While this guide introduces the basics of enabling [telemetry][telemetry] and alerting, a Nomad operator can go much further by customizing dashboards and integrating different
|
||||
[receivers][receivers] for alerts.
|
||||
|
||||
## Reference Material
|
||||
|
||||
- [Configuring Prometheus][configuring prometheus]
|
||||
- [Telemetry Stanza in Nomad Agent Configuration][telemetry stanza]
|
||||
- [Alerting Overview][alerting]
|
||||
|
||||
## Estimated Time to Complete
|
||||
|
||||
25 minutes
|
||||
|
||||
## Challenge
|
||||
|
||||
Think of a scenario where a Nomad operator needs to deploy Prometheus to
|
||||
collect metrics from a Nomad cluster. The operator must enable telemetry on
|
||||
the Nomad servers and clients as well as configure Prometheus to use Consul
|
||||
for service discovery. The operator must also configure Prometheus Alertmanager
|
||||
so notifications can be sent out to a specified [receiver][receivers].
|
||||
|
||||
## Solution
|
||||
|
||||
Deploy Prometheus with a configuration that accounts for a highly dynamic
|
||||
environment. Integrate service discovery into the configuration file to avoid
|
||||
using hard-coded IP addresses. Place the Prometheus deployment behind
|
||||
[fabio][fabio] (this will allow easy access to the Prometheus web interface
|
||||
by allowing the Nomad operator to hit any of the client nodes at the `/` path.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
To perform the tasks described in this guide, you need to have a Nomad
|
||||
environment with Consul installed. You can use this
|
||||
[repo](https://github.com/hashicorp/nomad/tree/master/terraform#provision-a-nomad-cluster-in-the-cloud)
|
||||
to easily provision a sandbox environment. This guide will assume a cluster with
|
||||
one server node and three client nodes.
|
||||
|
||||
-> **Please Note:** This guide is for demo purposes and is only using a single
|
||||
server node. In a production cluster, 3 or 5 server nodes are recommended. The
|
||||
alerting rules defined in this guide are for instructional purposes. Please
|
||||
refer to [Alerting Rules][alertingrules] for more information.
|
||||
|
||||
## Steps
|
||||
|
||||
### Step 1: Enable Telemetry on Nomad Servers and Clients
|
||||
|
||||
Add the stanza below in your Nomad client and server configuration
|
||||
files. If you have used the provided repo in this guide to set up a Nomad
|
||||
cluster, the configuration file will be `/etc/nomad.d/nomad.hcl`.
|
||||
After making this change, restart the Nomad service on each server and
|
||||
client node.
|
||||
|
||||
```hcl
|
||||
telemetry {
|
||||
collection_interval = "1s"
|
||||
disable_hostname = true
|
||||
prometheus_metrics = true
|
||||
publish_allocation_metrics = true
|
||||
publish_node_metrics = true
|
||||
}
|
||||
```
|
||||
|
||||
### Step 2: Create a Job for Fabio
|
||||
|
||||
Create a job for Fabio and name it `fabio.nomad`
|
||||
|
||||
```hcl
|
||||
job "fabio" {
|
||||
datacenters = ["dc1"]
|
||||
type = "system"
|
||||
|
||||
group "fabio" {
|
||||
task "fabio" {
|
||||
driver = "docker"
|
||||
config {
|
||||
image = "fabiolb/fabio"
|
||||
network_mode = "host"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 64
|
||||
network {
|
||||
mbits = 20
|
||||
port "lb" {
|
||||
static = 9999
|
||||
}
|
||||
port "ui" {
|
||||
static = 9998
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To learn more about fabio and the options used in this job file, see
|
||||
[Load Balancing with Fabio][fabio-lb]. For the purpose of this guide, it is
|
||||
important to note that the `type` option is set to [system][system] so that
|
||||
fabio will be deployed on all client nodes. We have also set `network_mode` to
|
||||
`host` so that fabio will be able to use Consul for service discovery.
|
||||
|
||||
### Step 3: Run the Fabio Job
|
||||
|
||||
We can now register our fabio job:
|
||||
|
||||
```shell
|
||||
$ nomad job run fabio.nomad
|
||||
==> Monitoring evaluation "7b96701e"
|
||||
Evaluation triggered by job "fabio"
|
||||
Allocation "d0e34682" created: node "28d7f859", group "fabio"
|
||||
Allocation "238ec0f7" created: node "510898b6", group "fabio"
|
||||
Allocation "9a2e8359" created: node "f3739267", group "fabio"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "7b96701e" finished with status "complete"
|
||||
```
|
||||
|
||||
At this point, you should be able to visit any one of your client nodes at port
|
||||
`9998` and see the web interface for fabio. The routing table will be empty
|
||||
since we have not yet deployed anything that fabio can route to.
|
||||
Accordingly, if you visit any of the client nodes at port `9999` at this
|
||||
point, you will get a `404` HTTP response. That will change soon.
|
||||
|
||||
### Step 4: Create a Job for Prometheus
|
||||
|
||||
Create a job for Prometheus and name it `prometheus.nomad`
|
||||
|
||||
```hcl
|
||||
job "prometheus" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "monitoring" {
|
||||
count = 1
|
||||
restart {
|
||||
attempts = 2
|
||||
interval = "30m"
|
||||
delay = "15s"
|
||||
mode = "fail"
|
||||
}
|
||||
ephemeral_disk {
|
||||
size = 300
|
||||
}
|
||||
|
||||
task "prometheus" {
|
||||
template {
|
||||
change_mode = "noop"
|
||||
destination = "local/prometheus.yml"
|
||||
data = <<EOH
|
||||
---
|
||||
global:
|
||||
scrape_interval: 5s
|
||||
evaluation_interval: 5s
|
||||
|
||||
scrape_configs:
|
||||
|
||||
- job_name: 'nomad_metrics'
|
||||
|
||||
consul_sd_configs:
|
||||
- server: '{{ env "NOMAD_IP_prometheus_ui" }}:8500'
|
||||
services: ['nomad-client', 'nomad']
|
||||
|
||||
relabel_configs:
|
||||
- source_labels: ['__meta_consul_tags']
|
||||
regex: '(.*)http(.*)'
|
||||
action: keep
|
||||
|
||||
scrape_interval: 5s
|
||||
metrics_path: /v1/metrics
|
||||
params:
|
||||
format: ['prometheus']
|
||||
EOH
|
||||
}
|
||||
driver = "docker"
|
||||
config {
|
||||
image = "prom/prometheus:latest"
|
||||
volumes = [
|
||||
"local/prometheus.yml:/etc/prometheus/prometheus.yml"
|
||||
]
|
||||
port_map {
|
||||
prometheus_ui = 9090
|
||||
}
|
||||
}
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "prometheus_ui" {}
|
||||
}
|
||||
}
|
||||
service {
|
||||
name = "prometheus"
|
||||
tags = ["urlprefix-/"]
|
||||
port = "prometheus_ui"
|
||||
check {
|
||||
name = "prometheus_ui port alive"
|
||||
type = "http"
|
||||
path = "/-/healthy"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Notice we are using the [template][template] stanza to create a Prometheus
|
||||
configuration using [environment][env] variables. In this case, we are using the
|
||||
environment variable `NOMAD_IP_prometheus_ui` in the
|
||||
[consul_sd_configs][consul_sd_config]
|
||||
section to ensure Prometheus can use Consul to detect and scrape targets.
|
||||
This works in our example because Consul is installed alongside Nomad.
|
||||
Additionally, we benefit from this configuration by avoiding the need to
|
||||
hard-code IP addresses. If you did not use the repo provided in this guide to
|
||||
create a Nomad cluster, be sure to point your Prometheus configuration
|
||||
to a Consul server you have set up.
|
||||
|
||||
The [volumes][volumes] option allows us to take the configuration file we
|
||||
dynamically created and place it in our Prometheus container.
|
||||
|
||||
### Step 5: Run the Prometheus Job
|
||||
|
||||
We can now register our job for Prometheus:
|
||||
|
||||
```shell
|
||||
$ nomad job run prometheus.nomad
|
||||
==> Monitoring evaluation "4e6b7127"
|
||||
Evaluation triggered by job "prometheus"
|
||||
Evaluation within deployment: "d3a651a7"
|
||||
Allocation "9725af3d" created: node "28d7f859", group "monitoring"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "4e6b7127" finished with status "complete"
|
||||
```
|
||||
|
||||
Prometheus is now deployed. You can visit any of your client nodes at port
|
||||
`9999` to visit the web interface. There is only one instance of Prometheus
|
||||
running in the Nomad cluster, but you are automatically routed to it
|
||||
regardless of which node you visit because fabio is deployed and running on the
|
||||
cluster as well.
|
||||
|
||||
At the top menu bar, click on `Status` and then `Targets`. You should see all
|
||||
of your Nomad nodes (servers and clients) show up as targets. Please note that
|
||||
the IP addresses will be different in your cluster.
|
||||
|
||||
[![Prometheus Targets][prometheus-targets]][prometheus-targets]
|
||||
|
||||
Let's use Prometheus to query how many jobs are running in our Nomad cluster.
|
||||
On the main page, type `nomad_nomad_job_summary_running` into the query
|
||||
section. You can also select the query from the drop-down list.
|
||||
|
||||
[![Running Jobs][running-jobs]][running-jobs]
|
||||
|
||||
You can see that the value of our fabio job is `3` since it is using the
|
||||
[system][system] scheduler type. This makes sense because we are running
|
||||
three Nomad clients in our demo cluster. The value of our Prometheus job, on
|
||||
the other hand, is `1` since we have only deployed one instance of it.
|
||||
To see the description of other metrics, visit the [telemetry][telemetry]
|
||||
section.
|
||||
|
||||
### Step 6: Deploy Alertmanager
|
||||
|
||||
Now that we have enabled Prometheus to collect metrics from our cluster and see
|
||||
the state of our jobs, let's deploy [Alertmanager][alertmanager]. Keep in mind
|
||||
that Prometheus sends alerts to Alertmanager. It is then Alertmanager's job to
|
||||
send out the notifications on those alerts to any designated [receiver][receivers].
|
||||
|
||||
Create a job for Alertmanager and named it `alertmanager.nomad`
|
||||
|
||||
```hcl
|
||||
job "alertmanager" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "alerting" {
|
||||
count = 1
|
||||
restart {
|
||||
attempts = 2
|
||||
interval = "30m"
|
||||
delay = "15s"
|
||||
mode = "fail"
|
||||
}
|
||||
ephemeral_disk {
|
||||
size = 300
|
||||
}
|
||||
|
||||
task "alertmanager" {
|
||||
driver = "docker"
|
||||
config {
|
||||
image = "prom/alertmanager:latest"
|
||||
port_map {
|
||||
alertmanager_ui = 9093
|
||||
}
|
||||
}
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "alertmanager_ui" {}
|
||||
}
|
||||
}
|
||||
service {
|
||||
name = "alertmanager"
|
||||
tags = ["urlprefix-/alertmanager strip=/alertmanager"]
|
||||
port = "alertmanager_ui"
|
||||
check {
|
||||
name = "alertmanager_ui port alive"
|
||||
type = "http"
|
||||
path = "/-/healthy"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Step 7: Configure Prometheus to Integrate with Alertmanager
|
||||
|
||||
Now that we have deployed Alertmanager, let's slightly modify our Prometheus job
|
||||
configuration to allow it to recognize and send alerts to it. Note that there are
|
||||
some rules in the configuration that refer a to a web server we will deploy soon.
|
||||
|
||||
Below is the same Prometheus configuration we detailed above, but we have added
|
||||
some sections that hook Prometheus into the Alertmanager and set up some Alerting
|
||||
rules.
|
||||
|
||||
```hcl
|
||||
job "prometheus" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "monitoring" {
|
||||
count = 1
|
||||
restart {
|
||||
attempts = 2
|
||||
interval = "30m"
|
||||
delay = "15s"
|
||||
mode = "fail"
|
||||
}
|
||||
ephemeral_disk {
|
||||
size = 300
|
||||
}
|
||||
|
||||
task "prometheus" {
|
||||
template {
|
||||
change_mode = "noop"
|
||||
destination = "local/webserver_alert.yml"
|
||||
data = <<EOH
|
||||
---
|
||||
groups:
|
||||
- name: prometheus_alerts
|
||||
rules:
|
||||
- alert: Webserver down
|
||||
expr: absent(up{job="webserver"})
|
||||
for: 10s
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
description: "Our webserver is down."
|
||||
EOH
|
||||
}
|
||||
|
||||
template {
|
||||
change_mode = "noop"
|
||||
destination = "local/prometheus.yml"
|
||||
data = <<EOH
|
||||
---
|
||||
global:
|
||||
scrape_interval: 5s
|
||||
evaluation_interval: 5s
|
||||
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- consul_sd_configs:
|
||||
- server: '{{ env "NOMAD_IP_prometheus_ui" }}:8500'
|
||||
services: ['alertmanager']
|
||||
|
||||
rule_files:
|
||||
- "webserver_alert.yml"
|
||||
|
||||
scrape_configs:
|
||||
|
||||
- job_name: 'alertmanager'
|
||||
|
||||
consul_sd_configs:
|
||||
- server: '{{ env "NOMAD_IP_prometheus_ui" }}:8500'
|
||||
services: ['alertmanager']
|
||||
|
||||
- job_name: 'nomad_metrics'
|
||||
|
||||
consul_sd_configs:
|
||||
- server: '{{ env "NOMAD_IP_prometheus_ui" }}:8500'
|
||||
services: ['nomad-client', 'nomad']
|
||||
|
||||
relabel_configs:
|
||||
- source_labels: ['__meta_consul_tags']
|
||||
regex: '(.*)http(.*)'
|
||||
action: keep
|
||||
|
||||
scrape_interval: 5s
|
||||
metrics_path: /v1/metrics
|
||||
params:
|
||||
format: ['prometheus']
|
||||
|
||||
- job_name: 'webserver'
|
||||
|
||||
consul_sd_configs:
|
||||
- server: '{{ env "NOMAD_IP_prometheus_ui" }}:8500'
|
||||
services: ['webserver']
|
||||
|
||||
metrics_path: /metrics
|
||||
EOH
|
||||
}
|
||||
driver = "docker"
|
||||
config {
|
||||
image = "prom/prometheus:latest"
|
||||
volumes = [
|
||||
"local/webserver_alert.yml:/etc/prometheus/webserver_alert.yml",
|
||||
"local/prometheus.yml:/etc/prometheus/prometheus.yml"
|
||||
]
|
||||
port_map {
|
||||
prometheus_ui = 9090
|
||||
}
|
||||
}
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "prometheus_ui" {}
|
||||
}
|
||||
}
|
||||
service {
|
||||
name = "prometheus"
|
||||
tags = ["urlprefix-/"]
|
||||
port = "prometheus_ui"
|
||||
check {
|
||||
name = "prometheus_ui port alive"
|
||||
type = "http"
|
||||
path = "/-/healthy"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Notice we have added a few important sections to this job file:
|
||||
|
||||
- We added another template stanza that defines an [alerting rule][alertingrules]
|
||||
for our web server. Namely, Prometheus will send out an alert if it detects
|
||||
the `webserver` service has disappeared.
|
||||
|
||||
- We added an `alerting` block to our Prometheus configuration as well as a
|
||||
`rule_files` block to make Prometheus aware of Alertmanager as well as the
|
||||
rule we have defined.
|
||||
|
||||
- We are now also scraping Alertmanager along with our
|
||||
web server.
|
||||
|
||||
### Step 8: Deploy Web Server
|
||||
|
||||
Create a job for our web server and name it `webserver.nomad`
|
||||
|
||||
```hcl
|
||||
job "webserver" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
group "webserver" {
|
||||
task "server" {
|
||||
driver = "docker"
|
||||
config {
|
||||
image = "hashicorp/demo-prometheus-instrumentation:latest"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 256
|
||||
network {
|
||||
mbits = 10
|
||||
port "http"{}
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "webserver"
|
||||
port = "http"
|
||||
|
||||
tags = [
|
||||
"testweb",
|
||||
"urlprefix-/webserver strip=/webserver",
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/"
|
||||
interval = "2s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
At this point, re-run your Prometheus job. After a few seconds, you will see the
|
||||
web server and Alertmanager appear in your list of targets.
|
||||
|
||||
[![New Targets][new-targets]][new-targets]
|
||||
|
||||
You should also be able to go to the `Alerts` section of the Prometheus web interface
|
||||
and see the alert that we have configured. No alerts are active because our web server
|
||||
is up and running.
|
||||
|
||||
[![Alerts][alerts]][alerts]
|
||||
|
||||
### Step 9: Stop the Web Server
|
||||
|
||||
Run `nomad stop webserver` to stop our webserver. After a few seconds, you will see
|
||||
that we have an active alert in the `Alerts` section of the web interface.
|
||||
|
||||
[![Active Alerts][active-alerts]][active-alerts]
|
||||
|
||||
We can now go to the Alertmanager web interface to see that Alertmanager has received
|
||||
this alert as well. Since Alertmanager has been configured behind fabio, go to the IP address of any of your client nodes at port `9999` and use `/alertmanager` as the route. An example is shown below:
|
||||
|
||||
-> < client node IP >:9999/alertmanager
|
||||
|
||||
You should see that Alertmanager has received the alert.
|
||||
|
||||
[![Alertmanager Web UI][alertmanager-webui]][alertmanager-webui]
|
||||
|
||||
## Next Steps
|
||||
|
||||
Read more about Prometheus [Alertmanager][alertmanager] and how to configure it
|
||||
to send out notifications to a [receiver][receivers] of your choice.
|
||||
|
||||
[active-alerts]: /img/active-alert.png
|
||||
[alerts]: /img/alerts.png
|
||||
[alerting]: https://prometheus.io/docs/alerting/overview/
|
||||
[alertingrules]: https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/
|
||||
[alertmanager]: https://prometheus.io/docs/alerting/alertmanager/
|
||||
[alertmanager-webui]: /img/alertmanager-webui.png
|
||||
[configuring prometheus]: https://prometheus.io/docs/introduction/first_steps/#configuring-prometheus
|
||||
[consul_sd_config]: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cconsul_sd_config%3E
|
||||
[env]: /docs/runtime/environment
|
||||
[fabio]: https://fabiolb.net/
|
||||
[fabio-lb]: https://learn.hashicorp.com/guides/load-balancing/fabio
|
||||
[new-targets]: /img/new-targets.png
|
||||
[prometheus-targets]: /img/prometheus-targets.png
|
||||
[running-jobs]: /img/running-jobs.png
|
||||
[telemetry]: /docs/configuration/telemetry
|
||||
[telemetry stanza]: /docs/configuration/telemetry
|
||||
[template]: /docs/job-specification/template
|
||||
[volumes]: /docs/drivers/docker#volumes
|
||||
[prometheus]: https://prometheus.io/docs/introduction/overview/
|
||||
[receivers]: https://prometheus.io/docs/alerting/configuration/#%3Creceiver%3E
|
||||
[system]: /docs/schedulers#system
|
7
website/pages/guides/operations/monitoring/index.mdx
Normal file
7
website/pages/guides/operations/monitoring/index.mdx
Normal file
|
@ -0,0 +1,7 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Monitoring
|
||||
description: Nomad monitoring
|
||||
---
|
||||
|
||||
Use the navigation on the left to see guides on Nomad monitoring.
|
61
website/pages/guides/operations/monitoring/nomad-metrics.mdx
Normal file
61
website/pages/guides/operations/monitoring/nomad-metrics.mdx
Normal file
|
@ -0,0 +1,61 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Setting up Nomad with Grafana and Prometheus Metrics
|
||||
description: |-
|
||||
It is possible to collect metrics on Nomad and create dashboards with Grafana
|
||||
and Prometheus. Nomad has default configurations for these, but it is
|
||||
possible to build and customize these.
|
||||
---
|
||||
|
||||
# Setting up Nomad with Grafana and Prometheus Metrics
|
||||
|
||||
Often aggregating and displaying metrics in dashboards can lead to more useful
|
||||
insights about a cluster. It is easy to get lost in a sea of logs!
|
||||
|
||||
This guide explains how to set up configuration for Prometheus and Grafana to
|
||||
integrate with a Nomad cluster. While this introduces the basics to get a
|
||||
dashboard up and running, Nomad exposes a wide variety of metrics, which can be
|
||||
explored via both Grafana and Prometheus.
|
||||
|
||||
## What metrics tools can be integrated with Nomad?
|
||||
|
||||
Nomad provides the opportunity to integrate with metrics dashboard tools such
|
||||
as [Prometheus](https://prometheus.io/), [Grafana](https://grafana.com/),
|
||||
[Graphite](https://graphiteapp.org/), [DataDog](https://www.datadoghq.com/),
|
||||
and [Circonus](https://www.circonus.com).
|
||||
|
||||
See Nomad's [Metrics API](/api/metrics) for more information on how
|
||||
data can be exposed for other metrics tools as well.
|
||||
|
||||
## Setting up metrics
|
||||
|
||||
Configurations for Grafana and Prometheus can be found in the
|
||||
[integrations](https://github.com/hashicorp/nomad/tree/master/integrations) subfolder.
|
||||
|
||||
For Prometheus, first follow Prometheus's [Getting Started
|
||||
Guide](https://prometheus.io/docs/introduction/getting_started/) in order to
|
||||
set up a Prometheus server. Next, use the [Nomad Prometheus
|
||||
Configuration](https://github.com/hashicorp/nomad/tree/master/integrations/prometheus/prometheus.yml)
|
||||
in order to configure Prometheus to talk to a Consul agent to fetch information
|
||||
about the Nomad cluster. See the
|
||||
[README](https://github.com/hashicorp/nomad/tree/master/integrations/prometheus/README.md)
|
||||
for more information.
|
||||
|
||||
For Grafana, follow Grafana's [Getting
|
||||
Started](http://docs.grafana.org/guides/getting_started/) guide to set up a
|
||||
running Grafana instance. Then, import the sample [Nomad
|
||||
Dashboard](https://github.com/hashicorp/nomad/blob/master/integrations/grafana_dashboards/sample_grafana_dashboard.json)
|
||||
for an example Grafana dashboard. This dashboard requires a Prometheus data
|
||||
source to be configured, see the
|
||||
[README.md](https://github.com/hashicorp/nomad/tree/master/integrations/grafana/README.md)
|
||||
for more information.
|
||||
|
||||
## Tagged Metrics
|
||||
|
||||
As of version 0.7, Nomad will start emitting metrics in a tagged format. Each
|
||||
metrics can support more than one tag, meaning that it is possible to do a
|
||||
match over metrics for datapoints such as a particular datacenter, and return
|
||||
all metrics with this tag.
|
||||
|
||||
See how [Grafana](http://docs.grafana.org/v3.1/reference/templating/) enables
|
||||
tagged metrics easily.
|
342
website/pages/guides/operations/node-draining.mdx
Normal file
342
website/pages/guides/operations/node-draining.mdx
Normal file
|
@ -0,0 +1,342 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Workload Migration
|
||||
sidebar_title: Workload Migration
|
||||
description: |-
|
||||
Workload migration is a normal part of cluster operations for a variety of
|
||||
reasons: server maintenance, operating system upgrades, etc. Nomad offers a
|
||||
number of parameters for controlling how running jobs are migrated off of
|
||||
draining nodes.
|
||||
---
|
||||
|
||||
# Workload Migration
|
||||
|
||||
Migrating workloads and decommissioning nodes are a normal part of cluster
|
||||
operations for a variety of reasons: server maintenance, operating system
|
||||
upgrades, etc. Nomad offers a number of parameters for controlling how running
|
||||
jobs are migrated off of draining nodes.
|
||||
|
||||
## Configuring How Jobs are Migrated
|
||||
|
||||
In Nomad 0.8 a [`migrate`][migrate] stanza was added to jobs to allow control
|
||||
over how allocations for a job are migrated off of a draining node. Below is an
|
||||
example job that runs a web service and has a Consul health check:
|
||||
|
||||
```hcl
|
||||
job "webapp" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
migrate {
|
||||
max_parallel = 2
|
||||
health_check = "checks"
|
||||
min_healthy_time = "15s"
|
||||
healthy_deadline = "5m"
|
||||
}
|
||||
|
||||
group "webapp" {
|
||||
count = 9
|
||||
|
||||
task "webapp" {
|
||||
driver = "docker"
|
||||
config {
|
||||
image = "hashicorp/http-echo:0.2.3"
|
||||
args = ["-text", "ok"]
|
||||
port_map {
|
||||
http = 5678
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http" {}
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "webapp"
|
||||
port = "http"
|
||||
check {
|
||||
name = "http-ok"
|
||||
type = "http"
|
||||
path = "/"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The above `migrate` stanza ensures only 2 allocations are stopped at a time to
|
||||
migrate during node drains. Even if multiple nodes running allocations for this
|
||||
job were draining at the same time, only 2 allocations would be migrated at a
|
||||
time.
|
||||
|
||||
When the job is run it may be placed on multiple nodes. In the following
|
||||
example the 9 `webapp` allocations are spread across 2 nodes:
|
||||
|
||||
```shell
|
||||
$ nomad run webapp.nomad
|
||||
==> Monitoring evaluation "5129bc74"
|
||||
Evaluation triggered by job "webapp"
|
||||
Allocation "5b4d6db5" created: node "46f1c6c4", group "webapp"
|
||||
Allocation "670a715f" created: node "f7476465", group "webapp"
|
||||
Allocation "78b6b393" created: node "46f1c6c4", group "webapp"
|
||||
Allocation "85743ff5" created: node "f7476465", group "webapp"
|
||||
Allocation "edf71a5d" created: node "f7476465", group "webapp"
|
||||
Allocation "56f770c0" created: node "46f1c6c4", group "webapp"
|
||||
Allocation "9a51a484" created: node "46f1c6c4", group "webapp"
|
||||
Allocation "f6f6e64c" created: node "f7476465", group "webapp"
|
||||
Allocation "fefe81d0" created: node "f7476465", group "webapp"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "5129bc74" finished with status "complete"
|
||||
```
|
||||
|
||||
If one those nodes needed to be decommissioned, perhaps because of a hardware
|
||||
issue, then an operator would issue node drain to migrate the allocations off:
|
||||
|
||||
```shell
|
||||
$ nomad node drain -enable -yes 46f1
|
||||
2018-04-11T23:41:56Z: Ctrl-C to stop monitoring: will not cancel the node drain
|
||||
2018-04-11T23:41:56Z: Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" drain strategy set
|
||||
2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" marked for migration
|
||||
2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" marked for migration
|
||||
2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" draining
|
||||
2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" draining
|
||||
2018-04-11T23:42:03Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" status running -> complete
|
||||
2018-04-11T23:42:03Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" status running -> complete
|
||||
2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" marked for migration
|
||||
2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" draining
|
||||
2018-04-11T23:42:27Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" status running -> complete
|
||||
2018-04-11T23:42:29Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" marked for migration
|
||||
2018-04-11T23:42:29Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" draining
|
||||
2018-04-11T23:42:29Z: Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" has marked all allocations for migration
|
||||
2018-04-11T23:42:34Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" status running -> complete
|
||||
2018-04-11T23:42:34Z: All allocations on node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" have stopped.
|
||||
```
|
||||
|
||||
There are a couple important events to notice in the output. First, only 2
|
||||
allocations are migrated initially:
|
||||
|
||||
```text
|
||||
2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" marked for migration
|
||||
2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" marked for migration
|
||||
```
|
||||
|
||||
This is because `max_parallel = 2` in the job specification. The next
|
||||
allocation on the draining node waits to be migrated:
|
||||
|
||||
```text
|
||||
2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" marked for migration
|
||||
```
|
||||
|
||||
Note that this occurs 25 seconds after the initial migrations. The 25 second
|
||||
delay is because a replacement allocation took 10 seconds to become healthy and
|
||||
then the `min_healthy_time = "15s"` meant node draining waited an additional 15
|
||||
seconds. If the replacement allocation had failed within that time the node
|
||||
drain would not have continued until a replacement could be successfully made.
|
||||
|
||||
### Scheduling Eligibility
|
||||
|
||||
Now that the example drain has finished we can inspect the state of the drained
|
||||
node:
|
||||
|
||||
```shell
|
||||
$ nomad node status
|
||||
ID DC Name Class Drain Eligibility Status
|
||||
f7476465 dc1 nomad-1 <none> false eligible ready
|
||||
96b52ad8 dc1 nomad-2 <none> false eligible ready
|
||||
46f1c6c4 dc1 nomad-3 <none> false ineligible ready
|
||||
```
|
||||
|
||||
While node `46f1c6c4` has `Drain = false`, notice that its `Eligibility = ineligible`. Node scheduling eligibility is a new field in Nomad 0.8. When a
|
||||
node is ineligible for scheduling the scheduler will not consider it for new
|
||||
placements.
|
||||
|
||||
While draining, a node will always be ineligible for scheduling. Once draining
|
||||
completes it will remain ineligible to prevent refilling a newly drained node.
|
||||
|
||||
However, by default canceling a drain with the `-disable` option will reset a
|
||||
node to be eligible for scheduling. To cancel a drain and preserving the node's
|
||||
ineligible status use the `-keep-ineligible` option.
|
||||
|
||||
Scheduling eligibility can be toggled independently of node drains by using the
|
||||
[`nomad node eligibility`][eligibility] command:
|
||||
|
||||
```shell
|
||||
$ nomad node eligibility -disable 46f1
|
||||
Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling
|
||||
```
|
||||
|
||||
### Node Drain Deadline
|
||||
|
||||
Sometimes a drain is unable to proceed and complete normally. This could be
|
||||
caused by not enough capacity existing in the cluster to replace the drained
|
||||
allocations or by replacement allocations failing to start successfully in a
|
||||
timely fashion.
|
||||
|
||||
Operators may specify a deadline when enabling a node drain to prevent drains
|
||||
from not finishing. Once the deadline is reached, all remaining allocations on
|
||||
the node are stopped regardless of `migrate` stanza parameters.
|
||||
|
||||
The default deadline is 1 hour and may be changed with the
|
||||
[`-deadline`][deadline] command line option. The [`-force`][force] option is an
|
||||
instant deadline: all allocations are immediately stopped. The
|
||||
[`-no-deadline`][no-deadline] option disables the deadline so a drain may
|
||||
continue indefinitely.
|
||||
|
||||
Like all other drain parameters, a drain's deadline can be updated by making
|
||||
subsequent `nomad node drain ...` calls with updated values.
|
||||
|
||||
## Node Drains and Non-Service Jobs
|
||||
|
||||
So far we have only seen how draining works with service jobs. Both batch and
|
||||
system jobs are have different behaviors during node drains.
|
||||
|
||||
### Draining Batch Jobs
|
||||
|
||||
Node drains only migrate batch jobs once the drain's deadline has been reached.
|
||||
For node drains without a deadline the drain will not complete until all batch
|
||||
jobs on the node have completed (or failed).
|
||||
|
||||
The goal of this behavior is to avoid losing progress a batch job has made by
|
||||
forcing it to exit early.
|
||||
|
||||
### Keeping System Jobs Running
|
||||
|
||||
Node drains only stop system jobs once all other allocations have exited. This
|
||||
way if a node is running a log shipping daemon or metrics collector as a system
|
||||
job, it will continue to run as long as there are other allocations running.
|
||||
|
||||
The [`-ignore-system`][ignore-system] option leaves system jobs running even
|
||||
after all other allocations have exited. This is useful when system jobs are
|
||||
used to monitor Nomad or the node itself.
|
||||
|
||||
## Draining Multiple Nodes
|
||||
|
||||
A common operation is to decommission an entire class of nodes at once. Prior
|
||||
to Nomad 0.7 this was a problematic operation as the first node to begin
|
||||
draining may migrate all of their allocations to the next node about to be
|
||||
drained. In pathological cases this could repeat on each node to be drained and
|
||||
cause allocations to be rescheduled repeatedly.
|
||||
|
||||
As of Nomad 0.8 an operator can avoid this churn by marking nodes ineligible
|
||||
for scheduling before draining them using the [`nomad node eligibility`][eligibility] command:
|
||||
|
||||
```shell
|
||||
$ nomad node eligibility -disable 46f1
|
||||
Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling
|
||||
|
||||
$ nomad node eligibility -disable 96b5
|
||||
Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling
|
||||
|
||||
$ nomad node status
|
||||
ID DC Name Class Drain Eligibility Status
|
||||
f7476465 dc1 nomad-1 <none> false eligible ready
|
||||
46f1c6c4 dc1 nomad-2 <none> false ineligible ready
|
||||
96b52ad8 dc1 nomad-3 <none> false ineligible ready
|
||||
```
|
||||
|
||||
Now that both `nomad-2` and `nomad-3` are ineligible for scheduling, they can
|
||||
be drained without risking placing allocations on an _about-to-be-drained_
|
||||
node.
|
||||
|
||||
Toggling scheduling eligibility can be done totally independently of draining.
|
||||
For example when an operator wants to inspect the allocations currently running
|
||||
on a node without risking new allocations being scheduled and changing the
|
||||
node's state:
|
||||
|
||||
```shell
|
||||
$ nomad node eligibility -self -disable
|
||||
Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling
|
||||
|
||||
$ # ...inspect node state...
|
||||
|
||||
$ nomad node eligibility -self -enable
|
||||
Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: eligible for scheduling
|
||||
```
|
||||
|
||||
### Example: Migrating Datacenters
|
||||
|
||||
A more complete example of draining multiple nodes would be when migrating from
|
||||
an old datacenter (`dc1`) to a new datacenter (`dc2`):
|
||||
|
||||
```shell
|
||||
$ nomad node status -allocs
|
||||
ID DC Name Class Drain Eligibility Status Running Allocs
|
||||
f7476465 dc1 nomad-1 <none> false eligible ready 4
|
||||
46f1c6c4 dc1 nomad-2 <none> false eligible ready 1
|
||||
96b52ad8 dc1 nomad-3 <none> false eligible ready 4
|
||||
168bdd03 dc2 nomad-4 <none> false eligible ready 0
|
||||
9ccb3306 dc2 nomad-5 <none> false eligible ready 0
|
||||
7a7f9a37 dc2 nomad-6 <none> false eligible ready 0
|
||||
```
|
||||
|
||||
Before migrating ensure that all jobs in `dc1` have `datacenters = ["dc1", "dc2"]`. Then before draining, mark all nodes in `dc1` as ineligible for
|
||||
scheduling. Shell scripting can help automate manipulating multiple nodes at
|
||||
once:
|
||||
|
||||
```shell
|
||||
$ nomad node status | awk '{ print $2 " " $1 }' | grep ^dc1 | awk '{ system("nomad node eligibility -disable "$2) }'
|
||||
Node "f7476465-4d6e-c0de-26d0-e383c49be941" scheduling eligibility set: ineligible for scheduling
|
||||
Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling
|
||||
Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling
|
||||
|
||||
$ nomad node status
|
||||
ID DC Name Class Drain Eligibility Status
|
||||
f7476465 dc1 nomad-1 <none> false ineligible ready
|
||||
46f1c6c4 dc1 nomad-2 <none> false ineligible ready
|
||||
96b52ad8 dc1 nomad-3 <none> false ineligible ready
|
||||
168bdd03 dc2 nomad-4 <none> false eligible ready
|
||||
9ccb3306 dc2 nomad-5 <none> false eligible ready
|
||||
7a7f9a37 dc2 nomad-6 <none> false eligible ready
|
||||
```
|
||||
|
||||
Then drain each node in `dc1`. For this example we will only monitor the final
|
||||
node that is draining. Watching `nomad node status -allocs` is also a good way
|
||||
to monitor the status of drains.
|
||||
|
||||
```shell
|
||||
$ nomad node drain -enable -yes -detach f7476465
|
||||
Node "f7476465-4d6e-c0de-26d0-e383c49be941" drain strategy set
|
||||
|
||||
$ nomad node drain -enable -yes -detach 46f1c6c4
|
||||
Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" drain strategy set
|
||||
|
||||
$ nomad node drain -enable -yes 9ccb3306
|
||||
2018-04-12T22:08:00Z: Ctrl-C to stop monitoring: will not cancel the node drain
|
||||
2018-04-12T22:08:00Z: Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" drain strategy set
|
||||
2018-04-12T22:08:15Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" marked for migration
|
||||
2018-04-12T22:08:16Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" draining
|
||||
2018-04-12T22:08:17Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" marked for migration
|
||||
2018-04-12T22:08:17Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" draining
|
||||
2018-04-12T22:08:21Z: Alloc "392ee2ec-d517-c170-e7b1-d93b2d44642c" status running -> complete
|
||||
2018-04-12T22:08:22Z: Alloc "6a833b3b-c062-1f5e-8dc2-8b6af18a5b94" status running -> complete
|
||||
2018-04-12T22:09:08Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" marked for migration
|
||||
2018-04-12T22:09:09Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" draining
|
||||
2018-04-12T22:09:14Z: Alloc "d572d7a3-024b-fcb7-128b-1932a49c8d79" status running -> complete
|
||||
2018-04-12T22:09:33Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" marked for migration
|
||||
2018-04-12T22:09:33Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" draining
|
||||
2018-04-12T22:09:33Z: Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" has marked all allocations for migration
|
||||
2018-04-12T22:09:39Z: Alloc "f3f24277-4435-56a3-7ee1-1b1eff5e3aa1" status running -> complete
|
||||
2018-04-12T22:09:39Z: All allocations on node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" have stopped.
|
||||
```
|
||||
|
||||
Note that there was a 15 second delay between node `96b52ad8` starting to drain
|
||||
and having its first allocation migrated. The delay was due to 2 other
|
||||
allocations for the same job already being migrated from the other nodes. Once
|
||||
at least 8 out of the 9 allocations are running for the job, another allocation
|
||||
could begin draining.
|
||||
|
||||
The final node drain command did not exit until 6 seconds after the `drain complete` message because the command line tool blocks until all allocations on
|
||||
the node have stopped. This allows operators to script shutting down a node
|
||||
once a drain command exits and know all services have already exited.
|
||||
|
||||
[deadline]: /docs/commands/node/drain#deadline
|
||||
[eligibility]: /docs/commands/node/eligibility
|
||||
[force]: /docs/commands/node/drain#force
|
||||
[ignore-system]: /docs/commands/node/drain#ignore-system
|
||||
[migrate]: /docs/job-specification/migrate
|
||||
[no-deadline]: /docs/commands/node/drain#no-deadline
|
218
website/pages/guides/operations/outage.mdx
Normal file
218
website/pages/guides/operations/outage.mdx
Normal file
|
@ -0,0 +1,218 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Outage Recovery
|
||||
sidebar_title: Outage Recovery
|
||||
description: |-
|
||||
Don't panic! This is a critical first step. Depending on your deployment
|
||||
configuration, it may take only a single server failure for cluster
|
||||
unavailability. Recovery requires an operator to intervene, but recovery is
|
||||
straightforward.
|
||||
---
|
||||
|
||||
# Outage Recovery
|
||||
|
||||
Don't panic! This is a critical first step.
|
||||
|
||||
Depending on your
|
||||
[deployment configuration](/docs/internals/consensus#deployment_table), it
|
||||
may take only a single server failure for cluster unavailability. Recovery
|
||||
requires an operator to intervene, but the process is straightforward.
|
||||
|
||||
~> This guide is for recovery from a Nomad outage due to a majority of server
|
||||
nodes in a datacenter being lost. If you are looking to add or remove servers,
|
||||
see the [bootstrapping guide](/guides/operations/cluster/bootstrapping).
|
||||
|
||||
## Failure of a Single Server Cluster
|
||||
|
||||
If you had only a single server and it has failed, simply restart it. A
|
||||
single server configuration requires the
|
||||
[`-bootstrap-expect=1`](/docs/configuration/server#bootstrap_expect)
|
||||
flag. If the server cannot be recovered, you need to bring up a new
|
||||
server. See the [bootstrapping guide](/guides/operations/cluster/bootstrapping)
|
||||
for more detail.
|
||||
|
||||
In the case of an unrecoverable server failure in a single server cluster, data
|
||||
loss is inevitable since data was not replicated to any other servers. This is
|
||||
why a single server deploy is **never** recommended.
|
||||
|
||||
## Failure of a Server in a Multi-Server Cluster
|
||||
|
||||
If you think the failed server is recoverable, the easiest option is to bring
|
||||
it back online and have it rejoin the cluster with the same IP address, returning
|
||||
the cluster to a fully healthy state. Similarly, even if you need to rebuild a
|
||||
new Nomad server to replace the failed node, you may wish to do that immediately.
|
||||
Keep in mind that the rebuilt server needs to have the same IP address as the failed
|
||||
server. Again, once this server is online and has rejoined, the cluster will return
|
||||
to a fully healthy state.
|
||||
|
||||
Both of these strategies involve a potentially lengthy time to reboot or rebuild
|
||||
a failed server. If this is impractical or if building a new server with the same
|
||||
IP isn't an option, you need to remove the failed server. Usually, you can issue
|
||||
a [`nomad server force-leave`](/docs/commands/server/force-leave) command
|
||||
to remove the failed server if it's still a member of the cluster.
|
||||
|
||||
If [`nomad server force-leave`](/docs/commands/server/force-leave) isn't
|
||||
able to remove the server, you have two methods available to remove it,
|
||||
depending on your version of Nomad:
|
||||
|
||||
- In Nomad 0.5.5 and later, you can use the [`nomad operator raft remove-peer`](/docs/commands/operator/raft-remove-peer) command to remove
|
||||
the stale peer server on the fly with no downtime.
|
||||
|
||||
- In versions of Nomad prior to 0.5.5, you can manually remove the stale peer
|
||||
server using the `raft/peers.json` recovery file on all remaining servers. See
|
||||
the [section below](#manual-recovery-using-peers-json) for details on this
|
||||
procedure. This process requires Nomad downtime to complete.
|
||||
|
||||
In Nomad 0.5.5 and later, you can use the [`nomad operator raft list-peers`](/docs/commands/operator/raft-list-peers) command to inspect
|
||||
the Raft configuration:
|
||||
|
||||
```shell
|
||||
$ nomad operator raft list-peers
|
||||
Node ID Address State Voter
|
||||
nomad-server01.global 10.10.11.5:4647 10.10.11.5:4647 follower true
|
||||
nomad-server02.global 10.10.11.6:4647 10.10.11.6:4647 leader true
|
||||
nomad-server03.global 10.10.11.7:4647 10.10.11.7:4647 follower true
|
||||
```
|
||||
|
||||
## Failure of Multiple Servers in a Multi-Server Cluster
|
||||
|
||||
In the event that multiple servers are lost, causing a loss of quorum and a
|
||||
complete outage, partial recovery is possible using data on the remaining
|
||||
servers in the cluster. There may be data loss in this situation because multiple
|
||||
servers were lost, so information about what's committed could be incomplete.
|
||||
The recovery process implicitly commits all outstanding Raft log entries, so
|
||||
it's also possible to commit data that was uncommitted before the failure.
|
||||
|
||||
See the [section below](#manual-recovery-using-peers-json) for details of the
|
||||
recovery procedure. You simply include just the remaining servers in the
|
||||
`raft/peers.json` recovery file. The cluster should be able to elect a leader
|
||||
once the remaining servers are all restarted with an identical `raft/peers.json`
|
||||
configuration.
|
||||
|
||||
Any new servers you introduce later can be fresh with totally clean data directories
|
||||
and joined using Nomad's `server join` command.
|
||||
|
||||
In extreme cases, it should be possible to recover with just a single remaining
|
||||
server by starting that single server with itself as the only peer in the
|
||||
`raft/peers.json` recovery file.
|
||||
|
||||
Prior to Nomad 0.5.5 it wasn't always possible to recover from certain
|
||||
types of outages with `raft/peers.json` because this was ingested before any Raft
|
||||
log entries were played back. In Nomad 0.5.5 and later, the `raft/peers.json`
|
||||
recovery file is final, and a snapshot is taken after it is ingested, so you are
|
||||
guaranteed to start with your recovered configuration. This does implicitly commit
|
||||
all Raft log entries, so should only be used to recover from an outage, but it
|
||||
should allow recovery from any situation where there's some cluster data available.
|
||||
|
||||
## Manual Recovery Using peers.json
|
||||
|
||||
To begin, stop all remaining servers. You can attempt a graceful leave,
|
||||
but it will not work in most cases. Do not worry if the leave exits with an
|
||||
error. The cluster is in an unhealthy state, so this is expected.
|
||||
|
||||
In Nomad 0.5.5 and later, the `peers.json` file is no longer present
|
||||
by default and is only used when performing recovery. This file will be deleted
|
||||
after Nomad starts and ingests this file. Nomad 0.5.5 also uses a new, automatically-
|
||||
created `raft/peers.info` file to avoid ingesting the `raft/peers.json` file on the
|
||||
first start after upgrading. Be sure to leave `raft/peers.info` in place for proper
|
||||
operation.
|
||||
|
||||
Using `raft/peers.json` for recovery can cause uncommitted Raft log entries to be
|
||||
implicitly committed, so this should only be used after an outage where no
|
||||
other option is available to recover a lost server. Make sure you don't have
|
||||
any automated processes that will put the peers file in place on a
|
||||
periodic basis.
|
||||
|
||||
The next step is to go to the
|
||||
[`-data-dir`](/docs/configuration#data_dir) of each Nomad
|
||||
server. Inside that directory, there will be a `raft/` sub-directory. We need to
|
||||
create a `raft/peers.json` file. It should look something like:
|
||||
|
||||
```python
|
||||
['10.0.1.8:4647', '10.0.1.6:4647', '10.0.1.7:4647']
|
||||
```
|
||||
|
||||
Simply create entries for all remaining servers. You must confirm
|
||||
that servers you do not include here have indeed failed and will not later
|
||||
rejoin the cluster. Ensure that this file is the same across all remaining
|
||||
server nodes.
|
||||
|
||||
At this point, you can restart all the remaining servers. In Nomad 0.5.5 and
|
||||
later you will see them ingest recovery file:
|
||||
|
||||
```text
|
||||
...
|
||||
2016/08/16 14:39:20 [INFO] nomad: found peers.json file, recovering Raft configuration...
|
||||
2016/08/16 14:39:20 [INFO] nomad.fsm: snapshot created in 12.484µs
|
||||
2016/08/16 14:39:20 [INFO] snapshot: Creating new snapshot at /tmp/peers/raft/snapshots/2-5-1471383560779.tmp
|
||||
2016/08/16 14:39:20 [INFO] nomad: deleted peers.json file after successful recovery
|
||||
2016/08/16 14:39:20 [INFO] raft: Restored from snapshot 2-5-1471383560779
|
||||
2016/08/16 14:39:20 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:10.212.15.121:4647 Address:10.212.15.121:4647}]
|
||||
...
|
||||
```
|
||||
|
||||
If any servers managed to perform a graceful leave, you may need to have them
|
||||
rejoin the cluster using the [`server join`](/docs/commands/server/join) command:
|
||||
|
||||
```shell
|
||||
$ nomad server join <Node Address>
|
||||
Successfully joined cluster by contacting 1 nodes.
|
||||
```
|
||||
|
||||
It should be noted that any existing member can be used to rejoin the cluster
|
||||
as the gossip protocol will take care of discovering the server nodes.
|
||||
|
||||
At this point, the cluster should be in an operable state again. One of the
|
||||
nodes should claim leadership and emit a log like:
|
||||
|
||||
```text
|
||||
[INFO] nomad: cluster leadership acquired
|
||||
```
|
||||
|
||||
In Nomad 0.5.5 and later, you can use the [`nomad operator raft list-peers`](/docs/commands/operator/raft-list-peers) command to inspect
|
||||
the Raft configuration:
|
||||
|
||||
```shell
|
||||
$ nomad operator raft list-peers
|
||||
Node ID Address State Voter
|
||||
nomad-server01.global 10.10.11.5:4647 10.10.11.5:4647 follower true
|
||||
nomad-server02.global 10.10.11.6:4647 10.10.11.6:4647 leader true
|
||||
nomad-server03.global 10.10.11.7:4647 10.10.11.7:4647 follower true
|
||||
```
|
||||
|
||||
## Peers.json Format Changes in Raft Protocol 3
|
||||
|
||||
For Raft protocol version 3 and later, peers.json should be formatted as a JSON
|
||||
array containing the node ID, address:port, and suffrage information of each
|
||||
Nomad server in the cluster, like this:
|
||||
|
||||
```
|
||||
[
|
||||
{
|
||||
"id": "adf4238a-882b-9ddc-4a9d-5b6758e4159e",
|
||||
"address": "10.1.0.1:4647",
|
||||
"non_voter": false
|
||||
},
|
||||
{
|
||||
"id": "8b6dda82-3103-11e7-93ae-92361f002671",
|
||||
"address": "10.1.0.2:4647",
|
||||
"non_voter": false
|
||||
},
|
||||
{
|
||||
"id": "97e17742-3103-11e7-93ae-92361f002671",
|
||||
"address": "10.1.0.3:4647",
|
||||
"non_voter": false
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
- `id` `(string: <required>)` - Specifies the `node ID`
|
||||
of the server. This can be found in the logs when the server starts up,
|
||||
and it can also be found inside the `node-id` file in the server's data directory.
|
||||
|
||||
- `address` `(string: <required>)` - Specifies the IP and port of the server in `ip:port` format. The port is the
|
||||
server's RPC port used for cluster communications, typically 4647.
|
||||
|
||||
- `non_voter` `(bool: <false>)` - This controls whether the server is a non-voter, which is used
|
||||
in some advanced [Autopilot](/guides/operations/autopilot) configurations. If omitted, it will
|
||||
default to false, which is typical for most clusters.
|
559
website/pages/guides/security/acl.mdx
Normal file
559
website/pages/guides/security/acl.mdx
Normal file
|
@ -0,0 +1,559 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Access Control
|
||||
sidebar_title: Access Control
|
||||
description: >-
|
||||
Nomad provides an optional Access Control List (ACL) system which can be used
|
||||
to control
|
||||
|
||||
access to data and APIs. The ACL is Capability-based, relying on tokens which
|
||||
are
|
||||
|
||||
associated with policies to determine which fine grained rules can be applied.
|
||||
---
|
||||
|
||||
# Access Control
|
||||
|
||||
Nomad provides an optional Access Control List (ACL) system which can be used to control access to data and APIs. The ACL is [Capability-based](https://en.wikipedia.org/wiki/Capability-based_security), relying on tokens which are associated with policies to determine which fine grained rules can be applied. Nomad's capability based ACL system is very similar to the design of [AWS IAM](https://aws.amazon.com/iam/).
|
||||
|
||||
# ACL System Overview
|
||||
|
||||
The ACL system is designed to be easy to use and fast to enforce while providing administrative insight. At the highest level, there are three major components to the ACL system:
|
||||
|
||||
![ACL Overview](/img/acl.jpg)
|
||||
|
||||
- **ACL Policies**. No permissions are granted by default, making Nomad a default-deny or whitelist system. Policies allow a set of capabilities or actions to be granted or whitelisted. For example, a "readonly" policy might only grant the ability to list and inspect running jobs, but not to submit new ones.
|
||||
|
||||
- **ACL Tokens**. Requests to Nomad are authenticated by using bearer token. Each ACL token has a public Accessor ID which is used to name a token, and a Secret ID which is used to make requests to Nomad. The Secret ID is provided using a request header (`X-Nomad-Token`) and is used to authenticate the caller. Tokens are either `management` or `client` types. The `management` tokens are effectively "root" in the system, and can perform any operation. The `client` tokens are associated with one or more ACL policies which grant specific capabilities.
|
||||
|
||||
- **Capabilities**. Capabilities are the set of actions that can be performed. This includes listing jobs, submitting jobs, querying nodes, etc. A `management` token is granted all capabilities, while `client` tokens are granted specific capabilities via ACL Policies. The full set of capabilities is discussed below in the rule specifications.
|
||||
|
||||
### ACL Policies
|
||||
|
||||
An ACL policy is a named set of rules. Each policy must have a unique name, an optional description, and a rule set.
|
||||
A client ACL token can be associated with multiple policies, and a request is allowed if _any_ of the associated policies grant the capability.
|
||||
Management tokens cannot be associated with policies because they are granted all capabilities.
|
||||
|
||||
The special `anonymous` policy can be defined to grant capabilities to requests which are made anonymously. An anonymous request is a request made to Nomad without the `X-Nomad-Token` header specified. This can be used to allow anonymous users to list jobs and view their status, while requiring authenticated requests to submit new jobs or modify existing jobs. By default, there is no `anonymous` policy set meaning all anonymous requests are denied.
|
||||
|
||||
### ACL Tokens
|
||||
|
||||
ACL tokens are used to authenticate requests and determine if the caller is authorized to perform an action. Each ACL token has a public Accessor ID which is used to identify the token, a Secret ID which is used to make requests to Nomad, and an optional human readable name. All `client` type tokens are associated with one or more policies, and can perform an action if any associated policy allows it. Tokens can be associated with policies which do not exist, which are the equivalent of granting no capabilities. The `management` type tokens cannot be associated with policies, but can perform any action.
|
||||
|
||||
When ACL tokens are created, they can be optionally marked as `Global`. This causes them to be created in the authoritative region and replicated to all other regions. Otherwise, tokens are created locally in the region the request was made and not replicated. Local tokens cannot be used for cross-region requests since they are not replicated between regions.
|
||||
|
||||
### Capabilities and Scope
|
||||
|
||||
The following table summarizes the ACL Rules that are available for constructing policy rules:
|
||||
|
||||
| Policy | Scope |
|
||||
| --------------------------------- | -------------------------------------------- |
|
||||
| [namespace](#namespace-rules) | Job related operations by namespace |
|
||||
| [agent](#agent-rules) | Utility operations in the Agent API |
|
||||
| [node](#node-rules) | Node-level catalog operations |
|
||||
| [operator](#operator-rules) | Cluster-level operations in the Operator API |
|
||||
| [quota](#quota-rules) | Quota specification related operations |
|
||||
| [host_volume](#host_volume-rules) | host_volume related operations |
|
||||
|
||||
Constructing rules from these policies is covered in detail in the Rule Specification section below.
|
||||
|
||||
### Multi-Region Configuration
|
||||
|
||||
Nomad supports multi-datacenter and multi-region configurations. A single region is able to service multiple datacenters, and all servers in a region replicate their state between each other. In a multi-region configuration, there is a set of servers per region. Each region operates independently and is loosely coupled to allow jobs to be scheduled in any region and requests to flow transparently to the correct region.
|
||||
|
||||
When ACLs are enabled, Nomad depends on an "authoritative region" to act as a single source of truth for ACL policies and global ACL tokens. The authoritative region is configured in the [`server` stanza](/docs/configuration/server) of agents, and all regions must share a single authoritative source. Any ACL policies or global ACL tokens are created in the authoritative region first. All other regions replicate ACL policies and global ACL tokens to act as local mirrors. This allows policies to be administered centrally, and for enforcement to be local to each region for low latency.
|
||||
|
||||
Global ACL tokens are used to allow cross region requests. Standard ACL tokens are created in a single target region and not replicated. This means if a request takes place between regions, global tokens must be used so that both regions will have the token registered.
|
||||
|
||||
# Configuring ACLs
|
||||
|
||||
ACLs are not enabled by default, and must be enabled. Clients and Servers need to set `enabled` in the [`acl` stanza](/docs/configuration/acl). This enables the [ACL Policy](/api/acl-policies) and [ACL Token](/api/acl-tokens) APIs, as well as endpoint enforcement.
|
||||
|
||||
For multi-region configurations, all servers must be configured to use a single [authoritative region](/docs/configuration/server#authoritative_region). The authoritative region is responsible for managing ACL policies and global tokens. Servers in other regions will replicate policies and global tokens to act as a mirror, and must have their [`replication_token`](/docs/configuration/acl#replication_token) configured.
|
||||
|
||||
# Bootstrapping ACLs
|
||||
|
||||
Bootstrapping ACLs on a new cluster requires a few steps, outlined below:
|
||||
|
||||
### Enable ACLs on Nomad Servers
|
||||
|
||||
The APIs needed to manage policies and tokens are not enabled until ACLs are enabled. To begin, we need to enable the ACLs on the servers. If a multi-region setup is used, the authoritative region should be enabled first. For each server:
|
||||
|
||||
1. Set `enabled = true` in the [`acl` stanza](/docs/configuration/acl#enabled).
|
||||
1. Set `authoritative_region` in the [`server` stanza](/docs/configuration/server#authoritative_region).
|
||||
1. For servers outside the authoritative region, set `replication_token` in the [`acl` stanza](/docs/configuration/acl#replication_token). Replication tokens should be `management` type tokens which are either created in the authoritative region, or created as Global tokens.
|
||||
1. Restart the Nomad server to pick up the new configuration.
|
||||
|
||||
Please take care to restart the servers one at a time, and ensure each server has joined and is operating correctly before restarting another.
|
||||
|
||||
### Generate the initial token
|
||||
|
||||
Once the ACL system is enabled, we need to generate our initial token. This first token is used to bootstrap the system and care should be taken not to lose it. Once the ACL system is enabled, we use the [Bootstrap CLI](/docs/commands/acl/bootstrap):
|
||||
|
||||
```shell
|
||||
$ nomad acl bootstrap
|
||||
Accessor ID = 5b7fd453-d3f7-6814-81dc-fcfe6daedea5
|
||||
Secret ID = 9184ec35-65d4-9258-61e3-0c066d0a45c5
|
||||
Name = Bootstrap Token
|
||||
Type = management
|
||||
Global = true
|
||||
Policies = n/a
|
||||
Create Time = 2017-09-11 17:38:10.999089612 +0000 UTC
|
||||
Create Index = 7
|
||||
Modify Index = 7
|
||||
```
|
||||
|
||||
Once the initial bootstrap is performed, it cannot be performed again unless [reset](#resetting-acl-bootstrap). Make sure to save this AccessorID and SecretID.
|
||||
The bootstrap token is a `management` type token, meaning it can perform any operation. It should be used to setup the ACL policies and create additional ACL tokens. The bootstrap token can be deleted and is like any other token, so care should be taken to not revoke all management tokens.
|
||||
|
||||
### Enable ACLs on Nomad Clients
|
||||
|
||||
To enforce client endpoints, we need to enable ACLs on clients as well. This is simpler than servers, and we just need to set `enabled = true` in the [`acl` stanza](/docs/configuration/acl). Once configured, we need to restart the client for the change.
|
||||
|
||||
### Set an Anonymous Policy (Optional)
|
||||
|
||||
The ACL system uses a whitelist or default-deny model. This means by default no permissions are granted. For clients making requests without ACL tokens, we may want to grant some basic level of access. This is done by setting rules on the special "anonymous" policy. This policy is applied to any requests made without a token.
|
||||
|
||||
To permit anonymous users to read, we can setup the following policy:
|
||||
|
||||
```text
|
||||
# Store our token secret ID
|
||||
$ export NOMAD_TOKEN="BOOTSTRAP_SECRET_ID"
|
||||
|
||||
# Write out the payload
|
||||
$ cat > payload.json <<EOF
|
||||
{
|
||||
"Name": "anonymous",
|
||||
"Description": "Allow read-only access for anonymous requests",
|
||||
"Rules": "
|
||||
namespace \"default\" {
|
||||
policy = \"read\"
|
||||
}
|
||||
agent {
|
||||
policy = \"read\"
|
||||
}
|
||||
node {
|
||||
policy = \"read\"
|
||||
}
|
||||
"
|
||||
}
|
||||
EOF
|
||||
|
||||
# Install the policy
|
||||
$ curl --request POST \
|
||||
--data @payload.json \
|
||||
-H "X-Nomad-Token: $NOMAD_TOKEN" \
|
||||
https://localhost:4646/v1/acl/policy/anonymous
|
||||
|
||||
# Verify anonymous request works
|
||||
$ curl https://localhost:4646/v1/jobs
|
||||
```
|
||||
|
||||
# Rule Specification
|
||||
|
||||
A core part of the ACL system is the rule language which is used to describe the policy that must be enforced. We make use of the [HashiCorp Configuration Language (HCL)](https://github.com/hashicorp/hcl/) to specify rules. This language is human readable and interoperable with JSON making it easy to machine-generate. Policies can contain any number of rules.
|
||||
|
||||
Policies typically have several dispositions:
|
||||
|
||||
- `read`: allow the resource to be read but not modified
|
||||
- `write`: allow the resource to be read and modified
|
||||
- `deny`: do not allow the resource to be read or modified. Deny takes precedence when multiple policies are associated with a token.
|
||||
|
||||
Specification in the HCL format looks like:
|
||||
|
||||
```text
|
||||
# Allow read only access to the default namespace
|
||||
namespace "default" {
|
||||
policy = "read"
|
||||
}
|
||||
|
||||
# Allow writing to the `foo` namespace
|
||||
namespace "foo" {
|
||||
policy = "write"
|
||||
}
|
||||
|
||||
agent {
|
||||
policy = "read"
|
||||
}
|
||||
|
||||
node {
|
||||
policy = "read"
|
||||
}
|
||||
|
||||
quota {
|
||||
policy = "read"
|
||||
}
|
||||
```
|
||||
|
||||
This is equivalent to the following JSON input:
|
||||
|
||||
```json
|
||||
{
|
||||
"namespace": {
|
||||
"default": {
|
||||
"policy": "read"
|
||||
},
|
||||
"foo": {
|
||||
"policy": "write"
|
||||
}
|
||||
},
|
||||
"agent": {
|
||||
"policy": "read"
|
||||
},
|
||||
"node": {
|
||||
"policy": "read"
|
||||
},
|
||||
"quota": {
|
||||
"policy": "read"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The [ACL Policy](/api/acl-policies) API allows either HCL or JSON to be used to define the content of the rules section.
|
||||
|
||||
### Namespace Rules
|
||||
|
||||
The `namespace` policy controls access to a namespace, including the [Jobs API](/api/jobs), [Deployments API](/api/deployments), [Allocations API](/api/allocations), and [Evaluations API](/api/evaluations).
|
||||
|
||||
```
|
||||
namespace "default" {
|
||||
policy = "write"
|
||||
}
|
||||
|
||||
namespace "sensitive" {
|
||||
policy = "read"
|
||||
}
|
||||
```
|
||||
|
||||
Namespace rules are keyed by the namespace name they apply to. When no namespace is specified, the "default" namespace is the one used. For example, the above policy grants write access to the default namespace, and read access to the sensitive namespace. In addition to the coarse grained `policy` specification, the `namespace` stanza allows setting a more fine grained list of `capabilities`. This includes:
|
||||
|
||||
- `deny` - When multiple policies are associated with a token, deny will take precedence and prevent any capabilities.
|
||||
- `list-jobs` - Allows listing the jobs and seeing coarse grain status.
|
||||
- `read-job` - Allows inspecting a job and seeing fine grain status.
|
||||
- `submit-job` - Allows jobs to be submitted or modified.
|
||||
- `dispatch-job` - Allows jobs to be dispatched
|
||||
- `read-logs` - Allows the logs associated with a job to be viewed.
|
||||
- `read-fs` - Allows the filesystem of allocations associated to be viewed.
|
||||
- `alloc-exec` - Allows an operator to connect and run commands in running allocations.
|
||||
- `alloc-node-exec` - Allows an operator to connect and run commands in allocations running without filesystem isolation, for example, raw_exec jobs.
|
||||
- `alloc-lifecycle` - Allows an operator to stop individual allocations manually.
|
||||
- `sentinel-override` - Allows soft mandatory policies to be overridden.
|
||||
|
||||
The coarse grained policy dispositions are shorthand for the fine grained capabilities:
|
||||
|
||||
- `deny` policy - ["deny"]
|
||||
- `read` policy - ["list-jobs", "read-job"]
|
||||
- `write` policy - ["list-jobs", "read-job", "submit-job", "dispatch-job", "read-logs", "read-fs", "alloc-exec", "alloc-lifecycle"]
|
||||
|
||||
When both the policy short hand and a capabilities list are provided, the capabilities are merged:
|
||||
|
||||
```
|
||||
# Allow reading jobs and submitting jobs, without allowing access
|
||||
# to view log output or inspect the filesystem
|
||||
namespace "default" {
|
||||
policy = "read"
|
||||
capabilities = ["submit-job"]
|
||||
}
|
||||
```
|
||||
|
||||
Namespace definitions may also include globs, allowing a single policy definition to apply to a set of namespaces. For example, the below policy allows read access to most production namespaces, but allows write access to the "production-api" namespace, and rejects any access to the "production-web" namespace.
|
||||
|
||||
```
|
||||
namespace "production-*" {
|
||||
policy = "read"
|
||||
}
|
||||
|
||||
namespace "production-api" {
|
||||
policy = "write"
|
||||
}
|
||||
|
||||
namespace "production-web" {
|
||||
policy = "deny"
|
||||
}
|
||||
```
|
||||
|
||||
Namespaces are matched to their policies first by performing a lookup on any _exact match_, before falling back to performing a glob based lookup. When looking up namespaces by glob, the matching policy with the greatest number of matched characters will be chosen. For example:
|
||||
|
||||
```
|
||||
namespace "*-web" {
|
||||
policy = "deny"
|
||||
}
|
||||
|
||||
namespace "*" {
|
||||
policy = "write"
|
||||
}
|
||||
```
|
||||
|
||||
Will evaluate to deny for `production-web`, because it is 9 characters different from the `"*-web"` rule, but 13 characters different from the `"*"` rule.
|
||||
|
||||
### Node Rules
|
||||
|
||||
The `node` policy controls access to the [Node API](/api/nodes) such as listing nodes or triggering a node drain. Node rules are specified for all nodes using the `node` key:
|
||||
|
||||
```
|
||||
node {
|
||||
policy = "read"
|
||||
}
|
||||
```
|
||||
|
||||
There's only one node policy allowed per rule set, and its value is set to one of the policy dispositions.
|
||||
|
||||
### Agent Rules
|
||||
|
||||
The `agent` policy controls access to the utility operations in the [Agent API](/api/agent), such as join and leave. Agent rules are specified for all agents using the `agent` key:
|
||||
|
||||
```
|
||||
agent {
|
||||
policy = "write"
|
||||
}
|
||||
```
|
||||
|
||||
There's only one agent policy allowed per rule set, and its value is set to one of the policy dispositions.
|
||||
|
||||
### Operator Rules
|
||||
|
||||
The `operator` policy controls access to the [Operator API](/api/operator). Operator rules look like:
|
||||
|
||||
```
|
||||
operator {
|
||||
policy = "read"
|
||||
}
|
||||
```
|
||||
|
||||
There's only one operator policy allowed per rule set, and its value is set to one of the policy dispositions. In the example above, the token could be used to query the operator endpoints for diagnostic purposes but not make any changes.
|
||||
|
||||
### Quota Rules
|
||||
|
||||
The `quota` policy controls access to the quota specification operations in the [Quota API](/api/quotas), such as quota creation and deletion. Quota rules are specified for all quotas using the `quota` key:
|
||||
|
||||
```
|
||||
quota {
|
||||
policy = "write"
|
||||
}
|
||||
```
|
||||
|
||||
There's only one quota policy allowed per rule set, and its value is set to one of the policy dispositions.
|
||||
|
||||
# Advanced Topics
|
||||
|
||||
### Outages and Multi-Region Replication
|
||||
|
||||
The ACL system takes some steps to ensure operation during outages. Clients nodes maintain a limited cache of ACL tokens and ACL policies that have recently or frequently been used, associated with a time-to-live (TTL).
|
||||
|
||||
When the region servers are unavailable, the clients will automatically ignore the cache TTL, and extend the cache until the outage has recovered. For any policies or tokens that are not cached, they will be treated as missing and denied access until the outage has been resolved.
|
||||
|
||||
Nomad servers have all the policies and tokens locally and can continue serving requests even if quorum is lost. The tokens and policies may become stale during this period as data is not actively replicating, but will be automatically fixed when the outage has been resolved.
|
||||
|
||||
In a multi-region setup, there is a single authoritative region which is the source of truth for ACL policies and global ACL tokens. All other regions asynchronously replicate from the authoritative region. When replication is interrupted, the existing data is used for request processing and may become stale. When the authoritative region is reachable, replication will resume and repair any inconsistency.
|
||||
|
||||
### host_volume Rules
|
||||
|
||||
The `host_volume` policy controls access to mounting and accessing host volumes.
|
||||
|
||||
```
|
||||
host_volume "*" {
|
||||
policy = "write"
|
||||
}
|
||||
|
||||
host_volume "prod-*" {
|
||||
policy = "deny"
|
||||
}
|
||||
|
||||
host_volume "prod-ca-certificates" {
|
||||
policy = "read"
|
||||
}
|
||||
```
|
||||
|
||||
Host volume rules are keyed to the volume names that they apply to. As with namespaces, you may use wildcards to reuse the same configuration across a set of volumes. In addition to the coarse grained policy specification, the `host_volume` stanza allows setting a more fine grained list of capabilities. This includes:
|
||||
|
||||
- `deny` - Do not allow a user to mount a volume in any way.
|
||||
- `mount-readonly` - Only allow the user to mount the volume as `readonly`
|
||||
- `mount-readwrite` - Allow the user to mount the volume as `readonly` or `readwrite` if the `host_volume` configuration allows it.
|
||||
|
||||
The course grained policy permissions are shorthand for the fine grained capabilities:
|
||||
|
||||
- `deny` policy - ["deny"]
|
||||
- `read` policy - ["mount-readonly"]
|
||||
- `write` policy - ["mount-readonly", "mount-readwrite"]
|
||||
|
||||
When both the policy short hand and a capabilities list are provided, the capabilities are merged.
|
||||
|
||||
**Note:** Host Volume policies are applied when attempting to _use_ a volume, however, if a user has access to the Node API, they will be able to see that a volume exists in the `nomad node status` output regardless of this configuration.
|
||||
|
||||
### Resetting ACL Bootstrap
|
||||
|
||||
If all management tokens are lost, it is possible to reset the ACL bootstrap so that it can be performed again. First, we need to determine the reset index, this can be done by calling the reset endpoint:
|
||||
|
||||
```shell
|
||||
$ nomad acl bootstrap
|
||||
|
||||
Error bootstrapping: Unexpected response code: 500 (ACL bootstrap already done (reset index: 7))
|
||||
```
|
||||
|
||||
Here we can see the `reset index`. To reset the ACL system, we create the `acl-bootstrap-reset` file in the data directory of the **leader** node:
|
||||
|
||||
```shell
|
||||
$ echo 7 >> /nomad-data-dir/server/acl-bootstrap-reset
|
||||
```
|
||||
|
||||
With the reset key setup, we can bootstrap like normal:
|
||||
|
||||
```shell
|
||||
$ nomad acl bootstrap
|
||||
Accessor ID = 52d3353d-d7b9-d945-0591-1af608732b76
|
||||
Secret ID = 4b0a41ca-6d32-1853-e64b-de0d347e4525
|
||||
Name = Bootstrap Token
|
||||
Type = management
|
||||
Global = true
|
||||
Policies = n/a
|
||||
Create Time = 2017-09-11 18:38:11.929089612 +0000 UTC
|
||||
Create Index = 11
|
||||
Modify Index = 11
|
||||
```
|
||||
|
||||
If we attempt to bootstrap again, we will get a mismatch on the reset index:
|
||||
|
||||
```shell
|
||||
$ nomad acl bootstrap
|
||||
|
||||
Error bootstrapping: Unexpected response code: 500 (Invalid bootstrap reset index (specified 7, reset index: 11))
|
||||
```
|
||||
|
||||
This is because the reset file is in place, but with the incorrect index. The reset file can be deleted, but Nomad will not reset the bootstrap until the index is corrected.
|
||||
|
||||
~> **Note**: Resetting ACL Bootstrap does not automatically invalidate previous ACL tokens. All previous bootstrap and management token remains valid, and existing tools that utilize them remain functional. If the token is unused, or if a management token is suspected of being compromised, then we should invalidate it, update any existing system with new tokens, and audit all existing tokens.
|
||||
|
||||
## Vault Integration
|
||||
|
||||
HashiCorp Vault has a secret backend for generating short-lived Nomad tokens. As Vault has a number of authentication backends, it could provide a workflow where a user or orchestration system authenticates using an pre-existing identity service (LDAP, Okta, Amazon IAM, etc.) in order to obtain a short-lived Nomad token.
|
||||
|
||||
~> HashiCorp Vault is a standalone product with its own set of deployment and configuration best practices. Please review [Vault's documentation](https://www.vaultproject.io/docs) before deploying it in production.
|
||||
|
||||
For evaluation purposes, a Vault server in "dev" mode can be used.
|
||||
|
||||
```shell
|
||||
$ vault server -dev
|
||||
==> Vault server configuration:
|
||||
|
||||
Cgo: disabled
|
||||
Cluster Address: https://127.0.0.1:8201
|
||||
Listener 1: tcp (addr: "127.0.0.1:8200", cluster address: "127.0.0.1:8201", tls: "disabled")
|
||||
Log Level: info
|
||||
Mlock: supported: false, enabled: false
|
||||
Redirect Address: http://127.0.0.1:8200
|
||||
Storage: inmem
|
||||
Version: Vault v0.8.3
|
||||
Version Sha: a393b20cb6d96c73e52eb5af776c892b8107a45d
|
||||
|
||||
==> WARNING: Dev mode is enabled!
|
||||
|
||||
In this mode, Vault is completely in-memory and unsealed.
|
||||
Vault is configured to only have a single unseal key. The root
|
||||
token has already been authenticated with the CLI, so you can
|
||||
immediately begin using the Vault CLI.
|
||||
|
||||
The only step you need to take is to set the following
|
||||
environment variables:
|
||||
|
||||
export VAULT_ADDR='http://127.0.0.1:8200'
|
||||
|
||||
The unseal key and root token are reproduced below in case you
|
||||
want to seal/unseal the Vault or play with authentication.
|
||||
|
||||
Unseal Key: YzFfPgnLl9R1f6bLU7tGqi/PIDhDaAV/tlNDMV5Rrq0=
|
||||
Root Token: f84b587e-5882-bba1-a3f0-d1a3d90ca105
|
||||
```
|
||||
|
||||
### Pre-requisites
|
||||
|
||||
- Nomad ACL system bootstrapped.
|
||||
- A management token (You can use the bootstrap token; however, for production systems we recommended having an integration-specific token)
|
||||
- A set of policies created in Nomad
|
||||
- An unsealed Vault server (Vault running in `dev` mode is unsealed automatically upon startup)
|
||||
- Vault must be version 0.9.3 or later to have the Nomad plugin
|
||||
|
||||
### Configuration
|
||||
|
||||
Mount the [`nomad`][nomad_backend] secret backend in Vault:
|
||||
|
||||
```shell
|
||||
$ vault mount nomad
|
||||
Successfully mounted 'nomad' at 'nomad'!
|
||||
```
|
||||
|
||||
Configure access with Nomad's address and management token:
|
||||
|
||||
```shell
|
||||
$ vault write nomad/config/access \
|
||||
address=http://127.0.0.1:4646 \
|
||||
token=adf4238a-882b-9ddc-4a9d-5b6758e4159e
|
||||
Success! Data written to: nomad/config/access
|
||||
```
|
||||
|
||||
Vault secret backends have the concept of roles, which are configuration units that group one or more Vault policies to a potential identity attribute, (e.g. LDAP Group membership). The name of the role is specified on the path, while the mapping to policies is done by naming them in a comma separated list, for example:
|
||||
|
||||
```shell
|
||||
$ vault write nomad/role/role-name policies=policyone,policytwo
|
||||
Success! Data written to: nomad/role/role-name
|
||||
```
|
||||
|
||||
Similarly, to create management tokens, or global tokens:
|
||||
|
||||
```shell
|
||||
$ vault write nomad/role/role-name type=management global=true
|
||||
Success! Data written to: nomad/role/role-name
|
||||
```
|
||||
|
||||
Create a Vault policy to allow different identities to get tokens associated with a particular role:
|
||||
|
||||
```shell
|
||||
$ echo 'path "nomad/creds/role-name" {
|
||||
capabilities = ["read"]
|
||||
}' | vault policy write nomad-user-policy -
|
||||
Policy 'nomad-user-policy' written.
|
||||
```
|
||||
|
||||
If you have an existing authentication backend (like LDAP), follow the relevant instructions to create a role available on the [Authentication backends page](https://www.vaultproject.io/docs/auth). Otherwise, for testing purposes, a Vault token can be generated associated with the policy:
|
||||
|
||||
```shell
|
||||
$ vault token create -policy=nomad-user-policy
|
||||
Key Value
|
||||
--- -----
|
||||
token deedfa83-99b5-34a1-278d-e8fb76809a5b
|
||||
token_accessor fd185371-7d80-8011-4f45-1bb3af2c2733
|
||||
token_duration 768h0m0s
|
||||
token_renewable true
|
||||
token_policies [default nomad-user-policy]
|
||||
```
|
||||
|
||||
Finally obtain a Nomad Token using the existing Vault Token:
|
||||
|
||||
```shell
|
||||
$ vault read nomad/creds/role-name
|
||||
Key Value
|
||||
--- -----
|
||||
lease_id nomad/creds/role-name/6fb22e25-0cd1-b4c9-494e-aba330c317b9
|
||||
lease_duration 768h0m0s
|
||||
lease_renewable true
|
||||
accessor_id 10b8fb49-7024-2126-8683-ab355b581db2
|
||||
secret_id 8898d19c-e5b3-35e4-649e-4153d63fbea9
|
||||
```
|
||||
|
||||
Verify that the token is created correctly in Nomad, looking it up by its accessor:
|
||||
|
||||
```shell
|
||||
$ nomad acl token info 10b8fb49-7024-2126-8683-ab355b581db2
|
||||
Accessor ID = 10b8fb49-7024-2126-8683-ab355b581db2
|
||||
Secret ID = 8898d19c-e5b3-35e4-649e-4153d63fbea9
|
||||
Name = Vault test root 1507307164169530060
|
||||
Type = management
|
||||
Global = true
|
||||
Policies = n/a
|
||||
Create Time = 2017-10-06 16:26:04.170633207 +0000 UTC
|
||||
Create Index = 228
|
||||
Modify Index = 228
|
||||
```
|
||||
|
||||
Any user or process with access to Vault can now obtain short lived Nomad Tokens in order to carry out operations, thus centralizing the access to Nomad tokens.
|
||||
|
||||
[nomad_backend]: https://www.vaultproject.io/docs/secrets/nomad
|
91
website/pages/guides/security/encryption.mdx
Normal file
91
website/pages/guides/security/encryption.mdx
Normal file
|
@ -0,0 +1,91 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Encryption Overview
|
||||
sidebar_title: Encryption Overview
|
||||
description: 'Learn how to configure Nomad to encrypt HTTP, RPC, and Serf traffic.'
|
||||
---
|
||||
|
||||
# Encryption Overview
|
||||
|
||||
The Nomad agent supports encrypting all of its network traffic. There are
|
||||
two separate encryption systems, one for gossip traffic, and one for HTTP and
|
||||
RPC.
|
||||
|
||||
## Gossip
|
||||
|
||||
Enabling gossip encryption only requires that you set an encryption key when
|
||||
starting the Nomad server. The key can be set via the
|
||||
[`encrypt`](/docs/configuration/server#encrypt) parameter: the value
|
||||
of this setting is a server configuration file containing the encryption key.
|
||||
The same encryption key should be used on every server in a region.
|
||||
|
||||
The key must be 16 bytes, base64 encoded. As a convenience, Nomad provides the
|
||||
[`nomad operator keygen`](/docs/commands/operator/keygen) command to
|
||||
generate a cryptographically suitable key:
|
||||
|
||||
```shell
|
||||
$ nomad operator keygen
|
||||
cg8StVXbQJ0gPvMd9o7yrg==
|
||||
```
|
||||
|
||||
With that key, you can enable gossip encryption on the agent.
|
||||
|
||||
## HTTP, RPC, and Raft Encryption with TLS
|
||||
|
||||
Nomad supports using TLS to verify the authenticity of servers and clients. To
|
||||
enable this, Nomad requires that all clients and servers have key pairs that are
|
||||
generated and signed by a private Certificate Authority (CA).
|
||||
|
||||
TLS can be used to verify the authenticity of the servers and clients. The
|
||||
configuration option [`verify_server_hostname`][tls] causes Nomad to verify that
|
||||
a certificate is provided that is signed by the Certificate Authority from the
|
||||
[`ca_file`][tls] for TLS connections.
|
||||
|
||||
If `verify_server_hostname` is set, then outgoing connections perform
|
||||
hostname verification. Unlike traditional HTTPS browser validation, all servers
|
||||
must have a certificate valid for `server.<region>.nomad` or the client will
|
||||
reject the handshake. It is also recommended for the certificate to sign
|
||||
`localhost` such that the CLI can validate the server name.
|
||||
|
||||
TLS is used to secure the RPC calls between agents, but gossip between nodes is
|
||||
done over UDP and is secured using a symmetric key. See above for enabling
|
||||
gossip encryption.
|
||||
|
||||
### Configuring the command line tool
|
||||
|
||||
If you have HTTPS enabled for your Nomad agent, you must export environment
|
||||
variables for the command line tool to also use HTTPS:
|
||||
|
||||
```sh
|
||||
# NOMAD_ADDR defaults to http://, so set it to https
|
||||
# Alternatively you can use the -address flag
|
||||
export NOMAD_ADDR=https://127.0.0.1:4646
|
||||
|
||||
# Set the location of your CA certificate
|
||||
# Alternatively you can use the -ca-cert flag
|
||||
export NOMAD_CACERT=/path/to/ca.pem
|
||||
```
|
||||
|
||||
Run any command except `agent` with `-h` to see all environment variables and
|
||||
flags. For example: `nomad status -h`
|
||||
|
||||
By default HTTPS does not validate client certificates, so you do not need to
|
||||
give the command line tool access to any private keys.
|
||||
|
||||
### Network Isolation with TLS
|
||||
|
||||
If you want to isolate Nomad agents on a network with TLS you need to enable
|
||||
both [`verify_https_client`][tls] and [`verify_server_hostname`][tls]. This
|
||||
will cause agents to require client certificates for all incoming HTTPS
|
||||
connections as well as verify proper names on all other certificates.
|
||||
|
||||
Consul will not attempt to health check agents with `verify_https_client` set
|
||||
as it is unable to use client certificates.
|
||||
|
||||
# Configuring Nomad with TLS
|
||||
|
||||
Read the [Securing Nomad with TLS Guide][guide] for details on how to configure
|
||||
encryption for Nomad.
|
||||
|
||||
[guide]: /guides/security/securing-nomad 'Securing Nomad with TLS'
|
||||
[tls]: /docs/configuration/tls 'Nomad TLS Configuration'
|
14
website/pages/guides/security/index.mdx
Normal file
14
website/pages/guides/security/index.mdx
Normal file
|
@ -0,0 +1,14 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Security
|
||||
sidebar_title: Securing Nomad
|
||||
description: Learn how to use Nomad safely and securely in a multi-team setting.
|
||||
---
|
||||
|
||||
# Security
|
||||
|
||||
The Nomad Security section provides best practices and
|
||||
guidance for securing Nomad in an enterprise environment.
|
||||
|
||||
Please
|
||||
navigate the appropriate sub-sections for more information.
|
510
website/pages/guides/security/securing-nomad.mdx
Normal file
510
website/pages/guides/security/securing-nomad.mdx
Normal file
|
@ -0,0 +1,510 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Securing Nomad with TLS
|
||||
sidebar_title: Securing Nomad with TLS
|
||||
description: |-
|
||||
Securing Nomad's cluster communication with TLS is important for both
|
||||
security and easing operations. Nomad can use mutual TLS (mTLS) for
|
||||
authenticating for all HTTP and RPC communication.
|
||||
---
|
||||
|
||||
# Securing Nomad with TLS
|
||||
|
||||
Securing Nomad's cluster communication is not only important for security but
|
||||
can even ease operations by preventing mistakes and misconfigurations. Nomad
|
||||
optionally uses mutual [TLS][tls] (mTLS) for all HTTP and RPC communication.
|
||||
Nomad's use of mTLS provides the following properties:
|
||||
|
||||
- Prevent unauthorized Nomad access
|
||||
- Prevent observing or tampering with Nomad communication
|
||||
- Prevent client/server role or region misconfigurations
|
||||
- Prevent other services from masquerading as Nomad agents
|
||||
|
||||
Preventing region misconfigurations is a property of Nomad's mTLS not commonly
|
||||
found in the TLS implementations on the public Internet. While most uses of
|
||||
TLS verify the identity of the server you are connecting to based on a domain
|
||||
name such as `example.com`, Nomad verifies the node you are connecting to is in
|
||||
the expected region and configured for the expected role (e.g.
|
||||
`client.us-west.nomad`). This also prevents other services who may have access
|
||||
to certificates signed by the same private CA from masquerading as Nomad
|
||||
agents. If certificates were identified based on hostname/IP then any other
|
||||
service on a host could masquerade as a Nomad agent.
|
||||
|
||||
Correctly configuring TLS can be a complex process, especially given the wide
|
||||
range of deployment methodologies. If you use the sample
|
||||
[Vagrantfile][vagrantfile] from the [Getting Started Guide][guide-install] - or
|
||||
have [cfssl][cfssl] and Nomad installed - this guide will provide you with a
|
||||
production ready TLS configuration.
|
||||
|
||||
~> Note that while Nomad's TLS configuration will be production ready, key
|
||||
management and rotation is a complex subject not covered by this guide.
|
||||
[Vault][vault] is the suggested solution for key generation and management.
|
||||
|
||||
## Creating Certificates
|
||||
|
||||
The first step to configuring TLS for Nomad is generating certificates. In
|
||||
order to prevent unauthorized cluster access, Nomad requires all certificates
|
||||
be signed by the same Certificate Authority (CA). This should be a _private_ CA
|
||||
and not a public one like [Let's Encrypt][letsencrypt] as any certificate
|
||||
signed by this CA will be allowed to communicate with the cluster.
|
||||
|
||||
~> Nomad certificates may be signed by intermediate CAs as long as the root CA
|
||||
is the same. Append all intermediate CAs to the `cert_file`.
|
||||
|
||||
### Certificate Authority
|
||||
|
||||
There are a variety of tools for managing your own CA, [like the PKI secret
|
||||
backend in Vault][vault-pki], but for the sake of simplicity this guide will
|
||||
use [cfssl][cfssl]. You can generate a private CA certificate and key with
|
||||
[cfssl][cfssl]:
|
||||
|
||||
```shell
|
||||
$ # Generate the CA's private key and certificate
|
||||
$ cfssl print-defaults csr | cfssl gencert -initca - | cfssljson -bare nomad-ca
|
||||
```
|
||||
|
||||
The CA key (`nomad-ca-key.pem`) will be used to sign certificates for Nomad
|
||||
nodes and must be kept private. The CA certificate (`nomad-ca.pem`) contains
|
||||
the public key necessary to validate Nomad certificates and therefore must be
|
||||
distributed to every node that requires access.
|
||||
|
||||
### Node Certificates
|
||||
|
||||
Once you have a CA certificate and key you can generate and sign the
|
||||
certificates Nomad will use directly. TLS certificates commonly use the
|
||||
fully-qualified domain name of the system being identified as the certificate's
|
||||
Common Name (CN). However, hosts (and therefore hostnames and IPs) are often
|
||||
ephemeral in Nomad clusters. Not only would signing a new certificate per
|
||||
Nomad node be difficult, but using a hostname provides no security or
|
||||
functional benefits to Nomad. To fulfill the desired security properties
|
||||
(above) Nomad certificates are signed with their region and role such as:
|
||||
|
||||
- `client.global.nomad` for a client node in the `global` region
|
||||
- `server.us-west.nomad` for a server node in the `us-west` region
|
||||
|
||||
To create certificates for the client and server in the cluster from the
|
||||
[Getting Started guide][guide-cluster] with [cfssl][cfssl] create ([or
|
||||
download][cfssl.json]) the following configuration file as `cfssl.json` to
|
||||
increase the default certificate expiration time:
|
||||
|
||||
```json
|
||||
{
|
||||
"signing": {
|
||||
"default": {
|
||||
"expiry": "87600h",
|
||||
"usages": ["signing", "key encipherment", "server auth", "client auth"]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```shell
|
||||
$ # Generate a certificate for the Nomad server
|
||||
$ echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -config=cfssl.json \
|
||||
-hostname="server.global.nomad,localhost,127.0.0.1" - | cfssljson -bare server
|
||||
|
||||
# Generate a certificate for the Nomad client
|
||||
$ echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -config=cfssl.json \
|
||||
-hostname="client.global.nomad,localhost,127.0.0.1" - | cfssljson -bare client
|
||||
|
||||
# Generate a certificate for the CLI
|
||||
$ echo '{}' | cfssl gencert -ca=nomad-ca.pem -ca-key=nomad-ca-key.pem -profile=client \
|
||||
- | cfssljson -bare cli
|
||||
```
|
||||
|
||||
Using `localhost` and `127.0.0.1` as subject alternate names (SANs) allows
|
||||
tools like `curl` to be able to communicate with Nomad's HTTP API when run on
|
||||
the same host. Other SANs may be added including a DNS resolvable hostname to
|
||||
allow remote HTTP requests from third party tools.
|
||||
|
||||
You should now have the following files:
|
||||
|
||||
- `cfssl.json` - cfssl configuration.
|
||||
- `nomad-ca.csr` - CA signing request.
|
||||
- `nomad-ca-key.pem` - CA private key. Keep safe!
|
||||
- `nomad-ca.pem` - CA public certificate.
|
||||
- `cli.csr` - Nomad CLI certificate signing request.
|
||||
- `cli-key.pem` - Nomad CLI private key.
|
||||
- `cli.pem` - Nomad CLI certificate.
|
||||
- `client.csr` - Nomad client node certificate signing request for the `global` region.
|
||||
- `client-key.pem` - Nomad client node private key for the `global` region.
|
||||
- `client.pem` - Nomad client node public certificate for the `global` region.
|
||||
- `server.csr` - Nomad server node certificate signing request for the `global` region.
|
||||
- `server-key.pem` - Nomad server node private key for the `global` region.
|
||||
- `server.pem` - Nomad server node public certificate for the `global` region.
|
||||
|
||||
Each Nomad node should have the appropriate key (`-key.pem`) and certificate
|
||||
(`.pem`) file for its region and role. In addition each node needs the CA's
|
||||
public certificate (`nomad-ca.pem`).
|
||||
|
||||
## Configuring Nomad
|
||||
|
||||
Next Nomad must be configured to use the newly-created key and certificates for
|
||||
mTLS. Starting with the [server configuration from the Getting Started
|
||||
guide][guide-server] add the following TLS configuration options:
|
||||
|
||||
```hcl
|
||||
# Increase log verbosity
|
||||
log_level = "DEBUG"
|
||||
|
||||
# Setup data dir
|
||||
data_dir = "/tmp/server1"
|
||||
|
||||
# Enable the server
|
||||
server {
|
||||
enabled = true
|
||||
|
||||
# Self-elect, should be 3 or 5 for production
|
||||
bootstrap_expect = 1
|
||||
}
|
||||
|
||||
# Require TLS
|
||||
tls {
|
||||
http = true
|
||||
rpc = true
|
||||
|
||||
ca_file = "nomad-ca.pem"
|
||||
cert_file = "server.pem"
|
||||
key_file = "server-key.pem"
|
||||
|
||||
verify_server_hostname = true
|
||||
verify_https_client = true
|
||||
}
|
||||
```
|
||||
|
||||
The new [`tls`][tls_block] section is worth breaking down in more detail:
|
||||
|
||||
```hcl
|
||||
tls {
|
||||
http = true
|
||||
rpc = true
|
||||
# ...
|
||||
}
|
||||
```
|
||||
|
||||
This enables TLS for the HTTP and RPC protocols. Unlike web servers, Nomad
|
||||
doesn't use separate ports for TLS and non-TLS traffic: your cluster should
|
||||
either use TLS or not.
|
||||
|
||||
```hcl
|
||||
tls {
|
||||
# ...
|
||||
|
||||
ca_file = "nomad-ca.pem"
|
||||
cert_file = "server.pem"
|
||||
key_file = "server-key.pem"
|
||||
|
||||
# ...
|
||||
}
|
||||
```
|
||||
|
||||
The file lines should point to wherever you placed the certificate files on
|
||||
the node. This guide assumes they are in Nomad's current directory.
|
||||
|
||||
```hcl
|
||||
tls {
|
||||
# ...
|
||||
|
||||
verify_server_hostname = true
|
||||
verify_https_client = true
|
||||
}
|
||||
```
|
||||
|
||||
These two settings are important for ensuring all of Nomad's mTLS security
|
||||
properties are met. If [`verify_server_hostname`][verify_server_hostname] is
|
||||
set to `false` the node's certificate will be checked to ensure it is signed by
|
||||
the same CA, but its role and region will not be verified. This means any
|
||||
service with a certificate signed by same CA as Nomad can act as a client or
|
||||
server of any region.
|
||||
|
||||
[`verify_https_client`][verify_https_client] requires HTTP API clients to
|
||||
present a certificate signed by the same CA as Nomad's certificate. It may be
|
||||
disabled to allow HTTP API clients (e.g. Nomad CLI, Consul, or curl) to
|
||||
communicate with the HTTPS API without presenting a client-side certificate. If
|
||||
`verify_https_client` is enabled only HTTP API clients presenting a certificate
|
||||
signed by the same CA as Nomad's certificate are allowed to access Nomad.
|
||||
|
||||
~> Enabling `verify_https_client` effectively protects Nomad from unauthorized
|
||||
network access at the cost of losing Consul HTTPS health checks for agents.
|
||||
|
||||
### Client Configuration
|
||||
|
||||
The Nomad client configuration is similar to the server configuration. The
|
||||
biggest difference is in the certificate and key used for configuration.
|
||||
|
||||
```hcl
|
||||
# Increase log verbosity
|
||||
log_level = "DEBUG"
|
||||
|
||||
# Setup data dir
|
||||
data_dir = "/tmp/client1"
|
||||
|
||||
# Enable the client
|
||||
client {
|
||||
enabled = true
|
||||
|
||||
# For demo assume we are talking to server1. For production,
|
||||
# this should be like "nomad.service.consul:4647" and a system
|
||||
# like Consul used for service discovery.
|
||||
servers = ["127.0.0.1:4647"]
|
||||
}
|
||||
|
||||
# Modify our port to avoid a collision with server1
|
||||
ports {
|
||||
http = 5656
|
||||
}
|
||||
|
||||
# Require TLS
|
||||
tls {
|
||||
http = true
|
||||
rpc = true
|
||||
|
||||
ca_file = "nomad-ca.pem"
|
||||
cert_file = "client.pem"
|
||||
key_file = "client-key.pem"
|
||||
|
||||
verify_server_hostname = true
|
||||
verify_https_client = true
|
||||
}
|
||||
```
|
||||
|
||||
### Running with TLS
|
||||
|
||||
Now that we have certificates generated and configuration for a client and
|
||||
server we can test our TLS-enabled cluster!
|
||||
|
||||
In separate terminals start a server and client agent:
|
||||
|
||||
```shell
|
||||
$ # In one terminal...
|
||||
$ nomad agent -config server1.hcl
|
||||
|
||||
$ # ...and in another
|
||||
$ nomad agent -config client1.hcl
|
||||
```
|
||||
|
||||
If you run `nomad node status` now, you'll get an error, like:
|
||||
|
||||
```text
|
||||
Error querying node status: Get http://127.0.0.1:4646/v1/nodes: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
|
||||
```
|
||||
|
||||
This is because the Nomad CLI defaults to communicating via HTTP instead of
|
||||
HTTPS. We can configure the local Nomad client to connect using TLS and specify
|
||||
our custom keys and certificates using the command line:
|
||||
|
||||
```shell
|
||||
$ nomad node status -ca-cert=nomad-ca.pem -client-cert=cli.pem -client-key=cli-key.pem -address=https://127.0.0.1:4646
|
||||
```
|
||||
|
||||
This process can be cumbersome to type each time, so the Nomad CLI also
|
||||
searches environment variables for default values. Set the following
|
||||
environment variables in your shell:
|
||||
|
||||
```shell
|
||||
$ export NOMAD_ADDR=https://localhost:4646
|
||||
$ export NOMAD_CACERT=nomad-ca.pem
|
||||
$ export NOMAD_CLIENT_CERT=cli.pem
|
||||
$ export NOMAD_CLIENT_KEY=cli-key.pem
|
||||
```
|
||||
|
||||
- `NOMAD_ADDR` is the URL of the Nomad agent and sets the default for `-addr`.
|
||||
- `NOMAD_CACERT` is the location of your CA certificate and sets the default
|
||||
for `-ca-cert`.
|
||||
- `NOMAD_CLIENT_CERT` is the location of your CLI certificate and sets the
|
||||
default for `-client-cert`.
|
||||
- `NOMAD_CLIENT_KEY` is the location of your CLI key and sets the default for
|
||||
`-client-key`.
|
||||
|
||||
After these environment variables are correctly configured, the CLI will
|
||||
respond as expected:
|
||||
|
||||
```shell
|
||||
$ nomad node status
|
||||
ID DC Name Class Drain Eligibility Status
|
||||
237cd4c5 dc1 nomad <none> false eligible ready
|
||||
|
||||
$ nomad job init
|
||||
Example job file written to example.nomad
|
||||
vagrant@nomad:~$ nomad job run example.nomad
|
||||
==> Monitoring evaluation "e9970e1d"
|
||||
Evaluation triggered by job "example"
|
||||
Allocation "a1f6c3e7" created: node "237cd4c5", group "cache"
|
||||
Evaluation within deployment: "080460ce"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "e9970e1d" finished with status "complete"
|
||||
```
|
||||
|
||||
## Server Gossip
|
||||
|
||||
At this point all of Nomad's RPC and HTTP communication is secured with mTLS.
|
||||
However, Nomad servers also communicate with a gossip protocol, Serf, that does
|
||||
not use TLS:
|
||||
|
||||
- HTTP - Used to communicate between CLI and Nomad agents. Secured by mTLS.
|
||||
- RPC - Used to communicate between Nomad agents. Secured by mTLS.
|
||||
- Serf - Used to communicate between Nomad servers. Secured by a shared key.
|
||||
|
||||
Nomad server's gossip protocol use a shared key instead of TLS for encryption.
|
||||
This encryption key must be added to every server's configuration using the
|
||||
[`encrypt`](/docs/configuration/server#encrypt) parameter or with
|
||||
the [`-encrypt` command line option](/docs/commands/agent).
|
||||
|
||||
The Nomad CLI includes a `operator keygen` command for generating a new secure gossip
|
||||
encryption key:
|
||||
|
||||
```shell
|
||||
$ nomad operator keygen
|
||||
cg8StVXbQJ0gPvMd9o7yrg==
|
||||
```
|
||||
|
||||
Alternatively, you can use any method that base64 encodes 16 random bytes:
|
||||
|
||||
```shell
|
||||
$ openssl rand -base64 16
|
||||
raZjciP8vikXng2S5X0m9w==
|
||||
$ dd if=/dev/urandom bs=16 count=1 status=none | base64
|
||||
LsuYyj93KVfT3pAJPMMCgA==
|
||||
```
|
||||
|
||||
Put the same generated key into every server's configuration file or command
|
||||
line arguments:
|
||||
|
||||
```hcl
|
||||
server {
|
||||
enabled = true
|
||||
|
||||
# Self-elect, should be 3 or 5 for production
|
||||
bootstrap_expect = 1
|
||||
|
||||
# Encrypt gossip communication
|
||||
encrypt = "cg8StVXbQJ0gPvMd9o7yrg=="
|
||||
}
|
||||
```
|
||||
|
||||
## Switching an existing cluster to TLS
|
||||
|
||||
Since Nomad does _not_ use different ports for TLS and non-TLS communication,
|
||||
the use of TLS must be consistent across the cluster. Switching an existing
|
||||
cluster to use TLS everywhere is operationally similar to upgrading between
|
||||
versions of Nomad, but requires additional steps to preventing needlessly
|
||||
rescheduling allocations.
|
||||
|
||||
1. Add the appropriate key and certificates to all nodes. Ensure the private
|
||||
key file is only readable by the Nomad user.
|
||||
1. Add the environment variables to all nodes where the CLI is used.
|
||||
1. Add the appropriate [`tls`][tls_block] block to the configuration file on
|
||||
all nodes.
|
||||
1. Generate a gossip key and add it the Nomad server configuration.
|
||||
|
||||
~> Once a quorum of servers are TLS-enabled, clients will no longer be able to
|
||||
communicate with the servers until their client configuration is updated and
|
||||
reloaded.
|
||||
|
||||
At this point a rolling restart of the cluster will enable TLS everywhere.
|
||||
However, once servers are restarted clients will be unable to heartbeat. This
|
||||
means any client unable to restart with TLS enabled before their heartbeat TTL
|
||||
expires will have their allocations marked as `lost` and rescheduled.
|
||||
|
||||
While the default heartbeat settings may be sufficient for concurrently
|
||||
restarting a small number of nodes without any allocations being marked as
|
||||
`lost`, most operators should raise the [`heartbeat_grace`][heartbeat_grace]
|
||||
configuration setting before restarting their servers:
|
||||
|
||||
1. Set `heartbeat_grace = "1h"` or an appropriate duration on servers.
|
||||
1. Restart servers, one at a time.
|
||||
1. Restart clients, one or more at a time.
|
||||
1. Set [`heartbeat_grace`][heartbeat_grace] back to its previous value (or
|
||||
remove to accept the default).
|
||||
1. Restart servers, one at a time.
|
||||
|
||||
~> In a future release Nomad will allow upgrading a cluster to use TLS by
|
||||
allowing servers to accept TLS and non-TLS connections from clients during
|
||||
the migration.
|
||||
|
||||
Jobs running in the cluster will _not_ be affected and will continue running
|
||||
throughout the switch as long as all clients can restart within their heartbeat
|
||||
TTL.
|
||||
|
||||
## Changing Nomad certificates on the fly
|
||||
|
||||
As of 0.7.1, Nomad supports dynamic certificate reloading via SIGHUP.
|
||||
|
||||
Given a prior TLS configuration as follows:
|
||||
|
||||
```hcl
|
||||
tls {
|
||||
http = true
|
||||
rpc = true
|
||||
|
||||
ca_file = "nomad-ca.pem"
|
||||
cert_file = "server.pem"
|
||||
key_file = "server-key.pem"
|
||||
|
||||
verify_server_hostname = true
|
||||
verify_https_client = true
|
||||
}
|
||||
```
|
||||
|
||||
Nomad's cert_file and key_file can be reloaded via SIGHUP simply by
|
||||
updating the TLS stanza to:
|
||||
|
||||
```hcl
|
||||
tls {
|
||||
http = true
|
||||
rpc = true
|
||||
|
||||
ca_file = "nomad-ca.pem"
|
||||
cert_file = "new_server.pem"
|
||||
key_file = "new_server_key.pem"
|
||||
|
||||
verify_server_hostname = true
|
||||
verify_https_client = true
|
||||
}
|
||||
```
|
||||
|
||||
## Migrating a cluster to TLS
|
||||
|
||||
### Reloading TLS configuration via SIGHUP
|
||||
|
||||
Nomad supports dynamically reloading both client and server TLS configuration.
|
||||
To reload an agent's TLS configuration, first update the TLS block in the
|
||||
agent's configuration file and then send the Nomad agent a SIGHUP signal.
|
||||
Note that this will only reload a subset of the configuration file,
|
||||
including the TLS configuration.
|
||||
|
||||
The agent reloads all its network connections when there are changes to its TLS
|
||||
configuration during a config reload via SIGHUP. Any new connections
|
||||
established will use the updated configuration, and any outstanding old
|
||||
connections will be closed. This process works when upgrading to TLS,
|
||||
downgrading from it, as well as rolling certificates. We recommend upgrading
|
||||
to TLS.
|
||||
|
||||
### RPC Upgrade Mode for Nomad Servers
|
||||
|
||||
When migrating to TLS, the [ `rpc_upgrade_mode` ][rpc_upgrade_mode] option
|
||||
(defaults to `false`) in the TLS configuration for a Nomad server can be set
|
||||
to true. When set to true, servers will accept both TLS and non-TLS
|
||||
connections. By accepting non-TLS connections, operators can upgrade clients
|
||||
to TLS without the clients being marked as lost because the server is
|
||||
rejecting the client connection due to the connection not being over TLS.
|
||||
However, it is important to note that `rpc_upgrade_mode` should be used as a
|
||||
temporary solution in the process of migration, and this option should be
|
||||
re-set to false (meaning that the server will strictly accept only TLS
|
||||
connections) once the entire cluster has been migrated.
|
||||
|
||||
[cfssl]: https://cfssl.org/
|
||||
[cfssl.json]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/cfssl.json
|
||||
[guide-install]: /intro/getting-started/install
|
||||
[guide-cluster]: /intro/getting-started/cluster
|
||||
[guide-server]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/server.hcl
|
||||
[heartbeat_grace]: /docs/configuration/server#heartbeat_grace
|
||||
[letsencrypt]: https://letsencrypt.org/
|
||||
[rpc_upgrade_mode]: /docs/configuration/tls#rpc_upgrade_mode/
|
||||
[tls]: https://en.wikipedia.org/wiki/Transport_Layer_Security
|
||||
[tls_block]: /docs/configuration/tls
|
||||
[vagrantfile]: https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/Vagrantfile
|
||||
[vault]: https://www.vaultproject.io/
|
||||
[vault-pki]: https://www.vaultproject.io/docs/secrets/pki
|
||||
[verify_https_client]: /docs/configuration/tls#verify_https_client
|
||||
[verify_server_hostname]: /docs/configuration/tls#verify_server_hostname
|
642
website/pages/guides/security/vault-pki-integration.mdx
Normal file
642
website/pages/guides/security/vault-pki-integration.mdx
Normal file
|
@ -0,0 +1,642 @@
|
|||
---
|
||||
layout: guides
|
||||
page_title: Vault PKI Secrets Engine Integration
|
||||
sidebar_title: Vault PKI Secrets Engine
|
||||
description: |-
|
||||
Securing Nomad's cluster communication with TLS is important for both
|
||||
security and easing operations. Nomad can use mutual TLS (mTLS) for
|
||||
authenticating for all HTTP and RPC communication. This guide will leverage
|
||||
Vault's PKI secrets engine to accomplish this task.
|
||||
---
|
||||
|
||||
# Vault PKI Secrets Engine Integration
|
||||
|
||||
You can use [Consul Template][consul-template] in your Nomad cluster to
|
||||
integrate with Vault's [PKI Secrets Engine][pki-engine] to generate and renew
|
||||
dynamic X.509 certificates. By using this method, you enable each node to have a
|
||||
unique certificate with a relatively short ttl. This feature, along with
|
||||
automatic certificate rotation, allows you to safely and securely scale your
|
||||
cluster while using mutual TLS (mTLS).
|
||||
|
||||
## Reference Material
|
||||
|
||||
- [Vault PKI Secrets Engine][pki-engine]
|
||||
- [Consul Template][consul-template-github]
|
||||
- [Build Your Own Certificate Authority (CA)][vault-ca-learn]
|
||||
|
||||
## Estimated Time to Complete
|
||||
|
||||
25 minutes
|
||||
|
||||
## Challenge
|
||||
|
||||
Secure your existing Nomad cluster with mTLS. Configure a root and intermediate
|
||||
CA in Vault and ensure (with the help of Consul Template) that you are
|
||||
periodically renewing your X.509 certificates on all nodes to maintain a healthy
|
||||
cluster state.
|
||||
|
||||
## Solution
|
||||
|
||||
Enable TLS in your Nomad cluster configuration. Additionally, configure Consul
|
||||
Template on all nodes along with the appropriate templates to communicate with
|
||||
Vault and ensure all nodes are dynamically generating/renewing their X.509
|
||||
certificates.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
To perform the tasks described in this guide, you need to have a Nomad
|
||||
environment with Consul and Vault installed. You can use this [repo][repo] to
|
||||
easily provision a sandbox environment. This guide will assume a cluster with
|
||||
one server node and three client nodes.
|
||||
|
||||
~> **Please Note:** This guide is for demo purposes and is only using a single
|
||||
Nomad server with a Vault server configured alongside it. In a production
|
||||
cluster, 3 or 5 Nomad server nodes are recommended along with a separate Vault
|
||||
cluster. Please see [Vault Reference Architecture][vault-ra] to learn how to
|
||||
securely deploy a Vault cluster.
|
||||
|
||||
## Steps
|
||||
|
||||
### Step 1: Initialize Vault Server
|
||||
|
||||
Run the following command to initialize Vault server and receive an
|
||||
[unseal][seal] key and initial root [token][token] (if you are running the
|
||||
environment provided in this guide, the Vault server is co-located with the
|
||||
Nomad server). Be sure to note the unseal key and initial root token as you will
|
||||
need these two pieces of information.
|
||||
|
||||
```shell
|
||||
$ vault operator init -key-shares=1 -key-threshold=1
|
||||
```
|
||||
|
||||
The `vault operator init` command above creates a single Vault unseal key for
|
||||
convenience. For a production environment, it is recommended that you create at
|
||||
least five unseal key shares and securely distribute them to independent
|
||||
operators. The `vault operator init` command defaults to five key shares and a
|
||||
key threshold of three. If you provisioned more than one server, the others will
|
||||
become standby nodes but should still be unsealed.
|
||||
|
||||
### Step 2: Unseal Vault
|
||||
|
||||
Run the following command and then provide your unseal key to Vault.
|
||||
|
||||
```shell
|
||||
$ vault operator unseal
|
||||
```
|
||||
|
||||
The output of unsealing Vault will look similar to the following:
|
||||
|
||||
```shell
|
||||
Key Value
|
||||
--- -----
|
||||
Seal Type shamir
|
||||
Initialized true
|
||||
Sealed false
|
||||
Total Shares 1
|
||||
Threshold 1
|
||||
Version 1.0.3
|
||||
Cluster Name vault-cluster-d1b6513f
|
||||
Cluster ID 87d6d13f-4b92-60ce-1f70-41a66412b0f1
|
||||
HA Enabled true
|
||||
HA Cluster n/a
|
||||
HA Mode standby
|
||||
Active Node Address <none>
|
||||
```
|
||||
|
||||
### Step 3: Log in to Vault
|
||||
|
||||
Use the [login][login] command to authenticate yourself against Vault using the
|
||||
initial root token you received earlier. You will need to authenticate to run
|
||||
the necessary commands to write policies, create roles, and configure your root
|
||||
and intermediate CAs.
|
||||
|
||||
```shell
|
||||
$ vault login <your initial root token>
|
||||
```
|
||||
|
||||
If your login is successful, you will see output similar to what is shown below:
|
||||
|
||||
```shell
|
||||
Success! You are now authenticated. The token information displayed below
|
||||
is already stored in the token helper. You do NOT need to run "vault login"
|
||||
again. Future Vault requests will automatically use this token.
|
||||
...
|
||||
```
|
||||
|
||||
### Step 4: Generate the Root CA
|
||||
|
||||
Enable the [PKI secrets engine][pki-engine] at the `pki` path:
|
||||
|
||||
```shell
|
||||
$ vault secrets enable pki
|
||||
```
|
||||
|
||||
Tune the PKI secrets engine to issue certificates with a maximum time-to-live
|
||||
(TTL) of 87600 hours:
|
||||
|
||||
```shell
|
||||
$ vault secrets tune -max-lease-ttl=87600h pki
|
||||
```
|
||||
|
||||
- Please note: we are using a common and recommended pattern which is to have
|
||||
one mount act as the root CA and to use this CA only to sign intermediate CA
|
||||
CSRs from other PKI secrets engines (which we will create in the next few
|
||||
steps). For tighter security, you can store your CA outside of Vault and use
|
||||
the PKI engine only as an intermediate CA.
|
||||
|
||||
Generate the root certificate and save the certificate as `CA_cert.crt`:
|
||||
|
||||
```shell
|
||||
$ vault write -field=certificate pki/root/generate/internal \
|
||||
common_name="global.nomad" ttl=87600h > CA_cert.crt
|
||||
```
|
||||
|
||||
### Step 5: Generate the Intermediate CA and CSR
|
||||
|
||||
Enable the PKI secrets engine at the `pki_int` path:
|
||||
|
||||
```shell
|
||||
$ vault secrets enable -path=pki_int pki
|
||||
```
|
||||
|
||||
Tune the PKI secrets engine at the `pki_int` path to issue certificates with a
|
||||
maximum time-to-live (TTL) of 43800 hours:
|
||||
|
||||
```shell
|
||||
$ vault secrets tune -max-lease-ttl=43800h pki_int
|
||||
```
|
||||
|
||||
Generate a CSR from your intermediate CA and save it as `pki_intermediate.csr`:
|
||||
|
||||
```shell
|
||||
$ vault write -format=json pki_int/intermediate/generate/internal \
|
||||
common_name="global.nomad Intermediate Authority" \
|
||||
ttl="43800h" | jq -r '.data.csr' > pki_intermediate.csr
|
||||
```
|
||||
|
||||
### Step 6: Sign the CSR and Configure Intermediate CA Certificate
|
||||
|
||||
Sign the intermediate CA CSR with the root certificate and save the generated
|
||||
certificate as `intermediate.cert.pem`:
|
||||
|
||||
```shell
|
||||
$ vault write -format=json pki/root/sign-intermediate \
|
||||
csr=@pki_intermediate.csr format=pem_bundle \
|
||||
ttl="43800h" | jq -r '.data.certificate' > intermediate.cert.pem
|
||||
```
|
||||
|
||||
Once the CSR is signed and the root CA returns a certificate, it can be imported
|
||||
back into Vault:
|
||||
|
||||
```shell
|
||||
vault write pki_int/intermediate/set-signed certificate=@intermediate.cert.pem
|
||||
```
|
||||
|
||||
### Step 7: Create a Role
|
||||
|
||||
A role is a logical name that maps to a policy used to generate credentials. In
|
||||
our example, it will allow you to use [configuration
|
||||
parameters][config-parameters] that specify certificate common names, designate
|
||||
alternate names, and enable subdomains along with a few other key settings.
|
||||
|
||||
Create a role named `nomad-cluster` that specifies the allowed domains, enables
|
||||
you to create certificates for subdomains, and generates certificates with a TTL
|
||||
of 86400 seconds (24 hours).
|
||||
|
||||
```shell
|
||||
$ vault write pki_int/roles/nomad-cluster allowed_domains=global.nomad \
|
||||
allow_subdomains=true max_ttl=86400s require_cn=false generate_lease=true
|
||||
```
|
||||
|
||||
You should see the following output if the command you issues was successful:
|
||||
|
||||
```
|
||||
Success! Data written to: pki_int/roles/nomad-cluster
|
||||
```
|
||||
|
||||
### Step 8: Create a Policy to Access the Role Endpoint
|
||||
|
||||
Recall from [Step 1](#step-1-initialize-vault-server) that we generated a root
|
||||
token that we used to log in to Vault. Although we could use that token in our
|
||||
next steps to generate our TLS certs, the recommended security approach is to
|
||||
create a new token based on a specific policy with limited privileges.
|
||||
|
||||
Create a policy file named `tls-policy.hcl` and provide it the following
|
||||
contents:
|
||||
|
||||
```
|
||||
path "pki_int/issue/nomad-cluster" {
|
||||
capabilities = ["update"]
|
||||
}
|
||||
```
|
||||
|
||||
Note that we have are specifying the `update` [capability][capability] on the
|
||||
path `pki_int/issue/nomad-cluster`. All other privileges will be denied. You can
|
||||
read more about Vault policies [here][policies].
|
||||
|
||||
Write the policy we just created into Vault:
|
||||
|
||||
```shell
|
||||
$ vault policy write tls-policy tls-policy.hcl
|
||||
Success! Uploaded policy: tls-policy
|
||||
```
|
||||
|
||||
### Step 9: Generate a Token based on `tls-policy`
|
||||
|
||||
Create a token based on `tls-policy` with the following command:
|
||||
|
||||
```shell
|
||||
$ vault token create -policy="tls-policy" -period=24h -orphan
|
||||
```
|
||||
|
||||
If the command is successful, you will see output similar to the following:
|
||||
|
||||
```shell
|
||||
Key Value
|
||||
--- -----
|
||||
token s.m069Vpul3c4lfGnJ6unpxgxD
|
||||
token_accessor HiZALO25hDQzSgyaglkzty3M
|
||||
token_duration 24h
|
||||
token_renewable true
|
||||
token_policies ["default" "tls-policy"]
|
||||
identity_policies []
|
||||
policies ["default" "tls-policy"]
|
||||
```
|
||||
|
||||
Make a note of this token as you will need it in the upcoming steps.
|
||||
|
||||
### Step 10: Create and Populate the Templates Directory
|
||||
|
||||
We need to create templates that Consul Template can use to render the actual
|
||||
certificates and keys on the nodes in our cluster. In this guide, we will place
|
||||
these templates in `/opt/nomad/templates`.
|
||||
|
||||
Create a directory called `templates` in `/opt/nomad`:
|
||||
|
||||
```shell
|
||||
$ sudo mkdir /opt/nomad/templates
|
||||
```
|
||||
|
||||
Below are the templates that the Consul Template configuration will use. We will
|
||||
provide different templates to the nodes depending on whether they are server
|
||||
nodes or client nodes. All of the nodes will get the CLI templates (since we
|
||||
want to use the CLI on any of the nodes).
|
||||
|
||||
**For Nomad Servers**:
|
||||
|
||||
_agent.crt.tpl_:
|
||||
|
||||
```
|
||||
{{ with secret "pki_int/issue/nomad-cluster" "common_name=server.global.nomad" "ttl=24h" "alt_names=localhost" "ip_sans=127.0.0.1"}}
|
||||
{{ .Data.certificate }}
|
||||
{{ end }}
|
||||
```
|
||||
|
||||
_agent.key.tpl_:
|
||||
|
||||
```
|
||||
{{ with secret "pki_int/issue/nomad-cluster" "common_name=server.global.nomad" "ttl=24h" "alt_names=localhost" "ip_sans=127.0.0.1"}}
|
||||
{{ .Data.private_key }}
|
||||
{{ end }}
|
||||
```
|
||||
|
||||
_ca.crt.tpl_:
|
||||
|
||||
```
|
||||
{{ with secret "pki_int/issue/nomad-cluster" "common_name=server.global.nomad" "ttl=24h"}}
|
||||
{{ .Data.issuing_ca }}
|
||||
{{ end }}
|
||||
```
|
||||
|
||||
**For Nomad Clients**:
|
||||
|
||||
Replace the word `server` in the `common_name` option in each template with the
|
||||
word `client`.
|
||||
|
||||
_agent.crt.tpl_:
|
||||
|
||||
```
|
||||
{{ with secret "pki_int/issue/nomad-cluster" "common_name=client.global.nomad" "ttl=24h" "alt_names=localhost" "ip_sans=127.0.0.1"}}
|
||||
{{ .Data.certificate }}
|
||||
{{ end }}
|
||||
```
|
||||
|
||||
_agent.key.tpl_:
|
||||
|
||||
```
|
||||
{{ with secret "pki_int/issue/nomad-cluster" "common_name=client.global.nomad" "ttl=24h" "alt_names=localhost" "ip_sans=127.0.0.1"}}
|
||||
{{ .Data.private_key }}
|
||||
{{ end }}
|
||||
```
|
||||
|
||||
_ca.crt.tpl_:
|
||||
|
||||
```
|
||||
{{ with secret "pki_int/issue/nomad-cluster" "common_name=client.global.nomad" "ttl=24h"}}
|
||||
{{ .Data.issuing_ca }}
|
||||
{{ end }}
|
||||
```
|
||||
|
||||
**For Nomad CLI (provide this on all nodes)**:
|
||||
|
||||
_cli.crt.tpl_:
|
||||
|
||||
```
|
||||
{{ with secret "pki_int/issue/nomad-cluster" "ttl=24h" }}
|
||||
{{ .Data.certificate }}
|
||||
{{ end }}
|
||||
```
|
||||
|
||||
_cli.key.tpl_:
|
||||
|
||||
```
|
||||
{{ with secret "pki_int/issue/nomad-cluster" "ttl=24h" }}
|
||||
{{ .Data.private_key }}
|
||||
{{ end }}
|
||||
```
|
||||
|
||||
### Step 11: Configure Consul Template on All Nodes
|
||||
|
||||
If you are using the AWS environment provided in this guide, you already have
|
||||
[Consul Template][consul-template-github] installed on all nodes. If you are
|
||||
using your own environment, please make sure Consul Template is installed. You
|
||||
can download it [here][ct-download].
|
||||
|
||||
Provide the token you created in [Step
|
||||
9](#step-9-generate-a-token-based-on-tls-policy) to the Consul Template
|
||||
configuration file located at `/etc/consul-template.d/consul-template.hcl`. You
|
||||
will also need to specify the [template stanza][ct-template-stanza] so you can
|
||||
render each of the following on your nodes at the specified location from the
|
||||
templates you created in the previous step:
|
||||
|
||||
- Node certificate
|
||||
- Node private key
|
||||
- CA public certificate
|
||||
|
||||
We will also specify the template stanza to create certs and keys from the
|
||||
templates we previously created for the Nomad CLI (which defaults to HTTP but
|
||||
will need to use HTTPS once once TLS is enabled in our cluster).
|
||||
|
||||
Your `consul-template.hcl` configuration file should look similar to the
|
||||
following (you will need to provide this to each node in the cluster):
|
||||
|
||||
```
|
||||
# This denotes the start of the configuration section for Vault. All values
|
||||
# contained in this section pertain to Vault.
|
||||
vault {
|
||||
# This is the address of the Vault leader. The protocol (http(s)) portion
|
||||
# of the address is required.
|
||||
address = "http://active.vault.service.consul:8200"
|
||||
|
||||
# This value can also be specified via the environment variable VAULT_TOKEN.
|
||||
token = "s.m069Vpul3c4lfGnJ6unpxgxD"
|
||||
|
||||
# This should also be less than or around 1/3 of your TTL for a predictable
|
||||
# behaviour. See https://github.com/hashicorp/vault/issues/3414
|
||||
grace = "1s"
|
||||
|
||||
# This tells Consul Template that the provided token is actually a wrapped
|
||||
# token that should be unwrapped using Vault's cubbyhole response wrapping
|
||||
# before being used. Please see Vault's cubbyhole response wrapping
|
||||
# documentation for more information.
|
||||
unwrap_token = false
|
||||
|
||||
# This option tells Consul Template to automatically renew the Vault token
|
||||
# given. If you are unfamiliar with Vault's architecture, Vault requires
|
||||
# tokens be renewed at some regular interval or they will be revoked. Consul
|
||||
# Template will automatically renew the token at half the lease duration of
|
||||
# the token. The default value is true, but this option can be disabled if
|
||||
# you want to renew the Vault token using an out-of-band process.
|
||||
renew_token = true
|
||||
}
|
||||
|
||||
# This block defines the configuration for connecting to a syslog server for
|
||||
# logging.
|
||||
syslog {
|
||||
enabled = true
|
||||
|
||||
# This is the name of the syslog facility to log to.
|
||||
facility = "LOCAL5"
|
||||
}
|
||||
|
||||
# This block defines the configuration for a template. Unlike other blocks,
|
||||
# this block may be specified multiple times to configure multiple templates.
|
||||
template {
|
||||
# This is the source file on disk to use as the input template. This is often
|
||||
# called the "Consul Template template".
|
||||
source = "/opt/nomad/templates/agent.crt.tpl"
|
||||
|
||||
# This is the destination path on disk where the source template will render.
|
||||
# If the parent directories do not exist, Consul Template will attempt to
|
||||
# create them, unless create_dest_dirs is false.
|
||||
destination = "/opt/nomad/agent-certs/agent.crt"
|
||||
|
||||
# This is the permission to render the file. If this option is left
|
||||
# unspecified, Consul Template will attempt to match the permissions of the
|
||||
# file that already exists at the destination path. If no file exists at that
|
||||
# path, the permissions are 0644.
|
||||
perms = 0700
|
||||
|
||||
# This is the optional command to run when the template is rendered. The
|
||||
# command will only run if the resulting template changes.
|
||||
command = "systemctl reload nomad"
|
||||
}
|
||||
|
||||
template {
|
||||
source = "/opt/nomad/templates/agent.key.tpl"
|
||||
destination = "/opt/nomad/agent-certs/agent.key"
|
||||
perms = 0700
|
||||
command = "systemctl reload nomad"
|
||||
}
|
||||
|
||||
template {
|
||||
source = "/opt/nomad/templates/ca.crt.tpl"
|
||||
destination = "/opt/nomad/agent-certs/ca.crt"
|
||||
command = "systemctl reload nomad"
|
||||
}
|
||||
|
||||
# The following template stanzas are for the CLI certs
|
||||
|
||||
template {
|
||||
source = "/opt/nomad/templates/cli.crt.tpl"
|
||||
destination = "/opt/nomad/cli-certs/cli.crt"
|
||||
}
|
||||
|
||||
template {
|
||||
source = "/opt/nomad/templates/cli.key.tpl"
|
||||
destination = "/opt/nomad/cli-certs/cli.key"
|
||||
}
|
||||
```
|
||||
|
||||
!> Note: we have hard-coded the token we created into the Consul Template
|
||||
configuration file. Although we can avoid this by assigning it to the
|
||||
environment variable `VAULT_TOKEN`, this method can still pose a security
|
||||
concern. The recommended approach is to securely introduce this token to Consul
|
||||
Template. To learn how to accomplish this, see [Secure
|
||||
Introduction][secure-introduction].
|
||||
|
||||
- Please also note we have applied file permissions `0700` to the `agent.crt`
|
||||
and `agent.key` since only the root user should be able to read those files.
|
||||
Any other user using the Nomad CLI will be able to read the CLI certs and key
|
||||
that we have created for them along with intermediate CA cert.
|
||||
|
||||
### Step 12: Start the Consul Template Service
|
||||
|
||||
Start the Consul Template service on each node:
|
||||
|
||||
```shell
|
||||
$ sudo systemctl start consul-template
|
||||
```
|
||||
|
||||
You can quickly confirm the appropriate certs and private keys were generated in
|
||||
the `destination` directory you specified in your Consul Template configuration
|
||||
by listing them out:
|
||||
|
||||
```shell
|
||||
$ ls /opt/nomad/agent-certs/ /opt/nomad/cli-certs/
|
||||
/opt/nomad/agent-certs/:
|
||||
agent.crt agent.key ca.crt
|
||||
|
||||
/opt/nomad/cli-certs/:
|
||||
cli.crt cli.key
|
||||
```
|
||||
|
||||
### Step 13: Configure Nomad to Use TLS
|
||||
|
||||
Add the following [tls stanza][nomad-tls-stanza] to the configuration of all
|
||||
Nomad agents (servers and clients) in the cluster (configuration file located at
|
||||
`/etc/nomad.d/nomad.hcl` in this example):
|
||||
|
||||
```hcl
|
||||
tls {
|
||||
http = true
|
||||
rpc = true
|
||||
|
||||
ca_file = "/opt/nomad/agent-certs/ca.crt"
|
||||
cert_file = "/opt/nomad/agent-certs/agent.crt"
|
||||
key_file = "/opt/nomad/agent-certs/agent.key"
|
||||
|
||||
verify_server_hostname = true
|
||||
verify_https_client = true
|
||||
}
|
||||
```
|
||||
|
||||
Additionally, ensure the [`rpc_upgrade_mode`][rpc-upgrade-mode] option is set to
|
||||
`true` on your server nodes (this is to ensure the Nomad servers will accept
|
||||
both TLS and non-TLS connections during the upgrade):
|
||||
|
||||
```hcl
|
||||
rpc_upgrade_mode = true
|
||||
```
|
||||
|
||||
Reload Nomad's configuration on all nodes:
|
||||
|
||||
```shell
|
||||
$ systemctl reload nomad
|
||||
```
|
||||
|
||||
Once Nomad has been reloaded on all nodes, go back to your server nodes and
|
||||
change the `rpc_upgrade_mode` option to false (or remove the line since the
|
||||
option defaults to false) so that your Nomad servers will only accept TLS
|
||||
connections:
|
||||
|
||||
```hcl
|
||||
rpc_upgrade_mode = false
|
||||
```
|
||||
|
||||
You will need to reload Nomad on your servers after changing this setting. You
|
||||
can read more about RPC Upgrade Mode [here][rpc-upgrade].
|
||||
|
||||
If you run `nomad status`, you will now receive the following error:
|
||||
|
||||
```text
|
||||
Error querying jobs: Get http://172.31.52.215:4646/v1/jobs: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02"
|
||||
```
|
||||
|
||||
This is because the Nomad CLI defaults to communicating via HTTP instead of
|
||||
HTTPS. We can configure the local Nomad client to connect using TLS and specify
|
||||
our custom key and certificates by setting the following environments variables:
|
||||
|
||||
```shell
|
||||
export NOMAD_ADDR=https://localhost:4646
|
||||
export NOMAD_CACERT="/opt/nomad/agent-certs/ca.crt"
|
||||
export NOMAD_CLIENT_CERT="/opt/nomad/cli-certs/cli.crt"
|
||||
export NOMAD_CLIENT_KEY="/opt/nomad/cli-certs/cli.key"
|
||||
```
|
||||
|
||||
After these environment variables are correctly configured, the CLI will respond
|
||||
as expected:
|
||||
|
||||
```shell
|
||||
$ nomad status
|
||||
No running jobs
|
||||
```
|
||||
|
||||
## Encrypt Server Gossip
|
||||
|
||||
At this point all of Nomad's RPC and HTTP communication is secured with mTLS.
|
||||
However, Nomad servers also communicate with a gossip protocol, Serf, that does
|
||||
not use TLS:
|
||||
|
||||
- HTTP - Used to communicate between CLI and Nomad agents. Secured by mTLS.
|
||||
- RPC - Used to communicate between Nomad agents. Secured by mTLS.
|
||||
- Serf - Used to communicate between Nomad servers. Secured by a shared key.
|
||||
|
||||
Nomad server's gossip protocol use a shared key instead of TLS for encryption.
|
||||
This encryption key must be added to every server's configuration using the
|
||||
[`encrypt`](/docs/configuration/server#encrypt) parameter or with the
|
||||
[`-encrypt` command line option](/docs/commands/agent).
|
||||
|
||||
The Nomad CLI includes a `operator keygen` command for generating a new secure
|
||||
gossip encryption key:
|
||||
|
||||
```shell
|
||||
$ nomad operator keygen
|
||||
cg8StVXbQJ0gPvMd9o7yrg==
|
||||
```
|
||||
|
||||
Alternatively, you can use any method that base64 encodes 16 random bytes:
|
||||
|
||||
```shell
|
||||
$ openssl rand -base64 16
|
||||
raZjciP8vikXng2S5X0m9w==
|
||||
$ dd if=/dev/urandom bs=16 count=1 status=none | base64
|
||||
LsuYyj93KVfT3pAJPMMCgA==
|
||||
```
|
||||
|
||||
Put the same generated key into every server's configuration file or command
|
||||
line arguments:
|
||||
|
||||
```hcl
|
||||
server {
|
||||
enabled = true
|
||||
|
||||
# Self-elect, should be 3 or 5 for production
|
||||
bootstrap_expect = 1
|
||||
|
||||
# Encrypt gossip communication
|
||||
encrypt = "cg8StVXbQJ0gPvMd9o7yrg=="
|
||||
}
|
||||
```
|
||||
|
||||
Unlike with TLS, reloading Nomad will not be enough to initiate encryption of
|
||||
gossip traffic. At this point, you may restart each Nomad server with `systemctl restart nomad`.
|
||||
|
||||
[capability]: https://www.vaultproject.io/docs/concepts/policies#capabilities
|
||||
[config-parameters]: https://www.vaultproject.io/api/secret/pki#parameters-8
|
||||
[consul-template]: https://www.consul.io/docs/guides/consul-template.html
|
||||
[consul-template-github]: https://github.com/hashicorp/consul-template
|
||||
[ct-download]: https://releases.hashicorp.com/consul-template/
|
||||
[ct-template-stanza]: https://github.com/hashicorp/consul-template#configuration-file-format
|
||||
[login]: https://www.vaultproject.io/docs/commands/login
|
||||
[nomad-tls-stanza]: /docs/configuration/tls
|
||||
[policies]: https://www.vaultproject.io/docs/concepts/policies#policies
|
||||
[pki-engine]: https://www.vaultproject.io/docs/secrets/pki
|
||||
[repo]: https://github.com/hashicorp/nomad/tree/master/terraform
|
||||
[rpc-upgrade-mode]: /docs/configuration/tls#rpc_upgrade_mode
|
||||
[rpc-upgrade]: /guides/security/securing-nomad#rpc-upgrade-mode-for-nomad-servers
|
||||
[seal]: https://www.vaultproject.io/docs/concepts/seal
|
||||
[secure-introduction]: https://learn.hashicorp.com/vault/identity-access-management/iam-secure-intro
|
||||
[token]: https://www.vaultproject.io/docs/concepts/tokens
|
||||
[vault-ca-learn]: https://learn.hashicorp.com/vault/secrets-management/sm-pki-engine
|
||||
[vault-ra]: https://learn.hashicorp.com/vault/operations/ops-reference-architecture
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue