7d37cd3f53
* systemd should be downcased * containerd should be downcased * spellchecking, adjust list item spacing * QEMU should be upcased * spelling, it's->its * Fewer exclamation points; drive-by list spacing * Update website/pages/docs/internals/security.mdx * Namespace is not ent only now. Co-authored-by: Tim Gross <tgross@hashicorp.com>
327 lines
17 KiB
Plaintext
327 lines
17 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Security Model
|
|
sidebar_title: Security Model
|
|
description: >-
|
|
Nomad relies on both a lightweight gossip mechanism and an RPC system to
|
|
provide various features. Both of the systems have different security
|
|
mechanisms that stem from their designs. However, the security mechanisms of
|
|
Nomad have a common goal: to provide confidentiality, integrity, and
|
|
authentication.
|
|
---
|
|
|
|
## Overview
|
|
|
|
Nomad is a flexible workload orchestrator to deploy and manage any containerized
|
|
or legacy application using a single, unified workflow. It can run diverse
|
|
workloads including Docker, non-containerized, microservice, and batch
|
|
applications.
|
|
|
|
Nomad utilizes a lightweight gossip and RPC system, [similar to
|
|
Consul](https://www.consul.io/docs/internals/security), which provides
|
|
various essential features. Both of these systems provide security mechanisms
|
|
which should be utilized to help provide [confidentiality, integrity and
|
|
authentication](https://en.wikipedia.org/wiki/Information_security).
|
|
|
|
Using defense in depth is crucial for cluster security, and deployment
|
|
requirements may differ drastically depending on your use case. Further security
|
|
features for multi-tenant deployments are offered exclusively in the enterprise
|
|
version. This documentation may need to be adapted to your deployment situation,
|
|
but the general mechanisms for a secure Nomad deployment revolve around:
|
|
|
|
- **[mTLS](https://learn.hashicorp.com/tutorials/nomad/security-enable-tls)** -
|
|
Mutual authentication of both the TLS server and client x509 certificates
|
|
prevents internal abuse by preventing unauthenticated access to network
|
|
components within the cluster.
|
|
|
|
- **[ACLs](https://learn.hashicorp.com/collections/nomad/access-control)** - Enables
|
|
authorization for authenticated connections by granting capabilities to ACL
|
|
tokens.
|
|
|
|
- **[Namespaces](https://learn.hashicorp.com/tutorials/nomad/namespaces)**
|
|
- Access to read and write to a namespace can be
|
|
controlled to allow for granular access to job information managed within a
|
|
multi-tenant cluster.
|
|
|
|
- **[Sentinel Policies](https://learn.hashicorp.com/tutorials/nomad/sentinel)**
|
|
(**Enterprise Only**) - Sentinel policies allow for granular control over
|
|
components such as task drivers within a cluster.
|
|
|
|
### Personas
|
|
|
|
When thinking about Nomad, it helps to consider the following types of base
|
|
personas when managing the security requirements for the cluster deployment. The
|
|
granularity may change depending on your team's use case where rigorous roles
|
|
can be accurately defined and managed using the [Nomad backend secret engine for
|
|
Vault](https://www.vaultproject.io/docs/secrets/nomad). This is
|
|
described further with getting started steps using a development server
|
|
[here](https://learn.hashicorp.com/collections/nomad/access-control).
|
|
|
|
It's important to note that there's no traditional concept of a user
|
|
within Nomad itself.
|
|
|
|
- **System Administrator** - This is someone who has access to the underlying
|
|
infrastructure to a Nomad cluster. Often she has access to SSH or RDP
|
|
directly into a server within a cluster through a bastion host. Ultimately
|
|
they have read, write and execute permissions for the actual Nomad binary.
|
|
This binary is the same for server and client nodes using different
|
|
configuration files. These users potentially have something like sudo,
|
|
administrative, or some other super-user access to the underlying compute
|
|
resource. Users like these are essentially totally trusted by Nomad as they
|
|
have administrative rights to the system and can start or stop the agent.
|
|
|
|
- **Nomad Administrator** - This is someone (probably the same **System
|
|
Administrator**) who has access to define the Nomad agent configurations
|
|
for servers and clients, and/or have a Nomad management ACL token. They also
|
|
have total rights to all of the parts in the Nomad system including the
|
|
ability to start and stop all jobs within a cluster.
|
|
|
|
- **Nomad Operator** - This is someone who likely has selective access with
|
|
restricted capabilities to manage jobs applicable to their namespace within
|
|
a cluster.
|
|
|
|
- **User** - This is someone who is a user of an application being run on the
|
|
system. In some cases applications may be public facing and exposed to the
|
|
internet such as a web server. This is someone who shouldn't have any
|
|
network access to the Nomad server API.
|
|
|
|
### Secure Configuration
|
|
|
|
Nomad's security model is applicable only if all parts of the system are running
|
|
with a secure configuration; **Nomad is not secure-by-default.** Without the following
|
|
mechanisms enabled in Nomad's configuration, it may be possible to abuse access
|
|
to a cluster. Like all security considerations, one must appropriately determine
|
|
what concerns they have for their environment and adapt to these security
|
|
recommendations accordingly.
|
|
|
|
#### Requirements
|
|
|
|
- **[mTLS enabled](https://learn.hashicorp.com/tutorials/nomad/security-enable-tls)**
|
|
|
|
- Mutual TLS (mTLS) enables [mutual
|
|
authentication](https://en.wikipedia.org/wiki/Mutual_authentication) with
|
|
security properties to prevent the following problems:
|
|
|
|
* Unauthorized access because both server and clients must provide valid TLS
|
|
[X.509](https://en.wikipedia.org/wiki/X.509) certificates signed by the same
|
|
valid [CA](https://en.wikipedia.org/wiki/Certificate_authority) in order to
|
|
communicate within the cluster.
|
|
|
|
* Observing or tampering communication between nodes is thwarted due to the
|
|
traffic being encrypted using the well known network security protocol
|
|
[TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) version 1.2,
|
|
with a [configurable minimal
|
|
version](/docs/configuration/tls#tls_min_version).
|
|
Both server and client agents must be configured to validate each other's
|
|
certificates to ensure mTLS is actually enabled. This requires appropriate
|
|
certificates to be distributed to servers, clients, machines, or operators
|
|
for things like CLI usage. It is recommended to use
|
|
[Vault](https://learn.hashicorp.com/tutorials/nomad/vault-pki-nomad)
|
|
to securely manage the certificate creation and rotation for nodes.
|
|
|
|
* Agent role misconfiguration is prevented using the X.509
|
|
[SAN](https://en.wikipedia.org/wiki/Subject_Alternative_Name) extension.
|
|
This is essentially a domain name that is used to identify and verify a
|
|
node's region and role name are configured as expected (e.g.
|
|
`client.us-east.nomad`).
|
|
|
|
* Using the previously mentioned role name prevents maliciously masquerading
|
|
as a server or client node, and allows other services to be signed easily by
|
|
the same CA. This also avoids any potential pitfalls with certificates using
|
|
the IP or Hostname of nodes within a cluster.
|
|
|
|
- **[ACLs enabled](https://learn.hashicorp.com/collections/nomad/access-control)** - The
|
|
access control list (ACL) system provides a capability-based control
|
|
mechanism for Nomad administrators allowing for custom roles (typically
|
|
within Vault) to be tied to an individual human or machine operator
|
|
identity. This allows for access to capabilities within the cluster to be
|
|
restricted to specific users.
|
|
|
|
- **[Sentinel Policies](https://learn.hashicorp.com/tutorials/nomad/sentinel)**
|
|
(**Enterprise Only**) - [Sentinel](https://www.hashicorp.com/sentinel/) is
|
|
a feature which enables
|
|
[policy-as-code](https://docs.hashicorp.com/sentinel/concepts/policy-as-code/)
|
|
to enforce further restrictions on operators. This is used to augment the
|
|
built-in ACL system for fine-grained control over jobs.
|
|
|
|
- **[Namespaces](https://learn.hashicorp.com/tutorials/nomad/namespaces)**
|
|
(**Enterprise Only**) - This feature allows for a cluster to be shared by
|
|
multiple teams within a company. Using this logical separation is important
|
|
for multi-tenant clusters to prevent users without access to that namespace
|
|
from conflicting with each other. This requires ACLs to be enabled in order
|
|
to be enforced.
|
|
|
|
- **[Resource Quotas](https://learn.hashicorp.com/tutorials/nomad/quotas)**
|
|
(**Enterprise Only**) - Can limit a namespace's access to the underlying
|
|
compute resources in the cluster by setting upper-limits for operators.
|
|
Access to these resource quotas can be managed via ACLs to ensure read-only
|
|
access for operators so they can't just change their quotas.
|
|
|
|
#### Recommendations
|
|
|
|
The following are security recommendations that can help significantly improve
|
|
the security of your cluster depending on your use case. We recommend always
|
|
practicing defense in depth when architecting the security mechanisms for your
|
|
environment.
|
|
|
|
- **Rotate credentials** - Using short-lived credentials or rotating them
|
|
frequently is highly recommended to reduce damage of accidentally leaked
|
|
credentials.
|
|
|
|
- Use [Vault](/docs/integrations/vault-integration) to create and manage
|
|
dynamic, rotated credentials prevent secrets from being easily exposed
|
|
within the [job specification](/docs/job-specification) itself
|
|
which may be leaked into version control or otherwise be accidentally stored
|
|
on disk on an operator's local machine.
|
|
|
|
- Rotate credentials used by the Nomad agent; e.g. [integrate with Vault's
|
|
PKI secret engine](https://learn.hashicorp.com/tutorials/nomad/vault-pki-nomad) to
|
|
automatically generate and renew dynamic, unique X.509 certificates for each
|
|
Nomad node with a short [TTL](https://en.wikipedia.org/wiki/Time_to_live).
|
|
|
|
- **[Running without Root](https://groups.google.com/forum/#!topic/nomad-tool/pSyMwC_FSFA)** -
|
|
Nomad servers can be run as unprivileged users that only require access to
|
|
the data directory.
|
|
|
|
- **Containers with Sandbox Runtimes** - In some situations, such as running
|
|
untrusted code as a service, it may be worth considering using different
|
|
container runtimes such as [gVisor](https://gvisor.dev/) or [Kata
|
|
Containers](https://katacontainers.io/). These types of runtimes provide
|
|
sandboxing features which help prevent raw access to the underlying shared
|
|
kernel for other containers and the Nomad client agent itself. Docker driver
|
|
allows [customizing runtimes](/docs/drivers/docker#runtime).
|
|
|
|
- **[Disable Unused Drivers](/docs/configuration/client#driver-denylist)** -
|
|
Each driver provides different degrees of isolation, and bugs may allow
|
|
unintended privilege escalation. If a task driver is not needed, you can
|
|
disable it to reduce risk.
|
|
|
|
- **Linux Security Modules** - Use of security modules that can be directly
|
|
integrated into operating systems such as AppArmor, SElinux, and Seccomp on
|
|
both the Nomad hosts and applied to containers for an extra layer of
|
|
security. Seccomp profiles are able to be passed directly to containers
|
|
using the
|
|
**[`security_opt`](/docs/drivers/docker#security_opt)**
|
|
parameter available in the default [Docker
|
|
driver](/docs/drivers/docker).
|
|
|
|
- **[Service Mesh](https://www.hashicorp.com/resources/service-mesh-microservices-networking)** -
|
|
Integrating service mesh technologies such as
|
|
**[Consul](https://www.consul.io/)** can be extremely useful for limiting
|
|
and efficiently load balancing network connectivity within a cluster.
|
|
|
|
- **[TLS Settings](/docs/configuration/tls)** -
|
|
TLS settings, such as the available [cipher suites](/docs/configuration/tls#tls_cipher_suites), should be tuned to fit the needs of your environment.
|
|
|
|
- **[HTTP Headers](/docs/configuration#http_api_response_headers)** -
|
|
Additional security [headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers), such as [`X-XSS-Protection`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-XSS-Protection), can be [configured](/docs/configuration#http_api_response_headers) for HTTP API responses.
|
|
|
|
### Threat Model
|
|
|
|
The following are parts of the Nomad threat model:
|
|
|
|
- **Nomad agent-to-agent communication** - Transport encryption for
|
|
agent-to-agent communication is required to prevent eavesdropping. TCP and
|
|
UDP based protocols within Nomad provide different mechanisms for enabling
|
|
encryption including symmetric (shared gossip encryption keys) and
|
|
asymmetric keys (TLS).
|
|
|
|
- **Tampering of data in transit** - Any tampering should be detectable via mTLS
|
|
and cause Nomad to avoid processing the request.
|
|
|
|
- **Access to data without authentication or authorization** - Requests to the
|
|
server should be authenticated and authorized using mTLS and ACLs
|
|
respectively.
|
|
|
|
- **State modification or corruption due to malicious messages** - Improperly
|
|
formatted messages are discarded while properly formatted messages require
|
|
authentication and authorization.
|
|
|
|
- **Non-server members accessing raw data** - All servers that join the cluster
|
|
require proper authentication and authorization in order to begin
|
|
participating in Raft. All data in Raft should be encrypted with TLS.
|
|
|
|
- **Denial of Service against a node** - DoS attacks against a single node
|
|
should not compromise the security posture of Nomad.
|
|
|
|
The following are not part of the threat model for server agents:
|
|
|
|
- **Access (read or write) to the Nomad data directory** - Information about the
|
|
jobs managed by Nomad is persisted to a server's data directory.
|
|
|
|
- **Access (read or write) to the Nomad configuration directory** - Access to
|
|
Nomad's configuration file(s) directory can enable and disable features for
|
|
a cluster.
|
|
|
|
- **Memory access to a running Nomad server agent** - Direct access to the
|
|
memory of the Nomad server agent process (usually requiring a shell on the
|
|
system through various means) results in almost all aspects of the agent
|
|
being compromised including access to certificates and other secrets.
|
|
|
|
The following are not part of the threat model for client agents:
|
|
|
|
- **Access (read or write) to the Nomad data directory** - Information about the
|
|
allocations scheduled to a Nomad client is persisted to its data directory.
|
|
This would include any secrets in any of the allocation's file systems.
|
|
|
|
- **Access (read or write) to the Nomad configuration directory** - Access to a
|
|
client's configuration file can enable and disable features for a client
|
|
including insecure drivers such as
|
|
[`raw_exec`](/docs/drivers/raw_exec).
|
|
|
|
- **Memory access to a running Nomad client agent** - Direct access to the
|
|
memory of the Nomad client agent process allows an attack to extract secrets
|
|
from clients such as Vault tokens.
|
|
|
|
- **Lax Client Driver Sandbox** - Drivers may allow some privileged operations,
|
|
e.g. filesystem access to configuration directories, or raw accesses to host
|
|
devices. Such privileges can be used to facilitate compromise other workloads,
|
|
or cause denial-of-service attacks.
|
|
|
|
#### Internal Threats
|
|
|
|
- **Job Operator** - Someone with a valid mTLS certificate and ACL token may still be a
|
|
threat to your cluster in certain situations, especially in multi-team
|
|
cluster deployments. They may accidentally or intentionally use a malicious
|
|
job to harm a cluster which can help be protected against using
|
|
Quotas, Namespace, and Sentinel policies.
|
|
|
|
- **Workload** - Workloads may have host network access within a cluster which
|
|
can lead to SSRF due to application security issues outside of the scope of
|
|
Nomad which may lead to internal access within the cluster. Using mTLS, ACLs
|
|
and Sentinel policies together can add layers of protection against
|
|
malicious workloads.
|
|
|
|
- **RPC / API Access** - RPC and HTTP API endpoints without mTLS can expose
|
|
clusters to abuse within the cluster from malicious workloads.
|
|
|
|
- **Client driver** - Drivers implement various workload types for a cluster,
|
|
and the backend configuration of these drivers should be considered to
|
|
implement defense in depth. For example, a custom Docker driver that limits
|
|
the ability to mount the host file system may be subverted by network access
|
|
to an exposed Docker daemon API through other means such as the [`raw_exec`](/docs/drivers/raw_exec)
|
|
driver.
|
|
|
|
#### External Threats
|
|
|
|
There are two main components to consider to for external threats in a Nomad cluster:
|
|
|
|
- **Server agent** - Internal cluster leader elections and replication is
|
|
managed via Raft between server agents encrypted in transit. However,
|
|
information about the server is stored unencrypted at rest in the agent's
|
|
data directory. This may contain sensitive information such as ACL tokens
|
|
and TLS certificates.
|
|
|
|
- **Client agent** - Client-to-server communication within a cluster is
|
|
encrypted and authenticated using mTLS. Information about the allocations on
|
|
a client node is unencrypted in the agent's data and configuration
|
|
directory.
|
|
|
|
### Network Ports
|
|
|
|
| **Port / Protocol** | Agents | Description |
|
|
| -------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| **4646** / TCP | All | [HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) to provide [UI](https://learn.hashicorp.com/tutorials/nomad/web-ui-access) and [API](/api-docs) access to agents. |
|
|
| **4647** / TCP | Servers | [RPC](https://en.wikipedia.org/wiki/Remote_procedure_call) protocol used by agents. |
|
|
| **4648** / TCP + UDP | Servers | [gossip](/docs/internals/gossip) protocol to manage server membership using [Serf](https://www.serf.io/). |
|