323 lines
16 KiB
Plaintext
323 lines
16 KiB
Plaintext
|
---
|
|||
|
layout: docs
|
|||
|
page_title: Security Model
|
|||
|
sidebar_title: Security Model
|
|||
|
description: >-
|
|||
|
Nomad relies on both a lightweight gossip mechanism and an RPC system to
|
|||
|
provide various features. Both of the systems have different security
|
|||
|
mechanisms that stem from their designs. However, the security mechanisms of
|
|||
|
Nomad have a common goal: to provide confidentiality, integrity, and
|
|||
|
authentication.
|
|||
|
---
|
|||
|
|
|||
|
## Overview
|
|||
|
|
|||
|
Nomad is a flexible workload orchestrator to deploy and manage any containerized
|
|||
|
or legacy application using a single, unified workflow. It can run diverse
|
|||
|
workloads including Docker, non-containerized, microservice, and batch
|
|||
|
applications.
|
|||
|
|
|||
|
Nomad utilizes a lightweight gossip and RPC system, [similar to
|
|||
|
Consul](https://www.consul.io/docs/internals/security.html), which provides
|
|||
|
various essential features. Both of these systems provide security mechanisms
|
|||
|
which should be utilized to help provide [confidentiality, integrity and
|
|||
|
authentication](https://en.wikipedia.org/wiki/Information_security).
|
|||
|
|
|||
|
Using defense in depth is crucial for cluster security, and deployment
|
|||
|
requirements may differ drastically depending on your use case. Further security
|
|||
|
features for multi-tenant deployments are offered exclusively in the enterprise
|
|||
|
version. This documentation may need to be adapted to your deployment situation,
|
|||
|
but the general mechanisms for a secure Nomad deployment revolve around:
|
|||
|
|
|||
|
* **[mTLS](/guides/security/securing-nomad.html)** -
|
|||
|
Mutual authorization of both the TLS server and client x509 certificates
|
|||
|
prevents internal abuse by preventing unauthorized access to network
|
|||
|
components within the cluster.
|
|||
|
|
|||
|
* **[ACLs](/guides/security/acl.html)** - Allow for
|
|||
|
roles to be applied to authorized connections by granting capabilities for a
|
|||
|
token.
|
|||
|
|
|||
|
* **[Namespaces](/docs/enterprise/index.html#namespaces)**
|
|||
|
(**Enterprise Only**) - Access to read and write to a Namepsace can be
|
|||
|
controlled to allow for granular access to job information managed within a
|
|||
|
multi-tenant cluster.
|
|||
|
|
|||
|
* **[Sentinel Policies](/docs/enterprise/index.html#sentinel-policies)**
|
|||
|
(**Enterprise Only**) - Sentinel policies allow for granular control over
|
|||
|
components such as task drivers within a cluster.
|
|||
|
|
|||
|
### Personas
|
|||
|
|
|||
|
When thinking about Nomad, it helps to consider the following types of base
|
|||
|
personas when managing the security requirements for the cluster deployment. The
|
|||
|
granularity may change depending on your team’s use case where rigorous roles
|
|||
|
can be accurately defined and managed using the [Nomad backend secret engine for
|
|||
|
Vault](https://www.vaultproject.io/docs/secrets/nomad/index.html). This is
|
|||
|
described further with getting started steps using a development server
|
|||
|
[here](/guides/security/acl.html#vault-integration).
|
|||
|
|
|||
|
It’s super important to note that there's no traditional concept of a user
|
|||
|
within Nomad itself.
|
|||
|
|
|||
|
* **System Administrator** - This is someone who has access to the underlying
|
|||
|
infrastructure to a Nomad cluster. Often she has access to SSH or RDP
|
|||
|
directly into a server within a cluster through a bastion host. Ultimately
|
|||
|
they have read, write and execute permissions for the actual Nomad binary.
|
|||
|
This binary is the same for server and client nodes using different
|
|||
|
configuration files. These users potentially have something like sudo,
|
|||
|
administrative, or some other super-user access to the underlying compute
|
|||
|
resource. Users like these are essentially totally trusted by Nomad as they
|
|||
|
have administrative rights to the system and can start or stop the agent.
|
|||
|
|
|||
|
* **Nomad Administrator** - This is someone ( probably the same **System
|
|||
|
Administrator** ) who has access to define the Nomad agent configurations
|
|||
|
for servers and clients. They also have total rights to all of the parts in
|
|||
|
the Nomad system including the ability to start and stop all jobs within a
|
|||
|
cluster.
|
|||
|
|
|||
|
* **Nomad Operator** - This is someone who likely has selective access with
|
|||
|
restricted capabilities to manage jobs applicable to their namespace within
|
|||
|
a cluster.
|
|||
|
|
|||
|
* **User** - This is someone who is a user of an application being run on the
|
|||
|
system. In some cases applications may be public facing and exposed to the
|
|||
|
internet such as a web server. This is someone who shouldn’t have any
|
|||
|
network access to the Nomad server API.
|
|||
|
|
|||
|
### Secure Configuration
|
|||
|
|
|||
|
Nomad’s security model is applicable only if all parts of the system are running
|
|||
|
with a secure configuration; it is not secure-by-default. Without the following
|
|||
|
mechanisms enabled in Nomad’s configuration, it may be possible to abuse access
|
|||
|
to a cluster. Like all security considerations, one must appropriately determine
|
|||
|
what concerns they have for their environment and adapt to these security
|
|||
|
recommendations accordingly.
|
|||
|
|
|||
|
#### Requirements
|
|||
|
|
|||
|
* **[mTLS enabled](/guides/security/securing-nomad.html)**
|
|||
|
- Mutual TLS ( mTLS ) enables [mutual
|
|||
|
authentication](https://en.wikipedia.org/wiki/Mutual_authentication) with
|
|||
|
security properties to prevent the following problems:
|
|||
|
|
|||
|
* Unauthorized access because both server and clients must provide valid TLS
|
|||
|
[X.509](https://en.wikipedia.org/wiki/X.509) certificates signed by the same
|
|||
|
valid [CA](https://en.wikipedia.org/wiki/Certificate_authority) in order to
|
|||
|
communicate within the cluster.
|
|||
|
|
|||
|
* Observing or tampering communication between nodes is thwarted due to the
|
|||
|
traffic being encrypted using the well known network security protocol
|
|||
|
[TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) version 1.2,
|
|||
|
with a [configurable minimal
|
|||
|
version](/docs/configuration/tls.html#tls_min_version).
|
|||
|
Both server and client agents must be configured to validate each other's
|
|||
|
certificates to ensure mTLS is actually enabled. This requires appropriate
|
|||
|
certificates to be distributed to servers, clients, machines, or operators
|
|||
|
for things like CLI usage. It is recommended to use
|
|||
|
[Vault](/guides/security/vault-pki-integration.html)
|
|||
|
to securely manage the certificate creation and rotation for nodes.
|
|||
|
|
|||
|
* Agent role misconfiguration is prevented using the X.509
|
|||
|
[SAN](https://en.wikipedia.org/wiki/Subject_Alternative_Name) extension.
|
|||
|
This is essentially a domain name that is used to identify and verify a
|
|||
|
node’s region and role name are configured as expected ( e.g.
|
|||
|
`client.us-east.nomad` ).
|
|||
|
|
|||
|
* Using the previously mentioned role name prevents maliciously masquerading
|
|||
|
as a server or client node, and allows other services to be signed easily by
|
|||
|
the same CA. This also avoids any potential pitfalls with certificates using
|
|||
|
the IP or Hostname of nodes within a cluster.
|
|||
|
|
|||
|
* **[ACLs enabled](/guides/security/acl.html)** - The
|
|||
|
access control list (ACL) system provides a capability-based control
|
|||
|
mechanism for Nomad administrators allowing for custom roles ( typically
|
|||
|
within Vault ) to be tied to an individual human or machine operator
|
|||
|
identity. This allows for access to capabilities within the cluster to be
|
|||
|
restricted to specific users.
|
|||
|
|
|||
|
* **[Sentinel Policies](/guides/governance-and-policy/sentinel/sentinel-policy.html)**
|
|||
|
(**Enterprise Only**) - [Sentinel](https://www.hashicorp.com/sentinel/) is
|
|||
|
a feature which enables
|
|||
|
[policy-as-code](https://docs.hashicorp.com/sentinel/concepts/policy-as-code/)
|
|||
|
to enforce further restrictions on operators. This is used to augment the
|
|||
|
built-in ACL system for fine-grained control over jobs.
|
|||
|
|
|||
|
* **[Namespaces](/guides/governance-and-policy/namespaces.html)**
|
|||
|
(**Enterprise Only**) - This feature allows for a cluster to be shared by
|
|||
|
multiple teams within a company. Using this logical separation is important
|
|||
|
for multi-tenant clusters to prevent users without access to that namespace
|
|||
|
from conflicting with each other. This requires ACLs to be enabled in order
|
|||
|
to be enforced.
|
|||
|
|
|||
|
* **[Resource Quotas](/guides/governance-and-policy/quotas.html)**
|
|||
|
(**Enterprise Only**) - Can limit a namespace’s access to the underlying
|
|||
|
compute resources in the cluster by setting upper-limits for operators.
|
|||
|
Access to these resource quotas can be managed via ACLs to ensure read-only
|
|||
|
access for operators so they can’t just change their quotas.
|
|||
|
|
|||
|
#### Recommendations
|
|||
|
|
|||
|
The following are security recommendations that can help significantly improve
|
|||
|
the security of your cluster depending on your use case. We recommend always
|
|||
|
practicing defense in depth when architecting the security mechanisms for your
|
|||
|
environment.
|
|||
|
|
|||
|
* **[Rotate Credentials](/docs/job-specification/vault.html)** -
|
|||
|
Using something like [Vault](/docs/vault-integration/index.html) to
|
|||
|
create and manage dynamic, rotated credentials is highly recommended to
|
|||
|
prevent secrets from being easily exposed within the [job
|
|||
|
specification](/docs/job-specification/index.html)
|
|||
|
itself which may be leaked into version control or otherwise be accidently
|
|||
|
stored on disk on an operator’s local machine. It is also possible to
|
|||
|
[integrate with Vault’s PKI secret engine](/guides/security/vault-pki-integration.html)
|
|||
|
to automatically generate and renew dynamic, unique X.509 certificates for
|
|||
|
each Nomad node with a short
|
|||
|
[TTL](https://en.wikipedia.org/wiki/Time_to_live).
|
|||
|
|
|||
|
* **[Running without Root](https://groups.google.com/forum/#!topic/nomad-tool/pSyMwC_FSFA)** -
|
|||
|
Certain features of Nomad can be used without needing to run the Nomad agent
|
|||
|
server or client as the `root` user. Instead you can granularly assign the
|
|||
|
appropriate capabilities in various ways for your Nomad agents. For example:
|
|||
|
Nomad servers only require access to the data directory; it is possible to
|
|||
|
use Nomad to orchestrate Docker containers by adding a non-root `nomad` user
|
|||
|
to the `docker` group to access the [default unix
|
|||
|
socket](https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-socket-option).
|
|||
|
|
|||
|
* **Containers with Sandbox Runtimes** - In some situations, such as running
|
|||
|
untrusted code as a service, it may be worth considering using different
|
|||
|
container runtimes such as [gVisor](https://gvisor.dev/) or [Kata
|
|||
|
Containers](https://katacontainers.io/). These types of runtimes provide
|
|||
|
sandboxing features which help prevent raw access to the underlying shared
|
|||
|
kernel for other containers and the Nomad client agent itself.
|
|||
|
|
|||
|
* **[Disable Unused Drivers](/docs/configuration/client#driver-blacklist)** -
|
|||
|
Each driver provides different degrees of isolation, and bugs may allow
|
|||
|
unintended privilege escalation. If a task driver is not needed, you can
|
|||
|
disable it to reduce risk.
|
|||
|
|
|||
|
* **Linux Security Modules** - Use of security modules that can be directly
|
|||
|
integrated into operating systems such as AppArmor, SElinux, and Seccomp on
|
|||
|
both the Nomad hosts and applied to containers for an extra layer of
|
|||
|
security. Seccomp profiles are able to be passed directly to containers
|
|||
|
using the
|
|||
|
**[security_opt](/docs/drivers/docker.html#security_opt)**
|
|||
|
parameter available in the default [Docker
|
|||
|
driver](/docs/drivers/docker.html).
|
|||
|
|
|||
|
* **[Service Mesh](https://www.hashicorp.com/resources/service-mesh-microservices-networking)** -
|
|||
|
Integrating service mesh technologies such as
|
|||
|
**[Consul](https://www.consul.io/)** can be extremely useful for limiting
|
|||
|
and efficiently load balancing network connectivity within a cluster.
|
|||
|
|
|||
|
### Threat Model
|
|||
|
|
|||
|
The following are parts of the Nomad threat model:
|
|||
|
|
|||
|
* **Nomad agent-to-agent communication** - Transport encryption for
|
|||
|
agent-to-agent communication is required to prevent eavesdropping. TCP and
|
|||
|
UDP based protocols within Nomad provide different mechanisms for enabling
|
|||
|
encryption including symmetric (shared gossip encryption keys) and
|
|||
|
asymmetric keys (TLS).
|
|||
|
|
|||
|
* **Tampering of data in transit** - Any tampering should be detectable via mTLS
|
|||
|
and cause Nomad to avoid processing the request.
|
|||
|
|
|||
|
* **Access to data without authentication or authorization** - Requests to the
|
|||
|
server should be authenticated and authorized using mTLS and ACLs
|
|||
|
respectively.
|
|||
|
|
|||
|
* **State modification or corruption due to malicious messages** - Improperly
|
|||
|
formatted messages are discarded while properly formatted messages require
|
|||
|
authentication and authorization.
|
|||
|
|
|||
|
* **Non-server members accessing raw data** - All servers that join the cluster
|
|||
|
require proper authentication and authorization in order to begin
|
|||
|
participating in Raft. All data in Raft should be encrypted with TLS.
|
|||
|
|
|||
|
* **Denial of Service against a node** - DoS attacks against a single node
|
|||
|
should not compromise the security posture of Nomad.
|
|||
|
|
|||
|
The following are not part of the threat model for server agents:
|
|||
|
|
|||
|
* **Access (read or write) to the Nomad data directory** - Information about the
|
|||
|
jobs managed by Nomad is persisted to a server’s data directory.
|
|||
|
|
|||
|
* **Access (read or write) to the Nomad configuration directory** - Access to
|
|||
|
Nomad’s configuration file(s) directory can enable and disable features for
|
|||
|
a cluster.
|
|||
|
|
|||
|
* **Memory access to a running Nomad server agent** - Direct access to the
|
|||
|
memory of the Nomad server agent process ( usually requiring a shell on the
|
|||
|
system through various means ) results in almost all aspects of the agent
|
|||
|
being compromised including access to certificates and other secrets.
|
|||
|
|
|||
|
The following are not part of the threat model for client agents:
|
|||
|
|
|||
|
* **Access (read or write) to the Nomad data directory** - Information about the
|
|||
|
allocations scheduled to a Nomad client is persisted to its data directory.
|
|||
|
This would include any secrets in any of the allocation’s file systems.
|
|||
|
|
|||
|
* **Access (read or write) to the Nomad configuration directory** - Access to a
|
|||
|
client’s configuration file can enable and disable features for a client
|
|||
|
including insecure drivers such as
|
|||
|
[raw_exec](/docs/drivers/raw_exec.html).
|
|||
|
|
|||
|
* **Memory access to a running Nomad client agent** - Direct access to the
|
|||
|
memory of the Nomad client agent process allows an attack to extract secrets
|
|||
|
from clients such as Vault tokens.
|
|||
|
|
|||
|
* **Lax Client Driver Sandbox** - Drivers may allow some privileged operations,
|
|||
|
e.g. filesystem access to configuration directories, or raw accesses to host
|
|||
|
devices. Such privileges can be used to facilitate compromise other workloads,
|
|||
|
or cause denial-of-service attacks.
|
|||
|
|
|||
|
#### Internal Threats
|
|||
|
|
|||
|
* **Operator** - Someone with a valid mTLS cert and ACL token may still be a
|
|||
|
threat to your cluster in certain situations, especially in multi-team
|
|||
|
cluster deployments. They may accidentally or intentionally use a malicious
|
|||
|
jobspec to harm a cluster which can help be protected against using
|
|||
|
Namespaces and Sentinel policies.
|
|||
|
|
|||
|
* **Workload** - Workloads may have host network access within a cluster which
|
|||
|
can lead to SSRF due to application security issues outside of the scope of
|
|||
|
Nomad which may lead to internal access within the cluster. Using mTLS, ACLs
|
|||
|
and Sentinel policies together can add layers of protection against
|
|||
|
malicious workloads.
|
|||
|
|
|||
|
* **RPC / API Access** - RPC and HTTP API endpoints without mTLS can expose
|
|||
|
clusters to abuse within the cluster from malicious workloads.
|
|||
|
|
|||
|
* **Client driver** - Drivers implement various workload types for a cluster,
|
|||
|
and the backend configuration of these drivers should be considered to
|
|||
|
implement defense in depth. For example, a custom Docker driver that limits
|
|||
|
the ability to mount the host file system may be subverted by network access
|
|||
|
to an exposed Docker daemon API through other means such as the raw_exec
|
|||
|
driver.
|
|||
|
|
|||
|
|
|||
|
#### External Threats
|
|||
|
|
|||
|
There are two main components to consider to for external threats in a Nomad cluster:
|
|||
|
|
|||
|
* **Server agent** - Internal cluster leader elections and replication is
|
|||
|
managed via Raft between server agents encrypted in transit. However,
|
|||
|
information about the server is stored unencrypted at rest in the agent’s
|
|||
|
data directory. This information may contain information such as ACL tokens
|
|||
|
and TLS certificates.
|
|||
|
|
|||
|
* **Client agent** - Client-to-server communication within a cluster is
|
|||
|
encrypted and authenticated using mTLS. Information about the allocations on
|
|||
|
a client node is unencrypted in the agent’s data and configuration
|
|||
|
directory.
|
|||
|
|
|||
|
### Network Ports
|
|||
|
|
|||
|
|
|||
|
| **Port / Protocol** | Agents | Description |
|
|||
|
|----------------------|---------|-------------|
|
|||
|
| **4646** / TCP | All | [HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) to provide [UI](/guides/web-ui/access.html) and [API](/api/index.html) access to agents. |
|
|||
|
| **4647** / TCP | Servers | [RPC](https://en.wikipedia.org/wiki/Remote_procedure_call) protocol used by agents. |
|
|||
|
| **4648** / TCP + UDP | Servers | [gossip](/docs/internals/gossip.html) protocol to manage server membership using [Serf](https://www.serf.io/). |
|