From 0e4346986daa9b6f40ac9f212c312d508090463f Mon Sep 17 00:00:00 2001 From: Mahmood Ali Date: Mon, 6 Jul 2020 14:54:49 -0400 Subject: [PATCH] Add security model document --- website/data/docs-navigation.js | 3 +- website/pages/docs/internals/security.mdx | 322 ++++++++++++++++++++++ 2 files changed, 324 insertions(+), 1 deletion(-) create mode 100644 website/pages/docs/internals/security.mdx diff --git a/website/data/docs-navigation.js b/website/data/docs-navigation.js index 88a0fdd6b..d3a5d9b8a 100644 --- a/website/data/docs-navigation.js +++ b/website/data/docs-navigation.js @@ -38,7 +38,8 @@ export default [ content: ['scheduling', 'preemption'] }, 'consensus', - 'gossip' + 'gossip', + 'security' ] }, { diff --git a/website/pages/docs/internals/security.mdx b/website/pages/docs/internals/security.mdx new file mode 100644 index 000000000..a24b145c0 --- /dev/null +++ b/website/pages/docs/internals/security.mdx @@ -0,0 +1,322 @@ +--- +layout: docs +page_title: Security Model +sidebar_title: Security Model +description: >- + Nomad relies on both a lightweight gossip mechanism and an RPC system to + provide various features. Both of the systems have different security + mechanisms that stem from their designs. However, the security mechanisms of + Nomad have a common goal: to provide confidentiality, integrity, and + authentication. +--- + +## Overview + +Nomad is a flexible workload orchestrator to deploy and manage any containerized +or legacy application using a single, unified workflow. It can run diverse +workloads including Docker, non-containerized, microservice, and batch +applications. + +Nomad utilizes a lightweight gossip and RPC system, [similar to +Consul](https://www.consul.io/docs/internals/security.html), which provides +various essential features. Both of these systems provide security mechanisms +which should be utilized to help provide [confidentiality, integrity and +authentication](https://en.wikipedia.org/wiki/Information_security). + +Using defense in depth is crucial for cluster security, and deployment +requirements may differ drastically depending on your use case. Further security +features for multi-tenant deployments are offered exclusively in the enterprise +version. This documentation may need to be adapted to your deployment situation, +but the general mechanisms for a secure Nomad deployment revolve around: + +* **[mTLS](/guides/security/securing-nomad.html)** - + Mutual authorization of both the TLS server and client x509 certificates + prevents internal abuse by preventing unauthorized access to network + components within the cluster. + +* **[ACLs](/guides/security/acl.html)** - Allow for + roles to be applied to authorized connections by granting capabilities for a + token. + +* **[Namespaces](/docs/enterprise/index.html#namespaces)** + (**Enterprise Only**) - Access to read and write to a Namepsace can be + controlled to allow for granular access to job information managed within a + multi-tenant cluster. + +* **[Sentinel Policies](/docs/enterprise/index.html#sentinel-policies)** + (**Enterprise Only**) - Sentinel policies allow for granular control over + components such as task drivers within a cluster. + +### Personas + +When thinking about Nomad, it helps to consider the following types of base +personas when managing the security requirements for the cluster deployment. The +granularity may change depending on your team’s use case where rigorous roles +can be accurately defined and managed using the [Nomad backend secret engine for +Vault](https://www.vaultproject.io/docs/secrets/nomad/index.html). This is +described further with getting started steps using a development server +[here](/guides/security/acl.html#vault-integration). + +It’s super important to note that there's no traditional concept of a user +within Nomad itself. + +* **System Administrator** - This is someone who has access to the underlying + infrastructure to a Nomad cluster. Often she has access to SSH or RDP + directly into a server within a cluster through a bastion host. Ultimately + they have read, write and execute permissions for the actual Nomad binary. + This binary is the same for server and client nodes using different + configuration files. These users potentially have something like sudo, + administrative, or some other super-user access to the underlying compute + resource. Users like these are essentially totally trusted by Nomad as they + have administrative rights to the system and can start or stop the agent. + +* **Nomad Administrator** - This is someone ( probably the same **System + Administrator** ) who has access to define the Nomad agent configurations + for servers and clients. They also have total rights to all of the parts in + the Nomad system including the ability to start and stop all jobs within a + cluster. + +* **Nomad Operator** - This is someone who likely has selective access with + restricted capabilities to manage jobs applicable to their namespace within + a cluster. + +* **User** - This is someone who is a user of an application being run on the + system. In some cases applications may be public facing and exposed to the + internet such as a web server. This is someone who shouldn’t have any + network access to the Nomad server API. + +### Secure Configuration + +Nomad’s security model is applicable only if all parts of the system are running +with a secure configuration; it is not secure-by-default. Without the following +mechanisms enabled in Nomad’s configuration, it may be possible to abuse access +to a cluster. Like all security considerations, one must appropriately determine +what concerns they have for their environment and adapt to these security +recommendations accordingly. + +#### Requirements + +* **[mTLS enabled](/guides/security/securing-nomad.html)** + - Mutual TLS ( mTLS ) enables [mutual + authentication](https://en.wikipedia.org/wiki/Mutual_authentication) with + security properties to prevent the following problems: + + * Unauthorized access because both server and clients must provide valid TLS + [X.509](https://en.wikipedia.org/wiki/X.509) certificates signed by the same + valid [CA](https://en.wikipedia.org/wiki/Certificate_authority) in order to + communicate within the cluster. + + * Observing or tampering communication between nodes is thwarted due to the + traffic being encrypted using the well known network security protocol + [TLS](https://en.wikipedia.org/wiki/Transport_Layer_Security) version 1.2, + with a [configurable minimal + version](/docs/configuration/tls.html#tls_min_version). + Both server and client agents must be configured to validate each other's + certificates to ensure mTLS is actually enabled. This requires appropriate + certificates to be distributed to servers, clients, machines, or operators + for things like CLI usage. It is recommended to use + [Vault](/guides/security/vault-pki-integration.html) + to securely manage the certificate creation and rotation for nodes. + + * Agent role misconfiguration is prevented using the X.509 + [SAN](https://en.wikipedia.org/wiki/Subject_Alternative_Name) extension. + This is essentially a domain name that is used to identify and verify a + node’s region and role name are configured as expected ( e.g. + `client.us-east.nomad` ). + + * Using the previously mentioned role name prevents maliciously masquerading + as a server or client node, and allows other services to be signed easily by + the same CA. This also avoids any potential pitfalls with certificates using + the IP or Hostname of nodes within a cluster. + +* **[ACLs enabled](/guides/security/acl.html)** - The + access control list (ACL) system provides a capability-based control + mechanism for Nomad administrators allowing for custom roles ( typically + within Vault ) to be tied to an individual human or machine operator + identity. This allows for access to capabilities within the cluster to be + restricted to specific users. + +* **[Sentinel Policies](/guides/governance-and-policy/sentinel/sentinel-policy.html)** + (**Enterprise Only**) - [Sentinel](https://www.hashicorp.com/sentinel/) is + a feature which enables + [policy-as-code](https://docs.hashicorp.com/sentinel/concepts/policy-as-code/) + to enforce further restrictions on operators. This is used to augment the + built-in ACL system for fine-grained control over jobs. + +* **[Namespaces](/guides/governance-and-policy/namespaces.html)** + (**Enterprise Only**) - This feature allows for a cluster to be shared by + multiple teams within a company. Using this logical separation is important + for multi-tenant clusters to prevent users without access to that namespace + from conflicting with each other. This requires ACLs to be enabled in order + to be enforced. + +* **[Resource Quotas](/guides/governance-and-policy/quotas.html)** + (**Enterprise Only**) - Can limit a namespace’s access to the underlying + compute resources in the cluster by setting upper-limits for operators. + Access to these resource quotas can be managed via ACLs to ensure read-only + access for operators so they can’t just change their quotas. + +#### Recommendations + +The following are security recommendations that can help significantly improve +the security of your cluster depending on your use case. We recommend always +practicing defense in depth when architecting the security mechanisms for your +environment. + +* **[Rotate Credentials](/docs/job-specification/vault.html)** - + Using something like [Vault](/docs/vault-integration/index.html) to + create and manage dynamic, rotated credentials is highly recommended to + prevent secrets from being easily exposed within the [job + specification](/docs/job-specification/index.html) + itself which may be leaked into version control or otherwise be accidently + stored on disk on an operator’s local machine. It is also possible to + [integrate with Vault’s PKI secret engine](/guides/security/vault-pki-integration.html) + to automatically generate and renew dynamic, unique X.509 certificates for + each Nomad node with a short + [TTL](https://en.wikipedia.org/wiki/Time_to_live). + +* **[Running without Root](https://groups.google.com/forum/#!topic/nomad-tool/pSyMwC_FSFA)** - + Certain features of Nomad can be used without needing to run the Nomad agent + server or client as the `root` user. Instead you can granularly assign the + appropriate capabilities in various ways for your Nomad agents. For example: + Nomad servers only require access to the data directory; it is possible to + use Nomad to orchestrate Docker containers by adding a non-root `nomad` user + to the `docker` group to access the [default unix + socket](https://docs.docker.com/engine/reference/commandline/dockerd/#daemon-socket-option). + +* **Containers with Sandbox Runtimes** - In some situations, such as running + untrusted code as a service, it may be worth considering using different + container runtimes such as [gVisor](https://gvisor.dev/) or [Kata + Containers](https://katacontainers.io/). These types of runtimes provide + sandboxing features which help prevent raw access to the underlying shared + kernel for other containers and the Nomad client agent itself. + +* **[Disable Unused Drivers](/docs/configuration/client#driver-blacklist)** - + Each driver provides different degrees of isolation, and bugs may allow + unintended privilege escalation. If a task driver is not needed, you can + disable it to reduce risk. + +* **Linux Security Modules** - Use of security modules that can be directly + integrated into operating systems such as AppArmor, SElinux, and Seccomp on + both the Nomad hosts and applied to containers for an extra layer of + security. Seccomp profiles are able to be passed directly to containers + using the + **[security_opt](/docs/drivers/docker.html#security_opt)** + parameter available in the default [Docker + driver](/docs/drivers/docker.html). + +* **[Service Mesh](https://www.hashicorp.com/resources/service-mesh-microservices-networking)** - + Integrating service mesh technologies such as + **[Consul](https://www.consul.io/)** can be extremely useful for limiting + and efficiently load balancing network connectivity within a cluster. + +### Threat Model + +The following are parts of the Nomad threat model: + +* **Nomad agent-to-agent communication** - Transport encryption for + agent-to-agent communication is required to prevent eavesdropping. TCP and + UDP based protocols within Nomad provide different mechanisms for enabling + encryption including symmetric (shared gossip encryption keys) and + asymmetric keys (TLS). + +* **Tampering of data in transit** - Any tampering should be detectable via mTLS + and cause Nomad to avoid processing the request. + +* **Access to data without authentication or authorization** - Requests to the + server should be authenticated and authorized using mTLS and ACLs + respectively. + +* **State modification or corruption due to malicious messages** - Improperly + formatted messages are discarded while properly formatted messages require + authentication and authorization. + +* **Non-server members accessing raw data** - All servers that join the cluster + require proper authentication and authorization in order to begin + participating in Raft. All data in Raft should be encrypted with TLS. + +* **Denial of Service against a node** - DoS attacks against a single node + should not compromise the security posture of Nomad. + +The following are not part of the threat model for server agents: + +* **Access (read or write) to the Nomad data directory** - Information about the + jobs managed by Nomad is persisted to a server’s data directory. + +* **Access (read or write) to the Nomad configuration directory** - Access to + Nomad’s configuration file(s) directory can enable and disable features for + a cluster. + +* **Memory access to a running Nomad server agent** - Direct access to the + memory of the Nomad server agent process ( usually requiring a shell on the + system through various means ) results in almost all aspects of the agent + being compromised including access to certificates and other secrets. + +The following are not part of the threat model for client agents: + +* **Access (read or write) to the Nomad data directory** - Information about the + allocations scheduled to a Nomad client is persisted to its data directory. + This would include any secrets in any of the allocation’s file systems. + +* **Access (read or write) to the Nomad configuration directory** - Access to a + client’s configuration file can enable and disable features for a client + including insecure drivers such as + [raw_exec](/docs/drivers/raw_exec.html). + +* **Memory access to a running Nomad client agent** - Direct access to the + memory of the Nomad client agent process allows an attack to extract secrets + from clients such as Vault tokens. + +* **Lax Client Driver Sandbox** - Drivers may allow some privileged operations, + e.g. filesystem access to configuration directories, or raw accesses to host + devices. Such privileges can be used to facilitate compromise other workloads, + or cause denial-of-service attacks. + +#### Internal Threats + +* **Operator** - Someone with a valid mTLS cert and ACL token may still be a + threat to your cluster in certain situations, especially in multi-team + cluster deployments. They may accidentally or intentionally use a malicious + jobspec to harm a cluster which can help be protected against using + Namespaces and Sentinel policies. + +* **Workload** - Workloads may have host network access within a cluster which + can lead to SSRF due to application security issues outside of the scope of + Nomad which may lead to internal access within the cluster. Using mTLS, ACLs + and Sentinel policies together can add layers of protection against + malicious workloads. + +* **RPC / API Access** - RPC and HTTP API endpoints without mTLS can expose + clusters to abuse within the cluster from malicious workloads. + +* **Client driver** - Drivers implement various workload types for a cluster, + and the backend configuration of these drivers should be considered to + implement defense in depth. For example, a custom Docker driver that limits + the ability to mount the host file system may be subverted by network access + to an exposed Docker daemon API through other means such as the raw_exec + driver. + + +#### External Threats + +There are two main components to consider to for external threats in a Nomad cluster: + +* **Server agent** - Internal cluster leader elections and replication is + managed via Raft between server agents encrypted in transit. However, + information about the server is stored unencrypted at rest in the agent’s + data directory. This information may contain information such as ACL tokens + and TLS certificates. + +* **Client agent** - Client-to-server communication within a cluster is + encrypted and authenticated using mTLS. Information about the allocations on + a client node is unencrypted in the agent’s data and configuration + directory. + +### Network Ports + + +| **Port / Protocol** | Agents | Description | +|----------------------|---------|-------------| +| **4646** / TCP | All | [HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) to provide [UI](/guides/web-ui/access.html) and [API](/api/index.html) access to agents. | +| **4647** / TCP | Servers | [RPC](https://en.wikipedia.org/wiki/Remote_procedure_call) protocol used by agents. | +| **4648** / TCP + UDP | Servers | [gossip](/docs/internals/gossip.html) protocol to manage server membership using [Serf](https://www.serf.io/). |