open-nomad/website/content/docs/configuration/server.mdx

---
layout: docs
page_title: server Block - Agent Configuration
description: |-
  The "server" block configures the Nomad agent to operate in server mode to
  participate in scheduling decisions, register with service discovery, handle
  join failures, and more.
---

# `server` Block

<Placement groups={['server']} />

The `server` block configures the Nomad agent to operate in server mode to
participate in scheduling decisions, register with service discovery, handle
join failures, and more.

```hcl
server {
  enabled          = true
  bootstrap_expect = 3
  server_join {
    retry_join = [ "1.1.1.1", "2.2.2.2" ]
    retry_max = 3
    retry_interval = "15s"
  }
}
```

## `server` Parameters

- `authoritative_region` `(string: "")` - Specifies the authoritative region, which
  provides a single source of truth for global configurations such as ACL Policies and
  global ACL tokens. Non-authoritative regions will replicate from the authoritative
  to act as a mirror. By default, the local region is assumed to be authoritative.

- `bootstrap_expect` `(int: required)` - Specifies the number of server nodes to
  wait for before bootstrapping. It is most common to use the odd-numbered
  integers `3` or `5` for this value, depending on the cluster size. A value of
  `1` does not provide any fault tolerance and is not recommended for production
  use cases.

- `data_dir` `(string: "[data_dir]/server")` - Specifies the directory to use
  for server-specific data, including the replicated log. By default, this is
  the top-level [data_dir](/nomad/docs/configuration#data_dir) suffixed with "server",
  like `"/opt/nomad/server"`. The top-level option must be set, even when
  setting this value. This must be an absolute path.

- `enabled` `(bool: false)` - Specifies if this agent should run in server mode.
  All other server options depend on this value being set.

- `enabled_schedulers` `(array<string>: [all])` - Specifies which sub-schedulers
  this server will handle. This can be used to restrict the evaluations that
  worker threads will dequeue for processing.

- `enable_event_broker` `(bool: true)` - Specifies if this server will generate
  events for its event stream.

- `encrypt` `(string: "")` - Specifies the secret key to use for encryption of
  Nomad server's gossip network traffic. This key must be 32 bytes that are
  [RFC4648] "URL and filename safe" base64-encoded. You can generate an
  appropriately-formatted key with the [`nomad operator keygen`] command. The
  provided key is automatically persisted to the data directory and loaded
  automatically whenever the agent is restarted. This means that to encrypt
  Nomad server's gossip protocol, this option only needs to be provided once
  on each agent's initial startup sequence. If it is provided after Nomad has
  been initialized with an encryption key, then the provided key is ignored
  and a warning will be displayed. See the [encryption
  documentation][encryption] for more details on this option and its impact on
  the cluster.

- `event_buffer_size` `(int: 100)` - Specifies the number of events generated
  by the server to be held in memory. Increasing this value enables new
  subscribers to have a larger look back window when initially subscribing.
  Decreasing will lower the amount of memory used for the event buffer.

- `node_gc_threshold` `(string: "24h")` - Specifies how long a node must be in a
  terminal state before it is garbage collected and purged from the system. This
  is specified using a label suffix like "30s" or "1h".

- `job_gc_interval` `(string: "5m")` - Specifies the interval between the job
  garbage collections. Only jobs who have been terminal for at least
  `job_gc_threshold` will be collected. Lowering the interval will perform more
  frequent but smaller collections. Raising the interval will perform collections
  less frequently but collect more jobs at a time. Reducing this interval is
  useful if there is a large throughput of tasks, leading to a large set of
  dead jobs. This is specified using a label suffix like "30s" or "3m". `job_gc_interval`
  was introduced in Nomad 0.10.0.

- `job_gc_threshold` `(string: "4h")` - Specifies the minimum time a job must be
  in the terminal state before it is eligible for garbage collection. This is
  specified using a label suffix like "30s" or "1h".

- `eval_gc_threshold` `(string: "1h")` - Specifies the minimum time an
  evaluation must be in the terminal state before it is eligible for garbage
  collection. This is specified using a label suffix like "30s" or "1h". Note
  that batch job evaluations are controlled via `batch_eval_gc_threshold`.

- `batch_eval_gc_threshold` `(string: "24h")` - Specifies the minimum time an
  evaluation stemming from a batch job must be in the terminal state before it is
  eligible for garbage collection. This is specified using a label suffix like
  "30s" or "1h". Note that the threshold is a necessary but insufficient condition
  for collection, and the most recent evaluation won't be garbage collected even if
  it breaches the threshold.

- `deployment_gc_threshold` `(string: "1h")` - Specifies the minimum time a
  deployment must be in the terminal state before it is eligible for garbage
  collection. This is specified using a label suffix like "30s" or "1h".

- `csi_volume_claim_gc_threshold` `(string: "1h")` - Specifies the minimum age of
  a CSI volume before it is eligible to have its claims garbage collected.
  This is specified using a label suffix like "30s" or "1h".

- `csi_plugin_gc_threshold` `(string: "1h")` - Specifies the minimum age of a
  CSI plugin before it is eligible for garbage collection if not in use.
  This is specified using a label suffix like "30s" or "1h".

- `acl_token_gc_threshold` `(string: "1h")` - Specifies the minimum age of an
  expired ACL token before it is eligible for garbage collection. This is
  specified using a label suffix like "30s" or "1h".

- `default_scheduler_config` <code>([scheduler_configuration][update-scheduler-config]:
  nil)</code> - Specifies the initial default scheduler config when
  bootstrapping cluster. The parameter is ignored once the cluster is bootstrapped or
  value is updated through the [API endpoint][update-scheduler-config]. See [the
  example section](#configuring-scheduler-config) for more details
  `default_scheduler_config` was introduced in Nomad 0.10.4.

- `heartbeat_grace` `(string: "10s")` - Specifies the additional time given
  beyond the heartbeat TTL of Clients to account for network and processing
  delays and clock skew. This is specified using a label suffix like "30s" or
  "1h". See [Client Heartbeats](#client-heartbeats) below for details.

- `min_heartbeat_ttl` `(string: "10s")` - Specifies the minimum time between
  Client heartbeats. This is used as a floor to prevent excessive updates. This
  is specified using a label suffix like "30s" or "1h". See [Client
  Heartbeats](#client-heartbeats) below for details.

- `failover_heartbeat_ttl` `(string: "5m")` - The time by which all Clients
  must heartbeat after a Server leader election. This is specified using a label
  suffix like "30s" or "1h". See [Client Heartbeats](#client-heartbeats) below
  for details.

- `max_heartbeats_per_second` `(float: 50.0)` - Specifies the maximum target
  rate of heartbeats being processed per second. This allows the TTL to be
  increased to meet the target rate. See [Client Heartbeats](#client-heartbeats)
  below for details.

- `non_voting_server` `(bool: false)` - (Enterprise-only) Specifies whether
  this server will act as a non-voting member of the cluster to help provide
  read scalability.

- `num_schedulers` `(int: [num-cores])` - Specifies the number of parallel
  scheduler threads to run. This can be as many as one per core, or `0` to
  disallow this server from making any scheduling decisions. This defaults to
  the number of CPU cores.

- `license_path` `(string: "")` - Specifies the path to load a Nomad Enterprise
  license from. This must be an absolute path (`/opt/nomad/license.hclic`). The
  license can also be set by setting `NOMAD_LICENSE_PATH` or by setting
  `NOMAD_LICENSE` as the entire license value. `license_path` has the highest
  precedence, followed by `NOMAD_LICENSE` and then `NOMAD_LICENSE_PATH`.

- `plan_rejection_tracker` <code>([PlanRejectionTracker](#plan_rejection_tracker-parameters))</code> -
  Configuration for the plan rejection tracker that the Nomad leader uses to
  track the history of plan rejections.

- `raft_boltdb` - This is a nested object that allows configuring options for
  Raft's BoltDB based log store.
    - `no_freelist_sync` - Setting this to `true` will disable syncing the BoltDB
    freelist to disk within the `raft.db` file. Not syncing the freelist to disk
    will reduce disk IO required for write operations at the expense of longer
    server startup times.

- `raft_protocol` `(int: 3)` - Specifies the Raft protocol version to use when
  communicating with other Nomad servers. This affects available Autopilot
  features and is typically not required as the agent internally knows the
  latest version, but may be useful in some upgrade scenarios. Must be `3` in
  Nomad v1.4 or later.

- `raft_multiplier` `(int: 1)` - An integer multiplier used by Nomad servers to
  scale key Raft timing parameters. Omitting this value or setting it to 0 uses
  default timing described below. Lower values are used to tighten timing and
  increase sensitivity while higher values relax timings and reduce sensitivity.
  Tuning this affects the time it takes Nomad to detect leader failures and to
  perform leader elections, at the expense of requiring more network and CPU
  resources for better performance. The maximum allowed value is 10.

  By default, Nomad will use the highest-performance timing, currently equivalent
  to setting this to a value of 1. Increasing the timings makes leader election
  less likely during periods of networking issues or resource starvation. Since
  leader elections pause Nomad's normal work, it may be beneficial for slow or
  unreliable networks to wait longer before electing a new leader. The trade-off
  when raising this value is that during network partitions or other events
  (server crash) where a leader is lost, Nomad will not elect a new leader for
  a longer period of time than the default. The [`nomad.nomad.leader.barrier` and
  `nomad.raft.leader.lastContact` metrics](/nomad/docs/operations/metrics-reference) are a good
  indicator of how often leader elections occur and Raft latency.

- `raft_snapshot_threshold` `(int: "8192")` - Specifies the minimum number of
  Raft logs to be written to disk before the node is allowed to take a snapshot.
  This reduces the frequency and impact of creating snapshots. During node
  startup, Raft restores the latest snapshot and then applies the individual
  logs to catch the node up to the last known state. This can be tuned during
  operation by a hot configuration reload.

- `raft_snapshot_interval` `(string: "120s")` - Specifies the minimum time between
  checks if Raft should perform a snapshot. The Raft library randomly staggers
  between this value and twice this value to avoid the entire cluster performing
  a snapshot at once. Nodes are eligible to snapshot once they have exceeded the
  `raft_snapshot_threshold`. This value can be tuned during operation by a hot
  configuration reload.

- `raft_trailing_logs` `(int: "10240")` - Specifies how many logs are retained
  after a snapshot. These logs are used so that Raft can quickly replay logs on
  a follower instead of being forced to send an entire snapshot. This value can
  be tuned during operation by a hot configuration reload.

- `redundancy_zone` `(string: "")` - (Enterprise-only) Specifies the redundancy
  zone that this server will be a part of for Autopilot management. For more
  information, see the [Autopilot Guide](/nomad/tutorials/manage-clusters/autopilot).

- `rejoin_after_leave` `(bool: false)` - Specifies if Nomad will ignore a
  previous leave and attempt to rejoin the cluster when starting. By default,
  Nomad treats leave as a permanent intent and does not attempt to join the
  cluster again when starting. This flag allows the previous state to be used to
  rejoin the cluster.

- `root_key_gc_interval` `(string: "10m")` - Specifies the interval between
  [encryption key][] metadata garbage collections.

- `root_key_gc_threshold` `(string: "1h")` - Specifies the minimum time that an
  [encryption key][] must exist before it can be eligible for garbage
  collection.

- `root_key_rotation_threshold` `(string: "720h")` - Specifies the minimum time
  that an [encryption key][] must exist before it is automatically rotated on
  the next garbage collection interval.

- `server_join` <code>([server_join][server-join]: nil)</code> - Specifies
  how the Nomad server will connect to other Nomad servers. The `retry_join`
  fields may directly specify the server address or use go-discover syntax for
  auto-discovery. See the [server_join documentation][server-join] for more detail.

- `upgrade_version` `(string: "")` - A custom version of the format X.Y.Z to use
  in place of the Nomad version when custom upgrades are enabled in Autopilot.
  For more information, see the [Autopilot Guide](/nomad/tutorials/manage-clusters/autopilot).

- `search` <code>([search][search]: nil)</code> - Specifies configuration parameters
  for the Nomad search API.

### Deprecated Parameters

- `retry_join` `(array<string>: [])` - Specifies a list of server addresses to
  retry joining if the first attempt fails. This is similar to
  [`start_join`](#start_join), but only invokes if the initial join attempt
  fails. The list of addresses will be tried in the order specified, until one
  succeeds. After one succeeds, no further addresses will be contacted. This is
  useful for cases where we know the address will become available eventually.
  Use `retry_join` with an array as a replacement for `start_join`, **do not use
  both options**. See the [server_join][server-join]
  section for more information on the format of the string. This field is
  deprecated in favor of the [server_join block][server-join].

- `retry_interval` `(string: "30s")` - Specifies the time to wait between retry
  join attempts. This field is deprecated in favor of the [server_join
  block][server-join].

- `retry_max` `(int: 0)` - Specifies the maximum number of join attempts to be
  made before exiting with a return code of 1. By default, this is set to 0
  which is interpreted as infinite retries. This field is deprecated in favor of
  the [server_join block][server-join].

- `start_join` `(array<string>: [])` - Specifies a list of server addresses to
  join on startup. If Nomad is unable to join with any of the specified
  addresses, agent startup will fail. See the [server address
  format](/nomad/docs/configuration/server_join#server-address-format)
  section for more information on the format of the string. This field is
  deprecated in favor of the [server_join block][server-join].

### `plan_rejection_tracker` Parameters

The leader plan rejection tracker can be adjusted to prevent evaluations from
getting stuck due to always being scheduled to a client that may have an
unexpected issue. Refer to [Monitoring Nomad][monitoring_nomad_progress] for
more details.

- `enabled` `(bool: false)` - Specifies if plan rejections should be tracked.

- `node_threshold` `(int: 100)` - The number of plan rejections for a node
  within the `node_window` to trigger a client to be set as ineligible.

- `node_window` `(string: "5m")` - The time window for when plan rejections for
  a node should be considered.

If you observe too many false positives (clients being marked as ineligible
even if they don't present any problem) you may want to increase
`node_threshold`.

Or if you are noticing jobs not being scheduled due to plan rejections for the
same `node_id` and the client is not being set as ineligible you can try
increasing the `node_window` so more historical rejections are taken into
account.

## `server` Examples

### Common Setup

This example shows a common Nomad agent `server` configuration block. The two
IP addresses could also be DNS, and should point to the other Nomad servers in
the cluster

```hcl
server {
  enabled          = true
  bootstrap_expect = 3

  server_join {
    retry_join     = [ "1.1.1.1", "2.2.2.2" ]
    retry_max      = 3
    retry_interval = "15s"
  }
}
```

### Configuring Data Directory

This example shows configuring a custom data directory for the server data.

```hcl
server {
  data_dir = "/opt/nomad/server"
}
```

### Automatic Bootstrapping

The Nomad servers can automatically bootstrap if Consul is configured. For a
more detailed explanation, please see the
[automatic Nomad bootstrapping documentation](/nomad/tutorials/manage-clusters/clustering).

### Restricting Schedulers

This example shows restricting the schedulers that are enabled as well as the
maximum number of cores to utilize when participating in scheduling decisions:

```hcl
server {
  enabled            = true
  enabled_schedulers = ["batch", "service"]
  num_schedulers     = 7
}
```

### Bootstrapping with a Custom Scheduler Config ((#configuring-scheduler-config))

While [bootstrapping a cluster], you can use the `default_scheduler_config` block
to prime the cluster with a [`SchedulerConfig`][update-scheduler-config]. The
scheduler configuration determines which scheduling algorithm is configured—
spread scheduling or binpacking—and which job types are eligible for preemption.

~> **Warning:** Once the cluster is bootstrapped, you must configure this using
the [update scheduler configuration][update-scheduler-config] API. This
option is only consulted during bootstrap.

The structure matches the [Update Scheduler Config][update-scheduler-config] API
endpoint, which you should consult for canonical documentation. However, the
attributes names must be adapted to HCL syntax by using snake case
representations rather than camel case.

This example shows configuring spread scheduling and enabling preemption for all
job-type schedulers.

```hcl
server {
  default_scheduler_config {
    scheduler_algorithm             = "spread"
    memory_oversubscription_enabled = true
    reject_job_registration         = false
    pause_eval_broker               = false # New in Nomad 1.3.2

    preemption_config {
      batch_scheduler_enabled    = true
      system_scheduler_enabled   = true
      service_scheduler_enabled  = true
      sysbatch_scheduler_enabled = true # New in Nomad 1.2
    }
  }
}
```

## Client Heartbeats ((#client-heartbeats))

~> This is an advanced topic. It is most beneficial to clusters over 1,000
   nodes or with unreliable networks or nodes (eg some edge deployments).

Nomad Clients periodically heartbeat to Nomad Servers to confirm they are
operating as expected. Nomad Clients which do not heartbeat in the specified
amount of time are considered `down` and their allocations are marked as `lost`
or `disconnected` (if [`max_client_disconnect`][max_client_disconnect] is set)
and rescheduled.

The various heartbeat related parameters allow you to tune the following
tradeoffs:

- The longer the heartbeat period, the longer a `down` Client's workload will
  take to be rescheduled.
- The shorter the heartbeat period, the more likely transient network issues,
  leader elections, and other temporary issues could cause a perfectly
  functional Client and its workloads to be marked as `down` and the work
  rescheduled.

While Nomad Clients can connect to any Server, all heartbeats are forwarded to
the leader for processing. Since this heartbeat processing consumes resources,
Nomad adjusts the rate at which Clients heartbeat based on cluster size. The
goal is to try to keep the resource cost of processing heartbeats constant
regardless of cluster size.

The base formula for determining how often a Client must heartbeat is:

```
<number of Clients> / <max_heartbeats_per_second>
```

Other factors modify this base TTL:

- A random factor up to `2x` is added to the base TTL to prevent the
  [thundering herd][herd] problem where a large number of clients attempt to
  heartbeat at exactly the same time.
- [`min_heartbeat_ttl`](#min_heartbeat_ttl) is used as the lower bound to
  prevent small clusters from sending excessive heartbeats.
- [`heartbeat_grace`](#heartbeat_grace) is the amount of _extra_ time the
  leader will wait for a heartbeat beyond the base heartbeat.
- After a leader election all Clients are given up to `failover_heartbeat_ttl`
  to successfully heartbeat. This gives Clients time to discover a functioning
  Server in case they were directly connected to a leader that crashed.

For example, given the default values for heartbeat parameters, different sized
clusters will use the following TTLs for the heartbeats. Note that the `Server TTL`
simply adds the `heartbeat_grace` parameter to the TTL Clients are given.

| Clients | Client TTL  | Server TTL  | Safe after elections |
| ------- | ----------- | ----------- | -------------------- |
| 10      | 10s - 20s   | 20s - 30s   | yes                  |
| 100     | 10s - 20s   | 20s - 30s   | yes                  |
| 1000    | 20s - 40s   | 30s - 50s   | yes                  |
| 5000    | 100s - 200s | 110s - 210s | yes                  |
| 10000   | 200s - 400s | 210s - 410s | NO (see below)       |

Regardless of size, all clients will have a Server TTL of
`failover_heartbeat_ttl` after a leader election. It should always be larger
than the maximum Client TTL for your cluster size in order to prevent marking
live Clients as `down`.

For clusters over 5000 Clients you should increase `failover_heartbeat_ttl`
using the following formula:

```
(2 * (<number of Clients> / <max_heartbeats_per_second>)) + (10 * <min_heartbeat_ttl>)

# For example with 6000 Clients:
(2 * (6000 / 50)) + (10 * 10) = 340s (5m40s)
```

This ensures Clients have some additional time to failover even if they were
told to heartbeat after the maximum interval.

The actual value used should take into consideration how much tolerance your
system has for a delay in noticing crashed Clients. For example a
`failover_heartbeat_ttl` of 30 minutes may give even the slowest clients in the
largest clusters ample time to heartbeat after an election. However if the
election was due to a datacenter-wide failure affecting Clients, it will be 30
minutes before Nomad recognizes that they are `down` and reschedules their
work.

[encryption]: /nomad/tutorials/transport-security/security-gossip-encryption 'Nomad Encryption Overview'
[server-join]: /nomad/docs/configuration/server_join 'Server Join'
[update-scheduler-config]: /nomad/api-docs/operator/scheduler#update-scheduler-configuration 'Scheduler Config'
[bootstrapping a cluster]: /nomad/docs/faq#bootstrapping
[rfc4648]: https://tools.ietf.org/html/rfc4648#section-5
[monitoring_nomad_progress]: /nomad/docs/operations/monitoring-nomad#progress
[`nomad operator keygen`]: /nomad/docs/commands/operator/keygen
[search]: /nomad/docs/configuration/search
[encryption key]: /nomad/docs/operations/key-management
[max_client_disconnect]: /nomad/docs/job-specification/group#max-client-disconnect
[herd]: https://en.wikipedia.org/wiki/Thundering_herd_problem