page_title: Run Consul-Terraform-Sync with High Availability
description: >-
Improve network automation resiliency by enabling high availability for Consul-Terraform-Sync. HA enables persistent task and event data so that CTS functions as expected during a failover event.
---
# Run Consul-Terraform-Sync with High Availability
This topic describes how to run Consul-Terraform-Sync (CTS) configured for high availability. High availability is an enterprise capability that ensures that all changes to Consul that occur during a failover transition are processed and that CTS continues to operate as expected.
A network always has exactly one instance of the CTS cluster that is the designated leader. The leader is responsible for monitoring and running tasks. If the leader fails, CTS triggers the following process when it is configured for high availability:
1. The CTS cluster promotes a new leader from the pool of followers in the network.
1. The new leader begins running all existing tasks in `once-mode` in order to process changes that occurred during the failover transition period. In this mode, CTS runs all existing tasks one time.
1. The new leader logs any errors that occur during `once-mode` operation and the new leader continues to monitor Consul for changes.
In a standard configuration, CTS exits if errors occur when the CTS instance runs tasks in `once-mode`. In a high availability configuration, CTS logs the errors and continues to operate without interruption.
The following diagram shows operating state when high availability is enabled:
The following diagram shows the CTS cluster state after the leader stops. CTS Instance B becomes the leader responsible for monitoring and running tasks.
- The time it takes for a new leader to be elected is determined by the `high_availability.cluster.storage.session_ttl` configuration. The minimum failover time is equal to the `session_ttl` value. The maximum failover time is double the `session_ttl` value.
- If failover occurs during task execution, a new leader is elected. The new leader will attempt to run all tasks once before continuing to monitor for changes.
- If using the [Terraform Cloud (TFC) driver](/docs/nia/network-drivers/terraform-cloud), the task finishes and CTS starts a new leader that attempts to queue a run for each task in TFC in once-mode.
- If using [Terraform driver](/docs/nia/network-drivers/terraform), the task may complete depending on the cause of the failover. The new leader starts and attempts to run each task in [once-mode](/docs/nia/cli/start#modes). Depending on the module and provider, the task may require manual intervention to fix any inconsistencies between the infrastructure and Terraform state.
- If failover occurs when no task is executing, CTS elects a new leader that attempts to run all tasks in once-mode.
We recommend specifying the [TFC driver](/docs/nia/network-drivers/terraform-cloud) in your CTS configuration if you want to run in high availability mode.
Add the `high_availability` block in your CTS configuration and configure the required settings to enable high availability. Refer to the [Configuration reference](#) for details about the configuration fields for the `high_availability` block.
The following example configures high availability functionality for a cluster named `cts-cluster`:
<CodeBlockConfig filename="cts-config.hcl">
```hcl
high_availability {
cluster {
name = "cts-cluster"
storage "consul" {
parent_path = "cts"
namespace = "ns"
session_ttl = "30s"
}
}
instance {
address = "cts-01.example.com"
}
}
```
</CodeBlockConfig>
### ACL permissions
The `session` and `keys` resources in your Consul environment must have `write` permissions. Refer to the [ACL documentation](/docs/security/acl) for details on how to define ACL policies.
If the `high_availability.cluster.storage.namespace` field is configured, then your ACL policy must also enable `write` permissions for the `namespace` resource.
## Start a new CTS cluster
We recommend deploying a cluster that includes three CTS instances. This is so that the cluster has one leader and two followers.
1. Create an HCL configuration file that includes the settings you want to include, including the `high_availability` block. Refer to [Configuration Options for Consul-Terraform-Sync](/docs/nia/configuration) for all configuration options.
1. Issue the startup command and pass the configuration file. Refer to the [`start` command reference](/docs/nia/cli/start#modes) for additional information about CTS startup modes.
1. You can call the `/status` API endpoint to verify the status of tasks CTS is configured to monitor. Refer to the [`/status` API reference documentation](/docs/nia/api/status) for information about usage and responses.
You can implement a rolling update to update a non-task configuration for a CTS instance, such as the Consul connection settings. If you need to update a task in the instance configuration, refer to [Modify tasks](#modify-tasks).
1. Identify the leader CTS instance by either making a call to the [`status` API endpoint](/docs/nia/cli/start) or by checking the logs for the following entry:
When high availability is enabled, CTS persists task and event data. Refer to [State storage and persistence](/docs/nia/architecture#state-storage-and-persistence) for additional information.
You can use the following methods for modifying tasks when high availability is enabled. We recommend choosing a single method to make all task configuration changes because inconsistencies between the state and the configuration can occur when mixing methods.
Use the CTS API to identify the CTS leader instance and delete and replace a task.
1. Identify the leader CTS instance by either making a call to the [`status` API endpoint](/docs/nia/cli/start) or by checking the logs for the following entry:
1. Send a `DELETE` call to the [`/task/<task-name>` endpoint](/docs/nia/api/tasks#delete-task) to delete the task. In the following example, the leader instance is at `localhost:8558`:
You can also use the [`task-create` command](/docs/nia/cli/task#task-create) to complete this step.
### Discard data with the `-reset-storage` flag
You can restart the CTS cluster using the [`-reset-storage` flag](/docs/nia/cli/options) to discard persisted data if you need to update a task.
1. Stop a follower instance.
1. Update the instance’s task configuration.
1. Restart the instance and include the `-reset-storage` flag.
1. Stop all other instances so that the updated instance becomes the leader.
1. Start all other instances again.
1. Restart the instance you restarted in step 3 without the `-reset-storage` flag so that it starts up with the current state. If you continue to run an instance with the `-reset-storage` flag enabled, then CTS will reset the state data whenever the instance becomes the leader.
## Troubleshooting
Use the following troubleshooting procedure if a previous leader had been running a task successfully but the new leader logs an error after a failover:
1. Check the logs printed to the console for errors. Refer to the [`syslog` configuration](/docs/nia/configuration#syslog) for information on how to locate the logs. In the following example output, CTS reported a `401: Bad credentials` error:
| Error: GET https://api.github.com/user: 401 Bad credentials []
|
| with module.config-task.provider["registry.terraform.io/integrations/github"],
| on .terraform/modules/config-task/main.tf line 11, in provider "github":
| 11: provider "github" {
|
```
1. Check for differences between the previous leader and new leader, such as differences in configurations, environment variables, and local resources.
1. Start a new instance with the fix that resolves the issue.
1. Tear down the leader instance that has the issue and any other instances that may have the same issue.