open-nomad/website/pages/docs/autoscaling/plugins/apm.mdx

196 lines
5.1 KiB
Plaintext

---
layout: docs
page_title: APM
sidebar_title: APM
description: APM plugins provide metric data points describing the resources current state.
---
# APM Plugins
APMs are used to store metrics about an applications performance and current
state. The APM (Application Performance Management) plugin is responsible for
querying the APM and returning a value which will be used to determine if
scaling should occur.
## Prometheus APM Plugin
Use [Prometheus][prometheus_io] metrics to scale your Nomad job task groups or
cluster. The query performed on Prometheus should return a single value. You can
use the [scalar][prometheus_scaler_function] function in your query to achieve
this.
### Agent Configuration Options
```hcl
apm "prometheus" {
driver = "prometheus"
config = {
address = "http://prometheus.my.endpoint.io:9090"
}
}
```
- `address` `(string: "http://127.0.0.1:9090")` - The address of the Prometheus
endpoint used to perform queries.
### Policy Configuration Options
```hcl
check {
source = "prometheus"
query = "scalar(avg((haproxy_server_current_sessions{backend=\"http_back\"}) and (haproxy_server_up{backend=\"http_back\"} == 1)))"
...
}
```
## Datadog APM Plugin
The [Datadog][datadog_homepage] APM allows using [time series][datadog_timeseries]
data to make scaling decisions.
### Agent Configuration Options
```hcl
apm "datadog" {
driver = "datadog"
config = {
dd_api_key = "<api key>"
dd_app_key = "<app key>"
}
}
```
- `dd_api_key` `(string: "")` - The Datadog API key to use for authentication.
- `dd_app_key` `(string: "")` - The Datadog APP key to use for authentication.
The Datadog plugin can also read its configuration options via environment
variables. The accepted keys are `DD_API_KEY` and `DD_APP_KEY`. The agent
configuration parameters take precedence over the environment variables.
### Policy Configuration Options
```hcl
check {
source = "datadog"
query = "FROM=2m;TO=0m;QUERY=avg:proxy.backend.response.time{proxy-service:web-app}"
...
}
```
The query consists of three sections, each separated using a `;` delimiter. More
information on the arguments can be found on the [Datadog site][datadog_timeseries].
- `FROM` - A time offset which indicates the start of the queried time period.
- `TO` - A time offset which indicates the end of the queried time period.
- `QUERY` - The query string to execute.
## Nomad APM Plugin
The Nomad APM plugin allows querying the Nomad API for metric data. This provides
an immediate starting point without addition applications but comes at the price
of efficiency. When using this APM, it is advised to monitor Nomad carefully
ensuring it is not put under excessive load pressure.
### Agent Configuration Options
```hcl
apm "nomad-apm" {
driver = "nomad-apm"
}
```
When using a Nomad cluster with ACLs enabled, following ACL policy will provide the appropriate
permissions for obtaining task group metrics:
```hcl
namespace "default" {
policy = "read"
capabilities = ["read-job"]
}
```
In order to obtain cluster level metrics, the following ACL policy will be required:
```hcl
node {
policy = "read"
}
namespace "default" {
policy = "read"
capabilities = ["read-job"]
}
```
### Policy Configuration Options - Task Groups
The Nomad APM allows querying Nomad to understand the current resource usage of
a task group.
```hcl
check {
source = "nomad-apm"
query = "avg_cpu"
...
}
```
Querying Nomad task group metrics is be done using the `operation_metric` syntax,
where valid operations are:
- `avg` - returns the average of the metric value across allocations in the task
group.
- `min` - returns the lowest metric value among the allocations in the task group.
- `max` - returns the highest metric value among the allocations in the task
group.
- `sum` - returns the sum of all the metric values for the allocations in the
task group.
The metric value can be:
- `cpu` - CPU usage as reported by the `nomad.client.allocs.cpu.total_percent`
metric.
- `memory` - Memory usage as reported by the `nomad.client.allocs.memory.usage`
metric.
### Policy Configuration Options - Client Nodes
The Nomad APM allows querying Nomad to understand the current allocated resource
as a percentage of the total available.
```hcl
check {
source = "nomad-apm"
query = "percentage-allocated_cpu"
...
}
```
Querying Nomad client node metrics is be done using the `operation_metric` syntax,
where valid operations are:
- `percentage-allocated` - returns the allocated percentage of the desired
resource.
The metric value can be:
- `cpu` - allocated CPU as reported by calculating total allocatable against the
total allocated by the scheduler.
- `memory` - allocated memory as reported by calculating total allocatable against
the total allocated by the scheduler.
[prometheus_io]: https://prometheus.io/
[prometheus_scaler_function]: https://prometheus.io/docs/prometheus/latest/querying/functions/#scalar
[nomad_telemetry_stanza]: /docs/configuration/telemetry#inlinecode-publish_allocation_metrics
[datadog_homepage]: https://www.datadoghq.com/
[datadog_timeseries]: https://docs.datadoghq.com/api/v1/metrics/#query-timeseries-points