open-nomad/website/source/guides/spark/configuration.html.md

155 lines
7.4 KiB
Markdown

---
layout: "guides"
page_title: "Apache Spark Integration - Configuration Properties"
sidebar_current: "guides-spark-configuration"
description: |-
Comprehensive list of Spark configuration properties.
---
# Spark Configuration Properties
Spark [configuration properties](https://spark.apache.org/docs/latest/configuration.html#available-properties)
are generally applicable to the Nomad integration. The properties listed below
are specific to running Spark on Nomad. Configuration properties can be set by
adding `--conf [property]=[value]` to the `spark-submit` command.
- `spark.nomad.authToken` `(string: nil)` - Specifies the secret key of the auth
token to use when accessing the API. This falls back to the NOMAD_TOKEN environment
variable. Note that if this configuration setting is set and the cluster deploy
mode is used, this setting will be propagated to the driver application in the
job spec. If it is not set and an auth token is taken from the NOMAD_TOKEN
environment variable, the token will not be propagated to the driver which will
require the driver to pick up its token from an environment variable.
- `spark.nomad.cluster.expectImmediateScheduling` `(bool: false)` - Specifies
that `spark-submit` should fail if Nomad is not able to schedule the job
immediately.
- `spark.nomad.cluster.monitorUntil` `(string: "submitted"`) - Specifies the
length of time that `spark-submit` should monitor a Spark application in cluster
mode. When set to `submitted`, `spark-submit` will return as soon as the
application has been submitted to the Nomad cluster. When set to `scheduled`,
`spark-submit` will return as soon as the Nomad job has been scheduled. When
set to `complete`, `spark-submit` will tail the output from the driver process
and return when the job has completed.
- `spark.nomad.datacenters` `(string: dynamic)` - Specifies a comma-separated
list of Nomad datacenters to use. This property defaults to the datacenter of
the first Nomad server contacted.
- `spark.nomad.docker.email` `(string: nil)` - Specifies the email address to
use when downloading the Docker image specified by
[spark.nomad.dockerImage](#spark.nomad.dockerImage). See the
[Docker driver authentication](https://www.nomadproject.io/docs/drivers/docker.html#authentication)
docs for more information.
- `spark.nomad.docker.password` `(string: nil)` - Specifies the password to use
when downloading the Docker image specified by
[spark.nomad.dockerImage](#spark.nomad.dockerImage). See the
[Docker driver authentication](https://www.nomadproject.io/docs/drivers/docker.html#authentication)
docs for more information.
- `spark.nomad.docker.serverAddress` `(string: nil)` - Specifies the server
address (domain/IP without the protocol) to use when downloading the Docker
image specified by [spark.nomad.dockerImage](#spark.nomad.dockerImage). Docker
Hub is used by default. See the
[Docker driver authentication](https://www.nomadproject.io/docs/drivers/docker.html#authentication)
docs for more information.
- `spark.nomad.docker.username` `(string: nil)` - Specifies the username to use
when downloading the Docker image specified by
[spark.nomad.dockerImage](#spark-nomad-dockerImage). See the
[Docker driver authentication](https://www.nomadproject.io/docs/drivers/docker.html#authentication)
docs for more information.
- `spark.nomad.dockerImage` `(string: nil)` - Specifies the `URL` for the
[Docker image](https://www.nomadproject.io/docs/drivers/docker.html#image) to
use to run Spark with Nomad's `docker` driver. When not specified, Nomad's
`exec` driver will be used instead.
- `spark.nomad.driver.cpu` `(string: "1000")` - Specifies the CPU in MHz that
should be reserved for driver tasks.
- `spark.nomad.driver.logMaxFileSize` `(string: "1m")` - Specifies the maximum
size by time that Nomad should use for driver task log files.
- `spark.nomad.driver.logMaxFiles` `(string: "5")` - Specifies the number of log
files that Nomad should keep for driver tasks.
- `spark.nomad.driver.networkMBits` `(string: "1")` - Specifies the network
bandwidth that Nomad should allocate to driver tasks.
- `spark.nomad.driver.retryAttempts` `(string: "5")` - Specifies the number of
times that Nomad should retry driver task groups upon failure.
- `spark.nomad.driver.retryDelay` `(string: "15s")` - Specifies the length of
time that Nomad should wait before retrying driver task groups upon failure.
- `spark.nomad.driver.retryInterval` `(string: "1d")` - Specifies Nomad's retry
interval for driver task groups.
- `spark.nomad.executor.cpu` `(string: "1000")` - Specifies the CPU in MHz that
should be reserved for executor tasks.
- `spark.nomad.executor.logMaxFileSize` `(string: "1m")` - Specifies the maximum
size by time that Nomad should use for executor task log files.
- `spark.nomad.executor.logMaxFiles` `(string: "5")` - Specifies the number of
log files that Nomad should keep for executor tasks.
- `spark.nomad.executor.networkMBits` `(string: "1")` - Specifies the network
bandwidth that Nomad should allocate to executor tasks.
- `spark.nomad.executor.retryAttempts` `(string: "5")` - Specifies the number of
times that Nomad should retry executor task groups upon failure.
- `spark.nomad.executor.retryDelay` `(string: "15s")` - Specifies the length of
time that Nomad should wait before retrying executor task groups upon failure.
- `spark.nomad.executor.retryInterval` `(string: "1d")` - Specifies Nomad's retry
interval for executor task groups.
- `spark.nomad.job.template` `(string: nil)` - Specifies the path to a JSON file
containing a Nomad job to use as a template. This can also be set with
`spark-submit's --nomad-template` parameter.
- `spark.nomad.namespace` `(string: nil)` - Specifies the namespace to use. This
falls back first to the NOMAD_NAMESPACE environment variable and then to Nomad's
default namespace.
- `spark.nomad.priority` `(string: nil)` - Specifies the priority for the
Nomad job.
- `spark.nomad.region` `(string: dynamic)` - Specifies the Nomad region to use.
This property defaults to the region of the first Nomad server contacted.
- `spark.nomad.shuffle.cpu` `(string: "1000")` - Specifies the CPU in MHz that
should be reserved for shuffle service tasks.
- `spark.nomad.shuffle.logMaxFileSize` `(string: "1m")` - Specifies the maximum
size by time that Nomad should use for shuffle service task log files..
- `spark.nomad.shuffle.logMaxFiles` `(string: "5")` - Specifies the number of
log files that Nomad should keep for shuffle service tasks.
- `spark.nomad.shuffle.memory` `(string: "256m")` - Specifies the memory that
Nomad should allocate for the shuffle service tasks.
- `spark.nomad.shuffle.networkMBits` `(string: "1")` - Specifies the network
bandwidth that Nomad should allocate to shuffle service tasks.
- `spark.nomad.sparkDistribution` `(string: nil)` - Specifies the location of
the Spark distribution archive file to use.
- `spark.nomad.tls.caCert` `(string: nil)` - Specifies the path to a `.pem` file
containing the certificate authority that should be used to validate the Nomad
server's TLS certificate.
- `spark.nomad.tls.cert` `(string: nil)` - Specifies the path to a `.pem` file
containing the TLS certificate to present to the Nomad server.
- `spark.nomad.tls.trustStorePassword` `(string: nil)` - Specifies the path to a
`.pem` file containing the private key corresponding to the certificate in
[spark.nomad.tls.cert](#spark-nomad-tls-cert).