2017-06-29 00:10:05 +00:00
|
|
|
|
---
|
|
|
|
|
layout: "guides"
|
|
|
|
|
page_title: "Apache Spark Integration - Customizing Applications"
|
2019-05-08 21:40:38 +00:00
|
|
|
|
sidebar_current: "guides-analytical-workloads-spark-customizing"
|
2017-06-29 00:10:05 +00:00
|
|
|
|
description: |-
|
2019-05-08 21:40:38 +00:00
|
|
|
|
Learn how to customize the Nomad job that is created to run a Spark
|
2017-06-29 00:10:05 +00:00
|
|
|
|
application.
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
# Customizing Applications
|
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
There are two ways to customize the Nomad job that Spark creates to run an
|
2017-07-07 20:20:40 +00:00
|
|
|
|
application:
|
|
|
|
|
|
|
|
|
|
- Use the default job template and set configuration properties
|
|
|
|
|
- Use a custom job template
|
|
|
|
|
|
|
|
|
|
## Using the Default Job Template
|
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
The Spark integration will use a generic job template by default. The template
|
|
|
|
|
includes groups and tasks for the driver, executors and (optionally) the
|
2017-07-08 15:20:10 +00:00
|
|
|
|
[shuffle service](/guides/spark/dynamic.html). The job itself and the tasks that
|
|
|
|
|
are created have the `spark.nomad.role` meta value defined accordingly:
|
2017-06-29 00:10:05 +00:00
|
|
|
|
|
|
|
|
|
```hcl
|
|
|
|
|
job "structure" {
|
|
|
|
|
meta {
|
|
|
|
|
"spark.nomad.role" = "application"
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
# A driver group is only added in cluster mode
|
|
|
|
|
group "driver" {
|
|
|
|
|
task "driver" {
|
|
|
|
|
meta {
|
|
|
|
|
"spark.nomad.role" = "driver"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
group "executors" {
|
|
|
|
|
count = 2
|
|
|
|
|
task "executor" {
|
|
|
|
|
meta {
|
|
|
|
|
"spark.nomad.role" = "executor"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
# Shuffle service tasks are only added when enabled (as it must be when
|
2017-06-29 00:10:05 +00:00
|
|
|
|
# using dynamic allocation)
|
|
|
|
|
task "shuffle-service" {
|
|
|
|
|
meta {
|
|
|
|
|
"spark.nomad.role" = "shuffle"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
The default template can be customized indirectly by explicitly [setting
|
2017-07-07 20:20:40 +00:00
|
|
|
|
configuration properties](/guides/spark/configuration.html).
|
2017-06-29 00:10:05 +00:00
|
|
|
|
|
2017-07-07 20:20:40 +00:00
|
|
|
|
## Using a Custom Job Template
|
2017-06-29 00:10:05 +00:00
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
An alternative to using the default template is to set the
|
|
|
|
|
`spark.nomad.job.template` configuration property to the path of a file
|
2017-07-07 20:20:40 +00:00
|
|
|
|
containing a custom job template. There are two important considerations:
|
2017-06-29 00:10:05 +00:00
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
* The template must use the JSON format. You can convert an HCL jobspec to
|
2018-03-22 20:39:18 +00:00
|
|
|
|
JSON by running `nomad job run -output <job.nomad>`.
|
2017-06-29 00:10:05 +00:00
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
* `spark.nomad.job.template` should be set to a path on the submitting
|
|
|
|
|
machine, not to a URL (even in cluster mode). The template does not need to
|
2017-06-29 00:10:05 +00:00
|
|
|
|
be accessible to the driver or executors.
|
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
Using a job template you can override Spark’s default resource utilization, add
|
|
|
|
|
additional metadata or constraints, set environment variables, add sidecar
|
|
|
|
|
tasks and utilize the Consul and Vault integration. The template does
|
|
|
|
|
not need to be a complete Nomad job specification, since Spark will add
|
|
|
|
|
everything necessary to run your the application. For example, your template
|
|
|
|
|
might set `job` metadata, but not contain any task groups, making it an
|
2017-07-07 20:20:40 +00:00
|
|
|
|
incomplete Nomad job specification but still a valid template to use with Spark.
|
2017-06-29 00:10:05 +00:00
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
To customize the driver task group, include a task group in your template that
|
2017-06-29 00:10:05 +00:00
|
|
|
|
has a task that contains a `spark.nomad.role` meta value set to `driver`.
|
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
To customize the executor task group, include a task group in your template that
|
|
|
|
|
has a task that contains a `spark.nomad.role` meta value set to `executor` or
|
2017-06-29 00:10:05 +00:00
|
|
|
|
`shuffle`.
|
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
The following template adds a `meta` value at the job level and an environment
|
2017-06-29 00:10:05 +00:00
|
|
|
|
variable to the executor task group:
|
|
|
|
|
|
|
|
|
|
```hcl
|
|
|
|
|
job "template" {
|
|
|
|
|
|
|
|
|
|
meta {
|
|
|
|
|
"foo" = "bar"
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
group "executor-group-name" {
|
|
|
|
|
|
|
|
|
|
task "executor-task-name" {
|
|
|
|
|
meta {
|
|
|
|
|
"spark.nomad.role" = "executor"
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
env {
|
|
|
|
|
BAZ = "something"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
2017-07-17 18:41:50 +00:00
|
|
|
|
## Order of Precedence
|
2017-07-07 20:20:40 +00:00
|
|
|
|
|
|
|
|
|
The order of precedence for customized settings is as follows:
|
|
|
|
|
|
|
|
|
|
1. Explicitly set configuration properties.
|
|
|
|
|
2. Settings in the job template (if provided).
|
|
|
|
|
3. Default values of the configuration properties.
|
|
|
|
|
|
2017-06-29 00:10:05 +00:00
|
|
|
|
## Next Steps
|
|
|
|
|
|
2019-05-08 21:40:38 +00:00
|
|
|
|
Learn how to [allocate resources](/guides/spark/resource.html) for your Spark
|
2017-06-29 00:10:05 +00:00
|
|
|
|
applications.
|