open-nomad/website/source/guides/analytical-workloads/spark/customizing.html.md

---
layout: "guides"
page_title: "Apache Spark Integration - Customizing Applications"
sidebar_current: "guides-analytical-workloads-spark-customizing"
description: |-
  Learn how to customize the Nomad job that is created to run a Spark
  application.
---

# Customizing Applications

There are two ways to customize the Nomad job that Spark creates to run an
application:

 - Use the default job template and set configuration properties
 - Use a custom job template

## Using the Default Job Template

The Spark integration will use a generic job template by default. The template
includes groups and tasks for the driver, executors and (optionally) the
[shuffle service](/guides/spark/dynamic.html). The job itself and the tasks that
 are created have the `spark.nomad.role` meta value defined accordingly:

```hcl
job "structure" {
  meta {
    "spark.nomad.role" = "application"
  }

  # A driver group is only added in cluster mode
  group "driver" {
    task "driver" {
      meta {
        "spark.nomad.role" = "driver"
      }
    }
  }

  group "executors" {
    count = 2
    task "executor" {
      meta {
        "spark.nomad.role" = "executor"
      }
    }

    # Shuffle service tasks are only added when enabled (as it must be when
    # using dynamic allocation)
    task "shuffle-service" {
      meta {
        "spark.nomad.role" = "shuffle"
      }
    }
  }
}
```

The default template can be customized indirectly by explicitly [setting
configuration properties](/guides/spark/configuration.html).

## Using a Custom Job Template

An alternative to using the default template is to set the
`spark.nomad.job.template` configuration property to the path of a file
containing a custom job template. There are two important considerations:

  * The template must use the JSON format. You can convert an HCL jobspec to
  JSON by running `nomad job run -output <job.nomad>`.

  * `spark.nomad.job.template` should be set to a path on the submitting
  machine, not to a URL (even in cluster mode). The template does not need to
  be accessible to the driver or executors.

Using a job template you can override Spark’s default resource utilization, add
additional metadata or constraints, set environment variables, add sidecar
tasks and utilize the Consul and Vault integration. The template does
not need to be a complete Nomad job specification, since Spark will add
everything necessary to run your the application. For example, your template
might set `job` metadata, but not contain any task groups, making it an
incomplete Nomad job specification but still a valid template to use with Spark.

To customize the driver task group, include a task group in your template that
has a task that contains a `spark.nomad.role` meta value set to `driver`.

To customize the executor task group, include a task group in your template that
has a task that contains a `spark.nomad.role` meta value set to `executor` or
`shuffle`.

The following template adds a `meta` value at the job level and an environment
variable to the executor task group:

```hcl
job "template" {

  meta {
    "foo" = "bar"
  }

  group "executor-group-name" {

    task "executor-task-name" {
      meta {
        "spark.nomad.role" = "executor"
      }

      env {
        BAZ = "something"
      }
    }
  }
}
```

## Order of Precedence

The order of precedence for customized settings is as follows:

1. Explicitly set configuration properties.
2. Settings in the job template (if provided).
3. Default values of the configuration properties.

## Next Steps

Learn how to [allocate resources](/guides/spark/resource.html) for your Spark
applications.
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
+								---
 								layout: "guides"
 								page_title: "Apache Spark Integration - Customizing Applications"
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								sidebar_current: "guides-analytical-workloads-spark-customizing"
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
+								description: |-
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								  Learn how to customize the Nomad job that is created to run a Spark
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
+								  application.
 								---
 								# Customizing Applications
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								There are two ways to customize the Nomad job that Spark creates to run an
-												rewording and minor fixes to various pages

											
										
										
											2017-07-07 20:20:40 +00:00
+								application:
 								 - Use the default job template and set configuration properties
 								 - Use a custom job template
 								## Using the Default Job Template
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								The Spark integration will use a generic job template by default. The template
 								includes groups and tasks for the driver, executors and (optionally) the
-												Fix formatting; remove reference to personal S3 bucket

											
										
										
											2017-07-08 15:20:10 +00:00
+								[shuffle service](/guides/spark/dynamic.html). The job itself and the tasks that
 								 are created have the `spark.nomad.role` meta value defined accordingly:
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
 								```hcl
 								job "structure" {
 								  meta {
 								    "spark.nomad.role" = "application"
 								  }
 								  # A driver group is only added in cluster mode
 								  group "driver" {
 								    task "driver" {
 								      meta {
 								        "spark.nomad.role" = "driver"
 								      }
 								    }
 								  }
 								  group "executors" {
 								    count = 2
 								    task "executor" {
 								      meta {
 								        "spark.nomad.role" = "executor"
 								      }
 								    }
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								    # Shuffle service tasks are only added when enabled (as it must be when
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
+								    # using dynamic allocation)
 								    task "shuffle-service" {
 								      meta {
 								        "spark.nomad.role" = "shuffle"
 								      }
 								    }
 								  }
 								}
 								```
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								The default template can be customized indirectly by explicitly [setting
-												rewording and minor fixes to various pages

											
										
										
											2017-07-07 20:20:40 +00:00
+								configuration properties](/guides/spark/configuration.html).
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
-												rewording and minor fixes to various pages

											
										
										
											2017-07-07 20:20:40 +00:00
+								## Using a Custom Job Template
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								An alternative to using the default template is to set the
 								`spark.nomad.job.template` configuration property to the path of a file
-												rewording and minor fixes to various pages

											
										
										
											2017-07-07 20:20:40 +00:00
+								containing a custom job template. There are two important considerations:
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								  * The template must use the JSON format. You can convert an HCL jobspec to
-												Fix old references

											
										
										
											2018-03-22 20:39:18 +00:00
+								  JSON by running `nomad job run -output <job.nomad>`.
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								  * `spark.nomad.job.template` should be set to a path on the submitting
 								  machine, not to a URL (even in cluster mode). The template does not need to
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
+								  be accessible to the driver or executors.
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								Using a job template you can override Spark’s default resource utilization, add
 								additional metadata or constraints, set environment variables, add sidecar
 								tasks and utilize the Consul and Vault integration. The template does
 								not need to be a complete Nomad job specification, since Spark will add
 								everything necessary to run your the application. For example, your template
 								might set `job` metadata, but not contain any task groups, making it an
-												rewording and minor fixes to various pages

											
										
										
											2017-07-07 20:20:40 +00:00
+								incomplete Nomad job specification but still a valid template to use with Spark.
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								To customize the driver task group, include a task group in your template that
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
+								has a task that contains a `spark.nomad.role` meta value set to `driver`.
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								To customize the executor task group, include a task group in your template that
 								has a task that contains a `spark.nomad.role` meta value set to `executor` or
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
+								`shuffle`.
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								The following template adds a `meta` value at the job level and an environment
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
+								variable to the executor task group:
 								```hcl
 								job "template" {
 								  meta {
 								    "foo" = "bar"
 								  }
 								  group "executor-group-name" {
 								    task "executor-task-name" {
 								      meta {
 								        "spark.nomad.role" = "executor"
 								      }
 								      env {
 								        BAZ = "something"
 								      }
 								    }
 								  }
 								}
 								```
-												Spellcheck sweep of website directory

Caught some typos.  Made units separate from the numbers 1GHz -> 1 GHz
after talking to Nick about questions of style (this has the side effect of making future spell checking easier).

											
										
										
											2017-07-17 18:41:50 +00:00
+								## Order of Precedence
-												rewording and minor fixes to various pages

											
										
										
											2017-07-07 20:20:40 +00:00
 								The order of precedence for customized settings is as follows:
 . Explicitly set configuration properties.
 . Settings in the job template (if provided).
 . Default values of the configuration properties.
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
+								## Next Steps
-												Revised NomadProject Structure

- Revised "What is Nomad" copy
- Added "Key Features" section with links to task drivers & device plugins with lift-and-shift from README
- Added "Who Uses Nomad" section with users, talks, blog posts
- Removed Hadoop YARN, Docker Swarm, HTCondor from comparisons
- Revamped Guides section
- Inserted "Installing Nomad", "Upgrading", "Integrations" as persistent in Guides navbar
- Split Installing Nomad into two paths for users (one for Sandbox with "Quickstart", one for Production)
- Surfaced "Upgrading" and "Integrations" section from documentation
- Changed "Job Lifecycle" section into "Deploying & Managing Applications"
- Reworked "Operations" into "Operating Nomad"
- Reworked "Security" into "Securing Nomad"
- Segmented Namespaces, Resource Quotas, Sentinel into "Governance & Policy" subsection
- Reworked "Spark integration" into its own "Analytical Workloads" section

											
										
										
											2019-05-08 21:40:38 +00:00
+								Learn how to [allocate resources](/guides/spark/resource.html) for your Spark
-												Apache Spark Integration guide

											
										
										
											2017-06-29 00:10:05 +00:00
+								applications.