Merge pull request #2738 from hashicorp/f-terraform-config

Add Packer, Terraform configs to spin up an integrated Nomad, Consul, Vault cluster in AWS
2017-07-10 15:55:04 -07:00 · 2017-07-10 15:55:04 -07:00 · 5f55f5a5ab
parent f821802033 e92daeb943
commit 5f55f5a5ab
29 changed files with 1270 additions and 0 deletions
--- a/terraform/README.md
+++ b/terraform/README.md
@ -0,0 +1,134 @@
+# Provision a Nomad cluster on AWS with Packer & Terraform
+
+Use this to easily provision a Nomad sandbox environment on AWS with 
+[Packer](https://packer.io) and [Terraform](https://terraform.io). 
+[Consul](https://www.consul.io/intro/index.html) and 
+[Vault](https://www.vaultproject.io/intro/index.html) are also installed 
+(colocated for convenience). The intention is to allow easy exploration of 
+Nomad and its integrations with the HashiCorp stack. This is *not* meant to be
+a production ready environment. A demonstration of [Nomad's Apache Spark 
+integration](examples/spark/README.md) is included. 
+
+## Setup
+
+Clone this repo and (optionally) use [Vagrant](https://www.vagrantup.com/intro/index.html) 
+to bootstrap a local staging environment:
+
+```bash
+$ git clone git@github.com:hashicorp/nomad.git
+$ cd terraform/aws
+$ vagrant up && vagrant ssh
+```
+
+The Vagrant staging environment pre-installs Packer, Terraform, and Docker.
+
+### Pre-requisites
+
+You will need the following:
+
+- AWS account
+- [API access keys](http://aws.amazon.com/developers/access-keys/)
+- [SSH key pair](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html)
+
+Set environment variables for your AWS credentials:
+
+```bash
+$ export AWS_ACCESS_KEY_ID=[ACCESS_KEY_ID]
+$ export AWS_SECRET_ACCESS_KEY=[SECRET_ACCESS_KEY]
+```
+
+## Provision a cluster
+
+`cd` to an environment subdirectory:
+
+```bash
+$ cd env/us-east
+```
+
+Update terraform.tfvars with your SSH key name:
+
+```bash
+region                  = "us-east-1"
+ami                     = "ami-76787e60"
+instance_type           = "t2.medium"
+key_name                = "KEY_NAME"
+server_count            = "3"
+client_count            = "4"
+```
+
+Note that a pre-provisioned, publicly available AMI is used by default 
+(for the `us-east-1` region). To provision your own customized AMI with 
+[Packer](https://www.packer.io/intro/index.html), follow the instructions 
+[here](aws/packer/README.md). You will need to replace the AMI ID in 
+`terraform.tfvars` with your own. You can also modify the `region`, 
+`instance_type`, `server_count`, and `client_count`. At least one client and
+one server are required.
+
+Provision the cluster:
+
+```bash
+$ terraform get
+$ terraform plan
+$ terraform apply
+```
+
+## Access the cluster
+
+SSH to one of the servers using its public IP:
+
+```bash
+$ ssh -i /path/to/key ubuntu@PUBLIC_IP
+```
+
+Note that the AWS security group is configured by default to allow all traffic 
+over port 22. This is *not* recommended for production deployments.
+
+Run a few basic commands to verify that Consul and Nomad are up and running 
+properly:
+
+```bash
+$ consul members
+$ nomad server-members
+$ nomad node-status
+```
+
+Optionally, initialize and unseal Vault:
+
+```bash
+$ vault init -key-shares=1 -key-threshold=1
+$ vault unseal
+$ export VAULT_TOKEN=[INITIAL_ROOT_TOKEN]
+```
+
+The `vault init` command above creates a single 
+[Vault unseal key](https://www.vaultproject.io/docs/concepts/seal.html) for 
+convenience. For a production environment, it is recommended that you create at 
+least five unseal key shares and securely distribute them to independent 
+operators. The `vault init` command defaults to five key shares and a key 
+threshold of three. If you provisioned more than one server, the others will 
+become standby nodes (but should still be unsealed). You can query the active 
+and standby nodes independently:
+
+```bash
+$ dig active.vault.service.consul
+$ dig active.vault.service.consul SRV
+$ dig standby.vault.service.consul
+``` 
+
+## Getting started with Nomad & the HashiCorp stack
+
+See:
+
+* [Getting Started with Nomad](https://www.nomadproject.io/intro/getting-started/jobs.html)
+* [Consul integration](https://www.nomadproject.io/docs/service-discovery/index.html)
+* [Vault integration](https://www.nomadproject.io/docs/vault-integration/index.html)
+* [consul-template integration](https://www.nomadproject.io/docs/job-specification/template.html) 
+
+## Apache Spark integration
+
+Nomad is well-suited for analytical workloads, given its performance 
+characteristics and first-class support for batch scheduling. Apache Spark is a 
+popular data processing engine/framework that has been architected to use 
+third-party schedulers. The Nomad ecosystem includes a [fork that natively 
+integrates Nomad with Spark](https://github.com/hashicorp/nomad-spark). A
+detailed walkthrough of the integration is included [here](examples/spark/README.md).
--- a/terraform/aws/Vagrantfile
+++ b/terraform/aws/Vagrantfile
@ -0,0 +1,55 @@
+# -*- mode: ruby -*-
+# vi: set ft=ruby :
+
+Vagrant.configure(2) do |config|
+
+  config.vm.box = "ubuntu/trusty64"
+  config.vm.provision "shell", inline: <<-SHELL
+    
+    cd /tmp
+
+    PACKERVERSION=1.0.0
+    PACKERDOWNLOAD=https://releases.hashicorp.com/packer/${PACKERVERSION}/packer_${PACKERVERSION}_linux_amd64.zip
+    TERRAFORMVERSION=0.9.8
+    TERRAFORMDOWNLOAD=https://releases.hashicorp.com/terraform/${TERRAFORMVERSION}/terraform_${TERRAFORMVERSION}_linux_amd64.zip
+
+    echo "Dependencies..."
+    sudo apt-get install -y unzip tree
+
+    # Disable the firewall
+    sudo ufw disable
+
+    ## Packer
+    echo Fetching Packer...
+    curl -L $PACKERDOWNLOAD > packer.zip
+    echo Installing Packer...
+    unzip packer.zip -d /usr/local/bin
+    chmod 0755 /usr/local/bin/packer
+    chown root:root /usr/local/bin/packer
+
+    ## Terraform
+    echo Fetching Terraform...
+    curl -L $TERRAFORMDOWNLOAD > terraform.zip
+    echo Installing Terraform...
+    unzip terraform.zip -d /usr/local/bin
+    chmod 0755 /usr/local/bin/terraform
+    chown root:root /usr/local/bin/terraform
+
+    ## Docker
+    echo deb https://apt.dockerproject.org/repo ubuntu-`lsb_release -c | awk '{print $2}'` main | sudo tee /etc/apt/sources.list.d/docker.list
+    sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
+    sudo apt-get update
+    sudo apt-get install -y docker-engine
+
+  SHELL
+
+  config.vm.synced_folder "../aws/", "/home/vagrant/aws", owner: "vagrant", group: "vagrant"
+  config.vm.synced_folder "../shared/", "/home/vagrant/shared", owner: "vagrant", group: "vagrant"
+  config.vm.synced_folder "../examples/", "/home/vagrant/examples", owner: "vagrant", group: "vagrant"
+
+  config.vm.provider "virtualbox" do |vb|
+    vb.memory = "2048"
+    vb.cpus = 2
+  end
+
+end
--- a/terraform/aws/env/us-east/main.tf
+++ b/terraform/aws/env/us-east/main.tf
@ -0,0 +1,60 @@
+variable "region" {
+  description = "The AWS region to deploy to."
+  default     = "us-east-1"
+}
+
+variable "ami" {}
+
+variable "instance_type" {
+  description = "The AWS instance type to use for both clients and servers."
+  default     = "t2.medium"
+}
+
+variable "key_name" {}
+
+variable "server_count" {
+  description = "The number of servers to provision."
+  default     = "3"
+}
+
+variable "client_count" {
+  description = "The number of clients to provision."
+  default     = "4"
+}
+
+variable "cluster_tag_value" {
+  description = "Used by Consul to automatically form a cluster."
+  default     = "auto-join"
+}
+
+provider "aws" {
+  region = "${var.region}"
+}
+
+module "hashistack" {
+  source = "../../modules/hashistack"
+
+  region            = "${var.region}"
+  ami               = "${var.ami}"
+  instance_type     = "${var.instance_type}"
+  key_name          = "${var.key_name}"
+  server_count      = "${var.server_count}"
+  client_count      = "${var.client_count}"
+  cluster_tag_value = "${var.cluster_tag_value}"
+}
+
+output "primary_server_private_ips" {
+  value = "${module.hashistack.primary_server_private_ips}"
+}
+
+output "primary_server_public_ips" {
+  value = "${module.hashistack.primary_server_public_ips}"
+}
+
+output "client_private_ips" {
+  value = "${module.hashistack.client_private_ips}"
+}
+
+output "client_public_ips" {
+  value = "${module.hashistack.client_public_ips}"
+}
--- a/terraform/aws/env/us-east/terraform.tfvars
+++ b/terraform/aws/env/us-east/terraform.tfvars
@ -0,0 +1,7 @@
+region            = "us-east-1"
+ami               = "ami-76787e60"
+instance_type     = "t2.medium"
+key_name          = "KEY_NAME"
+server_count      = "1"
+client_count      = "4"
+cluster_tag_value = "auto-join"
--- a/terraform/aws/env/us-east/user-data-client.sh
+++ b/terraform/aws/env/us-east/user-data-client.sh
@ -0,0 +1,6 @@
+#!/bin/bash
+
+set -e
+
+exec > >(sudo tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
+sudo bash /ops/shared/scripts/client.sh "${region}" "${cluster_tag_value}"
--- a/terraform/aws/env/us-east/user-data-server.sh
+++ b/terraform/aws/env/us-east/user-data-server.sh
@ -0,0 +1,6 @@
+#!/bin/bash
+
+set -e
+
+exec > >(sudo tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
+sudo bash /ops/shared/scripts/server.sh "${server_count}" "${region}" "${cluster_tag_value}"
--- a/terraform/aws/modules/hashistack/hashistack.tf
+++ b/terraform/aws/modules/hashistack/hashistack.tf
@ -0,0 +1,173 @@
+variable "region" {}
+variable "ami" {}
+variable "instance_type" {}
+variable "key_name" {}
+variable "server_count" {}
+variable "client_count" {}
+variable "cluster_tag_value" {}
+
+data "aws_vpc" "default" {
+  default = true
+}
+
+resource "aws_security_group" "primary" {
+  name   = "hashistack"
+  vpc_id = "${data.aws_vpc.default.id}"
+
+  ingress {
+    from_port   = 22
+    to_port     = 22
+    protocol    = "tcp"
+    cidr_blocks = ["0.0.0.0/0"]
+  }
+
+  # HDFS NameNode UI
+  ingress {
+    from_port   = 50070
+    to_port     = 50070
+    protocol    = "tcp"
+    cidr_blocks = ["0.0.0.0/0"]
+  }
+
+  # HDFS DataNode UI
+  ingress {
+    from_port   = 50075
+    to_port     = 50075
+    protocol    = "tcp"
+    cidr_blocks = ["0.0.0.0/0"]
+  }
+
+  # Spark history server UI
+  ingress {
+    from_port   = 18080
+    to_port     = 18080
+    protocol    = "tcp"
+    cidr_blocks = ["0.0.0.0/0"]
+  }
+
+  ingress {
+    from_port = 0
+    to_port   = 0
+    protocol  = "-1"
+    self      = true
+  }
+
+  egress {
+    from_port   = 0
+    to_port     = 0
+    protocol    = "-1"
+    cidr_blocks = ["0.0.0.0/0"]
+  }
+}
+
+data "template_file" "user_data_server_primary" {
+  template = "${file("${path.root}/user-data-server.sh")}"
+
+  vars {
+    server_count      = "${var.server_count}"
+    region            = "${var.region}"
+    cluster_tag_value = "${var.cluster_tag_value}"
+  }
+}
+
+data "template_file" "user_data_client" {
+  template = "${file("${path.root}/user-data-client.sh")}"
+
+  vars {
+    region            = "${var.region}"
+    cluster_tag_value = "${var.cluster_tag_value}"
+  }
+}
+
+resource "aws_instance" "primary" {
+  ami                    = "${var.ami}"
+  instance_type          = "${var.instance_type}"
+  key_name               = "${var.key_name}"
+  vpc_security_group_ids = ["${aws_security_group.primary.id}"]
+  count                  = "${var.server_count}"
+
+  #Instance tags
+  tags {
+    Name           = "hashistack-server-${count.index}"
+    ConsulAutoJoin = "${var.cluster_tag_value}"
+  }
+
+  user_data            = "${data.template_file.user_data_server_primary.rendered}"
+  iam_instance_profile = "${aws_iam_instance_profile.instance_profile.name}"
+}
+
+resource "aws_instance" "client" {
+  ami                    = "${var.ami}"
+  instance_type          = "${var.instance_type}"
+  key_name               = "${var.key_name}"
+  vpc_security_group_ids = ["${aws_security_group.primary.id}"]
+  count                  = "${var.client_count}"
+  depends_on             = ["aws_instance.primary"]
+
+  #Instance tags
+  tags {
+    Name           = "hashistack-client-${count.index}"
+    ConsulAutoJoin = "${var.cluster_tag_value}"
+  }
+
+  user_data            = "${data.template_file.user_data_client.rendered}"
+  iam_instance_profile = "${aws_iam_instance_profile.instance_profile.name}"
+}
+
+resource "aws_iam_instance_profile" "instance_profile" {
+  name_prefix = "hashistack"
+  role        = "${aws_iam_role.instance_role.name}"
+}
+
+resource "aws_iam_role" "instance_role" {
+  name_prefix        = "hashistack"
+  assume_role_policy = "${data.aws_iam_policy_document.instance_role.json}"
+}
+
+data "aws_iam_policy_document" "instance_role" {
+  statement {
+    effect  = "Allow"
+    actions = ["sts:AssumeRole"]
+
+    principals {
+      type        = "Service"
+      identifiers = ["ec2.amazonaws.com"]
+    }
+  }
+}
+
+resource "aws_iam_role_policy" "auto_discover_cluster" {
+  name   = "auto-discover-cluster"
+  role   = "${aws_iam_role.instance_role.id}"
+  policy = "${data.aws_iam_policy_document.auto_discover_cluster.json}"
+}
+
+data "aws_iam_policy_document" "auto_discover_cluster" {
+  statement {
+    effect = "Allow"
+
+    actions = [
+      "ec2:DescribeInstances",
+      "ec2:DescribeTags",
+      "autoscaling:DescribeAutoScalingGroups",
+    ]
+
+    resources = ["*"]
+  }
+}
+
+output "primary_server_private_ips" {
+  value = ["${aws_instance.primary.*.private_ip}"]
+}
+
+output "primary_server_public_ips" {
+  value = ["${aws_instance.primary.*.public_ip}"]
+}
+
+output "client_private_ips" {
+  value = ["${aws_instance.client.*.private_ip}"]
+}
+
+output "client_public_ips" {
+  value = ["${aws_instance.client.*.public_ip}"]
+}
--- a/terraform/aws/packer/README.md
+++ b/terraform/aws/packer/README.md
@ -0,0 +1,31 @@
+# Build an Amazon machine image with Packer
+
+[Packer](https://www.packer.io/intro/index.html) is HashiCorp's open source tool 
+for creating identical machine images for multiple platforms from a single 
+source configuration. The Terraform templates included in this repo reference a 
+publicly avaialble Amazon machine image (AMI) by default. The Packer build 
+configuration used to create the public AMI is included [here](./packer.json). 
+If you wish to customize it and build your own private AMI, follow the 
+instructions below.
+
+## Pre-requisites
+
+See the pre-requisites listed [here](../../README.md). If you did not use the 
+included `Vagrantfile` to bootstrap a staging environment, you will need to 
+[install Packer](https://www.packer.io/intro/getting-started/install.html).
+
+Set environment variables for your AWS credentials if you haven't already:
+
+```bash
+$ export AWS_ACCESS_KEY_ID=[ACCESS_KEY_ID]
+$ export AWS_SECRET_ACCESS_KEY=[SECRET_ACCESS_KEY]
+```
+
+After you make your modifications to `packer.json`, execute the following 
+command to build the AMI:
+
+```bash
+$ packer build packer.json
+```
+
+Don't forget to copy the AMI ID to your [terraform.tfvars file](../env/us-east/terraform.tfvars).
--- a/terraform/aws/packer/packer.json
+++ b/terraform/aws/packer/packer.json
@ -0,0 +1,33 @@
+{
+  "builders": [{
+    "type": "amazon-ebs",
+    "region": "us-east-1",
+    "source_ami": "ami-80861296",
+    "instance_type": "t2.medium",
+    "ssh_username": "ubuntu",
+    "ami_name": "nomad-packer {{timestamp}}",
+    "ami_groups": ["all"]
+  }],
+  "provisioners":  [
+  {
+    "type": "shell",
+    "inline": [
+      "sudo mkdir /ops",
+      "sudo chmod 777 /ops"
+    ]
+  },
+  {
+    "type": "file",
+    "source": "../../shared",
+    "destination": "/ops"
+  },
+  {
+    "type": "file",
+    "source": "../../examples",
+    "destination": "/ops"
+  },
+  {
+    "type": "shell",
+    "script": "../../shared/scripts/setup.sh"
+  }]
+}
--- a/terraform/examples/README.md
+++ b/terraform/examples/README.md
@ -0,0 +1,7 @@
+# Examples
+
+The examples included here are designed to introduce specific features and 
+provide a basic learning experience. The examples subdirectory is automatically 
+provisioned into the home directory of the VMs in your cloud environment.
+
+- [Spark Integration](spark/README.md)
--- a/terraform/examples/spark/README.md
+++ b/terraform/examples/spark/README.md
@ -0,0 +1,193 @@
+# Nomad / Spark integration
+
+The Nomad ecosystem includes a fork of Apache Spark that natively supports using 
+a Nomad cluster to run Spark applications. When running on Nomad, the Spark 
+executors that run Spark tasks for your application, and optionally the 
+application driver itself, run as Nomad tasks in a Nomad job. See the 
+[usage guide](./RunningSparkOnNomad.pdf) for more details.
+
+Clusters provisioned with Nomad's Terraform templates are automatically 
+configured to run the Spark integration. The sample job files found here are 
+also provisioned onto every client and server.
+
+## Setup
+
+To give the Spark integration a test drive, provision a cluster and SSH to any 
+one of the clients or servers (the public IPs are displayed when the Terraform 
+provisioning process completes):
+
+```bash
+$ ssh -i /path/to/key ubuntu@PUBLIC_IP
+```
+
+The Spark history server and several of the sample Spark jobs below require 
+HDFS. Using the included job file, deploy an HDFS cluster on Nomad: 
+
+```bash
+$ cd $HOME/examples/spark
+$ nomad run hdfs.nomad
+$ nomad status hdfs
+```
+
+When the allocations are all in the `running` state (as shown by `nomad status 
+hdfs`), query Consul to verify that the HDFS service has been registered:
+
+```bash
+$ dig hdfs.service.consul
+```
+
+Next, create directories and files in HDFS for use by the history server and the 
+sample Spark jobs:
+
+```bash
+$ hdfs dfs -mkdir /foo
+$ hdfs dfs -put /var/log/apt/history.log /foo
+$ hdfs dfs -mkdir /spark-events
+$ hdfs dfs -ls /
+```
+
+Finally, deploy the Spark history server:
+
+```bash
+$ nomad run spark-history-server-hdfs.nomad
+```
+
+You can get the private IP for the history server with a Consul DNS lookup:
+
+```bash
+$ dig spark-history.service.consul
+```
+
+Cross-reference the private IP with the `terraforom apply` output to get the 
+corresponding public IP. You can access the history server at 
+`http://PUBLIC_IP:18080`.
+
+## Sample Spark jobs
+
+The sample `spark-submit` commands listed below demonstrate several of the 
+official Spark examples. Features like `spark-sql`, `spark-shell` and `pyspark` 
+are included. The commands can be executed from any client or server.
+
+You can monitor the status of a Spark job in a second terminal session with:
+
+```bash
+$ nomad status
+$ nomad status JOB_ID
+$ nomad alloc-status DRIVER_ALLOC_ID
+$ nomad logs DRIVER_ALLOC_ID
+```
+
+To view the output of the job, run `nomad logs` for the driver's Allocation ID.
+
+### SparkPi (Java)
+
+```bash
+spark-submit \
+  --class org.apache.spark.examples.JavaSparkPi \
+  --master nomad \
+  --deploy-mode cluster \
+  --conf spark.executor.instances=4 \
+  --conf spark.nomad.cluster.monitorUntil=complete \
+  --conf spark.eventLog.enabled=true \
+  --conf spark.eventLog.dir=hdfs://hdfs.service.consul/spark-events \
+  --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz \
+  https://s3.amazonaws.com/nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar 100
+```
+
+### Word count (Java)
+
+```bash
+spark-submit \
+  --class org.apache.spark.examples.JavaWordCount \
+  --master nomad \
+  --deploy-mode cluster \
+  --conf spark.executor.instances=4 \
+  --conf spark.nomad.cluster.monitorUntil=complete \
+  --conf spark.eventLog.enabled=true \
+  --conf spark.eventLog.dir=hdfs://hdfs.service.consul/spark-events \
+  --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz \
+  https://s3.amazonaws.com/nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar \
+  hdfs://hdfs.service.consul/foo/history.log
+```
+
+### DFSReadWriteTest (Scala)
+
+```bash
+spark-submit \
+  --class org.apache.spark.examples.DFSReadWriteTest \
+  --master nomad \
+  --deploy-mode cluster \
+  --conf spark.executor.instances=4 \
+  --conf spark.nomad.cluster.monitorUntil=complete \
+  --conf spark.eventLog.enabled=true \
+  --conf spark.eventLog.dir=hdfs://hdfs.service.consul/spark-events \
+  --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz \
+  https://s3.amazonaws.com/nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar \
+  /home/ubuntu/.bashrc hdfs://hdfs.service.consul/foo
+```
+
+### spark-shell
+
+Start the shell:
+
+```bash
+spark-shell \
+  --master nomad \
+  --conf spark.executor.instances=4 \
+  --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz
+```
+
+Run a few commands:
+
+```bash
+$ spark.version
+
+$ val data = 1 to 10000
+$ val distData = sc.parallelize(data)
+$ distData.filter(_ < 10).collect()
+```
+
+### sql-shell
+
+Start the shell:
+
+```bash
+spark-sql \
+  --master nomad \
+  --conf spark.executor.instances=4 \
+  --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz jars/spark-sql_2.11-2.1.0-SNAPSHOT.jar
+```
+
+Run a few commands:
+
+```bash
+$ CREATE TEMPORARY VIEW usersTable
+USING org.apache.spark.sql.parquet
+OPTIONS (
+  path "/usr/local/bin/spark/examples/src/main/resources/users.parquet"
+);
+
+$ SELECT * FROM usersTable;
+```
+
+### pyspark
+
+Start the shell:
+
+```bash
+pyspark \
+  --master nomad \
+  --conf spark.executor.instances=4 \
+  --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz
+```
+
+Run a few commands:
+
+```bash
+$ df = spark.read.json("/usr/local/bin/spark/examples/src/main/resources/people.json")
+$ df.show()
+$ df.printSchema()
+$ df.createOrReplaceTempView("people")
+$ sqlDF = spark.sql("SELECT * FROM people")
+$ sqlDF.show()
+```
--- a/terraform/examples/spark/RunningSparkOnNomad.pdf
+++ b/terraform/examples/spark/RunningSparkOnNomad.pdf
--- a/terraform/examples/spark/docker/hdfs/Dockerfile
+++ b/terraform/examples/spark/docker/hdfs/Dockerfile
@ -0,0 +1,9 @@
+FROM openjdk:7
+
+ENV HADOOP_VERSION 2.7.3
+
+RUN wget -O - http://apache.mirror.iphh.net/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz | tar xz -C /usr/local/
+ENV HADOOP_PREFIX /usr/local/hadoop-$HADOOP_VERSION
+ENV PATH $PATH:$HADOOP_PREFIX/bin
+
+COPY core-site.xml $HADOOP_PREFIX/etc/hadoop/
--- a/terraform/examples/spark/docker/hdfs/core-site.xml
+++ b/terraform/examples/spark/docker/hdfs/core-site.xml
@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<configuration>
+    <property>
+        <name>fs.defaultFS</name>
+        <value>hdfs://hdfs.service.consul/</value>
+    </property>
+</configuration>
--- a/terraform/examples/spark/docker/spark/Dockerfile
+++ b/terraform/examples/spark/docker/spark/Dockerfile
@ -0,0 +1,7 @@
+FROM openjdk:7-jre
+
+RUN curl https://spark-nomad.s3.amazonaws.com/spark-2.1.1-bin-nomad.tgz | tar -xzC /tmp
+RUN mv /tmp/spark* /opt/spark
+
+ENV SPARK_HOME /opt/spark
+ENV PATH $PATH:$SPARK_HOME/bin
--- a/terraform/examples/spark/hdfs.nomad
+++ b/terraform/examples/spark/hdfs.nomad
@ -0,0 +1,91 @@
+job "hdfs" {
+
+  datacenters = [ "dc1" ]
+
+  group "NameNode" {
+
+    constraint {
+      operator  = "distinct_hosts"
+      value     = "true"
+    }
+
+    task "NameNode" {
+
+      driver = "docker"
+
+      config {
+        image = "rcgenova/hadoop-2.7.3"
+        command = "bash"
+        args = [ "-c", "hdfs namenode -format && exec hdfs namenode -D fs.defaultFS=hdfs://${NOMAD_ADDR_ipc}/ -D dfs.permissions.enabled=false" ]
+        network_mode = "host"
+        port_map {
+          ipc = 8020
+          ui = 50070
+        }
+      }
+
+      resources {
+        memory = 500
+        network {
+          port "ipc" {
+            static = "8020"
+          }
+          port "ui" {
+            static = "50070"
+          }
+        }
+      }
+
+      service {
+        name = "hdfs"
+        port = "ipc"
+      }
+    }
+  }
+
+  group "DataNode" {
+
+    count = 3
+
+    constraint {
+      operator  = "distinct_hosts"
+      value     = "true"
+    }
+    
+    task "DataNode" {
+
+      driver = "docker"
+
+      config {
+        network_mode = "host"
+        image = "rcgenova/hadoop-2.7.3"
+        args = [ "hdfs", "datanode"
+          , "-D", "fs.defaultFS=hdfs://hdfs.service.consul/"
+          , "-D", "dfs.permissions.enabled=false"
+        ]
+        port_map {
+          data = 50010
+          ipc = 50020
+          ui = 50075
+        }
+      }
+
+      resources {
+        memory = 500
+        network {
+          port "data" {
+            static = "50010"
+          }
+          port "ipc" {
+            static = "50020"
+          }
+          port "ui" {
+            static = "50075"
+          }
+        }
+      }
+
+    }
+  }
+
+}
--- a/terraform/examples/spark/spark-history-server-hdfs.nomad
+++ b/terraform/examples/spark/spark-history-server-hdfs.nomad
@ -0,0 +1,45 @@
+job "spark-history-server" {
+  datacenters = ["dc1"]
+  type = "service"
+
+  group "server" {
+    count = 1
+
+    task "history-server" {
+      driver = "docker"
+      
+      config {
+        image = "barnardb/spark"
+        command = "/spark/spark-2.1.0-bin-nomad/bin/spark-class"
+        args = [ "org.apache.spark.deploy.history.HistoryServer" ]
+        port_map {
+          ui = 18080
+        }
+        network_mode = "host"
+      }
+
+      env {
+        "SPARK_HISTORY_OPTS" = "-Dspark.history.fs.logDirectory=hdfs://hdfs.service.consul/spark-events/"
+        "SPARK_PUBLIC_DNS"   = "spark-history.service.consul"
+      }
+
+      resources {
+        cpu    = 500
+        memory = 500
+        network {
+          mbits = 250
+          port "ui" {
+            static = 18080
+          }
+        }
+      }
+
+      service {
+        name = "spark-history"
+        tags = ["spark", "ui"]
+        port = "ui"
+      }
+    }
+
+  }
+}
--- a/terraform/shared/config/consul.json
+++ b/terraform/shared/config/consul.json
@ -0,0 +1,17 @@
+{
+  "log_level": "INFO",
+  "server": true,
+  "data_dir": "/opt/consul/data",
+  "bind_addr": "0.0.0.0",
+  "client_addr": "0.0.0.0",
+  "advertise_addr": "IP_ADDRESS",
+  "bootstrap_expect": SERVER_COUNT,
+  "service": {
+    "name": "consul"
+  },
+  "retry_join_ec2": {
+    "tag_key": "ConsulAutoJoin",
+    "tag_value": "CLUSTER_TAG_VALUE",
+    "region": "REGION"
+  }
+}
--- a/terraform/shared/config/consul_client.json
+++ b/terraform/shared/config/consul_client.json
@ -0,0 +1,12 @@
+{
+  "log_level": "INFO",
+  "data_dir": "/opt/consul/data",
+  "bind_addr": "0.0.0.0",
+  "client_addr": "0.0.0.0",
+  "advertise_addr": "IP_ADDRESS",
+  "retry_join_ec2": {
+    "tag_key": "ConsulAutoJoin",
+    "tag_value": "CLUSTER_TAG_VALUE",
+    "region": "REGION"
+  }
+}
--- a/terraform/shared/config/consul_upstart.conf
+++ b/terraform/shared/config/consul_upstart.conf
@ -0,0 +1,24 @@
+description "Consul"
+
+start on runlevel [2345]
+stop on runlevel [!2345]
+
+respawn
+
+console log
+
+script
+  if [ -f "/etc/service/consul" ]; then
+    . /etc/service/consul
+  fi
+
+  # Allow Consul to use privileged ports
+  export CONSUL_ALLOW_PRIVILEGED_PORTS=true
+
+  exec /usr/local/bin/consul agent \
+    -config-dir="/etc/consul.d" \
+    -dns-port="53" \
+    -recursor="172.31.0.2" \
+    \$${CONSUL_FLAGS} \
+    >>/var/log/consul.log 2>&1
+end script
--- a/terraform/shared/config/core-site.xml
+++ b/terraform/shared/config/core-site.xml
@ -0,0 +1,8 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<configuration>
+    <property>
+        <name>fs.defaultFS</name>
+        <value>hdfs://hdfs.service.consul/</value>
+    </property>
+</configuration>
--- a/terraform/shared/config/nomad.hcl
+++ b/terraform/shared/config/nomad.hcl
@ -0,0 +1,23 @@
+data_dir = "/opt/nomad/data"
+bind_addr = "IP_ADDRESS"
+
+# Enable the server
+server {
+  enabled = true
+  bootstrap_expect = SERVER_COUNT
+}
+
+name = "nomad@IP_ADDRESS"
+
+consul {
+  address = "IP_ADDRESS:8500"
+}
+
+vault {
+  enabled = false
+  address = "vault.service.consul"
+  task_token_ttl = "1h"
+  create_from_role = "nomad-cluster"
+  token = ""
+}
+
--- a/terraform/shared/config/nomad_client.hcl
+++ b/terraform/shared/config/nomad_client.hcl
@ -0,0 +1,17 @@
+data_dir = "/opt/nomad/data"
+bind_addr = "IP_ADDRESS"
+name = "nomad@IP_ADDRESS"
+
+# Enable the client
+client {
+  enabled = true
+}
+
+consul {
+  address = "127.0.0.1:8500"
+}
+
+vault {
+  enabled = true
+  address = "vault.service.consul"
+}
--- a/terraform/shared/config/nomad_upstart.conf
+++ b/terraform/shared/config/nomad_upstart.conf
@ -0,0 +1,19 @@
+description "Nomad"
+
+start on runlevel [2345]
+stop on runlevel [!2345]
+
+respawn
+
+console log
+
+script
+  if [ -f "/etc/service/nomad" ]; then
+    . /etc/service/nomad
+  fi
+
+  exec /usr/local/bin/nomad agent \
+    -config="/etc/nomad.d/nomad.hcl" \
+    \$${NOMAD_FLAGS} \
+    >>/var/log/nomad.log 2>&1
+end script
--- a/terraform/shared/config/vault.hcl
+++ b/terraform/shared/config/vault.hcl
@ -0,0 +1,12 @@
+backend "consul" {
+  path = "vault/"
+  address = "IP_ADDRESS:8500"
+  cluster_addr = "https://IP_ADDRESS:8201"
+  redirect_addr = "http://IP_ADDRESS:8200"
+}
+
+listener "tcp" {
+  address = "IP_ADDRESS:8200"
+  cluster_address = "IP_ADDRESS:8201"
+  tls_disable = 1
+}
--- a/terraform/shared/config/vault_upstart.conf
+++ b/terraform/shared/config/vault_upstart.conf
@ -0,0 +1,22 @@
+description "Vault"
+
+start on runlevel [2345]
+stop on runlevel [!2345]
+
+respawn
+
+console log
+
+script
+  if [ -f "/etc/service/vault" ]; then
+    . /etc/service/vault
+  fi
+
+  # Make sure to use all our CPUs, because Vault can block a scheduler thread
+  export GOMAXPROCS=`nproc`
+
+  exec /usr/local/bin/vault server \
+    -config="/etc/vault.d/vault.hcl" \
+    \$${VAULT_FLAGS} \
+    >>/var/log/vault.log 2>&1
+end script
--- a/terraform/shared/scripts/client.sh
+++ b/terraform/shared/scripts/client.sh
@ -0,0 +1,64 @@
+#!/bin/bash
+
+set -e
+
+CONFIGDIR=/ops/shared/config
+
+CONSULCONFIGDIR=/etc/consul.d
+NOMADCONFIGDIR=/etc/nomad.d
+HADOOP_VERSION=hadoop-2.7.3
+HADOOPCONFIGDIR=/usr/local/$HADOOP_VERSION/etc/hadoop
+HOME_DIR=ubuntu
+
+# Wait for network
+sleep 15
+
+IP_ADDRESS=$(curl http://instance-data/latest/meta-data/local-ipv4)
+DOCKER_BRIDGE_IP_ADDRESS=(`ifconfig docker0 2>/dev/null|awk '/inet addr:/ {print $2}'|sed 's/addr://'`)
+REGION=$1
+CLUSTER_TAG_VALUE=$2
+
+# Consul
+sed -i "s/IP_ADDRESS/$IP_ADDRESS/g" $CONFIGDIR/consul_client.json
+sed -i "s/REGION/$REGION/g" $CONFIGDIR/consul_client.json
+sed -i "s/CLUSTER_TAG_VALUE/$CLUSTER_TAG_VALUE/g" $CONFIGDIR/consul_client.json
+sudo cp $CONFIGDIR/consul_client.json $CONSULCONFIGDIR/consul.json
+sudo cp $CONFIGDIR/consul_upstart.conf /etc/init/consul.conf
+
+sudo service consul start
+sleep 10
+
+# Nomad
+sed -i "s/IP_ADDRESS/$IP_ADDRESS/g" $CONFIGDIR/nomad_client.hcl
+sudo cp $CONFIGDIR/nomad_client.hcl $NOMADCONFIGDIR/nomad.hcl
+sudo cp $CONFIGDIR/nomad_upstart.conf /etc/init/nomad.conf
+
+sudo service nomad start
+sleep 10
+export NOMAD_ADDR=http://$IP_ADDRESS:4646
+
+# Add hostname to /etc/hosts
+echo "127.0.0.1 $(hostname)" | sudo tee --append /etc/hosts
+
+# Add Docker bridge network IP to /etc/resolv.conf (at the top)
+echo "nameserver $DOCKER_BRIDGE_IP_ADDRESS" | sudo tee /etc/resolv.conf.new
+cat /etc/resolv.conf | sudo tee --append /etc/resolv.conf.new
+sudo mv /etc/resolv.conf.new /etc/resolv.conf
+
+# Hadoop config file to enable HDFS CLI
+sudo cp $CONFIGDIR/core-site.xml $HADOOPCONFIGDIR
+
+# Move examples directory to $HOME
+sudo mv /ops/examples /home/$HOME_DIR
+sudo chown -R $HOME_DIR:$HOME_DIR /home/$HOME_DIR/examples
+sudo chmod -R 775 /home/$HOME_DIR/examples
+
+# Set env vars for tool CLIs
+echo "export VAULT_ADDR=http://$IP_ADDRESS:8200" | sudo tee --append /home/$HOME_DIR/.bashrc
+echo "export NOMAD_ADDR=http://$IP_ADDRESS:4646" | sudo tee --append /home/$HOME_DIR/.bashrc
+echo "export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre"  | sudo tee --append /home/$HOME_DIR/.bashrc
+
+# Update PATH
+echo "export PATH=$PATH:/usr/local/bin/spark/bin:/usr/local/$HADOOP_VERSION/bin" | sudo tee --append /home/$HOME_DIR/.bashrc
+
+
--- a/terraform/shared/scripts/server.sh
+++ b/terraform/shared/scripts/server.sh
@ -0,0 +1,78 @@
+#!/bin/bash
+
+set -e
+
+CONFIGDIR=/ops/shared/config
+
+CONSULCONFIGDIR=/etc/consul.d
+VAULTCONFIGDIR=/etc/vault.d
+NOMADCONFIGDIR=/etc/nomad.d
+HADOOP_VERSION=hadoop-2.7.3
+HADOOPCONFIGDIR=/usr/local/$HADOOP_VERSION/etc/hadoop
+HOME_DIR=ubuntu
+
+sleep 15
+
+IP_ADDRESS=$(curl http://instance-data/latest/meta-data/local-ipv4)
+DOCKER_BRIDGE_IP_ADDRESS=(`ifconfig docker0 2>/dev/null|awk '/inet addr:/ {print $2}'|sed 's/addr://'`)
+SERVER_COUNT=$1
+REGION=$2
+CLUSTER_TAG_VALUE=$3
+
+# Consul
+sed -i "s/IP_ADDRESS/$IP_ADDRESS/g" $CONFIGDIR/consul.json
+sed -i "s/SERVER_COUNT/$SERVER_COUNT/g" $CONFIGDIR/consul.json
+sed -i "s/REGION/$REGION/g" $CONFIGDIR/consul.json
+sed -i "s/CLUSTER_TAG_VALUE/$CLUSTER_TAG_VALUE/g" $CONFIGDIR/consul.json
+sudo cp $CONFIGDIR/consul.json $CONSULCONFIGDIR
+sudo cp $CONFIGDIR/consul_upstart.conf /etc/init/consul.conf
+
+sudo service consul start
+sleep 20
+export CONSUL_HTTP_ADDR=$IP_ADDRESS:8500
+export CONSUL_RPC_ADDR=$IP_ADDRESS:8400
+
+# Vault
+sed -i "s/IP_ADDRESS/$IP_ADDRESS/g" $CONFIGDIR/vault.hcl
+sudo cp $CONFIGDIR/vault.hcl $VAULTCONFIGDIR
+sudo cp $CONFIGDIR/vault_upstart.conf /etc/init/vault.conf
+
+sudo service vault start
+
+# Nomad
+sed -i "s/IP_ADDRESS/$IP_ADDRESS/g" $CONFIGDIR/nomad.hcl
+sed -i "s/SERVER_COUNT/$SERVER_COUNT/g" $CONFIGDIR/nomad.hcl
+sudo cp $CONFIGDIR/nomad.hcl $NOMADCONFIGDIR
+sudo cp $CONFIGDIR/nomad_upstart.conf /etc/init/nomad.conf
+
+sudo service nomad start
+sleep 10
+export NOMAD_ADDR=http://$IP_ADDRESS:4646
+
+# Add hostname to /etc/hosts
+
+echo "127.0.0.1 $(hostname)" | sudo tee --append /etc/hosts
+
+# Add Docker bridge network IP to /etc/resolv.conf (at the top)
+
+echo "nameserver $DOCKER_BRIDGE_IP_ADDRESS" | sudo tee /etc/resolv.conf.new
+cat /etc/resolv.conf | sudo tee --append /etc/resolv.conf.new
+sudo mv /etc/resolv.conf.new /etc/resolv.conf
+
+# Hadoop
+sudo cp $CONFIGDIR/core-site.xml $HADOOPCONFIGDIR
+
+# Move examples directory to $HOME
+sudo mv /ops/examples /home/$HOME_DIR
+sudo chown -R $HOME_DIR:$HOME_DIR /home/$HOME_DIR/examples
+sudo chmod -R 775 /home/$HOME_DIR/examples
+
+# Set env vars for tool CLIs
+echo "export CONSUL_RPC_ADDR=$IP_ADDRESS:8400" | sudo tee --append /home/$HOME_DIR/.bashrc
+echo "export CONSUL_HTTP_ADDR=$IP_ADDRESS:8500" | sudo tee --append /home/$HOME_DIR/.bashrc
+echo "export VAULT_ADDR=http://$IP_ADDRESS:8200" | sudo tee --append /home/$HOME_DIR/.bashrc
+echo "export NOMAD_ADDR=http://$IP_ADDRESS:4646" | sudo tee --append /home/$HOME_DIR/.bashrc
+echo "export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre"  | sudo tee --append /home/$HOME_DIR/.bashrc
+
+# Update PATH
+echo "export PATH=$PATH:/usr/local/bin/spark/bin:/usr/local/$HADOOP_VERSION/bin" | sudo tee --append /home/$HOME_DIR/.bashrc
--- a/terraform/shared/scripts/setup.sh
+++ b/terraform/shared/scripts/setup.sh
@ -0,0 +1,106 @@
+#!/bin/bash
+
+set -e
+
+cd /ops
+
+CONFIGDIR=/ops/shared/config
+
+CONSULVERSION=0.8.4
+CONSULDOWNLOAD=https://releases.hashicorp.com/consul/${CONSULVERSION}/consul_${CONSULVERSION}_linux_amd64.zip
+CONSULCONFIGDIR=/etc/consul.d
+CONSULDIR=/opt/consul
+
+VAULTVERSION=0.7.3
+VAULTDOWNLOAD=https://releases.hashicorp.com/vault/${VAULTVERSION}/vault_${VAULTVERSION}_linux_amd64.zip
+VAULTCONFIGDIR=/etc/vault.d
+VAULTDIR=/opt/vault
+
+NOMADVERSION=0.5.6
+NOMADDOWNLOAD=https://releases.hashicorp.com/nomad/${NOMADVERSION}/nomad_${NOMADVERSION}_linux_amd64.zip
+NOMADCONFIGDIR=/etc/nomad.d
+NOMADDIR=/opt/nomad
+
+HADOOP_VERSION=2.7.3
+
+# Dependencies
+sudo apt-get install -y software-properties-common
+sudo apt-get update
+sudo apt-get install -y unzip tree redis-tools jq
+sudo apt-get install -y upstart-sysv
+sudo update-initramfs -u
+
+# Numpy (for Spark)
+sudo apt-get install -y python-setuptools
+sudo easy_install pip
+sudo pip install numpy
+
+# Disable the firewall
+
+sudo ufw disable
+
+# Consul
+
+curl -L $CONSULDOWNLOAD > consul.zip
+
+## Install
+sudo unzip consul.zip -d /usr/local/bin
+sudo chmod 0755 /usr/local/bin/consul
+sudo chown root:root /usr/local/bin/consul
+
+## Configure
+sudo mkdir -p $CONSULCONFIGDIR
+sudo chmod 755 $CONSULCONFIGDIR
+sudo mkdir -p $CONSULDIR
+sudo chmod 755 $CONSULDIR
+
+# Vault
+
+curl -L $VAULTDOWNLOAD > vault.zip
+
+## Install
+sudo unzip vault.zip -d /usr/local/bin
+sudo chmod 0755 /usr/local/bin/vault
+sudo chown root:root /usr/local/bin/vault
+
+## Configure
+sudo mkdir -p $VAULTCONFIGDIR
+sudo chmod 755 $VAULTCONFIGDIR
+sudo mkdir -p $VAULTDIR
+sudo chmod 755 $VAULTDIR
+
+# Nomad
+
+curl -L $NOMADDOWNLOAD > nomad.zip
+
+## Install
+sudo unzip nomad.zip -d /usr/local/bin
+sudo chmod 0755 /usr/local/bin/nomad
+sudo chown root:root /usr/local/bin/nomad
+
+## Configure
+sudo mkdir -p $NOMADCONFIGDIR
+sudo chmod 755 $NOMADCONFIGDIR
+sudo mkdir -p $NOMADDIR
+sudo chmod 755 $NOMADDIR
+
+# Docker
+echo deb https://apt.dockerproject.org/repo ubuntu-`lsb_release -c | awk '{print $2}'` main | sudo tee /etc/apt/sources.list.d/docker.list
+sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
+sudo apt-get update
+sudo apt-get install -y docker-engine
+
+# Java
+sudo add-apt-repository -y ppa:openjdk-r/ppa
+sudo apt-get update 
+sudo apt-get install -y openjdk-8-jdk
+JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
+
+# Spark
+sudo wget -P /ops/examples/spark https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz
+sudo tar -xvf /ops/examples/spark/spark-2.1.0-bin-nomad.tgz --directory /ops/examples/spark
+sudo mv /ops/examples/spark/spark-2.1.0-bin-nomad /usr/local/bin/spark
+sudo chown -R root:root /usr/local/bin/spark
+
+# Hadoop (to enable the HDFS CLI)
+wget -O - http://apache.mirror.iphh.net/hadoop/common/hadoop-$HADOOP_VERSION/hadoop-$HADOOP_VERSION.tar.gz | sudo tar xz -C /usr/local/