open-nomad/terraform/examples/spark/README.md
2017-05-15 11:56:41 -07:00

2.8 KiB

Spark integration

cd to examples/spark/spark on one of the servers. The spark/spark subdirectory will be created when the cluster is provisioned.

You can use the spark-submit commands below to run several of the official Spark examples against Nomad. You can monitor Nomad status simulaneously with:

$ nomad status
$ nomad status [JOB_ID]
$ nomad alloc-status [ALLOC_ID]

SparkPi

Java

$ ./bin/spark-submit --class org.apache.spark.examples.JavaSparkPi --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz examples/jars/spark-examples*.jar 100

Python

$ ./bin/spark-submit --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz examples/src/main/python/pi.py 100

Scala

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz examples/jars/spark-examples*.jar 100

Machine Learning

Python

$ ./bin/spark-submit --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz examples/src/main/python/ml/logistic_regression_with_elastic_net.py

Scala

$ ./bin/spark-submit --class org.apache.spark.examples.SparkLR --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz examples/jars/spark-examples*.jar

pyspark

$ ./bin/pyspark --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz

df = spark.read.json("examples/src/main/resources/people.json")
df.show()
df.printSchema()
df.createOrReplaceTempView("people")
sqlDF = spark.sql("SELECT * FROM people")
sqlDF.show()

spark-shell

$ ./bin/spark-shell --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz

:type spark
spark.version

val data = 1 to 10000
val distData = sc.parallelize(data)
distData.filter(_ < 10).collect()

spark-sql

$ ./bin/spark-sql --master nomad --conf spark.executor.instances=8 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/rcgenova-nomad-spark/spark-2.1.0-SNAPSHOT-bin-nomad-spark.tgz jars/spark-sql_2.11-2.1.0-SNAPSHOT.jar