Installation

This will walk you through INSTALL_SPARK that helps you setup Spark stand along on HyperStore cluster. stand along simply means it runs without Hadoop YARN nor mess.

This is going to setup the following storage system.

Cluster Sharing	Data Locality	Hadoop FileSystem	Notes

 YES        |     node      |      hsfs         | managed chunk size

Before proceeding further, please make sure that you have your Cloudian HyperStore cluster up and running.

Installation Procedures

go to Spark download page, select 1.5.2 & Pre-built for Hadoop 2.6 or later, then download spark-1.5.2-bin-hadoop2.6.tgz.
upload spark-1.5.2-bin-hadoop2.6.tgz to your Cloudian HyperStore nodes
unpack the uploaded spark-1.5.2-bin-hadoop2.6.tgz to your Spark installation directory(e.g. /opt)
upload hap/build/*.jar(see the followings) to the shared location(e.g. /usr/local/lib/) of your nodes
1. hadoop-aws-2.7.1.jar
2. hap-5.2.1.jar
3. aws-java-sdk-1.7.4.jar
pick one(e.g. cloudian-node1) as a master, go to Spark installation directory, and copy spark-env.sh to modify
```
[root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# cp conf/spark-env.sh.template conf/spark-env.sh
```

add the following classpaths to SPARK_CLASSPATH in spark-env.sh to make S3 FileSystem and HyperStore FileSystem available

e.g.  
SPARK_CLASSPATH=/usr/local/lib/*:/opt/cloudian/conf:/opt/cloudian/lib/apache-cassandra-2.0.11.jar:/opt/cloudian/lib/apache-cassandra-clientutil-2.0.11.jar:/opt/cloudian/lib/apache-cassandra-thrift-2.0.11.jar:/opt/cloudian/lib/cassandra-driver-core-2.1.4.jar:/opt/cloudian/lib/cloudian-s3-5.2.jar:/opt/cloudian/lib/commons-pool-1.5.5.jar:/opt/cloudian/lib/jetty-util-9.2.3.v20140905.jar:/opt/cloudian/lib/hector-core-1.1-4.jar:/opt/cloudian/lib/guava-17.0.jar:/opt/cloudian/lib/jedis-2.0.1-jmx.jar:/opt/cloudian/lib/snappy-java-1.1.0.1.jar:/opt/cloudian/lib/httpclient-4.3.6.jar:/opt/cloudian/lib/httpcore-4.4.1.jar

copy conf/spark-defaults.conf to modify

[root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# cp conf/spark-defaults.conf.template conf/spark-defaults.conf

add hsfs(s3a) related properties

[root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# cat conf/spark-defaults.conf | grep hadoop  
spark.hadoop.fs.s3a.access.key	ACCESS_KEY  
spark.hadoop.fs.s3a.secret.key	SECRET_KEY  
spark.hadoop.fs.s3a.connection.ssl.enabled	true|false  
spark.hadoop.fs.s3a.endpoint		S3.DOMAIN.COM:S3_PORT  
spark.hadoop.fs.hsfs.impl		com.cloudian.hadoop.HyperStoreFileSystem

copy the modified spark-env.sh and spark-defaults.conf to the other nodes

start spark master on one node(cloudian-node1), and slave services on every node including the maser node

e.g. master on cloudian-node1
[root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# sbin/start-master.sh

e.g. slave(2 cores, 2 GB) on cloudian-node6
[root@cloudian-node6 spark-1.5.2-bin-hadoop2.6]# sbin/start-slave.sh -c 2 -m 2g spark://cloudian-node1:7077

check the status of the Spark cluster on the Spark master UI
```
e.g. http://cloudian-node1:8080/  
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Installation

Installation Procedures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally