-
Notifications
You must be signed in to change notification settings - Fork 0
Installation
Takenori Sato edited this page Jan 14, 2016
·
3 revisions
This will walk you through INSTALL_SPARK that helps you setup Spark stand along on HyperStore cluster. stand along simply means it runs without Hadoop YARN nor mess.
This is going to setup the following storage system.
| Cluster Sharing | Data Locality | Hadoop FileSystem | Notes |
|---|
YES | node | hsfs | managed chunk size
Before proceeding further, please make sure that you have your Cloudian HyperStore cluster up and running.
- go to Spark download page, select 1.5.2 & Pre-built for Hadoop 2.6 or later, then download spark-1.5.2-bin-hadoop2.6.tgz.
- upload spark-1.5.2-bin-hadoop2.6.tgz to your Cloudian HyperStore nodes
- unpack the uploaded spark-1.5.2-bin-hadoop2.6.tgz to your Spark installation directory(e.g. /opt)
- upload hap/build/*.jar(see the followings) to the shared location(e.g. /usr/local/lib/) of your nodes
- hadoop-aws-2.7.1.jar
- hap-5.2.1.jar
- aws-java-sdk-1.7.4.jar
- pick one(e.g. cloudian-node1) as a master, go to Spark installation directory, and copy spark-env.sh to modify
[root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# cp conf/spark-env.sh.template conf/spark-env.sh - add the following classpaths to SPARK_CLASSPATH in spark-env.sh to make S3 FileSystem and HyperStore FileSystem available
e.g. SPARK_CLASSPATH=/usr/local/lib/*:/opt/cloudian/conf:/opt/cloudian/lib/apache-cassandra-2.0.11.jar:/opt/cloudian/lib/apache-cassandra-clientutil-2.0.11.jar:/opt/cloudian/lib/apache-cassandra-thrift-2.0.11.jar:/opt/cloudian/lib/cassandra-driver-core-2.1.4.jar:/opt/cloudian/lib/cloudian-s3-5.2.jar:/opt/cloudian/lib/commons-pool-1.5.5.jar:/opt/cloudian/lib/jetty-util-9.2.3.v20140905.jar:/opt/cloudian/lib/hector-core-1.1-4.jar:/opt/cloudian/lib/guava-17.0.jar:/opt/cloudian/lib/jedis-2.0.1-jmx.jar:/opt/cloudian/lib/snappy-java-1.1.0.1.jar:/opt/cloudian/lib/httpclient-4.3.6.jar:/opt/cloudian/lib/httpcore-4.4.1.jar - copy conf/spark-defaults.conf to modify
[root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# cp conf/spark-defaults.conf.template conf/spark-defaults.conf - add hsfs(s3a) related properties
[root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# cat conf/spark-defaults.conf | grep hadoop spark.hadoop.fs.s3a.access.key ACCESS_KEY spark.hadoop.fs.s3a.secret.key SECRET_KEY spark.hadoop.fs.s3a.connection.ssl.enabled true|false spark.hadoop.fs.s3a.endpoint S3.DOMAIN.COM:S3_PORT spark.hadoop.fs.hsfs.impl com.cloudian.hadoop.HyperStoreFileSystem - copy the modified spark-env.sh and spark-defaults.conf to the other nodes
- start spark master on one node(cloudian-node1), and slave services on every node including the maser node
e.g. master on cloudian-node1 [root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# sbin/start-master.sh e.g. slave(2 cores, 2 GB) on cloudian-node6 [root@cloudian-node6 spark-1.5.2-bin-hadoop2.6]# sbin/start-slave.sh -c 2 -m 2g spark://cloudian-node1:7077 - check the status of the Spark cluster on the Spark master UI
e.g. http://cloudian-node1:8080/