Skip to content
Takenori Sato edited this page Jan 14, 2016 · 3 revisions

This will walk you through INSTALL_SPARK that helps you setup Spark stand along on HyperStore cluster. stand along simply means it runs without Hadoop YARN nor mess.

This is going to setup the following storage system.

Cluster Sharing Data Locality Hadoop FileSystem Notes
 YES        |     node      |      hsfs         | managed chunk size

Before proceeding further, please make sure that you have your Cloudian HyperStore cluster up and running.

Installation Procedures

  1. go to Spark download page, select 1.5.2 & Pre-built for Hadoop 2.6 or later, then download spark-1.5.2-bin-hadoop2.6.tgz.
  2. upload spark-1.5.2-bin-hadoop2.6.tgz to your Cloudian HyperStore nodes
  3. unpack the uploaded spark-1.5.2-bin-hadoop2.6.tgz to your Spark installation directory(e.g. /opt)
  4. upload hap/build/*.jar(see the followings) to the shared location(e.g. /usr/local/lib/) of your nodes
    1. hadoop-aws-2.7.1.jar
    2. hap-5.2.1.jar
    3. aws-java-sdk-1.7.4.jar
  5. pick one(e.g. cloudian-node1) as a master, go to Spark installation directory, and copy spark-env.sh to modify
    [root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# cp conf/spark-env.sh.template conf/spark-env.sh
    
  6. add the following classpaths to SPARK_CLASSPATH in spark-env.sh to make S3 FileSystem and HyperStore FileSystem available
    e.g.  
    SPARK_CLASSPATH=/usr/local/lib/*:/opt/cloudian/conf:/opt/cloudian/lib/apache-cassandra-2.0.11.jar:/opt/cloudian/lib/apache-cassandra-clientutil-2.0.11.jar:/opt/cloudian/lib/apache-cassandra-thrift-2.0.11.jar:/opt/cloudian/lib/cassandra-driver-core-2.1.4.jar:/opt/cloudian/lib/cloudian-s3-5.2.jar:/opt/cloudian/lib/commons-pool-1.5.5.jar:/opt/cloudian/lib/jetty-util-9.2.3.v20140905.jar:/opt/cloudian/lib/hector-core-1.1-4.jar:/opt/cloudian/lib/guava-17.0.jar:/opt/cloudian/lib/jedis-2.0.1-jmx.jar:/opt/cloudian/lib/snappy-java-1.1.0.1.jar:/opt/cloudian/lib/httpclient-4.3.6.jar:/opt/cloudian/lib/httpcore-4.4.1.jar  
    
    
  7. copy conf/spark-defaults.conf to modify
    [root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# cp conf/spark-defaults.conf.template conf/spark-defaults.conf  
    
  8. add hsfs(s3a) related properties
    [root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# cat conf/spark-defaults.conf | grep hadoop  
    spark.hadoop.fs.s3a.access.key	ACCESS_KEY  
    spark.hadoop.fs.s3a.secret.key	SECRET_KEY  
    spark.hadoop.fs.s3a.connection.ssl.enabled	true|false  
    spark.hadoop.fs.s3a.endpoint		S3.DOMAIN.COM:S3_PORT  
    spark.hadoop.fs.hsfs.impl		com.cloudian.hadoop.HyperStoreFileSystem  
    
    
  9. copy the modified spark-env.sh and spark-defaults.conf to the other nodes
  10. start spark master on one node(cloudian-node1), and slave services on every node including the maser node
    e.g. master on cloudian-node1
    [root@cloudian-node1 spark-1.5.2-bin-hadoop2.6]# sbin/start-master.sh
    
    e.g. slave(2 cores, 2 GB) on cloudian-node6
    [root@cloudian-node6 spark-1.5.2-bin-hadoop2.6]# sbin/start-slave.sh -c 2 -m 2g spark://cloudian-node1:7077
    
  11. check the status of the Spark cluster on the Spark master UI
    e.g. http://cloudian-node1:8080/  
    

Clone this wiki locally