-
Notifications
You must be signed in to change notification settings - Fork 186
Open
Description
I was going through the sparkmllib_pipeline.py in Chapter 3 and had some issues getting it to run at first. There are a couple of issues that need to be addressed.
Turns out pyspark requires java to run smoothly, but there is no mention of it in Chapter 3 (mentioned in Chapter 2 only, impossible to recall)
And it doesn’t work with newer versions of Java (like Java 24 or 21),
Had to install as follows:
sudo apt update
sudo apt install openjdk-11-jdk -y
Then obviously,
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
Also, I was not able to get spark to work using:
sc = SparkContext("local", "pipelines")
spark = SparkSession.builder.getOrCreate()
So, with some look up I found this to do the job:
spark = SparkSession.builder \
.appName("pipelines") \
.master("local[*]") \
.getOrCreate()
After these changes, everything worked well!
Lastly, I did spark-submit sparkmllib_pipeline.py instead of python sparkmllib_pipeline.py, because apparently python interpreter cannot launch Spark JVM with correct settings.
Metadata
Metadata
Assignees
Labels
No labels