Friday, June 26, 2020

How to Install Apache Spark on Linux Ubuntu

First of all, you need to have Java already installed; if you don't I recommend this article to install Java. Then you will need to install Scala:

wget https://downloads.lightbend.com/scala/2.12.11/scala-2.12.11.tgz
tar xvf scala-2.12.11.tgz
sudo mv scala-2.12.11 /usr/local/scala
view raw sh hosted with ❤ by GitHub
Once done installing Scala, you will need to download the Apache Spark binaries archive and install it:

wget https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz
tar xvf spark-3.0.0-bin-hadoop3.2.tgz
sudo mv spark-3.0.0-bin-hadoop3.2 /usr/local/spark
sudo cp /usr/local/spark/conf/log4j.properties.template /usr/local/spark/conf/log4j.properties
view raw sh hosted with ❤ by GitHub
Afterward and in order to avoid moving to Spark's bin directory every time you want to interact with it, add Spark's bin directory to the PATH system's variable. To do so open .bashrc script vim .bashrc and add the following lines in the bottom of the file:

export SCALA_HOME=/usr/local/scala
export PATH=$SCALA_HOME/bin:$PATH
export SPARK_HOME=/usr/local/spark
export PATH=$SPARK_HOME/bin:$PATH
view raw .bashrc hosted with ❤ by GitHub
That's it, you only have to apply the last modifications running source ~/.bashrc

No comments:

Post a Comment