What to do to install and maintain the IS pipeline≈
# Install Homebrew
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
# Install Java
brew install java
# Install Scala
brew install scala
# Install Spark
# Check that the version installed is the same than on AWS EMR
# https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark.html
brew install apache-spark
# Install Python 3
# Yes, Python 3, and not 2.7 - Live with the times!
brew install python3
# Install Snappy
brew install snappy
# Install Pip
brew install pip3
# Install AWS Command Line Interface
pip3 install awscli
# Install python packages
pip3 install bs4
pip3 install numpy
pip3 install pandas
pip3 install matplotlib
pip3 install boto3
pip3 install pyspark
pip3 install parquet
pip3 install pyarrow
# List of things to install
- Eclipse
- Scala for Eclipse
- PyDev for Eclipse
- Tableau