Docker settings (Dockerfile etc.) for data science
Default port open for Jupyter/PySpark access is 8123
- Bash Script to run after initializing Cloud instance
- AWS EC2, GCP Compute Engine, etc...
- Install Docker, set up basic useful aliases
- To Run:
- configure chmod:
sudo chmod 774 starterpack.sh - run script:
./starterpack.sh
- configure chmod:
etture/ybigta_img_python:1.2: basic Python image with Anacondaetture/ybigta_img_hadoop:1.2: image with Hadoopetture/ybigta_img_spark:1.2.2: image with Spark
- Size: 4.19 GB
- Pull:
sudo docker pull etture/ybigta_img_python:1.2 - Run:
sudo docker run -d -it -p 8123:8123 --name=ybigta-python etture/ybigta_img_python:1.2 /bin/bash - Exec:
sudo docker exec -it ybigta-python /bin/bash
- Size: 5.14 GB
- Pull:
sudo docker pull etture/ybigta_img_crawling:1.1 - Run:
sudo docker run -d -it -p 8123:8123 --name=ybigta-crawling etture/ybigta_img_crawling:1.1 /bin/bash - Exec:
sudo docker exec -it ybigta-crawling /bin/bash
- Size: 6.68 GB
- Pull:
sudo docker pull etture/ybigta_img_hadoop:1.2 - Run:
sudo docker run -d -it -p 8123:8123 --name=ybigta-hadoop etture/ybigta_img_hadoop:1.2 /bin/bash - Exec:
sudo docker exec -it ybigta-hadoop /bin/bash
- Size: 7.67 GB
- Pull:
sudo docker pull etture/ybigta_img_spark:1.2.4 - Run:
sudo docker run -d -it -p 8123:8123 -p 4040:4040 --name=ybigta-spark etture/ybigta_img_spark:1.2.4 /bin/bash - Exec:
sudo docker exec -it ybigta-spark /bin/bash