This guide is not directly related to typical Big Data tools like Hadoop. In this guide we will configuring Docker and Jupyter on a cloud server. Here are the steps, commands to create data science environment on cloud server with Docker for data analysis starting with a blank server with SSH. The users following this guide must have adequate knowledge so that they do not need to read guides like our basic server setup. This kind of environment is great for testing and development and avoids the clutter out of installing many packages on localhost. For our works, we need a cloud server instance least 2GB of RAM. Xen, KVM virtualisation should be enough. VPSDime has cheap OpenVZ servers but you should send email to their support for Docker. Their dirt cheap package does not support Docker. Next option is Linode for 2 GB. Next option is OVH. Next option is Digital Ocean (which is most costly among all the names of web hosts).
Create Data Science Environment on Cloud Server With Docker For Data Analysis
For this example, we are using Ubuntu 16.04 LTS as server OS, however any server OS like REHL, CentOS would work fine. We have previously published series of articles on using Docker. On that series we also had guide on Docker web management UI. Follow both guides to create a new user other than root with full sudo
access, another with full sudo
access and SSH access. Disable root
to SSH. Install Docker. YOUR-USERNAME
in our guide is an example of user other than root with full sudo
access.
We are installing Docker Community Edition (Docker CE). Running these few commands will do the job :
---
1 2 3 4 5 6 | curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" apt update apt upgrade apt-cache policy docker-ce apt install -y docker-ce |
Check the status :
1 | sudo systemctl status docker |
Make sure to run :
1 | sudo usermod -aG docker YOUR-USERNAME |
…after installing Docker, then exit
the ssh session, ssh back again, become YOUR-USERNAME
and run :
1 2 | mkdir -p /home/YOUR-USERNAME/notebooks ls -al /home/YOUR-USERNAME/ |
We need some Docker image to download with some data science stuffs with python 3 like jupyter notebook, and data science libraries like numpy, pandas, scipy, scikit-learn, nltk etc. Like this image has lot of stuffs even for visualisation, machine learning :
1 | https://hub.docker.com/r/alessandroadamo/ubuntu-ds-python3/ |
One the image is downloaded, you can start ocker container with :
1 2 3 4 5 6 | docker run -d -p 8888:8888 -v /home/ds/notebooks:/home/ds/notebooks alessandroadamo/ubuntu-ds-python3 Replace `alessandroadamo/ubuntu-ds-python3/` with the name of the image you'll actually download to use. Next step is using a web server with reverse proxy like Nginx. In that way the server can take requests from the public internet and pass to our Jupyter server above. You have to install nginx, use a practical domain/subdomain name and install CertBot/Let's encrypt. There are zillion of guides on our website on installing configuring Nginx and CertBot/Let's Encrypt. Kindly perform a search if you need guides. For Ubuntu, we install `nginx-extras` : |
1 | apt install nginx-extras |
List of free domain names and guide on CertBot/Let’s Encrypt for Nginx published before. Configuration for port 80 will be like this :
1 2 3 4 5 6 7 8 9 10 11 12 13 | server { listen 80; } server { client_max_body_size 10M; location / { proxy_pass http://127.0.0.1:8888; } } |
If you need to password protect the frontend, you can install apache2-utils
and generate password for YOUR-USERNAME
:
1 2 | apt install apache2-utils sudo htpasswd -c /etc/nginx/.htpasswd YOUR-USERNAME |
You need to change the Nginx config :
1 2 3 4 5 6 | ... location / { ... auth_basic "Restricted"; auth_basic_user_file /etc/nginx/.htpasswd; ... |
Check by running :
1 | service nginx -t |
and restart Nginx :
1 2 | sudo systemctl restart nginx # service nginx restart |