Apache SystemML provides a system for machine learning using big data on top of Apache Spark. Previously we described how to install Apache Mahout for building machine learning platform. Here is another solution for machine learning – Apache SystemML. In this guide we will show you how to install Apache SystemML machine learning system on Ubuntu 16.04. SystemML needs a minimum guidance to get started to use it. SystemML uses high-level declarative machine learning language that comes in two flavours, among them one has R-like syntax (DML) and another Python-like syntax (PyDML). Algorithm scriptsare written in DML and PyDML which can be run on Hadoop, on Spark, or standalone mode without modification of script. After installation, you can read this official guide :
1 | http://systemml.apache.org/docs/0.14.0/beginners-guide-to-dml-and-pydml.html |
We can use the Spark MLContext API to run SystemML from Scala or Python using spark-shell, pyspark, or spark-submit. Installation of Apache SystemML itself is just easy.
How To Install Apache SystemML Machine Learning System on Ubuntu
Obvious, the first step is to use an existing installation of Apache Spark or follow our guide to install Apache Spark on Ubuntu Server.
---
Official Github repo of Apache SystemML :
1 | https://github.com/apache/systemml |
suggests to use LinuxBrew, however we find no reason to use Linux Brew unlike MacOS X (read as UNIX). In later case the reason to use HomeBrew is practically absent package management system on UNIX system. GNU/Linux does not ship with wheel group configured by default. On production server, the thing can have minimum security risk unlike MacOS X.
You simply need python2 or python3. Probably python3 is practical if you are going to use Anaconda somehow and Anaconda is for python3.
Simply you need three packages – jupyter, matplotlib, numpy. We can install with pip :
1 | pip3 install jupyter matplotlib numpy |
Then download the latest Apache SystemML :
1 | http://systemml.apache.org/download.html |
Uncompress it somewhere you want, like at path/to/
directory. Then add this on .bashrc
or .bash_profile
:
1 | export SYSTEMML_HOME=path/to/systemml-0.14.0-incubating |
Then source the .bashrc
or .bash_profile
:
1 2 3 | source ~/.bashrc # or source ~/.bash_profile |
Official guides have different ways of installation at different official places! There is also another way of installation :
1 | pip install systemml |
Possibly you will need to configure Jupyter Notebook :
1 2 | # Start Jupyter Notebook Server PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*] --conf "spark.driver.memory=12g" --conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128 --conf spark.default.parallelism=100 |
This will be helpful guide :
1 | http://systemml.apache.org/docs/0.14.0/spark-mlcontext-programming-guide#spark-shell-example |