In our earlier guide, we described how to install Gradle on Ubuntu 18.04 LTS. Here Are the Steps on How to Run Apache SAMOA with Apache S4. In this guide, we will use that Gradle. Full Form of S4 is Simple Scalable Streaming System. It is a pluggable platform to develop applications. Apache S4 is a distributed, scalable, pluggable platform to allow the programmers to develop applications for processing continuous, unbounded streams of data. Apache SAMOA is a distributed streaming machine learning (ML) framework. Here are are official sites of Apache SAMOA with Apache S4 :
1 2 | https://samoa.incubator.apache.org/index.html https://github.com/apache/incubator-s4 |
Apache S4 is a retired project on Apache incubator. S4 applications can be deployed on YARN for easy deployment and automatic failover. S4 integration is tested with Hadoop/YARN. YARN allowsvarious kinds of applications in addition to MapReduce applications. We need Zookeeper for it.
How to Run Apache SAMOA with Apache S4
You’ll get 2013’s Apache S4 :
---
1 | https://github.com/apache/incubator-s4/releases |
wget it and uncompress :
1 2 3 4 5 6 7 | wget https://github.com/apache/incubator-s4/archive/0.6.0-Final.tar.gz ls -al tar -xzvf 0.6.0-Final.tar.gz ls al # delete the tar archive rm 0.6.0-Final.tar.gz # cd to that directory |
From the root directory of the S4 project:
1 2 | ./gradlew install ./gradlew s4-tools:installApp |
The above commands will build the packages and install the artifacts in the local maven repository and build the tools will help you so that you can work with the platform through the s4
command.
1 2 3 4 5 | # change version numbers, path # set the Apache S4 environment variable export S4_HOME=/foo/bar/apache-s4-0.6.0-incubating-src # add the S4_HOME to the system PATH export PATH=$PATH:$S4_HOME |
We need to compile Apache SAMOA for Apache S4. You’ll see that there is list of needed S4 dependencies for executing SAMOA with Apache S4. We can simply clone the repository and install Apache SAMOA :
1 2 3 | git clone http://git.apache.org/incubator-samoa.git cd incubator-samoa mvn -Ps4 package |
The jars for SAMOA will be in :
1 | target/SAMOA-<variant>-<version>-SNAPSHOT.jar |
Find bin/samoa-s4.properties
file and do these kind of changes :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # Zookeeper Server zookeeper.server=localhost zookeeper.port=2181 # Apache S4 also distributes the application via HTTP # therefore the server and port which contains # the S4 application must be provided # Simple HTTP Server providing the packaged S4 jar http.server.ip=localhost http.server.port=8000 # Apache S4 uses the concept of logical clusters to # define a group of machines, which are identified by # an ID and start serving on a specific port. # Name of the S4 cluster cluster.name=cluster cluster.port=12000 # SAMOA can be deployed on a single machine using only # one resource or in a cluster environments. # The following property can be defined to deploy as a # local application or on a cluster. # Deployment strategy samoa.deploy.mode=local |
The execution syntax is :
1 | bin/samoa <platform> <jar-location> <task & options> |
This is an example command :
1 | bin/samoa S4 target/SAMOA-S4-0.0.1-SNAPSHOT.jar "ClusteringEvaluation" |