Apache Hama is a Distributed Computing Framework For Massive Scientific Computations. Hama consists of 3 major components – BSPMaster, GroomServers and Zookeeper. It is a framework for Big Data analytics which uses the Bulk Synchronous Parallel (BSP) computing model. It provides BSP programming model, vertex and neuron centric programming models. Hama can be installed in local/pseudo-distributed mode or as HDFS installation for multiple node. We can set up Apache Hama to work with Apache Mesos, Apache Hadoop, Apache Spark. In order to run Hama on Mesos it is required that Mesos already be installed on the cluster. Instructions to set up Mesos may be found at the project website. Here is How to Install Apache Hama as HDFS Installation.
How to Install Apache Hama
We have to assume that you have already set up HDFS (may be CDH on your cluster) and Apache Mesos. CDH, HDP, and Mesosphere does not need any installation. We must point that some versions of HDFS CDH may be odd error and version of Apache Hama older than 0.70 may have different guides. Officially requirements to run Hama is hadoop-1.0.x or higher version (non-secure version), Sun Java JDK 1.7.x or higher version and SSH access to manage BSP deamons.
Download latest Hama :
---
1 | http://www.apache.org/dyn/closer.cgi/hama |
First clean previous versions on Hama on HDFS:
1 | hadoop fs -rm /hama.tar.gz |
Now build Hama for the particular HDFS and Mesos version, command will be like :
1 | mvn clean install -Phadoop2 -Dhadoop.version=2.3.0-cdh5.1.2 -Dmesos.version=0.20.0 -DskipTests |
We have to put it to HDFS (name of tar file same as your configuration file) :
1 | hadoop fs -put dist/target/hama-0.7.1-SNAPSHOT.tar.gz /hama.tar.gz |
Move to the distribution directory:
1 | cd dist/target/hama-0.7.0-SNAPSHOT/hama-0.7.1-SNAPSHOT/ |
Set LD_LIBRARY_PATH
or MESOS_NATIVE_LIBRARY
environment variables correctly to mesos installation libraries. Default is usr/lib/mesos
:
1 | export LD_LIBRARY_PATH=/root/mesos-installation/lib/ |
Without setting them correctly, bspmaster will not start throwing Expecting an absolute path of the library
error on log. Configure Hama with Mesos, you can follow instructions from the Hama wiki. This is an example configuration :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | <configuration> <property> <name>bsp.master.address</name> <value>euca-10-2-112-10.eucalyptus.internal:40000</value> <description>The address of the bsp master server. Either the literal string "local" or a host[:port] (where host is a name or IP address) for distributed mode. </description> </property> <property> <name>bsp.master.port</name> <value>40000</value> <description>The port master should bind to.</description> </property> <property> <name>bsp.master.TaskWorkerManager.class</name> <value>org.apache.hama.bsp.MesosScheduler</value> <description>Instructs the scheduler to use Mesos to execute tasks of each job </description> </property> <property> <name>fs.default.name</name> <value>hdfs://euca-10-2-112-10.eucalyptus.internal:9000</value> <description> The name of the default file system. Either the literal string "local" or a host:port for HDFS. </description> </property> <property> <name>hama.mesos.executor.uri</name> <value>hdfs://euca-10-2-112-10.eucalyptus.internal:9000/hama.tar.gz</value> <description>This is the URI of the Hama distribution </description> </property> <!-- Hama requires one cpu and memory defined by bsp.child.java.opts for each slot. This means that a cluster with bsp.tasks.maximum.total set to 2 and bsp.child.jova.opts set to -Xmx1024m will need at least 2 cpus and and 2048m of memory. --> <property> <name>bsp.tasks.maximum.total</name> <value>2</value> <description>This is an override for the total maximum tasks that may be run. The default behavior is to determine a value based on the available groom servers. However, if using Mesos, the groom servers are not yet allocated. So, a value indicating the maximum number of slots available in the cluster is needed. </description> </property> <property> <name>hama.mesos.master</name> <value>zk://euca-10-2-112-10.eucalyptus.internal:2181/mesos</value> <description>This is the address of the Mesos master instance. If you're using Zookeeper for master election, use the Zookeeper address here (i.e.,zk://zk.apache.org:2181/hadoop/mesos). </description> </property> <property> <name>bsp.child.java.opts</name> <value>-Xmx1024m</value> <description>Java opts for the groom server child processes. </description> </property> <property> <name>bsp.system.dir</name> <value>${hadoop.tmp.dir}/bsp/system</value> <description>The shared directory where BSP stores control files. </description> </property> <property> <name>bsp.local.dir</name> <value>/mnt/bsp/local</value> <description>local directory for temporal store.</description> </property> <property> <name>hama.tmp.dir</name> <value>/mnt/hama/tmp/hama-${user.name}</value> <description>Temporary directory on the local filesystem.</description> </property> <property> <name>bsp.disk.queue.dir</name> <value>${hama.tmp.dir}/messages/</value> <description>Temporary directory on the local message buffer on disk.</description> </property> <property> <name>hama.zookeeper.quorum</name> <value>euca-10-2-112-10.eucalyptus.internal</value> <description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK is set in hama-env.sh this is the list of servers which we will start/stop zookeeper on. </description> </property> <property> <name>hama.zookeeper.property.clientPort</name> <value>2181</value> <description>The port to which the zookeeper clients connect </description> </property> </configuration> |
Run bspmaster:
1 | ./bin/hama-daemon.sh start bspmaster |
Check the log files for error :
1 2 | less hama-root-bspmaster-$HOSTNAME.log less hama-root-bspmaster-$HOSTNAME.out |
If everything is fine then you can run some examples, like :
1 | ./bin/hama jar hama-examples-0.7.1-SNAPSHOT.jar gen fastgen 100 10 randomgraph 2 |
You will get output two files as two partitions of a graph, now run :
1 | ./bin/hama jar hama-examples-0.7.0-SNAPSHOT.jar pagerank randomgraph pagerankresult 4 |
List files in HDFS :
1 | hadoop fs -ls /user/root/randomgraph |
We guess, now you can use more examples.
Tagged With awaysac , exclaimediqp , installation apache hama , J5BV , risingx8a , silencehut , subject5we