Apache Hbase is column-oriented distributed datastore. Previously, we have shown installing Apache Accumulo. Accumulo is distributed key/value store, built on Hadoop. In another guide we have shown how to install Apache Cassandra. Cassandra is a column-oriented distributed datastore, inspired by BigTable. We have sown how to install Hadoop on single server instance. We can install Hbase without installing Hadoop.
The reason to use Apache HBase instead of conventional Apache Hadoop is mainly to do random reads and writes. When we are using Hadoop we are going to read the whole dataset whenever we want to run a MapReduce job. Hadoop is a distributed file system (HDFS) and MapReduce (a framework for distributed computing). HBase is key-value data store built on top of Hadoop (on top of HDFS).
Hadoop comprises of HDFS and Map-Reduce. HDFS is a file system which provides a reliable storage with high fault tolerance using replication by distributing the data across a set of nodes. The Hadoop distributed file system aka HDFS provides multiple jobs for us. It consists of 2 components, NameNode (Where the metadata about the file system is stored.) and datanodes(Where the actual distributed data is stored).
Map-Reduce is a set of 2 types of java daemons called Job-Tracker and Task-Tracker. Job-Tracker daemon governs the jobs to be executed, whereas the Task-tracker daemons are the daemons which run on top of the data-nodes in which the data is distributed so that they can compute the program execution logic provided by the user specific to the data within the corresponding data-node.
---
HDFS is the storage component and Map-Reduce is the Execution component. As for the HBase concern, simply we can not connect remotely to HBase without using HDFS because HBase can not create clusters and it has its own local file system. HBase comprises of HMaster (Which consists of the metadata) and RegionServers. RegionServers are another set of daemons running on top of the data-node in the HDFS cluster to store and compute the database related data in the HDFS cluster. We store this in HDFS so that we exploit the core functionality of HDFS that is data replication and fault tolerance. The difference between Map-Reduce Daemons and Hbase-RegionServer Daemons which run on top of HDFS is that, the Map-Reduce Daemons only perform Map-Reduce (Aggregation) type of jobs, whereas the Hbase-RegionServer daemons perform the DataBase related functionalities like read, write etc.
HBase does not support SQL scripting. HBase is not a direct replacement of classic SQL database, but Apache Phoenix project provides a SQL layer for HBase as well as JDBC driver that can be integrated with various analytics and business intelligence applications. The Apache Trafodion project provides a SQL query engine with ODBC and JDBC drivers and distributed ACID transaction protection across multiple statements, tables and rows that uses HBase as a storage engine.
Here is Step By Step Guide On How To Install Apache HBase On Ubuntu Single Cloud Server Instance.
Steps on How To Install Apache HBase
Requirements :
- DNS pointing properly
- SSH with root user
- Minimum Java 6 from Oracle
- Setup of Loopback IP
- NTP
- Tweaking ulimit and nproc
Point DNS with FQDN. You can not use localhost on public server. Both forward and reverse DNS resolving must work. HBase expects the loopback IP address to be 127.0.0.1. We will tweak the settings later.
Create ssh passwordless login like we did for installation of Hadoop.
In the file /etc/security/limits.conf
add :
1 2 3 | hadoop - nofile 32768 hadoop hard nproc 32000 hadoop soft nproc 32000 |
In the file /etc/pam.d/common-session
add as the last line in the file:
1 | session required pam_limits.so |
/etc/hosts
should look something like this:
1 2 3 4 | ... 127.0.0.1 localhost 127.0.0.1 your.domain.com ... |
Install commonly needed things :
1 2 | sudo apt-get update sudo apt-get install -y git wget ntp maven tar make gcc ant |
Log out and log back in again for the changes to take effect. Commonly we install Oracle Java in this way :
1 2 3 | sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer |
We can install in different way as well :
1 2 3 4 5 6 7 8 9 | #JDK installation #copy jdk file wget http://download.oracle.com/otn/java/jdk/8u172-b11/a58eab1ec242421181065cdc37240b08/jdk-8u172-linux-x64.tar.gz sudo cp jdk-8u112-linux-x64.tar.gz /usr/lib cd /usr/lib #extract jdk file sudo tar -xvf jdk-8u112-linux-x64.tar.gz #remove compressed file sudo rm jdk-8u112-linux-x64.tar.gz |
For manual setting up PATH
and JAVA_HOME
variables, add the following commands to ~/.profile
file :
1 2 | export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64 export PATH=$PATH:$JAVA_HOME/bin |
And follow by reload :
1 | source ~/.profile |
Now we need to install Hbase :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #Install Hbase #create hbase directory sudo mkdir -p /usr/lib/hbase #download latest Hbase ## http://www.apache.org/dyn/closer.cgi/hbase/ ## https://github.com/apache/hbase wget http://www-eu.apache.org/dist/hbase/2.1.0/hbase-2.1.0-bin.tar.gz cp hbase-* /usr/lib/hbase cd /usr/lib/hbase #extract hbase files tar -xzvf hbase-2.1.0-bin.tar.gz #remove habse compressed file #rm hbase-2.1.0-bin.tar.gz #make sure that hbase folder is at path like # /usr/lib/hbase/hbase-2.1.0 cd /usr/lib/hbase/hbase-2.1.0/conf |
You’ll get a file named hbase-site.xml
there, move it as backup :
1 | mv hbase-site.xml hbase-site.xml.backup |
Create empty file named hbase-site.xml
:
1 2 | touch hbase-site.xml nano hbase-site.xml |
It will be like :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | <?xml version=\"1.0\"?> <?xml-stylesheet type=\"text/xsl\" href=\"configuration.xsl\"?> <configuration> <property> <name>hbase.rootdir</name> <value>$HOME/HBASE/hbase/</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>$HOME/HBASE/zookeeper/</value> </property> </configuration> |
Final steps :
1 2 3 4 5 | mkdir HBASE mkdir HBASE/hbase mkdir HBASE/zookeeper echo "export HBASE_HOME=/usr/lib/hbase/hbase-2.1.0" >> ~/.profile echo "export PATH=\$PATH:\$HBASE_HOME/bin" >> ~/.profile |
To build from source, :
1 2 3 | mvn package -DskipTests # Run test mvn test -fn |
For Ubuntu 18.04 LTS, you need this patch :
1 2 | wget https://issues.apache.org/jira/secure/attachment/12899868/HBASE-19188.branch-1.2.002.patch patch -p1 -i HBASE-19188.branch-1.2.002.patch |
Finally, we can start HBase, with the start-hbase.sh
script from bin directory, like :
1 | /usr/local/HBase/bin/start-hbase.sh |
We can also start Hbase shell :
1 | hbase shell |