Commonly either peoples are versed with Apache Hadoop or Openstack Swift. The topic integration of Apache Hadoop with Openstack Swift is not exactly new. Good experience with both together may be rare. You can follow our this guide specially for handling OpenStack part without searching here and there. Further you can use this website’s search function to find our old guides apart from the linked articles.
Integration of Apache Hadoop With OpenStack Swift : Foreword
You must know what you are doing. Hadoop file system widely used with HDFS, but most of them are not built to work out of the box with object storage. It is not odd to get odd behaviour as response. OpenStack installations can differ from vendor to vendor. We have good number of guides on Rackspace and HP Cloud fully separately. Access can be API based or username password based.
Integration of Apache Hadoop With OpenStack Swift
We guess that you already have Apache Hadoop up and running, if not please follow our guide for installation and setup of Apache Hadoop on single server instance.
---
Coming to OpenStack Swift, it has client which we described on earlier series of guides like Installation and Setup of OpenStack Python Packages, Uploading to a Swift Container (HP Cloud), Emptying a Swift Container (HP Cloud), mount OpenStack Swift on Ubuntu server (Rackspace) and so an.
After getting used with ordinary files with OpenStack Swift, you can follow Apache Hadoop’s official guide :
1 | https://hadoop.apache.org/docs/current2/hadoop-openstack/index.html |
In the same way, OpenStack Swift has official guide :
1 | https://docs.openstack.org/developer/sahara/userdoc/hadoop-swift.html |
Regardless of cloud vendor, you’ll need these to configure Apache products including Hadoop, Spark to access :
- username
- region for your container
- authorization URL
- API key OR password
Please remember that we are not talking about tenant name. Above linked guides have easy integration of those basic things with bash or ZSH. To configure Hadoop for Swift, at location /usr/share/hadoop/etc/hadoop
, you’ll find hadoop-env.sh
. You need to add a line in this format :
1 | export HADOOP_CLASSPATH=/usr/share/hadoop/share/hadoop/tools/lib/hadoop-openstack-VERSION.jar:/usr/share/hadoop/share/hadoop/tools/lib/httpclient-x.y.z.jar:/usr/share/hadoop/share/hadoop/tools/lib/httpcore-x.y.z.jar:$HADOOP_CLASSPATH |
Another file is usr/share/hadoop/etc/hadoop/core-site.xml
, swift_test
is our example name :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | <property> <name>fs.swift.service.swift_test.auth.url</name> <value>https://identity.vendor.openstack.replace.this.url/v2.0/tokens</value> <description>VendorName US (multiregion)</description> </property> <property> <name>fs.swift.service.swift_test.username</name> <value>OS_USER</value> </property> <property> <name>fs.swift.service.swift_test.region</name> <value>OS_REGION</value> </property> <property> <name>fs.swift.service.swift_test.apikey</name> <value>OS_APIKEY</value> </property> |
In those official guides you’ll find another variable in case you are running Hadoop outside of OpenStack Swift provider’s datacenter. We already talked about installation of Apache Spark with Hadoop. For, Spark you need to add these lines to /usr/share/spark/conf/spark-env.sh
.
1 2 | export SPARK_DIST_CLASSPATH=$(/usr/share/hadoop/bin/hadoop classpath) export HADOOP_CONF_DIR=/usr/share/hadoop/etc/hadoop |