By now, we know the differences between batch processing and stream processing, differences between Hadoop and Spark. Here are the steps to install the Apache Crunch on Hadoop. Crunch is used to creating pipelines that are composed of many user-defined functions. Crunch can run with Hadoop MapReduce and Apache Spark. Apache Crunch library is for tasks which are difficult to implement with just MapReduce. It’s APIs are useful for processing data which does not fall into the relational model, serialized object formats. There is Scrunch API for the Scala users.
Apache Hive and Apache Pig are built to make MapReduce accessible who has limited experience in Java programming. The crunch was designed for the developers who understand Java.
Steps to Install Apache Crunch
Naturally we need Haddop to be installed, may be Apache Spark, Apache Avro with it. Hadoop 2.x should use minimum 0.14.0 version of Crunch.
---
You’ll get the latest version of Apache Crunch from here :
1 2 3 4 5 | # https://crunch.apache.org/download.html # |
You need Apache Maven to be installed.
So, you can run :
1 2 3 4 5 6 7 | # wget http://www.apache.org/dyn/closer.cgi/crunch/crunch-0.15.0/apache-crunch-0.15.0-src.tar.gz tar -xzvz apache-crunch-0.15.0-src.tar.gz rm apache-crunch-0.15.0-src.tar.gz ls -al | grep crunch mvn clean install # |
Also, the Crunch project provides Maven artefacts on Maven Central of the form:
1 2 3 4 5 | <dependency> <groupId>org.apache.crunch</groupId> <artifactId>crunch-core</artifactId> <version>${crunch.version}</version> </dependency> |
The getting started with Apache Crunch guide will be found here.
Tagged With how to install crunch for windows , install apache 09