It is possibly confusing to many new users when we talk about combining various big data related softwares. Here is a List of Apache Projects For Big Data With Basic Practical Details Which is Helpful to the Developers Who Are New in Big Data Field. Apache Hadoop and Apache Spark are possibly most known. At present there are total 37 Apache projects which are directly related to Big Data.
List of Apache Projects For Big Data
There are the 37 Big Data related Apache projects:
- Apache Airavata
- Apache Ambari
- Apache Apex
- Apache Avro
- Apache Beam
- Apache Bigtop
- Apache BookKeeper
- Apache Calcite
- Apache CouchDB
- Apache Crunch
- Apache DataFu
- Apache DirectMemory
- Apache Drill
- Apache Edgent
- Apache Falcon
- Apache Flink
- Apache Flume
- Apache Giraph
- Apache Hama
- Apache Helix
- Apache Ignite
- Apache Kafka
- Apache Knox
- Apache MetaModel
- Apache Oozie
- Apache ORC
- Apache Parquet
- Apache Phoenix
- Apache REEF
- Apache Samza
- Apache Spark
- Apache Sqoop
- Apache Storm
- Apache Tajo
- Apache Tez
- Apache VXQuery
- Apache Zeppelin
Where is Apache Hadoop? Hadoop is officially under database. Most known and used are Apache Hadoop, Apache Spark, Apache Tez, Apache CouchDB, Apache Bigtop, Apache REEF. In next tier would be Apache Kafka, Apache Flume, Apache Drill, Apache Samza, Apache Storm, and the newer one Apache Edgent.
---
Big data is a generic terminology for all the softwares using newer strategies for large datasets. The most essential component of a big data system is processing framework. It is processing framework which computes over the data in the system.
In that way we can divide th frameworks as Batch-only frameworks (Apache Hadoop), Stream-only frameworks (Apache Storm, Apache Samza and Hybrid frameworks (Apache Spark, Apache Flink).
The reason of having so much softwares is lack of universal common way of data processing for all existing types of data. If you are new in Big Data and searching for starting with something, Apache Hadoop and Apache Spark are correct to start with. Reason is nothing but that is commonly suggested to understand the limitations.
Tagged With apache data related projects , list of apache projects