Get Started. It's Free
or sign up with your email address
Hadoop Ecosystem by Mind Map: Hadoop Ecosystem

1. Column store

2. Pig

2.1. Scripting for Hadoop

3. HBase

3.1. Transactional lookups

4. Flume

4.1. Log collector

4.2. Integrates into Hadoop

5. Avro

5.1. Data parsing

5.2. Binary data serialization

5.3. RPC

5.4. language-neutral

5.5. optional codegen

5.6. schema evolution

5.7. untagged data

5.8. dynamic typing

6. Mahout

6.1. Machine learning

6.2. Applied to MR

7. Ambari

7.1. Cluster deployment and admin

7.2. Driven by Hortonworks

8. ZooKeeper

8.1. Coordinator of shared state between apps

8.2. Naming, configuration, and synchronization services

9. YARN

9.1. cluster management

9.2. Hadoop 2

9.3. resource manager

9.4. job scheduler

10. BigTop

10.1. Package Hadoop ecosys

10.2. Test Hadoop ecosys package

11. Query data stored in HDFS and HBase

12. Non-relational

13. Maps query onto nodes

14. Coordinator jobs are recurrent Oozie Workflow jobs that are triggered by time and data availability.

15. Reduces aggregated results into answers

16. Links jobs

16.1. Workflow processing

17. Bundle provides a way to package multiple coordinator and workflow jobs and to manage the lifecycle of those jobs

17.1. Connects non-Hadoop stores (RDBMS)

17.2. Moves data to & from RDBMS to Hadoop

18. Workflow jobs are Directed Acyclical Graphs (DAGs), specifying a sequence of actions to execute. The Workflow job has to wait

19. Hive

19.1. SQL-like querying

19.2. Combiner can be used to optimize reducer performance

19.3. Structured data warehousing

19.4. Partition columns instead of indexes

20. Oozie

21. Sqoop

21.1. Autogens Java InputFormat code for data access

22. MapReduce

22.1. Distributed compute

23. Related Apache Ecosystems

24. HDFS

24.1. Distributed storage

25. Spark

26. Impala

26.1. SQL query egnine

26.2. Real time

27. Cascading

27.1. Higher abstraction from MR

27.2. Creates Flow that assembles Map/Reduce jobs