Get Started. It's Free
or sign up with your email address
Hadoop Ecosystem by Mind Map: Hadoop Ecosystem

1. Query data stored in HDFS and HBase

2. Column store

3. Non-relational

4. Maps query onto nodes

5. Coordinator jobs are recurrent Oozie Workflow jobs that are triggered by time and data availability.

6. Reduces aggregated results into answers

7. Links jobs

7.1. Workflow processing

8. Bundle provides a way to package multiple coordinator and workflow jobs and to manage the lifecycle of those jobs

8.1. Connects non-Hadoop stores (RDBMS)

8.2. Moves data to & from RDBMS to Hadoop

9. Workflow jobs are Directed Acyclical Graphs (DAGs), specifying a sequence of actions to execute. The Workflow job has to wait

10. Hive

10.1. SQL-like querying

10.2. Combiner can be used to optimize reducer performance

10.3. Structured data warehousing

10.4. Partition columns instead of indexes

11. Pig

11.1. Scripting for Hadoop

12. HBase

12.1. Transactional lookups

13. Flume

13.1. Log collector

13.2. Integrates into Hadoop

14. Oozie

15. Avro

15.1. Data parsing

15.2. Binary data serialization

15.3. RPC

15.4. language-neutral

15.5. optional codegen

15.6. schema evolution

15.7. untagged data

15.8. dynamic typing

16. Mahout

16.1. Machine learning

16.2. Applied to MR

17. Sqoop

17.1. Autogens Java InputFormat code for data access

18. MapReduce

18.1. Distributed compute

19. Ambari

19.1. Cluster deployment and admin

19.2. Driven by Hortonworks

20. ZooKeeper

20.1. Coordinator of shared state between apps

20.2. Naming, configuration, and synchronization services

21. YARN

21.1. cluster management

21.2. Hadoop 2

21.3. resource manager

21.4. job scheduler

22. BigTop

22.1. Package Hadoop ecosys

22.2. Test Hadoop ecosys package

23. Related Apache Ecosystems

24. HDFS

24.1. Distributed storage

25. Spark

26. Impala

26.1. SQL query egnine

26.2. Real time

27. Cascading

27.1. Higher abstraction from MR

27.2. Creates Flow that assembles Map/Reduce jobs