Get Started. It's Free
or sign up with your email address
Rocket clouds
Hadoop Stack by Mind Map: Hadoop Stack

1. 2. Clustering architectures

1.1. 2.1. What

1.2. 2.2. Why

1.2.1. Save cost

1.2.2. Volume of data can be processed

1.2.3. Speed, faster performance

1.3. 2.3. How

1.3.1. Data locality

1.3.2. Scalable

1.4. 2.4. Issues

1.4.1. Node failure

2. 3. Big Data

2.1. 3.1. Some statistics

2.2. 3.2. How big is big data

2.3. 3.3. Is this big data

2.4. 3.4. Challenges

2.4.1. Velocity

2.4.2. Volume

2.4.3. Variety

3. 4. HDFS

3.1. 4.1. What is HDFS

3.1.1. Storage engine for hadoop

3.1.2. Designed for storing very large files

3.1.3. Runs on commonly available hardware

3.2. 4.2. The File system

3.2.1. File Disk

3.2.2. Distributing data

3.2.3. Default block size - 64 MB (normal-4-8KB)

3.2.4. Replication

3.3. 4.3. Accessing the file system

3.3.1. Via command line

3.3.2. Over HTTP

3.3.3. From Java

3.3.4. LIBHDFS (C language)

3.4. 4.4. Nodes

3.4.1. Name node

3.4.2. Worker node

4. 5. Map Reduce

4.1. 5.1. What?

4.1.1. Refers to 2 separate distinct tasks that hadoop job performs

4.2. 5.2. Overview

4.3. 5.3. What should I do?

4.3.1. Write map function

4.3.2. Write reduce function

4.3.3. Hadoop takes care of issuing the commands

5. 6. In Action

5.1. 6.1. Roles

5.1.1. 6.1.1. Administrators

5.1.1.1. Installation of the system

5.1.1.1.1. Single node cluster in UNIX

5.1.1.1.2. Using Elastic map reduce

5.1.1.1.3. Setup a multi-node cluster

5.1.1.1.4. In Windows

5.1.1.2. Monitor and manage the system

5.1.1.3. Tune adds nodes, make it scalable

5.1.2. 6.1.2. Developers

5.1.2.1. Design applications

5.1.2.2. Import and export data

6. 7. Hadoop Overview

6.1. 7.1. What is Hadoop?

6.1.1. Set of frameworks

6.1.2. Used to develop apps that can run on big data platform

6.2. 7.2. How is it done?

6.2.1. Data is broken into smaller chunks

6.2.2. Data sets are distributed across different nodes

6.2.3. Processing logic is also distributed across different nodes

6.2.4. Each node process its logic and returns output

6.2.5. Data is aggregated and results are calculated

6.3. 7.3. Core

6.3.1. HDFS

6.3.2. Map Reduce

6.4. 7.4. Why Hadoop

6.4.1. Technology independent

6.4.2. Open source

6.4.3. Based on small commodity hardware

6.5. 7.5. When to use

6.5.1. Where data cannot be processed by a single machine

6.6. 7.6. When not to use

6.6.1. Where processing cannot be divided into multiple chunks

6.6.2. Where processing has to be done sequentially

6.6.3. Online transactions

6.7. 7.7. Hadoop history

6.7.1. Timeline

6.8. 7.8. Where is hadoop used?

6.8.1. 7.8.1. Users

6.8.1.1. IBM

6.8.1.2. Facebook

6.8.1.3. Applicable across any domain

6.8.1.4. EBay

6.8.2. 7.8.2. Applications

6.8.2.1. Detecting video change in patterns

7. 8. Architecture

7.1. 8.1. How it works

7.2. 8.2. Advantages to developers

7.2.1. Where the file is located

7.2.2. how to manage failures

7.2.3. how to break into multiple jobs

7.2.4. how to do programming for scaling

8. 9. Misc

8.1. 9.1. References

8.2. 9.2. Hadoop Technical stack

8.2.1. Diagram

8.3. 9.3. Limitations

8.3.1. Designed for sequential read, not random seek

8.3.2. Designed for write once and read many times

8.3.3. Hardware failures happen, but in the big picture, it is not noticeable.

9. 10. New Idea

10. 1. Hadoop Core

10.1. 1.1. Hive

10.1.1. Data warehouse infrastructure on top of Hadoop

10.1.2. Providing query and analysis API

10.2. 1.2. HBase

10.2.1. Non-relational database works on top of HDFS

10.2.2. Written in Java

10.3. 1.3. Pig

10.3.1. Create map reduce code easier using SQL like code

10.4. 1.4. Oozie

10.4.1. Workflow scheduler system manage Hadoop jobs

10.5. 1.5. Scoop

10.5.1. Tool used to transfer bulk data between relational database and Hadoop

10.6. 1.6. Flume

10.7. 1.7. Zookeeper