Hadoop

Plan your Research & Development and track all outcomes

Get Started. It's Free
or sign up with your email address
Rocket clouds
Hadoop by Mind Map: Hadoop

1. Hadoop stack

1.1. Hive

1.1.1. data ware house infrastructure on top of hadoop

1.1.2. providing query and analysis api

1.2. HBase

1.2.1. nonrelational database works on top of HDFS

1.2.2. written in Java

1.3. Pig

1.3.1. create map reduce code easier using SQL like code

1.4. Oozie

1.4.1. workflow scheduler system manage hadoop jobs

1.5. Scoop

1.5.1. tool used to transfer bulk data between relational database and hadoop

1.6. Flume

1.7. Zookeeper

2. Clustering architectures

2.1. What

2.2. Why

2.2.1. Save cost

2.2.2. Volume of data can be processed

2.2.3. Speed, faster performance

2.3. How

2.3.1. Data locality

2.3.2. Scalable

2.4. Issues

2.4.1. Node failure

3. Big Data

3.1. Some statistics

3.2. How big is big data

3.3. Is this big data

3.4. Challenges

3.4.1. Velocity

3.4.2. Volume

3.4.3. Variety

4. HDFS

4.1. What is HDFS

4.1.1. Storage engine for hadoop

4.1.2. Designed for storing very large files

4.1.3. Runs on commonly available hardware

4.2. The File system

4.2.1. FileDisk

4.2.2. Distributing data

4.2.3. Default block size - 64 MB (normal-4-8KB)

4.2.4. Replication

4.3. Accessing the file system

4.3.1. Via command line

4.3.2. Over HTTP

4.3.3. From Java

4.3.4. LIBHDFS (C language)

4.4. Nodes

4.4.1. Name node

4.4.2. Worker node

5. Map Reduce

5.1. What ?

5.1.1. refers to 2 separate distinct tasks that hadoop job performs

5.2. Overview

5.3. What should I do

5.3.1. Write map function

5.3.2. Write reduce function

5.3.3. Hadoop takes care of issuing the commands

6. In Action

6.1. Roles

6.1.1. Administrators

6.1.1.1. Installation of the system

6.1.1.1.1. Single node cluster in Unix

6.1.1.1.2. Using Elastic map reduce

6.1.1.1.3. Setup a multi-node cluster

6.1.1.1.4. in Windows

6.1.1.2. Monitor and manage the system

6.1.1.3. Tune add nodes, make it scalable

6.1.2. Developers

6.1.2.1. Design applications

6.1.2.2. Import and export data

7. Hadoop Overview

7.1. What is hadoop

7.1.1. set of frameworks

7.1.2. used to develop apps that can run on big data platform

7.2. How is it done

7.2.1. Data is broken into smaller chunks

7.2.2. Data sets are distributed across different nodes

7.2.3. Processing logic is also disctributed across different nodes

7.2.4. Each node process its logic and returns output

7.2.5. Data is aggregated and results are calculated

7.3. Core

7.3.1. HDFS

7.3.2. Map Reduce

7.4. Why Hadoop

7.4.1. Technology independent

7.4.2. opensource

7.4.3. based on small commodity hardware

7.5. When to use

7.5.1. where data cannot be processed by a single machine

7.6. When not to use

7.6.1. where processing cannot be divided into multiple chunks

7.6.2. where processing has to be done sequentially

7.6.3. online transactions

7.7. Hadoop history

7.7.1. Timeline

7.8. Where is hadoop used

7.8.1. Users

7.8.1.1. IBM

7.8.1.2. Facebook

7.8.1.3. Applicable across any domain

7.8.1.4. eBay

7.8.2. Applications

7.8.2.1. Detecting video change in patterns

8. Architecture

8.1. How it works

8.2. Advantages to developers

8.2.1. Where the file is located

8.2.2. how to manage failures

8.2.3. how to break into multiple jobs

8.2.4. how to do programming for scaling

9. Misc

9.1. References

9.2. Hadoop Technical stack

9.2.1. Diagram

9.3. Limitations

9.3.1. Designed for sequential read, not random seek

9.3.2. designed for write once and read many times

9.3.3. Hardware failures happen, but in the big picture, it is not noticeable.

10. New Idea