Online Mind Mapping and Brainstorming

Create your own awesome maps

Online Mind Mapping and Brainstorming

Even on the go

with our free apps for iPhone, iPad and Android

Get Started

Already have an account? Log In

Hadoop by Mind Map: Hadoop
0.0 stars - 0 reviews range from 0 to 5

Hadoop

Deiveehan Nallazhagappan

Hadoop stack

Hive

data ware house infrastructure on top of hadoop

providing query and analysis api

HBase

nonrelational database works on top of HDFS

written in Java

Pig

create map reduce code easier using SQL like code

Oozie

workflow scheduler system manage hadoop jobs

Scoop

tool used to transfer bulk data between relational database and hadoop

Flume

Zookeeper

Clustering architectures

What

Why

Save cost

Volume of data can be processed

Speed, faster performance

How

Data locality

Scalable

Issues

Node failure

Big Data

Some statistics

How big is big data

Is this big data

Challenges

Velocity

Volume

Variety

HDFS

What is HDFS

Storage engine for hadoop

Designed for storing very large files

Runs on commonly available hardware

The File system

FileDisk

Distributing data

Default block size - 64 MB (normal-4-8KB)

Replication

Accessing the file system

Via command line

Over HTTP

From Java

LIBHDFS (C language)

Nodes

Name node

Worker node

Map Reduce

What ?

refers to 2 separate distinct tasks that hadoop job performs

Overview

What should I do

Write map function

Write reduce function

Hadoop takes care of issuing the commands

In Action

Roles

Administrators, Installation of the system, Single node cluster in Unix, Using Elastic map reduce, Setup a multi-node cluster, in Windows, Install linux on Widows, Monitor and manage the system, Tune add nodes, make it scalable

Developers, Design applications, Import and export data

Hadoop Overview

What is hadoop

set of frameworks

used to develop apps that can run on big data platform

How is it done

Data is broken into smaller chunks

Data sets are distributed across different nodes

Processing logic is also disctributed across different nodes

Each node process its logic and returns output

Data is aggregated and results are calculated

Core

HDFS

Map Reduce

Why Hadoop

Technology independent

opensource

based on small commodity hardware

When to use

where data cannot be processed by a single machine

When not to use

where processing cannot be divided into multiple chunks

where processing has to be done sequentially

online transactions

Hadoop history

Timeline

Where is hadoop used

Users, IBM, Facebook, Applicable across any domain, eBay

Applications, Detecting video change in patterns

Architecture

How it works

Advantages to developers

Where the file is located

how to manage failures

how to break into multiple jobs

how to do programming for scaling

Misc

References

Ø EMR training Playlist (11 videos): http://www.youtube.com/playlist?list=PLB5E99B925DBE79FF Ø Creating a Map reduce application: http://java.dzone.com/articles/running-elastic-mapreduce-job Ø Demystifying hadoop: http://www.youtube.com/watch?v=XtLXPLb6EXs Ø Hadoop home page: http://hadoop.apache.org/

Hadoop Technical stack

Diagram

Limitations

Designed for sequential read, not random seek

designed for write once and read many times

Hardware failures happen, but in the big picture, it is not noticeable.

New Idea