Online Mind Mapping and Brainstorming

Create your own awesome maps

Online Mind Mapping and Brainstorming

Even on the go

with our free apps for iPhone, iPad and Android

Get Started

Already have an account? Log In

Hadoop talk, Nathan Milford, Outbrain by Mind Map: Hadoop talk, Nathan Milford, Outbrain
0.0 stars - reviews range from 0 to 5

Hadoop talk, Nathan Milford, Outbrain

@NathanMilford

Test nodes

less good for random access

If you loose it, your whole cluster is useless

Hadoop HDFS

expects failure

Nodes

intended for commodity hardware

by default, replication of 3

optimized for large files

All NameNode data stored in RAM

Workflow

About

Ops Engineer in Outbrain

Usage

15 Hadoop nodes

Used for Reports

Why

Scales really well

Alternatives working in scale-up reach some wall

When using sharding, you need a reliable framework to work with it

When writing scripts yourself (e.g., ETL) they usually start |eating their own tail"

Why not commercial alternatives?

2 distro's of Hadoop

ASF (Apache) vs. CDH

ASF has the bleeding edge features

most use CDH

abstraction between ops & dev

dev treat the data as an abstract dataset, not caring where it is

ops just do the plumbing & make it work

design principles

support

"scatter & gather"

share nothing

map/reduce is very simple, but can be applied to almost anything

know before you build

according to the use-case, you'll need to setup the cluster differently

Ecosystem

Hadoop MapReduce

Hive

Pig

Other awesome stuff used in Outbrain

Scoop

oozie

Flume

More stuff from Cloudera

Outbrain's Cluster

15 nodes

Resources

Eric Sammer's Productizing hadoop talk

...

Tips

Replace the default javabased compression