Hadoop World

Get Started. It's Free
or sign up with your email address
Rocket clouds
Hadoop World by Mind Map: Hadoop World

1. Lambda

1.1. Twitter Summingbird

1.2. Lambdoop

2. Architectures

3. Distributed Framework Manager

3.1. Large-Scale Datatransfer

3.1.1. DistCp

3.2. Scheduling Type

3.2.1. monolithic

3.2.2. two-level

3.2.3. shared state

3.3. Software

3.3.1. Apache Mesos

3.3.1.1. Scheduling

3.3.1.2. Monitoring

3.3.2. Apache Ambari

3.3.2.1. Monitoring

3.3.2.2. Manage Cluster

3.3.2.3. Automated Deployment

3.3.3. Ganglia

3.3.3.1. Monitoring

3.3.4. Ooyala Spark Job-Server

3.3.5. Google Kubernetes

4. Configuration Management

4.1. Software

4.1.1. Apache Zookeeper

5. Core

5.1. Distributed Filesystem

5.1.1. HDFS

5.2. Scheduling Big Data Jobs

5.2.1. Yarn

5.2.1.1. Map-Reduce

5.2.2. Job Schedule Manager

5.2.2.1. Apache Reef

6. Search

6.1. Solr

7. Relational Databases

7.1. General

7.1.1. Cloudera Impala

7.2. Warehouse (OLAP)

7.2.1. Apache Hive

7.2.1.1. Apache Tajo

7.2.1.2. Spark SQL

7.3. Query Language

7.3.1. Hive-QL

7.3.2. SQL

7.4. Read-Only/Low Latency

7.4.1. SploutSQL

7.5. Transactions

7.5.1. Row-Based ACID

7.5.2. ACID

7.5.3. Eventually Consistent

7.5.4. Eventually Durable

7.6. Interfaces

7.6.1. Software

7.6.1.1. Apache Thrift

7.6.2. JDBC

7.6.3. ODBC

7.7. Transactional

7.7.1. Splice

7.7.2. Stinger.next/Apache Hive

8. Event Processing

8.1. Spark Streaming

9. Use Cases

9.1. Large-Scale Logging and Failure Analysis

9.1.1. Apache Chukwa

9.2. Predictive Maintenance

9.3. Personalized Advertisement

9.4. Master Data Management

9.5. Preference Learning

9.6. Gamification

9.7. Business Warehouse

10. NoSQL

10.1. Data Storage Type

10.1.1. Key/Value

10.1.1.1. Apache HBase

10.1.1.1.1. SQL

10.1.1.2. Apache Accumulo

10.1.2. Graph

10.1.2.1. Neo4J

10.1.2.2. Apache Giraph

10.1.2.2.1. Bagel

10.1.2.3. GraphX

10.1.3. Columnar

10.1.3.1. Parquet (Storageformat)

10.1.3.2. Apache Drill

10.1.3.3. Apache Cassandra

10.1.4. GIS

10.1.4.1. GIS Tools for Hadoop

10.1.4.2. Spatial Hadoop

10.2. Query Language

10.2.1. Apache Pig

11. Processing Paradigm

11.1. Batch-Processing

11.1.1. Map-Reduce

11.1.2. TEZ

11.1.3. Spark

11.2. Stream-Processing(Realtime)

11.2.1. Software

11.2.1.1. Apache Spark

11.2.1.2. Apache Flink

11.3. Integration of both

11.3.1. Software

11.3.1.1. Twitter Summingbird

11.4. in-memory

11.4.1. Apache Tez

11.4.2. Apache Tachyon

11.4.3. Apache Ignite

11.4.4. Apache Flink

11.5. Libraries

11.5.1. Apache Crunch

12. Statistical Analytics/Machine Learning

12.1. Software

12.1.1. RHadoop

12.1.2. RHipe

12.1.3. Apache Mahout

12.1.4. SparkR

12.1.5. Apache Hama

12.1.6. mllib

12.1.7. Weka (distributedWekaHadoop)

12.1.8. DDF.io - Distributed Data Frame

12.1.9. Kepler

12.2. Languages

12.2.1. R

12.2.2. Java

12.2.3. Python

12.3. GUI

12.3.1. Browser

12.3.1.1. RStudioWeb

12.3.1.2. Cloudera Hue

12.3.1.3. Apache Zeppelin

12.4. in-database analytics

12.4.1. hivemall

13. Alternatives

13.1. Event-Processing

13.1.1. Apache Storm

14. Managing Environments

14.1. Software

14.1.1. Puppet

14.1.2. Chef

14.1.3. Google Kubernetes

14.2. Software Container

14.2.1. Software

14.2.1.1. Docker

14.3. Deploy

14.3.1. Software

14.3.1.1. Apache Slider

15. Cloud Manager

15.1. Software

15.1.1. Apache Delta Cloud

15.1.2. Ubuntu Juju

15.1.3. Apache Whirr

15.1.4. Cloudera Cloud Manager

15.1.5. OpenStack

15.1.5.1. Apache Savanna

16. Data Import/Export

16.1. Software

16.1.1. Apache Flume

16.1.2. Apache Sqoop

17. Reporting

17.1. Software

17.1.1. R

17.1.1.1. MarkDown

17.1.1.2. Knit

18. Workflows

18.1. Software

18.1.1. Apache Oozie

18.1.1.1. Apache Falcon

18.1.2. Apache Flink (Stratosphere)

18.1.3. Spotify Luigi

18.2. Run-time / Query Optimization

18.3. Data transformation

19. Packaging/Distribution

19.1. Cloud

19.1.1. Amazon Elastic MapReduce (EMR)

19.1.2. Microsoft Azure HDInsight

19.1.3. Google Compute Hadoop

19.1.4. Altiscale

19.2. On-Premise

19.2.1. MapR

19.2.2. Apache BigTop

19.2.3. HortonWorks

19.2.4. Microsoft HDInsight

19.2.5. Cloudera Enterprise

19.2.6. Buildoop

19.2.6.1. Lambda Architecture

20. Distributed File Systems

20.1. Windows Azure Blob Storage

20.2. CassandraFS

20.3. CephFS

20.4. CleverSafe Object Store

20.5. Google Cloud Storage Connector

20.6. ClusterFS

20.7. GridGrain

20.8. Lustre

20.9. MapR FileSystem

20.10. OrangeFS

20.11. Quantcast File System

20.12. Symantec Veritas Cluster File System

20.13. Amazon S3

21. Security

21.1. Cluster

21.1.1. Software

21.1.1.1. Apache Knox

21.2. Data

21.2.1. Authorization

21.2.1.1. Software

21.2.1.1.1. Apache Sentry

22. Messaging

22.1. Software

22.1.1. Apache Kafka

22.1.1.1. Apache Samza

22.1.2. Akka

23. System Tools

23.1. JVM Garbage Collection

23.1.1. GCViewer

23.2. HDFS live Statistics

23.2.1. Twitter HDFS Du

23.3. Disk Image Analytics

23.3.1. HDFS FSImage

23.4. UserMonitor

23.4.1. LinkedIn White Elephant

23.5. MapReduce Monitor

23.5.1. Twitter Hraven

24. Legacy Software Integration

24.1. Apache Slider

25. Data Cleaning

25.1. Openrefine

25.2. Netflix Zeno

26. NewSQL

26.1. BayesDB

26.2. H-Store

27. MetaData

27.1. Kite

27.2. Hive MetaStore

28. OLAP

28.1. Kylin