Big Data Fundamentals

Get Started. It's Free
or sign up with your email address
Big Data Fundamentals by Mind Map: Big Data Fundamentals

1. Big data

1.1. volume

1.1.1. amount of data

1.2. velocity

1.2.1. frequency

1.3. variety

1.3.1. contents of the data

2. Components

2.1. Apache Hadoop

2.1.1. Scalable storage and batch processing system

2.1.2. Compliments existing systems by scaling

2.1.3. HDFS Distributed File Systems

2.1.4. YARN Scheduling and executing

2.1.5. Map Reduce YARN-based for processing large data sets on the cluster

2.1.6. How does this work? Architecture Name Nodes Data Nodes Process Client writes to a name node (64MB chunk) The data node replicates to 2 other nodes

2.1.7. Big data job types MapReduce Parallel processing of large data sets Map - into separate nodes Reduce - aggregate the outputs Partition - which reduces receives the kvp from the mapper shuffler - transfers the data from the mappers to the reducers Sorted - by keys as it arrives to the reducers Hive data warehousing infrastructure HIVE query language - HQL Allows unstructured data as if it were structured Allows ad hoc querying and analysis Pig Programming environment for data tasks Pig Latin - procedural map reduce with HQL

3. Use cases

4. Database Architecture

4.1. ACID

4.1.1. Atomocity All or nothing for transactions

4.1.2. Consistency Only valid data is saved

4.1.3. Isolation such as two people purchasing the last ticket

4.1.4. Durability AZ failure


5.1. OLTP

5.1.1. Many users, constant transactions

5.1.2. Critical to the business

5.1.3. GB

5.2. OLAP

5.2.1. Analytics

5.2.2. Periodic large updates and complex queries

5.2.3. TB/PB

6. Data Warehouse

6.1. Uses ETL to ingest the data

6.2. Data Mart

6.2.1. A smaller DWH

6.2.2. Covers a subset of the data --> faster and easier