AWS Data Analytics

Kom i gang. Det er Gratis
eller tilmeld med din email adresse
AWS Data Analytics af Mind Map: AWS Data Analytics

1. Topics

1.1. Collection

1.1.1. Kinesis

1.1.2. Firhose

1.1.3. Kinesis Data Analytics

1.1.3.1. Source - Kinesis Data Streams and Firehose

1.1.3.2. Destination: Kinesis Data Streams, Lambda, Firehose

1.1.4. KDA vs Lambda

1.1.4.1. Windowing is KDA

1.1.5. Managed Streaming Kafka

1.1.5.1. Can't integrate with Kinesis

1.1.6. Load Data to Redshift

1.1.6.1. Multiple Files is better

1.1.6.2. Copy Command is better

1.1.7. IoT

1.2. Storage and Data Management

1.2.1. S3

1.2.1.1. Best Forma for S3

1.2.1.2. Partitioning

1.2.1.3. Flatten JSON before use in Athena and Glue Catalog

1.2.1.4. Store Application Logs and analyze at low cost with Athena

1.2.1.4.1. Use Glue Crawler to keep the schema upto date if no Fixed Schema

1.2.2. Glue Data catalogue

1.2.2.1. Glue Data Catalog over Hive Catalog

1.2.2.2. How to keep it up to date?

1.2.3. ElasticSearch + Kiabana

1.2.3.1. With Visualization : Kibana + ElasticSearch

1.2.3.2. Store Application Logs : With Vsualization

1.2.4. Redshift

1.2.4.1. Fixed Schema data + Complete Analytics like aggregation + Interactive

1.2.4.2. How to structure tables ?

1.2.4.2.1. Distribution Keys

1.2.4.2.2. Sort Keys

1.2.4.2.3. Use Key for fact tables and All for dimension tables that don't update regularly.

1.2.4.3. Redshift Spectrum

1.2.4.3.1. For Data in S3

1.2.4.3.2. Newer Data in Redshift

1.2.4.3.3. Older data in S3

1.2.4.4. When should I use Amazon Athena vs. Redshift Spectrum?

1.2.4.5. What is AQUA (Advanced Query Accelerator) for Amazon Redshift?

1.2.4.6. When would I use Amazon Redshift vs. Amazon RDS?

1.2.4.7. When would I use Amazon Redshift or Redshift Spectrum vs. Amazon EMR?

1.2.4.8. How do I load data into my Amazon Redshift data warehouse?

1.2.4.9. How do I load data from my existing Amazon RDS, Amazon EMR, Amazon DynamoDB, and Amazon EC2 data sources to Amazon Redshift?

1.2.5. EMR

1.2.5.1. Why chose over S3 or Redshift

1.2.5.1.1. Migration : Large scale on prem Hadoop Cluster

1.2.5.1.2. Existing Dependency on HDFS

1.2.5.1.3. EMR vs Redshift

1.2.5.2. EMRFS

1.2.5.2.1. Uses DynamoDB for consistent view

1.3. Processing

1.3.1. Glue

1.3.1.1. Use to merge files

1.3.1.2. JSON to Parquet

1.3.1.3. PySpark vs Python Shell

1.3.1.4. Questions on Handson

1.3.1.4.1. Tuning

1.3.1.4.2. DynamicFrames

1.3.1.5. Use for batch not for streaming

1.3.2. Redshift

1.3.2.1. Performance

1.3.2.1.1. WLM ( Manual vs Auto)

1.3.2.1.2. Short Query Accelearation

1.3.2.1.3. Concurrency Acceleration

1.3.2.1.4. Elastic Resize

1.3.2.2. Classic vs Elastic Resize

1.3.2.2.1. Classic : To change node type

1.3.2.2.2. Elastic : to maintain uptime

1.4. Analysis and Visualization

1.4.1. Quicksight

1.4.2. Tablau

1.4.3. Presto

1.4.3.1. un a short query against multiple data sources

1.5. Security

1.5.1. S3

1.5.1.1. Security Options

1.5.1.2. Multi region with Athena and Glue Catalog

1.5.1.3. S3 security with Athena, Redshift Spectrum, Glue Catalog and Quicksight

1.5.2. EMR

1.5.2.1. Security at Rest

1.5.2.2. In flight

1.5.2.3. Kerberose

1.5.2.4. AD integration

1.5.2.5. EBS Root volume encryption

1.5.3. Redshift

1.5.3.1. Audit all actions

1.5.3.2. Enhanced VPC Routing

2. Resources

3. Preparation

3.1. Exam date:12-Jul

3.2. Type of exam: AWS Data Analytics

3.3. Number of questions: 85

3.4. Grading & emphasis:

3.5. Links

3.5.1. AWS Certified Data Analytics vs Big Data - What's the difference?

4. 3 Weeks

4.1. Week 1

4.1.1. EMR Glue Redshift KDA MSK

4.2. Week 2

4.2.1. Review Week 1

4.2.2. New Material

4.3. Week 3

4.3.1. Review Week 2

4.3.2. New Material