AWS Services for Certified Data Engineer - Associate Exam

A mindmap containing some information on the services covered in the AWS Certified Data Engineer - Associate Exam.

Iniziamo. È gratuito!
o registrati con il tuo indirizzo email
AWS Services for Certified Data Engineer - Associate Exam da Mind Map: AWS Services for Certified Data Engineer - Associate Exam

1. Security and Identity

1.1. Cognito

1.1.1. user management and authentication service that can be integrated to your web or mobile applications

1.1.1.1. also enables you to authenticate users through an external identity provider

1.1.2. Amazon Cognito Identity Pools for authentication with Amazon.com accounts offers a secure and scalable solution for user authentication

1.2. Dectective

1.2.1. automatically collects log data from your AWS resources

1.2.2. uses machine learning, statistical analysis, and graph theory to build a linked set of data that enables you to easily conduct faster and more efficient security investigations

1.3. GuardDuty

1.3.1. intelligent threat detection service

1.3.2. analyzes billions of events across your AWS accounts from AWS CloudTrail, Amazon VPC Flow Logs and DNS logs

1.3.3. regional service

1.3.4. Threat detection categories

1.3.4.1. Reconnaissance

1.3.4.2. Instance compromise

1.3.4.3. Account compromise

1.3.5. provides three severity levels (Low, Medium, and High) to allow you to prioritize response to potential threats

1.4. Inspector

1.4.1. automated security assessment service that helps you test the network accessibility of your EC2 instances

1.4.2. test the security state of your applications running on the instances

1.4.3. assesses applications for vulnerabilities, which is important for security

1.5. Macie

1.5.1. security service that uses machine learning to automatically discover, classify, and protect sensitive data in AWS

1.5.2. allows you to achieve the following

1.5.2.1. dentify and protect various data types, including PII, PHI, regulatory documents, API keys, and secret keys

1.5.2.1.1. Pre-built classifiers in Macie are used to identify specific types of sensitive data like PII

1.5.2.2. Verify compliance with automated logs that allow for instant auditing

1.5.2.3. Identify changes to policies and access control lists

1.5.2.4. Receive notifications when data and account credentials leave protected zones

1.5.2.5. Detect when large quantities of business-critical documents are shared internally and externally

1.5.2.6. monitors for unusual patterns of access to sensitive data stored in S3

1.6. Artifact

1.6.1. self-service central repository of AWS’ security and compliance reports and select online agreements

1.6.2. AWS Artifact Reports include the following

1.6.2.1. ISO

1.6.2.2. Service Organization Control (SOC) reports

1.6.2.3. Payment Card Industry (PCI) reports

1.6.2.4. ertifications that validate the implementation and operating effectiveness of AWS security controls

1.7. Audit Manager

1.7.1. help you audit your AWS usage

1.7.1.1. simplify risk management and compliance with regulations and industry standards

1.7.2. Automates evidence collection for policies, procedures, and activities, as well as the creation of audit reports

1.8. Certificate Manager

1.8.1. easily provision, manage, and deploy public and private SSL/TLS certificates

1.8.2. integrated with the following services

1.8.2.1. Elastic Load Balancing

1.8.2.2. Amazon CloudFront

1.8.2.3. AWS Elastic Beanstalk

1.8.2.4. Amazon API Gateway

1.8.2.5. AWS CloudFormation

1.8.3. manages the renewal process for the certificates managed in ACM

1.8.3.1. can import your own certificates into ACM, however, you have to renew these yourself

1.9. CloudHSM

1.9.1. computing device that enables you to provision and manage your own single-tenant HSMs for the generation and use of encryption keys

1.9.2. perform the following cryptographic tasks

1.9.2.1. Generate, store, import, export, and manage cryptographic keys

1.9.2.2. Use symmetric and asymmetric algorithms to encrypt and decrypt your data

1.9.2.3. Compute message digests and hash-based message authentication codes using cryptographic hash functions

1.9.2.4. Cryptographically sign data and verify signatures

1.9.2.5. You can generate cryptographically secure random data

1.10. Directory Service

1.10.1. enables your directory-aware workloads and AWS resources to use managed Active Directory

1.10.2. Also known as AWS Managed Microsoft AD

1.10.3. best choice if you need actual Active Directory features to support AWS applications or Windows workloads

1.11. Firewall Manager

1.11.1. Simplifies your AWS WAF administration and maintenance tasks across multiple accounts and resources

1.11.2. set up your firewall rules just once

1.12. IAM

1.12.1. Control who is authenticated (signed in) and authorized (has permissions) to use resources

1.12.2. root user is a single sign-in identity that has complete access to the account

1.12.3. IAM Access Analyzer

1.12.3.1. helps you analyze IAM policies to ensure that permissions granted are not overly permissive and align with the principle of least privilege

1.12.3.2. continuously monitors policies and detects configurations that allow resources to be accessed by external entities

1.12.4. IAM roles for service accounts (IRSA) feature

1.12.4.1. enables kubernetes pods to securely authenticate and authorize with AWs services by associating IAM roles with Kubernetes service accounts

1.12.4.2. allows applications running on EKS to access AWS resources without hardcoding AWS credentials

1.12.5. IAM role policies

1.12.5.1. trust policy

1.12.5.1.1. defines which principals can assume the role, and under which conditions

1.12.5.1.2. specific type of resource-based policy for IAM roles

1.12.5.2. identity-based policy

1.12.5.2.1. define the permissions that the user of the role is able to perform and on which resources.

1.12.5.3. permissions boundary

1.12.5.3.1. advanced feature for using a managed policy to set the maximum permissions for a role

1.12.5.4. time-based

1.12.5.4.1. can be used to grant temporary permissions during specific times of the day apparently

1.13. KMS

1.13.1. managed service that simplifies the creation, management, and control of encryption keys used for protecting data across AWS services and applications

1.13.2. enables you to easily encrypt your data

1.13.3. automatically rotate KMS keys created within KMS once per year without the need to re-encrypt data

1.13.4. highly available key storage, management, and auditing solution

1.13.4.1. CloudTrail, you can audit who used which keys, on which resources, and when

1.13.5. KMS keys are region-specific and cannot be used in another region.

1.13.6. KMS allows attaching resource-based policies (also known as key policies) to control which IAM users and roles have permissions to use or manage the key.

1.14. Network Firewall

1.14.1. fine-grained network traffic control

1.14.2. helps deploy network protections for Amazon VPCs

1.14.3. create policies based on AWS Network Firewall rules and then apply centrally across your VPCs and accounts

1.14.4. Automatically scales firewall capacity

1.15. Organizations

1.15.1. policy-based management for multiple AWS accounts

1.15.2. With Organizations, you can create groups of accounts and then apply policies to those groups

1.15.3. Service Control Policies (SCPs)

1.15.3.1. attached to AWS Organizations and Organizational Units (OUs) that define the maximum permissions for entities within those accounts

1.15.3.2. Even if individual IAM roles or policies grant more permissions, the SCP acts as a guardrail to restrict actions

1.15.3.2.1. enforce the principle of least privilege across multiple accounts.

1.16. Resource Access Manager (RAM)

1.16.1. enables you to easily and securely share AWS resources with any AWS account

1.16.1.1. eliminates the need to create duplicate resources in multiple accounts

1.16.1.2. create resources centrally in a multi-account environment

1.16.2. Only the master account can enable sharing with AWS Organizations

1.16.2.1. The organization must be enabled for all features

1.17. Secrets Manager

1.17.1. service that enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle

1.17.2. can rotate secrets on a schedule or on demand by using the Secrets Manager console, AWS SDK, or AWS CLI

1.17.3. encrypts secrets at rest using encryption keys that you own and store in AWS KMS

1.18. Security Hub

1.18.1. comprehensive view of your security state

1.18.1.1. ingle place that aggregates, organizes, and prioritizes your security alerts

1.18.2. compliance with security industry standards and best practices

1.19. Shield

1.19.1. managed Distributed Denial of Service (DDoS) protection service that safeguards applications running on AWS

1.19.1.1. All AWS customers benefit from the automatic protections of Shield Standard.

1.19.1.2. When used with CloudFront and Route 53, get comprehensive availability protection against all known infrastructure attack

1.19.1.3. view all the events detected and mitigated by AWS Shield

1.20. WAF

1.20.1. Protect your web applications from common exploits

1.20.2. configure rules that allow, block, or monitor (count) web requests based on conditions that you define

1.20.2.1. IP adresses

1.20.2.2. HTTP Headers

1.20.2.3. HTTP Body

1.20.2.4. URI strings

1.20.2.5. SQL injection

1.20.2.6. cross-site scripting

1.20.3. AWS WAF can be deployed on Amazon CloudFront, the Application Load Balancer (ALB), and Amazon API Gateway. It cannot be configured directly on an Amazon EC2 instance.

1.21. VPC

1.21.1. service that allows you to provision a logically isolated network within the AWS Cloud

1.21.2. network access control list (network ACL) does not have the capability to block traffic based on geographic match conditions.

1.21.3. endpoints type

1.21.3.1. gateway

1.21.3.1.1. specify as a target for a route in your route table for traffic destined to a supported AWS service.

1.21.3.2. interface

1.21.3.2.1. is an elastic network interface with a private IP address from the IP address range of your subnet that serves as an entry point for traffic destined to a supported service.

1.21.4. VPC peering

1.21.4.1. networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses.

1.21.4.1.1. Instances in either VPC can communicate with each other as if they are within the same network.

1.21.4.1.2. helps you to facilitate the transfer of data.

1.21.4.1.3. allow other VPCs to access resources you have in one of your VPCs.

1.21.5. VPC Flow Logs

1.21.5.1. allows you to capture network traffic data within the VPC and store them in an Amazon S3 bucket

1.22. Security Group

1.22.1. sercurity group cannot use an Internet Gateway ID as the custom source for the inbound rule

1.22.2. controls the traffic that is allowed to reach and leave the resources that it is associated with

1.23. Access Control

1.23.1. Role-Based Access Control (RBAC) allows roles to inherit permissions from other roles.

1.23.2. Attribute-based access control (ABAC) determines access to resources based on attributes

2. Management Tools

2.1. Managed Grafana

2.1.1. Create dashboards and visualizationsfrom multiple sources

2.1.2. All logical Grafana server deployment, setup, scaling, and maintenance are handled by AWS

2.1.3. integrated with the following data sources

2.1.3.1. Amazon CloudWatch

2.1.3.2. Amazon OpenSearch Service

2.1.3.3. AWS X-Ray

2.1.3.4. Amazon Timestream

2.1.3.5. Amazon Managed Service for Prometheus

2.2. Managed Service for Prometheus

2.2.1. managed monitoring service for container environments

2.2.1.1. can monitor and alert on the performance of containerized workloads using the open-source PromQL

2.2.1.2. don't have to scale or manage the underlying infrastructure

2.2.2. Integrated with Amazon EKS, Amazon ECS, and AWS Distro for OpenTelemetry

2.2.3. Supports multi-AZ replication within an AWS Region

2.2.4. Collect and query metrics from container clusters on AWS and on-premises

2.3. Cloudwatch

2.3.1. Monitoring tool for your AWS resources and applications

2.3.1.1. Display metrics and create alarms that watch the metrics and send notifications or automatically make changes to the resources you are monitoring when a threshold is breached

2.3.2. CloudWatch does not aggregate data across regions

2.3.3. CloudWatch Logs Insights

2.3.3.1. provides an interactive interface to query and analyze log data stored in CloudWatch Logs.

2.4. Auto Scaling

2.4.1. Configure automatic scaling for the AWS resources

2.4.1.1. dynamic scaling

2.4.1.2. predictive scaling

2.4.2. Optimize for availability, for cost, or a balance of both

2.4.3. It is a region specific service

2.5. Compute optimizer

2.5.1. service that recommends optimal AWS resources to reduce costs and improve performance of your workloads

2.5.1.1. machine learning to analyze historical utilization metrics

2.5.1.2. can view findings and recommendations across AWS Regions and accounts

2.5.2. Generates recommendations for the following resources

2.5.2.1. EC2

2.5.2.2. EC2 Auto scaling groups

2.5.2.3. EBS Volumes

2.5.2.4. Lambda functions

2.6. CDK

2.6.1. open-source software development framework for defining cloud infrastructure in code

2.6.1.1. uses familiar programming languages such as TypeScript, JavaScript, Python, C#, .Net, Java, and Go and deploys everything as CloudFormation stacks

2.6.2. high-level object-oriented abstraction on top of AWS CloudFormation that allows you to use pre-built cloud components called constructs

2.7. CloudFormation

2.7.1. allows users to manage and provision AWS resources in a consistent, automated, and predictable manner using templates

2.7.2. convenient provisioning mechanism

2.8. CloudShell

2.8.1. interact with your AWS resources without installing any software on your local computer

2.8.2. you can use any command-line interpreter

2.8.2.1. Bash

2.8.2.2. PowerShell

2.8.2.3. Z Shell

2.8.3. The compute environment is built on Amazon Linux 2

2.9. CloudTrail

2.9.1. auditing AWS account activity

2.9.2. Actions taken by a user, role, or an AWS service in the AWS Management Console, AWS Command Line Interface, and AWS SDKs and APIs are recorded as events

2.9.2.1. events in Event History, where you can view, search, and download the past 90 days of activity in your AWS account

2.9.2.2. aggregate, immutably store, and query your activity logs

2.9.3. focuses on auditing API activity

2.9.4. helps you identify unusual activity in your account

2.10. Config

2.10.1. provides you with an AWS resource inventory, configuration history, and configuration change notifications to enable security and governance

2.10.1.1. Use Amazon SNS to send you notifications every time a supported AWS resource is created, updated, or otherwise modified as a result of user API activity

2.10.1.2. Use Amazon CloudWatch Events to detect and react to changes in the status of AWS Config events

2.10.1.3. Use AWS CloudTrail to capture API calls to Config

2.10.2. tracks changes but does not handle applying configurations

2.10.2.1. to apply use AWS Systems Manager

2.11. Control Tower

2.11.1. configuring and managing a multi-account AWS environment

2.12. Health

2.12.1. Provides ongoing visibility into the state of your AWS resources, services, and accounts

2.12.1.1. alerts and notifications triggered by changes in the health of AWS resources

2.12.1.2. can also create Amazon EventBridge rule to detect and respond to AWS Health events

2.12.2. AWS Health Dashboard is available to all customers

2.12.3. centrally aggregate your AWS Health events from all accounts in your AWS Organization

2.13. License Manager

2.13.1. centrally managing software licenses across AWS and on-premises environments

2.13.1.1. control and visibility into license usage

2.13.1.2. limit licensing overages and reduce the risk of noncompliance and misreporting

2.14. Management Console

2.14.1. Everything you need to access and manage the AWS Cloud — in one web interface

2.14.2. Resource Groups

2.14.2.1. Resource groups make it easier to manage and automate tasks on large numbers of resources at one time

2.14.2.2. Two types of queries on which you can build a group

2.14.2.2.1. Tag-based

2.14.2.2.2. AWS CloudFormation stack-based

2.14.2.3. collection of AWS resources that are all in the same AWS region, and that match criteria provided in a query

2.14.3. Tag Editor

2.14.3.1. Tags are words or phrases that act as metadata for identifying and organizing your AWS resources

2.14.3.1.1. most resources can have up to 50 tags

2.14.3.1.2. can sort and filter the results of your tag search to find the tags and resources that you need to work with

2.15. OpsWorks

2.15.1. configuration management service that helps you configure and operate applications in a cloud enterprise

2.15.1.1. puppet

2.15.1.2. chef

2.15.2. you can automate how nodes are configured, deployed, and managed, whether they are Amazon EC2 instances or on-premises devices

2.16. Outposts

2.16.1. brings AWS infrastructure, services, APIs, and tools to the customer’s premises

2.16.2. concepts

2.16.2.1. outpost site: physical location where AWs install outpost

2.16.2.2. outpost configurations: include EC2. EBS and netowrking capabilities

2.16.2.3. need the rightoutpost euqiement such as AWs managed racks, servers, switches and cabling

2.16.2.4. service link: allows ocmmunication between Outpost and an associated AWS region

2.17. Proton

2.17.1. deploying container and serverless applications

2.17.2. Uses templates to define and maintain standard application stacks

2.18. Sevice Catalog

2.18.1. Allows you to create, manage, and distribute catalogs of approved products to end-users

2.18.1.1. who can then access the products they need in a personalized portal.

2.18.2. Administrators can control which users have access to each product

2.18.3. This is a regional service

2.18.4. features

2.18.4.1. Standardization of assets

2.18.4.2. Self-service discovery and launch

2.18.4.3. Fine-grain access control

2.18.4.4. Extensibility and version control

2.19. Systems Manager

2.19.1. centralize operational data from multiple AWS services

2.19.2. automate tasks across your AWS resources

2.19.2.1. Lets you schedule windows of time to run administrative and maintenance tasks across your instances

2.20. Trusted Advisor

2.20.1. analyzes your AWS environment and provides best practice recommendations in five categories

2.20.1.1. Cost Optimization

2.20.1.2. Performance

2.20.1.3. Security

2.20.1.4. Fault Tolerance

2.20.1.5. Service Limits

2.20.2. Access to the full set of Trusted Advisor checks is available to Business, Enterprise On-Ramp, and Enterprise Support plan

2.21. Well-Architected Tool

2.21.1. reviews your workloads and compares them to the latest AWS architectural best practices

2.21.2. provides recommendations for making your workloads more reliable, secure, efficient, and cost-effective

3. Analytics Services

3.1. Athena

3.1.1. interactive query service that makes it easy to analyze data directly in S3 and other data sources

3.1.1.1. use SQL

3.1.1.2. integrates with Amazon QuickSight for easy data visualization

3.1.1.3. integrates out-of-the-box with AWS Glue

3.1.1.4. sources other than Amazon S3, you can use Athena Federated Query to query the data in place

3.1.1.4.1. or build pipelines that extract data from multiple data sources and store them in Amazon S3

3.1.1.4.2. A data connector is implemented in a Lambda function that uses Athena Query Federation SDK

3.1.2. serverless

3.1.3. built-in query editor

3.1.4. statements

3.1.4.1. select

3.1.4.1.1. CSV is the only output format supported by the Athena SELECT command

3.1.4.2. unload

3.1.4.2.1. Supported formats for UNLOAD include Apache Parquet, ORC, Apache Avro, and JSON.

3.1.5. features

3.1.5.1. workgroups

3.1.5.1.1. allows for the separation of workloads by teams or applications

3.1.5.1.2. the agency can isolate query execution and access to query histories.

3.1.5.1.3. IAM policies can then be attached to each workgroup to manage permissions effectively

3.1.5.2. ACID transactions

3.1.5.2.1. uses Apache Iceberg

3.1.5.2.2. just add 'table_type' - 'ICEBERG' in CREATE TABLE command

3.1.5.2.3. concurrent users can safely make row-level modifications

3.1.5.2.4. compatible with EMR, Spark, anything that supports Iceberg table formats

3.1.5.2.5. removes need for custom record locking

3.1.5.2.6. time travel operations, can recover data recently deleted using a select statement

3.1.5.2.7. governed tables in laek formation have ACId features with Athena

3.1.5.2.8. benefits from periodic compaction to peserve performance using BIN_PACK command

3.1.5.3. Athena JDBC driver

3.1.5.3.1. Java Database Connectivity

3.1.5.3.2. allow applications, like those written in Spark, to connect to a database and perform SQL queries over the network.

3.1.5.3.3. does not provide the targeted access and management capabilities unlike those offered by Athena Workgroups for CTAS-generated tables.

3.1.5.4. Athen API

3.1.5.4.1. suited for custom application integrations and automated workflows

3.1.5.5. cost controls

3.1.5.5.1. per-query limit

3.1.5.5.2. per-workgroup limit

3.1.5.6. supports a wide variety of data formats such as

3.1.5.6.1. CSV

3.1.5.6.2. JSON

3.1.5.6.3. ORC

3.1.5.6.4. Avro

3.1.5.6.5. Parquet

3.1.6. automatically executes queries in parallel

3.1.7. NO MICROSOFT EXCEL

3.1.8. ZIP file format is not supported by Amazon Athena

3.1.9. Athena cannot be used to analyze data in real-time

3.1.10. Athena doesn't support table location paths that include a double slash (//)

3.2. CloudSearch

3.2.1. makes it easy to set up, manage, and scale a search solution for your website or application

3.2.2. use it to index and search both structured data and plain text

3.2.3. CloudSearch can scale to accommodate the amount of data uploaded to the domain and the volume and complexity of search requests

3.2.4. can integrate CloudSearch with API Gateway

3.3. OpenSearch

3.3.1. open source engine that you can use for full-text search and analytics use cases

3.3.2. lets you search, analyze, and visualize your data in real-time

3.3.2.1. OpenSearch (Kibana) dashboards support auto-refresh of data.

3.3.3. manages the capacity, scaling, patching, and administration of your Elasticsearch clusters for you

3.3.4. storage tiers

3.3.4.1. Hot

3.3.4.1.1. for frequently accessed data like recent logs

3.3.4.2. Ultra-warm

3.3.4.2.1. optimized for storing older logs that are accessed less frequently

3.3.4.2.2. No there is no warm option, why this is called ultra-warm i do not know or care

3.3.4.3. Cold

3.3.4.3.1. for rarely accessed, archival data

3.3.5. shard types

3.3.5.1. primary

3.3.5.1.1. main shards

3.3.5.2. replica

3.3.5.2.1. Replica shards are copies of primary shards, back-up shards

3.3.5.2.2. provide data redundancy

3.3.5.2.3. mprove read performance by distributing read requests across multiple nodes

3.3.5.2.4. enhance fault tolerance.

3.3.6. no Opensearch Import feature to load data from S3

3.4. EMR

3.4.1. managed big data platform on AWS that simplifies the execution of big data frameworks like Apache Hadoop and Apache Spark

3.4.2. simplifies running big data frameworks, such as

3.4.2.1. Apache Hadoop

3.4.2.2. Apache Spark

3.4.2.3. natively supports Apache HIVE

3.4.3. process and analyze vast amounts of data

3.4.3.1. can leverage multiple data stores

3.4.3.2. use EMR to transform and move large amounts of data into and out of other AWS data stores and databases

3.4.4. features

3.4.4.1. storage

3.4.4.1.1. Hadoop Distributed File System (HDFS)

3.4.4.1.2. EMRFS

3.4.4.1.3. local file system

3.4.4.1.4. EBS

3.4.4.2. scaling

3.4.4.2.1. Automatic

3.4.4.2.2. Managed

3.4.4.2.3. EMR enables you to quickly and easily provision as much capacity as you need

3.4.4.3. nodes

3.4.4.3.1. master

3.4.4.3.2. core

3.4.4.3.3. task

3.4.5. other setup options

3.4.5.1. serverless

3.4.5.1.1. choose an EMR Release and Runtime (Spark, Hive, Presto)

3.4.5.1.2. Spark add 10% overhead to memory requested for drivers and executors so take that into account when defining inital capacity

3.4.5.1.3. submit queries/scripts via jobs run requests

3.4.5.1.4. manages underlying capacity but you can specifiy to begin with

3.4.5.1.5. one region across multiple AZ's

3.4.5.1.6. EMR computes resources needed for your job and schedules worker accordingly yay

3.4.5.1.7. TLS encrytion intransist

3.4.5.1.8. how

3.4.5.2. EMR on EKS

3.4.5.2.1. also serverless

3.4.5.2.2. allow submitting spark jobs on EKS without provisioning the clusters

3.4.5.2.3. fully managed

3.4.5.2.4. share resources between spark and other apps on kubernetes

3.4.6. can only reside in a single Availability Zone

3.4.7. charge per hour

3.5. Kinesis

3.5.1. Makes it easy to collect, process, and analyze real-time, streaming data

3.5.2. types

3.5.2.1. Kinesis Video Streams

3.5.2.1.1. fully managed AWS service

3.5.2.1.2. stream live video from devices to the AWS Cloud

3.5.2.1.3. real-time video processing or batch-oriented video analytics

3.5.2.2. Kinesis Data Streams

3.5.2.2.1. massively scalable, highly durable data ingestion and processing service optimized for streaming data

3.5.2.2.2. the maximum size of a data blob (the data payload before Base64-encoding) within one record is 1 megabyte (MB)

3.5.2.2.3. we can only have as many consumers as shards in Kinesis (which is in practice, much less than the number of producers).

3.5.2.2.4. features

3.5.2.3. Kinesis Data Firehose

3.5.2.3.1. easiest way to load streaming data into data stores and analytics tools

3.5.2.3.2. fully managed service that automatically scales to match the throughput of your data

3.5.2.3.3. can also batch, compress, and encrypt the data before loading it

3.5.2.3.4. capture, transform, and load streaming data

3.5.2.3.5. maximum size of a record sent to Kinesis Data Firehose, before base64-encoding, is 1,000 KiB

3.5.2.3.6. cannot directly write into a DynamoDB table

3.5.2.3.7. near-real time NOT real-time

3.5.2.4. Kinesis Data Analytics

3.5.2.4.1. Analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time

3.5.2.4.2. can quickly build SQL queries and Java applications using built-in templates and operators for common processing functions to organize, transform, aggregate, and analyze data at any scale.

3.5.2.4.3. Kinesis Data Analytics is serverless and takes care of everything required to continuously run your application

3.5.2.5. Kinesis Client Library

3.5.2.5.1. A DynamoDB table is required as a checkpointing table for the KCL

3.5.2.5.2. Each KCL application must use its own DynamoDB table

3.5.2.5.3. standalone Java software library

3.5.2.5.4. simplifies the process of consuming and processing data from Amazon Kinesis Data Streams

3.5.2.5.5. features

3.6. MSK

3.6.1. fully managed Apache Kafka to ingest and process streaming data in real-time

3.7. QuickSight

3.7.1. cloud-powered business analytics service

3.7.2. QuickSight has options to refresh data on a Daily, Weekly, or Monthly basis. For the enterprise edition only, you have an option to choose Hourly

3.7.3. easy to build visualizations, perform ad-hoc analysis, and quickly get business insights from their data, anytime, on any device

3.7.4. Provides ML Insights

3.7.5. Allows you to directly connect to and import data from a wide variety of cloud and on-premises data sources

3.7.6. Quicksight Q

3.7.6.1. allows users to ask questions in natural language and get visual answers

3.7.7. SPICE

3.7.7.1. Super-fast, Parallel, In-memory Calculation Engine

3.7.7.2. allows caching data in memory to enable fast query performance and quick rendering of visualizations.

3.7.8. Amazon QuickSight supports identity federation in both Standard and Enterprise editions

3.7.8.1. Enterprise edition additionally offers encryption at rest and Microsoft Active Directory integration

3.8. Data Exchange

3.8.1. enables users to search, subscribe to, and use third-party data in the cloud

3.8.1.1. Provides a central catalog where data providers may publish their data products, and data subscribers can search and subscribe to them

3.8.2. can also find and use publicly available data sets that are part of the Open Data on AWS program with or without an AWS account

3.9. Data Pipeline

3.9.1. web service for scheduling regular data movement and data processing activities

3.9.1.1. processing and moving data between AWS compute and storage services

3.9.2. integrates with on-premise and cloud-based storage systems

3.9.3. managed ETL service

3.9.4. Native integration with

3.9.4.1. S3

3.9.4.2. DynamoDB

3.9.4.3. RDS

3.9.4.4. EMR

3.9.4.5. EC2

3.9.4.6. Redshift

3.9.5. provides built-in activities for common actions such as copying data between Amazon Amazon S3 and Amazon RDS

3.9.6. does not offer the same level of flexibility, extensibility, and detailed monitoring as Apache Airflow through MWAA or GLUE

3.10. Glue

3.10.1. ETL your data for analytics

3.10.2. fully managed service

3.10.3. Discover and search across different AWS data sets without moving your data

3.10.4. using crawler to extract metadata, with an ETl glue jobs to extract changes in schemas into data catalog

3.10.4.1. minimum precision for a schedule is 5 minutes

3.10.5. AWS Lambda is a more cost-effective option compared to AWS Glue

3.10.6. glue triggers cannot directly invoke a Lambda function

3.10.7. AWS Glue job bookmarks

3.10.7.1. simplest approach to ensure that data is not re-processed in subsequent job runs

3.10.7.2. automatically tracks the processed data in S3 and resumes processing from where it left off

3.10.8. job monitoring section of the AWS Glue console uses the results of previous job runs to determine the appropriate DPU capacity.

3.11. Glue DataBrew

3.11.1. designed to streamline your data analysis process

3.11.2. allows you to interact with your data directly, eliminating the need for complex coding

3.11.3. 250 pre-built transformations, you can easily clean, normalize, and format your data, preparing it for insightful analysis

3.11.3.1. aka AWS Glue DataBrew Recipes provide a no-code interface

3.11.4. supports

3.11.4.1. CSV

3.11.4.2. Microsoft Excel

3.11.4.3. JSON

3.11.4.4. ORC

3.11.4.5. Parquet

3.12. Glue Data Quality

3.12.1. provides a way to monitor and measure the quality of your data

3.12.2. part of the AWS Glue service

3.12.3. Serverless

3.13. Lake Formation

3.13.1. service for managing and building data lakes

3.13.2. stores and catalogs data from databases and object storage before transferring it to a new S3 data lake

3.13.3. use ML algorithms to clean and classify data

3.13.4. secure access to sensitive data with granular controls at the column, row, and cell levels

3.13.4.1. Tag-Based Access Control (TBAC) provides scalable and fine-grained access control

3.13.5. Lake Formation Blueprints

3.13.5.1. templates for common data ingestion tasks

3.13.5.2. also automate the data cataloging process using the AWS Glue Data Catalog.

3.13.6. integrates with RAM to facilitate secure cross-account sharing of data lake resources.

4. Application Services

4.1. AppFlow

4.1.1. integration service that automates data flows by securely integrating third-party applications and AWS services without writing any code.

4.1.1.1. Data is encrypted at rest and in transit

4.1.2. Aggregate data from multiple sources to train analytics tools more effectively and save money

4.1.3. Publish events related to the status of a flow using Amazon Event Bridge

4.2. Eventbridge

4.2.1. serverless event bus

4.2.2. allows applications to communicate with each other using data from different sources in real time

4.2.3. Amazon EventBridge is the only event-based service that integrates directly with third-party SaaS partners

4.3. SES

4.3.1. Simple Email Service

4.3.2. applications that need to send communications via email

4.3.3. regional service

4.3.4. cost-effective and scalable

4.4. SNS

4.4.1. Simple Notification Service

4.4.2. easy to set up, operate, and send notifications from the cloud

4.4.3. SNS follows the “publish-subscribe” (pub-sub) messaging paradigm

4.4.4. Amazon EventBridge can directly route events to SNS as a target

4.4.4.1. allowing it to send notifications to team members who are subscribed to the SNS topic.

4.4.5. Amazon SNS won't keep our data if it cannot be delivered

4.4.6. NO THIRD PARTY

4.5. SQS

4.5.1. Simple Queue Service

4.5.2. hosted queue that lets you integrate and decouple distributed software systems and components

4.5.3. supports both standard and FIFO queues

4.5.4. pull based (polling) not push based

4.5.5. FIFO queues support up to 300 messages per second

4.5.6. NO THIRD PARTY

4.6. SWF

4.6.1. Simple Workflow Service

4.6.1.1. fully-managed state tracker and task coordinator

4.6.2. create desired workflows with their associated tasks and any conditional logic you wish to apply and store them with SWF

4.6.3. also provides the AWS Flow Framework to help developers use asynchronous programming in the development of their applications

4.7. Step Functions

4.7.1. web service that provides serverless orchestration for modern applications

4.7.2. enables you to coordinate the components of distributed applications and microservices using visual workflows

4.7.3. Nesting your Step Functions workflows allows you to build larger, more complex workflows out of smaller, simpler workflows

4.7.4. coordinates your existing Lambda functions and microservices and lets you modify them into new compositions.

4.7.5. STATES

4.7.5.1. map

4.7.5.1.1. run a set of workflow steps for each item in a dataset

4.7.5.1.2. iterations run in parallel, which makes it possible to process a dataset quickly

4.7.5.1.3. can use a variety of input types

4.7.5.2. parallel

4.7.5.2.1. add separate branches of execution in your state machine

4.7.5.2.2. A Parallel state causes AWS Step Functions to execute each branch

4.7.5.2.3. more suited for scenarios where different tasks or workflows need to run in parallel

4.7.5.3. choice

4.7.5.3.1. adds conditional logic to a state machine

4.7.5.3.2. contains an array of Choice Rules that determine which state the state machine transitions to next

4.7.5.4. task

4.7.5.4.1. represent a single unit of work in a workflow

4.7.5.5. wait

4.7.5.5.1. introduces a delay or a wait time in the workflow.

4.7.5.6. succeed

4.7.5.6.1. mark a workflow as successful

4.8. Managed Workflows for Apache Airflow

4.8.1. helps you manage and automate your data workflows using Apache Airflow

4.8.2. Workflows are designed as Directed Acyclic Graphs (DAGs) using Python

4.8.3. Easy setup and operation

4.8.4. Automatically adjusts to match workload demands

4.8.5. Reduces operational costs due to its managed nature

4.8.6. use cases

4.8.6.1. Complex Data Workflows: Handles complex data processing tasks

4.8.6.2. ETL Jobs

4.8.6.3. Machine Learning

5. Machine Learning and AI

5.1. SageMaker

5.1.1. fully managed service that allows data scientists and developers to easily build, train, and deploy machine learning models at scale

5.1.2. Provides built-in algorithms that you can immediately use for model training

5.1.2.1. sometimes SageMaker involves custom code development to build, develop, test, and deploy the anomaly detection model that is relevant to the given scenario

5.1.3. supports custom algorithms through docker containers

5.1.4. One-click model deployment

5.1.5. features

5.1.5.1. Data Wrangler

5.1.5.1.1. streamlines data preparation and feature engineering for machine learning

5.1.5.1.2. provides a visual interface

5.1.5.1.3. feature in Amazon SageMaker Studio Classic

5.1.5.1.4. integrates data from various sources, allows you to explore, clean, transform, and visualize data, and automates these steps in your machine-learning workflow

5.1.5.2. Experiments

5.1.5.2.1. tool for organizing, tracking, and comparing iterations of machine learning models

5.1.5.3. Model Monitor

5.1.5.3.1. monitor the quality of machine learning models once deployed in production

5.1.5.4. Debugger

5.1.5.4.1. offers insights into the training process by identifying issues like overfitting or underfitting

5.1.5.4.2. provides a detailed debugging information

5.1.5.5. Lineage tracking

5.1.5.5.1. tracks the lineage of machine learning artifacts throughout the lifecycle of a machine learning model

5.1.5.5.2. maintains a comprehensive audit trail of datasets, feature transformations, model parameters, and training metrics

5.1.5.6. Autoscaling

5.1.5.6.1. automatically scales training resources up and down based on the dataset size

5.1.5.7. Feature Store

5.1.5.7.1. centralized repository for managing machine learning features

5.1.5.7.2. simplifies the process of data exploration, model training, and batch predictions by providing a unified view of your features

5.1.5.7.3. Enhances ML model development and deployment efficiency

5.1.5.7.4. options

5.1.5.8. Autopilot

5.1.5.8.1. automates the model creation process

5.1.6. important must dos to configure a training job

5.1.6.1. Set the IAM role that SageMaker can assume to access resources on behalf of the user.

5.1.6.2. Define the hyperparameters to optimize the training algorithm.

5.1.6.3. Specify the S3 location of the training data as a training channel.

6. File Formats

6.1. ORC

6.1.1. supports complex types such as structs, maps, and lists. Parquet does not.

6.1.2. columnar storage

6.2. PARQUET

6.2.1. performance-oriented, column-based data format.

6.3. CSV

6.3.1. minimal, row-based data format.

6.4. XML

6.4.1. highly configurable, rigidly defined data structures that aren't row or column based.

6.4.2. highly standardized.

6.5. AVRO

6.5.1. performance-oriented, row-based data format.

6.5.2. compact binary format

6.5.3. ability to manage schema evolution, meaning data can be read and written even if the schema changes over time

6.6. GrokLog

6.6.1. similar to regular expression capture groups.

6.6.1.1. recognize patterns of character sequences in a plaintext file and give them a type and purpose

6.6.2. primary purpose is to read logs

6.7. ION

6.7.1. that aren't row or column based

6.7.2. represents data structures in interchangeable binary and plaintext representations

6.8. JSON

6.8.1. consistent shape but flexible contents

6.8.2. aren't row or column based

7. Database Services

7.1. Aurora

7.1.1. a fully managed, MySQL and PostgreSQL database

7.1.2. underlying storage grows automatically as needed, up to 128 terabytes

7.1.2.1. minimum storage is 10GB

7.1.3. quick, efficient cloning operations

7.1.4. fault-tolerant and self-healing

7.1.4.1. automatically maintains 6 copies of your data across 3 Availability Zones

7.1.5. support for Row Level Security

7.1.6. cluster types

7.1.6.1. primary DB instance - write and read

7.1.6.2. aurora replica - read only

7.1.7. does not integrate directly with Cognito

7.2. DocumentDB

7.2.1. a fully managed document database service

7.2.2. data is stored in JSON-like documents

7.2.3. compatible with MongoDb

7.2.3.1. millions of requests per second with millisecond latency and has twice the throughput of MongoDb

7.2.4. Fflexible schema and indexing

7.2.5. commonly used for content management, user profiles, and real-time big data.

7.2.5.1. minimum storage is 10GB

7.3. DynamoDB

7.3.1. a fully managed NoSQL database service

7.3.1.1. known for its ability to handle large amounts of unstructured data

7.3.2. Offers encryption at rest

7.3.3. features

7.3.3.1. can restore that table to any point in time during the last 35 days

7.3.3.2. DynamoDB Accelerator

7.3.3.2.1. operates on port 8111

7.3.3.2.2. fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement

7.3.3.2.3. can throttle requests when the read or write request throughput exceeds the allocated capacity.

7.3.3.3. DynamoDB Stream

7.3.3.3.1. ordered flow of information about changes to items in the Amazon DynamoDB table

7.3.3.3.2. options

7.3.3.4. Auto Scaling

7.3.3.4.1. is effective for gradually adjusting capacity in response to changing access patterns

7.3.3.4.2. scales horizontally by distributing data across multiple nodes

7.3.3.5. TTL

7.3.3.5.1. TTL deletions occur within 48 hours

7.3.3.5.2. Items queued for TTL deletion can still be updated

7.3.3.5.3. TTL-triggered deletions can be captured in DynamoDB Streams

7.3.4. create database tables that can store and retrieve any amount of data, and serve any level of request traffic

7.3.5. data is stored in partitions, backed by solid state disks (SSDs) and automatically replicated across multiple AZs in an AWS region

7.3.6. storage limit for each item in an Amazon DynamoDB table is 400 KB

7.3.7. Each table in DynamoDB can have up to 20 global secondary indexes (default quota) and 5 local secondary indexes

7.4. QLDB

7.4.1. fully managed, serverless, and highly available ledger database

7.4.2. ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log ‎owned by a central trusted authority

7.4.3. all application data changes, and maintain a complete and verifiable history of changes over time

7.4.4. serverless

7.4.5. transactions are ACID (atomicity, consistency, isolation, and durability) compliant

7.4.6. PartiQL as its query language

7.5. RDS

7.5.1. fully managed relational database service

7.5.2. Industry-standard relational database

7.5.3. does not support a high request rate (millions of requests per second) with low predictable latency

7.5.4. native support for Row Level Security

7.5.5. provides built-in encryption at rest using AWS KMS

7.5.6. typically scales vertically by increasing the resources (CPU, memory) of a single server.

7.5.7. instance types

7.5.7.1. mySQL

7.5.7.2. MariaDB

7.5.7.3. PostgreSQL

7.5.7.4. Microsoft SQL Server

7.5.7.5. Oracle

7.6. Redshift

7.6.1. petabyte-scale data warehouse

7.6.2. cons

7.6.2.1. does not support a high request rate (millions of requests per second) with low predictable latency

7.6.2.2. Single-AZ deployments

7.6.2.3. Amazon Redshift does not reclaim free space automatically

7.6.2.3.1. Such available space is created whenever you delete or update rows on a table

7.6.2.3.2. need to run a vacuum command

7.6.3. nodes

7.6.3.1. RA3

7.6.3.1.1. concurrency scaling for write operations on only Amazon Redshift RA3 nodes

7.6.3.1.2. enabling the Redshift RA3 replication feature for real-time data mirroring across availability zones.

7.6.3.1.3. allow Amazon Redshift to scale compute and storage independently.

7.6.3.1.4. automatically offloads infrequently accessed (cold) data to S3 while keeping frequently accessed (hot) data on high-performance SSDs

7.6.3.2. DC2

7.6.3.2.1. store data locally on SSDs

7.6.4. extends data warehouse queries to your data lake.

7.6.4.1. run analytic queries against petabytes of data stored locally in Redshift, and directly against exabytes of data stored in S3

7.6.5. allows you to enforce Row Level Security directly within the database

7.6.6. columnar storage

7.6.7. The SUPER data type gives you the ability to store nested JSON data natively within a Redshift table

7.6.7.1. an use PartiQL to efficiently query the nested data without changing the structure

7.6.8. OLAP type of DB

7.6.8.1. massively parallel processing

7.6.8.2. automatically parallelizes the data ingestion.

7.6.9. automatically and continuously backs up your data to S3

7.6.10. features

7.6.10.1. streaming ingestion

7.6.10.1.1. from Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka

7.6.10.1.2. provides low-latency, high-speed ingestion

7.6.10.1.3. into an Amazon Redshift provisioned or Amazon Redshift Serverless materialized view

7.6.10.2. systems views

7.6.10.2.1. STL_ALERT_EVENT_LOG

7.6.10.2.2. STL_WLM_QUERY

7.6.10.2.3. STL_QUERYTEXT

7.6.10.2.4. SVL_QUERY_SUMMARY

7.6.10.2.5. STL_QUERY

7.6.10.2.6. STL_BLOCKLIST

7.6.10.2.7. STL_TBL_PERM

7.6.10.2.8. SVV_EXTERNAL_SCHEMAS

7.6.10.3. concurrency scaling

7.6.10.3.1. automatically adds additional cluster capacity when needed to handle bursts in query load.

7.6.10.3.2. can be enabled directly in the Redshift cluster's WLM settings.

7.6.10.4. distribution styles

7.6.10.4.1. AUTO

7.6.10.4.2. EVEN

7.6.10.4.3. KEY

7.6.10.4.4. ALL

7.6.10.5. Data API

7.6.10.5.1. supports asynchronous query processing

7.6.10.5.2. allowing the web application to submit a query and fetch the results later without waiting for the query to complete.

7.6.10.6. Workload Management

7.6.10.6.1. Automatic Workload Management dynamically allocates resources based on query complexity

7.6.10.6.2. Maunual Workload Management

7.6.10.6.3. Short Query Acceleration

7.6.11. Redshift Serverless

7.6.11.1. integrates with Amazon S3, enabling efficient querying of large datasets stored in a data lake using Redshift Spectrum

7.6.11.2. run and scale analytics without managing the underlying data warehouse infrastructure

7.6.11.2.1. Supports integration with BI tools like Tableau and Amazon QuickSight

7.6.11.3. dynamically adjusts compute capacity to handle fluctuating query loads

7.6.11.3.1. Ideal for workloads with unpredictable usage patterns

7.6.11.4. effective for Extract, Transform, and Load operations

7.7. ElastiCache

7.7.1. distributed in-memory cache environment in the AWS Cloud

7.7.2. works with both the Redis and Memcached engines

7.7.3. not a fully managed NoSQL database

7.7.4. ElastiCache cannot be used as a cache to serve static content from Amazon S3

7.8. MemoryDB for Redis

7.8.1. in-memory database service for microservices-based applications

7.8.2. achieve microsecond read and single-digit millisecond write latency and high throughput

7.8.3. Multi-AZ transactional log to store data across multiple AZs in order to enable fast failover, database recovery, and node restarts

7.8.4. configure encryption at rest using AWS KMS.

7.9. Neptune

7.9.1. graph database service used for building applications that work with highly connected datasets

7.9.2. usecases

7.9.2.1. Social Networking

7.9.2.2. Recommendation Engines

7.9.2.3. Knowledge Graphs

7.9.2.4. Identity Graphs

7.9.3. storage is self-healing

7.9.4. supports graph query languages like Apache TinkerPop Gremlin and W3C’s SPARQL

7.9.5. supports read scaling through Read Replicas

7.10. Timestream

7.10.1. fully managed, purpose-built time-series database engines

7.10.2. low-latency queries to large-scale data ingestion

7.10.3. ingest more than tens of gigabytes of time-series data per minute

7.10.4. run SQL queries on terabytes of time-series data in seconds

8. Storage Services

8.1. Backup

8.1.1. fully managed service that centralizes and automates data protection across AWS services and hybrid workloads

8.1.2. centralize and automate data protection across AWS services and hybrid workloads

8.1.3. can be assigned to a resource type or a signle resource

8.1.4. To delete a backup plan, you must first delete all resources associated with it

8.2. S3

8.2.1. object storage service offering industry-leading scalability, data availability, security, and performance

8.2.2. stores data as objects within buckets

8.2.2.1. object consists of a file and optionally any metadata that describes that file

8.2.2.2. key is a unique identifier for an object

8.2.3. Storage capacity is virtually unlimited

8.2.4. strongly consistent for all GET, PUT and LIST operations

8.2.5. can’t change its Region after creation

8.2.6. features

8.2.6.1. S3 Transfer Acceleration S3TA

8.2.6.1.1. enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket

8.2.6.1.2. Multipart upload allows you to upload a single object as a set of parts.

8.2.6.2. high availability

8.2.6.2.1. objects are automatically stored across multiple devices spanning a minimum of three Availability Zones

8.2.6.3. S3 Analytics Storage Class Analysis

8.2.6.3.1. helps decide when to move between standard and standard_ia storage

8.2.6.4. Encryption

8.2.6.4.1. encryption context

8.2.6.4.2. Metadata, which can be included with the object, is not encrypted while being stored on Amazon S3

8.2.6.4.3. SSE-S3 does not provide the ability to audit trail the usage of the encryption keys.

8.2.6.5. lifecycle policies

8.2.6.5.1. When you have multiple rules in an S3 Lifecycle configuration, an object can become eligible for multiple S3 Lifecycle actions. In such cases, Amazon S3 follows these general rules:

8.2.6.6. Object Lock

8.2.6.6.1. prevent Amazon S3 objects from being deleted or overwritten for a fixed amount of time or indefinitely

8.2.6.7. Object Lambda

8.2.6.7.1. allows you to add custom code to process data retrieved from S3 before returning it to an application.

8.2.6.8. S3 Versioning

8.2.6.8.1. allows you to keep multiple versions of an object within the same bucket.

8.2.6.8.2. helps in data recovery and compliance

8.2.7. events supported

8.2.7.1. SNS

8.2.7.2. SQS

8.2.7.3. Lambda

8.2.7.4. NO FIFO

8.2.8. types

8.2.8.1. frequent access

8.2.8.1.1. standard general purpose

8.2.8.1.2. express one zone

8.2.8.2. infrequent access

8.2.8.2.1. standard_ia

8.2.8.2.2. onezone_ia

8.2.8.2.3. min object size 128KB min storage 30 day

8.2.8.3. intelligent tiering

8.2.8.3.1. optimize storage costs automatically

8.2.8.3.2. 30 consecutive days infrequent access tier

8.2.8.3.3. 90 consecutive days archive access tier

8.2.8.3.4. 180 consecutive days , ideep archive access tier

8.2.8.4. glacier

8.2.8.4.1. not available for real-time access

8.2.8.4.2. Long-term archival solution optimized for infrequently used data, or “cold data.”

8.2.8.4.3. S3 Glacier vault

8.2.8.4.4. types

8.3. Storage Gateway

8.3.1. Integrates on-premises enterprise applications and workflows with Amazon’s block and object cloud storage services

8.3.2. enables hybrid storage

8.3.2.1. stores files as native S3 objects, archives virtual tapes in Amazon Glacier

8.3.2.2. stores EBS Snapshots generated by the Volume Gateway with Amazon EBS

8.3.3. types

8.3.3.1. File Gateway

8.3.3.1.1. store and retrieve files

8.3.3.2. Volume Gateway Cloud-backed storage volumes

8.3.3.2.1. Cached volumes store your data in S3 and retain a copy of frequently accessed data subsets locally

8.3.3.2.2. Stored volumes low-latency access to your entire dataset

8.3.3.3. Tape Gateway

8.3.3.3.1. archive backup data in Amazon Glacier

8.4. Transfer family

8.4.1. secure transfer service for moving files into and out of AWS storage

8.4.1.1. Supports up to 3 Availability Zones and is backed by an auto scaling, redundant fleet for your connection and transfer requests

8.4.2. do not need to run or maintain any server infrastructure of your own

8.4.3. can provision a Transfer Family server with multiple protocols (SFTP, FTPS, FTP)

8.5. EFS

8.5.1. fully-managed file storage service

8.5.1.1. EFS file data can be managed simply with AWS DataSync

8.5.1.2. five EFS Lifecycle Management policies (7, 14, 30, 60, or 90 days) to automatically move files into the EFS Infrequent Access (EFS IA) storage class and save up to 85% in cost

8.5.2. You can mount EFS filesystems onto EC2 instances running Linux or MacOS Big Sur

8.5.2.1. Windows is not supported

8.5.3. Multiple Amazon EC2 instances can access an EFS

8.5.4. across multiple Availability Zones in an AWS Region

8.5.5. File system deletion is a destructive action that you can’t undo.

8.6. HDFS

8.6.1. Hadoop Distributed File System (HDFS)

8.6.2. HDFS is ephemeral storage that is reclaimed when you terminate a cluster

8.6.3. distributed, scalable file system for Hadoop

8.6.4. HDFS distributes the data it stores across instances in the cluster, storing multiple copies of data on different instances to ensure that no data is lost if an individual instance fails

8.6.5. you need to pay for data replication in HDFS

8.7. FSx

8.7.1. fully managed third-party file system solution

8.7.2. SSD storage to provide fast performance with low latency

8.8. Snowball Edge

8.8.1. device with on-board storage and compute power for select AWS capabilities

8.8.2. undertake local processing and edge-computing workloads in addition

8.8.3. transferring data between your local environment and the AWS Cloud

8.8.4. on-board S3-compatible storage and compute to support running Lambda functions and EC2 instances

8.8.5. types

8.8.5.1. storage optimised

8.8.5.2. compute optimised

8.8.6. jobs

8.8.6.1. import to S3

8.8.6.2. Export from S3

8.8.6.3. Local Compute and Storage Only

8.9. Snowmobile

8.9.1. An exabyte-scale data transfer service used to move extremely large amounts of data to AWS

8.9.1.1. up to 100PB per Snowmobile

8.9.2. multiple layers of security

8.9.2.1. GPS tracking, alarm monitoring

8.9.2.2. 24/7 surveillance

8.9.2.3. optional escort security vechile while in transit

8.9.2.4. data is encrypted with 256-bit encryption keys you manage through KMS

8.9.3. pricing is based on the amount of data stored on the truck per month

8.10. EBS

8.10.1. block level storage with EC2

8.10.1.1. Well-suited for use as the primary storage for file systems, databases

8.10.2. Performance metrics, such as bandwidth, throughput, latency, and average queue length, provided by Amazon CloudWatch

8.10.3. raw, unformatted

8.10.4. allows you to change the volume type of existing volumes with no downtime using the AWS Management Console or the AWS Command Line Interface (CLI).

8.10.5. Termination protection is turned off by default

8.10.5.1. up to 5,000 EBS volumes by default

8.10.5.2. up to 10,000 snapshots by default

8.10.6. encrytion at rest, on snapshots and when data is moving between volume and instance

8.10.6.1. you get encrytion, and you get encryption. everyone gets encrytion

8.10.7. types

8.10.7.1. SSD general gp2, gp3

8.10.7.1.1. gp3 consistent baseline rate of 3,000 IOPS

8.10.7.1.2. gp2 Base performance of 3 IOPS/GiB, with the ability to burst to 3,000 IOPS for extended periods of time

8.10.7.2. IOPS SSD Provisioned i01, i02

8.10.7.2.1. io1 intensive workloads

8.10.7.2.2. io2 higher durability same price

8.10.7.3. HDD Throughput Optimized st1

8.10.7.3.1. Low-cost magnetic storage that focuses on throughput rather than IOPS

8.10.7.4. HDD cold sc1

8.10.7.4.1. Low-cost magnetic storage that focuses on throughput rather than IOPS

9. Compute Services

9.1. Lambda

9.1.1. serverless compute service

9.1.2. 512MB – 10GB

9.1.3. 15 MINUTES MAX

9.1.4. storage options

9.1.4.1. Ephemeral

9.1.4.1.1. max 10240MB size

9.1.4.1.2. Dynamic

9.1.4.1.3. included with lambda

9.1.4.1.4. only accessible by the specific lambda function

9.1.4.2. Lambda Layers

9.1.4.2.1. 5 per function (up to 250MB in total)

9.1.4.2.2. durable

9.1.4.2.3. static, immutable

9.1.4.2.4. included with lambda

9.1.4.2.5. shared across all lambda functions

9.1.4.3. S3

9.1.4.3.1. elastic, durable

9.1.4.3.2. NOT INLCUDED WITH LAMBDA

9.1.4.4. EFS

9.1.4.4.1. Must leverage EFS Access Points

9.1.4.4.2. elastic, durable

9.1.4.4.3. Be Careful of EFS connection limits and connection burst limits

9.1.4.4.4. NOT INLCUDED WITH LAMBDA

9.1.5. SOC, HIPAA, PCI, ISO compliant

9.1.6. A Lambda layer is a .zip file archive that contains supplementary code or data

9.1.6.1. Layers usually contain library dependencies, a custom runtime, or configuration files

9.1.7. executes your code only when needed and scales automatically

9.1.7.1. You allocate the amount of memory for functions

9.1.7.2. It allocates porportional CPU power, netwrok bandwidth and disk I/O

9.1.8. memory allocation can be increased to improve performance

9.1.9. Lambda@Edge

9.1.9.1. allows you to run Lambda functions at AWS Edge locations in response to CloudFront events

9.1.9.2. used for processing content closer to users

9.2. Elastic Beanstalk

9.2.1. cloud-based PaaS (Platform as a Service) that simplifies web application deployment and management on the AWS cloud

9.2.2. quickly deploy and manage applications in the AWS Cloud

9.2.3. automatically handles

9.2.3.1. capacity provisioning

9.2.3.2. load balancing

9.2.3.3. scaling

9.2.3.4. application health monitoring

9.2.4. supports

9.2.4.1. Go

9.2.4.2. Java

9.2.4.3. .NET

9.2.4.4. Node.js

9.2.4.5. PHP

9.2.4.6. Python

9.2.4.7. Ruby

9.3. Batch

9.3.1. fully managed batch processing

9.3.2. batch computing workloads

9.3.2.1. Job Queues

9.3.2.2. manages compute environments

9.3.2.3. as Docker images

9.3.3. regional, across AZ's

9.3.3.1. Batch chooses where to run the jobs, launching additional AWS capacity if needed

9.3.4. no need to manage clusters but you must pay for the instances batch provisions for you

9.3.5. schedule batch jobs using CloudWatch events

9.3.5.1. or orchestrate using Step Functions

9.4. LightSail

9.4.1. Build applications and websites fast with low-cost, pre-configured cloud resources

9.4.2. cloud-based virtual private server

9.4.2.1. disk up to 16 TB, 15 disks per instance

9.4.3. Linux or Windows OS

9.4.4. websites and web applications

9.4.4.1. Load Balancers

9.4.5. Database maintenance

9.4.6. "AWS for people who cried when they saw EC2. Comes with training wheels and a helmet" - ChatGPT

9.5. ParallelCluster

9.5.1. open source cluster management tool

9.5.2. cluster management tool for deploying and managing High Performance Computing clusters on AWS

9.5.2.1. master instance for build and control

9.5.2.2. cluster of compute instances

9.5.2.3. shared file system

9.5.2.4. batch scheduler

9.5.3. does not support building Windows clusters or mixed instance types for a cluster

9.5.3.1. you can pick one instance type for the master node and another instance type for the compute node

9.6. ECS

9.6.1. EFS is a fully managed, scalable file storage that can be attached to multiple ECS containers, allowing them to share the same file system.

9.6.2. manage docker containers on a cluster

9.6.3. regional

9.6.4. consistent deployment and build experience

9.6.5. manage, and scale batch and Extract-Transform-Load (ETL) workloads

9.7. SAM

9.7.1. open-source framework for building serverless applications

9.7.1.1. cloudformation but serverless

9.7.2. serverless

9.7.3. You create a YAML configuration template which sam converts into a complex clourdformation

9.7.4. download SAM CLI to locally debug Lambda functions written in Node.js, Java, Python, and Go.

9.7.5. SAM Accelerate

9.7.5.1. sam sync

9.7.5.1.1. synchronizes codes and infrastructure

9.7.5.2. sam sync --code

9.7.5.2.1. synchronizes only code

9.7.5.2.2. v fast

9.7.5.3. sam sync --code --resource AWS::Serverless::Function

9.7.5.3.1. synchronizes only the lambda functions and their dependencies

9.7.5.4. sam sync --code --resource-id {resource-id}

9.7.5.4.1. synchronizes only a specific resources specified by its ID

9.7.5.5. sam sync --watch

9.7.5.5.1. monitor for file changes and auto synchronize when changes detected

9.7.5.5.2. if changes include configuration, it uses sam sync

9.7.5.5.3. if changes are only code it uses sam sync --code

9.8. Fargate

9.8.1. serverless compute engine for containers

9.8.2. works with both ECS and EKS

9.8.2.1. EFS

9.8.3. no manual infrastructure management required.

9.9. ECR

9.9.1. AWS Docker registry service

9.9.2. regional

9.10. EC2

9.10.1. Elastic Compute Cloud

9.10.2. types

9.10.2.1. general purpose

9.10.2.1.1. M-

9.10.2.1.2. T-

9.10.2.2. compute optimized

9.10.2.2.1. C-

9.10.2.3. memory optimized

9.10.2.3.1. R-

9.10.2.3.2. X-

9.10.2.3.3. U-

9.10.2.4. accelerated computing

9.10.2.4.1. P-

9.10.2.4.2. G-

9.10.2.4.3. Trn-

9.10.2.4.4. DL-

9.10.2.5. storage optimized

9.10.2.5.1. I-

9.10.2.5.2. D-

9.10.2.6. HPC optimized

9.10.2.6.1. Hpc-

9.10.3. allows users to rent virtual servers (instances) in the AWS cloud to run their applications

9.10.3.1. without the need to purchase or manage physical hardware

9.10.4. if it goes down it takes the root volume with it by default

9.11. Graviton

9.11.1. custom-designed, 64-bit ARM-based processors developed by Amazon Web Services

9.11.2. provide a balance of performance, efficiency, and price for various cloud workloads.

9.11.3. offers best price performance

9.11.4. works with

9.11.4.1. MSK

9.11.4.2. RDS

9.11.4.3. MemoryDB

9.11.4.4. ElastiCache

9.11.4.5. OpenSearch

9.11.4.6. EMR

9.11.4.7. Lambda

9.11.4.8. Fargate

9.12. EKS

9.12.1. Start, run, and scale Kubernetes without thinking about cluster management

9.12.1.1. managed service

9.12.1.2. allows you to run Kubernetes on AWS without installing, operating, or maintaining your own Kubernetes control plane or nodes

9.12.2. Integration with various AWS services to provide scalability and security for your applications:

9.12.2.1. Amazon ECR for container images

9.12.2.2. Elastic Load Balancing for load distribution

9.12.2.3. IAM for authentication

9.12.2.4. Amazon VPC for isolation

9.12.3. autoscaling

9.12.3.1. Cluster Autoscaler – uses AWS Auto Scaling groups.

9.12.3.2. Karpenter – works directly with the Amazon EC2 Fleet.

9.13. SDK

9.13.1. Develop and deploy applications with Boto3

9.13.2. makes it easy to call AWS services using idiomatic Python APIs

9.13.3. providing a set of libraries that are consistent and familiar for Python developers

9.13.4. key packages

9.13.4.1. Botocore

9.13.4.1.1. the library providing the low-level functionality shared between the Python SDK and the AWS CLI

9.13.4.2. Boto3

9.13.4.2.1. the package implementing the Python SDK itself

10. Migration Services

10.1. DMS

10.1.1. Database Migration Service

10.1.2. offers

10.1.2.1. one-time data migration into RDS and EC2-based databases

10.1.2.2. Continuous replication can be done from your data center to the databases in AWS

10.1.2.3. can schedule task to do it periodically

10.1.2.4. also continuously replicate your data with high availability (enable multi-AZ) and consolidate databases into a petabyte-scale data warehouse by streaming data to Amazon Redshift and Amazon S3

10.1.2.5. supports homogeneous and heterogeneous migrations

10.1.3. Replication between on-premises to on-premises databases is not supported

10.1.4. features

10.1.4.1. Data Validation

10.1.4.1.1. compare the source daatabse to the target, to verify migration

10.1.4.2. Metrics

10.1.4.2.1. Host

10.1.4.2.2. Replication Task

10.1.4.2.3. Table

10.1.4.3. premigration assessment

10.1.4.3.1. evaluates specified components of a database migration task to help identify any problems that might prevent a migration task from running as expected.

10.1.4.4. does not migrate empty tables

10.2. DataSync

10.2.1. online data transfer service

10.2.2. Direct Connect

10.2.3. to and from AWS storage services over the internet or AWS Direct Connect

10.2.4. automates and accelerates moving data between on-premises and AWS Storage services

10.2.5. encrypts data at rest and in transit

10.2.5.1. uses SSL/TLS to ensure the data's confidentiality and integrity.

10.2.6. DataSync can copy data between

10.2.6.1. NFS or SMB file serves

10.2.6.2. S3

10.2.6.3. EFS

10.2.6.4. FSx

10.3. Seven Common Migration Strategies

10.3.1. Rehost

10.3.1.1. lift and shift

10.3.2. Replatform

10.3.2.1. lift, tinker and shift

10.3.3. Repurchase

10.3.3.1. drop and shop

10.3.4. Refactor/Re-architect

10.3.4.1. often the most expensive solution

10.3.5. Relocate

10.3.6. Retire

10.3.7. Retain

10.4. Server Migration Service

10.4.1. agentless service for migrating thousands of on-premises workloads to AWS

10.4.1.1. automating incremental replication of live server volumes to the AWS cloud

10.4.1.2. allowing customers to schedule replications

10.4.1.3. track the replication progress of a group of servers via Management Console

10.4.2. Each server volume replicated is saved as a new Amazon Machine Image (AMI), which can be launched as an EC2 instance in the AWS cloud

10.4.3. offers multi-server migration support as well

10.5. Snowball Edge

10.5.1. on-board storage and compute power for select AWS capabilities

10.5.2. undertake local processing and edge-computing workloads in addition

10.5.3. transferring data between your local environment and the AWS Cloud

10.5.4. on-board S3-compatible storage and compute to support running Lambda functions and EC2 instances

10.5.5. handle up to 80 TB per device

10.5.6. types

10.5.6.1. storage optimised

10.5.6.2. compute optimised

10.5.7. jobs

10.5.7.1. import to S3

10.5.7.2. Export from S3

10.5.7.3. Local Compute and Storage Only

10.6. Snowmobile

10.6.1. An exabyte-scale data transfer service used to move extremely large amounts of data to AWS

10.6.1.1. up to 100PB per Snowmobile

10.6.2. multiple layers of security

10.6.2.1. GPS tracking, alarm monitoring

10.6.2.2. 24/7 surveillance

10.6.2.3. optional escort security vechile while in transit

10.6.2.4. data is encrypted with 256-bit encryption keys you manage through KMS

10.6.3. pricing is based on the amount of data stored on the truck per month

10.6.4. durable shipping container that can withstand harsh environmental conditions.

10.6.4.1. yes a practice question asked me about transferring data in harsh climates

10.6.5. does not offer a storage Clustering option

10.7. Application Discovery Service

10.7.1. modes

10.7.1.1. agent-based discovery mode (ADA)

10.7.1.1.1. system configuration

10.7.1.1.2. system performance

10.7.1.1.3. running processes

10.7.1.1.4. details of the network connections between systems

10.7.1.2. agentless discovery mode (ADC)

10.7.1.2.1. no need to install any software on the on-premise servers

10.7.1.2.2. virtual machine inventory, configuration abd performance history (APU, memory and disk usage)

10.7.2. resulting data can be viewed within AWS Migration Hub

10.8. Application Migration Service (MGN)

10.8.1. does not require agents to be installed on databases.

10.8.1.1. uses a lightweight, non-intrusive agent on source servers to enable real-time data replication.

10.8.2. designed for lift-and-shift migrations

10.8.2.1. can be used for test migration tho

10.8.3. does not exclusively support only database migrations.

10.8.4. can automatically replicates live server volumes to AWS, which can be used to launch cloud instances mirroring your on-premises environments.

10.8.4.1. minimal cutover window, minimizing downtime.

10.8.5. continuous data replication during staging before cutover into full AWS approach

10.9. Schema Conversion Tool

10.9.1. designed to convert database schemas and code from one database engine to another

10.10. Transfer Family

10.10.1. designed for secure file transfers to AWS using familiar protocols

10.10.1.1. provides a seamless migration experience without the need to modify the existing SFTP-based workflows or integrations

10.10.2. file transfers into and out of S3 or EFS

10.10.3. can be used through Route 53

10.10.4. and can include authenication such as active directory

10.10.5. features

10.10.5.1. managed infrastructure

10.10.5.2. scalable

10.10.5.3. reliable

10.10.5.4. highly available (Multi-AZ)

10.10.6. porotcol types

10.10.6.1. SFTP

10.10.6.2. FTPS

10.10.6.3. FTP

11. Networking & Content Delivery

11.1. API Gateway

11.1.1. API Gateway acts as a front door to manage all the API calls, ensuring scalability, security, and efficiency.

11.1.2. Enables developers to create, publish, maintain, monitor, and secure APIs at any scale

11.1.3. HIPAA eligible service

11.1.4. reating, deploying, and managing a RESTful API to expose backend HTTP endpoints, Lambda functions, or other AWS services

11.1.5. Together with Lambda, API Gateway forms the app-facing part of the AWS serverless infrastructure

11.2. CloudFront

11.2.1. Content Delivery Network (CDN) service.

11.2.2. speeds up distribution of your static and dynamic web content to your users

11.2.3. NACL cannot be associated with an Amazon CloudFront distribution

11.2.3.1. CloudFront works from edge locations and doesn't belong to a VPC

11.3. Route 53

11.3.1. highly available and scalable Domain Name System (DNS) web service

11.3.2. used for

11.3.2.1. domain registration

11.3.2.2. DNS routing

11.3.2.3. health checking

11.4. VPC

11.4.1. the networking layer of Amazon EC2

11.4.1.1. Create a virtual network in the cloud dedicated to your AWS account where you can launch AWS resources

11.4.2. VPC spans all the Availability Zones in the region

11.4.2.1. After creating a VPC, you can add one or more subnets in each Availability Zone

11.4.3. VPC endpoints

11.4.3.1. enables private connections between the VPC and database without requiring internet access.

11.4.3.2. works with

11.4.3.2.1. S3

11.4.3.2.2. DynamoDB

11.5. Direct Connect

11.5.1. links your internal network to a Direct Connect location over a standard Ethernet fiber-optic cable

11.5.2. data can now be delivered through a private network connection between AWS and your datacenter or corporate network

11.5.3. 1 Gbps, 10 Gbps, and 100 Gbps connections are available

11.5.4. useful for

11.5.4.1. transferring large data sets

11.5.4.2. developing and using applications that use real-time data feeds

11.5.4.3. building hybrid environments that satisfy regulatory requirements requiring the use of private connectivity

11.6. ELB

11.6.1. Distributes incoming application or network traffic across multiple targets

11.6.1.1. EC2 instances

11.6.1.2. ECS

11.6.1.3. lambda functions

11.6.1.4. ip addresses

11.6.1.5. avalability zones

11.6.2. you must specify one public subnet from at least two Availability Zones

11.6.2.1. You can specify only one public subnet per Availability Zone

11.6.3. Monitors the health of its registered targets and routes traffic only to healthy targets

11.6.4. Cross Zone Load Balancing

11.6.5. types

11.6.5.1. application load balancer

11.6.5.2. network load balancer

11.6.5.3. gateway load balancer

11.6.5.4. classic load balancer - EC2

11.6.5.5. HTTP Headers

11.7. Global Accelerator

11.7.1. improves the availability and performance of your applications to your local and global users

11.7.2. provides static IP addresses that act as a fixed entry point to your application

11.7.3. continually monitors the health of your application endpoints

11.7.3.1. detect an unhealthy endpoint and redirect traffic to healthy endpoints in less than 1 minute.

11.7.4. benefits

11.7.4.1. instant regional failover

11.7.4.2. high availability

11.7.4.3. no variability around clients that cache IP addresses

11.7.4.4. improved performance

11.7.4.5. easy manageability

11.7.4.6. fine-grained control

11.8. Transit Gateway

11.8.1. networking service

11.8.1.1. hub and spoke model to enable customers to connect their on-premises data centers and their Amazon Virtual Private Clouds (VPCs) to a single gateway

11.8.2. customers only have to create and manage a single connection from the central gateway into each on-premises data center

11.8.3. if a new VPC is created, it is automatically connected to the Transit Gateway and will also be available to every other network that is also connected to the Transit Gateway

11.8.4. features

11.8.4.1. inter-region peering

11.8.4.1.1. effective way to replicate data for geographic redundancy or to share resources between AWS Regions

11.8.4.2. multicast

11.8.4.2.1. Enables customers to have fine-grain control on who can consume and produce multicast traffic

11.8.4.3. automated provisioning

11.8.4.3.1. Customers can automatically identify the Site-to-Site VPN connections and the on-premises resources with which they are associated using AWS Transit Gateway

11.9. PrivateLink

11.9.1. provides private connectivity between VPCs, AWS services, and on-premises applications, securely on the AWS network.

12. Developer Tools

12.1. CodePipeline

12.1.1. continuous integration and continuous delivery service for automating release pipelines

12.1.2. suited for code deployments and delivery processes

12.2. CodeBuild

12.2.1. continuous integration service

12.2.2. compiles source code, runs tests, and produces software packages that are ready to deploy.

12.2.3. You can automate your release process using AWS CodePipeline to test your code and run your builds with CodeBuild.

12.2.4. three levels of compute capacity

12.2.4.1. Build.general1.small – 3GB memory, 2 vCPU

12.2.4.2. Build.general1.medium – 7GB memory, 4 vCPU

12.2.4.3. Build.general1.large – 15GB memory, 8 vCPU

12.3. CodeCommit

12.3.1. fully-managed source control service

12.3.2. hosts secure Git-based repositories, similar to Github.

12.4. CodeDeploy

12.4.1. fully managed deployment service

12.4.2. automates software deployments to a variety of compute services, such as

12.4.2.1. EC2

12.4.2.2. Fargate

12.4.2.3. Lambda

12.4.2.4. your on-premise servers

12.5. CodeStar

12.5.1. cloud‑based software development service

12.5.2. provides the tools you need to quickly develop, build, and deploy applications on AWS.

12.5.3. commonly used along with CodeCommit, CodeBuild, CodeDeploy, and CodePipeline for a robust CI/CD toolchain.

12.6. X-Ray

12.6.1. analyzes and debugs production, distributed applications, such as those built using a microservices architecture.

12.6.2. you can identify performance bottlenecks, edge case errors, and other hard to detect issues.

12.7. AppConfig

12.7.1. quickly and securely adjust application behavior in production environments without needing to deploy code

12.7.2. features

12.7.2.1. feature flags

12.7.2.1.1. Allows gradual release of new capabilities to users

12.7.2.1.2. measures the impact of those changes before full deployment.

12.7.2.2. dynamic configurations

12.7.2.2.1. Enables updates to block lists, allow lists, throttling limits, logging verbosity, and other operational tuning to quickly respond to issues in production environments.

12.7.2.3. validators

12.7.2.3.1. Ensures that your configuration data is syntactically and semantically correct before deploying the changes

12.7.2.4. deployment strategies

12.7.2.4.1. Enables slow release of changes to production environments over minutes or hours.

12.7.2.5. monitoring and automatc rollback

12.7.2.5.1. Integrates with Amazon CloudWatch to monitor changes to your applications.

12.7.2.5.2. If a configuration change causes an alarm in CloudWatch, AWS AppConfig automatically rolls back the change to maintain the application’s health.