AWS Certified DevOps Engineer - Professional

Get Started. It's Free
or sign up with your email address
AWS Certified DevOps Engineer - Professional by Mind Map: AWS Certified DevOps Engineer - Professional

1. CI/CD/Automation

1.1. CloudFormation

1.1.1. Stack Stack Updating Stack Creation Upload Template to S3 Syntax Check Stack Name Parameter Verification Processing the template and Stack creation Resource Ordering Stack Completion or Rollback Stack Nesting Why

1.1.2. Template Anatomy Paramters Mappings Conditions Recources Outputs Intrinsic Functions Base64 FindInMap GetAtt GetAZs Join Select Ref Condition Functions DependOn DeletionPolicy Specifies what happen with every resource during stack deleting Delete Retain Snapshot Default is Delete Wait Conditions & Handlers Wait conditions and handlers are two related components Wait condition handler is cloudformation resource that doesn't have properties but it generates signed URL, which can be used to communicate SUCCESS or FALIRE Wait condition generally has four components: Custom Resources backed by Type = Custom::* ServiceToken: Ref: LambdaFunction or SNS topic Request Respond Creation Policy Can ONLY be used with EC2 instances and Autoscaling groups (currently) Tow Main Components When number of signals equal or more than Count in Creation policy that resource is marked as complete

1.1.3. Stack Policy govern what can be changed and by who can't be deleted after creation; only can be changed Stack Policy Can prevent resources update Evaluated first before changing resources By default, absence of stack policy, allows any updates Once stack policy is applied it can not be removed Once policy was applied, by default ALL resources are protected, Update:* is denied To remove Default protection you need to update the policy with explicit "Allow" for one/more/all resources Update:* Update Impact

1.2. OpsWorks

1.2.1. Two Part CHEF Agent Configuration of Machines OpsWork Automation Engine Create, Update, Delete various AWS Infrastructures Components Handling Load Balancing AutoScaling AutoHealing LifeCycle Events

1.2.2. Chef Cookbooks Recepies OpsWorks and Chef are declarative desired state engines

1.2.3. Instances require INTERNET access (default vpc, public subnets or private subnets with nat)

1.2.4. Linux and Windows machines can NOT be mixed

1.2.5. changes os etc apply to new instances ONLY

1.2.6. RDS instance can ONLY be associated with ONLY one Stack

1.2.7. Stack Clone Operation Does NOT clone RDS instances

1.2.8. Specific Resources EIP RDS Volumes

1.2.9. Layers Opswork Traditional Layer ECS Integrate with ECS cluster RDS

1.2.10. Events Setup instance has been created Configure enters or leaves online; associated or disassociated EIP; attach or detach ELB to the layer on all instances in the stack Deploy when you run deploy command on the instance Undeploy occur when you deleted app on the layer or run undeploy Shutdown when instance shutdown before instance is terminated

1.2.11. Instances Stack and Layer influence on Instances Stack has default OS for instances Layers contains recipes, general, network ,disc configs Types 24x7 Time based instances Load based instances parameters that can be overwritten Subnet Id SSH Key os root device type volume type Instance Auto Healing Each Opswork instance has Opswork Agent Heartbeat style health check

1.2.12. Applications Creating App Name Document Root Data source App source Env Variables Domain Name SSL Settings Deployment it executes the Deployment of App against instances by Command Passed to command is application id App Parameters are passed to Chef env Deploy recipe access to app source and pull 5 version or app are maintained Deployment Commands Creates application deployment stack level command to be executed against stack Commands

1.2.13. Bershelf was provided in Opsworks/Chef 11.10

1.2.14. Databags is a global JSON objects accessible from within CHEF framework There are multiple databags, including: STACK, LAYER, APP, INSTANCE Data is accessed through Chef `data_bag_item` & `search` methods

1.2.15. events can be run manually

1.3. Elastic Beanstalk

1.3.1. deploy, monitor and scale an application

1.3.2. focus towards infrastructure, focusing on components and performance - not configuration and specifications

1.3.3. it attempts to remove or simplify infrastructure management, allowing application to deployed into infrastructure environment easily

1.3.4. components Applications your entire application is one EB application OR each logical component of your application is EB application each application is providing own URL Environments Each application can have several environments Environment is either single instance or scalable types App URL can swaped between environment Versions app can have many versions zip file version of app can be deployed in environments

1.3.5. Configuration Web Tier Scaling Instances Notifications Software Configuration Updates and Deployments Health Network Tier Date Tier Types one db can be used in only one EB Cloning EB doesn't clone DB DB can be moved to other EB

1.3.6. Supported Environments NodeJS PHP Python Ruby Go Java Tomcat Microsoft IIS Docker Glassfish Go Python Custom

1.3.7. ebextantions it is a folder .ebextations it allows granular configuration of EB environment and customisation resources that it has (EC2, ELB and other) YAML format ended by .config Files contain key sections option_settings recources other files will be processed in alphabetical order leader_instance leader instance picked up during deployment and after deployment it becomes general instance to make sure that some commands were executed once

1.3.8. EB Docker Application Source Bundle Files

2. DevOps Core Concept

2.1. CI & CD

2.1.1. Continuous Integration is a process of automating regular codecommit followed through Build and Test processes designed to highlight integration problem

2.1.2. Continuous Deployment takes the form of workflow based process which accepts tested software build payload from CI server

2.2. Deployment

2.2.1. Single Target Deployment

2.2.2. All at once Deployment

2.2.3. Minimum In Service Style Deployment

2.2.4. Rolling Deployment

2.2.5. Blue/Green Deployment

2.3. Testing

2.3.1. A/B Testing is sending percentage traffic to Blue Environment and Percent of traffic to Green

2.4. Bootstraping

2.5. Immutable Architecture

2.5.1. means "Don't diagnose or Fix, throw away and re-create

3. Monitoring/Metric/Logging

3.1. CloudWatch

3.1.1. Metric Gathering Service 2 weeks is the longest period of time for storring metrics CloudWatch launched extended retention of metrics in November 1, 2016. The feature enabled storage of all metrics for customers from the previous 14 days to 15 months. CloudWatch Metrics now supports the following three retention schedules: 1 minute datapoints are available for 15 days 5 minute datapoints are available for 63 days 1 hour datapoints are available for 455 days Metrics fall under different namespaces You can aggregate data accros ec2 Custom Metrics can use custom namespace

3.1.2. Monitoring/Alert Service Alarms alarm period must be equal or greater than metric frequency alarm can't invoke actions because it is in state. Change of state can invoke actions alarm action must be in the same region as alarm AWS resources doesn't send metric data under certain conditions. For example, if ebs volume is not attached to ec2 instance it won't send data to cloudwatch You can create alarm before metric Alarm State OK ALARM INSUFFICEANT_DATA cli mon-put-metric-alarm mon-[enable/disable]-alarm mon-describe-alarms

3.1.3. Graphing Service

3.2. CloudTrail

3.2.1. the service records all AWS API request that made in your account and delivers in Log to you

3.2.2. Logs contain identity of who made the request time of call The source IP of the call request parameters responds elements returned by AWS

3.2.3. Purposes Enable security analysis Track changes in your account provide compliance auditing

3.2.4. Two types of trail All region to one s3 bucket One Region to one s3 bucket

3.2.5. Storage S3 Server Side Encryption Store log as long as you want or apply archiving and life cycle policies logs are delivering within 15 min New log file are published every 5 min

3.2.6. Notes Aws Cloudtrail records API request made by a user or on behalf of the user by AWS service There is 'invokedBy' field in logs

3.3. CloudWatch Logs

3.3.1. Agent for EC2 Ubuntu Amazon OS Windows

3.3.2. Main Purposes Monitor logs from EC2 instances in real time Monitor CloudTrail logged events Archive log data

3.3.3. Terminology Log Event Timestamp and message Log stream it is the sequence log events that shared single event source Log Group it is a group of log stream that shared retention period and monitoring and access controle settings Metric Filters how service would extract metrics from events Retention Settings

3.3.4. Filter will be applied ONLY to data that was got after creating Filter it will return first 50 result

3.3.5. Subscription Filter Amazon Kinesis Stream AWS Lambda Amazon Kinesis Firehose

3.4. CloudWatch Events

3.4.1. it is a similar service as CloudTrail but more faster

3.4.2. targets Lambda Kinesis Stream SNS topics built-in targets can have multiple targets

4. Security/Governance/Validation

4.1. Delegation and Federation

4.1.1. Federation allows users from other Identity Provider (IdP) to access to you aws system Two Types Corporate/Enterprise Identity Federation Web or Social Identity Federation

4.1.2. Roles is an object that contains two policy documents TRUST policy ACCESS policy

4.1.3. Sessions Session is an set or temporary credentials secret and access key with an expiration you obtain them using Security Token Service STS sts:AssumeRole sts:AssumeRoleWithSAML sts:AssumeRoleWithWebIdentity It may or may not involve Congnito - depending it is web federation

4.1.4. Corporate/Enterprise Identity Federation Types or Request Console - Assume Role API - GetFederateToken Console - AssumeRoleWithSAML Federation Proxy has sts:assumerole permissions respond AccessKey SecretKey Token Id ExpirationDate

4.1.5. Web Identity Federation Cognito can orchestrate unauthenticated identities can merge unauthenticated identity to authenticated if both are provided can merge multiple identities like fasebook, twiter .. when Id is merged - any sync data is merged Types:

5. High Availability & Elasticity

5.1. EC2 AutoScaling

5.1.1. Lifecycle Lifecycle Start When you/Autoscaling Group launch instance Lifecycle End When you/autoscaling group terminate instance If Scale In or fail health check -> instance will be terminated Enter StandBy remove instance from in service and troubleshout instance is still managed by autoscaling group Detach instance will be removed from AG and still running

5.1.2. Lifecycle Hooks EC2_INSTANCE_LAUNCHING for example install new app during launcing EC2_INSTANCE_TERMINATING for example copy logs before terminating work process 1) AG responds scale out event by launching new instance 2) AG puts the instance into Pending:Wait state 3) AG sends a message to the notification target defined for the hook 4) wait until you tell it continue or timeout ends 5) you can perform any action, eg install software 6) By default, the instance wait for an hour and will change the state to Pending:Proceed, then it will moved to inService state If you need more time you can restart timeout period or change timeout Notes: You can change heartbeat timeout or you can set the parameter when you creates lifecycle hooks using `heartbeat-timeout` parameter You can call `complete-lifecycle-action` `record-lifecycle-action-heartbeat` to add more time to timeout 48 hours is maximum that you can keep the server in wait condition When Spot Instance terminates you must still complete the lifecycle actions Cooldown start when instance enter to inservice state Result Response of instance Abandon Continue

5.1.3. Launch Configuration Note When you create AG you MUST specify LG One LG can be associated with Many AG One AG can have only One LG you can NOT update LG, only create new If you changed LC in AG -> new instance will use new LC, old instances will NOT recreated Spot Instances You CAN use lifecycle hooks with spot instances LC for on-demand and spot instance are different set bid price in you LC If you want to change bid price you need to create new LC If instance was terminated it will try to launch new one If your price is below the spot price AG will wait LC from EC2 instance you can create LC from running EC2 instance some properties of Ec2 instances are not supported AG Diff in creating LC from scratch and from EC2 instance Self Healing

5.2. RDS

5.2.1. Engines MySQL MariaDB PostgreSQL MSSQL Storage Oracle AuroraDB

5.2.2. Managed Administration RDS Provisioning Infrastructure Install Software Automatic Backups Automatic Patching Synchronous Data Replication Automatic Failover You Settings Schema Performance Tuning

5.2.3. Scaling Vertical Scale/ Scale In Compute Storage Read Scale Out Up to 5 Read Replicas Read Replica can be promoted to Master can create Read Replica of Read Replica Sharding Split you tables across multiple DBs Split Tables that aren't joined by queries

5.3. Aurora

5.3.1. you can encrypt unencrypted db you need to create new

5.3.2. compatible with MySQL 5.6

5.3.3. Storage Fault Tolerant and Auto Healing Disk fails are repeated in background Detects database crashes and restartes No crash recovery or cache rebuilding Automatic failover to one of up to 15 read replicas Storage autoscaling from 10Gb up to 64 Tb without any distruption

5.3.4. Backups Automatic, Continuous and incremental backups Point in time recovery within a second retention period Up to 35 Days Stored in S3 no impact on database performance

5.3.5. Snapshots Stored to S3 Keep until you delete them Incremental

5.3.6. Data failures 6 copies of your data 3 availability zones restore in healthy AZ restore from PIT and snapshotes

5.3.7. Fault Tolerance data divides into 10 Gb segments across many disks Transparently handle data loss Can lose up to 2 copies of you data without effecting write Can lose up to 3 copies of data without effecting reading All storages are auto healing

5.3.8. Replicas Aurora Replica Share same underline volume that the primary instance Updates made by primary instance immediately visible across all replicas up to 15 Performance impact on primary is low Replica can failover the target without data loss MySQL Replica replay by transaction up to 5 high performance impact on primary Replica can failover with potential minutes data loss

5.3.9. Security all instances must be created in VPC SSL KMS data at rest encrypted storage encrypted backups encrypted snapshote encrypted replicas you can not encrypt unencrypted db

5.4. DynamoDB

5.4.1. Primer NoSQL Fully managed Predictable fully manageable performace seamless scalability no visible servers no practical storage limitations Fully resilient and highly available Performance scale is linear way Fully integrated with IAM Structure Tables Items (Rows) Atributes Hash Key/Partition Key Sort Key/Range Key Data Types String Number Binary Boolen Null Document JSON (List or Map) Set (array) Write Capacity Unit (WCU) Number of 1KB blocks per second Write at least to 2 location to finish write Read Capacity Unit (RCU) Number of 4 KB blocks per second Read from only 1 location Eventually Consistent By Default You can configure Immediate consistent If your items are smaller than 4 KB in size, each read capacity unit will yield one strongly consistent read per second or two eventually consistent reads per second. Unlike SQL - Schema is not part of database level

5.4.2. Partition There is underlying storage and processing nodes of DynamoDB 1 partition 10 Gb 3000 RCU 1000 WCU when > 10Gb or > 3000 RCU or > 1000 WCU get new partition data automatically spreaded over the time distribution between partitions based on HASH / Partition Key Limitations Partitions will automatically increase Partitions don't decrease Allocated WCU and RCU are splitted between partitions Each partition key is limited calculating MAX((WCU/1000 +RCU/3000),Storage/10Gb) Round Up to whole number

5.4.3. GSI/LSI Global Secondary Indexies Allows to use alternative partition key As with LSI, you can shoose GSI doesn't share WCU and RCU and has own changes written Async Supports ONLY eventual consistency Local Secondary Indexies Contain Any data that written to the table copied Async to any LSI Shares WCU and RCU with the table sparse index Operations Concerns about LSI Supports two consistencies

5.4.4. Stream & Replication Stream it is ordered record of updates to dynamo db table When stream is enabled AWS guaranty that each change in the table occur ONLY once All changes to the Table occur in the scream in nr realtime Stream contains only Partition Keys from NOW to - 24 hours can be used in Stream View aws dynamodb create-table \ --table-name BarkTable \ --attribute-definitions AttributeName=Username,AttributeType=S AttributeName=Timestamp,AttributeType=S \ --key-schema AttributeName=Username,KeyType=HASH AttributeName=Timestamp,KeyType=RANGE \ --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \ --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES Replication Stream + Kinesis

5.4.5. DeepDive Performance Write Read Add random prefix to partition key to distribute it through partitions

5.5. SQS

5.5.1. messages 256 kb

5.5.2. SQS Extended Library allows to handle > 256 Kb using S3

5.5.3. Ensures delivery of message at least once

5.5.4. Not Fifo

5.5.5. Messages can be retain up to 14 days

5.5.6. Long polling reduce cost of frequent polling polling every 20 seconds if queue is empty

5.5.7. 0.5$ for every million requests plus data transfer charges

5.5.8. Use cases prioritisation using two SQS queues parallel processing reading from one queue different components will do different roles SNS -> to many SQS

5.6. Kinesis

5.6.1. Stream Collect and process large streams of data in real time Create data processing application Read data from the stream Process records Scenarios Fast log processing Realtime metric and reporting Realtime data analitics Complex stream processing Benefits Realtime aggregation of data Loading aggregated data into data warehouses / mapreduce cluster Durability and Elasticity Parallel application readers

5.6.2. Analitics The easiest way to run SQL queries against streaming data

5.6.3. Firehose The easiest way to load streaming data to S3 Redshift That enables near realtime analitics Batch size or interval to control how often data will be upload to S3 or Redshift Firehose API or Linux agent compression scalable secure - encription monitoring through CloudWatch Pay as you go pay only for data and nothing for infrastructure

5.7. Kinesis vs SQS

5.7.1. Kinesis Data can be processed at the same time or within 24 hours by different consumer Data can be replayed within the time 24 hours of retention

5.7.2. SQS You can write to multiple queues. Then you can read from multiple source. But you can not reuse the information later up to 24 days of retention

6. Operations

6.1. Instance Type Deepdive

6.1.1. Virtualization Types Para-Virtualisation Hardware Virtualization (HVM) AMI supports either first or second virtualisation Usually Older AMI supports Para - Virtualisation Using HVM gives much more wide choice of instance type and sizes Some families only supports HVM (eg T2) Some Old instance types don't support HVM (eg. T1)

6.1.2. Instance Types T General Burstable M General Purpose C CPU utilisation R RAM optimised G GPU, graphic I High IO instances D Dense Storage - Disk performance

6.1.3. Instance Size

6.1.4. Instance Generation

6.1.5. Instance Features EBS optimisation Dedicated bandwidth, consistency higher throughput 500Mbs -4000Mbs enhanced networking AWS supported SR-IOV less jitter higher throughput VM must have driver Instance Store Volumes GPU availability

6.2. EBS Deep Dives

6.2.1. Types Magnetic ~100 IOPS 2-40 ms latency ~10Mb throughput no burst General purpose SSD 3 IOPS per Gb Burst up to 3000 up to 160 Mb/s troughput Larger volumes can scale to 10000 IOPS IOPS limits assume 256 KB block size IOPS Burst Pool starts with 5400 000 Procision IOPS max IOPS 20 000 up to 320 Mb/s throughtput 50 IOPS for Gb of storage max 99.9% performance consistency

6.2.2. Terms Capacity Amount of data in Gb Troughput Data throughput Mb/s for read/write operations Block Size Size of each write or read in KB IOPS Number read and write operation per second Latency Delay between read/write request and completion ms

6.2.3. Performance element Instance IO You can not maintain MAX IOPS and MAX throughput in the same time network ebs volumes

6.2.4. IOPS vs Throughput eg, you expect 3000 IOPS and 160Mbs 32KB * 3000 IOPS = 96Mbs 256KB *750 IOPS =192Mbs 256KB * 3000 IOPS = 768 Mbs that beyond 160 MBs MAX Throughput = IOPS * Block Size you need to optimise Block Size

6.2.5. RAID We need more Large EBS optimised instance can deliver 32000 IOPS @ 16k 500 Mbs Max IOPS 48000 IOPS @ 16k 10Gb RAID 0 LVM Stripe You need to freeze filesystem during snapshotting

6.2.6. Burst Pool MAX Burst is 3000 IOPS Start Burst pool 5400 000 you don't need to worry about provision large volume to get better performance

6.2.7. Tips Pre-warming is not longer required for new EBS volumes Volumes created from snapshots are Lazy restored from S3 and require PrepWarming to get full performance First snapshot - full copy, next are incremental. Often snapshots - cheaper and faster