AWS SysOps Certificate

Get Started. It's Free
or sign up with your email address
Rocket clouds
AWS SysOps Certificate by Mind Map: AWS SysOps Certificate

1. Monitoring, Metrics, Analizies

1.1. CloudWatch

1.1.1. Types

1.1.1.1. EC2

1.1.1.1.1. Host Level Metrics Consist of

1.1.1.2. EBS

1.1.1.2.1. Metrics

1.1.1.2.2. Volume Status Check

1.1.1.3. ELB

1.1.1.3.1. every 60 seconds

1.1.1.3.2. important  metrics

1.1.1.4. ElastiCache

1.1.1.4.1. Memcached

1.1.1.4.2. Redis

1.1.1.5. DynamoDB

1.1.1.6. RDS

1.1.1.6.1. by metrics

1.1.1.6.2. by events

1.1.1.7. Custom Metrics

1.1.1.7.1. min interval is 1 min

1.1.1.8. Log Files of application

1.1.2. Storing Metrics

1.1.2.1. By default - 2 weeks

1.1.2.2. can be longer than 2 weeks using GetMetricStatistics API or using third party tools

1.1.2.3. You can retrieve data from terminated EC2 RDS instance for up to 2 weeks, after terminating

1.1.3. Centralized Monitoring

1.1.3.1. If you need to monitor the entire infrastructure you need to use Centalized Monitoring (Zennos, Splunk, Nagios etc)

1.2. Storages

1.2.1. EBS

1.2.1.1. I/O Credits

1.2.1.1.1. Each volume gets 5,400,000 I/O credits on the balance

1.2.1.1.2. When you are not over your provisioned IO level you will be earning IO credits

1.2.1.1.3. burst is up to 3000 IO for volumes bigger than 1 Gb

1.2.1.2. Pre-Warming EBS Volumes

1.2.1.2.1. New EBS volumes receive their maximum performance the moment that they are available and do not require initialization (formerly known as pre-warming).

1.2.1.2.2. Storage blocks on volumes that were restored from snapshots must be initialized (pulled down from Amazon S3 and written to the volume) before you can access the block.

1.3. CostOptimisation

1.3.1. Heavy,Medium utilization for reserved instances

2. High Availability

2.1. Elasticity

2.1.1. Scale OUT

2.1.2. Short term period of time (Hours, Days)

2.1.3. EC2: increase number of instances in ASG

2.1.4. DynamoDB: increase IOPS for additional spikes in traffic

2.1.5. RDS: not very elastic

2.2. Scalability

2.2.1. Long term period (weeks, months)

2.2.2. Scale UP

2.2.3. EC2: increase instance type

2.2.4. DynamoDB Unlimited amount of storage

2.2.5. RDS: Increase instance size

2.3. RDS

2.3.1. Read Replica

2.3.1.1. up to 5 Read Replicas for MySQL and Postgres

2.3.1.2. Read Replicas cross Region For MySQL

2.3.1.3. Replication for Read Replicas is Asynchronous

2.3.1.4. Read Replicas can NOT be Multi-AZ

2.3.1.5. Read Replica of Read Replica (ONLY MySQL) increase latency

2.3.1.6. DB Snapshot or Automated Backup can NOT be taken for Read Replica

2.3.1.7. Key Metrica is REPLICA LAG

2.3.1.8. Requires   Automated Backups OR created snapshot

2.3.2. Multi-AZ

2.3.2.1. Auto Failover

2.3.2.2. NOT a scaling solution

2.4. Trouble Shooting

2.4.1. Instances not launching in to Autoscaling Groups

2.4.1.1. Assosiated Key Pair doesn't exist

2.4.1.2. Security Group Doesn't exist

2.4.1.3. Autoscaling config is not working correctly

2.4.1.4. ASG not found

2.4.1.5. Instance type is not supported in the AZ

2.4.1.6. AZ is not longer supported

2.4.1.7. Invalid EBS device mapping

2.4.1.8. Autoscaling Service is not enabled in your account

2.4.1.9. Attempting to attach EBS device to an instance store AMI

3. Deploying & Provisioning

3.1. Services give root access to OS

3.1.1. EC2

3.1.2. Opsworks

3.1.3. MapReduce

3.1.4. Beanstalk

3.2. ELB

3.2.1. supports different AZ in same VPC, same region

3.2.2. Types

3.2.2.1. External

3.2.2.2. Internal

3.2.2.2.1. Only addresable in your VPC

3.2.3. Sticky Session

3.2.3.1. Disabled by default

3.2.3.2. Types

3.2.3.2.1. Duration Based Session Stickyness

3.2.3.2.2. Application-controlled Session Stickyness

3.2.4. CloudWatch ELB

3.2.4.1. Latency

3.2.4.2. SurgeQueueLength

3.2.4.3. SpillOverCount

3.2.5. Prewarming ELB

3.2.5.1. You need to contact with AWS support to preconfigure ELB

4. Data Management

4.1. Disaster Recovery

4.1.1. Services

4.1.1.1. Regions

4.1.1.2. Storage

4.1.1.2.1. S3- 99,999999999% durability and Cross Region Replication

4.1.1.2.2. Glacier

4.1.1.2.3. EBS

4.1.1.2.4. AWS Storage Gateway

4.1.1.3. Compute

4.1.1.3.1. EC2

4.1.1.3.2. EC2 VM Import Connector

4.1.1.4. Networking

4.1.1.4.1. Route53

4.1.1.4.2. ELB

4.1.1.4.3. VPC

4.1.1.4.4. Direct Connect

4.1.1.5. Databases

4.1.1.5.1. RDS

4.1.1.5.2. DynamoDB

4.1.1.5.3. RedShift

4.1.1.6. Orchestration

4.1.1.6.1. CloudFormation

4.1.1.6.2. ElasicBeanstalk

4.1.1.6.3. Opsworks

4.1.1.7. Lambda

4.1.2. Recovery Time Objective (RTO)

4.1.3. Recovery Point Objective (RPO)

4.1.3.1. amount of data your organisation is prepared to lose in event disaster

4.1.4. DR Scenarios

4.1.4.1. Backup & Restore (24 hours)

4.1.4.2. Pilot Light (2 hours)

4.1.4.2.1. the minimal version of your environment is always runs  in cloud

4.1.4.2.2. Predefined AMI for EC2 instalces

4.1.4.2.3. Small RDS instances with replication

4.1.4.2.4. Network

4.1.4.3. Warm Standby (0.5 hours)

4.1.4.4. Multi Site (0 hours)

4.2. Automated Backups

4.2.1. Services

4.2.1.1. Have automated backups from the box

4.2.1.1.1. RDS

4.2.1.1.2. ElastiCache (only for Redis)

4.2.1.1.3. Redshift

4.2.1.2. NOT have automated backups from the box

4.2.1.2.1. EC2

4.3. Storage Type

4.3.1. Instance Store

4.3.1.1. Root Volume Size is up to 10 Gb

4.3.1.2. boot time is less than 5 min

4.3.1.3. Terminating

4.3.1.3.1. Root volume will be deleted automatically

4.3.1.3.2. Other instance store volumes will be deleted automatically

4.3.1.3.3. Other EBS volumes will persist

4.3.1.4. can NOT be stopped

4.3.1.5. You will lose data of Instace Store Volume under following circumstancies

4.3.1.5.1. Failed underling drive

4.3.1.5.2. Stopping EBS backed instance

4.3.1.5.3. Terminating instance

4.3.2. EBS

4.3.2.1. Root Volume Size is up to 1 or 2 Tb depending on the OS

4.3.2.2. boot time is less than 1 min

4.3.2.3. you can update instance type, kernel RAM disk user data ...

4.3.2.4. Terimnating

4.3.2.4.1. Root volume will be removed by default (can be changed while creating instance)

4.3.2.4.2. Other volumes will be preserved

5. Security

5.1. Security Token Service (STS)

5.1.1. Federation (AD)

5.1.1.1. uses Security Assertion Markup Language (SAML)

5.1.1.2. Grants temporal access based off the users AD credentials. Doesn't need to be a user of IAM

5.1.1.3. Single sign on allows user to log in to AWS consol without assigning IAM credentials

5.1.2. Federation with Mobile Apps

5.1.2.1. Use Facebook/Amazon/Google or other OpenID providers to log in

5.1.3. Cross Account Access

5.1.3.1. Let's users from one account access resources in another

5.1.4. Terms

5.1.4.1. Federation

5.1.4.1.1. Combining/Joining a list of user in one domain (such as IAM)  with list of users in another domain (such as AD, Facebook...)

5.1.4.2. Identity Broker

5.1.4.2.1. A service that allows you to take an identity from point A and join it (federate it) to point B. You need to configure it

5.1.4.3. Identity Store

5.1.4.3.1. Services like AD, Facebook....

5.1.4.4. Identities

5.1.4.4.1. A users of services like Facebook etc

5.1.5. Scenario1

5.1.5.1. Employee enters username and password

5.1.5.2. Identity Broker captures the username and password

5.1.5.3. Identity Broker uses LDAP direcotry to validate

5.1.5.4. Identity Broker calls the new GetFederationToken. The call includes IAM policy and a duration

5.1.5.5. return access key, secret key, token and duration

5.1.6. Scenario2

5.1.6.1. Develop an Identity Broker to communicate with LDAP and AWS STS

5.1.6.2. Identity Broker always authenticate with LDAP first, gets IAM Role associated with a user

5.1.6.3. Application then authenticates with STS and assumes that IAM role

5.1.6.4. Application uses that IAM role to interact with S3

5.1.7. Tips

5.1.7.1. You need to develop Identity Broker to communicate with LDAP and AWS STS

5.1.7.2. Identity Broker always authenticate with LDAP first and THEN with AWS STS

5.1.7.3. Application gets temporal access to AWS resources

6. Network

6.1. Network throughput depend on instance type

6.2. Different instance type has different EBS throughput to the disk

6.3. Route53

6.3.1. Failover

6.3.1.1. need to configure health check

6.3.2. Weighted

6.3.3. Latency based routing

6.3.4. Geolocation

6.4. Direct Connect

7. Questions

7.1. Retantion period for SQS

7.1.1. SQS automatically deletes messages that have been in a queue for more than maximum message retention period. The default message retention period is 4 days. However, you can set the message retention period to a value from 60 seconds to 1209600 seconds (14 days) with SetQueueAttributes.

7.2. custom CloudWatch metric for Disk full percentage of an Elastic Block Store Volume

7.3. You have been tasked with identifying an appropriate storage solution for a 300 GB MongoDB database that requires random I/O reads of greater than 110,000 4kB IOPS. Which EC2 option will meet this requirement?

7.4. You are using ElastiCache to cache your web application. The caching seems be running slower and slower and you want to diagnose the cause of this issue. If you are using Memcached as your caching engine, what parameter should be adjusted if you find that the overhead pool is less than 50MB?

7.4.1. Memcached_Connections_Overhead

7.5. Your EBS Volume status check is showing impaired. What does this mean?

7.5.1. The volume is stalled or not available.

7.6. You have a web application which queries elasticache to cache your database queries. You are using memached with elasticache and you use CloudWatch metrics to monitor your memcached performance. You notice that two metrics, Evictions (The number of non-expired items the cache evicted to allow space for new writes.) and GetMisses (The number of get requests the cache has received where the key requested was not found.) are getting very high. What should you do to scale your environment further?

7.6.1. Increase the number of nodes in your memcached cluster or increase the size of each node in your cluster.

7.7. You are leading a design team to implement an urgently needed collection and analysis project. You will be collecting data for an array of 50,000 anonymous data collectors which will be summarized each day and then rarely used again. The data will be pulled from collectors approximately once an hour. The Dev responsible for the DynamoDB design is concerned about how to design the Partition and Local keys to ensure efficient use of the DynamoDB tables. What advice would you provide. (Select 2)

7.7.1. Create a new table each day, and reconfigure the old table for infrequent use after the summation is complete

7.7.2. insert a calculated hash in front of the Date/Time value in the partition key to force DynamoDB to hop from partition to to partition.

7.7.3. There are two issues here. How to handle stale data to avoid paying for high provisioned throughput for infrequently used data. Plus how to design a partition key to distribute IO from sequential data across partitions evenly to avoid performance bottlenecks Further information: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Partitions http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.Partitions.html https://aws.amazon.com/blogs/aws/optimizing-provisioned-throughput-in-amazon-dynamodb/