1. Understand
1.1. Automation means
1.1.1. A push buton
1.1.2. to deploy software
1.1.2.1. from
1.1.2.1.1. artifacts registry (CI delivery point)
1.1.2.2. to
1.1.2.2.1. 1. testing environments
1.1.2.2.2. 2. production
1.1.2.3. Using
1.1.2.3.1. Environment configurations
1.1.2.3.2. Automation code
1.1.2.3.3. Under version contol too
1.2. So that
1.2.1. Get fast feedback on quality issues after a change
1.2.2. Reduce risk to update production and update production frequently
2. Measure
2.1. Count of manual steps in the deployment process
2.2. % automated activites / all deployment activites
2.3. Time per deployment process activity, order by desc
3. Pitfalls
3.1. Missing APIs to automate
3.1.1. Which component / service still requires a manual acion ? (aka no API)
3.1.2. Example
3.1.2.1. Firestore mode: native vs datastore
3.1.2.2. Past: stackdriver hosting project (solved with metric scopes)
3.2. Keeping existing complexity
3.2.1. Do NOT lift and shift a manual complex process
3.2.2. Do re-architect for deployability
3.2.2.1. Deployment script must be simple
3.2.2.2. Compexity must stay in
3.2.2.2.1. App code
3.2.2.2.2. Infrastructure
3.2.2.3. Look for
3.2.2.3.1. 1. Idempotent steps
3.2.2.3.2. 2. Services that fail gracefully on other dependencie absent
3.3. Tightly coupled services
3.3.1. Need to deploy multiple services
3.3.1.1. Toghether
3.3.1.2. In a specific order
3.3.1.3. aka orchestrated releases
3.3.2. How to decouple ?
3.3.2.1. Design for backward compatibility
3.3.2.2. Use API versioning
3.3.2.2.1. E.g Cloud Resource Manager API
3.3.2.3. Parallel change pattern
3.3.2.4. Fault tolerance in high volume distributed system
3.3.2.4.1. Per-dependency thread pools
3.3.2.4.2. If remote call
3.3.2.4.3. If local call
3.4. Low inter teams collaboration
3.4.1. e.g.
3.4.1.1. siloed teams
3.4.1.1.1. Dev
3.4.1.1.2. IT
3.4.1.2. Different moethods - tools - configs for
3.4.1.2.1. IT - production environment
3.4.1.2.2. Dev - lower environments
4. Implement
4.1. Principles
4.1.1. "Only one"
4.1.1.1. One deployment process to rule all environments
4.1.1.1.1. the same. Production included
4.1.1.2. One set of packages for all environements
4.1.1.2.1. Production included
4.1.1.2.2. Env specific configuration is out of the packages
4.1.1.3. One comon set of tools to deploy
4.1.1.3.1. Automation code is documentation
4.1.1.3.2. Accessible to dev and ops
4.1.2. No deployment ticket - no wait
4.1.2.1. Protected by required credentials
4.1.2.2. Fully automated - push button
4.1.2.3. Anyone who need can launch it
4.1.2.3.1. No specific skills
4.1.3. Repeatable
4.1.3.1. Possible to recreate the state of any environment
4.1.3.2. Using information under version control
4.1.3.3. If a disaster occures, the restoration is deterministic
4.1.4. Rollback safe
4.1.4.1. e.g Traffic split
4.1.4.1.1. k8s, GKE, Gloud Run
4.1.4.2. Tested
4.1.4.3. May be automated
4.1.4.3.1. e.g. GKE
4.1.4.3.2. e.g. Cloud Run - Release Manager
4.1.5. Progressive
4.1.5.1. Rollouts must proceed in waves - NOT all at once
4.1.5.2. bake / soak time
4.1.5.2.1. Time to wait before processing next stage
4.1.5.2.2. Allow time to monitor and test
4.1.5.3. Migitate blast radius by applying to
4.1.5.3.1. by environments
4.1.5.3.2. to a small portion of
4.1.5.3.3. Combination to these types
4.1.5.4. Deployment saturation goal, e.g. 1 day, 1 week
4.2. Typical steps
4.2.1. Consume artifacts from CI output repos
4.2.2. Binary authorization
4.2.3. Deploy
4.2.4. Hydrate / Flip
4.2.5. Test
4.3. How many CD pipelines ?
4.3.1. Option 1
4.3.1.1. Two pipelines
4.3.1.1.1. One for App deployment
4.3.1.1.2. One for Infra provisioning
4.3.1.2. Key points
4.3.1.2.1. Follow different rhythms
4.3.1.2.2. Segregation of skills and dutties
4.3.2. Option 2
4.3.2.1. One comon pipeline
4.3.2.1.1. App and Infra deployed in the same pipeline
4.3.2.2. Key points
4.3.2.2.1. Create a full isolated virtual deployment environment
4.3.2.2.2. Leverage cloud Pay as You go benefit
4.3.2.2.3. The App Infra is ephemeral
4.3.2.2.4. Production is not ephemeral