1. Incident Management
1.1. Postmortem & RCA
1.2. SLO Management
1.2.1. Coverage for Tiers
1.2.2. Alerting
1.2.2.1. Multi Window Multi Burn Alerting Strategy
1.2.2.1.1. Adaptive Paging
1.3. On Call Teams
1.3.1. 24x7 responsibilities
1.3.2. Wheel of Fortune
1.3.3. Process and Playbooks
1.3.4. Incident Comandors
2. SRE
2.1. Partially > AppKit
2.2. Reliability Reporting
2.3. SRE Mindset - Community champions & support
2.4. SRE Guild
2.5. WORM
2.6. Resilience
2.6.1. Chaos Engineering
2.6.2. Load Testing
2.6.3. Capacity Planning
3. Observability
3.1. Toolkit
3.1.1. Telemetry
3.1.1.1. Logging Support
3.1.1.1.1. Support for job frameworks (long running / Kubernetes Jobs)
3.1.1.1.2. Structured Logging
3.1.1.1.3. On-the-fly log level changes
3.1.1.2. Tracing Support
3.1.1.2.1. HTTP requests (client & server side)
3.1.1.2.2. Propagation allowing interaction with OpenTracing instrumentation
3.1.1.2.3. Extensible Instrumentation
3.1.1.2.4. Parallel / Async processing support
3.1.1.2.5. Support for job frameworks (short running / Kubernetes Jobs)
3.1.1.2.6. Database calls
3.1.1.2.7. Support for AWS services
3.1.1.2.8. Easy way to create internal spans
3.1.1.2.9. Full support for OTel Tracing Data Model
3.1.1.2.10. On-the-fly tracing granularity changes (similar to log levels)
3.1.1.3. Metrics Support
3.1.1.3.1. Run time metrics
3.1.1.3.2. Pools
3.1.1.3.3. Client side metrics (db and client)
3.1.1.3.4. Support for all metrics instruments
3.1.1.3.5. System Metrics
3.1.2. Out of the Box Dashboards
3.2. Policies
3.2.1. Semantic Conventions and Standards