Specialization of: Business Agility

Site Reliability Engineering (SRE)

SRE blends software engineering with operations to achieve reliability at scale.

WHAT YOU’LL LEARN
- Translate customer expectations into SLOs and error budgets.
- Run on-call, triage incidents, and conduct blameless postmortems.
- Instrument services for logs, metrics, traces, and APM.
- Use chaos experiments and capacity planning to prevent failures.

CORE TOPICS (Clusters)
- SLOs & Error Budgets
- Incident Response & On-Call
- Observability & APM
- Chaos Engineering
- Capacity & Scaling

OUTCOMES
- Measurable reliability aligned to user experience.
- Faster MTTR, fewer severe incidents.
- Proactive resilience through testing and capacity models.

Prerequisites: Basic production experience and monitoring fundamentals.

Related Resources

The Agile Learning Digest
A personalized learning compilation made just for you
Get select content from around the web tailored for your specific learning - weekly in your inbox. Our communities gather and evaluate each resource, curating them so you can be continually informed and inspired.
Accounts are free and have no ads