Infrastructure Engineering

Site Reliability Engineering

SRE blends software engineering with operations to achieve reliability at scale.
0 Cohorts
4 Active this week
0 Resources
Individually selected
Flexible Schedule
Invest 20 minutes a day
SLOs and error budgets, incident response, observability/APM, chaos engineering, and capacity planning to keep systems reliable at scale. This track teaches how to define SLOs and error budgets, instrument and observe systems, run effective on-call, and use chaos and capacity practices to prevent outages. Build resilient services while maintaining delivery speed.

Target Audience

SREs, platform/ops engineers, backend engineers, tech leads, engineering managers, incident commanders.

Domains in this track

Chaos Engineering & Resilience Testing

Observability, Monitoring & Alerting

Capacity, Performance & Load

Resilience Architecture & Failure Modes

SLOs & Error Budgets

On-Call, Incident Management & Postmortems

Production Readiness & Toil Reduction

Reliability Economics

Upcoming Events
Programs