Runbooks
Step-by-step procedures for common operational tasks and incident response.
Contents
- Emergency Access - Regaining access in emergencies
- Database Failover - PostgreSQL failover procedures
- Cluster Recovery - Kubernetes cluster recovery
- Secret Rotation - Rotating secrets and credentials
- Incident Response - Handling incidents
Runbook Format
Each runbook follows this structure:
- Overview - What the runbook covers
- Prerequisites - What you need before starting
- Procedure - Step-by-step instructions
- Verification - How to verify success
- Rollback - How to undo if needed
Severity Levels
| Level | Description | Response Time |
|---|---|---|
| P1 | Critical - Complete outage | Immediate |
| P2 | Major - Partial outage | < 30 min |
| P3 | Minor - Degraded service | < 4 hours |
| P4 | Low - Cosmetic issues | Next business day |