Incident management

High Severity incident management



  • Detection: alert for SEV
  • Diagnosis: discover the source
  • Mitigation: introduce a fix
  • Prevention: understand root cause
  • Closure: gameday to replicate SEV and confirm reliable fix

TTD: Time to Detection (Detection to Diagnosis)

TTR: Time to recovery (detection to mitigation)

TTP: Time to prevention

TTI: Total Time of Impact (Detection to mitigation)

TBF: Time between failures (Detection to next failure detection)