~/src/www.mokhan.ca/xlgmokha [main]
cat error-budgets.md
error-budgets.md 1229 bytes | 2023-03-08 16:15
symlink: /dev/eng/error-budgets.md

Error Budgets

The error budget provides a clear, objective metric that determines how unreliable the service is allowed to be within a single quarter. This metric removes the politics from negotiations between SREs and the product developers when deciding how much risk to allow. source

  • Objective: The desired level of success, noted as a percentage
  • SLI: an evaluation used to distinguish number of failed events
  • Timeframe: enforcing a recency bias to the SLI
Objective: 99.95%
SLI: 95th percentile latency of api requests over 5 mins is < 100 ms
Timeframe: previous 28 days

99.95% of the 95th percentile of api requests over 5 minutes is less than 100 ms
over the previous 28 days.

The error budget is calculated using a formula:

Error Budget = 1 - Objective.

e.g.

20.16 minutes = ((1 - 0.9995) * (28 * 24 * 60))