~/src/www.mokhan.ca/xlgmokha [main]
cat slert.md
slert.md 1751 bytes | 2022-03-14 11:20
symlink: /dev/eng/slert.md

SLERT

SLERT is a monitoring framework that focuses on the four key performance indicators essential for understanding system health and performance.

Saturation

The amount of requested work that the resource cannot yet service, often queued. This indicates how close your system is to its maximum capacity.

Examples:

  • CPU utilization approaching 100%
  • Memory usage near limits
  • Disk I/O queue length

Latency

A measure of the amount of time to complete a unit of work. Can be expressed as an average or percentile (p50, p95, p99).

Examples:

  • HTTP request response time
  • Database query duration
  • API call latency

Error

Internal errors for the work that the resource produces. This includes both the rate and types of errors occurring.

Examples:

  • HTTP 5xx error rates
  • Failed database connections
  • Application exceptions

Throughput

The amount of work the system is doing per unit of time. This measures the system’s current workload.

Examples:

  • Requests per second
  • Transactions processed per minute
  • Messages consumed per hour

Implementation

Monitors should be created based on:

  • Metrics - Quantitative measurements over time
  • Integrations - System and service integrations
  • Tracing - Request flow through distributed systems
  • Logs - Detailed event records for debugging