Monitors
Configure HTTP health checks to detect outages and latency issues automatically.
A monitor is an HTTP health check that BlameTrail runs on a recurring schedule. Each monitor belongs to a service and watches a single endpoint, recording the response status code, response time, and whether the check passed or failed.
Creating a monitor
- Navigate to Monitors and click Add Monitor.
- Configure the following fields:
| Field | Required | Description |
|---|---|---|
| Service | Yes | The service this monitor belongs to. |
| URL | Yes | The endpoint to check (e.g., https://api.example.com/health). |
| HTTP Method | Yes | The request method — GET, POST, HEAD, PUT, PATCH, or DELETE. |
| Expected Status | Yes | The HTTP status code that counts as a passing check (e.g., 200). |
| Interval | Yes | How often to run the check, in seconds. |
| Latency Threshold | Yes | Maximum acceptable response time in milliseconds. Responses slower than this are recorded as latency failures. |
| Headers | No | Custom HTTP headers to include with each request (e.g., authentication tokens). |
- Click Save. The monitor starts checking immediately.
How checks work
A background worker pings the configured URL at the specified interval. For each check, BlameTrail records:
- Status code — The HTTP response code returned by the endpoint.
- Latency — The response time in milliseconds.
- Result — Pass or fail, based on whether the status code matches the expected value and the latency is within the threshold.
A check fails if:
- The endpoint returns a status code that does not match the expected value (availability failure).
- The response time exceeds the latency threshold (latency failure).
- The endpoint is unreachable or the connection times out (availability failure).
After 3 consecutive failures of the same type, BlameTrail automatically creates an incident. See Incidents for details on incident creation and resolution.
Pausing a monitor
You can pause a monitor by setting it to inactive. This stops all scheduled checks without deleting the monitor or its history. To resume, set the monitor back to active.
Pausing is useful during planned maintenance windows or when an endpoint is intentionally offline.
Monitor detail page
The monitor detail page shows:
- Current status — Whether the monitor is passing or failing.
- Check history — A timeline of recent checks with status codes, latency values, and pass/fail results.
- Active incidents — Any open incidents linked to this monitor.
Use the check history to identify patterns — intermittent failures, gradual latency increases, or specific time windows where problems occur.