Core Concepts

Understand the key building blocks of BlameTrail — services, monitors, deploys, incidents, suspect scoring, commit analysis, fix proposals, notifications, on-call paging, postmortems, and status pages.

BlameTrail is built around a handful of core concepts. Understanding how they connect will help you get the most out of the platform.

Services

A service is anything your team deploys and operates — a REST API, a marketing website, a background worker, a database. Every other object in BlameTrail (monitors, deploys, incidents) is attached to a service.

When you create a service, you choose an environment (production, staging, or development) and a type (web, api, worker, database, or internal). Optionally, you can link a GitHub repository to enable automatic commit enrichment.

See Services for full details.

Monitors

A monitor is an HTTP health check that BlameTrail runs on a recurring schedule. Each monitor targets a specific URL and records the response status code, latency, and whether the check passed or failed.

You configure:

URL — The endpoint to check.
HTTP method — GET, POST, HEAD, etc.
Expected status code — The response code that counts as healthy (e.g., 200).
Interval — How many seconds between checks.
Latency threshold — The maximum acceptable response time in milliseconds.

See Monitors for full details.

Deploys

A deploy is a record of code that was pushed to an environment. Deploys are sent to BlameTrail via a webhook from your CI/CD pipeline. Each deploy tracks the commit SHA, branch, commit message, who deployed, and the target environment.

When a service is linked to a GitHub repository, BlameTrail automatically enriches deploys with additional metadata — commit messages, pull request titles, and changed files. This context is surfaced during suspect scoring and commit analysis.

See Deploy Tracking for full details.

Incidents

An incident represents an active problem with a service. BlameTrail creates incidents automatically based on monitor check results:

Availability incident — Created after 3 consecutive failed checks.
Latency incident — Created after 3 consecutive responses that exceed the latency threshold.

Incidents auto-resolve when the monitor records 3 consecutive passing checks. You can also create incidents manually and update their status at any time.

See Incidents for full details.

Suspect Deploys

When an incident opens, BlameTrail looks at all deploys to the affected service from the last 60 minutes and ranks them by time proximity to the incident. The most recent deploy before the failure started is the top suspect.

For each suspect deploy, BlameTrail shows:

Commit SHA and commit message
Pull request title (if linked to a repository)
Changed files

This gives your team an immediate starting point for investigation instead of manually searching through deploy logs.

Commit Analysis

Commit analysis is an AI-powered inspection of code changes. BlameTrail fetches the diff from GitHub and runs it through a structured analysis pipeline:

File classification — Categorizes each changed file by type and purpose.
Risk scoring — Assigns a risk level to each file based on the nature of the changes.
Diagnosis — Generates a human-readable explanation of what the changes do and what could go wrong.

Analysis can run against a single commit or a range of commits, which is useful for identifying which change in a series of deploys introduced a problem.

Fix Proposals

When an incident has a suspect commit, BlameTrail can generate a fix — either an automated revert or an AI-powered code change — and open a pull request on your repository.

Revert — Creates a PR that reverts the suspect commit. Deterministic and low risk.
AI fix — Analyzes error context (stack traces, alert data) alongside the suspect commit's diff and generates a targeted code fix. Includes a confidence score and risk assessment. Requires manual approval before a PR is created.

Fix proposals are available on Starter and Pro plans. A GitHub token must be configured for the organization.

See Incidents for full details on using fix proposals.

Notifications

BlameTrail sends Slack notifications when incidents change state. Each notification includes:

Service name
Monitor that triggered the incident
Incident type (availability or latency)
Duration (on resolve)
Most likely suspect deploy

Notifications fire on both incident open and incident resolve, so your team knows when a problem starts and when it clears.

See Slack Integration for setup instructions.

On-call paging

When Slack isn't enough — for example, a production outage at 3am — BlameTrail pages the on-call rotation directly over three channels:

Browser push (Web Push / VAPID) — works in Chrome, Firefox, Edge, and Safari 16.4+.
SMS (Twilio) — any country; US/Canada/UK are bundled, other countries are billed as metered overage with a 30% surcharge.
Voice call with DTMF acknowledge (Twilio TwiML) — the call announces the incident and prompts the responder to press 1 to acknowledge or 2 to escalate.

Paging is driven by escalation policies attached to services. Each step in a policy has independent SMS / voice / push toggles and a delay, so you can start with a soft push, escalate to SMS, and finally wake the on-call with a voice call.

Per-tenant guardrails keep runaway loops from blowing your bill: a hard cap at 5× the monthly included quota, a per-hour burst limit, and an automatic per-monitor storm mute if a single monitor pages more than 10 times in 15 minutes. Pages beyond the monthly quota are billed as metered overage rather than blocked.

Paging is available on Starter (200 pages/month, 2 verified numbers per user) and Pro (2,000 pages/month, 5 numbers per user).

See On-call paging for full setup and escalation design.

Postmortems

A postmortem is a structured retrospective of an incident — summary, customer impact, timeline, root cause, contributing factors, and action items. BlameTrail auto-drafts a postmortem as soon as an incident resolves, using the same context the incident summary uses (suspect commits, observability metrics and logs, timeline updates posted by the team).

The postmortem moves through a lifecycle:

pending --> generating --> draft --> edited --> published

Anyone on the team can edit a draft before publishing; owners and admins can publish. Regenerating a postmortem within the same calendar month does not consume additional quota.

Postmortems are available on Starter (5/month) and Pro (50/month).

See Postmortems for the full lifecycle and tips for better drafts.

Status pages

A status page is a customer-facing view of your services and incidents, hosted at status.blametrail.com/<your-slug>. Each page has a visibility setting:

Public — anyone with the link can view. CDN-cacheable, revalidates every 30 seconds.
Private (members only) — requires a signed-in BlameTrail session for a member of your organization. Unauthenticated viewers are redirected to log in.

Incidents start private by default. To surface one on the public page, click Publish to status page on the incident, optionally give it a customer-facing public title (to hide internal codenames), and post timestamped updates through the incident lifecycle — investigating, identified, monitoring, or resolved.

Starter includes a public status page; Pro adds custom brand color, headline, and support link for white-label branding.

See Status Page for setup and publishing.

On this page