BlameTrail
Monitoring

Incidents

Understand how BlameTrail creates, manages, and resolves incidents — from automatic detection to AI summaries.

An incident represents an active problem with a monitored service. BlameTrail creates incidents automatically when monitors detect sustained failures, and resolves them automatically when the service recovers.

Automatic incident creation

BlameTrail creates an incident after a monitor records consecutive failures of the same type:

  • Availability incident — 3 consecutive checks return an unexpected status code or fail to connect.
  • Latency incident — 3 consecutive checks exceed the configured latency threshold.

The threshold of 3 consecutive failures prevents transient blips from generating noise.

Incident types

TypeTrigger
AvailabilityMonitor checks fail due to wrong status code or unreachable endpoint.
LatencyMonitor checks succeed but response time exceeds the threshold.
ManualCreated by a team member from the Incidents page.
GrafanaCreated by a Grafana alert webhook.
SentryCreated by a Sentry integration event.

Severity levels

Every incident has a severity level:

SeverityUse case
CriticalComplete outage or data loss.
ErrorMajor functionality broken.
WarningDegraded performance or partial failure.
InfoNotable event that does not require immediate action.

Automatically created incidents default to a severity based on the failure type. You can change the severity at any time.

Status lifecycle

Incidents move through three statuses:

open --> acknowledged --> resolved
  • Open — The incident is active and has not been addressed.
  • Acknowledged — A team member has seen the incident and is investigating.
  • Resolved — The problem is fixed.

Auto-resolve

BlameTrail automatically resolves an incident when the monitor that triggered it records 3 consecutive passing checks. This mirrors the 3-failure threshold for creation — the service must demonstrate sustained recovery before the incident closes.

Manual status updates

You can update an incident's status from the incident detail page. This is useful for acknowledging an incident your team is investigating or manually resolving an incident that was fixed through a non-monitored path.

Paging responders

If the affected service has an escalation policy configured, BlameTrail pages the on-call rotation when an incident opens — SMS, voice (with DTMF acknowledge), and browser push can all fire per step. Responders can acknowledge an incident directly from a voice page by pressing 1. See On-call paging.

Creating incidents manually

Not every incident comes from a monitor. You can create incidents manually from the Incidents page:

  1. Click Create Incident.
  2. Fill in the details:
    • Title — A short description of the problem.
    • Severity — Critical, error, warning, or info.
    • Environment (optional) — The affected environment.
    • Context (optional) — Additional details about the incident.
  3. Click Create.

Manual incidents follow the same status lifecycle as automatic ones.

AI summaries

BlameTrail generates AI-powered summaries for incidents automatically. These summaries provide a concise explanation of what happened, when it started, and what changed.

When new context becomes available — such as enriched deploy metadata or suspect scoring results — you can refresh the summary to incorporate the latest information.

Publishing to a status page

Incidents are private by default. To surface an incident on your public status page, click Publish to status page on the incident detail page and optionally set a customer-facing public title (useful when the internal incident title contains codenames or sensitive details). You can then post timestamped updatesinvestigating, identified, monitoring, resolved — as the situation evolves. Public incidents, updates, and resolution are reflected on the status page within 30 seconds. See Status Page.

Postmortems

When an incident resolves — either automatically or via a manual status update — BlameTrail queues an AI-drafted postmortem. The draft pulls from the incident timeline, suspect commits, observability metrics and logs, and any updates your team posted. Anyone with edit permissions can revise the draft; owners and admins can publish.

Postmortems are available on Starter (5/month) and Pro (50/month). See Postmortems for the full lifecycle, permissions, and tips for better drafts.

Suspect deploys

The incident detail page shows suspect deploys: recent deploys ranked by how likely they caused the incident. Each suspect includes the commit SHA, commit message, pull request title, and changed files.

See Core Concepts for more on how suspect scoring works.

Fix proposals

When an incident has a suspect commit, you can generate a fix proposal directly from the incident detail page. BlameTrail supports two types of fixes:

Revert PR

Click Generate Revert PR to create a pull request that reverts the suspect commit. The revert is deterministic — BlameTrail creates a new branch, constructs the revert commit using GitHub's git data API, and opens a PR against the default branch. The PR is created immediately without requiring approval.

AI fix

Click Generate AI Fix to have BlameTrail analyze the error context (stack traces, breadcrumbs, alert data) alongside the suspect commit's code changes and generate a targeted code fix. AI fixes go through a review step before a PR is created:

  1. BlameTrail generates the fix and shows a preview with confidence score, risk assessment, and a diff of the proposed changes.
  2. Review the proposal — expand the diff preview to inspect the changes file by file.
  3. Click Approve & Create PR to open a pull request, or Reject to discard the proposal.

Confidence and risk

Each AI fix includes:

  • Confidence score — A percentage indicating how likely the fix is correct. Scores above 80% are shown in green, 50-80% in yellow, and below 50% in red.
  • Risk assessment — Low, medium, or high, based on the number of files changed, error context quality, and the AI's self-assessment.
  • Context quality — Full (stack trace available), partial (some context), or minimal (error message only). A yellow banner appears when context is limited.

Proposals with confidence below 30% are blocked from automatic PR creation. Proposals between 30-50% require an explicit confirmation checkbox before approving.

PR tracking

After a PR is created, the proposal card shows the PR number, link, and state (open, closed, or merged). BlameTrail automatically tracks the PR state via GitHub webhooks and background polling.

Plan limits

Fix proposals are available on Starter (20/month) and Pro (200/month) plans. The Free plan does not include fix proposals. Higher-tier plans also receive larger AI context windows for more accurate fix suggestions. A GitHub token must be configured in your organization settings.

On this page