Incidents
Understand how BlameTrail creates, manages, and resolves incidents — from automatic detection to AI summaries.
An incident represents an active problem with a monitored service. BlameTrail creates incidents automatically when monitors detect sustained failures, and resolves them automatically when the service recovers.
Automatic incident creation
BlameTrail creates an incident after a monitor records consecutive failures of the same type:
- Availability incident — 3 consecutive checks return an unexpected status code or fail to connect.
- Latency incident — 3 consecutive checks exceed the configured latency threshold.
The threshold of 3 consecutive failures prevents transient blips from generating noise.
Incident types
| Type | Trigger |
|---|---|
| Availability | Monitor checks fail due to wrong status code or unreachable endpoint. |
| Latency | Monitor checks succeed but response time exceeds the threshold. |
| Manual | Created by a team member from the Incidents page. |
| Grafana | Created by a Grafana alert webhook. |
| Sentry | Created by a Sentry integration event. |
Severity levels
Every incident has a severity level:
| Severity | Use case |
|---|---|
| Critical | Complete outage or data loss. |
| Error | Major functionality broken. |
| Warning | Degraded performance or partial failure. |
| Info | Notable event that does not require immediate action. |
Automatically created incidents default to a severity based on the failure type. You can change the severity at any time.
Status lifecycle
Incidents move through three statuses:
open --> acknowledged --> resolved- Open — The incident is active and has not been addressed.
- Acknowledged — A team member has seen the incident and is investigating.
- Resolved — The problem is fixed.
Auto-resolve
BlameTrail automatically resolves an incident when the monitor that triggered it records 3 consecutive passing checks. This mirrors the 3-failure threshold for creation — the service must demonstrate sustained recovery before the incident closes.
Manual status updates
You can update an incident's status from the incident detail page. This is useful for acknowledging an incident your team is investigating or manually resolving an incident that was fixed through a non-monitored path.
Paging responders
If the affected service has an escalation policy configured, BlameTrail pages the on-call rotation when an incident opens — SMS, voice (with DTMF acknowledge), and browser push can all fire per step. Responders can acknowledge an incident directly from a voice page by pressing 1. See On-call paging.
Creating incidents manually
Not every incident comes from a monitor. You can create incidents manually from the Incidents page:
- Click Create Incident.
- Fill in the details:
- Title — A short description of the problem.
- Severity — Critical, error, warning, or info.
- Environment (optional) — The affected environment.
- Context (optional) — Additional details about the incident.
- Click Create.
Manual incidents follow the same status lifecycle as automatic ones.
AI summaries
BlameTrail generates AI-powered summaries for incidents automatically. These summaries provide a concise explanation of what happened, when it started, and what changed.
When new context becomes available — such as enriched deploy metadata or suspect scoring results — you can refresh the summary to incorporate the latest information.
Publishing to a status page
Incidents are private by default. To surface an incident on your public status page, click Publish to status page on the incident detail page and optionally set a customer-facing public title (useful when the internal incident title contains codenames or sensitive details). You can then post timestamped updates — investigating, identified, monitoring, resolved — as the situation evolves. Public incidents, updates, and resolution are reflected on the status page within 30 seconds. See Status Page.
Postmortems
When an incident resolves — either automatically or via a manual status update — BlameTrail queues an AI-drafted postmortem. The draft pulls from the incident timeline, suspect commits, observability metrics and logs, and any updates your team posted. Anyone with edit permissions can revise the draft; owners and admins can publish.
Postmortems are available on Starter (5/month) and Pro (50/month). See Postmortems for the full lifecycle, permissions, and tips for better drafts.
Suspect deploys
The incident detail page shows suspect deploys: recent deploys ranked by how likely they caused the incident. Each suspect includes the commit SHA, commit message, pull request title, and changed files.
See Core Concepts for more on how suspect scoring works.
Fix proposals
When an incident has a suspect commit, you can generate a fix proposal directly from the incident detail page. BlameTrail supports two types of fixes:
Revert PR
Click Generate Revert PR to create a pull request that reverts the suspect commit. The revert is deterministic — BlameTrail creates a new branch, constructs the revert commit using GitHub's git data API, and opens a PR against the default branch. The PR is created immediately without requiring approval.
AI fix
Click Generate AI Fix to have BlameTrail analyze the error context (stack traces, breadcrumbs, alert data) alongside the suspect commit's code changes and generate a targeted code fix. AI fixes go through a review step before a PR is created:
- BlameTrail generates the fix and shows a preview with confidence score, risk assessment, and a diff of the proposed changes.
- Review the proposal — expand the diff preview to inspect the changes file by file.
- Click Approve & Create PR to open a pull request, or Reject to discard the proposal.
Confidence and risk
Each AI fix includes:
- Confidence score — A percentage indicating how likely the fix is correct. Scores above 80% are shown in green, 50-80% in yellow, and below 50% in red.
- Risk assessment — Low, medium, or high, based on the number of files changed, error context quality, and the AI's self-assessment.
- Context quality — Full (stack trace available), partial (some context), or minimal (error message only). A yellow banner appears when context is limited.
Proposals with confidence below 30% are blocked from automatic PR creation. Proposals between 30-50% require an explicit confirmation checkbox before approving.
PR tracking
After a PR is created, the proposal card shows the PR number, link, and state (open, closed, or merged). BlameTrail automatically tracks the PR state via GitHub webhooks and background polling.
Plan limits
Fix proposals are available on Starter (20/month) and Pro (200/month) plans. The Free plan does not include fix proposals. Higher-tier plans also receive larger AI context windows for more accurate fix suggestions. A GitHub token must be configured in your organization settings.