Welcome to BlameTrail

Learn how to monitor your services, track deploys, and pinpoint the code that broke production.

BlameTrail is an incident monitoring platform that detects outages, correlates them with recent deploys, and uses AI to explain what went wrong. This documentation covers everything you need to get started and make the most of the platform.

Where to start

New to BlameTrail? Begin with the Quickstart to set up your first service and monitor in under five minutes.
Core Concepts — Understand services, monitors, incidents, deploys, and suspects before diving deeper.
Connect your tools — Set up GitHub, Sentry, Slack, or custom webhooks.

Feature overview

Feature	What it does
Uptime monitoring	HTTP health checks with configurable intervals and thresholds
Incident detection	Automatic incident creation after 3 consecutive failures
Deploy tracking	Record every deploy via webhook, enrich with GitHub metadata
Suspect scoring	Rank recent deploys by how likely they caused an incident
Commit analysis	AI-powered inspection of code changes — file classification, risk scoring, diagnosis
Fix proposals	Generate revert PRs or AI-powered code fixes for incidents with suspect commits
On-call paging	Page the rotation over SMS, voice (with DTMF acknowledge), and browser push, with per-tenant guardrails
AI postmortems	Auto-draft postmortems on resolve — timeline, customer impact, root cause, action items
Public status pages	Customer-facing status at `status.yourdomain.com` with timestamped incident updates
Integrations	GitHub, Sentry, Slack, Grafana, and custom webhooks
Team management	Organizations, roles, and member invitations

Where to start

Feature overview

On this page