Welcome to BlameTrail
Learn how to monitor your services, track deploys, and pinpoint the code that broke production.
BlameTrail is an incident monitoring platform that detects outages, correlates them with recent deploys, and uses AI to explain what went wrong. This documentation covers everything you need to get started and make the most of the platform.
Where to start
- New to BlameTrail? Begin with the Quickstart to set up your first service and monitor in under five minutes.
- Core Concepts — Understand services, monitors, incidents, deploys, and suspects before diving deeper.
- Connect your tools — Set up GitHub, Sentry, Slack, or custom webhooks.
Feature overview
| Feature | What it does |
|---|---|
| Uptime monitoring | HTTP health checks with configurable intervals and thresholds |
| Incident detection | Automatic incident creation after 3 consecutive failures |
| Deploy tracking | Record every deploy via webhook, enrich with GitHub metadata |
| Suspect scoring | Rank recent deploys by how likely they caused an incident |
| Commit analysis | AI-powered inspection of code changes — file classification, risk scoring, diagnosis |
| Fix proposals | Generate revert PRs or AI-powered code fixes for incidents with suspect commits |
| On-call paging | Page the rotation over SMS, voice (with DTMF acknowledge), and browser push, with per-tenant guardrails |
| AI postmortems | Auto-draft postmortems on resolve — timeline, customer impact, root cause, action items |
| Public status pages | Customer-facing status at status.yourdomain.com with timestamped incident updates |
| Integrations | GitHub, Sentry, Slack, Grafana, and custom webhooks |
| Team management | Organizations, roles, and member invitations |