Suspect Scoring

How BlameTrail ranks recent deploys by likelihood of causing an incident using temporal proximity and enriched commit context.

When an incident opens, BlameTrail automatically identifies which recent deploys are most likely responsible. This process, called suspect scoring, ranks deploys by their temporal proximity to the first failure and presents them with enriched context so your team can start investigating immediately.

How scoring works

Suspect scoring follows a straightforward process:

Incident triggers — A monitor records 3 consecutive failures (availability) or 3 consecutive slow responses (latency), creating an incident.
Window lookup — BlameTrail queries all deploys to the affected service from the last 60 minutes before the first failure.
Proximity ranking — Deploys are ranked by how close in time they were to the incident start. The most recent deploy before the failure is scored highest.
Context attachment — Each suspect deploy is annotated with its commit message, branch, deployer, and any enrichment data (PR details, changed files).

Understanding confidence

Each suspect deploy receives a confidence score based on temporal proximity:

High confidence — Deployed within minutes of the first failure. This is the most common pattern: a deploy goes out and shortly after, monitors start failing.
Medium confidence — Deployed 15-30 minutes before the failure. Still a likely candidate, especially for issues that take time to manifest (memory leaks, queue backlogs, gradual cache invalidation).
Lower confidence — Deployed 30-60 minutes before the failure. Less likely to be the direct cause, but worth reviewing if higher-ranked suspects are ruled out.

The top suspect is highlighted prominently on the incident detail page and included in Slack notifications.

What you see for each suspect

For every suspect deploy, BlameTrail displays:

Field	Source
Commit SHA	Deploy webhook payload
Commit message	Deploy webhook payload or GitHub enrichment
Branch	Deploy webhook payload
Deployed by	Deploy webhook payload
Time since deploy	Calculated from deploy timestamp to incident start
PR title and number	GitHub enrichment (if available)
PR author	GitHub enrichment (if available)
Changed files	GitHub enrichment, ranked by relevance (if available)

If commit enrichment is configured, the suspect list becomes significantly more useful. Instead of just seeing a commit SHA, you see the full PR context and which files were changed.

AI summary integration

When an incident has an AI-generated summary, BlameTrail includes suspect deploy context in the prompt. The AI can reference:

What code was changed in the top suspects
Which files were modified and their relevance
The relationship between the changed code and the type of failure observed

This produces summaries that go beyond "the service is down" to "the service started failing 3 minutes after a deploy that modified the database connection pool configuration."

Scoring without enrichment

Suspect scoring works even without GitHub enrichment. If no GitHub token is configured or the service is not linked to a repository, BlameTrail still:

Identifies deploys in the 60-minute window
Ranks them by temporal proximity
Shows commit SHA, branch, deployer, and commit message from the webhook payload

Enrichment adds depth, but the core scoring mechanism only depends on deploy timestamps.

Example timeline

14:00  Deploy v2.3.0 (branch: main, by: alice)
14:12  Deploy v2.3.1 (branch: hotfix/cache, by: bob)
14:15  Monitor starts failing
14:17  Incident created (3 consecutive failures)

In this scenario, BlameTrail would rank:

v2.3.1 (highest) — Deployed 3 minutes before first failure
v2.3.0 (lower) — Deployed 15 minutes before first failure

Both deploys appear on the incident page. If enrichment is available, you would see that v2.3.1 touched cache configuration files, immediately suggesting where to look.

Limitations

60-minute window — Deploys older than 60 minutes before the incident are not considered. If you suspect an older deploy caused the issue, check the deploy history manually.
Single service scope — Scoring only considers deploys to the service that owns the failing monitor. Cross-service incidents require manual correlation.
Temporal proximity only — The scoring algorithm does not analyze code content. It ranks by time, then relies on enrichment and AI summaries to provide code-level context.