IntegrationsObservability
Observability Integration
Connect your metrics, logging, and tracing providers to BlameTrail for unified observability during incidents.
Observability Integration
The observability integration connects your existing metrics, logging, and distributed tracing infrastructure to BlameTrail. When an incident occurs, BlameTrail automatically pulls relevant metrics, logs, and correlated traces from your providers, giving your team immediate context without switching tools.
Supported providers
| Provider | Metrics | Logs | Traces |
|---|---|---|---|
| Prometheus | Yes | No | No |
| Grafana Loki | No | Yes | No |
| Datadog | Yes | Yes | Yes |
| AWS CloudWatch | Yes | Yes | No |
| Tempo | No | No | Yes |
| Jaeger | No | No | Yes |
| Honeycomb | No | No | Yes |
| New Relic | No | No | Yes |
| Elastic APM | No | No | Yes |
| AWS X-Ray | No | No | Yes |
| Lightstep | No | No | Yes |
What it does
- Incident context — When an incident is created, BlameTrail queries your connected providers for metrics and logs around the incident time window (15 minutes before to 15 minutes after).
- Service-level overview — View health status, key metrics, and recent logs for any service with connected observability providers.
- Dedicated metrics explorer — Query and chart metrics across providers with preset templates or custom PromQL/Datadog queries.
- Log search — Search and filter logs across Loki and CloudWatch connections, with level-based filtering and cursor pagination.
- Trace correlation — When an incident has a suspect deploy, BlameTrail queries connected trace providers for traces around the deploy window and scores them by error density, timing proximity, latency deviation, and service overlap.
- Latency regression detection — Compares pre-deploy and post-deploy latency percentiles (p50, p95, p99) per operation to surface regressions introduced by the suspect deploy.
How it works
BlameTrail Service → Observability Mapping → Provider Connection
↓
Prometheus / Loki / Datadog /
CloudWatch / Tempo / Jaeger /
Honeycomb / New Relic / Elastic /
X-Ray / Lightstep
↓
Metrics, Logs & Traces
↓
Incident Context /
Service Overview /
Trace Correlation- You connect one or more observability providers via the Integrations page.
- You map each connection to the BlameTrail services it monitors.
- When an incident fires or you visit a service page, BlameTrail queries the mapped providers in parallel and presents a unified view.
Requirements
- A BlameTrail Starter or Pro plan
- Network access from BlameTrail to your provider endpoints (Prometheus, Loki, Datadog API, or AWS CloudWatch)
Plan limits
| Feature | Starter | Pro |
|---|---|---|
| Connections | 3 | 20 |
| Max time range | 7 days | 30 days |
| Max log lines per query | 500 | 1,000 |
| Concurrent provider queries | 3 | 10 |
Next steps
- Connecting Providers — Set up your first provider connection
- Service Mappings — Link providers to your services
- Querying Data — Use the metrics explorer and log search
- AI Summaries — Automatic AI-generated incident analysis
- Trace Correlation — Understand how BlameTrail correlates traces with incidents