OpenTelemetry Integration
AIA uses OpenTelemetry to receive traces and logs from your application, automatically detecting errors and triggering incident analysis.
#Overview
The Agent service runs an OpenTelemetry receiver on port 4318 (HTTP). Your application sends telemetry data to this endpoint, and AIA automatically detects:
- HTTP 5xx errors
- Uncaught exceptions
- Latency spikes (>2000ms)
- Process crashes (via log patterns)
#Quick Setup
Environment Variables
Set these in your application's environment:
OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
OTEL_SERVICE_NAME="my-app"
Important: The endpoint is port 4318 (Agent service), not 3001 (Router).
#Node.js / Bun / Express
Step 1: Install Dependencies
bun add @opentelemetry/api @opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http
Step 2: Create Instrumentation File
Create instrumentation.ts:
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
// AIA Agent endpoint
url: 'http://localhost:4318/v1/traces',
}),
instrumentations: [getNodeAutoInstrumentations()],
serviceName: process.env.OTEL_SERVICE_NAME || 'my-app',
});
sdk.start();
// Graceful shutdown
process.on('SIGTERM', () => {
sdk.shutdown()
.then(() => console.log('Tracing terminated'))
.catch((error) => console.log('Error terminating tracing', error))
.finally(() => process.exit(0));
});
Step 3: Run Your Application
bun run -r ./instrumentation.ts server.ts
#Next.js
Step 1: Install Dependencies
bun add @opentelemetry/api @opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-http
Step 2: Create Instrumentation
Create instrumentation.ts in your project root (or src/):
export async function register() {
if (process.env.NEXT_RUNTIME === 'nodejs') {
const { NodeSDK } = await import('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = await import('@opentelemetry/exporter-trace-otlp-http');
const { getNodeAutoInstrumentations } = await import('@opentelemetry/auto-instrumentations-node');
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces',
}),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
}
}
Step 3: Configure Environment
Add to .env.local:
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OTEL_SERVICE_NAME=my-nextjs-app
Step 4: Enable Instrumentation
In next.config.js:
module.exports = {
experimental: {
instrumentationHook: true,
},
}
#Detectors
AIA automatically detects the following error types:
1. HTTP 5xx Errors
Triggered when:
- Span status code is
2(ERROR) - HTTP status code is
>= 500
Example trace:
{
"name": "GET /api/users",
"status": { "code": 2 },
"attributes": {
"http.status_code": 500
}
}
2. Uncaught Exceptions
Triggered when:
- Span contains an
exceptionevent - Event has
exception.typeandexception.message
Example trace:
{
"events": [{
"name": "exception",
"attributes": {
"exception.type": "TypeError",
"exception.message": "Cannot read property 'x' of undefined",
"exception.stacktrace": "..."
}
}]
}
3. Latency Spikes
Triggered when:
- Span duration > 2000ms
- Span status is ERROR
4. Process Crashes
Triggered when:
- Log contains crash patterns
- Keywords: "SIGTERM", "SIGKILL", "fatal error"
#Deduplication
AIA deduplicates incidents using trace ID:
- Same trace ID within 30 seconds = same incident
- Prevents duplicate PRs for multi-faceted errors
- Example: HTTP 500 + exception in same trace = 1 incident
#Testing
Using Sample App
The included sample app demonstrates OTEL integration:
# Trigger test error
curl -X POST http://localhost:3008/trigger \
-H "Content-Type: application/json" \
-d '{"action":"cause_error"}'
Manual Testing
Send a test trace:
curl -X POST http://localhost:4318/v1/traces \
-H "Content-Type: application/json" \
-d '{
"resourceSpans": [{
"scopeSpans": [{
"spans": [{
"name": "test-error",
"status": {"code": 2},
"attributes": {
"http.status_code": 500
}
}]
}]
}]
}'
#Data Privacy
- AIA only analyzes error traces
- Non-error spans are ignored
- Sanitize PII before exporting (recommended)
- Use OTEL processors to filter sensitive data
#Troubleshooting
"No traces received"
- Check endpoint:
http://localhost:4318(not 3001) - Verify Agent service is running
- Check network connectivity
"Traces sent but no incidents"
- Ensure traces have error status (
code: 2) - Check HTTP status code is >= 500
- Verify exception events are formatted correctly
"Duplicate incidents"
- Check if traces have different trace IDs
- Deduplication window is 30 seconds
- Same trace ID = same incident
#Next Steps
- Architecture - How detection works
- Getting Started - Full setup guide
- Troubleshooting - Common issues