OpenTelemetry Integration

AIA uses OpenTelemetry to receive traces and logs from your application, automatically detecting errors and triggering incident analysis.

#Overview

The Agent service runs an OpenTelemetry receiver on port 4318 (HTTP). Your application sends telemetry data to this endpoint, and AIA automatically detects:

  • HTTP 5xx errors
  • Uncaught exceptions
  • Latency spikes (>2000ms)
  • Process crashes (via log patterns)

#Quick Setup

Environment Variables

Set these in your application's environment:

Terminal
OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318" OTEL_SERVICE_NAME="my-app"

Important: The endpoint is port 4318 (Agent service), not 3001 (Router).

#Node.js / Bun / Express

Step 1: Install Dependencies

Terminal
bun add @opentelemetry/api @opentelemetry/sdk-node \ @opentelemetry/auto-instrumentations-node \ @opentelemetry/exporter-trace-otlp-http

Step 2: Create Instrumentation File

Create instrumentation.ts:

Terminal
import { NodeSDK } from '@opentelemetry/sdk-node'; import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http'; import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node'; const sdk = new NodeSDK({ traceExporter: new OTLPTraceExporter({ // AIA Agent endpoint url: 'http://localhost:4318/v1/traces', }), instrumentations: [getNodeAutoInstrumentations()], serviceName: process.env.OTEL_SERVICE_NAME || 'my-app', }); sdk.start(); // Graceful shutdown process.on('SIGTERM', () => { sdk.shutdown() .then(() => console.log('Tracing terminated')) .catch((error) => console.log('Error terminating tracing', error)) .finally(() => process.exit(0)); });

Step 3: Run Your Application

Terminal
bun run -r ./instrumentation.ts server.ts

#Next.js

Step 1: Install Dependencies

Terminal
bun add @opentelemetry/api @opentelemetry/sdk-node \ @opentelemetry/auto-instrumentations-node \ @opentelemetry/exporter-trace-otlp-http

Step 2: Create Instrumentation

Create instrumentation.ts in your project root (or src/):

Terminal
export async function register() { if (process.env.NEXT_RUNTIME === 'nodejs') { const { NodeSDK } = await import('@opentelemetry/sdk-node'); const { OTLPTraceExporter } = await import('@opentelemetry/exporter-trace-otlp-http'); const { getNodeAutoInstrumentations } = await import('@opentelemetry/auto-instrumentations-node'); const sdk = new NodeSDK({ traceExporter: new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + '/v1/traces', }), instrumentations: [getNodeAutoInstrumentations()], }); sdk.start(); } }

Step 3: Configure Environment

Add to .env.local:

Terminal
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 OTEL_SERVICE_NAME=my-nextjs-app

Step 4: Enable Instrumentation

In next.config.js:

Terminal
module.exports = { experimental: { instrumentationHook: true, }, }

#Detectors

AIA automatically detects the following error types:

1. HTTP 5xx Errors

Triggered when:

  • Span status code is 2 (ERROR)
  • HTTP status code is >= 500

Example trace:

Terminal
{ "name": "GET /api/users", "status": { "code": 2 }, "attributes": { "http.status_code": 500 } }

2. Uncaught Exceptions

Triggered when:

  • Span contains an exception event
  • Event has exception.type and exception.message

Example trace:

Terminal
{ "events": [{ "name": "exception", "attributes": { "exception.type": "TypeError", "exception.message": "Cannot read property 'x' of undefined", "exception.stacktrace": "..." } }] }

3. Latency Spikes

Triggered when:

  • Span duration > 2000ms
  • Span status is ERROR

4. Process Crashes

Triggered when:

  • Log contains crash patterns
  • Keywords: "SIGTERM", "SIGKILL", "fatal error"

#Deduplication

AIA deduplicates incidents using trace ID:

  • Same trace ID within 30 seconds = same incident
  • Prevents duplicate PRs for multi-faceted errors
  • Example: HTTP 500 + exception in same trace = 1 incident

#Testing

Using Sample App

The included sample app demonstrates OTEL integration:

Terminal
# Trigger test error curl -X POST http://localhost:3008/trigger \ -H "Content-Type: application/json" \ -d '{"action":"cause_error"}'

Manual Testing

Send a test trace:

Terminal
curl -X POST http://localhost:4318/v1/traces \ -H "Content-Type: application/json" \ -d '{ "resourceSpans": [{ "scopeSpans": [{ "spans": [{ "name": "test-error", "status": {"code": 2}, "attributes": { "http.status_code": 500 } }] }] }] }'

#Data Privacy

  • AIA only analyzes error traces
  • Non-error spans are ignored
  • Sanitize PII before exporting (recommended)
  • Use OTEL processors to filter sensitive data

#Troubleshooting

"No traces received"

  • Check endpoint: http://localhost:4318 (not 3001)
  • Verify Agent service is running
  • Check network connectivity

"Traces sent but no incidents"

  • Ensure traces have error status (code: 2)
  • Check HTTP status code is >= 500
  • Verify exception events are formatted correctly

"Duplicate incidents"

  • Check if traces have different trace IDs
  • Deduplication window is 30 seconds
  • Same trace ID = same incident

#Next Steps