Data Flow

Understanding how data flows through AIA helps you debug issues, optimize performance, and extend the system.

#Complete Flow Overview

Terminal
┌─────────────────────────────────────────────────────────────────┐ │ Your Application │ │ (with OTEL instrumentation) │ └────────────────────────────┬────────────────────────────────────┘ │ │ HTTP POST /v1/traces │ (OpenTelemetry Protocol) ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Agent Service (Port 4318) │ │ - Receives OTEL traces and logs │ │ - Runs error detectors (HTTP 5xx, exceptions, latency) │ │ - Deduplicates by trace ID (30s window) │ └────────────────────────────┬────────────────────────────────────┘ │ │ Detected incident ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Router Service (Port 3001) │ │ - Enriches incident with code snapshots │ │ - Extracts file paths from stack trace │ │ - Reads relevant source code │ └────────────────────────────┬────────────────────────────────────┘ │ │ Store incident ▼ ┌─────────────────────────────────────────────────────────────────┐ │ State Service (Port 3003) │ │ - Stores incident in PostgreSQL │ │ - Generates unique incident ID │ │ - Returns incident metadata │ └────────────────────────────┬────────────────────────────────────┘ │ │ Trigger analysis ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Autopsy Service (Port 3002) │ │ - Calls You.com API with incident context │ │ - Receives AI analysis: │ │ • Root cause explanation │ │ • Code patch (git diff) │ │ • AI fix prompt │ │ • Manual remediation steps │ │ - Validates and cleans patch │ └────────────────────────────┬────────────────────────────────────┘ │ │ Store autopsy result ▼ ┌─────────────────────────────────────────────────────────────────┐ │ State Service (Port 3003) │ │ - Stores autopsy result in PostgreSQL │ │ - Links to incident via incident_id │ └────────────────────────────┬────────────────────────────────────┘ │ │ Create PR ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Git Service (Port 3004) │ │ - Clones GitHub repository │ │ - Creates branch: aia/incident-{id} │ │ - Attempts to apply patch │ │ - Commits (or saves failed patch) │ │ - Pushes to GitHub │ │ - Creates Pull Request │ └────────────────────────────┬────────────────────────────────────┘ │ │ PR created ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Dashboard (Port 3000) │ │ - Displays incident list │ │ - Shows autopsy results │ │ - Provides AI fix prompt (copy button) │ │ - Links to GitHub PR │ │ - Generates PDF reports │ └─────────────────────────────────────────────────────────────────┘

#Detailed Flow Steps

Step 1: Error Occurs

Your application encounters an error:

Terminal
// Example: Null pointer error function getUser(id: string) { const user = db.query("SELECT * FROM users WHERE id = ?", [id]); return { name: user.name }; // ❌ user might be null }

Step 2: OTEL Trace Sent

OpenTelemetry SDK captures the error and sends a trace:

Terminal
{ "resourceSpans": [{ "scopeSpans": [{ "spans": [{ "traceId": "abc123...", "spanId": "def456...", "name": "GET /api/users/123", "status": { "code": 2 }, "attributes": { "http.status_code": 500, "http.method": "GET", "http.url": "/api/users/123" }, "events": [{ "name": "exception", "attributes": { "exception.type": "TypeError", "exception.message": "Cannot read property 'name' of undefined", "exception.stacktrace": "at getUser (src/api/users.ts:13:25)..." } }] }] }] }] }

Step 3: Agent Detection

Agent service receives the trace and runs detectors:

Terminal
// HTTP Error Detector if (span.status?.code === 2 && statusCode >= 500) { return { triggered: true, reason: `HTTP ${statusCode} Error: ${span.name}`, type: "http_error" }; } // Exception Detector if (event.name === "exception") { return { triggered: true, reason: `Uncaught Exception: ${type}: ${msg}`, type: "exception", stacktrace: stacktrace }; }

Deduplication:

Terminal
// Use trace ID to prevent duplicates const dedupeKey = span.traceId; if (incidentCache.has(dedupeKey)) { console.log("Skipping duplicate incident"); return; } incidentCache.set(dedupeKey, Date.now());

Step 4: Router Enrichment

Router extracts file information from stack trace:

Terminal
// Parse stack trace const stackLines = stacktrace.split('\n'); const fileMatch = stackLines[0].match(/at .* \((.*):(\d+):(\d+)\)/); if (fileMatch) { const [_, filePath, line, col] = fileMatch; // Read source code const fileContent = fs.readFileSync(filePath, 'utf-8'); // Create snapshot const snapshot = { path: filePath, content: fileContent, line_number: parseInt(line), column: parseInt(col) }; }

Step 5: State Persistence

State service stores incident in PostgreSQL:

Terminal
INSERT INTO incidents ( id, trace_id, error_type, error_message, stack_trace, file_snapshots, created_at ) VALUES ( 'inc_abc123', 'trace_abc123', 'exception', 'Cannot read property name of undefined', '...', '[{"path": "src/api/users.ts", ...}]', NOW() );

Step 6: AI Analysis

Autopsy service calls You.com API:

Request:

Terminal
{ "input": "Analyze this error and provide a fix:\n\nError: Cannot read property 'name' of undefined\nLocation: src/api/users.ts:13\n\nCode:\n...\n\nProvide:\n1. Root cause\n2. Git diff patch\n3. AI fix prompt\n4. Manual steps", "agent": "express" }

Response:

Terminal
{ "root_cause": "The user object is undefined when the database query returns null...", "patch": { "diff": "--- a/src/api/users.ts\n+++ b/src/api/users.ts\n..." }, "ai_fix_prompt": "Fix the null pointer error in getUser function...", "manual_steps": [ "Open src/api/users.ts", "Add null check after db.query", "..." ] }

Step 7: Autopsy Storage

Autopsy result stored in database:

Terminal
INSERT INTO autopsy_results ( incident_id, root_cause, patch_diff, ai_fix_prompt, manual_steps, confidence_score, created_at ) VALUES ( 'inc_abc123', 'The user object is undefined...', '--- a/src/api/users.ts...', 'Fix the null pointer error...', '["Open src/api/users.ts", ...]', 0.85, NOW() );

Step 8: Git Operations

Git service creates PR:

Terminal
# Clone repository git clone https://${GITHUB_TOKEN}@github.com/owner/repo.git # Create branch git checkout -b aia/incident-inc_abc123 # Apply patch git apply --ignore-space-change patch.diff # If patch fails, save it if [ $? -ne 0 ]; then cp patch.diff patch_failed_$(date +%s).diff git add patch_failed_*.diff fi # Commit git commit -m "fix: resolve incident inc_abc123" # Push git push origin aia/incident-inc_abc123 # Create PR via GitHub API curl -X POST https://api.github.com/repos/owner/repo/pulls \ -H "Authorization: token ${GITHUB_TOKEN}" \ -d '{ "title": "fix: resolve incident inc_abc123", "head": "aia/incident-inc_abc123", "base": "main", "body": "..." }'

Step 9: Dashboard Display

Dashboard fetches and displays data:

Terminal
// Fetch incidents const incidents = await fetch('http://localhost:3003/incidents'); // Fetch autopsy for each incident const autopsy = await fetch(`http://localhost:3003/autopsy/${incident.id}`); // Display in UI <div class="incident"> <h3>{incident.error_message}</h3> <p>{autopsy.root_cause}</p> <button onclick="copyToClipboard(autopsy.ai_fix_prompt)"> Copy AI Fix Prompt </button> <a href="{incident.pr_url}">View PR on GitHub</a> </div>

#Data Storage

PostgreSQL Schema

incidents table:

Terminal
CREATE TABLE incidents ( id VARCHAR(255) PRIMARY KEY, trace_id VARCHAR(255), error_type VARCHAR(50), error_message TEXT, stack_trace TEXT, file_snapshots JSONB, pr_url VARCHAR(500), created_at TIMESTAMP DEFAULT NOW() ); CREATE INDEX idx_incidents_trace_id ON incidents(trace_id); CREATE INDEX idx_incidents_created_at ON incidents(created_at);

autopsy_results table:

Terminal
CREATE TABLE autopsy_results ( id SERIAL PRIMARY KEY, incident_id VARCHAR(255) REFERENCES incidents(id), root_cause TEXT, patch_diff TEXT, ai_fix_prompt TEXT, manual_steps JSONB, confidence_score FLOAT, created_at TIMESTAMP DEFAULT NOW() ); CREATE INDEX idx_autopsy_incident_id ON autopsy_results(incident_id);

Cloudflare R2 (Optional)

Used for backup storage:

Terminal
/autopsy_results/ ├── inc_abc123.json ├── inc_def456.json └── ... /patches/ ├── inc_abc123.diff ├── inc_def456.diff └── ...

Local Filesystem

Terminal
apps/git/git_workspace/ └── {repo-name}/ ├── .git/ ├── src/ └── ... logs/ ├── agent.log ├── router.log ├── autopsy.log └── ...

#Performance Considerations

Latency Breakdown

| Step | Service | Typical Latency | | :--- | :--- | :--- | | OTEL trace sent | Application → Agent | 10-50ms | | Detection | Agent | 5-10ms | | Enrichment | Router | 50-100ms | | State storage | State | 10-20ms | | AI analysis | Autopsy → You.com | 2-5s | | Autopsy storage | State | 10-20ms | | Git operations | Git | 5-15s | | PR creation | Git → GitHub | 1-2s | | Total | | 8-22s |

Bottlenecks

  1. AI Analysis (2-5s)

    • Slowest step
    • Depends on You.com API
    • Use express model for faster results
  2. Git Operations (5-15s)

    • Repository cloning
    • Patch application
    • Push to GitHub

Optimization Tips

  1. Cache git clones

    • Reuse existing clones
    • Only fetch updates
  2. Parallel processing

    • Multiple incidents can be processed simultaneously
    • Each service is stateless
  3. Database indexing

    • Index on trace_id for deduplication
    • Index on created_at for dashboard queries

#Error Handling

Retry Logic

Agent → Router:

  • Retries: 3
  • Backoff: Exponential (1s, 2s, 4s)

Autopsy → You.com:

  • Retries: 2
  • Timeout: 30s

Git → GitHub:

  • Retries: 3
  • Handles rate limits

Failure Modes

| Failure | Handling | | :--- | :--- | | OTEL trace invalid | Log and skip | | No file snapshot | Use stack trace only | | AI API timeout | Retry, then fail gracefully | | Patch application fails | Save as patch_failed_*.diff | | PR creation fails | Log error, save to database | | Database unavailable | Queue in memory, retry |

#Monitoring

Key Metrics

  • Incidents detected per hour
  • Patch success rate
  • Average time to PR
  • AI API latency
  • Database query time

Logging

Each service logs to console:

Terminal
[Agent] Incident detected: HTTP 500 Error [Router] Enriched incident inc_abc123 [State] Stored incident inc_abc123 [Autopsy] Analysis complete for inc_abc123 [Git] PR created: https://github.com/owner/repo/pull/123 [Dashboard] Displaying 5 incidents

#Next Steps