Data Flow

Understanding how data flows through AIA helps you debug issues, optimize performance, and extend the system.

#Complete Flow Overview

Terminal

┌─────────────────────────────────────────────────────────────────┐
│                        Your Application                          │
│                    (with OTEL instrumentation)                   │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             │ HTTP POST /v1/traces
                             │ (OpenTelemetry Protocol)
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                      Agent Service (Port 4318)                   │
│  - Receives OTEL traces and logs                                │
│  - Runs error detectors (HTTP 5xx, exceptions, latency)         │
│  - Deduplicates by trace ID (30s window)                        │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             │ Detected incident
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Router Service (Port 3001)                   │
│  - Enriches incident with code snapshots                        │
│  - Extracts file paths from stack trace                         │
│  - Reads relevant source code                                   │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             │ Store incident
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                      State Service (Port 3003)                   │
│  - Stores incident in PostgreSQL                                │
│  - Generates unique incident ID                                 │
│  - Returns incident metadata                                    │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             │ Trigger analysis
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Autopsy Service (Port 3002)                   │
│  - Calls You.com API with incident context                      │
│  - Receives AI analysis:                                        │
│    • Root cause explanation                                     │
│    • Code patch (git diff)                                      │
│    • AI fix prompt                                              │
│    • Manual remediation steps                                   │
│  - Validates and cleans patch                                   │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             │ Store autopsy result
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                      State Service (Port 3003)                   │
│  - Stores autopsy result in PostgreSQL                          │
│  - Links to incident via incident_id                            │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             │ Create PR
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                       Git Service (Port 3004)                    │
│  - Clones GitHub repository                                     │
│  - Creates branch: aia/incident-{id}                            │
│  - Attempts to apply patch                                      │
│  - Commits (or saves failed patch)                              │
│  - Pushes to GitHub                                             │
│  - Creates Pull Request                                         │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             │ PR created
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Dashboard (Port 3000)                         │
│  - Displays incident list                                       │
│  - Shows autopsy results                                        │
│  - Provides AI fix prompt (copy button)                         │
│  - Links to GitHub PR                                           │
│  - Generates PDF reports                                        │
└─────────────────────────────────────────────────────────────────┘

#Detailed Flow Steps

Step 1: Error Occurs

Your application encounters an error:

Terminal

// Example: Null pointer error
function getUser(id: string) {
  const user = db.query("SELECT * FROM users WHERE id = ?", [id]);
  return { name: user.name }; // ❌ user might be null
}

Step 2: OTEL Trace Sent

OpenTelemetry SDK captures the error and sends a trace:

Terminal

{
  "resourceSpans": [{
    "scopeSpans": [{
      "spans": [{
        "traceId": "abc123...",
        "spanId": "def456...",
        "name": "GET /api/users/123",
        "status": { "code": 2 },
        "attributes": {
          "http.status_code": 500,
          "http.method": "GET",
          "http.url": "/api/users/123"
        },
        "events": [{
          "name": "exception",
          "attributes": {
            "exception.type": "TypeError",
            "exception.message": "Cannot read property 'name' of undefined",
            "exception.stacktrace": "at getUser (src/api/users.ts:13:25)..."
          }
        }]
      }]
    }]
  }]
}

Step 3: Agent Detection

Agent service receives the trace and runs detectors:

Terminal

// HTTP Error Detector
if (span.status?.code === 2 && statusCode >= 500) {
  return {
    triggered: true,
    reason: `HTTP ${statusCode} Error: ${span.name}`,
    type: "http_error"
  };
}

// Exception Detector
if (event.name === "exception") {
  return {
    triggered: true,
    reason: `Uncaught Exception: ${type}: ${msg}`,
    type: "exception",
    stacktrace: stacktrace
  };
}

Deduplication:

Terminal

// Use trace ID to prevent duplicates
const dedupeKey = span.traceId;
if (incidentCache.has(dedupeKey)) {
  console.log("Skipping duplicate incident");
  return;
}
incidentCache.set(dedupeKey, Date.now());

Step 4: Router Enrichment

Router extracts file information from stack trace:

Terminal

// Parse stack trace
const stackLines = stacktrace.split('\n');
const fileMatch = stackLines[0].match(/at .* \((.*):(\d+):(\d+)\)/);

if (fileMatch) {
  const [_, filePath, line, col] = fileMatch;
  
  // Read source code
  const fileContent = fs.readFileSync(filePath, 'utf-8');
  
  // Create snapshot
  const snapshot = {
    path: filePath,
    content: fileContent,
    line_number: parseInt(line),
    column: parseInt(col)
  };
}

Step 5: State Persistence

State service stores incident in PostgreSQL:

Terminal

INSERT INTO incidents (
  id,
  trace_id,
  error_type,
  error_message,
  stack_trace,
  file_snapshots,
  created_at
) VALUES (
  'inc_abc123',
  'trace_abc123',
  'exception',
  'Cannot read property name of undefined',
  '...',
  '[{"path": "src/api/users.ts", ...}]',
  NOW()
);

Step 6: AI Analysis

Autopsy service calls You.com API:

Request:

Terminal

{
  "input": "Analyze this error and provide a fix:\n\nError: Cannot read property 'name' of undefined\nLocation: src/api/users.ts:13\n\nCode:\n...\n\nProvide:\n1. Root cause\n2. Git diff patch\n3. AI fix prompt\n4. Manual steps",
  "agent": "express"
}

Response:

Terminal

{
  "root_cause": "The user object is undefined when the database query returns null...",
  "patch": {
    "diff": "--- a/src/api/users.ts\n+++ b/src/api/users.ts\n..."
  },
  "ai_fix_prompt": "Fix the null pointer error in getUser function...",
  "manual_steps": [
    "Open src/api/users.ts",
    "Add null check after db.query",
    "..."
  ]
}

Step 7: Autopsy Storage

Autopsy result stored in database:

Terminal

INSERT INTO autopsy_results (
  incident_id,
  root_cause,
  patch_diff,
  ai_fix_prompt,
  manual_steps,
  confidence_score,
  created_at
) VALUES (
  'inc_abc123',
  'The user object is undefined...',
  '--- a/src/api/users.ts...',
  'Fix the null pointer error...',
  '["Open src/api/users.ts", ...]',
  0.85,
  NOW()
);

Step 8: Git Operations

Git service creates PR:

Terminal

# Clone repository
git clone https://${GITHUB_TOKEN}@github.com/owner/repo.git

# Create branch
git checkout -b aia/incident-inc_abc123

# Apply patch
git apply --ignore-space-change patch.diff

# If patch fails, save it
if [ $? -ne 0 ]; then
  cp patch.diff patch_failed_$(date +%s).diff
  git add patch_failed_*.diff
fi

# Commit
git commit -m "fix: resolve incident inc_abc123"

# Push
git push origin aia/incident-inc_abc123

# Create PR via GitHub API
curl -X POST https://api.github.com/repos/owner/repo/pulls \
  -H "Authorization: token ${GITHUB_TOKEN}" \
  -d '{
    "title": "fix: resolve incident inc_abc123",
    "head": "aia/incident-inc_abc123",
    "base": "main",
    "body": "..."
  }'

Step 9: Dashboard Display

Dashboard fetches and displays data:

Terminal

// Fetch incidents
const incidents = await fetch('http://localhost:3003/incidents');

// Fetch autopsy for each incident
const autopsy = await fetch(`http://localhost:3003/autopsy/${incident.id}`);

// Display in UI
<div class="incident">
  <h3>{incident.error_message}</h3>
  <p>{autopsy.root_cause}</p>
  <button onclick="copyToClipboard(autopsy.ai_fix_prompt)">
    Copy AI Fix Prompt
  </button>
  <a href="{incident.pr_url}">View PR on GitHub</a>
</div>

#Data Storage

PostgreSQL Schema

incidents table:

Terminal

CREATE TABLE incidents (
  id VARCHAR(255) PRIMARY KEY,
  trace_id VARCHAR(255),
  error_type VARCHAR(50),
  error_message TEXT,
  stack_trace TEXT,
  file_snapshots JSONB,
  pr_url VARCHAR(500),
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_incidents_trace_id ON incidents(trace_id);
CREATE INDEX idx_incidents_created_at ON incidents(created_at);

autopsy_results table:

Terminal

CREATE TABLE autopsy_results (
  id SERIAL PRIMARY KEY,
  incident_id VARCHAR(255) REFERENCES incidents(id),
  root_cause TEXT,
  patch_diff TEXT,
  ai_fix_prompt TEXT,
  manual_steps JSONB,
  confidence_score FLOAT,
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_autopsy_incident_id ON autopsy_results(incident_id);

Cloudflare R2 (Optional)

Used for backup storage:

Terminal

/autopsy_results/
  ├── inc_abc123.json
  ├── inc_def456.json
  └── ...

/patches/
  ├── inc_abc123.diff
  ├── inc_def456.diff
  └── ...

Local Filesystem

Terminal

apps/git/git_workspace/
  └── {repo-name}/
      ├── .git/
      ├── src/
      └── ...

logs/
  ├── agent.log
  ├── router.log
  ├── autopsy.log
  └── ...

#Performance Considerations

| Step | Service | Typical Latency | | :--- | :--- | :--- | | OTEL trace sent | Application → Agent | 10-50ms | | Detection | Agent | 5-10ms | | Enrichment | Router | 50-100ms | | State storage | State | 10-20ms | | AI analysis | Autopsy → You.com | 2-5s | | Autopsy storage | State | 10-20ms | | Git operations | Git | 5-15s | | PR creation | Git → GitHub | 1-2s | | Total | | 8-22s |

Bottlenecks

AI Analysis (2-5s)
- Slowest step
- Depends on You.com API
- Use express model for faster results
Git Operations (5-15s)
- Repository cloning
- Patch application
- Push to GitHub

Optimization Tips

Cache git clones
- Reuse existing clones
- Only fetch updates
Parallel processing
- Multiple incidents can be processed simultaneously
- Each service is stateless
Database indexing
- Index on trace_id for deduplication
- Index on created_at for dashboard queries

#Error Handling

Retry Logic

Agent → Router:

Retries: 3
Backoff: Exponential (1s, 2s, 4s)

Autopsy → You.com:

Retries: 2
Timeout: 30s

Git → GitHub:

Retries: 3
Handles rate limits

Failure Modes

#Monitoring

Key Metrics

Incidents detected per hour
Patch success rate
Average time to PR
AI API latency
Database query time

Logging

Each service logs to console:

Terminal

[Agent] Incident detected: HTTP 500 Error
[Router] Enriched incident inc_abc123
[State] Stored incident inc_abc123
[Autopsy] Analysis complete for inc_abc123
[Git] PR created: https://github.com/owner/repo/pull/123
[Dashboard] Displaying 5 incidents

#Next Steps

Architecture - Service details
OpenTelemetry - Instrumentation
AI Engine - How analysis works
GitHub Integration - PR creation

Architecture AI Engine