Traceback AI — Automated Root-Cause Analysis for Distributed Systems
2025 – Present
Correlates logs, metrics, traces, and deployment events to rank likely incident causes with an evidence trail.
Architected an automated RCA system for distributed microservices by building an ingest + normalization pipeline, modeling service interactions as a temporal dependency graph, and applying causal scoring over time-ordered events. Used an LLM only for human-readable explanations while keeping inference deterministic and debuggable.