I work on motion planning, fault tolerant control, and multi agent coordination for autonomous UAV swarms. Two years shipping production backend systems across enterprise environments. C and C++ instructor as GTA at UT Arlington.
I am an MS Computer Science student at UT Arlington with two years of professional backend engineering behind me. Not side projects, but production Java and Apex services handling live enterprise workloads across a Salesforce ecosystem.
In parallel, I am doing graduate research in UAV swarm coordination and fault tolerant multi agent control, building simulation frameworks and training MARL policies that hold up under GPS drift, sensor corruption, and communication degradation. The kind of work where the system has to keep flying even when things break.
C++ was my first programming language, and I have taught C and C++ to students as a GTA at UT Arlington, supporting over 100 graduate students across Machine Learning, Data Science, and systems level courses.
I do my best work on hard technical problems with real constraints: bandwidth limited links, multi fault injection, production SLAs. If the system cannot afford to just restart and retry, that is exactly the environment I want to be in.
The problem: when a standard controller like PID encounters a fault, it keeps pushing harder regardless of whether the error is from wind, a bad sensor, or a broken link. The supervisory layer I built watches the whole swarm, separates real failure conditions from normal tracking error, and changes how drones respond before the situation destabilizes.
The supervisory architecture runs at 5Hz above PID and operates in four priority modes. Mode 3 handles connectivity rescue by compressing formation scale and increasing trajectory smoothing when inter-agent connectivity drops below threshold. Mode 2 freezes the PID integrator to prevent windup from amplifying fault response when actuator saturation is detected. Mode 1 applies bounded reference shifts to re-anchor the swarm center of mass when persistent drift is detected and sensors are trusted. Mode 0 leaves the supervisor inactive and lets PID run normally. The supervisor uses persistence checks before triggering any mode change, avoiding false positives from momentary signal noise.
Validated across 30 randomized seeds with controlled fault injection windows. Results are repeatable, not cherry picked. A ROS2 rclpy bridge publishes live swarm pose, odometry, actuator commands, and fault state streams. The entire stack runs in a Docker and ROS2 Jazzy containerized environment for reproducible multi-machine experiments.
Controller comparison study comparing open loop, PID, and an agentic supervisor across 20 randomized seeds with fault injection across wind, sensor corruption, and communication degradation.
The agentic supervisor runs at 5Hz on top of PID, distinguishes between sensor-corrupted and trusted agents, freezes reference shifts during severe communication loss, and uses persistence checks before triggering any mode change. Isolates how each controller behaves when tracking error comes from a real fault rather than normal lag.
GitHubCTDE-MAPPO trained swarm policy for collaborative task allocation and adaptive relay routing across 6 UAVs under dynamic task arrivals and constrained communication and energy budgets.
The learned joint policy is evaluated against three non-learning baselines: static allocation, fixed relay tree, and energy-aware greedy. Measured on task completion rate, throughput, and Age of Information. Ablations isolate the contribution of task allocation and relay routing independently.
GitHubThis paper presents a cross-layer supervisory architecture for fault tolerant coordination in low-altitude UAV swarm networks. The supervisory layer operates above a classical PID control loop and makes fault classification decisions across the dynamics, sensing, and communication layers simultaneously, rather than treating each failure mode independently. The architecture is validated in a physics-based simulation environment across 30 randomized seeds with controlled fault injection across wind disturbance, sensor corruption, communication degradation, and agent dropout.
MS thesis at UT Arlington exploring how multi-UAV systems can remain controllable and coordinated when the environment becomes unreliable. Instead of replacing classical control with a learned policy, the work layers a diagnosis driven supervisory mechanism on top of a stable PID control loop, letting the swarm respond differently to dynamics faults, sensing faults, and communication degradation. The contribution is a systems perspective on how bounded decision loops, coordination, and supervision can coexist in a deployable autonomy architecture.
When something breaks across 10+ microservices, figuring out what actually caused it is slow and mostly guesswork. Traceback ingests logs, metrics, and deployment events, models inter-service dependencies as a graph, and surfaces ranked root cause hypotheses with evidence backed scoring.
The normalization pipeline standardizes 5 heterogeneous telemetry formats including Prometheus, structured JSON, and raw log streams, cutting normalization latency by 60%. Z-score anomaly detection adapts to each service's baseline behavior rather than applying a fixed cutoff, reducing false positive signals by 30%. Graph traversal traces failure propagation across 3+ dependency hops. The multi-factor ranking engine surfaces the correct root cause in top-3 results 87% of the time.
Reads a candidate's resume and generates personalized, context-aware answers for job application questions grounded in actual experience, not generic phrasing. A generic prompt has no grounding and fabricates specifics or stays vague. JobPrep retrieves actual content from the candidate's documents before generating, so answers reference real projects, real metrics, and real experience.
Built to run fully offline via Ollama with no external API dependency or data sent to third party services. FastAPI backend with LlamaIndex vector search achieving sub 2 second response times. Incremental embedding logic improves re-indexing efficiency by 45%. React frontend deployed on GCP Cloud Run serving real active user sessions.
Working on autonomous systems, distributed infrastructure, or hard problems in defense tech? I want to hear about it.