MS Computer Science    UT Arlington    Dec 2026

Nitin
Singh
Rathore

I work on motion planning, fault tolerant control, and multi agent coordination for autonomous UAV swarms. Two years shipping production backend systems across enterprise environments. C and C++ instructor as GTA at UT Arlington.

01

About

Degree
MS Computer Science
UT Arlington, Dec 2026
Focus
Distributed Systems
Machine Learning
UAV Control
Multi Agent RL
Mentored
100+ graduate students

I am an MS Computer Science student at UT Arlington with two years of professional backend engineering behind me. Not side projects, but production Java and Apex services handling live enterprise workloads across a Salesforce ecosystem.

In parallel, I am doing graduate research in UAV swarm coordination and fault tolerant multi agent control, building simulation frameworks and training MARL policies that hold up under GPS drift, sensor corruption, and communication degradation. The kind of work where the system has to keep flying even when things break.

C++ was my first programming language, and I have taught C and C++ to students as a GTA at UT Arlington, supporting over 100 graduate students across Machine Learning, Data Science, and systems level courses.

I do my best work on hard technical problems with real constraints: bandwidth limited links, multi fault injection, production SLAs. If the system cannot afford to just restart and retry, that is exactly the environment I want to be in.

02

Research

Thesis — Defended

UAV Autonomy Research Suite

Fault tolerant supervisory control for autonomous UAV swarms

The problem: when a standard controller like PID encounters a fault, it keeps pushing harder regardless of whether the error is from wind, a bad sensor, or a broken link. The supervisory layer I built watches the whole swarm, separates real failure conditions from normal tracking error, and changes how drones respond before the situation destabilizes.

The supervisory architecture runs at 5Hz above PID and operates in four priority modes. Mode 3 handles connectivity rescue by compressing formation scale and increasing trajectory smoothing when inter-agent connectivity drops below threshold. Mode 2 freezes the PID integrator to prevent windup from amplifying fault response when actuator saturation is detected. Mode 1 applies bounded reference shifts to re-anchor the swarm center of mass when persistent drift is detected and sensors are trusted. Mode 0 leaves the supervisor inactive and lets PID run normally. The supervisor uses persistence checks before triggering any mode change, avoiding false positives from momentary signal noise.

Validated across 30 randomized seeds with controlled fault injection windows. Results are repeatable, not cherry picked. A ROS2 rclpy bridge publishes live swarm pose, odometry, actuator commands, and fault state streams. The entire stack runs in a Docker and ROS2 Jazzy containerized environment for reproducible multi-machine experiments.

!6 UAV formationfault active
Fault injection
Wind disturbance with persistent external force
Sensor corruption on position and velocity
Communication degradation and packet loss
Full agent dropout simulation
Supervisor modes
Mode 3: connectivity rescue, formation compression
Mode 2: anti-windup, integrator freeze
Mode 1: drift correction, bounded reference shift
Mode 0: inactive, standard PID
Evaluation
30 randomized seeds per scenario
Per-seed CSV telemetry and recovery metrics
Controlled fault windows, not random noise
Batch plot generation for paper figures
Infrastructure
ROS2 rclpy bridge for live swarm telemetry
PyBullet with Crazyflie drone dynamics
Docker and ROS2 Jazzy for reproducibility
CTDE-MAPPO policies trained with RLlib
Cross-Layer Supervisory Control for Low-Altitude UAV Swarm Networks
Under review    IEEE Network Magazine, 2026
Repository kept private during peer review. Now open source.

UAV Trajectory Tracking

Controller comparison study comparing open loop, PID, and an agentic supervisor across 20 randomized seeds with fault injection across wind, sensor corruption, and communication degradation.

The agentic supervisor runs at 5Hz on top of PID, distinguishes between sensor-corrupted and trusted agents, freezes reference shifts during severe communication loss, and uses persistence checks before triggering any mode change. Isolates how each controller behaves when tracking error comes from a real fault rather than normal lag.

GitHub

Multi-Agent Task Allocation

CTDE-MAPPO trained swarm policy for collaborative task allocation and adaptive relay routing across 6 UAVs under dynamic task arrivals and constrained communication and energy budgets.

The learned joint policy is evaluated against three non-learning baselines: static allocation, fixed relay tree, and energy-aware greedy. Measured on task completion rate, throughput, and Age of Information. Ablations isolate the contribution of task allocation and relay routing independently.

GitHub
03

Publications

Under Review
IEEE Network Magazine
2026

Cross-Layer Supervisory Control for Low-Altitude UAV Swarm Networks

This paper presents a cross-layer supervisory architecture for fault tolerant coordination in low-altitude UAV swarm networks. The supervisory layer operates above a classical PID control loop and makes fault classification decisions across the dynamics, sensing, and communication layers simultaneously, rather than treating each failure mode independently. The architecture is validated in a physics-based simulation environment across 30 randomized seeds with controlled fault injection across wind disturbance, sensor corruption, communication degradation, and agent dropout.

Nitin Singh Rathore    UT Arlington, 2026

Defended
MS Thesis
2026

Cross-Layer Supervisory Control for Low-Altitude UAV Swarm Networks

MS thesis at UT Arlington exploring how multi-UAV systems can remain controllable and coordinated when the environment becomes unreliable. Instead of replacing classical control with a learned policy, the work layers a diagnosis driven supervisory mechanism on top of a stable PID control loop, letting the swarm respond differently to dynamics faults, sensing faults, and communication degradation. The contribution is a systems perspective on how bounded decision loops, coordination, and supervision can coexist in a deployable autonomy architecture.

Nitin Singh Rathore    UT Arlington    Advisor: Dr. Md Salik Parwez

04

Projects

01
Systems

Traceback AI

Root cause analysis for distributed microservice failures    Nexus Hackathon

When something breaks across 10+ microservices, figuring out what actually caused it is slow and mostly guesswork. Traceback ingests logs, metrics, and deployment events, models inter-service dependencies as a graph, and surfaces ranked root cause hypotheses with evidence backed scoring.

The normalization pipeline standardizes 5 heterogeneous telemetry formats including Prometheus, structured JSON, and raw log streams, cutting normalization latency by 60%. Z-score anomaly detection adapts to each service's baseline behavior rather than applying a fixed cutoff, reducing false positive signals by 30%. Graph traversal traces failure propagation across 3+ dependency hops. The multi-factor ranking engine surfaces the correct root cause in top-3 results 87% of the time.

02
Product

JobPrep AI

Conversational RAG assistant    Deployed on GCP    12 to 15 active users

Reads a candidate's resume and generates personalized, context-aware answers for job application questions grounded in actual experience, not generic phrasing. A generic prompt has no grounding and fabricates specifics or stays vague. JobPrep retrieves actual content from the candidate's documents before generating, so answers reference real projects, real metrics, and real experience.

Built to run fully offline via Ollama with no external API dependency or data sent to third party services. FastAPI backend with LlamaIndex vector search achieving sub 2 second response times. Incremental embedding logic improves re-indexing efficiency by 45%. React frontend deployed on GCP Cloud Run serving real active user sessions.

05

Skills

Languages
C++CPythonJavaTypeScriptJavaScriptGolang
Autonomy & Simulation
ROS2PyBulletCTDE-MAPPORLlibPettingZooGymnasiumFault injectionMulti seed eval
Backend & Systems
FastAPIMicroservicesEvent driven arch.REST & SOAPPostgreSQLMySQLDocker
AI & ML
LlamaIndexRAG pipelinesOllamaMulti agent RLAnomaly detectionVector search
Cloud & DevOps
GCPAWS EC2 / ECS / S3CI/CDCodePipelineCloudWatchIAM
Frontend
ReactNext.jsTypeScriptReal time dashboardsData visualization
06

Experience

Graduate Teaching Assistant
Aug 2025 to Present
UT Arlington
  • TA for 4 graduate courses including Machine Learning, Data Science, Foundations of Computing, and Introduction to Programming, supporting 100+ graduate students through coursework, debugging sessions, and applied projects.
  • Diagnoses low-level C and C++ bugs for students: segmentation faults, memory allocation errors, and runtime undefined behavior across systems-level assignments.
  • Built a Python automated grading tool from scratch that validates submission structure and evaluates student code, eliminating manual review overhead across 100+ weekly submissions.
  • Conducting concurrent research in UAV swarm coordination, fault tolerant control, and multi agent reinforcement learning.
Junior Software Developer
Sept 2023 to Oct 2024
WERBOOZ India
  • Engineered and maintained 6 Java and Apex backend services across 3 enterprise clients in a Salesforce ecosystem, sustaining 99.8% API uptime and reducing manual processing overhead by 40%.
  • Optimized 15+ SQL and SOQL queries, cutting average query latency by 35% across production workloads.
  • Authored 500+ test cases across JUnit, Postman, and Tosca, reducing post-release defects by 30% and cutting QA cycle time by 2 days per sprint.
  • Resolved 12 critical production incidents within 2-hour SLA windows via log analysis and root cause debugging. Zero SLA breaches.
Software Developer Intern
Feb 2023 to Sept 2023
WERBOOZ India
  • Refactored 4 Java and SQL data access modules, reducing query latency from 320ms to 275ms and improving overall module efficiency by 15%.
  • Wrote JUnit test suites across 3 release cycles, catching and eliminating 20+ pre-production bugs before reaching QA.
  • Shipped 4 backend features via Git PR workflows across 2 major quarterly releases with full Agile participation from spec to deployment.

Let's build
something
that matters.

Working on autonomous systems, distributed infrastructure, or hard problems in defense tech? I want to hear about it.