MS Computer Science UT Arlington Dec 2026

Nitin
Singh
Rathore

I work on motion planning, fault tolerant control, and multi agent coordination for autonomous UAV swarms. Two years shipping production backend systems across enterprise environments. C and C++ instructor as GTA at UT Arlington.

GitHub LinkedIn Research Projects

About

Degree

MS Computer Science
UT Arlington, Dec 2026

Focus

Distributed Systems
Machine Learning
UAV Control
Multi Agent RL

Mentored

100+ graduate students

I am an MS Computer Science student at UT Arlington with two years of professional backend engineering behind me. Not side projects, but production Java and Apex services handling live enterprise workloads across a Salesforce ecosystem.

In parallel, I am doing graduate research in UAV swarm coordination and fault tolerant multi agent control, building simulation frameworks and training MARL policies that hold up under GPS drift, sensor corruption, and communication degradation. The kind of work where the system has to keep flying even when things break.

C++ was my first programming language, and I have taught C and C++ to students as a GTA at UT Arlington, supporting over 100 graduate students across Machine Learning, Data Science, and systems level courses.

I do my best work on hard technical problems with real constraints: bandwidth limited links, multi fault injection, production SLAs. If the system cannot afford to just restart and retry, that is exactly the environment I want to be in.

Research

Thesis — Defended

UAV Autonomy Research Suite

Fault tolerant supervisory control for autonomous UAV swarms

The problem: when a standard controller like PID encounters a fault, it keeps pushing harder regardless of whether the error is from wind, a bad sensor, or a broken link. The supervisory layer I built watches the whole swarm, separates real failure conditions from normal tracking error, and changes how drones respond before the situation destabilizes.

The supervisory architecture runs at 5Hz above PID and operates in four priority modes. Mode 3 handles connectivity rescue by compressing formation scale and increasing trajectory smoothing when inter-agent connectivity drops below threshold. Mode 2 freezes the PID integrator to prevent windup from amplifying fault response when actuator saturation is detected. Mode 1 applies bounded reference shifts to re-anchor the swarm center of mass when persistent drift is detected and sensors are trusted. Mode 0 leaves the supervisor inactive and lets PID run normally. The supervisor uses persistence checks before triggering any mode change, avoiding false positives from momentary signal noise.

Validated across 30 randomized seeds with controlled fault injection windows. Results are repeatable, not cherry picked. A ROS2 rclpy bridge publishes live swarm pose, odometry, actuator commands, and fault state streams. The entire stack runs in a Docker and ROS2 Jazzy containerized environment for reproducible multi-machine experiments.

Fault injection

Wind disturbance with persistent external force

Sensor corruption on position and velocity

Communication degradation and packet loss

Full agent dropout simulation

Supervisor modes

Mode 3: connectivity rescue, formation compression

Mode 2: anti-windup, integrator freeze

Mode 1: drift correction, bounded reference shift

Mode 0: inactive, standard PID

Evaluation

30 randomized seeds per scenario

Per-seed CSV telemetry and recovery metrics

Controlled fault windows, not random noise

Batch plot generation for paper figures

Infrastructure

ROS2 rclpy bridge for live swarm telemetry

PyBullet with Crazyflie drone dynamics

Docker and ROS2 Jazzy for reproducibility

CTDE-MAPPO policies trained with RLlib

Cross-Layer Supervisory Control for Low-Altitude UAV Swarm Networks
Under review IEEE Network Magazine, 2026
Repository kept private during peer review. Now open source.

View on GitHub Read case study

Case study 01

UAV Trajectory Tracking

Controller comparison study comparing open loop, PID, and an agentic supervisor across 20 randomized seeds with fault injection across wind, sensor corruption, and communication degradation.

The agentic supervisor runs at 5Hz on top of PID, distinguishes between sensor-corrupted and trusted agents, freezes reference shifts during severe communication loss, and uses persistence checks before triggering any mode change. Isolates how each controller behaves when tracking error comes from a real fault rather than normal lag.

GitHub

Case study 02

Multi-Agent Task Allocation

CTDE-MAPPO trained swarm policy for collaborative task allocation and adaptive relay routing across 6 UAVs under dynamic task arrivals and constrained communication and energy budgets.

The learned joint policy is evaluated against three non-learning baselines: static allocation, fixed relay tree, and energy-aware greedy. Measured on task completion rate, throughput, and Age of Information. Ablations isolate the contribution of task allocation and relay routing independently.

GitHub

Publications

Under Review

IEEE Network Magazine

2026

Cross-Layer Supervisory Control for Low-Altitude UAV Swarm Networks

This paper presents a cross-layer supervisory architecture for fault tolerant coordination in low-altitude UAV swarm networks. The supervisory layer operates above a classical PID control loop and makes fault classification decisions across the dynamics, sensing, and communication layers simultaneously, rather than treating each failure mode independently. The architecture is validated in a physics-based simulation environment across 30 randomized seeds with controlled fault injection across wind disturbance, sensor corruption, communication degradation, and agent dropout.

Nitin Singh Rathore UT Arlington, 2026

Defended

MS Thesis

2026

Cross-Layer Supervisory Control for Low-Altitude UAV Swarm Networks

MS thesis at UT Arlington exploring how multi-UAV systems can remain controllable and coordinated when the environment becomes unreliable. Instead of replacing classical control with a learned policy, the work layers a diagnosis driven supervisory mechanism on top of a stable PID control loop, letting the swarm respond differently to dynamics faults, sensing faults, and communication degradation. The contribution is a systems perspective on how bounded decision loops, coordination, and supervision can coexist in a deployable autonomy architecture.

Nitin Singh Rathore UT Arlington Advisor: Dr. Md Salik Parwez

Projects

Systems

Traceback AI

Root cause analysis for distributed microservice failures Nexus Hackathon

When something breaks across 10+ microservices, figuring out what actually caused it is slow and mostly guesswork. Traceback ingests logs, metrics, and deployment events, models inter-service dependencies as a graph, and surfaces ranked root cause hypotheses with evidence backed scoring.

The normalization pipeline standardizes 5 heterogeneous telemetry formats including Prometheus, structured JSON, and raw log streams, cutting normalization latency by 60%. Z-score anomaly detection adapts to each service's baseline behavior rather than applying a fixed cutoff, reducing false positive signals by 30%. Graph traversal traces failure propagation across 3+ dependency hops. The multi-factor ranking engine surfaces the correct root cause in top-3 results 87% of the time.

FastAPIPythonGraph traversalZ-Score anomalyOpenTelemetry

GitHub

Product

JobPrep AI

Conversational RAG assistant Deployed on GCP 12 to 15 active users

Reads a candidate's resume and generates personalized, context-aware answers for job application questions grounded in actual experience, not generic phrasing. A generic prompt has no grounding and fabricates specifics or stays vague. JobPrep retrieves actual content from the candidate's documents before generating, so answers reference real projects, real metrics, and real experience.

Built to run fully offline via Ollama with no external API dependency or data sent to third party services. FastAPI backend with LlamaIndex vector search achieving sub 2 second response times. Incremental embedding logic improves re-indexing efficiency by 45%. React frontend deployed on GCP Cloud Run serving real active user sessions.

FastAPILlamaIndexOllamaReactGCP Cloud Run

GitHub

Skills

Languages

C++CPythonJavaTypeScriptJavaScriptGolang

Autonomy & Simulation

ROS2PyBulletCTDE-MAPPORLlibPettingZooGymnasiumFault injectionMulti seed eval

Backend & Systems

FastAPIMicroservicesEvent driven arch.REST & SOAPPostgreSQLMySQLDocker

AI & ML

LlamaIndexRAG pipelinesOllamaMulti agent RLAnomaly detectionVector search

Cloud & DevOps

GCPAWS EC2 / ECS / S3CI/CDCodePipelineCloudWatchIAM

Frontend

ReactNext.jsTypeScriptReal time dashboardsData visualization

Experience

Graduate Teaching Assistant

Aug 2025 to Present

UT Arlington

TA for 4 graduate courses including Machine Learning, Data Science, Foundations of Computing, and Introduction to Programming, supporting 100+ graduate students through coursework, debugging sessions, and applied projects.
Diagnoses low-level C and C++ bugs for students: segmentation faults, memory allocation errors, and runtime undefined behavior across systems-level assignments.
Built a Python automated grading tool from scratch that validates submission structure and evaluates student code, eliminating manual review overhead across 100+ weekly submissions.
Conducting concurrent research in UAV swarm coordination, fault tolerant control, and multi agent reinforcement learning.

Junior Software Developer

Sept 2023 to Oct 2024

WERBOOZ India

Engineered and maintained 6 Java and Apex backend services across 3 enterprise clients in a Salesforce ecosystem, sustaining 99.8% API uptime and reducing manual processing overhead by 40%.
Optimized 15+ SQL and SOQL queries, cutting average query latency by 35% across production workloads.
Authored 500+ test cases across JUnit, Postman, and Tosca, reducing post-release defects by 30% and cutting QA cycle time by 2 days per sprint.
Resolved 12 critical production incidents within 2-hour SLA windows via log analysis and root cause debugging. Zero SLA breaches.

Software Developer Intern

Feb 2023 to Sept 2023

WERBOOZ India

Refactored 4 Java and SQL data access modules, reducing query latency from 320ms to 275ms and improving overall module efficiency by 15%.
Wrote JUnit test suites across 3 release cycles, catching and eliminating 20+ pre-production bugs before reaching QA.
Shipped 4 backend features via Git PR workflows across 2 major quarterly releases with full Agile participation from spec to deployment.

Let's build
something
that matters.

Working on autonomous systems, distributed infrastructure, or hard problems in defense tech? I want to hear about it.

nxr3560@mavs.uta.edu

GitHub LinkedIn

NitinSinghRathore

About

Research

UAV Autonomy Research Suite

UAV Trajectory Tracking

Multi-Agent Task Allocation

Publications

Cross-Layer Supervisory Control for Low-Altitude UAV Swarm Networks

Cross-Layer Supervisory Control for Low-Altitude UAV Swarm Networks

Projects

Traceback AI

JobPrep AI

Skills

Experience

Let's buildsomethingthat matters.

Nitin
Singh
Rathore

Let's build
something
that matters.