Inference Systems Engineer · Distributed Serving · GPU Cluster Scheduling
7+ years engineering resilient distributed systems at Microsoft and Amazon. Currently designing ML-driven proxies and scheduling architectures to optimize LLM inference efficiency.
- vLLM Contributor — PR #41952 (Under Review): Fixed preemption ordering in
PriorityRequestQueueto minimize KV cache recompute overhead. - Clairvoyant — A Go-based reverse proxy eliminating head-of-line (HOL) blocking in vLLM/SGLang via ML-driven Shortest-Job-First (SJF) scheduling. arXiv preprint in prep.
- ACO Scheduler — Ant Colony Optimization GPU cluster scheduler featuring heterogeneous GPU/CPU/ARM64 affinity routing. Achieved P99 latency <10ms and a +28% utilization gain validated against Alibaba and Google Borg traces.
- ServiceScope — An LLM-powered AST dependency mapper processing 190 files/sec with 0% inference failure via localized execution (zero external API calls); validated on Django (2,886 files).
- Google Prompt Wars 2026 (Hack2Skill · Hyderabad) — Rank 16 / 83 (Score: 90.64%)
- AI Code Analysis: 100% Efficiency · 96.5% Problem Alignment · 92.5% Accessibility
Python · Go · C++ · Java · Bash
LLM inference internals · GPU cluster scheduling · Distributed systems · AI-powered developer tooling


