MS Computer Engineering at NYU, BS Data Science at Duke. I build practical AI systems with a focus on LLM infrastructure, inference performance, observability, and customer-facing AI workflows.
I like work that turns messy model-serving behavior into something measurable: benchmarks, traces, evals, small correctness tests, better docs, and tools that make AI systems easier to operate.
- LLM serving infrastructure: SGLang, vLLM-style serving, llm-d, routing, KV-cache behavior, and OpenAI-compatible APIs.
- Performance tooling: inference benchmarks, kernel correctness checks, latency reports, and lightweight observability.
- Forward-deployed AI systems: support/onboarding agents, RAG workflows, citations, escalation paths, and eval loops.
- Applied AI products: local-first assistants and domain workflows where reliability matters more than demo polish.
| Project | What it shows | Stack |
|---|---|---|
| langgraph-support-agent | LangGraph support/onboarding agent for AI infrastructure repos, with local retrieval, evals, escalation, and a forward-deployed case study. | Python, LangGraph |
| sglang-observability-router | OpenAI-compatible routing proxy and benchmark harness for SGLang-style serving experiments. | Python, LLM serving |
| flashinfer-kernel-bench | Correctness-first microbenchmarks for attention and sampling code paths. | Python, NumPy, Torch |
| rust-candle-gateway | Rust inference gateway with bounded request handling, health checks, metrics, and a Candle-ready engine boundary. | Rust |
| fitsnap-coach | Local-first AI fitness coach with form checks, recovery scoring, trend charts, and an agent task workspace. | JavaScript |
- SGLang PR #29205: parser fix and unit test for flat
max_dynamic_patchimage metadata in Jinja template content. - llm-d PR #1942: contributor guide for coding agents working inside an AI infrastructure repository.
- Start with a small reproducible path before optimizing.
- Prefer tests, eval cases, and traceable outputs over broad claims.
- Keep setup steps explicit because developer experience is part of the product.
- Write docs and runbooks when they make a system easier for the next person to operate.
- Computer engineering, data science, medical imaging, visualization, and cloud data pipelines.
- Comfortable with Python, JavaScript/TypeScript, Rust, C++/CUDA, SQL, Docker, PyTorch, FastAPI, Three.js, and ML/data tooling.
- Bilingual: English and Chinese.
