Skip to content
View Gloria72's full-sized avatar
  • @New York University
  • United States
  • 12:06 (UTC -07:00)

Block or report Gloria72

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Gloria72/README.md

Hi, I am Yu Cheng

MS Computer Engineering at NYU, BS Data Science at Duke. I build practical AI systems with a focus on LLM infrastructure, inference performance, observability, and customer-facing AI workflows.

I like work that turns messy model-serving behavior into something measurable: benchmarks, traces, evals, small correctness tests, better docs, and tools that make AI systems easier to operate.

Current Focus

  • LLM serving infrastructure: SGLang, vLLM-style serving, llm-d, routing, KV-cache behavior, and OpenAI-compatible APIs.
  • Performance tooling: inference benchmarks, kernel correctness checks, latency reports, and lightweight observability.
  • Forward-deployed AI systems: support/onboarding agents, RAG workflows, citations, escalation paths, and eval loops.
  • Applied AI products: local-first assistants and domain workflows where reliability matters more than demo polish.

Selected Work

Project What it shows Stack
langgraph-support-agent LangGraph support/onboarding agent for AI infrastructure repos, with local retrieval, evals, escalation, and a forward-deployed case study. Python, LangGraph
sglang-observability-router OpenAI-compatible routing proxy and benchmark harness for SGLang-style serving experiments. Python, LLM serving
flashinfer-kernel-bench Correctness-first microbenchmarks for attention and sampling code paths. Python, NumPy, Torch
rust-candle-gateway Rust inference gateway with bounded request handling, health checks, metrics, and a Candle-ready engine boundary. Rust
fitsnap-coach Local-first AI fitness coach with form checks, recovery scoring, trend charts, and an agent task workspace. JavaScript

Open Source

  • SGLang PR #29205: parser fix and unit test for flat max_dynamic_patch image metadata in Jinja template content.
  • llm-d PR #1942: contributor guide for coding agents working inside an AI infrastructure repository.

How I Work

  • Start with a small reproducible path before optimizing.
  • Prefer tests, eval cases, and traceable outputs over broad claims.
  • Keep setup steps explicit because developer experience is part of the product.
  • Write docs and runbooks when they make a system easier for the next person to operate.

Background

  • Computer engineering, data science, medical imaging, visualization, and cloud data pipelines.
  • Comfortable with Python, JavaScript/TypeScript, Rust, C++/CUDA, SQL, Docker, PyTorch, FastAPI, Three.js, and ML/data tooling.
  • Bilingual: English and Chinese.

Contact


GitHub stats

Pinned Loading

  1. RemoShen/ConVis RemoShen/ConVis Public

    JavaScript 3

  2. Embedded-Project Embedded-Project Public

    C++

  3. bay-area-rental-skill bay-area-rental-skill Public

    Bilingual AI assistant skill for Bay Area apartment screening, review audits, and unit-level rental decisions.

    Python

  4. rust-candle-gateway rust-candle-gateway Public

    Dependency-light Rust inference gateway with a Candle-ready engine boundary.

    Rust

  5. sglang-observability-router sglang-observability-router Public

    OpenAI-compatible routing proxy and benchmark harness for SGLang-style LLM serving.

    Python

  6. langgraph-support-agent langgraph-support-agent Public

    Forward-deployable LangGraph support and onboarding agent for AI infrastructure repos.

    Python