CLI tool that takes a GitHub issue number, SSHs into a remote GPU machine, and launches an agent (Claude/Codex/Cursor/pi) to autonomously investigate and fix the bug. The agent produces a report and a diff that you can review and turn into a PR.
No install step needed — just use uv run:
cd pt_job_queue
uv run ptq --helpFor development (tests, web dashboard):
uv run --extra dev pytest
uv run ptq webAssumes you have uv installed otherwise: curl -LsSf https://astral.sh/uv/install.sh | sh
git clone git@github.com:drisspg/pt_job_queue.git
# This will take a second
uv run ptq setup --local --build
uv run ptq run -m "tell me a story" --agent codex# Remote GPU machine (auto-detects CUDA version)
uv run ptq setup my-gpu-box --build
# Remote with explicit CUDA version
uv run ptq setup my-gpu-box --cuda cu130 --build
# Local (for testing/development)
uv run ptq setup --local --cpu --buildThis creates a workspace with:
- A
uv-managed venv with PyTorch nightly - A pytorch source clone at the matching nightly commit
- Helper scripts for applying fixes to site-packages
When --build is used, setup performs a full checkout nuke before editable install (git clean -dfx + submodule sync/update) to avoid stale CMake/Ninja graphs after upstream file moves.
Speed up C++ rebuilds: Install system NCCL to skip building it from source (~5 min savings per rebuild):
sudo apt install -y libnccl-devThen add to ~/.ptq/config.toml:
[build.env]
USE_SYSTEM_NCCL = "1"# On a remote machine
uv run ptq worktree flex-attn --machine my-gpu-box
# Locally (default when no --machine)
uv run ptq worktree my-fix --local
# With verbose build output
uv run ptq worktree stride-fix --machine my-gpu-box -vCreates a PyTorch git worktree with a ready-to-use venv, without launching an agent. Useful when you want to work in the worktree yourself or defer agent launch. Run ptq setup ... first — worktree assumes the workspace already exists. The command prints the shell command to enter the worktree.
Later, launch an agent in the same worktree by name:
uv run ptq run flex-attn -m "optimize the CPU codegen"The worktree shows up in ptq list and can be cleaned with ptq clean like any other job.
# On a remote machine
uv run ptq run --issue 174923 --machine my-gpu-box
# Locally
uv run ptq run --issue 174923 --local
# Run in background (don't stream output)
uv run ptq run --issue 174923 --machine my-gpu-box --no-follow
# Ad-hoc task (no issue, just a message)
uv run ptq run --machine my-gpu-box -m "Optimize the flex attention CPU codegen"
# Issue + extra context
uv run ptq run --issue 174923 --machine my-gpu-box -m "Focus on the stride logic"
# Use a preset template from the prompt library
ptq run --issue 174923 --machine my-gpu-box -p diagnose_and_plan
# Preset + extra instructions (appends your -m text)
ptq run --issue 174923 --machine my-gpu-box -p fix_and_verify -m "focus only on scaled_mm path"
# Use a different agent
uv run ptq run --issue 174923 --machine my-gpu-box --agent cursor --model gpt-5.3-codex-xhigh-fast
# Use first-class thinking control when the backend supports it
uv run ptq run --agent pi --model openai-codex/gpt-5.4 --thinking high -m "triage the repro"The agent will:
- Reproduce the bug using a repro script extracted from the issue
- Read pytorch source to find the root cause
- Apply a minimal Python-only fix
- Test the fix by copying edits to site-packages and re-running the repro
- Write
report.mdandfix.diff
Re-running the same issue reuses the existing worktree and preserves prior edits. Each run gets its own log (claude-1.log, claude-2.log, ...). Different issues run concurrently via separate git worktrees. Fresh workspaces still need an explicit ptq setup ... first.
uv run ptq web
# or on a custom port
uv run ptq web --port 9000The web UI lets you:
- Launch jobs (issue-based or ad-hoc) with agent/model/thinking/machine selection
- Fill the message box from a built-in prompt library for
Repro Only,Diagnose And Plan, andFix And Verify - Monitor live logs via streaming
- View reports, diffs, and worklogs
- Follow up on stopped jobs with steering messages
- Take Over — copies an SSH command that drops you into the job's worktree with the venv activated
- Create PRs directly from the UI
Add a screenshot at
docs/assets/web-ui.pngand this README will render it automatically.
The prompt library is backed by ~/.ptq/config.toml.
Per-agent model defaults live there too. For backends with first-class reasoning controls, you can set thinking separately from the model:
[models.pi]
default = "openai-codex/gpt-5.4"
thinking = "high"
[models.claude]
default = "opus"
thinking = "high"
[models.codex]
default = "gpt-5.4"
thinking = "high"Cursor currently encodes reasoning level in the model name itself, so PTQ continues to treat Cursor thinking as model-driven.
- Built-ins are always available and can be overridden under
[prompt_library.builtin.<name>] - User presets can be added under
[prompt_library.custom.<name>]
List everything available from CLI with:
ptq presets# Peek at the agent's worklog
uv run ptq peek 174923
# Peek with recent log activity
uv run ptq peek 174923 --log 30
# List all jobs with running/stopped status
uv run ptq listThe agent maintains a worklog.md with entries after each significant step, so you can check progress without streaming the full output.
# By issue number (uses most recent job)
uv run ptq results 174923
# By full job ID
uv run ptq results 20260214-174923Fetches report.md, fix.diff, worklog.md, and the run log from the remote.
uv run ptq apply 174923 --pytorch-path ~/meta/pytorchCreates a branch ptq/{issue_number}, applies the diff, and prints next steps for creating a PR.
# Check status of a specific job
uv run ptq status 174923
# Kill a specific agent
uv run ptq kill 174923
# Kill all agents on a machine (tracked + zombie processes)
uv run ptq prune my-gpu-box
# Kill all local agents
uv run ptq prune --local# Remove all jobs on a machine
uv run ptq clean my-gpu-box
# Keep the 3 most recent
uv run ptq clean my-gpu-box --keep 3
# Clean local workspace
uv run ptq clean --localRemoves job directories and prunes git worktrees.
| Flag | Command | Default | Description |
|---|---|---|---|
--cuda |
setup | auto-detect | CUDA tag (cu124, cu126, cu128, cu130) |
--cpu |
setup | Use CPU-only PyTorch (macOS/testing) | |
--machine |
run, worktree | Remote machine hostname | |
--local |
setup, run, worktree, clean, prune | Use local workspace instead of SSH | |
--follow/--no-follow |
run | follow | Stream agent output to terminal |
--agent |
run | claude | Agent (claude, codex, cursor, pi) |
--model |
run | opus | Model name (agent-specific) |
--thinking |
run | agent default | Reasoning/thinking level when supported by the agent |
--max-turns |
run | 100 | Max agent turns |
-m/--message |
run | Ad-hoc task or extra context for an issue | |
-p/--preset |
run | Prompt preset key/title from prompt library | |
--workspace |
setup, run, worktree, prune | ~/ptq_workspace |
Custom workspace path |
--keep |
clean | 0 | Number of recent jobs to keep |
--log |
peek | 0 | Number of log lines to show |
- Add a
[repos.<name>]section to~/.ptq/config.toml:
[repos.torchtitan]
github_repo = "pytorch/torchtitan"
clone_url = "https://github.com/pytorch/torchtitan.git"
dir_name = "torchtitan"
smoke_test_import = "torchtitan"
repro_import_hint = "import torchtitan"- Create prompt templates in
prompts/:prompts/investigate_<name>.md— issue investigation promptprompts/adhoc_<name>.md— freeform task prompt
The prompt templates are where the real work is — they teach the agent about the repo's build system, directory layout, debugging tools, and testing conventions. See the existing investigate.md and investigate_torchtitan.md for examples.
Optional profile fields (all default to false/null):
| Field | Description |
|---|---|
uses_custom_worktree_tool |
Use tools/create_worktree.py instead of git worktree add |
needs_cpp_build |
Run C++ rebuild after worktree creation |
lint_cmd |
Lint command to run before PRs |
pt_job_queue/
├── pyproject.toml
├── ptq/
│ ├── cli.py # Thin Typer CLI adapter
│ ├── ssh.py # SSH/SCP + local subprocess backends
│ ├── issue.py # GitHub issue fetching via gh
│ ├── agent.py # Prompt construction + text utilities
│ ├── agents.py # Agent protocol + claude/codex/cursor/pi
│ ├── config.py # Config loading (~/.ptq/config.toml)
│ ├── workspace.py # Remote workspace setup
│ ├── domain/
│ │ ├── models.py # JobRecord, RunRequest, JobStatus, errors
│ │ └── policies.py # Job ID generation
│ ├── infrastructure/
│ │ ├── job_repository.py # JSON persistence (~/.ptq/jobs.json)
│ │ └── backends.py # Backend factory functions
│ ├── application/
│ │ ├── run_service.py # Launch/rerun orchestration
│ │ ├── worktree_service.py # Worktree + venv provisioning
│ │ ├── job_service.py # Status/kill/clean/list
│ │ ├── artifact_service.py # Results fetching + diff apply
│ │ └── pr_service.py # PR creation workflow
│ └── web/
│ ├── app.py # FastAPI app factory
│ ├── deps.py # Template + status helpers
│ ├── routes.py # Thin web route adapter
│ ├── static/style.css # Dark-theme styles
│ └── templates/ # Jinja2 templates (Pico CSS + htmx)
├── prompts/
│ ├── investigate.md # PyTorch issue investigation prompt
│ ├── adhoc.md # PyTorch freeform task prompt
│ ├── investigate_torchtitan.md # TorchTitan issue investigation prompt
│ └── adhoc_torchtitan.md # TorchTitan freeform task prompt
└── scripts/
└── rebuild.sh
~/ptq_workspace/
├── .venv/ # uv-managed, PyTorch nightly
├── pytorch/ # Source clone at nightly commit
├── scripts/apply_to_site_pkgs.sh # Copies edits to site-packages
└── jobs/
└── 20260214-174923/ # Per-issue job directory
├── pytorch/ # git worktree (isolated)
├── system_prompt.md
├── repro.py
├── claude-1.log # Per-run logs
├── claude-2.log
├── worklog.md # Agent progress log
├── report.md
└── fix.diff
