ptq — PyTorch Job Queue

CLI tool that takes a GitHub issue number, SSHs into a remote GPU machine, and launches an agent (Claude/Codex/Cursor/pi) to autonomously investigate and fix the bug. The agent produces a report and a diff that you can review and turn into a PR.

Install

No install step needed — just use uv run:

cd pt_job_queue
uv run ptq --help

For development (tests, web dashboard):

uv run --extra dev pytest
uv run ptq web

UBER SPEED MODE GETING STARTED

Assumes you have uv installed otherwise: curl -LsSf https://astral.sh/uv/install.sh | sh

git clone git@github.com:drisspg/pt_job_queue.git

# This will take a second
uv run ptq setup --local --build
uv run ptq run -m "tell me a story" --agent codex

Usage

1. Set up a machine

# Remote GPU machine (auto-detects CUDA version)
uv run ptq setup my-gpu-box --build

# Remote with explicit CUDA version
uv run ptq setup my-gpu-box --cuda cu130 --build

# Local (for testing/development)
uv run ptq setup --local --cpu --build

This creates a workspace with:

A uv-managed venv with PyTorch nightly
A pytorch source clone at the matching nightly commit
Helper scripts for applying fixes to site-packages

When --build is used, setup performs a full checkout nuke before editable install (git clean -dfx + submodule sync/update) to avoid stale CMake/Ninja graphs after upstream file moves.

Speed up C++ rebuilds: Install system NCCL to skip building it from source (~5 min savings per rebuild):

sudo apt install -y libnccl-dev

Then add to ~/.ptq/config.toml:

[build.env]
USE_SYSTEM_NCCL = "1"

2. Create a named worktree

# On a remote machine
uv run ptq worktree flex-attn --machine my-gpu-box

# Locally (default when no --machine)
uv run ptq worktree my-fix --local

# With verbose build output
uv run ptq worktree stride-fix --machine my-gpu-box -v

Creates a PyTorch git worktree with a ready-to-use venv, without launching an agent. Useful when you want to work in the worktree yourself or defer agent launch. Run ptq setup ... first — worktree assumes the workspace already exists. The command prints the shell command to enter the worktree.

Later, launch an agent in the same worktree by name:

uv run ptq run flex-attn -m "optimize the CPU codegen"

The worktree shows up in ptq list and can be cleaned with ptq clean like any other job.

3. Launch an investigation

# On a remote machine
uv run ptq run --issue 174923 --machine my-gpu-box

# Locally
uv run ptq run --issue 174923 --local

# Run in background (don't stream output)
uv run ptq run --issue 174923 --machine my-gpu-box --no-follow

# Ad-hoc task (no issue, just a message)
uv run ptq run --machine my-gpu-box -m "Optimize the flex attention CPU codegen"

# Issue + extra context
uv run ptq run --issue 174923 --machine my-gpu-box -m "Focus on the stride logic"

# Use a preset template from the prompt library
ptq run --issue 174923 --machine my-gpu-box -p diagnose_and_plan

# Preset + extra instructions (appends your -m text)
ptq run --issue 174923 --machine my-gpu-box -p fix_and_verify -m "focus only on scaled_mm path"

# Use a different agent
uv run ptq run --issue 174923 --machine my-gpu-box --agent cursor --model gpt-5.3-codex-xhigh-fast

# Use first-class thinking control when the backend supports it
uv run ptq run --agent pi --model openai-codex/gpt-5.4 --thinking high -m "triage the repro"

The agent will:

Reproduce the bug using a repro script extracted from the issue
Read pytorch source to find the root cause
Apply a minimal Python-only fix
Test the fix by copying edits to site-packages and re-running the repro
Write report.md and fix.diff

Re-running the same issue reuses the existing worktree and preserves prior edits. Each run gets its own log (claude-1.log, claude-2.log, ...). Different issues run concurrently via separate git worktrees. Fresh workspaces still need an explicit ptq setup ... first.

4. Web dashboard

uv run ptq web
# or on a custom port
uv run ptq web --port 9000

The web UI lets you:

Launch jobs (issue-based or ad-hoc) with agent/model/thinking/machine selection
Fill the message box from a built-in prompt library for Repro Only, Diagnose And Plan, and Fix And Verify
Monitor live logs via streaming
View reports, diffs, and worklogs
Follow up on stopped jobs with steering messages
Take Over — copies an SSH command that drops you into the job's worktree with the venv activated
Create PRs directly from the UI

Web UI preview

Add a screenshot at docs/assets/web-ui.png and this README will render it automatically.

The prompt library is backed by ~/.ptq/config.toml.

Per-agent model defaults live there too. For backends with first-class reasoning controls, you can set thinking separately from the model:

[models.pi]
default = "openai-codex/gpt-5.4"
thinking = "high"

[models.claude]
default = "opus"
thinking = "high"

[models.codex]
default = "gpt-5.4"
thinking = "high"

Cursor currently encodes reasoning level in the model name itself, so PTQ continues to treat Cursor thinking as model-driven.

Built-ins are always available and can be overridden under [prompt_library.builtin.<name>]
User presets can be added under [prompt_library.custom.<name>]

List everything available from CLI with:

ptq presets

5. Monitor progress (CLI)

# Peek at the agent's worklog
uv run ptq peek 174923

# Peek with recent log activity
uv run ptq peek 174923 --log 30

# List all jobs with running/stopped status
uv run ptq list

The agent maintains a worklog.md with entries after each significant step, so you can check progress without streaming the full output.

6. View results

# By issue number (uses most recent job)
uv run ptq results 174923

# By full job ID
uv run ptq results 20260214-174923

Fetches report.md, fix.diff, worklog.md, and the run log from the remote.

7. Apply the fix

uv run ptq apply 174923 --pytorch-path ~/meta/pytorch

Creates a branch ptq/{issue_number}, applies the diff, and prints next steps for creating a PR.

8. Manage agents

# Check status of a specific job
uv run ptq status 174923

# Kill a specific agent
uv run ptq kill 174923

# Kill all agents on a machine (tracked + zombie processes)
uv run ptq prune my-gpu-box

# Kill all local agents
uv run ptq prune --local

9. Clean up

# Remove all jobs on a machine
uv run ptq clean my-gpu-box

# Keep the 3 most recent
uv run ptq clean my-gpu-box --keep 3

# Clean local workspace
uv run ptq clean --local

Removes job directories and prunes git worktrees.

Options

Flag	Command	Default	Description
`--cuda`	setup	auto-detect	CUDA tag (`cu124`, `cu126`, `cu128`, `cu130`)
`--cpu`	setup		Use CPU-only PyTorch (macOS/testing)
`--machine`	run, worktree		Remote machine hostname
`--local`	setup, run, worktree, clean, prune		Use local workspace instead of SSH
`--follow/--no-follow`	run	follow	Stream agent output to terminal
`--agent`	run	claude	Agent (`claude`, `codex`, `cursor`, `pi`)
`--model`	run	opus	Model name (agent-specific)
`--thinking`	run	agent default	Reasoning/thinking level when supported by the agent
`--max-turns`	run	100	Max agent turns
`-m/--message`	run		Ad-hoc task or extra context for an issue
`-p/--preset`	run		Prompt preset key/title from prompt library
`--workspace`	setup, run, worktree, prune	`~/ptq_workspace`	Custom workspace path
`--keep`	clean	0	Number of recent jobs to keep
`--log`	peek	0	Number of log lines to show

Adding a new repo

Add a [repos.<name>] section to ~/.ptq/config.toml:

[repos.torchtitan]
github_repo = "pytorch/torchtitan"
clone_url = "https://github.com/pytorch/torchtitan.git"
dir_name = "torchtitan"
smoke_test_import = "torchtitan"
repro_import_hint = "import torchtitan"

Create prompt templates in prompts/:
- prompts/investigate_<name>.md — issue investigation prompt
- prompts/adhoc_<name>.md — freeform task prompt

The prompt templates are where the real work is — they teach the agent about the repo's build system, directory layout, debugging tools, and testing conventions. See the existing investigate.md and investigate_torchtitan.md for examples.

Optional profile fields (all default to false/null):

Field	Description
`uses_custom_worktree_tool`	Use `tools/create_worktree.py` instead of `git worktree add`
`needs_cpp_build`	Run C++ rebuild after worktree creation
`lint_cmd`	Lint command to run before PRs

Project layout

pt_job_queue/
├── pyproject.toml
├── ptq/
│   ├── cli.py                          # Thin Typer CLI adapter
│   ├── ssh.py                          # SSH/SCP + local subprocess backends
│   ├── issue.py                        # GitHub issue fetching via gh
│   ├── agent.py                        # Prompt construction + text utilities
│   ├── agents.py                       # Agent protocol + claude/codex/cursor/pi
│   ├── config.py                       # Config loading (~/.ptq/config.toml)
│   ├── workspace.py                    # Remote workspace setup
│   ├── domain/
│   │   ├── models.py                   # JobRecord, RunRequest, JobStatus, errors
│   │   └── policies.py                 # Job ID generation
│   ├── infrastructure/
│   │   ├── job_repository.py           # JSON persistence (~/.ptq/jobs.json)
│   │   └── backends.py                 # Backend factory functions
│   ├── application/
│   │   ├── run_service.py              # Launch/rerun orchestration
│   │   ├── worktree_service.py         # Worktree + venv provisioning
│   │   ├── job_service.py              # Status/kill/clean/list
│   │   ├── artifact_service.py         # Results fetching + diff apply
│   │   └── pr_service.py              # PR creation workflow
│   └── web/
│       ├── app.py                      # FastAPI app factory
│       ├── deps.py                     # Template + status helpers
│       ├── routes.py                   # Thin web route adapter
│       ├── static/style.css            # Dark-theme styles
│       └── templates/                  # Jinja2 templates (Pico CSS + htmx)
├── prompts/
│   ├── investigate.md                  # PyTorch issue investigation prompt
│   ├── adhoc.md                        # PyTorch freeform task prompt
│   ├── investigate_torchtitan.md       # TorchTitan issue investigation prompt
│   └── adhoc_torchtitan.md             # TorchTitan freeform task prompt
└── scripts/
    └── rebuild.sh

Workspace layout (on remote/local)

~/ptq_workspace/
├── .venv/                          # uv-managed, PyTorch nightly
├── pytorch/                        # Source clone at nightly commit
├── scripts/apply_to_site_pkgs.sh   # Copies edits to site-packages
└── jobs/
    └── 20260214-174923/            # Per-issue job directory
        ├── pytorch/                # git worktree (isolated)
        ├── system_prompt.md
        ├── repro.py
        ├── claude-1.log            # Per-run logs
        ├── claude-2.log
        ├── worklog.md              # Agent progress log
        ├── report.md
        └── fix.diff

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
docs/assets		docs/assets
prompts		prompts
ptq		ptq
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ptq — PyTorch Job Queue

Install

UBER SPEED MODE GETING STARTED

Usage

1. Set up a machine

2. Create a named worktree

3. Launch an investigation

4. Web dashboard

Web UI preview

5. Monitor progress (CLI)

6. View results

7. Apply the fix

8. Manage agents

9. Clean up

Options

Adding a new repo

Project layout

Workspace layout (on remote/local)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ptq — PyTorch Job Queue

Install

UBER SPEED MODE GETING STARTED

Usage

1. Set up a machine

2. Create a named worktree

3. Launch an investigation

4. Web dashboard

Web UI preview

5. Monitor progress (CLI)

6. View results

7. Apply the fix

8. Manage agents

9. Clean up

Options

Adding a new repo

Project layout

Workspace layout (on remote/local)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages