From 4845fed6996ea124cabdad20c574241e520b6674 Mon Sep 17 00:00:00 2001 From: "Shen, Haihao" Date: Fri, 10 Apr 2026 07:53:13 +0000 Subject: [PATCH 1/3] Add agents.md --- AGENTS.md | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++++ AGENTS_CN.md | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 192 insertions(+) create mode 100644 AGENTS.md create mode 100644 AGENTS_CN.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..250eff8b4 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,97 @@ +# AGENTS.md + +Agent-facing notes for working on AutoRound. + +## Project overview +- AutoRound is a Python toolkit for low-bit quantization of LLMs and VLMs. +- Primary entry points are CLI commands like `auto-round`, `auto-round-best`, and `auto-round-light`. + +## Repository layout +- `auto_round/`: core library and quantization logic +- `docs/`: user guides and technical notes +- `test/`: unit tests +- `.azure-pipelines/`: CI scripts and test orchestration + +## Architecture overview +- CLI entry points: `auto_round/__main__.py` +- Core quantization API: `auto_round/autoround.py` +- Compressor implementations: `auto_round/compressors/` (LLM/MLLM/Diffusion) +- Schemes and presets: `auto_round/schemes.py` +- Export formats/pipeline: `auto_round/formats.py`, `auto_round/export/` +- AutoScheme generator: `auto_round/auto_scheme/` +- Data type definitions: `auto_round/data_type/` +- Model-specific patches: `auto_round/modeling/` +- Shared utilities: `auto_round/utils/` +- Evaluation entry points: `auto_round/eval/` + +## Setup +- Python 3.10+ is expected. +- Install for development: `pip install -e .` +- Install runtime deps only: `pip install -r requirements.txt` + +## Build and test commands +- Build from source: `pip install .` +- Run unit tests: `pytest test` + +## Common commands +- CLI help: `auto-round -h` +- List supported formats: `auto_round list format` + +## Minimal examples +- CLI quantization: + +```bash +auto-round --model Qwen/Qwen3-0.6B --scheme "W4A16" --format "auto_round" --output_dir ./tmp_autoround +``` + +- API quantization: + +```python +from auto_round import AutoRound + +ar = AutoRound("Qwen/Qwen3-0.6B", scheme="W4A16") +ar.quantize_and_save(output_dir="./qmodel", format="auto_round") +``` + +- GGUF export (single format): + +```bash +auto-round --model Qwen/Qwen3-0.6B --scheme "W4A16" --format "gguf:q4_k_m" --output_dir ./tmp_autoround_gguf +``` + +- CompressedTensors export (LLM-Compressor format): + +```bash +auto-round --model Qwen/Qwen3-0.6B --scheme "NVFP4" --format "llm_compressor" --output_dir ./tmp_autoround_ct +``` + +## Tests +- Quick local run: `pytest test` +- CI runs split CPU tests and optional LLM-compressor tests via + `.azure-pipelines/scripts/ut/run_ut.sh` (uses `uv`, `numactl`, and + installs extra deps). Expect longer runtimes and more system + requirements. +- Prefer targeted tests for the area you changed. + +## Code style and linting +- Python formatting is aligned with Black/Ruff, line length 120. +- Prefer double quotes for new strings (Ruff format default). +- Keep imports sorted (isort profile: black). + +## Docs and translations +- If you change a markdown doc that has a `_CN` counterpart, update the + Chinese file to keep content and structure aligned (for example, + `README.md` and `README_CN.md`). + +## Contributions +- DCO sign-off is required for commits (see `CONTRIBUTING.md`). + +## Data and large files +- Avoid committing model weights, large binaries, or datasets. +- If you need sample data, prefer small fixtures or use public URLs. + +## Common agent pitfalls +- Do not mix GGUF export with other formats; choose a single GGUF format. +- When editing a markdown doc with a `_CN` counterpart, update both files. +- Avoid running full CI locally unless needed; use targeted tests first. +- Do not add large artifacts (models, binaries, datasets) to the repo. diff --git a/AGENTS_CN.md b/AGENTS_CN.md new file mode 100644 index 000000000..bc996fa19 --- /dev/null +++ b/AGENTS_CN.md @@ -0,0 +1,95 @@ +# AGENTS.md + +AutoRound 智能体使用说明。 + +## 项目概览 +- AutoRound 是面向 LLM/VLM 的低比特量化 Python 工具包。 +- 主要入口是 CLI 命令,例如 `auto-round`、`auto-round-best`、`auto-round-light`。 + +## 仓库结构 +- `auto_round/`: 核心库与量化逻辑 +- `docs/`: 使用指南与技术文档 +- `test/`: 单元测试 +- `.azure-pipelines/`: CI 脚本与测试编排 + +## 架构概览 +- CLI 入口:`auto_round/__main__.py` +- 核心量化 API:`auto_round/autoround.py` +- 压缩器实现:`auto_round/compressors/`(LLM/MLLM/Diffusion) +- 方案与预设:`auto_round/schemes.py` +- 导出格式与流程:`auto_round/formats.py`、`auto_round/export/` +- AutoScheme 生成器:`auto_round/auto_scheme/` +- 数据类型定义:`auto_round/data_type/` +- 模型特定补丁:`auto_round/modeling/` +- 通用工具:`auto_round/utils/` +- 评测入口:`auto_round/eval/` + +## 环境设置 +- 需要 Python 3.10+。 +- 开发安装:`pip install -e .` +- 仅安装运行时依赖:`pip install -r requirements.txt` + +## 构建与测试命令 +- 从源码构建:`pip install .` +- 运行单元测试:`pytest test` + +## 常用命令 +- CLI 帮助:`auto-round -h` +- 列出支持的导出格式:`auto_round list format` + +## 最小示例 +- CLI 量化: + +```bash +auto-round --model Qwen/Qwen3-0.6B --scheme "W4A16" --format "auto_round" --output_dir ./tmp_autoround +``` + +- API 量化: + +```python +from auto_round import AutoRound + +ar = AutoRound("Qwen/Qwen3-0.6B", scheme="W4A16") +ar.quantize_and_save(output_dir="./qmodel", format="auto_round") +``` + +- GGUF 导出(单一格式): + +```bash +auto-round --model Qwen/Qwen3-0.6B --scheme "W4A16" --format "gguf:q4_k_m" --output_dir ./tmp_autoround_gguf +``` + +- CompressedTensors 导出(LLM-Compressor 格式): + +```bash +auto-round --model Qwen/Qwen3-0.6B --scheme "NVFP4" --format "llm_compressor" --output_dir ./tmp_autoround_ct +``` + +## 测试 +- 本地快速运行:`pytest test` +- CI 通过 `.azure-pipelines/scripts/ut/run_ut.sh` 拆分运行 CPU 测试,并在部分场景下 + 运行 LLM-compressor 测试(依赖 `uv`、`numactl`,会安装额外依赖)。运行时间更长且 + 对系统资源要求更高。 +- 优先运行与你改动相关的定向测试。 + +## 代码风格与检查 +- Python 格式化遵循 Black/Ruff,行宽 120。 +- 新增字符串优先使用双引号(Ruff 默认)。 +- 保持导入排序一致(isort profile: black)。 + +## 文档与翻译 +- 如果修改了带 `_CN` 对应版本的 Markdown,请同步更新中文文件以保持内容和结构一致 + (例如 `README.md` 与 `README_CN.md`)。 + +## 贡献 +- 提交需要 DCO 签名(参见 `CONTRIBUTING.md`)。 + +## 数据与大文件 +- 避免提交模型权重、大型二进制或数据集。 +- 需要样例数据时,优先使用小型 fixture 或公开 URL。 + +## 智能体常见错误防范 +- 不要把 GGUF 导出与其他格式混用;GGUF 只选一种格式。 +- 修改带 `_CN` 对应版本的 Markdown 时,请同步更新两份文件。 +- 非必要不要在本地跑完整 CI;优先定向测试。 +- 不要提交大型产物(模型、二进制、数据集)。 From 4de7be8bf0193871db41d4e66b56eebdbdab6c97 Mon Sep 17 00:00:00 2001 From: Haihao Shen Date: Fri, 10 Apr 2026 16:17:19 +0800 Subject: [PATCH 2/3] Update AGENTS.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- AGENTS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 250eff8b4..d179e9155 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -35,7 +35,7 @@ Agent-facing notes for working on AutoRound. ## Common commands - CLI help: `auto-round -h` -- List supported formats: `auto_round list format` +- List supported formats: `auto-round list format` ## Minimal examples - CLI quantization: From fe67630c8d068d251cc4e80abc9a90ad306ebc2b Mon Sep 17 00:00:00 2001 From: Haihao Shen Date: Fri, 10 Apr 2026 16:17:29 +0800 Subject: [PATCH 3/3] Update AGENTS_CN.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- AGENTS_CN.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/AGENTS_CN.md b/AGENTS_CN.md index bc996fa19..c999b169a 100644 --- a/AGENTS_CN.md +++ b/AGENTS_CN.md @@ -35,7 +35,7 @@ AutoRound 智能体使用说明。 ## 常用命令 - CLI 帮助:`auto-round -h` -- 列出支持的导出格式:`auto_round list format` +- 列出支持的导出格式:`auto-round list format` ## 最小示例 - CLI 量化: