-
Notifications
You must be signed in to change notification settings - Fork 44
Description
Plugin Reduction Analysis: Moving Intelligence from Builder to Fromager
Executive Summary
Analysis of 62 package plugins and 171 YAML settings files in the AIPCC wheels builder reveals that ~30 plugins (48%) could be fully eliminated and ~12 more (19%) significantly simplified by adding targeted features to Fromager. The remaining ~20 plugins (32%) contain genuinely complex, package-specific logic that must remain as plugins.
The highest-impact Fromager enhancements, ranked by plugin elimination potential:
| Proposed Fromager Feature | Plugins Eliminated | Plugins Simplified |
|---|---|---|
| YAML-driven resolver configuration (GitLab/GitHub tag provider) | 11 fully | 16 partially |
Auto-detect ensure_pkg_info need |
3 fully | 13 partially |
YAML-driven dependency patching (remove_install_requires) |
3 fully | 4 partially |
YAML-driven version override (build_version setting) |
3 fully | 2 partially |
YAML-driven CMake FETCHCONTENT_SOURCE_DIR downloads |
2 fully | 3 partially |
| Midstream tag matcher in YAML settings | 10 fully (subset of resolver) | — |
| Missing file creation via YAML | 1 fully | — |
1. Current Plugin Landscape
1.1 Hook Usage Distribution
| Hook | Plugin Count | % of 62 Plugins |
|---|---|---|
prepare_source |
33 | 53% |
get_resolver_provider |
27 | 44% |
build_wheel |
21 | 34% |
download_source |
14 | 23% |
update_extra_environ |
11 | 18% |
resolve_source |
3 | 5% |
build_sdist |
4 | 6% |
get_build_system_dependencies |
2 | 3% |
get_build_backend_dependencies |
2 | 3% |
get_build_sdist_dependencies |
2 | 3% |
get_install_dependencies_of_sdist |
2 | 3% |
1.2 Plugin Complexity Tiers
Tier 1 — Trivial (could be YAML only): 17 plugins
- Resolver-only: deep_ep, kubeflow, litellm, llama_stack, mlflow_skinny, mlflow_tracing, mlserver, mlserver_lightgbm, mlserver_sklearn, mlserver_xgboost, pplx_kernels, tritonclient
- PKG-INFO only: ibm_fms, userpath, cuda_pathfinder
- Version override only: torchaudio, torchvision
Tier 2 — Low complexity (1–2 simple operations): 15 plugins
- Source patching: amdsmi, certifi, docling, hf_xet, kfp, kornia_rs, tqdm_multiprocess, tpu_inference
- Simple git source: nvidia_riva_client, tacozip, deep_gemm
- Version+env: trl, torchao, faiss_cpu
Tier 3 — Medium complexity (multi-hook with custom logic): 16 plugins
- Git source + dependency downloads: cmake, onnx, nvidia_cudnn_frontend, nixl, nixl_cu12, nixl_cu13, opencv_python_headless, symengine
- Custom Rust vendoring: meson, outlines_core
- Version-to-commit mapping: submodlib_py, tpu_info, vllm_hpu_extension
- Custom build steps: flashinfer_python, llvmlite, pydantic_core
Tier 4 — High complexity (irreducible): 14 plugins
- Full custom build pipelines: aotriton, pyarrow, tensorflow, tilelang, triton
- Multi-variant hardware builds: torch, vllm, bitsandbytes
- Custom resolvers: tensorflow_rocm, amdsmi (dynamic constraints)
- Special builds: flit_core, mlflow, pypdfium2, flashinfer_python
- Complex dependency management: vllm_neuron, nixl_cu12/cu13
2. Proposed Fromager Enhancements
2.1 YAML-Driven Git Tag Resolver (HIGH IMPACT)
Problem: 27 plugins implement get_resolver_provider, and 11 of these exist solely for this hook. Most just wire up GitLabTagProvider or GitHubTagProvider with a project path and optional matcher.
Current plugin pattern (repeated 11+ times):
def get_resolver_provider(ctx, req, sdist_server_url, include_sdists, include_wheels, **kwargs):
if include_sdists:
return resolver.GitLabTagProvider(
project_path="/redhat/rhel-ai/core/mirrors/github/org/repo",
constraints=ctx.constraints,
)
return resolver.default_resolver_provider(ctx, req, sdist_server_url, ...)Proposed YAML setting:
# overrides/settings/package.yaml
resolver_dist:
provider: gitlab_tags # or "github_tags" or "pypi" (default)
project_path: "/redhat/rhel-ai/core/mirrors/github/org/repo" # for gitlab_tags
# OR
organization: "org" # for github_tags
repo: "repo" # for github_tags
tag_matcher: "midstream" # built-in matchers: "standard", "midstream", or regex string
tag_filter_asset: "vendored-deps" # optional: filter tags by asset name prefix
include_sdists: false
include_wheels: trueBuilt-in tag matchers Fromager should support:
"standard"— Defaultv1.2.3/1.2.3pattern (already exists)"midstream"— ADR-114 patternv{version}+rhai{N}[.{accelerator}](used by 10 plugins)- Custom regex string — e.g.,
"v(.+)-stable"(used by litellm, pplx_kernels) - Callable — For truly complex cases (keep plugin hook as escape hatch)
Plugins eliminated: deep_ep, kubeflow, litellm, llama_stack, mlflow_skinny, mlflow_tracing, mlserver, mlserver_lightgbm, mlserver_sklearn, mlserver_xgboost, pplx_kernels, tritonclient (12 plugins)
Plugins simplified: aiter, deep_gemm, flashinfer_jit_cache, nixl, nixl_cu12, nixl_cu13, nvidia_cudnn_frontend, torchao, tilelang, vllm_neuron (10 plugins — resolver portion eliminated)
Implementation notes:
- The
create_midstream_matcher()function frompackage_plugins/utils.pywould need to be ported to Fromager - Midstream matcher supports optional
acceleratorparameter andfilter_by_acceleratorflag - The
tag_filter_assetoption handles the mlflow pattern of filtering GitHub releases by attached assets
2.2 Auto-Detect and Generate PKG-INFO (MEDIUM IMPACT)
Problem: 16 plugins call ensure_pkg_info() in their prepare_source hook. For 3 plugins (ibm_fms, userpath, cuda_pathfinder), this is their only purpose.
Current pattern:
def prepare_source(ctx, req, source_filename, version):
source_root_dir, is_new = sources.default_prepare_source(ctx, req, source_filename, version)
if is_new:
sources.ensure_pkg_info(source_root_dir, req, version)
return source_root_dir, is_newProposed Fromager change:
Make default_prepare_source automatically call ensure_pkg_info when PKG-INFO is missing. This is already done in default_build_sdist, but by the time build_sdist runs, some tools (setuptools-scm) may have already failed during dependency resolution.
Specifically: In sources.default_prepare_source(), after prepare_new_source(), check if PKG-INFO exists. If not, call ensure_pkg_info(). This is safe because:
ensure_pkg_infoonly creates PKG-INFO if it doesn't exist- Having PKG-INFO present never breaks builds
- It's needed for setuptools-scm fallback version detection
- It must happen before dependency hooks run (not just before sdist)
Alternatively (conservative approach): Add a YAML setting:
# overrides/settings/package.yaml
source_options:
ensure_pkg_info: true # default: false, for backward compatibilityPlugins eliminated: ibm_fms, userpath, cuda_pathfinder (3 plugins)
Plugins simplified: 13 other plugins that call ensure_pkg_info in their prepare_source can remove that line
2.3 YAML-Driven Dependency Patching (MEDIUM IMPACT)
Problem: 9 plugins patch pyproject.toml, setup.cfg, or requirements.in to remove or modify runtime dependencies. Fromager already supports project_override.remove_build_requires for build dependencies, but not for install/runtime dependencies.
Current plugin patterns:
Pattern A — Remove install dependencies (docling, kfp, tpu_inference, vllm):
def prepare_source(ctx, req, source_filename, version):
source_root_dir, is_new = sources.default_prepare_source(...)
if is_new:
utils.replace_lines(source_root_dir / "pyproject.toml", {
utils.poetry_pkg("easyocr"): "", # comment out dependency
utils.poetry_pkg("rapidocr"): "", # comment out dependency
})
return source_root_dir, is_newPattern B — Remove build dependency not in [build-system] (amdsmi):
# Remove 'clang' from [project] dependencies section
utils.replace_lines(source_root_dir / "pyproject.toml", {
r'"clang",': "# clang removed",
})Proposed YAML setting:
# overrides/settings/package.yaml
project_override:
# Existing (already works):
remove_build_requires:
- cmake
update_build_requires:
- "setuptools>=69.0"
# NEW: Remove install/runtime dependencies
remove_install_requires:
- easyocr
- rapidocr-onnxruntime
- clang
# NEW: Update/add install dependencies
update_install_requires:
- "torchvision>=0.20,<0.21" # replace version constraint
# NEW: Remove optional/extra dependencies
remove_extras_requires:
dev:
- pytest-benchmark
ocr:
- easyocrImplementation: Fromager's pyproject.apply_project_override() (already called in prepare_new_source()) would be extended to handle these new keys. It already has the infrastructure for modifying pyproject.toml and setup.cfg.
Plugins eliminated: docling, kfp (2 plugins fully — their only purpose is dependency removal)
Plugins simplified: amdsmi, tpu_inference, vllm, vllm_neuron (4 plugins — dependency patching portion removed)
2.4 YAML-Driven Version Override (MEDIUM IMPACT)
Problem: 5 plugins exist solely or primarily to set BUILD_VERSION environment variable for Torch/Meta ecosystem packages. This is a well-understood, deterministic pattern.
Current plugin pattern:
# torchaudio.py, torchvision.py (identical)
def build_wheel(ctx, build_env, extra_environ, req, sdist_root_dir, build_dir, version):
extra_environ["BUILD_VERSION"] = str(version)
return wheels.default_build_wheel(ctx, build_env, extra_environ, req, sdist_root_dir, build_dir, version)# trl.py
def build_wheel(...):
extra_environ["PYTORCH_BUILD_VERSION"] = version.base_version
extra_environ["PYTORCH_BUILD_NUMBER"] = str(version.post) if version.post else "0"
return wheels.default_build_wheel(...)Proposed YAML setting:
# overrides/settings/torchaudio.yaml
env:
BUILD_VERSION: "${version}" # NEW: support ${version} template variable
# OR more explicit:
build_version:
BUILD_VERSION: "${version}"
# For torch/trl:
PYTORCH_BUILD_VERSION: "${version.base_version}"
PYTORCH_BUILD_NUMBER: "${version.post:-0}"Implementation: Extend Fromager's get_extra_environ() to support ${version}, ${version.base_version}, ${version.post} template variables in the env: section. Currently only ${VAR} (environment variable references) are supported.
Plugins eliminated: torchaudio, torchvision, trl (3 plugins)
Plugins simplified: torchao (version override portion removed), torch (BUILD_VERSION portion)
2.5 YAML-Driven CMake Dependency Downloads (MEDIUM IMPACT)
Problem: 5 plugins download external source archives during prepare_source for CMake FETCHCONTENT_SOURCE_DIR_* offline builds. The pattern is identical: download a tarball, extract it, set an environment variable pointing to it.
Current plugin pattern (repeated in cmake, onnx, nvidia_cudnn_frontend, symengine, nixl_cu12/cu13):
def prepare_source(ctx, req, source_filename, version):
source_root_dir, is_new = sources.default_prepare_source(...)
if is_new:
# Download dependency
tarball = utils.download_url(
source_root_dir / ".deps",
f"https://github.com/org/dep/archive/v{dep_version}.tar.gz",
f"dep-{dep_version}.tar.gz",
)
# Extract
shutil.unpack_archive(tarball, source_root_dir / ".deps")
return source_root_dir, is_new
def update_extra_environ(ctx, req, version, sdist_root_dir, extra_environ, build_env):
extra_environ["FETCHCONTENT_SOURCE_DIR_DEP"] = str(
sdist_root_dir / ".deps" / f"dep-{dep_version}"
)Proposed YAML setting:
# overrides/settings/onnx.yaml
build_dependencies:
- name: protobuf
url: "https://github.com/protocolbuffers/protobuf/releases/download/v${dep_version}/protobuf-${dep_version}.tar.gz"
version_from: "CMakeLists.txt" # extract version from this file
version_regex: 'set\(Protobuf_VERSION "([^"]+)"\)'
extract_to: ".deps/"
env_var: "FETCHCONTENT_SOURCE_DIR_PROTOBUF"
- name: abseil
url: "https://github.com/abseil/abseil-cpp/releases/download/${dep_version}/abseil-cpp-${dep_version}.tar.gz"
version_from: ".deps/protobuf-${protobuf.version}/cmake/abseil-cpp.cmake"
version_regex: 'set\(ABSL_TAG "([^"]+)"\)'
extract_to: ".deps/"
env_var: "FETCHCONTENT_SOURCE_DIR_ABSL"Alternative (simpler) approach — list of downloads:
# overrides/settings/cmake.yaml
source_downloads:
- url: "https://github.com/Kitware/CMake/releases/download/v3.31.6/cmake-3.31.6.tar.gz"
destination: "build/cmake-3.31.6-source"
env_var: "CMAKE_SOURCE_DIR" # optional: set env var pointing to extracted dir
# For packages needing version extracted from source files
- url: "https://github.com/org/dep/archive/v${extract:CMakeLists.txt:set\\(DEP_VERSION \"([^\"]+)\"\\)}.tar.gz"
destination: ".deps/"Plugins eliminated: cmake (1 plugin — downloading CMake source is its only purpose)
Plugins simplified: onnx, nvidia_cudnn_frontend, symengine, nixl_cu12, nixl_cu13 (5 plugins — download portion removed)
Implementation complexity: Medium. The simple version (explicit URLs + destinations) is straightforward. The version-extraction-from-files feature adds complexity but handles a real pattern (onnx extracting protobuf version from CMakeLists.txt).
2.6 Auto-Create Missing Files (LOW IMPACT)
Problem: 1 plugin (tqdm_multiprocess) exists solely to create an empty requirements-dev.txt file that setup.py expects but the sdist doesn't include.
Proposed YAML setting:
# overrides/settings/tqdm_multiprocess.yaml
source_fixes:
create_files:
- path: "requirements-dev.txt"
content: "" # empty filePlugin eliminated: tqdm_multiprocess (1 plugin)
Implementation complexity: Very low. Add a step in prepare_new_source() that creates specified files if they don't exist.
2.7 YAML-Driven Source Line Patching (LOW-MEDIUM IMPACT)
Problem: 10 plugins use utils.replace_lines() to make regex-based modifications to source files. While Fromager already has a patch system (overrides/patches/), these are often single-line changes that don't warrant full patch files (which are version-specific and fragile).
Current pattern:
utils.replace_lines(source_root_dir / "pyproject.toml", {
r'"easyocr>=1.4"': "",
r'rapidocr-onnxruntime': "",
})Proposed YAML setting:
# overrides/settings/package.yaml
source_patches:
- file: "pyproject.toml"
replacements:
- pattern: '"easyocr>=1.4"'
replacement: ""
- pattern: "rapidocr-onnxruntime"
replacement: ""
- file: "Cargo.toml"
replacements:
- pattern: 'features = \["require-simd"\]'
replacement: 'features = []'
arch_only: "s390x" # optional: only apply on specific architecturesNote: This overlaps significantly with proposal 2.3 (remove_install_requires) for the dependency-removal use case. This more general facility would handle non-dependency patching cases like:
- kornia_rs: Disabling SIMD features in Cargo.toml for s390x
- hf_xet: Removing
python-sourcefrom[tool.maturin] - deep_gemm: Patching
__version__in__init__.py
Plugins simplified: kornia_rs, hf_xet, deep_gemm, tpu_inference, amdsmi (5 plugins)
Implementation complexity: Medium. Need to handle regex safely, support architecture conditions, and integrate into prepare_new_source().
2.8 Version-to-Commit Mapping in YAML (LOW IMPACT)
Problem: 3 plugins (submodlib_py, tpu_info, vllm_hpu_extension) use resolve_source with VersionMap to map specific versions to git commit SHAs because the upstream repos have no version tags.
Current pattern:
VERSIONS = VersionMap({
"abc123def": ("1.0.0", "https://gitlab.com/.../archive/abc123def/repo-abc123def.tar.gz"),
})
def resolve_source(ctx, req, sdist_server_url, **kwargs):
return resolve_specifier(ctx, req, VERSIONS.get_version_info)Proposed YAML setting:
# overrides/settings/submodlib_py.yaml
version_map:
"1.0.0":
commit: "abc123def456"
url: "https://gitlab.com/.../archive/${commit}/repo-${commit}.tar.gz"
"1.1.0":
commit: "def789ghi012"
url: "https://gitlab.com/.../archive/${commit}/repo-${commit}.tar.gz"Plugins eliminated: submodlib_py (partially), tpu_info (partially), vllm_hpu_extension (partially)
Implementation complexity: Low-medium. Needs integration with the resolver system.
3. Fromager Internal Improvements (No YAML Changes)
3.1 Automatic Rust Vendor Ordering
Problem: 2 plugins (meson, outlines_core) override prepare_source solely to control the ordering of Rust vendoring relative to patch application. The default order is: unpack → patch → vendor Rust. These plugins need: unpack → vendor Rust → patch.
Current workaround:
def prepare_source(ctx, req, source_filename, version):
source_root_dir = sources.unpack_source(...)
vendor_rust.vendor_rust(...) # vendor BEFORE patching
sources.patch_source(...) # then patch
return source_root_dir, TrueProposed Fromager change:
# overrides/settings/meson.yaml
source_options:
vendor_rust_before_patch: true # default: false (current behavior)Or better: Fromager could auto-detect when patches modify vendored Rust files and adjust ordering automatically.
Plugins eliminated: meson, outlines_core (if combined with other changes)
3.2 Monorepo Build Dir Auto-Detection
Problem: 12 packages use build_dir in YAML settings, and 2 plugins (triton, pyarrow) implement multiple hooks solely to redirect build operations to a subdirectory. Fromager already supports build_dir in YAML, but some plugins still override hooks because build_dir wasn't propagated to all hook default implementations.
Current state: Fromager's PackageBuildInfo.build_dir() method is available, and most default implementations accept a build_dir parameter. The issue is that some plugins override hooks just to pass the correct build_dir.
Proposed Fromager change: Ensure ALL default hook implementations automatically use pbi.build_dir() when available, so plugins never need to override hooks solely for build directory redirection.
3.3 Git Clone Improvements
Problem: 14 plugins implement download_source hooks. Many of these could be replaced by Fromager's built-in git clone support (via git+ URLs and git_options in YAML), but currently the built-in support doesn't handle:
- Non-standard tag formats (e.g.,
r{version}for nvidia_riva_client,{build_number}for opencv) - GitLab authentication token injection
- Pre-clone source modifications
Proposed Fromager changes:
# overrides/settings/nvidia_riva_client.yaml
download_source:
url: "https://gitlab.com/.../archive/r${version}/repo-r${version}.tar.gz"
# Tag format template already supported, but ensure 'r' prefix works
git_options:
submodules: true
tag_format: "r${version}" # NEW: custom tag format for git clone4. Impact Analysis
4.1 Plugins That Can Be Fully Eliminated
If all proposals in Section 2 were implemented in Fromager:
| Plugin | Eliminated By | Current Purpose |
|---|---|---|
| deep_ep | 2.1 (YAML resolver) | GitLab tag resolver only |
| kubeflow | 2.1 (YAML resolver) | Midstream tag resolver only |
| litellm | 2.1 (YAML resolver) | Custom regex tag resolver only |
| llama_stack | 2.1 (YAML resolver) | Midstream tag resolver only |
| mlflow_skinny | 2.1 (YAML resolver) | Midstream tag resolver only |
| mlflow_tracing | 2.1 (YAML resolver) | Midstream tag resolver only |
| mlserver | 2.1 (YAML resolver) | Midstream tag resolver only |
| mlserver_lightgbm | 2.1 (YAML resolver) | Midstream tag resolver only |
| mlserver_sklearn | 2.1 (YAML resolver) | Midstream tag resolver only |
| mlserver_xgboost | 2.1 (YAML resolver) | Midstream tag resolver only |
| pplx_kernels | 2.1 (YAML resolver) | Custom regex tag resolver only |
| tritonclient | 2.1 (YAML resolver) | Midstream tag resolver only |
| ibm_fms | 2.2 (auto PKG-INFO) | PKG-INFO generation only |
| userpath | 2.2 (auto PKG-INFO) | PKG-INFO generation only |
| cuda_pathfinder | 2.2 (auto PKG-INFO) | PKG-INFO generation only |
| torchaudio | 2.4 (version override) | BUILD_VERSION only |
| torchvision | 2.4 (version override) | BUILD_VERSION only |
| trl | 2.4 (version override) | PYTORCH_BUILD_VERSION only |
| cmake | 2.5 (CMake downloads) | Download CMake source only |
| tqdm_multiprocess | 2.6 (create files) | Create empty file only |
| docling | 2.3 (dep patching) | Remove dependencies only |
| kfp | 2.3 (dep patching) | Remove dependencies only |
Total: 22 plugins fully eliminated (35% of all plugins)
4.2 Plugins That Can Be Significantly Simplified
| Plugin | Hooks Remaining | Hooks Eliminated |
|---|---|---|
| aiter | update_extra_environ (GPU_ARCHS logic) |
get_resolver_provider, download_source, prepare_source |
| torchao | update_extra_environ (SCM version) |
get_resolver_provider, download_source, prepare_source |
| deep_gemm | prepare_source (version patching) |
get_resolver_provider, download_source |
| flashinfer_jit_cache | — (may be fully eliminated) | get_resolver_provider, download_source, prepare_source |
| nvidia_riva_client | — (may be fully eliminated) | download_source, prepare_source |
| kornia_rs | — (may be fully eliminated with arch-conditional patching) | prepare_source |
| hf_xet | — (may be fully eliminated with TOML patching) | prepare_source |
| onnx | update_extra_environ (env vars) |
prepare_source (download portion) |
| nvidia_cudnn_frontend | build_wheel (env var setup) |
get_resolver_provider, prepare_source (download portion) |
| amdsmi | get_resolver_provider (dynamic constraints) |
prepare_source (dependency removal) |
| tpu_inference | update_extra_environ |
prepare_source (patching portion) |
| vllm_neuron | download_source (pre-tarball patching) |
get_resolver_provider |
Total: ~12 plugins significantly simplified (19% of all plugins)
4.3 Plugins That Must Remain (Irreducible Complexity)
These plugins contain genuinely complex, package-specific logic that cannot be reasonably generalized:
| Plugin | Lines | Reason |
|---|---|---|
| vllm | ~815 | 6-variant build with 7+ external repo clones, extensive per-variant dependency patching |
| torch | ~278 | ROCm-specific AMD build steps, triton version fetching, complex dependency management |
| aotriton | ~394 | Full cmake/ninja build pipeline, custom wheel structure creation |
| tensorflow | ~394 | Bazel build system, custom .bazelrc generation, CUDA/ROCm configuration |
| tilelang | ~324 | cmake/ninja build, custom pyproject.toml generation, CUDA/ROCm configuration |
| pyarrow | ~200 | 4-stage cmake/make build, CUDA variant support |
| triton | ~180 | Version-dependent build dir, LLVM version detection |
| bitsandbytes | ~150 | Multi-backend compilation (cpu/cuda/hip) |
| flashinfer_python | ~120 | Version-dependent build behavior, AOT compilation |
| symengine | ~150 | C++ library compilation + cmake |
| certifi | ~50 | Security patching (system cert store) — policy, not build logic |
| mlflow | ~100 | Yarn/Node.js web UI build |
| meson | ~60 | Custom Rust vendoring order (could be simplified with 3.1) |
| outlines_core | ~80 | Custom Rust vendoring order + aws-lc-sys patching |
| tensorflow_rocm | ~100 | Custom resolver scraping AMD repo |
| flit_core | ~30 | Bootstrap self-build |
| pypdfium2 | ~80 | System library integration |
| opencv_python_headless | ~120 | Non-standard tag format, ADE dependency pre-download |
| nixl_cu12 | ~215 | Meson subproject dependency downloads, .wrap file patching |
| nixl_cu13 | ~215 | Duplicate of nixl_cu12 (could share code but not eliminate) |
5. Implementation Roadmap
Phase 1: Quick Wins (Low effort, high plugin reduction)
Estimated effort: 1–2 weeks of Fromager development
-
Auto
ensure_pkg_infoindefault_prepare_source(Proposal 2.2)- Eliminates: 3 plugins
- Simplifies: 13 plugins
- Effort: ~2 hours (add one check + function call)
- Risk: Very low (additive, never breaks builds)
-
${version}template in env settings (Proposal 2.4)- Eliminates: 3 plugins
- Simplifies: 2 plugins
- Effort: ~4 hours (extend template engine)
- Risk: Low
-
YAML-driven dependency removal (Proposal 2.3 —
remove_install_requires)- Eliminates: 2 plugins
- Simplifies: 4 plugins
- Effort: ~1 day (extend
apply_project_override()) - Risk: Low (builds on existing infrastructure)
Phase 2: Medium Effort (Significant plugin reduction)
Estimated effort: 2–4 weeks of Fromager development
-
YAML-driven resolver configuration (Proposal 2.1)
- Eliminates: 12 plugins
- Simplifies: 10 plugins
- Effort: ~1 week
- Components:
- Add
provider,project_path,organization,repoto YAML schema - Add
tag_matcherwith built-in "standard", "midstream", regex options - Port
create_midstream_matcher()to Fromager
- Add
- Risk: Medium (resolver is core infrastructure)
-
YAML-driven source downloads (Proposal 2.5)
- Eliminates: 1 plugin
- Simplifies: 5 plugins
- Effort: ~3 days
- Risk: Low-medium
-
Create missing files from YAML (Proposal 2.6)
- Eliminates: 1 plugin
- Effort: ~2 hours
- Risk: Very low
Phase 3: Refinements (Lower priority)
-
Source line patching in YAML (Proposal 2.7)
- Simplifies: 5 plugins
- Effort: ~3 days
- Risk: Medium (regex handling needs care)
-
Rust vendor ordering control (Proposal 3.1)
- Simplifies: 2 plugins
- Effort: ~4 hours
- Risk: Low
-
Version-to-commit mapping (Proposal 2.8)
- Simplifies: 3 plugins
- Effort: ~2 days
- Risk: Low
6. Quantitative Summary
Before vs. After (All Phases)
| Metric | Current | After Phase 1 | After Phase 2 | After All Phases |
|---|---|---|---|---|
| Total plugins | 62 | 54 (-8) | 41 (-21) | 40 (-22) |
| Resolver-only plugins | 12 | 12 | 0 (-12) | 0 |
| Trivial plugins (< 20 lines of logic) | 17 | 9 (-8) | 0 (-17) | 0 |
| Entry points in pyproject.toml | 80 | 72 (-8) | 59 (-21) | 58 (-22) |
| Lines of plugin code (estimated) | ~6,500 | ~6,000 | ~4,800 | ~4,500 |
Plugin Reduction by Proposal
Proposal 2.1 (YAML resolver) ████████████████████████ 12 eliminated, 10 simplified
Proposal 2.2 (auto PKG-INFO) ██████████████████████ 3 eliminated, 13 simplified
Proposal 2.3 (dep patching) ████████████ 2 eliminated, 4 simplified
Proposal 2.4 (version override) ██████████ 3 eliminated, 2 simplified
Proposal 2.5 (CMake downloads) ████████████ 1 eliminated, 5 simplified
Proposal 2.6 (create files) ██ 1 eliminated
Proposal 2.7 (source patching) ██████████ 0 eliminated, 5 simplified
Proposal 2.8 (version-commit map) ██████ 0 eliminated, 3 simplified
Proposals 3.x (internal) ████ 0 eliminated, 2 simplified
7. Recommendations for Fromager Maintainers
Do First (Highest ROI)
- Auto
ensure_pkg_info— Trivial change, immediately simplifies 16 plugins - YAML resolver configuration — Eliminates 12 boilerplate plugins, largest single win
${version}in env templates — Small change, eliminates 3 plugins
Consider Carefully
remove_install_requires— Useful but need to define scope (pyproject.toml only? setup.cfg? requirements files?)- Source downloads — Generic feature but version extraction from source files adds complexity
Defer or Keep as Plugin Responsibility
- Multi-stage native builds (cmake/ninja/bazel) — Too varied to generalize
- Hardware-specific build logic — Inherently package-specific
- External repo cloning (vllm cloning cutlass, flash-attn, etc.) — Too complex and package-specific
Architectural Principle
The dividing line should be: If the customization can be expressed as data (URLs, patterns, names, flags), it belongs in YAML settings. If it requires procedural logic (conditional builds, multi-stage pipelines, dynamic computation), it belongs in plugins.
Appendix A: Plugin Inventory by Category
A.1 Resolver-Only Plugins (12) — All eliminable via Proposal 2.1
| Plugin | Provider | Project/Org | Matcher |
|---|---|---|---|
| deep_ep | GitLabTagProvider | mirrors/github/deepseek-ai/DeepEP | standard |
| kubeflow | GitLabTagProvider | mirrors/github/kubeflow/sdk | midstream |
| litellm | GitLabTagProvider | mirrors/github/BerriAI/litellm | regex: v(.+)-stable |
| llama_stack | GitLabTagProvider | mirrors/github/meta-llama/llama-stack | midstream |
| mlflow_skinny | GitHubTagProvider | mlflow/mlflow | midstream |
| mlflow_tracing | GitHubTagProvider | mlflow/mlflow | midstream |
| mlserver | GitLabTagProvider | mirrors/github/SeldonIO/MLServer | midstream |
| mlserver_lightgbm | GitLabTagProvider | mirrors/github/SeldonIO/MLServer | midstream |
| mlserver_sklearn | GitLabTagProvider | mirrors/github/SeldonIO/MLServer | midstream |
| mlserver_xgboost | GitLabTagProvider | mirrors/github/SeldonIO/MLServer | midstream |
| pplx_kernels | GitLabTagProvider | mirrors/github/ppl-ai/pplx-kernels | regex: +downstream |
| tritonclient | GitLabTagProvider | mirrors/github/triton-inference-server/client | midstream |
A.2 PKG-INFO-Only Plugins (3) — All eliminable via Proposal 2.2
| Plugin | Additional Logic |
|---|---|
| ibm_fms | None |
| userpath | None |
| cuda_pathfinder | Uses pbi.build_dir() for monorepo — needs ensure_pkg_info to respect build_dir |
A.3 Version-Override-Only Plugins (3) — All eliminable via Proposal 2.4
| Plugin | Env Var | Value |
|---|---|---|
| torchaudio | BUILD_VERSION |
str(version) |
| torchvision | BUILD_VERSION |
str(version) |
| trl | PYTORCH_BUILD_VERSION + PYTORCH_BUILD_NUMBER |
version.base_version + post or "0" |
A.4 Dependency-Removal-Only Plugins (2) — All eliminable via Proposal 2.3
| Plugin | File Modified | Dependencies Removed |
|---|---|---|
| docling | pyproject.toml | easyocr, rapidocr-onnxruntime |
| kfp | requirements.in | all runtime deps |
A.5 Complex Plugins Requiring Custom Logic (20) — Must remain as plugins
aotriton, bitsandbytes, certifi, flashinfer_python, flit_core, meson, mlflow, nixl_cu12, nixl_cu13, opencv_python_headless, outlines_core, pyarrow, pypdfium2, symengine, tensorflow, tensorflow_rocm, tilelang, torch, triton, vllm
Appendix B: Fromager YAML Settings Schema (Current)
For reference, these are the settings Fromager currently supports via YAML:
build_dir: str # Monorepo subdirectory
annotations: dict # Arbitrary key-value metadata
changelog: dict[str, list[str]] # Version changelog
config_settings: dict # PEP 517 config settings
env: dict[str, str] # Environment variables (with ${VAR} templates)
download_source:
url: str # Download URL (with ${version}, ${canonicalized_name})
destination_filename: str # Downloaded filename template
resolver_dist:
sdist_server_url: str # Custom simple index URL
include_sdists: bool # Resolve from sdists
include_wheels: bool # Resolve from wheels
ignore_platform: bool # Cross-platform resolution
use_pypi_org_metadata: bool|null # PyPI metadata preference
build_options:
cpu_cores_per_job: int # CPU scaling factor
memory_per_job_gb: float # Memory scaling factor
exclusive_build: bool # Build alone
build_ext_parallel: bool # Distutils parallelization
git_options:
submodules: bool # Clone all submodules
submodule_paths: list[str] # Clone specific submodules
project_override:
update_build_requires: list[str] # Add/update build requirements
remove_build_requires: list[str] # Remove build requirements
requires_external: list[str] # System dependency metadata
variants:
<variant_name>:
annotations: dict
env: dict[str, str]
pre_built: bool # Use pre-built wheel
wheel_server_url: str # Custom wheel serverAppendix C: Proposed Additions to Fromager YAML Schema
# --- Proposal 2.1: YAML-driven resolver ---
resolver_dist:
provider: str # NEW: "pypi" (default), "gitlab_tags", "github_tags"
project_path: str # NEW: GitLab project path (for gitlab_tags)
organization: str # NEW: GitHub org (for github_tags)
repo: str # NEW: GitHub repo (for github_tags)
tag_matcher: str # NEW: "standard", "midstream", or regex pattern
tag_filter_asset: str # NEW: Filter by release asset prefix
# --- Proposal 2.3: Dependency patching ---
project_override:
remove_install_requires: list[str] # NEW: Remove runtime dependencies
update_install_requires: list[str] # NEW: Add/update runtime dependencies
# --- Proposal 2.4: Version templates in env ---
env:
BUILD_VERSION: "${version}" # NEW: Support ${version} template
# Also: ${version.base_version}, ${version.post}, ${version.major}, etc.
# --- Proposal 2.5: Source dependency downloads ---
source_downloads: # NEW section
- url: str # Download URL
destination: str # Extract destination (relative to source root)
env_var: str # Optional: set env var pointing to extracted dir
sha256: str # Optional: checksum verification
# --- Proposal 2.6: Missing file creation ---
source_fixes: # NEW section
create_files:
- path: str # File path relative to source/build dir
content: str # File content
# --- Proposal 2.7: Source line patching ---
source_patches: # NEW section
- file: str # File path relative to source root
replacements:
- pattern: str # Regex pattern
replacement: str # Replacement string
arch_only: str # Optional: only apply on specific arch
# --- Proposal 2.8: Version-to-commit mapping ---
version_map: # NEW section
"<version>":
commit: str # Git commit SHA
url: str # Download URL (with ${commit} template)
# --- Proposal 3.1: Rust vendor ordering ---
source_options: # NEW section
ensure_pkg_info: bool # Create PKG-INFO if missing (default: auto)
vendor_rust_before_patch: bool # Vendor Rust crates before patching