Skip to content

py_binary venvs: analysis-time assembly + shared external_venv (v2.0.0)#944

Open
gregmagolan wants to merge 2 commits intomainfrom
optimized_py_binary
Open

py_binary venvs: analysis-time assembly + shared external_venv (v2.0.0)#944
gregmagolan wants to merge 2 commits intomainfrom
optimized_py_binary

Conversation

@gregmagolan
Copy link
Copy Markdown
Member

@gregmagolan gregmagolan commented Apr 23, 2026

What

Overhauls how py_binary / py_test acquire virtualenvs and consolidates the API surface. Ships as v2.0.0.

Legacy py_binary staged its site-packages tree at launcher startup: every invocation ran a Rust venv_bin tool that walked the runfiles, created symlinks under a temp venv dir, and then exec'd Python against it. On small graphs that's a few hundred ms of overhead; on large monorepos with thousands of wheels we measured 10s of seconds per bazel run / bazel test — on every invocation, not amortised by caching. Legacy py_venv_binary avoided that cost by building a real on-disk venv as a Bazel declare_directory tree artifact, but paid for it with tree-artifact remote-execution / remote-cache penalties and a different launcher shape that didn't integrate with py_image_layer or the rest of the py_binary surface.

This PR unifies both onto a single model: the venv is a tree of individual Bazel outputs (ctx.actions.symlink + ctx.actions.write outputs, not a tree artifact) produced at analysis time. The launcher is a no-op by comparison — Bazel has already placed the venv files in runfiles — so startup is milliseconds, not seconds, regardless of graph size. On top of that, py_binary(external_venv = :shared_venv) lets many binaries share a single py_venv target, and py_binary(expose_venv = True) auto-emits a first-class sibling :<name>.venv py_venv per target so the shared-venv pattern is available per-callsite without a separate declaration.

Why

Two user-visible wins:

  1. Faster startup. The launcher no longer runs a Rust binary to materialise the venv before exec'ing Python. Improvement scales with wheel count — biggest impact on large monorepos, where launcher cost was dominant.

  2. Shared virtualenvs. The new external_venv attribute lets many py_binary / py_test targets share a single py_venv. Point your IDE at one venv and every entrypoint in the repo resolves to the same interpreter + dep closure. An analysis-time coverage check rejects binaries whose dep closure isn't a subset of the venv's, with a clear error naming the missing wheels.

A few smaller ergonomic improvements fell out:

  • env = {…} and env_inherit = […] declared on a py_venv flow through to py_binary(external_venv = …) consumers; binary-level env wins on key conflicts.
  • isolated = False on py_binary opts out of Python's -I flag for code that needs PYTHONPATH or script-dir-on-sys.path semantics (matches the legacy py_venv_binary shape).
  • py_image_layer produces valid container tars in all configurations we test (container test coverage turned out to have been latently-broken before this work).
  • Stable load paths: @aspect_rules_py//py:defs.bzl, //py:extensions.bzl, //uv:defs.bzl, //uv:extensions.bzl. The /unstable/ paths graduated.

Shape of the change

py_binary / py_test gain expose_venv. When expose_venv = True is passed, the macro splits into a first-class sibling :<name>.venv py_venv target carrying all venv-shaping attrs (deps, imports, package_collisions, include_*_site_packages, interpreter_options) plus the binary/test rule with external_venv = ":<name>.venv". The .venv sibling is consumable (other targets can external_venv = "//:<name>.venv") and runnable (bazel run :<name>.venv drops into the hermetic interpreter). expose_venv defaults to False so default py_binary callers emit exactly one target — no graph bloat.

Auto-emitted <name>.venv sibling removed. In v1.x the py_binary macro unconditionally emitted a sibling :<name>.venv link target. That doubled the node count in bazel query :* for every caller and forced the internal venv to awkwardly avoid the <name>.venv name. v2.0.0 drops the auto-emit; users who want the IDE-materialise-to-workspace behaviour declare py_venv_link(name = ..., venv = ":<name>.venv") explicitly (and must also pass expose_venv = True on the binary to get a sibling venv to point at).

py_venv_link is now an explicit opt-in rule. Its signature changed: takes venv = <label> pointing at an existing py_venv (typically the one from expose_venv = True), produces a runnable target whose bazel run materialises a workspace-local symlink to the venv tree.

py_venv_binary / py_venv_test were removed. Calling them now fail()s at analysis with a clear migration message. Replacement is py_binary / py_test with expose_venv = True, isolated = False. The legacy deprecation-warning path (print-and-delegate) is gone; the failure is the one-and-only migration signal.

/unstable/ load paths were removed. Everything previously at //py/unstable:{defs,extension}.bzl and //uv/unstable:{defs,extension}.bzl moved to the stable //py:defs.bzl, //py:extensions.bzl, //uv:defs.bzl, //uv:extensions.bzl paths. Loading the old paths fail()s with a message naming the replacement.

Dropped Rust tooling. The venv_bin and venv_shim crates and their toolchain types (VENV_TOOLCHAIN, VENV_EXEC_TOOLCHAIN, SHIM_TOOLCHAIN) are gone — their jobs (runtime venv assembly, interpreter-indirection shim) are now done in Starlark. Only unpack remains, collapsed into a single crate under py/tools/unpack/. A future shell-out to the uv CLI could remove that too once the uv toolchain lands.

Two-hop symlinks for containers. The venv's per-top-level site-packages/<name> symlinks were previously calibrated for the runfiles layout only, which broke py_image_layer's bazel-bin tar walk. They now route through a <venv>/_wheels/<i>/ directory alias whose relative targets resolve identically from bazel-bin, runfiles, and inside an OCI container.

New test coverage. Four regression tests added for pytest-integration machinery that previously relied only on downstream usage to validate: pytest-xdist parallel-worker execution, py_pytest_main(chdir = ...) template substitution, COVERAGE_MANIFEST / LCOV post-processing in pytest_main.py, and multiple py_binary + py_test targets sharing a single external_venv py_venv.

tar.bzl upstream fixes

Building py_image_layer on real-world inputs surfaced two bugs in bazel-contrib/tar.bzl's preserve_symlinks.awk:

  • #106 — make_relative_link off-by-one. Emits N ../s for a path of N segments; should be N-1. Produces dangling symlinks whose traversal escapes the archive.
  • #107 — symlink detection under any transitioned config. Classifier was scoped to the mtree_mutate action's own bin_dir and missed targets under transitioned configs (-ST-<hash> suffix) and under external/<repo>/. Also flips readlink order so authored declare_symlink + target_path strings land in the archive verbatim.

Both fixes are carried locally in py/private/modify_mtree.awk via a small custom mtree_preserve_symlinks rule. Once the upstream PRs merge and we bump tar.bzl, the local fork retires and py_image_layer reverts to mtree_mutate(preserve_symlinks = True).

Migration guide

For a walkthrough with before/after examples and a troubleshooting section: docs/migrating_v1_v2.md.

Breaking changes (v2.0.0)

All breaking changes surface with a clear error at analysis time; bazel test //... after the bump names every callsite that needs updating.

Removed macros — py_venv_binary / py_venv_test

Calling either macro now fail()s. Replacement is plain py_binary / py_test with expose_venv = True, isolated = False. Plain py_binary(...) with no extra attrs is the common case (analysis-time venv assembly is the default); the two extra attrs exist for callers who need the old split-target shape plus PYTHONPATH-honoring launcher.

# v1.x
py_venv_binary(
    name = "my_tool",
    srcs = ["my_tool.py"],
    main = "my_tool.py",
    deps = [...],
)

# v2.0.0 — common case
py_binary(
    name = "my_tool",
    srcs = ["my_tool.py"],
    main = "my_tool.py",
    deps = [...],
)

# v2.0.0 — preserve the old split + launcher semantics
py_binary(
    name = "my_tool",
    srcs = ["my_tool.py"],
    main = "my_tool.py",
    deps = [...],
    expose_venv = True,
    isolated = False,
)

Removed load paths — /unstable/

//py/unstable:{defs,extension}.bzl and //uv/unstable:{defs,extension}.bzl are gone. Each now fail()s at load time with a message pointing at the stable replacement.

# v1.x
load("@aspect_rules_py//py/unstable:defs.bzl", "py_venv")
load("@aspect_rules_py//uv/unstable:defs.bzl", "gazelle_python_manifest")
interpreters = use_extension("@aspect_rules_py//py/unstable:extension.bzl", "python_interpreters")
uv = use_extension("@aspect_rules_py//uv/unstable:extension.bzl", "uv")

# v2.0.0
load("@aspect_rules_py//py:defs.bzl", "py_venv")
load("@aspect_rules_py//uv:defs.bzl", "gazelle_python_manifest")
interpreters = use_extension("@aspect_rules_py//py:extensions.bzl", "python_interpreters")
uv = use_extension("@aspect_rules_py//uv:extensions.bzl", "uv")

Removed auto-emitted sibling — :<name>.venv no longer shows up for free

In v1.x, every py_binary(name = "foo") call auto-generated a :foo.venv link target. v2.0.0 drops the auto-emit. Callers who ran bazel run :foo.venv for IDE integration must either:

  • Opt into expose_venv = True on the binary to get a consumable + runnable :foo.venv py_venv, then declare py_venv_link explicitly if they also need the workspace-materialise behaviour, or
  • Stop relying on the sibling entirely (point the IDE at a py_venv declared separately, or at bazel-bin directly).
# v1.x — :my_app.venv auto-generated
py_binary(name = "my_app", ...)
# $ bazel run :my_app.venv          # worked

# v2.0.0 — opt in explicitly
py_binary(name = "my_app", expose_venv = True, ...)
py_venv_link(name = "my_app_ide", venv = ":my_app.venv")
# $ bazel run :my_app.venv          # drops into interpreter
# $ bazel run :my_app_ide           # materialises workspace symlink

Changed signature — py_venv_link

Was: takes deps / srcs / imports and builds its own internal venv.
Now: takes venv = <label> pointing at an existing py_venv target. Pair with py_binary(expose_venv = True, ...) or a standalone py_venv declaration.

# v1.x
py_venv_link(
    name = "my_venv",
    srcs = ["main.py"],
    deps = ["@pypi//fastapi"],
)

# v2.0.0
py_venv(
    name = "my_venv_target",
    deps = ["@pypi//fastapi"],
)
py_venv_link(
    name = "my_venv",
    venv = ":my_venv_target",
)

Removed Rust toolchain types

VENV_TOOLCHAIN, VENV_EXEC_TOOLCHAIN, and SHIM_TOOLCHAIN are gone — their work (runtime venv staging, interpreter-indirection shim) is now done in Starlark at analysis time. Users who registered overrides for these toolchain types must delete those registrations; leaving them in place produces toolchain-resolution errors.

Changed venv internal layout — two-hop site-packages symlinks

The per-top-level <venv>/lib/python<ver>/site-packages/<top> symlinks now route through an intermediate <venv>/_wheels/<i>/ directory alias instead of pointing directly at the wheel's materialised tree. Python-level consumers (site.py, importlib, every normal user) see no difference. Filesystem-walking tools that iterate the venv tree and expect one-hop symlinks — custom tar rules, PEX-style packagers, docker build scripts that walk site-packages/ — may need updating. The relative-link depth is the same in bazel-bin, runfiles, and inside an OCI image, so the new shape is strictly more portable than the old one.

Unchanged (explicitly, for the skim-reader)

  • The string venv = "..." attribute (uv pip-extension wheel selection) is unchanged. The new external_venv = :label is a separately-typed attribute that coexists with it.
  • py_binary callers who don't opt into external_venv, expose_venv, or isolated = False see no behavioural changes beyond the startup-performance win.
  • The default on-disk venv basename stays .<name>.venv/ — same as v1.x. IDEs auto-detect the path unchanged.

@gregmagolan gregmagolan force-pushed the optimized_py_binary branch from 5de72ac to cc215c4 Compare April 23, 2026 23:31
@aspect-workflows
Copy link
Copy Markdown

aspect-workflows Bot commented Apr 23, 2026

Bazel 8 (Test)

All tests were cache hits

137 tests (100.0%) were fully cached saving 1m 1s.


Bazel 9 (Test)

All tests were cache hits

136 tests (100.0%) were fully cached saving 1m 7s.


Bazel 8 (Test)

e2e

All tests were cache hits

81 tests (100.0%) were fully cached saving 1m 21s.


Bazel 9 (Test)

e2e

All tests were cache hits

81 tests (100.0%) were fully cached saving 1m 19s.


Bazel 8 (Test)

examples/uv_pip_compile

All tests were cache hits

1 test (100.0%) was fully cached saving 444ms.


Buildifier

@gregmagolan gregmagolan force-pushed the optimized_py_binary branch 4 times, most recently from 51aff4f to 6a61c17 Compare April 24, 2026 01:27
Comment thread uv/private/whl_install/rule.bzl
Comment thread py/private/providers.bzl Outdated
Comment thread py/private/py_library.bzl
Comment thread py/private/py_library.bzl
@gregmagolan gregmagolan force-pushed the optimized_py_binary branch 2 times, most recently from 5cdfc60 to e6249ef Compare April 24, 2026 02:07
@gregmagolan gregmagolan changed the title WIP: optimizing py_binary start times Optimize py_binary start-up times: build venv with symlink actions instead of during py_binary bootstrap phase Apr 24, 2026
@gregmagolan gregmagolan changed the title Optimize py_binary start-up times: build venv with symlink actions instead of during py_binary bootstrap phase feat: optimize py_binary start-up times: build venv with symlink actions instead of during py_binary bootstrap phase Apr 24, 2026
@gregmagolan gregmagolan force-pushed the optimized_py_binary branch from e6249ef to 959ece0 Compare April 24, 2026 02:35
Comment thread py/private/py_binary.bzl
Comment thread py/private/py_binary.bzl Outdated
Comment thread py/private/py_binary.bzl
@gregmagolan gregmagolan force-pushed the optimized_py_binary branch 11 times, most recently from 93b5cd2 to d39e5fb Compare April 24, 2026 07:37
@gregmagolan
Copy link
Copy Markdown
Member Author

Put up bazel-contrib/tar.bzl#106

@gregmagolan
Copy link
Copy Markdown
Member Author

Put up bazel-contrib/tar.bzl#107

@gregmagolan gregmagolan force-pushed the optimized_py_binary branch 2 times, most recently from ab86fe2 to efceb7c Compare April 24, 2026 15:06
@gregmagolan gregmagolan changed the title feat: optimize py_binary start-up times: build venv with symlink actions instead of during py_binary bootstrap phase py_binary venvs: analysis-time assembly + shared external_venv support Apr 24, 2026
@gregmagolan gregmagolan force-pushed the optimized_py_binary branch from efceb7c to 50a2859 Compare April 24, 2026 15:23
@gregmagolan gregmagolan changed the title py_binary venvs: analysis-time assembly + shared external_venv support py_binary venvs: analysis-time assembly + shared external_venv (v2.0.0) Apr 24, 2026
@gregmagolan gregmagolan force-pushed the optimized_py_binary branch from 5a6bc37 to 6adb52b Compare April 24, 2026 16:09
@gregmagolan gregmagolan force-pushed the optimized_py_binary branch 2 times, most recently from b943ae3 to 7bbca29 Compare April 24, 2026 16:47

def _unpack_toolchain_path_impl(ctx):
unpack_bin = ctx.toolchains[UNPACK_TOOLCHAIN].bin.bin
unpack = ctx.toolchains[UNPACK_TOOLCHAIN].bin.bin
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional name change

Comment thread e2e/cases/root-dir-paths-538/BUILD.bazel Outdated
Comment thread e2e/cases/root-dir-paths-538/BUILD.bazel
@gregmagolan gregmagolan force-pushed the optimized_py_binary branch from 022055c to 5993a4d Compare April 24, 2026 18:01
if ctx.attr.main:
if not ctx.attr.main.label.name.endswith(".py"):
fail("main must end in '.py'")
# Check the RESOLVED file name, not the label name — the label may
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this (+ a test) something that can be merged now? Ask claude to cherry-pick this change + a test...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregmagolan gregmagolan force-pushed the optimized_py_binary branch 2 times, most recently from 3658c37 to 7a803f5 Compare April 24, 2026 20:02
@gregmagolan gregmagolan force-pushed the optimized_py_binary branch from 7a803f5 to fd20423 Compare April 24, 2026 20:27
# Launcher for py_venv targets — `bazel run :name` activates the venv
# and exec's its bin/python interactively.
#
# (py_venv_binary / py_venv_test don't use this template — they expand
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence seems irrelevant here.

fi

{{BASH_RLOCATION_FN}}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert so you can still read the code that gets generated 🙏

Comment thread py/private/py_binary.bzl
]

def _check_venv_coverage(ctx, imports_depset, wheels_depset, vinfo):
"""Analysis-time check: everything the binary needs must live in the venv.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you think about making this a validation action? Or making it optional?

IDK if it's worth changing it from validation-in-starlark to validation-action, but it would allow avoiding the .to_list() calls I think?

Comment thread py/private/py_binary.bzl

default_info_files = [executable_launcher, main]
if not external_venv:
default_info_files = default_info_files + extra_runfiles
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra_runfiles got added to DefaultInfo in addition to runfiles? Normally I think we'd want to (a) keep them separate (b) not duplicate them?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#949 ?

Comment thread py/defs.bzl
# `load("@aspect_rules_py//py:defs.bzl", "py_venv_binary")` resolves
# and the call-site failure surfaces the friendly message (instead of
# Bazel's generic "symbol not exported" error).
py_venv_binary = _py_venv_binary
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the deprecated ones should remain where they were. There's no reason to tell people to migrate from py/unsafe/defs.bzl to py/defs.bzl just to be told it's unsupported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants