feat: add builtin uninstaller as alternative to calling pip#10904
feat: add builtin uninstaller as alternative to calling pip#10904radoering wants to merge 5 commits into
Conversation
- legacy pip uninstall: propagate pip return code in case of failed uninstalls - new builtin uninstall: rollback uninstall in case of failed install
There was a problem hiding this comment.
Hey - I've found 5 issues, and left some high level feedback:
- The new
UninstallPathSet/StashedUninstallPathSetlogic relies ondist._path(a private attribute ofimportlib.metadata.Distribution); if possible, consider switching to a public API or adding a clear compatibility comment/guard around this to reduce the risk of breakage with future Python versions. - In
StashedUninstallPathSet,_movesis documented as(old path, new path)butrollback()iterates asfor new_path, path in self._moves, effectively swapping semantics; aligning the variable names and/or the comment with the actual ordering would make the rollback behavior easier to reason about and maintain.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `UninstallPathSet`/`StashedUninstallPathSet` logic relies on `dist._path` (a private attribute of `importlib.metadata.Distribution`); if possible, consider switching to a public API or adding a clear compatibility comment/guard around this to reduce the risk of breakage with future Python versions.
- In `StashedUninstallPathSet`, `_moves` is documented as `(old path, new path)` but `rollback()` iterates as `for new_path, path in self._moves`, effectively swapping semantics; aligning the variable names and/or the comment with the actual ordering would make the rollback behavior easier to reason about and maintain.
## Individual Comments
### Comment 1
<location path="src/poetry/installation/uninstaller.py" line_range="209-217" />
<code_context>
+ def norm_join(*a: str) -> str:
+ return os.path.normcase(os.path.join(*a))
+
+ for root in unchecked:
+ if any(os.path.normcase(root).startswith(w) for w in wildcards):
+ # This directory has already been handled.
+ continue
+
+ all_files: set[str] = set()
+ all_subdirs: set[str] = set()
+ for dirname, subdirs, files in os.walk(root):
+ all_subdirs.update(norm_join(root, dirname, d) for d in subdirs)
+ all_files.update(norm_join(root, dirname, f) for f in files)
+ # If all the files we found are in our remaining set of files to
</code_context>
<issue_to_address>
**issue (bug_risk):** Path joining in compress_for_rename double-prefixes root, producing incorrect paths
In `compress_for_rename`, `os.walk(root)` already yields `dirname` as an absolute path under `root`. Using `norm_join(root, dirname, ...)` produces `root/root/...` paths that don’t exist and will break the `all_files - remaining` logic. The joins should use `dirname` only: `norm_join(dirname, d)` / `norm_join(dirname, f)`.
</issue_to_address>
### Comment 2
<location path="src/poetry/installation/uninstaller.py" line_range="184-186" />
<code_context>
+
+ sep = os.path.sep
+ short_paths: set[str] = set()
+ for path in sorted(paths, key=len):
+ should_skip = any(
+ path.startswith(shortpath.rstrip("*"))
+ and path[len(shortpath.rstrip("*").rstrip(sep))] == sep
+ for shortpath in short_paths
</code_context>
<issue_to_address>
**issue (bug_risk):** Possible IndexError and incorrect prefix handling in compact() short-path check
In `compact`, `path[len(shortpath.rstrip("*").rstrip(sep))]` can raise `IndexError` when `path` is equal to or shorter than the stripped `shortpath`, since there’s no `len(path) > len(prefix)` guard. Because this runs on filesystem paths during uninstall, that would abort uninstalls. Compute `prefix = shortpath.rstrip("*").rstrip(sep)` once, guard with `len(path) > len(prefix)`, and prefer something like `path.startswith(prefix + sep)` over direct index access.
</issue_to_address>
### Comment 3
<location path="tests/installation/test_uninstaller.py" line_range="11" />
<code_context>
+from poetry.installation.uninstaller import StashedUninstallPathSet
+from poetry.installation.uninstaller import UninstallPathSet
+from poetry.installation.uninstaller import _normalize_path
+from poetry.installation.uninstaller import _uninstallation_paths
+from poetry.installation.uninstaller import compact
+from poetry.installation.uninstaller import compress_for_rename
</code_context>
<issue_to_address>
**suggestion (testing):** Add a focused unit test for `_uninstallation_paths` to verify .pyc/.pyo sibling generation
`_uninstallation_paths` is only covered indirectly via `uninstall_distribution`. Please add a small, direct unit test that feeds a synthetic list of `PackagePath` objects (including at least one `.py`) and asserts that the yielded paths include both the `.py` entry and the corresponding `.pyc`/`.pyo`. This will make the helper’s behaviour explicit and protect against regressions if RECORD contents or `UninstallPathSet.add` change later.
Suggested implementation:
```python
import csv
from importlib.metadata import PackagePath
from pathlib import Path
from typing import TYPE_CHECKING
```
```python
def _make_env(tmp_path: Path) -> MockEnv:
env_path = tmp_path / "env"
def test__uninstallation_paths_generates_bytecode_siblings(tmp_path: Path) -> None:
"""
_uninstallation_paths should include .pyc/.pyo siblings for .py files.
This uses a synthetic set of PackagePath entries (not a real distribution)
to exercise the helper directly and make its behaviour explicit.
"""
env = _make_env(tmp_path)
# In this test we only care about the uninstallation path generation logic,
# so we can treat the environment's site-packages directory as the root.
site_packages = Path(env._path) # Adjust if MockEnv exposes site-packages differently
py_rel = PackagePath("sample_module.py")
non_py_rel = PackagePath("data.txt")
package_paths = [py_rel, non_py_rel]
# Call the helper directly with synthetic PackagePath objects.
# The exact signature of _uninstallation_paths may differ; adjust the
# arguments as needed to match the implementation.
uninstall_paths = list(_uninstallation_paths(package_paths, root=site_packages))
# Normalise all returned paths to Path objects for comparison.
normalized_paths = {Path(p) for p in uninstall_paths}
source = site_packages / str(py_rel)
pyc = source.with_suffix(".pyc")
pyo = source.with_suffix(".pyo")
# Ensure the original .py is present...
assert source in normalized_paths
# ...and that both bytecode siblings are included as well.
assert pyc in normalized_paths
assert pyo in normalized_paths
# Non-.py entries should be present but not produce extra siblings.
non_py_source = site_packages / str(non_py_rel)
assert non_py_source in normalized_paths
assert non_py_source.with_suffix(".pyc") not in normalized_paths
assert non_py_source.with_suffix(".pyo") not in normalized_paths
```
</issue_to_address>
### Comment 4
<location path="tests/installation/test_uninstaller.py" line_range="220-229" />
<code_context>
+ assert not (Path(scripts_dir) / "demo-cli").exists()
+
+
+def test_rollback_restores_files(tmp_path: Path) -> None:
+ env = _make_env(tmp_path)
+ _, installed = _install_fake_distribution(env)
+
+ pathset = uninstall_distribution(env, "demo")
+ assert pathset is not None
+
+ for path in installed:
+ if path.is_file():
+ assert not path.exists()
+
+ pathset.rollback()
+
+ for path in installed:
+ assert path.exists(), f"{path} should have been restored"
+
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for rolling back when there is nothing to roll back
`rollback()` is already tested after a real uninstall, but both `StashedUninstallPathSet.rollback()` and `UninstallPathSet.rollback()` have a no-op path when `can_rollback` is false. Please add a small test that calls `UninstallPathSet.rollback()` on a freshly created pathset (without `remove()`) and asserts it doesn’t raise and logs the expected error, so that branch is covered and the edge case is verified.
Suggested implementation:
```python
pathset.commit()
assert not (Path(scripts_dir) / "demo-cli").exists()
def test_uninstall_pathset_rollback_without_remove_logs(caplog) -> None:
pathset = UninstallPathSet()
with caplog.at_level(logging.ERROR):
pathset.rollback()
assert any(
"rollback" in message.lower() for message in caplog.messages
), "Expected rollback-related log message when rolling back without remove()"
import logging
import csv
```
You may need to adjust the log level (`logging.ERROR` here) and/or the substring asserted in the message (`"rollback"`) to match the actual behavior of `UninstallPathSet.rollback()` when `can_rollback` is false. If the implementation uses a different level (e.g. WARNING) or a specific message, update the `caplog.at_level(...)` and the `any(...)` predicate accordingly. Also, if your codebase enforces type annotations for fixtures, you can add an appropriate type hint for `caplog` (e.g. importing `LogCaptureFixture` under a `TYPE_CHECKING` guard and annotating `caplog: LogCaptureFixture`).
</issue_to_address>
### Comment 5
<location path="tests/installation/test_uninstaller.py" line_range="307-316" />
<code_context>
+ assert not stashed.can_rollback
+
+
+def test_compress_for_rename_collapses_full_directory(tmp_path: Path) -> None:
+ pkg = tmp_path / "pkg"
+ pkg.mkdir()
+ f1 = pkg / "a.py"
+ f1.touch()
+ f2 = pkg / "b.py"
+ f2.touch()
+
+ result = compress_for_rename([str(f1), str(f2)])
+ assert any(r.rstrip("/\\").endswith("pkg") for r in result)
</code_context>
<issue_to_address>
**suggestion (testing):** You may want a companion test where compress_for_rename keeps both directory and file paths
The existing test covers the full-directory case where all files are included and the path collapses to the directory. Please also add a test where only some files in the directory are passed to `compress_for_rename`, and verify it returns just those file paths (no directory wildcard). This will lock in the expected partial-directory behavior and help prevent unintended directory-level moves during uninstall.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| for root in unchecked: | ||
| if any(os.path.normcase(root).startswith(w) for w in wildcards): | ||
| # This directory has already been handled. | ||
| continue | ||
|
|
||
| all_files: set[str] = set() | ||
| all_subdirs: set[str] = set() | ||
| for dirname, subdirs, files in os.walk(root): | ||
| all_subdirs.update(norm_join(root, dirname, d) for d in subdirs) |
There was a problem hiding this comment.
issue (bug_risk): Path joining in compress_for_rename double-prefixes root, producing incorrect paths
In compress_for_rename, os.walk(root) already yields dirname as an absolute path under root. Using norm_join(root, dirname, ...) produces root/root/... paths that don’t exist and will break the all_files - remaining logic. The joins should use dirname only: norm_join(dirname, d) / norm_join(dirname, f).
| for path in sorted(paths, key=len): | ||
| should_skip = any( | ||
| path.startswith(shortpath.rstrip("*")) |
There was a problem hiding this comment.
issue (bug_risk): Possible IndexError and incorrect prefix handling in compact() short-path check
In compact, path[len(shortpath.rstrip("*").rstrip(sep))] can raise IndexError when path is equal to or shorter than the stripped shortpath, since there’s no len(path) > len(prefix) guard. Because this runs on filesystem paths during uninstall, that would abort uninstalls. Compute prefix = shortpath.rstrip("*").rstrip(sep) once, guard with len(path) > len(prefix), and prefer something like path.startswith(prefix + sep) over direct index access.
| from poetry.installation.uninstaller import StashedUninstallPathSet | ||
| from poetry.installation.uninstaller import UninstallPathSet | ||
| from poetry.installation.uninstaller import _normalize_path | ||
| from poetry.installation.uninstaller import _uninstallation_paths |
There was a problem hiding this comment.
suggestion (testing): Add a focused unit test for _uninstallation_paths to verify .pyc/.pyo sibling generation
_uninstallation_paths is only covered indirectly via uninstall_distribution. Please add a small, direct unit test that feeds a synthetic list of PackagePath objects (including at least one .py) and asserts that the yielded paths include both the .py entry and the corresponding .pyc/.pyo. This will make the helper’s behaviour explicit and protect against regressions if RECORD contents or UninstallPathSet.add change later.
Suggested implementation:
import csv
from importlib.metadata import PackagePath
from pathlib import Path
from typing import TYPE_CHECKINGdef _make_env(tmp_path: Path) -> MockEnv:
env_path = tmp_path / "env"
def test__uninstallation_paths_generates_bytecode_siblings(tmp_path: Path) -> None:
"""
_uninstallation_paths should include .pyc/.pyo siblings for .py files.
This uses a synthetic set of PackagePath entries (not a real distribution)
to exercise the helper directly and make its behaviour explicit.
"""
env = _make_env(tmp_path)
# In this test we only care about the uninstallation path generation logic,
# so we can treat the environment's site-packages directory as the root.
site_packages = Path(env._path) # Adjust if MockEnv exposes site-packages differently
py_rel = PackagePath("sample_module.py")
non_py_rel = PackagePath("data.txt")
package_paths = [py_rel, non_py_rel]
# Call the helper directly with synthetic PackagePath objects.
# The exact signature of _uninstallation_paths may differ; adjust the
# arguments as needed to match the implementation.
uninstall_paths = list(_uninstallation_paths(package_paths, root=site_packages))
# Normalise all returned paths to Path objects for comparison.
normalized_paths = {Path(p) for p in uninstall_paths}
source = site_packages / str(py_rel)
pyc = source.with_suffix(".pyc")
pyo = source.with_suffix(".pyo")
# Ensure the original .py is present...
assert source in normalized_paths
# ...and that both bytecode siblings are included as well.
assert pyc in normalized_paths
assert pyo in normalized_paths
# Non-.py entries should be present but not produce extra siblings.
non_py_source = site_packages / str(non_py_rel)
assert non_py_source in normalized_paths
assert non_py_source.with_suffix(".pyc") not in normalized_paths
assert non_py_source.with_suffix(".pyo") not in normalized_paths| def test_rollback_restores_files(tmp_path: Path) -> None: | ||
| env = _make_env(tmp_path) | ||
| _, installed = _install_fake_distribution(env) | ||
|
|
||
| pathset = uninstall_distribution(env, "demo") | ||
| assert pathset is not None | ||
|
|
||
| for path in installed: | ||
| if path.is_file(): | ||
| assert not path.exists() |
There was a problem hiding this comment.
suggestion (testing): Consider adding a test for rolling back when there is nothing to roll back
rollback() is already tested after a real uninstall, but both StashedUninstallPathSet.rollback() and UninstallPathSet.rollback() have a no-op path when can_rollback is false. Please add a small test that calls UninstallPathSet.rollback() on a freshly created pathset (without remove()) and asserts it doesn’t raise and logs the expected error, so that branch is covered and the edge case is verified.
Suggested implementation:
pathset.commit()
assert not (Path(scripts_dir) / "demo-cli").exists()
def test_uninstall_pathset_rollback_without_remove_logs(caplog) -> None:
pathset = UninstallPathSet()
with caplog.at_level(logging.ERROR):
pathset.rollback()
assert any(
"rollback" in message.lower() for message in caplog.messages
), "Expected rollback-related log message when rolling back without remove()"
import logging
import csvYou may need to adjust the log level (logging.ERROR here) and/or the substring asserted in the message ("rollback") to match the actual behavior of UninstallPathSet.rollback() when can_rollback is false. If the implementation uses a different level (e.g. WARNING) or a specific message, update the caplog.at_level(...) and the any(...) predicate accordingly. Also, if your codebase enforces type annotations for fixtures, you can add an appropriate type hint for caplog (e.g. importing LogCaptureFixture under a TYPE_CHECKING guard and annotating caplog: LogCaptureFixture).
| def test_compress_for_rename_collapses_full_directory(tmp_path: Path) -> None: | ||
| pkg = tmp_path / "pkg" | ||
| pkg.mkdir() | ||
| f1 = pkg / "a.py" | ||
| f1.touch() | ||
| f2 = pkg / "b.py" | ||
| f2.touch() | ||
|
|
||
| result = compress_for_rename([str(f1), str(f2)]) | ||
| assert any(r.rstrip("/\\").endswith("pkg") for r in result) |
There was a problem hiding this comment.
suggestion (testing): You may want a companion test where compress_for_rename keeps both directory and file paths
The existing test covers the full-directory case where all files are included and the path collapses to the directory. Please also add a test where only some files in the directory are passed to compress_for_rename, and verify it returns just those file paths (no directory wildcard). This will lock in the expected partial-directory behavior and help prevent unintended directory-level moves during uninstall.
| pass | ||
|
|
||
|
|
||
| class _TempDirectory: |
There was a problem hiding this comment.
from tempfile import TemporaryDirectory and drop this?
| yield os.path.join(dn, base + ".pyo") | ||
|
|
||
|
|
||
| def compact(paths: Iterable[str]) -> set[str]: |
There was a problem hiding this comment.
my AI friend claims that this is redundant and that compress_for_rename() already returns a minimal, non-overlapping set:
- wildcards are added only for roots whose every on-disk file is in the
input, and roots are processed shortest-first, so a wildcard is never
nested under another wildcard;- a file remains in the 'remaining' set only if its parent (which is
always in 'unchecked') did not get a wildcard, so it cannot be under
any wildcard either.
There was a problem hiding this comment.
My AI friend said:
Short answer: No, it's not redundant - though in the common case it's effectively a no-op.
Here's why. remove() calls them in sequence at uninstaller.py:400-402:
for_rename = compress_for_rename(self._paths)
for path in sorted(compact(for_rename)):
moved.stash(path)
compress_for_rename() already does most of the compaction work: when every file on disk under a directory is in the
removal set, it collapses them into a single dir/ wildcard and drops the individual files from remaining
(uninstaller.py:226-228). And its guard at line 214 prevents nested wildcards (roots are processed shortest-first, so a
child dir under an already-wildcarded parent is skipped). So for an all-regular-files package, compress_for_rename returns
a minimal set and compact() changes nothing.
The gap compact() covers: compress_for_rename only subtracts entries that os.walk reports as files (all_files, line 222).
Anything os.walk classifies as a subdir - most notably a symlink to a directory, or an empty directory listed in RECORD -
lands in all_subdirs instead and is never removed from remaining. So you can get output like {/pkg/, /pkg/symlinked_dir}:
the parent wildcard plus a child entry it failed to subtract.
That's a real problem without compact(), because of the sort order at line 402: /pkg/ sorts before /pkg/symlinked_dir, so
remove() would stash the whole /pkg/ directory first, then try to stash /pkg/symlinked_dir \u2014 which no longer exists at its
original location, causing _renames() to fail. compact() (uninstaller.py:180-196) drops the child because it's prefixed
by the wildcard parent, preventing the double-stash.
So: redundant for the typical RECORD of plain files, but a genuine safety net for directory-typed entries (symlinked/empty
dirs). It also matches pip's original code, which the module comment says it deliberately mirrors.
That may be a theoretical issue. I do not think that such wheels are possible (yet). However, it cannot hurt to be robust. Nevertheless, I think only a minimal edit in compress_for_rename() is necessary to make compact truly redundant. I added the change (and a test) in a separate commit for now.
| shutil.rmtree(self._path, ignore_errors=True) | ||
|
|
||
|
|
||
| class _AdjacentTempDirectory(_TempDirectory): |
There was a problem hiding this comment.
pretty sure we could drop this without causing damage. AIUI the point is to stay on the same filesystem as the original as some sort of perf optimisation?
Looks like pdm has been through the same copy-from-pip-and-remove-legacy-noise approach - here - and they do not have an _AdjacentTempDirectory
There was a problem hiding this comment.
Actually, it makes sense to me. In cases where the temporary directory is on another filesystem it should make a significant difference, shoudn't it?
I would give more weight to pip than to pdm in this case because of its broader usage.
There was a problem hiding this comment.
... it should make a significant difference, shoudn't it?
I don't know. Maybe?
Another comparison point: as best I can tell uv does not bother with any of this at all. It just removes files.
That's going to win on simplicity and presumably also on performance, is the pip commit/rollback dance even useful?!
There was a problem hiding this comment.
Perhaps the pip behaviour dates from a time before virtual environments were so universal, and disposable.
I speculate that the case uv might make for not taking the extra care is: in the rare case that the uninstall leaves your venv in an halfway state - just delete it and start over. So long as we reported that something went wrong, we don't have to work so hard to avoid that happening.
There was a problem hiding this comment.
I did some archaeology in the pip repo:
- the move/rollback design dates to 2009
- of course people had good ideas in 2009 also! But it's at least plausible that whatever drove pip back then is not so relevant for poetry today
AdjacentTemporaryDirectorycame in 2018 and the motivation was not performance at all, rather it was to work around accidentally using too-long paths on Windows - Fixes #3055 Uninstall causes paths to exceed MAX_PATH limit pypa/pip#6029
There was a problem hiding this comment.
This is a good point. On a second thought, I agree that we probably do not need the commit/rollback behavior, which adds some complexity. I have to admit, this feature initially confused me when investigating how pip uninstalls packages. Thus, it is probably a good thing to omit if we do not need it.
In my experience, the most common case where an uninstall fails, is on Windows when another process is blocking a file that should be removed. In that case, it should be fine to error out. If we make sure that the .dist-info directory is the last one to delete, the user can even kill the process and try again.
I will open a PR with an alternative implementation shortly.
|
For future reference: We probably also have to patch Otherwise, script files are not uninstalled on Windows with Python 3.10 and 3.11. This is even an issue, although we do not use |
…r_rename() and remove now redundant compact(); add test case
Pull Request Check List
Introduce a builtin uninstaller (adapted from pip's code) so that we do not have to run a pip subprocess for each uninstall. This makes uninstalls (and updates) faster. The new builtin uninstaller is not used by default. You have to set
installer.builtin-uninstallto use it. (We can switch it later and remove the setting even later, just as we did withinstaller.modern-installation.)The second commit makes use of the rollback mechanism in case of an error. Further, it fixes an issue in the legacy pip path where the return code of pip is not checked.