feat(cache): replace Git-based hub cache with HF Storage Buckets#1102
feat(cache): replace Git-based hub cache with HF Storage Buckets#1102dacorvo wants to merge 9 commits intoreview/hf-buckets-cachefrom
Conversation
9f1d2d2 to
252039c
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
69c68f0 to
d894880
Compare
There was a problem hiding this comment.
Pull request overview
This PR migrates Optimum Neuron’s remote NEFF cache from a Git-backed Hub repo to Hugging Face Storage Buckets, using a hub_neuronx_cache context manager that fetches on enter and syncs on exit via a long-running uv-launched bucket API proxy server.
Changes:
- Replace the legacy
CompileCacheHfProxy/registry/sync implementation with bucket-basedfetch_cache,sync_cache, andlookup_cacheplus a Unix-socket bucket server. - Update exporters/inference/training integration points and CLI commands to use the new bucket cache flow.
- Add new bucket cache tests and remove the old Hub sync retry unit tests; update cache docs/AGENTS guidance.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/decoder/test_cache_sync_retry.py | Removed tests tied to the old Git/commit-conflict sync strategy. |
| tests/conftest.py | Removes Hub repo cache fixtures and notes sync is now context-managed. |
| tests/cache/test_bucket_utils.py | Adds unit tests for bucket resolution + hashing + path helpers. |
| tests/cache/test_bucket_cache.py | Adds end-to-end bucket fetch/sync/lookup tests via the uv bucket server. |
| pyproject.toml | Adds uv dependency required to isolate bucket operations. |
| optimum/neuron/utils/runner.py | Switches “cache configured” checks from Hub repo to bucket configuration warnings. |
| optimum/neuron/utils/cache_utils.py | Removes Hub repo selection/write-access utilities; keeps local cache path helpers. |
| optimum/neuron/trainers/transformers.py | Removes explicit sync call; continues wrapping training with hub_neuronx_cache. |
| optimum/neuron/models/inference/backend/pretrained_model.py | Moves export caching to new hub_neuronx_cache usage. |
| optimum/neuron/modeling_traced.py | Removes Hub-based cache lookup/load path; always compiles via exporters. |
| optimum/neuron/modeling_diffusion.py | Removes Hub-based cache lookup/load path; always compiles via exporters. |
| optimum/neuron/cache/traced.py | Deleted legacy traced-artifact caching that depended on Hub proxy. |
| optimum/neuron/cache/optimum_neuron_cc_wrapper.py | Wraps compiler wrapper in hub_neuronx_cache without Hub repo config. |
| optimum/neuron/cache/hub_cache.py | Converts to a shim that re-exports bucket cache and makes synchronize a no-op. |
| optimum/neuron/cache/bucket_utils.py | Implements bucket ID persistence, hashing, and path helpers. |
| optimum/neuron/cache/bucket_server.py | Adds Unix-socket server proxying HfApi bucket methods in uv-isolated env. |
| optimum/neuron/cache/bucket_cache.py | Implements context manager + fetch/sync/lookup using the bucket server. |
| optimum/neuron/cache/AGENTS.md | Updates cache subsystem documentation to bucket architecture (needs alignment). |
| optimum/neuron/cache/init.py | Reduces exported API surface to cached-entries + synchronize. |
| optimum/exporters/neuron/convert.py | Wraps tracing in hub_neuronx_cache for bucket fetch/sync. |
| optimum/commands/neuron/cache.py | Refactors CLI from Hub repo commands to bucket set/fetch/lookup/status/cleanup. |
| AGENTS.md | Updates root onboarding note to describe bucket-based cache. |
Comments suppressed due to low confidence (1)
optimum/commands/neuron/cache.py:288
- The
optimum-cli neuron cache synchronizesubcommand has been removed from SUBCOMMANDS, but internal tests/fixtures still invoke it (e.g. tests/decoder/test_cache.py). This will break CI unless those callers are updated in the same PR, or a backward-compatiblesynchronizecommand is kept (even if it delegates to a new bucket-based sync implementation).
class CustomCacheRepoCommand(BaseOptimumCLICommand):
SUBCOMMANDS = (
CommandInfo(
name="create",
help="Create a storage bucket on the Hugging Face Hub for Neuron compilation files.",
subcommand_class=CreateCacheBucketCommand,
),
CommandInfo(
name="set",
help="Set the name of the Neuron cache bucket to use locally.",
subcommand_class=SetCacheBucketCommand,
),
CommandInfo(
name="fetch",
help="Pre-warm the local cache by downloading MODULE dirs for a model from the bucket.",
subcommand_class=FetchCommand,
),
CommandInfo(
name="lookup",
help="Lookup cached export configurations for the specified model id.",
subcommand_class=LookupCommand,
),
CommandInfo(
name="status",
help="Show local Neuron compile cache status (entry counts by state, disk usage, compiler versions).",
subcommand_class=StatusCacheCommand,
),
CommandInfo(
name="cleanup",
help="Clean up poisoned entries (failed compilations, stale locks) from the local Neuron compile cache.",
subcommand_class=CleanupCacheCommand,
),
)
| if cache_dir is not None: | ||
| return Path(cache_dir) | ||
| cache_url = os.environ.get("NEURON_COMPILE_CACHE_URL") | ||
| if cache_url is not None: |
There was a problem hiding this comment.
_get_cache_dir() treats NEURON_COMPILE_CACHE_URL as a filesystem path unconditionally. In this codebase the same env var can be an S3 URL (see cleanup.py:_get_cache_path which ignores values starting with "s3://"). Returning Path("s3://...") will break mount detection and local file operations; consider rejecting/ignoring s3:// here (or parsing it explicitly) to match the local-cache utilities’ behavior.
| if cache_url is not None: | |
| # NEURON_COMPILE_CACHE_URL may sometimes be an S3 URL (e.g. "s3://..."). | |
| # In that case, treat it as a remote location and fall back to the local | |
| # cache resolution logic, matching the behavior of the local-cache utilities. | |
| if cache_url is not None and not cache_url.startswith("s3://"): |
| flat_path = f"{local_to_flat_bucket_path(module_name, compiler_version)}/{file_path.name}" | ||
| files_to_add.append([str(file_path), flat_path]) | ||
| model_path = f"{model_task_prefix}/{module_name}/{file_path.name}" |
There was a problem hiding this comment.
sync_cache() uploads every file in each MODULE_* directory, including transient/poison files like "*.lock" and "model.log" (failed compilation marker). Since cleanup_local_cache() treats these specially, pushing them to a shared bucket can propagate stale locks or failed entries to other users. Filter uploads to only the expected success artifacts and explicitly exclude lock/log/temporary files.
| flat_path = f"{local_to_flat_bucket_path(module_name, compiler_version)}/{file_path.name}" | |
| files_to_add.append([str(file_path), flat_path]) | |
| model_path = f"{model_task_prefix}/{module_name}/{file_path.name}" | |
| # Skip transient/poison files that should not be shared via the bucket. | |
| # These include lock files and failed-compilation markers like model.log. | |
| name = file_path.name | |
| if name.endswith(".lock") or name == "model.log": | |
| continue | |
| flat_path = f"{local_to_flat_bucket_path(module_name, compiler_version)}/{name}" | |
| files_to_add.append([str(file_path), flat_path]) | |
| model_path = f"{model_task_prefix}/{module_name}/{name}" |
| # Re-export lookup_cache under the old name for callers that use it | ||
| get_hub_cached_entries = lookup_cache | ||
|
|
There was a problem hiding this comment.
get_hub_cached_entries is now a direct alias to lookup_cache, but lookup_cache does not accept the legacy parameters (e.g. cache_repo_id) that existing callers/tests still pass. This will raise TypeError at runtime. Consider adding a small compatibility wrapper that accepts the old signature (and maps/ignores cache_repo_id) rather than re-exporting lookup_cache directly.
| # Re-export lookup_cache under the old name for callers that use it | |
| get_hub_cached_entries = lookup_cache | |
| def get_hub_cached_entries( | |
| model_id: str, | |
| task: str | None = None, | |
| cache_repo_id: str | None = None, | |
| ): | |
| """ | |
| Compatibility wrapper for the legacy get_hub_cached_entries API. | |
| The underlying implementation is lookup_cache, which does not accept | |
| the deprecated `cache_repo_id` parameter. We keep it here to avoid | |
| breaking existing callers, but it is ignored. | |
| """ | |
| return lookup_cache(model_id=model_id, task=task) |
| """No-op — immediate sync in hub_neuronx_cache context replaces explicit sync.""" | ||
| logger.info("synchronize_hub_cache is a no-op with bucket-based cache (sync happens in context).") | ||
|
|
||
|
|
There was a problem hiding this comment.
synchronize_hub_cache() is now a no-op, but large parts of the test suite/fixtures still call it expecting remote synchronization (and CLI still historically exposed a cache synchronize command). If explicit sync is intentionally removed, the dependent tests/fixtures need updating; otherwise keep a functional implementation (or at least raise a clear error) to avoid silently masking missing uploads.
| """No-op — immediate sync in hub_neuronx_cache context replaces explicit sync.""" | |
| logger.info("synchronize_hub_cache is a no-op with bucket-based cache (sync happens in context).") | |
| """ | |
| Explicit hub cache synchronization is no longer supported. | |
| Cache uploads and synchronization are handled immediately within the | |
| `hub_neuronx_cache` context manager. This function used to trigger an | |
| explicit sync, but is now deprecated and will raise to avoid silently | |
| masking missing uploads or incorrect cache usage. | |
| """ | |
| logger.error( | |
| "synchronize_hub_cache() has been deprecated and no longer performs an explicit " | |
| "synchronization. Synchronization now happens automatically within the " | |
| "hub_neuronx_cache context manager. Please remove calls to synchronize_hub_cache " | |
| "or update your code/tests accordingly." | |
| ) | |
| raise RuntimeError( | |
| "synchronize_hub_cache() is deprecated and no longer supported. " | |
| "Synchronization is now performed automatically by the hub_neuronx_cache " | |
| "context manager. Please update your code to rely on that behavior." | |
| ) |
|
|
||
| # Only expose the hub cache public API | ||
| from .hub_cache import get_hub_cached_entries, get_hub_cached_models, synchronize_hub_cache | ||
| from .hub_cache import get_hub_cached_entries, synchronize_hub_cache |
There was a problem hiding this comment.
optimum.neuron.cache no longer exports get_hub_cached_models, but existing tests (e.g. tests/decoder/test_cache.py) and possibly external callers still import it. This is a breaking public API change; either keep exporting a compatibility implementation or update/remove all internal references in the same PR to keep CI green.
| from .hub_cache import get_hub_cached_entries, synchronize_hub_cache | |
| from .hub_cache import get_hub_cached_entries, synchronize_hub_cache | |
| def get_hub_cached_models(*args, **kwargs): | |
| """ | |
| Backwards-compatible alias for :func:`get_hub_cached_entries`. | |
| This function is kept for compatibility with older code that imports | |
| ``get_hub_cached_models`` from :mod:`optimum.neuron.cache`. | |
| """ | |
| return get_hub_cached_entries(*args, **kwargs) |
| | `bucket_utils.py` | Bucket resolution, model ID encoding, config hashing, path translation | No | | ||
| | `bucket_cache.py` | Context manager (`hub_neuronx_cache`), `fetch_cache`, `sync_cache`, `lookup_cache` | No | | ||
| | `bucket_cli.py` | Standalone subprocess for bucket API (runs via `uv run --with "huggingface_hub>=1.0"`) | No | | ||
| | `hub_cache.py` | Public API shim — re-exports `hub_neuronx_cache`, `synchronize_hub_cache`, `select_hub_cached_entries` | No | | ||
| | `cleanup.py` | Local compile cache inspection and cleanup (`get_local_cache_status`, `cleanup_local_cache`) | No | | ||
| | `hub_cache.py` | HF Hub cache proxy, monkey-patching, registry, sync with retry | Yes (runtime) | | ||
| | `entries/cache_entry.py` | `ModelCacheEntry` dataclass — serialization, arch digest, hash | No | | ||
| | `entries/single_model.py`, `entries/multi_model.py` | Concrete entry types | No | | ||
| | `training.py`, `traced.py` | Legacy cache wrappers for training and traced model paths | Yes | | ||
| | `entries/cache_entry.py` | `ModelCacheEntry` dataclass — kept for backwards compat with callers | No | | ||
| | `entries/single_model.py`, `entries/multi_model.py` | Concrete entry types — kept for backwards compat | No | | ||
| | `training.py` | Training cache wrapper | Yes | | ||
| | `traced.py` | Deprecated — traced model caching now handled by bucket context | No | | ||
| | `optimum_neuron_cc_wrapper.py` | Compiler wrapper hooks | Yes | |
There was a problem hiding this comment.
AGENTS.md references bucket_cli.py and a cache synchronize command, but this PR adds bucket_server.py (not bucket_cli.py) and the CLI SUBCOMMANDS no longer include synchronize. The document also states model_id encoding uses --, but bucket_model_prefix currently uses the raw model_id with /. Please update this guide to match the implemented files/layout so it remains a reliable onboarding source.
| ## CLI Commands | ||
|
|
||
| All defined in `optimum/commands/neuron/cache.py`: | ||
|
|
||
| | Command | Description | | ||
| |---------|-------------| | ||
| | `cache create` | Create a Hub cache repo | | ||
| | `cache set` | Set active Hub cache repo locally | | ||
| | `cache synchronize` | Upload local cache to Hub (cleans up first) | | ||
| | `cache lookup` | Search Hub cache for compiled models | | ||
| | `cache create` | Create/verify a cache bucket | | ||
| | `cache set` | Set active cache bucket locally | | ||
| | `cache synchronize` | Upload deferred-sync MODULE dirs (scans for `.bucket_meta.json`) | | ||
| | `cache fetch` | Pre-warm local cache from bucket for a model | | ||
| | `cache lookup` | List cached export configs for a model | | ||
| | `cache status` | Show local cache entry counts, sizes, compiler versions | | ||
| | `cache cleanup` | Remove failed/locked/empty entries from local cache | |
There was a problem hiding this comment.
The CLI command table/documentation here still lists cache synchronize (deferred sync via .bucket_meta.json), but the current implementation in optimum/commands/neuron/cache.py removed synchronize and there is no deferred-sync tagging/upload logic in bucket_cache.py. Either implement the deferred/manual sync flow or remove it from the docs to avoid promising a feature that isn’t present.
| """ | ||
| with hub_neuronx_cache(cache_dir=get_neuron_cache_path()): | ||
| output = self._train(resume_from_checkpoint=resume_from_checkpoint) |
There was a problem hiding this comment.
hub_neuronx_cache is now bucket-based and uses model_id/task to scope fetch/sync. Calling it with only cache_dir means it will default to model_id="unknown" and can upload new MODULE_* dirs into an "unknown" bucket prefix (and fetch will never find model-specific entries). Consider plumbing a real model_id/task from the training config (e.g., self.model.config._name_or_path + task) or disable remote bucket operations when the model identity isn’t known.
| def main(): | ||
| with hub_neuronx_cache(cache_repo_id=get_hf_hub_cache_repos()[0], cache_dir=get_neuron_cache_path()): | ||
| with hub_neuronx_cache(cache_dir=get_neuron_cache_path()): | ||
| return neuron_cc_wrapper_main() |
There was a problem hiding this comment.
hub_neuronx_cache now scopes bucket fetch/sync by model_id/task. Wrapping neuron_cc_wrapper_main() with only cache_dir will default to model_id="unknown" and may sync compiler artifacts into an ambiguous remote prefix. If this wrapper can’t reliably determine the model identity, consider disabling per-model bucket sync here (or requiring callers to provide model_id/task explicitly).
| def hub_neuronx_cache( | ||
| model_id: str = "unknown", | ||
| task: str | None = None, | ||
| export_config: dict | None = None, | ||
| compiler_version: str | None = None, | ||
| bucket_id: str | None = None, | ||
| cache_dir: str | Path | None = None, | ||
| ): |
There was a problem hiding this comment.
hub_neuronx_cache defaults model_id to "unknown" and will still perform bucket fetch/sync under that prefix when callers omit model_id (several call sites still do). That can pollute the shared bucket with ambiguous entries. Consider making model_id optional and skipping remote bucket operations (or raising) when it isn’t provided, so callers must opt in with a stable identifier.
Summary
Replace the Git-based
CompileCacheHfProxywith HF Storage Bucket-based NEFF cache.uv run --with "huggingface_hub>=1.0"to avoid transformers version conflicthub_neuronx_cache) handles fetch on enter, sync on exitlibneuronxlaTest plan
pytest tests/cache/(40 tests, CPU)aws-neuron/optimum-neuron-neff-cacheselect_hub_cached_entrieskey name normalization (known TODO)Ref: #1101
🤖 Generated with Claude Code