Skip to content

fix: defer heavy imports in inference CLI for vLLM compat#53

Open
slacki-ai wants to merge 2 commits intolongtermrisk:v0.9from
slacki-ai:fix/inference_cli_deferred_imports
Open

fix: defer heavy imports in inference CLI for vLLM compat#53
slacki-ai wants to merge 2 commits intolongtermrisk:v0.9from
slacki-ai:fix/inference_cli_deferred_imports

Conversation

@slacki-ai
Copy link
Copy Markdown

Summary

  • Move torch, vllm, transformers, huggingface_hub, and openweights.client imports from module top-level into the __main__ guard
  • This allows monkey-patches to be applied before vLLM is imported — vLLM captures tqdm and tokenizer behaviour at import time, so patching after import has no effect
  • Adds tqdm noise reduction (rate-limited updates) and transformers.PreTrainedTokenizerBase.all_special_tokens_extended compat patch for newer transformers versions
  • Changes main() signature from main(config_json: str) to main(cfg, conversations) — config parsing and data loading now happen in __main__ before vLLM import

Changes

  • openweights/jobs/inference/cli.py — restructured import order and main() signature

Test plan

  • AST-based unit tests verify import structure (11 tests): heavy imports not at top level, stdlib imports preserved, main() signature correct, __main__ guard contains deferred imports
  • Integration: run an inference job end-to-end to verify the new import order works with vLLM

🤖 Generated with Claude Code

slacki-ai and others added 2 commits March 26, 2026 10:12
Move torch, vLLM, transformers, and huggingface_hub imports from
module top-level to the __main__ guard. This allows monkey-patches
(tqdm rate limiting, tokenizer compat) to be applied BEFORE vLLM is
imported, since vLLM captures tqdm and tokenizer behaviour at import
time.

Also changes main() signature to accept pre-parsed (cfg, conversations)
instead of a raw JSON string, and adds tqdm noise reduction and
transformers all_special_tokens_extended compat patch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep 5 deferred-import checks (one per heavy library). Remove 6 tests:
stdlib presence check, function signature checks, __main__ guard
existence, and AST-dump string matching for imports inside the guard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nielsrolf
Copy link
Copy Markdown
Collaborator

Why do we need this? I strongly prefer imports at module level. Modifying tqdm behavior is imo probably not worth such a change. Why does the tokenizer need any monkeypatching?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants