Skip to content

Make CommandEncoder thread local#3348

Merged
zcbenz merged 1 commit intoml-explore:mainfrom
zcbenz:encoder-refactor-4
Apr 1, 2026
Merged

Make CommandEncoder thread local#3348
zcbenz merged 1 commit intoml-explore:mainfrom
zcbenz:encoder-refactor-4

Conversation

@zcbenz
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz commented Apr 1, 2026

Refs #3078, #3216.

  • Do synchronization in ~CommandEncoder instead of ~Scheduler: so we can clean up when each thread exits, instead of when process exits, and avoid issues about thread safety and destruction order.
  • Store CommandEncoder in thread local storage and throw error when accessing a stream not created from current thread.

@zcbenz zcbenz force-pushed the encoder-refactor-4 branch 4 times, most recently from d27cbb7 to a58624e Compare April 1, 2026 03:40
Copy link
Copy Markdown
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! And really nice broken up in several PRs the past few days. Kudos

@zcbenz zcbenz force-pushed the encoder-refactor-4 branch 2 times, most recently from 247c227 to 0aefd88 Compare April 1, 2026 07:53
@zcbenz zcbenz force-pushed the encoder-refactor-4 branch from 0aefd88 to 8a40a06 Compare April 1, 2026 08:01
@zcbenz zcbenz merged commit 5e2c442 into ml-explore:main Apr 1, 2026
16 checks passed
@zcbenz zcbenz deleted the encoder-refactor-4 branch April 1, 2026 09:42
Thump604 added a commit to Thump604/mlx-lm that referenced this pull request Apr 1, 2026
MLX ml-explore/mlx#3348 (Make CommandEncoder thread local) makes each
thread maintain its own Metal command encoder. The module-level
generation_stream created at import time belongs to the main thread's
command encoder. When generation runs in a thread pool (e.g., via
vllm-mlx BatchedEngine), mx.eval and mx.synchronize on that stream
fail with:

    RuntimeError: There is no Stream(gpu, 0) in current thread.

Fix: replace the module-level mx.Stream with a thread-local factory
function. Each thread gets its own stream on first use. The public
name `generation_stream` becomes a function returning the current
thread's stream, keeping the call-site diff minimal.

Backward compatible: on MLX versions without thread-local command
encoders, per-thread streams work identically to a shared stream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants