Skip to content

feat: add num_return_sequences support for rec beam search.#1289

Open
DragonFive wants to merge 13 commits intojd-opensource:mainfrom
DragonFive:feat/rec-num-return-sequences
Open

feat: add num_return_sequences support for rec beam search.#1289
DragonFive wants to merge 13 commits intojd-opensource:mainfrom
DragonFive:feat/rec-num-return-sequences

Conversation

@DragonFive
Copy link
Copy Markdown
Collaborator

Summary

  • add num_return_sequences plumbing for REC beam search across proto, pybind, c_api, and cc_api
  • align REC multi-round fast-path beam scores with full-logsoftmax semantics used by xllm_rec
  • expand OneRec beams before the final prune when num_return_sequences > beam_width
  • keep REC top-k outputs sorted and add targeted tests for beam expansion and fast-path scoring

Testing

  • git diff --check
  • python3 -m py_compile xllm/pybind/llm.py xllm/pybind/params.py
  • added targeted unit coverage in batch_test.cpp and rec_sampler_test.cpp

Notes

  • branch: feat/rec-num-return-sequences
  • target: team/main

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the num_return_sequences parameter across the C++, Python, and Protobuf APIs, allowing beam search to return more sequences than the beam width. The implementation adds logic to Batch and SequencesGroup to expand results using top-k logprobs and updates the RecSampler for accurate logprob calculation. Feedback identifies a critical runtime bug where the beam_base_logprobs tensor must be flattened to 1-D to prevent a shape mismatch crash during batch processing. Additionally, an include path in batch.cpp requires correction to follow the project's root-relative path style guide.

Comment thread xllm/core/framework/batch/batch.cpp Outdated
Comment thread xllm/core/runtime/rec_worker_impl.cpp Outdated
Comment thread xllm/core/framework/batch/batch.cpp Outdated
Comment thread xllm/core/framework/batch/batch_test.cpp Outdated
Comment thread xllm/core/framework/batch/batch.cpp
Comment thread xllm/core/framework/batch/batch.cpp Outdated
Comment thread xllm/core/distributed_runtime/rec_engine.cpp Outdated
Comment thread xllm/core/framework/request/sequences_group.cpp Outdated
Comment thread xllm/core/framework/sampling/rec_sampler_test.cpp Outdated
Comment thread xllm/pybind/bind.cpp
Comment thread xllm/core/runtime/rec_worker_impl.cpp
@DragonFive
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the num_return_sequences parameter across the C++, C, Python, and Protobuf APIs, allowing users to specify the number of sequences returned from beam search. The feedback identifies violations of the repository style guide regarding member functions in structs and suggests logic adjustments to correctly support sequence pruning when num_return_sequences is less than the beam width.

Comment thread xllm/core/framework/request/request_params.h Outdated
Comment thread xllm/core/framework/sampling/sampling_params.h Outdated
Comment thread xllm/core/framework/request/request_params.cpp
Comment thread xllm/core/framework/request/sequences_group.cpp Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants