Skip to content

[RL] Fix the incorrect routing of EOS tokens, which leads to changes in accuracy#7960

Open
gongshaotian wants to merge 11 commits into
PaddlePaddle:release/2.6from
gongshaotian:r3_eos_2.6
Open

[RL] Fix the incorrect routing of EOS tokens, which leads to changes in accuracy#7960
gongshaotian wants to merge 11 commits into
PaddlePaddle:release/2.6from
gongshaotian:r3_eos_2.6

Conversation

@gongshaotian
Copy link
Copy Markdown
Collaborator

@gongshaotian gongshaotian commented May 29, 2026

Motivation

An incorrect calculation of the context length in the tokenizer led to one extra route for the output tokens compared to what was actually captured.
In Overlap mode, the estimated token count for the current inference step might be a bit higher than the actual count. This could cause some contamination in the routing cache when updating the CPU cache.
These two issues together cause the k3_kl value to become unstable after R3 starts supporting Overlap.

Modifications

  1. Integrating multiple operators into get_positions_and_slot_mapping()
  2. Route that no longer returns EOS tokens
  3. In the Overlap Schedule mode,, only the flush routing offor realthe actual token numcount is applicable.
  4. ADD debug mode

Usage or Command

Add debug model

    --routing-replay-config '{"enable_routing_replay":true, "debug_mode":true}' 

Accuracy Tests

Add tests/operators/test_get_position_ids_and_slot_mapping.py

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 29, 2026

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 29, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-30 19:49:26

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

有 3 个 Required 任务失败,需优先处理后方可合并。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
36(0) 36 31 5 0 0 0

2 任务状态汇总

2.1 Required 任务:7/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Approval 18s 需要 Approval 请通过人工审批 Job -
run_tests_with_coverage 1h14m PR问题:baseline路径0530目录不存在且只读 修复EOS路由并在CI预创建新baseline目录 Job -
run_4_cards_tests 12m17s PR问题:EOS路由后replay未生成stream目录 检查token_processor.py中stream目录生成逻辑 Job -
其余 7 个必选任务通过 - - - - -

2.2 可选任务 — 24/26 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 1m46s Job -
CI_HPU 1h4m Job -
其余 24 个可选任务通过 - - -

3 失败详情(仅 required)

Approval — 需要人工审批(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 测试失败(置信度: 高)

Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage

  • 状态: ❌ 失败
  • 错误类型: 测试失败
  • 置信度: 高
  • 根因摘要: PR更新baseline路径至0530,CI环境/ModelData只读,无法创建新baseline目录
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试 错误 根因
e2e/test_EB_Lite_serving_R3.py::test_r3_accuracy OSError: [Errno 30] Read-only file system routing replay未生成stream目录,新baseline目录不存在且文件系统只读

根因详情:
PR 在 tests/e2e/utils/rollout_routing_replay_test_utils.py 将 baseline 路径从 R3_BaseLine_uint8_0424 更新为 R3_BaseLine_uint8_0530(L159)。test_r3_accuracy 调用 generated_base_line_routing_index 尝试将 ./R3_tmp/routing_replay_output_eb45/r3_chat_completion_stream 移动到新 baseline 目录,但源目录不存在(EOS routing 变更后 routing replay 未生成 stream 输出);回退 copytree 时,尝试创建 /ModelData/R3_BaseLine_uint8_0530OSError: [Errno 30] Read-only file system

关键日志:

FAILED tests/e2e/test_EB_Lite_serving_R3.py::test_r3_accuracy - OSError: [Errno 30] Read-only file system: '/ModelData/R3_BaseLine_uint8_0530'
Unit tests failed (exit code 8)
Failed test cases:
tests/e2e/test_EB_Lite_serving_R3.py

修复建议:

  1. 检查 fastdeploy/output/token_processor.py 中 EOS token 路由逻辑,确认修改后 routing replay 能正确生成 r3_chat_completion_stream 目录
  2. 在 CI 机器上预先创建并挂载 /ModelData/R3_BaseLine_uint8_0530 目录(或通知 CI 管理员),确保 baseline 写入路径可写

修复建议摘要: 修复token_processor.py的EOS路由,并在CI预创建新baseline目录

关联变更: tests/e2e/utils/rollout_routing_replay_test_utils.py L159(baseline路径0424→0530)、fastdeploy/output/token_processor.py(EOS路由逻辑变更)

链接: 查看日志

Run Four Cards Tests / run_4_cards_tests — 测试失败(置信度: 中)

Run Four Cards Tests / run_4_cards_tests

  • 状态: ❌ 失败
  • 错误类型: 测试失败
  • 置信度: 中
  • 根因摘要: PR修改EOS路由后,routing replay未生成r3_chat_completion_stream目录
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试 错误 根因
test_GLM_45_AIR_mtp_tp4.py::test_r3_accuracy FileNotFoundError routing replay未生成r3_chat_completion_stream目录
test_GLM_45_AIR_tp4.py::test_r3_accuracy FileNotFoundError routing replay未生成r3_chat_completion_stream目录

根因详情:
PR 将 baseline 路径从 R3_BaseLine_uint8_0424 更新为 R3_BaseLine_uint8_0530,并修改了 EOS token 的路由逻辑(fastdeploy/output/token_processor.py 等文件)。test_r3_accuracy 调用 generated_base_line_routing_index 时尝试将 ./R3_tmp/routing_replay_output_glm45air_mtp_tp4/r3_chat_completion_stream 移动到新 baseline 目录,但该源目录不存在,说明 routing replay 在 PR 修改后未能生成预期的流式输出目录。

修复建议:

  1. 检查 fastdeploy/output/token_processor.py 中 stream 目录生成逻辑与测试期望路径的一致性
  2. 确认 EOS token 路由修改后 routing replay 的输出结构是否与 r3_chat_completion_stream 命名规范一致

修复建议摘要: 检查token_processor.py中stream目录生成逻辑与测试期望路径一致性

关联变更: tests/e2e/utils/rollout_routing_replay_test_utils.py L159(baseline路径从0424→0530)、fastdeploy/output/token_processor.py(EOS路由逻辑变更)

链接: 查看日志

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 29, 2026

Codecov Report

❌ Patch coverage is 25.64103% with 87 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@ac24fcc). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...model_executor/layers/moe/routing_indices_cache.py 18.75% 47 Missing and 5 partials ⚠️
fastdeploy/cache_manager/routing_cache_manager.py 4.54% 21 Missing ⚠️
fastdeploy/worker/gpu_model_runner.py 61.90% 7 Missing and 1 partial ⚠️
fastdeploy/config.py 33.33% 1 Missing and 1 partial ⚠️
fastdeploy/model_executor/pre_and_post_process.py 50.00% 2 Missing ⚠️
fastdeploy/output/token_processor.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #7960   +/-   ##
==============================================
  Coverage               ?   71.63%           
==============================================
  Files                  ?      386           
  Lines                  ?    55447           
  Branches               ?     8688           
==============================================
  Hits                   ?    39718           
  Misses                 ?    12927           
  Partials               ?     2802           
Flag Coverage Δ
GPU 71.63% <25.64%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread fastdeploy/worker/gpu_model_runner.py Outdated
Comment thread fastdeploy/worker/gpu_model_runner.py Outdated
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-30 18:02:16

📋 Review 摘要

PR 概述:修复 R3 Routing Replay 中 EOS token 路由错误和 Overlap 模式下 token 计数不准导致的精度不稳定问题
变更范围:custom_ops(新增 fused kernel)、cache_manager、worker、model_executor/layers/moe、output/token_processor
影响面 Tag[RL] [OP] [KVCache] [FDConfig]

问题

级别 文件 概述
🟡 建议 fastdeploy/output/token_processor.py:620 EOS 检查缺少防御性守卫,当 eos_token_ids 为 None 或 output_token_ids 为空时会抛异常

历史 Findings 修复情况

Finding 问题 状态
F1 block_tables dtype 不匹配 ✅ 已修复
F2 logger.info 在热路径刷屏 ✅ 已修复(已加 debug_mode 守卫)
F3 遗留三重注释代码 ⚠️ 仍存在
F4 token_num_overlap 初始化为 0 ⚠️ 仍存在(经分析,执行流保证首次调用前已赋值,非实际 bug)
F5 DSA Attention Backend 功能回归 ⚠️ 仍存在(position_ids 从 int32 改为 int64,DSA 消费侧需确认兼容)
F6 seq_len - 1 在 recovery_stop 场景下 off-by-one ✅ 已修复(EOS 检查已精确守卫)
F7 routing_replay_manager 为 None 时 AttributeError ✅ 已修复(已加 is not None 判断)
F8 Kernel <<<1, bsz>>> 当 bsz>1024 静默失败 ⚠️ 仍存在
F9 Kernel 串行循环性能问题 ⚠️ 仍存在

📝 PR 规范检查

PR 目标分支为 release/2.6(非 develop),按仓库规范应使用 Cherry-Pick 格式标题。同时 Checklist 条目均未勾选。

标题建议(可直接复制):

  • [Cherry-Pick][RL] Fix the incorrect routing of EOS tokens, which leads to changes in accuracy
PR 描述建议(点击展开,可直接复制)
## Motivation

An incorrect calculation of the context length in the tokenizer led to one extra route for the output tokens compared to what was actually captured.
In Overlap mode, the estimated token count for the current inference step might be a bit higher than the actual count. This could cause some contamination in the routing cache when updating the CPU cache.
These two issues together cause the k3_kl value to become unstable after R3 starts supporting Overlap.

## Modifications

1. Integrating multiple operators into `get_positions_and_slot_mapping()` — a new fused CUDA kernel replacing separate `get_position_ids` + Python-side slot_mapping computation
2. Route that no longer returns EOS tokens (`seq_len - 1` in `_finalize_routing`)
3. In the Overlap Schedule mode, only the flush routing for the actual token count is applicable (`token_num_overlap`)
4. Added Debug mode for R3 routing validation (`--routing-replay-config '{"enable_routing_replay":true, "debug_mode":true}'`)

## Usage or Command

Add debug mode:

--routing-replay-config '{"enable_routing_replay":true, "debug_mode":true}'


## Accuracy Tests

Add tests/operators/test_get_position_ids_and_slot_mapping.py

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

核心修复逻辑正确:EOS token 路由排除、Overlap 实际 token 数修正、fused kernel 合并计算均验证合理。F1/F2/F6/F7 已修复;建议后续迭代处理 F8(kernel bsz 上限)和 F5(DSA int64 兼容性确认)。

if hasattr(task, "output_token_ids")
else task.prompt_token_ids_len
)
if task.output_token_ids[-1] in task.eos_token_ids:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 EOS 检查缺少防御性守卫

当前写法 task.output_token_ids[-1] in task.eos_token_ids 在以下边界场景会抛异常:

  • task.eos_token_idsNoneTypeError
  • task.output_token_ids 为空列表 → IndexError

虽然外层有 try/except 兜底不会 crash,但会导致该请求的 routing 数据静默丢失。

建议修复:

if (
    hasattr(task, "output_token_ids")
    and task.output_token_ids
    and task.eos_token_ids
    and task.output_token_ids[-1] in task.eos_token_ids
):
    seq_len = seq_len - 1  # Ignore eos token

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants