Skip to content

[Feature] Add SGLang backend support to GRPO#3437

Open
vmoens wants to merge 28 commits intogh/vmoens/217/basefrom
gh/vmoens/217/head
Open

[Feature] Add SGLang backend support to GRPO#3437
vmoens wants to merge 28 commits intogh/vmoens/217/basefrom
gh/vmoens/217/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 2, 2026

Stack from ghstack (oldest at bottom):


  • Add inference_model.backend config option ("vllm" or "sglang")
  • Refactor get_inference_model() to support both backends
  • Refactor make_weight_sync_scheme() to support both backends
  • Add _get_sglang_inference_model() for SGLang backend
  • Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
inference_model:
backend: "sglang" # or "vllm" (default)

Co-authored-by: Cursor cursoragent@cursor.com

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 2, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3437

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 58f3bfd with merge base 7a0b1f9 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Feb 2, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: d4a7d42
Pull-Request: #3437
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 2, 2026
@vmoens vmoens added the llm/ LLM-related PR, triggers LLM CI tests label Feb 2, 2026
vmoens added a commit that referenced this pull request Feb 2, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: d4a7d42
Pull-Request: #3437
vmoens added a commit that referenced this pull request Feb 2, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: d4a7d42
Pull-Request: #3437
@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 153. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 79.7279μs 78.9204μs 12.6710 KOps/s 12.4432 KOps/s $\color{#35bf28}+1.83\%$
test_tensor_to_bytestream_speed[torch.save] 0.1374ms 0.1365ms 7.3257 KOps/s 6.9896 KOps/s $\color{#35bf28}+4.81\%$
test_tensor_to_bytestream_speed[untyped_storage] 99.3632ms 99.1173ms 10.0891 Ops/s 9.9511 Ops/s $\color{#35bf28}+1.39\%$
test_tensor_to_bytestream_speed[numpy] 2.5267μs 2.5178μs 397.1696 KOps/s 401.3456 KOps/s $\color{#d91a1a}-1.04\%$
test_tensor_to_bytestream_speed[safetensors] 37.4448μs 35.7343μs 27.9843 KOps/s 27.8806 KOps/s $\color{#35bf28}+0.37\%$
test_simple 0.6465s 0.5525s 1.8100 Ops/s 1.8143 Ops/s $\color{#d91a1a}-0.24\%$
test_transformed 1.2142s 1.1210s 0.8921 Ops/s 0.8881 Ops/s $\color{#35bf28}+0.44\%$
test_serial 1.6147s 1.6119s 0.6204 Ops/s 0.6180 Ops/s $\color{#35bf28}+0.38\%$
test_parallel 1.1719s 1.0882s 0.9189 Ops/s 0.8918 Ops/s $\color{#35bf28}+3.04\%$
test_step_mdp_speed[True-True-True-True-True] 0.1368ms 42.8734μs 23.3245 KOps/s 22.9722 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[True-True-True-True-False] 47.2210μs 24.7991μs 40.3240 KOps/s 40.7477 KOps/s $\color{#d91a1a}-1.04\%$
test_step_mdp_speed[True-True-True-False-True] 53.9010μs 24.4272μs 40.9381 KOps/s 40.6437 KOps/s $\color{#35bf28}+0.72\%$
test_step_mdp_speed[True-True-True-False-False] 43.1900μs 13.3599μs 74.8509 KOps/s 73.2902 KOps/s $\color{#35bf28}+2.13\%$
test_step_mdp_speed[True-True-False-True-True] 70.9810μs 45.9791μs 21.7490 KOps/s 21.5313 KOps/s $\color{#35bf28}+1.01\%$
test_step_mdp_speed[True-True-False-True-False] 54.4210μs 26.9548μs 37.0991 KOps/s 36.4718 KOps/s $\color{#35bf28}+1.72\%$
test_step_mdp_speed[True-True-False-False-True] 61.0910μs 27.0364μs 36.9871 KOps/s 36.7787 KOps/s $\color{#35bf28}+0.57\%$
test_step_mdp_speed[True-True-False-False-False] 44.9210μs 16.0130μs 62.4494 KOps/s 61.9833 KOps/s $\color{#35bf28}+0.75\%$
test_step_mdp_speed[True-False-True-True-True] 84.2210μs 49.1327μs 20.3530 KOps/s 20.0122 KOps/s $\color{#35bf28}+1.70\%$
test_step_mdp_speed[True-False-True-True-False] 54.3900μs 29.5088μs 33.8882 KOps/s 33.2746 KOps/s $\color{#35bf28}+1.84\%$
test_step_mdp_speed[True-False-True-False-True] 60.7210μs 26.7829μs 37.3372 KOps/s 37.1462 KOps/s $\color{#35bf28}+0.51\%$
test_step_mdp_speed[True-False-True-False-False] 39.2710μs 16.1264μs 62.0101 KOps/s 61.6146 KOps/s $\color{#35bf28}+0.64\%$
test_step_mdp_speed[True-False-False-True-True] 88.7410μs 51.5487μs 19.3991 KOps/s 19.3300 KOps/s $\color{#35bf28}+0.36\%$
test_step_mdp_speed[True-False-False-True-False] 68.2810μs 32.3322μs 30.9289 KOps/s 31.0619 KOps/s $\color{#d91a1a}-0.43\%$
test_step_mdp_speed[True-False-False-False-True] 65.5610μs 29.2107μs 34.2341 KOps/s 33.7179 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[True-False-False-False-False] 49.3800μs 18.6834μs 53.5235 KOps/s 53.6520 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-True-True-True-True] 82.0420μs 48.8105μs 20.4874 KOps/s 20.5139 KOps/s $\color{#d91a1a}-0.13\%$
test_step_mdp_speed[False-True-True-True-False] 57.1710μs 30.0349μs 33.2946 KOps/s 33.2924 KOps/s $+0.01\%$
test_step_mdp_speed[False-True-True-False-True] 59.0210μs 30.6643μs 32.6112 KOps/s 32.2843 KOps/s $\color{#35bf28}+1.01\%$
test_step_mdp_speed[False-True-True-False-False] 46.3810μs 18.1216μs 55.1828 KOps/s 55.4546 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[False-True-False-True-True] 2.8492ms 51.7865μs 19.3101 KOps/s 19.4123 KOps/s $\color{#d91a1a}-0.53\%$
test_step_mdp_speed[False-True-False-True-False] 60.5010μs 32.8667μs 30.4260 KOps/s 30.5900 KOps/s $\color{#d91a1a}-0.54\%$
test_step_mdp_speed[False-True-False-False-True] 61.6410μs 32.8341μs 30.4562 KOps/s 29.7973 KOps/s $\color{#35bf28}+2.21\%$
test_step_mdp_speed[False-True-False-False-False] 53.2510μs 20.5876μs 48.5729 KOps/s 48.7214 KOps/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[False-False-True-True-True] 82.7020μs 54.3186μs 18.4099 KOps/s 18.5001 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[False-False-True-True-False] 59.1610μs 34.7519μs 28.7754 KOps/s 28.1799 KOps/s $\color{#35bf28}+2.11\%$
test_step_mdp_speed[False-False-True-False-True] 64.7010μs 33.2622μs 30.0642 KOps/s 29.4514 KOps/s $\color{#35bf28}+2.08\%$
test_step_mdp_speed[False-False-True-False-False] 44.5410μs 20.5007μs 48.7788 KOps/s 48.3979 KOps/s $\color{#35bf28}+0.79\%$
test_step_mdp_speed[False-False-False-True-True] 83.6610μs 56.0269μs 17.8486 KOps/s 17.6412 KOps/s $\color{#35bf28}+1.18\%$
test_step_mdp_speed[False-False-False-True-False] 67.9410μs 37.2100μs 26.8745 KOps/s 26.3548 KOps/s $\color{#35bf28}+1.97\%$
test_step_mdp_speed[False-False-False-False-True] 65.7410μs 35.3742μs 28.2692 KOps/s 28.0871 KOps/s $\color{#35bf28}+0.65\%$
test_step_mdp_speed[False-False-False-False-False] 50.8810μs 22.8865μs 43.6938 KOps/s 43.0949 KOps/s $\color{#35bf28}+1.39\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.8370s 0.7478s 1.3373 Ops/s 1.3109 Ops/s $\color{#35bf28}+2.02\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7068s 0.6175s 1.6193 Ops/s 1.5916 Ops/s $\color{#35bf28}+1.74\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7035s 1.6319s 0.6128 Ops/s 0.6022 Ops/s $\color{#35bf28}+1.75\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.4923s 1.4153s 0.7066 Ops/s 0.7028 Ops/s $\color{#35bf28}+0.54\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9455s 1.8703s 0.5347 Ops/s 0.5304 Ops/s $\color{#35bf28}+0.80\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7316s 1.6538s 0.6047 Ops/s 0.5970 Ops/s $\color{#35bf28}+1.29\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6215s 4.5188s 0.2213 Ops/s 0.2171 Ops/s $\color{#35bf28}+1.93\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.5503s 4.3597s 0.2294 Ops/s 0.2292 Ops/s $\color{#35bf28}+0.09\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 1.9836s 1.9057s 0.5247 Ops/s 0.5191 Ops/s $\color{#35bf28}+1.09\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.6933s 1.6164s 0.6187 Ops/s 0.6089 Ops/s $\color{#35bf28}+1.61\%$
test_values[generalized_advantage_estimate-True-True] 9.8502ms 9.5480ms 104.7339 Ops/s 104.0231 Ops/s $\color{#35bf28}+0.68\%$
test_values[vec_generalized_advantage_estimate-True-True] 21.4078ms 17.5304ms 57.0439 Ops/s 91.3283 Ops/s $\textbf{\color{#d91a1a}-37.54\%}$
test_values[td0_return_estimate-False-False] 0.2334ms 0.1237ms 8.0813 KOps/s 7.9157 KOps/s $\color{#35bf28}+2.09\%$
test_values[td1_return_estimate-False-False] 25.8653ms 25.4993ms 39.2168 Ops/s 39.1256 Ops/s $\color{#35bf28}+0.23\%$
test_values[vec_td1_return_estimate-False-False] 21.0469ms 17.5566ms 56.9588 Ops/s 89.6804 Ops/s $\textbf{\color{#d91a1a}-36.49\%}$
test_values[td_lambda_return_estimate-True-False] 41.6441ms 37.6824ms 26.5376 Ops/s 26.5613 Ops/s $\color{#d91a1a}-0.09\%$
test_values[vec_td_lambda_return_estimate-True-False] 18.4335ms 17.5099ms 57.1107 Ops/s 90.8051 Ops/s $\textbf{\color{#d91a1a}-37.11\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.5783ms 8.4045ms 118.9836 Ops/s 118.7691 Ops/s $\color{#35bf28}+0.18\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 6.4098ms 1.4442ms 692.4358 Ops/s 689.4044 Ops/s $\color{#35bf28}+0.44\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4987ms 0.3976ms 2.5152 KOps/s 2.5503 KOps/s $\color{#d91a1a}-1.38\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 34.4149ms 33.7281ms 29.6488 Ops/s 34.3946 Ops/s $\textbf{\color{#d91a1a}-13.80\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.1372ms 1.6881ms 592.3718 Ops/s 587.3560 Ops/s $\color{#35bf28}+0.85\%$
test_dqn_speed[False-None] 1.7749ms 1.3545ms 738.2915 Ops/s 737.1060 Ops/s $\color{#35bf28}+0.16\%$
test_dqn_speed[False-backward] 1.9015ms 1.8504ms 540.4185 Ops/s 544.3134 Ops/s $\color{#d91a1a}-0.72\%$
test_dqn_speed[True-None] 0.6645ms 0.5323ms 1.8787 KOps/s 1.9058 KOps/s $\color{#d91a1a}-1.42\%$
test_dqn_speed[True-backward] 0.9931ms 0.9535ms 1.0488 KOps/s 925.0190 Ops/s $\textbf{\color{#35bf28}+13.38\%}$
test_dqn_speed[reduce-overhead-None] 0.6321ms 0.5101ms 1.9604 KOps/s 1.8884 KOps/s $\color{#35bf28}+3.81\%$
test_ddpg_speed[False-None] 3.0931ms 2.7545ms 363.0358 Ops/s 360.9001 Ops/s $\color{#35bf28}+0.59\%$
test_ddpg_speed[False-backward] 4.0747ms 3.9342ms 254.1821 Ops/s 254.0566 Ops/s $\color{#35bf28}+0.05\%$
test_ddpg_speed[True-None] 1.4081ms 1.3420ms 745.1509 Ops/s 735.2747 Ops/s $\color{#35bf28}+1.34\%$
test_ddpg_speed[True-backward] 2.3851ms 2.2854ms 437.5546 Ops/s 387.8506 Ops/s $\textbf{\color{#35bf28}+12.82\%}$
test_ddpg_speed[reduce-overhead-None] 1.5288ms 1.3470ms 742.4139 Ops/s 730.1286 Ops/s $\color{#35bf28}+1.68\%$
test_sac_speed[False-None] 8.1077ms 7.6428ms 130.8420 Ops/s 128.8739 Ops/s $\color{#35bf28}+1.53\%$
test_sac_speed[False-backward] 11.1555ms 10.7267ms 93.2253 Ops/s 91.8994 Ops/s $\color{#35bf28}+1.44\%$
test_sac_speed[True-None] 2.4264ms 2.0671ms 483.7794 Ops/s 480.0232 Ops/s $\color{#35bf28}+0.78\%$
test_sac_speed[True-backward] 4.0429ms 3.8972ms 256.5935 Ops/s 229.0440 Ops/s $\textbf{\color{#35bf28}+12.03\%}$
test_sac_speed[reduce-overhead-None] 2.1917ms 2.0604ms 485.3498 Ops/s 475.5584 Ops/s $\color{#35bf28}+2.06\%$
test_redq_speed[False-None] 14.6589ms 10.2299ms 97.7527 Ops/s 99.5794 Ops/s $\color{#d91a1a}-1.83\%$
test_redq_speed[False-backward] 17.9311ms 17.1798ms 58.2077 Ops/s 58.4740 Ops/s $\color{#d91a1a}-0.46\%$
test_redq_speed[True-None] 4.5471ms 4.3131ms 231.8514 Ops/s 220.9552 Ops/s $\color{#35bf28}+4.93\%$
test_redq_speed[True-backward] 9.7482ms 9.2992ms 107.5362 Ops/s 103.4088 Ops/s $\color{#35bf28}+3.99\%$
test_redq_speed[reduce-overhead-None] 4.6539ms 4.3609ms 229.3090 Ops/s 232.9872 Ops/s $\color{#d91a1a}-1.58\%$
test_redq_deprec_speed[False-None] 11.2520ms 10.7666ms 92.8797 Ops/s 92.9325 Ops/s $\color{#d91a1a}-0.06\%$
test_redq_deprec_speed[False-backward] 16.0982ms 15.4942ms 64.5403 Ops/s 64.5423 Ops/s $-0.00\%$
test_redq_deprec_speed[True-None] 4.9697ms 3.5926ms 278.3499 Ops/s 286.5845 Ops/s $\color{#d91a1a}-2.87\%$
test_redq_deprec_speed[True-backward] 7.5646ms 7.1658ms 139.5513 Ops/s 130.6530 Ops/s $\textbf{\color{#35bf28}+6.81\%}$
test_redq_deprec_speed[reduce-overhead-None] 3.6587ms 3.4959ms 286.0468 Ops/s 280.5406 Ops/s $\color{#35bf28}+1.96\%$
test_td3_speed[False-None] 8.0159ms 7.7705ms 128.6923 Ops/s 127.1730 Ops/s $\color{#35bf28}+1.19\%$
test_td3_speed[False-backward] 11.3013ms 10.6422ms 93.9656 Ops/s 94.2590 Ops/s $\color{#d91a1a}-0.31\%$
test_td3_speed[True-None] 1.8956ms 1.7932ms 557.6628 Ops/s 544.8542 Ops/s $\color{#35bf28}+2.35\%$
test_td3_speed[True-backward] 3.7485ms 3.5146ms 284.5302 Ops/s 282.2548 Ops/s $\color{#35bf28}+0.81\%$
test_td3_speed[reduce-overhead-None] 1.7898ms 1.7332ms 576.9557 Ops/s 572.7447 Ops/s $\color{#35bf28}+0.74\%$
test_cql_speed[False-None] 27.7961ms 25.4136ms 39.3491 Ops/s 38.6423 Ops/s $\color{#35bf28}+1.83\%$
test_cql_speed[False-backward] 35.0699ms 34.3366ms 29.1234 Ops/s 28.7742 Ops/s $\color{#35bf28}+1.21\%$
test_cql_speed[True-None] 12.3066ms 12.0257ms 83.1552 Ops/s 81.8364 Ops/s $\color{#35bf28}+1.61\%$
test_cql_speed[True-backward] 18.3605ms 17.7605ms 56.3048 Ops/s 55.0787 Ops/s $\color{#35bf28}+2.23\%$
test_cql_speed[reduce-overhead-None] 12.4298ms 12.0738ms 82.8241 Ops/s 80.9881 Ops/s $\color{#35bf28}+2.27\%$
test_a2c_speed[False-None] 5.5110ms 5.2505ms 190.4574 Ops/s 187.4382 Ops/s $\color{#35bf28}+1.61\%$
test_a2c_speed[False-backward] 11.8828ms 11.5146ms 86.8461 Ops/s 86.0987 Ops/s $\color{#35bf28}+0.87\%$
test_a2c_speed[True-None] 3.7840ms 3.6461ms 274.2653 Ops/s 265.6002 Ops/s $\color{#35bf28}+3.26\%$
test_a2c_speed[True-backward] 8.6610ms 8.3994ms 119.0560 Ops/s 119.1768 Ops/s $\color{#d91a1a}-0.10\%$
test_a2c_speed[reduce-overhead-None] 3.9213ms 3.6250ms 275.8629 Ops/s 272.1102 Ops/s $\color{#35bf28}+1.38\%$
test_ppo_speed[False-None] 6.1742ms 5.7432ms 174.1188 Ops/s 172.3916 Ops/s $\color{#35bf28}+1.00\%$
test_ppo_speed[False-backward] 12.5446ms 12.1308ms 82.4346 Ops/s 81.6224 Ops/s $\color{#35bf28}+1.00\%$
test_ppo_speed[True-None] 3.7046ms 3.5538ms 281.3891 Ops/s 270.6564 Ops/s $\color{#35bf28}+3.97\%$
test_ppo_speed[True-backward] 8.4809ms 8.1974ms 121.9896 Ops/s 120.7615 Ops/s $\color{#35bf28}+1.02\%$
test_ppo_speed[reduce-overhead-None] 3.6685ms 3.5080ms 285.0648 Ops/s 281.4205 Ops/s $\color{#35bf28}+1.29\%$
test_reinforce_speed[False-None] 4.7041ms 4.3917ms 227.7023 Ops/s 221.8298 Ops/s $\color{#35bf28}+2.65\%$
test_reinforce_speed[False-backward] 7.3789ms 7.1251ms 140.3480 Ops/s 138.6210 Ops/s $\color{#35bf28}+1.25\%$
test_reinforce_speed[True-None] 2.9248ms 2.7538ms 363.1353 Ops/s 346.7965 Ops/s $\color{#35bf28}+4.71\%$
test_reinforce_speed[True-backward] 7.8938ms 7.5562ms 132.3416 Ops/s 117.1584 Ops/s $\textbf{\color{#35bf28}+12.96\%}$
test_reinforce_speed[reduce-overhead-None] 3.0462ms 2.7699ms 361.0254 Ops/s 358.1144 Ops/s $\color{#35bf28}+0.81\%$
test_iql_speed[False-None] 19.7178ms 19.0446ms 52.5083 Ops/s 50.5324 Ops/s $\color{#35bf28}+3.91\%$
test_iql_speed[False-backward] 30.4984ms 29.1289ms 34.3302 Ops/s 33.8530 Ops/s $\color{#35bf28}+1.41\%$
test_iql_speed[True-None] 8.6892ms 8.2880ms 120.6558 Ops/s 119.5313 Ops/s $\color{#35bf28}+0.94\%$
test_iql_speed[True-backward] 16.5068ms 16.0900ms 62.1504 Ops/s 59.4895 Ops/s $\color{#35bf28}+4.47\%$
test_iql_speed[reduce-overhead-None] 8.5973ms 8.3582ms 119.6436 Ops/s 126.5484 Ops/s $\textbf{\color{#d91a1a}-5.46\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.0999ms 5.9426ms 168.2758 Ops/s 167.7115 Ops/s $\color{#35bf28}+0.34\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.3170ms 0.3194ms 3.1307 KOps/s 2.8762 KOps/s $\textbf{\color{#35bf28}+8.85\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5918ms 0.3372ms 2.9654 KOps/s 2.7478 KOps/s $\textbf{\color{#35bf28}+7.92\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.0114ms 5.6973ms 175.5218 Ops/s 175.3247 Ops/s $\color{#35bf28}+0.11\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9802ms 0.3369ms 2.9680 KOps/s 3.0051 KOps/s $\color{#d91a1a}-1.23\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6158ms 0.3205ms 3.1202 KOps/s 3.1433 KOps/s $\color{#d91a1a}-0.73\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7581ms 1.4243ms 702.1182 Ops/s 718.9385 Ops/s $\color{#d91a1a}-2.34\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6861ms 1.3330ms 750.2109 Ops/s 768.5497 Ops/s $\color{#d91a1a}-2.39\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9526ms 5.7928ms 172.6295 Ops/s 170.0144 Ops/s $\color{#35bf28}+1.54\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1708ms 0.4173ms 2.3962 KOps/s 2.0570 KOps/s $\textbf{\color{#35bf28}+16.49\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6120ms 0.3994ms 2.5037 KOps/s 2.2977 KOps/s $\textbf{\color{#35bf28}+8.97\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.7767ms 5.7005ms 175.4218 Ops/s 173.1068 Ops/s $\color{#35bf28}+1.34\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5706ms 0.2774ms 3.6047 KOps/s 2.7405 KOps/s $\textbf{\color{#35bf28}+31.53\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4614ms 0.2604ms 3.8400 KOps/s 2.8522 KOps/s $\textbf{\color{#35bf28}+34.64\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.8363ms 5.6328ms 177.5324 Ops/s 177.5115 Ops/s $\color{#35bf28}+0.01\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.5213ms 0.2711ms 3.6884 KOps/s 3.6372 KOps/s $\color{#35bf28}+1.41\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5707ms 0.2550ms 3.9210 KOps/s 3.2737 KOps/s $\textbf{\color{#35bf28}+19.77\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.9158ms 5.8191ms 171.8470 Ops/s 171.9245 Ops/s $\color{#d91a1a}-0.05\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.4712ms 0.4834ms 2.0687 KOps/s 2.0954 KOps/s $\color{#d91a1a}-1.28\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5830ms 0.3975ms 2.5157 KOps/s 2.0740 KOps/s $\textbf{\color{#35bf28}+21.30\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.5723s 16.3097ms 61.3131 Ops/s 55.2560 Ops/s $\textbf{\color{#35bf28}+10.96\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 8.9956ms 1.9075ms 524.2526 Ops/s 557.6035 Ops/s $\textbf{\color{#d91a1a}-5.98\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 10.4277ms 1.2658ms 790.0105 Ops/s 898.5534 Ops/s $\textbf{\color{#d91a1a}-12.08\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.5058ms 5.0415ms 198.3545 Ops/s 197.4158 Ops/s $\color{#35bf28}+0.48\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 3.9008ms 1.6776ms 596.0773 Ops/s 569.9939 Ops/s $\color{#35bf28}+4.58\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.9895ms 0.8643ms 1.1570 KOps/s 779.2133 Ops/s $\textbf{\color{#35bf28}+48.49\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 8.2236ms 5.1933ms 192.5566 Ops/s 60.4411 Ops/s $\textbf{\color{#35bf28}+218.59\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 3.9358ms 1.7782ms 562.3735 Ops/s 531.9410 Ops/s $\textbf{\color{#35bf28}+5.72\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.8437ms 1.0888ms 918.4467 Ops/s 969.9553 Ops/s $\textbf{\color{#d91a1a}-5.31\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 36.6465ms 34.6241ms 28.8816 Ops/s 28.2871 Ops/s $\color{#35bf28}+2.10\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.3685ms 17.3547ms 57.6213 Ops/s 56.9639 Ops/s $\color{#35bf28}+1.15\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 38.7418ms 35.6586ms 28.0438 Ops/s 27.0746 Ops/s $\color{#35bf28}+3.58\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.2608ms 17.8055ms 56.1624 Ops/s 56.1268 Ops/s $\color{#35bf28}+0.06\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.9265ms 37.8613ms 26.4122 Ops/s 26.1740 Ops/s $\color{#35bf28}+0.91\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.5641ms 19.1334ms 52.2646 Ops/s 51.8698 Ops/s $\color{#35bf28}+0.76\%$

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 148. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_tensor_to_bytestream_speed[pickle] 81.1952μs 80.2005μs 12.4687 KOps/s 12.6401 KOps/s $\color{#d91a1a}-1.36\%$
test_tensor_to_bytestream_speed[torch.save] 0.1358ms 0.1355ms 7.3801 KOps/s 7.3580 KOps/s $\color{#35bf28}+0.30\%$
test_tensor_to_bytestream_speed[untyped_storage] 0.1032s 0.1028s 9.7312 Ops/s 9.6883 Ops/s $\color{#35bf28}+0.44\%$
test_tensor_to_bytestream_speed[numpy] 2.4845μs 2.4777μs 403.6064 KOps/s 400.0835 KOps/s $\color{#35bf28}+0.88\%$
test_tensor_to_bytestream_speed[safetensors] 36.2977μs 36.1419μs 27.6687 KOps/s 27.3967 KOps/s $\color{#35bf28}+0.99\%$
test_simple 0.8919s 0.8023s 1.2464 Ops/s 1.2432 Ops/s $\color{#35bf28}+0.26\%$
test_transformed 1.5131s 1.4204s 0.7040 Ops/s 0.6995 Ops/s $\color{#35bf28}+0.65\%$
test_serial 2.2512s 2.2502s 0.4444 Ops/s 0.4359 Ops/s $\color{#35bf28}+1.95\%$
test_parallel 2.0882s 1.9533s 0.5120 Ops/s 0.5238 Ops/s $\color{#d91a1a}-2.26\%$
test_step_mdp_speed[True-True-True-True-True] 0.2642ms 45.0704μs 22.1875 KOps/s 22.6350 KOps/s $\color{#d91a1a}-1.98\%$
test_step_mdp_speed[True-True-True-True-False] 70.4710μs 24.9672μs 40.0526 KOps/s 40.4373 KOps/s $\color{#d91a1a}-0.95\%$
test_step_mdp_speed[True-True-True-False-True] 76.0910μs 24.5331μs 40.7612 KOps/s 40.0323 KOps/s $\color{#35bf28}+1.82\%$
test_step_mdp_speed[True-True-True-False-False] 45.6510μs 13.6825μs 73.0859 KOps/s 73.0016 KOps/s $\color{#35bf28}+0.12\%$
test_step_mdp_speed[True-True-False-True-True] 82.8120μs 47.5163μs 21.0454 KOps/s 21.2754 KOps/s $\color{#d91a1a}-1.08\%$
test_step_mdp_speed[True-True-False-True-False] 71.0710μs 27.5182μs 36.3396 KOps/s 36.1089 KOps/s $\color{#35bf28}+0.64\%$
test_step_mdp_speed[True-True-False-False-True] 56.2610μs 27.4866μs 36.3814 KOps/s 35.9515 KOps/s $\color{#35bf28}+1.20\%$
test_step_mdp_speed[True-True-False-False-False] 48.8800μs 16.5430μs 60.4486 KOps/s 60.3102 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[True-False-True-True-True] 96.1320μs 50.7562μs 19.7020 KOps/s 20.0251 KOps/s $\color{#d91a1a}-1.61\%$
test_step_mdp_speed[True-False-True-True-False] 61.3310μs 30.8799μs 32.3836 KOps/s 33.0312 KOps/s $\color{#d91a1a}-1.96\%$
test_step_mdp_speed[True-False-True-False-True] 82.2810μs 27.5170μs 36.3411 KOps/s 35.9762 KOps/s $\color{#35bf28}+1.01\%$
test_step_mdp_speed[True-False-True-False-False] 68.7810μs 16.4358μs 60.8429 KOps/s 60.7640 KOps/s $\color{#35bf28}+0.13\%$
test_step_mdp_speed[True-False-False-True-True] 97.6710μs 53.2562μs 18.7772 KOps/s 18.9587 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[True-False-False-True-False] 76.0210μs 32.8031μs 30.4849 KOps/s 30.2361 KOps/s $\color{#35bf28}+0.82\%$
test_step_mdp_speed[True-False-False-False-True] 62.9010μs 30.2380μs 33.0709 KOps/s 33.3887 KOps/s $\color{#d91a1a}-0.95\%$
test_step_mdp_speed[True-False-False-False-False] 58.3300μs 18.9386μs 52.8023 KOps/s 52.0126 KOps/s $\color{#35bf28}+1.52\%$
test_step_mdp_speed[False-True-True-True-True] 93.3710μs 51.2815μs 19.5002 KOps/s 20.0639 KOps/s $\color{#d91a1a}-2.81\%$
test_step_mdp_speed[False-True-True-True-False] 64.6410μs 30.4457μs 32.8454 KOps/s 32.6002 KOps/s $\color{#35bf28}+0.75\%$
test_step_mdp_speed[False-True-True-False-True] 84.5620μs 31.5627μs 31.6829 KOps/s 31.5638 KOps/s $\color{#35bf28}+0.38\%$
test_step_mdp_speed[False-True-True-False-False] 51.1800μs 18.0827μs 55.3013 KOps/s 54.2888 KOps/s $\color{#35bf28}+1.87\%$
test_step_mdp_speed[False-True-False-True-True] 2.7268ms 52.9011μs 18.9032 KOps/s 18.9145 KOps/s $\color{#d91a1a}-0.06\%$
test_step_mdp_speed[False-True-False-True-False] 63.6810μs 32.9991μs 30.3039 KOps/s 29.8325 KOps/s $\color{#35bf28}+1.58\%$
test_step_mdp_speed[False-True-False-False-True] 67.1110μs 34.4598μs 29.0193 KOps/s 29.7402 KOps/s $\color{#d91a1a}-2.42\%$
test_step_mdp_speed[False-True-False-False-False] 67.0910μs 20.8060μs 48.0631 KOps/s 47.8014 KOps/s $\color{#35bf28}+0.55\%$
test_step_mdp_speed[False-False-True-True-True] 0.1124ms 56.1447μs 17.8111 KOps/s 17.8472 KOps/s $\color{#d91a1a}-0.20\%$
test_step_mdp_speed[False-False-True-True-False] 92.3610μs 35.8843μs 27.8673 KOps/s 27.6309 KOps/s $\color{#35bf28}+0.86\%$
test_step_mdp_speed[False-False-True-False-True] 67.2910μs 33.9388μs 29.4648 KOps/s 29.5280 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[False-False-True-False-False] 52.0910μs 21.0568μs 47.4905 KOps/s 47.7812 KOps/s $\color{#d91a1a}-0.61\%$
test_step_mdp_speed[False-False-False-True-True] 0.1114ms 57.4667μs 17.4014 KOps/s 17.2438 KOps/s $\color{#35bf28}+0.91\%$
test_step_mdp_speed[False-False-False-True-False] 71.4410μs 38.6155μs 25.8963 KOps/s 25.9883 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[False-False-False-False-True] 78.5010μs 36.2437μs 27.5910 KOps/s 27.7472 KOps/s $\color{#d91a1a}-0.56\%$
test_step_mdp_speed[False-False-False-False-False] 54.5410μs 23.2937μs 42.9300 KOps/s 42.8839 KOps/s $\color{#35bf28}+0.11\%$
test_non_tensor_env_rollout_speed[1000-single-True] 0.7267s 0.7252s 1.3789 Ops/s 1.3279 Ops/s $\color{#35bf28}+3.84\%$
test_non_tensor_env_rollout_speed[1000-single-False] 0.7096s 0.6176s 1.6191 Ops/s 1.6168 Ops/s $\color{#35bf28}+0.14\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 1.7121s 1.6385s 0.6103 Ops/s 0.6092 Ops/s $\color{#35bf28}+0.17\%$
test_non_tensor_env_rollout_speed[1000-serial-no-buffers-False] 1.5182s 1.4474s 0.6909 Ops/s 0.7025 Ops/s $\color{#d91a1a}-1.65\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 1.9528s 1.8777s 0.5326 Ops/s 0.5292 Ops/s $\color{#35bf28}+0.63\%$
test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 1.7315s 1.6567s 0.6036 Ops/s 0.5951 Ops/s $\color{#35bf28}+1.43\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-True] 4.6440s 4.5523s 0.2197 Ops/s 0.2207 Ops/s $\color{#d91a1a}-0.45\%$
test_non_tensor_env_rollout_speed[1000-parallel-no-buffers-False] 4.4729s 4.4100s 0.2268 Ops/s 0.2228 Ops/s $\color{#35bf28}+1.78\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 2.0023s 1.9392s 0.5157 Ops/s 0.5137 Ops/s $\color{#35bf28}+0.39\%$
test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 1.7167s 1.6398s 0.6098 Ops/s 0.6023 Ops/s $\color{#35bf28}+1.24\%$
test_values[generalized_advantage_estimate-True-True] 20.1281ms 19.7361ms 50.6685 Ops/s 49.0989 Ops/s $\color{#35bf28}+3.20\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1346s 3.6051ms 277.3852 Ops/s 262.8840 Ops/s $\textbf{\color{#35bf28}+5.52\%}$
test_values[td0_return_estimate-False-False] 0.1049ms 82.0046μs 12.1944 KOps/s 12.1507 KOps/s $\color{#35bf28}+0.36\%$
test_values[td1_return_estimate-False-False] 47.1625ms 46.7645ms 21.3837 Ops/s 19.8893 Ops/s $\textbf{\color{#35bf28}+7.51\%}$
test_values[vec_td1_return_estimate-False-False] 1.2882ms 1.0792ms 926.6324 Ops/s 899.9695 Ops/s $\color{#35bf28}+2.96\%$
test_values[td_lambda_return_estimate-True-False] 76.9499ms 76.6949ms 13.0387 Ops/s 12.0831 Ops/s $\textbf{\color{#35bf28}+7.91\%}$
test_values[vec_td_lambda_return_estimate-True-False] 1.3251ms 1.0762ms 929.2141 Ops/s 923.9489 Ops/s $\color{#35bf28}+0.57\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 20.4464ms 20.1688ms 49.5814 Ops/s 45.8490 Ops/s $\textbf{\color{#35bf28}+8.14\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0493ms 0.7470ms 1.3387 KOps/s 1.3478 KOps/s $\color{#d91a1a}-0.67\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7164ms 0.6701ms 1.4922 KOps/s 1.4398 KOps/s $\color{#35bf28}+3.64\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5423ms 1.4805ms 675.4702 Ops/s 665.9507 Ops/s $\color{#35bf28}+1.43\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7292ms 0.6864ms 1.4570 KOps/s 1.3844 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_dqn_speed[False-None] 1.6091ms 1.5180ms 658.7645 Ops/s 651.1626 Ops/s $\color{#35bf28}+1.17\%$
test_dqn_speed[False-backward] 2.3732ms 2.1617ms 462.5916 Ops/s 464.6926 Ops/s $\color{#d91a1a}-0.45\%$
test_dqn_speed[True-None] 1.3063ms 0.5433ms 1.8406 KOps/s 1.8591 KOps/s $\color{#d91a1a}-1.00\%$
test_dqn_speed[True-backward] 1.1366ms 1.0665ms 937.6164 Ops/s 946.7913 Ops/s $\color{#d91a1a}-0.97\%$
test_dqn_speed[reduce-overhead-None] 0.6366ms 0.5616ms 1.7807 KOps/s 1.7491 KOps/s $\color{#35bf28}+1.81\%$
test_ddpg_speed[False-None] 3.2515ms 2.8880ms 346.2575 Ops/s 350.8086 Ops/s $\color{#d91a1a}-1.30\%$
test_ddpg_speed[False-backward] 4.5867ms 4.1769ms 239.4138 Ops/s 240.9976 Ops/s $\color{#d91a1a}-0.66\%$
test_ddpg_speed[True-None] 1.4013ms 1.2935ms 773.0963 Ops/s 785.2755 Ops/s $\color{#d91a1a}-1.55\%$
test_ddpg_speed[True-backward] 2.4336ms 2.3201ms 431.0144 Ops/s 433.7582 Ops/s $\color{#d91a1a}-0.63\%$
test_ddpg_speed[reduce-overhead-None] 1.3779ms 1.2989ms 769.8961 Ops/s 762.7022 Ops/s $\color{#35bf28}+0.94\%$
test_sac_speed[False-None] 8.8643ms 8.2986ms 120.5021 Ops/s 120.0974 Ops/s $\color{#35bf28}+0.34\%$
test_sac_speed[False-backward] 11.6604ms 11.2444ms 88.9328 Ops/s 88.8489 Ops/s $\color{#35bf28}+0.09\%$
test_sac_speed[True-None] 2.4305ms 1.7515ms 570.9456 Ops/s 566.9629 Ops/s $\color{#35bf28}+0.70\%$
test_sac_speed[True-backward] 3.7985ms 3.4100ms 293.2528 Ops/s 297.3175 Ops/s $\color{#d91a1a}-1.37\%$
test_sac_speed[reduce-overhead-None] 18.5971ms 10.5041ms 95.2012 Ops/s 96.0955 Ops/s $\color{#d91a1a}-0.93\%$
test_redq_deprec_speed[False-None] 9.9626ms 9.2719ms 107.8533 Ops/s 108.8695 Ops/s $\color{#d91a1a}-0.93\%$
test_redq_deprec_speed[False-backward] 12.9939ms 12.4199ms 80.5158 Ops/s 81.4374 Ops/s $\color{#d91a1a}-1.13\%$
test_redq_deprec_speed[True-None] 2.6583ms 2.4677ms 405.2395 Ops/s 408.8220 Ops/s $\color{#d91a1a}-0.88\%$
test_redq_deprec_speed[True-backward] 4.3609ms 4.0314ms 248.0512 Ops/s 247.9315 Ops/s $\color{#35bf28}+0.05\%$
test_redq_deprec_speed[reduce-overhead-None] 15.2734ms 9.4127ms 106.2397 Ops/s 91.0037 Ops/s $\textbf{\color{#35bf28}+16.74\%}$
test_td3_speed[False-None] 8.4144ms 8.1155ms 123.2216 Ops/s 122.4892 Ops/s $\color{#35bf28}+0.60\%$
test_td3_speed[False-backward] 10.9761ms 10.5506ms 94.7814 Ops/s 93.7792 Ops/s $\color{#35bf28}+1.07\%$
test_td3_speed[True-None] 1.6893ms 1.5880ms 629.7283 Ops/s 637.0996 Ops/s $\color{#d91a1a}-1.16\%$
test_td3_speed[True-backward] 3.1127ms 3.0009ms 333.2356 Ops/s 312.6792 Ops/s $\textbf{\color{#35bf28}+6.57\%}$
test_td3_speed[reduce-overhead-None] 64.4766ms 22.8810ms 43.7045 Ops/s 43.9116 Ops/s $\color{#d91a1a}-0.47\%$
test_cql_speed[False-None] 17.3865ms 17.0298ms 58.7207 Ops/s 58.3399 Ops/s $\color{#35bf28}+0.65\%$
test_cql_speed[False-backward] 23.0446ms 22.3387ms 44.7654 Ops/s 43.8571 Ops/s $\color{#35bf28}+2.07\%$
test_cql_speed[True-None] 3.6543ms 3.1432ms 318.1469 Ops/s 310.3157 Ops/s $\color{#35bf28}+2.52\%$
test_cql_speed[True-backward] 5.6423ms 5.1837ms 192.9139 Ops/s 185.5129 Ops/s $\color{#35bf28}+3.99\%$
test_cql_speed[reduce-overhead-None] 18.6264ms 11.6129ms 86.1109 Ops/s 87.9384 Ops/s $\color{#d91a1a}-2.08\%$
test_a2c_speed[False-None] 4.3401ms 3.2011ms 312.3935 Ops/s 311.8634 Ops/s $\color{#35bf28}+0.17\%$
test_a2c_speed[False-backward] 6.5585ms 6.1047ms 163.8095 Ops/s 157.4093 Ops/s $\color{#35bf28}+4.07\%$
test_a2c_speed[True-None] 1.4777ms 1.3077ms 764.7167 Ops/s 760.6908 Ops/s $\color{#35bf28}+0.53\%$
test_a2c_speed[True-backward] 3.0030ms 2.8988ms 344.9710 Ops/s 329.2582 Ops/s $\color{#35bf28}+4.77\%$
test_a2c_speed[reduce-overhead-None] 1.0104ms 0.9455ms 1.0576 KOps/s 1.0425 KOps/s $\color{#35bf28}+1.45\%$
test_ppo_speed[False-None] 3.8611ms 3.7703ms 265.2298 Ops/s 253.3178 Ops/s $\color{#35bf28}+4.70\%$
test_ppo_speed[False-backward] 8.9965ms 7.0583ms 141.6767 Ops/s 139.7886 Ops/s $\color{#35bf28}+1.35\%$
test_ppo_speed[True-None] 1.5346ms 1.3824ms 723.3778 Ops/s 729.8054 Ops/s $\color{#d91a1a}-0.88\%$
test_ppo_speed[True-backward] 3.2932ms 3.1861ms 313.8644 Ops/s 312.1835 Ops/s $\color{#35bf28}+0.54\%$
test_ppo_speed[reduce-overhead-None] 1.0949ms 1.0133ms 986.8782 Ops/s 973.5070 Ops/s $\color{#35bf28}+1.37\%$
test_reinforce_speed[False-None] 2.3406ms 2.2640ms 441.6965 Ops/s 440.3910 Ops/s $\color{#35bf28}+0.30\%$
test_reinforce_speed[False-backward] 3.4535ms 3.3750ms 296.2925 Ops/s 290.4463 Ops/s $\color{#35bf28}+2.01\%$
test_reinforce_speed[True-None] 1.3003ms 1.2140ms 823.6981 Ops/s 795.3064 Ops/s $\color{#35bf28}+3.57\%$
test_reinforce_speed[True-backward] 3.0848ms 2.9796ms 335.6124 Ops/s 335.6344 Ops/s $-0.01\%$
test_reinforce_speed[reduce-overhead-None] 16.3051ms 9.0220ms 110.8407 Ops/s 111.8410 Ops/s $\color{#d91a1a}-0.89\%$
test_iql_speed[False-None] 10.2584ms 9.3554ms 106.8903 Ops/s 106.5948 Ops/s $\color{#35bf28}+0.28\%$
test_iql_speed[False-backward] 13.7967ms 13.3564ms 74.8703 Ops/s 74.5519 Ops/s $\color{#35bf28}+0.43\%$
test_iql_speed[True-None] 2.2170ms 2.1066ms 474.7057 Ops/s 470.8957 Ops/s $\color{#35bf28}+0.81\%$
test_iql_speed[True-backward] 5.0140ms 4.6085ms 216.9885 Ops/s 213.4915 Ops/s $\color{#35bf28}+1.64\%$
test_iql_speed[reduce-overhead-None] 17.4750ms 10.0351ms 99.6500 Ops/s 79.3648 Ops/s $\textbf{\color{#35bf28}+25.56\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9514ms 5.7292ms 174.5446 Ops/s 171.6759 Ops/s $\color{#35bf28}+1.67\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.0086ms 0.3505ms 2.8527 KOps/s 3.0677 KOps/s $\textbf{\color{#d91a1a}-7.01\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5702ms 0.3301ms 3.0292 KOps/s 3.3749 KOps/s $\textbf{\color{#d91a1a}-10.24\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.7648ms 5.5279ms 180.9000 Ops/s 177.7024 Ops/s $\color{#35bf28}+1.80\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.9880ms 0.3534ms 2.8299 KOps/s 3.3163 KOps/s $\textbf{\color{#d91a1a}-14.67\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6015ms 0.3390ms 2.9499 KOps/s 3.2713 KOps/s $\textbf{\color{#d91a1a}-9.83\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6827ms 1.3675ms 731.2721 Ops/s 726.0248 Ops/s $\color{#35bf28}+0.72\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6121ms 1.2721ms 786.0770 Ops/s 775.0676 Ops/s $\color{#35bf28}+1.42\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.8221ms 5.7355ms 174.3523 Ops/s 170.5487 Ops/s $\color{#35bf28}+2.23\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.5198ms 0.5038ms 1.9850 KOps/s 2.2149 KOps/s $\textbf{\color{#d91a1a}-10.38\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6808ms 0.5051ms 1.9797 KOps/s 2.2778 KOps/s $\textbf{\color{#d91a1a}-13.09\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.9480ms 5.5731ms 179.4322 Ops/s 174.3179 Ops/s $\color{#35bf28}+2.93\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.3996ms 0.3467ms 2.8841 KOps/s 3.2201 KOps/s $\textbf{\color{#d91a1a}-10.44\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6540ms 0.3294ms 3.0361 KOps/s 3.0590 KOps/s $\color{#d91a1a}-0.75\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.7347ms 5.5159ms 181.2934 Ops/s 176.3974 Ops/s $\color{#35bf28}+2.78\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1704ms 0.3312ms 3.0197 KOps/s 3.0124 KOps/s $\color{#35bf28}+0.24\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5199ms 0.3188ms 3.1366 KOps/s 3.8084 KOps/s $\textbf{\color{#d91a1a}-17.64\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.8299ms 5.7477ms 173.9823 Ops/s 172.1649 Ops/s $\color{#35bf28}+1.06\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3644ms 0.4293ms 2.3294 KOps/s 2.0257 KOps/s $\textbf{\color{#35bf28}+15.00\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6283ms 0.4049ms 2.4695 KOps/s 2.1090 KOps/s $\textbf{\color{#35bf28}+17.10\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.4882ms 4.8894ms 204.5229 Ops/s 197.2680 Ops/s $\color{#35bf28}+3.68\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 7.5374ms 2.1431ms 466.6097 Ops/s 514.8798 Ops/s $\textbf{\color{#d91a1a}-9.38\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 10.4682ms 1.2998ms 769.3662 Ops/s 1.1049 KOps/s $\textbf{\color{#d91a1a}-30.37\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.5898s 16.7185ms 59.8139 Ops/s 196.2574 Ops/s $\textbf{\color{#d91a1a}-69.52\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 4.1113ms 1.7572ms 569.0771 Ops/s 513.6457 Ops/s $\textbf{\color{#35bf28}+10.79\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 10.0164ms 1.3204ms 757.3702 Ops/s 799.5331 Ops/s $\textbf{\color{#d91a1a}-5.27\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 7.9302ms 5.1509ms 194.1392 Ops/s 50.2263 Ops/s $\textbf{\color{#35bf28}+286.53\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 4.1215ms 1.9467ms 513.6868 Ops/s 508.8817 Ops/s $\color{#35bf28}+0.94\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.1498ms 1.0799ms 926.0086 Ops/s 951.4027 Ops/s $\color{#d91a1a}-2.67\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 36.7523ms 34.8278ms 28.7127 Ops/s 28.2432 Ops/s $\color{#35bf28}+1.66\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.3390ms 17.8592ms 55.9936 Ops/s 55.5439 Ops/s $\color{#35bf28}+0.81\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 39.8316ms 36.6825ms 27.2610 Ops/s 27.0360 Ops/s $\color{#35bf28}+0.83\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.8918ms 18.0604ms 55.3697 Ops/s 54.0312 Ops/s $\color{#35bf28}+2.48\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 40.7305ms 38.4017ms 26.0405 Ops/s 26.1956 Ops/s $\color{#d91a1a}-0.59\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.7594ms 20.2252ms 49.4433 Ops/s 52.1248 Ops/s $\textbf{\color{#d91a1a}-5.14\%}$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 2, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: ce5c928
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 2, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 053df08
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 2, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 1ccd157
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 2, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 5ecce60
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 2, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 9607fa3
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: ee9c9b3
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: da31145
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 13b969e
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: a1a2b70
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: e4da2ab
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: ea2fe66
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 4d20ae4
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 54f5596
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: b7639e8
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 09d2fa0
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 6f6c674
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 8c27963
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 9322b29
Pull-Request: #3437
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 50a8064
Pull-Request: #3437
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 50a8064
Pull-Request: #3437
vmoens added a commit that referenced this pull request Feb 3, 2026
- Add inference_model.backend config option ("vllm" or "sglang")
- Refactor get_inference_model() to support both backends
- Refactor make_weight_sync_scheme() to support both backends
- Add _get_sglang_inference_model() for SGLang backend
- Add _make_sglang_weight_sync_scheme() for SGLang weight sync

Users can now run GRPO with either vLLM or SGLang:
  inference_model:
    backend: "sglang"  # or "vllm" (default)

Co-authored-by: Cursor <cursoragent@cursor.com>
ghstack-source-id: 50a8064
Pull-Request: #3437
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Feature New feature llm/ LLM-related PR, triggers LLM CI tests sota-implementations/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant