Skip to content

Add Qwen3.5 ROCK agentic SWE example#396

Merged
PanAndy merged 3 commits intoalibaba:mainfrom
shamanez:feat/qwen35-agentic-example
Mar 24, 2026
Merged

Add Qwen3.5 ROCK agentic SWE example#396
PanAndy merged 3 commits intoalibaba:mainfrom
shamanez:feat/qwen35-agentic-example

Conversation

@shamanez
Copy link
Contributor

Summary

Add a Qwen3.5 native ROCK agentic example and fix the text agentic chat-template tokenization path for newer Transformers.

Closes #395.

What Changed

  • added examples/agentic_demo/agent_val_rock_swe_qwen35_2b.yaml
  • added examples/agentic_demo/run_agentic_pipeline_rock_swe_qwen35_2b.sh
  • set return_dict=False in AgentNativeStepEnvManager and proxy_utils.generate_by_proxy so apply_chat_template(..., tokenize=True) keeps returning token ids for Qwen3.5

Validation

  • python -m py_compile roll/pipeline/agentic/env_manager/agent_native_env_manager.py roll/pipeline/agentic/llm_proxy/proxy_utils.py
  • parsed examples/agentic_demo/agent_val_rock_swe_qwen35_2b.yaml with yaml.safe_load

@CLAassistant
Copy link

CLAassistant commented Mar 23, 2026

CLA assistant check
All committers have signed the CLA.

Commented out the 'fa2' line in model_args section.
@shamanez shamanez force-pushed the feat/qwen35-agentic-example branch from 181ce03 to 87740e9 Compare March 24, 2026 03:23
@PanAndy
Copy link
Collaborator

PanAndy commented Mar 24, 2026

Could you fix this bug as well? It should only require a config change.
#397

With group_size=1, mean normalization over traj_group_id produces
all-zero advantages (R - R = 0). Use method: identity to skip
normalization and pass raw rewards through directly.

Closes alibaba#397
@shamanez shamanez force-pushed the feat/qwen35-agentic-example branch from 32c57b7 to e39fe4b Compare March 24, 2026 08:35
@shamanez
Copy link
Contributor Author

@PanAndy done.

@PanAndy PanAndy merged commit 52e0978 into alibaba:main Mar 24, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide Qwen3.5 agentic example with DeepSpeed support

3 participants