Skip to content

ValueError: operands could not be broadcast together with shapes (243,) (158,) #63

@zwxandy

Description

@zwxandy

When the training process comes to step 78, there is a bug as shown below. This is most likely caused by the problem of mask. I only adapted this framework to my task, wondering if this is a common bug or other problem?

Traceback (most recent call last):
  File "/storage/v-wenzeng/Agent-R1/agent_r1/src/main_agent.py", line 67, in main
    run_agent(config)
  File "/storage/v-wenzeng/Agent-R1/agent_r1/src/main_agent.py", line 79, in run_agent
    ray.get(runner.run.remote(config))
  File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
  File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/worker.py", line 2858, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/home/dki/.conda/envs/agent_rl/lib/python3.10/site-packages/ray/_private/worker.py", line 958, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=434664, ip=10.0.0.6, actor_id=73a93c038ea79f93bdc1c7a801000000, repr=<main_agent.TaskRunner object at 0x792c96999960>)
  File "/storage/v-wenzeng/Agent-R1/agent_r1/src/main_agent.py", line 202, in run
    trainer.fit()
  File "/storage/v-wenzeng/Agent-R1/agent_r1/src/agent_ray_trainer.py", line 1033, in fit
    gen_batch_output = generation_manager.run_llm_loop(
  File "/storage/v-wenzeng/Agent-R1/agent_r1/llm_agent/generation.py", line 367, in run_llm_loop
    rollings = self._update_rolling_state(
  File "/storage/v-wenzeng/Agent-R1/agent_r1/llm_agent/generation.py", line 197, in _update_rolling_state
    new_action_masks.append(action_mask + action_masks[i])
ValueError: operands could not be broadcast together with shapes (243,) (158,)

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

The training script used here:

export BASE_MODEL='Qwen/Qwen2.5-3B-Instruct'
export PROJECT_NAME='multiturn'
export EXPERIMENT_NAME=ppo-qwen2.5-3b-instruct

python3 -m agent_r1.src.main_agent \
    data.train_files=['data/python_multiturn_new/qwen/finqa/train.parquet'] \
    data.val_files=['data/python_multiturn_new/qwen/finqa/test.parquet'] \
    data.train_batch_size=2 \
    data.max_prompt_length=8192 \
    data.max_response_length=8192 \
    data.max_response_length_single_turn=1024 \
    actor_rollout_ref.model.path=$BASE_MODEL \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=2 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=True \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.stop_token_ids=[151658] \
    actor_rollout_ref.rollout.stop=[] \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    critic.optim.lr=1e-5 \
    critic.model.use_remove_padding=True \
    critic.model.path=$BASE_MODEL \
    critic.model.enable_gradient_checkpointing=True \
    critic.ppo_micro_batch_size_per_gpu=1 \
    critic.model.fsdp_config.param_offload=True \
    critic.model.fsdp_config.optimizer_offload=True \
    algorithm.adv_estimator=gae \
    algorithm.kl_ctrl.kl_coef=0.001 \
    algorithm.use_kl_in_reward=True \
    trainer.critic_warmup=3 \
    trainer.logger=['console','wandb'] \
    trainer.project_name=$PROJECT_NAME \
    trainer.experiment_name=$EXPERIMENT_NAME \
    trainer.n_gpus_per_node=1 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=10 \
    trainer.val_before_train=True \
    trainer.log_val_generations=0 \
    tool.max_turns=3 \
    tool.tools=['table_keyword_search'] \
    tool.use_batch_tool_calls=False \
    tool.val_kwargs.use_batch_tool_calls=False \
    tool.max_tool_response_length=512 $@

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions