Skip to content

任务适配输出异常 #61

@xjtupy

Description

@xjtupy

想用作者的代码训练我自己的任务,任务大致如下:

### 任务目标
你是一个智能决策模型负责根据用户搜索的query列表与点击的poi列表找出数据库中缺失的query集合### 任务执行方式
1找出不相关query集合分析每一个搜索query与点击poi列表中每个poi的相关性如果搜索query与点击poi列表中每个poi都不相关那么这个query就是不相关query你的分析过程需要用<no_relate_think></no_relate_think>包裹起来最终找出的不相关的query集合以json格式保存并用<no_relate_query></no_relate_query>包裹起来例如<no_relate_query>[{{"query":"xxx", "lng":"xxx", "lat":"xxx", "country_code":"xxx"}}, ...]</no_relate_query>
2生成调用接口请求如果存在不相关的query集合你可以通过<search>[{{"query":"xxx", "lng":"xxx", "lat":"xxx", "country_code":"xxx"}}, ...]</search>调用检索接口它会返回每个不相关query检索到的poi数据列表并用<poi_recall_informations></poi_recall_informations>包裹起来3找出数据库中缺失的query集合分析每个不相关query和它检索到的poi数据列表中每个poi数据的相关性如果不相关query与它检索到的poi数据列表中每个poi都不相关那么这个query就是数据库中缺失的query你的分析过程用<miss_query_think></miss_query_think>包裹起来最终找出的数据库中缺失的query集合以json格式保存并用<answer></answer>包裹起来例如<answer>[{{"query":"xxx", "lng":"xxx", "lat":"xxx", "country_code":"xxx"}}, ...]</answer>

### 输出格式
你的输出格式必须是以下两种的一种1存在不相关的query集合的输出格式<no_relate_think>你分析不相关query的思考过程</no_relate_think>
<no_relate_query>不相关的query集合</no_relate_query>
<search>调用搜索工具的query集合</search>
<poi_recall_informations>召回的poi列表</poi_recall_informations>
<miss_query_think>你分析缺失query的思考过程</miss_query_think>
<answer>最终答案</answer>
2不存在不相关的query集合的输出格式<no_relate_think>你分析不相关query的思考过程</no_relate_think>
<no_relate_query>[]</no_relate_query>
<miss_query_think>你分析缺失query的思考过程</miss_query_think>
<answer>最终答案</answer>

用户搜索的query列表:{search_querys}
用户点击的poi列表:{click_pois}\n

出现一下问题:
1、训练之前的校验阶段,输出内容和格式正常,但调用完工具后不在继续执行

Image 2、训练阶段,除了上述错误外,还会输出异常字符,格式也是混乱的 Image Image

我的训练脚本:

#model_type=$1

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export GPU_NUMS=8
export WANDB_API_KEY=''
export RAY_TMPDIR='/nfs/dataset-ofs-search-v1/ray_tmpdir'
export DATA_DIR='/nfs/dataset-ofs-search-v1/ddmpeng/miss_mining_agent_v1/miss_mining_agent/data'

WAND_PROJECT='miss_mining_agent_v2'

BASE_MODEL=""
EXPERIMENT_NAME=""

#if [ "$model_type" = "llama3.1_8b_instruct" ]; then
#  BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/miss_mining_agent_v2/miss_mining_agent/models/Meta-Llama-3.1-8B-Instruct'
#  EXPERIMENT_NAME='miss_mining_agent_v2-grpo-llama3.1-8b'
#elif [ "$model_type" = "llama3.1_8b_instruct_sft" ]; then
#  BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/product_auto/model/merge/merge_miss_query_agent_lora_sft_ds3_llama3.1_8b'
#  EXPERIMENT_NAME='miss_mining_agent_v2-grpo-llama3.1-8b_sft'
#elif [ "$model_type" = "llama3.2_1b_instruct" ]; then
#  BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/miss_mining_agent_v2/miss_mining_agent/models/Llama-3.2-1B-Instruct'
#  EXPERIMENT_NAME='miss_mining_agent_v2-grpo-llama3.2-1b'
#elif [ "$model_type" = "llama3.2_3b_instruct" ]; then
#  BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/miss_mining_agent_v2/miss_mining_agent/models/Llama-3.2-3B-Instruct'
#  EXPERIMENT_NAME='miss_mining_agent_v2-grpo-llama3.2-3b'
#elif [ "$model_type" = "qwen2.5_7b_instruct_sft" ]; then
#  BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/product_auto/model/merge/merge_miss_query_agent_lora_sft_ds3_qwen2.5_7b_2'
#  EXPERIMENT_NAME='miss_mining_agent_v2-grpo-qwen2.5_7b'
#else
#  BASE_MODEL=""
#  EXPERIMENT_NAME=""
#fi

#export BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/LLaMA-Factory/pretrain_model/Qwen2.5-7B-Instruct'
export BASE_MODEL='/nfs/dataset-ofs-search-v1/ddmpeng/product_auto/model/merge/merge_miss_query_agent_lora_sft_ds3_qwen2.5_7b'
export EXPERIMENT_NAME='miss_mining_agent-grpo-qwen2.5_7b_lora'

HYDRA_FULL_ERROR=1 python3 -m agent_r1.src.main_agent \
    algorithm.adv_estimator=grpo \
    data.train_files=["$DATA_DIR/train.parquet"] \
    data.val_files=["$DATA_DIR/test.parquet"] \
    data.train_batch_size=64 \
    data.val_batch_size=64 \
    data.max_prompt_length=4096 \
    data.max_response_length=4096 \
    data.max_response_length_single_turn=1024 \
    data.use_default_tool_template=False \
    actor_rollout_ref.model.path=$BASE_MODEL \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.actor.optim.lr_warmup_steps_ratio=0.285 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=32 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n_repeat=5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.logger=['console','wandb'] \
    trainer.project_name=$PROJECT_NAME \
    trainer.experiment_name=$EXPERIMENT_NAME \
    trainer.n_gpus_per_node=$GPU_NUMS \
    trainer.nnodes=1 \
    trainer.save_freq=20 \
    trainer.test_freq=10 \
    trainer.total_epochs=10 \
    trainer.val_before_train=True \
    trainer.log_val_generations=0 \
    tool.max_turns=2 \
    tool.tools=['dd_search'] \
    tool.env=dd_search \
    tool.max_tool_response_length=2048 \
    2>&1 | tee $EXPERIMENT_NAME.log

看了源代码,感觉没啥问题,希望作者帮忙解答下

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions