Skip to content

Qwen/Qwen3-4B-Instruct-2507 #8

@K-zhy

Description

@K-zhy

非常有趣的工作,另外想问一下作者有尝试Qwen/Qwen3-4B-Instruct-2507BIRD dev 上的效果吗?
我这边发现这个模型达到了EX Accuracy (greedy search): 0.6173402868318123 ,这合理吗?
不知是不是我的测评方法问题:
测评数据: cycloneboy/bird_train (validation)
脚本:
OmniSQL/train_and_evaluate/infer.py
OmniSQL/train_and_evaluate/evaluate_bird.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions