[GPT-OSS] Accuracy and performance config differences - Reasoning effort

Reference in the regular inference uses "low" and "high" reasoning efforts for performance and accuracy runs respectively. The max tokens are also 10k and 32k respectively - reported in #303 
The current implementations for the gpt-oss dataset presets and also the message specs (both sglang and vllm) do not account for these. 
If we want parity/comparability with the legacy inference submissions we'd need to find a way to resolve this.

@nvzhihanj @arekay-nv 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPT-OSS] Accuracy and performance config differences - Reasoning effort #322

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[GPT-OSS] Accuracy and performance config differences - Reasoning effort #322

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions