Hello,
I believe there is an issue with the "OpenAIChatCompletionRequest" class, the do_sample boolean is specified, but setting it has no effect, which causes the model to repeat the same message after regeneration. The "seed" parameter is ignored if "do_sample" is "false".
Tested with Gemma 2 using the LLM worker. If I hardcode the value to true in the code, it picks up the seed, and the message is different.
Also, i don't see a way to set "stream_chunk_tokens" it is not included in OpenAIChatCompletionRequest and is ignored. But setting it in the code does work.
Hello,
I believe there is an issue with the "OpenAIChatCompletionRequest" class, the
do_sampleboolean is specified, but setting it has no effect, which causes the model to repeat the same message after regeneration. The "seed" parameter is ignored if "do_sample" is "false".Tested with Gemma 2 using the LLM worker. If I hardcode the value to true in the code, it picks up the seed, and the message is different.
Also, i don't see a way to set "stream_chunk_tokens" it is not included in OpenAIChatCompletionRequest and is ignored. But setting it in the code does work.