fix(arena): pass full conversation history to moderate_input in vision anony arena#3864
Open
Chessing234 wants to merge 1 commit intolm-sys:mainfrom
Open
Conversation
…n anony arena The third argument to moderate_input should be all_conv_text (the full conversation history from both models), not just the current user message. Without this, multi-turn conversations bypass content moderation because only the current short input is checked, not prior model responses.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug: In
gradio_block_arena_vision_anony.py,moderate_inputis called withtextas both the user message and theall_conv_textargument:The function signature is
moderate_input(state, text, all_conv_text, model_list, images, ip), whereall_conv_textis the full conversation history that the moderation filter checks. Passing only the current user message means prior model responses are never moderated, allowing policy-violating content to slip through multi-turn conversations.Root cause: The other arena files (
gradio_block_arena_vision_named.py) already computeall_conv_textfrom both models' prompts and pass it correctly. This file was left with the original two-textcall.Fix: Compute
all_conv_textfrom both models' conversation histories (matching the pattern ingradio_block_arena_vision_named.py) and pass it as the third argument: