Skip to content

fix(arena): pass full conversation history to moderate_input in vision anony arena#3864

Open
Chessing234 wants to merge 1 commit intolm-sys:mainfrom
Chessing234:fix/vision-arena-moderate-full-conversation
Open

fix(arena): pass full conversation history to moderate_input in vision anony arena#3864
Chessing234 wants to merge 1 commit intolm-sys:mainfrom
Chessing234:fix/vision-arena-moderate-full-conversation

Conversation

@Chessing234
Copy link
Copy Markdown

Bug: In gradio_block_arena_vision_anony.py, moderate_input is called with text as both the user message and the all_conv_text argument:

text, image_flagged, csam_flag = moderate_input(
    state0, text, text, model_list, images, ip
)

The function signature is moderate_input(state, text, all_conv_text, model_list, images, ip), where all_conv_text is the full conversation history that the moderation filter checks. Passing only the current user message means prior model responses are never moderated, allowing policy-violating content to slip through multi-turn conversations.

Root cause: The other arena files (gradio_block_arena_vision_named.py) already compute all_conv_text from both models' prompts and pass it correctly. This file was left with the original two-text call.

Fix: Compute all_conv_text from both models' conversation histories (matching the pattern in gradio_block_arena_vision_named.py) and pass it as the third argument:

all_conv_text_left = states[0].conv.get_prompt()
all_conv_text_right = states[1].conv.get_prompt()
all_conv_text = (
    all_conv_text_left[-1000:] + all_conv_text_right[-1000:] + "\nuser: " + text
)
text, image_flagged, csam_flag = moderate_input(
    state0, text, all_conv_text, model_list, images, ip
)

…n anony arena

The third argument to moderate_input should be all_conv_text (the full
conversation history from both models), not just the current user message.
Without this, multi-turn conversations bypass content moderation because
only the current short input is checked, not prior model responses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant