Skip to content

fix(ai-aws-content-moderation): moderate decoded LLM content instead of raw body#13528

Open
shreemaan-abhishek wants to merge 1 commit into
apache:masterfrom
shreemaan-abhishek:fix/ai-aws-content-moderation-decode-llm-content
Open

fix(ai-aws-content-moderation): moderate decoded LLM content instead of raw body#13528
shreemaan-abhishek wants to merge 1 commit into
apache:masterfrom
shreemaan-abhishek:fix/ai-aws-content-moderation-decode-llm-content

Conversation

@shreemaan-abhishek

Copy link
Copy Markdown
Contributor

Description

The ai-aws-content-moderation plugin sent the raw HTTP request body to AWS Comprehend instead of parsing the LLM request and extracting the actual prompt content. As a result, moderation scored the undecoded JSON envelope, while the upstream LLM acts on the decoded content, so the two see different text:

  • A body containing "content":"toxic" was scored by Comprehend as the literal escaped string, whereas the upstream model decodes it to toxic.
  • The full {"model":...,"messages":[...]} envelope was scored as-is, adding noise to the toxicity result.

This is inconsistent with the sibling ai-aliyun-content-moderation plugin, which is protocol-aware.

What changed

Make the plugin protocol-aware, in the same style as ai-aliyun-content-moderation:

  • Require/validate the application/json content type.
  • Parse the JSON body, detect the client protocol via ai-protocols, and extract the LLM-visible content (messages[].content, Responses input/instructions, Anthropic message content, etc.).
  • Send only the normalized, decoded content to Comprehend.

Because the plugin runs in the rewrite phase (before ai-proxy), it detects the protocol directly rather than relying on ctx.ai_client_protocol. Requests that are not recognized AI requests (non-JSON content type, unparseable bodies, or JSON that carries no LLM content) are handled by the existing fail_mode (skip by default), so non-AI traffic on Consumer-bound plugins keeps its current pass-through behavior.

Behavior change

Making the plugin JSON/LLM-only is a behavior change for anyone running it on non-JSON routes. With the default fail_mode: skip, such requests now pass through unchecked instead of being forwarded verbatim to Comprehend; set fail_mode: error to reject them.

Which issue(s) this PR fixes:

Fixes #

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

…of raw body

The plugin sent the raw HTTP request body to AWS Comprehend, so it scored
the undecoded JSON envelope (e.g. the "messages" wrapper and escape
sequences like a literal backslash-u sequence) instead of the actual
prompt the upstream LLM acts on. This makes the moderation see different
text than the model and adds noise to the toxicity result.

Make the plugin protocol-aware, like ai-aliyun-content-moderation: require
application/json, parse the body, detect the client protocol via
ai-protocols, and send only the normalized, decoded LLM-visible content
to Comprehend. Non-AI requests (non-JSON, unparseable, or bodies that
carry no LLM content) are governed by the existing fail_mode.
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant