Fix ACPAgent dropping image content from user messages#3027
Fix ACPAgent dropping image content from user messages#3027sjathin wants to merge 6 commits intoOpenHands:mainfrom
Conversation
_build_acp_prompt now returns a list of ACP content blocks (text + image) instead of a single joined string. ImageContent URLs (data URIs and plain URLs) are converted to ACP ImageContentBlock via the new _image_url_to_acp_block helper. step() passes the blocks directly to conn.prompt(). Closes OpenHands#2982
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟢 Good taste - Clean implementation that eliminates the special case of dropped image content.
[EVAL CONSIDERATION]
This PR fixes a bug where image content was silently dropped, but it could affect agent behavior in multimodal scenarios. While the fix is correct, I recommend running lightweight evals before merging to ensure no unexpected performance impacts in benchmarks that use image inputs.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This is a correctness fix for existing ACP functionality. The change is well-isolated to ACPAgent, has good test coverage, and includes proper error handling for malformed data URIs. The risk is low because it's fixing missing functionality rather than changing existing behavior.
VERDICT:
✅ Worth merging after eval check: Implementation is solid, tests are comprehensive, and this fixes a real bug where images were being silently dropped.
KEY INSIGHT:
The change from str to list[ContentBlock] eliminates a special case by preserving all content types, which is a cleaner architecture than text-only extraction.
|
@OpenHands do we have existing example tests for ACP agent? If so, can we modify that example test to add an image as input so we can thoroughly test the ACP system to make sure that it can actually handle image inputs? |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Adds a second conversation turn that sends a Message with TextContent + ImageContent (using the OpenHands logo URL) so the example test exercises the new multimodal image support end-to-end through the ACP pipeline. Co-authored-by: openhands <openhands@all-hands.dev>
|
Yes, there is an existing example test at The example now covers:
This will be validated when the example test suite runs ( This comment was generated by an AI agent (OpenHands). |
SummaryThe reviewer asked whether there are existing example tests for the ACP agent and requested adding an image input to that example to end-to-end test the new image support. What was doneI modified the existing ACP agent example ( Checklist
ConcisenessThe changes are minimal and focused — only one file was modified (
No extraneous changes were made. |
Why
ACPAgent._build_acp_prompt()only extractsTextContentfrom user messages and silently dropsImageContentblocks. The ACPprompt()method supportsImageContentBlockbut it was never wired up.Summary
_build_acp_prompt()to returnlist[TextContentBlock | ImageContentBlock]instead ofstr, preserving image content as ACPImageContentBlock_image_url_to_acp_block()helper to convert data URIs and plain URLs to ACP image blocksstep()to pass content blocks directly toconn.prompt()instead of wrapping in a singletext_block()Issue Number
Closes #2982
How to Test
uv run python -m pytest tests/sdk/agent/test_acp_agent.py -v -k "not test_static_secret"168 tests pass. New tests cover:
test_build_acp_prompt_includes_image_content— mixed text + image messageTestImageUrlToAcpBlock— data URI parsing, plain URL passthrough, invalid URI handlingManually verified with a real PNG image that base64 data round-trips correctly through the conversion.
Video/Screenshots
N/A — backend logic change.
Type
Notes
The
_image_url_to_acp_blockhelper handles two URL formats:data:image/png;base64,...): parsed intodata+mime_typefieldshttps://...): passed via theurifield with a generic MIME type for the ACP server to resolve