Skip to content

feat: add image support#1737

Open
mike-inkeep wants to merge 4 commits intomainfrom
feat/image-support
Open

feat: add image support#1737
mike-inkeep wants to merge 4 commits intomainfrom
feat/image-support

Conversation

@mike-inkeep
Copy link

  • Add image handling support (without persistence to conversation history)
  • Tweak image URL schema and tighten up tests

@vercel
Copy link

vercel bot commented Feb 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Feb 5, 2026 9:05pm
agents-manage-ui Ready Ready Preview, Comment Feb 5, 2026 9:05pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
agents-docs Skipped Skipped Feb 5, 2026 9:05pm

Request Review

@changeset-bot
Copy link

changeset-bot bot commented Feb 5, 2026

⚠️ No Changeset found

Latest commit: cf9b73b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@claude
Copy link
Contributor

claude bot commented Feb 5, 2026

Claude finished @mike-inkeep's task in 4m 57s —— View job


PR Review Summary

5 Key Findings | Risk: Medium

🟠🔶⚠️ Major (3) 🟠🔶⚠️

🟠 1) message-parts.ts No unit tests for new utility functions

files: agents-api/src/domains/run/utils/message-parts.ts

Issue: The new message-parts.ts file introduces 8 exported functions (imageUrlSchema, isTextContentItem, isImageContentItem, buildTextPart, buildFilePart, extractTextFromParts, getMessagePartsFromOpenAIContent, getMessagePartsFromVercelContent) with zero test coverage. These functions handle input validation and content transformation critical to the image feature.

Why: Without tests, validation edge cases (malformed URLs, invalid base64, mixed valid/invalid arrays) could slip through. The codebase pattern in model-context-utils.test.ts shows 60-80 assertions per utility suite is expected.

Fix: Create agents-api/src/__tests__/run/utils/message-parts.test.ts with test suites for:

  • imageUrlSchema validation (valid/invalid URLs, base64 formats, unsupported types like GIF)
  • Type guards (isTextContentItem, isImageContentItem) with edge cases
  • Content converters with mixed content arrays and error scenarios

🟠 2) message-parts.ts:25-33 Type guard doesn't use imported type and has weak null check

files: agents-api/src/domains/run/utils/message-parts.ts:25-33

Issue: The isImageContentItem type guard uses an inline object type instead of the imported ImageContentItem type, and uses optional chaining (item.image_url?.url) which could mask null issues when the guard returns true but item.image_url is falsy.

Why: If ImageContentItem type definition changes, this guard won't be updated. The optional chaining means safeParse(undefined) fails correctly, but callers assume item.image_url exists after the guard passes—which could be violated if only .url was checked.

Fix: Import and use ImageContentItem type, add explicit null check:

import type { ContentItem, ImageContentItem } from '../types/chat';

export const isImageContentItem = (item: ContentItem): item is ImageContentItem => {
  return (
    item.type === 'image_url' &&
    'image_url' in item &&
    item.image_url != null &&
    imageUrlSchema.safeParse(item.image_url.url).success
  );
};

🟠 3) message-parts.ts:77 buildFilePart doesn't validate non-data-URI strings are valid URLs

files: agents-api/src/domains/run/utils/message-parts.ts:58-80

Issue: When parseDataUri() returns null (not a data URI), the function assumes the input is a valid HTTP URL and constructs a FilePart with { uri, mimeType: 'image/*' }. However, no validation ensures uri is actually a valid URL.

Why: If malformed strings pass through (e.g., from lenient upstream validation), downstream Vercel AI SDK calls may fail with confusing errors when attempting to process the invalid URI.

Fix: Add URL validation before constructing the FilePart:

if (parsed) {
  return { kind: 'file', file: { bytes: parsed.base64Data, mimeType: parsed.mimeType }, ... };
}

// Validate that uri is actually a URL before using it
try {
  new URL(uri);
} catch {
  throw new Error(`Invalid image URI: expected valid data URI or HTTP URL`);
}

return { kind: 'file', file: { uri, mimeType: 'image/*' }, ... };

Point-fix Edits (2)


💡 APPROVE WITH SUGGESTIONS

Summary: This PR introduces solid image support infrastructure with proper URL validation, content transformation utilities, and AI SDK integration. The implementation follows existing patterns well. Main concerns are the missing unit tests for the new utility module (a codebase requirement per AGENTS.md) and a few type safety improvements. Once tests are added for message-parts.ts and the type guard is tightened, this is ready to ship.


Other Findings (5)

Potentially valid

(minor/info or lower confidence)

Location Issue Reason Excluded
message-parts.ts:5-17 No size limit on base64 data URIs (unlike schemas.ts which has 1MB limit) Different use case - request-level validation vs storage; AI providers handle large payloads
imageUrlSchema No internal IP blocklist for SSRF protection URLs are passed to AI providers who fetch them; SSRF is provider's responsibility
Agent.ts:3290 new URL(file.uri) could throw if URI is malformed Covered by upstream validation in route handlers
chat.ts:343-348 Images not persisted to conversation history Intentional per PR description
chatDataStream.ts:68 Vercel schema uses text field for image URL (unconventional naming) Matches Vercel AI SDK v5 convention

Discarded as invalid or not applicable

Location Issue Reason Excluded
Agent.ts:3288-3291 file.bytes access not narrowed after union check TypeScript narrowing via 'uri' in file is correct; the discriminated union ensures mutual exclusivity
Route handlers Missing rate limiting for image uploads Pre-existing concern not introduced by this PR
generateTaskHandler.ts No image count limits Reasonable for MVP; can be added if abuse occurs

Comment on lines 25 to 33
export const isImageContentItem = (
item: ContentItem
): item is { type: 'image_url'; image_url: { url: string; detail?: 'auto' | 'low' | 'high' } } => {
return (
item.type === 'image_url' &&
'image_url' in item &&
imageUrlSchema.safeParse(item.image_url?.url).success
);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MAJOR Type guard uses inline type instead of imported ImageContentItem

The type predicate re-declares the type inline instead of using the imported ImageContentItem type from ../types/chat. This creates maintenance risk if the type changes.

Suggested change
export const isImageContentItem = (
item: ContentItem
): item is { type: 'image_url'; image_url: { url: string; detail?: 'auto' | 'low' | 'high' } } => {
return (
item.type === 'image_url' &&
'image_url' in item &&
imageUrlSchema.safeParse(item.image_url?.url).success
);
};
export const isImageContentItem = (
item: ContentItem
): item is ImageContentItem => {
return (
item.type === 'image_url' &&
'image_url' in item &&
item.image_url != null &&
imageUrlSchema.safeParse(item.image_url.url).success
);
};

.string()
.regex(
/^data:image\/(png|jpeg|jpg|webp);base64,/,
'Image must be PNG, JPEG, or WebP format (GIF not supported by all providers)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check this? i believe we can support the gif file format. but it's just that only a single frame will work, it cannot understand animated gifs. and that is ok

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

claude and chatgpt both support GIF, but Gemini only supports these: https://ai.google.dev/gemini-api/docs/image-understanding#supported-formats

Gemini supports the following image format MIME types:

PNG - image/png
JPEG - image/jpeg
WEBP - image/webp
HEIC - image/heic
HEIF - image/heif

Alternatively, we can accept GIF and let the downstream provider error bubble up if we hit this case

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think skip gif for now until we figure out how to properly expose agent capabilities to the front end for client side validation.

@vercel vercel bot temporarily deployed to Preview – agents-docs February 5, 2026 21:03 Inactive
@claude
Copy link
Contributor

claude bot commented Feb 5, 2026

Claude finished @mike-inkeep's task in 5m 34s —— View job


PR Review Summary

1 Key Finding | Risk: Low

This is a follow-up review after the PR was synchronized with commit cf9b73b which addressed previous review feedback.

✅ Previous Feedback Addressed

The latest commit successfully addressed the following items from the previous review:

  • ✅ Type guard now uses imported ImageContentItem type (was inline type)
  • ✅ Added explicit null check (item.image_url != null) before accessing .url
  • ✅ Added URL validation in buildFilePart with try/catch block
  • ✅ Fixed base64 regex pattern (changed * to + to require at least one character)
  • ✅ Added warning log when Vercel message parts are dropped due to invalid schema
  • ✅ Made internal helper functions private (changed from export const to const)

🟠🔶⚠️ Major (1) 🟠🔶⚠️

🟠 1) message-parts.ts Missing unit tests for new utility module

files: agents-api/src/domains/run/utils/message-parts.ts

Issue: The new message-parts.ts file introduces 4 exported functions (imageUrlSchema, extractTextFromParts, getMessagePartsFromOpenAIContent, getMessagePartsFromVercelContent) plus several internal helpers, with no dedicated test file. While Agent.test.ts covers the integration path, the utility functions themselves lack direct unit tests.

Why: Per AGENTS.md, all new features require comprehensive unit tests. The codebase has a consistent pattern of utility tests (see model-resolver.test.ts, stream-helpers.test.ts, tool-streaming.test.ts in agents-api/src/__tests__/run/utils/). These validation and transformation functions are critical to correct image handling and should have edge case coverage.

Fix: Create agents-api/src/__tests__/run/utils/message-parts.test.ts with test cases for:

  • imageUrlSchema validation: valid HTTP URLs, valid data URIs (PNG, JPEG, WebP), invalid formats (GIF, malformed base64, non-image MIME types)
  • extractTextFromParts: empty arrays, mixed part types, text-only arrays
  • getMessagePartsFromOpenAIContent: string input, array input, mixed text/image content, invalid image URLs filtered
  • getMessagePartsFromVercelContent: backwards-compat string content, parts array, invalid parts logged and dropped

🕐 Pending from Previous Review 🕐

📋 GIF Support Decision

Per the discussion thread on line 14, the team decided to skip GIF support for now until there's a proper way to expose agent capabilities to the frontend for client-side validation. This is a reasonable MVP approach.


💡 APPROVE WITH SUGGESTIONS

Summary: This PR introduces well-architected image support with proper URL validation, format conversion between OpenAI/Vercel formats and the internal A2A protocol, and solid integration with the AI SDK. The previous review feedback was thoroughly addressed in commit cf9b73b. The remaining gap is the missing dedicated unit tests for the message-parts.ts utility module — recommend adding these before or shortly after merge to maintain the codebase's testing standards.


Other Findings (3)

Potentially valid

(minor/info or lower confidence)

Location Issue Reason Excluded
chatDataStream.ts:58 content: z.any() is permissive Intentional for backwards compatibility; actual parts are validated separately
Agent.ts:3296 Fallback mimeType: 'image/*' for URL-based images Acceptable — providers handle MIME type detection from URL

Discarded as invalid or not applicable

Location Issue Reason Excluded
message-parts.ts:8-20 No size limit on base64 data URIs Different use case than storage validation; AI providers handle payload limits
Previous: buildFilePart URL validation Was flagged as missing ✅ Now addressed in latest commit
Previous: Type guard using inline type Was flagged as maintenance risk ✅ Now uses imported ImageContentItem

};

export type ImageContentItem = {
type: 'image_url';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we redefining types and schemas?

https://github.com/inkeep/agents/pull/1737/changes#diff-d4594cf84e549a2289ce15381dea24e93b0a15bfa4211ffd785023abd461b99cR67-R73

I think it's a mistake made throughout our codebase, but we try to reduce schemas and types were we can.

You can infer types from zod schemas. https://zod.dev/basics?id=inferring-types


export type UserInput =
| string
| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this point we can follow the A2A protocol for parts.

I don't want to introduce too many intermediate types in the message handling.

I think we support: Vercel Streaming, A2A, OpenAI Completions, but our primary pipeline (our widget to our agent framework) does Vercel Streaming (Client Widget to Agents Server) to A2A (Agent Server internal messaging)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest defining types int he packages where we define A2A types or just re-using the existing by scoping down/omitting properties that we are not supporting in the A2A at this time.

type: string;
text?: string;
image?: string | URL;
experimental_providerMetadata?: { openai?: { imageDetail?: 'auto' | 'low' | 'high' } };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even just the enum component can make a useful extraction as a schema so we don't have to redefine the enum set.

nick-inkeep added a commit that referenced this pull request Feb 6, 2026
Pattern extracted from PR #1737 human reviewer feedback (amikofalvy):
- Types should derive from Zod schemas using z.infer<typeof schema>
- Use Pick/Omit/Partial instead of manually redefining type subsets
- Extract shared enum/union schemas instead of inline string literals

Changes:
- pr-review-types.md: New anti-pattern + analysis step 6 with detection patterns
- pr-review-consistency.md: Extended "Reuse" section to cover types

This demonstrates the closed-pr-review-auto-improver output — these are
the exact changes the agent proposed when run against PR #1737.

Co-Authored-By: Claude <noreply@anthropic.com>
nick-inkeep added a commit that referenced this pull request Feb 6, 2026
Patterns extracted from human reviewer feedback:
- Type Definition Discipline: use z.infer<> instead of manual type definitions
- Type Composition Safety: discriminated unions for mutually exclusive states

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants