Skip to content

[codex] add typed generate_text provider adapters#145

Draft
Hynek Kydlíček (hynky1999) wants to merge 4 commits into
mainfrom
codex/add-generate-text-provider-adapters
Draft

[codex] add typed generate_text provider adapters#145
Hynek Kydlíček (hynky1999) wants to merge 4 commits into
mainfrom
codex/add-generate-text-provider-adapters

Conversation

@hynky1999
Copy link
Copy Markdown
Collaborator

Purpose

Add a Vercel-style typed generate_text inference surface while preserving the existing raw generate escape hatch.

Changes

  • Add TypedDict message/content-part types for text, image, and file inputs.
  • Add provider-specific message conversion for OpenAI-compatible chat, OpenAI Responses, Google/Gemini, and Anthropic.
  • Add native endpoint providers for Google, OpenAI Responses, and Anthropic.
  • Normalize in-memory media into each provider wire format, including Gemini inlineData for video bytes.
  • Document the new API and provider-specific providerOptions usage.

Validation

  • uv run pytest tests/test_inference.py
  • uv run ty check
  • uv run ruff check --force-exclude src/refiner/inference tests/test_inference.py docs/inference.md
  • uv run pytest (644 passed)

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new generate_text API for typed, multimodal inference, adding native support for Google Gemini, Anthropic Messages, and OpenAI Responses APIs. The changes include a message conversion layer to translate canonical Refiner messages into provider-specific formats, along with new client implementations and updated documentation. Feedback identifies a potential issue where sending providerOptions as a top-level key to OpenAI-compatible endpoints could cause request failures. Additionally, a suggestion was made to reduce code duplication in the Google client by using a shared HTTP helper function.

Comment on lines +118 to +124
if providerOptions is not None and not isinstance(
provider,
GoogleEndpointProvider
| AnthropicEndpointProvider
| OpenAIResponsesProvider,
):
payload["providerOptions"] = providerOptions
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For OpenAIEndpointProvider and VLLMProvider, including providerOptions as a top-level key in the request payload is likely to cause 400 Bad Request errors from most OpenAI-compatible endpoints, as they typically do not recognize this field. Since the relevant options (like reasoningEffort) are already extracted and normalized into the payload in previous steps (lines 93-94, 116-117), this assignment should be removed for these providers.

Comment on lines +162 to +213
async def generate_text(self, payload: Mapping[str, Any]) -> InferenceResponse:
response_json = await self._post_json(
f"{_google_model_path(self.model)}:generateContent",
payload,
operation="google generation",
)
if not isinstance(response_json, Mapping):
raise RuntimeError("google generation response must be a JSON object")
return _parse_google_inference_response(response_json)

async def _post_json(
self,
endpoint_path: str,
payload: Mapping[str, Any],
*,
operation: str,
) -> Any:
client = self._ensure_client()
for attempt in range(_OPENAI_ENDPOINT_MAX_RETRIES):
try:
response = await client.post(endpoint_path, json=dict(payload))
break
except (
ConnectionError,
OSError,
asyncio.TimeoutError,
httpx.NetworkError,
httpx.TimeoutException,
) as err:
if attempt + 1 >= _OPENAI_ENDPOINT_MAX_RETRIES:
message = (
f"{operation} request failed after "
f"{_OPENAI_ENDPOINT_MAX_RETRIES} attempts: "
f"{type(err).__name__}: {err}"
)
raise RuntimeError(message) from err
await asyncio.sleep(_retry_delay_seconds(attempt))
else:
raise RuntimeError(f"{operation} request failed without a response")
try:
response.raise_for_status()
except httpx.HTTPStatusError as err:
detail = ""
try:
detail = str(err.response.json())
except ValueError:
detail = err.response.text.strip()
message = f"{operation} request failed with HTTP {err.response.status_code}"
if detail:
message = f"{message}: {detail}"
raise RuntimeError(message) from err
return response.json()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _post_json method in _GoogleEndpointClient is identical to the _post_json_with_retries helper function defined later in this file. To improve maintainability and reduce code duplication, _GoogleEndpointClient should use the helper function.

Suggested change
async def generate_text(self, payload: Mapping[str, Any]) -> InferenceResponse:
response_json = await self._post_json(
f"{_google_model_path(self.model)}:generateContent",
payload,
operation="google generation",
)
if not isinstance(response_json, Mapping):
raise RuntimeError("google generation response must be a JSON object")
return _parse_google_inference_response(response_json)
async def _post_json(
self,
endpoint_path: str,
payload: Mapping[str, Any],
*,
operation: str,
) -> Any:
client = self._ensure_client()
for attempt in range(_OPENAI_ENDPOINT_MAX_RETRIES):
try:
response = await client.post(endpoint_path, json=dict(payload))
break
except (
ConnectionError,
OSError,
asyncio.TimeoutError,
httpx.NetworkError,
httpx.TimeoutException,
) as err:
if attempt + 1 >= _OPENAI_ENDPOINT_MAX_RETRIES:
message = (
f"{operation} request failed after "
f"{_OPENAI_ENDPOINT_MAX_RETRIES} attempts: "
f"{type(err).__name__}: {err}"
)
raise RuntimeError(message) from err
await asyncio.sleep(_retry_delay_seconds(attempt))
else:
raise RuntimeError(f"{operation} request failed without a response")
try:
response.raise_for_status()
except httpx.HTTPStatusError as err:
detail = ""
try:
detail = str(err.response.json())
except ValueError:
detail = err.response.text.strip()
message = f"{operation} request failed with HTTP {err.response.status_code}"
if detail:
message = f"{message}: {detail}"
raise RuntimeError(message) from err
return response.json()
async def generate_text(self, payload: Mapping[str, Any]) -> InferenceResponse:
response_json = await _post_json_with_retries(
self._ensure_client(),
f"{_google_model_path(self.model)}:generateContent",
payload,
operation="google generation",
)
if not isinstance(response_json, Mapping):
raise RuntimeError("google generation response must be a JSON object")
return _parse_google_inference_response(response_json)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant