Phase 0 + 0b spikes: OpenAI Agents SDK vs PydanticAI#97
Open
pofallon wants to merge 2 commits into
Open
Conversation
Adds the throwaway spike for the BURR.md migration decision gate and a
one-page report against the spec's decision-gate matrix.
Five validations against `sample-maverick-project-37n.3`:
V1 OAuth PASS LiteLLM device flow works; OpenCode auth.json
cannot bootstrap (stale ghu_ token)
V2 codex+tools PASS SubmitImplementationPayload first try;
implementer shipped renderer.py + tests
V3 claude+tools FAIL Claude on Copilot Chat Completions returns
text / markdown-fenced JSON; SDK has no
envelope unwrap layer
V4 prompt cache PASS 97% of seeded prefix served from cache run 2
V5 cost telemetry PARTIAL tokens reachable; cost_usd / cache_write
not exposed via openai-agents Usage
Outcome: row 4 of the decision matrix —
pass | pass | partial | * | * → Proceed to Phase 1
Phase 1 scope additions captured in the report:
- MaverickCascadingModel injects Copilot IDE-auth headers
(openai-agents SDK's User-Agent override breaks Claude path)
- MaverickCascadingModel ships an envelope-unwrap layer (mirrors
OpenCode `_unwrap_envelope`) to strip markdown fences
- Agent.__init__ wraps output_type with strict_json_schema=False
- DEFAULT_TIERS routes claude-favoured roles to openai/* first
- cost_usd computed against a Maverick-owned price table
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-up spikes to the original Phase 0:
1. scripts/spike-v3-redo.py — tests the function-tool + StopAtTools
output pattern against the OpenAI Agents SDK. Confirms it works
cleanly for codex (29s / 5 raw_responses / 22 RunItems — tighter
than V2's output_type= pattern at 75.8s / 14 / 37) but claude on
Copilot Chat Completions doesn't reliably invoke any tool at all.
Asked "What is 17 + 25?" with an explicit add tool, claude returns
"Sure! Let me calculate that for you!" and ends the turn — exactly
the MCP-era failure mode that motivated this whole migration.
2. scripts/spike-pydantic-ai.py — Phase 0b proper. Validates PydanticAI
on the same surface that broke Phase 0:
V1 codex auth PASS (1.9s)
V2 codex + tools + typed PASS (27.7s — 2.7x faster than 0a)
V3 claude via Copilot FAIL — but for a different reason
V4 prompt cache PASS (79% hit)
V5 cost telemetry richer than 0a (cache_write exposed)
Key finding: PydanticAI implements typed output as a hidden
``final_result`` function-tool call — the architecturally-clean pattern
by default. No strict_json_schema gotcha. But Claude on Copilot is
unreachable through PydanticAI's strict response validation (Copilot's
ChatCompletion response is missing required fields).
Net: neither substrate makes Claude-on-Copilot work. The root cause is
upstream of both libraries — Copilot's Chat Completions endpoint
doesn't behave like OpenAI's for the Claude family.
Phase 0b report lays out three paths:
A. Continue with OpenAI Agents SDK + LiteLLM, eat the workarounds
B. Switch to PydanticAI, validate against Anthropic-direct first
(needs API key)
C. Defer the migration; benefits are smaller than the spec assumed
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three commits across two spikes investigating the BURR.md migration
decision gate. The original Phase 0 (PR title) closed with "proceed
with caveats"; a sharp question from the reviewer surfaced a deeper
concern (MCP-era tool-use pain) and triggered Phase 0b against
PydanticAI as an alternative substrate.
Findings (both spikes together)
strict_json_schema=False, Copilot IDE headersStopAtToolsmodel doesn't call tools at allChatCompletionschema (missingindex)cached_tokensexposedcache_read_tokensandcache_write_tokensexposedcost_usd/cache_writenocost_usdstill noCrucial finding: PydanticAI implements typed output as a hidden
final_resultfunction-tool call by default — the architecturally-clean pattern. No
strict_json_schemagotcha; the pattern I arguedfor in code review is what PydanticAI ships.
Equally crucial: Claude on Copilot fails on BOTH substrates, for
different reasons. The root cause is upstream of either library —
Copilot's Chat Completions endpoint doesn't behave like OpenAI's for
the Claude family.
Files
scripts/spike-openai-agents-sdk-litellm.py+ results — Phase 0.scripts/spike-v3-redo.py+ results — function-tool / StopAtToolsfollow-up. Validates that the pattern works for codex (cleaner than
output_type=) but doesn't fix Claude on Copilot.scripts/spike-pydantic-ai.py+ results — Phase 0b.docs/migration-phase-0-report.md— original Phase 0 report.docs/migration-phase-0b-report.md— Phase 0b report with threecandidate paths (A: stay on OpenAI Agents SDK; B: switch to
PydanticAI, requires Anthropic key for full validation; C: defer
the migration).
What we still don't know
load-bearing question. Not testable in this dev env without an
Anthropic API key.
in-process Python SDK, or only via OpenCode's adapter.
Decision pending
Awaiting a call on Path A / B / C. Phase 1 is on hold.
🤖 Generated with Claude Code