feat(ccproxy): v2.0.0 — inspector architecture, lightllm, DAG pipeline, compliance by starbaser · Pull Request #16 · starbaser/ccproxy

starbaser · 2026-04-16T00:48:33Z

AI Summary

Complete rewrite of ccproxy from a LiteLLM proxy subprocess model to an in-process mitmproxy-based transparent LLM API interceptor. This is the v2.0.0 release (tagged v2.0.0-rc1).

Inspector architecture: mitmweb runs in-process via WebMaster API with dual listeners — reverse proxy + WireGuard namespace jail. No subprocess, no gateway server.
lightllm: Surgical nerve connector into LiteLLM's BaseConfig transformation pipeline, bypassing cost tracking and callback machinery entirely.
DAG-based hook pipeline: @hook(reads=..., writes=...) decorator-declared data dependencies, topologically sorted via Kahn's algorithm. Per-request overrides via x-ccproxy-hooks header.
SSE streaming: SseTransformer stateful stream callable — parses, transforms per-chunk via LiteLLM's provider iterators, re-serializes as OpenAI-format SSE.
Compliance profile learning: Provider-agnostic system that observes legitimate request shapes from WireGuard traffic and stamps compliance profiles onto proxied requests.
Gemini/Vertex AI support: Full routing, OAuth handling, context caching via cachedContents API, path rewriting for cloudcode-pa.googleapis.com.
Flows CLI: ccproxy flows list/dump/diff/compare/clear with multi-page HAR 1.2 output, jq filtering, and sliding-window diff across flow sets.
MCP notification endpoint: POST /mcp/notify for terminal event ingestion, buffered and injected as synthetic tool_use/tool_result pairs.
XDG config directory: Default config moved to ~/.config/ccproxy/ (breaking change).
init replaces install: CLI rename (breaking change).
Rich pipeline visualization: render_pipeline() builds a full DAG display with parallel groups via rich.columns.Columns.

Breaking Changes

Config directory: ~/.ccproxy/ → ~/.config/ccproxy/
CLI: ccproxy install → ccproxy init
--debug flag replaced by --log-level / -v
forward_port / reverse_port replaced by unified port config
mitm config section renamed to inspect
Prisma/database infrastructure removed entirely
LiteLLM proxy subprocess removed
to_mermaid / to_ascii removed from HookDAG

Test plan

just test passes with ≥90% coverage
just lint / just typecheck clean
Smoke test: ccproxy run --inspect -- claude --model haiku -p "what's 2+2"
Verify ccproxy init creates config at ~/.config/ccproxy/
Verify flows CLI: ccproxy flows list, ccproxy flows dump
Verify Gemini routing through inspector

Phase 3 of inspector stack enhancement — LiteLLM runs in its own WireGuard namespace, eliminating the HTTPS_PROXY env var hack: - process.py: launch mitmweb with two --mode wireguard: listeners (CLI port A, gateway port B), return 4-tuple with both ports - namespace.py: add create_gateway_namespace() with slirp4netns port forwarding (--port-map) for external HTTP client LAN access - addon.py: split ProxyDirection into WIREGUARD_CLI and WIREGUARD_GW, detect by comparing WG listen port against configured gateway port, set flow.metadata["ccproxy.direction"] for route handlers - cli.py: start LiteLLM inside gateway namespace via run_in_namespace(), fetch WG configs for both namespaces, remove HTTPS_PROXY/HTTP_PROXY - outbound.py: detect outbound via metadata instead of RegularMode - script.py: pass WG port config through to InspectorAddon

66 new tests covering: - routing.py: route dispatch, passthrough, host matching, error handling, path parameter extraction, blacklist domains (20 tests) - pcap.py: frame construction, sequence tracking, file/pipe output, addr normalization, payload building (17 tests) - wg_keylog.py: JSON parsing, format validation, error cases (5 tests) - routes/inbound.py: OAuth sentinel detection, token substitution, custom auth headers, direction tagging (9 tests) - routes/outbound.py: beta header merge, dedup, auth failure logging, direction filtering (10 tests) - addon.py: WIREGUARD_CLI vs WIREGUARD_GW detection, metadata tagging, forward guard, ProxyDirection enum stability (5 tests) Fix: route patterns use {path} instead of invalid {path:.*} (parse library doesn't support regex format specs)

Update architecture documentation with dual-WireGuard topology, xepor routing framework, PCAP synthesizer, WireGuard keylog export, OAuth dual-layer architecture, ProxyDirection enum values, provider-agnostic model, inspector addon chain ordering, and expanded test documentation.

Bridges inbound and outbound HTTPFlow objects via a thread-safe TTL store, enabling auth decisions from inbound routes to be readable in outbound routes. Removes vendored xepor routing code in favor of a thin InspectorRouter subclass for mitmproxy 12.x compatibility.

Introduces test_flow_store.py and test_telemetry.py with 188 new test cases covering FlowRecord dataclass defaults, flow creation/retrieval, TTL expiration, and InspectorTracer span lifecycle. Also adds edge case tests for _get_direction null handling in WireGuard mode.

Moves InspectorRouter class to a dedicated router.py module to better reflect its purpose as a xepor routing adapter. Updates all imports across the codebase and removes unused pyright overrides.

Consolidates OAuth and custom auth header logic under a unified naming scheme. Removes unused capture_bodies and excluded_hosts config fields, and standardizes Field default factories to use lambda. BREAKING CHANGE: renamed get_oauth_user_agent() to get_auth_provider_ua() and get_oauth_auth_header() to get_auth_header(); removed InspectorConfig.capture_bodies and excluded_hosts fields

Removes the subprocess-based script.py addon model and replaces it with direct in-process mitmproxy embedding. InspectorAddon now receives litellm_port as a constructor parameter instead of reading from environment, and namespace operations use async variants for event loop compatibility.

Clarifies that this field represents the HTTP header name used for authentication, not a lookup key. Updates all usages across flow store, routes, and tests. BREAKING CHANGE: AuthMeta.key_field renamed to auth_header; update instantiations to use auth_header parameter

Consolidates session ID parsing logic from multiple modules into a single reusable function in utils.py. Removes duplicate implementations from extract_session_id hook and inspector addon, improving maintainability.

- Defer web/streaming options via update_defer() since mitmproxy 12.x registers them through addons inside WebMaster.__init__, not on Options - Replace nonexistent --port-map flag with add_hostfwd API socket call (slirp4netns never had --port-map; this was a latent bug) - Bind LiteLLM to 0.0.0.0 in inspect mode so slirp4netns hostfwd traffic arriving at tap0 IP (10.0.2.100) reaches it without iptables - Pass litellm_port (not main_port) to gateway namespace — mitmproxy reverse proxy needs to reach LiteLLM, not the other way around

…aul logging Remove the vendored mitmpcap PCAP synthesizer (fake TCP/IP frame reconstruction) and replace with mitmproxy's native MITMPROXY_SSLKEYLOGFILE for real TLS key logging. Combined with the existing WireGuard keylog, packet captures can now be fully decrypted in Wireshark without synthetic frames. Overhaul logging to use unified tagged namespaces across all components: - Rewrite setup_logging() with stderr + truncate-on-restart file handler - Initialize config singleton early in main() for correct debug level - Route LiteLLM subprocess output through ccproxy.subprocess.litellm logger - Route slirp4netns output through ccproxy.subprocess.slirp4netns logger - Add nsenter command logging via ccproxy.subprocess.nsenter logger - Disable mitmproxy TermLog to prevent root logger hijack - Remove competing debug handler from CCProxyHandler.__init__ - Fix view_logs() missing -n flag for process-compose, add file fallback - Fix show_status() to report actual log file path - Gate web_open_browser on config, pass MitmproxyOptions through directly Deleted: inspector/pcap.py, tests/test_pcap.py, inspector/script.py references

Gemini CLI targets cloudcode-pa.googleapis.com (Google's proprietary Cloud Code API), which LiteLLM doesn't understand natively. Route this traffic through LiteLLM's /gemini/ pass-through endpoint with outbound host/path restoration so the correct upstream is reached. - Change forward_domains from list[str] to dict[str, str | None] where the value is the LiteLLM endpoint prefix (e.g. /gemini/) or None for direct forwarding - Add OriginalRequest dataclass to FlowRecord for storing the pre-rewrite host/port/scheme/path - Propagate flow ID through LiteLLM pass-through via x-pass- prefix (LiteLLM strips custom headers by default but always forwards x-pass-* headers with the prefix stripped) - Outbound handler looks up FlowRecord via flow ID header and restores original host/path before the request hits the provider - Split pyright (editor, standard mode) and mypy (CI, explicit strict flags) to eliminate cast+redundant-cast friction per Stainless SDK pattern: disable warn_unused_ignores and warn_redundant_casts - Add litellm stub modules for litellm_core_utils and proxy internals - Remove dead else-branch in hook registration loop (hooks list is typed list[str | dict], so the else was unreachable) - Annotate double-check lock pattern in ModelRouter with type: ignore[unreachable] since mypy can't model concurrent mutation

Introduces ccproxy.lightllm — a thin orchestration layer that imports LiteLLM's BaseConfig transformation pipeline directly and exposes it at the mitmproxy inspector layer. Zero vendored code; pure import glue. - dispatch.py: sequences validate_environment → get_complete_url → transform_request → sign_request for standard providers, with a dedicated Gemini path using _get_gemini_url + _transform_request_body - registry.py: wraps ProviderConfigManager (~90 providers for free) - noop_logging.py: duck-type stub for logging_obj parameter - inspector/routes/transform.py: mitmproxy route handler that matches InspectorConfig.transforms rules and rewrites flows to dest provider - TransformRoute config model on InspectorConfig.transforms - Transform router added to addon chain (after inbound, before outbound) - docs/light_llm_transform.md: full architecture reference

…l redaction - Strip ?key= from Gemini URL when using OAuth tokens (ya29.*), use Authorization: Bearer header only - Add match_model to TransformRoute for reverse proxy flows where all traffic arrives at the same host - Make match_host optional (None matches any host) - Parse request body before matching so match_model can inspect it - Collect hosts from pretty_host, Host header, and X-Forwarded-Host - Redact query params from transform log output (prevents credential leak)

The transform route now supports mode=passthrough which restores the original destination from FlowRecord.original_request, bypassing LiteLLM entirely. This fixes Gemini CLI routing — _maybe_forward rewrites cloudcode-pa.googleapis.com traffic to LiteLLM's /gemini/ pass-through, which incorrectly routes to generativelanguage.googleapis.com. The passthrough mode intercepts at the inbound layer and sends traffic directly to cloudcode-pa.googleapis.com with the CLI's own OAuth token. Verified: `ccproxy run --inspect -- gemini -p "..."` returns correct responses through the passthrough route.

…request path The lightllm nerve connector now handles all provider transformations directly at the mitmproxy layer. Traffic flows client → mitmweb → [inbound → transform → outbound] → provider with no LiteLLM subprocess or second WireGuard tunnel. - Remove _maybe_forward(), gateway direction detection, litellm_port - Collapse three mitmproxy listeners to two (reverse + WG-CLI) - Delete create_gateway_namespace() and run_in_namespace_async() - Remove forward_domains from InspectorConfig - Rewrite outbound routes for post-transform fixups (beta headers, Claude Code identity injection, auth failure observation) - Add fallback policy: WG flows passthrough, reverse proxy gets 501

…spector Context is now flow-native — wraps HTTPFlow as first-class member with body fields parsed once and flushed via commit(). Header mutations are live. Removes from_litellm_data/to_litellm_data. PipelineExecutor.execute() takes HTTPFlow directly. Two-DAG addon chain: inbound pipeline (OAuth, session extraction) → transform (lightllm) → outbound pipeline (beta headers, identity injection). Hooks adapted for flow-native Context: - forward_oauth: sentinel substitution + cached token via set_header() - add_beta_headers: single-write merge, anthropic-version guard - inject_claude_code_identity: string + list system types - extract_session_id: reads ctx.metadata, drops Langfuse plumbing - verbose_mode: strips redact-thinking-* via get/set_header() Config hooks field now supports inbound/outbound dict structure.

Remove handler.py, router.py, metadata_store.py, classifier.py, rules.py, patches/, and LiteLLM-only hooks (rule_evaluator, model_router, forward_apikey, capture_headers). Delete inbound.py and outbound.py route handlers (replaced by DAG pipeline). ccproxy start no longer has --inspect flag — inspect mode is the default. The non-inspect LiteLLM subprocess path is removed along with generate_handler_file(). ccproxy run --inspect remains for WG namespace jail. Update Nix defaults and YAML template to two-stage hook dict format. Strip RuleConfig, patches, default_model_passthrough from config. -9,470 lines deleted across 42 files.

Old CLAUDE.md documented the deleted LiteLLM handler/classifier/router pipeline. Rewritten from scratch to reflect the current architecture: mitmweb in-process, lightllm nerve connector, DAG-driven hook pipeline, single WireGuard tunnel. Marketplace plugin sync section preserved.

…oxy.yaml LiteLLM proxy was removed but config.yaml (its config file) persisted as dead weight. Delete it and promote host/port to first-class CCProxyConfig fields with CCPROXY_ env prefix override via pydantic-settings.

…rmation Universal SSE streaming: responseheaders hook on InspectorAddon detects text/event-stream responses and enables flow.response.stream before the body arrives — fixes client hanging for all providers. Cross-provider response transformation: SseTransformer wraps LiteLLM's per-provider ModelResponseIterator.chunk_parser() to rewrite SSE chunks on the fly. Non-streaming responses use transform_to_openai() via a MitmResponseShim that duck-types httpx.Response. TransformMeta on FlowRecord propagates provider/model/request_data from request phase to response phase.

extract_session_id wrote session_id into the body's metadata dict, which upstream APIs reject (Anthropic: "Extra inputs are not permitted", Google: "Unknown name metadata"). Store on flow.metadata instead. Context.metadata getter uses setdefault which creates an empty metadata key even for read-only guard checks. Strip empty metadata dicts in commit() so they don't leak into the request body.

extract_session_id declared writes=["session_id"] but now writes to flow.metadata — update to writes=[]. inject_mcp_notifications read session_id from ctx.metadata (body) which was always empty after the previous fix — read from flow.metadata instead.

Hardcoded 40-char width caused right border misalignment when parallel group labels overflowed. Width now computed from longest content line.

…urce The LiteLLM proxy server was removed several commits ago but many files still described the old architecture. This commit systematically removes every stale reference: rewrites README, configuration, and inspect docs from scratch; deletes the superseded skills/using-litellm-ccproxy skill; drops 8 unused dependencies from pyproject.toml; removes 9 dead type stubs; fixes source docstrings/comments/types across 6 source files; and cleans infrastructure files (process-compose, docker-compose, nix module, .gitignore).

…roxy_pplx_thread → session_id Generalizes the Perplexity-specific thread metadata field to the provider-agnostic `metadata.session_id` so any provider can use it for session/thread continuation. The extracted 10 MCP tools (pplx_usage, thread list/get/import/rename/share/delete/ bulk-delete/export) now live in the standalone ccpplx project at ~/dev/projects/ccpplx, which imports ccproxy as a runtime library dependency. ccproxy's MCP server drops from 22 tools to 12, keeping only flow inspection, shape capture, conversation grouping, and model catalog tools.

Exposes Perplexity thread history as OpenAI-shaped messages via new inspector route. Followup requests now send only the last user turn instead of flattened history when last_backend_uuid is present.

Adds pplx_steps module with renderers covering MCP_TOOL_INPUT/OUTPUT, web search, browser agent, image gen, calendar/email, code execution, and a generic catch-all that DEBUG-logs unmapped step_types instead of dropping silently. Dispatcher uses the lowercase content-field naming convention reverse-engineered from the SPA bundle (MCP_TOOL_INPUT → mcp_tool_input_content) so it covers the full 68-value step_type enum. _extract_deltas now walks plan_block.steps[] (the structured channel), gates text-field-JSON step processing on "no plan_block in this event" to avoid double-emit, pairs MCP_TOOL_OUTPUT to its INPUT by goal_id to recover tool_name (structured channel omits it on outputs), handles bare markdown_block (no diff_block wrapper), dedups step uuids across cumulative events, and DEBUG-logs unknown intended_usage block types once per stream. Surfaces as delta.reasoning_content (Claude-style thinking) plus non-spec response fields: pplx_mcp_steps, pplx_steps, pplx_goals, pplx_pending_followups, pplx_thread_title. response.model now reflects the upstream display_model (e.g. "claude46sonnet") instead of the requested alias. Removes the dead user-defined-tool prompt-injection experiment (pplx_tools.py + pplx_tool_inject hook + related tests/example): defeated by every frontier model tested in 2026 — the real tool-calling path on Perplexity is the server-side MCP connectors flow this commit now properly surfaces.

Prevents TypeError in curl-cffi when timeout=None is passed to client.request, which crashes on None + None arithmetic in set_curl_options.

Perplexity changed thread response shape from step-based `structured_answer[]` to block-based `blocks[]` with `intended_usage` keys. New parser reads `structured_answer_block_usages` hint and extracts answer from `markdown_block`, citations from `web_result_block`.

Adds inbound parsers (Anthropic Messages, OpenAI Chat) that produce ParsedRequest with pydantic-ai ModelMessage IR, outbound renderers that use pydantic-ai's per-provider Model._map_message via a CaptureSentinel pattern (Anthropic, OpenAI, Google, in-tree Perplexity), and a sync response pipeline (vendor-side intakes driving ModelResponsePartsManager directly, listener-side renderers emitting Anthropic Messages SSE and OpenAI Chat Completion SSE). Context gains _listener_format pinning and ensure_parsed() lazy bridge. Inspector rewire to consume these modules follows in Phase 8.

…fields SsePipeline is the sync callable that bridges upstream wire bytes → listener wire bytes via the IR pipeline; buffered.py handles the non-streaming counterpart. TransformMeta gains optional listener_format and request_parameters fields so the response-side pipeline can pick the right renderer and construct ModelResponsePartsManager. The actual inspector swap (transform_to_provider → render_outbound, SseTransformer → SsePipeline) is deferred to a follow-up; this commit lands the modules and integration tests that lock in their contracts.

_handle_transform routes through render_outbound_sync (private event loop wrapping the async renderer) for non-Gemini providers; Gemini keeps the existing lightllm path until cachedContents is folded into outbound_google.py. responseheaders installs SsePipeline (intake + render via select_*) when transform.listener_format and request_parameters are available, falls back to passthrough otherwise. TransformMeta populated by the transform router from the Context's inbound parse.

…e.py Context.messages/system/tools now read from self.parse_sync() (the pydantic-ai IR via the inbound parser) instead of wire.py's lossy parse_messages/parse_system/parse_tools. Setters update the IR cache; commit() re-renders to listener-format wire bytes via the outbound renderer to refresh self._body for hooks that operate on raw body. Deletes pipeline/wire.py, pipeline/types.py (CachedSystemPromptPart / CachedToolDefinition replaced by pydantic-ai's settings-level cache control), tests/test_wire.py. Removes phase8.md (now obsolete). Lightllm/dispatch.py + registry.py + noop_logging.py + test_lightllm_dispatch.py stay alive for the Gemini cachedContents carve-out — pending follow-up.

parse_sync and render_outbound_sync previously created a private event loop and called run_until_complete unconditionally. When invoked from a sync hook running inside mitmproxy's async runtime (e.g. inject_mcp_notifications reading ctx.messages), asyncio raised "Cannot run the event loop while another loop is running" because nested run_until_complete in the same thread isn't allowed. Add a worker-thread fallback: if a running loop is detected on the current thread, dispatch the awaitable to a ThreadPoolExecutor that owns its own private loop. The CaptureSentinel pattern keeps this bounded.

Replace the CaptureSentinel + AnthropicModel/OpenAIChatModel instantiation hack with pydantic-graph FSM dumps and per-listener parsers with FSM loads. The new lightllm/graph/ package owns dispatch_load / dispatch_dump_sync; Context.ensure_parsed and inspector/routes/transform.py call through it. Anthropic and OpenAI dumps build their wire bodies directly from typed SDK TypedDicts (anthropic.types.beta.*, openai.types.chat.*) via per-IR-part nodes routed by structural pattern matching, with an ApplyCacheNode middleware that attaches cache_control to the last-emitted block. Google and Perplexity dumps move into the graph package under their original mechanisms (Google still wraps pydantic-ai's GoogleModel; Perplexity remains a clean IR-to-helper bridge). KEEPS Context._run_coro_sync and the worker-thread bridge. pydantic_graph's Graph.run_sync is deprecated and uses loop.run_until_complete (graph.py:189), which crashes inside mitmproxy's running asyncio loop -- the bug commit 016d7d1 already fixed. The FSM nodes are async def run(...); they are driven via await graph.run(...) inside the bridge. 1689 tests pass, matching baseline 9e8aa30. Lossiness regressions for tool_name two-pass, image media_type, non-standard cache TTLs, and unknown content blocks are preserved verbatim. Test files renamed to tests/test_lightllm_graph_*.py with the implementation parametrize collapsed to fsm-only.

AGENTS.md becomes the tracked canonical (Codex native). CLAUDE.md is a small file containing @AGENTS.md (Claude Code import). Both files tracked; consistent across all user repos.

Migrates anthropic_dump, openai_dump, and openai_load from pydantic_graph's BaseNode class-based FSM to pydantic_graph.beta's GraphBuilder step-based FSM. Replaces class-per-operation with function-per-operation for cleaner dispatch.

Migrates from pydantic_graph's BaseNode class hierarchy to pydantic_graph.beta's GraphBuilder pattern with typed dispatch envelopes, eliminating boilerplate run() methods while preserving the same FSM logic.

mypy 1.19 does not recognize pydantic_graph.beta's infer_variance TypeVars as generic at runtime, causing cascading type errors in FSM wire-translation modules that pyright handles correctly.

…e litellm Completes the bi-modal → symmetric-FSM migration planned in nextplan.md (phases J–S). New graph/*_intake.py + graph/*_render.py modules plus graph/sse_pipeline.py (persistent asyncio loop per stream) and graph/buffered.py replace the hand-rolled lightllm/response/ subpackage and the LiteLLM-mediated dispatch.py + context_cache.py + noop_logging.py. litellm is removed from src/ and pyproject.toml; the request and response sides now share one FSM idiom, one dispatcher pattern, and one IR boundary in both directions.

Replaces the four FSM modules (anthropic_load, anthropic_dump, openai_load, openai_dump) with procedural AnthropicAdapter and OpenAIChatAdapter classes that extend pydantic-ai's UIAdapter. Removes dispatch_load and simplifies the request-side translation to synchronous code using MessagesBuilder and SDK TypedDicts directly.

- adapters/google.py: direct generateContent wire construction; kills CaptureSentinel exception-capture hack in graph/google_dump.py (deleted) - adapters/perplexity.py: thin wrapper around pplx.py:_build_pplx_payload; graph/perplexity_dump.py deleted (now 1-line indirection) - graph/__init__.py:dispatch_dump_sync routes all providers (Anthropic, OpenAI, Google, Perplexity) through adapters/; async dispatch_dump kept only as test-compat shim - lightllm/graph_ext.py: monkey-patches GraphBuilder.add_subgraph and wraps Graph.render so future SSE FSM refactors can compose subgraphs. Applied at lightllm import time via idempotent apply_patches() - pipeline/results.py: Temporal-style HookResult discriminated union (_HookSuccess | _HookSkipped | _HookError | _HookDeferred) with wrap/unwrap helpers; executor.py captures every invocation, flow records carry structured failure metadata - adapters/{anthropic,openai_chat,_envelope}.py: thread raw_extras through load_messages so refusal text, INVALID_JSON wrapping, image_detail, file blocks, unknown blocks, and non-standard cache TTLs all survive round-trip - _envelope.py:_render_anthropic re-attaches anthropic_cache_instructions to system blocks at dump time - hooks/pplx_thread_inject.py: fix pre-existing mypy arg-type + no-any-return on the thread-fetch helper

Bumps pydantic-ai-slim / pydantic-graph to >=1.99.0 (resolved 1.101.0) to escape the deprecated pydantic_graph.beta namespace and pick up the typed-promotion ModelResponsePartsManager API. All six lightllm/graph/ intake/render modules now import from canonical pydantic_graph paths. Adapters: Google and Perplexity are full UIAdapter subclasses for parity with Anthropic/OpenAI; load_messages raises NotImplementedError since both are outbound-only. Each adapter gains a render(req) classmethod that takes an LLMRenderInput Protocol and returns wire bytes; dispatch_dump_sync now routes through these. Context owns typed IR state directly via five lazy-parsed slots (_cached_messages, _cached_system, _cached_request_parameters, _cached_settings, _cached_raw_extras); parse_sync returns None and populates in-place. The previous ParsedRequest bridge is gone from the production hot path. ParsedRequest survives in parsed.py as a frozen LLMRenderInput stub used by tests and the inspector flow-enrichment shim parse_request(); ParsedResponse was unused and removed. graph_ext.py and its add_subgraph monkey-patch are deleted along with the 5 covering tests — subgraph composition is the wrong granularity for request-side dump methods (9-73 line ranges, no dispatch ladders) and the canonical pydantic_graph.GraphBuilder has no add_subgraph either. If response-side intake decomposition (Phase F Stages 2-5) materializes later, it lands on canonical primitives. Other 1.99 deprecation rebasing: BuiltinToolCallPart → NativeToolCallPart in anthropic_intake/render; ModelResponsePartsManager(model_request_parameters=...) threaded through all four intake constructors; pydantic-ai-slim acquires the [anthropic] optional group (no longer bundled). Ruff cleanup picks up ListenerFormat → StrEnum and the SIM108/SIM102/RUF002 leftovers in lightllm/. docs/lightllm.md rewritten to reflect the post-refactor architecture, HookResult discriminated union, LLMRenderInput Protocol, and adapter walkthrough. 1659 tests pass (baseline 1664 minus the 5 graph_ext tests); mypy + ruff clean tree-wide; inspector smoke (claude --model haiku) succeeds end-to-end.

- The FSM pattern section used invented dump-side symbol names (AnthropicDumpState, parse_text, _DumpDone, apply_cache, _dump_graph, render_anthropic_dump) that don't exist in the codebase. Replaced with the real anthropic_intake.py shape (_AnthropicIntakeState, frame_next_event, handle_content_block_*, _FeedDone, _IgnoredEvent, _intake_graph, AnthropicResponseIntakeFSM.feed). Reframed to make clear the FSM idiom is response-side only; request side is procedural adapter classmethods. - GoogleAdapter description claimed it wraps pydantic-ai's GoogleModel. It doesn't — it does direct generateContent wire construction (camelCase keys, base64 inline data, generationConfig hoist). - Roundtrip test snippet showed AnthropicAdapter.load_messages returning a (messages, settings, raw_extras) tuple. Actual signature returns list[ModelMessage]; settings and raw_extras come from envelope helpers and are passed through via raw_extras kwarg. - Visualization example imported _dump_graph from anthropic_dump (deleted module). Replaced with _intake_graph from anthropic_intake and listed the other graph names. - Lossiness invariants section dropped the obsolete "pre-FSM wire.py predecessor" reference; rewrote to describe the current adapter contract instead. - File map deduplicated the SSE pipeline row.

The b3089a7 refactor moved Context's cached IR state from a single ``_parsed: ParsedRequest | None`` slot into five lazy-parsed fields (``_cached_messages``, ``_cached_request_parameters``, ``_cached_settings``, ``_cached_raw_extras``, ``_cached_system``). ``Context.commit()`` re-renders the IR back to ``_body`` whenever ANY of these are populated. When an earlier outbound hook (``commitbee_compat``, which always reads ``ctx.system``) triggers ``parse_sync()``, all five slots get populated from the pre-shape body. The shape hook then replaces ``ctx._body`` with the captured Claude CLI envelope via ``apply_shape`` — but the cached IR is now stale. ``commit()`` re-renders the IR back to bytes, clobbering the shape's envelope: forwarded body ships only ``{model, messages, max_tokens}`` with no ``system``, no ``metadata``, no billing header. For Claude-CLI clients this still worked accidentally because their own request body carries the right shape. For plain Anthropic-SDK clients sending sentinel keys, Anthropic's anti-abuse path returns 429 ``rate_limit_error`` with empty ``message: "Error"`` when it sees Claude-CLI headers attached to a bare SDK body. Fix: ``apply_shape`` calls ``ctx.invalidate_parsed()`` after writing ``_body``, dropping the stale cache so ``commit()`` sees no cached state and leaves ``_body`` (the shape) alone. Verified with ``docs/sdk/anthropic_sdk.py`` against the dev daemon — both simple and streaming requests now return 200. Tests still pass (1659).

Closes the deferred Phase F (per-step decomposition) and Phase H (typed part promotion) items from next.md, plus fixes two pre-existing bugs the work surfaced. Phase F — subgraph composition via temporary GraphBuilder.add_subgraph patch (lightllm/graph/_subgraph_patch.py) tracking upstream TODO at pydantic_graph/graph_builder.py:1469. Perplexity's 142-line _dispatch_one_event is gone — replaced by a per-event inner graph (absorb_event → text_mirror → pop_next_block → {plan_arm → bare_markdown_arm → diff_block_arm | flush}) that preserves the cross-block has_plan_block invariant and the single end-of-event flush via per-event scratch fields on _PerplexityIntakeState. Google's handle_generate_chunk is gone — replaced by a per-chunk inner graph that classifies parts via a typed-marker decision across five arms. Shared StateT flows through unchanged so the inner graphs mutate the same state instance the outer FSM owns. Phase H — thread tool_kind through the listener parse boundary so ModelResponsePartsManager auto-promotes ToolCallPart to its typed subclass (e.g. ToolSearchCallPart for web_search_20250305). New adapters/_tool_kinds.py maps wire `type` discriminators to ToolPartKind; _parse_tools in both envelopes reads it. Regression test at tests/test_lightllm_graph_intake_anthropic.py asserts the promotion. pplx_stamp_headers — restores the Perplexity Pro browser-shape header bundle (Cookie: __Secure-next-auth.session-token=…, Chrome UA, Origin, Referer, x-perplexity-*, x-app-api*, sec-fetch-*) that the litellm removal in 96db672 silently dropped along with PerplexityProConfig.validate_environment. Without this, every /rest/sse/perplexity_ask call returned 403. Also swaps perplexity_pro auth.file to ~/.opnix/secrets/perplexity-pro-api-key to match the production opnix convention. commitbee_compat — guard against non-dict bodies (Anthropic /api/v2/logs posts a list-shaped event batch) so the hook short-circuits cleanly instead of crashing on ctx._body.get(). Regression test at tests/issues/regression/test_commitbee_list_body.py. Docs — align AGENTS.md project overview, lightllm subsection, hook table, provider description, prompt-caching note, and stubs list to the post-litellm-removal reality. docs/lightllm.md gains a Subgraph composition section + Typed-part promotion section, refreshed module layout, FSM-file table, mermaid section, and file map. docs/mcp.md, docs/inspect.md, docs/configuration.md, docs/sdk/README.md get their stale litellm references replaced. Verified end-to-end: 1668 pytest passing (+9 new), mypy/ruff clean, deprecation-warnings-as-errors gate clean, mermaid sanity clean, and the live smoke matrix passes rows 1 (Claude CLI), 2 (SDK shape replay / former 429 reproducer), 11 (Gemini CLI), 12 (Perplexity Pro).

Sonnet LSP audit confirmed three orphan symbols left over from the litellm-removal refactor: - ``PerplexityProConfig`` class in ``lightllm/pplx.py`` (zero external references — ``PerplexityAdapter.render`` goes directly to ``_build_pplx_payload``). - ``lightllm/registry.py`` module entirely (``_LOCAL_CONFIGS`` and ``get_config`` referenced only by themselves and dead tests). - Their exports from ``lightllm/__init__.py``. Deleted plus ``tests/test_lightllm_registry.py`` and the three matching test functions in ``tests/test_lightllm_pplx.py`` (registry resolver + two ``transform_request`` tests). 1663 pytest still passing (was 1668; 5 deleted dead tests). Also added ``web_search_20260209`` to ``_tool_kinds.ANTHROPIC_TYPED_TOOLS`` (per the Anthropic SDK's currently shipped dated variants) and documented the scope constraint inline: pydantic-ai's ``ToolPartKind`` is ``Literal['tool-search']`` today, so only ``web_search_*`` variants map until upstream registers more kinds (the bash / code_execution / computer / text_editor / web_fetch families have no ``ToolPartKind`` equivalents yet). OpenAI Chat Completions ``tools[].type`` is ``Literal['function']`` only (verified against ``openai/types/chat/``), so ``OPENAI_TYPED_TOOLS`` stays empty until ccproxy adds a Responses API listener. Doc cleanup: ``ParsedRequest`` is now correctly described as **test-only**. The previous docstring + ``docs/lightllm.md`` claim that the inspector used ``parse_request`` for "flow enrichment" was stale — the inspector goes through ``Context.from_flow`` → ``Context.parse_sync`` → ``parse_request_into_fields`` (in-place population), like all production code.

…nder Three independent ergonomic improvements landed together; zero behavior change. - Naming pass. ListenerFormat -> InboundFormat (StrEnum) so the type name matches the canonical inbound/outbound axis used everywhere else. Provider.provider -> Provider.type so the field matches the AuthSource.type discriminator pattern. TransformMeta.provider -> .provider_type, TransformMeta.listener_format -> .inbound_format. Dispatch kwarg renames: upstream_provider/provider -> provider_type, listener_format -> inbound_format. Metadata key ccproxy.listener_format -> ccproxy.inbound_format. _select_listener_format -> _select_inbound_format. Nix-side YAML: providers.X.provider -> providers.X.type in nix/defaults.nix + bundled template. - Context.extras. ~60 LOC typed accessor (.get/.set/.delete/.has) over ctx._body via glom, exposed as layer 3 of the three-layer access model alongside the header and typed-IR layers. Existing glom(ctx._body, ...) callers stay valid; migration is opportunistic. - HookDAG.render(). Emits stateDiagram-v2 mermaid markup walking the topo-sorted execution order with [*] brackets for sources/sinks. ccproxy status --mermaid prints inbound + outbound DAGs as paste-ready output. AGENTS.md + docs/lightllm.md updated to reflect the renames, the new Context.extras layer, and the --mermaid CLI flag. phase4.md added as the next-session plan for OpenAI Responses (Codex parity). Verified: 1671 tests pass, mypy clean across 103 source files, grep for ListenerFormat / listener_format / upstream_provider / _listener_format returns zero matches in src/ tests/ docs/ AGENTS.md nix/.

Apply Tier 1+2+3 cuts from the removal-candidates plan: - Delete pure duplicates: Marketplace Plugin Sync, Defaults Flow diagram, MCP tool enumeration, transport constants, FlowRecord field listing, historical commit references. - Compress subsystem deep-dives with canonical homes elsewhere: lightllm (docs/lightllm.md), Perplexity Pro narrative (docs/pplx.md), oauth/sources prose, Anthropic billing two-phase signing (regenerate.py docstring), inspector + pipeline per-file enumerations, dev-vs-prod section. - Selective trim: hook table Purpose column to single-sentence form, Configuration narrative dedupe, Smoke Test prose, SSL/Logging Implementation Notes entries. Preserve all load-bearing content: both IMPERATIVE blocks (shape replay; Perplexity docs gate), Triage Principle, three-layer access model, hook table rows, sentinel-key concept, routing precedence, Key Constants, Body metadata footgun, SSE streaming + namespace localhost routing notes.

Enables bidirectional transform for OpenAI's Responses API (used by Codex CLI). Handles 27-item discriminated union in input[], preserving reasoning blocks and server-side tool calls via raw_extras for lossless round-trip.

starbaser added 30 commits April 6, 2026 19:19

refactor(ccproxy): rename routing.py to router.py for clarity

fdeeaac

Moves InspectorRouter class to a dedicated router.py module to better reflect its purpose as a xepor routing adapter. Updates all imports across the codebase and removes unused pyright overrides.

refactor(ccproxy): extract parse_session_id to shared utility

d6af1f4

Consolidates session ID parsing logic from multiple modules into a single reusable function in utils.py. Removes duplicate implementations from extract_session_id hook and inspector addon, improving maintainability.

test(pipeline): add PipelineExecutor test coverage

91f7439

test(lightllm): add match_model and null match_host coverage

d8189ec

docs: update CLAUDE.md with response flow, SSE streaming, metadata notes

84fc302

fix(dag): auto-size box width in ASCII DAG visualization

343a763

Hardcoded 40-char width caused right border misalignment when parallel group labels overflowed. Width now computed from longest content line.

cleaning

b6f334e

starbaser added 30 commits May 14, 2026 16:18

flake.nix

7698c24

refactor(ccproxy): disable websockets in run_inspector uvicorn config

ef6b3fc

feat(ccproxy): add GET /pplx/messages endpoint for session resume

bb19287

Exposes Perplexity thread history as OpenAI-shaped messages via new inspector route. Followup requests now send only the last user turn instead of flattened history when last_backend_uuid is present.

refactor(ccproxy): use explicit timeout in _attempt_request

f0a36cf

Prevents TypeError in curl-cffi when timeout=None is passed to client.request, which crashes on None + None arithmetic in set_curl_options.

cleaned up old plan files

9e8aa30

refactor: rename CLAUDE.md → AGENTS.md; CLAUDE.md imports via @

435249a

AGENTS.md becomes the tracked canonical (Codex native). CLAUDE.md is a small file containing @AGENTS.md (Claude Code import). Both files tracked; consistent across all user repos.

Rename Sse* to SSE*

ad6d1a7

refactor(ccproxy): replace user-turn nodes with GraphBuilder functions

d6007ea

Migrates from pydantic_graph's BaseNode class hierarchy to pydantic_graph.beta's GraphBuilder pattern with typed dispatch envelopes, eliminating boilerplate run() methods while preserving the same FSM logic.

chore: disable mypy errors for pydantic_graph TypeVar inference

4dd9765

mypy 1.19 does not recognize pydantic_graph.beta's infer_variance TypeVars as generic at runtime, causing cascading type errors in FSM wire-translation modules that pyright handles correctly.

feat(ccproxy): add OpenAIResponsesAdapter for /v1/responses

d962a21

Enables bidirectional transform for OpenAI's Responses API (used by Codex CLI). Handles 27-item discriminated union in input[], preserving reasoning blocks and server-side tool calls via raw_extras for lossless round-trip.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ccproxy): v2.0.0 — inspector architecture, lightllm, DAG pipeline, compliance#16

feat(ccproxy): v2.0.0 — inspector architecture, lightllm, DAG pipeline, compliance#16
starbaser wants to merge 407 commits into
mainfrom
dev

starbaser commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

starbaser commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Summary

Breaking Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

starbaser commented Apr 16, 2026 •

edited

Loading