feat: multi-agent handoff, local recording, preemptive generation, semantic EOU + review-wave fixes (both SDKs)#169
Merged
Conversation
Several STT/TTS/realtime provider option classes reference Union[...] in type annotations but never import it. `from __future__ import annotations` masked the omission at import time, but typing.get_type_hints() and other runtime annotation introspection (Pydantic, docs tooling, inspect with eval_str=True) raised `NameError: name 'Union' is not defined`. Affected: assemblyai_stt, cartesia_stt, soniox_stt, whisper_stt, rime_tts, lmnt_tts, gemini_live, ultravox_realtime. Python-only fix (TS unaffected). https://claude.ai/code/session_01Nrb3ZoVFc6K4v1asd2jN8P
The TypeScript Anthropic/Google/Groq/Cerebras providers returned silently
on a non-2xx LLM response instead of throwing. Two regressions followed:
- FallbackLLMProvider treated a generator that completed with zero
chunks as success, so it never failed over to the next provider.
- The stream handler only speaks `agent.llmErrorMessage` when the LLM
loop throws, so a silent return produced dead air on the call.
Python (anthropic/google via vendor SDKs, groq/cerebras via the openai
SDK) already raises on HTTP errors, and the TS OpenAI provider already
throws PatterConnectionError — these four were the outliers. Make them
throw PatterConnectionError too, and cap the logged/thrown error body to
200 chars (provider 401 bodies have been observed to embed the rejected
API-key prefix).
Updates the two Cerebras tests that asserted the old silent-drain
behaviour to expect the throw while still verifying the recovery-hint log.
https://claude.ai/code/session_01Nrb3ZoVFc6K4v1asd2jN8P
…ort gaps - client.py: PipelineHooks/ConsultConfig/CallResult/RealtimeTurnDetection (models) and VADProvider/AudioFilter/BackgroundAudioPlayer (providers.base) were referenced in Patter.agent()'s signature but never imported — IDEs and typing.get_type_hints() raised NameError on the SDK's main entry point. Tool/SpeechEventCallback move from TYPE_CHECKING to runtime imports (no cycle), so get_type_hints(Patter.agent) now fully resolves. - models.py: BargeInStrategy added to the TYPE_CHECKING block (same bug). - google_llm.py: missing Union import (companion to the earlier 8-module fix), drop dead api_key local, unshadow call_id loop variable. - __init__.py: 53 provider option enums were re-exported but missing from __all__ (import * / doc tooling missed them); stt/tts package __all__ gain openai_transcribe, elevenlabs_ws, inworld. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…e stream timeout
Python:
- llm_loop: when cancel_event fires mid-stream every provider returns
cleanly, leaving truncated tool-call JSON accumulated — the loop then
executed those tools with {} arguments after the caller interrupted
(transfer/SMS/booking firing with empty payloads). Bail out before tool
dispatch on cancel, and answer malformed-JSON tool calls with an error
envelope instead of executing with guessed arguments.
- stream_handler/test_mode: history was snapshotted AFTER pushing the
current user turn while LLMLoop._build_messages appends user_text
itself — every request carried the user utterance twice.
- cerebras: 404 model_not_found was swallowed (empty stream looks like
success → no fallback failover, no spoken llm_error_message, dead air).
Now logs the recovery hint and re-raises, mirroring TS; test updated.
- anthropic/google: prepend a synthetic user turn when history starts
with the first_message greeting (Messages API requires user-first;
Gemini same shape), map Gemini functionResponse.name back to the real
function name via the paired functionCall (spec requires the names to
match), subtract cached tokens from Gemini input usage.
- chat_context: to_anthropic folded role:"tool" entries into user turns
(Anthropic 400s on tool role); truncate drops leading orphan tool
results (bare tool_call_id 400s on OpenAI).
- fallback_provider: forward caller/callee to delegates and only pass the
context kwargs each delegate's stream() declares — a minimal custom
provider no longer TypeErrors on every attempt (availability flapping).
TypeScript (mirrors where applicable):
- replace the fixed 30 s whole-stream LLM ceiling with an idle watchdog
(createStreamIdleWatchdog, re-armed per chunk) in OpenAI/Anthropic/
Google/Groq/Cerebras/OpenAI-compatible providers; idle aborts now throw
PatterConnectionError instead of surfacing as a fake barge-in AbortError
(parity: Python has no whole-stream ceiling).
- anthropic: handle in-band SSE error events (overloaded_error) by
throwing instead of ending the stream as success; user-first guard.
- google: user-first guard, functionResponse.name mapping, cached-token
subtraction. groq/cerebras shared parser: subtract cached tokens from
prompt_tokens (was double-billing cache reads).
- chat-context: same to_anthropic/truncate fixes as Python.
- tests updated to the new contracts + new watchdog unit tests.
https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
- store.py: record_call_end crashed with TypeError when the standalone
ingest passed metrics as a plain dict (asdict on non-dataclass) — and
the exception fired AFTER the active row was popped, so every completed
call vanished from the standalone dashboard at hangup. Accept dicts.
- store.py: update_call_status now copies the live transcript/turns into
the terminal entry (TS already did) — the Twilio statusCallback vs WS
stop race no longer blanks the transcript pane.
- both stores: add Plivo 'timeout'/'cancel' to the terminal status set —
rows for unanswered/cancelled Plivo dials leaked in the active set
forever (phantom live call).
- both servers: Telnyx call.hangup with a no-media cause (busy/no-answer/
rejected) now terminal-izes the pre-registered dashboard row — same
permanent active-set leak.
- store.py SSE: a force-dropped slow subscriber now receives a close
sentinel so its generator ends and EventSource reconnects — previously
the dashboard froze forever while showing 'streaming · sse'.
- cli ingest (both SDKs): a finished-call payload is no longer replayed
as a fresh call_start (spurious SSE event + started_at = ingest time);
stores derive started_at from the metrics duration when absent.
- cli.ts: raise express.json body cap to 5 MB (long-call ingests 413'd
and silently vanished).
- api_routes.py: /api/v1/calls/{id} falls back to the active set (TS
parity). routes.py: clamp negative ?limit; interpret date-only export
filters as UTC like JS Date (same query returned different ranges per
SDK).
https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
… wait, WS hardening
Cross-carrier correctness fixes confirmed by the deep review (several found
independently by two reviewers):
- Telnyx outbound calls never got a media stream in EITHER SDK: a perf
refactor folded streaming_start into actions/answer on call.initiated,
but Answer is only valid on incoming legs and call.answered had become a
no-op — callees answered to dead air. Outgoing legs now skip the answer
and attach the stream via actions/streaming_start on call.answered.
- Telnyx webhook signature validation only accepted DER/SPKI public keys,
but the Telnyx portal issues TELNYX_PUBLIC_KEY as base64 of the RAW
32-byte Ed25519 key — every webhook 403'd (fail-closed) the moment the
documented security feature was enabled. Both forms now verify; tests
cover the raw form.
- Plivo call(wait=True) could never resolve: completions/AMD/prewarm were
keyed by the dial-time request_uuid while every webhook carries the live
CallUUID. The answer webhook now re-keys all per-call bookkeeping
(alias_call_id / aliasCallId + client prewarm re-key); the TS Plivo
branch also actually routes through maybeAwaitCompletion (wait was
silently ignored).
- TS carrier WS: no 'error' listener (an ECONNRESET became an
uncaughtException killing every live call), unguarded async 'close'
listeners (throwing onCallEnd → unhandled rejection → crash), and ws@8
invoking async listeners unawaited (interleaved handleAudio → VAD state
races, out-of-order STT). All three carrier streams now serialize events
onto a per-connection FIFO with contained errors + error listeners.
- Per-IP WS cap counted the tunnel's loopback peer: hard ceiling of 10
concurrent calls behind cloudflared/ngrok and a trivial shared-bucket
DoS. Loopback peers now key on CF-Connecting-IP / X-Forwarded-For.
- Voicemail drops (Telnyx/Plivo, both SDKs) were awaited inline in webhook
handlers including a playback sleep of up to 30 s — carriers timed out
and retried, double-speaking the message. Now tracked fire-and-forget
tasks; the Telnyx drop also moves from the early
call.machine.detection.ended to call.machine.greeting.ended (the beep),
so the message is no longer clipped mid-greeting; playback estimate
constants aligned (were 2x apart between SDKs).
- machine_end_other now triggers voicemail-drop/prewarm-evict like the
other machine_end_* outcomes (both SDKs).
- Telnyx configure_number PATCHed connection_id to /phone_numbers/{id}/voice
which silently ignores it (auto-config 'succeeded' but inbound never
routed) — association now goes to PATCH /phone_numbers/{id} (all 3 impls).
- Python: completion futures resolve in finally around user on_call_end
(a throwing callback stranded wait=True until the 30-min backstop; TS
same); serve() no longer crashes on Windows (add_signal_handler);
call(from_number=...) was always ignored (config value won the or);
webhook_url now normalised to a bare hostname (schemed values built
wss://https://... URLs); outbound Telnyx/Plivo dials leaked a pooled
httpx client per call; bridges resolve direction from the store instead
of hardcoding 'inbound'; handler.cleanup() guarded in all three bridge
finallys; WebSocketDisconnect no longer logged/recorded as a call error;
Plivo bridge masks phone numbers in logs and sends the same
transcript/conversation_history payload shape as Twilio/Telnyx.
- TS: recording: true now actually starts Plivo recording (worked in
Python only).
https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…e handler The bridge keeps no history deque of its own (Twilio/Plivo do) — the parity addition referenced an undefined name, which the on_call_end try/except silently swallowed, skipping the callback entirely. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
Python stream handler: - transcript/history deques: 'x or deque()' silently replaced the carrier-shared (empty → falsy) deques with private ones, so EVERY on_call_end payload carried an empty transcript and history. Use 'is not None'. - STT-connect-failure hangup called _hangup_fn(call_id) but the carrier hangup closures take no args — the TypeError was swallowed and the call stayed up, deaf. - apply_call_overrides round-tripped the Agent through dataclasses.asdict, dict-ifying nested configs and deep-copying live provider objects — any per-call override from on_call_start crashed the call later. Use dataclasses.replace. - _await_dispatch_settle: dispatch-turn failures were logged at DEBUG (callers heard silence, operators saw nothing) and CancelledError was swallowed even when the awaiting task itself was being cancelled, defeating teardown. cleanup() now also cancels the STT loop BEFORE the dispatch task so a racing transcript can't respawn an orphan turn, and guards each adapter close individually. - user on_transcript/on_metrics callbacks are now exception-contained (_safe_on_* helpers): one raise inside the realtime forward loop permanently killed event forwarding (zombie call). - mcp_servers were silently ignored in pipeline mode (only the realtime handler called _init_mcp_tools); pipeline start() now discovers MCP tools and cleanup() closes the sessions — matching the documented mode-agnostic contract and TS. - realtime function_call: unknown/handler-less tools and malformed argument JSON now get an error-envelope function_result instead of silence (a dangling call item stalled the model: dead air). Mirrors TS. - pipeline transfer_call validates E.164 BEFORE invoking the carrier transfer (which silently no-ops on bad targets) and returns the same rejection envelope as the realtime path. - realtime guardrails: evaluate on accumulated text (per-delta checks never matched terms split across deltas), clear the carrier playout buffer on block, and speak the replacement via the no-fake-turn reassurance path — send_text injected it as a phantom role:user turn the model then replied to. - barge-in: echo guard now runs BEFORE the tail-grace rescue (the grace window is exactly when the agent's final-sentence echo arrives — the rescue disarmed the downstream echo check and the agent answered its own words); duplicate/hallucination finals are filtered BEFORE cancelling (Deepgram's is_final twin of a just-committed speech_final cancelled the agent's brand-new turn); a strategy-confirmed barge-in now actually flushes the inbound ring, and the pending window forwards audio to STT — with strategies configured but forward-stt off, no transcript could ever arrive, so strategy barge-in was structurally impossible. - firstMessage: history append no longer gated on metrics being enabled (model could re-greet), echo-guard reference now covers the greeting and non-streaming replies, prewarm pacing derives bytes/ms from the active output format (mulaw 8k prewarm bytes were paced 4x too fast, re-opening the barge-in flush window). - STT send_audio failures degrade to dropped frames (rate-limited warn) instead of tearing the whole call down via the carrier read loop. - remote_message: the 30 s asyncio.timeout spanned the generator's whole consumption INCLUDING TTS playback time of each yielded chunk — long spoken replies were cancelled mid-sentence with no log. Now a per-receive idle timeout. - services: IVR loop detector compares the newest chunk to its immediate predecessor (max-over-window false-fired on alternating A/B prompts); scheduler cache stores (loop, scheduler) so a reallocated id() can't hand back a scheduler bound to a dead loop; markdown filter no longer eats all prose after a bare '<'. TypeScript stream handler (mirrors + TS-specific): - dispatchTask gets its rejection handler AT creation (dispatchTurn is try/finally only; the next turn's catch attached far too late for Node's unhandled-rejection check → process crash). fireCallEnd guards the user onCallEnd; processTranscript guards the user onTranscript. - same echo-guard-before-tail-grace reorder; same firstMessage / runRegularLlm / WS-remote echo references; runRegularLlm returns its final text instead of the caller re-reading history[-1] (raced a concurrently committed user turn). - WS-remote turns now honour barge-in at the outer loop (previously kept consuming the remote stream and started a fresh TTS synthesis per chunk after the interrupt) and only bill TTS/turn-complete on a clean finish. remote-message drains buffered frames after done/close (the old !done condition dropped every buffered chunk after the first). - LLM tool loop: bail out before/between tool executions when the abort signal fired (parity with the Python cancel_event fix — no more side effects from truncated tool JSON after a barge-in). - speechEvents threaded into StreamHandlerDeps (the public onUserSpeechStarted/.../onAudioOut API never fired on real served calls — only unit tests passed it). - scheduler.scheduleOnce chains timeouts past Node's 2^31-1 ms clamp (a >24.8-day job fired immediately); IVR note*State respect stop(); test-mode REPL survives provider errors. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…ndary correctness
STT (TypeScript): every provider emit loop now contains BOTH sync throws
and async rejections from transcript callbacks — the registered callback
is async, so the bare cb(t) (or sync-only try/catch in Deepgram/
Speechmatics) left its rejection unhandled and killed the Node process on
any user-callback error.
STT (Python):
- whisper/telnyx/speechmatics adapters fully reset per-call state on
connect(): close() left a closed httpx client, an unsent WAV-header
flag and stale None/_STOP sentinels behind, so a sequential second call
on the same adapter instance was deterministically broken (no
transcripts / instant loop exit / rejected audio).
- whisper transcriptions are now chained sequentially (both SDKs had the
ordering bug; Python fixed here): parallel HTTP requests with OpenAI's
latency variance routinely delivered chunk N+1's final before chunk N,
scrambling word order in history.
- AssemblyAI reconnect: _running stayed true through the reconnect
handshake — the consumer polls it every 100 ms and the TLS+WS handshake
takes longer, so a successfully reconnected (billed) session delivered
zero transcripts.
- Soniox (both SDKs): add finalize() ({type:'finalize'}) so the VAD
speech_end fast-path actually works (every turn previously waited out
the full endpointing delay), and stop re-emitting identical interims on
token-less keepalive frames.
- OpenAITranscribeSTT (both SDKs): reject verbose_json up front — the
gpt-4o transcribe models 400 on it, so every chunk failed (logged only)
while audio kept being buffered and billed.
- deepgram: Transcript.words back to a tuple (frozen-dataclass contract);
providers.deepgram() helper smart_format default aligned with the class
(the two entry points behaved differently); providers.soniox(language=)
now maps to language_hints instead of being silently discarded.
Audio:
- StatefulResampler.flush() (py) fed the partial-frame carry to ratecv,
which ALWAYS raises on a non-whole frame — every odd-length stream
crashed the flush path. Drop the sub-frame remainder like TS.
- TS 16k→8k FIR decimator rewritten with a real lookahead carry: the old
single-pending-sample design processed the carried sample twice (lost
the true s-2) and edge-replicated the +2 tap at every chunk end —
audible crackle at chunk boundaries on the main Twilio outbound path.
Chunked output is now bit-identical to one-shot output (regression
tests added).
- AEC far-end taps (3 py + 2 ts sites) gated on the carrier-native fast
path: with the TTS adapter auto-flipped to ulaw_8000 they pushed mulaw
wire bytes into an int16-PCM-16k echo canceller — garbage reference,
and odd-length chunks crashed np.frombuffer mid-turn (misreported as an
LLM error).
- Silero VAD (py): queue transitions beyond the first per process_frame
instead of dropping them (a chunk spanning speech_end→speech_start lost
the start event); reset() clears the queue.
- background_audio builtin_clip_path returned a path whose as_file
context had already exited — on zip-based installs the extracted temp
file was deleted before use. Keep the context open for the process
lifetime (same pattern as silero_onnx).
https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…, Gemini sanitization - tool executor (py): handler tools (which is what every MCP tool is) ran UNBOUNDED when no per-tool timeout was declared — a hung tool froze the realtime event loop indefinitely. Apply the documented 10 s default (mirrors TS); webhook timeouts are now terminal like handler timeouts (retrying multiplied a dead webhook's wait to many minutes per turn); non-JSON-serializable returns no longer burn the whole retry+backoff loop (default=str). - consult (both SDKs): the consult tool now declares its own timeout budget — TS DefaultToolExecutor raced the handler against the 10 s default and killed any consult longer than that, while the handler's own budget was 30 s; the Python tool needed the declaration for the new executor default. - circuit breaker (py): HALF_OPEN admitted unlimited concurrent probes (comment said one) — a burst of parallel tool calls hammered a recovering backend. Gate with probe_in_flight like TS. - @tool schema generation (py): PEP 604 unions (str | None) have origin types.UnionType, not typing.Union — the idiomatic 3.10+ spelling mapped to {type: object} and was wrongly marked required. Literal[...] now emits enum, list[X] emits items (Gemini rejects array schemas without items). - define_tool (py) returned a plain dict that Patter.agent(tools=[...]) rejects with TypeError since 0.5.0 — now returns the public Tool. - Gemini schema sanitization (google_llm.py, gemini_live.py, google-llm.ts, gemini-live.ts): recursively strip JSON-Schema keys the proto Schema rejects ($schema, additionalProperties, oneOf, …) — strict-mode tools REQUIRE additionalProperties:false and nearly every zod-derived MCP server emits $schema, so one such tool 400'd every Gemini turn/session. - MCP (ts): transport throws from callTool now return the structured error envelope instead of reaching the executor's retry loop, which re-fired non-idempotent MCP tools up to 3x on transient errors (parity with Python). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
- openai-realtime-2.ts: the connect() monkeypatch registered every message listener as an anonymous wrapper, so removeListener (identity match) never removed the setup listener — it stayed attached for the whole call and its error branch ran ws.close() on the FIRST benign mid-call error frame (commit-empty, truncate-too-short, parallel tool-result conversation_already_has_active_response), tearing down the live engine socket. The patched on() now keeps a handler→wrapped map and off() translates through it; the setup listener also hard-ignores messages once settled. - ElevenLabs ConvAI (TS): the adapter hardcoded language 'it' (every conversation forced to Italian or failing initiation) and always sent a voice_id override (ElevenLabs rejects overrides not enabled in the agent's security settings — broke default-configured agents). Overrides are now sent only when explicitly configured. buildAIAdapter switches to the options form with ulaw_8000 in/out — the positional form sent no output_format, so ConvAI streamed PCM16@16k onto the mulaw carrier wire (loud static on every TS ConvAI call). - ElevenLabs ConvAI (py): the Telnyx bridge built the handler without for_twilio=True even though Telnyx negotiates PCMU 8 kHz — caller mulaw bytes were fed to ConvAI as PCM16@16k (garbled in both directions; Twilio/Plivo branches were already correct). - OpenAI Realtime (both SDKs, v1+GA): response.output_item.added also fires for function_call items — recording those as the truncate target made barge-in during a tool turn truncate a non-message item, which the server rejects with an error event. Only message items are tracked now. - OpenAI Realtime GA (both SDKs): the GA session schema removed 'temperature' — forwarding it made the session.update fail and the call drop at pickup. Warn-and-skip. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…y opt-out hygiene - cost: openai_realtime_2 calls fell through to the pipeline branch of _compute_cost in BOTH SDKs (exact-string match on 'openai_realtime') and reported $0 AI cost for the most expensive engine — while still emitting cached savings. TS also never captured realtimeModelName / the realtime provider tag for GA calls. - agent_response_ms (py): llm_ttft_ms is initialised to 0.0 so the 'is not None' gate was always true — when no first-token signal fired the flagship SLO metric silently EXCLUDED the whole LLM segment. Now gated on the actual signal (TS already leaves it undefined). - EOU delay (py): the final-transcript path stamped record_vad_stop unconditionally, overwriting the real VAD speech_end stamp microseconds before stt_final — end_of_utterance_delay was always ≈0 and the fake endpoint signal defeated record_stt_complete's own don't-fake logic. The fallback stamp is now first-wins. - InterruptionMetrics units: Python emitted SECONDS, TS milliseconds — cross-SDK consumers were 1000x apart for the same event. Python now emits ms (every other latency field is *_ms); TS gains Python's early-return so stray overlap-ends no longer inflate interruption counts. Docstring + regression test updated. - call-log (ts): logTurn/logEvent/logCallEnd re-derived the day directory from Date.now(), so calls crossing midnight UTC split across two day dirs — the original metadata stayed 'in_progress' forever and the dashboard hydrate resurrected phantom live calls. Per-call startedAt map added (mirrors Python). - telemetry opt-out (both SDKs): the environment-dims helper ran unconditionally at construction and its previousVersion probe WROTE ~/.getpatter/version — violating the documented 'opting out never touches the filesystem' invariant. Now gated on enabled. Numeric dimensions (latency_ms/cost_usd/…) now require numbers — the one gap that let free text reach the wire. - pricing (both SDKs): gpt-4.1 $3/$12 → $2/$8 and gpt-4.1-mini $0.80/$3.20 → $0.40/$1.60 (published OpenAI rates; siblings were correct); gpt-4o-realtime-preview audio still carried the Oct-2024 launch price ($100/$200) — cut to $40/$80 in Dec-2024. - evals (py): one transient judge failure (429/timeout/missing key) aborted the whole suite and discarded every completed case — now recorded as a failed case; the verdict is computed locally from the score instead of trusting the judge's self-reported 'passed' (a hallucinated passed:true with score 0.2 recorded a pass). - observability exports (ts): shutdownTracing/withSpan/recordPatterAttrs/ patterCallScope/attachSpanExporter were not exported from the package root — users could not flush the BatchSpanProcessor Patter creates (NodeTracerProvider does not flush on exit), silently dropping trailing spans. - minor parity: Python _opt_avg now filters zeros like TS optAvg; TS recordTtsFirstByte emits only inside the first-byte latch (re-emitted stale TTFB events); stale 'masked by default' phone-redaction docstrings corrected in both SDKs. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…facade parity - ElevenLabs 'native format' for Telnyx was declared pcm_16000 across the TTS layer (both SDKs: native-format maps, for_telnyx factories, and the stream handlers' native checks) while the SDK's own streaming_start pins the Telnyx wire to PCMU/μ-law @ 8 kHz — the native fast path therefore shipped raw PCM16 bytes onto a μ-law wire: pure static on every default ElevenLabs-on-Telnyx call. Every surface now agrees on ulaw_8000 (the TS handler check also gates on known carriers); tests updated to the corrected contract. - CartesiaTTS.for_twilio/forTwilio (and the pipeline facades, both SDKs) requested sample_rate=8000, but the audio sender has no consuming hook for a declared TTS rate — it unconditionally runs its fixed 16k→8k decimator, so the 8 kHz audio was decimated AGAIN and played at ~2x speed (chipmunk) on every call using the documented factory. The factories now emit 16 kHz (the pipeline rate). - sentence chunker (both SDKs): a standard-path emission now ends the aggressive 'first flush' window — only the aggressive flush cleared the flag, so a comma in sentence 2+ could still trigger a clause-level flush mid-turn (choppy prosody, contradicting the documented contract). - tts/openai.py facade default aligned to gpt-4o-mini-tts (the underlying provider default and the TS facade) — the same nominal config produced different voice/latency/price per SDK. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
… OpenAI key in TS agent() The parity runner still referenced the pre-monorepo layout (sdk/, sdk-ts/, package 'patter') — every one of the 10 scenarios failed, so the suite had silently stopped guarding parity. Restored: - paths/imports updated to libraries/python + libraries/typescript/dist and the getpatter package (tool_executor moved to tools/). - call_init / voice_mode_enum rewritten for the modern carrier-object API (cloud mode and DEFAULT_*_URL report 'removed' on both sides). - the TS shim silences the telemetry banner and the runner parses the last stdout line, so SDK construction output can't corrupt the JSON protocol. - sentence_chunker delegates to its dedicated standalone runner (xfail semantics) instead of counting as a failure. The revived suite immediately caught a real divergence: Python fails fast in agent() when OpenAI Realtime mode has no API key, TS deferred to call time (dead call instead of a clear construction error). TS now validates eagerly too; test helpers gain stub keys. Suite result: 10/10 scenarios matched (sentence_chunker: 53 pass, 8 xfail). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…ones — rebuilt ui.html
- App.tsx: the bucket strategy captured Date.now() once and was memoized on
[range] only, so the time window froze at mount — a dashboard left open
past the window edge silently dropped every call that ended after it
(within at most 1 hour on the default 24h view). The window now re-anchors
on a 30 s tick.
- mappers: Twilio 'ringing'/'queued' statuses mapped through the default
branch to 'ended' — every outbound call showed an "ended" pill for the
whole 10-30 s ring phase. They now map to the (already styled, previously
unused) 'queued' pill and count as ongoing.
- mergeCallPreserving resurrected soft-deleted calls forever: deletions are
absent from the server snapshot by design, and the prev-carry-over loop
re-appended them on every refresh (cross-tab deletes never propagated).
The calls_deleted SSE payload and local deletes now feed a tombstone set
consulted by the merge.
- turnCount transcript fallback halves the line count (one line per user
AND assistant message double-counted turns past the percentile gate);
'All' range sparkline now actually derives its window from data extents
(the {fromMs: 0} sentinel was truthy, so every call landed in the
rightmost 1970→now bucket).
dist rebuilt and synced into both SDKs' dashboard/ui.html (identical md5).
https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
The firstMessage streamed INLINE: in Python inside PipelineStreamHandler .start(), which the carrier bridge awaits from its single WS read loop — so for the whole greeting no media frames were processed (VAD/barge-in structurally impossible on the first message), stop frames went unnoticed, and prewarmed mark-gated pacing starved because mark acks could never be read (0.5 s timeout per chunk → ~13x slower than realtime, guaranteed jitter underrun). The TS handler had the same shape inside handleCallStart, made symmetric by the recent per-connection FIFO serialization. Both handlers now await beginSpeaking(is_first_message=true) BEFORE returning (the self-hearing guard engages from the very first inbound frame) and stream the greeting in a tracked background task (_play_first_message / playFirstMessage). Teardown cancels (py) / settles (ts) the task before adapters close; failures log instead of killing call setup. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
STT adapters are stateful per-connection objects, but the documented usage hands ONE instance to ONE agent served for MANY calls: concurrent calls shared a socket/queue/callback set, so call B's connect() overwrote call A's WebSocket, each call could receive the OTHER caller's transcripts, and the first hangup closed the surviving call's socket. Python: STTProvider.__init_subclass__ now captures every subclass's ORIGINAL constructor arguments (outermost call wins through inheritance chains, zero per-provider code — user subclasses included) and a generic clone() replays them; _create_stt_from_config clones provider instances per call, degrading to the legacy shared instance with a loud warning when clone() fails. Verified across all 7 streaming providers. TypeScript: each provider records its construction args and exposes clone(); createSTT() clones per call with the same loud-warning fallback for adapters without clone(). The identity test updated to the new contract (same type/config, fresh connection state, distinct per call). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…art overrides - persist (TS): both SDKs' docs state persistence is ON by default since 0.6.2 (the dashboard hydrate path needs on-disk records across restarts), but resolvePersistRoot had regressed to opt-in — with persist omitted and no PATTER_LOG_DIR, TS silently wrote nothing while Python persisted. Aligned to the documented default. - call(first_message=...) (py) was documented but never referenced in the body — now applied as a per-call frozen-dataclass copy of the agent so prewarm synthesis, the bridge and the handler all see the override. TS gains the same option (LocalCallOptions.firstMessage) for parity. - onCallStart per-call overrides (TS): Python has applied a dict returned from on_call_start as per-call agent config since 0.5.x; TS typed the callback void and ignored the result. The handler now applies returned overrides (snake_case keys mirroring apply_call_overrides; the Python-only stt_config/tts_config keys warn-and-skip), the server's logging wrapper forwards the return value instead of swallowing it, and the public callback types accept the override shape. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…l watchdog in Python - NLMS AEC (both SDKs): the far-end ring only advances on push, so while the agent was silent processNearEnd convolved the SAME frozen TTS tail into every 20 ms user frame forever — a repeating buzz at echo-estimate amplitude superimposed on user speech, exactly when there is no echo to cancel. The canceller now passes through when the reference is stale (>250 ms since the last far-end push). The adaptation floor also rises from 1e-6 (-120 dBFS — a TTS fade-out still 'counted' as far energy, letting weights blow up against user speech with a near-zero norm) to 1e-3 (≈ -60 dBFS), freezing adaptation on an effectively-silent reference. - Python stream handlers gain the 1-hour auto-hangup watchdog TS has had all along (armed in each start(), cancelled in cleanup()) — a call whose carrier stop never arrives could previously run and bill forever. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…ox event semantics
- ElevenLabs ConvAI (both SDKs): handle the previously-ignored
client_tool_call event — configured client tools stalled until the
provider-side timeout and reported failure. The adapters surface it as
the shared function_call event and gain a client_tool_result sender;
the handlers route through the tool executor and ALWAYS answer
(unknown tools and execution errors included — silence stalls the
ElevenLabs agent). transfer_call/end_call declared as ElevenLabs client
tools now reach the carrier helpers.
- Gemini Live (both SDKs): enable input/output audio transcription in the
session config and parse serverContent.inputTranscription/
outputTranscription — native-audio sessions previously produced NO user
transcript ever and no assistant transcript in AUDIO modality
(logs/history/metrics empty for every Gemini Live call). goAway is now
logged loudly (the only warning before the server drops the ~10-15 min
session).
- Ultravox (both SDKs): 'listening' is entered after EVERY normal agent
turn — mapping it to speech_started cleared the carrier playout buffer
at each turn end, clipping the audio tail; turn end is now the
speaking→listening transition ('idle' never fires mid-call, so
response_done effectively never fired before). Agent transcripts emit
delta frames only — full-text frames forwarded as appends duplicated
the transcript ('Hel'+'lo'+'Hello').
https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…operational log - Thread the SpeechEvents dispatcher through all three Python carrier bridges into every handler; ConvAI and Pipeline handler ctors now accept/forward speech_events like the Realtime handler already did. - Emit the documented user/agent speech events from pipeline mode in both SDKs (VAD start/stop, EOS on turn commit, agent begin/end with interrupted=true on barge-in) — previously realtime-only. - Write events.jsonl (documented since 0.6 but never written): tool_call/tool_result records from role=tool transcript lines, barge_in from interrupted turns, error from CallMetrics.error_code (also persisted as metadata.json "error") — wired in both servers' logging-callback wrappers, with unit tests reading the files back. - Update stale test contracts surfaced by the full-suite run: remote WS mock gained recv() (per-receive idle-timeout loop), transfer_call now rejects non-E.164 before invoking the carrier helper, guardrail replacement speaks via send_reassurance instead of send_text. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
… call loop
The existing eval runner scored an arbitrary reply(text) -> str callable
and never exercised the SDK. This adds getpatter.evals.session.EvalSession,
which constructs an actual PipelineStreamHandler and injects user turns
through the same path a live call uses — the real STT receive loop
(_handle_barge_in -> _commit_transcript -> _dispatch_turn), the real
LLMLoop with the real ToolExecutor, pipeline hooks, guardrail replacement,
dedup/hallucination filtering, history handling, and metrics — with only
the paid/external boundary faked:
* FakeAudioSender (records send_audio/send_clear/send_mark, auto-acks marks)
* FakeSTT (queue-backed; finals flow through the handler's real _stt_loop)
* FakeTTS (records spoken sentences, yields 10 ms of silence)
* ScriptedLLMProvider (deterministic chunk scripts for CI) or any real
LLMProvider for live evals
API: `async with EvalSession(agent=..., llm_provider=...) as s:` then
`result = await s.user_says("...")` -> frozen TurnResult(agent_text =
what the caller heard post-guardrails/hooks, tool_calls, history_snapshot,
interrupted, metrics_turn). getpatter.evals.assertions.expect(result) adds
chainable tool_called(name, args_subset=) / no_tool_called() /
agent_text_contains(...) and an async judge(llm_judge, intent=...) that
reuses the existing LLMJudge.
EvalCase gains optional agent= / llm_provider= fields; EvalRunner routes
those through EvalSession while the legacy reply()-factory path (and the
`patter eval` CLI, which keeps that contract) is unchanged — both flavours
mix in one suite. stream_handler.py is untouched: the session drives
existing public/internal methods only.
Tests (no network, scripted provider) prove: (a) a tool-call case asserts
via tool_called with the REAL ToolExecutor running a local handler, (b)
multi-turn history accumulates with prior turns exactly once (pinning the
existing cross-SDK trailing duplicate of the current user message), (c)
guardrail replacement is observable in agent_text, (d) cleanup leaves no
pending tasks (asyncio.all_tasks comparison), plus commit-filter drops
surfacing loudly, hooks, metrics capture, and runner integration.
Python-only by design (the TypeScript CLI already prints an evals stub).
https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…/ OutputChain) Proposes splitting PipelineStreamHandler's three interleaved state machines into composable stages, inventories today's states/transitions and maps every PipelineHooks surface and recently-fixed bug to its owning stage, and defines a 4-slice migration plan that keeps the public API and all existing tests green. Slice 1 (InputProcessingChain + audio_filter wiring) lands separately. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…ProcessingChain (slice 1) audio_filter / audioFilter (Krisp / DeepFilterNet) was accepted by the public API, documented as "integrated before VAD and STT", implemented and unit-tested — but never invoked by any pipeline. Slice 1 of the pipeline-stages decomposition (docs/architecture/pipeline-stages.md) extracts the inbound half of on_audio_received / handleAudio into an InputProcessingChain that owns decode (mulaw->PCM16) -> stateful 8k->16k resample -> AEC near-end -> audio_filter (NEW) -> VAD feed and returns the processed frame + VAD event. The handlers keep the downstream logic (VAD-event handling, self-hearing gate, ring buffer, beforeSendToStt hook, STT feed) so the diff stays reviewable; with no AEC/filter/VAD configured the byte path is identical to before. - Filter wrapper is fail-open: raise / non-bytes return -> passthrough of the pre-filter PCM, WARN once then DEBUG, keeps attempting. - AEC/filter/VAD resolved via late-bound getters (start() and test fixtures install _aec/_auto_vad after construction). - TS chain also owns the per-call VAD error kill switch (former vadDisabled) including the 25 ms ONNX inference timeout race. - (Python) KrispVivaFilter now re-frames input internally to its configured frame_duration_ms (remainder buffered, dropped on sample-rate change) instead of raising on the pipeline's 20 ms frames vs its 10 ms default. - Tests: chain-level order assertion (AEC -> filter -> VAD via recording fakes), warn-once passthrough, mulaw + PCM parity vs stateful reference, handler-level proof that agent(audio_filter=...) transforms the bytes reaching a fake STT; Krisp re-framing unit tests. Both full unit suites pass unchanged (py 1435 passed/19 skipped; ts 1866 passed/8 skipped, tsc clean). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
Both logEvent calls in the transcript wrapper are fire-and-forget, so the on-disk append order of tool_call vs tool_result is not guaranteed (records carry their own ts). Assert content per type instead of order. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
Entries for the fix batches landed on this branch (greeting background task, per-call STT clone, dashboard SPA window/status/tombstones, persist default parity, call(first_message), TS onCallStart overrides, ConvAI client tools, Gemini Live transcriptions, Ultravox event semantics, Python 1h watchdog, AEC far-end staleness, pipeline speech events, events.jsonl, plus the provider/transport wave) per the AGENTS.md same-PR changelog rule. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
Opt-in agent.barge_in_mode="pause_resume" (default "cancel" keeps today's behaviour byte-identical). LiveKit-style state machine on VAD speech_start while the agent speaks: - PAUSE: gate the sentence/audio send loops on _output_paused and send_clear the carrier so queued audio stops within a frame. The LLM stream and TTS provider stream stay alive: sentences buffer as text (capped at 32) and synthesized audio queues into per-sentence retention entries (capped at ~15 s of playout; overflow while paused degrades to a full cancel, overflow while speaking releases retention for the turn). Mic audio flows to STT while paused (output is silent, so the line is echo-quiet) and the inbound ring is flushed so the confirm window can actually hear the user. - KILL: a committed final transcript (non-echo, non-hallucination, non-duplicate — the existing _handle_barge_in/_commit_transcript filter family) within barge_in_confirm_ms (default 1500 ms) runs the existing _do_cancel_for_barge_in path and discards the paused buffers. The overlap window anchored at pause time is preserved so InterruptionMetrics.detection_delay measures VAD-T1 -> confirm-T2. - RESUME: window expires with no confirming transcript -> re-send the cleared-but-unheard tail from retained audio at SENTENCE granularity (first sentence not fully played, derived from the #164 _playback_buffered_until cursor + heard-prefix segments; the partially-played sentence replays from its start) without re-billing TTS, then release the buffered sentences through the normal synth path. Recorded as a false interruption via record_overlap_end(was_interruption=False) — the backchannel counter, never an interruption — plus a false_interruption event. The playback bookkeeping is frozen at the heard offset on pause so a kill still rewrites history to the heard prefix; on resume the replay re-stamps segments so later barge-ins stay accurate. Turn bodies wait out an in-flight pause decision before ending (bounded by the confirm window) so buffered sentences are never orphaned. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…ipt) TypeScript port of 3877814 — exact parity with the Python semantics, defaults, and events (camelCase ↔ snake_case naming): Opt-in agent.bargeInMode: 'pause_resume' (default 'cancel' keeps today's behaviour byte-identical). LiveKit-style state machine on VAD speech_start while the agent speaks: - PAUSE: gate the sentence/audio send loops on outputPaused and sendClear the carrier so queued audio stops within a frame. The LLM stream and TTS provider stream stay alive: sentences buffer as text (capped at 32) and synthesized audio queues into per-sentence retention entries (capped at ~15 s of playout; overflow while paused degrades to a full cancel, overflow while speaking releases retention for the turn). Mic audio flows to STT while paused and the inbound ring is flushed so the confirm window can actually hear the user. - KILL: a committed final transcript (non-echo, non-hallucination, non-duplicate — the existing handleBargeIn/commitTranscript filter family) within bargeInConfirmMs (default 1500 ms) runs the existing runBargeInCancel path and discards the paused buffers. The overlap window anchored at pause time is preserved so detection_delay measures VAD-T1 -> confirm-T2. - RESUME: window expires with no confirming transcript -> re-send the cleared-but-unheard tail from retained audio at SENTENCE granularity (first sentence not fully played, derived from the #164 playbackBufferedUntil cursor + heard-prefix segments; the partially-played sentence replays from its start) without re-billing TTS, then release the buffered sentences through the normal synth path. Recorded as a false interruption via recordOverlapEnd(false) — the backchannel counter, never an interruption — plus a 'false_interruption' event ({ resumedSentences }). The playback bookkeeping is frozen at the heard offset on pause so a kill still rewrites history to the heard prefix; on resume the replay re-stamps segments so later barge-ins stay accurate. Turn bodies wait out an in-flight pause decision before ending — completes the predecessor's in-progress port by bounding awaitPauseDecision (confirm window + 5 s fail-open margin, mirroring Python's _await_pause_decision) so a teardown race can never strand the dispatch loop. Tests mirror tests/unit/test_barge_in_pause_resume.py: pause gates without cancelling, paused buffering + overflow degradation, resume tail replay + false-interruption metrics/event, kill filters (final-only / hallucination / duplicate / frozen-prefix history rewrite), legacy cancel mode untouched, config-off defaults, streaming-loop integration (resume, kill, stream-ends-paused), and teardown mid-pause. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…t interim transcripts Opt-in Agent.preemptive_generation (default False): on an interim transcript that ends with sentence-final punctuation or is unchanged for preemptive_min_stable_ms (default 300), pipeline mode starts a speculative dispatch — built-in LLM loop + sentence-chunked TTS — and HOLDS all audio in memory (bounded ~15 s; overflow aborts). When the final transcript commits: - normalized match → RELEASE: buffered audio flushes to the carrier and the speculative task becomes the live turn; history/metrics record exactly one turn with the final transcript text as the user message, and TTFT/latency anchors are stamped from the REAL commit point (user-perceived timing). - mismatch → discard via the cancel-event machinery (history untouched) and dispatch normally on the final. At most one speculation in flight (a newer qualifying interim replaces it); VAD speech_start during speculation aborts silently. The consume loop races the next LLM token against the release signal so a commit mid-token-silence flushes immediately. New CallMetrics counters preemptive_hits / preemptive_misses (accumulator record_preemptive_hit/_miss). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
… the registration race _start_speculation awaits the old speculation's unwind (bounded 5 s) before registering its own _SpeculativeTurn. The interim-stability watcher and the STT receive loop both call it, so a second path could register a NEWER speculation during that await — and the resuming caller then overwrote it. The overwritten turn's task was orphaned parked on its release_event forever: never aborted, never released, never counted as a miss, holding up to ~15 s of buffered audio and an open LLM stream until call teardown. It also broke the documented at-most-one-speculation invariant (two tasks generating concurrently). Guard the registration: after the abort settles, yield to any speculation registered concurrently — it always corresponds to the later-arriving interim. Regression test interleaves a concurrent registration into the replacement window. Mirrored in the TS port's startSpeculation. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…ident interim transcripts TS port of 238214e at parity: opt-in agent.preemptiveGeneration (default false) + preemptiveMinStableMs (default 300). On an interim that ends with sentence-final punctuation, or is unchanged for the stability window, pipeline mode starts a speculative dispatch — built-in LLM loop + sentence-chunked TTS — and HOLDS all audio in memory (~15 s playout cap; overflow aborts). When the final transcript commits: - normalized match → RELEASE: buffered audio flushes to the carrier and the speculative task becomes the live turn (tracked via dispatchTask); history/metrics record exactly one turn with the final transcript text as the user message, TTFT/latency anchors stamped from the real commit point (user-perceived timing). - mismatch → discard via the AbortController machinery (history and carrier untouched) and dispatch normally on the final. At most one speculation in flight: a newer qualifying interim replaces it (noteInterimTranscript is awaited on the transcript drain loop so replacements serialize — parity with Python's awaited _note_interim_transcript — and startSpeculation yields to a concurrently registered newer speculation, mirroring the Python fix). VAD speech_start during speculation aborts silently; handleStop / handleWsClose tear down without a miss. The token consume loop races the next LLM token against the release decision so a commit mid-token-silence flushes buffered audio immediately. New CallMetrics counters preemptive_hits / preemptive_misses (recordPreemptiveHit/Miss), mirroring Python. One TS-specific addition over the straight port: the released speculative task clears dispatchTask in its finally exactly like dispatchTurn's finally does — without this, canSpeculate() (which requires dispatchTask === null; the TS null-on-done convention, vs Python's dispatch.done()) stayed false for the rest of the call after the first hit. Covered by the sequential-two-hits regression test. Tests mirror Python's test_preemptive_generation.py: immediate start on punctuated interims, stability-window start, release (single LLM call, single history turn, hit counted, no audio before commit), mid-stream release flush, mismatch discard + normal re-dispatch, VAD abort, replacement, same-interim dedupe, buffer overflow, teardown without a miss, speculation gates (speaking / dispatch in flight), default-off. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
… (opt-in, both SDKs) Integrate the open pipecat-ai smart-turn v3 ONNX end-of-utterance model as an optional semantic turn detector for pipeline mode in the Python and TypeScript SDKs. Design: - Provider: SmartTurnDetector (getpatter/providers/smart_turn.py, src/providers/smart-turn.ts) implements the new TurnDetectorProvider interface (threshold / predict(pcm16-16k window) / close). The Whisper log-mel preprocessing (reflect-padded 400-pt STFT, 80 Slaney mel filters, last-8s left-padded window, zero-mean/unit-variance normalize) is ported natively in each SDK; cross-SDK numeric parity is locked by a reference-value test generated from the Python implementation. - Wiring: Agent.turn_detector / agent.turnDetector, off by default — the speech_end path is unchanged when unset. On a VAD speech_end the handler scores the rolling 8 s caller-audio window: probability >= threshold finalizes STT immediately (end-of-turn fires early); below threshold the finalize is HELD and re-scored every ~200 ms of further silence, capped by Agent.max_semantic_hold_ms / maxSemanticHoldMs (default 1200 ms, then plain vad_silence). A frame-driven poll plus a generation-guarded wall-clock backstop guarantee the cap even if inbound audio stalls; a VAD speech_start or an STT-side transcript commit cancels the hold. - Speech events: pipeline mode now fires on_user_speech_eos (only when a detector is configured — zero behavior change otherwise) with trigger EouTrigger.SEMANTIC_TURN_DETECTOR when the model decided the commit vs EouTrigger.VAD_SILENCE otherwise. - Graceful degradation: onnxruntime/numpy stay optional (the getpatter[turn-detector] extra; onnxruntime-node optionalDependency), imported lazily. SmartTurnDetector.maybe_load() / maybeLoad() warns once and returns None/undefined when the runtime or the model file (PATTER_SMART_TURN_MODEL or model_path) is unprovisioned, so the agent runs plain VAD-silence endpointing instead of crashing; load() keeps fail-fast errors with install/download instructions. At call time the handler fails open AND fails once: the first predict error logs a single warning and disables the detector for the rest of the call (the existing vadDisabled pattern). - Model weights are NOT bundled (~30 MB); downloaded by the user from https://huggingface.co/pipecat-ai/smart-turn-v3. Also fixes a scratch-buffer aliasing bug in the TS mixed-radix FFT base case (every n=50 sub-transform corrupted its even half and then overwrote it with the odd half), caught by the Python-generated reference-value parity test. Tests: pytest 2382 passed / exit 0 (ONNX session is the only mocked boundary, tagged @pytest.mark.mocked); vitest 1896 passed / exit 0 (*.mocked.test.ts twins); tsc --noEmit clean. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…t in pipeline mode Integration reconciliation between the unconditional pipeline speech events (this branch) and the semantic turn detector's EOS stamping (the smart-turn feature, written against a base where pipeline EOS never fired): - Pipeline EOS fires exactly once per committed turn, AT transcript commit (the analogue of Realtime's input_audio_buffer.committed) — before the hook veto and handler-availability checks — covering the on_message path and orphaned turns that the old emission point next to record_turn_committed missed. - The semantic detector's stamped trigger is consumed at that single point (semantic_turn_detector | vad_silence | manual_commit); the duplicate emission the feature carried is removed in both SDKs. - TS emitUserSpeechEos gains the vad_silence/manual_commit resolution Python already had (it hardcoded vad_silence) and an explicit-trigger arg for the Realtime path. - Released speculative turns (preemptive generation) bypass the dispatch path entirely: the release commit now performs the same semantic cleanup + EOS emission so combining the two opt-ins neither leaks a stale stamped trigger nor skips the event. - Detector tests updated to the merged contract (EOS always fires; only the trigger differs). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…ide stereo WAV) serve(local_recording=True) records every call from the media stream the SDK already proxies — no carrier recording API, no recording fees, audio never leaves the process. Works on Twilio, Telnyx and Plivo in every engine mode (pipeline / OpenAI Realtime / ElevenLabs ConvAI); independent of the carrier-side `recording` flag and off by default. - New getpatter/audio/call_recorder.py: LocalCallRecorder writes an interleaved stereo WAV (left=caller, right=agent — the QA-standard layout), 16-bit PCM @ 16 kHz; mulaw 8k / pcm16 8k / pcm16 24k inputs are decoded per channel with stateful resamplers. Caller-clocked alignment: inbound PSTN frames are the wall clock, agent TTS bursts drain at that rate from a bounded FIFO (60 s cap, overflow force-flushed), the idle channel is zero-padded. - Hot-path safe: 64 KiB buffered writes (no per-frame disk I/O), bounded memory, any I/O error disables the recorder without touching the call. - Placeholder RIFF header is patched on close(); every handler cleanup path (including abnormal carrier WS drops) finalizes, so truncated calls still yield parseable WAVs. - Wiring: EmbeddedServer.create_local_recorder resolves the target path (explicit dir string > call-log dir next to metadata.json/ transcript.jsonl > ./recordings fallback); the three telephony bridges attach the recorder before handler start and surface `recording_path` in the on_call_end payload; CallLogger.log_call_end persists it in metadata.json. Because the WAV lives in the per-call log directory, PATTER_LOG_RETENTION_DAYS sweeps recordings too. - Tests: WAV header/channel/length round-trips via stdlib wave, both- direction capture, caller-clock alignment + silence padding, encoding decodes, bounded backlog, buffered-write batching, abnormal-teardown finalization, idempotent close, path resolution + sanitization, bridge-level recording_path surfacing, retention sweep covering recordings, and config-off ⇒ zero filesystem writes. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…arity with Python)
serve({ localRecording: true }) records every call SDK-side as a stereo
WAV (left=caller, right=agent), 16-bit PCM @ 16 kHz — same defaults,
layout, payload keys and metadata shape as the Python SDK.
- New src/audio/call-recorder.ts: LocalCallRecorder with per-channel
decode (mulaw_8k / pcm16_8k / pcm16_24k / pcm16_16k → PCM16 16 kHz via
stateful resamplers), caller-clocked alignment with a bounded 60 s
agent FIFO, 64 KiB batched writeSync (no per-frame disk I/O), and a
placeholder RIFF header patched on close(). Exported from the package
index (mirrors Python's importable getpatter.audio.call_recorder).
- StreamHandler taps: caller audio at the top of handleAudio (above
every engine-mode guard, wire codec from bridge.inputWireFormat);
agent audio in encodePipelineAudio — the single chokepoint for all
pipeline sends, decoding the carrier-native μ-law fast path instead of
skipping — and in onAdapterAudio for Realtime/ConvAI (μ-law wire,
PCM16 16 kHz for non-negotiated ConvAI).
- fireCallEnd finalizes the WAV on both teardown funnels (handleStop and
the abnormal handleWsClose) and surfaces `recording_path` in the
onCallEnd payload; EmbeddedServer.makeLocalRecorder resolves the
target path (explicit dir > call-log dir > ./recordings fallback) and
CallLogger.logCallEnd persists recording_path in metadata.json, with
callDir made public so the WAV lands next to transcript.jsonl and is
covered by the PATTER_LOG_RETENTION_DAYS sweep.
- Tests: real WAV byte round-trips (header fields, stereo mapping,
sample rate, lengths), both-direction capture through the live handler
taps, alignment + silence padding, encodings, bounded backlog,
buffered-write batching, abnormal-teardown finalization, idempotent
close, makeLocalRecorder path resolution + sanitization, retention
sweep covering recordings, and config-off ⇒ zero writes / no
recording_path key.
https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
Two related in-call capabilities, both opt-in with zero behavior change
when unused:
WARM TRANSFER — transfer_call / CallControl.transfer gain carrier-neutral
mode ("cold" default, byte-identical blind redirect | "warm") + summary.
Warm mode on Twilio parks the caller in a per-call conference on hold
music, dials the human agent with the summary spoken first (<Say>), then
bridges the two as the AI leg ends. New signature-validated, fail-closed
webhooks: /webhooks/twilio/conference (lifecycle observability) and
/webhooks/twilio/warm-status (releases a caller stuck on hold when the
human never answers). Telnyx/Plivo return a clear {error} envelope and
keep the AI on the line — never a silent blind-redirect fallback.
Invalid modes are rejected with an error envelope on every path.
MULTI-AGENT HANDOFF — agent(handoffs={name: Agent}) injects a built-in
handoff_to(name, reason?) tool (names enum-constrained). Calling it (or
PipelineStreamHandler._perform_handoff programmatically) swaps the live
call to the target agent's system prompt, tools, variables, guardrails,
text transforms, consult tool, and onward handoffs — history preserved,
a [handoff] system line recorded and never replayed as a fabricated user
turn. Pipeline mode: LLMLoop.update_agent swaps prompt + tool list for
the next turn. Realtime mode: new OpenAIRealtimeAdapter.update_session
sends a partial session.update (GA adapter adds the mandatory
"type": "realtime" discriminator) BEFORE the function result so the next
response already runs as the target. Unknown targets / malformed args
return error envelopes — never silence. Audio infra established at call
start (STT/TTS/engine connection, hence voice on engines that cannot
switch mid-session) is retained; chained handoffs follow the target map.
Tests: tests/test_handoff.py + tests/unit/test_warm_transfer_unit.py —
authentic, mocking only the carrier REST boundary (@pytest.mark.mocked).
https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…ith Python
Mirrors the Python SDK feature-for-feature (snake_case <-> camelCase):
WARM TRANSFER — TRANSFER_CALL_TOOL gains mode ("cold" | "warm") + summary;
TelephonyBridge.transferCall / CallControl.transfer accept an optional
TransferCallOptions and resolve a TransferCallResult envelope for warm
mode. TwilioBridge implements the conference REST sequence (park caller
on hold -> dial human with <Say> summary -> bridge; recovery release when
the target dial fails), with the new signature-validated, fail-closed
/webhooks/twilio/conference + /webhooks/twilio/warm-status routes.
TelnyxBridge/PlivoBridge return the documented { error } envelope for
warm mode — never a silent blind-redirect fallback. Cold mode stays
byte-identical. New public types TransferCallOptions / TransferCallResult.
MULTI-AGENT HANDOFF — agent({ handoffs }) validates and stores the target
registry; buildAIAdapter advertises the built-in handoff_to tool (enum-
constrained names); the stream handler dispatches it on both paths:
performHandoff (pipeline — LLMLoop.updateAgent swaps prompt + tools for
the next turn) and handleHandoffFunctionCall (realtime — partial
session.update BEFORE the function result). OpenAIRealtimeAdapter gains
updateSession; the GA adapter overrides buildSessionUpdatePatch to add
the mandatory "type": "realtime" discriminator the GA endpoint requires
(without it the handoff session.update would be rejected). History is
preserved; [handoff] system entries are recorded and skipped on replay;
unknown targets / malformed args return error envelopes — never silence.
Tests: tests/handoff.test.ts + tests/warm-transfer.mocked.test.ts —
authentic, mocking only the carrier boundary (fetch / injected WS).
https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
Document the transfer_call mode/summary parameters (Twilio-implemented warm mode, clear-envelope fallback on Telnyx/Plivo) and the new handoff_to built-in tool (handoffs registry, history preservation, Realtime session.update vs Pipeline next-turn semantics) in both SDK tool pages, and add the two Unreleased changelog entries. https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
…eculation path Cross-feature reconciliation after integrating preemptive generation and multi-agent handoff: specSpeakSentence read deps.agent.guardrails (written before the handoff feature introduced the mutable currentAgent view), so a mid-call handoff's guardrails did not apply to speculative sentences. The pause-resume drain and speculation send paths also gained the carrier-native AEC far-end gate and the local-recording agent tap during integration (committed with their cherry-picks). https://claude.ai/code/session_01BJxPoE8v57ck3EDfeZCpo4
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Integrates the
claude/affectionate-davinci-kzje88work branch (42 commits, 2026-06-10) into main, with latest main merged in (clean, no conflicts).Features
handoff_tobuilt-in tool, Python + TypeScript parity, with docs.local_recording/localRecording), both SDKs.Review-wave fixes
Pipeline orchestrator correctness, barge-in/tool-dispatch safety, STT adapter clone-per-call, TTS wire formats (Telnyx/Cartesia), engine fixes (ConvAI, Gemini Live, Ultravox, GA listener leak), telephony/WS hardening (Telnyx outbound media, Ed25519, Plivo), dashboard-app fixes (ingest crash, SSE freeze, live rows), observability (GA cost attribution, speech events end-to-end, events.jsonl, telemetry opt-out hygiene), audio (AEC staleness, resampler boundaries, 1h watchdog), type-annotation/export gaps,
audio_filterwiring via extracted InputProcessingChain.Verification