fix(telemetry): three delivery bugs + Realtime model in feature_used#167
Merged
Conversation
Delivery fixes (both SDKs), each found by E2E-testing the published 0.6.6 artifacts against a real local collector: 1. GC loss: a TelemetryClient constructed fire-and-forget (the CLI's cli_command pattern) was garbage-collected with its buffered events before the exit flush ran — the registry is a WeakSet by design. Clients now hold a strong module-level reference (_PENDING_FLUSH / pendingFlush) from first buffered event until the buffer drains. 2. In-flight close loss: aclose()/close() flushed an already-empty buffer while the POST started by record() was still in flight, killing it mid-air on prompt shutdown. Close now awaits the in-flight flush first. 3. In-flight stranding: events recorded while a flush POST was in flight stayed buffered with no flush scheduled (record() saw the live task and skipped). A completed flush now chains another when the buffer is non-empty — constructor events can no longer shadow agent-time events. Added: Realtime agents now report their model variant in feature_used via the existing sanitized llm_model dimension (openai-gpt-realtime-2, ...; custom names still collapse to openai-other). The dedupe key includes the model so two agents on different Realtime models both record. No schema change — llm_model was already an allowlisted string dimension. Tests: regression tests for all three delivery bugs + realtime-model capture, in both suites (Python 51 passed, TS 39 passed).
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
Merged
3 tasks
FrancescoRosciano
added a commit
that referenced
this pull request
Jun 11, 2026
Bump getpatter to 0.6.7 across all four version files and roll the CHANGELOG "## Unreleased" block into "## 0.6.7 (2026-06-10)". Release contents (this PR): the telemetry delivery fix wave (#167) — three root-cause fixes so fire-and-forget events actually arrive (WeakSet GC loss of unreferenced clients, close() killing the in-flight POST, in-flight flush stranding subsequent events) — plus the Realtime model variant in feature_used (llm_model: openai-gpt-realtime-2 / -mini / ...).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
E2E testing of the published 0.6.6 packages against a real local collector found that several telemetry events record but never deliver. Three root-cause fixes (both SDKs, per
sdk-parity) plus one small capture gap closed:Fixed
TelemetryClient(...).record(...)without a held reference (the CLI'scli_commandpattern) was garbage-collected with its buffered events before the atexit/beforeExit flush. The registry is a WeakSet by design; clients now hold a strong module-level ref (_PENDING_FLUSH/pendingFlush) from first buffered event until the buffer drains.aclose()/close()flushed an already-empty buffer while the POST started byrecord()was still in flight; prompt shutdown killed it mid-air. Close now awaits the in-flight flush first.first_runPOST shadowedsdk_initialized+ all agent-time events). A completed flush now chains another when the buffer is non-empty.Added
feature_used—llm_model: "openai-gpt-realtime-2"etc. via the existing sanitizedllm_modeldimension (no schema bump; custom names still collapse toopenai-other). Dedupe key includes the model. Docs table updated.Test plan
scripts/pr-validate.sh: Python tests (full suite) + security tests + TS lint/tests/build — all greenNotes
getpatter hermes/getpatter openclawCLI commands never emittedcli_command(only dashboard/eval/other do) — left as-is, worth a follow-up.