Skip to content

fix(telegram+config): per-inbound typing keying, tighter context cache, higher max_tokens#997

Open
minhdang03 wants to merge 1 commit intonextlevelbuilder:devfrom
minhdang03:fix/telegram-typing-config-tuning
Open

fix(telegram+config): per-inbound typing keying, tighter context cache, higher max_tokens#997
minhdang03 wants to merge 1 commit intonextlevelbuilder:devfrom
minhdang03:fix/telegram-typing-config-tuning

Conversation

@minhdang03
Copy link
Copy Markdown

Summary

Three follow-ups to #996, all grouped around the "two inbounds for the same chat in quick succession" failure mode and the truncation-related UX bugs.

  • Typing controller: key by inbound message id, not localKey. With scheduler `maxConcurrent=1` on DMs, msg2 queues behind an in-flight msg1. The old handler replaced `typingCtrls[localKey]` whenever msg2 arrived, so when `Send()` for msg1's reply ran it yanked a typing controller that actually belonged to msg2. Result: msg2 would run silently with no typing indicator, which users read as "bot died". Each inbound now owns its own controller under a composite key `localKey#inboundID`; `Send()` looks up via the new `inbound_message_id` outbound metadata and stops only its own (with a graceful fallback to localKey for non-Telegram outbound paths).
  • USER.md cache TTL 5m → 1m. The five-minute window made same-conversation writes invisible for long enough that predefined agents would "forget" their own USER.md updates on the next turn. Direct writes invalidate their own key already, but indirect writes (admin UI edits, background workers) could linger up to 5 min. One minute is still long enough to absorb most of the I/O load for an active session.
  • DefaultMaxTokens 8192 → 16384. Modern agents doing step-by-step reasoning plus tool calls plus a final answer can hit the 8192 cap, producing `finish_reason=length` mid-bold and cut messages on Telegram. Providers bill actual completion tokens, not the cap, so raising the ceiling has no per-call cost impact for normal replies — it just removes a recurring truncation footgun.

Files changed

Path What
`internal/channels/telegram/handlers.go` Key typing by inbound message id; stop removing previous controllers on same localKey.
`internal/channels/telegram/handlers_utils.go` New `typingKeyFromIDs` helper (localKey + msg id composite).
`internal/channels/telegram/send.go` `Send()` looks up typing via `inbound_message_id` metadata with localKey fallback.
`cmd/gateway_consumer_normal.go` Propagate `inbound_message_id` into outbound metadata.
`internal/tools/context_file_interceptor.go` Reduce `defaultContextCacheTTL` from 5m to 1m; add explanatory comment.
`internal/config/defaults.go` Bump `DefaultMaxTokens` from 8192 to 16384 with rationale.
`internal/channels/telegram/handlers_utils_test.go` Three new cases for `typingKeyFromIDs` — distinct keys per msg id, zero-id fallback, determinism.

Test plan

  • `go build ./...` (PG variant)
  • `go build -tags sqliteonly ./...` (desktop variant)
  • `go vet ./...`
  • `go test ./internal/channels/telegram/... ./internal/tools/... ./internal/config/... ./internal/pipeline/... ./internal/i18n/...` — all pass
  • Manual: send two Telegram messages to the same chat within ~10s, confirm each gets its own typing indicator that stops only when its own reply arrives
  • Manual: edit USER.md via admin UI, send a message within 90s, confirm agent sees fresh content
  • Manual: send a long prompt that previously produced a cut reply, confirm full message fits within the new `max_tokens` ceiling

Relation to #996

These are strictly additive. #996 is already enough to stop the visible "**70" leak and the silent-error dead-end. This PR closes the remaining two footguns (typing yank + truncation pressure) and reduces the staleness window on USER.md.

🤖 Generated with Claude Code

…he + raise max_tokens

Three follow-ups to the earlier telegram-error / markdown-balance fix, all
concentrating around "user sends two messages in rapid succession".

**Typing controller: key by inbound message id, not localKey.**
With scheduler maxConcurrent=1 for DM, msg2 queues behind an in-flight msg1.
The old handler replaced typingCtrls[localKey] when msg2 arrived, and then
Send() for msg1's reply yanked that shared controller — leaving msg2 to run
silently with no typing indicator. Now each inbound owns its own typing
controller under a composite key (localKey + msg id) and Send() stops only
its own via the new `inbound_message_id` outbound metadata field (with a
graceful fall-through to localKey for non-Telegram outbounds).

**USER.md context-file cache TTL 5m → 1m.**
The 5-minute cache made same-conversation writes invisible for a long
window — a predefined agent could write USER.md, then re-read its stale
cached copy on the next turn. Direct writes invalidate their own key, but
indirect writes (admin UI, background workers) could linger up to 5 min.
One minute is short enough to be safe for bursty conversational writes
without re-reading the file every turn.

**DefaultMaxTokens 8192 → 16384.**
Modern agents doing step-by-step reasoning plus tool calls plus a final
answer can hit the 8192 cap, producing finish_reason=length mid-bold and
cut messages on Telegram. Providers bill actual completion tokens not the
cap, so raising it has no cost impact on normal replies but removes a
recurring truncation footgun for long answers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant