fix: keep failed tasks resumable + clearer timeout error + quieter Feishu logs#29
Merged
Merged
Conversation
…hu logs Four independent backend fixes surfaced while debugging a Feishu reply that said "Task #14 not found or has no saved session". - taskboard.py: persist the agent session_id (claude session / codex thread_id) even when a run FAILS. It was only saved on success, so a failed task could never be resumed (replying in a Feishu/Slack/Telegram thread hit "no saved session"). Codex emits thread_id in the opening thread.started event, so even a started-then-failed run is recoverable. - taskboard.py: on timeout, use "Task timed out after Ns" as the task.error summary instead of letting _extract_error_summary surface an unrelated stderr line (e.g. codex's "Reading additional input from stdin..."). Also hoist the timed_out init before the try block so the failure branch can read it when Popen itself raises (CLI not found) before the timer is armed. - channels/feishu_channel.py: register no-op processors for the message_read and recalled receipts we subscribe to but don't act on, silencing the SDK's "processor not found, type: im.message.message_read_v1" ERROR per receipt. - channels/feishu_channel.py: set the "Lark" logger propagate=False so lines aren't emitted twice (its own handler + root basicConfig handler), and lower log_level DEBUG -> INFO to further cut noise. Tests: +5 via red-green TDD; full suite 824 passed, coverage 92.79%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four independent backend fixes, all found while tracing a Feishu bot reply that said "❌ Task #14 not found or has no saved session."
session_idon failed runs (taskboard.py)task.error= "Task timed out after Ns" (taskboard.py)message_read/recalledreceipts (feishu_channel.py)ERROR … processor not found, type: im.message.message_read_v1on every read receiptLarkloggerpropagate=False+ log_level DEBUG→INFO (feishu_channel.py)Root causes
session_idwas written only inside theif success:branch; the failure branch dropped it. Codex emitsthread_idin the openingthread.startedevent, so even a started-then-failed run has a recoverable id.output="Task timed out after Ns", but the failure branch overwrote the summary via_extract_error_summary(raw_stderr, …), which picked a noisy codex stderr line. Also hoistedtimed_outinit before thetryso theFileNotFoundErrorpath no longerUnboundLocalErrors.receive/bot_added/reactiononly; the app is also subscribed tomessage_read, so the SDK logged "processor not found" per receipt."Lark"logger has its own stdout handler and propagates to the root logger (configured bylogging.basicConfigintaskboard.py) → two emissions per line.Test plan
+5tests via red→green TDD:test_execute_task_codex_failure_still_persists_session_idtest_execute_task_claude_failure_still_persists_session_idtest_execute_task_timeout_error_summary_states_timeout_not_stderrtest_start_registers_readonly_event_noopstest_start_disables_lark_logger_propagationmake check: 824 passed, coverage 92.79% (gate 90%), ruff lint + format clean.🤖 Generated with Claude Code