Skip to content

Response compression: compress API response before returning to agent#7

Merged
alderpath merged 2 commits into
masterfrom
response-compression
Jun 14, 2026
Merged

Response compression: compress API response before returning to agent#7
alderpath merged 2 commits into
masterfrom
response-compression

Conversation

@alderpath

Copy link
Copy Markdown
Contributor

Compresses the assistant message content in API responses before returning to the agent. Each response's reasoning blocks get compressed by the code-block-aware compressor, saving 2-10% on wire size per turn. Compounds across turns as the compressed responses are reused in conversation history.

New compress_response_body() function:
- Deserializes the API JSON response
- Runs compress_assistant_text() on each choice's message content
- Code-block-aware: prose sections compressed, code blocks verbatim
- Accepts any positive savings (no minimum threshold)
- Returns modified body + chars saved in x-reliaty-response-saved header

Also fixed compress_prose_inline threshold: accepts any 10+ char savings
(was requiring 85% of original threshold).

Verified: fires on real LLM responses (~2% savings on reasoning output).
@alderpath alderpath merged commit 52c12f7 into master Jun 14, 2026
2 of 4 checks passed
@alderpath alderpath deleted the response-compression branch June 14, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant