Skip to content

feat: add pith-skill context compression server optimized for Windows#4404

Open
VjAlbert wants to merge 7 commits into
modelcontextprotocol:mainfrom
VjAlbert:feat/add-pith-skill-compression
Open

feat: add pith-skill context compression server optimized for Windows#4404
VjAlbert wants to merge 7 commits into
modelcontextprotocol:mainfrom
VjAlbert:feat/add-pith-skill-compression

Conversation

@VjAlbert

Copy link
Copy Markdown

Description

This PR introduces Pith-Skill, an MCP server written in Python and heavily optimized for Windows environments. It bridges the gap in high-context development sessions by introducing an aggressive, token-aware context compression engine that mitigates token bloat during heavy LLM analysis loops.

Key Features & Windows Optimizations

  • Robust Encoding Fix: Explicitly implements UTF-8 standard output overriding (sys.stdout.reconfigure) to fully eliminate the disruptive UnicodeEncodeError: 'charmap' codec can't encode character '→' bug common in Windows CP1252 environments when handling LLM-generated Unicode characters.
  • Token-Aware Context Compression: Provides a structured mechanism (compress.py) to prune, filter, and condense massive context logs before they hit the token limit.
  • Built-in Validation: Includes a complete evaluation suite (tests/run_evals.py and tests/evals.json) ensuring functional parity and stability.

How It Works

  1. Extract — code blocks, JSON, URLs, file paths quarantined before processing (never touched)
  2. Score — each sentence scored by Zipf density (word length ≥ 7 chars as rarity proxy)
  3. Filter — top N% sentences by density selected (default: 60%)
  4. Benford Gate — if compression increases MAD vs Benford's Law by >2×, relax ratio and retry (max 3 attempts)
  5. Reassemble — original sentence order restored, preserved blocks reinserted

MCP Tools Exposed

Tool Output Use Case
compress Header + compressed text Standard agent handoffs
compress_with_metadata Full JSON with token counts + Benford MAD Programmatic pipelines

Testing Status

Tested locally inside modern Windows PowerShell terminal environments using Anthropic's recommended prompt structures. All 7 validation evals pass successfully under strict UTF-8 enforcement:

  • TC01: core compression (verbose web search result)
  • TC02: code preservation (code blocks never touched)
  • TC03: passthrough (short payloads < 5 sentences)
  • TC04: JSON preservation (structured data intact)
  • TC05: aggressive compression (ratio 0.4)
  • TC06: URL preservation (all URLs survive compression)
  • TC07: Benford metadata validation (JSON output with MAD values)

Installation

uvx mcp-server-pith

Claude Desktop (Windows):

{
  "mcpServers": {
    "pith": {
      "command": "cmd",
      "args": ["/c", "uvx", "mcp-server-pith"]
    }
  }
}

VjAlbert and others added 7 commits June 22, 2026 14:49
Introduces mcp-server-pith, a zero-dependency Python MCP server that
compresses inter-agent payloads using Zipf word-density scoring validated
by Benford's Law structural integrity check.

Exposes two tools: `compress` (text output with header) and
`compress_with_metadata` (JSON with token counts and Benford MAD values).
Applies sys.stdout.reconfigure(encoding="utf-8") at startup to prevent
UnicodeEncodeError on Windows CP1252 terminals.

All 7 eval cases pass (core compression, code preservation, passthrough,
JSON preservation, aggressive compression, URL preservation, Benford metadata).
Replace legacy 'Zipf density scoring' label with accurate 'Shannon
local information scoring' in server.py tool description.
Rewrite README 'How it Works' section to reflect v2 pipeline:
SIZE_GATE, Shannon LUT, filler pre-pass, polarity checksum,
Benford gate, XML receptor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace LOG_CACHE static lookup with @functools.lru_cache(maxsize=8192) on
_log2; update README docs to reflect lru_cache approach; 22/22 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant