From 33f454cc2ea5a658d32b15713dff472fdbca019b Mon Sep 17 00:00:00 2001 From: Adam Fisk Date: Fri, 26 Jun 2026 15:53:21 -0600 Subject: [PATCH 1/2] curate: weekly site-rules curation agent (the "invent new methods" pass) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the judgment layer of the self-improvement loop on top of the deterministic probe harness. bench/curate-inputs.sh gathers the weekly inputs (stats with per-cause failure breakdown, latest probe traces, currently published rules) into one JSON; agent-skill/wick-curate/SKILL.md reasons about what the fixed matrix couldn't crack — sites failing EVERY cell (e.g. apkpure), high user-offline noise, seed/measured conflicts — and proposes + tests genuinely new methods (different residential country, wait_for_selector, URL rewrites, CEF+residential), then re-probes and republishes. Guardrail: never publish a rule that isn't backed by a passing probe. Co-Authored-By: Claude Opus 4.8 (1M context) --- agent-skill/wick-curate/SKILL.md | 128 +++++++++++++++++++++++++++++++ bench/curate-inputs.sh | 74 ++++++++++++++++++ 2 files changed, 202 insertions(+) create mode 100644 agent-skill/wick-curate/SKILL.md create mode 100755 bench/curate-inputs.sh diff --git a/agent-skill/wick-curate/SKILL.md b/agent-skill/wick-curate/SKILL.md new file mode 100644 index 0000000..6ee48f5 --- /dev/null +++ b/agent-skill/wick-curate/SKILL.md @@ -0,0 +1,128 @@ +--- +name: wick-curate +description: Weekly self-improvement curation pass for Wick's site-rules. Reads the public stats (with per-cause failure breakdown), the residential probe traces, and the currently-published rules, then reasons about anomalies the deterministic harness can't resolve on its own — sites that fail EVERY tested strategy, high user-offline noise, seed/measured conflicts — and proposes + tests genuinely NEW access methods (different residential country, wait-for-selector, URL rewrites, CEF+residential). Use weekly, or after a stats regression on a high-volume host. Operator-facing: needs prod Vault creds (via /residential-proxy) and a Worker publish key. +--- + +# Wick site-rules curation (the weekly "invent new methods" pass) + +The deterministic harness (`bench/probe.sh`) measures a fixed matrix — +`cronet | cronet+residential | cef` — and corrects the rules from that. This +skill is the **judgment layer on top**: it looks at what the matrix *couldn't* +crack and what the stats reveal, then reasons about methods the matrix doesn't +try. It is the "H reviews anomalies and invents new methods" box in the +self-improvement loop. + +Run it **weekly** (rules change slowly), or when a high-volume host regresses. + +## Step 1 — gather + triage + +```bash +bash bench/curate-inputs.sh > /tmp/curate.json # read-only, no creds +jq '{failing: (.failing|length), hard: (.hard|length)}' /tmp/curate.json +``` + +`curate-inputs.sh` returns three buckets. Read them and sort hosts into: + +- **`hard`** — the probe's latest sweep tried `cronet`, `cronet+residential`, + and `cef` and *every cell failed*. These are the real targets for this pass: + the matrix has nothing left to offer, so a new method is needed. +- **`failing`** — site-side failing hosts (from stats) with a `causes` + breakdown. Use the cause to reason about *why*: + - mostly `reset` / `refused` → active blocking (anti-bot edge or RST). CEF + + a clean residential IP is the usual answer; if already tried, see "new + methods" below. + - mostly `403` in `status_dist` → datacenter-IP block or bot fingerprint → + `needs_residential` and/or CEF. + - **mostly `offline`** → user-side noise, NOT a hard site. Do not propose a + rule; note it and move on. (The probe already excludes these, but the stats + list still surfaces them — don't be fooled.) + - `dns` with the site clearly up → likely the *reporting users'* resolver, not + the site; low priority. +- **seed/measured conflicts** — where `published_rules` shows a `source:"seed"` + entry that the latest probe contradicts. The publish merge already lets + measured win, but flag low-confidence ones for a re-probe. + +## Step 2 — reason about NEW methods (the part the matrix can't do) + +For each `hard` host, form a hypothesis the harness hasn't tested. In rough +priority order: + +1. **Different residential country.** The probe defaults to `us`. A site may + geo-block US residential but allow its home country. Probe its likely region + (a `.fr`/`.de`/`.jp` TLD, a known regional service). Confirm availability + first: `bash /scripts/residential-probe.sh `. +2. **`wait_for_selector`.** A `cef` cell that returns a 200 with tiny + `content_bytes` is an SPA that hadn't hydrated when the DOM was dumped. + Open it interactively to find the content selector, then encode it in the + rule (`wait_for_selector`), so CEF waits for real content. + ```bash + wick fetch --render cef --wait-for-selector 'article' --json + ``` +3. **URL rewrite.** Some sites have a lighter endpoint that isn't bot-walled + (the built-in `www.reddit.com → old.reddit.com` rewrite is the canonical + example). If you find one, it belongs in `fetch.rs`'s rewrite list, not a + rule — open a PR. +4. **CEF + residential.** A DataDome-class site (e.g. `apkpure` — failed every + cell) typically needs *both* a real browser and a clean IP. The SOCKS + harness can't test this combination (`--proxy` routes Cronet, not CEF — CEF + residential is a WireGuard preload). Verify it from a tunneled Linux box, or + record the hypothesis and flag for that environment. +5. **Nothing plausible** → open a GitHub issue documenting the host, the cells + tried, and the causes, so a human can investigate. Don't invent a rule that + isn't backed by a passing probe. + +## Step 3 — test the hypothesis + +Re-probe just the candidates with the tuned parameters, e.g. a different +country: + +```bash +source /scripts/residential-proxy-env.sh +# Probe a specific region for geo-blocked hosts (probe.sh sweeps the failing +# set; for a one-off host use wick fetch directly through the proxy): +PX=$(bash bench/proxy-providers.sh --provider=oxylabs --country=fr) +wick fetch --json --no-robots --render cef --proxy "$PX" https:/// +``` + +A cell counts as a win only at `status 200` with `content_bytes` above the +block threshold (~1000) — a 200 with near-zero content is a challenge shell, +not success (same rule the harness uses). + +## Step 4 — act + +- **A new method worked** → add/adjust the measured rule and republish: + ```bash + WICK_PUBLISH_KEY= bash bench/publish-rules.sh # --dry-run first + ``` + (If the win needs a param the rule schema can't express yet — e.g. a per-host + residential country — open a PR extending the schema; don't fake it.) +- **A seed is wrong** (probe contradicts it and the probe is trustworthy) → the + merge already corrects it on publish; just confirm it's published. +- **Nothing worked** → file the issue from Step 2.5 and leave the host's + existing rule/seed untouched. + +## Step 5 — log + +Append a short curation note (date, hosts reviewed, methods tried, outcomes) to +`~/.wick/probe/curation-log.md` so the next pass sees what's already been tried +and doesn't re-litigate dead ends. + +## Scheduling + +``` +# Sundays 05:00 — after the Sunday 04:00 probe sweep has refreshed the traces +0 5 * * 0 cd /abs/path/wick && bash bench/curate-inputs.sh > /tmp/curate.json && \ + claude -p "/wick-curate review /tmp/curate.json and act per the skill" +``` + +## Guardrails + +- **Never publish a rule not backed by a passing probe.** A hypothesis is not a + rule. The whole point of the loop is that rules are *measured*. +- **Respect the offline signal.** A high `offline` fraction means the failures + are users' networks, not the site — excluding these is the difference between + improving and chasing ghosts. +- **Residential is for reachability testing of our own routing**, per + `/residential-proxy` — not general scraping. +- Methodology caveats (operator-vantage baseline, SOCKS-vs-CEF) live in + `bench/PROBE.md`; read them before trusting a single probe. diff --git a/bench/curate-inputs.sh b/bench/curate-inputs.sh new file mode 100755 index 0000000..0ec80f5 --- /dev/null +++ b/bench/curate-inputs.sh @@ -0,0 +1,74 @@ +#!/usr/bin/env bash +# Gather the weekly curation inputs into one JSON document for the +# /wick-curate agent to reason over. Read-only and creds-free: public stats + +# GET /v1/site-rules + the local probe traces. The agent (see +# agent-skill/wick-curate/SKILL.md) consumes this to decide what to re-probe +# and what new methods to try. +# +# Output (stdout): { failing: [...], hard: [...], published_rules: {...}, +# generated_at: } +# failing — site-side failing hosts (low success, failures NOT mostly +# user-offline), with a per-cause breakdown so the agent can see +# WHY each fails (reset/refused/timeout/403…). +# hard — hosts from the latest probe sweep where EVERY tested cell failed +# (cronet / cronet+residential / cef) — the ones that need a method +# the harness hasn't tried yet. +# published_rules — what clients are currently being served. + +set -u + +STATS_URL="${WICK_STATS_URL:-https://releases.getwick.dev/v1/stats/summary}" +RULES_URL="${WICK_RULES_URL:-https://releases.getwick.dev/v1/site-rules}" +PROBE_DIR="${WICK_PROBE_OUT_DIR:-$HOME/.wick/probe}" +MIN_FETCHES="${WICK_CURATE_MIN_FETCHES:-4}" +MAX_SR="${WICK_CURATE_MAX_SR:-0.5}" + +command -v jq >/dev/null || { echo "ERROR: jq required" >&2; exit 1; } +command -v curl >/dev/null || { echo "ERROR: curl required" >&2; exit 1; } + +fetch() { curl -s --max-time 30 --retry 2 "$1"; } + +stats="$(fetch "$STATS_URL")" +printf '%s' "$stats" | jq -e '.rows' >/dev/null 2>&1 || { echo "ERROR: bad stats from $STATS_URL" >&2; exit 1; } + +rules="$(fetch "$RULES_URL")" +printf '%s' "$rules" | jq -e '.' >/dev/null 2>&1 || rules='{"rules":{}}' + +latest_trace="$(ls -1t "$PROBE_DIR"/probe-*.jsonl 2>/dev/null | head -1)" +traces='[]' +[ -n "$latest_trace" ] && traces="$(jq -s '.' "$latest_trace" 2>/dev/null || echo '[]')" + +jq -n \ + --argjson stats "$stats" \ + --argjson rules "$rules" \ + --argjson traces "$traces" \ + --argjson minf "$MIN_FETCHES" \ + --argjson maxsr "$MAX_SR" ' + { + generated_at: ($stats.generated_at // null), + failing: ( + $stats.rows + | group_by(.host) + | map({ + host: .[0].host, + fetches: (map(.fetches) | add), + successes: (map(.successes) | add), + offline: (map((.error_kind_dist // {}).offline // 0) | add), + causes: ( + reduce .[] as $r ({}; + reduce (($r.error_kind_dist // {}) | to_entries[]) as $e (.; + .[$e.key] = ((.[$e.key] // 0) + $e.value))) + ), + } + | . + { sr: (if .fetches > 0 then (.successes / .fetches) else 1 end), + failures: (.fetches - .successes) }) + | map(select(.fetches >= $minf and .sr < $maxsr + and (.failures <= 0 or (.offline / .failures) < 0.5))) + | sort_by(.sr, (-.fetches)) + ), + hard: ( + $traces + | map(select((.cells | to_entries | any(.value | startswith("ok"))) | not)) + ), + published_rules: ($rules.rules // {}), + }' From 14edda006e522334bd4ea3a48d9d207159abdfab Mon Sep 17 00:00:00 2001 From: Adam Fisk Date: Sat, 27 Jun 2026 06:57:40 -0600 Subject: [PATCH 2/2] curate: mirror the explicit sort_by(.host) before group_by Same belt-and-suspenders as probe.sh for the host aggregation. Co-Authored-By: Claude Opus 4.8 (1M context) --- bench/curate-inputs.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/bench/curate-inputs.sh b/bench/curate-inputs.sh index 0ec80f5..0b69a69 100755 --- a/bench/curate-inputs.sh +++ b/bench/curate-inputs.sh @@ -48,6 +48,8 @@ jq -n \ generated_at: ($stats.generated_at // null), failing: ( $stats.rows + # explicit sort before group_by for clarity (jq group_by sorts internally) + | sort_by(.host) | group_by(.host) | map({ host: .[0].host,