From 33f454cc2ea5a658d32b15713dff472fdbca019b Mon Sep 17 00:00:00 2001
From: Adam Fisk <afisk@getlantern.org>
Date: Fri, 26 Jun 2026 15:53:21 -0600
Subject: [PATCH 1/2] curate: weekly site-rules curation agent (the "invent new
 methods" pass)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds the judgment layer of the self-improvement loop on top of the
deterministic probe harness. bench/curate-inputs.sh gathers the weekly inputs
(stats with per-cause failure breakdown, latest probe traces, currently
published rules) into one JSON; agent-skill/wick-curate/SKILL.md reasons about
what the fixed matrix couldn't crack — sites failing EVERY cell (e.g. apkpure),
high user-offline noise, seed/measured conflicts — and proposes + tests
genuinely new methods (different residential country, wait_for_selector, URL
rewrites, CEF+residential), then re-probes and republishes.

Guardrail: never publish a rule that isn't backed by a passing probe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 agent-skill/wick-curate/SKILL.md | 128 +++++++++++++++++++++++++++++++
 bench/curate-inputs.sh           |  74 ++++++++++++++++++
 2 files changed, 202 insertions(+)
 create mode 100644 agent-skill/wick-curate/SKILL.md
 create mode 100755 bench/curate-inputs.sh

diff --git a/agent-skill/wick-curate/SKILL.md b/agent-skill/wick-curate/SKILL.md
new file mode 100644
index 0000000..6ee48f5
--- /dev/null
+++ b/agent-skill/wick-curate/SKILL.md
@@ -0,0 +1,128 @@
+---
+name: wick-curate
+description: Weekly self-improvement curation pass for Wick's site-rules. Reads the public stats (with per-cause failure breakdown), the residential probe traces, and the currently-published rules, then reasons about anomalies the deterministic harness can't resolve on its own — sites that fail EVERY tested strategy, high user-offline noise, seed/measured conflicts — and proposes + tests genuinely NEW access methods (different residential country, wait-for-selector, URL rewrites, CEF+residential). Use weekly, or after a stats regression on a high-volume host. Operator-facing: needs prod Vault creds (via /residential-proxy) and a Worker publish key.
+---
+
+# Wick site-rules curation (the weekly "invent new methods" pass)
+
+The deterministic harness (`bench/probe.sh`) measures a fixed matrix —
+`cronet | cronet+residential | cef` — and corrects the rules from that. This
+skill is the **judgment layer on top**: it looks at what the matrix *couldn't*
+crack and what the stats reveal, then reasons about methods the matrix doesn't
+try. It is the "H reviews anomalies and invents new methods" box in the
+self-improvement loop.
+
+Run it **weekly** (rules change slowly), or when a high-volume host regresses.
+
+## Step 1 — gather + triage
+
+```bash
+bash bench/curate-inputs.sh > /tmp/curate.json     # read-only, no creds
+jq '{failing: (.failing|length), hard: (.hard|length)}' /tmp/curate.json
+```
+
+`curate-inputs.sh` returns three buckets. Read them and sort hosts into:
+
+- **`hard`** — the probe's latest sweep tried `cronet`, `cronet+residential`,
+  and `cef` and *every cell failed*. These are the real targets for this pass:
+  the matrix has nothing left to offer, so a new method is needed.
+- **`failing`** — site-side failing hosts (from stats) with a `causes`
+  breakdown. Use the cause to reason about *why*:
+  - mostly `reset` / `refused` → active blocking (anti-bot edge or RST). CEF +
+    a clean residential IP is the usual answer; if already tried, see "new
+    methods" below.
+  - mostly `403` in `status_dist` → datacenter-IP block or bot fingerprint →
+    `needs_residential` and/or CEF.
+  - **mostly `offline`** → user-side noise, NOT a hard site. Do not propose a
+    rule; note it and move on. (The probe already excludes these, but the stats
+    list still surfaces them — don't be fooled.)
+  - `dns` with the site clearly up → likely the *reporting users'* resolver, not
+    the site; low priority.
+- **seed/measured conflicts** — where `published_rules` shows a `source:"seed"`
+  entry that the latest probe contradicts. The publish merge already lets
+  measured win, but flag low-confidence ones for a re-probe.
+
+## Step 2 — reason about NEW methods (the part the matrix can't do)
+
+For each `hard` host, form a hypothesis the harness hasn't tested. In rough
+priority order:
+
+1. **Different residential country.** The probe defaults to `us`. A site may
+   geo-block US residential but allow its home country. Probe its likely region
+   (a `.fr`/`.de`/`.jp` TLD, a known regional service). Confirm availability
+   first: `bash <skills>/scripts/residential-probe.sh <CC>`.
+2. **`wait_for_selector`.** A `cef` cell that returns a 200 with tiny
+   `content_bytes` is an SPA that hadn't hydrated when the DOM was dumped.
+   Open it interactively to find the content selector, then encode it in the
+   rule (`wait_for_selector`), so CEF waits for real content.
+   ```bash
+   wick fetch --render cef --wait-for-selector 'article' --json <url>
+   ```
+3. **URL rewrite.** Some sites have a lighter endpoint that isn't bot-walled
+   (the built-in `www.reddit.com → old.reddit.com` rewrite is the canonical
+   example). If you find one, it belongs in `fetch.rs`'s rewrite list, not a
+   rule — open a PR.
+4. **CEF + residential.** A DataDome-class site (e.g. `apkpure` — failed every
+   cell) typically needs *both* a real browser and a clean IP. The SOCKS
+   harness can't test this combination (`--proxy` routes Cronet, not CEF — CEF
+   residential is a WireGuard preload). Verify it from a tunneled Linux box, or
+   record the hypothesis and flag for that environment.
+5. **Nothing plausible** → open a GitHub issue documenting the host, the cells
+   tried, and the causes, so a human can investigate. Don't invent a rule that
+   isn't backed by a passing probe.
+
+## Step 3 — test the hypothesis
+
+Re-probe just the candidates with the tuned parameters, e.g. a different
+country:
+
+```bash
+source <skills>/scripts/residential-proxy-env.sh
+# Probe a specific region for geo-blocked hosts (probe.sh sweeps the failing
+# set; for a one-off host use wick fetch directly through the proxy):
+PX=$(bash bench/proxy-providers.sh --provider=oxylabs --country=fr)
+wick fetch --json --no-robots --render cef --proxy "$PX" https://<host>/
+```
+
+A cell counts as a win only at `status 200` with `content_bytes` above the
+block threshold (~1000) — a 200 with near-zero content is a challenge shell,
+not success (same rule the harness uses).
+
+## Step 4 — act
+
+- **A new method worked** → add/adjust the measured rule and republish:
+  ```bash
+  WICK_PUBLISH_KEY=<key> bash bench/publish-rules.sh        # --dry-run first
+  ```
+  (If the win needs a param the rule schema can't express yet — e.g. a per-host
+  residential country — open a PR extending the schema; don't fake it.)
+- **A seed is wrong** (probe contradicts it and the probe is trustworthy) → the
+  merge already corrects it on publish; just confirm it's published.
+- **Nothing worked** → file the issue from Step 2.5 and leave the host's
+  existing rule/seed untouched.
+
+## Step 5 — log
+
+Append a short curation note (date, hosts reviewed, methods tried, outcomes) to
+`~/.wick/probe/curation-log.md` so the next pass sees what's already been tried
+and doesn't re-litigate dead ends.
+
+## Scheduling
+
+```
+# Sundays 05:00 — after the Sunday 04:00 probe sweep has refreshed the traces
+0 5 * * 0  cd /abs/path/wick && bash bench/curate-inputs.sh > /tmp/curate.json && \
+           claude -p "/wick-curate review /tmp/curate.json and act per the skill"
+```
+
+## Guardrails
+
+- **Never publish a rule not backed by a passing probe.** A hypothesis is not a
+  rule. The whole point of the loop is that rules are *measured*.
+- **Respect the offline signal.** A high `offline` fraction means the failures
+  are users' networks, not the site — excluding these is the difference between
+  improving and chasing ghosts.
+- **Residential is for reachability testing of our own routing**, per
+  `/residential-proxy` — not general scraping.
+- Methodology caveats (operator-vantage baseline, SOCKS-vs-CEF) live in
+  `bench/PROBE.md`; read them before trusting a single probe.
diff --git a/bench/curate-inputs.sh b/bench/curate-inputs.sh
new file mode 100755
index 0000000..0ec80f5
--- /dev/null
+++ b/bench/curate-inputs.sh
@@ -0,0 +1,74 @@
+#!/usr/bin/env bash
+# Gather the weekly curation inputs into one JSON document for the
+# /wick-curate agent to reason over. Read-only and creds-free: public stats +
+# GET /v1/site-rules + the local probe traces. The agent (see
+# agent-skill/wick-curate/SKILL.md) consumes this to decide what to re-probe
+# and what new methods to try.
+#
+# Output (stdout): { failing: [...], hard: [...], published_rules: {...},
+#                    generated_at: <stats.generated_at> }
+#   failing  — site-side failing hosts (low success, failures NOT mostly
+#              user-offline), with a per-cause breakdown so the agent can see
+#              WHY each fails (reset/refused/timeout/403…).
+#   hard     — hosts from the latest probe sweep where EVERY tested cell failed
+#              (cronet / cronet+residential / cef) — the ones that need a method
+#              the harness hasn't tried yet.
+#   published_rules — what clients are currently being served.
+
+set -u
+
+STATS_URL="${WICK_STATS_URL:-https://releases.getwick.dev/v1/stats/summary}"
+RULES_URL="${WICK_RULES_URL:-https://releases.getwick.dev/v1/site-rules}"
+PROBE_DIR="${WICK_PROBE_OUT_DIR:-$HOME/.wick/probe}"
+MIN_FETCHES="${WICK_CURATE_MIN_FETCHES:-4}"
+MAX_SR="${WICK_CURATE_MAX_SR:-0.5}"
+
+command -v jq >/dev/null   || { echo "ERROR: jq required" >&2; exit 1; }
+command -v curl >/dev/null || { echo "ERROR: curl required" >&2; exit 1; }
+
+fetch() { curl -s --max-time 30 --retry 2 "$1"; }
+
+stats="$(fetch "$STATS_URL")"
+printf '%s' "$stats" | jq -e '.rows' >/dev/null 2>&1 || { echo "ERROR: bad stats from $STATS_URL" >&2; exit 1; }
+
+rules="$(fetch "$RULES_URL")"
+printf '%s' "$rules" | jq -e '.' >/dev/null 2>&1 || rules='{"rules":{}}'
+
+latest_trace="$(ls -1t "$PROBE_DIR"/probe-*.jsonl 2>/dev/null | head -1)"
+traces='[]'
+[ -n "$latest_trace" ] && traces="$(jq -s '.' "$latest_trace" 2>/dev/null || echo '[]')"
+
+jq -n \
+  --argjson stats "$stats" \
+  --argjson rules "$rules" \
+  --argjson traces "$traces" \
+  --argjson minf "$MIN_FETCHES" \
+  --argjson maxsr "$MAX_SR" '
+  {
+    generated_at: ($stats.generated_at // null),
+    failing: (
+      $stats.rows
+      | group_by(.host)
+      | map({
+          host: .[0].host,
+          fetches:   (map(.fetches)   | add),
+          successes: (map(.successes) | add),
+          offline:   (map((.error_kind_dist // {}).offline // 0) | add),
+          causes: (
+            reduce .[] as $r ({};
+              reduce (($r.error_kind_dist // {}) | to_entries[]) as $e (.;
+                .[$e.key] = ((.[$e.key] // 0) + $e.value)))
+          ),
+        }
+        | . + { sr: (if .fetches > 0 then (.successes / .fetches) else 1 end),
+                failures: (.fetches - .successes) })
+      | map(select(.fetches >= $minf and .sr < $maxsr
+                   and (.failures <= 0 or (.offline / .failures) < 0.5)))
+      | sort_by(.sr, (-.fetches))
+    ),
+    hard: (
+      $traces
+      | map(select((.cells | to_entries | any(.value | startswith("ok"))) | not))
+    ),
+    published_rules: ($rules.rules // {}),
+  }'

From 14edda006e522334bd4ea3a48d9d207159abdfab Mon Sep 17 00:00:00 2001
From: Adam Fisk <afisk@getlantern.org>
Date: Sat, 27 Jun 2026 06:57:40 -0600
Subject: [PATCH 2/2] curate: mirror the explicit sort_by(.host) before
 group_by

Same belt-and-suspenders as probe.sh for the host aggregation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 bench/curate-inputs.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/bench/curate-inputs.sh b/bench/curate-inputs.sh
index 0ec80f5..0b69a69 100755
--- a/bench/curate-inputs.sh
+++ b/bench/curate-inputs.sh
@@ -48,6 +48,8 @@ jq -n \
     generated_at: ($stats.generated_at // null),
     failing: (
       $stats.rows
+      # explicit sort before group_by for clarity (jq group_by sorts internally)
+      | sort_by(.host)
       | group_by(.host)
       | map({
           host: .[0].host,