Skip to content

Self-improving site routing: curated rules + failure classification + residential probe loop#9

Open
myleshorton wants to merge 8 commits into
mainfrom
self-improving-site-rules
Open

Self-improving site routing: curated rules + failure classification + residential probe loop#9
myleshorton wants to merge 8 commits into
mainfrom
self-improving-site-rules

Conversation

@myleshorton

Copy link
Copy Markdown
Member

Self-improving site routing

Wick now closes a feedback loop: it learns which sites it fails on from the public stats, empirically tests access methods through residential proxies, and maintains a shared curated "known behaviors" list so every client routes correctly on its first visit — instead of each machine independently re-paying the Cronet‑fail‑then‑escalate cost.

sequenceDiagram
    autonumber
    participant C as Wick client<br/>fetch.rs
    participant W as Worker<br/>index.js
    participant H as Probe harness<br/>bench/probe.sh
    participant R as Residential proxy<br/>oxylabs

    C->>W: POST /v1/events host, strategy, ok, error_kind
    Note over W: aggregate per host:<br/>success_rate + error_kind_dist
    H->>W: GET /v1/stats/summary
    Note over H: select site-side failing hosts<br/>drop error_kind=offline ⚠️
    H->>R: fetch via cronet / cronet+res / cef
    R-->>H: per-strategy status + bytes
    Note over H: derive render + needs_residential
    H->>W: POST /v1/site-rules merged seed + measured
    C->>W: GET /v1/site-rules daily refresh
    W-->>C: curated rules
    Note over C: site_rules consulted ABOVE site_cache<br/>right method on first visit ✅
Loading

What's in this PR

1. Curated site-rules (the evolving list)rust/src/site_rules.rs + rust/data/site-rules.json

  • Consulted above the per-machine site_cache: a curated render: cef / needs_residential rule routes a first-time visit correctly with no local learning.
  • Bundled seed (include_str!) overlaid by ~/.wick/site-rules.json, refreshed daily from the Worker — so the list evolves without a reinstall.
  • fetch.rs now threads the rule's residential flag + selector into every CEF path (the old hardcoded use_residential: false is gone).

2. Transport-failure classification — the data-quality foundation the loop depends on

  • The old cronet-transport-error bucket couldn't tell a user disconnect from a site actively blocking us — so a user whose wifi died 25× looked identical to a hard site and would poison the rules.
  • cronet::on_failed now captures the real net-error (was discarded); classify_transport_error gates every non-definitive cause on a connectivity probe → offline / dns / timeout / reset / refused / unreachable / quic / connect / other.
  • Worker aggregates error_kind_dist alongside status_dist.

3. The probe harnessbench/probe.sh, bench/publish-rules.sh, bench/PROBE.md

  • Reads /v1/stats/summary, selects site-side failing hosts (drops offline-dominated so user disconnects aren't chased), probes a cronet | cronet+residential | cef matrix via wick fetch --json, derives render + needs_residential.
  • publish-rules.sh merges measured verdicts with the seed (measured wins, correcting over-aggressive seeds) → POST /v1/site-rules.

4. Propagation — Worker GET /v1/site-rules (public, cached) + POST /v1/site-rules/:key (auth); client site_rules::refresh_if_stale() (once/process, background, atomic, opt-out-aware).

Verified

  • Built and tested in both the default and --features cronet configs (the real shipping path); 28 unit tests pass, including rule-precedence and cause-taxonomy tests. Worker passes node --check.
  • Ran live against oxylabs US residential (3/3 clean exits). The loop auto-corrected 5 over-aggressive hand-seeds (reuters/cfr/tradingview/apkmirror/apkcombo all work on plain Cronet from a clean vantage — the telemetry "100% failures" were vantage/noise) and preserved the seed for apkpure (DataDome — failed every testable cell).
  • A prior 3-agent adversarial review of part 1 surfaced and fixed 6 issues (CEF daemon residential-mode reuse, universal offline gate, proxy-aware probe, worker key-cardinality, dup host-walk, dead code).

Not in this PR / follow-ups

  • Worker is not yet deployed (npx wrangler deploy activates error_kind_dist + the site-rules endpoints) and no rules are published yet.
  • Methodology caveat (see PROBE.md): the cronet baseline cell uses the operator's own IP — run the harness from a datacenter VM to detect needs_residential faithfully. And --proxy routes Cronet, not CEF (CEF residential is a WireGuard preload), so cef+residential isn't tested here.
  • PR4: a weekly Claude curation agent to review anomalies and propose genuinely new methods.

🤖 Generated with Claude Code

myleshorton and others added 2 commits June 26, 2026 15:24
…ation

Add a curated, evolving per-site routing list (`site_rules`) consulted ABOVE
the per-machine `site_cache`, so a client routes correctly on its first visit
instead of re-paying the Cronet-fail-then-escalate cost. Seeded from anti-bot
vendor knowledge + live failure telemetry (source/confidence tagged), overlaid
by `~/.wick/site-rules.json` and refreshed daily from the Worker.

Also classify transport failures so a user disconnect is never mistaken for
"this site is hard" (which would poison the rules): capture the real Cronet
net-error in `on_failed` (previously discarded) and gate every non-definitive
cause on a connectivity probe — offline / dns / timeout / reset / refused /
unreachable / quic / connect / other. The Worker aggregates `error_kind_dist`
alongside `status_dist`, and serves the rules via GET/POST /v1/site-rules.

- site_rules.rs: include_str! seed + on-disk overlay + once-per-process daily refresh
- fetch.rs: rule-aware should_use_cef_first; thread residential flag + selector into CEF; classify_transport_error + connectivity probe (proxy-aware)
- analytics.rs: report_transport_error carrying error_kind
- cronet: bind Cronet_Error_error_code_get; surface the cause in on_failed
- cef.rs: respawn the CEF daemon on a residential-mode mismatch
- site_cache.rs: extract shared parent_domain host walk
- main.rs: register site_rules; add `wick fetch --json`
- worker: error_kind_dist + GET /v1/site-rules (public) + POST /v1/site-rules/:key (auth)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the loop from the public stats page back into routing. probe.sh reads
/v1/stats/summary, selects genuinely site-side failing hosts (dropping
error_kind=offline so user disconnects aren't chased), and probes a
cronet | cronet+residential | cef matrix per host via `wick fetch --json`,
deriving render + needs_residential. publish-rules.sh merges the measured
verdicts with the bundled seed (measured wins, so a measurement CORRECTS an
over-aggressive seed) and POSTs to /v1/site-rules.

Fixes proxy-providers.sh: oxylabs is HTTP CONNECT on :7777 (:443-only), not
SOCKS5 — the old socks5:// URL failed.

See bench/PROBE.md for the pipeline, scheduling, and methodology caveats
(notably: run from a datacenter VM to detect needs_residential faithfully;
--proxy routes Cronet, not CEF).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a closed-loop “self-improving routing” system: clients emit richer failure telemetry, the Worker aggregates it (including transport-failure causes), a probe harness measures best strategies (Cronet/Cronet+proxy/CEF), and curated per-site rules are published for clients to consume on first visit.

Changes:

  • Add curated site-rules: bundled seed + on-disk overlay refreshed daily from the Worker, consulted above the per-machine site_cache.
  • Add transport-failure cause classification (with a connectivity probe gate) and propagate error_kind_dist through Worker aggregation.
  • Add probe + publish scripts to measure and push merged rules to the Worker, plus CLI --json output for deterministic probing.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
worker/src/index.js Adds error_kind_dist allowlisting/aggregation and new public/private site-rules endpoints.
rust/src/site_rules.rs Introduces curated site-rules (seed + overlay), lookup semantics, and background refresh logic.
rust/src/site_cache.rs Refactors parent-domain walk into shared helper for consistent cache/rules scoping.
rust/src/main.rs Adds wick fetch --json output mode for probe harness consumption.
rust/src/fetch.rs Consults curated rules in routing decisions; adds transport error classification with connectivity probe.
rust/src/cronet/mod.rs Captures Cronet net-error codes into error messages for downstream classification.
rust/src/cronet/ffi.rs Exposes Cronet error-code accessor needed for failure classification.
rust/src/cef.rs Ensures CEF daemon is respawned when residential mode changes (singleton mode correctness).
rust/src/analytics.rs Adds report_transport_error including error_kind for Worker aggregation.
rust/data/site-rules.json Adds initial bundled seed rules document.
bench/publish-rules.sh Publishes merged seed+measured rules to Worker (measured wins).
bench/proxy-providers.sh Updates Oxylabs proxy scheme to HTTP CONNECT for compatibility.
bench/probe.sh Adds probing harness: candidate selection from stats + strategy matrix + measured rules output.
bench/PROBE.md Documents harness usage, scheduling, and methodology caveats.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread worker/src/index.js Outdated
Comment thread rust/src/fetch.rs Outdated
Comment thread rust/src/site_rules.rs Outdated
Comment thread rust/src/cronet/mod.rs Outdated
Comment thread bench/probe.sh
Comment thread bench/probe.sh Outdated
- worker: only count error_kind on transport failures (ok !== true), so a
  client can't skew the offline fraction by attaching it to OK events
- fetch/main: bridge the --proxy CLI arg into WICK_PROXY so connectivity_ok
  probes through the configured proxy (a proxied-only host was misclassified
  "offline")
- site_rules: Windows-safe overlay replace (rename won't overwrite on Windows)
- cronet: fix stale doc reference (classify_transport_error / candidate_cause)
- bench/probe.sh: scheme-agnostic proxy wording (oxylabs is HTTP CONNECT, not SOCKS5)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 26, 2026

Copy link
Copy Markdown

Deploying wickproject with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2e061b0
Status: ✅  Deploy successful!
Preview URL: https://cf42b53e.wickproject.pages.dev
Branch Preview URL: https://self-improving-site-rules.wickproject.pages.dev

View logs

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Comment thread rust/src/main.rs Outdated
Comment thread worker/src/index.js Outdated
Comment thread worker/src/index.js Outdated
- fetch --json: rename `bytes` → `content_bytes` and document it as the
  extracted-content size (a challenge/JS shell extracts to near nothing, so a
  small value flags a block) — not bytes-on-wire. Updated bench/probe.sh.
- worker: gate error_kind on statusBucket==="0" as well — a cause means "no
  HTTP response at all", so an HTTP error (e.g. 403) must not carry one.
- worker: reject arrays in doc.rules validation (typeof [] === "object" would
  otherwise let an array through as a rules map).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Comment thread rust/src/site_rules.rs
Comment thread bench/probe.sh
…ble all rules

A published overlay entry missing `render` (manual edit, residential-only rule,
partial doc) made serde_json fail the whole-file parse, silently dropping EVERY
overlay rule back to the seed. `render` is now #[serde(default)] — an empty
value is "no opinion" (same as no rule) per should_use_cef_first. Adds a test.

Addresses Copilot round-3 feedback on #9.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Comment thread rust/src/cef.rs
DaemonProcess.use_residential stored the *requested* mode, but the LD_PRELOAD
tunnel only applies when WireGuard is up AND bindwg is present. Compute the
effective mode once (want_residential) and use it for the reuse check, the
preload decision, and the stored flag. Fixes two issues: (a) a daemon spawned
non-residential because the tunnel was down would never switch to residential
when the tunnel later came up, and (b) non-residential requests triggered
needless respawns against a daemon that was already effectively non-residential.

Addresses Copilot round-4 feedback on #9.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Comment thread rust/src/main.rs Outdated
Comment thread bench/probe.sh Outdated
myleshorton and others added 2 commits June 26, 2026 16:04
- fetch/main: record the resolved proxy in a OnceLock that connectivity_ok
  reads, instead of std::env::set_var. set_var is unsound under the
  multi-thread Tokio runtime (it can race env reads on worker threads), and
  the prior "no threads spawned yet" rationale was wrong (#[tokio::main]
  spawns workers before the body runs).
- bench/probe.sh: resolve `timeout` vs `gtimeout` (macOS ships neither by
  default) and run without a per-request timeout + warn if absent, rather than
  failing the whole sweep on a default Mac.

Addresses Copilot round-5 feedback on #9.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Belt-and-suspenders / clarity for the host aggregation. jq's group_by already
sorts by the key internally (so the candidate totals were correct — verified
live), but the explicit sort makes intent obvious and closes the review thread.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@myleshorton myleshorton force-pushed the self-improving-site-rules branch from 0198910 to 2e061b0 Compare June 27, 2026 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants