Self-improving site routing: curated rules + failure classification + residential probe loop#9
Open
myleshorton wants to merge 8 commits into
Open
Self-improving site routing: curated rules + failure classification + residential probe loop#9myleshorton wants to merge 8 commits into
myleshorton wants to merge 8 commits into
Conversation
…ation Add a curated, evolving per-site routing list (`site_rules`) consulted ABOVE the per-machine `site_cache`, so a client routes correctly on its first visit instead of re-paying the Cronet-fail-then-escalate cost. Seeded from anti-bot vendor knowledge + live failure telemetry (source/confidence tagged), overlaid by `~/.wick/site-rules.json` and refreshed daily from the Worker. Also classify transport failures so a user disconnect is never mistaken for "this site is hard" (which would poison the rules): capture the real Cronet net-error in `on_failed` (previously discarded) and gate every non-definitive cause on a connectivity probe — offline / dns / timeout / reset / refused / unreachable / quic / connect / other. The Worker aggregates `error_kind_dist` alongside `status_dist`, and serves the rules via GET/POST /v1/site-rules. - site_rules.rs: include_str! seed + on-disk overlay + once-per-process daily refresh - fetch.rs: rule-aware should_use_cef_first; thread residential flag + selector into CEF; classify_transport_error + connectivity probe (proxy-aware) - analytics.rs: report_transport_error carrying error_kind - cronet: bind Cronet_Error_error_code_get; surface the cause in on_failed - cef.rs: respawn the CEF daemon on a residential-mode mismatch - site_cache.rs: extract shared parent_domain host walk - main.rs: register site_rules; add `wick fetch --json` - worker: error_kind_dist + GET /v1/site-rules (public) + POST /v1/site-rules/:key (auth) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the loop from the public stats page back into routing. probe.sh reads /v1/stats/summary, selects genuinely site-side failing hosts (dropping error_kind=offline so user disconnects aren't chased), and probes a cronet | cronet+residential | cef matrix per host via `wick fetch --json`, deriving render + needs_residential. publish-rules.sh merges the measured verdicts with the bundled seed (measured wins, so a measurement CORRECTS an over-aggressive seed) and POSTs to /v1/site-rules. Fixes proxy-providers.sh: oxylabs is HTTP CONNECT on :7777 (:443-only), not SOCKS5 — the old socks5:// URL failed. See bench/PROBE.md for the pipeline, scheduling, and methodology caveats (notably: run from a datacenter VM to detect needs_residential faithfully; --proxy routes Cronet, not CEF). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR implements a closed-loop “self-improving routing” system: clients emit richer failure telemetry, the Worker aggregates it (including transport-failure causes), a probe harness measures best strategies (Cronet/Cronet+proxy/CEF), and curated per-site rules are published for clients to consume on first visit.
Changes:
- Add curated site-rules: bundled seed + on-disk overlay refreshed daily from the Worker, consulted above the per-machine
site_cache. - Add transport-failure cause classification (with a connectivity probe gate) and propagate
error_kind_distthrough Worker aggregation. - Add probe + publish scripts to measure and push merged rules to the Worker, plus CLI
--jsonoutput for deterministic probing.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| worker/src/index.js | Adds error_kind_dist allowlisting/aggregation and new public/private site-rules endpoints. |
| rust/src/site_rules.rs | Introduces curated site-rules (seed + overlay), lookup semantics, and background refresh logic. |
| rust/src/site_cache.rs | Refactors parent-domain walk into shared helper for consistent cache/rules scoping. |
| rust/src/main.rs | Adds wick fetch --json output mode for probe harness consumption. |
| rust/src/fetch.rs | Consults curated rules in routing decisions; adds transport error classification with connectivity probe. |
| rust/src/cronet/mod.rs | Captures Cronet net-error codes into error messages for downstream classification. |
| rust/src/cronet/ffi.rs | Exposes Cronet error-code accessor needed for failure classification. |
| rust/src/cef.rs | Ensures CEF daemon is respawned when residential mode changes (singleton mode correctness). |
| rust/src/analytics.rs | Adds report_transport_error including error_kind for Worker aggregation. |
| rust/data/site-rules.json | Adds initial bundled seed rules document. |
| bench/publish-rules.sh | Publishes merged seed+measured rules to Worker (measured wins). |
| bench/proxy-providers.sh | Updates Oxylabs proxy scheme to HTTP CONNECT for compatibility. |
| bench/probe.sh | Adds probing harness: candidate selection from stats + strategy matrix + measured rules output. |
| bench/PROBE.md | Documents harness usage, scheduling, and methodology caveats. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- worker: only count error_kind on transport failures (ok !== true), so a client can't skew the offline fraction by attaching it to OK events - fetch/main: bridge the --proxy CLI arg into WICK_PROXY so connectivity_ok probes through the configured proxy (a proxied-only host was misclassified "offline") - site_rules: Windows-safe overlay replace (rename won't overwrite on Windows) - cronet: fix stale doc reference (classify_transport_error / candidate_cause) - bench/probe.sh: scheme-agnostic proxy wording (oxylabs is HTTP CONNECT, not SOCKS5) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deploying wickproject with
|
| Latest commit: |
2e061b0
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://cf42b53e.wickproject.pages.dev |
| Branch Preview URL: | https://self-improving-site-rules.wickproject.pages.dev |
- fetch --json: rename `bytes` → `content_bytes` and document it as the extracted-content size (a challenge/JS shell extracts to near nothing, so a small value flags a block) — not bytes-on-wire. Updated bench/probe.sh. - worker: gate error_kind on statusBucket==="0" as well — a cause means "no HTTP response at all", so an HTTP error (e.g. 403) must not carry one. - worker: reject arrays in doc.rules validation (typeof [] === "object" would otherwise let an array through as a rules map). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ble all rules A published overlay entry missing `render` (manual edit, residential-only rule, partial doc) made serde_json fail the whole-file parse, silently dropping EVERY overlay rule back to the seed. `render` is now #[serde(default)] — an empty value is "no opinion" (same as no rule) per should_use_cef_first. Adds a test. Addresses Copilot round-3 feedback on #9. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DaemonProcess.use_residential stored the *requested* mode, but the LD_PRELOAD tunnel only applies when WireGuard is up AND bindwg is present. Compute the effective mode once (want_residential) and use it for the reuse check, the preload decision, and the stored flag. Fixes two issues: (a) a daemon spawned non-residential because the tunnel was down would never switch to residential when the tunnel later came up, and (b) non-residential requests triggered needless respawns against a daemon that was already effectively non-residential. Addresses Copilot round-4 feedback on #9. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- fetch/main: record the resolved proxy in a OnceLock that connectivity_ok reads, instead of std::env::set_var. set_var is unsound under the multi-thread Tokio runtime (it can race env reads on worker threads), and the prior "no threads spawned yet" rationale was wrong (#[tokio::main] spawns workers before the body runs). - bench/probe.sh: resolve `timeout` vs `gtimeout` (macOS ships neither by default) and run without a per-request timeout + warn if absent, rather than failing the whole sweep on a default Mac. Addresses Copilot round-5 feedback on #9. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Belt-and-suspenders / clarity for the host aggregation. jq's group_by already sorts by the key internally (so the candidate totals were correct — verified live), but the explicit sort makes intent obvious and closes the review thread. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0198910 to
2e061b0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Self-improving site routing
Wick now closes a feedback loop: it learns which sites it fails on from the public stats, empirically tests access methods through residential proxies, and maintains a shared curated "known behaviors" list so every client routes correctly on its first visit — instead of each machine independently re-paying the Cronet‑fail‑then‑escalate cost.
sequenceDiagram autonumber participant C as Wick client<br/>fetch.rs participant W as Worker<br/>index.js participant H as Probe harness<br/>bench/probe.sh participant R as Residential proxy<br/>oxylabs C->>W: POST /v1/events host, strategy, ok, error_kind Note over W: aggregate per host:<br/>success_rate + error_kind_dist H->>W: GET /v1/stats/summary Note over H: select site-side failing hosts<br/>drop error_kind=offline ⚠️ H->>R: fetch via cronet / cronet+res / cef R-->>H: per-strategy status + bytes Note over H: derive render + needs_residential H->>W: POST /v1/site-rules merged seed + measured C->>W: GET /v1/site-rules daily refresh W-->>C: curated rules Note over C: site_rules consulted ABOVE site_cache<br/>right method on first visit ✅What's in this PR
1. Curated site-rules (the evolving list) —
rust/src/site_rules.rs+rust/data/site-rules.jsonsite_cache: a curatedrender: cef/needs_residentialrule routes a first-time visit correctly with no local learning.include_str!) overlaid by~/.wick/site-rules.json, refreshed daily from the Worker — so the list evolves without a reinstall.fetch.rsnow threads the rule's residential flag + selector into every CEF path (the old hardcodeduse_residential: falseis gone).2. Transport-failure classification — the data-quality foundation the loop depends on
cronet-transport-errorbucket couldn't tell a user disconnect from a site actively blocking us — so a user whose wifi died 25× looked identical to a hard site and would poison the rules.cronet::on_failednow captures the real net-error (was discarded);classify_transport_errorgates every non-definitive cause on a connectivity probe →offline / dns / timeout / reset / refused / unreachable / quic / connect / other.error_kind_distalongsidestatus_dist.3. The probe harness —
bench/probe.sh,bench/publish-rules.sh,bench/PROBE.md/v1/stats/summary, selects site-side failing hosts (dropsoffline-dominated so user disconnects aren't chased), probes acronet | cronet+residential | cefmatrix viawick fetch --json, derivesrender+needs_residential.publish-rules.shmerges measured verdicts with the seed (measured wins, correcting over-aggressive seeds) →POST /v1/site-rules.4. Propagation — Worker
GET /v1/site-rules(public, cached) +POST /v1/site-rules/:key(auth); clientsite_rules::refresh_if_stale()(once/process, background, atomic, opt-out-aware).Verified
--features cronetconfigs (the real shipping path); 28 unit tests pass, including rule-precedence and cause-taxonomy tests. Worker passesnode --check.apkpure(DataDome — failed every testable cell).Not in this PR / follow-ups
npx wrangler deployactivateserror_kind_dist+ the site-rules endpoints) and no rules are published yet.PROBE.md): thecronetbaseline cell uses the operator's own IP — run the harness from a datacenter VM to detectneeds_residentialfaithfully. And--proxyroutes Cronet, not CEF (CEF residential is a WireGuard preload), socef+residentialisn't tested here.🤖 Generated with Claude Code