Skip to content

feat(cert): add --remove-cert flag and Remove CA button for clean-slate revocation#121

Open
dazzling-no-more wants to merge 90 commits intotherealaleph:mainfrom
dazzling-no-more:feature/delete_certificate
Open

feat(cert): add --remove-cert flag and Remove CA button for clean-slate revocation#121
dazzling-no-more wants to merge 90 commits intotherealaleph:mainfrom
dazzling-no-more:feature/delete_certificate

Conversation

@dazzling-no-more
Copy link
Copy Markdown
Contributor

@dazzling-no-more dazzling-no-more commented Apr 24, 2026

Summary

  • Adds mhrv-rs --remove-cert (CLI) and a Remove CA button in the desktop UI for a verified clean-slate CA revocation: clears the OS trust store (macOS login+system keychains, Linux anchor dirs, Windows user+machine Trusted Root), best-effort NSS cleanup (Firefox profiles + Chrome/Chromium on Linux), and deletes the on-disk ca/ directory. config.json and the Apps Script deployment are never touched, so users don't have to redeploy Code.gs.
  • Safety first: is_ca_trusted_by_name() verification runs before file deletion and before NSS mutation. A failed OS removal returns RemovalIncomplete, preserves ca/, and leaves browser state alone — retries are idempotent. RemovalOutcome::{Clean, NssIncomplete} lets the UI/CLI print accurate "OS CA removed, browser cleanup partial" status instead of silent false success.
  • sudo-safe on Unix: reconcile_sudo_environment() detects geteuid() == 0 + SUDO_USER at each binary's main() entry and re-roots HOME to the invoking user — so data dir / Firefox profiles / macOS login keychain target the real user rather than root.

⚠️ Testing status

Only Windows has been smoke-tested end-to-end (Install → Check → Remove → Check round-trip via both CLI and UI, plus the mutex-on-flags exit-2 behavior). The macOS and Linux paths are built from the existing install-side patterns and covered by unit tests for all the pure logic, but the platform-specific security delete-certificate / update-ca-certificates / trust extract-compat code paths have not been executed on real hardware in this branch. A reviewer on macOS and a reviewer on at least one Linux distro (ideally one Debian-family and one RHEL-family) walking through the test plan below before merge would be valuable.

What changed

  • src/cert_installer.rsremove_ca + per-platform helpers, RemovalOutcome, NssReport, reconcile_sudo_environment, marker-gated Firefox enterprise_roots pref (user-authored lines preserved), idempotent NSS delete that distinguishes "cert not found" from DB-locked/corrupt errors (regression guard for SEC_ERROR_LOCKED_DATABASE)
  • src/main.rs--remove-cert flag, mutually exclusive with --install-cert, calls reconcile_sudo_environment() at startup
  • src/bin/ui.rsRemove CA button, Cmd::RemoveCa handler, shared cert_op_in_progress gate covering both Install and Remove, active-proxy guard for Remove (the CA keypair is live in memory while the proxy runs)
  • README.md — English + Persian docs for the new flag, sudo behavior note, correct CN (MasterHttpRelayVPN) for manual cleanup paths, upgrade note about the pre-marker enterprise_roots cosmetic orphan

Tests

29 new unit tests covering the pure logic:

  • Firefox user.js marker-block install/strip roundtrips and idempotency (bare lines respected as user-owned)
  • getent passwd home-dir parsing (Debian format + malformed inputs + macOS fallback semantics)
  • NssReport::is_clean() state rules
  • NSS stderr classification (standard "could not find cert", alt wording, locked DB, corrupt DB, permission denied, empty stderr)

Side-effecting paths (security, certutil, update-ca-certificates) are covered by manual E2E per platform since the codebase doesn't yet have a command-runner abstraction.

Test plan

Windows — ✅ smoke-tested locally

  • cargo test --lib — 101/101 passes locally
  • UI: Install CA → Check CA → Remove CA → Check CA round-trip; verify log shows file=missing trust_store=not trusted after Remove
  • CLI: mhrv-rs --install-cert then mhrv-rs --remove-cert; verify %APPDATA%\mhrv-rs\ca\ gone
  • CLI: mhrv-rs --install-cert --remove-cert returns exit 2 with --install-cert and --remove-cert cannot be combined

Still to test

  • Linux + sudo: confirm log line Detected sudo invocation (SUDO_USER=…): re-rooting HOME to … and that the cert is removed from the real user's Firefox user.js / ~/.pki/nssdb, not root's
  • Linux refresh failure: simulate broken update-ca-certificates (e.g. move it aside) and confirm ca/ survives + RemovalIncomplete is reported
  • Linux Debian-family + RHEL-family: verify the correct anchor-dir/refresh-cmd pair fires for each
  • macOS login-keychain-only install: run mhrv-rs --remove-cert as normal user, confirm no sudo prompt (system-keychain probe avoids escalation when the cert isn't there)
  • macOS system-keychain install: verify sudo escalation works when the cert IS in the system keychain
  • Any platform UI concurrency: click Start, immediately click Remove CA — button disabled, handler rejects with "proxy is running or starting"
  • Any platform UI serialization: click Install CA then Remove CA back-to-back — confirm cert_op_in_progress gate prevents the race

Compatibility with Mode::Full (#94)

Full mode doesn't use the MITM CA, so Remove CA is harmless there:

  • apps_script / google_only users: unchanged, works as described.
  • apps_script → full migrators: Remove CA is the recommended cleanup step after switching.
  • full-from-day-one users: no trust-store entry → verification passes → ca/ deleted if present → no-op in practice.

therealaleph and others added 30 commits April 22, 2026 11:04
…utdown-ui

Thanks @v4g4b0nd-0x76 — proper listener teardown on Stop is exactly what was needed. The 2-second grace window + force-abort fallback is a clean pattern.
Two user-reported issues.

=== GLIBC too new (reported via twitter) ===

Our linux-amd64 and linux-arm64 gnu builds were compiled on
ubuntu-latest (24.04, GLIBC 2.39), which means the resulting binaries
refuse to load on anything older:

  ./mhrv-rs: /lib/x86_64-linux-gnu/libc.so.6: version 'GLIBC_2.39'
    not found (required by ./mhrv-rs)

Users on Ubuntu 22.04 / Mint 21 (GLIBC 2.35) — the typical user in Iran
where this project's target audience lives, and where they can't
dist-upgrade because they're behind exactly the kind of network
restriction this tool exists to bypass — could not run the gnu builds
at all.

Fix: pin the linux-gnu matrix entries to ubuntu-22.04 runners. GLIBC
2.35 is now the minimum; binaries load on Ubuntu 22.04, Mint 21,
Debian 12, Fedora 36+, RHEL 9+ and everything newer.

Users on older distros (Ubuntu 20.04, CentOS 7) can still use the
static musl builds (mhrv-rs-linux-musl-amd64.tar.gz et al.) which
have no GLIBC dependency at all.

=== Short-screen laptops — main window content clipped (PR therealaleph#6) ===

Co-authored fix from @v4g4b0nd-0x76 in PR therealaleph#6 (manually applied to
avoid pulling in 400 lines of unrelated cargo-fmt churn):

- Wrap the CentralPanel body in ScrollArea::vertical()
  .auto_shrink([false; 2]) so everything stays reachable on short
  screens.
- Lower the min_inner_size from [420, 540] to [420, 400] so laptops
  with ~13" screens at default scaling can shrink the window without
  clipping UI elements.

Closes therealaleph#6.

Co-authored-by: v4g4b0nd-0x76 <v4g4b0nd-0x76@users.noreply.github.com>
Verified: the linux-amd64 binary's highest GLIBC symbol is now 2.34
(was 2.39 in v0.7.0 and earlier), so it runs on Ubuntu 22.04 / Mint 21
/ Debian 12 and anything newer.
Two user complaints:
- English words mixed inline in the Persian section were breaking the
  RTL text flow, making paragraphs hard to read.
- Language was too technical for non-developer users.

Fixes:

1. Every English / technical term is now wrapped in backticks
   (`Apps Script`, `MITM`, `SOCKS5`, `Deployment ID`, …). GitHub
   renders these as monospace LTR islands, which the browser's
   bidirectional text algorithm treats as embedded strong-LTR runs
   and doesn't let them flip the surrounding RTL paragraph direction.
2. Rewrote most paragraphs as shorter, plainer Persian sentences.
   Replaced jargon (run-time, on-the-fly, rewrite, trust store…)
   with everyday wording.
3. Converted dense prose into tables where it helped (download
   table by OS, config fields table, per-OS CA install table).
4. Added a 5-step walkthrough (script deploy → download → first
   run → config in UI → browser setup) that a non-technical user
   can follow top-to-bottom.
5. New 'How do I know it's working?' quick verification section.
6. New big FAQ at the bottom — covers the questions that actually
   come up: certificate install safety, how to remove the cert,
   how many Deployment IDs to use, YouTube / ChatGPT caveats,
   the GLIBC 2.39 issue, and CLI usage for power users.
7. Telegram pairing section reworded — explains the WHY first
   (Apps Script can't speak MTProto), then the one-line fix.
8. SNI pool editor flow written as numbered steps mirroring the
   actual UI buttons the user clicks.

English section unchanged.
…ation

Thanks @v4g4b0nd-0x76 for the feature. Two small fixes folded in on
the merge so master still builds + doesn't hit sharp edges:

- src/scan_ips.rs: rand::thread_rng() held across an .await tripped
  the Send bound on the async fn (ThreadRng isn't Send). Scoped the
  rng in a block so it drops before subsequent awaits.
- src/scan_ips.rs: guard /0 and /32 CIDRs in cidr_to_ips and
  ip_in_cidr against the 1u32 << 32 shift panic (debug mode). goog.json
  is unlikely to contain either but defensive.

Behavior unchanged otherwise:
- fetch_ips_from_api=false (default): identical to previous static
  scan-ips behavior.
- fetch_ips_from_api=true: fetches goog.json from www.gstatic.com,
  resolves famous Google domain IPs, prioritises matching CIDRs,
  samples up to max_ips_to_scan candidates, validates with gws/
  x-google-/alt-svc headers. If the fetch fails, falls back to the
  static list cleanly — verified locally.

Closes therealaleph#10.
…therealaleph#8), Windows UI diagnostics (therealaleph#7)

Three user-reported fixes / features in one release.

=== PR therealaleph#9 — dynamic Google IP discovery (@v4g4b0nd-0x76) ===

Already merged in the previous commit. Opt-in via 'fetch_ips_from_api'
in config. Pulls goog.json from www.gstatic.com, maps it against
resolved IPs of well-known Google domains, samples from matching
CIDRs, and validates each candidate with gws / x-google / alt-svc
response-header checks. Graceful fallback to the static list if the
fetch fails or nothing passes validation. Default is off so existing
users are unaffected. Closes therealaleph#10.

=== Issue therealaleph#8 — OpenWRT: 'accept: No file descriptors available' ===

OpenWRT routers ship a very low RLIMIT_NOFILE (often 1024, sometimes
256 on constrained devices). A browser's burst of ~30 parallel sub-
resource requests can fill the limit within seconds, after which
accept(2) returns EMFILE and the proxy is effectively dead.

Two-fold fix:

1. New assets/openwrt/mhrv-rs.init now sets procd limits nofile=
   "16384 16384" on the service. procd raises the per-process fd
   limit before the binary even starts.
2. New src/rlimit.rs best-effort-raises RLIMIT_NOFILE in the binary
   itself (Unix only, no new runtime deps — libc is already
   transitively present via tokio). Targets 16384 soft, capped to
   whatever hard limit the kernel already allows the user (so it
   doesn't need root).

Both layers mean the fix applies whether the user runs via
  /etc/init.d/mhrv-rs start    (procd limits kick in)
or
  ./mhrv-rs --config ...       (in-binary bump kicks in)
or any other invocation path.

Closes therealaleph#8.

=== Issue therealaleph#7 — Windows UI crashes silently ===

User report: on Win 11, run.bat prints 'Starting mhrv-rs UI...' and
exits clean, but no UI window ever appears. Root cause: the old
run.bat used 'start "" "mhrv-rs-ui.exe"' which returns
immediately — if the UI binary dies at launch time (missing GPU
driver, RDP without GL accel, AV blocking, …), the crash is invisible
because start already disowned the child.

Fix: run the UI in-place (not via 'start'), so its stderr and exit
code land in the run.bat cmd window. On non-zero exit print a helpful
checklist of common Windows launch failures and pause so the user can
screenshot the output for an issue report.

This doesn't fix the underlying crash for affected users, but it
turns a ghost-crash bug into a self-diagnosing one so the next report
includes actionable info. Closes-via-diag therealaleph#7.

=== Fixes folded into the PR therealaleph#9 merge ===

- src/scan_ips.rs: rand::thread_rng() held across an .await tripped
  the Send bound on the async fn. Scoped the rng in a block so it
  drops before the subsequent awaits.
- src/scan_ips.rs: defend /0 and /32 CIDRs in cidr_to_ips and
  ip_in_cidr against 1u32 << 32 shift panic.

All 36 unit tests pass.
Placing a new table header before eframe silently scoped it into the
unix-only target table, so Windows builds lost the dependency entirely:

  error[E0432]: unresolved import `eframe`
  use of unresolved module or unlinked crate `eframe`

(Builds green on Mac/Linux because those hit cfg(unix) == true. Windows
was the only casualty.)

Moved the [target.'cfg(unix)'.dependencies] block to the end of
Cargo.toml, after the optional eframe line, so the main [dependencies]
table stays intact for all targets. Added a comment so this foot-gun
can't return.
…orks (closes therealaleph#11)

Multiple users reported the same thing (issue therealaleph#11): they trusted the
CA, then re-installed it, then deleted and re-generated it, and still
every HTTPS site through the proxy failed in the browser. The python
version of the same project doesn't have the issue.

Root cause: rcgen's CertificateParams::default() produces a
minimum-viable x509 cert that does NOT carry:

  - ExtendedKeyUsage extension with id-kp-serverAuth
  - KeyUsage extension with digitalSignature + keyEncipherment

Modern Chrome / Firefox / Edge / Safari all reject TLS server leaves
without those. The CA trust bit didn't matter — the browser's chain
validator rejected the leaf itself with NET::ERR_CERT_INVALID before
ever consulting the trust store. So 'reinstall the CA' was powerless
to help.

Fix in src/mitm.rs::issue_leaf:
  - Set params.extended_key_usages = [ServerAuth].
  - Set params.key_usages = [DigitalSignature, KeyEncipherment].
  - Backdate not_before by 5 min to absorb clock skew between the
    MITM process and a slightly-fast client clock. Same fix in the
    CA's own not_before.

Also added src/mitm.rs::tests::leaf_has_serverauth_eku_and_key_usage
as a permanent regression guard — it parses the DER with x509-parser
and asserts the three extensions are present. Added x509-parser to
dev-dependencies (already in the tree transitively via rcgen).

Upgrade path for users affected by therealaleph#11: download v0.8.1, run it. No
CA reinstall required — the CA cert itself was fine, only the per-
site leaves were broken.

Verified end-to-end locally:
  curl --cacert <ca.crt> -x http://127.0.0.1:... https://httpbin.org/ip
  curl --cacert <ca.crt> -x socks5h://127.0.0.1:... https://httpbin.org/ip
Both return JSON without cert errors, through the Apps Script relay
path. 37 unit tests pass.
…llow-up to therealaleph#11)

After v0.8.1 fixed the leaf cert extensions, users reported "still
broken" — specifically Firefox showing:
  "Software is Preventing Firefox From Safely Connecting to This Site.
   drive.google.com ... This issue is caused by MasterHttpRelayVPN"
for HSTS-preloaded sites. That error is Firefox's "MITM detected AND
issuing CA isn't in my trust store" path combined with HSTS blocking
the normal override button — so users were stuck with no workaround.

Real root cause of the "still broken" reports: the CA was making it
into the OS trust store (Windows cert store / update-ca-certificates
on Linux) but NOT into the browser-specific trust stores that
Firefox and Chrome use on every OS.

Three additions:

1. Firefox: .
   For every Firefox profile we find, we now write this pref to the
   profile's user.js. It tells Firefox to trust the OS CA store, so
   our already-successful system-level install automatically covers
   Firefox on next startup. Critical on Windows (NSS certutil isn't
   on PATH there, so the certutil-based Firefox install never
   worked). Idempotent — checks for existing pref before writing
   and leaves a non-matching user value alone.

2. Chrome/Chromium on Linux: install into ~/.pki/nssdb.
   Linux Chrome uses its own shared NSS DB, independent of both the
   OS store (populated by update-ca-certificates) AND Firefox's
   per-profile NSS. Without this, users installed the CA via
   run.sh, Chrome still refused every HTTPS site, and they spiraled
   trying to re-install the CA. We now also initialize that DB
   with  if it doesn't exist yet.

3. Refactored the NSS-install path so Firefox and Chrome share a
   single install_nss_in_dir() helper. Renamed the top-level entry
   from install_firefox_nss to install_nss_stores to match scope.

Locally verified the cert itself is fine — openssl x509 -text shows
Version 3, SAN, KeyUsage (critical), ExtendedKeyUsage, and
 passes. So the leaf is correct;
what was failing was the trust-chain validation inside the specific
browser because our CA wasn't in THAT browser's trust DB.

Upgrade path: download v0.8.2 and run the launcher or
`./mhrv-rs --install-cert`. Restart Firefox/Chrome after install —
Firefox needs the restart to re-read user.js.
…isibility (issue therealaleph#12)

Two reported issues:

1. Log level in the form had no visible effect — trace produced the
   same panel output as warn.
2. upstream_socks5 was reported as never being attempted.

(1) was because the UI binary never installed a tracing subscriber.
Every tracing::info!/debug!/trace! from the proxy was discarded; only
the handful of manual push_log() calls for start/stop/test reached
the 'Recent log' panel. Swapping the log level in the combo-box just
rewrote the config field — nothing consumed it.

Fix: install_ui_tracing() at startup registers a tracing_subscriber
fmt layer with a custom MakeWriter that mirrors each formatted event
line into shared.state.log. Respects RUST_LOG, defaults to 'info'
with hyper pinned to warn so the panel isn't swamped by low-level
HTTP chatter. Now the log level switch actually filters panel
output, and routing decisions show up live.

(2) is a documentation / visibility issue more than a bug. Our
upstream_socks5 routing is intentionally scoped to raw-TCP traffic
(non-HTTP, non-TLS) — HTTPS goes through the Apps Script relay,
which is the whole reason mhrv-rs exists. But without working logs,
it looks like upstream_socks5 is dead code.

Fix: every branch of dispatch_tunnel now emits a tracing::info! that
says exactly which path the connection took and, where applicable,
whether upstream_socks5 was used:

    dispatch api.telegram.org:443 -> raw-tcp (127.0.0.1:50529)
    dispatch www.google.com:443   -> sni-rewrite tunnel (Google edge direct)
    dispatch httpbin.org:443      -> MITM + Apps Script relay (TLS detected)

Combined with (1), users can now see in real time whether their
traffic is hitting upstream_socks5. If it says 'raw-tcp (direct)'
after they set the field, that's evidence of a real bug; if it
never reaches the raw-tcp branch at all, that's the documented
design (HTTPS → Apps Script).

Also per user request, updated README:
- Shields.io badges up top: latest release, total downloads, CI
  status, license, stars.
- Short 'Heads up on authorship' note crediting Anthropic's Claude
  for the bulk of the Rust port (with the human-on-every-commit
  caveat). English and Persian mirrors both have it.

All 37 unit tests pass.
…t-on-reopen bug)

A user reported that after Save-config, closing the UI, and reopening,
every form field was blank — but the config.json on disk still had all
the right values.

The culprit in the UI was load_form()'s swallow-errors pattern:

  let existing = if path.exists() {
      Config::load(&path).ok()   // .ok() threw away the error
  } else { ... };
  if let Some(c) = existing { /* populate form */ } else { /* defaults */ }

When Config::load returned an Err, .ok() silently converted to None,
the form went back to defaults, and the user had no signal at all
that the load had failed or WHY. On every platform I could test
(macOS / Linux) the round-trip works fine with a real round-trip test
added in config.rs (config::rt_tests::round_trip_all_current_fields
and round_trip_minimal_fields_only — both green). So whatever's
failing for this specific reporter is environment-specific (weird
filesystem encoding, partial write, different field shape from an
older version, … TBD). Without visibility we can't diagnose it.

Changes:

1. load_form() now returns (FormState, Option<String>). The String
   is a user-facing error message (with the full path + the
   underlying parse/validate reason) when Config::load fails on an
   existing file.
2. main() plumbs that error into App's initial toast, which sticks
   for 30 seconds (vs the normal 5 for regular toasts) so users who
   only open the UI briefly still see it.
3. Added tracing::info! in load_form for the success path too —
   the Recent log panel now always shows either 'config: loaded OK
   from <path>' or 'Config at <path> failed to load: <reason>' on
   startup, regardless of toast timing.
4. Added two regression-guard tests in config.rs covering the
   full-fields and minimal-fields save shapes the UI emits.

Next time a user reports this: they'll have the exact error in the
toast + the Recent log panel, and we can fix the actual bug instead
of shooting blind.
…ph#13)

User on issue therealaleph#13 reported that even after installing the CA (and
seeing it in the Windows cert manager UI), our 'Check CA' button still
said 'NOT trusted'. Root cause: is_ca_trusted() on Windows was just
returning false unconditionally — Check-CA has never worked on Windows.

Fix: is_trusted_windows() now shells out to certutil:
  certutil -user -store Root 'MasterHttpRelayVPN'
  certutil -store Root 'MasterHttpRelayVPN'

Checks both the user store (where our install_windows puts it by
default) and the machine store (fallback path when user-store install
is blocked). Requires certutil to print the cert name in stdout AND
exit 0 — belt-and-suspenders against locales where certutil exits 0
even on an empty match.

Also made the Check-CA UI message point users at the CA file path
for cross-device install — the same user reported their Android
V2rayNG client getting cert errors on our MITM-signed TLS leaves,
which is the expected 'the phone doesn't trust our CA' scenario. The
message now calls out the ca.crt path explicitly, and notes the
Android 7+ user-CA restriction (Firefox Android works, Chrome and
most apps don't trust user-installed CAs regardless).

Not addressed (by design):
- Replacing our CA keypair with Python-generated PEM fails to parse
  via rcgen. User tried this as a workaround before reporting. rcgen
  expects PKCS#8 PEM; Python's cryptography commonly emits PKCS#1
  ('BEGIN RSA PRIVATE KEY'). Even if parsing worked, mixing an
  external CA with our leaf-issuing code would break the key match.
  Users should stick with our generated CA — that's the supported
  flow. The Python cross-contamination experiment is expected to
  fail; we don't document it as supported.
Two reasons to pin a copy in the repo:

1. Users on networks where raw.githubusercontent.com is intermittent
   can still get the deploy-ready file via a repo ZIP / clone.
2. The Apps Script relay protocol between mhrv-rs and Code.gs is
   informal — upstream changes can silently break us. Keeping a
   snapshot lets future-us diff against what we tested against
   when diagnosing protocol-drift bugs.

Fetched verbatim from:
  https://raw.githubusercontent.com/masterking32/MasterHttpRelayVPN/refs/heads/python_testing/apps_script/Code.gs

Credit stays with @masterking32. The assets/apps_script/README.md
next to it calls out that we don't modify this file — users deploy
it as-is into their own Google Apps Script project.

Updated the Setup Guide link in both the English and Persian
sections so offline / restricted-network users have a fallback path.
Thanks @hamed256 — armhf cross-compile verified locally, produces a valid ARM 32-bit ELF. Merging with a follow-up commit on main to pin the runner to ubuntu-22.04 (GLIBC 2.36 floor, same policy as our other linux-gnu targets) so it runs on Raspberry Pi users on Bookworm / Bullseye.
=== PR therealaleph#14 follow-up: armhf build runs on Pi Bookworm/Bullseye ===

PR therealaleph#14 (merged earlier) added arm-unknown-linux-gnueabihf to the
release matrix but pinned os=ubuntu-latest, which is 24.04 with GLIBC
2.39. Target armhf sysroot on 24.04 is Debian Trixie (GLIBC 2.39),
far too new for a Raspberry Pi 2/3 on Bookworm (2.36) or Bullseye
(2.31) — users would get 'GLIBC_2.39 not found' the same way the
Linux-amd64 issue therealaleph#2 folks did before we pinned them to 22.04.

Fix: pin the armhf matrix entry to ubuntu-22.04, matching our other
linux-gnu targets. Binary will link against GLIBC 2.35 max, which
loads on Pi Bookworm and Bullseye. Also trimmed two trailing spaces.

Locally verified the cross-compile: rust:latest + gcc-arm-linux-
gnueabihf + proper CARGO_HOME config.toml produces a valid ARM 32-bit
ELF (2.9 MB, armhf EABI5).

=== Issue therealaleph#15: 'Check for updates' button in the UI ===

New src/update_check.rs module. On the user's click (no polling):

  1. Tcp-probes github.com:443 with a 5s budget. If unreachable, we
     return Offline(reason) instead of a confusing 'update check
     failed' — distinguishes 'you're offline' from 'GitHub API
     misbehaved'.

  2. HTTPS GET api.github.com/repos/.../releases/latest via the
     tokio + rustls stack (same hand-rolled HTTP pattern as
     domain_fronter — no new crate deps). Parses tag_name, strips
     the v-prefix, loose-semver-compares to env!(CARGO_PKG_VERSION).

  3. Returns one of four UpdateCheck variants: Offline / Error /
     UpToDate / UpdateAvailable { release_url }.

New UI wiring (src/bin/ui.rs):
  - Cmd::CheckUpdate enqueue variant
  - UiState::last_update_check { InFlight, Done(result) }
  - 'Check for updates' button next to the CA buttons
  - Result displayed as a colored small-text line under the CA info:
    green 'up to date', amber 'update available v0.8.5 → v0.8.6' with
    a clickable release-page hyperlink, red for offline/error.

Verified end-to-end with a live github.com fetch (got a rate-limit
HTTP 403 from my IP because I've been hitting the API a lot, but
that's the expected Error() state — response classification was
correct). Three unit tests for is_newer() and a gated live test for
the full round-trip.

43 tests pass.
=== UI redesign (zero new deps, same binary size) ===

Entire App::update() rewritten around three ideas:

1. Section cards. Form rows are grouped inside rounded frames with
   faint fills and small-caps headings:
     - 'Apps Script relay'  — Deployment IDs (textarea) + Auth key
     - 'Network'            — Google IP (+inline scan button), Front
                              domain, Listen host, HTTP+SOCKS5 ports
                              on one row, SNI pool button
     - Collapsing 'Advanced' — upstream SOCKS5, parallel dispatch,
                              log level, verify SSL, show auth key.
                              Closed by default — most users never
                              touch these.

2. Clearer action hierarchy. Primary buttons are accent-filled and
   larger:
     - Start  (green filled,  ▶ glyph, 120x32)
     - Stop   (red filled,    ■ glyph, 120x32)
     - Save config (blue accent filled, path shown inline after →)
     - SNI pool (blue accent filled, inside Network section)
     - Test relay (neutral, tall)
   Secondary actions (Install CA / Check CA / Check for updates)
   moved to their own compact row below, no longer competing.

3. Status + log clarity.
   - Header version links to GitHub:  → repo,  →
     the release tag page.
   - Running/stopped status is now a pill-shaped colored chip at the
     right end of the header (green fill + green dot when running,
     red when stopped).
   - Traffic stats in a 2-column layout inside the Traffic card —
     7 metrics fit in 4 rows instead of a 7-row vertical strip.
   - One compact transient status line above the log that auto-hides
     after 10 seconds — replaces the previous stack of permanent
     ca_trusted / test_msg / update_check labels that were pushing
     the log panel off-screen.
   - Log panel now has its own bordered frame (darker fill), a
     '[x] show' checkbox that hides it entirely when off, a 'save…'
     button that writes the current log buffer to a timestamped
     log-YYYYMMDD-HHMMSS.txt in the user-data dir, and a 'clear'
     button. Empty state shows a muted placeholder instead of
     silent void.

All helper functions (section, primary_button, form_row) live at the
top of ui.rs as small local helpers — no new modules, no new
dependencies.

=== Stricter end-to-end test (test_cmd.rs) ===

Previous test passed on any HTTP 200 status regardless of body.
After a user pointed out that the test reported PASS even after
they deleted their Apps Script deployment, updated the pass criteria:

  1. Status must contain '200 OK'.
  2. Body must parse as JSON.
  3. JSON must have an 'ip' field with a valid IPv4 or IPv6.

Anything else → SUSPECT (returns false), with a specific log message
like 'HTML returned instead of JSON. The Apps Script deployment may
be deleted, not published to Anyone, or requires sign-in.'

Also now emits tracing::info!/warn!/error! alongside println!, so
the verdict + detail show up in the UI's Recent log panel instead
of disappearing to a stdout nobody sees.

One new unit test: looks_like_ip() accepts v4+v6, rejects empty,
rejects malformed, rejects overflowed octets. 44 tests total, all
green.

Verified locally end-to-end — UI launches clean, form loads config
cleanly, Start/Stop/Save all fire correctly, Test relay produces
the new PASS/SUSPECT verdict with the tracing detail visible in
the log panel, Check-for-updates hits GitHub and resolves with the
compact auto-hiding status line.
…erealaleph#16)

User @barzamini pointed out an optimization from the Python community
(originally from seramo_ir): X/Twitter GraphQL URLs look like

  https://x.com/i/api/graphql/{hash}/{op}?variables=...&features=...&fieldToggles=...

The features and fieldToggles params change across sessions and even
within a session, busting our 50 MB response cache on every request to
the same logical query. Stripping everything after 'variables=' lets
identical logical queries collapse into one cache entry, dramatically
reducing quota usage when browsing Twitter through the relay.

Implementation:
  - src/domain_fronter.rs: new normalize_x_graphql_url() helper. Matches
    exactly the Python patch's pattern (host == 'x.com', path starts
    with /i/api/graphql/, query starts with variables=). Truncates at
    the first '&' past the '?'. Applied at the top of relay() so the
    normalized URL feeds BOTH the cache key AND the request sent to
    Apps Script — so we save on Apps Script quota too, not just on
    return-trip bytes.
  - src/config.rs: new opt-in normalize_x_graphql bool (default false).
    Off by default because strict X endpoints may reject trimmed requests;
    user should flip it on and watch for regressions.
  - src/bin/ui.rs: checkbox in the Advanced section,
    'Normalize X/Twitter GraphQL URLs', with tooltip explaining the
    trade-off and crediting the source.
  - Four new unit tests in domain_fronter::tests covering: the happy
    path trim, non-x.com hosts pass through unchanged, non-graphql x.com
    paths pass through unchanged, and idempotency. 48 tests total, all
    green.

Credit: idea by seramo_ir, Python patch at
https://gist.github.com/seramo/0ae9e5d30ac23a73d5eb3bd2710fcd67,
implementation request by @barzamini in issue therealaleph#16.
…therealaleph#15 follow-up)

@zula-editor reported on issue therealaleph#15 that the Check-for-updates button
was returning HTTP 403 on their ISP — classic GitHub
unauthenticated-API rate limit (60/hour per IP) on a shared NAT IP.
They also asked for the update to actually be downloadable from the
app, not just a page link.

Both addressed:

=== Route update check through our own proxy when running ===

New mhrv_rs::update_check::Route enum:
  - Direct: straight rustls to api.github.com (existing behavior)
  - Proxy { host, port }: HTTP CONNECT through our local HTTP proxy
    listener → MITM → Apps Script → api.github.com.

When the proxy is running, the UI automatically picks Proxy. From
GitHub's POV the request now comes from Apps Script's IP range (a
Google datacenter) — completely different rate-limit bucket from the
user's ISP IP, AND works even if GitHub is blocked on their network.

Routing over proxy means the MITM leaf for api.github.com has to be
trusted in the update_check's TLS config. build_root_store() now
conditionally adds our own CA cert from data_dir::ca_cert_path() to
the webpki roots when Route::Proxy is in use. Direct path is
unchanged.

=== Download button ===

The UpdateCheck::UpdateAvailable variant now carries an optional
ReleaseAsset { name, download_url, size_bytes } picked by
pick_asset_for_platform() from the GitHub API's assets[] array.
Preference list per (OS, arch):
  - macOS arm64 → mhrv-rs-macos-arm64-app.zip, else tar.gz
  - macOS amd64 → mhrv-rs-macos-amd64-app.zip, else tar.gz
  - Windows → mhrv-rs-windows-amd64.zip
  - Linux aarch64 → mhrv-rs-linux-arm64.tar.gz
  - Linux armv7 → mhrv-rs-raspbian-armhf.tar.gz
  - Linux x86_64 → mhrv-rs-linux-amd64.tar.gz

UI: when an update is available AND we have an asset, the transient
status line grows an accent-blue 'Download X.Y MB' button. Clicking
fires Cmd::DownloadUpdate, which pipes the asset through the same
Route (proxy if running, direct otherwise), writes it to
UserDirs::download_dir() (~/Downloads on most systems), and shows a
'show in folder' button that opens Finder / Explorer / xdg-open on
the containing directory.

Three new unit tests for asset-picking. The gated live test now
takes a Route argument (Direct) so it keeps working across the API
shape change. 49 tests pass.

Also refreshed in-repo releases/ archives to v0.9.1 alongside.
…sue therealaleph#18)

@Behzad9 on therealaleph#18: the OpenWRT 'No file descriptors available' errors
are back in v0.8.0+, this time logged as a wall of thousands of
identical ERRORs within seconds of activating the proxy. Two real
bugs, now fixed:

=== 1. accept() loop had no backoff ===

Previous code:
    loop {
        match listener.accept().await {
            Ok(x) => ...,
            Err(e) => { tracing::error!(...); continue; }  // tight loop
        }
    }

On EMFILE (RLIMIT_NOFILE exhausted), accept() returns synchronously,
the match re-runs instantly, accept() EMFILEs again, forever. The tight
loop ALSO starves the tokio runtime of CPU that existing connections
need to finish and close their fds — so the problem never clears on its
own. It's a self-sustaining meltdown.

New accept_backoff() helper (in proxy_server.rs) wraps both the HTTP
and SOCKS5 accept loops:
  - Detects EMFILE/ENFILE via raw_os_error (24 or 23).
  - Sleeps proportional to how long the pressure has lasted (50 ms
    first hit, ramping to a 2 s cap around hit therealaleph#40). Gives existing
    connections a chance to finish and free fds.
  - Rate-limits the log line: one WARN on the first EMFILE with fix
    instructions, then one every 100 retries. No more walls of
    identical errors.
  - Resets the counter on the next successful accept.
  - Non-EMFILE errors (ECONNABORTED from clients that went away during
    handshake, etc.) get a plain single-line error + 5 ms sleep so we
    still don't tight-loop on any unexpected error.

End-to-end verified: ran mhrv-rs under , flooded the
SOCKS5 port with 247 concurrent connections to trip EMFILE. Before:
log would have been 1000s of identical lines. After: exactly 1 warning,
listener stayed quiet, fds drained, accept resumed.

=== 2. RLIMIT_NOFILE bump was too conservative + silent ===

Previous behavior: target 16384 soft, cap to existing hard limit,
no log. On constrained systems where hard is already tiny, we'd
stay at the tiny limit silently.

rlimit.rs now:
  - Targets 65536 soft.
  - ALSO tries to raise the hard limit up to /proc/sys/fs/nr_open
    on Linux (Linux allows a non-privileged process to bump its own
    hard limit up to the kernel ceiling, usually 1048576 on modern
    kernels). On macOS/BSD we skip this — only bump soft.
  - Logs WARN on startup if soft ends up <4096 with the exact fix
    ('ulimit -n 65536' or use the procd init). No more silent
    failure.
  - Logs INFO with the before/after limits otherwise, so field bug
    reports tell us immediately whether the kernel cap is the real
    bottleneck.

Moved the rlimit call from main() pre-logging to post-init_logging so
its tracing output actually lands in the log panel + stderr. Small
reorganization only.

49 tests pass, musl x86_64 cross-compile verified locally.
therealaleph and others added 24 commits April 23, 2026 20:23
Linux / Android / mipsel build jobs now run on two self-hosted runners
on a Hetzner 8-core / 31 GB Ubuntu 24.04 box with Rust, Android SDK+NDK
r26c, all cross-compile toolchains and Docker pre-installed. macOS and
Windows still run on GitHub-hosted — we don't self-host those OSes and
the free minutes on a public repo are plenty.

Adds Swatinem/rust-cache@v2 to every cargo-using job so target/ + cargo
registry survive between runs. With warm caches the Linux jobs take
~1min each and the Android job ~3-4min; cold runs are ~9min for
Android and ~2min for everything else. Release wall time before this
change was ~13m consistently; it should now sit around 6-7m.

No new user-facing code in this release — primarily an infra change
exercised by an actual tag-push so we verify the full pipeline works
end-to-end from the new runners.
therealaleph#78)

Validate Content-Range in the range-parallel path before stitching. Malformed 206s are no longer combined into a fake 200 OK; invalid probes fall back to a normal single GET, invalid later chunks fall back to the validated probe response.
Reject configs that set HTTP and SOCKS5 listeners to the same port. Enforced both at config-load and in the UI form so users get a clear error before bind-time failure. Adds a focused regression test.
…roid note

- PR therealaleph#78: validate Content-Range on 206 responses in the range-parallel
  path before stitching. Prevents malformed partials from being combined
  into a fake 200 OK. Invalid probe falls back to a normal single GET;
  invalid later chunks fall back to the validated probe response
  instead of shipping truncated/wrong data.

- PR therealaleph#79: reject configs with listen_port == socks5_port at validation
  time (both config-load and UI form) instead of letting the second
  bind fail at runtime with a less clear error.

- README: add an explicit note about the Android 7+ user-CA trust
  limitation so future reporters (therealaleph#74, therealaleph#81, and the next dozen) find
  the answer in the docs instead of in a support thread. The previous
  "every app routes through the proxy" line was misleading — TUN
  captures all IP traffic but HTTPS still needs app-level trust of
  our MITM CA, which most non-browser apps don't grant.

Running through the new self-hosted CI pipeline. Warm rust-cache should
bring the full matrix in under ~7 minutes.
v1.2.4 tagged cleanly but its CI failed — parallel Linux matrix jobs
on the self-hosted runners all raced on `/var/lib/apt/lists/lock` and
failed the `sudo apt-get install` step within ~20s. v1.2.4's release
job therefore skipped and no assets were published.

Fix:

- Pre-installed every apt dependency the workflow needs on both
  self-hosted runners (eframe system libs, gcc-aarch64-linux-gnu,
  gcc-arm-linux-gnueabihf).
- Seeded per-runner cargo linker configs at
  /home/ghrunner/cargo-{01,02}/config.toml so the "echo
  [target.xxx] linker = ..." workflow step is also unnecessary.
- Gated the "Install Linux eframe system deps" and the two cross-
  compile-toolchain steps on `runner.environment == 'github-hosted'`
  so only hosted runners call apt-get; self-hosted runners skip the
  whole thing and use pre-installed tooling.

Re-tagging as v1.2.5 since v1.2.4 is an abandoned tag (git tag exists
but no GitHub Release was cut for it).

Same code changes as what v1.2.4 was meant to ship: PR therealaleph#78 range-
parallel validation, PR therealaleph#79 port-collision rejection, README note
on Android 7+ user-CA trust.
New `mhrv-rs scan-sni` subcommand: pulls Google's published IP ranges, issues PTR lookups via dns.google, filters results to Google-related hostnames, then TLS-probes each discovered SNI against the user's configured `google_ip`. Prints the SNIs that pass DPI for the user to paste into `sni_hosts`. Also expands the hardcoded FAMOUS_GOOGLE_DOMAINS list the existing scan-ips command already used.

Adds `url` crate for URL parsing in the DNS-over-HTTPS client. No other behavioural changes.
v1.2.4 and v1.2.5 both cut clean tags but CI failed downstream for
different self-hosted reasons:

- v1.2.4 failed on parallel apt-lock race (fixed)
- v1.2.5 failed with "TOML parse error at line 5 column 9" because
  rust-cache v2's default cache-bin=true prunes $CARGO_HOME/bin of
  any binary not installed via `cargo install`. `rustup` itself is
  installed by rustup-init, not cargo install, so it got flagged as
  "unknown" and deleted on cache save. Next job hits the cargo
  symlink that points at a missing rustup, which resolves somehow
  to a very old cargo that can't parse our Cargo.toml.

Fix:
- Set `cache-bin: "false"` on every Swatinem/rust-cache@v2 call.
  We still cache target/ + registry (the big win), just not bin/.
  Binaries are stable across runs on our self-hosted box anyway.
- Reinstalled rustup inside each per-runner CARGO_HOME on the server
  to recover from the broken state.

Also in this release:
- PR therealaleph#83: new `mhrv-rs scan-sni` subcommand. Pulls Google's
  published IP ranges, does PTR lookups via dns.google on each IP,
  filters to Google-related hostnames, then TLS-probes each
  discovered SNI against the configured google_ip to see which ones
  bypass DPI. Useful for rebuilding a working SNI pool on a new ISP.
  Adds the `url` crate dep.

Same user-facing code as v1.2.4/v1.2.5 (PRs therealaleph#78, therealaleph#79, README Android
note) plus PR therealaleph#83 and the CI fixes on top.
…herealaleph#92)

The googl.com shortener domain is NOT in Google's GFE certificate SAN list — verified via `openssl s_client -verify_hostname accounts.googl.com` returning hostname mismatch. Every Nth connection where the rotation landed on this entry was failing cert validation with `verify_ssl=true`. Replaced with accounts.google.com which is covered by *.google.com wildcard.
Standalone Rust/axum HTTP server + Apps Script-side CodeFull.gs for users who want to deploy a remote tunnel node. All new files; no changes to the main Rust crate. This is part 1 of 3 of the full-tunnel feature — it adds scaffolding that users can opt into once the Rust-side Mode::Full lands in therealaleph#94.
…herealaleph#92 + therealaleph#93)

- Android DEFAULT_SNI_POOL: mirror the Rust-side fix from therealaleph#92 —
  accounts.googl.com replaced by accounts.google.com. Same cert-SAN
  mismatch that was failing every Nth rotation in the Rust client
  affected the Android user's sniHosts population; both pools need
  to stay in sync by design.

- Release rolls up PR therealaleph#92 (cert fix) and PR therealaleph#93 (tunnel-node +
  CodeFull.gs scaffolding). PR therealaleph#93 adds a standalone binary under
  tunnel-node/ plus an Apps Script companion; no main-crate changes,
  so this is a zero-risk merge. Users who want to deploy a tunnel
  node can start today. The dispatch that activates `mode: full` is
  still in review in PR therealaleph#94.
dns.google replies with Transfer-Encoding: chunked; the raw payload was being handed to serde_json with chunk framing still embedded, so every PTR parse failed and scan-sni discovered nothing. Parses the HTTP response (chunked + Content-Length) before JSON decode. Includes 3 new unit tests.
The scan-sni DoH client to dns.google was using NoVerify — an on-path MITM could forge PTR answers and poison the discovered SNI pool. This is a public HTTPS request, not a fronted probe, so certificate validation belongs ON. Switched to the normal webpki root store.
…herealaleph#104)

filter_forwarded_headers was stripping hop-by-hop headers (Host,
Connection, Content-Length, etc.) but not identity-revealing
forwarding headers. If a user sat behind another proxy or ran a
browser extension that inserts any of:

  X-Forwarded-For, X-Forwarded-Host, X-Forwarded-Proto,
  X-Forwarded-Port, X-Forwarded-Server, X-Forwarded-Ssl,
  Forwarded, Via, X-Real-IP, X-Client-IP, X-Originating-IP,
  True-Client-IP, CF-Connecting-IP, Fastly-Client-IP,
  X-Cluster-Client-IP, Client-IP

those would carry the client's real IP all the way through the Apps
Script relay to the origin server. Stripping them so the origin only
ever sees whatever source IP the Apps Script / GFE path terminates on.

This covers the Apps Script relay path (the main leak vector). The
SNI-rewrite tunnel path is a raw TLS byte bridge — it doesn't parse
HTTP at all — so any headers the client emits there pass through as
opaque bytes to the Google edge that terminates TLS. In practice
that's narrower (origin sees GFE) but documenting the caveat on the
issue thread.

Adds a focused regression test that locks in every stripped header.

Reported in therealaleph#104.
Ports the upstream Python `youtube_via_relay` flag (commit a0fd8a0 in
masterking32/MasterHttpRelayVPN). When enabled, YouTube-family
suffixes (youtube.com, youtu.be, youtube-nocookie.com, ytimg.com)
opt out of the SNI-rewrite tunnel and fall through to the Apps Script
relay path.

Why it helps some users: when YouTube is reached via SNI-rewrite to
google_ip with SNI=www.google.com, Google's frontend can enforce
SafeSearch / Restricted Mode based on the SNI name, causing "video
restricted" errors on some regular videos. Routing through Apps
Script bypasses that specific filter at the cost of (a) UrlFetchApp's
fixed `User-Agent: Google-Apps-Script`, and (b) counting YouTube
traffic against the script's daily quota.

Off by default so existing behaviour is unchanged. Users who hit the
SafeSearch-on-SNI issue can set `"youtube_via_relay": true` in their
config.json and observe.

Explicit `hosts` overrides always beat the toggle — that's a user
choice and should win over the default policy. Added tests for all
three branches (youtube_via_relay off, on, and with hosts override).

Matching Android-side UI toggle deferred — `normalize_x_graphql` is
also config-only on Android today; users can edit config.json
directly if needed.
Rollup of four merged fixes since v1.2.7:

- security: strip identity-revealing forwarding headers in the Apps
  Script relay path. Closes the XFF leak vector from issue therealaleph#104 —
  users chained behind xray/v2rayNG or running browser extensions
  that inject X-Forwarded-For / Forwarded / Via / CF-Connecting-IP
  etc. would previously have those forwarded to the origin via the
  relay. Now stripped to 16 header variants with a regression test.

- proxy: new `youtube_via_relay` config toggle (therealaleph#102). Routes
  YouTube family suffixes through Apps Script instead of the
  SNI-rewrite tunnel. Trades SafeSearch-on-SNI for Apps Script's
  fixed User-Agent + quota cost. Off by default.

- scan_sni: decode chunked dns.google DoH responses (therealaleph#97, from
  @freeinternet865). Without this, PTR lookups always failed and
  scan-sni discovered zero domains.

- scan_sni: verify dns.google TLS with webpki roots (therealaleph#98, from
  @freeinternet865). The DoH request is a normal public HTTPS call
  — an on-path MITM should not be able to forge PTR answers and
  poison the suggested SNI pool.

73 tests pass (up from 67 — three new chunked-decode tests + one
XFF-filter + two youtube_via_relay branches).
v1.2.8 tagged cleanly but CI failed compiling mhrv-rs-ui with:

  error[E0063]: missing field `youtube_via_relay` in initializer of
  `mhrv_rs::config::Config`

When I added the youtube_via_relay field to the main Config struct
in 21912cc, I missed the struct-literal construction in src/bin/ui.rs
(FormState::save_to_config) and the ConfigWire serializer.

Fixed here:

- Added youtube_via_relay field to FormState (line 214), read path
  (line 291), default path (line 316), and the save path (line 451)
- Added youtube_via_relay field to ConfigWire (line 493) with
  skip_serializing_if on false, plus its From impl (line 544)

UI still doesn't expose a checkbox for the toggle — it's config-only
for now, same treatment as normalize_x_graphql. A future PR can add
the checkbox to the Advanced pane.

v1.2.8 tag exists but has no GitHub Release (release job skipped
on failure); v1.2.9 is the clean cut. Same payload as v1.2.8 plus
this fix.
…hem (therealaleph#99)

Before: `ProxyServer::run()` aborted only the two accept tasks on
shutdown (`http_task`, `socks_task`), but every per-client task was
spawned as a bare `tokio::spawn(...)` whose JoinHandle was discarded.
Aborting the accept loop stopped taking new connections, but in-flight
clients kept running on the runtime with their captured (stale)
`Arc<DomainFronter>`.

User-visible symptoms reported by @r-safavi in therealaleph#99:

1. Hitting Stop in the UI didn't actually stop serving: Firefox still
   reached x.com through the proxy even though the user expected a
   "connection refused."
2. Starting again with a changed auth_key worked for NEW domains
   (yahoo.com) but not for domains with a live keep-alive (x.com) —
   because the old child task was still using the old fronter with the
   old key.
3. Apps Script quota could be consumed after the user thought they'd
   stopped. Arguably the worst of the three.

Fix: wrap per-client spawns in a `tokio::task::JoinSet<()>` scoped
inside each accept task. When the accept task is aborted on shutdown,
the JoinSet is dropped, and `JoinSet::drop` aborts every still-running
child — closing their sockets and dropping their Arc clones of the
fronter, which in turn drops the pool.

Also added an opportunistic `try_join_next()` drain before each
accept() so the JoinSet doesn't grow unbounded with completed-task
handles on long-running proxies.

Covers Finding 2 of therealaleph#99. Finding 1 (quota-exceeded → timeout instead
of surfacing Apps Script's 502) is a separate pool-staleness issue and
stays open for now.
Single-focus release. The Stop button in the UI previously only
stopped new connections from being accepted — in-flight clients kept
running on the old DomainFronter, which meant:

- Pages kept loading after Stop (users thought they'd stopped)
- Auth-key changes didn't take effect for domains with a live
  keep-alive to the proxy
- Apps Script quota could still be consumed post-Stop

Fix (c8f4a0c): wrap per-client spawns in a tokio::task::JoinSet
inside each accept loop. On shutdown, aborting the accept task drops
the JoinSet, which aborts every in-flight client. Sockets close,
the old fronter's TLS pool drops, and a subsequent Start builds a
clean new state.

Finding 1 of therealaleph#99 (quota-exceeded → "timeout" instead of the real
502 body) is a separate pool-staleness issue and is NOT addressed
in this release.
…alaleph#64)

The x.com GraphQL URL-length fix added in v1.2.1 (cbe06b5) only
matched exact host "x.com". But browsers actually navigate to
www.x.com, and api.x.com serves GraphQL endpoints too — the original
fix never fired for real traffic.

@pourya-p's log in therealaleph#64 made this unambiguous:

  relay GET https://www.x.com/i/api/graphql/<hash>/HomeTimeline?variables=...&features=...
  ...
  ERROR Relay failed: relay error: Exception: بیش از حد مجاز: طول نشانی وب URLFetch.

(That Persian text is Apps Script's "URLFetch URL length exceeded"
error, which is exactly what the truncation was supposed to prevent.)

Widened the host matcher to `host == "x.com" || host ends with
".x.com"` so www.x.com / api.x.com / any future x.com subdomain all
hit the rewrite. The path-pattern constraint
(`/i/api/graphql/... ?variables=`) already filters to the right
endpoints.

73 tests still pass.
…ph#64)

Single-bug release. Unblocks x.com browsing for users whose browsers
resolve to www.x.com rather than bare x.com — i.e. essentially
everyone using Firefox / Chrome / Safari.

Previous releases still advertised the URL-truncation fix as working
but it only matched exact Host: x.com, which never happens in real
traffic. v1.2.11 widens the matcher to x.com + *.x.com so www.x.com,
api.x.com, and any future x.com subdomain all get the shortened URL
through Apps Script's URL length cap.
Adds a new `mode: full` that tunnels ALL traffic end-to-end through Apps Script → a remote tunnel node. Browser does TLS directly with the destination. No MITM, no CA installation needed on the client device.

Ships as part of the 3-PR series: therealaleph#93 (tunnel-node service + CodeFull.gs, merged) + this (Rust-side Mode::Full + batch tunnel client) + therealaleph#95 (Android UI dropdown, now rolled into this PR post-rebase).

### Architecture
- Client → mhrv-rs → script.google.com (Apps Script fetch) → tunnel-node on user's VPS → real destination
- Apps Script is the transport to reach the VPS; works even when the ISP blocks direct VPS IPs
- Batch multiplexer collects data from all active sessions and ships one Apps Script request per tick

### Safety properties of this merge
- AppsScript + GoogleOnly dispatch paths are **unchanged**; Full mode is an additive branch at the top of `dispatch_tunnel`.
- `tunnel_client.rs` is a new isolated module (387 LOC).
- `tunnel_request()` is a new method on `DomainFronter`, no change to `relay()` / `relay_parallel_range()`.
- Config: additive `Mode::Full` variant + validation tests (2 new); existing validation rules untouched.
- Local build: clean compile. `cargo test --quiet`: 75 passed (73 → 75 with 2 new config tests).

### Closes
Unblocks the feature requested in therealaleph#61, therealaleph#69, therealaleph#100, therealaleph#105, therealaleph#110, therealaleph#111, therealaleph#113, therealaleph#116.

### Testing
vahidlazio has iterated on prior review feedback. End-to-end testing with a real tunnel-node deployment will follow post-merge from @Feiabyte (volunteered in therealaleph#61). Post-merge CI will exercise compile + full test matrix across all targets; any regression caught there gets a fast-follow fix.
Rollup of PR therealaleph#94 — Mode::Full dispatch + batch tunnel client. Ships
the long-awaited no-MITM path that was the motivating fix for half
the open issues this week.

User-facing: add `"mode": "full"` to config.json, deploy CodeFull.gs
as a second Apps Script alongside your existing one, deploy
tunnel-node (tunnel-node/README.md) on a VPS, and traffic is tunneled
end-to-end: client → mhrv-rs → script.google.com → your tunnel node →
destination. Browser speaks TLS directly with the destination; we
never see plaintext. No CA needed on the client device.

Android side gets a "Full tunnel (no cert)" dropdown option; toggling
it writes `"mode": "full"` to config.json.

Safety: Mode::AppsScript and Mode::GoogleOnly dispatch paths are
unchanged — Full mode is an additive branch at the top of
dispatch_tunnel. Existing users on the default apps_script mode see
zero behaviour change.

Testing status: compiles clean on all 10 CI targets; 75 tests pass
(+2 new config-validation tests for Full mode); end-to-end real-VPS
testing will come post-release from @Feiabyte and others who opt in.
Any Full-mode regression gets a fast-follow fix.
@therealaleph
Copy link
Copy Markdown
Owner

Reviewed the diff and ran locally on macOS host — cargo check clean, cargo test --quiet = 101/101 passes (up from 75 on main, so ~26 net new). The code itself is well-structured and the quality signals (bilingual docs update, changelog entry, 29-test test plan, candid "only Windows tested" disclosure in the PR body) are genuinely impressive for a first contribution here.

That said, I can't auto-merge this one for two concrete reasons:

1. macOS + Linux E2E paths are self-declared as untested. You called this out in the PR body, which I appreciate. But security delete-certificate on macOS keychains, update-ca-certificates / trust extract-compat on Linux distros, and the certutil NSS operations against Firefox profiles are the kind of code paths where a bug means a reviewer ends up with an orphan root CA they have to manually clean with Keychain Access / /etc/ssl/certs/. I don't want to ship an untested cert-removal flow to users who might not be comfortable with manual keychain cleanup when it fails.

2. You're a new contributor to this repo (first PR). Combined with 1377 additions in a security-adjacent module, the conservative thing is to leave this open for explicit maintainer sign-off rather than auto-merging on my "cargo tests pass" alone. That's a rule rather than a reflection on the code quality.

What would unblock merge:

Ideally three smoke tests from three separate reviewers (one macOS, one Debian/Ubuntu-family Linux, one Fedora/RHEL-family Linux) walking through your ## Test plan checklist. If that's too high a bar, at minimum one macOS reviewer and one Linux reviewer.

I'll flag this in the repo for maintainer attention. If nobody steps up within a few days, I can run the macOS path myself on a disposable VM rather than my host machine — but that's slower than someone who can do it on their actual device.

Two small review notes on the diff itself (not blockers):

  1. reconcile_sudo_environment() re-rooting HOME via SUDO_USER is the right approach, but I'd want a test that specifically covers the "no SUDO_USER set but euid==0" case (real root login, not sudo). The current behavior should be "don't re-root, leave HOME as /root" and that's not explicitly asserted in the suite as far as I read.

  2. The pre-marker enterprise_roots cosmetic orphan note in your README upgrade section is helpful — thanks for surfacing that. Might be worth an actual warning print in remove_ca when it detects a pre-marker line (just so the user knows their Firefox user.js has an orphan pref, not that anything's broken).

Thanks again for the depth — this is the kind of PR I'd love to see more of.


[reply via Anthropic Claude | reviewed by @therealaleph]

@dazzling-no-more
Copy link
Copy Markdown
Contributor Author

Thanks for the careful review, the detailed context on the merge policy is appreciated, and the code-quality compliment means a lot.

Quick note on the first-contributor point: this is actually not my first PR to this repo. I also contributed the google_only bootstrap mode a while back (the direct SNI-rewrite path that lets users reach script.google.com to deploy Code.gs before they have an Apps Script relay).

On why this feature matters: the MITM CA this app installs has its private key on the user's disk, and the OS trusts it for every HTTPS site. That's a non-trivial capability to leave lying around, if the key is ever exposed (lost laptop, leaked backup, a machine sold or handed down, etc.), anyone holding it can mint certificates the browser silently accepts as legitimate for any domain. And on the OS side, a stale trusted root is effectively a standing MITM authorization until someone notices it in the cert store. Without a clean-slate uninstall path, users who try mhrv-rs and move on, or switch to Full Tunnel Mode and no longer need a local MITM, end up with that capability sitting on disk and in their trust store indefinitely. So I think having a one-command clean removal is worth some review rigor, which I fully agree with.

On the two small notes, I've pushed a follow-up commit addressing both:

  1. reconcile_sudo_home decision logic extracted into a pure should_reconcile_for(euid, sudo_user) helper with four branch tests covering every case in the matrix, including the one you called out explicitly: euid == 0 && SUDO_USER unset (real root login, not sudo). That branch now has an explicit assert_eq!(should_reconcile_for(0, None), None) so the "leave HOME as /root" invariant is pinned down.

  2. Pre-marker orphan warning disable_firefox_enterprise_roots now logs an info-level hint when a profile's user.js has a bare security.enterprise_roots.enabled = true without our marker above it. The log line explains it's cosmetic (Firefox falls back to its built-in root store once the CA leaves the OS trust store) and suggests manual removal if the user wants a clean file. New has_bare_enterprise_roots pure helper with four tests.

Test count: 109/109 passing.

On the platform-coverage bar completely reasonable, and no pressure on the VM offer. Take it when it's convenient. If any macOS or Linux reviewer sees this thread and wants to walk the ## Test plan checklist (the Install → Check → Remove → Check round-trip, the sudo HOME re-rooting verification, and the Linux refresh-failure retry case are the ones I'd most want another pair of eyes on), that would unblock merge faster than you carving out VM time.

Thanks again.

@therealaleph
Copy link
Copy Markdown
Owner

Heads-up: we just rewrote git history on main for a privacy-related cleanup (a few old commits had a contributor's real email/name leaked via git config user.email on their local machine; rewritten to the canonical noreply form). Force-push to main + all version tags landed a few minutes ago.

This PR's branch is based on the pre-rewrite SHAs, so you'll need to rebase before it can merge cleanly. Easiest path:

git fetch origin
git checkout feature/delete_certificate
git rebase origin/main
# resolve any conflicts (your changes don't touch any rewritten files,
# so this should be conflict-free — just SHA pointers updating)
git push --force-with-lease

If --force-with-lease complains, your local clone is out of sync because every SHA changed; git fetch origin && git reset --hard <your-fork>/feature/delete_certificate first to re-anchor, then rebase.

Functionally nothing about your PR has changed and the review is still where we left it — still awaiting macOS + Linux smoke tests from reviewers. Sorry for the disruption.


[reply via Anthropic Claude | reviewed by @therealaleph]

@therealaleph
Copy link
Copy Markdown
Owner

Hey @dazzling-no-more — thanks for the cert-removal work. The feature itself looks solid (the RemovalOutcome::{Clean, NssIncomplete}, the reconcile_sudo_environment() for sudo-aware HOME re-rooting, and the marker-gated Firefox enterprise_roots pref are all the right shapes), but the branch can't merge as-is.

The diff is showing +16,403 / -1,210 across 95 files because the fork point predates a lot of work that's already in main:

  • android/ directory in full (merged via earlier Android PRs)
  • tunnel-node/ workspace (merged in v1.5.0)
  • src/android_jni.rs, src/tunnel_client.rs, src/update_check.rs (all already in main)
  • docs/changelog/v1.1.0.md through v1.2.13.md (all already in main)
  • assets/apps_script/Code.gs, CodeFull.gs (already in main)
  • pre-built artifacts under releases/

So GitHub is showing this as adding all of that again, not because you wrote it twice but because git thinks your branch lacks those commits.

Could you do a clean rebase onto current main and force-push? Concretely:

git fetch origin
git rebase origin/main
# resolve conflicts — most should auto-resolve since you'd be re-applying just your cert-removal commits on top
git push --force-with-lease

If the rebase gets ugly because the cert-installer conflicts are nontrivial, the cleanest path is probably:

git checkout main && git pull
git checkout -b fix/remove-cert-rebased
# cherry-pick just your cert-installer + main.rs + ui.rs + README cert commits onto fresh main

Once the diff is just the --remove-cert feature (probably ~1500 LOC across src/cert_installer.rs, src/main.rs, src/bin/ui.rs, README.md), happy to actually review the trust-store removal logic per platform. The unit-test coverage you described (29 new tests for the pure logic) is a great start.

The Windows smoke test is enough to start — I'll do macOS, and we can ask for a Linux pair from someone with a Debian + RHEL box once the diff is reviewable.


[reply via Anthropic Claude | reviewed by @therealaleph]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants