feat(cert): add --remove-cert flag and Remove CA button for clean-slate revocation#121
feat(cert): add --remove-cert flag and Remove CA button for clean-slate revocation#121dazzling-no-more wants to merge 90 commits intotherealaleph:mainfrom
Conversation
…utdown-ui Thanks @v4g4b0nd-0x76 — proper listener teardown on Stop is exactly what was needed. The 2-second grace window + force-abort fallback is a clean pattern.
Two user-reported issues.
=== GLIBC too new (reported via twitter) ===
Our linux-amd64 and linux-arm64 gnu builds were compiled on
ubuntu-latest (24.04, GLIBC 2.39), which means the resulting binaries
refuse to load on anything older:
./mhrv-rs: /lib/x86_64-linux-gnu/libc.so.6: version 'GLIBC_2.39'
not found (required by ./mhrv-rs)
Users on Ubuntu 22.04 / Mint 21 (GLIBC 2.35) — the typical user in Iran
where this project's target audience lives, and where they can't
dist-upgrade because they're behind exactly the kind of network
restriction this tool exists to bypass — could not run the gnu builds
at all.
Fix: pin the linux-gnu matrix entries to ubuntu-22.04 runners. GLIBC
2.35 is now the minimum; binaries load on Ubuntu 22.04, Mint 21,
Debian 12, Fedora 36+, RHEL 9+ and everything newer.
Users on older distros (Ubuntu 20.04, CentOS 7) can still use the
static musl builds (mhrv-rs-linux-musl-amd64.tar.gz et al.) which
have no GLIBC dependency at all.
=== Short-screen laptops — main window content clipped (PR therealaleph#6) ===
Co-authored fix from @v4g4b0nd-0x76 in PR therealaleph#6 (manually applied to
avoid pulling in 400 lines of unrelated cargo-fmt churn):
- Wrap the CentralPanel body in ScrollArea::vertical()
.auto_shrink([false; 2]) so everything stays reachable on short
screens.
- Lower the min_inner_size from [420, 540] to [420, 400] so laptops
with ~13" screens at default scaling can shrink the window without
clipping UI elements.
Closes therealaleph#6.
Co-authored-by: v4g4b0nd-0x76 <v4g4b0nd-0x76@users.noreply.github.com>
Verified: the linux-amd64 binary's highest GLIBC symbol is now 2.34 (was 2.39 in v0.7.0 and earlier), so it runs on Ubuntu 22.04 / Mint 21 / Debian 12 and anything newer.
Two user complaints: - English words mixed inline in the Persian section were breaking the RTL text flow, making paragraphs hard to read. - Language was too technical for non-developer users. Fixes: 1. Every English / technical term is now wrapped in backticks (`Apps Script`, `MITM`, `SOCKS5`, `Deployment ID`, …). GitHub renders these as monospace LTR islands, which the browser's bidirectional text algorithm treats as embedded strong-LTR runs and doesn't let them flip the surrounding RTL paragraph direction. 2. Rewrote most paragraphs as shorter, plainer Persian sentences. Replaced jargon (run-time, on-the-fly, rewrite, trust store…) with everyday wording. 3. Converted dense prose into tables where it helped (download table by OS, config fields table, per-OS CA install table). 4. Added a 5-step walkthrough (script deploy → download → first run → config in UI → browser setup) that a non-technical user can follow top-to-bottom. 5. New 'How do I know it's working?' quick verification section. 6. New big FAQ at the bottom — covers the questions that actually come up: certificate install safety, how to remove the cert, how many Deployment IDs to use, YouTube / ChatGPT caveats, the GLIBC 2.39 issue, and CLI usage for power users. 7. Telegram pairing section reworded — explains the WHY first (Apps Script can't speak MTProto), then the one-line fix. 8. SNI pool editor flow written as numbered steps mirroring the actual UI buttons the user clicks. English section unchanged.
…ation Thanks @v4g4b0nd-0x76 for the feature. Two small fixes folded in on the merge so master still builds + doesn't hit sharp edges: - src/scan_ips.rs: rand::thread_rng() held across an .await tripped the Send bound on the async fn (ThreadRng isn't Send). Scoped the rng in a block so it drops before subsequent awaits. - src/scan_ips.rs: guard /0 and /32 CIDRs in cidr_to_ips and ip_in_cidr against the 1u32 << 32 shift panic (debug mode). goog.json is unlikely to contain either but defensive. Behavior unchanged otherwise: - fetch_ips_from_api=false (default): identical to previous static scan-ips behavior. - fetch_ips_from_api=true: fetches goog.json from www.gstatic.com, resolves famous Google domain IPs, prioritises matching CIDRs, samples up to max_ips_to_scan candidates, validates with gws/ x-google-/alt-svc headers. If the fetch fails, falls back to the static list cleanly — verified locally. Closes therealaleph#10.
…therealaleph#8), Windows UI diagnostics (therealaleph#7) Three user-reported fixes / features in one release. === PR therealaleph#9 — dynamic Google IP discovery (@v4g4b0nd-0x76) === Already merged in the previous commit. Opt-in via 'fetch_ips_from_api' in config. Pulls goog.json from www.gstatic.com, maps it against resolved IPs of well-known Google domains, samples from matching CIDRs, and validates each candidate with gws / x-google / alt-svc response-header checks. Graceful fallback to the static list if the fetch fails or nothing passes validation. Default is off so existing users are unaffected. Closes therealaleph#10. === Issue therealaleph#8 — OpenWRT: 'accept: No file descriptors available' === OpenWRT routers ship a very low RLIMIT_NOFILE (often 1024, sometimes 256 on constrained devices). A browser's burst of ~30 parallel sub- resource requests can fill the limit within seconds, after which accept(2) returns EMFILE and the proxy is effectively dead. Two-fold fix: 1. New assets/openwrt/mhrv-rs.init now sets procd limits nofile= "16384 16384" on the service. procd raises the per-process fd limit before the binary even starts. 2. New src/rlimit.rs best-effort-raises RLIMIT_NOFILE in the binary itself (Unix only, no new runtime deps — libc is already transitively present via tokio). Targets 16384 soft, capped to whatever hard limit the kernel already allows the user (so it doesn't need root). Both layers mean the fix applies whether the user runs via /etc/init.d/mhrv-rs start (procd limits kick in) or ./mhrv-rs --config ... (in-binary bump kicks in) or any other invocation path. Closes therealaleph#8. === Issue therealaleph#7 — Windows UI crashes silently === User report: on Win 11, run.bat prints 'Starting mhrv-rs UI...' and exits clean, but no UI window ever appears. Root cause: the old run.bat used 'start "" "mhrv-rs-ui.exe"' which returns immediately — if the UI binary dies at launch time (missing GPU driver, RDP without GL accel, AV blocking, …), the crash is invisible because start already disowned the child. Fix: run the UI in-place (not via 'start'), so its stderr and exit code land in the run.bat cmd window. On non-zero exit print a helpful checklist of common Windows launch failures and pause so the user can screenshot the output for an issue report. This doesn't fix the underlying crash for affected users, but it turns a ghost-crash bug into a self-diagnosing one so the next report includes actionable info. Closes-via-diag therealaleph#7. === Fixes folded into the PR therealaleph#9 merge === - src/scan_ips.rs: rand::thread_rng() held across an .await tripped the Send bound on the async fn. Scoped the rng in a block so it drops before the subsequent awaits. - src/scan_ips.rs: defend /0 and /32 CIDRs in cidr_to_ips and ip_in_cidr against 1u32 << 32 shift panic. All 36 unit tests pass.
Placing a new table header before eframe silently scoped it into the unix-only target table, so Windows builds lost the dependency entirely: error[E0432]: unresolved import `eframe` use of unresolved module or unlinked crate `eframe` (Builds green on Mac/Linux because those hit cfg(unix) == true. Windows was the only casualty.) Moved the [target.'cfg(unix)'.dependencies] block to the end of Cargo.toml, after the optional eframe line, so the main [dependencies] table stays intact for all targets. Added a comment so this foot-gun can't return.
…orks (closes therealaleph#11) Multiple users reported the same thing (issue therealaleph#11): they trusted the CA, then re-installed it, then deleted and re-generated it, and still every HTTPS site through the proxy failed in the browser. The python version of the same project doesn't have the issue. Root cause: rcgen's CertificateParams::default() produces a minimum-viable x509 cert that does NOT carry: - ExtendedKeyUsage extension with id-kp-serverAuth - KeyUsage extension with digitalSignature + keyEncipherment Modern Chrome / Firefox / Edge / Safari all reject TLS server leaves without those. The CA trust bit didn't matter — the browser's chain validator rejected the leaf itself with NET::ERR_CERT_INVALID before ever consulting the trust store. So 'reinstall the CA' was powerless to help. Fix in src/mitm.rs::issue_leaf: - Set params.extended_key_usages = [ServerAuth]. - Set params.key_usages = [DigitalSignature, KeyEncipherment]. - Backdate not_before by 5 min to absorb clock skew between the MITM process and a slightly-fast client clock. Same fix in the CA's own not_before. Also added src/mitm.rs::tests::leaf_has_serverauth_eku_and_key_usage as a permanent regression guard — it parses the DER with x509-parser and asserts the three extensions are present. Added x509-parser to dev-dependencies (already in the tree transitively via rcgen). Upgrade path for users affected by therealaleph#11: download v0.8.1, run it. No CA reinstall required — the CA cert itself was fine, only the per- site leaves were broken. Verified end-to-end locally: curl --cacert <ca.crt> -x http://127.0.0.1:... https://httpbin.org/ip curl --cacert <ca.crt> -x socks5h://127.0.0.1:... https://httpbin.org/ip Both return JSON without cert errors, through the Apps Script relay path. 37 unit tests pass.
…llow-up to therealaleph#11) After v0.8.1 fixed the leaf cert extensions, users reported "still broken" — specifically Firefox showing: "Software is Preventing Firefox From Safely Connecting to This Site. drive.google.com ... This issue is caused by MasterHttpRelayVPN" for HSTS-preloaded sites. That error is Firefox's "MITM detected AND issuing CA isn't in my trust store" path combined with HSTS blocking the normal override button — so users were stuck with no workaround. Real root cause of the "still broken" reports: the CA was making it into the OS trust store (Windows cert store / update-ca-certificates on Linux) but NOT into the browser-specific trust stores that Firefox and Chrome use on every OS. Three additions: 1. Firefox: . For every Firefox profile we find, we now write this pref to the profile's user.js. It tells Firefox to trust the OS CA store, so our already-successful system-level install automatically covers Firefox on next startup. Critical on Windows (NSS certutil isn't on PATH there, so the certutil-based Firefox install never worked). Idempotent — checks for existing pref before writing and leaves a non-matching user value alone. 2. Chrome/Chromium on Linux: install into ~/.pki/nssdb. Linux Chrome uses its own shared NSS DB, independent of both the OS store (populated by update-ca-certificates) AND Firefox's per-profile NSS. Without this, users installed the CA via run.sh, Chrome still refused every HTTPS site, and they spiraled trying to re-install the CA. We now also initialize that DB with if it doesn't exist yet. 3. Refactored the NSS-install path so Firefox and Chrome share a single install_nss_in_dir() helper. Renamed the top-level entry from install_firefox_nss to install_nss_stores to match scope. Locally verified the cert itself is fine — openssl x509 -text shows Version 3, SAN, KeyUsage (critical), ExtendedKeyUsage, and passes. So the leaf is correct; what was failing was the trust-chain validation inside the specific browser because our CA wasn't in THAT browser's trust DB. Upgrade path: download v0.8.2 and run the launcher or `./mhrv-rs --install-cert`. Restart Firefox/Chrome after install — Firefox needs the restart to re-read user.js.
…isibility (issue therealaleph#12) Two reported issues: 1. Log level in the form had no visible effect — trace produced the same panel output as warn. 2. upstream_socks5 was reported as never being attempted. (1) was because the UI binary never installed a tracing subscriber. Every tracing::info!/debug!/trace! from the proxy was discarded; only the handful of manual push_log() calls for start/stop/test reached the 'Recent log' panel. Swapping the log level in the combo-box just rewrote the config field — nothing consumed it. Fix: install_ui_tracing() at startup registers a tracing_subscriber fmt layer with a custom MakeWriter that mirrors each formatted event line into shared.state.log. Respects RUST_LOG, defaults to 'info' with hyper pinned to warn so the panel isn't swamped by low-level HTTP chatter. Now the log level switch actually filters panel output, and routing decisions show up live. (2) is a documentation / visibility issue more than a bug. Our upstream_socks5 routing is intentionally scoped to raw-TCP traffic (non-HTTP, non-TLS) — HTTPS goes through the Apps Script relay, which is the whole reason mhrv-rs exists. But without working logs, it looks like upstream_socks5 is dead code. Fix: every branch of dispatch_tunnel now emits a tracing::info! that says exactly which path the connection took and, where applicable, whether upstream_socks5 was used: dispatch api.telegram.org:443 -> raw-tcp (127.0.0.1:50529) dispatch www.google.com:443 -> sni-rewrite tunnel (Google edge direct) dispatch httpbin.org:443 -> MITM + Apps Script relay (TLS detected) Combined with (1), users can now see in real time whether their traffic is hitting upstream_socks5. If it says 'raw-tcp (direct)' after they set the field, that's evidence of a real bug; if it never reaches the raw-tcp branch at all, that's the documented design (HTTPS → Apps Script). Also per user request, updated README: - Shields.io badges up top: latest release, total downloads, CI status, license, stars. - Short 'Heads up on authorship' note crediting Anthropic's Claude for the bulk of the Rust port (with the human-on-every-commit caveat). English and Persian mirrors both have it. All 37 unit tests pass.
…t-on-reopen bug)
A user reported that after Save-config, closing the UI, and reopening,
every form field was blank — but the config.json on disk still had all
the right values.
The culprit in the UI was load_form()'s swallow-errors pattern:
let existing = if path.exists() {
Config::load(&path).ok() // .ok() threw away the error
} else { ... };
if let Some(c) = existing { /* populate form */ } else { /* defaults */ }
When Config::load returned an Err, .ok() silently converted to None,
the form went back to defaults, and the user had no signal at all
that the load had failed or WHY. On every platform I could test
(macOS / Linux) the round-trip works fine with a real round-trip test
added in config.rs (config::rt_tests::round_trip_all_current_fields
and round_trip_minimal_fields_only — both green). So whatever's
failing for this specific reporter is environment-specific (weird
filesystem encoding, partial write, different field shape from an
older version, … TBD). Without visibility we can't diagnose it.
Changes:
1. load_form() now returns (FormState, Option<String>). The String
is a user-facing error message (with the full path + the
underlying parse/validate reason) when Config::load fails on an
existing file.
2. main() plumbs that error into App's initial toast, which sticks
for 30 seconds (vs the normal 5 for regular toasts) so users who
only open the UI briefly still see it.
3. Added tracing::info! in load_form for the success path too —
the Recent log panel now always shows either 'config: loaded OK
from <path>' or 'Config at <path> failed to load: <reason>' on
startup, regardless of toast timing.
4. Added two regression-guard tests in config.rs covering the
full-fields and minimal-fields save shapes the UI emits.
Next time a user reports this: they'll have the exact error in the
toast + the Recent log panel, and we can fix the actual bug instead
of shooting blind.
…ph#13) User on issue therealaleph#13 reported that even after installing the CA (and seeing it in the Windows cert manager UI), our 'Check CA' button still said 'NOT trusted'. Root cause: is_ca_trusted() on Windows was just returning false unconditionally — Check-CA has never worked on Windows. Fix: is_trusted_windows() now shells out to certutil: certutil -user -store Root 'MasterHttpRelayVPN' certutil -store Root 'MasterHttpRelayVPN' Checks both the user store (where our install_windows puts it by default) and the machine store (fallback path when user-store install is blocked). Requires certutil to print the cert name in stdout AND exit 0 — belt-and-suspenders against locales where certutil exits 0 even on an empty match. Also made the Check-CA UI message point users at the CA file path for cross-device install — the same user reported their Android V2rayNG client getting cert errors on our MITM-signed TLS leaves, which is the expected 'the phone doesn't trust our CA' scenario. The message now calls out the ca.crt path explicitly, and notes the Android 7+ user-CA restriction (Firefox Android works, Chrome and most apps don't trust user-installed CAs regardless). Not addressed (by design): - Replacing our CA keypair with Python-generated PEM fails to parse via rcgen. User tried this as a workaround before reporting. rcgen expects PKCS#8 PEM; Python's cryptography commonly emits PKCS#1 ('BEGIN RSA PRIVATE KEY'). Even if parsing worked, mixing an external CA with our leaf-issuing code would break the key match. Users should stick with our generated CA — that's the supported flow. The Python cross-contamination experiment is expected to fail; we don't document it as supported.
Two reasons to pin a copy in the repo: 1. Users on networks where raw.githubusercontent.com is intermittent can still get the deploy-ready file via a repo ZIP / clone. 2. The Apps Script relay protocol between mhrv-rs and Code.gs is informal — upstream changes can silently break us. Keeping a snapshot lets future-us diff against what we tested against when diagnosing protocol-drift bugs. Fetched verbatim from: https://raw.githubusercontent.com/masterking32/MasterHttpRelayVPN/refs/heads/python_testing/apps_script/Code.gs Credit stays with @masterking32. The assets/apps_script/README.md next to it calls out that we don't modify this file — users deploy it as-is into their own Google Apps Script project. Updated the Setup Guide link in both the English and Persian sections so offline / restricted-network users have a fallback path.
Thanks @hamed256 — armhf cross-compile verified locally, produces a valid ARM 32-bit ELF. Merging with a follow-up commit on main to pin the runner to ubuntu-22.04 (GLIBC 2.36 floor, same policy as our other linux-gnu targets) so it runs on Raspberry Pi users on Bookworm / Bullseye.
=== PR therealaleph#14 follow-up: armhf build runs on Pi Bookworm/Bullseye === PR therealaleph#14 (merged earlier) added arm-unknown-linux-gnueabihf to the release matrix but pinned os=ubuntu-latest, which is 24.04 with GLIBC 2.39. Target armhf sysroot on 24.04 is Debian Trixie (GLIBC 2.39), far too new for a Raspberry Pi 2/3 on Bookworm (2.36) or Bullseye (2.31) — users would get 'GLIBC_2.39 not found' the same way the Linux-amd64 issue therealaleph#2 folks did before we pinned them to 22.04. Fix: pin the armhf matrix entry to ubuntu-22.04, matching our other linux-gnu targets. Binary will link against GLIBC 2.35 max, which loads on Pi Bookworm and Bullseye. Also trimmed two trailing spaces. Locally verified the cross-compile: rust:latest + gcc-arm-linux- gnueabihf + proper CARGO_HOME config.toml produces a valid ARM 32-bit ELF (2.9 MB, armhf EABI5). === Issue therealaleph#15: 'Check for updates' button in the UI === New src/update_check.rs module. On the user's click (no polling): 1. Tcp-probes github.com:443 with a 5s budget. If unreachable, we return Offline(reason) instead of a confusing 'update check failed' — distinguishes 'you're offline' from 'GitHub API misbehaved'. 2. HTTPS GET api.github.com/repos/.../releases/latest via the tokio + rustls stack (same hand-rolled HTTP pattern as domain_fronter — no new crate deps). Parses tag_name, strips the v-prefix, loose-semver-compares to env!(CARGO_PKG_VERSION). 3. Returns one of four UpdateCheck variants: Offline / Error / UpToDate / UpdateAvailable { release_url }. New UI wiring (src/bin/ui.rs): - Cmd::CheckUpdate enqueue variant - UiState::last_update_check { InFlight, Done(result) } - 'Check for updates' button next to the CA buttons - Result displayed as a colored small-text line under the CA info: green 'up to date', amber 'update available v0.8.5 → v0.8.6' with a clickable release-page hyperlink, red for offline/error. Verified end-to-end with a live github.com fetch (got a rate-limit HTTP 403 from my IP because I've been hitting the API a lot, but that's the expected Error() state — response classification was correct). Three unit tests for is_newer() and a gated live test for the full round-trip. 43 tests pass.
=== UI redesign (zero new deps, same binary size) ===
Entire App::update() rewritten around three ideas:
1. Section cards. Form rows are grouped inside rounded frames with
faint fills and small-caps headings:
- 'Apps Script relay' — Deployment IDs (textarea) + Auth key
- 'Network' — Google IP (+inline scan button), Front
domain, Listen host, HTTP+SOCKS5 ports
on one row, SNI pool button
- Collapsing 'Advanced' — upstream SOCKS5, parallel dispatch,
log level, verify SSL, show auth key.
Closed by default — most users never
touch these.
2. Clearer action hierarchy. Primary buttons are accent-filled and
larger:
- Start (green filled, ▶ glyph, 120x32)
- Stop (red filled, ■ glyph, 120x32)
- Save config (blue accent filled, path shown inline after →)
- SNI pool (blue accent filled, inside Network section)
- Test relay (neutral, tall)
Secondary actions (Install CA / Check CA / Check for updates)
moved to their own compact row below, no longer competing.
3. Status + log clarity.
- Header version links to GitHub: → repo, →
the release tag page.
- Running/stopped status is now a pill-shaped colored chip at the
right end of the header (green fill + green dot when running,
red when stopped).
- Traffic stats in a 2-column layout inside the Traffic card —
7 metrics fit in 4 rows instead of a 7-row vertical strip.
- One compact transient status line above the log that auto-hides
after 10 seconds — replaces the previous stack of permanent
ca_trusted / test_msg / update_check labels that were pushing
the log panel off-screen.
- Log panel now has its own bordered frame (darker fill), a
'[x] show' checkbox that hides it entirely when off, a 'save…'
button that writes the current log buffer to a timestamped
log-YYYYMMDD-HHMMSS.txt in the user-data dir, and a 'clear'
button. Empty state shows a muted placeholder instead of
silent void.
All helper functions (section, primary_button, form_row) live at the
top of ui.rs as small local helpers — no new modules, no new
dependencies.
=== Stricter end-to-end test (test_cmd.rs) ===
Previous test passed on any HTTP 200 status regardless of body.
After a user pointed out that the test reported PASS even after
they deleted their Apps Script deployment, updated the pass criteria:
1. Status must contain '200 OK'.
2. Body must parse as JSON.
3. JSON must have an 'ip' field with a valid IPv4 or IPv6.
Anything else → SUSPECT (returns false), with a specific log message
like 'HTML returned instead of JSON. The Apps Script deployment may
be deleted, not published to Anyone, or requires sign-in.'
Also now emits tracing::info!/warn!/error! alongside println!, so
the verdict + detail show up in the UI's Recent log panel instead
of disappearing to a stdout nobody sees.
One new unit test: looks_like_ip() accepts v4+v6, rejects empty,
rejects malformed, rejects overflowed octets. 44 tests total, all
green.
Verified locally end-to-end — UI launches clean, form loads config
cleanly, Start/Stop/Save all fire correctly, Test relay produces
the new PASS/SUSPECT verdict with the tracing detail visible in
the log panel, Check-for-updates hits GitHub and resolves with the
compact auto-hiding status line.
…erealaleph#16) User @barzamini pointed out an optimization from the Python community (originally from seramo_ir): X/Twitter GraphQL URLs look like https://x.com/i/api/graphql/{hash}/{op}?variables=...&features=...&fieldToggles=... The features and fieldToggles params change across sessions and even within a session, busting our 50 MB response cache on every request to the same logical query. Stripping everything after 'variables=' lets identical logical queries collapse into one cache entry, dramatically reducing quota usage when browsing Twitter through the relay. Implementation: - src/domain_fronter.rs: new normalize_x_graphql_url() helper. Matches exactly the Python patch's pattern (host == 'x.com', path starts with /i/api/graphql/, query starts with variables=). Truncates at the first '&' past the '?'. Applied at the top of relay() so the normalized URL feeds BOTH the cache key AND the request sent to Apps Script — so we save on Apps Script quota too, not just on return-trip bytes. - src/config.rs: new opt-in normalize_x_graphql bool (default false). Off by default because strict X endpoints may reject trimmed requests; user should flip it on and watch for regressions. - src/bin/ui.rs: checkbox in the Advanced section, 'Normalize X/Twitter GraphQL URLs', with tooltip explaining the trade-off and crediting the source. - Four new unit tests in domain_fronter::tests covering: the happy path trim, non-x.com hosts pass through unchanged, non-graphql x.com paths pass through unchanged, and idempotency. 48 tests total, all green. Credit: idea by seramo_ir, Python patch at https://gist.github.com/seramo/0ae9e5d30ac23a73d5eb3bd2710fcd67, implementation request by @barzamini in issue therealaleph#16.
…therealaleph#15 follow-up) @zula-editor reported on issue therealaleph#15 that the Check-for-updates button was returning HTTP 403 on their ISP — classic GitHub unauthenticated-API rate limit (60/hour per IP) on a shared NAT IP. They also asked for the update to actually be downloadable from the app, not just a page link. Both addressed: === Route update check through our own proxy when running === New mhrv_rs::update_check::Route enum: - Direct: straight rustls to api.github.com (existing behavior) - Proxy { host, port }: HTTP CONNECT through our local HTTP proxy listener → MITM → Apps Script → api.github.com. When the proxy is running, the UI automatically picks Proxy. From GitHub's POV the request now comes from Apps Script's IP range (a Google datacenter) — completely different rate-limit bucket from the user's ISP IP, AND works even if GitHub is blocked on their network. Routing over proxy means the MITM leaf for api.github.com has to be trusted in the update_check's TLS config. build_root_store() now conditionally adds our own CA cert from data_dir::ca_cert_path() to the webpki roots when Route::Proxy is in use. Direct path is unchanged. === Download button === The UpdateCheck::UpdateAvailable variant now carries an optional ReleaseAsset { name, download_url, size_bytes } picked by pick_asset_for_platform() from the GitHub API's assets[] array. Preference list per (OS, arch): - macOS arm64 → mhrv-rs-macos-arm64-app.zip, else tar.gz - macOS amd64 → mhrv-rs-macos-amd64-app.zip, else tar.gz - Windows → mhrv-rs-windows-amd64.zip - Linux aarch64 → mhrv-rs-linux-arm64.tar.gz - Linux armv7 → mhrv-rs-raspbian-armhf.tar.gz - Linux x86_64 → mhrv-rs-linux-amd64.tar.gz UI: when an update is available AND we have an asset, the transient status line grows an accent-blue 'Download X.Y MB' button. Clicking fires Cmd::DownloadUpdate, which pipes the asset through the same Route (proxy if running, direct otherwise), writes it to UserDirs::download_dir() (~/Downloads on most systems), and shows a 'show in folder' button that opens Finder / Explorer / xdg-open on the containing directory. Three new unit tests for asset-picking. The gated live test now takes a Route argument (Direct) so it keeps working across the API shape change. 49 tests pass. Also refreshed in-repo releases/ archives to v0.9.1 alongside.
…sue therealaleph#18) @Behzad9 on therealaleph#18: the OpenWRT 'No file descriptors available' errors are back in v0.8.0+, this time logged as a wall of thousands of identical ERRORs within seconds of activating the proxy. Two real bugs, now fixed: === 1. accept() loop had no backoff === Previous code: loop { match listener.accept().await { Ok(x) => ..., Err(e) => { tracing::error!(...); continue; } // tight loop } } On EMFILE (RLIMIT_NOFILE exhausted), accept() returns synchronously, the match re-runs instantly, accept() EMFILEs again, forever. The tight loop ALSO starves the tokio runtime of CPU that existing connections need to finish and close their fds — so the problem never clears on its own. It's a self-sustaining meltdown. New accept_backoff() helper (in proxy_server.rs) wraps both the HTTP and SOCKS5 accept loops: - Detects EMFILE/ENFILE via raw_os_error (24 or 23). - Sleeps proportional to how long the pressure has lasted (50 ms first hit, ramping to a 2 s cap around hit therealaleph#40). Gives existing connections a chance to finish and free fds. - Rate-limits the log line: one WARN on the first EMFILE with fix instructions, then one every 100 retries. No more walls of identical errors. - Resets the counter on the next successful accept. - Non-EMFILE errors (ECONNABORTED from clients that went away during handshake, etc.) get a plain single-line error + 5 ms sleep so we still don't tight-loop on any unexpected error. End-to-end verified: ran mhrv-rs under , flooded the SOCKS5 port with 247 concurrent connections to trip EMFILE. Before: log would have been 1000s of identical lines. After: exactly 1 warning, listener stayed quiet, fds drained, accept resumed. === 2. RLIMIT_NOFILE bump was too conservative + silent === Previous behavior: target 16384 soft, cap to existing hard limit, no log. On constrained systems where hard is already tiny, we'd stay at the tiny limit silently. rlimit.rs now: - Targets 65536 soft. - ALSO tries to raise the hard limit up to /proc/sys/fs/nr_open on Linux (Linux allows a non-privileged process to bump its own hard limit up to the kernel ceiling, usually 1048576 on modern kernels). On macOS/BSD we skip this — only bump soft. - Logs WARN on startup if soft ends up <4096 with the exact fix ('ulimit -n 65536' or use the procd init). No more silent failure. - Logs INFO with the before/after limits otherwise, so field bug reports tell us immediately whether the kernel cap is the real bottleneck. Moved the rlimit call from main() pre-logging to post-init_logging so its tracing output actually lands in the log panel + stderr. Small reorganization only. 49 tests pass, musl x86_64 cross-compile verified locally.
Linux / Android / mipsel build jobs now run on two self-hosted runners on a Hetzner 8-core / 31 GB Ubuntu 24.04 box with Rust, Android SDK+NDK r26c, all cross-compile toolchains and Docker pre-installed. macOS and Windows still run on GitHub-hosted — we don't self-host those OSes and the free minutes on a public repo are plenty. Adds Swatinem/rust-cache@v2 to every cargo-using job so target/ + cargo registry survive between runs. With warm caches the Linux jobs take ~1min each and the Android job ~3-4min; cold runs are ~9min for Android and ~2min for everything else. Release wall time before this change was ~13m consistently; it should now sit around 6-7m. No new user-facing code in this release — primarily an infra change exercised by an actual tag-push so we verify the full pipeline works end-to-end from the new runners.
therealaleph#78) Validate Content-Range in the range-parallel path before stitching. Malformed 206s are no longer combined into a fake 200 OK; invalid probes fall back to a normal single GET, invalid later chunks fall back to the validated probe response.
Reject configs that set HTTP and SOCKS5 listeners to the same port. Enforced both at config-load and in the UI form so users get a clear error before bind-time failure. Adds a focused regression test.
…roid note - PR therealaleph#78: validate Content-Range on 206 responses in the range-parallel path before stitching. Prevents malformed partials from being combined into a fake 200 OK. Invalid probe falls back to a normal single GET; invalid later chunks fall back to the validated probe response instead of shipping truncated/wrong data. - PR therealaleph#79: reject configs with listen_port == socks5_port at validation time (both config-load and UI form) instead of letting the second bind fail at runtime with a less clear error. - README: add an explicit note about the Android 7+ user-CA trust limitation so future reporters (therealaleph#74, therealaleph#81, and the next dozen) find the answer in the docs instead of in a support thread. The previous "every app routes through the proxy" line was misleading — TUN captures all IP traffic but HTTPS still needs app-level trust of our MITM CA, which most non-browser apps don't grant. Running through the new self-hosted CI pipeline. Warm rust-cache should bring the full matrix in under ~7 minutes.
v1.2.4 tagged cleanly but its CI failed — parallel Linux matrix jobs
on the self-hosted runners all raced on `/var/lib/apt/lists/lock` and
failed the `sudo apt-get install` step within ~20s. v1.2.4's release
job therefore skipped and no assets were published.
Fix:
- Pre-installed every apt dependency the workflow needs on both
self-hosted runners (eframe system libs, gcc-aarch64-linux-gnu,
gcc-arm-linux-gnueabihf).
- Seeded per-runner cargo linker configs at
/home/ghrunner/cargo-{01,02}/config.toml so the "echo
[target.xxx] linker = ..." workflow step is also unnecessary.
- Gated the "Install Linux eframe system deps" and the two cross-
compile-toolchain steps on `runner.environment == 'github-hosted'`
so only hosted runners call apt-get; self-hosted runners skip the
whole thing and use pre-installed tooling.
Re-tagging as v1.2.5 since v1.2.4 is an abandoned tag (git tag exists
but no GitHub Release was cut for it).
Same code changes as what v1.2.4 was meant to ship: PR therealaleph#78 range-
parallel validation, PR therealaleph#79 port-collision rejection, README note
on Android 7+ user-CA trust.
New `mhrv-rs scan-sni` subcommand: pulls Google's published IP ranges, issues PTR lookups via dns.google, filters results to Google-related hostnames, then TLS-probes each discovered SNI against the user's configured `google_ip`. Prints the SNIs that pass DPI for the user to paste into `sni_hosts`. Also expands the hardcoded FAMOUS_GOOGLE_DOMAINS list the existing scan-ips command already used. Adds `url` crate for URL parsing in the DNS-over-HTTPS client. No other behavioural changes.
v1.2.4 and v1.2.5 both cut clean tags but CI failed downstream for different self-hosted reasons: - v1.2.4 failed on parallel apt-lock race (fixed) - v1.2.5 failed with "TOML parse error at line 5 column 9" because rust-cache v2's default cache-bin=true prunes $CARGO_HOME/bin of any binary not installed via `cargo install`. `rustup` itself is installed by rustup-init, not cargo install, so it got flagged as "unknown" and deleted on cache save. Next job hits the cargo symlink that points at a missing rustup, which resolves somehow to a very old cargo that can't parse our Cargo.toml. Fix: - Set `cache-bin: "false"` on every Swatinem/rust-cache@v2 call. We still cache target/ + registry (the big win), just not bin/. Binaries are stable across runs on our self-hosted box anyway. - Reinstalled rustup inside each per-runner CARGO_HOME on the server to recover from the broken state. Also in this release: - PR therealaleph#83: new `mhrv-rs scan-sni` subcommand. Pulls Google's published IP ranges, does PTR lookups via dns.google on each IP, filters to Google-related hostnames, then TLS-probes each discovered SNI against the configured google_ip to see which ones bypass DPI. Useful for rebuilding a working SNI pool on a new ISP. Adds the `url` crate dep. Same user-facing code as v1.2.4/v1.2.5 (PRs therealaleph#78, therealaleph#79, README Android note) plus PR therealaleph#83 and the CI fixes on top.
…herealaleph#92) The googl.com shortener domain is NOT in Google's GFE certificate SAN list — verified via `openssl s_client -verify_hostname accounts.googl.com` returning hostname mismatch. Every Nth connection where the rotation landed on this entry was failing cert validation with `verify_ssl=true`. Replaced with accounts.google.com which is covered by *.google.com wildcard.
Standalone Rust/axum HTTP server + Apps Script-side CodeFull.gs for users who want to deploy a remote tunnel node. All new files; no changes to the main Rust crate. This is part 1 of 3 of the full-tunnel feature — it adds scaffolding that users can opt into once the Rust-side Mode::Full lands in therealaleph#94.
…herealaleph#92 + therealaleph#93) - Android DEFAULT_SNI_POOL: mirror the Rust-side fix from therealaleph#92 — accounts.googl.com replaced by accounts.google.com. Same cert-SAN mismatch that was failing every Nth rotation in the Rust client affected the Android user's sniHosts population; both pools need to stay in sync by design. - Release rolls up PR therealaleph#92 (cert fix) and PR therealaleph#93 (tunnel-node + CodeFull.gs scaffolding). PR therealaleph#93 adds a standalone binary under tunnel-node/ plus an Apps Script companion; no main-crate changes, so this is a zero-risk merge. Users who want to deploy a tunnel node can start today. The dispatch that activates `mode: full` is still in review in PR therealaleph#94.
dns.google replies with Transfer-Encoding: chunked; the raw payload was being handed to serde_json with chunk framing still embedded, so every PTR parse failed and scan-sni discovered nothing. Parses the HTTP response (chunked + Content-Length) before JSON decode. Includes 3 new unit tests.
The scan-sni DoH client to dns.google was using NoVerify — an on-path MITM could forge PTR answers and poison the discovered SNI pool. This is a public HTTPS request, not a fronted probe, so certificate validation belongs ON. Switched to the normal webpki root store.
…herealaleph#104) filter_forwarded_headers was stripping hop-by-hop headers (Host, Connection, Content-Length, etc.) but not identity-revealing forwarding headers. If a user sat behind another proxy or ran a browser extension that inserts any of: X-Forwarded-For, X-Forwarded-Host, X-Forwarded-Proto, X-Forwarded-Port, X-Forwarded-Server, X-Forwarded-Ssl, Forwarded, Via, X-Real-IP, X-Client-IP, X-Originating-IP, True-Client-IP, CF-Connecting-IP, Fastly-Client-IP, X-Cluster-Client-IP, Client-IP those would carry the client's real IP all the way through the Apps Script relay to the origin server. Stripping them so the origin only ever sees whatever source IP the Apps Script / GFE path terminates on. This covers the Apps Script relay path (the main leak vector). The SNI-rewrite tunnel path is a raw TLS byte bridge — it doesn't parse HTTP at all — so any headers the client emits there pass through as opaque bytes to the Google edge that terminates TLS. In practice that's narrower (origin sees GFE) but documenting the caveat on the issue thread. Adds a focused regression test that locks in every stripped header. Reported in therealaleph#104.
Ports the upstream Python `youtube_via_relay` flag (commit a0fd8a0 in masterking32/MasterHttpRelayVPN). When enabled, YouTube-family suffixes (youtube.com, youtu.be, youtube-nocookie.com, ytimg.com) opt out of the SNI-rewrite tunnel and fall through to the Apps Script relay path. Why it helps some users: when YouTube is reached via SNI-rewrite to google_ip with SNI=www.google.com, Google's frontend can enforce SafeSearch / Restricted Mode based on the SNI name, causing "video restricted" errors on some regular videos. Routing through Apps Script bypasses that specific filter at the cost of (a) UrlFetchApp's fixed `User-Agent: Google-Apps-Script`, and (b) counting YouTube traffic against the script's daily quota. Off by default so existing behaviour is unchanged. Users who hit the SafeSearch-on-SNI issue can set `"youtube_via_relay": true` in their config.json and observe. Explicit `hosts` overrides always beat the toggle — that's a user choice and should win over the default policy. Added tests for all three branches (youtube_via_relay off, on, and with hosts override). Matching Android-side UI toggle deferred — `normalize_x_graphql` is also config-only on Android today; users can edit config.json directly if needed.
Rollup of four merged fixes since v1.2.7: - security: strip identity-revealing forwarding headers in the Apps Script relay path. Closes the XFF leak vector from issue therealaleph#104 — users chained behind xray/v2rayNG or running browser extensions that inject X-Forwarded-For / Forwarded / Via / CF-Connecting-IP etc. would previously have those forwarded to the origin via the relay. Now stripped to 16 header variants with a regression test. - proxy: new `youtube_via_relay` config toggle (therealaleph#102). Routes YouTube family suffixes through Apps Script instead of the SNI-rewrite tunnel. Trades SafeSearch-on-SNI for Apps Script's fixed User-Agent + quota cost. Off by default. - scan_sni: decode chunked dns.google DoH responses (therealaleph#97, from @freeinternet865). Without this, PTR lookups always failed and scan-sni discovered zero domains. - scan_sni: verify dns.google TLS with webpki roots (therealaleph#98, from @freeinternet865). The DoH request is a normal public HTTPS call — an on-path MITM should not be able to forge PTR answers and poison the suggested SNI pool. 73 tests pass (up from 67 — three new chunked-decode tests + one XFF-filter + two youtube_via_relay branches).
v1.2.8 tagged cleanly but CI failed compiling mhrv-rs-ui with: error[E0063]: missing field `youtube_via_relay` in initializer of `mhrv_rs::config::Config` When I added the youtube_via_relay field to the main Config struct in 21912cc, I missed the struct-literal construction in src/bin/ui.rs (FormState::save_to_config) and the ConfigWire serializer. Fixed here: - Added youtube_via_relay field to FormState (line 214), read path (line 291), default path (line 316), and the save path (line 451) - Added youtube_via_relay field to ConfigWire (line 493) with skip_serializing_if on false, plus its From impl (line 544) UI still doesn't expose a checkbox for the toggle — it's config-only for now, same treatment as normalize_x_graphql. A future PR can add the checkbox to the Advanced pane. v1.2.8 tag exists but has no GitHub Release (release job skipped on failure); v1.2.9 is the clean cut. Same payload as v1.2.8 plus this fix.
…hem (therealaleph#99) Before: `ProxyServer::run()` aborted only the two accept tasks on shutdown (`http_task`, `socks_task`), but every per-client task was spawned as a bare `tokio::spawn(...)` whose JoinHandle was discarded. Aborting the accept loop stopped taking new connections, but in-flight clients kept running on the runtime with their captured (stale) `Arc<DomainFronter>`. User-visible symptoms reported by @r-safavi in therealaleph#99: 1. Hitting Stop in the UI didn't actually stop serving: Firefox still reached x.com through the proxy even though the user expected a "connection refused." 2. Starting again with a changed auth_key worked for NEW domains (yahoo.com) but not for domains with a live keep-alive (x.com) — because the old child task was still using the old fronter with the old key. 3. Apps Script quota could be consumed after the user thought they'd stopped. Arguably the worst of the three. Fix: wrap per-client spawns in a `tokio::task::JoinSet<()>` scoped inside each accept task. When the accept task is aborted on shutdown, the JoinSet is dropped, and `JoinSet::drop` aborts every still-running child — closing their sockets and dropping their Arc clones of the fronter, which in turn drops the pool. Also added an opportunistic `try_join_next()` drain before each accept() so the JoinSet doesn't grow unbounded with completed-task handles on long-running proxies. Covers Finding 2 of therealaleph#99. Finding 1 (quota-exceeded → timeout instead of surfacing Apps Script's 502) is a separate pool-staleness issue and stays open for now.
Single-focus release. The Stop button in the UI previously only stopped new connections from being accepted — in-flight clients kept running on the old DomainFronter, which meant: - Pages kept loading after Stop (users thought they'd stopped) - Auth-key changes didn't take effect for domains with a live keep-alive to the proxy - Apps Script quota could still be consumed post-Stop Fix (c8f4a0c): wrap per-client spawns in a tokio::task::JoinSet inside each accept loop. On shutdown, aborting the accept task drops the JoinSet, which aborts every in-flight client. Sockets close, the old fronter's TLS pool drops, and a subsequent Start builds a clean new state. Finding 1 of therealaleph#99 (quota-exceeded → "timeout" instead of the real 502 body) is a separate pool-staleness issue and is NOT addressed in this release.
…alaleph#64) The x.com GraphQL URL-length fix added in v1.2.1 (cbe06b5) only matched exact host "x.com". But browsers actually navigate to www.x.com, and api.x.com serves GraphQL endpoints too — the original fix never fired for real traffic. @pourya-p's log in therealaleph#64 made this unambiguous: relay GET https://www.x.com/i/api/graphql/<hash>/HomeTimeline?variables=...&features=... ... ERROR Relay failed: relay error: Exception: بیش از حد مجاز: طول نشانی وب URLFetch. (That Persian text is Apps Script's "URLFetch URL length exceeded" error, which is exactly what the truncation was supposed to prevent.) Widened the host matcher to `host == "x.com" || host ends with ".x.com"` so www.x.com / api.x.com / any future x.com subdomain all hit the rewrite. The path-pattern constraint (`/i/api/graphql/... ?variables=`) already filters to the right endpoints. 73 tests still pass.
…ph#64) Single-bug release. Unblocks x.com browsing for users whose browsers resolve to www.x.com rather than bare x.com — i.e. essentially everyone using Firefox / Chrome / Safari. Previous releases still advertised the URL-truncation fix as working but it only matched exact Host: x.com, which never happens in real traffic. v1.2.11 widens the matcher to x.com + *.x.com so www.x.com, api.x.com, and any future x.com subdomain all get the shortened URL through Apps Script's URL length cap.
Adds a new `mode: full` that tunnels ALL traffic end-to-end through Apps Script → a remote tunnel node. Browser does TLS directly with the destination. No MITM, no CA installation needed on the client device. Ships as part of the 3-PR series: therealaleph#93 (tunnel-node service + CodeFull.gs, merged) + this (Rust-side Mode::Full + batch tunnel client) + therealaleph#95 (Android UI dropdown, now rolled into this PR post-rebase). ### Architecture - Client → mhrv-rs → script.google.com (Apps Script fetch) → tunnel-node on user's VPS → real destination - Apps Script is the transport to reach the VPS; works even when the ISP blocks direct VPS IPs - Batch multiplexer collects data from all active sessions and ships one Apps Script request per tick ### Safety properties of this merge - AppsScript + GoogleOnly dispatch paths are **unchanged**; Full mode is an additive branch at the top of `dispatch_tunnel`. - `tunnel_client.rs` is a new isolated module (387 LOC). - `tunnel_request()` is a new method on `DomainFronter`, no change to `relay()` / `relay_parallel_range()`. - Config: additive `Mode::Full` variant + validation tests (2 new); existing validation rules untouched. - Local build: clean compile. `cargo test --quiet`: 75 passed (73 → 75 with 2 new config tests). ### Closes Unblocks the feature requested in therealaleph#61, therealaleph#69, therealaleph#100, therealaleph#105, therealaleph#110, therealaleph#111, therealaleph#113, therealaleph#116. ### Testing vahidlazio has iterated on prior review feedback. End-to-end testing with a real tunnel-node deployment will follow post-merge from @Feiabyte (volunteered in therealaleph#61). Post-merge CI will exercise compile + full test matrix across all targets; any regression caught there gets a fast-follow fix.
Rollup of PR therealaleph#94 — Mode::Full dispatch + batch tunnel client. Ships the long-awaited no-MITM path that was the motivating fix for half the open issues this week. User-facing: add `"mode": "full"` to config.json, deploy CodeFull.gs as a second Apps Script alongside your existing one, deploy tunnel-node (tunnel-node/README.md) on a VPS, and traffic is tunneled end-to-end: client → mhrv-rs → script.google.com → your tunnel node → destination. Browser speaks TLS directly with the destination; we never see plaintext. No CA needed on the client device. Android side gets a "Full tunnel (no cert)" dropdown option; toggling it writes `"mode": "full"` to config.json. Safety: Mode::AppsScript and Mode::GoogleOnly dispatch paths are unchanged — Full mode is an additive branch at the top of dispatch_tunnel. Existing users on the default apps_script mode see zero behaviour change. Testing status: compiles clean on all 10 CI targets; 75 tests pass (+2 new config-validation tests for Full mode); end-to-end real-VPS testing will come post-release from @Feiabyte and others who opt in. Any Full-mode regression gets a fast-follow fix.
|
Reviewed the diff and ran locally on macOS host — That said, I can't auto-merge this one for two concrete reasons: 1. macOS + Linux E2E paths are self-declared as untested. You called this out in the PR body, which I appreciate. But 2. You're a new contributor to this repo (first PR). Combined with 1377 additions in a security-adjacent module, the conservative thing is to leave this open for explicit maintainer sign-off rather than auto-merging on my "cargo tests pass" alone. That's a rule rather than a reflection on the code quality. What would unblock merge: Ideally three smoke tests from three separate reviewers (one macOS, one Debian/Ubuntu-family Linux, one Fedora/RHEL-family Linux) walking through your I'll flag this in the repo for maintainer attention. If nobody steps up within a few days, I can run the macOS path myself on a disposable VM rather than my host machine — but that's slower than someone who can do it on their actual device. Two small review notes on the diff itself (not blockers):
Thanks again for the depth — this is the kind of PR I'd love to see more of. [reply via Anthropic Claude | reviewed by @therealaleph] |
|
Thanks for the careful review, the detailed context on the merge policy is appreciated, and the code-quality compliment means a lot. Quick note on the first-contributor point: this is actually not my first PR to this repo. I also contributed the On why this feature matters: the MITM CA this app installs has its private key on the user's disk, and the OS trusts it for every HTTPS site. That's a non-trivial capability to leave lying around, if the key is ever exposed (lost laptop, leaked backup, a machine sold or handed down, etc.), anyone holding it can mint certificates the browser silently accepts as legitimate for any domain. And on the OS side, a stale trusted root is effectively a standing MITM authorization until someone notices it in the cert store. Without a clean-slate uninstall path, users who try mhrv-rs and move on, or switch to Full Tunnel Mode and no longer need a local MITM, end up with that capability sitting on disk and in their trust store indefinitely. So I think having a one-command clean removal is worth some review rigor, which I fully agree with. On the two small notes, I've pushed a follow-up commit addressing both:
Test count: 109/109 passing. On the platform-coverage bar completely reasonable, and no pressure on the VM offer. Take it when it's convenient. If any macOS or Linux reviewer sees this thread and wants to walk the Thanks again. |
|
Heads-up: we just rewrote git history on This PR's branch is based on the pre-rewrite SHAs, so you'll need to rebase before it can merge cleanly. Easiest path: git fetch origin
git checkout feature/delete_certificate
git rebase origin/main
# resolve any conflicts (your changes don't touch any rewritten files,
# so this should be conflict-free — just SHA pointers updating)
git push --force-with-leaseIf Functionally nothing about your PR has changed and the review is still where we left it — still awaiting macOS + Linux smoke tests from reviewers. Sorry for the disruption. [reply via Anthropic Claude | reviewed by @therealaleph] |
|
Hey @dazzling-no-more — thanks for the cert-removal work. The feature itself looks solid (the The diff is showing +16,403 / -1,210 across 95 files because the fork point predates a lot of work that's already in
So GitHub is showing this as adding all of that again, not because you wrote it twice but because git thinks your branch lacks those commits. Could you do a clean rebase onto current If the rebase gets ugly because the cert-installer conflicts are nontrivial, the cleanest path is probably: Once the diff is just the The Windows smoke test is enough to start — I'll do macOS, and we can ask for a Linux pair from someone with a Debian + RHEL box once the diff is reviewable. [reply via Anthropic Claude | reviewed by @therealaleph] |
Summary
mhrv-rs --remove-cert(CLI) and a Remove CA button in the desktop UI for a verified clean-slate CA revocation: clears the OS trust store (macOS login+system keychains, Linux anchor dirs, Windows user+machine Trusted Root), best-effort NSS cleanup (Firefox profiles + Chrome/Chromium on Linux), and deletes the on-diskca/directory.config.jsonand the Apps Script deployment are never touched, so users don't have to redeployCode.gs.is_ca_trusted_by_name()verification runs before file deletion and before NSS mutation. A failed OS removal returnsRemovalIncomplete, preservesca/, and leaves browser state alone — retries are idempotent.RemovalOutcome::{Clean, NssIncomplete}lets the UI/CLI print accurate "OS CA removed, browser cleanup partial" status instead of silent false success.reconcile_sudo_environment()detectsgeteuid() == 0 + SUDO_USERat each binary'smain()entry and re-rootsHOMEto the invoking user — so data dir / Firefox profiles / macOS login keychain target the real user rather than root.Only Windows has been smoke-tested end-to-end (Install → Check → Remove → Check round-trip via both CLI and UI, plus the mutex-on-flags exit-2 behavior). The macOS and Linux paths are built from the existing install-side patterns and covered by unit tests for all the pure logic, but the platform-specific
security delete-certificate/update-ca-certificates/trust extract-compatcode paths have not been executed on real hardware in this branch. A reviewer on macOS and a reviewer on at least one Linux distro (ideally one Debian-family and one RHEL-family) walking through the test plan below before merge would be valuable.What changed
remove_ca+ per-platform helpers,RemovalOutcome,NssReport,reconcile_sudo_environment, marker-gated Firefoxenterprise_rootspref (user-authored lines preserved), idempotent NSS delete that distinguishes "cert not found" from DB-locked/corrupt errors (regression guard forSEC_ERROR_LOCKED_DATABASE)--remove-certflag, mutually exclusive with--install-cert, callsreconcile_sudo_environment()at startupCmd::RemoveCahandler, sharedcert_op_in_progressgate covering both Install and Remove, active-proxy guard for Remove (the CA keypair is live in memory while the proxy runs)MasterHttpRelayVPN) for manual cleanup paths, upgrade note about the pre-markerenterprise_rootscosmetic orphanTests
29 new unit tests covering the pure logic:
user.jsmarker-block install/strip roundtrips and idempotency (bare lines respected as user-owned)getent passwdhome-dir parsing (Debian format + malformed inputs + macOS fallback semantics)NssReport::is_clean()state rulesSide-effecting paths (
security,certutil,update-ca-certificates) are covered by manual E2E per platform since the codebase doesn't yet have a command-runner abstraction.Test plan
Windows — ✅ smoke-tested locally
cargo test --lib— 101/101 passes locallyfile=missing trust_store=not trustedafter Removemhrv-rs --install-certthenmhrv-rs --remove-cert; verify%APPDATA%\mhrv-rs\ca\gonemhrv-rs --install-cert --remove-certreturns exit 2 with--install-cert and --remove-cert cannot be combinedStill to test
Detected sudo invocation (SUDO_USER=…): re-rooting HOME to …and that the cert is removed from the real user's Firefox user.js /~/.pki/nssdb, not root'supdate-ca-certificates(e.g. move it aside) and confirmca/survives +RemovalIncompleteis reportedmhrv-rs --remove-certas normal user, confirm no sudo prompt (system-keychain probe avoids escalation when the cert isn't there)cert_op_in_progressgate prevents the raceCompatibility with
Mode::Full(#94)Full mode doesn't use the MITM CA, so Remove CA is harmless there:
ca/deleted if present → no-op in practice.