Skip to content

Latest commit

 

History

History
358 lines (287 loc) · 15.6 KB

File metadata and controls

358 lines (287 loc) · 15.6 KB

Decision: Extract lean_release once API stabilizes

Context

MobDev.OtpAudit and Mix.Tasks.Mob.AuditOtp (added 2026-05-02) do something genuinely novel: lib-level reachability analysis of a Mix release tree, with cache-cruft + duplicate-version detection. The audit tool already paid for itself — caught ~10 MB of cruft shipping in every Mob iOS release that nobody had noticed before.

The same tool is useful to anyone shipping an Elixir release — Burrito, Bakeware, Nerves, plain mix release — not just Mob. The existing prior art in the ecosystem is strip_beams: true (debug-info stripping, ~30% wins) and hard-coded strip lists in Nerves; nobody publishes a tool that does empirical reachability + cache hygiene.

Decision

Build the next phases (empirical trace harness, mix mob.release --slim) in mob_dev. Extract to its own repo + Hex package once the public API stabilizes. Working name: lean_release.

Why not extract today

The API will reshape as the trace-harness work lands:

  • OtpAudit.report will gain a :trace_data field
  • OtpAudit.audit/2 will gain a :trace_input opt
  • report.strippable_libs will get a confidence tier (static-only vs static+trace vs hardcoded baseline)
  • The Mix task will likely split into audit_otp (read-only), trace_otp (instrument + capture), slim_release (strip + verify)

Anyone consuming a published v0.1 today would hit constant breaking changes. Without external contributors yet, the public commitment buys us nothing and costs friction.

Why extract eventually

  • Mob is a niche framework; the audit tool is general-purpose
  • Burrito + Bakeware ship full OTP unmodified; they'd benefit
  • Nerves uses hardcoded strip lists; an audit-driven approach is better
  • lean_release shows up on Hex, gets discovered by anyone hitting release-size pain
  • A clean public artifact attracts collaborators on the harder empirical-trace work

When to extract

Trigger conditions (any one of these):

  1. The API hasn't changed in 2 consecutive Mob releases
  2. Someone external asks "is this published?"
  3. The empirical-trace harness lands and produces actionable results
  4. We have at least one non-Mob app using the audit (e.g. ran it manually against another Elixir release)

Prior art and references

mix_unused (Hauleth)

https://hexdocs.pm/mix_unused/Mix.Tasks.Compile.Unused.html — community pointer when this work was discussed. Static AST analysis of project source, flags public functions never called.

  • Different layer than OtpAudit. We do app/module reachability across the whole release; mix_unused does dead-public-function detection inside one project. They stack, they don't compete.
  • Blind to dynamic dispatch (apply/2, apply/3, runtime module lookup). Mob and its apps use a fair bit of this — render-tree dispatch, NIF stub lookup, component registry — so expect false positives needing an ignore list.
  • Trial plan when work resumes: install in mob_dev first (least dynamic, highest signal), then square_triangle, then mob itself. Decide whether the ignore-list maintenance pays for itself before wiring it into mix mob.doctor.

Peer Stritzinger / GRiSP — closest prior art for shrinking

Stritzinger has been doing the same thing we're doing, at one-tenth our scale. Headline result (mid-2025, Code BEAM Stockholm): BEAM boots in 16 MB on GRiSP Nano. Reaches an Erlang shell, runs OTP, TCP/IP, USB.

References:

  • https://github.com/grisp/rebar3_grisp — their build plugin. Most useful artifact: shows how they decide which OTP modules to include and how they assemble a stripped ERTS. This is the rebar3 analog of what mix lean_release.slim should do. Read the source when starting the slim-release implementation, not before — it'll inform the design but isn't load-bearing for the audit work.
  • https://www.grisp.org/resources — current talk index. The 2025 Stockholm talk ("Squeezing the BEAM into 16MB" or similar) is the current technical reference; the 2017-era YouTube video is older and superseded.
  • Open question whether lean_release should reuse any GRiSP code or just the techniques. They're rebar3-native; we're Mix-native. Likely a re-implementation, not a port.

Outreach

When lean_release is closer to extraction (per the trigger conditions above), reach out to Stritzinger directly. He's an active community member; the GRiSP work is the closest prior art in Erlang-land. Trading notes is likely valuable both ways — our trace harness (empirical reachability from a running app) is something embedded developers don't need but Phoenix/LiveView shops would use.

Naming

lean_release — descriptive, available on Hex, reads well in mix lean_release.audit / mix lean_release.slim.

Considered + rejected: beam_diet (cute but unprofessional), unship (too clever), release_inspector (boring), otp_audit (we're already calling our internal module that, fine for internal but generic on Hex).

Pre-extraction checklist (when the trigger fires)

  • Move MobDev.OtpAuditLeanRelease.Audit (or just LeanRelease)
  • Move Mix.Tasks.Mob.AuditOtpMix.Tasks.LeanRelease.Audit
  • Add Mix.Tasks.LeanRelease.Slim (strip command)
  • Generic path discovery: look for _build/prod/rel/<app>/lib (standard Mix release output) instead of mob-specific dirs
  • Mob keeps a thin Mix.Tasks.Mob.AuditOtp shim that adds mob's release-tree path to LeanRelease's search list
  • mob_dev gains {:lean_release, "~> 0.1"} dep
  • README + guide on hexdocs
  • Initial Hex release as 0.1.0

When work resumes — quick start

Before doing anything else:

  1. Read this whole file, including the prior art section above.
  2. Re-run mix mob.audit_otp against a current Mob iOS release to establish the baseline (saved cruft total, current strip list).
  3. Decide whether the next phase is: (a) mix_unused evaluation, (b) empirical-trace harness, or (c) mix mob.release --slim. Pick one; don't fan out.

Progress log

2026-05-11 — Slim pass extracted from MobDev.NativeBuild

MobDev.OtpAudit.Slim now owns the in-place strip pass that mix mob.deploy --slim runs. The hardcoded prefix list is its source of truth (Slim.hardcoded_prefixes/0); per-app mob.exs overrides (:slim sub-keyword with :keep_libs / :drop_libs) let users expand or restrict the strip set without code changes. 22 unit tests against fixture trees pin every phase.

Deliberately deferred: audit-driven auto-expansion of the strip set. A baseline mix mob.audit_otp run against ~/code/pigeon showed audit.strippable_libs catches exqlite (1.3 MB) as unreachable — a true false positive, since exqlite loads via :erlang.load_nif which the static call graph can't see. Auto-union is blocked on either (a) tighter foreign-app detection (cross-reference _build/dev/lib/ to distinguish leftover cache from real runtime deps) or (b) trace data from MobDev.OtpTrace providing the empirical reachability signal. Both are higher-leverage next steps than mix_unused.

Same baseline run also surfaced: the audit's looks_like_user_app? heuristic missed obvious foreign apps (pigeon, push_notify, phase2q_lv, phase2q_smoke, pythonx_ios_spike) because the prefix list is hardcoded too narrowly (test_, toy_, mob_test). Tightening that is its own task — should land before the audit-driven slim union since it removes false positives there too.

Headline numbers from the baseline run (against ~/code/pigeon's cached iOS device tree):

Slice KB
Total shipped 103.0 MB
Reachable (kernel/stdlib/etc seed) 25.5 MB
Strippable (audit, 0 reachable) 17.3 MB
Duplicate versions 8.0 MB
Hardcoded baseline only catches ~28 MB extra (megaco, snmp, compiler, …)
Unreachable modules INSIDE partly-used libs ~52 MB (megaco 64/65 dead, snmp 83/90 dead, …)

That last row is the prize per-module stripping would unlock, but it's also the riskiest: it requires confident "this module is never called" answers that only trace data provides.

2026-05-11 (cont'd) — Audit improvements: foreign-app allow-list + trace input

Two related improvements landed in close succession after the Slim extraction.

Foreign-app allow-list (:project_deps): OtpAudit.audit/2 now accepts a list of atoms naming the project's runtime deps. Any lib in the bundle that isn't OTP-shipped, isn't Elixir-shipped, isn't the app under test, and isn't in :project_deps is classified as foreign and lands in report.foreign_apps (out of report.strippable_libs). mix mob.audit_otp auto-derives :project_deps from _build/dev/lib/ — Mix's view of what's installed. The legacy name-pattern heuristic (test_/toy_/mob_test/scratch_) is preserved when :project_deps is omitted, for backwards compat. This catches the pigeon / push_notify / phase2q_lv / etc. false-negative cluster the baseline audit surfaced.

Trace input (:trace_input): OtpAudit.audit/2 accepts a runtime-traced module set (MapSet, list, OtpTrace.result, or remote-trace shape — normalizer handles all four) and exposes report.trace_strippable_libs — libs whose modules are entirely absent from the trace. Each lib_report grows :modules_traced and :untraced_modules. The intersection strippable_libs ∩ trace_strippable_libs is the high-confidence strip set; the trace-only difference is the "static graph reaches it but trace says never called" set that unlocks megaco / snmp / diameter / compiler / etc.

mix mob.audit_otp --trace-json path/to/trace.json reads a JSON file written by mix mob.trace_otp --json and feeds it through. The CLI report now shows a "Trace-strippable" section split into "both static + trace" (high confidence) and "trace-only" (unlocked by trace), with statically-reachable module counts on the trace-only entries so the user can see how aggressive each strip would be.

Mob_new wheel-filter cherry-pick (parallel work): between the two audit steps, a .so-filter for iOS wheels was cherry-picked from a parallel pigeon-side branch into NativeBuild: copy_ios_safe_project_python_wheels/2 skips wheels containing any .so (cffi, cryptography ship Android-only binaries). 10 tests pinning the filter behaviour came along. Unrelated to the audit work but landed in the same session.

2026-05-12 — First real device trace + safety guardrail

Captured a 60-second trace against pigeon running on iPhone 17 Pro simulator (pigeon_ios_8a4250e9@127.0.0.1), saved at /tmp/pigeon_trace.json. 60s of UI driving → 133 modules / 1287 MFAs touched.

Feeding that trace through OtpAudit.audit/2 + Slim.compute_strip_set/1 surfaced a real safety issue: the trace correctly flagged megaco, snmp, compiler, diameter, mnesia, inets, etc. as never-called (~36 MB of safe new strip targets) — but ALSO flagged crypto, sasl, public_key, asn1 as never-called, which would crash any non-trivial app the moment it tried TLS or completed OTP boot.

Those four are essential-but-rarely-called from the trace's perspective: sasl runs at boot before tracing opens; crypto/ssl/ public_key/asn1 fire on TLS handshakes the UI driving didn't exercise.

Fix landed: Slim.@always_keep_libs hardcoded guardrail (kernel stdlib erts elixir logger sasl crypto public_key asn1 ssl). audit_expansion/1 subtracts this set after building the expansion union. The guardrail's scope is strictly the audit-driven expansion; the hardcoded baseline doesn't touch any always-keep lib by design.

User escape hatches preserved:

  • :keep_libs — wins over everything, last word.
  • :drop_libs — adds to the strip set after the guardrail filters, so a user who knows their app has zero TLS / no boot-time sasl ref can force-strip a guarded lib with eyes open.

Verified on real pigeon data: of 16 trace-only strippable candidates, 12 land in the strip set (~36 MB savings), 4 (public_key, crypto, asn1, sasl) are kept by the guardrail.

2026-05-12 (cont'd) — Multi-trace union + exqlite stale-lock guard

Two more cleanups landed before pausing for device-driving:

Multi-trace union (union_trace_jsons/2): MobDev.OtpAudit gained a public helper that reads N trace JSONs and returns a unioned MapSet. Caller supplies an on_read_error/2 callback so mix mob.audit_otp (CLI) can Mix.raise on a typo while the slim build path (NativeBuild.maybe_run_audit) just warns and skips that trace.

mix mob.audit_otp now accepts --trace-json repeated (OptionParser :keep). mob.exs accepts slim: [trace_jsons: ["a.json", "b.json"]] in addition to the single :trace_json (both shapes coexist for back-compat).

Defensive: all-reads-fail returns nil rather than empty set (would have let the audit-driven expansion strip every partly-used lib). Pin'd in tests.

Exqlite stale-lock guard: install_exqlite_otp_lib now uses install_exqlite_decision/2 (public for tests) that returns :noop | :stale | {:install, vsn}. Surfaced by pigeon's iOS-device deploy: mix.lock had an exqlite entry left over from a long-removed ecto_sqlite3 dep, but _build/dev/lib/exqlite was empty. Old code crashed in File.cp!; new code logs "[exqlite] stale mix.lock entry — skipping" and proceeds.

What's next

  1. Capture multi-mode traces. Done 2026-05-12. Capture as many windows as you like, point trace_jsons: at all of them. The audit unions them automatically.

  2. Per-module stripping inside partly-used libs. Still the biggest un-claimed prize (~52 MB of dead modules inside libs the static graph keeps alive). Now needs only:

    • Comprehensive multi-trace coverage (multi-trace exists; you just need to capture boot + UI + auth + idle + every screen and feed them all)
    • .app file rewriting — drop stripped modules from the {modules, [...]} list or the application controller will try to load them at boot
    • Backup safety from mix mob.verify_strip (already exists — eager-loads every shipped .beam) Material regression risk — defer until the multi-trace flow has driven a few apps end-to-end.
  3. mix_unused evaluation — still orthogonal, still anytime.

  4. Drive the flow. The bonus territory is done; what's left is exercise. Capture multiple traces against real apps, set slim: [audit: true, trace_jsons: [...]], deploy, watch for crashes / unexpected strips. Bugs surfaced this way are the next round of work.

How to use trace-augmented slim today

cd ~/code/<mob_app>
mix mob.connect --no-iex          # discover node, set up tunnels
mix mob.trace_otp \
  --remote <node>@127.0.0.1 \
  --duration 60000 \
  --json /tmp/mob_trace.json      # drive the app during the window

Then in mob.exs:

config :mob_dev,
  slim: [
    audit: true,
    trace_json: "/tmp/mob_trace.json",
    # Optional: force-keep if the trace's coverage is incomplete:
    keep_libs: ["specific_lib_you_need"],
    # Optional: force-strip a guarded lib if you're sure:
    drop_libs: ["crypto"]  # only do this if you have ZERO TLS
  ]

Inspect via mix mob.audit_otp --trace-json /tmp/mob_trace.json to preview the audit + trace classification before letting Slim strip.

Known caveats

  • Pigeon's iOS device build (physical iPhone) currently fails on MobDev.NativeBuild.install_exqlite_otp_lib/1 because pigeon doesn't depend on exqlite (only mix mob.new-generated projects do). The slim work above used the simulator deploy, which doesn't hit that path. Filed as future work — guard install_exqlite_otp_lib with File.exists? so non-exqlite projects can deploy to physical iOS too.