Skip to content

Add experiments/mnist VPD family (memorization-vs-generalization study)#890

Draft
lee-goodfire wants to merge 2 commits into
feature/silico-integrationfrom
silico/mnist-vpd-memorization
Draft

Add experiments/mnist VPD family (memorization-vs-generalization study)#890
lee-goodfire wants to merge 2 commits into
feature/silico-integrationfrom
silico/mnist-vpd-memorization

Conversation

@lee-goodfire

Copy link
Copy Markdown
Collaborator

Adds a reusable MNIST MLP experiment family for adVersarial Parameter Decomposition (VPD), mirroring experiments/resid_mlp, used for the memorization-vs-generalization decomposition study (Silico issue 6).

What's added

  • param_decomp_lab/experiments/mnist/
    • models.pyMnistMLP (named fc_in/fc_h.*/fc_out Linears, GELU) + train config + MnistTargetRunInfo
    • data.py — raw-tensor MNIST loader, deterministic label-corruption/subsample builder, full-batch memorized-set iterator (drops the partial final batch so VPD's per-datapoint persistent-PGD adversary stays batch-uniform)
    • train_mnist.pypd-mnist-pretrain CLI (label-noise sweep + size ladder)
    • run.pypd-mnist CLI, categorical KL reconstruction path (recon_loss_kl) + run_batch_first_element, SavedMnistRun
  • pd-mnist-pretrain / pd-mnist entry points in param_decomp_lab/pyproject.toml
  • Registered the existing generic UnmaskedReconLoss as a YAML eval metric (gives the kl_unmasked faithfulness check on categorical, non-LM targets; CEandKLLosses is LM-only)

Result (Silico issue 6)

At matched decomposition faithfulness, a pure memorizer decomposes into ~130x more live components (and ~240x more per input) than a generalizer, but the components are distributed/redundant, not per-example (density, ablation, and specimen evidence).

🤖 Generated with Claude Code

lee-goodfire and others added 2 commits June 24, 2026 02:03
…decomposition glue)

New reusable experiment family mirroring resid_mlp for the MNIST
memorization-vs-generalization study:
- models.py: MnistMLP (fc_in/fc_h.*/fc_out, GELU) + train config + run info
- data.py: raw MNIST loader, deterministic label-corruption/subsample builder,
  infinite memorized-set batch iterator
- train_mnist.py: pd-mnist-pretrain CLI (label-noise sweep + size ladder)
- run.py: pd-mnist CLI, categorical KL recon path (recon_loss_kl) + run_batch_first_element
- register UnmaskedReconLoss as a YAML eval metric (generic kl_unmasked check;
  CEandKLLosses is LM-only)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ay batch-uniform

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant