Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
de60f70
git subrepo push --branch=rfc-for-monorepo-license-and-update shared/…
svwingerden Jun 25, 2025
a017b64
Merge branch 'main' into williamsnell/eng-512
williamsnell Jun 30, 2025
676d63c
Merge branch 'williamsnell/move-tests-to-triangle' of https://github.…
GarretteBaker Jul 18, 2025
f3a787e
Merge branch 'main' into williamsnell/move-tests-to-triangle
williamsnell Jul 21, 2025
335b190
Merge branch 'main' into williamsnell/move-tests-to-triangle
GarretteBaker Jul 22, 2025
ab0b9e3
Disable test test_accuracy_rrr as it is not reliable (#1139)
BorisTheBrave Oct 30, 2025
18cf05a
Add docs back in (#1147)
svwingerden Nov 4, 2025
e0b6a2a
Take the axe to TPUs (#1143)
svwingerden Nov 7, 2025
05bff48
Adam/eng 832 enable ruff linter for shared (#1174)
BorisTheBrave Nov 10, 2025
c1099b9
Remove typing.Dict, List etc. (#1198)
BorisTheBrave Nov 11, 2025
ea41235
Weight Restrictions Overhaul (#1181)
williamsnell Nov 13, 2025
f5a6c29
ENG-882 docs autobuild (#1465)
Jan 19, 2026
65fc194
Try ylecun/mnist (#1494)
williamsnell Jan 27, 2026
25f39d7
Refactor sampler metrics for sgld.py (#1276)
williamsnell Mar 25, 2026
1cbfb96
Count sketch metrics (#1757)
Apr 9, 2026
b485644
Johan/eng 1086 update devinterp to a lightweight aether port (#1791)
svwingerden Apr 21, 2026
ccba7f7
change reqs, install, deps, etc. for devinterp port (#1837)
svwingerden Apr 22, 2026
cba1b2d
readme, docs, notebook fixes
svwingerden Apr 23, 2026
abd7979
undo ipynb output
svwingerden Apr 23, 2026
78df3ad
actually update the docs, and some minor revisions
svwingerden Apr 23, 2026
c4c4062
once more
svwingerden Apr 23, 2026
f28e5a2
Merge branch 'main' into stan/rfc-for-monorepo-docs
svwingerden Apr 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 41 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,14 @@
# DevInterp

[![PyPI version](https://badge.fury.io/py/devinterp.svg)](https://badge.fury.io/py/devinterp) ![Python version](https://img.shields.io/pypi/pyversions/devinterp) ![Contributors](https://img.shields.io/github/contributors/timaeus-research/devinterp) [![Docs](https://img.shields.io/badge/Read_the_Docs!-white?style=flat&logo=Read-the-Docs&logoColor=black)](https://devinterp.timaeus.co/)
[![PyPI version](https://badge.fury.io/py/devinterp.svg)](https://badge.fury.io/py/devinterp) ![Python version](https://img.shields.io/pypi/pyversions/devinterp) ![Contributors](https://img.shields.io/github/contributors/timaeus-research/devinterp) [![Docs](https://img.shields.io/badge/docs-devinterp.timaeus.co-blue?style=flat)](https://devinterp.timaeus.co/)


## A Python Library for Developmental Interpretability Research

DevInterp is a python library for conducting research on developmental interpretability, a novel AI safety research agenda rooted in Singular Learning Theory (SLT). DevInterp proposes tools for detecting, locating, and ultimately _controlling_ the development of structure over training.

[Read more about developmental interpretability](https://www.lesswrong.com/posts/TjaeCWvLZtEDAS5Ex/towards-developmental-interpretability).
DevInterp is [Timaeus](https://timaeus.co)' open source research package, built to allow external researchers to do SLT/DevInterp-style research on Large Language Models.

## Features

- **SGLD Sampling** with per-token loss storage to xarray/Zarr
- **Local Learning Coefficient (LLC)** estimation from sampling results
- **Susceptibilities** measuring first-order posterior response to data perturbations, localized on model components
- **Susceptibilities** measuring first-order posterior response to data perturbations, optionally restricted to specific model components
- **Bayesian Influence Functions (BIF)** as posterior correlations (or covariances) between per-sample losses
- **Weight restrictions** for sampling over parameter subsets (e.g., individual attention heads)

Expand All @@ -27,51 +22,51 @@ uv add devinterp

## Example

See [`examples/quickstart.py`](examples/quickstart.py) for a runnable script that computes LLC and susceptibilities on Qwen2.5-0.5B.
See the [Quickstart Notebook](examples/quickstart.ipynb) ([open in Colab](https://colab.research.google.com/github/timaeus-research/devinterp/blob/main/examples/quickstart.ipynb)) or the [Quickstart Script](examples/quickstart.py) for examples of how to compute LLCs and susceptibilities on Qwen2.5-0.5B (GPU required).

## Quick Start

### Compute the Local Learning Coefficient
### Sampling with Observables

```python
from devinterp.slt.llc import llc
from devinterp.slt.sampling import sample

result = llc(
tree = sample(
model=model,
dataset=dataset, # HuggingFace Dataset with "input_ids"
observables={"train": dataset},
dataset=train_data,
observables={
"train": train_data,
"code": (code_data, 5), # (dataset, batches_per_draw)
},
lr=0.001,
n_beta=30,
num_chains=4,
num_draws=200,
)

print(result["llc_mean"]) # scalar LLC
print(result["llc_per_chain"]) # (num_chains,) per-chain LLC
print(result["loss_trace"]) # (num_chains, num_steps) per-step loss, num_steps = num_draws * num_steps_bw_draws + num_burnin_steps
# tree is an xr.DataTree backed by Zarr with full per-token loss traces
```

### Sample with Observables
### Computing the Local Learning Coefficient

```python
from devinterp.slt.sampling import sample
from devinterp.slt.llc import llc

tree = sample(
result = llc(
model=model,
dataset=train_data,
observables={
"train": train_data,
"code": (code_data, 5), # (dataset, batches_per_draw)
},
dataset=dataset, # HuggingFace Dataset with "input_ids"
observables={"train": dataset},
lr=0.001,
n_beta=30,
num_chains=4,
num_draws=200,
)
# tree is an xr.DataTree backed by Zarr with full per-token loss traces

print(result["llc_mean"]) # scalar LLC
print(result["llc_per_chain"]) # (num_chains,) per-chain LLC
print(result["loss_trace"]) # (num_chains, num_steps) per-step loss, num_steps = num_draws * num_steps_bw_draws + num_burnin_steps
```

### Compute Susceptibilities
### Computing Susceptibilities

```python
from devinterp.slt.susceptibilities import susceptibilities
Expand All @@ -96,7 +91,7 @@ result = susceptibilities(
`create_param_masks` supports 85+ HuggingFace model types and TransformerLens.
Restriction patterns: `"full"`, `"l0"`, `"l0h1"`, `"l0g0"` (GQA group), `"l0 attn"`, `"l0 mlp"`, `"embed"`, `"unembed"`.

### Compute BIF
### Computing Bayesian Influence Functions

```python
from devinterp.slt.bif import bif
Expand Down Expand Up @@ -172,16 +167,24 @@ llc_value = float(result["llc_mean"])

## Hyperparameter selection

All sampling is sensitive to hyperparameters. See our [Sampling Hyperparameter Guide](https://timaeus.co/research/2026-04-21-sampling-guide).
All sampling is sensitive to hyperparameters. Our [Sampling Hyperparameter Guide](https://timaeus.co/research/2026-04-21-sampling-guide) covers the three primary knobs — step size (`lr`), inverse temperature (`n_beta`), and localization strength (`localization`) — along with burn-in, steps between draws, and chain count, and walks through diagnosing common failure modes (non-convergence, spikes, NaNs, low signal-to-noise) from the loss traces.


## Further Reading

- [You're Measuring Model Complexity Wrong](https://www.lesswrong.com/posts/6g8cAftfQufLmFDYT/you-re-measuring-model-complexity-wrong) - Introduction to LLC and phase transitions (2024)
Blog Posts:
- [Spectroscopy at Scale: Finding Interpretable Structure in Pythia-1.4B](https://timaeus.co/research/2026-04-21-spectroscopy-main) (2026)
- [Guide for Sampling Hyperparameter Selection](https://timaeus.co/research/2026-04-21-sampling-guide) (2026)

Papers:
- [Structural Inference with Susceptibilities](https://arxiv.org/abs/2504.18274) (2025)
- [Towards Spectroscopy: Susceptibility Clusters in Language Models](https://arxiv.org/abs/2601.12703) (2026)
- [The Local Learning Coefficient: A Singularity-Aware Complexity Measure](https://arxiv.org/pdf/2308.12108) (2023)
- [Algebraic Geometry and Statistical Learning Theory](https://www.cambridge.org/core/books/algebraic-geometry-and-statistical-learning-theory/9C8FD1BDC817E2FC79117C7F41544A3A#fndtn-information) Watanabe (2009)

Background:
- [Algebraic Geometry and Statistical Learning Theory](https://www.cambridge.org/core/books/algebraic-geometry-and-statistical-learning-theory/9C8FD1BDC817E2FC79117C7F41544A3A#fndtn-information), Watanabe (2009)
- [Interpreting the Ising Model](https://timaeus.co/research/2026-04-21-spectroscopy-ising) (2026)
- [You're Measuring Model Complexity Wrong](https://www.lesswrong.com/posts/6g8cAftfQufLmFDYT/you-re-measuring-model-complexity-wrong) (2024)

## Credits & Citations

Expand All @@ -201,3 +204,9 @@ If this package was useful in your work, please cite it as:
howpublished = {\url{https://github.com/timaeus-research/devinterp}},
}
```

The authors would like to thank Zach Furman, Matthew Farrugia-Roberts, Rohan Hitchcock, and Edmund Lau for useful advice.

## About Timaeus

Timaeus is a non-profit advancing AI safety through research in Singular Learning Theory (SLT). We use SLT to understand how training data shapes AI behavior, combining deep mathematical insights from algebraic geometry and statistical physics with empirical research to develop interpretability tools for how capabilities and values emerge during neural network training. This foundational work enables us to build interventions that ensure models are aligned with human values.
Loading
Loading