Popular repositories Loading
-
trait-inoculation
trait-inoculation PublicInoculation prompting experiment: trait distillation + checkpoint evaluation (French/Playful traits)
Python 1
-
rl-misalignment-envs
rl-misalignment-envs PublicRL environments that produce emergent misalignment in LLMs — replications of Sycophancy→Subterfuge, Goal Misgeneralization, and Natural EM
Python 1
-
openweights
openweights PublicForked from longtermrisk/openweights
A python sdk for LLM finetuning and inference on runpod infrastructure
Python 1
-
shaping-motiv-expl
shaping-motiv-expl PublicShaping motivations experiment: disentangling mechanisms that prevent emergent misalignment
Python 1
-
claudex-demo
claudex-demo PublicDemo: Gradient Leading Terms in Attention-Only Transformers (Im et al., ICLR 2026)
Python
-
If the problem persists, check the GitHub status page or contact support.


