<img width="551" alt="Image" src="https://github.com/user-attachments/assets/fd14a657-547c-40ee-83f0-2b8ce0285da6" /> How can we tweak/ablate the attention mechanism to understand the router's distribution without the effects of attention?
How can we tweak/ablate the attention mechanism to understand the router's distribution without the effects of attention?