https://notdesigned.github.io/2026/04/20/Error-Entropy/?
Introduction https://arxiv.org/pdf/2510.04067 The paper analyze a phenomenon that the cross-entropy loss scales slower when the model getting scaled larger. They break the CE loss into three part: Err
https://notdesigned.github.io/2026/04/20/Error-Entropy/?
Introduction https://arxiv.org/pdf/2510.04067 The paper analyze a phenomenon that the cross-entropy loss scales slower when the model getting scaled larger. They break the CE loss into three part: Err