Disentangling Continued Pre-training: Attention-Driven Routing and Semantic Hub Preservation in Language Adaptation
Khanh-Tung Tran, Vinh-Khanh Tran, Barry O'Sullivan, Hoang D. Nguyen
Accepted to ACL 2026
Continued Pretraining (CPT) effectively enables Large Language Models (LLMs) to acquire new target-language capabilities, yet the mechanisms underlying this second-language adaptation remain poorly understood. In this work, we investigate how CPT adapts model representations to accommodate new languages. Our extensive experiments reveal that second-language abilities emerge through a selective adaptation mechanism: task-solving capabilities are preserved in the semantic hub, while interface layers retarget to accommodate shifted token distributions. Through layer-swapping experiments, we demonstrate that semantic understanding can be surgically transferred between base and CPT models while maintaining cross-lingual functionality (e.g., swapping 50% of the parameters reduces performance by only 0.7%). Furthermore, we establish that attention components route language adaptation: larger parameter changes than FFN, correlate more strongly with language-specific neurons, and their surgical replacement substantially degrades performance, unlike FFN. Overall, our work provides a mechanistic understanding, guiding future work on efficient strategies for language adaptation.
Evaluate cross-lingual sentence retrieval on base and CPT models:
python sentence-retrieval/sr_experiment.py \
--model_path <model_path> \
--base <base_language> \
--target <target_language>python layer-specialize/layer_specialize.pyNote: We extend lm-evaluation-harness to include SIB200 task in lm_eval_sib200 folder.
lm_eval --model hf \
--model_args pretrained=<model> \
--tasks <task>python weight-diff/int_and_sem.pyCorrelate results:
python downstream-correlation/downstream_correlation.pypython layer-swap/layer-swap.pyEvaluate to verify:
- Swapping Semantic Hub layers exhibit low performance drop
- Swapping the same amount of layers randomly selected exhibit severe performance drop
python sentence-retrieval/sr_experiment.py \
--model_path <swapped_model_path> \
--base <base_language> \
--target <target_language>python weight-diff/attn_vs_ffn.py \
--base-model <base-model> \
--ckpt-model <ckpt-model>Tokenize multilingual OSCAR corpus for training:
python lsn/load_oscar.py \
--languages en,zh,ga \
--model-id model_name \
--tokenizer path/to/tokenizer \
--output-dir oscar_ids/Get activation patterns:
python lsn/activation.py \
-m <model> \
-i <id_path> \
-t <type> \
-s <save_folder> \
-n <name>Detect LSNs by analyzing activation patterns:
python lsn/identify.py \
-t <tag> \
-l <lang>Monitor LSN activations throughout the training trajectory:
python lsn/lsn_activation_cpt.py \
--model <model> \
--id_path <id_path> \
--type <type> \
--save_folder <save> \
--name <name> \
--mask_file <mask> \
--lang <lang>python weight-diff/gate_lsn.py \
--base-model path/to/base \
--target-model path/to/cpt \
--mask-file masks.json \
--lang target_lang \
--output results/output.jsonAnalyze correlation between:
- Attention weight changes in interface layers and LSN activation across checkpoints.
- LSN weight changes and LSN activation across checkpoints.
python3 activation-correlation/activation-correlation.pypython component-swap/attn_int_swap.py
python component-swap/ffn_lsn_swap.pyRun sentence retrieval on swapped models to verify:
- Semantic preservation for FFN swap
- Degradation for Attention swap