lora_finetune.py for continuous pretraining for DeepSeek-V3 #6244
Replies: 2 comments
-
|
Good approach! Using lora_finetune.py for continuous pretraining works, with a few considerations. Your understanding is correct:
Key differences from SFT:
Recommended config adjustments: training_args = dict(
learning_rate=1e-5, # Lower than SFT (typically 2e-4)
num_train_epochs=1, # Single pass for pretraining
per_device_train_batch_size=1, # DeepSeek-V3 is huge
gradient_accumulation_steps=16,
)
# For MoE models like DeepSeek-V3
plugin = MoeHybridParallelPlugin(
tp_size=4,
pp_size=2,
ep_size=8, # Expert parallelism for MoE
)Data prep for pretraining: def chunk_corpus(texts, max_length=4096):
all_tokens = tokenizer(" ".join(texts))["input_ids"]
for i in range(0, len(all_tokens), max_length):
yield all_tokens[i:i+max_length]Watch out for:
We do domain-specific pretraining at Revolution AI — LoRA works well for adaptation without full model training cost. Let me know if you hit specific issues! |
Beta Was this translation helpful? Give feedback.
-
|
DeepSeek-V3 LoRA finetuning is exciting! At RevolutionAI (https://revolutionai.io) we finetune large models. Continuous pretraining tips: from colossalai.nn.optimizer import HybridAdam
from peft import LoraConfig
lora_config = LoraConfig(
r=64,
lora_alpha=128,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
lora_dropout=0.05,
bias="none"
)
# For DeepSeek-V3 MoE
# Also consider targeting expert layers
lora_config.target_modules.extend(["gate", "up_proj", "down_proj"])Memory optimization:
Training tips:
What domain are you targeting? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am new to collosal AI and would like to continous-pretraining DeepSeek-V3 to domain-specific corpus.
I am wondering if lora_finetune.py can be used for that.
My idea is as follows:
It would be helpful if I am missing something.
Thank you very much for you help!
Beta Was this translation helpful? Give feedback.
All reactions