Under testing...(目前还属于团队自用,上传上来的改动了一部分,可能有少许bug,正在测试中) This is a text pretraining framework for LLaDA models, modified from the MMaDA codebase.
Features:
- Text-only training pipeline
- Distributed training support with DeepSpeed and Accelerate
- YAML-based configuration
- Memory efficient training options
pip install -r requirements.txt# Update paths in configs/llada_pretraining.yaml
bash scripts/train.shEdit configs/llada_pretraining.yaml:
model:
pretrained_model_path: ".../LLaDA-8B-Base/"
# LLaDA specific configuration
llada_config:
gradient_checkpointing: false # close gradient checkpointing
new_vocab_size: 126464
# Add other LLaDA specific configs here if needed
dataset:
params:
train_shards_path_or_url: "path/to/data"
training:
batch_size: 16
max_train_steps: 100000
mixed_precision: "bf16"accelerate configYou can also use the provided configuration files in accelerate_configs/ for different hardware and distributed setups:
1_gpu.yaml- Single GPU1_node_only.yaml- Single node, single process (CPU or GPU)1_node_8_gpus_deepspeed_zero1.yaml- 8 GPUs with DeepSpeed ZeRO-11_node_8_gpus_deepspeed_zero2.yaml- 8 GPUs with DeepSpeed ZeRO-21_node_8_gpus_deepspeed_zero3.yaml- 8 GPUs with DeepSpeed ZeRO-38_node_8_gpus_deepspeed_zero2.yaml- 8 nodes, each with 8 GPUs, DeepSpeed ZeRO-2
accelerate launch \
--config_file accelerate_configs/1_node_8_gpus_deepspeed_zero1.yaml \
--main_process_port=8888 \
training/train_llada.py \
config=configs/llada_pretraining.yamlLLaDA_pretraining/
├── accelerate_configs/ # Accelerate configurations
├── configs/ # Training configurations
├── models/ # Model implementations
├── parquet/ # Data loading utilities
├── training/ # Training scripts
└── scripts/ # Shell scripts
The files under the folder path you provided should be in JSONL format. It is recommended that the dataset be evenly split into multiple files with the number of files greater than the number of GPUs.
{"text": "Training text content"}MIT License - see LICENSE file.
Based on MMaDA by Yang et al.