A powerful Python script for merging multiple SafeTensor models using various averaging methods including Simple Moving Average (SMA), Exponential Moving Average (EMA), and Weighted Moving Average (WMA).
- Multiple Averaging Methods: SMA, EMA, and WMA support
- Memory Efficient: Processes large models without excessive memory usage
- GPU Acceleration: CUDA support for faster processing
- Scalar Tensor Support: Properly handles raw float tensors and alpha weights
- Flexible Weighting: Custom weights for fine-tuned merging control
- Metadata Preservation: Stores merge parameters and source information
# Required dependencies
pip install torch tqdmpython safetensor_merger.py <input_directory> <output_file> [OPTIONS]| Argument | Description |
|---|---|
input_dir |
Directory containing .safetensors files to merge |
output_file |
Output path for the merged model |
| Argument | Type | Default | Choices | Description |
|---|---|---|---|---|
--method |
str | sma |
sma, ema, wma |
Merging method to use |
--alpha |
float | 0.5 |
0.0 < α ≤ 1.0 |
Smoothing factor for EMA |
--weights |
str | None |
comma-separated | Custom weights for WMA |
--device |
str | cpu |
cpu, cuda |
Device for tensor operations |
Formula: Mavg = (1/N) × Σ(Mi)
Equal weight averaging of all models. Each model contributes equally to the final result.
python safetensor_merger.py ./models ./merged_sma.safetensors --method smaUse Case: When all models are equally important and you want a balanced merge.
Formula: EMA(t) = α × Current + (1-α) × EMA(t-1)
Exponentially weighted average where recent models have more influence.
# Higher alpha = more weight to recent models
python safetensor_merger.py ./models ./merged_ema.safetensors --method ema --alpha 0.7
# Lower alpha = more weight to earlier models
python safetensor_merger.py ./models ./merged_ema.safetensors --method ema --alpha 0.3Alpha Parameter Guide:
α = 0.1-0.3: Heavily favor earlier modelsα = 0.4-0.6: Balanced weightingα = 0.7-0.9: Heavily favor recent models
Use Case: When model order matters and you want recent models to have more influence.
Formula: WMA = Σ(Wi × Mi) / Σ(Wi)
Custom weighted average with user-defined weights.
# Custom weights (must match number of models)
python safetensor_merger.py ./models ./merged_wma.safetensors --method wma --weights "0.5,0.3,0.2"
# Default linear decreasing weights [N, N-1, ..., 2, 1]
python safetensor_merger.py ./models ./merged_wma.safetensors --method wmaWeight Examples:
"1,1,1": Equal weights (same as SMA)"0.6,0.3,0.1": Heavily favor first model"0.1,0.3,0.6": Heavily favor last model"2,1,1": Double weight to first model
Use Case: When you know specific models should have different importance levels.
# Simple merge with equal weights
python safetensor_merger.py ./my_models ./output.safetensors
# Use GPU acceleration
python safetensor_merger.py ./my_models ./output.safetensors --device cuda# Conservative EMA (favor earlier models)
python safetensor_merger.py ./models ./conservative_merge.safetensors --method ema --alpha 0.2
# Aggressive EMA (favor recent models)
python safetensor_merger.py ./models ./aggressive_merge.safetensors --method ema --alpha 0.8
# Balanced EMA
python safetensor_merger.py ./models ./balanced_merge.safetensors --method ema --alpha 0.5# Equal importance to all 3 models
python safetensor_merger.py ./models ./equal_merge.safetensors --method wma --weights "1,1,1"
# Pyramid weighting (decreasing importance)
python safetensor_merger.py ./models ./pyramid_merge.safetensors --method wma --weights "0.5,0.3,0.2"
# Focus on middle model
python safetensor_merger.py ./models ./middle_focus.safetensors --method wma --weights "0.2,0.6,0.2"
# Binary choice (ignore middle model)
python safetensor_merger.py ./models ./binary_merge.safetensors --method wma --weights "0.5,0,0.5"# Process multiple model directories
for dir in model_*; do
python safetensor_merger.py "$dir" "merged_${dir}.safetensors" --method ema --alpha 0.6
done# For large models, use CPU to avoid GPU memory issues
python safetensor_merger.py ./large_models ./merged.safetensors --device cpu --method smaThe script provides detailed output including:
- Model Discovery: Lists all found
.safetensorsfiles - Common Tensors: Reports number of tensors present in all models
- Method Parameters: Shows selected method and parameters
- Progress: Real-time progress bar during processing
- Metadata: Stores merge information in the output file
Merging 3 models using EMA (α=0.6)...
Model files:
1. model_1.safetensors
2. model_2.safetensors
3. model_3.safetensors
Finding common tensor keys...
Found 145 common tensors
Computing EMA tensors...
Processing tensors: 100%|████████| 145/145 [00:23<00:00, 6.21it/s]
Using memory efficient save file: ./merged_ema.safetensors
✅ Successfully merged 3 models using EMA
📁 Output saved to: ./merged_ema.safetensors
Each merged model contains metadata with:
merged_models_count: Number of source modelsmerge_method: Method used (SMA/EMA/WMA)source_files: List of source filenames- Method-specific parameters (
alphafor EMA,weightsfor WMA)
Common errors and solutions:
Error: No .safetensors files found in ./models
Solution: Ensure the directory contains .safetensors files
Error: Number of weights (2) must match number of models (3)
Solution: Provide correct number of comma-separated weights
Error: Alpha for EMA must be between 0 and 1 (exclusive of 0)
Solution: Use alpha value in range (0.0, 1.0]
Warning: CUDA requested but not available. Using CPU instead.
Solution: Install CUDA-enabled PyTorch or use --device cpu
- Use GPU: Add
--device cudafor faster processing on compatible hardware - Method Selection: SMA is fastest, WMA is most flexible, EMA is good for sequential importance
- Memory: For very large models, stick with CPU to avoid memory issues
- Weights: Pre-calculate optimal weights for WMA based on model performance
The script handles all standard tensor types including:
- Standard tensors (F32, F16, BF16, etc.)
- Integer tensors (I8, I16, I32, I64)
- Boolean tensors
- Scalar tensors (alpha weights, bias terms, etc.)
- Float8 types (if supported by PyTorch version)
- Tested on Python 3.11.9
- PyTorch 1.9+
- tqdm
- Standard library modules (json, struct, pathlib, argparse)
This script is provided as-is for educational and research purposes.