A powerful tool for converting PyTorch models (safetensors format) to FP8 quantized format using learned rounding techniques inspired by AdaRound. This tool provides both command-line and GUI interfaces for easy model quantization.
- Advanced Quantization: Uses SVD-based spectral norm scaling and learned rounding for optimal quantization
- FP8 Support: Converts models to
torch.float8_e4m3fnformat for improved inference efficiency - Bias Correction: Automatic bias adjustment to compensate for quantization errors
- Memory Efficient: Optimized loading and saving to handle large models
- Model Compatibility: Special modes for T5XXL and distillation models
- Dual Interface: Both command-line script and user-friendly GUI
- Stochastic Rounding: Manual implementation for better quantization quality
pip install -r requirements.txt
import torch
try:
torch.zeros(1, dtype=torch.float8_e4m3fn)
print("FP8 support available!")
except:
print("FP8 support not available. Please update PyTorch.")Run the graphical interface for easy model quantization:
python fp8_quantization_gui.py
GUI Features:
- File browser for input/output selection
- Real-time progress monitoring
- Parameter adjustment with spinboxes
- Settings save/load functionality
- Live output logging
- Automatic output filename generation
For batch processing or scripting:
python convert_fp8_scaled_learned_stochastic_bias_svdv2_memeff.py \
--input model.safetensors \
--output model_fp8.safetensors \
--calib_samples 1024 \
--num_iter 256 \
--lr 0.02 \
--reg_lambda 1.0
| Option | Default | Description |
|---|---|---|
--input |
Required | Input safetensors file path |
--output |
Auto-generated | Output safetensors file path |
--calib_samples |
1024 | Number of calibration samples for bias correction |
--num_iter |
256 | Optimization iterations per tensor |
--lr |
0.02 | Learning rate for rounding optimizer |
--reg_lambda |
1.0 | Regularization strength |
--t5xxl |
False | Enable T5XXL compatibility mode |
--keep_distillation |
False | Preserve distillation layers unquantized |
This tool implements a sophisticated quantization approach:
- SVD-based Scaling: Uses spectral norm (largest singular value) for optimal scale factor calculation
- Learned Rounding: Optimizes rounding decisions using gradient-based methods
- Bias Correction: Automatically adjusts biases to compensate for quantization errors
- Stochastic Rounding: Custom implementation for improved quantization quality
- Streaming Processing: Processes tensors one at a time to minimize memory usage
- Efficient I/O: Custom safetensors loading/saving optimized for large files
- GPU Memory Management: Automatic cleanup and cache clearing
- T5XXL Mode: Excludes decoder layers and certain parameter types
- Distillation Support: Option to preserve distillation layers in original precision
- General Models: Works with standard PyTorch models in safetensors format
fp8-quantization-tool/
├── convert_fp8_scaled_learned_stochastic_bias_svdv2_memeff.py # Core conversion script
├── fp8_quantization_gui.py # GUI interface
├── requirements.txt # Python dependencies
├── README.md # This file
└── fp8_gui_settings.json # GUI settings (auto-generated)
- GPU: CUDA-capable GPU recommended for performance
- RAM: Minimum 16GB, 32GB+ recommended for large models
- Storage: SSD recommended for faster I/O operations
- Use fewer calibration samples (
--calib_samples) for faster processing - Reduce iterations (
--num_iter) if quality is acceptable - Enable T5XXL mode for compatible models to reduce processing time
- Monitor GPU memory usage during processing
The quantized models maintain high quality through:
- Adaptive Scaling: Per-tensor scale factors optimized for each layer
- Learned Rounding: Minimizes quantization error through optimization
- Bias Compensation: Corrects for systematic quantization bias
- Spectral Normalization: Preserves important spectral properties
-
FP8 Not Supported
- Solution: Update to PyTorch build with FP8 support
-
Out of Memory
- Solution: Reduce
calib_samplesor use CPU processing - Close other applications using GPU memory
- Solution: Reduce
-
Import Errors
- Solution: Ensure all files are in the same directory
- Check that all dependencies are installed
-
Slow Processing
- Solution: Use GPU acceleration
- Reduce
num_iterfor faster processing - Use SSD storage for model files
- "Tensor is all zeros, skipping optimization": Normal for empty tensors
- "No calibration data found": Internal error, check model structure
- "Unsupported float8 type": Update PyTorch version
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
- Inspired by the AdaRound paper: "Up or Down? Adaptive Rounding for Post-Training Quantization"
- Memory-efficient loading/saving and GUI code by marduk191
- Core quantization implementation by Clybius
- initial merging and raw tensor fix by silveroxides
- Initial release with learned rounding quantization
- GUI interface implementation
- SVD-based scaling method
- Bias correction functionality
- T5XXL and distillation model support