-
Notifications
You must be signed in to change notification settings - Fork 61
Description
cmd example:
device=3
for model in Qwen3-32B
do
CUDA_VISIBLE_DEVICES=$device /home/weiweiz1/mambaforge/envs/expr/bin/python auto_round \
--model_name $dir/$model \
--avg_bits 4 \
--options "mxfp4,mxfp8" \
--iters 200 \
--device_map auto \
--ignore_scale_zp_bits \
--device_map auto \
--format fake \
--tasks "lambada_openai,hellaswag,winogrande,piqa,mmlu,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,gsm8k" \
--enable_alg_ext \
--enable_deterministic_algorithms \
--eval_task_by_task \
--eval_bs 16 \
--output_dir tmp/ \
2>&1| tee -a tmp.txt
done
error log:
[[38;20m2025-11-23 04:44:35 INFO base.py L367: using torch.bfloat16 for quantization tuning^[[0m
[[33;1m2025-11-23 04:44:35 WARNING base.py L389: using algorithm extension for quantization.^[[0m
[[33;1m2025-11-23 04:44:35 WARNING main.py: algorithm extension has only undergone limited validation on INT2,mxfp4 and nvfp4; use with caution.^[[0m
[[38;20m2025-11-23 04:44:35 INFO base.py L1631: start to cache block inputs^[[0m
[[38;20m2025-11-23 04:44:39 INFO base.py L1646: caching done^[[0m
0%| | 0/64 [00:00<?, ?it/s]^MQuantizing model.layers.0: 0%| | 0/64 [00:01<?, ?it/s]Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "autoround/auto_round/main.py", line 897, in
run()
File "autoround/auto_round/main.py", line 878, in run
tune(args)
File "autoround/auto_round/main.py", line 647, in tune
model, folders = autoround.quantize_and_save(export_dir, format=args.format) # pylint: disable=E1101
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!