Skip to content
This repository was archived by the owner on Jun 4, 2025. It is now read-only.

Commit 92bfd81

Browse files
committed
Use INT8 input for quantized models (#201)
* Added call to function that skips the quantization of the input if the model is quantized * Pass model path as string
1 parent 56e2d04 commit 92bfd81

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

utils/neuralmagic/utils.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import torch
88
import yaml
99
from sparseml.pytorch.optim import ScheduledModifierManager
10+
from sparseml.pytorch.sparsification.quantization import skip_onnx_input_quantize
1011
from sparseml.pytorch.utils import ModuleExporter, download_framework_model_by_recipe_type
1112
from sparseml.onnx.utils import override_model_input_shape
1213
from sparsezoo import Model
@@ -214,6 +215,11 @@ def neuralmagic_onnx_export(
214215

215216
saved_model_path = save_dir / onnx_file_name
216217

218+
try:
219+
skip_onnx_input_quantize(str(saved_model_path), str(saved_model_path))
220+
except Exception:
221+
pass
222+
217223
# set model input shape to a static shape (graph is still dynamic compatible)
218224
# for performance with deepsparse engine + extractable shape for analysis
219225
sample_data_shape = list(sample_data.shape)

0 commit comments

Comments
 (0)