Mask prompts are not working in the ONNX pipeline, while click and box prompts work fine.
In the PyTorch model, sam_prompt_encoder.mask_input_size is defined as (256,256), but in the exported ONNX model the prompt encoder expects only 3D mask input ([B,H,W]) but the existing code checks for 4 dimensions which breaks the execution when prompt encoder is executed. As a result, the dense mask branch does not produce valid outputs (all values collapse to -1024).
It seems the mask path in the prompt encoder was not fully preserved during ONNX export, so only sparse prompts function correctly.
This is the debug output :
begin image encoder onnx
0
(1, 3, 1024, 1024)
begin prompt encoder onnx
begin mask decoder onnx
backbone_features 11108.974
image_pe 34842.402
sparse_embeddings -1.4240006
dense_embeddings -18153.611
high_res_features 28492.957
high_res_features 25796.06