This is how one can interpret YOLO v5 segmentation output!!! #12834
Replies: 6 comments 3 replies
-
| great!Thank you! | 
Beta Was this translation helpful? Give feedback.
-
| @defend1234 Thanks.. you are welcome! | 
Beta Was this translation helpful? Give feedback.
-
| it's very helpful to me. I'm fusing for this when I use yolov5. Thank you! | 
Beta Was this translation helpful? Give feedback.
-
| It's very helpful ThankYou | 
Beta Was this translation helpful? Give feedback.
-
| @aravindchakravarti Thank you for the detailed explanation! It's really helpful! But I still can't figure out how to output masks with a higher resolution (like 640x640), any advice? Thank you! | 
Beta Was this translation helpful? Give feedback.
-
| It's a super great work!!! Thanks for your contribution!!! But i gotta question about  What's the 32 prototype channels? How to do that? Which line is the corresponding code in? | 
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi All,
There are lot of tutorials online (also discussions in
YOLOv5issues section) about detection output and postprocessing ofYOLOv5detection model.I was looking something similar for segmentation model from long time. Looks like, there is a very little information online or at least I didn't find much online. Until yesterday, when I understood that
YOLOv5segmentation model is inspired fromYOLACT.So, I am giving some inputs here, so that you know how
YOLOv5processes its segmentation output.Before we go to

YOLOv5segmentation model, lets see howYOLACTworks using simple diagram from its paper.Basically,
YOLOACTproduces two outputs.Prototype Generation
These are prototype masks which gets generated by Fully Connected Convolutional Neural Network. Number of masks generated is determined by last layer which has
kchannels. Number of masks generated =kMask Coefficients
The masks generated above are just prototypes. We need to combine these channels. To do this,
YOLOACTalso generated mask coefficients.YOLACTmultiplies prototypes with mask coefficients to generate final masks as shown below. They are later cropped and thresholded.Now, lets come to
YOLOv5segmentation. I am usingzidange.jpgfor below illustration.Lets look at below code in
segment\predict.pyHere model is producing two outputs. One is
predand another isproto.If you see the output dimension of
pred=torch.Size([1, 25200, 117])and output dimension of
proto=torch.Size([1, 32, 160, 160])Now, for
pred.1is the batch size,25200is the overall predictions including all anchors(same as detection network) and117is equal to85+32. Meaning, there80classes,5localization informationx, y, w, h, confand last32are themask coefficientsAnd for
proto.1is the batch size,32is number of prototype masks with each mask being160x160pixels. So, how does these 32 prototype look like? Lets see some examples.So, 32 such prototype masks will be generated by model. We can see, none of the images clearly identifying any object (2 persons and 1 tie) in the image. That is where we need help of mask coefficients.
We know that, model has produced
predwith size([1, 25200, 117])which contains mask coefficients. Now thispredis passed to NMS.If we see the dimension of
predthen forzidange.jpgit will betorch.Size([3, 38])Why
3? Because we have 3 objects in input image (2 persons and 1 tie).Why
38? Because YOLO segmentation also outputs bounding boxes. Hence this38is actually6+32. First6will bex, y, w, h, conf, class+32co-efficients.To check if you print
pred[0][0]then you will get,Now, leave first 6 values (because they corresponds to bounding box information). Now take rest 32 values and multiply with 32 prototype channels. You will get the final MASK!!!! One example is below which is indicating tie !!
Hope this is helpful!!
Beta Was this translation helpful? Give feedback.
All reactions