Hello, thank you very much for your work.
I would like to include the depth maps obtained from monocular depth estimation in my experiments as multimodal input data. However, since monocular depth estimation typically produces relative depth maps, I would like to ask whether your model can use relative depth directly, or if it includes a way to process relative depth so that it can be used as input data.
Looking forward to your reply