This repository contains a framework for converting monocular videos into side-by-side (SBS) 3D videos. It utilizes a combination of image processing techniques and depth map predictions to generate separate views for each eye, creating a 3D effect when viewed with appropriate hardware.
Thanks for open-sourcing the code. I have few questions:
1/ Why do you not make disparity = scale_factor * (1/depth)?
since depth is inversely proportional to disparity.
2/ How do we define scale_factor? how do we choose this value?
3/ shift_threshold is used in depth_anything notebook but not in the Marigold notebook. Why do we use that?
4/ I am getting blurry results after inpainting and not so stable video (with artifacts). Why don't you use deep learning based modern inpainting models there?