• Joe Robinson

How Computers See Depth: Recent Advances in Deep Learning-Based Methods

Part 2: Image-based stereovision

Mapping stereo pair to disparities via learn function f_delta
Stereo vision with deep learning. The input is a stereo image pair (i.e., images captured from the left and right cameras); the output is a depth map wrt the left image and for all pixels visible in both photos. Hence, modern-day end-to-end solutions learn to map two RGB photos to a depth map. The objective is to use supervision to minimize the distance (or maximize the similarity) of the predicted versus the ground truth. The stereo pair (leftmost above) is the input of the deep network (middle) that transforms the images to the corresponding depth prediction (right). Note that the closer the object is closer to the camera the larger the disparity (i.e., the smaller the depth). The output is a dense disparity map displayed on the right, with warmer colors representing larger depth values (and smaller disparities). The author created visualization.

Our perception of depth is an essential part of creating the 3D world around us. This knowledge has been prevalent for centuries, and one man who knew this well was Leonardo da Vinci. He used his expertise to help him create some art that would be famous across history pieces such as “The Last Supper” or “Salvatore Schizzera. Technically, an understanding of binocular traces back to 280 A.D. when Euclid realized our depth perception as humans focus on the same objects with two eyes. Still, today, stereo vision is quite an interesting problem. As I familiarize myself with the topic, the notes kept along the way are now being transformed into a series of blogs for others to reference.


Continue reading on Medium, Friend Link.





8 views0 comments