
To obtain impressive visual effects, the viewer’s main focus is considered, and thus salient object detection is performed to explore the significance region for visual attention. However, stereoscopic effects of synthesized views obtained from this depth map are limited and dissatisfy viewers. Rather than retrieving a precise depth value for pixels from the depth cues, a direction angle of the image is estimated and then the depth gradient, in accordance with the direction angle, is integrated with superpixels to obtain the depth map. The linear and atmospheric perspective cues that compensate each other are employed to estimate depth information. This study introduces an image-based 2D-to-3D conversion system that provides significant stereoscopic visual effects for humans. The experimental results prove that this novel method can adapt to different image types, reduce computational complexity and improve the temporal smoothness of generated 3D video. As input 2D video content consists of a number of video shots, the proposed algorithm reuses the global depth gradient of frames within the same video shot to generate time-coherent depth maps. Global depth gradient is computed according to image type while local depth refinement is related to color information. In our algorithm, a depth map is generated by combining global depth gradient and local depth refinement for each frame of 2D video input. Our study presents a novel 2D-to-3D video conversion algorithm which can be adopted in a 3D image sensor.

Generally, 2D image sensor and 2D-to-3D conversion chip can compose a 3D image sensor. 3D image sensors are a new method to produce stereoscopic video content conveniently and at a low cost, and can thus meet the urgent demand for 3D videos in the 3D entertaiment market. 3D entertainment is getting more and more popular and can be found in many contexts, such as TV and home gaming equipment. In this paper, we propose a novel 2D-to-3D video conversion method for 3D entertainment applications.
