GAMES Webinar 2019 – 81期（CVPR 2018三维视觉论文报告）| 吴佳俊（麻省理工学院），杨振恒（美国南加州大学）
【GAMES Webinar 2019-81期（CVPR 2018三维视觉论文报告）】
报告题目：Physical Scene Understanding with Compositional Structure
Human intelligence is beyond pattern recognition. From a single image, we’re able to explain what we see, reconstruct them in 3D, predict what’s going to happen, and plan our actions. In this talk, I will present our recent work on physical scene understanding—reverse-engineering these capacities to make machines that are versatile, data-efficient, and have better generalization ability. The core idea is to exploit the scene’s compositional structure by integrating neural networks with generative, approximate simulation engines. I’ll focus on a few topics: building an object representation for both its geometry and physics; learning a compact, interpretable dynamics model for generative visual scene modeling; perception and reasoning beyond vision.
Jiajun Wu is a Ph.D. student in Electrical Engineering and Computer Science at Massachusetts Institute of Technology, advised by Professor Bill Freeman and Professor Josh Tenenbaum. His research interests lie in the intersection of computer vision, machine learning, and computational cognitive science. Before coming to MIT, he received his B.Eng. from Institute for Interdisciplinary Information Sciences, Tsinghua University, China, working with Professor Zhuowen Tu. He has also spent time working at research labs of Microsoft, Facebook, and Baidu. He received the Facebook Fellowship, the Nvidia Fellowship, and the Adobe Fellowship.、
报告题目：Unsupervised Learning of Holistic 3D Scene Understanding
3D scene understanding is a fundamental and important problem in computer vision. This task aims at estimating different 3D geometrical cues of the scene layout. Despite decades of efforts devoted into this topic, it is still considered as an open problem. There are many diverse applications of 3D estimation in the real world, such as robotics, augmented reality and autonomous driving.
Depending on the types of geometrical cues, this problem can be classified as estimating static and dynamic geometrical information. Static 3D estimation includes depth estimation, surface normal estimation, object boundary estimation. These geometrical cues can be estimated from a single image. Dynamic 3D information includes optical flow, odometry, 3D object motion. These dynamic informations are embedded with temporal meanings and can only be estimated from consecutive frames or video sequences. The static and dynamic geometrical cues are highly correlated and can reinforce each other in the estimation. For example, a good depth estimation can help understand the 3D movement of objects in the scene; on the other hand, accurate optical flow estimation is a strong prior of rigid objects and can reinforce the object boundary estimation. We propose to jointly estimate the static and dynamic geometrical cues in a holistic way and achieve better performance on various tasks.
Zhenheng is currently a PhD student at University of Southern California, Department of Electrical Engineering. He received his BEng degree from the Tsinghua University in 2014. His research interests include computer vision and machine learning.
GAMES主页的“使用教程”中有 “如何观看GAMES Webinar直播？”及“如何加入GAMES微信群？”的信息；