International Workshop on Computer Vision @ SIAT (IWCV @ SIAT)


Date:July 14, 2010 (Wed)

Venue:B917, SIAT (中国科学院深圳先进技术研究院)

Topic:Processing and Understanding of Community Photographs



Visualizing the World from Internet Photo Collections


Speaker: Noah Snavely, Cornell University

Abstract: The Internet is an unprecedented source of visual information about our world, with billions of images on photo sharing websites such as Flickr.  These images include thousands of photos of virtually every famous world site.  For instance, a Flickr search for "Rome" returns over two million photos, taken by thousands of different people, from myriad viewpoints, at all times of day, and during every season of the year.  While this photo collection contains a nearly complete description of the appearance of the Grand Canyon, it is also very unorganized, making it difficult to get a sense of the rich scene structure underlying the photos. 


In this talk, I will show how we can use 3D computer vision algorithms to automatically recover structure from Internet photo collections, and will demonstrate new photo navigation interfaces, such as Microsoft's Photosynth, that enable powerful new ways to explore our world.  The ultimate goal of this work is to create a rich 3D model of all of the world's sites and cities.


Efficient and Accurate 3D Reconstruction of Urban Scenes


Speaker: Gerard Medioni, University of Southern California

Abstract: We present a novel approach to perform 3D reconstruction in urban scenes from a set of aerial images. Images of urban environments are characterized by significant occlusions, sharp edges, and texture less regions, leading to poor 3D reconstruction using standard multi-view stereo algorithms. Our approach makes a general assumption that urban scenes consist of planar facets that are either horizontal or vertical. In addition, it follows that most edges in urban scenes are also horizontal or vertical. These two assumptions provide very strong constraints for the underlying geometry. The contribution of this work is to translate these constraints respectively into intra-image-column and inter-image-column constraints, and formulate the dense reconstruction problem as a 2-pass dynamic programming problem, which can be solved efficiently. Moreover, our algorithm is fully parallelizable which performs the re- construction of 1M points (with 160 discrete height levels) in a hundred seconds on a GPU. Our results show that we can preserve a high level of detail, and have high visual quality. I will also talk about recognizing 3-D objects from a single photo, an extension of several existing phone tools ( http://www.iqengines.com/).


Using Geographical Multimedia to Bridge Psychical and Digital Worlds


Speaker: Speaker: Hongxun Yao, Harbin Institute of Technology

Abstract: In recent years, there is an ongoing growth of geographical tags on Internet multimedia. With the popularity of photo sharing websites such as Flickr and Live Spaces, both explicit GPS tagging and implicit geographical descriptions are now prevalent on the Web. Such geographical tagging on Internet multimedia poses wide spreading applications, such as identifying photos’ physical locations, discovering photographers’ visual concerns, as well as mining users’ social connections. Recently, there are increasing focuses on these specific data from both academic and industrial researches. In this talk, we survey recent advances in recognition and mining of geographical-aware Internet multimedia. Both implicit and explicit geographical tagging is investigated to collecting and mapping photos from the Web. Especially, we give in-depth studies of two developed systems: a Photo2Search-Beijing street view location recognition system and a Visual Tourism city landmark mining system. Another important issue comes from the leverage of such kind of data for related commercial recommendations. In this talk, we further show three related application systems in our former research, including image-based advertising, vision-based geo-location recommendation, as well as tourism recommendation from community-contributed geo-multimedia. Finally, we discuss potential challenges and future research trends for geographical-aware Internet multimedia.


Building Rome on a Cloudless Day


Speaker: Jan-Michael Frahm, University of North Carolina at Chapel Hill

Abstract: In recent years photo sharing web sites like Flickr have become increasingly popular. Nowadays, every day millions of photos are uploaded. These photos survey large parts of the world throughout the different seasons, various weather conditions and various times of the day. Given the scale of data we are facing a significant challenge to process them within a reasonable time frame. In the talk I will present my work on the highly efficient organization and reconstruction of 3D models from millions of images on a single PC in the span of a day. The approach addresses a variety of the current challenges that have to be addressed to achieve a concurrent 3D model from these data. The challenges are: estimation of the geometric and radiometric camera calibration from videos and photos, efficient robust camera motion estimation for (quasi-)degenerate estimation problems, high performance stereo estimation from multiple views, automatic selection of correct views from noisy photo collections, image based location recognition for topology detection. In the talk I will discuss the details of our optimal appearance and geometry based image organization method, our efficient stereo technique for determining the scene depths from photo collection images will also be explained during the talk. It allows performing the scene depth estimation with multiple frames per second from a large set of views with a considerable variation in appearance.



3D Shape Representation, Matching and Recognition


Speaker: Hongbin Zha, Peking University

Abstract:Development of new methods for describing 3D shapes is an important topic in object recognition, model-based manipulation, and digital geometry processing. In the early days of computer vision, an object is usually modeled with global representations such as constructive solid geometry, generalized cylinders, or deformed superquadrics. Recently, more sophisticated representations such as shape distributions are developed, which allow for matching of objects under general similarity metrics. One drawback of such global schemes is that they are not suitable for matching with scenes where the target objects are only partially visible due to occlusion or limited view fields. In the talk, I will report our efforts in combining global and local representations to develop efficient methods for the partial object matching and analysis. The topics include a new shape representation scheme which uses a probabilistic bag-of-words model, a shape matching algorithm based on a dimension amnesic pyramid match kernel, and a shape space approach to animation of 3D human faces.