Engineering and Technology
January 19, 2023
min read

CVPR 2020 Tutorial “Image Retrieval in the Wild”

The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) is one of the world’s top conferences in computer vision. We organized a half day tutorial “Image Retrieval in the Wild” at CVPR 2020.


Content-based image retrieval is one of the most essential techniques used for interacting with visual collections. Significant progress has been made in the last decade by technological advances in deep learning and similarity search. Although commercial applications using the technologies are increasing, there has not been enough discussion about how to build a practical and a large-scale visual search system.

The organizers of this tutorial

This tutorial covered several important components of building an image retrieval system for real-world applications. The organizers were Yusuke Matsui (The University of Tokyo), Zheng Wang (National Institute of Informatics) and Takuma Yamaguchi (Mercari, Inc).

All the presentation slides and videos are available at our project site


Billion-scale Approximate Nearest Neighbor Search

Yusuke Matsui introduced state-of-the-art algorithms of approximate nearest neighbor search. Since many algorithms and libraries have been proposed and published in the field and design of the search algorithm is critical for application performance, it’s a time consuming task to choose one of them. To make it easy, a practical guide to select the best algorithm and similarity search library for each given task, which was depending on database size and vector dimensions, was provided.

A Large-scale Visual Search System in the C2C Marketplace App Mercari

Takuma Yamaguchi presented an example of how such an algorithm was utilized in an online C2C marketplace app, which has over one billion listings and over 16 million monthly active users. He showed how to productionize a highly scalable and available visual search system on Kubernetes for the app. Additionally, since the general deep learning based feature extraction didn’t work very well due to a C2C marketplace specific issue, a technique to handle the issue was introduced.

Beyond Intra-modality Discrepancy: A Survey of Heterogeneous Person Re-identification

Zheng Wang conducted a systematic review for heterogeneous person re-identification, where the inter-modality discrepancy works as the main challenge. The survey covered four cross-modality application scenarios: low-resolution (LR), infrared (IR), sketch, and text. It also included the latest topics which were presented in the conference CVPR 2020. Additionally, the available datasets in each category were introduced and the representative approaches were compared and summarized in his talk.

Live-coding Demo to Implement an Image Search Engine from Scratch

Yusuke Matsui provided a live-coding demo to implement an image search engine from scratch within 30 mins without copying and pasting the code. The only 100 lines of Python code realized an image search web API by leveraging a pre-trained deep learning model. It will be very useful for those who are trying to build their own image search system for the first time.


The CVPR 2020 was a virtual conference this year. All of us organized the tutorial from Japan and it started at 12:30am. The time was a minor matter. We had more concerns before the tutorial, like network/machine troubles, microphone quality, the number of participants, and so on.

Fortunately, the tutorial finished successfully. We had many participants, lively discussions, and no network/machine troubles. Furthermore, we were very happy that some participants were satisfied with the contents.

Thank you to all the participants and the CVPR 2020 organizers.

Takuma Yamaguchi (Kumon)
Engineering Manager/Machine Learning Engineer at Mercari.