Large-scale image-to-video face retrieval with convolutional neural network features

Imane Hachchane, Abdelmajid Badri, Aïcha Sahel, Yassine Ruichek


Convolutional neural network features are becoming the norm in instance retrieval. This work investigates the relevance of using an of the shelf object detection network, like Faster R-CNN, as a feature extractor for an image-to-video face retrieval pipeline instead of using hand-crafted features. We use the objects proposals learned by a Region Proposal Network (RPN) and their associated representations taken from a CNN for the filtering and the re-ranking steps. Moreover, we study the relevance of features from a finetuned network. In addition to that we explore the use of face detection, fisher vector and bag of visual words with those CNN features. We also test the impact of different similarity metrics. The results obtained are very promising.


Image Processing; Classification; Object Recognition; CNN; Faster R- CNN; Image-To-Video Instance Retrieval; Face Retrieval; Video Retrieval; FV; BOVW;

Full Text:



S. S. Tsai et al., “Mobile product recognition,” in Proceedings of the international conference on Multimedia - MM ’10, 2010, p. 1587.

A. Salvador, X. Giro-I-Nieto, F. Marques, and S. Satoh, “Faster R-CNN Features for Instance Search,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2016, pp. 394–401.


J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond Short Snippets: Deep Networks for Video Classification,” Mar. 2015.

K. Simonyan and A. Zisserman, “Two-Stream Convolutional Networks for Action Recognition in Videos,” Jun. 2014.

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning Spatiotemporal Features with 3D Convolutional Networks,” Dec. 2014.

A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky, “Neural Codes for Image Retrieval,” Apr. 2014.

Y. Kalantidis, C. Mellina, and S. Osindero, “Cross-dimensional Weighting for Aggregated Deep Convolutional Features,” Dec. 2015.

A. S. Razavian, J. Sullivan, S. Carlsson, and A. Maki, “Visual Instance Retrieval with Deep Convolutional Networks,” Dec. 2014.

L. Wu, Y. Wang, Z. Ge, Q. Hu, and X. Li, “Structured deep hashing with convolutional neural networks for fast person re-identification,” Comput. Vis. Image Underst., vol. 167, pp. 63–73, Feb. 2018.

R. Arandjelovic and A. Zisserman, “Three things everyone should know to improve object retrieval,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2911–2918.

J. Pont-Tuset, P. Arbelaez, J. T. Barron, F. Marques, and J. Malik, “Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation,” Mar. 2015.

J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” Nov. 2014.

D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004.

M. Calonder, V. Lepetit, C. Strecha, and P. Fua, “BRIEF: Binary Robust Independent Elementary Features,” Springer, Berlin, Heidelberg, 2010, pp. 778–792.

J. Y. H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 07-12-June, pp. 4694–4702, 2015.

A. Araujo and B. Girod, “Large-Scale Video Retrieval Using Image Queries,” IEEE Trans. Circuits Syst. Video Technol., vol. XX, no. c, pp. 1–1, 2017.

G. De Oliveira Barra, M. Lux, and X. Giro-I-Nieto, “Large scale content-based video retrieval with LIvRE,” Proc. - Int. Work. Content-Based Multimed. Index., vol. 2016-June, 2016.

L. Zheng, Y. Yang, and Q. Tian, “SIFT Meets CNN: A Decade Survey of Instance Retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 5, pp. 1224–1244, 2018.

S. Zhang, M. Yang, T. Cour, K. Yu, and D. N. Metaxas, “Query Specific Rank Fusion for Image Retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 4, pp. 803–815, Apr. 2015.

S. Poullot, S. Tsukatani, A. Phuong Nguyen, H. Jégou, and S. Satoh, “Temporal Matching Kernel with Explicit Feature Maps,” in Proceedings of the 23rd ACM international conference on Multimedia - MM ’15, 2015, pp. 381–390.

D. M. Chen and B. Girod, “A Hybrid Mobile Visual Search System With Compact Global Signatures,” IEEE Trans. Multimed., vol. 17, no. 7, pp. 1019–1030, Jul. 2015.

C. Herrmann and J. Beyerer, “Fast face recognition by using an inverted index,” 2015, vol. 9405, p. 940507.

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “CNN Features off-the-shelf: an Astounding Baseline for Recognition,” Mar. 2014.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017.

R. Girshick, “Fast R-CNN,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448.

H. Jiang and E. Learned-Miller, “Face Detection with the Faster R-CNN,” Proc. - 12th IEEE Int. Conf. Autom. Face Gesture Recognition, FG 2017 - 1st Int. Work. Adapt. Shot Learn. Gesture Underst. Prod. ASL4GUP 2017, Biometrics Wild, Bwild 2017, Heteroge, pp. 650–657, 2017.

I. Hachchane, A. Badri, A. Sahel, and Y. Ruichek, “New Faster R-CNN Neuronal Approach for Face Retrieval,” in Lecture Notes in Networks and Systems, vol. 66, 2019, pp. 113–120.

G. Tolias, R. Sicre, and H. Jégou, “Particular object retrieval with integral max-pooling of CNN activations,” Nov. 2015.

Minyoung Kim, S. Kumar, V. Pavlovic, and H. Rowley, “Face tracking and recognition with visual constraints in real-world videos,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8.

L. Wolf, T. Hassner, and I. Maoz, “Face recognition in unconstrained videos with matched background similarity,” in CVPR 2011, 2011, pp. 529–534.

P. J. Phillips, H. Wechsler, J. Huang, and P. J. Rauss, “The FERET database and evaluation procedure for face-recognition algorithms,” Image Vis. Comput., vol. 16, no. 5, pp. 295–306, Apr. 1998.

D. L. Spacek, “Faces94 a face recognition dataset,” 2007.

H.-W. Ng and S. Winkler, “A data-driven approach to cleaning large face datasets,” in 2014 IEEE International Conference on Image Processing (ICIP), 2014, pp. 343–347.

Total views : 111 times


  • There are currently no refbacks.

View IJAI Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.