Document - Aggregating binary local descriptors for image retrieval

2018

Journal article Open Access

Aggregating binary local descriptors for image retrieval

Amato G., Falchi F., Vadicamo L.

Binary local feature Convolutional neural network Hardware and Architecture Computer Networks and Communications Bag of words VLAD Computer Vision and Pattern Recognition (cs.CV) FOS: Computer and information sciences Fisher vector Media Technology Software Content-based image retrieval Computer Science - Computer Vision and Pattern Recognition

Content-Based Image Retrieval based on local features is computationally expensive because of the complexity of both extraction and matching of local feature. On one hand, the cost for extracting, representing, and comparing local visual descriptors has been dramatically reduced by recently proposed binary local features. On the other hand, aggregation techniques provide a meaningful summarization of all the extracted feature of an image into a single descriptor, allowing us to speed up and scale up the image search. Only a few works have recently mixed together these two research directions, defining aggregation methods for binary local features, in order to leverage on the advantage of both approaches.In this paper, we report an extensive comparison among state-of-the-art aggregation methods applied to binary features. Then, we mathematically formalize the application of Fisher Kernels to Bernoulli Mixture Models. Finally, we investigate the combination of the aggregated binary features with the emerging Convolutional Neural Network (CNN) features. Our results show that aggregation methods on binary features are effective and represent a worthwhile alternative to the direct matching. Moreover, the combination of the CNN with the Fisher Vector (FV) built upon binary features allowed us to obtain a relative improvement over the CNN results that is in line with that recently obtained using the combination of the CNN with the FV built upon SIFTs. The advantage of using the FV built upon binary features is that the extraction process of binary features is about two order of magnitude faster than SIFTs.

Source: Multimedia tools and applications 77 (2018): 5385–5415. doi:10.1007/s11042-017-4450-2

Publisher: Kluwer Academic Publishers, Dordrecht ;, Stati Uniti d'America

Citations

Bing images. Http://www.bing.com/images/
Google googles. Http://www.google.com/mobile/goggles/
Google images. Https://images.google.com/
Alcantarilla, P.F., Nuevo, J., Bartoli, A.: Fast explicit diffusion for accelerated features in nonlinear scale spaces. In: In British Machine Vision Conference (BMVC) (2013)
Amato, G., Falchi, F., Vadicamo, L.: How effective are aggregation methods on binary features? In: Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, vol. 4, pp. 566-573 (2016)
Amato, G., Falchi, F., Vadicamo, L.: Visual recognition of ancient inscriptions using convolutional neural network and fisher vector. Journal on Computing and Cultural Heritage (JOCCH) (2016). To Appear
Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp. 2911-2918 (2012)
Arandjelovic, R., Zisserman, A.: All about VLAD. In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pp. 1578-1585 (2013). DOI 10.1109/CVPR. 2013.207
Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. In: Computer Vision-ECCV 2014, pp. 584-599. Springer (2014). DOI 10.1007/ 978-3-319-10590-1 38. URL http://dx.doi.org/10.1007/978-3-319-10590-1_38
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: A. Leonardis, H. Bischof, A. Pinz (eds.) Computer Vision - ECCV 2006, Lecture Notes in Computer Science, vol. 3951, pp. 404-417. Springer Berlin Heidelberg (2006). DOI 10.1007/11744023 32. URL http://dx.doi.org/10.1007/11744023_32
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer (2006)
Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 2559- 2566 (2010)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. In: K. Daniilidis, P. Maragos, N. Paragios (eds.) Computer Vision - ECCV 2010, Lecture Notes in Computer Science, vol. 6314, pp. 778-792. Springer Berlin Heidelberg (2010)
Chandrasekhar, V., Lin, J., More`re, O., Goh, H., Veillard, A.: A practical guide to cnns and fisher vectors for image instance retrieval. CoRR abs/1508.02496 (2015). URL http: //arxiv.org/abs/1508.02496
Chen, D., Tsai, S., Chandrasekhar, V., Takacs, G., Chen, H., Vedantham, R., Grzeszczuk, R., Girod, B.: Residual enhanced visual vectors for on-device image matching. In: Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on, pp. 850-854 (2011). DOI 10.1016/j.sigpro.2012.06.005. URL http://dx.doi.org/10.1016/j.sigpro.2012.06.005
Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: Automatic query expansion with a generative feature model for object retrieval. In: Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pp. 1-8 (2007)
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. Workshop on statistical learning in computer vision, ECCV 1(1-22), 1-2 (2004)
Datta, R., Li, J., Wang, J.Z.: Content-based image retrieval: Approaches and trends of the new age. In: Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, MIR '05, pp. 253-262. ACM, New York, NY, USA (2005)
Je´gou, H., Perronnin, F., Douze, M., Sa`nchez, J., Pe´rez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(9), 1704-1716 (2012). DOI 10.1109/TPAMI.2011.235
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675-678. ACM (2014). DOI 10.1145/2647868.2654889. URL http://doi.acm.org/10.1145/2647868.2654889
Kaufman, L., Rousseeuw, P.: Clustering by means of medoids. In: Y. Dodge (ed.) An introduction to L1-norm based statistical data analysis, Computational Statistics & Data Analysis, vol. 5 (1987)
Krapac, J., Verbeek, J., Jurie, F.: Modeling Spatial Layout with Fisher Vectors for Image Categorization. In: ICCV 2011 - International Conference on Computer Vision, pp. 1487- 1494. IEEE, Barcelona, Spain (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: F. Pereira, C. Burges, L. Bottou, K. Weinberger (eds.) Advances in Neural Information Processing Systems 25, pp. 1097-1105. Curran Associates, Inc. (2012)
Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, vol. 2 (2006)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436-444 (2015). DOI 10.1038/nature14539
Lee, S., Choi, S., Yang, H.: Bag-of-binary-features for fast image representation. Electronics Letters 51(7), 555-557 (2015)
Leutenegger, S., Chli, M., Siegwart, R.: Brisk: Binary robust invariant scalable keypoints. In: Computer Vision (ICCV), 2011 IEEE International Conference on, pp. 2548-2555 (2011)
Levi, G., Hassner, T.: LATCH: learned arrangements of three patch codes. CoRR abs/1501.03719 (2015)
Lin, K., Yang, H.F., Hsiao, J.H., Chen, C.S.: Deep learning of binary hash codes for fast image retrieval. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2015)
Lloyd, S.: Least squares quantization in pcm. Information Theory, IEEE Transactions on 28(2), 129-137 (1982). DOI 10.1109/TIT.1982.1056489. URL http://dx.doi.org/10. 1109/TIT.1982.1056489
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91-110 (2004). DOI 10.1023/B:VISI.0000029664.99615.94. URL http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley series in probability and statistics. Wiley (2000)
Miksik, O., Mikolajczyk, K.: Evaluation of local detectors and descriptors for fast feature matching. In: Pattern Recognition (ICPR), 2012 21st International Conference on, pp. 2681- 2684 (2012)
Perd'och, M., Chum, O., Matas, J.: Efficient representation of local geometry for large scale object retrieval. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 9-16 (2009)
Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on, pp. 1-8 (2007). DOI 10.1109/CVPR.2007.383266
Tolias, G., Furon, T., Je´gou, H.: Orientation covariant aggregation of local descriptors with embeddings. In: D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (eds.) Computer Vision - ECCV 2014, Lecture Notes in Computer Science, vol. 8694, pp. 382-397. Springer International Publishing (2014)
Tolias, G., Je´gou, H.: Local visual query expansion: Exploiting an image collection to refine local descriptors. Research Report RR-8325 (2013). URL https://hal.inria.fr/ hal-00840721
Uchida, Y., Sakazawa, S.: Image retrieval with fisher vectors of binary features. In: Pattern Recognition (ACPR), 2013 2nd IAPR Asian Conference on, pp. 23-28 (2013)
Ullman, S.: High-Level Vision - Object Recognition and Visual Cognition. MIT Press (1996)
Uricchio, T., Bertini, M., Seidenari, L., Del Bimbo, A.: Fisher encoded convolutional bag-ofwindows for efficient image retrieval and social image tagging. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (2015)
Van Opdenbosch, D., Schroth, G., Huitl, R., Hilsenbeck, S., Garcea, A., Steinbach, E.: Camera-based indoor positioning using scalable streaming of compressed binary image signatures. In: IEEE International Conference on Image Processing (2014)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp. 3360-3367 (2010)
Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann (1999)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 1794-1801 (2009)
Yue-Hei Ng, J., Yang, F., Davis, L.S.: Exploiting local features from deep networks for image retrieval. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2015)
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach, Advances in Database Systems, vol. 32. Springer (2006)
Zhang, Y., Zhu, C., Bres, S., Chen, L.: Encoding local binary descriptors by bag-of-features with hamming distance for visual object categorization. In: P. Serdyukov, P. Braslavski, S. Kuznetsov, J. Kamps, S. Rger, E. Agichtein, I. Segalovich, E. Yilmaz (eds.) Advances in Information Retrieval, Lecture Notes in Computer Science, vol. 7814, pp. 630-641. Springer Berlin Heidelberg (2013)
Zhao, W., Je´gou, H., Gravier, G.: Oriented pooling for dense and non-dense rotation-invariant features. In: BMVC - 24th British Machine Vision Conference (2013)
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, K. Weinberger (eds.) Advances in Neural Information Processing Systems 27, pp. 487-495. Curran Associates, Inc. (2014)

Metrics

Back to previous page

Cite as

BibTeX entry

@article{oai:it.cnr:prodotti:378357,
	title = {Aggregating binary local descriptors for image retrieval},
	author = {Amato G. and Falchi F. and Vadicamo L.},
	publisher = {Kluwer Academic Publishers, Dordrecht ;, Stati Uniti d'America},
	doi = {10.1007/s11042-017-4450-2 and 10.48550/arxiv.1608.00813},
	journal = {Multimedia tools and applications},
	volume = {77},
	pages = {5385–5415},
	year = {2018}
}