Document - Learning visual features for relational CBIR

2019

Journal article Open Access

Learning visual features for relational CBIR

Messina N., Amato G., Carrara F., Falchi F., Gennaro C.

Deep learning Relation networks Library and Information Sciences Media Technology Information Systems CLEVR Content-based image retrieval Deep features Relational reasoning

Recent works in deep-learning research highlighted remarkable relational reasoning capabilities of some carefully designed architectures. In this work, we employ a relationship-aware deep learning model to extract compact visual features used relational image descriptors. In particular, we are interested in relational content-based image retrieval (R-CBIR), a task consisting in finding images containing similar inter-object relationships. Inspired by the relation networks (RN) employed in relational visual question answering (R-VQA), we present novel architectures to explicitly capture relational information from images in the form of network activations that can be subsequently extracted and used as visual features. We describe a two-stage relation network module (2S-RN), trained on the R-VQA task, able to collect non-aggregated visual features. Then, we propose the aggregated visual features relation network (AVF-RN) module that is able to produce better relationship-aware features by learning the aggregation directly inside the network. We employ an R-CBIR ground-truth built by exploiting scene-graphs similarities available in the CLEVR dataset in order to rank images in a relational fashion. Experiments show that features extracted from our 2S-RN model provide an improved retrieval performance with respect to standard non-relational methods. Moreover, we demonstrate that the features extracted from the novel AVF-RN can further improve the performance measured on the R-CBIR task, reaching the state-of-the-art on the proposed dataset.

Source: International journal of multimedia information retrieval Print 9 (2019): 113–124. doi:10.1007/s13735-019-00178-7

Publisher: Springer, Londra, Regno Unito

Metrics

Back to previous page

Cite as

BibTeX entry

@article{oai:it.cnr:prodotti:416050,
	title = {Learning visual features for relational CBIR},
	author = {Messina N. and Amato G. and Carrara F. and Falchi F. and Gennaro C.},
	publisher = {Springer, Londra, Regno Unito},
	doi = {10.1007/s13735-019-00178-7},
	journal = {International journal of multimedia information retrieval Print},
	volume = {9},
	pages = {113–124},
	year = {2019}
}