Messina N., Amato G., Carrara F., Falchi F., Gennaro C.
Deep learning Relation networks Library and Information Sciences Media Technology Information Systems CLEVR Content-based image retrieval Deep features Relational reasoning
Recent works in deep-learning research highlighted remarkable relational reasoning capabilities of some carefully designed architectures. In this work, we employ a relationship-aware deep learning model to extract compact visual features used relational image descriptors. In particular, we are interested in relational content-based image retrieval (R-CBIR), a task consisting in finding images containing similar inter-object relationships. Inspired by the relation networks (RN) employed in relational visual question answering (R-VQA), we present novel architectures to explicitly capture relational information from images in the form of network activations that can be subsequently extracted and used as visual features. We describe a two-stage relation network module (2S-RN), trained on the R-VQA task, able to collect non-aggregated visual features. Then, we propose the aggregated visual features relation network (AVF-RN) module that is able to produce better relationship-aware features by learning the aggregation directly inside the network. We employ an R-CBIR ground-truth built by exploiting scene-graphs similarities available in the CLEVR dataset in order to rank images in a relational fashion. Experiments show that features extracted from our 2S-RN model provide an improved retrieval performance with respect to standard non-relational methods. Moreover, we demonstrate that the features extracted from the novel AVF-RN can further improve the performance measured on the R-CBIR task, reaching the state-of-the-art on the proposed dataset.
Source: International journal of multimedia information retrieval Print 9 (2019): 113–124. doi:10.1007/s13735-019-00178-7
Publisher: Springer, Londra, Regno Unito
@article{oai:it.cnr:prodotti:416050, title = {Learning visual features for relational CBIR}, author = {Messina N. and Amato G. and Carrara F. and Falchi F. and Gennaro C.}, publisher = {Springer, Londra, Regno Unito}, doi = {10.1007/s13735-019-00178-7}, journal = {International journal of multimedia information retrieval Print}, volume = {9}, pages = {113–124}, year = {2019} }