2019
Conference article  Open Access

Artpedia: a new visual-semantic dataset with visual and contextual sentences in the artistic domain

Stefanini M., Cornia M., Baraldi L., Corsini M., Cucchiara R.

Cross-modal retrieval  Visual-semantic models  Cultural heritage 

As vision and language techniques are widely applied to realistic images, there is a growing interest in designing visual-semantic models suitable for more complex and challenging scenarios. In this paper, we address the problem of cross-modal retrieval of images and sentences coming from the artistic domain. To this aim, we collect and manually annotate the Artpedia dataset that contains paintings and textual sentences describing both the visual content of the paintings and other contextual information. Thus, the problem is not only to match images and sentences, but also to identify which sentences actually describe the visual content of a given image. To this end, we devise a visual-semantic model that jointly addresses these two challenges by exploiting the latent alignment between visual and textual chunks. Experimental evaluations, obtained by comparing our model to different baselines, demonstrate the effectiveness of our solution and highlight the challenges of the proposed dataset. The Artpedia dataset is publicly available at: http://aimagelab.ing.unimore.it/artpedia.

Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 11752, pp. 729-740. Trento, Italy, 9-13 September, 2019


Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:iris.cnr.it:20.500.14243/525832,
	title = {Artpedia: a new visual-semantic dataset with visual and contextual sentences in the artistic domain},
	author = {Stefanini M. and Cornia M. and Baraldi L. and Corsini M. and Cucchiara R.},
	doi = {10.1007/978-3-030-30645-8_66},
	booktitle = {LECTURE NOTES IN COMPUTER SCIENCE, vol. 11752, pp. 729-740. Trento, Italy, 9-13 September, 2019},
	year = {2019}
}