11 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2021 Journal article Open Access OPEN

Solving the same-different task with convolutional neural networks
Messina N., Amato G. Carrara F., Gennaro C., Falchi F.
Deep learning demonstrated major abilities in solving many kinds of different real-world problems in computer vision literature. However, they are still strained by simple reasoning tasks that humans consider easy to solve. In this work, we probe current state-of-the-art convolutional neural networks on a difficult set of tasks known as the same-different problems. All the problems require the same prerequisite to be solved correctly: understanding if two random shapes inside the same image are the same or not. With the experiments carried out in this work, we demonstrate that residual connections, and more generally the skip connections, seem to have only a marginal impact on the learning of the proposed problems. In particular, we experiment with DenseNets, and we examine the contribution of residual and recurrent connections in already tested architectures, ResNet-18, and CorNet-S respectively. Our experiments show that older feed-forward networks, AlexNet and VGG, are almost unable to learn the proposed problems, except in some specific scenarios. We show that recently introduced architectures can converge even in the cases where the important parts of their architecture are removed. We finally carry out some zero-shot generalization tests, and we discover that in these scenarios residual and recurrent connections can have a stronger impact on the overall test accuracy. On four difficult problems from the SVRT dataset, we can reach state-of-the-art results with respect to the previous approaches, obtaining super-human performances on three of the four problems.Source: Pattern recognition letters 143 (2021): 75–80. doi:10.1016/j.patrec.2020.12.019
DOI: 10.1016/j.patrec.2020.12.019
Project(s): AI4EU via OpenAIRE

See at: arXiv.org e-Print Archive Open Access | Pattern Recognition Letters Open Access | ISTI Repository Open Access | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | CNR ExploRA Restricted | Pattern Recognition Letters Restricted | www.sciencedirect.com Restricted


2020 Conference article Restricted

Re-implementing and Extending Relation Network for R-CBIR
Messina N., Amato G., Falchi F.
Relational reasoning is an emerging theme in Machine Learning in general and in Computer Vision in particular. Deep Mind has recently proposed a module called Relation Network (RN) that has shown impressive results on visual question answering tasks. Unfortunately, the implementation of the proposed approach was not public. To reproduce their experiments and extend their approach in the context of Information Retrieval, we had to re-implement everything, testing many parameters and conducting many experiments. Our implementation is now public on GitHub and it is already used by a large community of researchers. Furthermore, we recently presented a variant of the relation network module that we called Aggregated Visual Features RN (AVF-RN). This network can produce and aggregate at inference time compact visual relationship-aware features for the Relational-CBIR (R-CBIR) task. R-CBIR consists in retrieving images with given relationships among objects. In this paper, we discuss the details of our Relation Network implementation and more experimental results than the original paper. Relational reasoning is a very promising topic for better understanding and retrieving inter-object relationships, especially in digital libraries.Source: 16th Italian Research Conference on Digital Libraries, IRCDL 2020, pp. 82–92, Bari, Italy, 30-31/01/2020
DOI: 10.1007/978-3-030-39905-4_9

See at: academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted


2020 Journal article Open Access OPEN

Virtual to real adaptation of pedestrian detectors
Ciampi L., Messina N., Falchi F., Gennaro C., Amato G.
Pedestrian detection through Computer Vision is a building block for a multitude of applications. Recently, there has been an increasing interest in convolutional neural network-based architectures to execute such a task. One of these supervised networks' critical goals is to generalize the knowledge learned during the training phase to new scenarios with different characteristics. A suitably labeled dataset is essential to achieve this purpose. The main problem is that manually annotating a dataset usually requires a lot of human effort, and it is costly. To this end, we introduce ViPeD (Virtual Pedestrian Dataset), a new synthetically generated set of images collected with the highly photo-realistic graphical engine of the video game GTA V (Grand Theft Auto V), where annotations are automatically acquired. However, when training solely on the synthetic dataset, the model experiences a Synthetic2Real domain shift leading to a performance drop when applied to real-world images. To mitigate this gap, we propose two different domain adaptation techniques suitable for the pedestrian detection task, but possibly applicable to general object detection. Experiments show that the network trained with ViPeD can generalize over unseen real-world scenarios better than the detector trained over real-world data, exploiting the variety of our synthetic dataset. Furthermore, we demonstrate that with our domain adaptation techniques, we can reduce the Synthetic2Real domain shift, making the two domains closer and obtaining a performance improvement when testing the network over the real-world images.Source: Sensors (Basel) 20 (2020). doi:10.3390/s20185250
DOI: 10.3390/s20185250

See at: Sensors Open Access | arXiv.org e-Print Archive Open Access | Sensors Open Access | Europe PubMed Central Open Access | ISTI Repository Open Access | CNR ExploRA Open Access | Sensors Open Access | Sensors Open Access | Sensors Open Access | Sensors Open Access


2020 Report Open Access OPEN

AIMH research activities 2020
Aloia N., Amato G., Bartalesi V., Benedetti F., Bolettieri P., Carrara F., Casarosa V., Ciampi L., Concordia C., Corbara S., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Thanos C., Trupiano L., Vadicamo L., Vairo C.
Annual Report of the Artificial Intelligence for Media and Humanities laboratory (AIMH) research activities in 2020.

See at: ISTI Repository Open Access | CNR ExploRA Open Access


2020 Conference article Open Access OPEN

Relational visual-textual information retrieval
Messina N.
With the advent of deep learning, multimedia information processing gained a huge boost, and astonishing results have been observed on a multitude of interesting visual-textual tasks. Relation networks paved the way towards an attentive processing methodology that considers images and texts as sets of basic interconnected elements (regions and words). These winning ideas recently helped to reach the state-of-the-art on the image-text matching task. Cross-media information retrieval has been proposed as a benchmark to test the capabilities of the proposed networks to match complex multi-modal concepts in the same common space. Modern deep-learning powered networks are complex and almost all of them cannot provide concise multi-modal descriptions that can be used in fast multi-modal search engines. In fact, the latest image-sentence matching networks use cross-attention and early-fusion approaches, which force all the elements of the database to be considered at query time. In this work, I will try to lay down some ideas to bridge the gap between the effectiveness of modern deep-learning multi-modal matching architectures and their efficiency, as far as fast and scalable visual-textual information retrieval is concerned.Source: SISAP 2020 - 13th International Conference on Similarity Search and Applications, pp. 405–411, Copenhagen, Denmark, September 30 - October 2, 2020
DOI: 10.1007/978-3-030-60936-8_33

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted


2019 Conference article Open Access OPEN

Learning relationship-aware visual features
Messina N., Amato G., Carrara F., Falchi F., Gennaro C.
Relational reasoning in Computer Vision has recently shown impressive results on visual question answering tasks. On the challenging dataset called CLEVR, the recently proposed Relation Network (RN), a simple plug-and-play module and one of the state-of-the-art approaches, has obtained a very good accuracy (95.5%) answering relational questions. In this paper, we define a sub-field of Content-Based Image Retrieval (CBIR) called Relational-CBIR (R-CBIR), in which we are interested in retrieving images with given relationships among objects. To this aim, we employ the RN architecture in order to extract relation-aware features from CLEVR images. To prove the effectiveness of these features, we extended both CLEVR and Sort-of-CLEVR datasets generating a ground-truth for R-CBIR by exploiting relational data embedded into scene-graphs. Furthermore, we propose a modification of the RN module - a two-stage Relation Network (2S-RN) - that enabled us to extract relation-aware features by using a preprocessing stage able to focus on the image content, leaving the question apart. Experiments show that our RN features, especially the 2S-RN ones, outperform the RMAC state-of-the-art features on this new challenging task.Source: ECCV 2018 - European Conference on Computer Vision, pp. 486–501, Monaco, Germania, 8-14 Settembre 2018
DOI: 10.1007/978-3-030-11018-5_40

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | doi.org Restricted | link.springer.com Restricted | link.springer.com Restricted | link.springer.com Restricted | openaccess.thecvf.com Restricted | openaccess.thecvf.com Restricted | CNR ExploRA Restricted


2019 Conference article Open Access OPEN

Intelligenza Artificiale per Ricerca in Big Multimedia Data
Carrara F., Amato G., Debole F., Di Benedetto M., Falchi F., Gennaro C., Messina N.
La diffusa produzione di immagini e media digitali ha reso necessario l'utilizzo di metodi automatici di analisi e indicizzazione su larga scala per la loro fruzione. Il gruppo AIMIR dell'ISTI-CNR si è specializzato da anni in questo ambito ed ha abbracciato tecniche di Deep Learning basate su reti neurali artificiali per molteplici aspetti di questa disciplina, come l'analisi, l'annotazione e la descrizione automatica di contenuti visuali e il loro recupero su larga scala.Source: Ital-IA, Roma, 18/3/2019, 19/3/2019

See at: ISTI Repository Open Access | CNR ExploRA Open Access | www.ital-ia.it Open Access


2019 Conference article Open Access OPEN

Testing Deep Neural Networks on the Same-Different Task
Messina N., Amato G., Carrara F., Falchi F., Gennaro C.
Developing abstract reasoning abilities in neural networks is an important goal towards the achievement of human-like performances on many tasks. As of now, some works have tackled this problem, developing ad-hoc architectures and reaching overall good generalization performances. In this work we try to understand to what extent state-of-The-Art convolutional neural networks for image classification are able to deal with a challenging abstract problem, the so-called same-different task. This problem consists in understanding if two random shapes inside the same image are the same or not. A recent work demonstrated that simple convolutional neural networks are almost unable to solve this problem. We extend their work, showing that ResNet-inspired architectures are able to learn, while VGG cannot converge. In light of this, we suppose that residual connections have some important role in the learning process, while the depth of the network seems not so relevant. In addition, we carry out some targeted tests on the converged architectures to figure out to what extent they are able to generalize to never seen patterns. However, further investigation is needed in order to understand what are the architectural peculiarities and limits as far as abstract reasoning is concerned.Source: 2019 International Conference on Content-Based Multimedia Indexing (CBMI), Dublin, Ireland, 4/9/2019, 6/9/2019
DOI: 10.1109/cbmi.2019.8877412
Project(s): AI4EU via OpenAIRE

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | ieeexplore.ieee.org Restricted | CNR ExploRA Restricted | xplorestaging.ieee.org Restricted


2019 Conference article Open Access OPEN

Learning pedestrian detection from virtual worlds
Amato G., Ciampi L., Falchi F., Gennaro C., Messina N.
In this paper, we present a real-time pedestrian detection system that has been trained using a virtual environment. This is a very popular topic of research having endless practical applications and recently, there was an increasing interest in deep learning architectures for performing such a task. However, the availability of large labeled datasets is a key point for an effective train of such algorithms. For this reason, in this work, we introduced ViPeD, a new synthetically generated set of images extracted from a realistic 3D video game where the labels can be automatically generated exploiting 2D pedestrian positions extracted from the graphics engine. We exploited this new synthetic dataset fine-tuning a state-of-the-art computationally efficient Convolutional Neural Network (CNN). A preliminary experimental evaluation, compared to the performance of other existing approaches trained on real-world images, shows encouraging results.Source: Image Analysis and Processing - ICIAP 2019, pp. 302–312, Trento, Italia, 9/9/2019, 13/9/2019
DOI: 10.1007/978-3-030-30642-7_27
Project(s): AI4EU via OpenAIRE

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | doi.org Restricted | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted | rd.springer.com Restricted


2019 Report Open Access OPEN

AIMIR 2019 Research Activities
Amato G., Bolettieri P., Carrara F., Ciampi L., Di Benedetto M., Debole F., Falchi F., Gennaro C., Lagani G., Massoli F. V., Messina N., Rabitti F., Savino P., Vadicamo L., Vairo C.
Multimedia Information Retrieval (AIMIR) research group is part of the NeMIS laboratory of the Information Science and Technologies Institute "A. Faedo" (ISTI) of the Italian National Research Council (CNR). The AIMIR group has a long experience in topics related to: Artificial Intelligence, Multimedia Information Retrieval, Computer Vision and Similarity search on a large scale. We aim at investigating the use of Artificial Intelligence and Deep Learning, for Multimedia Information Retrieval, addressing both effectiveness and efficiency. Multimedia information retrieval techniques should be able to provide users with pertinent results, fast, on huge amount of multimedia data. Application areas of our research results range from cultural heritage to smart tourism, from security to smart cities, from mobile visual search to augmented reality. This report summarize the 2019 activities of the research group.Source: AIMIR Annual Report, 2019

See at: ISTI Repository Open Access | CNR ExploRA Open Access


2019 Journal article Open Access OPEN

Learning visual features for relational CBIR
Messina N., Amato G., Carrara F., Falchi F., Gennaro C.
Recent works in deep-learning research highlighted remarkable relational reasoning capabilities of some carefully designed architectures. In this work, we employ a relationship-aware deep learning model to extract compact visual features used relational image descriptors. In particular, we are interested in relational content-based image retrieval (R-CBIR), a task consisting in finding images containing similar inter-object relationships. Inspired by the relation networks (RN) employed in relational visual question answering (R-VQA), we present novel architectures to explicitly capture relational information from images in the form of network activations that can be subsequently extracted and used as visual features. We describe a two-stage relation network module (2S-RN), trained on the R-VQA task, able to collect non-aggregated visual features. Then, we propose the aggregated visual features relation network (AVF-RN) module that is able to produce better relationship-aware features by learning the aggregation directly inside the network. We employ an R-CBIR ground-truth built by exploiting scene-graphs similarities available in the CLEVR dataset in order to rank images in a relational fashion. Experiments show that features extracted from our 2S-RN model provide an improved retrieval performance with respect to standard non-relational methods. Moreover, we demonstrate that the features extracted from the novel AVF-RN can further improve the performance measured on the R-CBIR task, reaching the state-of-the-art on the proposed dataset.Source: International journal of multimedia information retrieval Print 9 (2019): 113–124. doi:10.1007/s13735-019-00178-7
DOI: 10.1007/s13735-019-00178-7
Project(s): AI4EU via OpenAIRE

See at: ISTI Repository Open Access | International Journal of Multimedia Information Retrieval Restricted | International Journal of Multimedia Information Retrieval Restricted | International Journal of Multimedia Information Retrieval Restricted | International Journal of Multimedia Information Retrieval Restricted | International Journal of Multimedia Information Retrieval Restricted | CNR ExploRA Restricted