190 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2021 Journal article Restricted

Re-ranking via local embeddings: A use case with permutation-based indexing and the nSimplex projection
Vadicamo L., Gennaro C., Falchi F., Chavez E., Connor R., Amato G.
Approximate Nearest Neighbor (ANN) search is a prevalent paradigm for searching intrinsically high dimensional objects in large-scale data sets. Recently, the permutation-based approach for ANN has attracted a lot of interest due to its versatility in being used in the more general class of metric spaces. In this approach, the entire database is ranked by a permutation distance to the query. Typically, permutations allow the efficient selection of a candidate set of results, but typically to achieve high recall or precision this set has to be reviewed using the original metric and data. This can lead to a sizeable percentage of the database being recalled, along with many expensive distance calculations. To reduce the number of metric computations and the number of database elements accessed, we propose here a re-ranking based on a local embedding using the nSimplex projection. The nSimplex projection produces Euclidean vectors from objects in metric spaces which possess the n-point property. The mapping is obtained from the distances to a set of reference objects, and the original metric can be lower bounded and upper bounded by the Euclidean distance of objects sharing the same set of references. Our approach is particularly advantageous for extensive databases or expensive metric function. We reuse the distances computed in the permutations in the first stage, and hence the memory footprint of the index is not increased. An extensive experimental evaluation of our approach is presented, demonstrating excellent results even on a set of hundreds of millions of objects.Source: Information systems (Oxf.) 95 (2021). doi:10.1016/j.is.2020.101506
DOI: 10.1016/j.is.2020.101506
Project(s): AI4EU via OpenAIRE

See at: Information Systems Restricted | Information Systems Restricted | Information Systems Restricted | Information Systems Restricted | Information Systems Restricted | CNR ExploRA Restricted | www.sciencedirect.com Restricted | Information Systems Restricted

2021 Journal article Open Access OPEN

Solving the same-different task with convolutional neural networks
Messina N., Amato G. Carrara F., Gennaro C., Falchi F.
Deep learning demonstrated major abilities in solving many kinds of different real-world problems in computer vision literature. However, they are still strained by simple reasoning tasks that humans consider easy to solve. In this work, we probe current state-of-the-art convolutional neural networks on a difficult set of tasks known as the same-different problems. All the problems require the same prerequisite to be solved correctly: understanding if two random shapes inside the same image are the same or not. With the experiments carried out in this work, we demonstrate that residual connections, and more generally the skip connections, seem to have only a marginal impact on the learning of the proposed problems. In particular, we experiment with DenseNets, and we examine the contribution of residual and recurrent connections in already tested architectures, ResNet-18, and CorNet-S respectively. Our experiments show that older feed-forward networks, AlexNet and VGG, are almost unable to learn the proposed problems, except in some specific scenarios. We show that recently introduced architectures can converge even in the cases where the important parts of their architecture are removed. We finally carry out some zero-shot generalization tests, and we discover that in these scenarios residual and recurrent connections can have a stronger impact on the overall test accuracy. On four difficult problems from the SVRT dataset, we can reach state-of-the-art results with respect to the previous approaches, obtaining super-human performances on three of the four problems.Source: Pattern recognition letters 143 (2021): 75–80. doi:10.1016/j.patrec.2020.12.019
DOI: 10.1016/j.patrec.2020.12.019
Project(s): AI4EU via OpenAIRE

See at: arXiv.org e-Print Archive Open Access | Pattern Recognition Letters Open Access | ISTI Repository Open Access | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | CNR ExploRA Restricted | Pattern Recognition Letters Restricted | www.sciencedirect.com Restricted

2021 Conference article Open Access OPEN

Defending Neural ODE Image Classifiers from Adversarial Attacks with Tolerance Randomization
Carrara F., Caldelli R., Falchi F., Amato G.
Deep learned models are now largely adopted in different fields, and they generally provide superior performances with respect to classical signal-based approaches. Notwithstanding this, their actual reliability when working in an unprotected environment is far enough to be proven. In this work, we consider a novel deep neural network architecture, named Neural Ordinary Differential Equations (N-ODE), that is getting particular attention due to an attractive property--a test-time tunable trade-off between accuracy and efficiency. This paper analyzes the robustness of N-ODE image classifiers when faced against a strong adversarial attack and how its effectiveness changes when varying such a tunable trade-off. We show that adversarial robustness is increased when the networks operate in different tolerance regimes during test time and training time. On this basis, we propose a novel adversarial detection strategy for N-ODE nets based on the randomization of the adaptive ODE solver tolerance. Our evaluation performed on standard image classification benchmarks shows that our detection technique provides high rejection of adversarial examples while maintaining most of the original samples under white-box attacks and zero-knowledge adversaries.Source: International Conference on Pattern Recognition ICPR 2021, pp. 425–438, Milano (Virtuale), 10-15/01/2021
DOI: 10.1007/978-3-030-68780-9_35
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: ISTI Repository Open Access | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted

2021 Journal article Open Access OPEN

TweepFake: about detecting deepfake tweets
Fagni T., Falchi F., Gambini M., Martella A., Tesconi M.
The recent advances in language modeling significantly improved the generative capabilities of deep neural models: In 2019 OpenAI released GPT-2, a pre-trained language model that can autonomously generate coherent, non-trivial and human-like text samples. Since then, ever more powerful text generative models have been developed. Adversaries can exploit these tremendous generative capabilities to enhance social bots that will have the ability to write plausible deepfake messages, hoping to contaminate public debate. To prevent this, it is crucial to develop deepfake social media messages detection systems. However, to the best of our knowledge no one has ever addressed the detection of machinegenerated texts on social networks like Twitter or Facebook. With the aim of helping the research in this detection field, we collected the first dataset of real deepfake tweets, Tweep- Fake. It is real in the sense that each deepfake tweet was actually posted on Twitter. We collected tweets from a total of 23 bots, imitating 17 human accounts. The bots are based on various generation techniques, i.e., Markov Chains, RNN, RNN+Markov, LSTM, GPT-2. We also randomly selected tweets from the humans imitated by the bots to have an overall balanced dataset of 25,572 tweets (half human and half bots generated). The dataset is publicly available on Kaggle. Lastly, we evaluated 13 deepfake text detection methods (based on various state-of-the-art approaches) to both demonstrate the challenges that Tweepfake poses and create a solid baseline of detection techniques. We hope that Tweep- Fake can offer the opportunity to tackle the deepfake detection on social media messages as well.Source: PloS one 16 (2021). doi:10.1371/journal.pone.0251415
DOI: 10.1371/journal.pone.0251415
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: journals.plos.org Open Access | ISTI Repository Open Access | ISTI Repository Open Access | CNR ExploRA Open Access | ZENODO Open Access

2021 Journal article Open Access OPEN

The VISIONE video search system: exploiting off-the-shelf text search engines for large-scale video retrieval
Amato G., Bolettieri P., Carrara F., Debole F., Falchi F., Gennaro C., Vadicamo L., Vairo C.
This paper describes in detail VISIONE, a video search system that allows users to search for videos using textual keywords, the occurrence of objects and their spatial relationships, the occurrence of colors and their spatial relationships, and image similarity. These modalities can be combined together to express complex queries and meet users' needs. The peculiarity of our approach is that we encode all information extracted from the keyframes, such as visual deep features, tags, color and object locations, using a convenient textual encoding that is indexed in a single text retrieval engine. This offers great flexibility when results corresponding to various parts of the query (visual, text and locations) need to be merged. In addition, we report an extensive analysis of the retrieval performance of the system, using the query logs generated during the Video Browser Showdown (VBS) 2019 competition. This allowed us to fine-tune the system by choosing the optimal parameters and strategies from those we tested.Source: JOURNAL OF IMAGING 7 (2021). doi:10.3390/jimaging7050076
DOI: 10.3390/jimaging7050076

See at: ISTI Repository Open Access | ISTI Repository Open Access | CNR ExploRA Open Access | www.mdpi.com Open Access

2021 Conference article Open Access OPEN

Transformer reasoning network for image-text matching and retrieval
Messina N., Falchi F., Esuli A., Amato G.
Image-text matching is an interesting and fascinating task in modern AI research. Despite the evolution of deep-learning-based image and text processing systems, multi-modal matching remains a challenging problem. In this work, we consider the problem of accurate image-text matching for the task of multi-modal large-scale information retrieval. State-of-the-art results in image-text matching are achieved by inter-playing image and text features from the two different processing pipelines, usually using mutual attention mechanisms. However, this invalidates any chance to extract separate visual and textual features needed for later indexing steps in large-scale retrieval systems. In this regard, we introduce the Transformer Encoder Reasoning Network (TERN), an architecture built upon one of the modern relationship-aware self-attentive architectures, the Transformer Encoder (TE). This architecture is able to separately reason on the two different modalities and to enforce a final common abstract concept space by sharing the weights of the deeper transformer layers. Thanks to this design, the implemented network is able to produce compact and very rich visual and textual features available for the successive indexing step. Experiments are conducted on the MS-COCO dataset, and we evaluate the results using a discounted cumulative gain metric with relevance computed exploiting caption similarities, in order to assess possibly non-exact but relevant search results. We demonstrate that on this metric we are able to achieve state-of-the-art results in the image retrieval task. Our code is freely available at https://github.com/mesnico/TERN.Source: ICPR 2021 - International Conference on Pattern Recognition, pp. 5222–5229, Online conference, 10-15/01/2021
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: link.springer.com Open Access | ISTI Repository Open Access | CNR ExploRA Open Access

2021 Journal article Open Access OPEN

Hebbian semi-supervised learning in a sample efficiency setting
Lagani G., Falchi F., Gennaro C., Amato G.
We propose to address the issue of sample efficiency, in Deep Convolutional Neural Networks (DCNN), with a semi-supervised training strategy that combines Hebbian learning with gradient descent: all internal layers (both convolutional and fully connected) are pre-trained using an unsupervised approach based on Hebbian learning, and the last fully connected layer (the classification layer) is trained using Stochastic Gradient Descent (SGD). In fact, as Hebbian learning is an unsupervised learning method, its potential lies in the possibility of training the internal layers of a DCNN without labels. Only the final fully connected layer has to be trained with labeled examples. We performed experiments on various object recognition datasets, in different regimes of sample efficiency, comparing our semi-supervised (Hebbian for internal layers + SGD for the final fully connected layer) approach with end-to-end supervised backprop training, and with semi-supervised learning based on Variational Auto-Encoder (VAE). The results show that, in regimes where the number of available labeled samples is low, our semi-supervised approach outperforms the other approaches in almost all the cases.Source: Neural networks 143 (2021): 719–731. doi:10.1016/j.neunet.2021.08.003
DOI: 10.1016/j.neunet.2021.08.003
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: ISTI Repository Open Access | ZENODO Open Access | CNR ExploRA Restricted | www.sciencedirect.com Restricted

2021 Conference article Open Access OPEN

AIMH at SemEval-2021 - Task 6: multimodal classification using an ensemble of transformer models
Messina N., Falchi F., Gennaro C., Amato G.
This paper describes the system used by the AIMH Team to approach the SemEval Task 6. We propose an approach that relies on an architecture based on the transformer model to process multimodal content (text and images) in memes. Our architecture, called DVTT (Double Visual Textual Transformer), approaches Subtasks 1 and 3 of Task 6 as multi-label classification problems, where the text and/or images of the meme are processed, and the probabilities of the presence of each possible persuasion technique are returned as a result. DVTT uses two complete networks of transformers that work on text and images that are mutually conditioned. One of the two modalities acts as the main one and the second one intervenes to enrich the first one, thus obtaining two distinct ways of operation. The two transformers outputs are merged by averaging the inferred probabilities for each possible label, and the overall network is trained end-to-end with a binary cross-entropy loss.Source: SemEval-2021 - 15th International Workshop on Semantic Evaluation, pp. 1020–1026, Bangkok, Thailand, 5-6/08/2021
DOI: 10.18653/v1/2021.semeval-1.140
Project(s): AI4EU via OpenAIRE, AI4Media via OpenAIRE

See at: aclanthology.org Open Access | ISTI Repository Open Access | ISTI Repository Open Access | CNR ExploRA Open Access

2021 Conference article Open Access OPEN

Towards efficient cross-modal visual textual retrieval using transformer-encoder deep features
Messina N., Amato G., Falchi F., Gennaro C., Marchand-maillet S.
Cross-modal retrieval is an important functionality in modern search engines, as it increases the user experience by allowing queries and retrieved objects to pertain to different modalities. In this paper, we focus on the image-sentence retrieval task, where the objective is to efficiently find relevant images for a given sentence (image-retrieval) or the relevant sentences for a given image (sentence-retrieval). Computer vision literature reports the best results on the image-sentence matching task using deep neural networks equipped with attention and self-attention mechanisms. They evaluate the matching performance on the retrieval task by performing sequential scans of the whole dataset. This method does not scale well with an increasing amount of images or captions. In this work, we explore different preprocessing techniques to produce sparsified deep multi-modal features extracting them from state-of-the-art deep-learning architectures for image-text matching. Our main objective is to lay down the paths for efficient indexing of complex multi-modal descriptions. We use the recently introduced TERN architecture as an image-sentence features extractor. It is designed for producing fixed-size 1024-d vectors describing whole images and sentences, as well as variable-length sets of 1024-d vectors describing the various building components of the two modalities (image regions and sentence words respectively). All these vectors are enforced by the TERN design to lie into the same common space. Our experiments show interesting preliminary results on the explored methods and suggest further experimentation in this important research direction.Source: CBMI - International Conference on Content-Based Multimedia Indexing, Lille, France, 28-30/06/2021
DOI: 10.1109/cbmi50038.2021.9461890
Project(s): AI4EU via OpenAIRE

See at: ISTI Repository Open Access | ieeexplore.ieee.org Restricted | CNR ExploRA Restricted

2020 Conference article Open Access OPEN

Edge-Based Video Surveillance with Embedded Devices
Kavalionak H., Gennaro C., Amato G., Vairo C., Perciante C., Meghini C., Falchi F., Rabitti F.
Video surveillance systems have become indispensable tools for the security and organization of public and private areas. In this work, we propose a novel distributed protocol for an edge-based face recogni-tion system that takes advantage of the computational capabilities of the surveillance devices (i.e., cameras) to perform person recognition. The cameras fall back to a centralized server if their hardware capabili-ties are not enough to perform the recognition. We evaluate the proposed algorithm via extensive experiments on a freely available dataset. As a prototype of surveillance embedded devices, we have considered a Rasp-berry PI with the camera module. Using simulations, we show that our algorithm can reduce up to 50% of the load of the server with no negative impact on the quality of the surveillance service.Source: 28th Symposium on Advanced Database Systems (SEBD), pp. 278–285, Villasimius, Sardinia, Italy, 21-24/06/2020

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA Open Access

2020 Conference article Open Access OPEN

Multi-Resolution Face Recognition with Drones
Amato G., Falchi F., Gennaro C., Massoli F. V., Vairo C.
Smart cameras have recently seen a large diffusion and represent a low-cost solution for improving public security in many scenarios. Moreover, they are light enough to be lifted by a drone. Face recognition enabled by drones equipped with smart cameras has already been reported in the literature. However, the use of the drone generally imposes tighter constraints than other facial recognition scenarios. First, weather conditions, such as the presence of wind, pose a severe limit on image stability. Moreover, the distance the drones fly is typically much high than fixed ground cameras, which inevitably translates into a degraded resolution of the face images. Furthermore, the drones' operational altitudes usually require the use of optical zoom, thus amplifying the harmful effects of their movements. For all these reasons, in drone scenarios, image degradation strongly affects the behavior of face detection and recognition systems. In this work, we studied the performance of deep neural networks for face re-identification specifically designed for low-quality images and applied them to a drone scenario using a publicly available dataset known as DroneSURF.Source: 3rd International Conference on Sensors, Signal and Image Processing, pp. 13–18, Praga, Czech Republic (Virtual), 23-25/10/2020
DOI: 10.1145/3441233.3441237

See at: ISTI Repository Open Access | dl.acm.org Restricted | dl.acm.org Restricted | CNR ExploRA Restricted

2020 Conference article Open Access OPEN

Scalar Quantization-Based Text Encoding for Large Scale Image Retrieval
Amato G., Carrara F., Falchi F., Gennaro C., Rabitti F., Vadicamo L.
The great success of visual features learned from deep neu-ral networks has led to a significant effort to develop efficient and scal- A ble technologies for image retrieval. This paper presents an approach to transform neural network features into text codes suitable for being indexed by a standard full-text retrieval engine such as Elasticsearch. The basic idea is providing a transformation of neural network features with the twofold aim of promoting the sparsity without the need of un-supervised pre-training. We validate our approach on a recent convolu-tional neural network feature, namely Regional Maximum Activations of Convolutions (R-MAC), which is a state-of-art descriptor for image retrieval. An extensive experimental evaluation conducted on standard benchmarks shows the effectiveness and efficiency of the proposed ap-proach and how it compares to state-of-the-art main-memory indexes.Source: 28th Italian Symposium on Advanced Database Systems, pp. 258–265, Virtual (online) due COVID-19, 21-24/06/2020

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA Open Access

2020 Journal article Open Access OPEN

Cross-resolution learning for face recognition
Massoli F. V., Amato G., Falchi F.
Convolutional Neural Network models have reached extremely high performance on the Face Recognition task. Mostly used datasets, such as VGGFace2, focus on gender, pose, and age variations, in the attempt of balancing them to empower models to better generalize to unseen data. Nevertheless, image resolution variability is not usually discussed, which may lead to a resizing of 256 pixels. While specific datasets for very low-resolution faces have been proposed, less attention has been paid on the task of cross-resolution matching. Hence, the discrimination power of a neural network might seriously degrade in such a scenario. Surveillance systems and forensic applications are particularly susceptible to this problem since, in these cases, it is common that a low-resolution query has to be matched against higher-resolution galleries. Although it is always possible to either increase the resolution of the query image or to reduce the size of the gallery (less frequently), to the best of our knowledge, extensive experimentation of cross-resolution matching was missing in the recent deep learning-based literature. In the context of low- and cross-resolution Face Recognition, the contribution of our work is fourfold: i) we proposed a training procedure to fine-tune a state-of-the-art model to empower it to extract resolution-robust deep features; ii) we conducted an extensive test campaign by using high-resolution datasets (IJB-B and IJB-C) and surveillance-camera-quality datasets (QMUL-SurvFace, TinyFace, and SCface) showing the effectiveness of our algorithm to train a resolution-robust model; iii) even though our main focus was the cross-resolution Face Recognition, by using our training algorithm we also improved upon state-of-the-art model performances considering low-resolution matches; iv) we showed that our approach could be more effective concerning preprocessing faces with super-resolution techniques. The python code of the proposed method will be available at https://github.com/fvmassoli/cross-resolution-face-recognition.Source: Image and vision computing 99 (2020). doi:10.1016/j.imavis.2020.103927
DOI: 10.1016/j.imavis.2020.103927
Project(s): AI4EU via OpenAIRE

See at: arXiv.org e-Print Archive Open Access | Image and Vision Computing Open Access | ISTI Repository Open Access | Image and Vision Computing Restricted | Image and Vision Computing Restricted | Image and Vision Computing Restricted | Image and Vision Computing Restricted | Image and Vision Computing Restricted | Image and Vision Computing Restricted | Image and Vision Computing Restricted | Image and Vision Computing Restricted | CNR ExploRA Restricted | Image and Vision Computing Restricted | www.sciencedirect.com Restricted | Image and Vision Computing Restricted

2020 Conference article Restricted

Re-implementing and Extending Relation Network for R-CBIR
Messina N., Amato G., Falchi F.
Relational reasoning is an emerging theme in Machine Learning in general and in Computer Vision in particular. Deep Mind has recently proposed a module called Relation Network (RN) that has shown impressive results on visual question answering tasks. Unfortunately, the implementation of the proposed approach was not public. To reproduce their experiments and extend their approach in the context of Information Retrieval, we had to re-implement everything, testing many parameters and conducting many experiments. Our implementation is now public on GitHub and it is already used by a large community of researchers. Furthermore, we recently presented a variant of the relation network module that we called Aggregated Visual Features RN (AVF-RN). This network can produce and aggregate at inference time compact visual relationship-aware features for the Relational-CBIR (R-CBIR) task. R-CBIR consists in retrieving images with given relationships among objects. In this paper, we discuss the details of our Relation Network implementation and more experimental results than the original paper. Relational reasoning is a very promising topic for better understanding and retrieving inter-object relationships, especially in digital libraries.Source: 16th Italian Research Conference on Digital Libraries, IRCDL 2020, pp. 82–92, Bari, Italy, 30-31/01/2020
DOI: 10.1007/978-3-030-39905-4_9

See at: academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted

2020 Conference article Open Access OPEN

Continuous ODE-defined image features for adaptive retrieval
Carrara F., Amato G., Falchi F., Gennaro C.
In the last years, content-based image retrieval largely benefited from representation extracted from deeper and more complex convolutional neural networks, which became more effective but also more computationally demanding. Despite existing hardware acceleration, query processing times may be easily saturated by deep feature extraction in high-throughput or real-time embedded scenarios, and usually, a trade-off between efficiency and effectiveness has to be accepted. In this work, we experiment with the recently proposed continuous neural networks defined by parametric ordinary differential equations, dubbed ODE-Nets, for adaptive extraction of image representations. Given the continuous evolution of the network hidden state, we propose to approximate the exact feature extraction by taking a previous "near-in-time" hidden state as features with a reduced computational cost. To understand the potential and the limits of this approach, we also evaluate an ODE-only architecture in which we minimize the number of classical layers in order to delegate most of the representation learning process - - and thus the feature extraction process - - to the continuous part of the model. Preliminary experiments on standard benchmarks show that we are able to dynamically control the trade-off between efficiency and effectiveness of feature extraction at inference-time by controlling the evolution of the continuous hidden state. Although ODE-only networks provide the best fine-grained control on the effectiveness-efficiency trade-off, we observed that mixed architectures perform better or comparably to standard residual nets in both the image classification and retrieval setups while using fewer parameters and retaining the controllability of the trade-off.Source: ICMR '20 - International Conference on Multimedia Retrieval, pp. 198–206, Dublin, Ireland, 8-11 June, 2020
DOI: 10.1145/3372278.3390690
Project(s): AI4EU via OpenAIRE

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | dblp.uni-trier.de Restricted | dl.acm.org Restricted | dl.acm.org Restricted | CNR ExploRA Restricted

2020 Report Open Access OPEN

Automatic Pass Annotation from Soccer Video Streams Based on Object Detection and LSTM
Sorano D., Carrara F., Cintia P., Falchi F., Pappalardo L.
Soccer analytics is attracting increasing interest in academia and industry, thanks to the availability of data that describe all the spatio-temporal events that occur in each match. These events (e.g., passes, shots, fouls) are collected by human operators manually, constituting a considerable cost for data providers in terms of time and economic resources. In this paper, we describe PassNet, a method to recognize the most frequent events in soccer, i.e., passes, from video streams. Our model combines a set of artificial neural networks that perform feature extraction from video streams, object detection to identify the positions of the ball and the players, and classification of frame sequences as passes or not passes. We test PassNet on different scenarios, depending on the similarity of conditions to the match used for training. Our results show good classification results and significant improvement in the accuracy of pass detection with respect to baseline classifiers, even when the match's video conditions of the test and training sets are considerably different. PassNet is the first step towards an automated event annotation system that may break the time and the costs for event annotation, enabling data collections for minor and non-professional divisions, youth leagues and, in general, competitions whose matches are not currently annotated by data providers.Source: Research report, H2020 SoBigData++, 871042, 2020
Project(s): SoBigData via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA Open Access

2020 Journal article Open Access OPEN

Virtual to real adaptation of pedestrian detectors
Ciampi L., Messina N., Falchi F., Gennaro C., Amato G.
Pedestrian detection through Computer Vision is a building block for a multitude of applications. Recently, there has been an increasing interest in convolutional neural network-based architectures to execute such a task. One of these supervised networks' critical goals is to generalize the knowledge learned during the training phase to new scenarios with different characteristics. A suitably labeled dataset is essential to achieve this purpose. The main problem is that manually annotating a dataset usually requires a lot of human effort, and it is costly. To this end, we introduce ViPeD (Virtual Pedestrian Dataset), a new synthetically generated set of images collected with the highly photo-realistic graphical engine of the video game GTA V (Grand Theft Auto V), where annotations are automatically acquired. However, when training solely on the synthetic dataset, the model experiences a Synthetic2Real domain shift leading to a performance drop when applied to real-world images. To mitigate this gap, we propose two different domain adaptation techniques suitable for the pedestrian detection task, but possibly applicable to general object detection. Experiments show that the network trained with ViPeD can generalize over unseen real-world scenarios better than the detector trained over real-world data, exploiting the variety of our synthetic dataset. Furthermore, we demonstrate that with our domain adaptation techniques, we can reduce the Synthetic2Real domain shift, making the two domains closer and obtaining a performance improvement when testing the network over the real-world images.Source: Sensors (Basel) 20 (2020). doi:10.3390/s20185250
DOI: 10.3390/s20185250

See at: Sensors Open Access | arXiv.org e-Print Archive Open Access | Sensors Open Access | Europe PubMed Central Open Access | ISTI Repository Open Access | CNR ExploRA Open Access | Sensors Open Access | Sensors Open Access | Sensors Open Access | Sensors Open Access

2020 Journal article Embargo

Cross-resolution face recognition adversarial attacks
Massoli F. V., Falchi F., Amato G.
Face Recognition is among the best examples of computer vision problems where the supremacy of deep learning techniques compared to standard ones is undeniable. Unfortunately, it has been shown that they are vulnerable to adversarial examples - input images to which a human imperceptible perturbation is added to lead a learning model to output a wrong prediction. Moreover, in applications such as biometric systems and forensics, cross-resolution scenarios are easily met with a non-negligible impact on the recognition performance and adversary's success. Despite the existence of such vulnerabilities set a harsh limit to the spread of deep learning-based face recognition systems to real-world applications, a comprehensive analysis of their behavior when threatened in a cross-resolution setting is missing in the literature. In this context, we posit our study, where we harness several of the strongest adversarial attacks against deep learning-based face recognition systems considering the cross-resolution domain. To craft adversarial instances, we exploit attacks based on three different metrics, i.e., L, L, and L, and we study the resilience of the models across resolutions. We then evaluate the performance of the systems against the face identification protocol, open- and close-set. In our study, we find that the deep representation attacks represents a much dangerous menace to a face recognition system than the ones based on the classification output independently from the used metric. Furthermore, we notice that the input image's resolution has a non-negligible impact on an adversary's success in deceiving a learning model. Finally, by comparing the performance of the threatened networks under analysis, we show how they can benefit from a cross-resolution training approach in terms of resilience to adversarial attacks.Source: Pattern recognition letters 140 (2020): 222–229. doi:10.1016/j.patrec.2020.10.008
DOI: 10.1016/j.patrec.2020.10.008
Project(s): AI4EU via OpenAIRE

See at: Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | Pattern Recognition Letters Restricted | CNR ExploRA Restricted | www.sciencedirect.com Restricted | Pattern Recognition Letters Restricted

2020 Journal article Open Access OPEN

Detection of Face Recognition Adversarial Attacks
Massoli F. V., Carrara F., Amato G., Falchi F.
Deep Learning methods have become state-of-the-art for solving tasks such as Face Recognition (FR). Unfortunately, despite their success, it has been pointed out that these learning models are exposed to adversarial inputs - images to which an imperceptible amount of noise for humans is added to maliciously fool a neural network - thus limiting their adoption in sensitive real-world applications. While it is true that an enormous effort has been spent to train robust models against this type of threat, adversarial detection techniques have recently started to draw attention within the scientific community. The advantage of using a detection approach is that it does not require to re-train any model; thus, it can be added to any system. In this context, we present our work on adversarial detection in forensics mainly focused on detecting attacks against FR systems in which the learning model is typically used only as features extractor. Thus, training a more robust classifier might not be enough to counteract the adversarial threats. In this frame, the contribution of our work is four-fold: (i) we test our proposed adversarial detection approach against classification attacks, i.e., adversarial samples crafted to fool an FR neural network acting as a classifier; (ii) using a k-Nearest Neighbor (k-NN) algorithm as a guide, we generate deep features attacks against an FR system based on a neural network acting as features extractor, followed by a similarity-based procedure which returns the query identity; (iii) we use the deep features attacks to fool an FR system on the 1:1 face verification task, and we show their superior effectiveness with respect to classification attacks in evading such type of system; (iv) we use the detectors trained on the classification attacks to detect the deep features attacks, thus showing that such approach is generalizable to different classes of offensives.Source: Computer vision and image understanding (Print) 202 (2020). doi:10.1016/j.cviu.2020.103103
DOI: 10.1016/j.cviu.2020.103103
Project(s): AI4EU via OpenAIRE

See at: arXiv.org e-Print Archive Open Access | Computer Vision and Image Understanding Open Access | ISTI Repository Open Access | Computer Vision and Image Understanding Restricted | Computer Vision and Image Understanding Restricted | Computer Vision and Image Understanding Restricted | Computer Vision and Image Understanding Restricted | CNR ExploRA Restricted | Computer Vision and Image Understanding Restricted | Computer Vision and Image Understanding Restricted | www.sciencedirect.com Restricted

2020 Journal article Open Access OPEN

Learning accurate personal protective equipment detection from virtual worlds
Di Benedetto M., Carrara F., Meloni E., Amato G., Falchi F., Gennaro C.
Deep learning has achieved impressive results in many machine learning tasks such as image recognition and computer vision. Its applicability to supervised problems is however constrained by the availability of high-quality training data consisting of large numbers of humans annotated examples (e.g. millions). To overcome this problem, recently, the AI world is increasingly exploiting artificially generated images or video sequences using realistic photo rendering engines such as those used in entertainment applications. In this way, large sets of training images can be easily created to train deep learning algorithms. In this paper, we generated photo-realistic synthetic image sets to train deep learning models to recognize the correct use of personal safety equipment (e.g., worker safety helmets, high visibility vests, ear protection devices) during at-risk work activities. Then, we performed the adaptation of the domain to real-world images using a very small set of real-world images. We demonstrated that training with the synthetic training set generated and the use of the domain adaptation phase is an effective solution for applications where no training set is available.Source: Multimedia tools and applications (2020). doi:10.1007/s11042-020-09597-9
DOI: 10.1007/s11042-020-09597-9
Project(s): AI4EU via OpenAIRE

See at: ISTI Repository Open Access | Multimedia Tools and Applications Restricted | Multimedia Tools and Applications Restricted | Multimedia Tools and Applications Restricted | Multimedia Tools and Applications Restricted | CNR ExploRA Restricted