103 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2019 Journal article Open Access OPEN
LSTM-based real-time action detection and prediction in human motion streams
Carrara F., Elias P., Sedmidubsky J., Zezula P.
Motion capture data digitally represent human movements by sequences of 3D skeleton configurations. Such spatio-temporal data, often recorded in the stream-based nature, need to be efficiently processed to detect high-interest actions, for example, in human-computer interaction to understand hand gestures in real time. Alternatively, automatically annotated parts of a continuous stream can be persistently stored to become searchable, and thus reusable for future retrieval or pattern mining. In this paper, we focus on multi-label detection of user-specified actions in unsegmented sequences as well as continuous streams. In particular, we utilize the current advances in recurrent neural networks and adopt a unidirectional LSTM model to effectively encode the skeleton frames within the hidden network states. The model learns what subsequences of encoded frames belong to the specified action classes within the training phase. The learned representations of classes are then employed within the annotation phase to infer the probability that an incoming skeleton frame belongs to a given action class. The computed probabilities are finally compared against a learned threshold to automatically determine the beginnings and endings of actions. To further enhance the annotation accuracy, we utilize a bidirectional LSTM model to estimate class probabilities by considering not only the past frames but also the future ones. We extensively evaluate both the models on the three use cases of real-time stream annotation, offline annotation of long sequences, and early action detection and prediction. The experiments demonstrate that our models outperform the state of the art in effectiveness and are at least one order of magnitude more efficient, being able to annotate 10 k frames per second.Source: Multimedia tools and applications 78 (2019): 27309–27331. doi:10.1007/s11042-019-07827-3
DOI: 10.1007/s11042-019-07827-3
Metrics:


See at: ISTI Repository Open Access | Multimedia Tools and Applications Restricted | link.springer.com Restricted | CNR ExploRA


2019 Conference article Open Access OPEN
Surrogate text representation of visual features for fast image retrieval
Carrara F.
We propose a simple and effective methodology to index and retrieve image features without the need for a time-consuming codebook learning step. We employ a scalar quantization approach combined with Surrogate Text Representation (STR) to perform large-scale image retrieval relying on the latest text search engine technologies. Experiments on large-scale image retrieval benchmarks show that we improve the effectiveness-efficiency trade-off of current STR approaches while performing comparably to state-of-the-art main-memory methods without requiring a codebook learning procedure.Source: SEBD 2019 - 27th Italian Symposium on Advanced Database Systems, Castiglione della Pescaia, Grosseto, Italy, 16-19 June 2019

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA


2023 Journal article Restricted
Conditioned cooperative training for semi-supervised weapon detection
Salazar González J. L., Álvarez-García J. A., Rendón-Segador F. J., Carrara F.
Violent assaults and homicides occur daily, and the number of victims of mass shootings increases every year. However, this number can be reduced with the help of Closed Circuit Television (CCTV) and weapon detection models, as generic object detectors have become increasingly accurate with more data for training. We present a new semi-supervised learning methodology based on conditioned cooperative student-teacher training with optimal pseudo-label generation using a novel confidence threshold search method and improving both models by conditional knowledge transfer. Furthermore, a novel firearms image dataset of 458,599 images was collected using Instagram hashtags to evaluate our approach and compare the improvements obtained using a specific unsupervised dataset instead of a general one such as ImageNet. We compared our methodology with supervised, semi-supervised and self-supervised learning techniques, outperforming approaches such as YOLOv5 m (up to +19.86), YOLOv5l (up to +6.52) Unbiased Teacher (up to +10.5 AP), DETReg (up to +2.8 AP) and UP-DETR (up to +1.22 AP).Source: Neural networks 167 (2023): 489–501. doi:10.1016/j.neunet.2023.08.043
DOI: 10.1016/j.neunet.2023.08.043
Metrics:


See at: www.sciencedirect.com Restricted | CNR ExploRA


2020 Contribution to book Open Access OPEN
Preface - SISAP 2020
Satoh S., Vadicamo L., Zimek A., Carrara F., Bartolini I., Aumüller M., Jónsson B. Þór, Pagh R.
Preface of Volume 12440 LNCS,2020, Pages v-vi, 13th International Conference on Similarity Search and Applications, SISAP 2020.Source: Similarity Search and Applications, pp. v–vi. New York: Springer Science and Business Media, 2020
DOI: 10.1007/978-3-030-60936-8
Metrics:


See at: link.springer.com Open Access | ISTI Repository Open Access | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA


2022 Conference article Open Access OPEN
Tuning neural ODE networks to increase adversarial robustness in image forensics
Caldelli R., Carrara F., Falchi F.
Although deep-learning-based solutions are pervading different application sectors, many doubts have arisen about their reliability and, above all, their security against threats that can mislead their decision mechanisms. In this work, we considered a particular kind of deep neural network, the Neural Ordinary Differential Equations (N-ODE) networks, which have shown intrinsic robustness against adversarial samples by properly tuning their tolerance parameter at test time. Their behaviour has never been investigated in image forensics tasks such as distinguishing between an original and an altered image. Following this direction, we demonstrate how tuning the tolerance parameter during the prediction phase can control and increase N-ODE's robustness versus adversarial attacks. We performed experiments on basic image transformations used to generate tampered data, providing encouraging results in terms of adversarial rejection and preservation of the correct classification of pristine images.Source: ICIP 2022 - IEEE International Conference on Image Processing, pp. 1496–1500, Bordeaux, France, 16-19/10/2022
DOI: 10.1109/icip46576.2022.9897662
Project(s): AI4Media via OpenAIRE
Metrics:


See at: ISTI Repository Open Access | ieeexplore.ieee.org Restricted | CNR ExploRA


2023 Conference article Open Access OPEN
SegmentCodeList: unsupervised representation learning for human skeleton data retrieval
Sedmidubsky J., Carrara F., Amato G.
Recent progress in pose-estimation methods enables the extraction of sufficiently-precise 3D human skeleton data from ordinary videos, which offers great opportunities for a wide range of applications. However, such spatio-temporal data are typically extracted in the form of a continuous skeleton sequence without any information about semantic segmentation or annotation. To make the extracted data reusable for further processing, there is a need to access them based on their content. In this paper, we introduce a universal retrieval approach that compares any two skeleton sequences based on temporal order and similarities of their underlying segments. The similarity of segments is determined by their content-preserving low-dimensional code representation that is learned using the Variational AutoEncoder principle in an unsupervised way. The quality of the proposed representation is validated in retrieval and classification scenarios; our proposal outperforms the state-of-the-art approaches in effectiveness and reaches speed-ups up to 64x on common skeleton sequence datasets.Source: ECIR 2023 - 45th European Conference on Information Retrieval, pp. 110–124, Dublin, Ireland, 2-6/4/2023
DOI: 10.1007/978-3-031-28238-6_8
Project(s): AI4Media via OpenAIRE
Metrics:


See at: ISTI Repository Open Access | link.springer.com Restricted | CNR ExploRA


2023 Conference article Open Access OPEN
A workflow for developing biohybrid intelligent sensing systems
Fazzari E., Carrara F., Falchi F., Stefanini C., Romano D.
Animal are sometime exploited as biosensors for assessing the presence of volatile organic compounds (VOCs) in the environment by interpreting their stereotyped behavioral responses. However, current approaches are based on direct human observation to assess the changes in animal behaviors associated to specific environmental stimuli. We propose a general workflow based on artificial intelligence that use pose estimation and sequence classification technique to automate this process. This study also provides an example of its application studying the antennae movement of an insect (e.g. a cricket) in response to the presence of two chemical stimuli.Source: Ital-IA 2023, pp. 555–560, Pisa, Italy, 29-31/05/2023

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA


2023 Journal article Open Access OPEN
Using AI to decode the behavioral responses of an insect to chemical stimuli: towards machine-animal computational technologies
Fazzari E., Carrara F., Falchi F., Stefanini C., Romano D.
Orthoptera are insects with excellent olfactory sense abilities due to their antennae richly equipped with receptors. This makes them interesting model organisms to be used as biosensors for environmental and agricultural monitoring. Herein, we investigated if the house cricket Acheta domesticus can be used to detect different chemical cues by examining the movements of their antennae and attempting to identify specific antennal displays associated to different chemical cues exposed (e.g., sucrose or ammonia powder). A neural network based on state-of-the-art techniques (i.e., SLEAP) for pose estimation was built to identify the proximal and distal ends of the antennae. The network was optimised via grid search, resulting in a mean Average Precision (mAP) of 83.74%. To classify the stimulus type, another network was employed to take in a series of keypoint sequences, and output the stimulus classification. To find the best one-dimensional convolutional and recurrent neural networks, a genetic algorithm-based optimisation method was used. These networks were validated with iterated K-fold validation, obtaining an average accuracy of 45.33% for the former and 44% for the latter. Notably, we published and introduced the first dataset on cricket recordings that relate this animal's behaviour to chemical stimuli. Overall, this study proposes a novel and simple automated method that can be extended to other animals for the creation of Biohybrid Intelligent Sensing Systems (e.g., automated video-analysis of an organism's behaviour) to be exploited in various ecological scenarios.Source: International journal of machine learning and cybernetics (Print) (2023). doi:10.1007/s13042-023-02009-y
DOI: 10.1007/s13042-023-02009-y
Metrics:


See at: link.springer.com Open Access | ISTI Repository Open Access | ISTI Repository Open Access | ISTI Repository Open Access | CNR ExploRA


2015 Report Open Access OPEN
Efficient foreground-background segmentation using local features for object detection
Carrara F., Amato G., Falchi F., Gennaro C.
In this work, a local feature based background modelling for background-foreground feature segmentation is presented. In local feature based computer vision applications, a local feature based model presents advantages with respect to classical pixel-based ones in terms of informativeness, robustness and segmentation performances. The method discussed in this paper is a block-wise background modelling where we propose to store the positions of only most frequent local feature configurations for each block. Incoming local features are classified as background or foreground depending on their position with respect to stored configurations. The resulting classification is refined applying a block-level analysis. Experiments on public dataset were conducted to compare the presented method to classical pixel-based background modellingSource: ISTI Technical reports, 2015

See at: ISTI Repository Open Access | CNR ExploRA


2015 Conference article Open Access OPEN
Efficient foreground-background segmentation using local features for object detection
Carrara F., Amato G., Falchi F., Gennaro C.
In this work, a local feature based background modelling for background-foreground feature segmentation is presented. In local feature based computer vision applications, a local feature based model presents advantages with respect to classical pixel-based ones in terms of informativeness, robustness and segmentation performances. The method discussed in this paper is a block-wise background modelling where we propose to store the positions of only most frequent local feature configurations for each block. Incoming local features are classified as background or foreground depending on their position with respect to stored configurations. The resulting classification is refined applying a block-level analysis. Experiments on public dataset were conducted to compare the presented method to classical pixel-based background modelling.Source: 9th International Conference on Distributed Smart Cameras, pp. 175–180, Seville, Spain, 8-11/09/2015
DOI: 10.1145/2789116.2789136
Metrics:


See at: ISTI Repository Open Access | dl.acm.org Restricted | doi.org Restricted | CNR ExploRA


2015 Conference article Open Access OPEN
Semiautomatic learning of 3D objects from video streams
Carrara F., Falchi F., Gennaro C.
Object detection and recognition are classical problems in computer vision, but are still challenging without a priori knowledge of objects and with a limited user interaction. In this work, a semiautomatic system for visual object learning from video stream is presented. The system detects movable foreground objects relying on FAST interest points. Once a view of an object has been segmented, the system relies on ORB features to create its descriptor, store it and compare it with descriptors of previously seen views. To this end, a visual similarity function based on geometry consistency of the local features is used. The system groups together similar views of the same object into clusters relying on the transitivity of similarity among them. Each cluster identifies a 3D object and the system learn to autonomously recognize a particular view assessing its cluster membership. When ambiguities arise, the user is asked to validate the membership assignments. Experiments have demonstrated the ability of the system to group together unlabeled views, reducing the labeling work of the user.Source: Similarity Search and Applications. 8th International Conference (SISAP 2015), pp. 217–228, Glasgow, UK, 12-14/10/2015
DOI: 10.1007/978-3-319-25087-8_20
Metrics:


See at: ISTI Repository Open Access | doi.org Restricted | link.springer.com Restricted | CNR ExploRA


2017 Conference article Open Access OPEN
Detecting adversarial example attacks to deep neural networks
Carrara F., Falchi F., Caldelli R., Amato G., Fumarola R., Becarelli R.
Deep learning has recently become the state of the art in many computer vision applications and in image classification in particular. However, recent works have shown that it is quite easy to create adversarial examples, i.e., images intentionally created or modified to cause the deep neural network to make a mistake. They are like optical illusions for machines containing changes unnoticeable to the human eye. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks.Source: CBMI '17 - 15th International Workshop on Content-Based Multimedia Indexing, Firenze, Italy, 19-21 June 2017
DOI: 10.1145/3095713.3095753
Metrics:


See at: ISTI Repository Open Access | dl.acm.org Restricted | doi.org Restricted | CNR ExploRA


2017 Conference article Open Access OPEN
Efficient indexing of regional maximum activations of convolutions using full-text search engines
Amato G., Carrara F., Falchi F., Gennaro C.
In this paper, we adapt a surrogate text representation technique to develop efficient instance-level image retrieval using Regional Maximum Activations of Convolutions (R-MAC). R-MAC features have recently showed outstanding performance in visual instance retrieval. However, contrary to the activations of hidden layers adopting ReLU (Rectified Linear Unit), these features are dense. This constitutes an obstacle to the direct use of inverted indexes, which rely on sparsity of data. We propose the use of deep permutations, a recent approach for efficient evaluation of permutations, to generate surrogate text representation of R-MAC features, enabling indexing of visual features as text into a standard search-engine. The experiments, conducted on Lucene, show the effectiveness and efficiency of the proposed approach.Source: 2017 ACM on International Conference on Multimedia Retrieval (ICMR 2017), pp. 420–423, Bucharest, Romania, 6-9 June 2017
DOI: 10.1145/3078971.3079035
Metrics:


See at: ISTI Repository Open Access | dl.acm.org Restricted | doi.org Restricted | CNR ExploRA


2017 Report Open Access OPEN
Exploring epoch-dependent stochastic residual networks
Carrara F., Esuli A., Falchi F., Moreo Fernández A.
The recently proposed stochastic residual networks selectively activate or bypass the layers during training, based on independent stochastic choices, each of which following a probability distribution that is fixed in advance. In this paper we present a first exploration on the use of an epoch-dependent distribution, starting with a higher probability of bypassing deeper layers and then activating them more frequently as training progresses. Preliminary results are mixed, yet they show some potential of adding an epoch-dependent management of distributions, worth of further investigation.Source: Research report, 2017

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA


2018 Conference article Open Access OPEN
Large-scale image retrieval with Elasticsearch
Amato G., Bolettieri P., Carrara F., Falchi F., Gennaro C.
Content-Based Image Retrieval in large archives through the use of visual features has become a very attractive research topic in recent years. The cause of this strong impulse in this area of research is certainly to be attributed to the use of Convolutional Neural Network (CNN) activations as features and their outstanding performance. However, practically all the available image retrieval systems are implemented in main memory, limiting their applicability and preventing their usage in big-data applications. In this paper, we propose to transform CNN features into textual representations and index them with the well-known full-text retrieval engine Elasticsearch. We validate our approach on a novel CNN feature, namely Regional Maximum Activations of Convolutions. A preliminary experimental evaluation, conducted on the standard benchmark INRIA Holidays, shows the effectiveness and efficiency of the proposed approach and how it compares to state-of-the-art main-memory indexes.Source: SIGIR 2018: 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 925–928, Ann Arbor Michigan, U.S.A, 8-12 Luglio 2018
DOI: 10.1145/3209978.3210089
Metrics:


See at: ISTI Repository Open Access | dl.acm.org Restricted | doi.org Restricted | CNR ExploRA


2019 Conference article Open Access OPEN
Adversarial Examples Detection in Features Distance Spaces
Carrara F., Becarelli R., Caldelli R., Falchi F., Amato G.
Maliciously manipulated inputs for attacking machine learning methods -- in particular deep neural networks -- are emerging as a relevant issue for the security of recent artificial intelligence technologies, especially in computer vision. In this paper, we focus on attacks targeting image classifiers implemented with deep neural networks, and we propose a method for detecting adversarial images which focuses on the trajectory of internal representations (i.e. hidden layers neurons activation, also known as deep features) from the very first, up?to the last. We argue that the representations of adversarial inputs follow a different evolution with respect to genuine inputs, and we define a distance-based embedding of features to efficiently encode this information. We train an LSTM network that analyzes the sequence of deep features embedded in a distance space to detect adversarial examples. The results of our preliminary experiments are encouraging: our detection scheme is able to detect adversarial inputs targeted to the ResNet-50 classifier pre-trained on the ILSVRC'12 dataset and generated by a variety of crafting algorithms.Source: ECCV: European Conference on Computer Vision, pp. 313–327, Monaco, Germania, 8-14 Settembre 2018
DOI: 10.1007/978-3-030-11012-3_26
Metrics:


See at: ISTI Repository Open Access | doi.org Restricted | link.springer.com Restricted | CNR ExploRA


2019 Conference article Open Access OPEN
Evaluation of continuous image features learned by ODE nets
Carrara F., Amato G., Falchi F., Gennaro C.
Deep-learning approaches in data-driven modeling relies on learning a finite number of transformations (and representations) of the data that are structured in a hierarchy and are often instantiated as deep neural networks (and their internal activations). State-of-the-art models for visual data usually implement deep residual learning: the network learns to predict a finite number of discrete updates that are applied to the internal network state to enrich it. Pushing the residual learning idea to the limit, ODE Net--a novel network formulation involving continuously evolving internal representations that gained the best paper award at NeurIPS 2018--has been recently proposed. Differently from traditional neural networks, in this model the dynamics of the internal states are defined by an ordinary differential equation with learnable parameters that defines a continuous transformation of the input representation. These representations can be computed using standard ODE solvers, and their dynamics can be steered to learn the input-output mapping by adjusting the ODE parameters via standard gradient-based optimization. In this work, we investigate the image representation learned in the continuous hidden states of ODE Nets. In particular, we train image classifiers including ODE-defined continuous layers and perform preliminary experiments to assess the quality, in terms of transferability and generality, of the learned image representations and compare them to standard representation extracted from residual networks. Experiments on CIFAR-10 and Tiny-ImageNet-200 datasets show that representations extracted from ODE Nets are more transferable and suggest an improved robustness to overfit.Source: Image Analysis and Processing - ICIAP 2019, pp. 432–442, Trento, Italia, 9/9/2019 - 13/9/2019
DOI: 10.1007/978-3-030-30642-7_39
Project(s): AI4EU via OpenAIRE
Metrics:


See at: ISTI Repository Open Access | Lecture Notes in Computer Science Restricted | link.springer.com Restricted | CNR ExploRA


2019 Journal article Open Access OPEN
Efficient evaluation of image quality via deep-learning approximation of perceptual metrics
Artusi A., Banterle F., Moreo A., Carrara F.
Image metrics based on Human Visual System (HVS) play a remarkable role in the evaluation of complex image processing algorithms. However, mimicking the HVS is known to be complex and computationally expensive (both in terms of time and memory), and its usage is thus limited to a few applications and to small input data. All of this makes such metrics not fully attractive in real-world scenarios. To address these issues, we propose Deep Image Quality Metric ( DIQM ), a deep-learning approach to learn the global image quality feature ( mean-opinion-score ). DIQM can emulate existing visual metrics efficiently, reducing the computational costs by more than an order of magnitude with respect to existing implementations.Source: IEEE transactions on image processing (Online) 29 (2019): 1843–1855. doi:10.1109/TIP.2019.2944079
DOI: 10.1109/tip.2019.2944079
Project(s): ENCORE via OpenAIRE, RISE via OpenAIRE
Metrics:


See at: ISTI Repository Open Access | ZENODO Open Access | IEEE Transactions on Image Processing Open Access | IEEE Transactions on Image Processing Restricted | ieeexplore.ieee.org Restricted | CNR ExploRA


2019 Journal article Open Access OPEN
Detecting adversarial inputs by looking in the black box
Carrara F., Falchi F., Amato G., Becarelli R., Caldelli R.
The astonishing and cryptic effectiveness of Deep Neural Networks comes with the critical vulnerability to adversarial inputs - samples maliciously crafted to confuse and hinder machine learning models. Insights into the internal representations learned by deep models can help to explain their decisions and estimate their confidence, which can enable us to trace, characterise, and filter out adversarial attacks.Source: ERCIM news (2019): 16–17.

See at: ercim-news.ercim.eu Open Access | ISTI Repository Open Access | CNR ExploRA


2019 Conference article Open Access OPEN
Exploiting CNN layer activations to improve adversarial image classification
Caldelli R., Becarelli R., Carrara F., Falchi F., Amato G.
Neural networks are now used in many sectors of our daily life thanks to efficient solutions such instruments provide for diverse tasks. Leaving to artificial intelligence the chance to make choices on behalf of humans inevitably exposes these tools to be fraudulently attacked. In fact, adversarial examples, intentionally crafted to fool a neural network, can dangerously induce a misclassification though appearing innocuous for a human observer. On such a basis, this paper focuses on the problem of image classification and proposes an analysis to better insight what happens inside a convolutional neural network (CNN) when it evaluates an adversarial example. In particular, the activations of the internal network layers have been analyzed and exploited to design possible countermeasures to reduce CNN vulnerability. Experimental results confirm that layer activations can be adopted to detect adversarial inputs.Source: ICIP 2019 - IEEE International Conference on Image Processing, pp. 2289–2293, Taipei, Taiwan, 22-25 September, 2019
DOI: 10.1109/icip.2019.8803776
Metrics:


See at: ISTI Repository Open Access | doi.org Restricted | ieeexplore.ieee.org Restricted | CNR ExploRA