Page 1 of 4

2018 Other Open Access

Market Research, Deep Learning, and Quantification
Esuli A, Moreo Fernandez A, Sebastiani F
An abstract is not available

See at: goo.gl Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted

2018 Other Open Access

L'Epistola a Cangrande al vaglio della authorship verification
Corbara S, Moreo Fernandez A, Sebastiani F, Tavoni M
[Abstract non disponibile]

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2018 Other Open Access

Polylingual Text Classification via Funnelling
Esuli A, Moreo Fernandez A, Sebastiani F
[Abstract not available]

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2018 Software Metadata Only Access

PyDCI repository
Moreo Fernandez A
A Python Implementation of the Distributional Correspondence Indexig (DCI) algorithm for cross-domain and cross-lingual domain adaptation, described in https://arxiv.org/abs/1810.09311.

See at: github.com Restricted | CNR IRIS

2019 Software Metadata Only Access

Funnelling repository
Moreo Fernandez Ad
Software repository containing the Python code implementing Funnelling, a new ensemble method for heterogeneous transfer learning described in https://arxiv.org/abs/1901.11459.

See at: github.com Restricted | CNR IRIS

2018 Software Metadata Only Access

QuaNet repository
Esuli A, Moreo Fernandez Ad
This repository contains the Python code implementing the QuaNet (described in https://arxiv.org/pdf/1809.00836.pdf) model for quantification and everything needed to reproduce all experiments.

See at: github.com Restricted | CNR IRIS

2018 Software Metadata Only Access

inntt: Interactive NeuralNet Trainer for pyTorch
Moreo Fernandez A
Interactive NeuralNet Trainer for pyTorch (INNTT) is a Python class that allows the practitioner to modify many hyperparameters involved in the training of neural networks in PyTorch on the fly, interacting with the keyboard.

See at: github.com Restricted | CNR IRIS

2017 Journal article Open Access

Lightweight random indexing for polylingual text classification
Moreo Fernandez A, Esuli A, Sebastiani F
Researchers from ISTI-CNR, Pisa (in a joint effort with the Qatar Computing Research Institute), have undertaken an effort aimed at producing more accurate and more efficient means of performing poly-lingual text classification, i.e., automatic text classification in which classifying text in one language can also leverage training data expressed in a different language.Source: ERCIM NEWS, p. 41

See at: ercim-news.ercim.eu Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted

2017 Journal article Open Access

Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification
Moreo Fernandez A, Esuli A, Sebastiani F
Researchers from ISTI-CNR, Pisa (in a joint effort with the Qatar Computing Research Institute), have developed a transfer learning method that allows cross-domain and cross-lingual sentiment classification to be performed accurately and efficiently. This means sentiment classification efforts can leverage training data originally developed for performing sentiment classification on other domains and/or in other languages.Source: ERCIM NEWS, vol. 111, p. 48

See at: ercim-news.ercim.eu Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted

2020 Software Metadata Only Access

PyDRO: A Python reimplementation of the Distributional Random Oversampling method for binary text classification
Moreo Fernandez Ad
This repo is a stand-alone (re)implementation of the Distributional Random Oversampling (DRO) method presented in SIGIR'16. The former implementation was part of the JaTeCs framework for Java. Distributional Random Oversampling (DRO) is an oversampling method to counter data imbalance in binary text classification. DRO generates new random minority-class synthetic documents by exploiting the distributional properties of the terms in the collection. The variability introduced by the oversampling method is enclosed in a latent space; the original space is replicated and left untouched.

See at: github.com Restricted | CNR IRIS

2017 Other Open Access

Exploring epoch-dependent stochastic residual networks
Carrara F, Esuli A, Falchi F, Moreo Fernández A
The recently proposed stochastic residual networks selectively activate or bypass the layers during training, based on independent stochastic choices, each of which following a probability distribution that is fixed in advance. In this paper we present a first exploration on the use of an epoch-dependent distribution, starting with a higher probability of bypassing deeper layers and then activating them more frequently as training progresses. Preliminary results are mixed, yet they show some potential of adding an epoch-dependent management of distributions, worth of further investigation.

See at: arxiv.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted

2016 Software Metadata Only Access

Java Text Categorization System
Esuli A, Fagni T, Moreo Fernandez A
JaTeCS is an open source Java library focused on Automatic Text Categorization (ATC). It covers all the steps of an experimental activity, from reading the corpus to the evaluation of the experimental results. JaTeCS focuses on text as the central input, and its code is optimized for this type of data. As with many other machine learning (ML) frameworks, it provides data readers for many formats and well-known corpora, NLP tools, feature selection and weighting methods, the implementation of many ML algorithms as well as wrappers for well-known external software (e.g., libSVM, SVM_light). JaTeCS also provides the implementation of methods related to ATC that are rarely, if never, provided by other ML framework (e.g., active learning, quantification, transfer learning).

See at: github.com Restricted | CNR IRIS

2018 Conference article Open Access

Distributional correspondence indexing for cross-lingual and cross-domain sentiment classification (Extended Abstract)
Moreo Fernandez A, Esuli A, Sebastiani F
Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a "target" domain when the only available training data belongs to a different "source" domain. In this extended abstract we briefly describe a new DA method called Distributional Correspondence Indexing (DCI) for sentiment classification. DCI derives term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a pivot, i.e., to a highly predictive term that behaves similarly across domains. The experiments we have conducted show that DCI obtains better performance than current state-of-the-art techniques for cross-lingual and cross-domain sentiment classification.

See at: CNR IRIS Open Access | ISTI Repository | www.ijcai.org | CNR IRIS Restricted

2018 Conference article Open Access

Lightweight random indexing for polylingual text classification
Moreo Fernandez A, Esuli A, Sebastiani F
Polylingual Text Classification (PLC) is a supervised learning task that consists of assigning class labels to documents written in different languages, assuming that a representative set of training documents is available for each language. This scenario is more and more frequent, given the large quantity of multilingual platforms and communities emerging on the Internet. In this work we analyse some important methods proposed in the literature that are machine-translation-free and dictionary-free, and we propose a particular configuration of the Random Indexing method (that we dub Lightweight Random Indexing). We show that it outperforms all compared algorithms and also displays a significantly reduced computational cost.Source: IJCAI, pp. 5642-5646. Stockholm, SE, 13/07/2018, 19/07/2018

See at: CNR IRIS Open Access | ISTI Repository | www.ijcai.org | CNR IRIS Restricted | CNR IRIS

2019 Contribution to book Open Access

The Epistle to Cangrande Through the Lens of Computational Authorship Verification
Corbara S., Moreo Fernandez A. D., Sebastiani F., Tavoni M.
The Epistle to Cangrande is one of the most controversial among the works of Italian poet Dante Alighieri. For more than a hundred years now, scholars have been debating over its real paternity, i.e., whether it should be considered a true work by Dante or a forgery by an unnamed author. In this work we address this philological problem through the methodologies of (supervised) Computational Authorship Verification, by training a classifier that predicts whether a given work is by Dante Alighieri or not. We discuss the system we have set up for this endeavour, the training set we have assembled, the experimental results we have obtained, and some issues that this work leaves open.

See at: CNR IRIS Open Access | link.springer.com | ISTI Repository | CNR IRIS Restricted | CNR IRIS

2019 Other Open Access

Word-Class Embeddings for Multiclass Text Classification
Moreo Fernandez Ad, Esuli A, Sebastiani F
Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using four popular neural architectures and six widely used and publicly available datasets for multiclass text classification. Our code that implements WCEs is publicly available at this https URL

See at: arxiv.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted

2019 Other Open Access

Cross-Lingual Sentiment Quantification
Esuli A, Moreo Fernandez A D, Sebastiani F
We discuss Cross-Lingual Text Quantification (CLTQ), the task of performing text quantification (i.e., estimating the relative frequency pc(D) of all classes c?C in a set D of unlabelled documents) when training documents are available for a source language S but not for the target language T for which quantification needs to be performed. CLTQ has never been discussed before in the literature; we establish baseline results for the binary case by combining state-of-the-art quantification methods with methods capable of generating cross-lingual vectorial representations of the source and target documents involved. We present experimental results obtained on publicly available datasets for cross-lingual sentiment classification; the results show that the presented methods can perform CLTQ with a surprising level of accuracy.

See at: arxiv.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted

2019 Conference article Open Access

Learning to quantify: Estimating class prevalence via supervised learning
Moreo Fernandez Ad, Sebastiani F
Quantification (also known as "supervised prevalence estimation", or" class prior estimation") is the task of estimating, given a set ? of unlabelled items and a set of classes C= c1,..., c| C|, the relative frequency (or" prevalence") p (ci) of each class ci C, ie, the fraction of items in ? that belong to ci. The goal of this course is to introduce the audience to the problem of quantification and to its importance, to the main supervised learning techniques that have been proposed for solving it, to the metrics used to evaluate them, and to what appear to be the most promising directions for further research.

See at: dl.acm.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted | CNR IRIS

2020 Other Open Access

AIMH research activities 2020
Aloia N., Amato G., Bartalesi Lenzi V., Benedetti F., Bolettieri P., Carrara F., Casarosa V., Ciampi L., Concordia C., Corbara S., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo Fernandez A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Thanos C., Trupiano L., Vadicamo L., Vairo C.
Annual Report of the Artificial Intelligence for Media and Humanities laboratory (AIMH) research activities in 2020.DOI: 10.32079/isti-ar-2020/001
Metrics:

See at: CNR IRIS Open Access | ISTI Repository | CNR IRIS Restricted

2019 Conference article Open Access

The Epistle to Cangrande through the Lens of computational authorship verification
Corbara S., Moreo Fernandez A., Sebastiani F., Tavoni M.
The Epistle to Cangrande is one of the most debated documents in the production of the Italian poet Dante Alighieri. For more than a hundred years scholars have been debating over its real paternity, whether it should be considered a work by Dante or a malicious forgery by an unnamed author. In this work, we try to address this philological problem through the methodologies of computational authorship verification and machine learning, by training a classifier on a dataset of medieval Latin prose texts and by using a set of authorship-related features. Although the project is still in a preliminary phase, the early results seem to confirm the hypothesis of a forgery.Source: CEUR WORKSHOP PROCEEDINGS, pp. 29-35. Milan, Italy, July 17-18, 2019

See at: ceur-ws.org Open Access | CNR IRIS | ISTI Repository | CNR IRIS Restricted