74 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2018 Software Unknown
PyDCI repository
Moreo Fernandez A.
A Python Implementation of the Distributional Correspondence Indexig (DCI) algorithm for cross-domain and cross-lingual domain adaptation, described in https://arxiv.org/abs/1810.09311.

See at: github.com | CNR ExploRA


2019 Software Unknown
Funnelling repository
Moreo Fernandez A. D.
Software repository containing the Python code implementing Funnelling, a new ensemble method for heterogeneous transfer learning described in https://arxiv.org/abs/1901.11459.

See at: github.com | CNR ExploRA


2018 Software Unknown
inntt: Interactive NeuralNet Trainer for pyTorch
Moreo Fernandez A.
Interactive NeuralNet Trainer for pyTorch (INNTT) is a Python class that allows the practitioner to modify many hyperparameters involved in the training of neural networks in PyTorch on the fly, interacting with the keyboard.

See at: github.com | CNR ExploRA


2020 Software Unknown
PyDRO: A Python reimplementation of the Distributional Random Oversampling method for binary text classification
Moreo Fernandez A. D.
This repo is a stand-alone (re)implementation of the Distributional Random Oversampling (DRO) method presented in SIGIR'16. The former implementation was part of the JaTeCs framework for Java. Distributional Random Oversampling (DRO) is an oversampling method to counter data imbalance in binary text classification. DRO generates new random minority-class synthetic documents by exploiting the distributional properties of the terms in the collection. The variability introduced by the oversampling method is enclosed in a latent space; the original space is replicated and left untouched.

See at: github.com | CNR ExploRA


2022 Conference article Open Access OPEN
Rhythmic and psycholinguistic features for authorship tasks in the Spanish parliament: evaluation and analysis
Corbara S., Chulvi B., Rosso P., Moreo A.
Among the many tasks of the authorship field, Authorship Identification aims at uncovering the author of a document, while Author Profiling focuses on the analysis of personal characteristics of the author(s), such as gender, age, etc. Methods devised for such tasks typically focus on the style of the writing, and are expected not to make inferences grounded on the topics that certain authors tend to write about. In this paper, we present a series of experiments evaluating the use of topic-agnostic feature sets for Authorship Identification and Author Profiling tasks in Spanish political language. In particular, we propose to employ features based on rhythmic and psycholinguistic patterns, obtained via different approaches of text masking that we use to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by a BETO transformer, when the latter is trained on the original text, i.e., potentially learning from topical information. Moreover, we further investigate the results for the different authors, showing that variations in performance are partially explainable in terms of the authors' political affiliation and communication style.Source: CLEF 2022 - 13th Conference of the CLEF Association, pp. 79–92, Bologna, Italy, 5-8/9/2022
DOI: 10.1007/978-3-031-13643-6_6
Project(s): AI4Media via OpenAIRE
Metrics:


See at: ISTI Repository Open Access | link.springer.com Restricted | CNR ExploRA


2022 Conference article Open Access OPEN
Investigating topic-agnostic features for authorship tasks in Spanish political speeches
Corbara S., Chulvi Ferriols B., Rosso P., Moreo A.
Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information.Source: NLDB 2022 - 27th International Conference on Applications of Natural Language to Information Systems, pp. 394–402, Valencia, Spagna, 15-17/6/2022
DOI: 10.1007/978-3-031-08473-7_36
Project(s): AI4Media via OpenAIRE
Metrics:


See at: ISTI Repository Open Access | doi.org Restricted | link.springer.com Restricted | CNR ExploRA


2015 Conference article Restricted
Distributional correspondence indexing for cross-language text categorization
Esuli A., Fernandez A. M.
Cross-Language Text Categorization (CLTC) aims at producing a classifier for a target language when the only available training examples belong to a different source language. Existing CLTC methods are usually affected by high computational costs, require external linguistic resources, or demand a considerable human annotation effort. This paper presents a simple, yet effective, CLTC method based on projecting features from both source and target languages into a common vector space, by using a computationally lightweight distributional correspondence profile with respect to a small set of pivot terms. Experiments on a popular sentiment classification dataset show that our method performs favorably to state-of-the-art methods, requiring a significantly reduced computational cost and minimal human intervention.Source: ECIR 2015 - Advances in Information Retrieval. 37th European Conference on IR Research, pp. 104–109, Vienna, Austria, 29 March - 2 April 2015
DOI: 10.1007/978-3-319-16354-3_12
Metrics:


See at: doi.org Restricted | link.springer.com Restricted | CNR ExploRA


2016 Conference article Open Access OPEN
Transductive Distributional Correspondence Indexing for cross-domain topic classification
Fernandez A. M., Esuli A., Sebastiani F.
Obtaining high-quality annotated data for training a classifier for a new domain is often costly. Domain Adaptation (DA) aims at leveraging the annotated data available from a different but related source domain in order to deploy a classification model for the target domain of interest, thus alleviating the aforementioned costs. To that aim, the learning model is typically given access to a set of unlabelled documents collected from the target domain. These documents might consist of a representative sample of the target distribution, and they could thus be used to infer a general classification model for the domain (inductive inference). Alternatively, these documents could be the entire set of documents to be classified; this happens when there is only one set of documents we are interested in classifying (transductive inference). Many of the DA methods proposed so far have focused on transductive classification by topic, i.e., the task of assigning class labels to a specific set of documents based on the topics they are about. In this work, we report on new experiments we have conducted in transductive classification by topic using Distributional Correspondence Indexing method, a DA method we have recently developed that delivered state-of-the-art results in inductive classification by sentiment. The results we have obtained on three popular datasets show DCI to be competitive with the state of the art also in this scenario, and to be superior to all compared methods in many cases.Source: 7th Italian Information Retrieval Workshop, pp. 8–11, Venezia, Italy, 30-31 May 2016

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA


2018 Software Unknown
QuaNet repository
Esuli A., Moreo Fernandez A. D.
This repository contains the Python code implementing the QuaNet (described in https://arxiv.org/pdf/1809.00836.pdf) model for quantification and everything needed to reproduce all experiments.

See at: github.com | CNR ExploRA


2018 Contribution to conference Open Access OPEN
L'Epistola a Cangrande al vaglio della authorship verification
Corbara S., Moreo Fernandez A., Sebastiani F., Tavoni M.
[Abstract non disponibile]Source: Workshop "Nuove Inchieste sull'Epistola a Cangrande", Pisa, IT, 18/12/2018

See at: ISTI Repository Open Access | CNR ExploRA


2019 Contribution to conference Open Access OPEN
Learning to quantify: Estimating class prevalence via supervised learning
Moreo Fernandez A. D., Sebastiani F.
Quantification (also known as "supervised prevalence estimation", or" class prior estimation") is the task of estimating, given a set ? of unlabelled items and a set of classes C= c1,..., c| C|, the relative frequency (or" prevalence") p (ci) of each class ci C, ie, the fraction of items in ? that belong to ci. The goal of this course is to introduce the audience to the problem of quantification and to its importance, to the main supervised learning techniques that have been proposed for solving it, to the metrics used to evaluate them, and to what appear to be the most promising directions for further research.Source: 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1415–1416, Paris, France, 21-25/06/2019
DOI: 10.1145/3331184.3331389
Metrics:


See at: ISTI Repository Open Access | dl.acm.org Restricted | doi.org Restricted | CNR ExploRA


2019 Conference article Open Access OPEN
Tutorial: Supervised Learning for Prevalence Estimation
Moreo Fernandez A. D., Sebastiani F.
Quantification is the task of estimating, given a set of unlabelled items and a set of classes, the relative frequency (or "prevalence"). Quantification is important in many disciplines (such as e.g., market research, political science, the social sciences, and epidemiology) which usually deal with aggregate (as opposed to individual) data. In these contexts, classifying individual unlabelled instances is usually not a primary goal, while estimating the prevalence of the classes of interest in the data is. Quantification may in principle be solved via classification, i.e., by classifying each item in and counting, for all, how many such items have been labelled with. However, it has been shown in a multitude of works that this "classify and count" (CC) method yields suboptimal quantification accuracy, one of the reasons being that most classifiers are optimized for classification accuracy, and not for quantification accuracy. As a result, quantification has come to be no longer considered a mere byproduct of classification, and has evolved as a task of its own, devoted to designing methods and algorithms that deliver better prevalence estimates than CC. The goal of this tutorial is to introduce the main supervised learning techniques that have been proposed for solving quantification, the metrics used to evaluate them, and the most promising directions for further research.Source: International Conference on Flexible Query Answering Systems, pp. 13–17, Amantea, Italy, 2-5/06/2019
DOI: 10.1007/978-3-030-27629-4_3
Metrics:


See at: ISTI Repository Open Access | doi.org Restricted | link.springer.com Restricted | CNR ExploRA


2020 Report Open Access OPEN
Tweet Sentiment Quantification: An Experimental Re-Evaluation
Moreo A., Sebastiani F.
Sentiment quantification is the task of estimating the relative frequency (or" prevalence") of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts; this is especially important when these texts are tweets, since most sentiment classification endeavours carried out on Twitter data actually have quantification (and not the classification of individual tweets) as their ultimate goal. It is well-known that solving quantification via" classify and count"(ie, by classifying all unlabelled items via a standard classifier and counting the items that have been assigned to a given class) is suboptimal in terms of accuracy, and that more accurate quantification methods exist. In 2016, Gao and Sebastiani carried out a systematic comparison of quantification methods on the task of tweet sentiment quantification. In hindsight, we observe that the experimental protocol followed in that work is flawed, and that its results are thus unreliable. We now re-evaluate those quantification methods on the very same datasets, this time following a now consolidated and much more robust experimental protocol, that involves 5775 as many experiments as run in the original study. Our experimentation yields results dramatically different from those obtained by Gao and Sebastiani, and thus provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods.Source: Research report, SoBigData++ and AI4Media, 2020
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA


2020 Report Open Access OPEN
Re-Assessing the" Classify and Count" Quantification Method
Moreo A., Sebastiani F.
Learning to quantify (aka\quantification) is a task concerned with training unbiased estimators of class prevalence via supervised learning. This task originated with the observation that" Classify and Count"(CC), the trivial method of obtaining class prevalence estimates, is often a biased estimator, and thus delivers suboptimal quantification accuracy; following this observation, several methods for learning to quantify have been proposed that have been shown to outperform CC. In this work we contend that previous works have failed to use properly optimised versions of CC. We thus reassess the real merits of CC (and its variants), and argue that, while still inferior to some cutting-edge methods, they deliver near-state-of-the-art accuracy once (a) hyperparameter optimisation is performed, and (b) this optimisation is performed by using a true quantification loss instead of a standard classification-based loss. Experiments on three publicly available binary sentiment classification datasets support these conclusions.Source: Research report, SoBigData++ and AI4Media, 2020
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA


2021 Conference article Open Access OPEN
Re-assessing the "Classify and Count" quantification method
Moreo A., Sebastiani F.
Learning to quantify (a.k.a. quantification) is a task concerned with training unbiased estimators of class prevalence via supervised learning. This task originated with the observation that "Classify and Count" (CC), the trivial method of obtaining class prevalence estimates, is often a biased estimator, and thus delivers suboptimal quantification accuracy. Following this observation, several methods for learning to quantify have been proposed and have been shown to outperform CC. In this work we contend that previous works have failed to use properly optimised versions of CC. We thus reassess the real merits of CC and its variants, and argue that, while still inferior to some cutting-edge methods, they deliver near-state-of-the-art accuracy once (a) hyperparameter optimisation is performed, and (b) this optimisation is performed by using a truly quantification-oriented evaluation protocol. Experiments on three publicly available binary sentiment classification datasets support these conclusions.Source: ECIR 2021 - 43rd European Conference on Information Retrieval, pp. 75–91, Online conference, 28/03-01/04/2021
DOI: 10.1007/978-3-030-72240-1_6
DOI: 10.5281/zenodo.4468276
DOI: 10.48550/arxiv.2011.02552
DOI: 10.5281/zenodo.4468277
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | arxiv.org Open Access | ZENODO Open Access | ZENODO Open Access | ISTI Repository Open Access | Lecture Notes in Computer Science Restricted | doi.org Restricted | link.springer.com Restricted | CNR ExploRA


2022 Journal article Open Access OPEN
Report on the 1st International Workshop on Learning to Quantify (LQ 2021)
Del Coz J. J., González P., Moreo A., Sebastiani F.
The 1st International Workshop on Learning to Quantify (LQ 2021 - https://cikmlq2021.github.io/), organized as a satellite event of the 30th ACM International Conference on Knowledge Management (CIKM 2021), took place on two separate days, November 1 and 5, 2021. As the main CIKM 2021 conference, the workshop was held entirely online, due to the COVID-19 pandemic. This report presents a summary of each keynote speech and contributed paper presented in this event, and discusses the issues that were raised during the workshop.Source: SIGKDD explorations (Online) 24 (2022): 49–51.
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: kdd.org Open Access | ISTI Repository Open Access | CNR ExploRA


2022 Journal article Open Access OPEN
Syllabic quantity patterns as rhythmic features for Latin authorship attribution
Corbara S., Moreo A., Sebastiani F.
It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so-called syllabic quantity, that is, on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over others. In this research we investigate the possibility to employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. We test the impact of these features on the authorship attribution task when combined with other topic-agnostic features. Our experiments, carried out on three different datasets using support vector machines (SVMs) show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.Source: Journal of the Association for Information Science and Technology (2022). doi:10.1002/asi.24660
DOI: 10.1002/asi.24660
Metrics:


See at: asistdl.onlinelibrary.wiley.com Open Access | ISTI Repository Open Access | CNR ExploRA


2022 Journal article Open Access OPEN
MedLatinEpi and MedLatinLit: two datasets for the computational authorship analysis of medieval Latin texts
Corbara S., Moreo A., Sebastiani F., Tavoni M.
We present and make available MedLatinEpi and MedLatinLit, two datasets of medieval Latin texts to be used in research on computational authorship analysis. MedLatinEpi and MedLatinLit consist of 294 and 30 curated texts, respectively, labelled by author; MedLatinEpi texts are of epistolary nature, while MedLatinLit texts consist of literary comments and treatises about various subjects. As such, these two datasets lend themselves to supporting research in authorship analysis tasks, such as authorship attribution, authorship verification, or same-author verification. Along with the datasets we provide experimental results, obtained on these datasets, for the authorship verification task, i.e., the task of predicting whether a text of unknown authorship was written by a candidate author or not. We also make available the source code of the authorship verification system we have used, thus allowing our experiments to be reproduced, and to be used as baselines, by other researchers. We also describe the application of the above authorship verification system, using these datasets as training data, for investigating the authorship of two medieval epistles whose authorship has been disputed by scholars.Source: ACM journal on computing and cultural heritage (Print) 3 (2022). doi:10.1145/3485822
DOI: 10.1145/3485822
Metrics:


See at: ISTI Repository Open Access | dl.acm.org Restricted | CNR ExploRA


2022 Journal article Open Access OPEN
Tweet sentiment quantification: an experimental re-evaluation
Moreo A., Sebastiani F.
Sentiment quantification is the task of training, by means of supervised learning, estimators of the relative frequency (also called "prevalence") of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts. This task is especially important when these texts are tweets, since the final goal of most sentiment classification efforts carried out on Twitter data is actually quantification (and not the classification of indi- vidual tweets). It is well-known that solving quantification by means of "classify and count" (i.e., by classifying all unlabelled items by means of a standard classifier and counting the items that have been assigned to a given class) is less than optimal in terms of accuracy, and that more accurate quantification methods exist. Gao and Sebastiani 2016 carried out a systematic comparison of quantification methods on the task of tweet sentiment quantifica- tion. In hindsight, we observe that the experimentation carried out in that work was weak, and that the reliability of the conclusions that were drawn from the results is thus question- able. We here re-evaluate those quantification methods (plus a few more modern ones) on exactly the same datasets, this time following a now consolidated and robust experimental protocol (which also involves simulating the presence, in the test data, of class prevalence values very different from those of the training set). This experimental protocol (even without counting the newly added methods) involves a number of experiments 5,775 times larger than that of the original study. Due to the above-mentioned presence, in the test data, of samples characterised by class prevalence values very different from those of the training set, the results of our experiments are dramatically different from those obtained by Gao and Sebastiani, and provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods.Source: PloS one 17 (2022). doi:10.1371/journal.pone.0263449
DOI: 10.1371/journal.pone.0263449
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: journals.plos.org Open Access | ISTI Repository Open Access | CNR ExploRA


2022 Contribution to conference Open Access OPEN
Proceedings of the 2nd International Workshop on Learning to Quantify (LQ 2022)
Del Coz J. J, González P., Moreo A., Sebastiani F.
The 2nd International Workshop on Learning to Quantify (LQ 2022 - https: //lq-2022.github.io/) was held in Grenoble, FR, on September 23, 2022, as a satellite workshop of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2022). While the 1st edition of the workshop (LQ 2021 - https://cikmlq2021. github.io/, which was instead co-located with the 30th ACM International Conference on Information and Knowledge Management (CIKM 2021)) had to be an entirely online event, LQ 2022 was a hybrid event, with presentations given in-presence and both in-presence attendees and remote attendees. The workshop was a half-day event, and consisted of a keynote talk by Marco Saerens (Universit ?e Catholique de Louvain), presentations of four con- tributed papers, and a final collective discussion on the open problems of learning to quantify and on future initiatives. The present volume contains the four contributed papers that were ac- cepted for presentation at the workshop. Each of these papers was submitted as a response to the call for papers, was reviewed by at least three members of the international program committee, and was revised by the authors so as to take into account the feedback provided by the reviewers. We hope that the availability of the present volume will increase the interest in the subject of quantification on the part of researchers and practitioners alike, and will contribute to making quantification better known to potential users of this technology and to researchers interested in advancing the field.Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: lq-2022.github.io Open Access | ISTI Repository Open Access | CNR ExploRA