Page 1 of 4

2023 Conference article Open Access

Ordinal quantification through regularization
Bunse M., Moreo A., Sebastiani F., Senz M.
Quantification,i.e.,thetaskoftrainingpredictorsoftheclass prevalence values in sets of unlabelled data items, has received increased attention in recent years. However, most quantification research has con- centrated on developing algorithms for binary and multiclass problems in which the classes are not ordered. We here study the ordinal case, i.e., the case in which a total order is defined on the set of n > 2 classes. We give three main contributions to this field. First, we create and make available two datasets for ordinal quantification (OQ) research that overcome the inadequacies of the previously available ones. Second, we experimentally compare the most important OQ algorithms proposed in the literature so far. To this end, we bring together algorithms that are proposed by authors from very different research fields, who were unaware of each other's developments. Third, we propose three OQ algorithms, based on the idea of preventing ordinally implausible estimates through regu- larization. Our experiments show that these algorithms outperform the existing ones if the ordinal plausibility assumption holds.Source: ECML/PKDD 2022 - 33rd European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 36–52, Grenoble, France, 19-23/09/2022
DOI: 10.1007/978-3-031-26419-1_3
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | link.springer.com Restricted | CNR ExploRA

2023 Book Open Access

Learning to Quantify
Esuli A., Fabris A., Moreo A., Sebastiani F.
This open access book provides an introduction and an overview of learning to quantify (a.k.a. "quantification"), i.e. the task of training estimators of class proportions in unlabeled data by means of supervised learning. In data science, learning to quantify is a task of its own related to classification yet different from it, since estimating class proportions by simply classifying all data and counting the labels assigned by the classifier is known to often return inaccurate ("biased") class proportion estimates. The book introduces learning to quantify by looking at the supervised learning methods that can be used to perform it, at the evaluation measures and evaluation protocols that should be used for evaluating the quality of the returned predictions, at the numerous fields of human activity in which the use of quantification techniques may provide improved results with respect to the naive use of classification techniques, and at advanced topics in quantification research. The book is suitable to researchers, data scientists, or PhD students, who want to come up to speed with the state of the art in learning to quantify, but also to researchers wishing to apply data science technologies to fields of human activity (e.g., the social sciences, political science, epidemiology, market research) which focus on aggregate ("macro") data rather than on individual ("micro") data.DOI: 10.1007/978-3-031-20467-8
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: link.springer.com Open Access | ISTI Repository | CNR ExploRA

2023 Journal article Open Access

Measuring fairness under unawareness of sensitive attributes: a quantification-based approach
Fabris A., Esuli A., Moreo A., Sebastiani F.
Algorithms and models are increasingly deployed to inform decisions about people, inevitably affecting their lives. As a consequence, those in charge of developing these models must carefully evaluate their impact on different groups of people and favour group fairness, that is, ensure that groups determined by sensitive demographic attributes, such as race or sex, are not treated unjustly. To achieve this goal, the availability (awareness) of these demographic attributes to those evaluating the impact of these models is fundamental. Unfortunately, collecting and storing these attributes is often in conflict with industry practices and legislation on data minimisation and privacy. For this reason, it can be hard to measure the group fairness of trained models, even from within the companies developing them. In this work, we tackle the problem of measuring group fairness under unawareness of sensitive attributes, by using techniques from quantification, a supervised learning task concerned with directly providing group-level prevalence estimates (rather than individual-level class labels). We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem, as they are robust to inevitable distribution shifts while at the same time decoupling the (desirable) objective of measuring group fairness from the (undesirable) side effect of allowing the inference of sensitive attributes of individuals. More in detail, we show that fairness under unawareness can be cast as a quantification problem and solved with proven methods from the quantification literature. We show that these methods outperform previous approaches to measure demographic parity in five experimental protocols, corresponding to important challenges that complicate the estimation of classifier fairness under unawareness.Source: Journal of artificial intelligence research (Online) 76 (2023): 1117–1180. doi:10.1613/jair.1.14033
DOI: 10.1613/jair.1.14033
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | www.jair.org | CNR ExploRA

2023 Journal article Open Access

NoR-VDPNet++: real-time no-reference image quality metrics
Banterle F., Artusi A., Moreo A., Carrara F., Cignoni P.
Efficiency and efficacy are desirable properties for any evaluation metric having to do with Standard Dynamic Range (SDR) imaging or with High Dynamic Range (HDR) imaging. However, it is a daunting task to satisfy both properties simultaneously. On the one side, existing evaluation metrics like HDR-VDP 2.2 can accurately mimic the Human Visual System (HVS), but this typically comes at a very high computational cost. On the other side, computationally cheaper alternatives (e.g., PSNR, MSE, etc.) fail to capture many crucial aspects of the HVS. In this work, we present NoR-VDPNet++, a deep learning architecture for converting full-reference accurate metrics into no-reference metrics thus reducing the computational burden. We show NoR-VDPNet++ can be successfully employed in different application scenarios.Source: IEEE access 11 (2023): 34544–34553. doi:10.1109/ACCESS.2023.3263496
DOI: 10.1109/access.2023.3263496
Project(s): ENCORE via OpenAIRE

Metrics:

See at: IEEE Access Open Access | ieeexplore.ieee.org | ISTI Repository | ISTI Repository | CNR ExploRA

2023 Journal article Open Access

Multi-label quantification
Moreo A., Francisco M., Sebastiani F.
Quantification, variously called supervised prevalence estimation or learning to quantify, is the supervised learning task of generating predictors of the relative frequencies (a.k.a. prevalence values) of the classes of interest in unlabelled data samples. While many quantification methods have been proposed in the past for bi- nary problems and, to a lesser extent, single-label multiclass problems, the multi-label setting (i.e., the scenario in which the classes of interest are not mutually exclusive) remains by and large unexplored. A straightfor- ward solution to the multi-label quantification problem could simply consist of recasting the problem as a set of independent binary quantification problems. Such a solution is simple but naïve, since the independence assumption upon which it rests is, in most cases, not satisfied. In these cases, knowing the relative frequency of one class could be of help in determining the prevalence of other related classes. We propose the first truly multi-label quantification methods, i.e., methods for inferring estimators of class prevalence values that strive to leverage the stochastic dependencies among the classes of interest in order to predict their relative frequencies more accurately. We show empirical evidence that natively multi-label solutions outperform the naïve approaches by a large margin. The code to reproduce all our experiments is available online.Source: ACM transactions on knowledge discovery from data (Online) 18 (2023). doi:10.1145/3606264
DOI: 10.1145/3606264
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | ZENODO | dl.acm.org Restricted | CNR ExploRA

2023 Journal article Open Access

Same or different? Diff-vectors for authorship analysis
Corbara S., Moreo A., Sebastiani F.
In this paper we investigate the efects on authorship identiication tasks (including authorship veriication, closed-set authorship attribution, and closed-set and open-set same-author veriication) of a fundamental shift in how to conceive the vectorial representations of documents that are given as input to a supervised learner. In ?classic? authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document. We instead investigate the situation in which a feature vector represents an unordered pair of documents, the value of a feature represents the absolute diference in the relative frequencies (or increasing functions thereof) of the feature in the two documents, and the class label indicates whether the two documents are from the same author or not. This latter (learner-independent) type of representation has been occasionally used before, but has never been studied systematically. We argue that it is advantageous, and that in some cases (e.g., authorship veriication) it provides a much larger quantity of information to the training process than the standard representation. The experiments that we carry out on several publicly available datasets (among which one that we here make available for the irst time) show that feature vectors representing pairs of documents (that we here call Dif-Vectors) bring about systematic improvements in the efectiveness of authorship identiication tasks, and especially so when training data are scarce (as it is often the case in real-life authorship identiication scenarios). Our experiments tackle same-author veriication, authorship veriication, and closed-set authorship attribution; while DVs are naturally geared for solving the 1st, we also provide two novel methods for solving the 2nd and 3rd that use a solver for the 1st as a building block. The code to reproduce our experiments is open-source and available online.Source: ACM transactions on knowledge discovery from data (Online) (2023). doi:10.1145/3609226
DOI: 10.1145/3609226
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | dl.acm.org Restricted | CNR ExploRA

2023 Conference article Open Access

Enhancing adversarial authorship verification with data augmentation
Corbara S., Moreo A.
It has been shown that many Authorship Identification systems are vulnerable to adversarial attacks, where an author actively tries to fool the classifier. We propose to tackle the adversarial Authorship Verification task by augmenting the training set with synthetic textual examples. In this ongoing study, we present preliminary results using two learning algorithms (SVM and Neural Network), and two generation strategies (based on language modeling and GAN training) for two generator models, on three datasets. We empirically show that data augmentation may help improve the performance of the classifier in an adversarial setup.Source: IIR 2023 - 13th Italian Information Retrieval Workshop, pp. 73–78, Pisa, Italy, 8-9/6/23.

See at: ceur-ws.org Open Access | ISTI Repository | CNR ExploRA

2023 Report Open Access

AIMH Research Activities 2023
Aloia N., Amato G., Bartalesi V., Bianchi L., Bolettieri P., Bosio C., Carraglia M., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., De Martino C., Di Benedetto M., Esuli A., Falchi F., Fazzari E., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Puccetti G., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C., Versienti L.
The AIMH (Artificial Intelligence for Media and Humanities) laboratory is dedicated to exploring and pushing the boundaries in the field of Artificial Intelligence, with a particular focus on its application in digital media and humanities. This lab's objective is to enhance the current state of AI technology particularly on deep learning, text analysis, computer vision, multimedia information retrieval, multimedia content analysis, recognition, and retrieval. This report encapsulates the laboratory's progress and activities throughout the year 2023.Source: ISTI Annual Reports, 2023
DOI: 10.32079/isti-ar-2023/001
Metrics:

See at: ISTI Repository Open Access | CNR ExploRA

2022 Conference article Open Access

LeQua@CLEF2022: learning to quantify
Esuli A., Moreo A., Sebastiani F.
LeQua 2022 is a new lab for the evaluation of methods for "learning to quantify" in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of litera- ture has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary set- ting and in the single-label multiclass setting. For each such setting we provide data either in ready-made vector form or in raw document form.Source: ECIR 2022 - 44th European Conference on IR Research, pp. 374–381, Stavanger, Norway, 10-14/04/2022
DOI: 10.1007/978-3-030-99739-7_47
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | ISTI Repository | link.springer.com Restricted | CNR ExploRA

2022 Journal article Open Access

Report on the 1st International Workshop on Learning to Quantify (LQ 2021)
Del Coz J. J., González P., Moreo A., Sebastiani F.
The 1st International Workshop on Learning to Quantify (LQ 2021 - https://cikmlq2021.github.io/), organized as a satellite event of the 30th ACM International Conference on Knowledge Management (CIKM 2021), took place on two separate days, November 1 and 5, 2021. As the main CIKM 2021 conference, the workshop was held entirely online, due to the COVID-19 pandemic. This report presents a summary of each keynote speech and contributed paper presented in this event, and discusses the issues that were raised during the workshop.Source: SIGKDD explorations (Online) 24 (2022): 49–51.
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

See at: kdd.org Open Access | ISTI Repository | CNR ExploRA

2022 Journal article Open Access

Syllabic quantity patterns as rhythmic features for Latin authorship attribution
Corbara S., Moreo A., Sebastiani F.
It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so-called syllabic quantity, that is, on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over others. In this research we investigate the possibility to employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. We test the impact of these features on the authorship attribution task when combined with other topic-agnostic features. Our experiments, carried out on three different datasets using support vector machines (SVMs) show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.Source: Journal of the Association for Information Science and Technology (2022). doi:10.1002/asi.24660
DOI: 10.1002/asi.24660
Metrics:

See at: asistdl.onlinelibrary.wiley.com Open Access | ISTI Repository | CNR ExploRA

2022 Journal article Open Access

MedLatinEpi and MedLatinLit: two datasets for the computational authorship analysis of medieval Latin texts
Corbara S., Moreo A., Sebastiani F., Tavoni M.
We present and make available MedLatinEpi and MedLatinLit, two datasets of medieval Latin texts to be used in research on computational authorship analysis. MedLatinEpi and MedLatinLit consist of 294 and 30 curated texts, respectively, labelled by author; MedLatinEpi texts are of epistolary nature, while MedLatinLit texts consist of literary comments and treatises about various subjects. As such, these two datasets lend themselves to supporting research in authorship analysis tasks, such as authorship attribution, authorship verification, or same-author verification. Along with the datasets we provide experimental results, obtained on these datasets, for the authorship verification task, i.e., the task of predicting whether a text of unknown authorship was written by a candidate author or not. We also make available the source code of the authorship verification system we have used, thus allowing our experiments to be reproduced, and to be used as baselines, by other researchers. We also describe the application of the above authorship verification system, using these datasets as training data, for investigating the authorship of two medieval epistles whose authorship has been disputed by scholars.Source: ACM journal on computing and cultural heritage (Print) 3 (2022). doi:10.1145/3485822
DOI: 10.1145/3485822
Metrics:

See at: ISTI Repository Open Access | dl.acm.org Restricted | CNR ExploRA

2022 Conference article Open Access

A detailed overview of LeQua 2022: learning to quantify
Esuli A., Moreo A., Sebastiani F., Sperduti G.
LeQua 2022 is a new lab for the evaluation of methods for "learning to quantify" in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest Y = {y1 , ..., yn } in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of literature has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting; this is the first time that an evaluation exercise solely dedicated to quantification is organized. For both the binary setting and the single-label multiclass setting, data were provided to participants both in ready-made vector form and in raw document form. In this overview article we describe the structure of the lab, we report the results obtained by the participants in the four proposed tasks and subtasks, and we comment on the lessons that can be learned from these results.Source: CLEF 2022 - 13th Conference and Labs of the Evaluation Forum, pp. 1849–1868, Bologna, Italy, 5-8/9/2022
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

See at: ceur-ws.org Open Access | ISTI Repository | CNR ExploRA

2022 Conference article Open Access

A concise overview of LeQua@CLEF 2022: Learning to Quantify
Esuli A., Moreo A., Sebastiani F., Sperduti G.
LeQua 2022 is a new lab for the evaluation of methods for "learning to quantify" in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest Y={y1,...,yn} in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of literature has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary setting and in the single-label multiclass setting; this is the first time that an evaluation exercise solely dedicated to quantification is organized. For both the binary setting and the single-label multiclass setting, data were provided to participants both in ready-made vector form and in raw document form. In this overview article we describe the structure of the lab, we report the results obtained by the participants in the four proposed tasks and subtasks, and we comment on the lessons that can be learned from these results.Source: CLEF 2022 - 13th Conference and Labs of the Evaluation Forum, pp. 362–381, Bologna, Italy, 5-8/9/2022
DOI: 10.1007/978-3-031-13643-6_23
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | link.springer.com Restricted | CNR ExploRA

2022 Journal article Open Access

Tweet sentiment quantification: an experimental re-evaluation
Moreo A., Sebastiani F.
Sentiment quantification is the task of training, by means of supervised learning, estimators of the relative frequency (also called "prevalence") of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts. This task is especially important when these texts are tweets, since the final goal of most sentiment classification efforts carried out on Twitter data is actually quantification (and not the classification of indi- vidual tweets). It is well-known that solving quantification by means of "classify and count" (i.e., by classifying all unlabelled items by means of a standard classifier and counting the items that have been assigned to a given class) is less than optimal in terms of accuracy, and that more accurate quantification methods exist. Gao and Sebastiani 2016 carried out a systematic comparison of quantification methods on the task of tweet sentiment quantifica- tion. In hindsight, we observe that the experimentation carried out in that work was weak, and that the reliability of the conclusions that were drawn from the results is thus question- able. We here re-evaluate those quantification methods (plus a few more modern ones) on exactly the same datasets, this time following a now consolidated and robust experimental protocol (which also involves simulating the presence, in the test data, of class prevalence values very different from those of the training set). This experimental protocol (even without counting the newly added methods) involves a number of experiments 5,775 times larger than that of the original study. Due to the above-mentioned presence, in the test data, of samples characterised by class prevalence values very different from those of the training set, the results of our experiments are dramatically different from those obtained by Gao and Sebastiani, and provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods.Source: PloS one 17 (2022). doi:10.1371/journal.pone.0263449
DOI: 10.1371/journal.pone.0263449
Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: journals.plos.org Open Access | ISTI Repository | CNR ExploRA

2022 Contribution to conference Open Access

Proceedings of the 2nd International Workshop on Learning to Quantify (LQ 2022)
Del Coz J. J, González P., Moreo A., Sebastiani F.
The 2nd International Workshop on Learning to Quantify (LQ 2022 - https: //lq-2022.github.io/) was held in Grenoble, FR, on September 23, 2022, as a satellite workshop of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2022). While the 1st edition of the workshop (LQ 2021 - https://cikmlq2021. github.io/, which was instead co-located with the 30th ACM International Conference on Information and Knowledge Management (CIKM 2021)) had to be an entirely online event, LQ 2022 was a hybrid event, with presentations given in-presence and both in-presence attendees and remote attendees. The workshop was a half-day event, and consisted of a keynote talk by Marco Saerens (Universit ?e Catholique de Louvain), presentations of four con- tributed papers, and a final collective discussion on the open problems of learning to quantify and on future initiatives. The present volume contains the four contributed papers that were ac- cepted for presentation at the workshop. Each of these papers was submitted as a response to the call for papers, was reviewed by at least three members of the international program committee, and was revised by the authors so as to take into account the feedback provided by the reviewers. We hope that the availability of the present volume will increase the interest in the subject of quantification on the part of researchers and practitioners alike, and will contribute to making quantification better known to potential users of this technology and to researchers interested in advancing the field.Project(s): AI4Media via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

See at: lq-2022.github.io Open Access | ISTI Repository | CNR ExploRA

2022 Journal article Open Access

Generalized funnelling: ensemble learning and heterogeneous document embeddings for cross-lingual text classification
Moreo A., Pedrotti A., Sebastiani F.
Funnelling (Fun) is a recently proposed method for cross-lingual text classification (CLTC) based on a two-tier learning ensemble for heterogeneous transfer learning (HTL). In this ensemble method, 1st-tier classifiers, each working on a different and language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta-classifier that uses this vector as its input. The meta-classifier can thus exploit class-class correlations, and this (among other things) gives Fun an edge over CLTC systems in which these correlations cannot be brought to bear. In this paper we describe Generalized Funnelling (gFun), a generalisation of Fun consisting of an HTL architecture in which 1st-tier components can be arbitrary view-generating functions, i.e., language-dependent functions that each produce a language-independent representation ("view") of the (monolingual) document. We describe an instance of gFun in which the meta-classifier receives as input a vector of calibrated posterior probabilities (as in Fun) aggregated to other embedded representations that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings), word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings), and word-context correlations (as encoded by multilingual BERT ). We show that this instance of gFun substantially improves over Fun and over state-of-the-art baselines, by reporting experimental results obtained on two large, standard datasets for multilingual multilabel text classification. Our code that implements gFun is publicly available.Source: ACM transactions on information systems 41 (2022). doi:10.1145/3544104
DOI: 10.1145/3544104
Project(s): AI4Media via OpenAIRE

, ARIADNEplus via OpenAIRE

, SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | dl.acm.org Restricted | CNR ExploRA

2022 Conference article Open Access

Rhythmic and psycholinguistic features for authorship tasks in the Spanish parliament: evaluation and analysis
Corbara S., Chulvi B., Rosso P., Moreo A.
Among the many tasks of the authorship field, Authorship Identification aims at uncovering the author of a document, while Author Profiling focuses on the analysis of personal characteristics of the author(s), such as gender, age, etc. Methods devised for such tasks typically focus on the style of the writing, and are expected not to make inferences grounded on the topics that certain authors tend to write about. In this paper, we present a series of experiments evaluating the use of topic-agnostic feature sets for Authorship Identification and Author Profiling tasks in Spanish political language. In particular, we propose to employ features based on rhythmic and psycholinguistic patterns, obtained via different approaches of text masking that we use to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by a BETO transformer, when the latter is trained on the original text, i.e., potentially learning from topical information. Moreover, we further investigate the results for the different authors, showing that variations in performance are partially explainable in terms of the authors' political affiliation and communication style.Source: CLEF 2022 - 13th Conference of the CLEF Association, pp. 79–92, Bologna, Italy, 5-8/9/2022
DOI: 10.1007/978-3-031-13643-6_6
Project(s): AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | link.springer.com Restricted | CNR ExploRA

2022 Conference article Open Access

Investigating topic-agnostic features for authorship tasks in Spanish political speeches
Corbara S., Chulvi Ferriols B., Rosso P., Moreo A.
Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information.Source: NLDB 2022 - 27th International Conference on Applications of Natural Language to Information Systems, pp. 394–402, Valencia, Spagna, 15-17/6/2022
DOI: 10.1007/978-3-031-08473-7_36
Project(s): AI4Media via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | doi.org Restricted | link.springer.com | CNR ExploRA

2022 Report Open Access

AIMH research activities 2022
Aloia N., Amato G., Bartalesi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Metilli D., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability. This report summarize the 2022 activities of the research group.Source: ISTI Annual reports, 2022
DOI: 10.32079/isti-ar-2022/002
Metrics:

See at: ISTI Repository Open Access | CNR ExploRA