2022
Conference article
Open Access
Rhythmic and psycholinguistic features for authorship tasks in the Spanish parliament: evaluation and analysis
Corbara S., Chulvi B., Rosso P., Moreo Fernandez A.Among the many tasks of the authorship field, Authorship Identification aims at uncovering the author of a document, while Author Profiling focuses on the analysis of personal characteristics of the author(s), such as gender, age, etc. Methods devised for such tasks typically focus on the style of the writing, and are expected not to make inferences grounded on the topics that certain authors tend to write about. In this paper, we present a series of experiments evaluating the use of topic-agnostic feature sets for Authorship Identification and Author Profiling tasks in Spanish political language. In particular, we propose to employ features based on rhythmic and psycholinguistic patterns, obtained via different approaches of text masking that we use to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by a BETO transformer, when the latter is trained on the original text, i.e., potentially learning from topical information. Moreover, we further investigate the results for the different authors, showing that variations in performance are partially explainable in terms of the authors' political affiliation and communication style.Project(s): AI4Media
See at:
CNR IRIS | link.springer.com | ISTI Repository | CNR IRIS | CNR IRIS
2019
Contribution to book
Open Access
The Epistle to Cangrande Through the Lens of Computational Authorship Verification
Corbara S., Moreo A., Sebastiani F., Tavoni M.The Epistle to Cangrande is one of the most controversial among the works of Italian poet Dante Alighieri. For more than a hundred years now, scholars have been debating over its real paternity, i.e., whether it should be considered a true work by Dante or a forgery by an unnamed author. In this work we address this philological problem through the methodologies of (supervised) Computational Authorship Verification, by training a classifier that predicts whether a given work is by Dante Alighieri or not. We discuss the system we have set up for this endeavour, the training set we have assembled, the experimental results we have obtained, and some issues that this work leaves open.
See at:
CNR IRIS | link.springer.com | ISTI Repository | CNR IRIS | CNR IRIS
2020
Other
Open Access
MedLatin1 and MedLatin2: Two Datasets for the Computational Authorship Analysis of Medieval Latin Texts
Corbara S, Moreo A, Sebastiani F, Tavoni MWe present and make available MedLatin1 and MedLatin2, two datasets of medieval Latin texts to be used in research on computational authorship analysis. MedLatin1 and MedLatin2 consist of 294 and 30 curated texts, respectively, labelled by author, with MedLatin1 texts being of an epistolary nature and MedLatin2 texts consisting of literary comments and treatises about various subjects. As such, these two datasets lend themselves to supporting research in authorship analysis tasks, such as authorship attribution, authorship verification, or same-author verification.
See at:
arxiv.org | CNR IRIS | ISTI Repository | CNR IRIS
2019
Conference article
Open Access
The Epistle to Cangrande through the Lens of computational authorship verification
Corbara S., Moreo Fernandez A., Sebastiani F., Tavoni M.The Epistle to Cangrande is one of the most debated documents in the production of the Italian poet Dante Alighieri. For more than a hundred years scholars have been debating over its real paternity, whether it should be considered a work by Dante or a malicious forgery by an unnamed author. In this work, we try to address this philological problem through the methodologies of computational authorship verification and machine learning, by training a classifier on a dataset of medieval Latin prose texts and by using a set of authorship-related features. Although the project is still in a preliminary phase, the early results seem to confirm the hypothesis of a forgery.Source: CEUR WORKSHOP PROCEEDINGS, pp. 29-35. Milan, Italy, July 17-18, 2019
See at:
ceur-ws.org | CNR IRIS | ISTI Repository | CNR IRIS
2022
Journal article
Open Access
MedLatinEpi and MedLatinLit: two datasets for the computational authorship analysis of medieval Latin texts
Corbara S., Moreo Fernandez A., Sebastiani F., Tavoni M.We present and make available MedLatinEpi and MedLatinLit, two datasets of medieval Latin texts to be used in research on computational authorship analysis. MedLatinEpi and MedLatinLit consist of 294 and 30 curated texts, respectively, labelled by author; MedLatinEpi texts are of epistolary nature, while MedLatinLit texts consist of literary comments and treatises about various subjects. As such, these two datasets lend themselves to supporting research in authorship analysis tasks, such as authorship attribution, authorship verification, or same-author verification. Along with the datasets we provide experimental results, obtained on these datasets, for the authorship verification task, i.e., the task of predicting whether a text of unknown authorship was written by a candidate author or not. We also make available the source code of the authorship verification system we have used, thus allowing our experiments to be reproduced, and to be used as baselines, by other researchers. We also describe the application of the above authorship verification system, using these datasets as training data, for investigating the authorship of two medieval epistles whose authorship has been disputed by scholars.Source: ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, vol. 3 (issue 15)
See at:
dl.acm.org | CNR IRIS | ISTI Repository | CNR IRIS | CNR IRIS
2024
Journal article
Open Access
Explainable authorship identification in Cultural Heritage applications
Setzu M., Corbara S., Monreale A., Moreo Fernandez A., Sebastiani F.While a substantial amount of work has recently been devoted to improving the accuracy of computational Authorship Identification (AId) systems for textual data, little to no attention has been paid to endowing AId systems with the ability to explain the reasons behind their predictions. This substantially hinders the practical application of AId methods, since the predictions returned by such systems are hardly useful unless they are supported by suitable explanations. In this article, we explore the applicability of existing general-purpose eXplainable Artificial Intelligence (XAI) techniques to AId, with a focus on explanations addressed to scholars working in cultural heritage. In particular, we assess the relative merits of three different types of XAI techniques (feature ranking, probing, factual and counterfactual selection) on three different AId tasks (authorship attribution, authorship verification and same-authorship verification) by running experiments on real AId textual data. Our analysis shows that, while these techniques make important first steps towards XAI, more work remains to be done to provide tools that can be profitably integrated into the workflows of scholars.Source: ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, vol. 17 (issue 3), pp. 1-23
DOI: 10.1145/3654675Project(s): SoBigData-PlusPlus Metrics:
See at:
CNR IRIS | Journal on Computing and Cultural Heritage | CNR IRIS | CNR IRIS
2020
Other
Open Access
AIMH research activities 2020
Aloia N., Amato G., Bartalesi Lenzi V., Benedetti F., Bolettieri P., Carrara F., Casarosa V., Ciampi L., Concordia C., Corbara S., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo Fernandez A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Thanos C., Trupiano L., Vadicamo L., Vairo C.Annual Report of the Artificial Intelligence for Media and Humanities laboratory (AIMH) research activities in 2020.DOI: 10.32079/isti-ar-2020/001Metrics:
See at:
CNR IRIS | ISTI Repository | CNR IRIS
2021
Other
Open Access
AIMH research activities 2021
Aloia N, Amato G, Bartalesi V, Benedetti F, Bolettieri P, Cafarelli D, Carrara F, Casarosa V, Coccomini D, Ciampi L, Concordia C, Corbara S, Di Benedetto M, Esuli A, Falchi F, Gennaro C, Lagani G, Massoli Fv, Meghini C, Messina N, Metilli D, Molinari A, Moreo A, Nardi A, Pedrotti A, Pratelli N, Rabitti F, Savino P, Sebastiani F, Sperduti G, Thanos C, Trupiano L, Vadicamo L, Vairo CThe Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability.
This report summarize the 2021 activities of the research group.DOI: 10.32079/isti-ar-2021/003Metrics:
See at:
CNR IRIS | ISTI Repository | CNR IRIS
2022
Other
Open Access
AIMH research activities 2022
Aloia N., Amato G., Bartalesi Lenzi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Metilli D., Molinari A., Moreo Fernandez A. D., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability.This report summarize the 2022 activities of the research group.DOI: 10.32079/isti-ar-2022/002Metrics:
See at:
CNR IRIS | ISTI Repository | CNR IRIS
2023
Other
Open Access
AIMH Research Activities 2023
Aloia N, Amato G, Bartalesi V, Bianchi L, Bolettieri P, Bosio C, Carraglia M, Carrara F, Casarosa V, Ciampi L, Coccomini Da, Concordia C, Corbara S, De Martino C, Di Benedetto M, Esuli A, Falchi F, Fazzari E, Gennaro C, Lagani G, Lenzi E, Meghini C, Messina N, Molinari A, Moreo A, Nardi A, Pedrotti A, Pratelli N, Puccetti G, Rabitti F, Savino P, Sebastiani F, Sperduti G, Thanos C, Trupiano L, Vadicamo L, Vairo C, Versienti LThe AIMH (Artificial Intelligence for Media and Humanities) laboratory is dedicated to exploring and pushing the boundaries in the field of Artificial Intelligence, with a particular focus on its application in digital media and humanities. This lab's objective is to enhance the current state of AI technology particularly on deep learning, text analysis, computer vision, multimedia information retrieval, multimedia content analysis, recognition, and retrieval. This report encapsulates the laboratory's progress and activities throughout the year 2023.DOI: 10.32079/isti-ar-2023/001Metrics:
See at:
CNR IRIS | ISTI Repository | CNR IRIS