329 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
more
Rights operator: and / or
2025 Conference article Restricted
A simple method for classifier accuracy prediction under prior probability shift
Volpi L., Moreo Fernandez A., Sebastiani F.
The standard technique for predicting the accuracy that a classifier will have on unseen data (classifier accuracy prediction – CAP) is cross-validation (CV). However, CV relies on the assumption that the training data and the test data are sampled from the same distribution, an assumption that is often violated in many real-world scenarios. When such violations occur (i.e., in the presence of dataset shift), the estimates returned by CV are unreliable. In this paper we propose a CAP method specifically designed to address prior probability shift (PPS), an instance of dataset shift in which the training and test distributions are characterized by different class priors. By solving a system of independent linear equations, with n the number of classes, our method estimates the entries of the contingency table of the test data, and thus allows estimating any specific evaluation measure. Since a key step in this method involves predicting the class priors of the test data, we further observe a connection between our method and the field of “learning to quantify”. Our experiments show that, when combined with state-of-the-art quantification techniques, under PPS our method tends to outperform existing CAP methods.Source: LECTURE NOTES IN COMPUTER SCIENCE, vol. 15244, pp. 267-283. Pisa, Italy, 14-16/10/2024
DOI: 10.1007/978-3-031-78980-9_17
Project(s): Quantification in the Context of Dataset Shift
Metrics:


See at: CNR IRIS Restricted | CNR IRIS Restricted | CNR IRIS Restricted | link.springer.com Restricted


2024 Journal article Open Access OPEN
Explainable authorship identification in Cultural Heritage applications
Setzu M., Corbara S., Monreale A., Moreo Fernandez A., Sebastiani F.
While a substantial amount of work has recently been devoted to improving the accuracy of computational Authorship Identification (AId) systems for textual data, little to no attention has been paid to endowing AId systems with the ability to explain the reasons behind their predictions. This substantially hinders the practical application of AId methods, since the predictions returned by such systems are hardly useful unless they are supported by suitable explanations. In this article, we explore the applicability of existing general-purpose eXplainable Artificial Intelligence (XAI) techniques to AId, with a focus on explanations addressed to scholars working in cultural heritage. In particular, we assess the relative merits of three different types of XAI techniques (feature ranking, probing, factual and counterfactual selection) on three different AId tasks (authorship attribution, authorship verification and same-authorship verification) by running experiments on real AId textual data. Our analysis shows that, while these techniques make important first steps towards XAI, more work remains to be done to provide tools that can be profitably integrated into the workflows of scholars.Source: ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, vol. 17 (issue 3), pp. 1-23
DOI: 10.1145/3654675
Project(s): SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | Journal on Computing and Cultural Heritage Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2024 Journal article Open Access OPEN
Regularization-based methods for ordinal quantification
Bunse M., Moreo Fernandez A., Sebastiani F., Senz M.
Quantification, i.e., the task of predicting the class prevalence values in bags of unlabeled data items, has received increased attention in recent years. However, most quantification research has concentrated on developing algorithms for binary and multi-class problems in which the classes are not ordered. Here, we study the ordinal case, i.e., the case in which a total order is defined on the set of $$n>2$$classes. We give three main contributions to this field. First, we create and make available two datasets for ordinal quantification (OQ) research that overcome the inadequacies of the previously available ones. Second, we experimentally compare the most important OQ algorithms proposed in the literature so far. To this end, we bring together algorithms proposed by authors from very different research fields, such as data mining and astrophysics, who were unaware of each others’ developments. Third, we propose a novel class of regularized OQ algorithms, which outperforms existing algorithms in our experiments. The key to this gain in performance is that our regularization prevents ordinally implausible estimates, assuming that ordinal distributions tend to be smooth in practice. We informally verify this assumption for several real-world applications.Source: DATA MINING AND KNOWLEDGE DISCOVERY
DOI: 10.1007/s10618-024-01067-2
Project(s): AI4Media via OpenAIRE, Quantification in the Context of Dataset Shift, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
Multimodal heterogeneous transfer learning for multilingual image-text classification
Pedrotti A., Moreo Fernandez A., Sebastiani F.
The Multilingual Image-Text Classification (MITC) task is a specific instance of the Image-Text Classification (ITC) task, where each item to be classified consists of a visual representation and a textual description written in one of several possible languages. In this paper we propose MM-gFun, an extension of the gFun learning architecture originally developed for cross-lingual text classification. We extend its original text-only implementation to handle perceptual modalities.Source: CEUR WORKSHOP PROCEEDINGS, vol. 3928. Pisa, Italy, 14-16/10/2024
Project(s): SoBigData via OpenAIRE

See at: ceur-ws.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2024 Journal article Open Access OPEN
A noise-oriented and redundancy-aware instance selection framework
Cunha W., Moreo Fernandez A., Esuli A., Sebastiani F., Rocha L., Gonçalves M. A.
Fine-tuning transformer-based deep learning models is currently at the forefront of natural language processing (NLP) and information retrieval (IR) tasks. However, fine-tuning these transformers for specific tasks, especially when dealing with ever-expanding volumes of data, constant retraining requirements, and budget constraints, can be computationally and financially costly, requiring substantial energy consumption and contributing to carbon dioxide emissions. This article focuses on advancing the state-of-the-art (SOTA) on instance selection (IS) – a range of document filtering techniques designed to select the most representative documents for the sake of training. The objective is to either maintain or enhance classification effectiveness while reducing the overall training (fine-tuning) total processing time. In our prior research, we introduced the E2SC framework, a redundancy-oriented IS method focused on transformers and large datasets – currently the state-of-the-art in IS. Nonetheless, important research questions remained unanswered in our previous work, mostly due to E2SC’s sole emphasis on redundancy. In this article, we take our research a step further by proposing biO-IS – an extended bi-objective instance selection solution, a novel IS framework aimed at simultaneously removing redundant and noisy instances from the training. biO-IS estimates redundancy based on scalable, fast, and calibrated weak classifiers and captures noise with the support of a new entropy-based step. We also propose a novel iterative process to estimate near-optimum reduction rates for both steps. Our extended solution is able to reduce the training sets by 41% on average (up to 60%) while maintaining the effectiveness in all tested datasets, with speedup gains of 1.67 on average (up to 2.46x). No other baseline, not even our previous SOTA solution, was capable of achieving results with this level of quality, considering the tradeoff among training reduction, effectiveness, and speedup. To ensure reproducibility, our documentation, code, and datasets can be accessed on GitHub – https://github.com/waashk/bio-is.Source: ACM TRANSACTIONS ON INFORMATION SYSTEMS
DOI: 10.1145/3705000
Project(s): Future Artificial Intelligence Research, Italian Strengthening of the ESFRI RI RESILIENCE, SoBigData.it
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | ACM Transactions on Information Systems Restricted | CNR IRIS Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2024 Journal article Open Access OPEN
Binary quantification and dataset shift: an experimental investigation
González P., Moreo Fernandez A. D., Sebastiani F.
Quantification is the supervised learning task that consists of training predictors of the class prevalence values of sets of unlabelled data, and is of special interest when the labelled data on which the predictor has been trained and the unlabelled data are not IID, i.e., suffer from dataset shift. To date, quantification methods have mostly been tested only on a special case of dataset shift, i.e., prior probability shift; the relationship between quantification and other types of dataset shift remains, by and large, unexplored. In this work we carry out an experimental analysis of how current quantification algorithms behave under different types of dataset shift, in order to identify limitations of current approaches and hopefully pave the way for the development of more broadly applicable methods. We do this by proposing a fine-grained taxonomy of types of dataset shift, by establishing protocols for the generation of datasets affected by these types of shift, and by testing existing quantification methods on the datasets thus generated. One finding that results from this investigation is that many existing quantification methods that had been found robust to prior probability shift are not necessarily robust to other types of dataset shift. A second finding is that no existing quantification method seems to be robust enough to dealing with all the types of dataset shift we simulate in our experiments. The code needed to reproduce all our experiments is publicly available at https://github.com/pglez82/quant_datasetshift.Source: DATA MINING AND KNOWLEDGE DISCOVERY, vol. 38 (issue 4), pp. 1670-1712
DOI: 10.1007/s10618-024-01014-1
DOI: 10.48550/arxiv.2310.04565
Project(s): AI4Media via OpenAIRE, Quantification in the Context of Dataset Shift, SoBigData RI PPP via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | Data Mining and Knowledge Discovery Open Access | CNR IRIS Open Access | arXiv.org e-Print Archive Restricted | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2024 Book Open Access OPEN
Proceedings of the 4th international workshop on Learning to Quantify (LQ 2024)
Bunse M., Gonzalez P., Moreo Fernandez A., Sebastiani F.
The 4th International Workshop on Learning to Quantify (LQ 2024 – https://lq-2024.github.io/) was held in Vilnius, LT, on September 13, 2024, as a satellite workshop of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD 2024). While the 1st edition of the workshop (LQ 2021 – https://cikmlq2021.github.io/) had to be an entirely online event, LQ 2024 (like the 2nd edition LQ 2022 – https://lq-2022.github.io/ and 3rd edition LQ 2023 – https://lq-2023.github.io/) was a hybrid event, with presentations given in-presence, and both in-presence attendees and remote attendees. The workshop was the second part (Sep 13 afternoon) of a full-day event, whose first part (Sep 13 morning) consisted of a tutorial on Learning to Quantify presented by Mirko Bunse and Alejandro Moreo. The LQ 2024 workshop consisted of the presentations of three contributed papers, plus a number of invited contributions about the LeQua 2024 challenge, i.e., an overview of the challenge presented by the organisers, plus five brief presentations by LeQua 2024 participants. The program ended with a final collective discussion on LeQua 2024, on the open problems of learning to quantify, and on future initiatives. The present volume contains the text of these nine contributions. We hope that the availability of the present volume will increase the interest in the subject of quantification on the part of researchers and practitioners alike, and will contribute to making quantification better known to potential users of this technology and to researchers interested in advancing the field.Project(s): Quantification in the Context of Dataset Shift, SoBigData via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: CNR IRIS Open Access | lq-2024.github.io Open Access | CNR IRIS Restricted


2024 Conference article Open Access OPEN
An overview of LeQua 2024, the 2nd international data challenge on Learning to Quantify
Esuli A., Moreo Fernandez A., Sebastiani F., Sperduti G.
LeQua 2024 is a data challenge about methods and systems for “learning to quantify” (a.k.a. “quantification”, or “class prior estimation”), i.e., for training predictors of the relative frequencies of classes Y = {y1, ..., yn} in sets of unlabelled datapoints. While these predictions could be easily achieved by first classifying all datapoints via a classifier and then counting how many datapoints have been assigned to each class, a growing body of literature has shown this approach to be suboptimal, especially when the training data and the test data are a!ected by some form of dataset shift, and has proposed better methods. The goal of this data challenge is to provide a setting for the comparative evaluation of methods for learning to quantify. LeQua 2024 is the 2nd edition of the LeQua challenge, following the successful 1st edition of 2022. In LeQua 2024, four tasks were o!ered. The first three tasks (T1, T2, T3) tackle learning to quantify under prior probability shift, while the fourth task (T4) tackles learning to quantify under covariate shift; T1 and T4 are about binary quantification, T2 is about single-label multiclass quantification, while T3 is about ordinal quantification. For all such tasks, data are provided to participants in ready-made vector form. In this overview article we describe in detail the structure of the data challenge and the results obtained by the participating teams.Project(s): Future Artificial Intelligence Research, Quantification in the Context of Dataset Shif, SoBigData-PlusPlus via OpenAIRE, SoBigData.it

See at: CNR IRIS Open Access | lq-2024.github.io Open Access | CNR IRIS Restricted


2023 Conference article Open Access OPEN
Ordinal quantification through regularization
Bunse M, Moreo A, Sebastiani F, Senz M
Quantification,i.e.,thetaskoftrainingpredictorsoftheclass prevalence values in sets of unlabelled data items, has received increased attention in recent years. However, most quantification research has con- centrated on developing algorithms for binary and multiclass problems in which the classes are not ordered. We here study the ordinal case, i.e., the case in which a total order is defined on the set of n > 2 classes. We give three main contributions to this field. First, we create and make available two datasets for ordinal quantification (OQ) research that overcome the inadequacies of the previously available ones. Second, we experimentally compare the most important OQ algorithms proposed in the literature so far. To this end, we bring together algorithms that are proposed by authors from very different research fields, who were unaware of each other's developments. Third, we propose three OQ algorithms, based on the idea of preventing ordinally implausible estimates through regu- larization. Our experiments show that these algorithms outperform the existing ones if the ordinal plausibility assumption holds.DOI: 10.1007/978-3-031-26419-1_3
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | link.springer.com Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2023 Book Open Access OPEN
Learning to Quantify
Esuli A, Fabris A, Moreo A, Sebastiani F
This open access book provides an introduction and an overview of learning to quantify (a.k.a. "quantification"), i.e. the task of training estimators of class proportions in unlabeled data by means of supervised learning. In data science, learning to quantify is a task of its own related to classification yet different from it, since estimating class proportions by simply classifying all data and counting the labels assigned by the classifier is known to often return inaccurate ("biased") class proportion estimates.The book introduces learning to quantify by looking at the supervised learning methods that can be used to perform it, at the evaluation measures and evaluation protocols that should be used for evaluating the quality of the returned predictions, at the numerous fields of human activity in which the use of quantification techniques may provide improved results with respect to the naive use of classification techniques, and at advanced topics in quantification research.The book is suitable to researchers, data scientists, or PhD students, who want to come up to speed with the state of the art in learning to quantify, but also to researchers wishing to apply data science technologies to fields of human activity (e.g., the social sciences, political science, epidemiology, market research) which focus on aggregate ("macro") data rather than on individual ("micro") data.DOI: 10.1007/978-3-031-20467-8
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | link.springer.com Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2023 Journal article Open Access OPEN
Measuring fairness under unawareness of sensitive attributes: a quantification-based approach
Fabris A, Esuli A, Moreo A, Sebastiani F
Algorithms and models are increasingly deployed to inform decisions about people, inevitably affecting their lives. As a consequence, those in charge of developing these models must carefully evaluate their impact on different groups of people and favour group fairness, that is, ensure that groups determined by sensitive demographic attributes, such as race or sex, are not treated unjustly. To achieve this goal, the availability (awareness) of these demographic attributes to those evaluating the impact of these models is fundamental. Unfortunately, collecting and storing these attributes is often in conflict with industry practices and legislation on data minimisation and privacy. For this reason, it can be hard to measure the group fairness of trained models, even from within the companies developing them. In this work, we tackle the problem of measuring group fairness under unawareness of sensitive attributes, by using techniques from quantification, a supervised learning task concerned with directly providing group-level prevalence estimates (rather than individual-level class labels). We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem, as they are robust to inevitable distribution shifts while at the same time decoupling the (desirable) objective of measuring group fairness from the (undesirable) side effect of allowing the inference of sensitive attributes of individuals. More in detail, we show that fairness under unawareness can be cast as a quantification problem and solved with proven methods from the quantification literature. We show that these methods outperform previous approaches to measure demographic parity in five experimental protocols, corresponding to important challenges that complicate the estimation of classifier fairness under unawareness.Source: JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, vol. 76, pp. 1117-1180
DOI: 10.1613/jair.1.14033
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | www.jair.org Open Access | CNR IRIS Restricted


2023 Journal article Open Access OPEN
Improved risk minimization algorithms for technology-assisted review
Molinari A, Esuli A, Sebastiani F
MINECORE is a recently proposed decision-theoretic algorithm for technology-assisted review that attempts to minimise the expected costs of review for responsiveness and privilege in e-discovery. In MINECORE, two probabilistic classifiers that classify documents by responsiveness and by privilege, respectively, generate posterior probabilities. These latter are fed to an algorithm that returns as output, after applying risk minimization, two ranked lists, which indicate exactly which documents the annotators should review for responsiveness and which documents they should review for privilege. In this paper we attempt to find out if the performance of MINECORE can be improved (a) by using, for the purpose of training the two classifiers, active learning (implemented either via relevance sampling, or via uncertainty sampling, or via a combination of them) instead of passive learning, and (b) by using the Saerens-Latinne-Decaestecker algorithm to improve the quality of the posterior probabilities that MINECORE receives as input. We address these two research questions by carrying out extensive experiments on the RCV1-v2 benchmark. We make publicly available the code and data for reproducing all our experiments.Source: INTELLIGENT SYSTEMS WITH APPLICATIONS, vol. 18
DOI: 10.1016/j.iswa.2023.200209
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: Intelligent Systems with Applications Open Access | CNR IRIS Open Access | ISTI Repository Open Access | www.sciencedirect.com Open Access | CNR IRIS Restricted


2023 Journal article Open Access OPEN
Unravelling interlanguage facts via explainable machine learning
Berti B, Esuli A, Sebastiani F
Native language identification (NLI) is the task of training (via supervised machine learning) a classifier that guesses the native language of the author of a text. This task has been extensively researched in the last decade, and the performance of NLI systems has steadily improved over the years. We focus on a different facet of the NLI task, i.e. that of analysing the internals of an NLI classifier trained by an explainable machine learning (EML) algorithm, in order to obtain explanations of its classification decisions, with the ultimate goal of gaining insight into which linguistic phenomena 'give a speaker's native language away'. We use this perspective in order to tackle both NLI and a (much less researched) companion task, i.e. guessing whether a text has been written by a native or a non-native speaker. Using three datasets of different provenance (two datasets of English learners' essays and a dataset of social media posts), we investigate which kind of linguistic traits (lexical, morphological, syntactic, and statistical) are most effective for solving our two tasks, namely, are most indicative of a speaker's L1; our experiments indicate that the most discriminative features are the lexical ones, followed by the morphological, syntactic, and statistical features, in this order. We also present two case studies, one on Italian and one on Spanish learners of English, in which we analyse individual linguistic traits that the classifiers have singled out as most important for spotting these L1s; we show that the traits identified as most discriminative well align with our intuition, i.e. represent typical patterns of language misuse, underuse, or overuse, by speakers of the given L1. Overall, our study shows that the use of EML can be a valuable tool for the scholar who investigates interlanguage facts and language transfer.Source: DIGITAL SCHOLARSHIP IN THE HUMANITIES
DOI: 10.1093/llc/fqad019
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: academic.oup.com Open Access | CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2023 Journal article Open Access OPEN
Multi-label quantification
Moreo A, Francisco M, Sebastiani F
Quantification, variously called supervised prevalence estimation or learning to quantify, is the supervised learning task of generating predictors of the relative frequencies (a.k.a. prevalence values) of the classes of interest in unlabelled data samples. While many quantification methods have been proposed in the past for bi- nary problems and, to a lesser extent, single-label multiclass problems, the multi-label setting (i.e., the scenario in which the classes of interest are not mutually exclusive) remains by and large unexplored. A straightfor- ward solution to the multi-label quantification problem could simply consist of recasting the problem as a set of independent binary quantification problems. Such a solution is simple but naïve, since the independence assumption upon which it rests is, in most cases, not satisfied. In these cases, knowing the relative frequency of one class could be of help in determining the prevalence of other related classes. We propose the first truly multi-label quantification methods, i.e., methods for inferring estimators of class prevalence values that strive to leverage the stochastic dependencies among the classes of interest in order to predict their relative frequencies more accurately. We show empirical evidence that natively multi-label solutions outperform the naïve approaches by a large margin. The code to reproduce all our experiments is available online.Source: ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (ONLINE), vol. 18 (issue 1)
DOI: 10.1145/3606264
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | ISTI Repository Open Access | ZENODO Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2023 Journal article Open Access OPEN
Same or different? Diff-vectors for authorship analysis
Corbara S., Moreo Fernandez A. D., Sebastiani F.
In this paper we investigate the efects on authorship identiication tasks (including authorship veriication, closed-set authorship attribution, and closed-set and open-set same-author veriication) of a fundamental shift in how to conceive the vectorial representations of documents that are given as input to a supervised learner. In ?classic? authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document. We instead investigate the situation in which a feature vector represents an unordered pair of documents, the value of a feature represents the absolute diference in the relative frequencies (or increasing functions thereof) of the feature in the two documents, and the class label indicates whether the two documents are from the same author or not. This latter (learner-independent) type of representation has been occasionally used before, but has never been studied systematically. We argue that it is advantageous, and that in some cases (e.g., authorship veriication) it provides a much larger quantity of information to the training process than the standard representation. The experiments that we carry out on several publicly available datasets (among which one that we here make available for the irst time) show that feature vectors representing pairs of documents (that we here call Dif-Vectors) bring about systematic improvements in the efectiveness of authorship identiication tasks, and especially so when training data are scarce (as it is often the case in real-life authorship identiication scenarios). Our experiments tackle same-author veriication, authorship veriication, and closed-set authorship attribution; while DVs are naturally geared for solving the 1st, we also provide two novel methods for solving the 2nd and 3rd that use a solver for the 1st as a building block. The code to reproduce our experiments is open-source and available online.Source: ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA (ONLINE)
DOI: 10.1145/3609226
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted


2023 Other Open Access OPEN
AIMH Research Activities 2023
Aloia N., Amato G., Bartalesi Lenzi V., Bianchi L., Bolettieri P., Bosio C., Carraglia M., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., De Martino C., Di Benedetto M., Esuli A., Falchi F., Fazzari E., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Molinari A., Moreo Fernandez A., Nardi A., Pedrotti A., Pratelli N., Puccetti G., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C., Versienti L.
The AIMH (Artificial Intelligence for Media and Humanities) laboratory is dedicated to exploring and pushing the boundaries in the field of Artificial Intelligence, with a particular focus on its application in digital media and humanities. This lab's objective is to enhance the current state of AI technology particularly on deep learning, text analysis, computer vision, multimedia information retrieval, multimedia content analysis, recognition, and retrieval. This report encapsulates the laboratory's progress and activities throughout the year 2023.DOI: 10.32079/isti-ar-2023/001
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2022 Conference article Open Access OPEN
LeQua@CLEF2022: learning to quantify
Esuli A, Moreo A, Sebastiani F
LeQua 2022 is a new lab for the evaluation of methods for "learning to quantify" in textual datasets, i.e., for training predictors of the relative frequencies of the classes of interest in sets of unlabelled textual documents. While these predictions could be easily achieved by first classifying all documents via a text classifier and then counting the numbers of documents assigned to the classes, a growing body of litera- ture has shown this approach to be suboptimal, and has proposed better methods. The goal of this lab is to provide a setting for the comparative evaluation of methods for learning to quantify, both in the binary set- ting and in the single-label multiclass setting. For each such setting we provide data either in ready-made vector form or in raw document form.DOI: 10.1007/978-3-030-99739-7_47
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | link.springer.com Open Access | ISTI Repository Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2022 Journal article Open Access OPEN
Report on the 1st International Workshop on Learning to Quantify (LQ 2021)
Del Coz J. J., González P., Moreo Fernandez A. D., Sebastiani F.
The 1st International Workshop on Learning to Quantify (LQ 2021 - https://cikmlq2021.github.io/), organized as a satellite event of the 30th ACM International Conference on Knowledge Management (CIKM 2021), took place on two separate days, November 1 and 5, 2021. As the main CIKM 2021 conference, the workshop was held entirely online, due to the COVID-19 pandemic. This report presents a summary of each keynote speech and contributed paper presented in this event, and discusses the issues that were raised during the workshop.Source: SIGKDD EXPLORATIONS, vol. 24 (issue 1), pp. 49-51
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted | kdd.org Restricted


2022 Journal article Open Access OPEN
Syllabic quantity patterns as rhythmic features for Latin authorship attribution
Corbara S, Moreo A, Sebastiani F
It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so-called syllabic quantity, that is, on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over others. In this research we investigate the possibility to employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. We test the impact of these features on the authorship attribution task when combined with other topic-agnostic features. Our experiments, carried out on three different datasets using support vector machines (SVMs) show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.Source: JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY
DOI: 10.1002/asi.24660
Metrics:


See at: asistdl.onlinelibrary.wiley.com Open Access | CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2022 Journal article Open Access OPEN
MedLatinEpi and MedLatinLit: two datasets for the computational authorship analysis of medieval Latin texts
Corbara S., Moreo Fernandez A., Sebastiani F., Tavoni M.
We present and make available MedLatinEpi and MedLatinLit, two datasets of medieval Latin texts to be used in research on computational authorship analysis. MedLatinEpi and MedLatinLit consist of 294 and 30 curated texts, respectively, labelled by author; MedLatinEpi texts are of epistolary nature, while MedLatinLit texts consist of literary comments and treatises about various subjects. As such, these two datasets lend themselves to supporting research in authorship analysis tasks, such as authorship attribution, authorship verification, or same-author verification. Along with the datasets we provide experimental results, obtained on these datasets, for the authorship verification task, i.e., the task of predicting whether a text of unknown authorship was written by a candidate author or not. We also make available the source code of the authorship verification system we have used, thus allowing our experiments to be reproduced, and to be used as baselines, by other researchers. We also describe the application of the above authorship verification system, using these datasets as training data, for investigating the authorship of two medieval epistles whose authorship has been disputed by scholars.Source: ACM JOURNAL ON COMPUTING AND CULTURAL HERITAGE, vol. 3 (issue 15)
DOI: 10.1145/3485822
Metrics:


See at: dl.acm.org Open Access | CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted | CNR IRIS Restricted