280 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
more
Rights operator: and / or
2021 Journal article Open Access OPEN

A critical reassessment of the Saerens-Latinne-Decaestecker algorithm for posterior probability adjustment
Esuli A., Molinari A., Sebastiani F.
We critically re-examine the Saerens-Latinne-Decaestecker (SLD) algorithm, a well-known method for estimating class prior probabilities ("priors") and adjusting posterior probabilities ("posteriors") in scenarios characterized by distribution shift, i.e., difference in the distribution of the priors between the training and the unlabelled documents. Given a machine-learned classifier and a set of unlabelled documents for which the classifier has returned posterior probabilities and estimates of the prior probabilities, SLD updates them both in an iterative, mutually recursive way, with the goal of making both more accurate; this is of key importance in downstream tasks such as single-label multiclass classification and cost-sensitive text classification. Since its publication, SLD has become the standard algorithm for improving the quality of the posteriors in the presence of distribution shift, and SLD is still considered a top contender when we need to estimate the priors (a task that has become known as "quantification"). However, its real effectiveness in improving the quality of the posteriors has been questioned. We here present the results of systematic experiments conducted on a large, publicly available dataset, across multiple amounts of distribution shift and multiple learners. Our experiments show that SLD improves the quality of the posterior probabilities and of the estimates of the prior probabilities, but only when the number of classes in the classification scheme is very small and the classifier is calibrated. As the number of classes grows, or as we use non-calibrated classifiers, SLD converges more slowly (and often does not converge at all), performance degrades rapidly, and the impact of SLD on the quality of the prior estimates and of the posteriors becomes negative rather than positive.Source: ACM transactions on information systems 39 (2021). doi:10.1145/3433164
DOI: 10.1145/3433164
Project(s): AI4Media via OpenAIRE, ARIADNEplus via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: ZENODO Open Access | ACM Transactions on Information Systems Open Access | ACM Transactions on Information Systems Restricted | dl.acm.org Restricted | ACM Transactions on Information Systems Restricted | CNR ExploRA Restricted


2021 Journal article Open Access OPEN

Word-class embeddings for multiclass text classification
Moreo A., Esuli A., Sebastiani F.
Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using six popular neural architectures and six widely used and publicly available datasets for multiclass text classification. One further advantage of this method is that it is conceptually simple and straightforward to implement. Our code that implements WCEs is publicly available at https://github.com/AlexMoreo/word-class-embeddings.Source: Data mining and knowledge discovery 35 (2021): 911–963. doi:10.1007/s10618-020-00735-3
DOI: 10.1007/s10618-020-00735-3
DOI: 10.5281/zenodo.4468312
DOI: 10.5281/zenodo.4468313
Project(s): AI4Media via OpenAIRE, ARIADNEplus via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: arXiv.org e-Print Archive Open Access | ISTI Repository Open Access | link.springer.com Restricted | Data Mining and Knowledge Discovery Restricted | Data Mining and Knowledge Discovery Restricted | CNR ExploRA Restricted


2021 Journal article Open Access OPEN

Lost in transduction: transductive transfer learning in text classification
Moreo A., Esuli A., Sebastiani F.
Obtaining high-quality labelled data for training a classifier in a new application domain is often costly. Transfer Learning(a.k.a. "Inductive Transfer") tries to alleviate these costs by transferring, to the "target"domain of interest, knowledge available from a different "source"domain. In transfer learning the lack of labelled information from the target domain is compensated by the availability at training time of a set of unlabelled examples from the target distribution. Transductive Transfer Learning denotes the transfer learning setting in which the only set of target documents that we are interested in classifying is known and available at training time. Although this definition is indeed in line with Vapnik's original definition of "transduction", current terminology in the field is confused. In this article, we discuss how the term "transduction"has been misused in the transfer learning literature, and propose a clarification consistent with the original characterization of this term given by Vapnik. We go on to observe that the above terminology misuse has brought about misleading experimental comparisons, with inductive transfer learning methods that have been incorrectly compared with transductive transfer learning methods. We then, give empirical evidence that the difference in performance between the inductive version and the transductive version of a transfer learning method can indeed be statistically significant (i.e., that knowing at training time the only data one needs to classify indeed gives an advantage). Our clarification allows a reassessment of the field, and of the relative merits of the major, state-of-The-Art algorithms for transfer learning in text classification.Source: ACM transactions on knowledge discovery from data 16 (2021). doi:10.1145/3453146
DOI: 10.1145/3453146
Project(s): ARIADNEplus via OpenAIRE

See at: ISTI Repository Open Access | dl.acm.org Restricted | CNR ExploRA Restricted


2021 Conference article Open Access OPEN

Heterogeneous document embeddings for cross-lingual text classification
Moreo A., Pedrotti A., Sebastiani F.
Funnelling (Fun) is a method for cross-lingual text classification (CLC) based on a two-tier ensemble for heterogeneous transfer learning. In Fun, 1st-tier classifiers, each working on a different, language-dependent feature space, return a vector of calibrated posterior probabilities (with one dimension for each class) for each document, and the final classification decision is taken by a meta-classifier that uses this vector as its input. The metaclassifier can thus exploit class-class correlations, and this (among other things) gives Fun an edge over CLC systems where these correlations cannot be leveraged. We here describe Generalized Funnelling (gFun), a learning ensemble where the metaclassifier receives as input the above vector of calibrated posterior probabilities, concatenated with document embeddings (aligned across languages) that embody other types of correlations, such as word-class correlations (as encoded by Word-Class Embeddings) and word-word correlations (as encoded by Multilingual Unsupervised or Supervised Embeddings). We show that gFun improves on Fun by describing experiments on two large, standard multilingual datasets for multi-label text classification.Source: SAC 2021: 36th ACM/SIGAPP Symposium On Applied Computing, pp. 685–688, Online conference, 22-26/03/2021
DOI: 10.1145/3412841.3442093
Project(s): AI4Media via OpenAIRE, ARIADNEplus via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: ISTI Repository Open Access | ZENODO Open Access | dl.acm.org Restricted | dl.acm.org Restricted | CNR ExploRA Restricted


2021 Conference article Open Access OPEN

Re-assessing the "Classify and Count" quantification method
Moreo A., Sebastiani F.
Learning to quantify (a.k.a. quantification) is a task concerned with training unbiased estimators of class prevalence via supervised learning. This task originated with the observation that "Classify and Count" (CC), the trivial method of obtaining class prevalence estimates, is often a biased estimator, and thus delivers suboptimal quantification accuracy. Following this observation, several methods for learning to quantify have been proposed and have been shown to outperform CC. In this work we contend that previous works have failed to use properly optimised versions of CC. We thus reassess the real merits of CC and its variants, and argue that, while still inferior to some cutting-edge methods, they deliver near-state-of-the-art accuracy once (a) hyperparameter optimisation is performed, and (b) this optimisation is performed by using a truly quantification-oriented evaluation protocol. Experiments on three publicly available binary sentiment classification datasets support these conclusions.Source: ECIR 2021 - 43rd European Conference on Information Retrieval, pp. 75–91, Online conference, 28/03-01/04/2021
DOI: 10.1007/978-3-030-72240-1_6
DOI: 10.5281/zenodo.4468277
DOI: 10.5281/zenodo.4468276
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: arXiv.org e-Print Archive Open Access | ISTI Repository Open Access | ZENODO Open Access | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted


2021 Contribution to conference Open Access OPEN

Advances in Information Retrieval. 43rd European Conference on IR Research, ECIR 2021. Proceedings
Hiemstra D., Moens M. F., Mothe J., Perego R., Potthast M., Sebastiani F.
This two-volume set LNCS 12656 and 12657 constitutes the refereed proceedings of the 43rd European Conference on IR Research, ECIR 2021, held virtually in March/April 2021, due to the COVID-19 pandemic. The 50 full papers presented together with 11 reproducibility papers, 39 short papers, 15 demonstration papers, 12 CLEF lab descriptions papers, 5 doctoral consortium papers, 5 workshop abstracts, and 8 tutorials abstracts were carefully reviewed and selected from 436 submissions. The accepted contributions cover the state of the art in IR: deep learning-based information retrieval techniques, use of entities and knowledge graphs, recommender systems, retrieval methods, information extraction, question answering, topic and prediction models, multimedia retrieval, and much more.DOI: 10.1007/978-3-030-72240-1

See at: ISTI Repository Open Access | CNR ExploRA Open Access


2021 Contribution to journal Open Access OPEN

Report on the 43rd European Conference on Information Retrieval (ECIR 2021)
Perego R., Sebastiani F.
Source: SIGIR forum 55 (2021).

See at: ISTI Repository Open Access | CNR ExploRA Open Access | sigir.org Open Access


2020 Journal article Open Access OPEN

Evaluation measures for quantification: an axiomatic approach
Sebastiani F.
Quantification is the task of estimating, given a set ? of unlabelled items and a set of classes ?={c1,...,c|?|}, the prevalence (or "relative frequency") in ? of each class ci??. While quantification may in principle be solved by classifying each item in ? and counting how many such items have been labelled with ci, it has long been shown that this "classify and count" method yields suboptimal quantification accuracy. As a result, quantification is no longer considered a mere byproduct of classification, and has evolved as a task of its own. While the scientific community has devoted a lot of attention to devising more accurate quantification methods, it has not devoted much to discussing what properties an evaluation measure for quantification (EMQ) should enjoy, and which EMQs should be adopted as a result. This paper lays down a number of interesting properties that an EMQ may or may not enjoy, discusses if (and when) each of these properties is desirable, surveys the EMQs that have been used so far, and discusses whether they enjoy or not the above properties. As a result of this investigation, some of the EMQs that have been used in the literature turn out to be severely unfit, while others emerge as closer to what the quantification community actually needs. However, a significant result is that no existing EMQ satisfies all the properties identified as desirable, thus indicating that more research is needed in order to identify (or synthesize) a truly adequate EMQ.Source: Information retrieval (Boston) 23 (2020): 255–288. doi:10.1007/s10791-019-09363-y
DOI: 10.1007/s10791-019-09363-y

See at: arXiv.org e-Print Archive Open Access | Information Retrieval Open Access | ISTI Repository Open Access | Information Retrieval Restricted | Information Retrieval Restricted | Information Retrieval Restricted | Information Retrieval Restricted | Information Retrieval Restricted | CNR ExploRA Restricted


2020 Journal article Open Access OPEN

Cross-Lingual Sentiment Quantification
Esuli A., Moreo A., Sebastiani F.
Sentiment Quantification is the task of estimating the relative frequency of sentiment-related classes-such as Positive and Negative-in a set of unlabeled documents. It is an important topic in sentiment analysis, as the study of sentiment-related quantities and trends across a population is often of higher interest than the analysis of individual instances. In this article, we propose a method for cross-lingual sentiment quantification, the task of performing sentiment quantification when training documents are available for a source language S, but not for the target language T, for which sentiment quantification needs to be performed. Cross-lingual sentiment quantification (and cross-lingual text quantification in general) has never been discussed before in the literature; we establish baseline results for the binary case by combining state-of-the-art quantification methods with methods capable of generating cross-lingual vectorial representations of the source and target documents involved. Experiments on publicly available datasets for crosslingual sentiment classification show that the presented method performs cross-lingual sentiment quantification with high accuracy.Source: IEEE intelligent systems 35 (2020): 106–113. doi:10.1109/MIS.2020.2979203
DOI: 10.1109/mis.2020.2979203
Project(s): SoBigData-PlusPlus via OpenAIRE

See at: IEEE Intelligent Systems Open Access | ISTI Repository Open Access | ISTI Repository Open Access | IEEE Intelligent Systems Restricted | IEEE Intelligent Systems Restricted | IEEE Intelligent Systems Restricted | IEEE Intelligent Systems Restricted | IEEE Intelligent Systems Restricted | IEEE Intelligent Systems Restricted | IEEE Intelligent Systems Restricted | ieeexplore.ieee.org Restricted | IEEE Intelligent Systems Restricted | CNR ExploRA Restricted | IEEE Intelligent Systems Restricted


2020 Report Open Access OPEN

Tweet Sentiment Quantification: An Experimental Re-Evaluation
Moreo A., Sebastiani F.
Sentiment quantification is the task of estimating the relative frequency (or" prevalence") of sentiment-related classes (such as Positive, Neutral, Negative) in a sample of unlabelled texts; this is especially important when these texts are tweets, since most sentiment classification endeavours carried out on Twitter data actually have quantification (and not the classification of individual tweets) as their ultimate goal. It is well-known that solving quantification via" classify and count"(ie, by classifying all unlabelled items via a standard classifier and counting the items that have been assigned to a given class) is suboptimal in terms of accuracy, and that more accurate quantification methods exist. In 2016, Gao and Sebastiani carried out a systematic comparison of quantification methods on the task of tweet sentiment quantification. In hindsight, we observe that the experimental protocol followed in that work is flawed, and that its results are thus unreliable. We now re-evaluate those quantification methods on the very same datasets, this time following a now consolidated and much more robust experimental protocol, that involves 5775 as many experiments as run in the original study. Our experimentation yields results dramatically different from those obtained by Gao and Sebastiani, and thus provide a different, much more solid understanding of the relative strengths and weaknesses of different sentiment quantification methods.Source: Research report, SoBigData++ and AI4Media, 2020
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA Open Access


2020 Report Open Access OPEN

Re-Assessing the" Classify and Count" Quantification Method
Moreo A., Sebastiani F.
Learning to quantify (aka\quantification) is a task concerned with training unbiased estimators of class prevalence via supervised learning. This task originated with the observation that" Classify and Count"(CC), the trivial method of obtaining class prevalence estimates, is often a biased estimator, and thus delivers suboptimal quantification accuracy; following this observation, several methods for learning to quantify have been proposed that have been shown to outperform CC. In this work we contend that previous works have failed to use properly optimised versions of CC. We thus reassess the real merits of CC (and its variants), and argue that, while still inferior to some cutting-edge methods, they deliver near-state-of-the-art accuracy once (a) hyperparameter optimisation is performed, and (b) this optimisation is performed by using a true quantification loss instead of a standard classification-based loss. Experiments on three publicly available binary sentiment classification datasets support these conclusions.Source: Research report, SoBigData++ and AI4Media, 2020
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA Open Access


2020 Report Open Access OPEN

MedLatin1 and MedLatin2: Two Datasets for the Computational Authorship Analysis of Medieval Latin Texts
Corbara S., Moreo A., Sebastiani F., Tavoni M.
We present and make available MedLatin1 and MedLatin2, two datasets of medieval Latin texts to be used in research on computational authorship analysis. MedLatin1 and MedLatin2 consist of 294 and 30 curated texts, respectively, labelled by author, with MedLatin1 texts being of an epistolary nature and MedLatin2 texts consisting of literary comments and treatises about various subjects. As such, these two datasets lend themselves to supporting research in authorship analysis tasks, such as authorship attribution, authorship verification, or same-author verification.Source: Research report, 2020

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA Open Access


2020 Contribution to conference Open Access OPEN

Evaluation Measures for Quantification: An Axiomatic Approach
Sebastiani F.
Source: 42nd European Conference on Information Retrieval, pp. 862–862, Lisbon, PT, 14-17/04/2020
DOI: 10.1007/978-3-030-45439-5

See at: link.springer.com Open Access | ISTI Repository Open Access | CNR ExploRA Open Access


2019 Journal article Open Access OPEN

Jointly minimizing the expected costs of review for responsiveness and privilege in e-discovery
Oard D. W., Sebastiani F., Vinjumur J. K.
Discovery is an important aspect of the civil litigation process in the United States of America, in which all parties to a lawsuit are permitted to request relevant evidence from other parties. With the rapid growth of digital content, the emerging need for "e-discovery" has created a strong demand for techniques that can be used to review massive collections both for "responsiveness" (i.e., relevance) to the request and for "privilege" (i.e., presence of legally protected content that the party performing the review may have a right to withhold). In this process, the party performing the review may incur costs of two types, namely, annotation costs (deriving from the fact that human reviewers need to be paid for their work) and misclassification costs (deriving from the fact that failing to correctly determine the responsiveness or privilege of a document may adversely affect the interests of the parties in various ways). Relying exclusively on automatic classification would minimize annotation costs but could result in substantial misclassification costs, while relying exclusively on manual classification could generate the opposite consequences. This article proposes a risk minimization framework (called MINECORE, for "minimizing the expected costs of review") that seeks to strike an optimal balance between these two extreme stands. In MINECORE (a) the documents are first automatically classified for both responsiveness and privilege, and then (b) some of the automatically classified documents are annotated by human reviewers for responsiveness (typically by junior reviewers) and/or, in cascade, for privilege (typically by senior reviewers), with the overall goal of minimizing the expected cost (i.e., the risk) of the entire process. Risk minimization is achieved by optimizing, for both responsiveness and privilege, the choice of which documents to manually review. We present a simulation study in which classes from a standard text classification test collection (RCV1-v2) are used as surrogates for responsiveness and privilege. The results indicate that MINECORE can yield substantially lower total cost than any of a set of strong baselines.Source: ACM transactions on information systems 37 (2019). doi:10.1145/3268928
DOI: 10.1145/3268928

See at: ACM Transactions on Information Systems Open Access | ISTI Repository Open Access | ACM Transactions on Information Systems Restricted | ACM Transactions on Information Systems Restricted | dl.acm.org Restricted | ACM Transactions on Information Systems Restricted | ACM Transactions on Information Systems Restricted | ACM Transactions on Information Systems Restricted | ACM Transactions on Information Systems Restricted | CNR ExploRA Restricted


2019 Report Open Access OPEN

Funnelling: a new ensemble method for heterogeneous transfer learning and its application to polylingual text classification
Esuli A., Moreo Fernandez A. D., Sebastiani F.
Polylingual Text Classification (PLC) consists of automatically classifying, according to a common set C of classes, documents each written in one of a set of languages L, and doing so more accurately than when naively classifying each document via its corresponding language-specific classifier. In order to obtain an increase in the classification accuracy for a given language, the system thus needs to also leverage the training examples written in the other languages. We tackle multilabel PLC via funnelling, a new ensemble learning method that we propose here. Funnelling consists of generating a two-tier classification system where all documents, irrespectively of language, are classified by the same (2nd-tier) classifier. For this classifier all documents are represented in a common, language-independent feature space consisting of the posterior probabilities generated by 1st-tier, language-dependent classifiers. This allows the classification of all test documents, of any language, to benefit from the information present in all training documents, of any language. We present substantial experiments, run on publicly available polylingual text collections, in which funnelling is shown to significantly outperform a number of state-of-the-art baselines. All code and datasets (in vector form) are made publicly available.Source: Research report, 2019

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA Open Access


2019 Report Open Access OPEN

Building automated survey coders via interactive machine learning
Esuli A., Moreo A. D., Sebastiani F.
Software systems trained via machine learning to automatically classify open-ended answers (a.k.a. verbatims) are by now a reality. Still, their adoption in the survey coding industry has been less widespread than it might have been. Among the factors that have hindered a more massive takeup of this technology are the effort involved in manually coding a sufficient amount of training data, the fact that small studies do not seem to justify this effort, and the fact that the process needs to be repeated anew when brand new coding tasks arise. In this paper we will argue for an approach to building verbatim classifiers that we will call "Interactive Learning", and that addresses all the above problems. We will show that, for the same amount of training effort, interactive learning delivers much better coding accuracy than standard "non-interactive" learning. This is especially true when the amount of data we are willing to manually code is small, which makes this approach attractive also for small-scale studies. Interactive learning also lends itself to reusing previously trained classifiers for dealing with new (albeit related) coding tasks. Interactive learning also integrates better in the daily workflow of the survey specialist, and delivers a better user experience overall.Source: Research report, 2019

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA Open Access


2019 Report Open Access OPEN

Learning to weight for text classification
Moreo Fernández A. D., Esuli A., Sebastiani F.
In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although `supervised term weighting' approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article we analyse the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimised on the training set of interest; we dub this approach Learning to Weight (LTW). The experiments that we run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.Source: arXiv:1903.12090 [cs.LG], 2019

See at: arxiv.org Open Access | ISTI Repository Open Access | CNR ExploRA Open Access


2019 Journal article Open Access OPEN

Building automated survey coders via interactive machine learning
Esuli A., Moreo Fernandez A. D., Sebastiani F.
Software systems trained via machine learning to automatically classify open-ended answers (a.k.a. verbatims) are by now a reality. Still, their adoption in the survey coding industry has been less widespread than it might have been. Among the factors that have hindered a more massive takeup of this technology are the effort involved in manually coding a sufficient amount of training data, the fact that small studies do not seem to justify this effort, and the fact that the process needs to be repeated anew when brand new coding tasks arise. In this article, we will argue for an approach to building verbatim classifiers that we will call 'Interactive Learning,' and that addresses all the above problems. We will show that, for the same amount of training effort, interactive learning delivers much better coding accuracy than standard "non-interactive" learning. This is especially true when the amount of data we are willing to manually code is small, which makes this approach attractive also for small-scale studies. Interactive learning also lends itself to reusing previously trained classifiers for dealing with new (albeit related) coding tasks. Interactive learning also integrates better in the daily workflow of the survey specialist and delivers a better user experience overall.Source: International journal of market research 61 (2019): 1–22. doi:10.1177/1470785318824244
DOI: 10.1177/1470785318824244

See at: arXiv.org e-Print Archive Open Access | International Journal of Market Research Open Access | ISTI Repository Open Access | International Journal of Market Research Restricted | International Journal of Market Research Restricted | International Journal of Market Research Restricted | International Journal of Market Research Restricted | International Journal of Market Research Restricted | International Journal of Market Research Restricted | journals.sagepub.com Restricted | International Journal of Market Research Restricted | CNR ExploRA Restricted


2019 Journal article Open Access OPEN

Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification
Esuli A., Moreo Fernandez A. D., Sebastiani F.
Cross-lingual Text Classification(CLC) consists of automatically classifying, according to a common setCofclasses, documents each written in one of a set of languagesL, and doing so more accurately than when"naïvely" classifying each document via its corresponding language-specific classifier. In order to obtain anincrease in the classification accuracy for a given language, the system thus needs to also leverage the trainingexamples written in the other languages. We tackle "multilabel" CLC viafunnelling, a new ensemble learningmethod that we propose here. Funnelling consists of generating a two-tier classification system where alldocuments, irrespectively of language, are classified by the same (2nd-tier) classifier. For this classifier alldocuments are represented in a common, language-independent feature space consisting of the posteriorprobabilities generated by 1st-tier, language-dependent classifiers. This allows the classification of all testdocuments, of any language, to benefit from the information present in all training documents, of any language.We present substantial experiments, run on publicly available multilingual text collections, in which funnellingis shown to significantly outperform a number of state-of-the-art baselines. All code and datasets (in vectorform) are made publicly available.Source: ACM transactions on information systems 37 (2019): 1–30. doi:10.1145/3326065
DOI: 10.1145/3326065
Project(s): ARIADNEplus via OpenAIRE

See at: ISTI Repository Open Access | ACM Transactions on Information Systems Restricted | ACM Transactions on Information Systems Restricted | ACM Transactions on Information Systems Restricted | ACM Transactions on Information Systems Restricted | ACM Transactions on Information Systems Restricted | dl.acm.org Restricted | ACM Transactions on Information Systems Restricted | ACM Transactions on Information Systems Restricted | CNR ExploRA Restricted | ACM Transactions on Information Systems Restricted


2019 Conference article Open Access OPEN

The Epistle to Cangrande Through the Lens of Computational Authorship Verification
Corbara S., Moreo A., Sebastiani F., Tavoni M.
The Epistle to Cangrande is one of the most controversial among the works of Italian poet Dante Alighieri. For more than a hundred years now, scholars have been debating over its real paternity, i.e., whether it should be considered a true work by Dante or a forgery by an unnamed author. In this work we address this philological problem through the methodologies of (supervised) Computational Authorship Verification, by training a classifier that predicts whether a given work is by Dante Alighieri or not. We discuss the system we have set up for this endeavour, the training set we have assembled, the experimental results we have obtained, and some issues that this work leaves open.Source: International Conference on Image Analysis and Processing, pp. 148–158, Trento, Italia, 9-13 settembre, 2019
DOI: 10.1007/978-3-030-30754-7_15

See at: ISTI Repository Open Access | academic.microsoft.com Restricted | ceur-ws.org Restricted | dblp.uni-trier.de Restricted | link.springer.com Restricted | link.springer.com Restricted | CNR ExploRA Restricted