11 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2020 Report Open Access OPEN
AIMH research activities 2020
Aloia N., Amato G., Bartalesi V., Benedetti F., Bolettieri P., Carrara F., Casarosa V., Ciampi L., Concordia C., Corbara S., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Thanos C., Trupiano L., Vadicamo L., Vairo C.
Annual Report of the Artificial Intelligence for Media and Humanities laboratory (AIMH) research activities in 2020.Source: ISTI Annual Report, ISTI-2020-AR/001, 2020
DOI: 10.32079/isti-ar-2020/001
Metrics:


See at: ISTI Repository Open Access | CNR ExploRA


2023 Doctoral thesis Open Access OPEN
Posterior probabilities, active learning, and transfer learning in technology-assisted review
Molinari A.
Technology-Assisted Review (TAR) refers to the human-in-the-loop machine learning process whose goal is that of maximizing the cost-effectiveness of a review (i.e., the task of labeling items to satisfy an information need). This thesis explores and thoroughly analyzes: the applicability of the SLD algorithm to TAR scenarios; the usage of active learning combined with the MINECORE framework, effectively improving the framework performance; the portability of machine/deep learning models for the production of systematic reviews in empirical medicine. Finally, the thesis proposes a new algorithm, based on SLD, called SALt, which improves the class prevalence estimates on active learning scenarios, with respect to the current state-of-the-art.

See at: etd.adm.unipi.it Open Access | ISTI Repository Open Access | CNR ExploRA


2023 Journal article Open Access OPEN
SALt: efficiently stopping TAR by improving priors estimates
Molinari A., Esuli A.
In high recall retrieval tasks, human experts review a large pool of documents with the goal of satisfying an information need. Documents are prioritized for review through an active learning policy, and the process is usually referred to as Technology-Assisted Review (TAR). TAR tasks also aim to stop the review process once the target recall is achieved to minimize the annotation cost. In this paper, we introduce a new stopping rule called SALR? (SLD for Active Learning), a modified version of the Saerens-Latinne-Decaestecker algorithm (SLD) that has been adapted for use in active learning. Experiments show that our algorithm stops the review well ahead of the current state-of-the-art methods, while providing the same guarantees of achieving the target recall.Source: Data mining and knowledge discovery (Dordrecht. Online) (2023). doi:10.1007/s10618-023-00961-5
DOI: 10.1007/s10618-023-00961-5
Metrics:


See at: link.springer.com Open Access | ISTI Repository Open Access | CNR ExploRA


2021 Report Open Access OPEN
AIMH research activities 2021
Aloia N., Amato G., Bartalesi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Coccomini D., Ciampi L., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Massoli F. V., Meghini C., Messina N., Metilli D., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability. This report summarize the 2021 activities of the research group.Source: ISTI Annual Report, ISTI-2021-AR/003, pp.1–34, 2021
DOI: 10.32079/isti-ar-2021/003
Metrics:


See at: ISTI Repository Open Access | CNR ExploRA


2022 Report Open Access OPEN
AIMH research activities 2022
Aloia N., Amato G., Bartalesi V., Benedetti F., Bolettieri P., Cafarelli D., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., Di Benedetto M., Esuli A., Falchi F., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Metilli D., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C.
The Artificial Intelligence for Media and Humanities laboratory (AIMH) has the mission to investigate and advance the state of the art in the Artificial Intelligence field, specifically addressing applications to digital media and digital humanities, and taking also into account issues related to scalability. This report summarize the 2022 activities of the research group.Source: ISTI Annual reports, 2022
DOI: 10.32079/isti-ar-2022/002
Metrics:


See at: ISTI Repository Open Access | CNR ExploRA


2022 Journal article Open Access OPEN
Transferring knowledge between topics in systematic reviews
Molinari A., Kanoulas E.
In the medical domain, a systematic review (SR) is a well-structured process aimed to review all available literature on a research question. This is however a laborious task, both in terms of money and time. As such, the automation of a SR with the aid of technology has received interest in several research communities, among which the Information Retrieval community. In this work, we experiment on the possibility of leveraging previously conducted systematic reviews to train a classifier/ranker which is later applied to a new SR. We also investigate on the possibility of pre-training Deep Learning models and eventually tuning them in an Active Learning process. Our results show that the pre-training of these models deliver a good zero-shot (i.e., with no fine-tuning) ranking, achieving an improvement of 79% for the MAP metric, with respect to a standard classifier trained on few in-domain documents. However, the pre-trained deep learning algorithms fail to deliver consistent results when continuously trained in an Active Learning scenario: our analysis shows that using smaller sized models and employing adapter modules might enable an effective active learning training.Source: Intelligent systems with applications 16 (2022). doi:10.1016/j.iswa.2022.200150
DOI: 10.1016/j.iswa.2022.200150
Metrics:


See at: Intelligent Systems with Applications Open Access | ISTI Repository Open Access | www.sciencedirect.com Open Access | CNR ExploRA


2023 Journal article Open Access OPEN
Improved risk minimization algorithms for technology-assisted review
Molinari A., Esuli A., Sebastiani F.
MINECORE is a recently proposed decision-theoretic algorithm for technology-assisted review that attempts to minimise the expected costs of review for responsiveness and privilege in e-discovery. In MINECORE, two probabilistic classifiers that classify documents by responsiveness and by privilege, respectively, generate posterior probabilities. These latter are fed to an algorithm that returns as output, after applying risk minimization, two ranked lists, which indicate exactly which documents the annotators should review for responsiveness and which documents they should review for privilege. In this paper we attempt to find out if the performance of MINECORE can be improved (a) by using, for the purpose of training the two classifiers, active learning (implemented either via relevance sampling, or via uncertainty sampling, or via a combination of them) instead of passive learning, and (b) by using the Saerens-Latinne-Decaestecker algorithm to improve the quality of the posterior probabilities that MINECORE receives as input. We address these two research questions by carrying out extensive experiments on the RCV1-v2 benchmark. We make publicly available the code and data for reproducing all our experiments.Source: Intelligent systems with applications 18 (2023). doi:10.1016/j.iswa.2023.200209
DOI: 10.1016/j.iswa.2023.200209
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: Intelligent Systems with Applications Open Access | ISTI Repository Open Access | www.sciencedirect.com Open Access | CNR ExploRA


2023 Report Open Access OPEN
AIMH Research Activities 2023
Aloia N., Amato G., Bartalesi V., Bianchi L., Bolettieri P., Bosio C., Carraglia M., Carrara F., Casarosa V., Ciampi L., Coccomini D. A., Concordia C., Corbara S., De Martino C., Di Benedetto M., Esuli A., Falchi F., Fazzari E., Gennaro C., Lagani G., Lenzi E., Meghini C., Messina N., Molinari A., Moreo A., Nardi A., Pedrotti A., Pratelli N., Puccetti G., Rabitti F., Savino P., Sebastiani F., Sperduti G., Thanos C., Trupiano L., Vadicamo L., Vairo C., Versienti L.
The AIMH (Artificial Intelligence for Media and Humanities) laboratory is dedicated to exploring and pushing the boundaries in the field of Artificial Intelligence, with a particular focus on its application in digital media and humanities. This lab's objective is to enhance the current state of AI technology particularly on deep learning, text analysis, computer vision, multimedia information retrieval, multimedia content analysis, recognition, and retrieval. This report encapsulates the laboratory's progress and activities throughout the year 2023.Source: ISTI Annual Reports, 2023
DOI: 10.32079/isti-ar-2023/001
Metrics:


See at: ISTI Repository Open Access | CNR ExploRA


2020 Journal article Open Access OPEN
A critical reassessment of the Saerens-Latinne-Decaestecker algorithm for posterior probability adjustment
Esuli A., Molinari A., Sebastiani F.
We critically re-examine the Saerens-Latinne-Decaestecker (SLD) algorithm, a well-known method for estimating class prior probabilities ("priors") and adjusting posterior probabilities ("posteriors") in scenarios characterized by distribution shift, i.e., difference in the distribution of the priors between the training and the unlabelled documents. Given a machine-learned classifier and a set of unlabelled documents for which the classifier has returned posterior probabilities and estimates of the prior probabilities, SLD updates them both in an iterative, mutually recursive way, with the goal of making both more accurate; this is of key importance in downstream tasks such as single-label multiclass classification and cost-sensitive text classification. Since its publication, SLD has become the standard algorithm for improving the quality of the posteriors in the presence of distribution shift, and SLD is still considered a top contender when we need to estimate the priors (a task that has become known as "quantification"). However, its real effectiveness in improving the quality of the posteriors has been questioned. We here present the results of systematic experiments conducted on a large, publicly available dataset, across multiple amounts of distribution shift and multiple learners. Our experiments show that SLD improves the quality of the posterior probabilities and of the estimates of the prior probabilities, but only when the number of classes in the classification scheme is very small and the classifier is calibrated. As the number of classes grows, or as we use non-calibrated classifiers, SLD converges more slowly (and often does not converge at all), performance degrades rapidly, and the impact of SLD on the quality of the prior estimates and of the posteriors becomes negative rather than positive.Source: ACM transactions on information systems 39 (2020). doi:10.1145/3433164
DOI: 10.1145/3433164
Project(s): AI4Media via OpenAIRE, ARIADNEplus via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: ISTI Repository Open Access | ZENODO Open Access | ACM Transactions on Information Systems Open Access | dl.acm.org Restricted | ACM Transactions on Information Systems Restricted | CNR ExploRA


2019 Conference article Open Access OPEN
Leveraging the transductive nature of e-discovery in cost-sensitive technology-assisted review
Molinari A.
MINECORE is a recently proposed algorithm for minimizing the expected costs of review for topical relevance (a.k.a. "responsiveness") and sensitivity (a.k.a. "privilege") in e-discovery. Given a set of documents that must be classified by both responsiveness and privilege, for each such document and for both classification criteria MINECORE determines whether the class assigned by an automated classifier should be manually reviewed or not. This determination is heavily dependent on the ("posterior") probabilities of class membership returned by the automated classifiers, on the costs of manually reviewing a document (for responsiveness, for privilege, or for both), and on the costs that different types of misclassification would bring about. We attempt to improve on MINECORE by leveraging the transductive nature of e-discovery, i.e., the fact that the set of documents that must be classified is finite and available at training time. This allows us to use EMQ, a well-known algorithm that attempts to improve the quality of the posterior probabilities of unlabelled documents in transductive settings, with the goal of improving the quality (a) of the posterior probabilities that are input to MINECORE, and thus (b) of MINECORE's output. We report experimental results obtained on a large (? 800K) dataset of textual documents.Source: FDIA 2019 - 9th PhD Symposium on Future Directions in Information Access co-located with 12th European Summer School in Information Retrieval (ESSIR 2019), pp. 72–78, Milan, Italy, July 17-18, 2019

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA


2022 Conference article Open Access OPEN
Active learning and the Saerens-Latinne-Decaestecker algorithm: an evaluation
Molinari A., Esuli A., Sebastiani F.
The Saerens-Latinne-Decaestecker (SLD) algorithm is a method whose goal is improving the quality of the posterior probabilities (or simply "posteriors") returned by a probabilistic classifier in scenarios characterized by prior probability shift (PPS) between the training set and the unlabelled ("test") set. This is an important task, (a) because posteriors are of the utmost importance in downstream tasks such as, e.g., multiclass classification and cost-sensitive classification, and (b) because PPS is ubiquitous in many applications. In this paper we explore whether using SLD can indeed improve the quality of posteriors returned by a classifier trained via active learning (AL), a class of machine learning (ML) techniques that indeed tend to generate substantial PPS. Specifically, we target AL via relevance sampling (ALvRS) and AL via uncertainty sampling (ALvUS), two AL techniques that are very well-known especially because, due to their low computational cost, are suitable to being applied in scenarios characterized by large datasets. We present experimental results obtained on the RCV1-v2 dataset, showing that SLD fails to deliver better-quality posteriors with both ALvRS and ALvUS, thus contradicting previous findings in the literature, and that this is due not to the amount of PPS that these techniques generate, but to how the examples they prioritize for annotation are distributed.Source: CIRCLE 2022 - 2nd Joint Conference of the Information Retrieval Communities in Europe, Samatan, Gers, France, 4-7/07/2022
Project(s): AI4Media via OpenAIRE, SoBigData-PlusPlus via OpenAIRE

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA