2019
Conference article  Open Access

Leveraging the transductive nature of e-discovery in cost-sensitive technology-assisted review

Molinari A.

Automated classifiers  Classification criterion  Cost-sensitive  Expected costs  Misclassifications  Posterior probability  Textual documents  Training time 

MINECORE is a recently proposed algorithm for minimizing the expected costs of review for topical relevance (a.k.a. "responsiveness") and sensitivity (a.k.a. "privilege") in e-discovery. Given a set of documents that must be classified by both responsiveness and privilege, for each such document and for both classification criteria MINECORE determines whether the class assigned by an automated classifier should be manually reviewed or not. This determination is heavily dependent on the ("posterior") probabilities of class membership returned by the automated classifiers, on the costs of manually reviewing a document (for responsiveness, for privilege, or for both), and on the costs that different types of misclassification would bring about. We attempt to improve on MINECORE by leveraging the transductive nature of e-discovery, i.e., the fact that the set of documents that must be classified is finite and available at training time. This allows us to use EMQ, a well-known algorithm that attempts to improve the quality of the posterior probabilities of unlabelled documents in transductive settings, with the goal of improving the quality (a) of the posterior probabilities that are input to MINECORE, and thus (b) of MINECORE's output. We report experimental results obtained on a large (? 800K) dataset of textual documents.

Source: FDIA 2019 - 9th PhD Symposium on Future Directions in Information Access co-located with 12th European Summer School in Information Retrieval (ESSIR 2019), pp. 72–78, Milan, Italy, July 17-18, 2019

Publisher: M. Jeusfeld c/o Redaktion Sun SITE, Informatik V, RWTH Aachen., Aachen, Germania



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:442377,
	title = {Leveraging the transductive nature of e-discovery in cost-sensitive technology-assisted review},
	author = {Molinari A.},
	publisher = {M. Jeusfeld c/o Redaktion Sun SITE, Informatik V, RWTH Aachen., Aachen, Germania},
	booktitle = {FDIA 2019 - 9th PhD Symposium on Future Directions in Information Access co-located with 12th European Summer School in Information Retrieval (ESSIR 2019), pp. 72–78, Milan, Italy, July 17-18, 2019},
	year = {2019}
}
CNR ExploRA

Bibliographic record

ISTI Repository

Published version Open Access

Also available from

ceur-ws.orgOpen Access