2013
Journal article  Open Access

An enhanced CRFs-based system for information extraction from radiology reports

Esuli A., Marcheggiani D., Sebastiani F.

Computer Science Applications  Clinical text  Health Informatics  Information extraction  Conditional random fields 

We discuss the problem of performing information extraction from free-text radiology reports via supervised learning. In this task, segments of text (not necessarily coinciding with entire sentences, and possibly crossing sentence boundaries) need to be annotated with tags representing concepts of interest in the radiological domain. In this paper we present two novel approaches to IE for radiology reports: (i) a cascaded, two-stage method based on pipelining two taggers generated via the well known linear-chain conditional random fields (LC-CRFs) learner and (ii) a confidence-weighted ensemble method that combines standard LC-CRFs and the proposed two-stage method. We also report on the use of "positional features", a novel type of feature intended to aid in the automatic annotation of texts in which the instances of a given concept may be hypothesized to systematically occur in specific areas of the text. We present experiments on a dataset of mammography reports in which the proposed ensemble is shown to outperform a traditional, single-stage CRFs system in two different, applicatively interesting scenarios.

Source: Journal of biomedical informatics 46 (2013): 425–435. doi:10.1016/j.jbi.2013.01.006

Publisher: Academic Press,, San Diego, CA , Stati Uniti d'America


[1] McCallum A. Information extraction: distilling structured data from unstructured text. Queue 2005;3(9):48-57.
[2] Sarawagi S. Information extraction. Found Trends Databases 2008;1(3):261-377.
[3] Uzuner Ö, Luo Y, Szolovits P. Evaluating the state of the art in automatic deidentification. J Am Med Inform Assoc 2007;14(5):550-63.
[4] Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning (ICML 2001), Williamstown, USA; 2001. p. 282-9.
[5] Sutton C, McCallum A. An introduction to conditional random fields for relational learning. In: Getoor L, Taskar B, editors. Introduction to statistical relational learning. Cambridge (USA): The MIT Press; 2007. p. 93-127.
[6] Sutton C, McCallum A. An introduction to conditional random fields. Found Trends Mach Learn 2012;4(4):267-373.
[7] Uzuner Ö, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts assertions and relations in clinical text. J Am Med Inform Assoc 2011;18(5):552-6.
[8] Uzuner Ö, Solti I, Cadag E. Extracting medication information from clinical text. J Am Med Inform Assoc 2010;17(5):514-8.
[9] Jiang M, Chen Y, Liu M, Rosenbloom ST, Mani S, Denny JC, et al. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc 2011;18(5):601-6.
[10] Jonnalagadda S, Cohen T, Wu S, Gonzalez G. Enhancing clinical concept extraction with distributional semantics. J Biomed Inform 2012;45(1):129-40.
[11] Patrick J, Li M. High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge. J Am Med Inform Assoc 2010;17:524-7.
[12] Torii M, Wagholikar K, Liu H. Using machine learning for concept extraction on clinical documents from multiple data sources. J Am Med Inform Assoc 2011;18(5):580-7.
[13] Roli F, Giacinto G, Vernazza G. Methods for designing multiple classifier systems. In: Proceedings of the 2nd international workshop on multiple classifier systems (MCS 2001), Cambridge, UK; 2001. p. 78-87.
[14] Esuli A, Sebastiani F. Evaluating information extraction. In: Proceedings of the conference on multilingual and multimodal information access evaluation (CLEF 2010), Padova, Italy; 2010. p. 100-11.
[15] Suzuki J, McDermott E, Isozaki H. Training conditional random fields with multivariate evaluation measures. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the ACL (ACL/COLING 2006), Sydney, Australia; 2006. p. 217-24.
[16] Cunningham H. GATE a general architecture for text engineering. Comput Human 2002;36(2):223-54.
[17] Pianta E, Girardi C, Zanoli R. The TextPro tool suite. In: Proceedings of the 6th language resources and evaluation conference (LREC 2008), Marrakech, Morocco; 2008.
[18] Gaizauskas R, Wilks Y. Information extraction: beyond document retrieval. J Document 1998;54(1):70-105.
[19] Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. In: Geissbuhler A, Kulikowski C, editors. IMIA yearbook of medical informatics. Stuttgart (DE): Schattauer Publishers; 2008. p. 128-44.
[20] McNaught J, Black W. Information extraction: the task. In: Ananiadou S, McNaught J, editors. Text mining for biology and biomedicine. London (UK): Artech House Books; 2006. p. 143-76.
[21] Bleik S, Xiong W, Wang Y, Song M. Biomedical concept extraction using concept graphs and ontology-based mapping. In: Proceedings of the 4th IEEE international conference on bioinformatics and biomedicine (BIBM 2010), Hong Kong, China; 2010. p. 553-6.
[22] Dinh D, Tamine L. Biomedical concept extraction based on combining the content-based and word order similarities. In: Proceedings of the 26th ACM symposium on applied computing, TaiChung, Taiwan; 2011. p. 1159-63.
[23] Kang N, Afzal Z, Singh B, van Mulligen EM, Kors JA. Using an ensemble system to improve concept extraction from clinical records. J Biomed Inform 2012;45(3):423-8.
[24] Soderland S, Aronow D, Fisher D, Aseltine J, Lehnert W. Machine learning of text analysis rules for clinical records. Tech rep. TE-39. Amherst (USA): Center for Intelligent Information Retrieval, University of Massachusetts; 1995.
[25] Evans DA, Brownlow ND, Hersh WR, Campbell EM. Automating concept identification in the electronic medical record: an experiment in extracting dosage information. In: Proceedings of the annual fall symposium of the American Medical Informatics Association, Washington, USA; 1996. p. 388-92.
[26] Harkema H, Roberts I, Gaizauskas R, Hepple M. Information extraction from clinical records. In: Proceedings of the 4th UK e-science all hands meeting (AHM 2005), Nottingham, UK; 2005. p. 39-43.
[27] Sotelsek-Margalef A, Villena-Román J. MIDAS: an information-extraction approach to medical text classification. Proc Lenguaje Nat 2008;41:97-104.
[28] Mykowiecka A, Marciniak M, Kups´c´ A. Rule-based information extraction from patients' clinical data. J Biomed Inform 2009;42(5):923-36.
[29] Grishman R, Huttunen S, Yangarber R. Information extraction for enhanced access to disease outbreak reports. J Biomed Inform 2002;35(4):236-46.
[30] Zhou X, Han H, Chankai I, Prestrud AA, Brooks AD. Converting semi-structured clinical medical records into information and knowledge. In: Proceedings of the 21st international conference on data engineering (ICDE 2005), Tokyo, Japan; 2005. p. 1162-9.
[31] Taira RK, Soderland SG, Jakobovits RM. Automatic structuring of radiology free-text reports. RadioGraphics 2001;21(1):237-45.
[32] Li D, Kipper-Schuler K, Savova G. Conditional random fields and support vector machines for disorder named entity recognition in clinical texts. In: Proceedings of the ACL workshop on current trends in biomedical natural language processing (BioNLP 2008), Columbus, USA; 2008. p. 94-5.
[33] Wang Y, Patrick J. Cascading classifiers for named entity recognition in clinical notes. In: Proceedings of the RANLP 2009 workshop on biomedical information extraction, Borovets, Bulgaria; 2009. p. 42-9.
[34] Jonnalagadda SR, Li D, Sohn S, Wu ST, Wagholikar K, Torii M, et al. Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules. J Am Med Inform Assoc 2012;19(5):867-74.
[35] Kim S-M, Hovy E. Automatic identification of pro and con reasons in online reviews. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics (COLING/ACL 2006), Sydney, Australia; 2006. p. 483-90.
[36] Bramsen P, Deshpande P, Lee YK, Barzilay R. Finding temporal order in discharge summaries. In: Proceedings of the 30th AMIA annual symposium (AMIA 2006), Washington, USA; 2006. p. 81-5.
[37] Pan W, Zhong E, Yang Q. Transfer learning for text mining. In: Aggarwal CC, Zhai C, editors. Mining text data. Heidelberg (DE): Springer; 2012. p. 223-58.
[38] Wang H, Huang H, Nie F, Ding CH. Cross-language web page classification via dual knowledge transfer using nonnegative matrix tri-factorization. In: Proceedings of the 34th ACM international conference on research and development in information retrieval (SIGIR 2011), Beijing, China; 2011. p. 933-42.
[39] Wainwright MJ, Jordan MI. Graphical models, exponential families, and variational inference. Found Trends Mach Learn 2008;1(1/2):1-305.

Metrics



Back to previous page
BibTeX entry
@article{oai:it.cnr:prodotti:276788,
	title = {An enhanced CRFs-based system for information extraction from radiology reports},
	author = {Esuli A. and Marcheggiani D. and Sebastiani F.},
	publisher = {Academic Press,, San Diego, CA , Stati Uniti d'America},
	doi = {10.1016/j.jbi.2013.01.006},
	journal = {Journal of biomedical informatics},
	volume = {46},
	pages = {425–435},
	year = {2013}
}