2003
Conference article  Unknown

Research in automated classification of texts: trends and perspectives

Sebastiani F.

Classification texts  Classifier Design and Evaluation  Learning  Information Search and Retrieval 

Text categorization (also known as text classi.cation, or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. This task has several applications, including automated indexing of scienti.c articles according to prede.ned thesauri of technical terms, filing patents into patent directories, selective dissemination of information to information consumers, automated population of hierarchical catalogues of Web resources, spam filtering, identification of document genre, authorship attribution, automated survey coding, and even automated essay grading. Automated text classi.cation is attractive because it frees organizations from the need of manually organizing document bases, which can be too expensive, or simply infeasible given the time constraints of the application or the number of documents involved. The accuracy of modern text classification systems rivals that of trained human professionals, thanks to a combination of information retrieval (IR) technology and machine learning (ML) technology. This paper will outline the fundamental traits of the technologies involved, of the applications that can feasibly be tackled through text classi.cation, and of the tools and resources that are available to the researcher and developer wishing to take up these technologies for deploying real-world applications.

Source: Fourth International Colloquium on Library and Information Science, pp. 298–311, Salamanca, 5-7 May 2003

Publisher: Universidad de Salamanca, Salamanca, ESP



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:90975,
	title = {Research in automated classification of texts: trends and perspectives},
	author = {Sebastiani F.},
	publisher = {Universidad de Salamanca, Salamanca, ESP},
	booktitle = {Fourth International Colloquium on Library and Information Science, pp. 298–311, Salamanca, 5-7 May 2003},
	year = {2003}
}