2004
Contribution to book  Restricted

Supervised term weighting for automated text categorization

Debole F., Sebastiani F.

Supervised term weighting  Text classification 

The construction of a text classifier usually involves (i) a phase of term selection, in which the most relevant terms for the classification task are identified, (ii) a phase of term weighting, in which document weights for the selected terms are computed, and (iii) a phase of classifier learning, in which a classifier is generated from the weighted representations of the training documents. This process involves an activity of supervised learning, in which information on the membership of training documents in categories is used. Traditionally, supervised learning enters only phases (i) and (iii). In this paper we propose instead that learning from the training data should also affect phase (ii), i.e. that information on the membership of training documents to categories be used to determine term weights. We call this idea supervised term weighting (STW). As an example of STW, we propose a number of supervised variants of tfidf weighting, obtained by replacing the idf function with the function that has been used in phase (i) for term selection. The use of STW allows the terms that are distributed most differently in the positive and negative examples of the categories of interest to be weighted highest. We present experimental results obtained on the standard Reuters-21578 benchmark with three classifier learning methods (Rocchio, k-NN, and support vector machines), three term selection functions (information gain, chi-square, and gain ratio), and both local and global term selection and weighting.

Source: Text Mining and its Applications, edited by Spiros Sirmakessis, pp. 81–97. Heidelberg: Physica Verlag, 2004

Publisher: Physica Verlag, Heidelberg, DEU



Back to previous page
BibTeX entry
@inbook{oai:it.cnr:prodotti:138939,
	title = {Supervised term weighting for automated text categorization},
	author = {Debole F. and Sebastiani F.},
	publisher = {Physica Verlag, Heidelberg, DEU},
	booktitle = {Text Mining and its Applications, edited by Spiros Sirmakessis, pp. 81–97. Heidelberg: Physica Verlag, 2004},
	year = {2004}
}
CNR ExploRA

Bibliographic record

Also available from

www.isti.cnr.itRestricted