2011
Conference article  Restricted

Representing document lengths with identifiers

Tonellotto Nicola, Silvestri Fabrizio, Perego Raffaele

Information Retrieval 

The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The technique is based on a simple document identifier assignment method and a function allowing the emph{approximate} length of each indexed document to be computed analytically. The paper discusses the implication of the adoption of the proposed technique, and the encouraging results of the experiments conducted with the 2009 TREC Web Track dataset.

Source: Advances in Information Retrieval. 33rd European Conference on IR Research, ECIR 2011, pp. 665–669, Dublin, Ireland, 18-21 April 2011


Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:206212,
	title = {Representing document lengths with identifiers},
	author = {Tonellotto Nicola and Silvestri Fabrizio and Perego Raffaele},
	doi = {10.1007/978-3-642-20161-5_66},
	booktitle = {Advances in Information Retrieval. 33rd European Conference on IR Research, ECIR 2011, pp. 665–669, Dublin, Ireland, 18-21 April 2011},
	year = {2011}
}