Document - Representing document lengths with identifiers

2011

Conference article Restricted

Representing document lengths with identifiers

Tonellotto Nicola, Silvestri Fabrizio, Perego Raffaele

Information Retrieval

The length of each indexed document is needed by most common text retrieval scoring functions to rank it with respect to the current query. For efficiency purposes information retrieval systems maintain this information in the main memory. This paper proposes a novel strategy to encode the length of each document directly in the document identifier, thus reducing main memory demand. The technique is based on a simple document identifier assignment method and a function allowing the emph{approximate} length of each indexed document to be computed analytically. The paper discusses the implication of the adoption of the proposed technique, and the encouraging results of the experiments conducted with the 2009 TREC Web Track dataset.

Source: Advances in Information Retrieval. 33rd European Conference on IR Research, ECIR 2011, pp. 665–669, Dublin, Ireland, 18-21 April 2011

Metrics

Back to previous page

Cite as

BibTeX entry

@inproceedings{oai:it.cnr:prodotti:206212,
	title = {Representing document lengths with identifiers},
	author = {Tonellotto Nicola and Silvestri Fabrizio and Perego Raffaele},
	doi = {10.1007/978-3-642-20161-5_66},
	booktitle = {Advances in Information Retrieval. 33rd European Conference on IR Research, ECIR 2011, pp. 665–669, Dublin, Ireland, 18-21 April 2011},
	year = {2011}
}

CNR authors and affiliations

CNR authors

Perego, Raffaele
0000-0001-7189-4724
Silvestri, Fabrizio
0000-0001-7669-9055
Tonellotto, Nicola
0000-0002-7427-1001

Laboratories

High Performance Computing (2002-ongoing)

Download

CNR ExploRA

Bibliographic record

DOI

10.1007/978-3-642-20161-5_66

Also available from

Representing document lengths with identifiers

Metrics

Share

Cite as

CNR authors and affiliations

Download