2015
Conference article  Open Access

Optimal space-time tradeoffs for inverted indexes

Ottaviano G., Tonellotto N., Venturini R.

Compression  Inverted indexes  Knapsack problems 

Inverted indexes are usually represented by dividing posting lists into constant-sized blocks and representing them with an encoder for sequences of integers. Different encoders yield a different point in the space-time trade-off curve, with the fastest being several times larger than the most space-efficient. An important design decision for an index is thus the choice of the fastest encoding method such that the index fits in the available memory. However, a better usage of the space budget could be obtained by using faster encoders for frequently accessed blocks, and more space-efficient ones those that are rarely accessed. To perform this choice optimally, we introduce a linear time algorithm that, given a query distribution and a set of encoders, selects the best encoder for each index block to obtain the lowest expected query processing time respecting a given space constraint. To demonstrate the effectiveness of this approach we perform an extensive experimental analysis, which shows that our algorithm produces indexes which are significantly faster than single-encoder indexes under several query processing strategies, while respecting the same space constraints.

Source: 8th International Conference on Web search and data mining (WSDM 2015), pp. 47–56, Shanghai, China, 31/01/2015-06/02/2015


Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:336713,
	title = {Optimal space-time tradeoffs for inverted indexes},
	author = {Ottaviano G. and Tonellotto N. and Venturini R.},
	doi = {10.1145/2684822.2685297},
	booktitle = {8th International Conference on Web search and data mining (WSDM 2015), pp. 47–56, Shanghai, China, 31/01/2015-06/02/2015},
	year = {2015}
}