Part of book or chapter of book
Inverted Index Compression
Pibiri G. E., Venturini R.
The data structure at the core of nowadays large-scale search engines, social networks and storage architectures is the inverted index, which can be regarded as being a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by search engines and stringent performance requirements dictated by the heavy load of user queries, the inverted lists often store several million (even billion) of integers and must be searched efficiently.
In this scenario, compressing the inverted lists of the index appears as a mandatory design phase since it can introduce a twofold advantage over a non-compressed representation: feed faster memory levels with more data in order to speed up the query processing algorithms and reduce the number of storage machines needed to host the whole index. The scope of the chapter is the one of surveying the most important encoding algorithms developed for efficient inverted index compression.Back to previous page