2020
Journal article  Restricted

Weighting passages enhances accuracy

Muntean C. I., Nardini F. M., Perego R., Tonellotto N., Frieder O.

Weighting models  Salient terms  Evaluation  Computer Science Applications  Passage retrieval  BM25P  General Business  Management and Accounting  Information Systems 

We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.

Source: ACM transactions on information systems 39 (2020). doi:10.1145/3428687

Publisher: Association for Computing Machinery,, New York, NY , Stati Uniti d'America


Metrics



Back to previous page
BibTeX entry
@article{oai:it.cnr:prodotti:440068,
	title = {Weighting passages enhances accuracy},
	author = {Muntean C. I. and Nardini F. M. and Perego R. and Tonellotto N. and Frieder O.},
	publisher = {Association for Computing Machinery,, New York, NY , Stati Uniti d'America},
	doi = {10.1145/3428687},
	journal = {ACM transactions on information systems},
	volume = {39},
	year = {2020}
}