2024
Conference article  Open Access

SE-PQA : Personalized community Question Answering

Kasela P., Braga M., Pasi G., Perego R.

Information Retrieval (cs.IR)  Personalization  FOS: Computer and information sciences  Computer Science - Information Retrieval  Question Answering  User Model 

Personalization in Information Retrieval is a topic studied for a long time. Nevertheless, there is still a lack of high-quality, real-world datasets to conduct large-scale experiments and evaluate models for personalized search. This paper contributes to filling this gap by introducing SE-PQA (StackExchange - Personalized Question Answering), a new curated resource to design and evaluate personalized models related to the task of community Question Answering (cQA). The contributed dataset includes more than 1 million queries and 2 million answers, annotated with a rich set of features modeling the social interactions among the users of a popular cQA platform. We describe the characteristics of SE-PQA and detail the features associated with questions and answers. We also provide reproducible baseline methods for the cQA task based on the resource, including deep learning models and personalization approaches. The results of the preliminary experiments conducted show the appropriateness of SE-PQA to train effective cQA models; they also show that personalization remarkably improves the effectiveness of all the methods tested. Furthermore, we show the benefits in terms of robustness and generalization of combining data from multiple communities for personalization purposes.

Publisher: Association for Computing Machinery, Inc


[1] Qingyao Ai, Yongfeng Zhang, Keping Bi, Xu Chen, and W. Bruce Croft. 2017. Learning a Hierarchical Embedding Model for Personalized Product Search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (Shinjuku, Tokyo, Japan) (SIGIR '17). Association for Computing Machinery, New York, NY, USA, 645-654. https://doi.org/10.1145/3077136.3080813
[2] Michael Barbaro, Tom Zeller, and Saul Hansell. 2006. A face is exposed for AOL searcher no. 4417749. New York Times 9, 2008 (2006), 8.
[3] Elias Bassani. 2022. ranx: A Blazing-Fast Python Library for Ranking Evaluation and Comparison. In ECIR (2) (Lecture Notes in Computer Science, Vol. 13186). Springer, Cham, 259-264.
[4] Elias Bassani, Pranav Kasela, Alessandro Raganato, and Gabriella Pasi. 2022. A Multi-Domain Benchmark for Personalized Search Evaluation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (Atlanta, GA, USA) (CIKM '22). Association for Computing Machinery, New York, NY, USA, 3822-3827. https://doi.org/10.1145/3511808.3557536
[5] Elias Bassani and Luca Romelli. 2022. Ranx.Fuse: A Python Library for Metasearch. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management (Atlanta, GA, USA) (CIKM '22). Association for Computing Machinery, New York, NY, USA, 4808-4812. https://doi.org/10.1145/3511808.3557207
[6] Alexey Borisov, Ilya Markov, Maarten de Rijke, and Pavel Serdyukov. 2016. A Context-Aware Time Model for Web Search. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (Pisa, Italy) (SIGIR '16). Association for Computing Machinery, New York, NY, USA, 205-214. https://doi.org/10.1145/2911451.2911504
[7] Silvia Calegari and Gabriella Pasi. 2013. Personal ontologies: Generation of user profiles based on the YAGO ontology. Information Processing & Management 49, 3 (2013), 640-658. https://doi.org/10.1016/j.ipm.2012.07.010 Personalization and Recommendation in Information Access.
[8] Matthew Henderson, Rami Al-Rfou, Brian Strope, Yun-hsuan Sung, Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar, Balint Miklos, and Ray Kurzweil. 2017. Efficient Natural Language Response Suggestion for Smart Reply. https://doi.org/10.48550/ARXIV.1705.00652
[9] Doris Hoogeveen, Karin M. Verspoor, and Timothy Baldwin. 2015. CQADupStack: A Benchmark Data Set for Community Question-Answering Research. In Proceedings of the 20th Australasian Document Computing Symposium (ADCS) (Parramatta, NSW, Australia) (ADCS '15). ACM, New York, NY, USA, Article 3, 8 pages. https://doi.org/10.1145/2838931.2838934
[10] Minghui Huang, Wei Peng, and Dong Wang. 2021. TPRM: A Topic-based Personalized Ranking Model for Web Search. https://doi.org/10.48550/ARXIV.2108.06014
[11] HuggingFace. 2021. Train a Sentence Embedding Model with 1B Training Pairs. HuggingFace. https://huggingface.co/blog/1b-sentence-embeddings
[12] Zhengyi Ma, Zhicheng Dou, Guanyue Bian, and Ji-Rong Wen. 2020. PSTIE: Time Information Enhanced Personalized Search. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (Virtual Event, Ireland) (CIKM '20). Association for Computing Machinery, New York, NY, USA, 1075-1084. https://doi.org/10.1145/3340531.3411877
[13] Bhaskar Mitra and Nick Craswell. 2018. An Introduction to Neural Information Retrieval. Foundations and Trends® in Information Retrieval 13, 1 (2018), 1-126. https://doi.org/10.1561/1500000061
[14] Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A Picture of Search. In Proceedings of the 1st International Conference on Scalable Information Systems (Hong Kong) (InfoScale '06). Association for Computing Machinery, New York, NY, USA, 1-es. https://doi.org/10.1145/1146847.1146848
[15] Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. https://doi.org/10.48550/ARXIV.1908.10084
[16] Mirco Speretta and Susan Gauch. 2005. Personalized Search Based on User Search Histories, In The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05). Proceedings - 2005 IEEE/WIC/ACM InternationalConference on Web Intelligence, WI 2005 2005, 622- 628. https://doi.org/10.1109/WI.2005.114
[17] Shayan A. Tabrizi, Azadeh Shakery, Hamed Zamani, and Mohammad Ali Tavallaei. 2018. PERSON: Personalized information retrieval evaluation based on citation networks. Information Processing & Management 54, 4 (2018), 630-656. https://doi.org/10.1016/j.ipm.2018.04.004

Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:iris.cnr.it:20.500.14243/499881,
	title = {SE-PQA : Personalized community Question Answering},
	author = {Kasela P. and Braga M. and Pasi G. and Perego R.},
	publisher = {Association for Computing Machinery, Inc},
	doi = {10.1145/3589335.3651445 and 10.48550/arxiv.2306.16261},
	year = {2024}
}

EFRA
Extreme Food Risk Analytics


OpenAIRE