2015
Conference article  Open Access

On the impact of Entity Linking in microblog real-time filtering

Berardi G., Ceccarelli D., Esuli A., Marcheggiani D.

Microblogging  Information Retrieval (cs.IR)  Computer Science - Information Retrieval  FOS: Computer and information sciences  Entity Linking  Real-time filtering  H.4 

Microblogging is a model of content sharing in which the temporal locality of posts with respect to important events, either of foreseeable or unforeseeable nature, makes applications of real-time filtering of great practical interest. We propose the use of Entity Linking (EL) in order to improve the retrieval effectiveness, by enriching the representation of microblog posts and filtering queries. EL is the process of recognizing in an unstructured text the mention of relevant entities described in a knowledge base. EL of short pieces of text is a difficult task, but it is also a scenario in which the information EL adds to the text can have a substantial impact on the retrieval process. We implement a start-of-the-art filtering method, based on the best systems from the TREC Microblog track real-time adhoc retrieval and filtering tasks , and extend it with a Wikipedia-based EL method. Results show that the use of EL significantly improves over non-EL based versions of the filtering methods. Copyright is held by the owner/author(s).

Source: SAC'15 - 30th Annual ACM Symposium on Applied Computing, pp. 1066–1071, Salamanca, Spain, 13-17 April 2015


[1] M. Albakour, C. Macdonald, and I. Ounis. On sparsity and drift for e ective real-time ltering in microblogs. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM'13), pages 419{428, San Francisco, US, 2013.
[2] J. Allan. Incremental relevance feedback for information ltering. In Proceedings of the 19th annual international ACM SIGIR '96. Zurich, CH, 1996.
[3] G. Berardi, A. Esuli, D. Marcheggiani, and F. Sebastiani. ISTI@ TREC Microblog track 2011: exploring the use of hashtag segmentation and text quality ranking. In Proceedings of the 20th Text REtrieval Conference (TREC 2011), Gaithersburg, US, 2011.
[4] D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, and S. Trani. Dexter: an open source framework for entity linking. In Proceedings of the 6h International Workshop on Exploiting Semantic Annotations in Information Retrieval, (ESAIR'13), pages 17{20, San Francisco, US, 2013.
[5] J. Dalton, L. Dietz, and J. Allan. Entity query feature expansion using knowledge base links. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'14), pages 365{374, Gold Coast, AU, 2014.
[6] M. Efron. Information search and retrieval in microblogs. Journal of the American Society for Information Science and Technology, 62(6):996{1008, 2011.
[7] D. Feltoni Gurini and F. Gasparetti. Trec microblog 2012 track: Real-time algorithm for microblog ranking systems. Technical report, 2012.
[8] P. Ferragina and U. Scaiella. Tagme: On-the- y annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM'10), pages 1625{1628, Toronto, CA, 2010.
[9] Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at trec 2012 microblog track. In Proceedings of Text REtrieval Conference, 2012.
[10] X. Hu, N. Sun, C. Zhang, and T.-S. Chua. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In Proceedings of CIKM 2009, Hong Kong, CN, 2009.
[11] S. Karimi, J. Yin, and P. Thomas. Searching and ltering tweets: Csiro at the trec 2012 microblog track. Technical report, 2012.
[12] F. Liang, R. Qiang, Y. Hong, Y. Fei, and J. Yang. Pkuicst at trec 2012 microblog track. Technical report, 2012.
[13] N. Limsopatham, R. McCreadie, M.-D. Albakour, C. Macdonald, R. L. T. Santos, and I. Ounis. University of glasgow at trec 2012: Experiments with terrier in medical records, microblog, and web tracks. Technical report, 2012.
[14] E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In Proceedings of the 5th ACM international conference on Web search and data mining (WSDM'12), pages 563{572, Seattle, US, 2012.
[15] P. N. Mendes, M. Jakob, A. Garc a-Silva, and C. Bizer. Dbpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, pages 1{8, Graz, AU, 2011.
[16] D. Milne and I. H. Witten. Learning to link with wikipedia. In Proceedings of the 17th ACM conference on Information and knowledge management (CIKM'08), pages 509{518, Napa Valley, US, 2008.
[17] R. Nagmoti, A. Teredesai, and M. De Cock. Ranking approaches for microblog search. In Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01, WI-IAT '10, pages 153{157, Washington, DC, USA, 2010. IEEE Computer Society.
[18] I. Ounis, C. Macdonald, J. Lin, and I. Soboro . Overview of the trec-2011 microblog track. In Proceeddings of the 20th Text REtrieval Conference (TREC'11), 2011.
[19] C. D. Paice. Another stemmer. SIGIR Forum, 24(3):56{61, 1990.
[20] Z. Ren, M.-H. Peetz, S. Liang, W. van Dolen, and M. de Rijke. Hierarchical multi-label classi cation of social text streams. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'14), pages 213{222, Gold Coast, AU, 2014.
[21] S. Robertson and I. Soboro . The trec 2002 ltering track report. In Proceedings of the 11th Text REtrieval Conference (TREC'02), Gaithersburg, US, 2002.
[22] M. Schuhmacher and S. P. Ponzetto. Knowledge-based graph document modeling. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (WSDM'14), pages 543{552, New York, US, 2014.
[23] I. Soboro , I. Ounis, C. Macdonald, and J. Lin. Overview of the trec-2012 microblog track. In Proceedings of the 21st Text REtrieval Conference (TREC'12), 2012.
[24] J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: nding topic-sensitive in uential twitterers. In Proceedings of the third ACM international conference on Web search and data mining, WSDM '10, pages 261{270, New York, NY, USA, 2010. ACM.
[25] J. Zhang, S. Chen, Y. Liu, J. Yin, Q. Wang, W. Xu, and J. Guo. Pris at 2012 microblog track. Technical report, 2012.
[26] Y. Zhang and J. P. Callan. The bias problem and language models in adaptive ltering. In TREC, 2001.

Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:368744,
	title = {On the impact of Entity Linking in microblog real-time filtering},
	author = {Berardi G. and Ceccarelli D. and Esuli A. and Marcheggiani D.},
	doi = {10.1145/2695664.2695761 and 10.48550/arxiv.1611.03350},
	booktitle = {SAC'15 - 30th Annual ACM Symposium on Applied Computing, pp. 1066–1071, Salamanca, Spain, 13-17 April 2015},
	year = {2015}
}