Compressed String Dictionary Look-up with Edit Distance One Belazzougui D., Venturini R. In this paper we present dierent solutions for the problem of indexing a dictionary of strings in compressed space. Given a pattern P, the index has to report all the strings in the dictionary having edit distance at most one with P. Our rst solution is able to solve queries in (almost optimal) O(|P|+occ) time where occ is the number of strings in the dictionary having edit distance at most one with P. The space complexity of this solution is bounded in terms of the k-th order entropy of the indexed dictionary. Our second solution further improves this space complexity at the cost of increasing the query timeSource: 23rd Annual Symposium on Combinatorial Pattern Matching, pp. 280–292, Helsinky, 1 Giugno 2012 DOI: 10.1007/978-3-642-31265-6_23 Metrics:
How random walks can help tourism Lucchese C., Perego R., Silvestri F., Vahabi H., Venturini R. On-line photo sharing services allow users to share their touristic experiences. Tourists can publish photos of interesting locations or monuments visited, and they can also share comments, annotations, and even the GPS traces of their visits. By analyzing such data, it is possible to turn colorful photos into metadata-rich trajectories through the points of interest present in a city. In this paper we propose a novel algorithm for the interactive gen- eration of personalized recommendations of touristic places of interest based on the knowledge mined from photo albums and Wikipedia. The distinguishing features of our approach are multiple. First, the underlying recommendation model is built fully automatically in an unsupervised way and it can be easily extended with heterogeneous sources of infor- mation. Moreover, recommendations are personalized according to the places previously visited by the user. Finally, such personalized recom- mendations can be generated very efficiently even on-line from a mobile device.Source: Advances in Information Retrieval. 34th European Conference on IR Research, pp. 195–206, Barcelona, Spain, 1-5 April 2012 DOI: 10.1007/978-3-642-28997-2_17 Metrics:
Efficient Query Recommendations in the Long Tail via Center-Piece Subgraphs Bonchi F., Perego R., Silvestri F., Vahabi H., Venturini R. We present a recommendation method based on the wellknown concept of center-piece subgraph, that allows for the time/space efficient generation of suggestions also for rare, i.e., long-tail queries. Our method is scalable with respect to both the size of datasets from which the model is computed and the heavy workloads that current web search engines have to deal with. Basically, we relate terms contained into queries with highly correlated queries in a query-flow graph. This enables a novel recommendation generation method able to produce recommendations for approximately 99% of the workload of a real-world search engine. The method is based on a graph having term nodes, query nodes, and two kinds of connections: term-query and query-query. The first connects a term to the queries in which it is contained, the second connects two query nodes if the likelihood that a user submits the second query after having issued the first one is sufficiently high. On such large graph we compute the center-piece subgraph induced by terms contained into queries and we reduce the cost of this computation using a novel and efficient method based on an inverted index representation of the model. We experiment our solution on two real-world query logs and we show that its effectiveness is comparable (and in some case better) than state-of-the-art methods for head-queries. More importantly, the quality of the recommendations generated remains very high also for long-tail queries, where other methods fail even to produce any suggestion. Finally, we extensively investigate scalability and efficiency issues and we show the viability of our method in real world search engines.Source: 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 345–354, Portland, OR, USA, 12-16 August 2012 DOI: 10.1145/2348283.2348332 Metrics: