2009
Conference article  Unknown

Scalable similarity self join in a metric DHT system

Gennaro C.

Similarity Join  Content-Addressable Network  Metric Space 

Efficient processing of similarity joins is important for a large class of data analysis and data-mining applications. This primitive finds all pairs of records within a predefined distance threshold of each other. We present MCAN+, an extension of MCAN (a Content-Addressable Network for metric objects) to support similarity self join queries. The challenge of the proposed approach is to address the problem of the intrinsic quadratic complexity of similarity joins, with the aim of bounding the elaboration time, by involving an increasing number of computational nodes as the dataset size grows. To test the scalability of MCAN+, we used a real-life dataset of color features extracted from one million images of the Flickr photo sharing website.

Source: 17th Italian Symposium on Advanced Database Systems, pp. 81–88, Camogli, Genova, 21-24 June 2009

Publisher: Seneca Edizioni, Torino, ITA



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:92003,
	title = {Scalable similarity self join in a metric DHT system},
	author = {Gennaro C.},
	publisher = {Seneca Edizioni, Torino, ITA},
	booktitle = {17th Italian Symposium on Advanced Database Systems, pp. 81–88, Camogli, Genova, 21-24 June 2009},
	year = {2009}
}