Document - Hierarchical agglomerative clustering in presence of expensive metrics (Extended Tech. Rep.)

2005

Report Open Access

Hierarchical agglomerative clustering in presence of expensive metrics (Extended Tech. Rep.)

Nanni M.

I.5.3 Clustering Data Mining

In several contexts and domains, hierarchical agglomerative clustering (HAC) offers best-quality results, but at the price of a high complexity which reduces the size of datasets which can be handled. In some contexts, in particular, computing distances between objects is the most expensive task. In all such situations the standard approach to HAC, which first computes all object-to-object distances and then performs the real clustering process, quickly yields high computational costs and large running times. One of the key means for containing such problem naturally lies in methods that can save a significant portion of distance computations, resulting in a smaller complexity. In this paper we propose a pruning heuristics well integrated in all the phases of the HAC process, developed for two HAC variants: single-linkage and complete-linkage. After describing the method, we provide some theoretical evidence of its pruning power, followed by an empirical study of its effectiveness over different data domains, with a special focus on dimensionality issues.

Source: ISTI Technical reports, 2005

Back to previous page

Cite as

BibTeX entry

@techreport{oai:it.cnr:prodotti:160251,
	title = {Hierarchical agglomerative clustering in presence of expensive metrics (Extended Tech. Rep.)},
	author = {Nanni M.},
	institution = {ISTI Technical reports, 2005},
	year = {2005}
}

CNR authors and affiliations

CNR authors

Nanni, Mirco
0000-0003-3534-4332

Laboratories

Knowledge Discovery and Data Mining (2002-ongoing)

Download

CNR ExploRA

Bibliographic record

ISTI Repository

Deposited version

Hierarchical agglomerative clustering in presence of expensive metrics (Extended Tech. Rep.)

Share

Cite as

CNR authors and affiliations

Download