Template 11

Filters

Initially, a clustering algorithm can be applied in the dataset of interest and in the second step the results are filtered based either on individual or on a combination of 4 different methods.These are: a) density, b) haircut operation, c) best neighbour and d) cutting edge. These filters can be used to reduce the noise from the clusters and increase the quality of the results. This means that "unwanted" nodes that were "accidentally" assigned to a cluster can be removed whereas nodes that should belong to a cluster but do not show up there can be added. This way the cluster can b enriched by important nodes that were not initially detected by the clustering methods. These filters can be also used as thresholds to filter down the clustering results according to users preferences.

Cluster density method

This threshold can isolate or filter down areas or clusters below a certain density value. The density of a subgraph is calculated by the formula 2|E| / ( |V| ( |V-1| ) ) where |E| is the number of edges and |V| the number of vertices of the subgraph. Values for cluster densiy can vary from 0 to 1.

Haircut operation method

Haircut operation is a method that detects and excludes vertices with low degree of connectivity from the potential cluster that these nodes belong to. Proportionally, the lower the connectivity of a node is, the lower the probability for this node to belong to a cluster is. In such a way, the deletion of such nodes that add noise to the cluster. Values for cluster density can vary from 0 to 10.

Cutting edge method

To address these cases where densely connected within themselves nodes exist, a filtering criterion was applied,

called cutting edge and is defined as:

|inside edges| / |total edges|

where |inside edges| is  the number of edges inside a cluster and |total edges| is the number of edges that are adjacent

to at least one vertex of the cluster. The clusters in which the cutting edge metric is below a user defined threshold are

discarded from the filter of our method. Values for cluster density can vary from 0 to 1

Best neighbour method

In contrast with haircut operation method, best neighbour method tends to detect and enrich the clusters with candidate vertices that are considered as good "neighbours". Such a node is the one where the proportion of its edges adjacent to the cluster divided by the total degree of the vertex is above a threshold defined by the user:

( |adjacent edges| / |total edges| ) > threshold

Another advantage of using best neighbor method is that a node can be assigned to more than one clusters. Values for cluster density can vary from 0 to 1.