T - type of the points to clusterpublic class DBSCANClusterer<T extends Clusterable> extends Clusterer<T>
The DBSCAN algorithm forms clusters based on the idea of density connectivity, i.e. a point p is density connected to another point q, if there exists a chain of points pi, with i = 1 .. n and p1 = p and pn = q, such that each pair <pi, pi+1> is directly density-reachable. A point q is directly density-reachable from point p if it is in the ε-neighborhood of this point.
Any point that is not density-reachable from a formed cluster is treated as noise, and will thus not be present in the result.
The algorithm requires two parameters:
| Modifier and Type | Class and Description |
|---|---|
private static class |
DBSCANClusterer.PointStatus
Status of a point during the clustering process.
|
| Modifier and Type | Field and Description |
|---|---|
private double |
eps
Maximum radius of the neighborhood to be considered.
|
private int |
minPts
Minimum number of points needed for a cluster.
|
| Constructor and Description |
|---|
DBSCANClusterer(double eps,
int minPts)
Creates a new instance of a DBSCANClusterer.
|
DBSCANClusterer(double eps,
int minPts,
DistanceMeasure measure)
Creates a new instance of a DBSCANClusterer.
|
| Modifier and Type | Method and Description |
|---|---|
java.util.List<Cluster<T>> |
cluster(java.util.Collection<T> points)
Performs DBSCAN cluster analysis.
|
private Cluster<T> |
expandCluster(Cluster<T> cluster,
T point,
java.util.List<T> neighbors,
java.util.Collection<T> points,
java.util.Map<Clusterable,DBSCANClusterer.PointStatus> visited)
Expands the cluster to include density-reachable items.
|
double |
getEps()
Returns the maximum radius of the neighborhood to be considered.
|
int |
getMinPts()
Returns the minimum number of points needed for a cluster.
|
private java.util.List<T> |
getNeighbors(T point,
java.util.Collection<T> points)
Returns a list of density-reachable neighbors of a
point. |
private java.util.List<T> |
merge(java.util.List<T> one,
java.util.List<T> two)
Merges two lists together.
|
distance, getDistanceMeasureprivate final double eps
private final int minPts
public DBSCANClusterer(double eps,
int minPts)
throws NotPositiveException
The euclidean distance will be used as default distance measure.
eps - maximum radius of the neighborhood to be consideredminPts - minimum number of points needed for a clusterNotPositiveException - if eps < 0.0 or minPts < 0public DBSCANClusterer(double eps,
int minPts,
DistanceMeasure measure)
throws NotPositiveException
eps - maximum radius of the neighborhood to be consideredminPts - minimum number of points needed for a clustermeasure - the distance measure to useNotPositiveException - if eps < 0.0 or minPts < 0public double getEps()
public int getMinPts()
public java.util.List<Cluster<T>> cluster(java.util.Collection<T> points) throws NullArgumentException
cluster in class Clusterer<T extends Clusterable>points - the points to clusterNullArgumentException - if the data points are nullprivate Cluster<T> expandCluster(Cluster<T> cluster, T point, java.util.List<T> neighbors, java.util.Collection<T> points, java.util.Map<Clusterable,DBSCANClusterer.PointStatus> visited)
cluster - Cluster to expandpoint - Point to add to clusterneighbors - List of neighborspoints - the data setvisited - the set of already visited pointsprivate java.util.List<T> getNeighbors(T point, java.util.Collection<T> points)
point.point - the point to look forpoints - possible neighborsCopyright (c) 2003-2015 Apache Software Foundation