uhho / density-clustering Goto Github PK
View Code? Open in Web Editor NEWDensity Based Clustering in JavaScript
License: MIT License
Density Based Clustering in JavaScript
License: MIT License
E.g.
var DBSCAN = require('density-clustering').DBSCAN;
Seems obvious but different libraries have different conventions.
density-clustering/lib/KMEANS.js
Line 67 in 6fd0803
It seems the maxDim variable has not been initialized on this line.
I modified the sample and all the points are marked as noise if you increase the min points value to 3:
var dataset = [
[1,1],[0,1],[1,0],
[10,10],[10,13],[13,13],
[54,54],[55,55],[89,89],[57,55]
];
var clustering = require('density-clustering');
var dbscan = new clustering.DBSCAN();
var clusters = dbscan.run(dataset, 5, 3);
console.log(clusters);
console.log(dbscan.noise);
I'm not sure whether this is the intended behavior for DBSCAN. I think this should be fixed or documented. What do you think?
I guess this could be fixed arround here: https://github.com/LukaszKrawczyk/clustering/blob/master/lib/DBSCAN.js#L68
The source code has a lot of incorrect type declarations which is confusing my IDE.
Here are some examples:
/**
* DBSCAN class construcotr
* @constructor
*
* @param {Array} dataset
* @param {number} epsilon
* @param {number} minPts
* @param {function} distanceFunction
* @returns {DBSCAN}
*/
function DBSCAN(dataset, epsilon, minPts, distanceFunction) {
Fields are declared as mandatory but in fact they aren't.
/**
* Start clustering
*
* @param {Array} dataset
* @param {number} epsilon
* @param {number} minPts
* @param {function} distanceFunction
* @returns {undefined}
* @access public
*/
DBSCAN.prototype.run = function(dataset, epsilon, minPts, distanceFunction) {
Same here, and the return type is declared to be undefined
but in fact it returns the clusters.
Hi again,
I run DBScan on the following data: [0,1,2,3,4,5,6,390,393,396,399,402,405]
with the following configuration:
new DBSCAN(dbscan_data, 8, 2, function (x1, x2) {
return Math.abs(x1 - x2);
});
The output is: [[0,1,2,3,4,5,6],[7,8,9],[10,11],[12]] related to my data this are the clusters:
0,1,2,3,4,5,6
| 390,393,396
| 399,402
| 405
Shouldn't it be?:
0,1,2,3,4,5,6
| 390,393,396, 399,402, 405
If not do you know what clustering this is? I thought every point is connected by another point if the distance < 8.
I would be so happy if I get an answer :)
Hi,
when using a tiny toy data set I get really weird results:
const data = [
[4,5],
[5,5],
[4,4],
[4,4],
[10,10],
[9,9],
[10,9],
[9,10],
[1,1] // outlier
]
I would have expected that two clusters are found, from index 0-4 and 5-8. Index 9 should be an outlier.
If I run the same dataset with sklearn, I get the following output:
[ 0 0 0 0 1 1 1 1 -1]
which is what I expected.
Hello, it is very good algorithms
I want show the results. Are there any chart to show the clusters?
Google charts or other library.
Amy chance to support esm and not commonjs?
The lib/KMEANS.js file is encoded as UTF-8 text with byte order markers. That encoding can cause issues with some bundlers (specifically webpack).
The rest of the library is encoded as UTF8 (no byte order markers). Can the KMEANS clustering file be updated to use the same format?
In the DBScan.js I got the error: Unresolved variable (line: 109)
When passing an array of values instead of an array of arrays to DBSCAN or OPTICS, strange clustering occurs instead of throwing an error. This is easy to do by mistake when using one-dimensional data.
Hi,
Thanks for your work! Just having an issue with your implementation because in my case I'm working with 21k+ elements in the table and therefore your PriorityQueue implementation (which uses splice) eats up all the available stack memory.
I don't think I'm doing something wrong, but tell me if so. However, a solution to this issue could be to use a setImmediate(...)
function in NodeJS and a setTimeout(..., 0)
in the browser. That would let the time to the garbage collector to do it's duty.
What do you think?
Cheers, and thanks for this great work!
Hey,
I believe that the core distance calculation is wrong.
Per definition, the core-distance of an object o is the smallest distance for o to be a core object.
Your function distanceToCore(pointId, neighbors):
minDistance = undefined;
if (neighbors.length >= this.minPts) {
var minDistance = this.epsilon;
neighbors.forEach(function(pointId2) {
var dist = self.distance(self.dataset[pointId], self.dataset[pointId2]);
if (dist < minDistance) minDistance = dist;
});
}
return minDistance;
As far as I understand, you set the coreDistance to the distance to the nearest neighbor. That's wrong in genereal and would only be correct if there were a total of minPts many points as near as the nearest neighbor.
Here's a suggestion for distanceToCore:
OPTICS.prototype._distanceToCore = function(pointId) {
var coreDistCand;
for(coreDistCand = 0, l = this.epsilon; coreDistCand < l; coreDistCand++) {
var neighbors = [];
for (var id = 0, k = this.dataset.length; id < k; id++) {
if (this.distance(this.dataset[pointId], this.dataset[id]) <= coreDistCand) {
neighbors.push(id);
}
}
if(neighbors.length >= this.minPts) {
return coreDistCand;
}
}
return -1;
}
Sorry the dumb question, but what is the meaning of the result? Initially I though it was some buggy doc, but just installed and confirmed
var dataset = [
[1,1],[0,1],[1,0],
[10,10],[10,13],[13,13],
[54,54],[55,55],[89,89],[57,55]
];
var clustering = require('density-clustering');
var dbscan = new clustering.DBSCAN();
// parameters: 5 - neighborhood radius, 2 - number of points in neighborhood to form a cluster
var clusters = dbscan.run(dataset, 5, 2);
console.log(clusters, dbscan.noise);
/*
RESULT:
[
[0,1,2],
[3,4,5],
[6,7,9],
[8]
]
NOISE: [ 8 ]
*/
Why 3 different clusters of points in 2D have as result a 3x3 matrix?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.