Giter Site home page Giter Site logo

density-clustering's People

Contributors

alastaircoote avatar lukaszkrawczyk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

density-clustering's Issues

Support esm

Amy chance to support esm and not commonjs?

DBScan - am I understandig it wrong?

Hi again,

I run DBScan on the following data: [0,1,2,3,4,5,6,390,393,396,399,402,405]

with the following configuration:

new DBSCAN(dbscan_data, 8, 2, function (x1, x2) {
    return Math.abs(x1 - x2);
});

The output is: [[0,1,2,3,4,5,6],[7,8,9],[10,11],[12]] related to my data this are the clusters:

0,1,2,3,4,5,6 | 390,393,396 | 399,402 | 405

Shouldn't it be?:

0,1,2,3,4,5,6 | 390,393,396, 399,402, 405

If not do you know what clustering this is? I thought every point is connected by another point if the distance < 8.
I would be so happy if I get an answer :)

Type checking

When passing an array of values instead of an array of arrays to DBSCAN or OPTICS, strange clustering occurs instead of throwing an error. This is easy to do by mistake when using one-dimensional data.

Add info how to use it.

E.g.

var DBSCAN = require('density-clustering').DBSCAN;

Seems obvious but different libraries have different conventions.

UTF8-BOM Encoding

The lib/KMEANS.js file is encoded as UTF-8 text with byte order markers. That encoding can cause issues with some bundlers (specifically webpack).

The rest of the library is encoded as UTF8 (no byte order markers). Can the KMEANS clustering file be updated to use the same format?

Confused by DBSCAN min points

I modified the sample and all the points are marked as noise if you increase the min points value to 3:

var dataset = [
    [1,1],[0,1],[1,0],
    [10,10],[10,13],[13,13],
    [54,54],[55,55],[89,89],[57,55]
];

var clustering = require('density-clustering');
var dbscan = new clustering.DBSCAN();
var clusters = dbscan.run(dataset, 5, 3);
console.log(clusters);
console.log(dbscan.noise);

I'm not sure whether this is the intended behavior for DBSCAN. I think this should be fixed or documented. What do you think?

I guess this could be fixed arround here: https://github.com/LukaszKrawczyk/clustering/blob/master/lib/DBSCAN.js#L68

Incorrect type declarations

The source code has a lot of incorrect type declarations which is confusing my IDE.

Here are some examples:

/**
 * DBSCAN class construcotr
 * @constructor
 *
 * @param {Array} dataset
 * @param {number} epsilon
 * @param {number} minPts
 * @param {function} distanceFunction
 * @returns {DBSCAN}
 */
function DBSCAN(dataset, epsilon, minPts, distanceFunction) {

Fields are declared as mandatory but in fact they aren't.

/**
 * Start clustering
 *
 * @param {Array} dataset
 * @param {number} epsilon
 * @param {number} minPts
 * @param {function} distanceFunction
 * @returns {undefined}
 * @access public
 */
DBSCAN.prototype.run = function(dataset, epsilon, minPts, distanceFunction) {

Same here, and the return type is declared to be undefined but in fact it returns the clusters.

Optics Error?

Hi,

when using a tiny toy data set I get really weird results:

const data = [
    [4,5],
    [5,5],
    [4,4],
    [4,4],
    [10,10],
    [9,9],
    [10,9],
    [9,10],
    [1,1] // outlier
]

I would have expected that two clusters are found, from index 0-4 and 5-8. Index 9 should be an outlier.

If I run the same dataset with sklearn, I get the following output:

[ 0  0  0  0  1  1  1  1 -1]

which is what I expected.

Splice eating up the stack memory

Hi,

Thanks for your work! Just having an issue with your implementation because in my case I'm working with 21k+ elements in the table and therefore your PriorityQueue implementation (which uses splice) eats up all the available stack memory.

I don't think I'm doing something wrong, but tell me if so. However, a solution to this issue could be to use a setImmediate(...) function in NodeJS and a setTimeout(..., 0) in the browser. That would let the time to the garbage collector to do it's duty.

What do you think?

Cheers, and thanks for this great work!

OPTICS: Core Distance Calculation

Hey,

I believe that the core distance calculation is wrong.

Per definition, the core-distance of an object o is the smallest distance for o to be a core object.

Your function distanceToCore(pointId, neighbors):

minDistance = undefined;
if (neighbors.length >= this.minPts) {
    var minDistance = this.epsilon;
    neighbors.forEach(function(pointId2) {
        var dist = self.distance(self.dataset[pointId], self.dataset[pointId2]);
        if (dist < minDistance) minDistance = dist;
    });
}
return minDistance;

As far as I understand, you set the coreDistance to the distance to the nearest neighbor. That's wrong in genereal and would only be correct if there were a total of minPts many points as near as the nearest neighbor.

Here's a suggestion for distanceToCore:

OPTICS.prototype._distanceToCore = function(pointId) {

    var coreDistCand;
    for(coreDistCand = 0, l = this.epsilon; coreDistCand < l; coreDistCand++) {
      var neighbors = [];
      for (var id = 0, k = this.dataset.length; id < k; id++) {
        if (this.distance(this.dataset[pointId], this.dataset[id]) <= coreDistCand) {
          neighbors.push(id);
        }
      }
      if(neighbors.length >= this.minPts) {
        return coreDistCand;
      }
    }
    return -1;
}

What is the meaning of the result?

Sorry the dumb question, but what is the meaning of the result? Initially I though it was some buggy doc, but just installed and confirmed

var dataset = [
    [1,1],[0,1],[1,0],
    [10,10],[10,13],[13,13],
    [54,54],[55,55],[89,89],[57,55]
];
 
var clustering = require('density-clustering');
var dbscan = new clustering.DBSCAN();
// parameters: 5 - neighborhood radius, 2 - number of points in neighborhood to form a cluster
var clusters = dbscan.run(dataset, 5, 2);
console.log(clusters, dbscan.noise);
 
/*
RESULT:
[
    [0,1,2],
    [3,4,5],
    [6,7,9],
    [8]
]
 
NOISE: [ 8 ]
*/

Why 3 different clusters of points in 2D have as result a 3x3 matrix?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.