Giter Site home page Giter Site logo

richard-moulton / d-stream Goto Github PK

View Code? Open in Web Editor NEW
12.0 12.0 1.0 72 KB

Implementation of the D-Stream clustering algorithm for use in MOA. An earlier version is included as part of the MOA 17.06 release.

License: Apache License 2.0

Java 100.00%

d-stream's People

Contributors

richard-moulton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

bigredbull

d-stream's Issues

When to check that Grid Clusters remain valid Grid Groups

Grid Clusters and Grid Groups are both defined in Chen and Tu 2007 (Definitions 3.6 and 3.4 respectively).

It is possible that a Grid Cluster could degenerate to the point where it is no longer a Grid Group and this should be checked for at some point in the adjustClustering method. Investigate whether or not to include as part of the cleanClusters method or implement separately.

Proper processing of numerical dimensions

The datasets in Chen and Tu 2007 are preprocessed so that each numerical dimension is normalized and partitioned.

It isn't clear how to (or if) the D-Stream algorithm should handle numerical attributes. This implementation keeps track of the max and min values seen in each numerical dimension and partitions them into integer bins, recalculating N as required.

Unclear action for adjustForDenseGrid() when no neighbours are clustered

The procedure for dynamically adjusting clusters, Figure 4 of Chen and Tu 2007, describes adjusting a newly dense grid based on which of its neighbours belongs to the largest cluster. It is not stated what should be done if none of the grid's neighbours are clustered.

This implementation generates a new cluster in the adjustForDenseGrid method around the grid in question using the initial clustering procedure. Otherwise completely valid clusters would fail to form because they cannot bootstrap themselves into existence.

divided by 0

Hi!

When I used your code, I found a potential bug of 0 divided by 0:

Exception in thread "main" java.lang.ArithmeticException: / by zero
at moa.clusterers.dstream.Dstream.trainOnInstanceImpl(Dstream.java:348)
at moa.clusterers.AbstractClusterer.trainOnInstance(AbstractClusterer.java:131)
at mytest.TestDStream.main(TestDStream.java:24)

My testing code is:

package mytest;

import com.yahoo.labs.samoa.instances.Instance;

import moa.cluster.Clustering;
import moa.clusterers.dstream.Dstream;
import moa.streams.clustering.ClusteringStream;
import moa.streams.clustering.FileStream;

public class TestDStream {

	public static void main(String[] args) {
		FileStream fStream = new FileStream();
		fStream.arffFileOption.setValue("test.arff");// set the ARFF file name
		fStream.normalizeOption.setValue(false);// set normalized to be true or false
		fStream.prepareForUse();
		int numLines = 0;
		//DStream
		Dstream dstream = new Dstream();
		dstream.resetLearning();
		ClusteringStream stream = fStream;
		while (stream.hasMoreInstances()) {
			Instance curr = stream.nextInstance().getData();
			dstream.trainOnInstance(curr);
			numLines++;
		}
		Clustering resDstream = dstream.getClusteringResult();
		dstream.getMicroClusteringResult();
		System.out.println("Size of result from Dstream: " + resDstream.size());
		System.out.println(numLines + " lines have been read");
	}
}

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.