richard-moulton / d-stream Goto Github PK
View Code? Open in Web Editor NEWImplementation of the D-Stream clustering algorithm for use in MOA. An earlier version is included as part of the MOA 17.06 release.
License: Apache License 2.0
Implementation of the D-Stream clustering algorithm for use in MOA. An earlier version is included as part of the MOA 17.06 release.
License: Apache License 2.0
Grid Clusters and Grid Groups are both defined in Chen and Tu 2007 (Definitions 3.6 and 3.4 respectively).
It is possible that a Grid Cluster could degenerate to the point where it is no longer a Grid Group and this should be checked for at some point in the adjustClustering method. Investigate whether or not to include as part of the cleanClusters method or implement separately.
The datasets in Chen and Tu 2007 are preprocessed so that each numerical dimension is normalized and partitioned.
It isn't clear how to (or if) the D-Stream algorithm should handle numerical attributes. This implementation keeps track of the max and min values seen in each numerical dimension and partitions them into integer bins, recalculating N as required.
The procedure for dynamically adjusting clusters, Figure 4 of Chen and Tu 2007, describes adjusting a newly dense grid based on which of its neighbours belongs to the largest cluster. It is not stated what should be done if none of the grid's neighbours are clustered.
This implementation generates a new cluster in the adjustForDenseGrid method around the grid in question using the initial clustering procedure. Otherwise completely valid clusters would fail to form because they cannot bootstrap themselves into existence.
how to run project?
Hi!
When I used your code, I found a potential bug of 0 divided by 0:
Exception in thread "main" java.lang.ArithmeticException: / by zero
at moa.clusterers.dstream.Dstream.trainOnInstanceImpl(Dstream.java:348)
at moa.clusterers.AbstractClusterer.trainOnInstance(AbstractClusterer.java:131)
at mytest.TestDStream.main(TestDStream.java:24)
My testing code is:
package mytest;
import com.yahoo.labs.samoa.instances.Instance;
import moa.cluster.Clustering;
import moa.clusterers.dstream.Dstream;
import moa.streams.clustering.ClusteringStream;
import moa.streams.clustering.FileStream;
public class TestDStream {
public static void main(String[] args) {
FileStream fStream = new FileStream();
fStream.arffFileOption.setValue("test.arff");// set the ARFF file name
fStream.normalizeOption.setValue(false);// set normalized to be true or false
fStream.prepareForUse();
int numLines = 0;
//DStream
Dstream dstream = new Dstream();
dstream.resetLearning();
ClusteringStream stream = fStream;
while (stream.hasMoreInstances()) {
Instance curr = stream.nextInstance().getData();
dstream.trainOnInstance(curr);
numLines++;
}
Clustering resDstream = dstream.getClusteringResult();
dstream.getMicroClusteringResult();
System.out.println("Size of result from Dstream: " + resDstream.size());
System.out.println(numLines + " lines have been read");
}
}
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.