Giter Site home page Giter Site logo

gagolews / genie Goto Github PK

View Code? Open in Web Editor NEW
21.0 5.0 3.0 419 KB

Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)

Home Page: http://genieclust.gagolewski.com/

R 5.43% C++ 94.01% Makefile 0.56%
hierarchical-clustering-algorithm genie cluster outliers cluster-analysis r machine-learning-algorithms clustering machine-learning data-science

genie's Introduction

Genie (R Package)

This project has been superseded by genieclust (see also: GitHub), which features a faster and more feature-rich implementation of Genie (available for both R and Python).

A Fast and Robust Hierarchical Clustering Algorithm

The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts the use of all the classical linkage criteria at a disadvantage, with the exception of the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms and therefore usually does not reflect the true underlying structure of analysed data - unless the clusters are well-separated. To overcome its limitations, we proposed a hierarchical clustering linkage criterion called Genie. Namely, our algorithm links two clusters in such a way that the Gini measure of inequity of the cluster sizes does not exceed a given threshold. This method most often outperforms the Ward or average linkage in terms of the clustering quality on benchmark data. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. The algorithm is easily parallelisable and thus may be run on multiple threads to speed up its execution further on. Its memory overhead is small: there is no need to precompute the complete distance matrix to perform the computations in order to obtain a desired clustering.

A detailed description of the algorithm can be found in:

Gagolewski M., Bartoszuk M., Cena A., Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm, Information Sciences 363, 2016, 8–23. doi:10.1016/j.ins.2016.05.003.

See also:

Gagolewski M., genieclust: Fast and robust hierarchical clustering, SoftwareX 15, 2021, 100722. doi:10.1016/j.softx.2021.100722.

Authors: Marek Gagolewski, Maciej Bartoszuk, and Anna Cena

CRAN entry: https://cran.r-project.org/web/packages/genie/

genieclust: https://genieclust.gagolewski.com/, https://github.com/gagolews/genieclust/, https://cran.r-project.org/web/packages/genieclust/

genie's People

Contributors

bartoszukm avatar cenka avatar gagolews avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

genie's Issues

OWA-based linkages

Add support for the "smoothed" single linkage (OWAs over few nearest neighbours) as described in our manuscript submitted for publication in Information Sciences.

installation of genie

I am trying to install the genie package (R 3.4.0, Linux 64 Mint) and get the message below. From the message I can't see what the problem is, can you help me out?

> install.packages("genie", dep = T)
Installing package into '/home/henk/R/x86_64-pc-linux-gnu-library/3.4'
(as 'lib' is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/genie_1.0.4.tar.gz'
Content type 'application/x-gzip' length 30845 bytes (30 KB)
==================================================
downloaded 30 KB

* installing *source* package ‘genie’ ...
** package ‘genie’ successfully unpacked and MD5 sums checked
** libs
g++ -std=gnu++11 -I/usr/share/R/include -DNDEBUG  -I"/home/henk/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include"   -fopenmp -fpic  -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c RcppExports.cpp -o RcppExports.o
In file included from /home/henk/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp.h:34:0,
                 from RcppExports.cpp:4:
/home/henk/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/include/Rcpp/grow.h:73:47: fatal error: Rcpp/generated/grow__pairlist.h: No such file or directory
compilation terminated.
/usr/lib/R/etc/Makeconf:168: recipe for target 'RcppExports.o' failed
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for package ‘genie’
* removing ‘/home/henk/R/x86_64-pc-linux-gnu-library/3.4/genie’
Warning in install.packages :
  installation of package 'genie' had non-zero exit status

The downloaded source packages are in
	'/tmp/RtmpCiD3pB/downloaded_packages'

how to find best thresholdgini ??

hi @gagolews ,
what's threshlodgini meaning in hclust2? a deadline, over which belongs to group a ,else belongs to group b ? or sm other interesting things ?
how to find the best thresholegini in hclust2 ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.