Giter Site home page Giter Site logo

code-pcskm's Introduction

This repository contains the code used to produce the results of the manuscript: A semi-supervised sparse K-Means algorithm (arxiv version).

Code-PCSKM

exeSimus.m: Runs the whole analysis and stores the results inside the ./GenRes/results folder. This file contains the following options:

  • DETERM: 0/1 start without or with a random seed.

  • JMPCKM_OVERLOAD: 0/1 use overloaded or non-overloaded MPCK-Means. The WekaUT library is used for the MPCK-Means algorithm. See Bilenko, M., et al. (2004).

  • CONSTR_PERC: 0/1 use a flat number of constraints or percentages based on size.

  • LOG: (0) no log file and no display, (1) log file only, (2) display only, (else) both display and log file.

  • constraints_type: Type of constraints to use; 0/1 to activate ML and/or CL, when both 1 then equal number of constriants per type is selected when either -1 then random constraints are picked from all the available constraints.

  • constraints_number: flat or percentage of constraints to use.

  • citer: number of iterations per constraints

  • sstep: sparsity parameter values to be tested form 1.1 to sqrt(dimensions) with step sstep.

  • maxIter: iterations for algorithm to reach convergence.

  • kfolds: selection of k for k-fold validation.

CVstatsPer.m: Generates statistics about the data sets such as percentage of used constraints during the k-fold validation.

Citations for software and code that we have used in this project

Density K-Means++:

Nidheesh, N., KA Abdul Nazeer, and P. M. Ameer. "An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data." Computers in biology and medicine 91 (2017): 213-221.

MATLAB code was based on the R implementation of the algorithm; code: dkmpp_0.1.0

MPCK-Means:

Bilenko, Mikhail, Sugato Basu, and Raymond J. Mooney. "Integrating constraints and metric learning in semi-supervised clustering." Proceedings of the twenty-first international conference on Machine learning. 2004.

Modified WekaUT in order to read initial centroids from text files and write results to text files.

Sparse clustering:

Witten, Daniela M., and Robert Tibshirani. "A framework for feature selection in clustering." Journal of the American Statistical Association 105.490 (2010): 713-726.

Brodinová, Šárka, et al. "Robust and sparse k-means clustering for high-dimensional data." Advances in Data Analysis and Classification (2017): 1-28.

MATLAB code was based on the R implementation of the algorithm; packages: sparcl and wrsk

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.