Giter Site home page Giter Site logo

k-means's Introduction

K-Means Clustering:

K-means clustering is a classical clustering algorithm that uses an expectation maximization like technique to partition a number of data points into k clusters. 
K-means clustering is commonly used for a number of classification applications.  Because k-means is run on such large data sets, and because of certain characteristics of the algorithm, it is a good candidate for parallelization.

The goal of this project was to implement a framework in java for performing k-means clustering using Hadoop MapReduce. 

In this problem, we have considered inputs a set of n 1-dimensional points and desired clusters of size 3.
Once the k initial centers are chosen, the distance is calculated(Euclidean distance) from every point in the set to each of the 3 centers & point with the corresponding center is emitted by the mapper. Reducer collect all of the points of a particular centroid and calculate a new centroid and emit.

Termination Condition:
When difference between old and new centroid is less than or equal to 0.1

		
Algorithm: 
Step1: Initially randomly centroid is selected based on data. In our implementation we used 3 centroids. 
Step2: The Input file contains initial centroid and data. 
Step3: In Mapper class "configure" function is used to first open the file and read the centroids and store in the data structure(we used ArrayList)
Step4: Mapper read the data file and emit the nearest centroid with the point to the reducer. 
Step5: Reducer collect all this data and calculate the new corresponding centroids and emit. 
Step6: In the job configuration, we are reading both files and checking 
		if difference between old and new centroid is less than 0.1 then 
			convergence is reached 
	    else 
		    repeat step 2 with new centroids.
		  

Samples
For Centroid, this should be fine:
20.0
30.0
40.0

For data something like this simple should work:
20
23
19
29
33
29
43
35
18
25
27

k-means's People

Contributors

himank avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.