Giter Site home page Giter Site logo

joshlk / k-means-constrained Goto Github PK

View Code? Open in Web Editor NEW
179.0 179.0 39.0 6.68 MB

K-Means clustering - constrained with minimum and maximum cluster size. Documentation: https://joshlk.github.io/k-means-constrained

Home Page: https://github.com/joshlk/k-means-constrained

License: BSD 3-Clause "New" or "Revised" License

Python 32.34% Jupyter Notebook 62.97% Makefile 0.44% Batchfile 0.17% Cython 4.08%
clustering k-means kmeans-constrained maximum-cluster-sizes minimum-cluster-sizes ml optimization python

k-means-constrained's People

Contributors

esmail avatar joshlk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

k-means-constrained's Issues

Possibility to use this on k-modes

I want to use exactly the constrained clusters, as this package offers, since I'd like to create equal-sized clusters.

However, I'm working with categorical data. Is there also a k-modes (or k-prototype) alternative for this algorithm?

Constrained K-Means not implemented to sparse Matrix

I'm trying to apply a Constrained K-means to my data and I get this error "NotImplementedError: Not implemented for sparse X"

Originally I have a dataframe with 132034 rows of title. I convert them to a list then apply a tf-idf fit_transform to it.

Then it converts to a 132034x17693 sparse matrix of type '<class 'numpy.float64'>' with 694509 stored elements in Compressed Sparse Row format>.

Then, I try to apply the model

true_k = 25
smin = 5300
smax = 13200
model = KMeansConstrained(n_clusters=true_k, size_min=smin, size_max=smax, random_state=0,
init='k-means++', n_init=10, max_iter=1000)

True_k was defined based on Elbow Method using common k-means.
smin is based on hypothesis over the sample
smax is also based on hypothesis over the sample.

But I get those error and can't get through. There's no problem at all running the usual K-means. I have 128 GB RAM memory, so, it's not also a lack of processing power.

Python package dependency issue

Hello! I was hoping to use your package on k-means-constrained in order to get equal cluster sizes. However, I have run into the issue where different python packages need different versions. Your package requires numpy >1.22, however numba (which is in the umap package) requires numpy <=1.21. Hence, if I would like to cluster data from umap, I am unable to do so with your package due to the numpy version requirement.

With the latest update, numba can only use numpy <=1.21 so I can't upgrade it any further.

Do you happen to know a way around this?

Thanks!

Warning correction

Hello,

Amazing repo and very useful algorithm !

I opened this issue to maybe correct a warning that generate a lot of warning messages when running the fit_predict method of KMeansConstrained :

"""
k_means_constrained/sklearn_import/metrics/pairwise.py:575: DeprecationWarning: np.float is a deprecated alias for the builtin float. To silence this warning, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
dtype = np.float
"""

In a nutshell, it suggest to replace np.float and np.float32 by float in pairwise.py :)

Resource intensity

Hi,

It seems quite resource intensive compared to regular k-means. I'm working on a dataset with 23,000 examples and 50 features right now and it took almost 15 minutes to find 10 clusters; I set the minimum to be 0.5 the mean size cluster size (1150) and the maximum to be 2x the mean cluster size (4600). Does that seem reasonable? (possibly something wrong with my setup?) Those don't seem like very unreasonable constraints. But I would like to go larger, maybe 1M examples and 150 features. Maybe up to 20 or 50 clusters. Do you have any advice on this? How many iterations is reasonable to use? How big a dataset is reasonable to use if we would like it to complete in less than 15 minutes? Do you have experience to share as far as finding cluster centers from a sub-sample instead of the full sample? Or any papers or other types of references you recommend on these questions.

Thanks, any advice helps!
Carl

Problem with max size of clusters

Hi

Thank you for sharing your code! I used it to cluster my data in 10 cluster with min_size = 3 and max_size = 5. But it returns some clusters with more than max size elements unfortunately. it gives me a cluster with 7 elements sometimes.

[BUG] Incompatible with `ortools >= 9.4` (`No module named 'ortools.graph.pywrapgraph'`)

Describe the bug
k-means-constrained is incompatible with ortools version 9.4+

Minimum working example
from k_means_constrained import KMeansConstrained

Results in:

...
    from k_means_constrained import KMeansConstrained
  File "/home/ubuntu/hbx/python/lib/python3.8/site-packages/k_means_constrained/__init__.py", line 4, in <module>
    from .k_means_constrained_ import KMeansConstrained
  File "/home/ubuntu/hbx/python/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py", line 29, in <module>
    from k_means_constrained.mincostflow_vectorized import SimpleMinCostFlowVectorized
  File "/home/ubuntu/hbx/python/lib/python3.8/site-packages/k_means_constrained/mincostflow_vectorized.py", line 4, in <module>
    from ortools.graph.pywrapgraph import SimpleMinCostFlow
ModuleNotFoundError: No module named 'ortools.graph.pywrapgraph'

Versions:

  • Python: Python 3.8.10
  • Operating system: Linux
  • k-means-constrained: k-means-constrained==0.5.1
  • numpy: numpy==1.23.1
  • scipy: scipy==1.8.1
  • ortools: ortools==9.4.1874
  • joblib: joblib==1.1.0
  • cython (if installed):

[Enhancement] Add support for sample_weight in the fit function

The scikit-learn KMeans algorithm allows support for supplying a weight for each sample in the fit function. See the docs here.

Is this possible to add into the algorithm? i.e. can we have the minimum and maximum bounds account for the sum of all weights instead of the count of all samples? I haven't read into the MinCostFlow algorithm so I don't know how feasible this would be.

facing the issue during running

found the following error:

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Can't install k-means-constrained

When I try to run pip install k-means-constrained, it won't work. Below is the message. Is there a way to get around this?

❯ pip install k-means-constrained
Collecting k-means-constrained
  Using cached k-means-constrained-0.7.2.tar.gz (2.6 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting ortools>=9.4.1874 (from k-means-constrained)
  Using cached ortools-9.6.2534-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.4 MB)
Requirement already satisfied: scipy>=1.6.3 in /home/user-name/miniconda3/envs/kmeans/lib/python3.9/site-packages (from k-means-constrained) (1.9.3)
Requirement already satisfied: numpy>=1.23.0 in /home/user-name/miniconda3/envs/kmeans/lib/python3.9/site-packages (from k-means-constrained) (1.23.5)
Requirement already satisfied: six in /home/user-name/miniconda3/envs/kmeans/lib/python3.9/site-packages (from k-means-constrained) (1.16.0)
Requirement already satisfied: joblib in /home/user-name/miniconda3/envs/kmeans/lib/python3.9/site-packages (from k-means-constrained) (1.1.1)
Collecting absl-py>=0.13 (from ortools>=9.4.1874->k-means-constrained)
  Using cached absl_py-1.4.0-py3-none-any.whl (126 kB)
Collecting protobuf>=4.21.12 (from ortools>=9.4.1874->k-means-constrained)
  Using cached protobuf-4.23.3-cp37-abi3-manylinux2014_x86_64.whl (304 kB)
Collecting scipy>=1.6.3 (from k-means-constrained)
  Using cached scipy-1.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)
Building wheels for collected packages: k-means-constrained
  Building wheel for k-means-constrained (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for k-means-constrained (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [49 lines of output]
      <string>:9: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
      !!
      
              ********************************************************************************
              Requirements should be satisfied by a PEP 517 installer.
              If you are using pip, you can try `pip install --use-pep517`.
              ********************************************************************************
      
      !!
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-39
      creating build/lib.linux-x86_64-cpython-39/k_means_constrained
      copying k_means_constrained/k_means_constrained_.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained
      copying k_means_constrained/__init__.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained
      creating build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import
      copying k_means_constrained/sklearn_import/funcsigs.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import
      copying k_means_constrained/sklearn_import/exceptions.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import
      copying k_means_constrained/sklearn_import/fixes.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import
      copying k_means_constrained/sklearn_import/base.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import
      copying k_means_constrained/sklearn_import/__init__.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import
      creating build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/preprocessing
      copying k_means_constrained/sklearn_import/preprocessing/data.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/preprocessing
      copying k_means_constrained/sklearn_import/preprocessing/__init__.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/preprocessing
      creating build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/externals
      copying k_means_constrained/sklearn_import/externals/funcsigs.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/externals
      copying k_means_constrained/sklearn_import/externals/__init__.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/externals
      creating build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/cluster
      copying k_means_constrained/sklearn_import/cluster/k_means_.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/cluster
      copying k_means_constrained/sklearn_import/cluster/__init__.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/cluster
      creating build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/metrics
      copying k_means_constrained/sklearn_import/metrics/pairwise.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/metrics
      copying k_means_constrained/sklearn_import/metrics/__init__.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/metrics
      creating build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/utils
      copying k_means_constrained/sklearn_import/utils/validation.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/utils
      copying k_means_constrained/sklearn_import/utils/fixes.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/utils
      copying k_means_constrained/sklearn_import/utils/extmath.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/utils
      copying k_means_constrained/sklearn_import/utils/sparsefuncs.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/utils
      copying k_means_constrained/sklearn_import/utils/__init__.py -> build/lib.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/utils
      running build_ext
      building 'k_means_constrained.sklearn_import.cluster._k_means' extension
      creating build/temp.linux-x86_64-cpython-39
      creating build/temp.linux-x86_64-cpython-39/k_means_constrained
      creating build/temp.linux-x86_64-cpython-39/k_means_constrained/sklearn_import
      creating build/temp.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/cluster
      gcc -pthread -B /home/user-name/miniconda3/envs/kmeans/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /home/user-name/miniconda3/envs/kmeans/include -I/home/user-name/miniconda3/envs/kmeans/include -fPIC -O2 -isystem /home/user-name/miniconda3/envs/kmeans/include -fPIC -I/tmp/pip-build-env-a9dlicfw/overlay/lib/python3.9/site-packages/numpy/core/include -I/home/user-name/miniconda3/envs/kmeans/include/python3.9 -c k_means_constrained/sklearn_import/cluster/_k_means.c -o build/temp.linux-x86_64-cpython-39/k_means_constrained/sklearn_import/cluster/_k_means.o
      error: command 'gcc' failed: No such file or directory
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for k-means-constrained
Failed to build k-means-constrained
ERROR: Could not build wheels for k-means-constrained, which is required to install pyproject.toml-based projects

Fitting the k-means-constrained on training samples and predicting on test samples raises error

Hi,
I'm trying to fit the k-means-constrained on training samples and then call it to predict test samples. I am getting the following error message:

~\anaconda3\lib\site-packages\k_means_constrained\k_means_constrained_.py in predict(self, X, size_min, size_max)
708 raise ValueError("size_max must be larger than size_min")
709 if size_min * n_clusters > n_samples:
--> 710 raise ValueError("The product of size_min and n_clusters cannot exceed the number of samples (X)")
711
712 labels, inertia = \

ValueError: The product of size_min and n_clusters cannot exceed the number of samples (X)

It seems there is not enough data in the testing sample to meet the clusters size constraints (here size_min) but is there a way to only apply the clusters sizes constrains in the fitting process and not in the prediction one?

Cannot install on Docker

Background

I have the following Dockerfile:

FROM python:3.7

ADD requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt

ADD . /app

CMD gunicorn -b :$PORT main:app --timeout 60

My requirements.txt file contains k-means-constrained==0.3.3.

Problem

I get the following error on the add-requirements step:

    ERROR: Command errored out with exit status 1:
     command: /env/bin/python3.7 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-08ldu34e/k-means-constrained/setup.py'"'"'; __file__='"'"'/tmp/pip-install-08ldu34e/k-means-constrained/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-zj209e8i
         cwd: /tmp/pip-install-08ldu34e/k-means-constrained/
    Complete output (5 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-08ldu34e/k-means-constrained/setup.py", line 9, in <module>
        from Cython.Build import cythonize
    ModuleNotFoundError: No module named 'Cython'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
The command '/bin/sh -c pip install -r /app/requirements.txt' returned a non-zero code: 1
ERROR

Attempted solutions

  • Changed base image to python:3.7, but got the same result. So the base image is not the problem.
  • Added Cython to requirements.txt above k-means-constrained, but still no luck.

"Relative Tolerance" Definition

There is an error in the documentation that shadows a previous error in the KMeans documentation: specifically, it states that the relative tolerance is with respect to the inertia, when in fact it is with respect to the norm of the change in centroid positions.

See: scikit-learn/scikit-learn#16058

Incidentally, it might be helpful when outputting data when verbose=True to include the current norm at each iteration. For example:

Iteration 51, inertia 497.000

would become:

Iteration 51, inertia 497.000, centre-shift 0.01425.

That being said sklearn doesn't do that so for the sake of compatibility, probably best not to.

Issue in importing k-means-constrained in Google Colab notebook

Hi Josh,

I am having an issue import the k-means-constrained package in my Colab notebook. This was working a week ago but I have been receiving the following error this week, and was wondering if there is a package incompatibility due to the
numpy version I am using:

RuntimeError                              Traceback (most recent call last)
__init__.pxd in numpy.import_array()

RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf . Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem .

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
[<ipython-input-11-89081c05e795>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from k_means_constrained import KMeansConstrained
      2 get_ipython().system('pip install numpy --upgrade')
      3 import numpy as np

2 frames
[/usr/local/lib/python3.10/dist-packages/k_means_constrained/sklearn_import/metrics/pairwise.py](https://localhost:8080/#) in <module>
      8 from joblib import cpu_count, delayed, Parallel
      9 
---> 10 from k_means_constrained.sklearn_import.metrics.pairwise_fast import _sparse_manhattan
     11 
     12 from k_means_constrained.sklearn_import.preprocessing.data import normalize

k_means_constrained/sklearn_import/metrics/pairwise_fast.pyx in init k_means_constrained.sklearn_import.metrics.pairwise_fast()

__init__.pxd in numpy.import_array()

ImportError: numpy.core.multiarray failed to import

Could you please let me know how to resolve this? I have tried installing and uninstalling numpy, and trying to install specific versions of numpy but am still receiving this error. Thank you!

Different constrains per cluster

Is there a way to have a different constrains for different clusters? Like: I wan three clusters, one with a maximum of 6 points and 2 with a maximum or 4.

[How to classify the new instances after obtaining a constrained clustering]

Hi, there.

I want to use the constrained k-means for clustering instances, but these instances are divided into two parts (Let's say instance set I1, I2). After I obtain the clustering result with the I1, I want to obtain the labels of I2 according to the obtained clustering from I1. I can not use like this:

clf.fit(I1)
clf.predict(I2)

Because the constraints of the number of instances in each cluster in the process of fitting will be used in the prediction process.

For example, I1 has 2, 000 instances, and I2 has 500 instances. If I set min instance number (50), max instance number (200) and number of clusters (30) as constrains in clf.fit(I1), I will have errors reported that the number of instances should be more than min*number of clusters in the process in clf.predict(I2). That is number of instances in prediction should be more than 50*30 =1500, but I only have 500 instances in I2.

Installation Command

the installation command pip install k-mean-constrained needs to be changed to pip install k-means-constrained

[BUG] import error with python 3.8 and numpy < 1.20

Describe the bug
when installing k-means-constrained in python3.8 environment with numpy < 1.20 (i.e. 1.19.5) on linux, import fails due to numpy binary incompatibility.

Details:
I think the root cause is that during package installation (PEP-517) the numpy that is used to compile the package is numpy>1.13 (as described in pyproject.toml). on python 3.8 this will collect 1.20 (in python 3.6 it will collect 1.19.5).
so the package is built with numpy1.20 but if my environment has a different numpy (1.19.5) there is some binary incompatibility issue.
maybe its better to remove numpy from pyproject.toml and let it use the installed numpy instead?

Minimum working example

Python 3.8.9 (default, Apr  3 2021, 01:00:00) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from k_means_constrained import KMeansConstrained
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/e101364/views/1v_main/Qstreams/provision-algobb/assaf/lib/python3.8/site-packages/k_means_constrained/__init__.py", line 4, in <module>
    from .k_means_constrained_ import KMeansConstrained
  File "/home/e101364/views/1v_main/Qstreams/provision-algobb/assaf/lib/python3.8/site-packages/k_means_constrained/k_means_constrained_.py", line 18, in <module>
    from .sklearn_import.metrics.pairwise import euclidean_distances
  File "/home/e101364/views/1v_main/Qstreams/provision-algobb/assaf/lib/python3.8/site-packages/k_means_constrained/sklearn_import/metrics/pairwise.py", line 10, in <module>
    from k_means_constrained.sklearn_import.metrics.pairwise_fast import _sparse_manhattan
  File "k_means_constrained/sklearn_import/metrics/pairwise_fast.pyx", line 1, in init k_means_constrained.sklearn_import.metrics.pairwise_fast
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

Versions:

  • Python: 3.8.9
  • Operating system: linux (ubuntu18.04)
  • k-means-constrained: 0.5.2
  • numpy: 1.19.5
  • scipy: 1.6.3
  • ortools: 9.0.9048
  • joblib: 1.0.1
  • cython (if installed):

Thanks

Issue with min cost flow input

I'm receiving the following error when calling KMeansConstrained.predict():

"There was an issue with the min cost flow input."

This model contains 5 clusters and is running on 850 data points. During training, I have set the min and max size to be 140 and 200 respectively. If I relax the size constraint to 120, 220, predict() works fine.

Usage for size_max < sample size

Describe the bug
Not a bug but a question. Fitting KMeansConstrained with X.shape[0] < size_max throws ValueError: size_min and size_max must be a positive number smaller than the number of data points or None, which I understand. However, in my case, this may be violated without any consequence to the output. See MWE below.

Minimum working example

X = np.array([[0, 0, 0], [0, 0, 0]])
clst = KMeansConstrained(
    n_clusters=1,
    size_min=1,
    size_max=3,
)
clst.fit(X)

In this case, it should fit a single cluster. Not the biggest of deals as I could implement a try/except or pre-check input array shape and size_max similar to the source code to bypass the ValueError. I am just wondering if this is an edge case.

Some context: In the analysis I am trying to run, I am running

n = math.ceil(X.shape[0] / 3)
clst = KMeansConstrained(n_clusters=n, size_min=1, size_max=3)
clst.fit(X)

over different Xs -- most of which are >100 samples apart for some few odd ones that have 1-2 samples in them. Again, I could work my way around it, just wondering about the size_max < sample size check.

Thanks for the great library!

Segmentation fault when import k_means_constrained

Hi,
I successfully installed k_means_constrained with pip but when I try to import it in python I get a segmentation fault error, as follows.

_$ python3
Python 3.9.0 (default, Nov 21 2020, 14:01:50)
[Clang 12.0.0 (clang-1200.0.32.27)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import k_means_constrained
Segmentation fault: 11_

Versions:

  • Python: 3.9.0
  • Operating system: MacOS 10.15.4
  • k-means-constrained: 0.7.3
  • numpy: 1.23.2
  • scipy: 1.9.3
  • ortools: 9.8.3296
  • joblib: 1.2.0

Weighting observations

As far as I can tell, it is not currently possible to give weights to observations. One obvious workaround is to duplicate points but this would be very inefficient. Has this been considered as a future development?

In my example, I have a series of places that I'm clustered and would like to take account of their population in the max min cluster sizes.

Thanks
Rob

AttributeError: 'KMeansConstrained' object has no attribute '_check_fit_data' getting while fitting the data

AttributeError Traceback (most recent call last)
in
6 if a>0:
7 km_constrained = KMeansConstrained(n_clusters=a,size_max=4,size_min=3,init='k-means++',n_init=100,random_state=None)
----> 8 km_constrained.fit(X)
9 #k_cons.append(km_constrained)

~\AppData\Local\Continuum\anaconda3.1\lib\site-packages\k_means_constrained-0.2.0-py3.7-win-amd64.egg\k_means_constrained\k_means_constrained_.py in fit(self, X, y)
621 """
622 random_state = check_random_state(self.random_state)
--> 623 X = self.check_fit_data(X)
624
625 self.cluster_centers
, self.labels_, self.inertia_, self.n_iter_ = \

AttributeError: 'KMeansConstrained' object has no attribute '_check_fit_data'

[BUG] Failed to build k-means-constrained

Describe the bug
When trying to install k-means-constrained 0.7.1 on a Linux machine under Spark, I get the following error:

Failed to build k-means-constrained

22/09/05 10:06:58 INFO SharedDriverContext: Failed to attach library dbfs:/FileStore/jars/df01ffad_9c82_47d0_af9f_d58e5a9167f7/web_extension-4.0-py3-none-any.whl to Spark
org.apache.spark.SparkException: Process List(/databricks/python/bin/pip, install, --upgrade, --find-links=/local_disk0/spark-418a7f7d-d386-417e-badc-e56ea3a21c6e/userFiles-6ca93792-2269-488d-b718-a0c3fb538085, /local_disk0/spark-418a7f7d-d386-417e-badc-e56ea3a21c6e/userFiles-6ca93792-2269-488d-b718-a0c3fb538085/activity_discovery_web_extension-4.0-py3-none-any.whl, --disable-pip-version-check) exited with code 1.   ERROR: Command errored out with exit status 1:
   command: /databricks/python3/bin/python /databricks/python3/lib/python3.8/site-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmp4ehqh6yz
       cwd: /tmp/pip-install-_5fx8ys_/k-means-constrained_4ee597c098d6445ea4b8d6ca9744eeb4
  Complete output (38 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.8
  creating build/lib.linux-x86_64-3.8/k_means_constrained
  copying k_means_constrained/mincostflow_vectorized.py -> build/lib.linux-x86_64-3.8/k_means_constrained
  copying k_means_constrained/__init__.py -> build/lib.linux-x86_64-3.8/k_means_constrained
  copying k_means_constrained/k_means_constrained_.py -> build/lib.linux-x86_64-3.8/k_means_constrained
  creating build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import
  copying k_means_constrained/sklearn_import/__init__.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import
  copying k_means_constrained/sklearn_import/fixes.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import
  copying k_means_constrained/sklearn_import/base.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import
  copying k_means_constrained/sklearn_import/funcsigs.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import
  copying k_means_constrained/sklearn_import/exceptions.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import
  creating build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/metrics
  copying k_means_constrained/sklearn_import/metrics/__init__.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/metrics
  copying k_means_constrained/sklearn_import/metrics/pairwise.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/metrics
  creating build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/utils
  copying k_means_constrained/sklearn_import/utils/__init__.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/utils
  copying k_means_constrained/sklearn_import/utils/sparsefuncs.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/utils
  copying k_means_constrained/sklearn_import/utils/fixes.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/utils
  copying k_means_constrained/sklearn_import/utils/extmath.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/utils
  copying k_means_constrained/sklearn_import/utils/validation.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/utils
  creating build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/preprocessing
  copying k_means_constrained/sklearn_import/preprocessing/__init__.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/preprocessing
  copying k_means_constrained/sklearn_import/preprocessing/data.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/preprocessing
  creating build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/externals
  copying k_means_constrained/sklearn_import/externals/__init__.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/externals
  copying k_means_constrained/sklearn_import/externals/funcsigs.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/externals
  creating build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/cluster
  copying k_means_constrained/sklearn_import/cluster/__init__.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/cluster
  copying k_means_constrained/sklearn_import/cluster/k_means_.py -> build/lib.linux-x86_64-3.8/k_means_constrained/sklearn_import/cluster
  running build_ext
  creating build/temp.linux-x86_64-3.8
  creating build/temp.linux-x86_64-3.8/k_means_constrained
  x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/tmp/pip-build-env-j0gh3wdg/overlay/lib/python3.8/site-packages/numpy/core/include -I/databricks/python3/include -I/usr/include/python3.8 -c k_means_constrained/mincostflow_vectorized_.c -o build/temp.linux-x86_64-3.8/k_means_constrained/mincostflow_vectorized_.o
  error: command 'x86_64-linux-gnu-gcc' failed: No such file or directory
  ----------------------------------------
  ERROR: Failed building wheel for k-means-constrained
ERROR: Could not build wheels for k-means-constrained which use PEP 517 and cannot be installed directly

Minimum working example
pip install k_means_constrained

Versions:

  • Python: 3.8
  • Operating system: Linux
  • k-means-constrained: 0.7.1
  • numpy: 1.23.2
  • scipy: 1.8.0
  • ortools: 9.3.10497
  • joblib: 1.1.0
  • cython (if installed): 0.29.32

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Describe the bug
A clear and concise description of what the bug is.

While importing the library, this error happens:
ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
so I did following command to downgrade numpy and install k_means_constrained again:

Minimum working example
Code and minimum data to reproduce the error. The example should be copy and pastable to reproduce the problem:
from k_means_constrained import KMeansConstrained

Versions:

  • Python: 3.9.7
  • Operating system: [MacOS]
  • k-means-constrained: 0.7.2
  • numpy:1.23.4
  • scipy:1.7.3
  • ortools:9.4.1874
  • joblib:1.1.0
  • cython (if installed):0.29.24

I installed k_means_constrained which upgraded numpy to 1.23.4 and when importing k_means_constrained created

ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
so I did following command to downgrade numpy and install k_means_constrained again:

!pip uninstall numpy -y
!conda install -y -c conda-forge numpy
!pip uninstall k_means_constrained -y
!pip install k_means_constrained --no-binary k_means_constrained
right now I have k-means-constrained-0.7.2, numpy to 1.23.4 and still same error. even changed to k-means-constrained-0.7.0 and numpy to 1.21.5 still same error I have this issue on my Apple M1 pro, Anaconda environment, Python 3.9.7.
Any advice?

Update imports to conform to scikit-learn 0.23

This warning pops up when using k-means-constrained library with scikit version < 0.23:

sklearn/externals/six.py:31: FutureWarning: The module is deprecated in version 0.21 and will be removed in version 0.23 since we've dropped support for Python 2.7. Please rely on the official version of six (https://pypi.org/project/six/)

When updated to scikit >= 0.23, imports error and code fails.

There was an issue with the min cost flow input.

I get the following exception when attempting to cluster my dataset, following the example in the readme.
My data consists of ~17k 512-dimensional floating point vectors.
I receive this error regardless of the parameter values I set.

Can't import K-means-constrained

Hello,
When I try to import the package, I get the following error :

 from k_means_constrained import KMeansConstrained
  File "C:\Users\Public\Anaconda3\lib\site-packages\k_means_constrained\__init__.py", line 4, in <module>
    from .k_means_constrained_ import KMeansConstrained
  File "C:\Users\Public\Anaconda3\lib\site-packages\k_means_constrained\k_means_constrained_.py", line 29, in <module>
    from k_means_constrained.mincostflow_vectorized import SimpleMinCostFlowVectorized
  File "C:\Users\Public\Anaconda3\lib\site-packages\k_means_constrained\mincostflow_vectorized.py", line 4, in <module>
    from ortools.graph.pywrapgraph import SimpleMinCostFlow
  File "C:\Users\Public\Anaconda3\lib\site-packages\ortools\graph\pywrapgraph.py", line 13, in <module>
    from . import _pywrapgraph
ImportError: DLL load failed while importing _pywrapgraph: Le module spécifié est introuvable.

These are the versions of the required packages:

Name: k-means-constrained
Version: 0.6.0 

Name: numpy
Version: 1.20.3

Name: scipy
Version: 1.6.3

Name: ortools
Version: 9.0.9048

Thanks in advance

Maybe tag a new release?

I'm getting the following build error while doing a pip install after upgrading to Python 3.11.

Downloading k_means_constrained-0.3.3.tar.gz (369 kB)
     �������������������������������������� 369.8/369.8 kB 38.7 MB/s eta 0:00:00      
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'error'
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-h7ivomt8/k-means-constrained_9e6bbb958901423eb92830efe879d106/setup.py", line 9, in <module>
          from Cython.Build import cythonize
      ModuleNotFoundError: No module named 'Cython'   
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

From your source code I see that you fixed this issue using a try .. except block, and I was able to install the package by specifying the constraint

k-means-constrained @ git+https://github.com/joshlk/k-means-constrained.git@master#egg=k_means_constrained

Do create a new tag with the changes, so that they're pulled when people do a simple pip install. :)

numpy.ndarray size changed?

error

I am getting the above error when I attempt to use this. This is what it said when I installed the latest version as was the recommended solution to a similar issue:

install

I use Ubuntu 20.04.03, Python 3.8.10, and am in working in Jupyter Lab 3.2.4.

This is a similar issue I found for another package (which I don't currently use) while researching this problem, if that is helpful: MaartenGr/BERTopic#392

It seems notable to me that now it's asking for 96 and getting 88, when before (in #12) it was asking for 88 and getting 80, but I am unfamiliar with this topic area.

Apologies if this is just me doing something wrong.

[BUG] Won't install

Describe the bug
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for k_means_constrained
Building wheel for rpy2 (setup.py) ... done
Created wheel for rpy2: filename=rpy2-3.5.3-py3-none-any.whl size=207845 sha256=0c527502a43f4996efb61dfeb687e1794c223af9740121fac9bb43e8b27d4175
Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\58\f3\83\9105378219a010ded1729668fe32e073186237ba8a223ea4ce
Successfully built rpy2
Failed to build k_means_constrained
ERROR: Could not build wheels for k_means_constrained, which is required to install pyproject.toml-based projects

(JupyterLab) C:\Users\User\Documents\wiki\wiki\dev\python\Python-Stock\code\Screener>pip install --upgrade wheel
Requirement already satisfied: wheel in c:\users\user\appdata\local\programs\jupyterlab\lib\site-packages (0.37.1)

Note: Tried to install Microsoft Visual C++ 14.0 via two files BuildTools_MSBuild.msi and BuildTools_MSBuild.exe. The msi I found, but could not get to work properly. When I installed the .exe, v14 installs, but still would throw the same error when I tried to pip install.

Minimum working example
pip install k-means_constrained

Versions:

  • Python: 3.10.5
  • Operating system: [Windows/MacOS/Linux] Win10
  • k-means-constrained:
  • numpy:
  • scipy:
  • ortools:
  • joblib:
  • cython (if installed):

[BUG] installation issues with numpy < 1.23

Describe the bug
Whilek-means-constrained claims to support numpy>=1.22.0 in setup.py, actually attempting to install and use it results in an error related to the numpy API version:

Python 3.10.4 (main, Jun 29 2022, 12:14:53) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import k_means_constrained
Traceback (most recent call last):
  File "__init__.pxd", line 942, in numpy.import_array
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sconstable/local-dev/venv/lib/python3.10/site-packages/k_means_constrained/__init__.py", line 4, in <module>
    from .k_means_constrained_ import KMeansConstrained
  File "/home/sconstable/local-dev/venv/lib/python3.10/site-packages/k_means_constrained/k_means_constrained_.py", line 18, in <module>
    from .sklearn_import.metrics.pairwise import euclidean_distances
  File "/home/sconstable/local-dev/venv/lib/python3.10/site-packages/k_means_constrained/sklearn_import/metrics/pairwise.py", line 10, in <module>
    from k_means_constrained.sklearn_import.metrics.pairwise_fast import _sparse_manhattan
  File "k_means_constrained/sklearn_import/metrics/pairwise_fast.pyx", line 26, in init k_means_constrained.sklearn_import.metrics.pairwise_fast
  File "__init__.pxd", line 944, in numpy.import_array
ImportError: numpy.core.multiarray failed to import

Minimum working example
The following Dockerfile is enough to reproduce the issue for me:

from ubuntu:20.04
RUN apt-get update && apt-get install -y python3-pip
RUN pip3 install numpy==1.22.4 k-means-constrained
RUN python3 -c 'import k_means_constrained'

Versions:

  • Python: 3.8, 3.10
  • Operating system: [Windows/MacOS/Linux] Ubuntu 20.04, 22.04
  • k-means-constrained: 0.7.1
  • numpy: 1.22.4
  • scipy: 1.9.0
  • ortools: 9.0.9048
  • joblib: 1.1.0
  • cython (if installed): 0.29.32

[BUG] error compiling on apple silicon.

When trying to pip install k-means-constrained on a apple silicon Mac running 12.6.2, I get the following error

      gcc -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/pringle/opt/miniconda3/envs/parcels_Devel_mar2023/include -arch arm64 -fPIC -O2 -isystem /Users/pringle/opt/miniconda3/envs/parcels_Devel_mar2023/include -arch arm64 -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/pringle/opt/miniconda3/envs/parcels_Devel_mar2023/include -D_FORTIFY_SOURCE=2 -isystem /Users/pringle/opt/miniconda3/envs/parcels_Devel_mar2023/include -I/private/var/folders/m7/lqt6wj8n3f5f9lzgf80t8l3m0000gn/T/pip-build-env-wt28n0t8/overlay/lib/python3.11/site-packages/numpy/core/include -I/Users/pringle/opt/miniconda3/envs/parcels_Devel_mar2023/include/python3.11 -c k_means_constrained/sklearn_import/cluster/_k_means.c -o build/temp.macosx-11.0-arm64-cpython-311/k_means_constrained/sklearn_import/cluster/_k_means.o
      clang: error: the clang compiler does not support '-march=core2'
      error: command '/usr/bin/gcc' failed with exit code 1

Happy to learn from others experiences before I dig into it myself...

Jamie Pringle

Wrong clustering

Hello,
I have a bunch of house points. I have extended a line which extends Glass fiber to each house.
Now, I'd like to cluster the points to assign a distributor to each cluster. The maximum of house points per cluster should be 20.
I calculated an adjacency matrix based on the Glass fiber line for the data set which has 61 points, here as an example. I do the clustering by this library and the pre-computed adjacency matrix. However, I sometimes see a wrong clustering, which is observable in the picture.

It is my code:
"am" is the adjacency matrix of distances
db = KMeansConstrained(n_clusters = 4,size_max=20, random_state=0)
result = db.fit_predict(am)

In the picture, the black line is the glass fiber line which is the base of calculation and the colorful points are the my clustered points by the algorithm. The green and yellow clusters are not in the best state, as you see.
I have sometimes the same issue with other datasets, as well.

kmeans constrained

I appreciate any help to improve the result.

Versions:

  • Python: 3.9
  • Operating system: Windows
  • k-means-constrained: 0.7.2
  • numpy: 1.23.2
  • scipy: 1.9.1
  • ortools: 9.4.1874
  • joblib: 1.1.0
  • cython (if installed): is not installed

Best regards,
Mostafa

Microsoft Visual C++is required

Hello,
When installing k-means-constrained, I get the error: "Microsoft Visual C++ 14.0 is required."
Do you know if the Redistribuable Visual C++ package is sufficient or do we need the whole Microsoft Visual?
Thank you!

import k_means_constrained

Hi,

I'm having some trouble importing k-means-constrained.

I've already downloaded but it seems that I'm missing somenthing:
Requirement already satisfied: k-means-constrained in c:\apps\anaconda\lib\site-packages (0.7.3) Requirement already satisfied: ortools>=9.4.1874 in c:\apps\anaconda\lib\site-packages (from k-means-constrained) (9.7.2996) Requirement already satisfied: scipy>=1.6.3 in c:\apps\anaconda\lib\site-packages (from k-means-constrained) (1.10.1) Requirement already satisfied: numpy>=1.23.0 in c:\apps\anaconda\lib\site-packages (from k-means-constrained) (1.24.3) Requirement already satisfied: six in c:\apps\anaconda\lib\site-packages (from k-means-constrained) (1.16.0) Requirement already satisfied: joblib in c:\apps\anaconda\lib\site-packages (from k-means-constrained) (1.2.0) Requirement already satisfied: absl-py>=0.13 in c:\apps\anaconda\lib\site-packages (from ortools>=9.4.1874->k-means-constrained) (2.0.0) Requirement already satisfied: protobuf>=4.23.3 in c:\apps\anaconda\lib\site-packages (from ortools>=9.4.1874->k-means-constrained) (4.24.4)

When I try to import like this I have the following error : "No module named 'k_means_constrained' "
import k_means_constrained as kmc

I don't get what I'm doing wrong. Can you please advise?

Thank you,
ML

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.