Giter Site home page Giter Site logo

rymc / n2d Goto Github PK

View Code? Open in Web Editor NEW
123.0 6.0 21.0 137.84 MB

A deep clustering algorithm. Code to reproduce results for our paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.

License: GNU General Public License v3.0

Python 88.27% Shell 11.73%
clustering deep-clustering-algorithms machine-learning

n2d's Introduction

N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.

Abstract

Deep clustering has increasingly been demonstrating superiority over conventional shallow clustering algorithms. Deep clustering algorithms usually combine representation learning with deep neural networks to achieve this performance, typically optimizing a clustering and non-clustering loss. In such cases, an autoencoder is typically connected with a clustering network, and the final clustering is jointly learned by both the autoencoder and clustering network. Instead, we propose to learn an autoencoded embedding and then search this further for the underlying manifold. For simplicity, we then cluster this with a shallow clustering algorithm, rather than a deeper network. We study a number of local and global manifold learning methods on both the raw data and autoencoded embedding, concluding that UMAP in our framework is able to find the best clusterable manifold of the embedding. This suggests that local manifold learning on an autoencoded embedding is effective for discovering higher quality clusters. We quantitatively show across a range of image and time-series datasets that our method has competitive performance against the latest deep clustering algorithms, including out-performing current state-of-the-art on several. We postulate that these results show a promising research direction for deep clustering.

Results

N2D results

Visualizations

MNIST

HAR (Human Activity Recognition)

Note: clusters 'look' better in higher dimensions (based on clustering metrics) than they do here in 2d. The intended use of n2d is for clustering. Visualized here are the first 5000 points.

Paper

https://arxiv.org/abs/1908.05968

Install

Install Anaconda

wget https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-x86_64.sh
bash Anaconda3-2019.07-Linux-x86_64.sh
source anaconda3/bin/activate

Create environment

conda create -n n2d python=3.7  
conda activate n2d

Clone repo

git clone https://github.com/rymc/n2d.git

Install packages

pip install -r requirements.txt

Reproduce results

bash run.sh

For training a new network

If you remove the --ae_weights argument when running n2d then it will train a new network, rather than load the pretrained weights.

For adding a new dataset you should add a load function to datasets.py (you can use the existing ones to understand how) and a function to call your data loading function from n2d.py

I used the following packages for training the networks using the GPU.

conda install tensorflow-gpu=1.13.1 cudatoolkit=9.0

Visualization

If you would like to produce some plots for visualization purposes add the agument '--visualize'. I also reccomend setting the argument '--umap_dim' to be 2.

Citation

@inproceedings{McConville2020,
  author = {Ryan McConville and Raul Santos-Rodriguez and Robert J Piechocki and Ian Craddock},
  title = {N2D:(Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding},
  booktitle = {25th International Conference on Pattern Recognition, {ICPR} 2020},
  publisher = {{IEEE} Computer Society},
  year = {2020},
}

n2d's People

Contributors

rymc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

n2d's Issues

Choice of architecture/relation to t-sne

Hi Ryan, when you get the chance do you think you could walk through the choice of the [500, 500, 2000, c] architecture, especially with relation to t-sne (as mentioned in the paper)? I’ve been trying to understand it so I can confidently explain it to my advisor, but coming up pretty short :)

Best,

David

how to use it if there's no labels

Thanks for the implementation, I am trying to use the code on an unsupervised clustering problem. Is it possible to not feed labels? Thanks!

Exception after exception - code is not running

Hello!
I hope there are still people hanging around in the repo and maybe know the issue. After completing the steps described in readme I got errors if I try to run the model. Here is what I got:

/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/errors.py", line 744, in new_error_context
yield
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 230, in lower_block
self.lower_inst(inst)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 328, in lower_inst
self.storevar(val, inst.target.name)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 1278, in storevar
raise AssertionError(msg)
AssertionError: Storing i64 to ptr of i32 ('dim'). FE type int32

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "n2d.py", line 15, in
import umap
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/umap/init.py", line 1, in
from .umap_ import UMAP
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/umap/umap_.py", line 54, in
from umap.layouts import (
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/umap/layouts.py", line 36, in
"dim": numba.types.int32,
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/decorators.py", line 221, in wrapper
disp.compile(sig)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/dispatcher.py", line 909, in compile
cres = self._compiler.compile(args, return_type)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/dispatcher.py", line 79, in compile
status, retval = self._compile_cached(args, return_type)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/dispatcher.py", line 93, in _compile_cached
retval = self._compile_core(args, return_type)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/dispatcher.py", line 111, in _compile_core
pipeline_class=self.pipeline_class)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler.py", line 606, in compile_extra
return pipeline.compile_extra(func)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler.py", line 353, in compile_extra
return self._compile_bytecode()
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler.py", line 415, in _compile_bytecode
return self._compile_core()
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler.py", line 395, in _compile_core
raise e
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler.py", line 386, in _compile_core
pm.run(self.state)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler_machinery.py", line 339, in run
raise patched_exception
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler_machinery.py", line 330, in run
self._runPass(idx, pass_inst, state)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler_machinery.py", line 289, in _runPass
mutated |= check(pss.run_pass, internal_state)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler_machinery.py", line 262, in check
mangled = func(compiler_state)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/typed_passes.py", line 463, in run_pass
NativeLowering().run_pass(state)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/typed_passes.py", line 384, in run_pass
lower.lower()
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 136, in lower
self.lower_normal_function(self.fndesc)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 190, in lower_normal_function
entry_block_tail = self.lower_function_body()
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 216, in lower_function_body
self.lower_block(block)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 230, in lower_block
self.lower_inst(inst)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/contextlib.py", line 130, in exit
self.gen.throw(type, value, traceback)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/errors.py", line 751, in new_error_context
raise newerr.with_traceback(tb)
numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32

File "../miniconda3/envs/n2d2/lib/python3.7/site-packages/umap/layouts.py", line 52:
def rdist(x, y):

result = 0.0
dim = x.shape[0]
^

During: lowering "dim = static_getitem(value=$8load_attr.2, index=0, index_var=$const10.3, fn=)" at /home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/umap/layouts.py (52)

Tested on Ubuntu 20.04 running through WSL2 and on another workstation with Ubuntu 18.04.
Any ideas what have I done wrong?

Thanks.

Results on Fashion-MNIST is worse than results on your paper

I tried to run n2d on fashin-mnist dataset,but I got the result like this which is worse than results on your paper.

0.51409
0.55714
0.37784

I used the same parameters in the run.sh and the trained model downloaded with the code.
Can you please tell me how to acheive the ACC:0.672 and NMI:0.684 results on the dataset?

Unable to replicate the results

I am not sure this question comes under issues or not ? But I'm still gonna ask.

Hi @rymc, I am trying to replicate the results on MNIST dataset alone. I am little confused with the way the model is getting trained.

  • creating an autoencoder model #line354
  • extracting only modules related to encoder alone, creating separate model for encoder #line357.
  • then training the autoencoder model.
  • But predicting on Images, using encoder model ? #line385

How the encoder model got trained, since we are using it to predict on images?
Thanks in advance.

I cannot reproduce results when retraining

I am trying to reproduce the same results in the paper with the mnist dataset retraining the model from scratch. These are the parameters and the output:

Using GPU

Missing MulticoreTSNE package.. Only important if evaluating other manifold learners.

Namespace(ae_weights=None, batch_size=256, cluster='GMM', dataset='mnist', eval_all=False, gpu='0', manifold_learner='UMAP', n_clusters=10, pretrain_epochs=1000, save_dir='MYEXPS', umap_dim=10, umap_metric='euclidean', umap_min_dist='0.00', umap_neighbors=20, visualize=False)

Time to train the autoencoder: 1251.820315361023

=======================================

mnist | UMAP on autoencoded embedding with GMM - N2D
ACC 0.83611
NMI 0.8986
ARI 0.82823

============================================

They are well below the one in the paper (that I can reproduce using the provided weights):
ACC 0.979
MNI 0.942

Could you help me? Did I make some mistake?
Thanks

The theoretical derivation of the model

Hello,
Thank you for your paper and codes. I have a question: The theoretical derivation of the model is not given in the paper,is it because each component is trained independently and no formula derivation is required?
Hope to get your advice! Thanks!

Feature Selection

Hello,

Can I ask you to share HAR Feature Selection code?
Thank you

Number of Dimensions equals to clusters number

Hello,

I read the paper a couple of times, and everything was clear to me (for now) except 2 points.

First why are you setting as the number of dimensions to be the clusters number? (when you are using the manifold learning algorithm UMAP etc)
Second, for the visualization are you changing the number of the dimensions to 2, or you are adding one more manifold LA with the number of dimensions to 2?

Dimitris.

Training

Hello, Can you provide an example of how to train N2D network?

Problem with running the code

Hello,

When I want to run the code for UMAP, I get the following error:

Compilation is falling back to object mode WITH looplifting enabled because Function "make_euclidean_tree" failed type inference due to: Cannot unify RandomProjectionTreeNode(array(int64, 1d, C), bool, none, none, none, none) and RandomProjectionTreeNode(none, bool, array(float32, 1d, C), float64, RandomProjectionTreeNode(array(int64, 1d, C), bool, none, none, none, none), RandomProjectionTreeNode(array(int64, 1d, C), bool, none, none, none, none)) for '$46call_function.15', defined at/anaconda3/envs/n2d/lib/python3.7/site-packages/umap/rp_tree.py (457)

File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/rp_tree.py", line 457:
def make_euclidean_tree(data, indices, rng_state, leaf_size=30):

    left_node = make_euclidean_tree(data, left_indices, rng_state, leaf_size)
    ^

During: resolving callee type: recursive(type(CPUDispatcher(<function make_euclidean_tree at 0x7f959eb37dd0>)))
During: typing of call at/anaconda3/envs/n2d/lib/python3.7/site-packages/umap/rp_tree.py (457)

File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/rp_tree.py", line 457:
def make_euclidean_tree(data, indices, rng_state, leaf_size=30):

    left_node = make_euclidean_tree(data, left_indices, rng_state, leaf_size)
    ^

@numba.jit()
/anaconda3/envs/n2d/lib/python3.7/site-packages/numba/core/object_mode_passes.py:178: NumbaWarning: Function "make_euclidean_tree" was compiled in object mode without forceobj=True.

File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/rp_tree.py", line 451:
@numba.jit()
def make_euclidean_tree(data, indices, rng_state, leaf_size=30):
^

state.func_ir.loc))
/anaconda3/envs/n2d/lib/python3.7/site-packages/numba/core/object_mode_passes.py:188: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/rp_tree.py", line 451:
@numba.jit()
def make_euclidean_tree(data, indices, rng_state, leaf_size=30):
^

state.func_ir.loc))
/anaconda3/envs/n2d/lib/python3.7/site-packages/umap/nndescent.py:92: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/utils.py", line 409:
@numba.njit(parallel=True)
def build_candidates(current_graph, n_vertices, n_neighbors, max_candidates, rng_state):
^

current_graph, n_vertices, n_neighbors, max_candidates, rng_state
/anaconda3/envs/n2d/lib/python3.7/site-packages/numba/core/typed_passes.py:314: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/nndescent.py", line 47:
@numba.njit(parallel=True)
def nn_descent(
^

state.func_ir.loc))
/anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py:349: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "fuzzy_simplicial_set" failed type inference due to: Untyped global name 'nearest_neighbors': cannot determine Numba type of <class 'function'>

File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py", line 467:
def fuzzy_simplicial_set(

if knn_indices is None or knn_dists is None:
knn_indices, knn_dists, _ = nearest_neighbors(
^

@numba.jit()
/anaconda3/envs/n2d/lib/python3.7/site-packages/numba/core/object_mode_passes.py:178: NumbaWarning: Function "fuzzy_simplicial_set" was compiled in object mode without forceobj=True.

File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py", line 350:
@numba.jit()
def fuzzy_simplicial_set(
^

state.func_ir.loc))
/anaconda3/envs/n2d/lib/python3.7/site-packages/numba/core/object_mode_passes.py:188: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py", line 350:
@numba.jit()
def fuzzy_simplicial_set(
^

state.func_ir.loc))
Traceback (most recent call last):
File "n2d.py", line 391, in
hl, y, label_names)
File "n2d.py", line 192, in cluster_manifold_in_embedding
min_dist=md).fit_transform(hl)
File /anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py", line 1596, in fit_transform
self.fit(X, y)
File "/anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py", line 1454, in fit
self._search_graph.transpose()
File "/anaconda3/envs/n2d/lib/python3.7/site-packages/scipy/sparse/lil.py", line 437, in transpose
return self.tocsr(copy=copy).transpose(axes=axes, copy=False).tolil(copy=False)
File "/anaconda3/envs/n2d/lib/python3.7/site-packages/scipy/sparse/lil.py", line 462, in tocsr
_csparsetools.lil_get_lengths(self.rows, indptr[1:])
File "_csparsetools.pyx", line 109, in scipy.sparse._csparsetools.lil_get_lengths
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Could you please help?

Meaning of the plots

Hello,
I successfully applied this technique on my own dataset and its producing really good results. Thanks for that!
But i have one question. What is the difference betwenn the two plots called "n2d-predicted.png" and "n2d.png"?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.