A deep clustering algorithm. Code to reproduce results for our paper N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.
N2D: (Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding.
Abstract
Deep clustering has increasingly been demonstrating superiority over conventional shallow clustering algorithms.
Deep clustering algorithms usually combine representation learning with deep neural networks to achieve this performance, typically optimizing a clustering and non-clustering loss.
In such cases, an autoencoder is typically connected with a clustering network, and the final clustering is jointly learned by both the autoencoder and clustering network.
Instead, we propose to learn an autoencoded embedding and then search this further for the underlying manifold.
For simplicity, we then cluster this with a shallow clustering algorithm, rather than a deeper network.
We study a number of local and global manifold learning methods on both the raw data and autoencoded embedding, concluding that UMAP in our framework is able to find the best clusterable manifold of the embedding. This suggests that local manifold learning on an autoencoded embedding is effective for discovering higher quality clusters.
We quantitatively show across a range of image and time-series datasets that our method has competitive performance against the latest deep clustering algorithms, including out-performing current state-of-the-art on several.
We postulate that these results show a promising research direction for deep clustering.
Results
Visualizations
MNIST
HAR (Human Activity Recognition)
Note: clusters 'look' better in higher dimensions (based on clustering metrics) than they do here in 2d. The intended use of n2d is for clustering. Visualized here are the first 5000 points.
If you remove the --ae_weights argument when running n2d then it will train a new network, rather than load the pretrained weights.
For adding a new dataset you should add a load function to datasets.py (you can use the existing ones to understand how) and a function to call your data loading function from n2d.py
I used the following packages for training the networks using the GPU.
If you would like to produce some plots for visualization purposes add the agument '--visualize'. I also reccomend setting the argument '--umap_dim' to be 2.
Citation
@inproceedings{McConville2020,
author = {Ryan McConville and Raul Santos-Rodriguez and Robert J Piechocki and Ian Craddock},
title = {N2D:(Not Too) Deep Clustering via Clustering the Local Manifold of an Autoencoded Embedding},
booktitle = {25th International Conference on Pattern Recognition, {ICPR} 2020},
publisher = {{IEEE} Computer Society},
year = {2020},
}
Hi Ryan, when you get the chance do you think you could walk through the choice of the [500, 500, 2000, c] architecture, especially with relation to t-sne (as mentioned in the paper)? I’ve been trying to understand it so I can confidently explain it to my advisor, but coming up pretty short :)
Hello!
I hope there are still people hanging around in the repo and maybe know the issue. After completing the steps described in readme I got errors if I try to run the model. Here is what I got:
/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/errors.py", line 744, in new_error_context
yield
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 230, in lower_block
self.lower_inst(inst)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 328, in lower_inst
self.storevar(val, inst.target.name)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 1278, in storevar
raise AssertionError(msg)
AssertionError: Storing i64 to ptr of i32 ('dim'). FE type int32
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "n2d.py", line 15, in
import umap
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/umap/init.py", line 1, in
from .umap_ import UMAP
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/umap/umap_.py", line 54, in
from umap.layouts import (
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/umap/layouts.py", line 36, in
"dim": numba.types.int32,
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/decorators.py", line 221, in wrapper
disp.compile(sig)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/dispatcher.py", line 909, in compile
cres = self._compiler.compile(args, return_type)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/dispatcher.py", line 79, in compile
status, retval = self._compile_cached(args, return_type)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/dispatcher.py", line 93, in _compile_cached
retval = self._compile_core(args, return_type)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/dispatcher.py", line 111, in _compile_core
pipeline_class=self.pipeline_class)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler.py", line 606, in compile_extra
return pipeline.compile_extra(func)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler.py", line 353, in compile_extra
return self._compile_bytecode()
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler.py", line 415, in _compile_bytecode
return self._compile_core()
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler.py", line 395, in _compile_core
raise e
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler.py", line 386, in _compile_core
pm.run(self.state)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler_machinery.py", line 339, in run
raise patched_exception
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler_machinery.py", line 330, in run
self._runPass(idx, pass_inst, state)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
return func(*args, **kwargs)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler_machinery.py", line 289, in _runPass
mutated |= check(pss.run_pass, internal_state)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/compiler_machinery.py", line 262, in check
mangled = func(compiler_state)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/typed_passes.py", line 463, in run_pass
NativeLowering().run_pass(state)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/typed_passes.py", line 384, in run_pass
lower.lower()
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 136, in lower
self.lower_normal_function(self.fndesc)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 190, in lower_normal_function
entry_block_tail = self.lower_function_body()
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 216, in lower_function_body
self.lower_block(block)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/lowering.py", line 230, in lower_block
self.lower_inst(inst)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/contextlib.py", line 130, in exit
self.gen.throw(type, value, traceback)
File "/home/anton/miniconda3/envs/n2d2/lib/python3.7/site-packages/numba/core/errors.py", line 751, in new_error_context
raise newerr.with_traceback(tb)
numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32
File "../miniconda3/envs/n2d2/lib/python3.7/site-packages/umap/layouts.py", line 52:
def rdist(x, y):
I tried to run n2d on fashin-mnist dataset,but I got the result like this which is worse than results on your paper.
0.51409
0.55714
0.37784
I used the same parameters in the run.sh and the trained model downloaded with the code.
Can you please tell me how to acheive the ACC:0.672 and NMI:0.684 results on the dataset?
I am trying to reproduce the same results in the paper with the mnist dataset retraining the model from scratch. These are the parameters and the output:
Using GPU
Missing MulticoreTSNE package.. Only important if evaluating other manifold learners.
Hello,
Thank you for your paper and codes. I have a question: The theoretical derivation of the model is not given in the paper,is it because each component is trained independently and no formula derivation is required?
Hope to get your advice! Thanks!
I read the paper a couple of times, and everything was clear to me (for now) except 2 points.
First why are you setting as the number of dimensions to be the clusters number? (when you are using the manifold learning algorithm UMAP etc)
Second, for the visualization are you changing the number of the dimensions to 2, or you are adding one more manifold LA with the number of dimensions to 2?
I love this research, and would love to see it in an even more portable/applicable fashion in the form of a library. I have started with an object oriented framework for this stuff here, https://github.com/josephsdavid/N2D-OOP, and would love to make this into an entire library where it can be widely used :)
When I want to run the code for UMAP, I get the following error:
Compilation is falling back to object mode WITH looplifting enabled because Function "make_euclidean_tree" failed type inference due to: Cannot unify RandomProjectionTreeNode(array(int64, 1d, C), bool, none, none, none, none) and RandomProjectionTreeNode(none, bool, array(float32, 1d, C), float64, RandomProjectionTreeNode(array(int64, 1d, C), bool, none, none, none, none), RandomProjectionTreeNode(array(int64, 1d, C), bool, none, none, none, none)) for '$46call_function.15', defined at/anaconda3/envs/n2d/lib/python3.7/site-packages/umap/rp_tree.py (457)
File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/rp_tree.py", line 457:
def make_euclidean_tree(data, indices, rng_state, leaf_size=30):
@numba.jit()
/anaconda3/envs/n2d/lib/python3.7/site-packages/numba/core/object_mode_passes.py:178: NumbaWarning: Function "make_euclidean_tree" was compiled in object mode without forceobj=True.
state.func_ir.loc))
/anaconda3/envs/n2d/lib/python3.7/site-packages/numba/core/object_mode_passes.py:188: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.
state.func_ir.loc))
/anaconda3/envs/n2d/lib/python3.7/site-packages/umap/nndescent.py:92: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.
current_graph, n_vertices, n_neighbors, max_candidates, rng_state
/anaconda3/envs/n2d/lib/python3.7/site-packages/numba/core/typed_passes.py:314: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.
File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/nndescent.py", line 47:
@numba.njit(parallel=True)
def nn_descent(
^
state.func_ir.loc))
/anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py:349: NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "fuzzy_simplicial_set" failed type inference due to: Untyped global name 'nearest_neighbors': cannot determine Numba type of <class 'function'>
File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py", line 467:
def fuzzy_simplicial_set(
@numba.jit()
/anaconda3/envs/n2d/lib/python3.7/site-packages/numba/core/object_mode_passes.py:178: NumbaWarning: Function "fuzzy_simplicial_set" was compiled in object mode without forceobj=True.
File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py", line 350:
@numba.jit()
def fuzzy_simplicial_set(
^
state.func_ir.loc))
/anaconda3/envs/n2d/lib/python3.7/site-packages/numba/core/object_mode_passes.py:188: NumbaDeprecationWarning:
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.
File "../anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py", line 350:
@numba.jit()
def fuzzy_simplicial_set(
^
state.func_ir.loc))
Traceback (most recent call last):
File "n2d.py", line 391, in
hl, y, label_names)
File "n2d.py", line 192, in cluster_manifold_in_embedding
min_dist=md).fit_transform(hl)
File /anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py", line 1596, in fit_transform
self.fit(X, y)
File "/anaconda3/envs/n2d/lib/python3.7/site-packages/umap/umap_.py", line 1454, in fit
self._search_graph.transpose()
File "/anaconda3/envs/n2d/lib/python3.7/site-packages/scipy/sparse/lil.py", line 437, in transpose
return self.tocsr(copy=copy).transpose(axes=axes, copy=False).tolil(copy=False)
File "/anaconda3/envs/n2d/lib/python3.7/site-packages/scipy/sparse/lil.py", line 462, in tocsr
_csparsetools.lil_get_lengths(self.rows, indptr[1:])
File "_csparsetools.pyx", line 109, in scipy.sparse._csparsetools.lil_get_lengths
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
Hello,
I successfully applied this technique on my own dataset and its producing really good results. Thanks for that!
But i have one question. What is the difference betwenn the two plots called "n2d-predicted.png" and "n2d.png"?