ggiecold-zz / dbscan_multiplex Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 17.0 23 KB

A fast and efficient implementation of DBSCAN clustering.

License: MIT License

Python 100.00%

dbscan_multiplex's People

Contributors

Stargazers

Watchers

Forkers

carygeo jrings merlin83 hbwales changliu0427 skelow abhishekhp2016 python3pkg 2016102050016 cloriszhou tangzedong shoulda priscillaboyd qinhaochen smbbws sergeysamvelmirzoyan 18158911357

dbscan_multiplex's Issues

Warning! HDF5 library version mismatched error

Here is the error message:

The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
You can, at your own risk, disable this warning by setting the environment
variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'.
Setting it to 2 or higher will suppress the warning messages totally.
Headers are 1.8.17, library is 1.8.18

MemoryError

eps, labels_matrix = DB.DBSCAN(textVect, minPts = 1000, verbose = True,metric='euclidean')

I tried running DBSCAN on a vector which has 300000*300 dimension input data which came from doc2vec output. However, i am getting memory error as follows.

INFO: DBSCAN_multiplex @ load:
starting the determination of an appropriate value of 'eps' for this data-set and for the other parameter of the DBSCAN algorithm set to 1000.
This might take a while.

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-15-20e71b464c7a> in <module>()
----> 1 eps, labels_matrix = DB.DBSCAN(textVect, minPts = 1000, verbose = True,metric='euclidean')

c:\python27\lib\site-packages\DBSCAN_multiplex.pyc in DBSCAN(data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
    600 
    601     with open(path.join(getcwd(), 'tmp.h5'), 'w') as f:
--> 602         eps = load(f.name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
    603 
    604         for run in xrange(N_runs):

c:\python27\lib\site-packages\DBSCAN_multiplex.pyc in load(hdf5_file_name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
    271         quantile = np.clip(quantile, 0, 100)
    272 
--> 273         k_distances = kneighbors_graph(data, minPts, mode = 'distance', metric = metric, p = p).data
    274 
    275         radii = np.zeros(N_samples, dtype = float)

c:\python27\lib\site-packages\sklearn\neighbors\graph.pyc in kneighbors_graph(X, n_neighbors, mode, metric, p, metric_params, include_self, n_jobs)
    101 
    102     query = _query_include_self(X, include_self)
--> 103     return X.kneighbors_graph(X=query, n_neighbors=n_neighbors, mode=mode)
    104 
    105 

c:\python27\lib\site-packages\sklearn\neighbors\base.pyc in kneighbors_graph(self, X, n_neighbors, mode)
    487         elif mode == 'distance':
    488             A_data, A_ind = self.kneighbors(
--> 489                 X, n_neighbors, return_distance=True)
    490             A_data = np.ravel(A_data)
    491 

c:\python27\lib\site-packages\sklearn\neighbors\base.pyc in kneighbors(self, X, n_neighbors, return_distance)
    383                 delayed(self._tree.query, check_pickle=False)(
    384                     X[s], n_neighbors, return_distance)
--> 385                 for s in gen_even_slices(X.shape[0], n_jobs)
    386             )
    387             if return_distance:

c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.pyc in __call__(self, iterable)
    777             # was dispatched. In particular this covers the edge
    778             # case of Parallel used with an exhausted iterator.
--> 779             while self.dispatch_one_batch(iterator):
    780                 self._iterating = True
    781             else:

c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.pyc in dispatch_one_batch(self, iterator)
    623                 return False
    624             else:
--> 625                 self._dispatch(tasks)
    626                 return True
    627 

c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.pyc in _dispatch(self, batch)
    586         dispatch_timestamp = time.time()
    587         cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588         job = self._backend.apply_async(batch, callback=cb)
    589         self._jobs.append(job)
    590 

c:\python27\lib\site-packages\sklearn\externals\joblib\_parallel_backends.pyc in apply_async(self, func, callback)
    109     def apply_async(self, func, callback=None):
    110         """Schedule a func to be run"""
--> 111         result = ImmediateResult(func)
    112         if callback:
    113             callback(result)

c:\python27\lib\site-packages\sklearn\externals\joblib\_parallel_backends.pyc in __init__(self, batch)
    330         # Don't delay the application, to avoid keeping the input
    331         # arguments in memory
--> 332         self.results = batch()
    333 
    334     def get(self):

c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.pyc in __call__(self)
    129 
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
    132 
    133     def __len__(self):

sklearn\neighbors\binary_tree.pxi in sklearn.neighbors.kd_tree.BinaryTree.query()

c:\python27\lib\site-packages\sklearn\utils\validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    431                                       force_all_finite)
    432     else:
--> 433         array = np.array(array, dtype=dtype, order=order, copy=copy)
    434 
    435         if ensure_2d:

MemoryError:

Error in the demonstrated example (py3.5, spyder)

When I run this code:
import numpy as np
import DBSCAN_multiplex as DB
data = np.random.randn(15000, 7)
N_iterations = 50
N_sub = 9 * data.shape[0] / 10
subsamples_matrix = np.zeros((N_iterations, N_sub), dtype = float)#int)
for i in range(N_iterations):
subsamples_matrix[i] = np.random.choice(data.shape[0], N_sub, replace = False)
eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)

TypeError: 'float' object cannot be interpreted as an integer

Typo?

https://github.com/GGiecold/DBSCAN_multiplex/blob/075b1eec86d0e75166a9378d7d9a8974fc0a5e2e/DBSCAN_multiplex.py#L52

hi, what is the difference between sklearn and your dbscan implemation?

Unable to open/create file 'c:\Python27\Scripts\tmp0ak3rd.h5'

I have windows10 operating system and i am using python2.7 from python.org distribution
When i execute the example given in the wiki page

import numpy as np
import DBSCAN_multiplex as DB

data = np.random.randn(15000, 7)
N_iterations = 50
N_sub = 9 * data.shape[0] / 10
subsamples_matrix = np.zeros((N_iterations, N_sub), dtype = int)
for i in xrange(N_iterations):
        subsamples_matrix[i] = np.random.choice(data.shape[0], N_sub, replace = False)
eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)

I am getting error

---------------------------------------------------------------------------
HDF5ExtError                              Traceback (most recent call last)
<ipython-input-77-1dd50a293e38> in <module>()
      1 #eps, labels_matrix = DB.DBSCAN(data, minPts = 3, verbose = True,eps=0.3,metric='cosine')
----> 2 eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix)

c:\python27\lib\site-packages\DBSCAN_multiplex.pyc in DBSCAN(data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
    606 
    607     with NamedTemporaryFile('w', suffix = '.h5', delete = True, dir = './') as f:
--> 608         eps = load(f.name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
    609 
    610         for run in xrange(N_runs):

c:\python27\lib\site-packages\DBSCAN_multiplex.pyc in load(hdf5_file_name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
    318     beg_neigh = time.time()
    319 
--> 320     fileh = tables.open_file(hdf5_file_name, mode = 'r+')
    321     DBSCAN_group = fileh.create_group(fileh.root, 'DBSCAN_group')
    322 

c:\python27\lib\site-packages\tables\file.pyc in open_file(filename, mode, title, root_uep, filters, **kwargs)
    318 
    319     # Finally, create the File instance, and return it
--> 320     return File(filename, mode, title, root_uep, filters, **kwargs)
    321 
    322 

c:\python27\lib\site-packages\tables\file.pyc in __init__(self, filename, mode, title, root_uep, filters, **kwargs)
    781 
    782         # Now, it is time to initialize the File extension
--> 783         self._g_new(filename, mode, **params)
    784 
    785         # Check filters and set PyTables format version for new files.

tables\hdf5extension.pyx in tables.hdf5extension.File._g_new (tables\hdf5extension.c:5519)()

HDF5ExtError: HDF5 error back trace

  File "C:\conda\conda-bld\work\hdf5-1.8.17\src\H5F.c", line 604, in H5Fopen
    unable to open file
  File "C:\conda\conda-bld\work\hdf5-1.8.17\src\H5Fint.c", line 992, in H5F_open
    unable to open file: time = Mon Jan 29 00:05:33 2018
, name = 'c:\Python27\Scripts\tmp0ak3rd.h5', tent_flags = 1
  File "C:\conda\conda-bld\work\hdf5-1.8.17\src\H5FD.c", line 993, in H5FD_open
    open failed
  File "C:\conda\conda-bld\work\hdf5-1.8.17\src\H5FDsec2.c", line 339, in H5FD_sec2_open
    unable to open file: name = 'c:\Python27\Scripts\tmp0ak3rd.h5', errno = 13, error message = 'Permission denied', flags = 1, o_flags = 2

End of HDF5 error back trace

Unable to open/create file 'c:\Python27\Scripts\tmp0ak3rd.h5'

DBSCAN on Windows with Anaconda Python - no permission to create temporary file

I have the following code to run, but HDF5 tells me in the error back trace that i have no permission to create the temporary file.

The code I run is a script to run DBSCAN_multiplex.py:

import numpy as np
import DBSCAN_multiplex as DB
print("hey ;-)")
data = np.random.randn(15000, 7)
N_iterations = 50
N_sub = 9 * data.shape[0] / 10
subsamples_matrix = np.zeros((N_iterations, N_sub), dtype = int)
for i in xrange(N_iterations): subsamples_matrix[i] = np.random.choice(data.shape[0], N_sub, replace = False)
eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)

The error message in PyCharm is:

C:\Anaconda2\python.exe E:/Dropbox/DATA/research/GhostTowns/WB/untitled/dbscan.py
hey ;-)
INFO: DBSCAN_multiplex @ load:
starting the determination of an appropriate value of 'eps' for this data-set and for the other parameter of the DBSCAN algorithm set to 3.
This might take a while.

INFO: DBSCAN_multiplex @ load:
done with evaluating parameter 'eps' from the data-set provided. This took 1.737 seconds. Value of epsilon: 0.921.

INFO: DBSCAN_multiplex @ load:
identifying the neighbors within an hypersphere of radius 0.921 around each sample, while at the same time evaluating the number of epsilon-neighbors for each sample.
This might take a fair amount of time.
Traceback (most recent call last):
  File "E:/Dropbox/DATA/research/GhostTowns/WB/untitled/dbscan.py", line 9, in <module>
    eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)
  File "C:\Anaconda2\lib\site-packages\DBSCAN_multiplex.py", line 656, in DBSCAN
    eps = load(f.name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
  File "C:\Anaconda2\lib\site-packages\DBSCAN_multiplex.py", line 354, in load
    fileh = tables.open_file(hdf5_file_name, mode = 'r+')
  File "C:\Anaconda2\lib\site-packages\tables\file.py", line 318, in open_file
    return File(filename, mode, title, root_uep, filters, **kwargs)
  File "C:\Anaconda2\lib\site-packages\tables\file.py", line 784, in __init__
    self._g_new(filename, mode, **params)
  File "tables\hdf5extension.pyx", line 488, in tables.hdf5extension.File._g_new (tables\hdf5extension.c:5458)
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5F.c", line 604, in H5Fopen
    unable to open file
  File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5Fint.c", line 990, in H5F_open
    unable to open file: time = Mon Dec 07 18:12:12 2015
, name = 'F:\temp\tmpthbz3d.h5', tent_flags = 1
  File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5FD.c", line 993, in H5FD_open
    open failed
  File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5FDsec2.c", line 343, in H5FD_sec2_open
    unable to open file: name = 'F:\temp\tmpthbz3d.h5', errno = 13, error message = 'Permission denied', flags = 1, o_flags = 2

End of HDF5 error back trace

Unable to open/create file 'F:\temp\tmpthbz3d.h5'

Process finished with exit code 1

What it boils down to is the last line which says that permissions are not set. However, I have allowed full read and write permissions for the folder and I tried to run python with administrator rights. I also guess, that the issue is somehow connected to HDF5, or HDF5 is trying to create this temporary file but fails. At this point I am clueless on how to solve this issue and I would be happy about any input.

This question might be connected to http://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file but since the mode is set to 'w' I had no luck with it.

Thanks

Encoding problem

Trying to run the setup, I get

  File "setup.py", line 20
SyntaxError: Non-ASCII character '\xe2' in file setup.py on line 21, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

This should be fixable by adding

# -*- coding: utf-8 -*-

to the top of the file. Same for the DBSCAN_multiplex.py.

I'll draft a quick PR!

ggiecold-zz / dbscan_multiplex Goto Github PK

dbscan_multiplex's People

Contributors

Stargazers

Watchers

Forkers

dbscan_multiplex's Issues

Warning! HDF5 library version mismatched error

MemoryError

Error in the demonstrated example (py3.5, spyder)

Typo?

hi, what is the difference between sklearn and your dbscan implemation?

Unable to open/create file 'c:\Python27\Scripts\tmp0ak3rd.h5'

DBSCAN on Windows with Anaconda Python - no permission to create temporary file

Encoding problem

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent