ggiecold-zz / dbscan_multiplex Goto Github PK
View Code? Open in Web Editor NEWA fast and efficient implementation of DBSCAN clustering.
License: MIT License
A fast and efficient implementation of DBSCAN clustering.
License: MIT License
I have the following code to run, but HDF5 tells me in the error back trace that i have no permission to create the temporary file.
The code I run is a script to run DBSCAN_multiplex.py:
import numpy as np
import DBSCAN_multiplex as DB
print("hey ;-)")
data = np.random.randn(15000, 7)
N_iterations = 50
N_sub = 9 * data.shape[0] / 10
subsamples_matrix = np.zeros((N_iterations, N_sub), dtype = int)
for i in xrange(N_iterations): subsamples_matrix[i] = np.random.choice(data.shape[0], N_sub, replace = False)
eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)
The error message in PyCharm is:
C:\Anaconda2\python.exe E:/Dropbox/DATA/research/GhostTowns/WB/untitled/dbscan.py
hey ;-)
INFO: DBSCAN_multiplex @ load:
starting the determination of an appropriate value of 'eps' for this data-set and for the other parameter of the DBSCAN algorithm set to 3.
This might take a while.
INFO: DBSCAN_multiplex @ load:
done with evaluating parameter 'eps' from the data-set provided. This took 1.737 seconds. Value of epsilon: 0.921.
INFO: DBSCAN_multiplex @ load:
identifying the neighbors within an hypersphere of radius 0.921 around each sample, while at the same time evaluating the number of epsilon-neighbors for each sample.
This might take a fair amount of time.
Traceback (most recent call last):
File "E:/Dropbox/DATA/research/GhostTowns/WB/untitled/dbscan.py", line 9, in <module>
eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)
File "C:\Anaconda2\lib\site-packages\DBSCAN_multiplex.py", line 656, in DBSCAN
eps = load(f.name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
File "C:\Anaconda2\lib\site-packages\DBSCAN_multiplex.py", line 354, in load
fileh = tables.open_file(hdf5_file_name, mode = 'r+')
File "C:\Anaconda2\lib\site-packages\tables\file.py", line 318, in open_file
return File(filename, mode, title, root_uep, filters, **kwargs)
File "C:\Anaconda2\lib\site-packages\tables\file.py", line 784, in __init__
self._g_new(filename, mode, **params)
File "tables\hdf5extension.pyx", line 488, in tables.hdf5extension.File._g_new (tables\hdf5extension.c:5458)
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5F.c", line 604, in H5Fopen
unable to open file
File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5Fint.c", line 990, in H5F_open
unable to open file: time = Mon Dec 07 18:12:12 2015
, name = 'F:\temp\tmpthbz3d.h5', tent_flags = 1
File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5FD.c", line 993, in H5FD_open
open failed
File "C:\Python27\conda-bld\work\hdf5-1.8.15-patch1\src\H5FDsec2.c", line 343, in H5FD_sec2_open
unable to open file: name = 'F:\temp\tmpthbz3d.h5', errno = 13, error message = 'Permission denied', flags = 1, o_flags = 2
End of HDF5 error back trace
Unable to open/create file 'F:\temp\tmpthbz3d.h5'
Process finished with exit code 1
What it boils down to is the last line which says that permissions are not set. However, I have allowed full read and write permissions for the folder and I tried to run python with administrator rights. I also guess, that the issue is somehow connected to HDF5, or HDF5 is trying to create this temporary file but fails. At this point I am clueless on how to solve this issue and I would be happy about any input.
This question might be connected to http://stackoverflow.com/questions/23212435/permission-denied-to-write-to-my-temporary-file but since the mode is set to 'w' I had no luck with it.
Thanks
I have windows10 operating system and i am using python2.7 from python.org distribution
When i execute the example given in the wiki page
import numpy as np
import DBSCAN_multiplex as DB
data = np.random.randn(15000, 7)
N_iterations = 50
N_sub = 9 * data.shape[0] / 10
subsamples_matrix = np.zeros((N_iterations, N_sub), dtype = int)
for i in xrange(N_iterations):
subsamples_matrix[i] = np.random.choice(data.shape[0], N_sub, replace = False)
eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)
I am getting error
---------------------------------------------------------------------------
HDF5ExtError Traceback (most recent call last)
<ipython-input-77-1dd50a293e38> in <module>()
1 #eps, labels_matrix = DB.DBSCAN(data, minPts = 3, verbose = True,eps=0.3,metric='cosine')
----> 2 eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix)
c:\python27\lib\site-packages\DBSCAN_multiplex.pyc in DBSCAN(data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
606
607 with NamedTemporaryFile('w', suffix = '.h5', delete = True, dir = './') as f:
--> 608 eps = load(f.name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
609
610 for run in xrange(N_runs):
c:\python27\lib\site-packages\DBSCAN_multiplex.pyc in load(hdf5_file_name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
318 beg_neigh = time.time()
319
--> 320 fileh = tables.open_file(hdf5_file_name, mode = 'r+')
321 DBSCAN_group = fileh.create_group(fileh.root, 'DBSCAN_group')
322
c:\python27\lib\site-packages\tables\file.pyc in open_file(filename, mode, title, root_uep, filters, **kwargs)
318
319 # Finally, create the File instance, and return it
--> 320 return File(filename, mode, title, root_uep, filters, **kwargs)
321
322
c:\python27\lib\site-packages\tables\file.pyc in __init__(self, filename, mode, title, root_uep, filters, **kwargs)
781
782 # Now, it is time to initialize the File extension
--> 783 self._g_new(filename, mode, **params)
784
785 # Check filters and set PyTables format version for new files.
tables\hdf5extension.pyx in tables.hdf5extension.File._g_new (tables\hdf5extension.c:5519)()
HDF5ExtError: HDF5 error back trace
File "C:\conda\conda-bld\work\hdf5-1.8.17\src\H5F.c", line 604, in H5Fopen
unable to open file
File "C:\conda\conda-bld\work\hdf5-1.8.17\src\H5Fint.c", line 992, in H5F_open
unable to open file: time = Mon Jan 29 00:05:33 2018
, name = 'c:\Python27\Scripts\tmp0ak3rd.h5', tent_flags = 1
File "C:\conda\conda-bld\work\hdf5-1.8.17\src\H5FD.c", line 993, in H5FD_open
open failed
File "C:\conda\conda-bld\work\hdf5-1.8.17\src\H5FDsec2.c", line 339, in H5FD_sec2_open
unable to open file: name = 'c:\Python27\Scripts\tmp0ak3rd.h5', errno = 13, error message = 'Permission denied', flags = 1, o_flags = 2
End of HDF5 error back trace
Unable to open/create file 'c:\Python27\Scripts\tmp0ak3rd.h5'
eps, labels_matrix = DB.DBSCAN(textVect, minPts = 1000, verbose = True,metric='euclidean')
I tried running DBSCAN on a vector which has 300000*300 dimension input data which came from doc2vec output. However, i am getting memory error as follows.
INFO: DBSCAN_multiplex @ load:
starting the determination of an appropriate value of 'eps' for this data-set and for the other parameter of the DBSCAN algorithm set to 1000.
This might take a while.
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-15-20e71b464c7a> in <module>()
----> 1 eps, labels_matrix = DB.DBSCAN(textVect, minPts = 1000, verbose = True,metric='euclidean')
c:\python27\lib\site-packages\DBSCAN_multiplex.pyc in DBSCAN(data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
600
601 with open(path.join(getcwd(), 'tmp.h5'), 'w') as f:
--> 602 eps = load(f.name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
603
604 for run in xrange(N_runs):
c:\python27\lib\site-packages\DBSCAN_multiplex.pyc in load(hdf5_file_name, data, minPts, eps, quantile, subsamples_matrix, samples_weights, metric, p, verbose)
271 quantile = np.clip(quantile, 0, 100)
272
--> 273 k_distances = kneighbors_graph(data, minPts, mode = 'distance', metric = metric, p = p).data
274
275 radii = np.zeros(N_samples, dtype = float)
c:\python27\lib\site-packages\sklearn\neighbors\graph.pyc in kneighbors_graph(X, n_neighbors, mode, metric, p, metric_params, include_self, n_jobs)
101
102 query = _query_include_self(X, include_self)
--> 103 return X.kneighbors_graph(X=query, n_neighbors=n_neighbors, mode=mode)
104
105
c:\python27\lib\site-packages\sklearn\neighbors\base.pyc in kneighbors_graph(self, X, n_neighbors, mode)
487 elif mode == 'distance':
488 A_data, A_ind = self.kneighbors(
--> 489 X, n_neighbors, return_distance=True)
490 A_data = np.ravel(A_data)
491
c:\python27\lib\site-packages\sklearn\neighbors\base.pyc in kneighbors(self, X, n_neighbors, return_distance)
383 delayed(self._tree.query, check_pickle=False)(
384 X[s], n_neighbors, return_distance)
--> 385 for s in gen_even_slices(X.shape[0], n_jobs)
386 )
387 if return_distance:
c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.pyc in __call__(self, iterable)
777 # was dispatched. In particular this covers the edge
778 # case of Parallel used with an exhausted iterator.
--> 779 while self.dispatch_one_batch(iterator):
780 self._iterating = True
781 else:
c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.pyc in dispatch_one_batch(self, iterator)
623 return False
624 else:
--> 625 self._dispatch(tasks)
626 return True
627
c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.pyc in _dispatch(self, batch)
586 dispatch_timestamp = time.time()
587 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self)
--> 588 job = self._backend.apply_async(batch, callback=cb)
589 self._jobs.append(job)
590
c:\python27\lib\site-packages\sklearn\externals\joblib\_parallel_backends.pyc in apply_async(self, func, callback)
109 def apply_async(self, func, callback=None):
110 """Schedule a func to be run"""
--> 111 result = ImmediateResult(func)
112 if callback:
113 callback(result)
c:\python27\lib\site-packages\sklearn\externals\joblib\_parallel_backends.pyc in __init__(self, batch)
330 # Don't delay the application, to avoid keeping the input
331 # arguments in memory
--> 332 self.results = batch()
333
334 def get(self):
c:\python27\lib\site-packages\sklearn\externals\joblib\parallel.pyc in __call__(self)
129
130 def __call__(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
132
133 def __len__(self):
sklearn\neighbors\binary_tree.pxi in sklearn.neighbors.kd_tree.BinaryTree.query()
c:\python27\lib\site-packages\sklearn\utils\validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:
MemoryError:
When I run this code:
import numpy as np
import DBSCAN_multiplex as DB
data = np.random.randn(15000, 7)
N_iterations = 50
N_sub = 9 * data.shape[0] / 10
subsamples_matrix = np.zeros((N_iterations, N_sub), dtype = float)#int)
for i in range(N_iterations):
subsamples_matrix[i] = np.random.choice(data.shape[0], N_sub, replace = False)
eps, labels_matrix = DB.DBSCAN(data, minPts = 3, subsamples_matrix = subsamples_matrix, verbose = True)
TypeError: 'float' object cannot be interpreted as an integer
Trying to run the setup, I get
File "setup.py", line 20
SyntaxError: Non-ASCII character '\xe2' in file setup.py on line 21, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
This should be fixable by adding
# -*- coding: utf-8 -*-
to the top of the file. Same for the DBSCAN_multiplex.py.
I'll draft a quick PR!
Here is the error message:
The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
You can, at your own risk, disable this warning by setting the environment
variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'.
Setting it to 2 or higher will suppress the warning messages totally.
Headers are 1.8.17, library is 1.8.18
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.