Giter Site home page Giter Site logo

davisidarta / dbmap Goto Github PK

View Code? Open in Web Editor NEW
46.0 46.0 4.0 29.7 MB

A fast, accurate, and modularized dimensionality reduction approach based on diffusion harmonics and graph layouts. Escalates to millions of samples on a personal laptop. Adds high-dimensional big data intrinsic structure to your clustering and data visualization workflow.

Home Page: https://dbmap.readthedocs.io/en/latest/

License: GNU General Public License v2.0

Python 100.00%
denoising diffusion-process dimensionality-reduction graph-layout high-dimensional machine-learning nearest-neighbors single-cell umap visualization

dbmap's Introduction

Stars Twitter

Hi! I'm Davi

I develop tools to understand and interpret high-dimensional data, with a focus on single-cell omics.

  • I developed TopOMetry, a comprehensive framework for high-dimensional data analysis. TopOMetry learns similarity graphs, estimates the dimensionality of the data, obtains latent dimensions using topological operators, clusters samples and layouts topological graphs into two-dimensional visualizations. TopOMetry learns and evaluates dozens of possible visualizations so that users do not have to stick with any pre-determined model (e.g. t-SNE or UMAP). It was designed to be compatible with a scikit-learn centered workflow, as most classes and functions can be pipelined. TopOMetry manuscript is freely available at BioRxiv.

  • I'm currently a postdoc at Ana Domingos' lab at the University of Oxford. We are working on generating and analyzing single-cell datasets from a variety of tissues relevant to obesity and metabolism to build updated comprehensive neuroanatomical maps with cellular resolution. These will serve as a foundation for new studies investigating cellular-specific therapeutic targets for obesity and its comorbidities.

I'm always open to interesting conversations and enjoy getting involved in many projects. Feel free to reach me by email.

I tweet about medicine, neuroscience, computational biology, machine learning, and sometimes about my personal life.

dbmap's People

Contributors

apcamargo avatar davisidarta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dbmap's Issues

[Bug] `Metric jaccard available for string data only.` Jaccard should also be available for boolean data

I'm trying to use this package to compute diffusion maps on my boolean data. However, I'm unable to use Jaccard distance because it is expecting strings. Jaccard should also be able to work on boolean data.

Is there any way I can use the approximate nearest neighbors with boolean data and Jaccard distance?

X.dtypes.unique()
# array([dtype('bool')], dtype=object)

model = dm.diffusion.Diffusor(n_neighbors=int(X.shape[0] * (1/3)), ann_dist="jaccard", knn_dist="jaccard", ann=True)
model.fit(X)
# Input data is <class 'pandas.core.frame.DataFrame'> .Converting input to sparse...

# ---------------------------------------------------------------------------
# RuntimeError                              Traceback (most recent call last)
# File <timed exec>:26

# File ~/miniconda3/envs/mns-interactive_env/lib/python3.8/site-packages/dbmap/diffusion.py:142, in Diffusor.fit(self, data)
#     139     raise Exception('Kernel must be either \'simple\', \'simple_adaptive\', \'decay\' or \'decay_adaptive\'.') 
#     140 if self.ann:
#     141     # Construct an approximate k-nearest-neighbors graph
# --> 142     anbrs = ann.NMSlibTransformer(n_neighbors=self.n_neighbors,
#     143                                   metric=self.ann_dist,
#     144                                   p=self.p,
#     145                                   method='hnsw',
#     146                                   n_jobs=self.n_jobs,
#     147                                   M=self.M,
#     148                                   efC=self.efC,
#     149                                   efS=self.efS,
#     150                                   verbose=self.verbose).fit(data)
#     151     knn = anbrs.transform(data)
#     152     # X, y specific stds: Normalize by the distance of median nearest neighbor to account for neighborhood size.

# File ~/miniconda3/envs/mns-interactive_env/lib/python3.8/site-packages/dbmap/ann.py:188, in NMSlibTransformer.fit(self, data)
#     186         print('Metric ' + self.metric + 'available for string data only. Trying to compute distances...')
#     187         data = data.toarray()
# --> 188         self.nmslib_ = nmslib.init(method=self.method,
#     189                                    space=self.space,
#     190                                    data_type=nmslib.DataType.OBJECT_AS_STRING)
#     191 else:
#     192     self.space = {
#     193         'sqeuclidean': 'l2',
#     194         'euclidean': 'l2',
#    (...)
#     204         'jansen-shan': 'jsmetrfastapprox'
#     205     }[self.metric]

# RuntimeError: 2024-05-23 00:14:26 spacefactory.h:50 (CreateSpace) It looks like the space jaccard is not defined for the distance type : FLOAT

[Bug] .transform method returns incorrect shapes

I'm trying to use dbMAP to fit a model and then transform new data into the diffusion space. However, I noticed that when I transform new data it returns the original data's shape. Further, the number of components is off in both cases.

Am I doing something incorrectly here?

import dbmap as dm
dm.__version__
# '1.2.0.4'

print(X.shape, X.dtypes.unique())
# (1000, 100) [dtype('bool')]

print(Y.shape, Y.dtypes.unique())
# (199, 100) [dtype('bool')]

model = dm.diffusion.Diffusor(n_neighbors=10, n_components=50, ann=False, knn_dist="jaccard")
model.fit(X)
print(model.transform(X).shape)
# (1000, 62)

print(model.transform(Y).shape)
# (1000, 72)

dbmap in reticulate

Hi!

Is it possible to use this through reticulate?

Fast approximate nearest neighbour functions well. However, fast adaptive multiscaled diffusion maps, the last part of the code does not work:

ind, dist, grad, graph = diff.ind_dist_grad(data)

which in R i used:

diff$ind_dist_grad(data).

However, for this I receive this error:

Error in py_call_impl(callable, dots$args, dots$keywords) :
RuntimeError: 2021-01-04 17:55:21 spacefactory.h:50 (CreateSpace) It looks like the space cosine is not defined for the distance type : FLOAT

Detailed traceback:
File "/Users/knight05/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/dbmap/diffusion.py", line 406, in ind_dist_grad
).fit(mms)
File "/Users/knight05/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/dbmap/ann.py", line 145, in fit
data_type=nmslib.DataType.DENSE_VECTOR)

Are you able to indicate why this could be happening?

[Error|Feature Request] - Illegal instruction: 4 when loading package. Please make available through conda.

I've tried installing via pip and made sure all dependencies including scikit-build were installed prior. I get a fatal error when importing the package.

Can you build a conda package? Even if you push it to a personal channel it would be very helpful

Python 3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:36:46)
[Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dbmap as dm
Illegal instruction: 4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.