Giter Site home page Giter Site logo

Numba support about dcor HOT 3 CLOSED

vnmabus avatar vnmabus commented on June 17, 2024
Numba support

from dcor.

Comments (3)

vnmabus avatar vnmabus commented on June 17, 2024

Can you please clarify what do you mean by "can't support Numba"? This package uses Numba to accelerate the internal computations in the case that the "fast distance covariance algorithm" can be used instead of the original O(N^2) algorithm.

from dcor.

asemic-horizon avatar asemic-horizon commented on June 17, 2024

Hi. First let me apologize for the tone of that question. Besides being vague, it comes off as really rude. I don't know how that came from me. I wanted to know if there was something marginal that I could fix and make everything work.

Second, I can't reproduce the problem for arbitrary data. The problem doesn't happen with iris (which is small, 150 rows x 4 cols), but it does repeat with fashion-mnist (which is larger, on the order of tens of thousands of rows and (28*28) columns -- but not huge either.)

I came to believe that this was a dcor problem because I had used other custom metrics succesfully -- but, as it turns out, not in datasets as large as fashion-mnist.

I'm including some details on reproduction attempts but it's not clearly a dcor problem; it's probably the approximated-nearest-neighbors algorithm UMAP uses.


First, since UMAP uses the idea of nearest-neighbors (although it doesn't use the stock/exact algorithm), we try the following (code 1), which works

from sklearn.datasets import load_iris
iris = load_iris()

from sklearn.neighbors import kneighbors_graph
from numba import jit
from dcor import distance_correlation

@jit
def distcor(x,y):
    return 1 - distance_correlation(x,y)

g = kneighbors_graph(iris.data, 2, mode = 'distance', metric='pyfunc',
            metric_params = {'func': distcor})

Attempting to run this for fashion-mnist takes >20 minutes (the algorithm is expected to explode with large datasets anyway)-- I've given up before errors came up.

The following (code 2) runs UMAP itself. And works.

from umap import UMAP
embedding = UMAP(metric = distcor, n_neighbors = 4).fit_transform(iris.data)

Code 2 for fashion-mnist fails very loudly.

TypingError: Failed at nopython (nopython frontend)
Invalid usage of type(CPUDispatcher(<function distcor at 0x000001FC83B2B378>)) with parameters (array(float32, 1d, C), array(float32, 1d, C))
 * parameterized
[1] During: resolving callee type: type(CPUDispatcher(<function distcor at 0x000001FC83B2B378>))
[2] During: typing of call at C:\Users\Diego Navarro - FGV\Anaconda3b\lib\site-packages\umap\nndescent.py (65)


File "..\..\Anaconda3b\lib\site-packages\umap\nndescent.py", line 65:
    def nn_descent(
        <source elided>
            for j in range(indices.shape[0]):
                d = dist(data[i], data[indices[j]], *dist_args)
                ^

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

To see Python/NumPy features supported by the latest release of Numba visit:
http://numba.pydata.org/numba-doc/dev/reference/pysupported.html
and
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

For more information about typing errors and how to debug them visit:
http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile

If you think your code should work with Numba, please report the error message
and traceback, along with a minimal reproducer at:
https://github.com/numba/numba/issues/new

So third, I made some experiments with the pynndescent library, which appears to be a close cousin to the nndescent used inside the UMAP library. Surprisingly this doesn't work even for iris.

from pynndescent import NNDescent
index = NNDescent(iris.data, metric = distcor)
u,_=index.query(data,k=1)

Since the UMAP and pynndescent share maintainers, I probably should take that up with them.

from dcor.

vnmabus avatar vnmabus commented on June 17, 2024

Ok, thank you for the clarification. I did not see your question as rude at all, but I am not a native speaker. When I read your question I though that you were asking for GPU or compiled versions of the distance covariance/correlation functions via Numba. This is something I think is useful for speeding computations, but I do not have time to implement right now. However, if I have understood your answer correctly, your problem lied elsewere. I hope you can find and fix it easily.

from dcor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.