Giter Site home page Giter Site logo

Performance about hirola HOT 5 CLOSED

coldfix avatar coldfix commented on July 21, 2024
Performance

from hirola.

Comments (5)

bwoodsend avatar bwoodsend commented on July 21, 2024 1

I somehow thought I had to pass the desired number of elements and hirola would then choose a good hash table length for me. I don't know where I got that idea.

For the record, I've just pushed 41e0ba3 to master which adds a warning if you let your table get too full so the next person to make that mistake should find out otherwise pretty quickly.

from hirola.

coldfix avatar coldfix commented on July 21, 2024

Part of the performance difference can be mitigated by not redoing the lookups in the case where we have many repeated values:

def hiro2(sought):
    unique, indices = np.unique(sought, return_inverse=True)
    return tab.get(unique)[indices]

print("hiro2", timeit(lambda: hiro2(sought), number=100))

Output:

hiro2 0.2717397840006015

Still a bit slower than pandas, but already much better in this situation.

from hirola.

bwoodsend avatar bwoodsend commented on July 21, 2024

Thanks for the interest and benchmark code. I hadn't heard of pandas.Categorical().

The solution to why hirola is performing slower is actually very simple although the explanation is less so. If I change:

tab = hirola.HashTable(len(known), known.dtype)

from your code to:

tab = hirola.HashTable(len(known) * 1.5, known.dtype)

then then my benchmark values go from:

hirola 5.591865312977461
pandas 0.24903952999738976

to:

hirola 0.11547893600072712
pandas 0.2469974830164574

The problem is that you're letting the table get full. You have 500 unique but a table size of 500. Once a hash table gets more than around 23 full the algorithm very quickly loses efficiciency, devolving towards plain brute force searching. I guess I need to put these last sentences somewhere more prominent.

And if you don't mind eating extra memory, you can keep adding extra space. With that 1.5 multiplier set to 2 I get 0.08231 and with it set to 3 I get 0.06492.

from hirola.

coldfix avatar coldfix commented on July 21, 2024

Oh, many thanks, that makes sense. I somehow thought I had to pass the desired number of elements and hirola would then choose a good hash table length for me. I don't know where I got that idea.

from hirola.

coldfix avatar coldfix commented on July 21, 2024

Awesome!

from hirola.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.