Comments (5)
I somehow thought I had to pass the desired number of elements and hirola would then choose a good hash table length for me. I don't know where I got that idea.
For the record, I've just pushed 41e0ba3 to master which adds a warning if you let your table get too full so the next person to make that mistake should find out otherwise pretty quickly.
from hirola.
Part of the performance difference can be mitigated by not redoing the lookups in the case where we have many repeated values:
def hiro2(sought):
unique, indices = np.unique(sought, return_inverse=True)
return tab.get(unique)[indices]
print("hiro2", timeit(lambda: hiro2(sought), number=100))
Output:
hiro2 0.2717397840006015
Still a bit slower than pandas, but already much better in this situation.
from hirola.
Thanks for the interest and benchmark code. I hadn't heard of pandas.Categorical()
.
The solution to why hirola is performing slower is actually very simple although the explanation is less so. If I change:
tab = hirola.HashTable(len(known), known.dtype)
from your code to:
tab = hirola.HashTable(len(known) * 1.5, known.dtype)
then then my benchmark values go from:
hirola 5.591865312977461
pandas 0.24903952999738976
to:
hirola 0.11547893600072712
pandas 0.2469974830164574
The problem is that you're letting the table get full. You have 500 unique but a table size of 500. Once a hash table gets more than around 2⁄3 full the algorithm very quickly loses efficiciency, devolving towards plain brute force searching. I guess I need to put these last sentences somewhere more prominent.
And if you don't mind eating extra memory, you can keep adding extra space. With that 1.5
multiplier set to 2
I get 0.08231 and with it set to 3
I get 0.06492.
from hirola.
Oh, many thanks, that makes sense. I somehow thought I had to pass the desired number of elements and hirola would then choose a good hash table length for me. I don't know where I got that idea.
from hirola.
Awesome!
from hirola.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hirola.