Comments (15)
I do not quite understand the term key offset
.
You can just store an array of keys with slim
without values. Just pass a nil
to the value
argument:
Line 192 in d27f7e9
from slim.
Feature i need using the Trie is
Application 1 should load all the keys with their offsets along with data.
st, err := index.NewSlimIndex(keyOffsets, data)
Application 2 should just look up with the key without actually knowing all keys and values but initialising the trie object is requiring all keys and values. st, err := index.NewSlimIndex(keyOffsets, data)
Is it possible to initialise the trie object without actually loading all the keys and data?
As we are going to have 500Million key value pairs to store. So every application loading all the keys and values to intialise the Trie object is an overhead for the application.Creating the whole trie every time for every version keeps doubling up memory and affects the Space Complexity very badly
from slim.
Thanks for the explanation!
Is it possible to initialise the trie object without actually loading all the keys and data?
If what you want is to build a sparse index, e.g. to create an index for every 5 items, slimtrie provides a Range
mode:
- Choose 1 key/value for every 5 key/values in your dataset. Build a slice.
- Build slimtrie from this slice with option:
Complete: Bool(true)
. This eliminates the false-positive for range-get query, e.g. searching for an item of a key in range[1000, 1200)
.
slim/trie/slimtrie_complete_test.go
Lines 28 to 29 in d27f7e9
- Query slimtrie with
RangeGet
:Line 985 in d27f7e9
Reference:
Lines 64 to 75 in d27f7e9
E.g., with a slimtrie built from the following keys:
{
"aa": 1,
"az": 2,
"cd": 3,
}
RangeGet("a")
returns nil, falseRangeGet("ab")
returns 1, trueRangeGet("az")
returns 2, trueRangeGet("azz")
returns 2, trueRangeGet("b")
returns 2, trueRangeGet("d")
returns 3, true
from slim.
@drmingdrmer What would be the CPU , Memory and time taken for loading and intialising the Trie object with 1Lakh unique key value pairs till this st, err := index.NewSlimIndex(keyOffsets, data)
from slim.
I'm not very sure about the memory cost.
For cpu cost, BenchmarkNewSlimTrie
should tell you.
There is a benchmark with a similar setup to your case:
BenchmarkNewSlimTrie/200kweb2-4 1000000 1002 ns/op
Building a slimtrie from 2 Lakh of words collected from web takes 1 us/key, i.e., 100 milliseconds for 1 Lakh words.
But the performance varies with different key sets. You may like to benchmark it yourself:DDD
from slim.
But this wasn't the case for me. It was taking around 30seconds for 1Lakh words. Please suggest .
Attaching the sample code and key value.
ild.go.txt
kv.csv.txt
from slim.
May I have your complete csv file for a test?
from slim.
I mean, a test with at least 1 Lakh of lines.
from slim.
Here is the test csv file
from slim.
This is quite small. Did you mean that creating slimtrie from this 142 bytes file takes 30 seconds??
from slim.
I may need your 1 Lakh words file to see what takes so much time.
from slim.
Yes, To create a slim trie object from the csv file it was taking around 30 seconds . The information in the file is only the key(12 digit number) and value( 2 to 3 characters) used to create slim trie object.You can refer to the sample program attached as well.
from slim.
With the file you provided it takes only 0.6 seconds.
The file kv.csv.txt
is quite small, I do not know why you said it takes 30 seconds 🤔
time go run ild.go
Status false 88.153µs
real 0m0.667s
user 0m0.524s
sys 0m0.258s
from slim.
My system has 2 cpu core processor and 1GB RAM. Testing on Low System Limitations. How about your processors?
Is the Trie Implemetation is persistent storage ?
from slim.
I mainly test slimtrie on my iMac, 3.8Ghz core i5 4 cores.
No. when creating, it is purely in-memory operation.
Maybe you could have a profile on your machine to see what costs most of the time with a benchmark: go test ./... -cpuprofile prof.cpu -memprofile prof.mem -bench=. -run=none
And I still can not believe that a 7 lines input file takes that much time.
wc kv.csv.txt
7 8 142 kv.csv.txt
from slim.
Related Issues (20)
- Weekly Digest (10 February, 2020 - 17 February, 2020)
- Weekly Digest (16 February, 2020 - 23 February, 2020)
- Weekly Digest (5 April, 2020 - 12 April, 2020)
- Weekly Digest (12 April, 2020 - 19 April, 2020)
- Weekly Digest (19 April, 2020 - 26 April, 2020)
- Weekly Digest (26 April, 2020 - 3 May, 2020)
- Weekly Digest (3 May, 2020 - 10 May, 2020)
- Weekly Digest (28 June, 2020 - 5 July, 2020)
- Weekly Digest (5 July, 2020 - 12 July, 2020)
- Weekly Digest (12 July, 2020 - 19 July, 2020)
- Weekly Digest (19 July, 2020 - 26 July, 2020)
- Weekly Digest (26 July, 2020 - 2 August, 2020)
- Weekly Digest (2 August, 2020 - 9 August, 2020)
- Do I have to organize the keys in memory to use slim? HOT 1
- Possible to show GC performance on... a few variations? HOT 1
- 文档: 如何更新/删除key HOT 4
- encode.String16: panic: runtime error: index out of range [0] with length 0 HOT 1
- Probabilistic structure (bloom-filter-like) or a guaranteed structure? HOT 4
- scan only one key slim trie HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from slim.