Can the Trie Lookup happen without organizing the key offsets? I wan

I do not quite understand the term key offset . <p

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I'm not very sure about the memory cost. For cpu cost, <code class="

Organising the Key Offsets ,about openacid/slim

drmingdrmer commented on June 2, 2024

I do not quite understand the term key offset.

You can just store an array of keys with slim without values. Just pass a nil to the value argument:

slim/trie/slimtrie.go

Line 192 in d27f7e9

    
           func NewSlimTrie(e encode.Encoder, keys []string, values interface{}, opts ...Opt) (*SlimTrie, error) {

from slim.

vasundhara785 commented on June 2, 2024

Feature i need using the Trie is

Application 1 should load all the keys with their offsets along with data.
st, err := index.NewSlimIndex(keyOffsets, data)

Application 2 should just look up with the key without actually knowing all keys and values but initialising the trie object is requiring all keys and values. st, err := index.NewSlimIndex(keyOffsets, data)

Is it possible to initialise the trie object without actually loading all the keys and data?

As we are going to have 500Million key value pairs to store. So every application loading all the keys and values to intialise the Trie object is an overhead for the application.Creating the whole trie every time for every version keeps doubling up memory and affects the Space Complexity very badly

from slim.

drmingdrmer commented on June 2, 2024

Thanks for the explanation!

Is it possible to initialise the trie object without actually loading all the keys and data?

If what you want is to build a sparse index, e.g. to create an index for every 5 items, slimtrie provides a Range mode:

Choose 1 key/value for every 5 key/values in your dataset. Build a slice.
Build slimtrie from this slice with option: Complete: Bool(true). This eliminates the false-positive for range-get query, e.g. searching for an item of a key in range [1000, 1200).

slim/trie/slimtrie_complete_test.go

Lines 28 to 29 in d27f7e9

    
           st, err := NewSlimTrie(encode.I32{}, keys, values, Opt{Complete: Bool(true)}) 
        
           ta.NoError(err)

Query slimtrie with RangeGet:

slim/trie/slimtrie_test.go

Line 985 in d27f7e9

v, found := st.RangeGet(key)

Reference:

slim/trie/slimtrie_query.go

Lines 64 to 75 in d27f7e9

    
           // RangeGet look for a range that contains a key in SlimTrie. 
        
           // 
        
           // A range that contains a key means range-start <= key <= range-end. 
        
           // 
        
           // It returns the value the range maps to, and a bool indicate if a range is 
        
           // found. 
        
           // 
        
           // A positive return value does not mean the range absolutely exists, which in 
        
           // this case, is a "false positive". 
        
           // 
        
           // Since 0.4.3 
        
           func (st *SlimTrie) RangeGet(key string) (interface{}, bool) {

E.g., with a slimtrie built from the following keys:

{
"aa": 1,
"az": 2,
"cd": 3,
}

RangeGet("a") returns nil, false
RangeGet("ab") returns 1, true
RangeGet("az") returns 2, true
RangeGet("azz") returns 2, true
RangeGet("b") returns 2, true
RangeGet("d") returns 3, true

from slim.

vasundhara785 commented on June 2, 2024

@drmingdrmer What would be the CPU , Memory and time taken for loading and intialising the Trie object with 1Lakh unique key value pairs till this st, err := index.NewSlimIndex(keyOffsets, data)

from slim.

drmingdrmer commented on June 2, 2024

I'm not very sure about the memory cost.

For cpu cost, BenchmarkNewSlimTrie should tell you.

There is a benchmark with a similar setup to your case:

BenchmarkNewSlimTrie/200kweb2-4          1000000              1002 ns/op

Building a slimtrie from 2 Lakh of words collected from web takes 1 us/key, i.e., 100 milliseconds for 1 Lakh words.

But the performance varies with different key sets. You may like to benchmark it yourself:DDD

from slim.

vasundhara785 commented on June 2, 2024

But this wasn't the case for me. It was taking around 30seconds for 1Lakh words. Please suggest .

Attaching the sample code and key value.
ild.go.txt
kv.csv.txt

from slim.

drmingdrmer commented on June 2, 2024

May I have your complete csv file for a test?

from slim.

drmingdrmer commented on June 2, 2024

I mean, a test with at least 1 Lakh of lines.

from slim.

vasundhara785 commented on June 2, 2024

kv.csv.txt

Here is the test csv file

from slim.

drmingdrmer commented on June 2, 2024

This is quite small. Did you mean that creating slimtrie from this 142 bytes file takes 30 seconds??

from slim.

drmingdrmer commented on June 2, 2024

I may need your 1 Lakh words file to see what takes so much time.

from slim.

vasundhara785 commented on June 2, 2024

Yes, To create a slim trie object from the csv file it was taking around 30 seconds . The information in the file is only the key(12 digit number) and value( 2 to 3 characters) used to create slim trie object.You can refer to the sample program attached as well.

from slim.

drmingdrmer commented on June 2, 2024

With the file you provided it takes only 0.6 seconds.
The file kv.csv.txt is quite small, I do not know why you said it takes 30 seconds 🤔

time go run ild.go
Status  false 88.153µs

real    0m0.667s
user    0m0.524s
sys     0m0.258s

from slim.

vasundhara785 commented on June 2, 2024

My system has 2 cpu core processor and 1GB RAM. Testing on Low System Limitations. How about your processors?

Is the Trie Implemetation is persistent storage ?

from slim.

drmingdrmer commented on June 2, 2024

I mainly test slimtrie on my iMac, 3.8Ghz core i5 4 cores.

No. when creating, it is purely in-memory operation.

Maybe you could have a profile on your machine to see what costs most of the time with a benchmark: go test ./... -cpuprofile prof.cpu -memprofile prof.mem -bench=. -run=none

And I still can not believe that a 7 lines input file takes that much time.

wc kv.csv.txt
       7       8     142 kv.csv.txt

from slim.

Organising the Key Offsets about slim HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	st, err := NewSlimTrie(encode.I32{}, keys, values, Opt{Complete: Bool(true)})
	ta.NoError(err)

	// RangeGet look for a range that contains a key in SlimTrie.
	//
	// A range that contains a key means range-start <= key <= range-end.
	//
	// It returns the value the range maps to, and a bool indicate if a range is
	// found.
	//
	// A positive return value does not mean the range absolutely exists, which in
	// this case, is a "false positive".
	//
	// Since 0.4.3
	func (st *SlimTrie) RangeGet(key string) (interface{}, bool) {