Giter Site home page Giter Site logo

Comments (15)

drmingdrmer avatar drmingdrmer commented on June 2, 2024

I do not quite understand the term key offset.

You can just store an array of keys with slim without values. Just pass a nil to the value argument:

func NewSlimTrie(e encode.Encoder, keys []string, values interface{}, opts ...Opt) (*SlimTrie, error) {

from slim.

vasundhara785 avatar vasundhara785 commented on June 2, 2024

Feature i need using the Trie is

Application 1 should load all the keys with their offsets along with data.
st, err := index.NewSlimIndex(keyOffsets, data)

Application 2 should just look up with the key without actually knowing all keys and values but initialising the trie object is requiring all keys and values. st, err := index.NewSlimIndex(keyOffsets, data)

Is it possible to initialise the trie object without actually loading all the keys and data?

As we are going to have 500Million key value pairs to store. So every application loading all the keys and values to intialise the Trie object is an overhead for the application.Creating the whole trie every time for every version keeps doubling up memory and affects the Space Complexity very badly

from slim.

drmingdrmer avatar drmingdrmer commented on June 2, 2024

Thanks for the explanation!

Is it possible to initialise the trie object without actually loading all the keys and data?

If what you want is to build a sparse index, e.g. to create an index for every 5 items, slimtrie provides a Range mode:

  • Choose 1 key/value for every 5 key/values in your dataset. Build a slice.
  • Build slimtrie from this slice with option: Complete: Bool(true). This eliminates the false-positive for range-get query, e.g. searching for an item of a key in range [1000, 1200).

st, err := NewSlimTrie(encode.I32{}, keys, values, Opt{Complete: Bool(true)})
ta.NoError(err)

Reference:

// RangeGet look for a range that contains a key in SlimTrie.
//
// A range that contains a key means range-start <= key <= range-end.
//
// It returns the value the range maps to, and a bool indicate if a range is
// found.
//
// A positive return value does not mean the range absolutely exists, which in
// this case, is a "false positive".
//
// Since 0.4.3
func (st *SlimTrie) RangeGet(key string) (interface{}, bool) {

E.g., with a slimtrie built from the following keys:

{
"aa": 1,
"az": 2,
"cd": 3,
}
  • RangeGet("a") returns nil, false
  • RangeGet("ab") returns 1, true
  • RangeGet("az") returns 2, true
  • RangeGet("azz") returns 2, true
  • RangeGet("b") returns 2, true
  • RangeGet("d") returns 3, true

from slim.

vasundhara785 avatar vasundhara785 commented on June 2, 2024

@drmingdrmer What would be the CPU , Memory and time taken for loading and intialising the Trie object with 1Lakh unique key value pairs till this st, err := index.NewSlimIndex(keyOffsets, data)

from slim.

drmingdrmer avatar drmingdrmer commented on June 2, 2024

I'm not very sure about the memory cost.

For cpu cost, BenchmarkNewSlimTrie should tell you.

There is a benchmark with a similar setup to your case:

BenchmarkNewSlimTrie/200kweb2-4          1000000              1002 ns/op

Building a slimtrie from 2 Lakh of words collected from web takes 1 us/key, i.e., 100 milliseconds for 1 Lakh words.

But the performance varies with different key sets. You may like to benchmark it yourself:DDD

from slim.

vasundhara785 avatar vasundhara785 commented on June 2, 2024

But this wasn't the case for me. It was taking around 30seconds for 1Lakh words. Please suggest .

Attaching the sample code and key value.
ild.go.txt
kv.csv.txt

from slim.

drmingdrmer avatar drmingdrmer commented on June 2, 2024

May I have your complete csv file for a test?

from slim.

drmingdrmer avatar drmingdrmer commented on June 2, 2024

I mean, a test with at least 1 Lakh of lines.

from slim.

vasundhara785 avatar vasundhara785 commented on June 2, 2024

kv.csv.txt

Here is the test csv file

from slim.

drmingdrmer avatar drmingdrmer commented on June 2, 2024

This is quite small. Did you mean that creating slimtrie from this 142 bytes file takes 30 seconds??

from slim.

drmingdrmer avatar drmingdrmer commented on June 2, 2024

I may need your 1 Lakh words file to see what takes so much time.

from slim.

vasundhara785 avatar vasundhara785 commented on June 2, 2024

Yes, To create a slim trie object from the csv file it was taking around 30 seconds . The information in the file is only the key(12 digit number) and value( 2 to 3 characters) used to create slim trie object.You can refer to the sample program attached as well.

from slim.

drmingdrmer avatar drmingdrmer commented on June 2, 2024

With the file you provided it takes only 0.6 seconds.
The file kv.csv.txt is quite small, I do not know why you said it takes 30 seconds 🤔

time go run ild.go
Status  false 88.153µs

real    0m0.667s
user    0m0.524s
sys     0m0.258s

from slim.

vasundhara785 avatar vasundhara785 commented on June 2, 2024

My system has 2 cpu core processor and 1GB RAM. Testing on Low System Limitations. How about your processors?

Is the Trie Implemetation is persistent storage ?

from slim.

drmingdrmer avatar drmingdrmer commented on June 2, 2024

I mainly test slimtrie on my iMac, 3.8Ghz core i5 4 cores.

No. when creating, it is purely in-memory operation.

Maybe you could have a profile on your machine to see what costs most of the time with a benchmark: go test ./... -cpuprofile prof.cpu -memprofile prof.mem -bench=. -run=none

And I still can not believe that a 7 lines input file takes that much time.

wc kv.csv.txt
       7       8     142 kv.csv.txt

from slim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.