Giter Site home page Giter Site logo

Comments (6)

huichen avatar huichen commented on September 24, 2024 1

@xiocode 可以把range条件放在ScoringCriteria中,当文档不在range中的时候返回空切片。例如:

type RangedScoringCriteria struct {
        MinScore, MaxScore float32
}

func (criteria RangedScoringCriteria) Score(
        doc types.IndexedDocument, fields interface{}) []float32 {
        if reflect.TypeOf(fields) != reflect.TypeOf(RangedScoringFields{}) {
                return []float32{}
        }
        rsf := fields.(RangedScoringFields)

        // 检查是否在range内。
        if rsf.Score < criteria.MinScore || rsf.Score > criteria.MaxScore {
                return []float32{}        
        }

        // 进行后续打分。
}

悟空引擎现在还没有索引持久化,会在后续版本中实现,不过重启后将索引表从硬盘载入内存的这个时间应该是不可避免的,如果你需要zero downtime的话,应该考虑多个服务器duplication。

from wukong.

xiocode avatar xiocode commented on September 24, 2024

@huichen 嗯,谢谢。
从硬盘读取索引应该要比重建一次索引好点,也比较实际。。

from wukong.

xiocode avatar xiocode commented on September 24, 2024

你好我想再请教一下,你这里索引的内存占用情况,100G的文本的话,索引内存占用大概多少?

from wukong.

xiocode avatar xiocode commented on September 24, 2024

刚刚做了下测试,leveldb 68M的文本内容,约64W条微博,内存占用 1.25G,词典417M,索引占用800多M,这个是不是太夸张了

from wukong.

huichen avatar huichen commented on September 24, 2024

@xiocode 我测量的词典占用大概100M内存,你的417M可能是在GC之前测量到的值。索引你应该是用了LocationsIndex,这种索引占用内存是比较大的,因为需要存储所有分词的位置,每个docid-分词对需要大约24个字节。如果不需要存储分词位置,可以考虑使用其它索引类型,见

https://github.com/huichen/wukong/blob/master/types/indexer_init_options.go#L4

from wukong.

hihus avatar hihus commented on September 24, 2024

1.如果我只想按照属性值查询所有id,关键字为空的话,wukong是怎么支持这种查询的。看完源代码,没有关键字就不能进行评分了,是否需要有一个特殊的table保存所有的docID,当没有关键字时只按照属性评分搜索。
2.如果我是优先属性值的评分找到所有docID,然后关键字出现与否算是一个评分标准。这种wukong可以做不

from wukong.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.