Giter Site home page Giter Site logo

sea-team / gofound Goto Github PK

View Code? Open in Web Editor NEW
1.6K 1.6K 185.0 34.58 MB

GoFound GoLang Full text search go语言全文检索引擎,毫秒级查询。 使用http接口调用,集成Admin管理界面,任何系统都可以使用。

License: Apache License 2.0

Go 67.62% HTML 0.97% Vue 25.40% JavaScript 2.24% Dockerfile 0.43% Shell 3.34%

gofound's People

Contributors

baletu avatar codfrm avatar issueye avatar leaf-dawn avatar liu-cn avatar makonike avatar ncwsky avatar newpanjing avatar nightzjp avatar songzhibin97 avatar tom-debug110 avatar xiaok29 avatar zhangclb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gofound's Issues

Score 搜索结的score 是怎么算的

是关键字 命中的越多,分数越大吗?
测了不少次,没测出什么规律,不知道如何利用这个Score 字段
因为搜索的东西多,并不精准,想要筛选下结果集

process panic

使用最新的release,使用python客户端运行test.py

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x188 pc=0xcc1df7]

goroutine 7 [running]:
github.com/syndtr/goleveldb/leveldb.(*DB).isClosed(...)
        /home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db_state.go:230
github.com/syndtr/goleveldb/leveldb.(*DB).ok(...)
        /home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db_state.go:235
github.com/syndtr/goleveldb/leveldb.(*DB).Get(0xc00040a680?, {0xc002ed87a8?, 0x8?, 0x6?}, 0xc002ed87a8?)
        /home/runner/go/pkg/mod/github.com/syndtr/[email protected]/leveldb/db.go:838 +0x57
gofound/searcher/storage.(*LeveldbStorage).Get(0xc00040a680, {0xc002ed87a8, 0x6, 0x8})
        /home/runner/work/gofound/gofound/searcher/storage/leveldb_storage.go:88 +0x4b
gofound/searcher.(*Engine).addInvertedIndex(0xc0004380c0, {0xc0051bb8b0, 0x6}, 0x3e8)
        /home/runner/work/gofound/gofound/searcher/engine.go:208 +0x165
gofound/searcher.(*Engine).AddDocument(0xc0004380c0, 0xc002ee4080)
        /home/runner/work/gofound/gofound/searcher/engine.go:187 +0xf7
gofound/searcher.(*Engine).DocumentWorkerExec(0x0?, 0x0?)
        /home/runner/work/gofound/gofound/searcher/engine.go:125 +0x45
created by gofound/searcher.(*Engine).Init
        /home/runner/work/gofound/gofound/searcher/engine.go:71 +0x24a

二分查找算法错误

fast.go 里面的find的二分查找算法写错了

func (f *FastSort) find(target *uint32) (bool, int) {

	low := 0
	high := f.count - 1
	for low <= high {
		mid := (low + high) / 2
		if f.data[mid].Id == *target {
			return true, mid
		} else if f.data[mid].Id < *target {  // 这里的小于号应该是大于号
			high = mid - 1
		} else {
			low = mid + 1
		}
	}
	return false, -1
	//for index, item := range f.data {
	//	if item.Id == *target {
	//		return true, index
	//	}
	//}
	//return false, -1
}

添加索引非常慢

gofound-python添加add_documents方式特别慢,目测要1秒钟一个doc。
具体参数和环境为:
(1)索引text字段长度限制在200个字符内;
(2)每100个doc时调用add_documents;
(3)其他采用默认参数。

全英文内容搜索没有数据

the Hypertext Transfer Protocol HTTP is an application layer protocol in the Internet protocol 我增加了内容, 我搜 layer 没有出数据啊,搜 protocol 也没有数据,debug默认开的,控制台没有报错误

接口能增加自定义的分词参数吗?

内置的分词总有不满足的情况,新建时可否考虑接口中增加自定义分词的参数?

比如增加一个自定义分词属性"token",允许加上自定义的分词,像空格分隔的"token"=“π 3.131415926”,目的是希望用“π”或“3.131415926”检索时能被查出。

如此既有内置默认分词,又能方便业务扩展

web访问admin出现错误:index.3e4347a8.js 返回MIME类型为 “text/plain”

详细错误提示:
index.3e4347a8.js:1 Failed to load module script: Expected a JavaScript module script but the server responded with a MIME type of "text/plain". Strict MIME type checking is enforced for module scripts per HTML spec.

添加代码 gofound/web/admin/admin.go:
import (
"mime"
)

func init() {
mime.AddExtensionType(".html", "text/html")
mime.AddExtensionType(".css", "text/css")
mime.AddExtensionType(".js", "text/javascript")
}

有朋友可以帮忙看下么,系统是centos7.6 现在gofound就是无法启动

[GIN-debug] GET /api/db/list --> github.com/sea-team/gofound/web/controller.DBS (6 handlers)
[GIN-debug] GET /api/db/drop --> github.com/sea-team/gofound/web/controller.DatabaseDrop (6 handlers)
[GIN-debug] GET /api/db/create --> github.com/sea-team/gofound/web/controller.DatabaseCreate (6 handlers)
[GIN-debug] GET /api/word/cut --> github.com/sea-team/gofound/web/controller.WordCut (6 handlers)
2023/05/31 14:19:34 API Url: http://:8080/api

2023/05/31 14:19:44 waiting: 0
2023/05/31 14:19:44 waiting: 0
2023/05/31 14:19:54 waiting: 0
2023/05/31 14:19:54 waiting: 0
^C2023/05/31 14:19:59 Shutdown Server ...
2023/05/31 14:19:59 Server exiting

JSON文档能支持子节点吗?

{"name":"张三","age":18}这种“一维”结构太简单了,现实情境太复杂,不支持子节点结构影响实际应用啊

通过管理页面上添加{"name":"张三","age":18,"node":{"test":"testnode"}},提示像成功,实际查询结果不对,也不知道有没有正常添加成功

更新数据 偶发更新失败 但是api接口返回成功

{"document":{"sys_org_code":null,"owner_id":"10008","name":"测试测试1","background_zosid":"20220714100592881081549866861280","create_by":"","image_zosid":"20220818101555966537523750432670","create_time":"2022-08-18 09:15:15","update_time":"2022-08-18 10:35:31"},"text":"10013^10013^测试测试2","id":10013}

我以这种格式去更新数据,会偶尔出现更新不成功的情况 api接口 提示的是成功 也没有报错 但是数据内容没有变

能否增加针对CSV数据格式的分词选项

作者大大您好!
请问能否增加针对CSV数据格式的分词选项呢,CSV以逗号做为分隔符,把CSV整行数据导入Gofound其为强需求,如果可以增加这一块的功能,相信对项目是有益的。

考虑支持系统环境变量参数配置?

如今都是容器化部署,命令行参数配置较为繁琐,文件配置也不够灵活,将会有有众多的个性化参数出现。

各参数提供默认值,以环境变量配置以可替换化,这是比较灵活的方式,容器化部署也会很方便,请考虑,预祝越做越好!

A golang SDK example without tcp

package main
import (
	"gofound/core"
	"gofound/global"
	"gofound/searcher/model"
	service2 "gofound/web/service"
	"log"
	"runtime"
)

type Services struct {
	Base     *service2.Base
	Index    *service2.Index
	Database *service2.Database
	Word     *service2.Word
}

func NewServices() *Services {
	return &Services{
		Base:     service2.NewBase(),
		Index:    service2.NewIndex(),
		Database: service2.NewDatabase(),
		Word:     service2.NewWord(),
	}
}

func main() {
	// Initialize 初始化
	//global.CONFIG = core.Parser() // if you need config.yaml
	global.CONFIG = &global.Config{
		//Addr:        *addr,
		Data:       "./data",
		Debug:      true,
		Dictionary: "./data/dictionary.txt",
		//EnableAdmin: false,
		Gomaxprocs: runtime.NumCPU(),
		//Auth:        "",
		//EnableGzip:  false,
		Timeout:   600,
		BufferNum: 1000,
	}
	//初始化分词器
	tokenizer := core.NewTokenizer(global.CONFIG.Dictionary)
	global.Container = core.NewContainer(tokenizer)

	srv := NewServices()

	log.Println(srv.Base.Status())

	request := &model.IndexDoc{}
	request.Id = 1
	request.Text = "下列关于静态代码块的描述中,正确的是(  )"
	t := `
			a. 使用静态代码块可以实现类的初始化
		b. 静态代码块随着类的加载而加载
		c. 每次创建对象时,类中的静态代码块都会被执行一次
		d. 静态代码块指的是被static关键字修饰的代码块
		`
	request.Document = map[string]interface{}{
		"content": t,
		"answer":  "静态代码块指的是被static关键字修饰的代码块, 静态代码块随着类的加载而加载, 使用静态代码块可以实现类的初始化",
	}
	log.Println(srv.Index.AddIndex("default", request))

}

remove return 404 Not found

gofound revision: 59d4e00
gofound-python revision: e170d832a486c3588ddb7e61a0c84eea9e99829b

Use python script in https://github.com/newpanjing/gofound-python/blob/master/README.md, got:

$ python test.py
{'state': True, 'message': 'success'}
{'state': True, 'message': 'success', 'data': {'time': 33.977121, 'total': 1, 'pageCount': 1, 'page': 1, 'limit': 10, 'documents': [{'id': 1000, 'text': '探访海南自贸港“样板间”', 'document': {'content': '洋浦经济开发区地处海南西北部洋浦半岛,是21世纪海上丝绸之路与西部陆海新通道的交汇节点。是国务院1992年批准设立的。我国第一个由外商成片开发、享受保税区政策的国家级开发区'}, 'score': 3, 'keys': ['海南', '自贸港', '样板', '样板间', '探访']}], 'words': ['探访', '海南', '自贸港']}}
{'id': 1000, 'text': '探访海南自贸港“样板间”', 'document': {'content': '洋浦经济开发区地处海南西北部洋浦半岛,是21世纪海上丝绸之路与西部陆海新通道的交汇节点。是国务院1992年批准设立的。我国第一个由外商成片开发、享受保税区政策的国家级开发区'}, 'score': 3, 'keys': ['海南', '自贸港', '样板', '样板间', '探访']}
Traceback (most recent call last):
  File "test.py", line 46, in <module>
    remove()
  File "test.py", line 39, in remove
    res = client.remove_document(1000)
  File "/home/zhangclb/sandbox/gofound/gofound-python/gofound/client.py", line 78, in remove_document
    res = self._post("remove", json={
  File "/home/zhangclb/sandbox/gofound/gofound-python/gofound/client.py", line 43, in _post
    raise DBException("Error:", res.status_code)
gofound.exceptions.DBException: ('Error:', 404)

Then access http://localhost:8080/admin/#/ , query with "海南",
sent:

{
  "query": "海南",
  "page": 1,
  "limit": 10,
  "highlight": {
    "preTag": "<em style='color:red'>",
    "postTag": "</em>"
  },
  "order": "DESC"
}

got:
{"state":true,"message":"success","data":{"time":0.38622900000000004,"total":1,"pageCount":1,"page":1,"limit":10,"documents":[{"id":1000,"text":"探访\u003cem style='color:red'\u003e海南\u003c/em\u003e自贸港“样板间”","document":{"content":"洋浦经济开发区地处海南西北部洋浦半岛,是21世纪海上丝绸之路与西部陆海新通道的交汇节点。是国务院1992年批准设立的。我国第一个由外商成片开发、享受保税区政策的国家级开发区"},"originalText":"探访海南自贸港“样板间”","score":1,"keys":["海南","自贸港","样板","样板间","探访"]}],"words":["海南"]}}

But remove failed, sent:
{id: 1000}
But got 404:

请求网址: http://localhost:8080/api/remove?database=default
请求方法: POST
状态代码: 404 Not Found
远程地址: 127.0.0.1:8080
引荐来源网址政策: strict-origin-when-cross-origin

panic: resource temporarily unavailable

运行一段时间之后程序就会退出,打开 debug 模式查看了最后的日志输入如下:

panic: resource temporarily unavailable

goroutine 3364545 [running]:
gofound/searcher/storage.(*LeveldbStorage).ReOpen(0xc00007a3c0)
/home/runner/work/gofound/gofound/searcher/storage/leveldb_storage.go:81 +0x115
gofound/searcher/storage.(*LeveldbStorage).autoOpenDB(0xc00007a3c0)
/home/runner/work/gofound/gofound/searcher/storage/leveldb_storage.go:26 +0x2e
gofound/searcher/storage.(*LeveldbStorage).Get(0xc00007a3c0, {0xc001a29e8c, 0x4, 0x4})
/home/runner/work/gofound/gofound/searcher/storage/leveldb_storage.go:91 +0x2d
gofound/searcher.(*Engine).GetDocById(0xc0029da4e0?, 0x29da360?)
/home/runner/work/gofound/gofound/searcher/engine.go:550 +0x65
gofound/searcher.(*Engine).getDocument(0x0?, {0x43fb65?, 0xc0052f8768?}, 0xc000000230, 0xc002b6a600, 0xc002694280, 0xc000394a40)
/home/runner/work/gofound/gofound/searcher/engine.go:466 +0x68
created by gofound/searcher.(*Engine).MultiSearch.func2
/home/runner/work/gofound/gofound/searcher/engine.go:419 +0x9cd

/api/status 接口报错。 mac m1上

请求:

curl -H "Content-Type:application/json" -X GET http://127.0.0.1:5678/api/status

{"state":false,"message":"runtime error: index out of range [0] with length 0"}
goroutine 39 [running]:
runtime/debug.Stack()
	/opt/hostedtoolcache/go/1.18.5/x64/src/runtime/debug/stack.go:24 +0x68
runtime/debug.PrintStack()
	/opt/hostedtoolcache/go/1.18.5/x64/src/runtime/debug/stack.go:16 +0x20
gofound/web/middleware.Exception.func1.1()
	/home/runner/work/gofound/gofound/web/middleware/exception.go:15 +0x40
panic({0x10178b720, 0x14002e19ec0})
	/opt/hostedtoolcache/go/1.18.5/x64/src/runtime/panic.go:838 +0x204
gofound/searcher/system.GetCPUStatus()
	/home/runner/work/gofound/gofound/searcher/system/cpu.go:19 +0xd0
gofound/web/service.(*Base).Status(0x14000067100)
	/home/runner/work/gofound/gofound/web/service/base.go:44 +0x10c
gofound/web/controller.Status(0x14000071630?)
	/home/runner/work/gofound/gofound/web/controller/base.go:39 +0x38
github.com/gin-gonic/gin.(*Context).Next(...)
	/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/context.go:168
gofound/web/middleware.Exception.func1(0x1400017c300)
	/home/runner/work/gofound/gofound/web/middleware/exception.go:20 +0x6c
github.com/gin-gonic/gin.(*Context).Next(...)
	/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/context.go:168
gofound/web/middleware.Cors.func1(0x1400017c300)
	/home/runner/work/gofound/gofound/web/middleware/cors.go:25 +0x140
github.com/gin-gonic/gin.(*Context).Next(...)
	/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/context.go:168
github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0x1400017c300)
	/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/recovery.go:99 +0x80
github.com/gin-gonic/gin.(*Context).Next(...)
	/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/context.go:168
github.com/gin-gonic/gin.LoggerWithConfig.func1(0x1400017c300)
	/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/logger.go:241 +0xb0
github.com/gin-gonic/gin.(*Context).Next(...)
	/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/context.go:168
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0x140052b1040, 0x1400017c300)
	/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/gin.go:555 +0x568
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0x140052b1040, {0x1017ccf80?, 0x140002781c0}, 0x1400017c000)
	/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.7.7/gin.go:511 +0x1d4
net/http.serverHandler.ServeHTTP({0x140002c81e0?}, {0x1017ccf80, 0x140002781c0}, 0x1400017c000)
	/opt/hostedtoolcache/go/1.18.5/x64/src/net/http/server.go:2916 +0x3fc
net/http.(*conn).serve(0x1400032a000, {0x1017cd6e8, 0x140051c5f50})
	/opt/hostedtoolcache/go/1.18.5/x64/src/net/http/server.go:1966 +0x56c
created by net/http.(*Server).Serve
	/opt/hostedtoolcache/go/1.18.5/x64/src/net/http/server.go:3071 +0x450
[GIN] 2022/11/02 - 16:31:23 | 200 |      9.7025ms |       127.0.0.1 | GET      "/api/status"

分词器的问题

请问一下,分词的时候,为什么要把标点符号和空格都去掉,这样英文就没法分词了吧?还是我用的不对?
text = utils.RemovePunctuation(text)
//移除所有的空格
text = utils.RemoveSpace(text)

Engine::Drop method does not work.

环境配置

源码编译(go get & go build)后,将二进制文件放入docker内。

FROM ubuntu:20.04

COPY ./gofound /app/gofound
RUN mkdir /app/data &&\
    chmod 555 /app/gofound

WORKDIR /app

EXPOSE 5678

CMD ["./gofound", "--addr=:5678", "--data=./data"]

发生了什么

删除数据库后,虽然数据库的记录已经在内存中删除了,但对应生成的文件并没有删除。

似乎是Engine::Drop方法中的代码存在问题。

且其中的第551行不知道为什么要每次都要执行os.Remove(e.IndexPath),应该可以优化一下。

预期结果

删除数据库后,内存中的记录删除以外,对应的数据库文件也删除。

其他

我不确定这是否是预期行为,但个人感觉应该是一个bug。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.