openacid / slim Goto Github PK
View Code? Open in Web Editor NEWSurprisingly space efficient trie in Golang(11 bits/key; 100 ns/get).
Home Page: https://openacid.github.io/
License: MIT License
Surprisingly space efficient trie in Golang(11 bits/key; 100 ns/get).
Home Page: https://openacid.github.io/
License: MIT License
serialize.readFull -> io.ReadFull
io.SectionReader
to unify Unmarshal
and UnmarshalAt
n += cnt
should be placed after error checking?
func readFull(reader io.Reader, buf []byte) (cnt int, err error) {
n, cnt, toRead := 0, 0, len(buf)
for n < toRead {
cnt, err = reader.Read(buf[n:])
n += cnt
if err != nil {
break
}
}
return n, err
}
Trie Node: add field squash
Currently the (not slim) Trie node is defined as follow:
type Node struct {
Children map[int]*Node
Branches []int
Step uint16
Value interface{}
isStartLeaf bool
NodeCnt int
}
Add a field squash bool
, to indicate this trie to squash preceding branches every time after Append
a new key.
When creating child node, the child should inherit the squash
value from its parent.
Benifit:
Append()
: user does not need to pass a squash
argument any more.Here's the Weekly Digest for openacid/slim:
Last week, no issues were created.
Last week, no pull requests were created, updated or merged.
Last week there were no commits.
Last week there were no contributors.
Last week there were 4 stagazers.
⭐ TheBits
⭐ gnomeria
⭐ eeelinc
⭐ vislee
You all are the stars! 🌟
Last week there were no releases.
That's all for last week, please 👀 Watch and ⭐ Star the repository openacid/slim to receive next weekly updates. 😃
You can also view all Weekly Digests by clicking here.
Your Weekly Digest bot. 📆
Looking at the imports of the github.com/openacid/slim/trie package, I see multiple packages that appear to be used for testing. Among them are testing
, unsafe
, github.com/openacid/slim/trie/benchmark
, github.com/google/btree
and more. This appears to be because of the file trie/test_utils.go
. If this file were called trie/utils_test.go
these imports disappear.
To Reproduce
Expected behavior
I expected a list of imports without testing
Actual behavior
Includes testing
and a few others that probably don't belong there.
I think this can be fixed by renaming the file, but I'm not 100% sure, because I don't know the code too well.
compacted array provides 3 method: Init, Get and Has;
It'd be better if compacted array adopts existent popular map-like datatype API.
One to the uncomfortable spot is that Get()
returns nil when not found.
But golang map returns a second bool value ok
to indicate if it is found.
Describe the solution you'd like
Find a popular key-value implementation. Adopt its API.
Range-scan is supported!(under development)
When to release this feature?
This feature will be very useful!
What document to add
Additional context
定义一个描述错误的数据格式, 用来实现golang在运行时的错误传递, 目标:
设计数据结构的前提假设:
可以用于API层的错误描述, 一般API处理时收到错误并需要将错误返回给客户端, 要求错误有:
Message通过Code和Resource生成出来. 只提供一个Message接口用来输出message.
希望这个模块可以作为go的errors包的无修改替代品, 和 https://github.com/pkg/errors 的无修改替代品.
提供一个方便的接口让用户直接取得最底层的错误.
记录引发错误的底层错误是什么. 类似在存储服务中, 一个底层的IO错误导致API失败, 如果能在日志中记录引发错误的错误, 可以方便定位问题.
有一类上层错误由几个下层错误引起, 例如多数派写中, nw = 5,3, 这时写失败3个后端会引起整个API调用失败, 这时需要记录多个下层错误.
error 结构里可选的带有stacktrace 信息,方便打印日志(参考了 https://github.com/pkg/errors )
type Error struct {
Code string // error code
Cause []error // 0 or seveal error that cause this error.
Resource string // optional
*stack // optional traceback info of this error
}
// 实现系统error interface
func (e *Error) Error() string // 直接返回Code的string
// 兼容 https://github.com/pkg/errors 的接口:
func (e *Error) Cause() string // 返回第一个cause
// 其他接口没差别不列出了
// 扩展的接口
func (e *Error) AllCause() string // 返回所有cause的slice
func (e *Error) RootCause() string // 返回最初的cause
func (e *Error) AllRootCause() string // 返回所有的最底层的cause; cause的树的叶子节点.
func (e *Error) Message() string // 给人看的, 通过Code和Resource拼装起来.
以s2中场景为例, 描述下如何表示一个具体的错误,
假设一个错误是group没读取到.
而引发这个错误的是通过dbproxy读取group信息失败引发的.
而读dbproxy时重试了2次都失败了, 一次是mysql被置位readonly, 一次是socket读取超时:
Code: GroupNotFound
Resource: "group_id://123"
Cause:
- Code: DBProxyReadError
Resource: "dbproxy://127.0.0.1:3303"
Cause:
- Code: MysqlReadonly
Resource: "mysql://192.168.2.2:3306"
- Code: InvalidResponse
Resource: "mysql://192.168.9.9:3306"
Cause:
- Code: SocketReadTimeout
Resource: "tcp4://192.168.9.9:3306"
Code: BlockBuildError
Resource: "block://aabbccddeeff"
Cause:
- Code: NeedleWriteError
Resource: "needle://3cp/foo/bar.jpg"
Cause:
- Code: FSIsReadonly
Resource: "file:///ecdrives/bbb" # schema of local file url in browser
Cause:
- <a native fs error> # may be an error with errno.
Here's the Weekly Digest for openacid/slim:
Last week, no issues were created.
Last week, no pull requests were created, updated or merged.
Last week there were no commits.
Last week there were no contributors.
Last week there was 1 stargazer.
⭐ x0rzkov
You are the star! 🌟
Last week there were no releases.
That's all for last week, please 👀 Watch and ⭐ Star the repository openacid/slim to receive next weekly updates. 😃
You can also view all Weekly Digests by clicking here.
Your Weekly Digest bot. 📆
I created a trie with ~70k unique keys and corresponding integer values with trie.NewSlimTrie(encode.Int{}, keys, values)
I then measured how many of my keys were in the trie. I was lacking roughly 14%
To Reproduce
This doesn't return the same number of keys, but shows the problem:
func TestSlimTrie(t *testing.T) {
var keys []string
var indexes []int
for i := 0; i < 70000; i++ {
keys = append(keys, randStringRunes(10))
indexes = append(indexes, i)
}
sort.Strings(keys)
tr, err := trie.NewSlimTrie(encode.Int{}, keys, indexes)
if err != nil {
t.Error(err)
return
}
count := 0
for _, key := range keys {
_, found := tr.Get(key)
if !found {
count++
}
}
if count > 0 {
t.Errorf("Couldn't find %f%% of the keys entered. Missing %d keys", float64(count)/float64(len(keys))*100, count)
}
}
Environment:
Language:
Range support
A range SlimTrie is same as a key-value SlimTrie internally.
The only difference is a range SlimTrie uses two keys(starting and ending key) to represent a range.
// Create a range SlimTrie with 3 range:
// ["a", "p"] --> 1
// ["q", "q"] --> 5 // a range with only one item.
// ["r", "z"] --> 3
st, err := NewSlimTrie(xxx, []string{"a", "p", "q", "r" "z"}, []int32{1,1, 5, 3, 3})
Creating a SlimTrie with range support is the same as creating a normal SlimTrie, except that two adjacent keys(start, end for a range) have the same value.
Add example_range_test.go
to illustrate how to create/use a range SlimTrie.
Rewrite tests for the new range support. Also the max number of keys in a range SlimTrie can be determined before creating it.
Add trie/README.md
explaining how range SlimTrie works.
trie.Node.Append(): remove argument isStartLeaf
and attribute Node.isStartLeaf
, which is not required anymore according to latest design.
Push to branch "release-0.3"
To describe:
What document to add
裁剪之后的SlimTrie:
每个节点都有大于1个子节点, 除LeafParent节点之外。
SlimTrie的大小跟key的长度无关. 也就是说任意长度的key的集合都可以使用有限大小的SlimTrie来索引: SlimTrie最多有2n-1个节点: 最多节点是出现在所有inner节点都是二分的时候(空间复杂度为O(n))
//why the most node number of SlimTrie is 2n-1. Could you plase add more comments?
Additional context
Describe the bug
running make test
outputted errors below
# github.com/openacid/slim/array_test [github.com/openacid/slim/array.test]
array/int_test.go:196:4: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)
array/int_test.go:235:3: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)
array/int_test.go:428:4: a.XXX_sizecache undefined (type *array.U32 has no field or method XXX_sizecache)
array/int_test.go:467:3: a.XXX_sizecache undefined (type *array.U32 has no field or method XXX_sizecache)
array/int_test.go:660:4: a.XXX_sizecache undefined (type *array.U64 has no field or method XXX_sizecache)
array/int_test.go:699:3: a.XXX_sizecache undefined (type *array.U64 has no field or method XXX_sizecache)
array/marshal_test.go:73:4: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)
array/marshal_test.go:112:3: a.XXX_sizecache undefined (type *array.U16 has no field or method XXX_sizecache)
To Reproduce
go get github.com/openacid/slim
make gen
mak test
Environment (please complete the following information):
Language (please complete the following information):
go doc guide:
https://blog.golang.org/godoc-documenting-go-code
A data-type such as array would evolve and finally there are several implementation in a package.
A versionizing mechanism should be introduced to:
type Version string
type VersionGetter interface {
func GetVer() Version
}
Marshaller
interface to let data-types provides versioned marshal API.Marshaller.Marshal()
should call serialize.xxx
to dump version.Marshaller.Unmarshal()
should check version and choose a data type with correct version.type Marshaller interface {
Marshal(w io.Writer) (written int64, o interface{}, err error)
}
type UnMarshaller interface {
Unmarshal(r io.Reader) (read int64, o interface{}, err error)
}
type MarshallerAt interface {
MarshalAt(w io.Writer, offset int64) (written int64, o interface{}, err error)
}
type UnMarshallerAt interface {
UnmarshalAt(r io.Reader, offset int64) (read int64, o interface{}, err error)
}
MarshalHelper
struct to simplify Marshaling by converting MarshalAt to Marshal(see iohelper
).type MarshalHelper struct {
Marshaller
UnMarshaller
}
func (m *MarshalHelper) MarshalAt(offset int64) (written int64, err error)
array
should then implement 4 marshal interface.so people know what's benefit this library provide compared to map other than ordered data.
func bytesToString(buf []byte, delimter byte) (str string, err error) {
delimPos := bytes.IndexByte(buf, delimter)
if delimPos == -1 {
delimPos = len(buf)
}
return string(buf[:delimPos]), err
}
Imported serialize is contained in branch "serialize".
And its merge target should be "serialize-base", which contains unmodified imported codes from ec.
related functions:
MarshalAt
UnMarshalAt
Imported serialize is contained in branch "serialize".
And its merge target should be "serialize", which contains unmodified imported codes from ec.
Version comparison should be implemented according to version semantics, not with plain string comparison.
func makeDataHeader(verStr string, headerSize uint64, dataSize uint64) *DataHeader {
...
if verStr > version.VERSION { // <-- Bug
panic("forward compatibility is not supported")
}
A great practical version semantics package is here: https://github.com/Masterminds/semver
Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the > MAJOR.MINOR.PATCH format.
几个相关的数据结构:
各位看看下, 名字ok不?
pre-store popcount for every 8-bit word.
use a 256-elts array to store popcount.
use static mask.
Describe the solution
In slim there are several sub package.
Add multiple links to their url at godoc.com
Support proto.Marshal(SlimTrie)
SlimTrie should implement two interface: proto.Marshaler
and proto.Unmarhsaler
.
type Marshaler interface {
Marshal() ([]byte, error)
}
type Unmarshaler interface {
Unmarshal([]byte) error
}
To let the upper level apps easily marshal/unmarshal a SlimTrie, just like operating on a standard proto.Message
.
To utilize protobuf
functionalities.
API changes:
Currently SlimTrie provides Marshal and Unmarshal but the return value are different: func (st *SlimTrie) marshal(writer io.Writer) (cnt int64, err error)
On-disk data structure changes:
Currently on disk format is:
Children-header
Children-array
Steps-header
Steps-array
Leaves-header
Leaves-array
The header
is written by package serialize
.
After adapting to proto.Marshaler
, the header is not stored any more:
Children-array
Steps-array
Leaves-array
It should be responsibility of upper level to add the header.
E.g. by calling serialize.Marshal(slimtrie)
instead of calling slimtrie.Marshal
.
serialize.Marshal
writes a header(consists of version and length of the marshaled object), then finaly the on-disk layout should be:
SlimTrie-header
Children-array
Steps-array
Leaves-array
Make package bit a drop-in replacer of math/bits
.
Describe what to do
bit: rename to bits; and should respect golang math/bits
APIs
Here's the Weekly Digest for openacid/slim:
Last week 3 issues were created.
Of these, 1 issues have been closed and 2 issues are still open.
💚 #116 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.1, by dependabot-preview[bot]
💚 #115 [Snyk] Security upgrade pyyaml from 5.1 to 5.2, by snyk-bot
❤️ #114 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.0, by dependabot-preview[bot]
🔈 #114 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.0, by dependabot-preview[bot]
It received 2 comments.
Last week, 2 pull requests were created, updated or merged.
Last week, 2 pull requests were updated.
💛 #116 build(deps): bump github.com/stretchr/testify from 1.3.0 to 1.5.1, by dependabot-preview[bot]
💛 #115 [Snyk] Security upgrade pyyaml from 5.1 to 5.2, by snyk-bot
Last week there were no commits.
Last week there were no contributors.
Last week there were 5 stagazers.
⭐ Demonstrandum
⭐ vrischmann
⭐ niciyan
⭐ an63
⭐ gvelo
You all are the stars! 🌟
Last week there were no releases.
That's all for last week, please 👀 Watch and ⭐ Star the repository openacid/slim to receive next weekly updates. 😃
You can also view all Weekly Digests by clicking here.
Your Weekly Digest bot. 📆
go native if x != y { t.Fatalf...
is ugly.
Need a new one with more human readable syntax.
Is the requested enhancement related to a problem? Please describe.
No
Describe the solution
Affect other component or side effect
It's better if a new test framework compatible with go native testing.
Otherwise tests need to be rewritten.
Describe what to do
Let st.neighborBranches
not deal with leaf
branch, it should be dealt with outside this function.
This function should only deal with non-leaf branch.
Thus the searching logic would be more clear.
The code shows that: if word == LeafWord
then:
ltIdx
is always -1
eqIdx
is always eqIdx
rtIdx
is the first child of current node, or -1
(if no child).ltLeaf
(indicate if ltIdx
is a leaf) is always false
Thus moving logic about leaf out of this function seems better and clear.
func (st *SlimTrie) neighborBranches(idx uint16, word byte) (ltIdx, eqIdx, rtIdx int32, ltLeaf bool) {
ltIdx, eqIdx, rtIdx = int32(-1), int32(-1), int32(-1)
ltLeaf = false
isLeaf := st.Leaves.Has(int32(idx))
if word == LeafWord {
if isLeaf {
eqIdx = int32(idx)
}
} else {
if isLeaf {
ltIdx = int32(idx)
ltLeaf = true
}
}
bm, of, found := st.getChild(idx)
if !found {
return
}
if (bm >> word & 1) == 1 {
eqIdx = int32(getChildIdx(bm, of, uint16(word)))
}
ltStart := word & WordMask
for i := int8(ltStart) - 1; i >= 0; i-- {
if (bm >> uint8(i) & 1) == 1 {
ltIdx = int32(getChildIdx(bm, of, uint16(i)))
ltLeaf = false
break
}
}
rtStart := word + 1
if word == LeafWord {
rtStart = uint8(0)
}
for i := rtStart; i < LeafWord; i++ {
if (bm >> i & 1) == 1 {
rtIdx = int32(getChildIdx(bm, of, uint16(i)))
break
}
}
return
}
Create a struct that implements interface error
instead of wrapping struct error
.
pkg/errors
seems like a good choice:
https://github.com/pkg/errors
There is a great sample of README:
https://gist.github.com/PurpleBooth/109311bb0361f32d87a2
Here's the Weekly Digest for openacid/slim:
Last week 1 issue was created.
It is still open.
💚 #122 build(deps): bump github.com/golang/protobuf from 1.3.1 to 1.4.0, by dependabot-preview[bot]
Last week, 1 pull request was created, updated or merged.
Last week, 1 pull request was updated.
💛 #122 build(deps): bump github.com/golang/protobuf from 1.3.1 to 1.4.0, by dependabot-preview[bot]
Last week there were no commits.
Last week there were no contributors.
Last week there was 1 stargazer.
⭐ shijiayun
You are the star! 🌟
Last week there were no releases.
That's all for last week, please 👀 Watch and ⭐ Star the repository openacid/slim to receive next weekly updates. 😃
You can also view all Weekly Digests by clicking here.
Your Weekly Digest bot. 📆
Synopsis should show how to create and query a slimtrie
Here's the Weekly Digest for openacid/slim:
This week, no issues have been created or closed.
This week, no pull requests has been proposed by the users.
This week, no user has contributed to this repository.
This week, no user has starred this repository.
This week, there have been no commits.
This week, no releases were published.
That's all for this week, please watch 👀 and star ⭐ openacid/slim to receive next weekly updates. 😃
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.