Giter Site home page Giter Site logo

smarty-archives / mafsa Goto Github PK

View Code? Open in Web Editor NEW
295.0 295.0 25.0 35 KB

Package mafsa implements Minimal Acyclic Finite State Automata in Go, essentially a high-speed, memory-efficient, Unicode-friendly set of strings.

Home Page: https://godoc.org/github.com/smartystreets/mafsa

License: Other

Go 100.00%

mafsa's People

Contributors

joliver avatar mdwhatcott avatar mholt avatar shugyousha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mafsa's Issues

Impossible to use utf

It seems something wrong with encoding or decoding of utf. Consider this test:

import (
	"github.com/smartystreets/mafsa"
	"github.com/stretchr/testify/require"
	"testing"
)

func TestRussian(t *testing.T) {
	a := mafsa.New()
	a.Insert("я") // I'm in russian
	a.Finish()
	a.Save("test")
	require.True(t, a.Contains("я")) // Fine

	b, err := mafsa.Load("test")
	require.NoError(t, err)
	require.True(t, b.Contains("я")) // Failure!!
}

func TestSpanish(t *testing.T) {
	a := mafsa.New()
	a.Insert("Gracías")
	a.Finish()
	a.Save("test")
	require.True(t, a.Contains("Gracías")) // Thank you in spanish

	b, err := mafsa.Load("test")
	require.NoError(t, err)
	require.True(t, b.Contains("Gracías")) // Failure!!
}

Support Unicode encodings

The encoding currently only supports ASCII (oops - forgot to upgrade that) since each character only gets 1 byte. This should be expanded to more bytes when the user wants to encode trees with Unicode characters.

However, since this greatly increases the file size and may not be necessary, so this option should be configurable.

When encoding, adjust pointer size

To minimize the size of the file even more, the size of the pointer should be adjusted based on how many items are in the tree that are being encoded. Right now we simply use 4 bytes, but small trees don't need this many. 4 bytes allows over 4 billion items...

The pointer size should be automatically adjusted or configurable.

unused Entry and unnecessary entry

I've been getting a deeper understanding of the code and found the following.

  1. The type Entry defined in mintree.go does not appear to be used anywhere. It is defined here: https://github.com/smartystreets/mafsa/blob/1575156d598714bb1c41713191eefb7f5ccf3e8b/mintree.go#L26

  2. The entry argument to decodeEdge(...) seems to serve no purpose. We only ever append to it and pass to recursive calls, but its never used for anything. Further, when I remove it, and all usage of it, all tests in the package continue to pass. See: https://github.com/smartystreets/mafsa/blob/1575156d598714bb1c41713191eefb7f5ccf3e8b/decoder.go#L85

Is this just left over from some previous version of the implementation? Or am I missing some key aspect of the way things are working?

Elaborate on fuzzy matching and autocomplete?

From the README,

Basically, it's a set of strings that lets you test for membership, do spelling correction (fuzzy matching) and autocomplete, but with higher memory efficiency than a regular trie.

Can you please elaborate on how this can be used for fuzzy matching and autocomplete?

As far as I can tell, you can only test for strict presence of a full string. It's not possible to check if a substring or a fuzzy match exists in the set, is it?

Share the testing data

Thanks for making this. Can you please share the 8M strings of testing data you used for this? This will give a good point of comparison with other libs out there.

Unusual Encoder/Decoder signatures?

Please correct me if I'm wrong. I don't think there's a standard interface for Encoder/Decoder types, but as far as I can see, they all typically take an io.Reader to create a Decoder, and io.Writer to create an Encoder.

For example, look at:

http://godoc.org/encoding/json#NewDecoder
http://godoc.org/encoding/xml#NewDecoder
http://godoc.org/encoding/ascii85#NewDecoder

Then, their Encode/Decode methods take one parameter and return just error.

The Encoder/Decoder in this package do not take a writer/reader, but instead return the encoded/decoded data alongside the error.

My question is, would it be a better API if they did take a Reader/Writer?

Consistent use of []rune and string

We should consider making more consistent use of either []rune or string rather than both in arguments of the exported functions/methods.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.