blevesearch / bleve Goto Github PK
View Code? Open in Web Editor NEWA modern text/numeric/geo-spatial/vector indexing library for go
License: Apache License 2.0
A modern text/numeric/geo-spatial/vector indexing library for go
License: Apache License 2.0
See also full ICU analysis
ability to load synonyms from files (like stop word lists)
ability to either expand (index all synonyms)
or contract (consolidate synonyms to single version)
also, investigate wordnet: http://wordnet.princeton.edu/
Depends on #19
While we can't use them for the index keys which we craft to get the desired sort order, we should use protobufs to encode the index values. This will make the binary serialization/deserialization less error prone, more compact, and easier to evolve over time.
Two modes:
Currently index term entries are:
't'
Would like to add support for also storing the position of this term in any arrays that were a part of the field path.
Not 100% decided that this must be in the key, but that would be the only way to have some hope of efficiently querying on this information.
The idea is to be able to further qualify queries and say that in addition to other query criteria, matching items must occur in the same parent element.
Consider the following documents in an index.
{
"name": "a",
"children": [
{
"name": "c",
"age": 25
},
{
"name": "d",
"age": 15
},
}
{
"name": "b",
"children": [
{
"name": "c",
"age": 15
},
{
"name": "d",
"age": 25
},
}
Logically we want to query:
child.name = "c" AND child.age < 20 AND same child
Both documents have a child named "c" and a child who's age is less than 25, but ONLY "b" satisfies both criteria in the same child.
The implementation idea is to include the position in the children array, and the query criteria "same child" is accomplished by verifying that matching items have the same value.
Initial implementation should just operate at query time.
If we swap the field and term order in the index key we can support faceting at query time. For every document satisfying the original query, we can look up the document in the back index, and find entries for the field that is being faceted. Seems like we don't even have to load that key, just be able to parse the field id and terms. For categorical facets the terms are bucketed and counted. For numerical range facets the parsed terms are bucketed and counted. The top-N facets are then returned with the query results.
update readme to reference google group
Once the new API is done, the project README should illustrate how easy it is to index data and query it.
Currently the back index contains 2 separate lists of more strongly typed data. This should be changes to just a flat list of keys. This will make it easier to introduce new index row types in the future without having to keep updating the way the back index works.
truncate token at the specified max lenght
useful for fields left as a single token
The top-level bleve package should be all one needs to import to achieve the following:
Depends #23
use unicode/utf8 package RuneCount method
also rename existing length filter to ByteLength
will improve turkish analyzer
options min length, max length
for each input token, compute ngram tokens based on parameters, emit all resulting tokens
Need to ensure we have the ability to store things like ngram entries and not confuse them with indexed terms.
will improve analyzer for french
initial wiki pages for:
ability to tag words as keywords
keywords should then be ignored by the stemmer
arabic, german, hindi, indic, kurdish, persian, and scandanavian
options min length, max length, side (front/back)
for each input token, compute ngram tokens based on parameters, emit all resulting tokens
Users should be able to query an index using nothing but the top-level bleve package
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.