Giter Site home page Giter Site logo

nextapps-de / flexsearch Goto Github PK

View Code? Open in Web Editor NEW
11.9K 99.0 467.0 3.05 MB

Next-Generation full text search library for Browser and Node.js

License: Apache License 2.0

JavaScript 100.00%
search search-algorithm search-engine searching-algorithms searching search-in-text full-text-search fulltext-search elasticsearch nodejs

flexsearch's Issues

Error on compile.js

I tried to build this library with 'npm run build-compact' and got some errors like below :

/bin/sh: -c: line 0: unexpected EOF while looking for matching '' /bin/sh: -c: line 1: syntax error: unexpected end of file { Error: Command failed: java -jar node_modules/google-closure-compiler-java/compiler.jar --compilation_level=ADVANCED_OPTIMIZATIONS --use_types_for_optimization=true --new_type_inf=true --jscomp_warning=newCheckTypes --generate_exports=true --export_local_property_definitions=true --language_in=ECMASCRIPT6_STRICT --language_out=ECMASCRIPT6_STRICT --process_closure_primitives=true --summary_detail_level=3 --warning_level=VERBOSE --emit_use_strict=true --output_manifest=log/manifest.log --output_module_dependencies=log/module_dependencies.log --property_renaming_report=log/renaming_report.log' --js='flexsearch.js' --js='lang/**.js' --js='!lang/**.min.js' --define='RELEASE=compact' --define='DEBUG=false' --define='PROFILER=false' --define='SUPPORT_WORKER=false' --define='SUPPORT_ENCODER=true' --define='SUPPORT_CACHE=false' --define='SUPPORT_ASYNC=true' --define='SUPPORT_PRESETS=true' --define='SUPPORT_SUGGESTIONS=false' --define='SUPPORT_SERIALIZE=false' --define='SUPPORT_INFO=false' --define='SUPPORT_DOCUMENTS=true' --define='SUPPORT_WHERE=false' --define='SUPPORT_LANG_DE=false' --define='SUPPORT_LANG_EN=false' --js_output_file='dist/flexsearch.compact.js' && exit 0

and just found a simple error in 'compile.js(116:92)'.

exec("java -jar node_modules/google-closure-compiler-java/compiler.jar" + parameter + "' --js='flexsearch.js' --js='lang/**.js' --js='!lang/**.min.js'" + flag_str + " --js_output_file='dist/flex search." + (options["RELEASE"] || "custom") + ".js' && exit 0", function(){

After removing the unnecessary single quotation after parameter + ", the build process worked fine.
I think it's just a mistyping... maybe. 😓

Error while loading language files on node.js

I'm trying to load the language files to use with the stemmer p.e., but I'm getting a TypeError: Cannot read property 'registerLanguage' of undefined error.

var FlexSearch = require('flexsearch')
require(require('flexsearch/lang/en')

The error seems to indicate that the flexsearch object is not in scope, but when pass it as a global variable I get the same error. Am I missing something here?

How does suggestion work?

I tried to activate the suggestion function but it does not change anything in the result. How does it work?

thanks.

TypeError C is not a function

Thanks for this capability. I am excited to learn how this works for several use cases I have

I ran your 'best practice with some modeification. I cannot find the source of the error ...
"c is not a function'

Here is my code

const FlexSearch = require("flexsearch")

const bookstore = new FlexSearch();
const pizzashop = new FlexSearch();
const votingbooth = new FlexSearch();

let settings = {
action: "score",
adventure: {
encode: "extra",
tokenize: "strict",
depth: 5,
threhold: 5,
doc: {
id: "id",
field: ["intent", "text"]
} },

comedy: {
    encode: "advanced",
    tokenize: "forward",
    threshold: 5
}

}
let index = {}

const add = (id, cat, intent, text) => {
console.log(gr(Starting on Index ${id}))
console.log(for ${cat}, ${intent}, ${text})
try {
(index[cat] || (
index[cat] = new FlexSearch(settings[cat])
)).add(id, intent, text);
} catch(error) {
console.log(error)
}

}

const search = (cat, query) => {
return index[cat] ? index[cat].search(query) : [];
}

let x = 0
training.map((t) => {
console.log(b(Creating index ${x}))
x++
add(x, "bookstore", t.intent, t.text);
add(x, "pizzashop", t.intent, t.text);
add(x, "votingbooth", t.intent, t.text);
})

//add(1, "action", "Movie Title");
//add(2, "adventure", "Movie Title");
//add(3, "comedy", "Movie Title");

console.log(r(THIS SHOULD EXECUTE LAST))
//index.update(10025, "Road Runner");
//index.remove(10025);
var result1 = search("bookstore", "i am searching for a book"); // --> [1]
var result2 = search("pizzashop", "howdy"); // --> [1]
var result3 = search("votingboooth", "i need directions"); // --> [1]

console.log(========== FAST SEARCH TEST ==========)
console.log(result1)
console.log(result2)
console.log(result3)

The log shows an empty array

Contextual scoring doesn't seem to be working

When I set a depth, I would expect that if I search for multiple terms, documents that contain those terms near each other would score higher.

Example:

const FlexSearch = require(`flexsearch`)

const index = new FlexSearch({
	tokenize: `strict`,
	encode: `advanced`,
	cache: false,
	doc: {
		id: `id`,
		field: {
			content: {
				threshold: 9,
				resolution: 10,
				depth: 2,
			},
		},
	},
})

index.add([{
	id: 1,
	content: `billy who now what billy okay so what now thorton?`,
}, {
	id: 2,
	content: `billy bob thorton`,
}])

console.log(
	index.search(`billy thorton`)
)
// => [ { id: 1,
//    content: 'billy who now what billy okay so what now thorton?' },
//  { id: 2, content: 'billy bob thorton' } ]

I would expect document id 2 to be the top result, since it contains "billy" and "thorton" within two words of each other, but the top result is actually document id 1.

Tested in [email protected].

Multiple documents update by query?

Hello, first of all, thanks for creating new nice search engine. We are looking to use it instead of elasticsearch, which is very complex and have lots of legacy in it’s DSL and difficulties to get desired results. Currently we are interested if there’s any plans to implement multiple documents update by single query? It’s necessary, for example, to disable some of products when it’s category is disabled.

Also, to avoid creating another ticket, I would like know if it is possible to boost search result based on numeric value stored in search index itself.

Thanks in advance.

Any reason for all the weird linebreaks in flexsearch.js?

I say weird, but I should rather say… unconventional.

Like:

while(i < length){

                        tmp = arr[i++];

                        const index = "@" + tmp;

                        if(check[index]){

Are they on purpose?

If so, what is their purpose?

If not, could using tools like Prettier (or Prettier + ESLint) help?

Settings get overriden

We use flexsearch in a react app. Performs pretty well, thanks!
We store the flexsearch settings in a constant outside of a component. We also store documents and not key values pairs.
The first initialization of the component works perfect. All following behave wrong. The doc property is null. I guess flexsearch accesses the object by reference and somehow replaces the doc property.
image
Is this behavior expected?

How to create an index for a book

Hey
First thanks for the amazing library!

I would like to know if you can index a number and get the subject name, sub-topic, and paragraph number.
And whether it is possible to find two paragraphs together
For example

book:
[
    {
        "topic": "topic",
        "content": [
            {
                "title":
                "parts": [
                    "word1, word2, word3, word4, word5",
                    "word6, word7, word8, word9, word10",
                ]
            }
        ]
    }
]

index.search("word2 word3") // = [{topic: "topic1", title: "title1", part: 0}]
index.search("word5 word6") // = [{topic: "topic1", title: "title1", part: 0}, {topic: "topic1", title: "title1", part: 0}]
``` 

Thanks

Multivalue attributes

What is the best way to handle documents with multi value attributes?
For example a document with a m:n relation to another entity.

Exception thrown when searching for a value containing whitespace where suggest is set to true

Hi Thomas

Using the following example

const FlexSearch = require('./flexsearch')

const fs = new FlexSearch({
  encode: 'extra',
  tokenize: 'full',
  threshold: 1,
  depth: 4,
  resolution: 9,
  async: false,
  worker: 1,
  cache: true,
  suggest: true,
  doc: {
    id: 'id',
    field: [ 'intent', 'text' ]
  }
})

fs.add([
  {
    id: 0,
    intent: 'intent',
    text: 'text'
  }, {
    id: 1,
    intent: 'intent',
    text: 'howdy - how are you doing'
  }
])

console.log('INFO', fs.info())

const result = fs.search('howdy', { bool: 'or' })
console.log('RESULT', result)

const result2 = fs.search('howdy -', { bool: 'or' })
console.log('RESULT', result2)

An exception is thrown using 'howdy - as search parameter. When setting suggest to false, the search is successful, but the search for howdy - does not find any results.

The exception thrown is

.../search/flexsearch.js:3308
                    z = suggestions.length;
                                    ^

TypeError: Cannot read property 'length' of undefined
    at intersect (.../servers/search/flexsearch.js:3308:37)
    at FlexSearch.merge_and_sort (.../servers/search/flexsearch.js:1393:22)
    at FlexSearch.search (.../servers/search/flexsearch.js:1561:43)
    at Object.<anonymous> (.../servers/search/test2.js:33:19)
    at Module._compile (internal/modules/cjs/loader.js:734:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:745:10)
    at Module.load (internal/modules/cjs/loader.js:626:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:566:12)
    at Function.Module._load (internal/modules/cjs/loader.js:558:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:797:12)

In flexsearch on line 3068

function intersect(arrays, limit, cursor, suggest, bool, has_not) {

            let result = [];
            let suggestions;
            const length_z = arrays.length;

suggestions is not being assigned, because the while loop on line 3133 is false

while(++z < length_z){

so the assignment of the suggestion variable on line 3211 is bypassed

                    let found = false;

                    i = 0;
                    suggestions = [];

                    while(i < length){

The reason the search for howdy -, when suggestions is false, is unsuccessful is probably because of the options passed in. Should I implement my own tokenizer if I would like to find queries like howdy -?

Thanks in advance

Regards
William

What are "depth" and "threshold"?

I don't know enough fulltext index terminology to infer what these two settings actually mean.

I'm guessing from context that "depth" is the maximum number of words/tokens away a term can be and still be considered relevant.

I have no idea what the "threshold" number implies. :-x

I know I want that sweet contextual searching, so I'd love to figure this out so I can pick numbers appropriate to my use case.

Property 'length' of undefined when using web-worker

I tried setting the "worker" option to false and everything worked very well. But when I enable this option and set it to any number different than false, my console prints "Uncaught (in promise) TypeError: Cannot read property 'length' of undefined".

Here is the screenshot:

screenshot_7

image

I have around 30.000 items, thats why I want to use the web worker feature.

Any ideas? I can give another informations if necessary.

Paging with mutltiple fields/boost.

Setup

var index = FlexSearch.create({
    doc: {
        id: "url",
        field: [
            "title",
            "content"
        ]
    }
});

Working

Invoke:

index.search(
    "test",
    {
        page: true,
        limit: 5
    })

Result:

{
  "page": "0",
  "next": "5",
  "result": [
    {
      "title": "Load Testing V. 1.0.1",
      "content": "test",
      "url": "/Project_Management/validations/validation2"
    },
    {
      "title": "Pre Test Inpsection Report",
      "content": "test",
      "url": "/V_and_V/5016-09-F21"
    },
    {
      "title": "Packaging Validaiton Test Report",
      "content": "test",
      "url": "/V_and_V/5016-09-F19"
    },
    {
      "title": "EMC 60601 Test Plan",
      "content": "test",
      "url": "/V_and_V/5016-09-F23"
    },
    {
      "title": "Third Party Testing",
      "content": "test",
      "url": "/3rd_Party_Testing"
    }
  ]
}

Not working

Invoke:

index.search(
    [
        {
            field: "title",
            query: "test",
            boost: 1
        },
        {
            field: "content",
            query: "test",
            boost: 0.5
        }
    ],
    {
        page: true,
        limit: 5
    }));

Result:

{
  "page": "0",
  "next": null,
  "result": [
  ]
}

Comments

I need to be able to page the results, while also search multiple fields with different boost values.

Logical Operator (Please Vote)

Which kind of expression do you prefer?

1. required / optional / prohibited

var results = index.search([{
    field: "title",
    query: "foobar",
    presence: "required"
},{
    field: "body",
    query: "content",
    presence: "optional"
},{
    field: "blacklist",
    query: "xxx",
    presence: "prohibited"
}]);

2. and / or / not

var results = index.search([{
    field: "title",
    query: "foobar",
    bool: "and"
},{
    field: "body",
    query: "content",
    bool: "or"
},{
    field: "blacklist",
    query: "xxx",
    bool: "not"
}]);

3. + / -

var results = index.search([{
    field: "+title",
    query: "foobar"
},{
    field: "body",
    query: "content"
},{
    field: "-blacklist",
    query: "xxx"
}]);

Distinct values and distinct count

Hello, is it possible to count distinct values of field or\and get distinct values for some fields? For example, when searching products in catalog, it's good to know distinct category id's of results

Search results depend on the order of fields

NOTE: I've rewritten the entire issue because I've found a way to reproduce my issue on a very small dataset.

I've noticed that I'm missing search results depending on the order of fields that I provide when creating the index.

In the following example, there are two objects where notation:0 matches the search term WW 8840, and one object where prefLabel:de matches WW 8840. In the first example, only the latter object is returned as a search result even though all fields are supposed to be searched. The second example returns the correct search results just by reordering the fields (putting notation:0 to the end). Note that when specifying notation:0 as the only field to search, it will return the correct results in both cases.

Non-working example (prints 1 and 2 even though the first query should return 3 results):

const FlexSearch = require("flexsearch")

let index = new FlexSearch({
  doc: {
    id: "uri",
    field: [
      "prefLabel:de",
      "notation",
      "editorialNote:de",
    ]
  },
  profile: "score"
})

// Example dataset
let concepts = [
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WW%208720%20-%20WW%209239"}],"created":"2012-07-05","editorialNote":{"de":"(Blutgruppen s. XD 3200)"},"http://www.w3.org/2004/02/skos/core#closeMatch":[{"uri":"http://d-nb.info/gnd/4130604-1"},{"uri":"http://d-nb.info/gnd/4022814-9"},{"uri":"http://d-nb.info/gnd/4070945-0"},{"uri":"http://d-nb.info/gnd/4074195-3"}],"identifier":["152145:13422"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WW 8840 - WW 8879","prefLabel":{"de":"Blutkörperchen (Erythrozyt, Leukozyt), Hämoglobin"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WW%208840%20-%20WW%208879"},
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WD%205000%20-%20WD%205970"}],"created":"2012-07-05","editorialNote":{"de":"(Antibiotika s. XI 3500)"},"http://www.w3.org/2004/02/skos/core#closeMatch":[{"uri":"http://d-nb.info/gnd/4155845-5"},{"uri":"http://d-nb.info/gnd/4276935-8"},{"uri":"http://d-nb.info/gnd/4176522-9"},{"uri":"http://d-nb.info/gnd/4175383-5"},{"uri":"http://d-nb.info/gnd/4148701-1"}],"identifier":["148204:"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WD 5380","prefLabel":{"de":"Pyrrolfarbstoffe, Cytochrome, Chromoproteine (Hämoglobin s. WW 8840)"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WD%205380"},
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WW%208840%20-%20WW%208879"}],"created":"2012-07-05","editorialNote":{},"identifier":["152145:13423"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WW 8840","prefLabel":{"de":"Allgemeines"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WW%208840"}
]

index.add(concepts)

let results
results = index.search("WW 8840")
console.log(results.length) // only matches the second concept (which mentions "WW 8840" in label)

results = index.search("WW 8840", {
  field: "notation"
})
console.log(results.length) // correctly matches two concepts
// with large dataset, also correctly matches the two concepts

Working example (prints 3 and 2 as expected, just by reordering fields):

const FlexSearch = require("flexsearch")

let index = new FlexSearch({
  doc: {
    id: "uri",
    field: [
      "prefLabel:de",
      "editorialNote:de",
      "notation",
    ]
  },
  profile: "score"
})

// Example dataset
let concepts = [
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WW%208720%20-%20WW%209239"}],"created":"2012-07-05","editorialNote":{"de":"(Blutgruppen s. XD 3200)"},"http://www.w3.org/2004/02/skos/core#closeMatch":[{"uri":"http://d-nb.info/gnd/4130604-1"},{"uri":"http://d-nb.info/gnd/4022814-9"},{"uri":"http://d-nb.info/gnd/4070945-0"},{"uri":"http://d-nb.info/gnd/4074195-3"}],"identifier":["152145:13422"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WW 8840 - WW 8879","prefLabel":{"de":"Blutkörperchen (Erythrozyt, Leukozyt), Hämoglobin"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WW%208840%20-%20WW%208879"},
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WD%205000%20-%20WD%205970"}],"created":"2012-07-05","editorialNote":{"de":"(Antibiotika s. XI 3500)"},"http://www.w3.org/2004/02/skos/core#closeMatch":[{"uri":"http://d-nb.info/gnd/4155845-5"},{"uri":"http://d-nb.info/gnd/4276935-8"},{"uri":"http://d-nb.info/gnd/4176522-9"},{"uri":"http://d-nb.info/gnd/4175383-5"},{"uri":"http://d-nb.info/gnd/4148701-1"}],"identifier":["148204:"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WD 5380","prefLabel":{"de":"Pyrrolfarbstoffe, Cytochrome, Chromoproteine (Hämoglobin s. WW 8840)"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WD%205380"},
  {"@context":"https://gbv.github.io/jskos/context.json","broader":[{"uri":"http://rvk.uni-regensburg.de/nt/WW%208840%20-%20WW%208879"}],"created":"2012-07-05","editorialNote":{},"identifier":["152145:13423"],"inScheme":[{"uri":"http://uri.gbv.de/terminology/rvk/"}],"modified":"2018-12-14","notation":"WW 8840","prefLabel":{"de":"Allgemeines"},"type":["http://www.w3.org/2004/02/skos/core#Concept"],"uri":"http://rvk.uni-regensburg.de/nt/WW%208840"}
]

index.add(concepts)

let results
results = index.search("WW 8840")
console.log(results.length) // only matches the second concept (which mentions "WW 8840" in label)

results = index.search("WW 8840", {
  field: "notation"
})
console.log(results.length) // correctly matches two concepts
// with large dataset, also correctly matches the two concepts

Any idea why this is happening? Thanks!

Serializing as stream instead of string

I'm trying to create an index over a large dataset and I want to separate the script that's creating the index from the script that's using the index. The index creation seems to work very well, but when I use index.export(), I'm getting a RangeError: Invalid string length error. Is there a way to export the index as a file without getting this error? A possible solution would be to allow exporting via a stream that could be written to a file directly.

Thanks!

Pagination: forwards and backwards

The next page is not a problem, but the previous one. When I call the previous page, I get an array instead of an object. Then the fields for the page are also missing.
Could you give an simple example of a pagination back and forth?

Can't destroy index if created with doc parameter

flexsearch version 0.5.1

Problem

Can't destroy index instance in the browser because of the error.

Details

Here is test HTML:

<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <title>Benchmark Presets</title>
    <style>
        body{
            font-family: sans-serif;
        }
        table td{
            padding: 1em 2em;
        }
        button{
            padding: 5px 10px;
        }
    </style>
</head>
<body>
<div id="container"></div>
<script src="../dist/flexsearch.min.js"></script>
<script>
  (function(){
    var index = new FlexSearch({
        doc: {
            id: 'id',
            field: 'title'
        }
    });
    index.add([
      { id: 1, title: 'foo' },
      { id: 2, title: 'bar' }
    ])
    console.log(index.search('foo'))
    index.destroy()
  })();
</script>
</body>
</html>

Window console displays error:

 TypeError: a is undefined[Learn More]
flexsearch.min.js:33:45

Error when search "john wick" on demo site

Hi,
Your work is great.

After playing around, i found an issue with your demo.

Search string "john wi" is shown fine.
But search "john wic" is empty.

Could you check it?

2019-03-12_215239

Cyrillic languages support

Hello,

I've faced with the following behaviour.

This example works as expected:

const FlexSearch = require('flexsearch');
const index = new FlexSearch();

index.add(1, 'Foobar')
console.log(index.search('Foobar'));
// [ 1 ]

But this one shows no results.

const FlexSearch = require('flexsearch');
const index = new FlexSearch();

index.add(1, 'Фообар')
console.log(index.search('Фообар'));
// []

I've tested in node and in browser.

The benchmarks with different presets seem unfair

The benchmarks for query and memory tests use different presets, but compare to same config of other libraries.
It would be helpful to be able to compare the difference of flexsearch performance between presets, while showing a full, unbiased picture.

Question on 0.7.0/Field boosting

I see the documentation on indexing different fields in a document has been fleshed out, which is great, I was wondering how that would work.

The readme claims that field searching is a thing in 0.7.0, but the changelog only goes up to 0.6.0 and the version on npm is 0.6.2 – what's the deal there?

Besides wondering where I could find 0.7.0 I have one question: how does boosting work?

I have a document with a title, and a body. I want matches in the title to count towards the score 10x more than matches in the body.

Could I achieve that by setting the boost on the title field to 10, and the boost on the body field to 1? Is that how boost works, or have I misguessed? What is the default boost for a field?

Results are not unique when matches in more than one field

I expected to get matching documents to be unique within result. What is the angle for repeating these?

Example:

const f = new FlexSearch({
	doc: {
		id: 'id',
		field: ['field1', 'field2']
	}
})

const docs = [
	{id: 1, field1: 'phrase', field2: 'phrase'}
]

f.add(docs)
console.log(f.search('phrase'))
// Result = [{id: 1, field1: "phrase", field2: "phrase"} 1: {id: 1, field1: "phrase", field2: "phrase"}]

Port of this library for Ruby

Hi @ts-thomas ,

I am a beginner to open source contribution / projects. I want to work on the port of this library for Ruby. If possible can you point towards any reference/article/blog post related to scoring algorithm and other implementations used in this library. If anyone is already working on this library for Ruby, please let me know, I would also love to contribute to the project.

Sorting

Pretty neat. Performances really well.

I read in #7 "Flexsearch is a micro library whose complexity we want to keep as low as possible in the core. "

What about sorting? We are currently considering replacing our list filters by flexsearch. It would be nice to use the same index also for sorting.

How best to return unindexed data for each match (as well as the ID)?

For each item that matches a query, I'd like to be able to get unindexed arbitrary data — not just its ID.

For example: for matches when searching Shakespeare plays, I'd like to be able to return the text of an individual line (which is indexed) but also play name, location, speaker, etc.

What's the best way to achieve this?

I can do this in Elasticlunr (for example) like this:

const index = elasticlunr(function() {
    this.addField('text'); // doc property to be indexed
    this.setRef('id'); // doc property that is the ID of each item
    for (const doc of docs) {
      // doc includes additional arbitrary data for each item: play, speaker, location, etc.
      this.addDoc(doc); 
    }
}

Would I simply need to create an object that maps IDs with item data, or is there a better way to do this?

Great project by the way — thanks so much for building this.

Data doesn't get indexed

Hi

I am trying to run the example you posted in issue #30 without any luck.

Here is the code:

const FlexSearch = require('flexsearch')

// provide a document descriptor for each index
// the field "id" and at least one "field" is mandatory.

const settings = {
  'bookstore': {
    preset: 'score',
    doc: {
      id: 'id',
      field: ['intent', 'text']
    }
  },
  'pizzashop': {
    encode: 'extra',
    tokenize: 'strict',
    depth: 5,
    threshold: 5,
    doc: {
      id: 'id',
      field: ['intent', 'text']
    }
  },
  'votingbooth': {
    encode: 'advanced',
    tokenize: 'forward',
    threshold: 5,
    doc: {
      id: 'id',
      field: ['intent', 'text']
    }
  }
}

const index = {}

const add = (cat, doc) => {
  const i = index[cat] || (
    index[cat] = new FlexSearch(settings[cat])
  )
  i.add(doc)
}

const search = (cat, query) => {
  return index[cat] ? index[cat].search(query) : []
}

// provide documents which have the same structure as defined in the document descriptor above

const bookstore = [{
  id: 0,
  intent: 'intent',
  text: 'text'
}, {
  id: 1,
  intent: 'intent',
  text: 'i am searching for a book'
}]

const pizzashop = [{
  id: 0,
  intent: 'intent',
  text: 'text'
}, {
  id: 1,
  intent: 'intent',
  text: 'howdy'
}]

const votingbooth = [{
  id: 0,
  intent: 'intent',
  text: 'text'
}, {
  id: 1,
  intent: 'intent',
  text: 'i need directions'
}]

// add a full document or an array of documents to the index

add('bookstore', bookstore)
add('pizzashop', pizzashop)
add('votingbooth', votingbooth)

console.log('INFO', index['bookstore'].info())
console.log('INFO', index['pizzashop'].info())
console.log('INFO', index['votingbooth'].info())

console.log('INFO', index['bookstore'])
// search

const result1 = search('bookstore', 'i am searching for a book') // --> [1]
const result2 = search('pizzashop', 'howdy') // --> [1]
const result3 = search('votingbooth', 'i need directions') // --> [1]

console.log('========== FAST SEARCH TEST ==========')
console.log(result1)
console.log(result2)
console.log(result3)

and the ouput I get is:

INFO { id: 0,
  memory: 0,
  items: 0,
  sequences: 0,
  chars: 0,
  cache: false,
  matcher: 0,
  worker: undefined,
  threshold: 1,
  depth: 4,
  contextual: true }
INFO { id: 3,
  memory: 0,
  items: 0,
  sequences: 0,
  chars: 0,
  cache: false,
  matcher: 0,
  worker: undefined,
  threshold: 5,
  depth: 5,
  contextual: true }
INFO { id: 6,
  memory: 0,
  items: 0,
  sequences: 0,
  chars: 0,
  cache: false,
  matcher: 0,
  worker: undefined,
  threshold: 5,
  depth: 0,
  contextual: 0 }
INFO k {
  id: 0,
  o: [],
  f: 'strict',
  w: false,
  async: false,
  threshold: 1,
  b: 9,
  depth: 4,
  C: false,
  m: false,
  s: [Function: bound ],
  a:
   { id: [ 'id' ],
     field: [ [Array], [Array] ],
     index: { intent: [k], text: [k] },
     keys: [ 'intent', 'text' ] },
  h:
   [ [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {},
     [Object: null prototype] {} ],
  i: [Object: null prototype] {},
  c: [Object: null prototype] {},
  g:
   [Object: null prototype] {
     '0': { id: 0, intent: 'intent', text: 'text' },
     '1':
      { id: 1, intent: 'intent', text: 'i am searching for a book' } },
  v: true,
  cache: false,
  j: false }
========== FAST SEARCH TEST ==========
[]
[]
[]

Am I missing something?

Node version: 11.9.0

Thanks in advance...

Development Roadmap (Please Participate)

Please make suggestions or give some feedback.

1. Extract Core Functionality

The extraction of the core functionality is basically required for many upcoming features as well as for still existing ones, like:

  • Plugin API
  • Custom Tooling
  • Language-specific ports or migrations
  • Pluggable Workflows
  • All kinds of extensions

These still existing features has to remain as a core functionality:

  1. Lexical Pre-Scored Index
  2. Contextual-based Map
  3. Index-related Settings:
    • threshold
    • resolution
    • depth
    • rtl
  4. Matching Tokens (Query)
  5. Cursor-based Pagination
  6. Logical Operators
  7. Cross-Process Intersection
  8. Index-based Suggestions

The basic core API should have this methods:

  1. create
  2. init
  3. add
  4. update
  5. remove
  6. destroy
  7. match (search)

These missing features also needs to be integrated as a core functionality:

  • Providing abstract I/O, supporting various kinds of index storage:
    • In-Memory
    • Partial Persistent Storage (persistent documents, in-memory index)
    • Storage-only (persistent documents, persistent index)

These functions should be extracted as an optional tooling:

  • System-specific Features (Browser, Node.js):
    • Web Worker
    • Async
  • Language-specific Features:
    • Encoder
    • Tokenizer
    • Matcher, Stemmer, Filter
  • Documents (Field-Search)
  • Custom Search
  • Find / Where / Tags
  • Export / Import (Serialization)
  • Cache
  • Presets

2. Plugin API

The plugin API is required to provide additional tooling and features in a modular and extendable manner. The plugin API should have these capabilities:

  1. Extend via ad hoc methods
  2. Extend via pipeline
  3. Extend via events (callbacks)
  4. Plugin Package Descriptor

3. Prerequisites

  1. Extract language-specific logic
  2. Provide process connectivity and refactor

4. Language Port

There are several requests of a TypeScript port. The advantage of TypeScript compared to plain JavaScript may be too less, since the TypeScript also compiles to JavaScript and is also less optimized as the Google Closure Compiler for that purpose.

Technically there are two targets:

  1. Browser
  2. System (OS)

Browsers are actually covered as well as Node.js. Making a TypeScript port will do not cover any additional ecosystem. Only the formal codebase will differ and at the end it is just a different pattern for the same result. That's why I prefer a browser-less system-wide port over TypeScript. The language Rust is pretty close to TypeScript/JavaScript and covers 2., so this might be a better candidate for a port.

There is no final decision at the moment, so let us discuss pro and cons here.

Serialize/Deserialize for SSR ?

Does the library support serialize/deserialize flexsearch object as json ?
I'd love to create index in Node , but will deserialize the object in browser for client-side searching.

Relevance: can it be based on number of times a term occurs?

I would expect that if I search for a term, and that term appears once in document A but several times in document B, that B would have a higher position in the results than A. But that does not seem to be the case.

Example:

const FlexSearch = require(`flexsearch`)

const index = new FlexSearch({
	tokenize: `strict`,
	encode: `advanced`,
	cache: false,
	doc: {
		id: `id`,
		field: {
			content: {
				threshold: 9,
				resolution: 10,
			},
		},
	},
})

index.add([{
	id: 1,
	content: `billy bob thorton`,
}, {
	id: 2,
	content: `billy who now what billy okay so what now thorton?`,
}])

console.log(
	index.search(`billy`)
)
// => [ { id: 1, content: 'billy bob thorton' },
//  { id: 2,
//    content: 'billy who now what billy okay so what now thorton?' } ]

I would expect that a search for billy would have a higher score for document id 2 than document id 1, but the search returns document id 1 as the top result.

Tested with [email protected].

Contextual Search documentation is missing

The readme includes the line

Note: This feature is actually not enabled by default. Read here how to enable.

but the "here" link doesn't go to any page, and I can't find the intended target in the repo :-o

Remove Features: Where / Find / Tags

I thinking about to remove these features:

  • index.find() (get document by ID will remain)
  • index.where()
  • tag fields
  • where clause in custom search

The main reasons for this may:

  • they do not scale properly, just useful up to a medium size of document length
  • tags cannot be serialized, instead they need to recover from the original documents which slows down the import function
  • a custom helper function will replace this functionality and is also faster and also less redundant

What do you think about?

Benchmark with algolia ?

Can someone do a benchmark between this library and Algolia?
I just want to know if I should drop algolia for a better copycat?
Thank you ;)

Unexpected exception when attempting to call Index.search method

I tried to use code example from unit test, but got the following error:

Code to reproduce:

const FlexSearch = require('flexsearch')

// tslint:disable

;(async () => {
  const index = new FlexSearch({
    async: true,
    doc: {
      id: 'id',
      field: [ 'data:name' ]
    }
  })

  const data = [{
    id: 2,
    data: {
      title: 'Title 3',
      body: 'Body 3'
    }
  }, {
    id: 1,
    data: {
      title: 'Title 2',
      body: 'Body 2'
    }
  }, {
    id: 0,
    data: {
      title: 'Title 1',
      body: 'Body 1'
    }
  }]

  await index.add(data)

  console.log(index.search)

  const result = await index.search({
    field: 'data:body',
    query: 'body'
  })

  console.dir(result)
})()

Output:

[Function]
(node:10016) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'search' of undefined
    at h.search (C:\Users\User\Documents\Projects\test\node_modules\flexsearch\dist\flexsearch.node.js:24:281)
    at C:\Users\User\Documents\Projects\test\index.js:38:30
    at process._tickCallback (internal/process/next_tick.js:43:7)
    at Function.Module.runMain (internal/modules/cjs/loader.js:778:11)
    at startup (internal/bootstrap/node.js:300:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:826:3)
(node:10016) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:10016) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Environment: Node
Node version: v11.2.0
Flexsearch version: "^0.5.2"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.