Giter Site home page Giter Site logo

search-index's People

Contributors

anandthakker avatar craigshoemaker avatar cshum avatar dependabot[bot] avatar ehimah avatar eivindee avatar eklem avatar fergiemcdowall avatar gburgett avatar giladshoham avatar gitter-badger avatar greenkeeperio-bot avatar gueneler avatar holger-will avatar ilblog avatar jeffsee55 avatar jimkang avatar kldavis4 avatar lannka avatar mewwts avatar mikaelkaron avatar mistermoe avatar mrrefactoring avatar n1k0 avatar nhhagen avatar nlaplante avatar orangeswim avatar rngadam avatar sbiaudet avatar timosaikkonen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

search-index's Issues

Error thrown when searching using multiple words

I know that example from documentation uses different notation (['a, b']) but this is the only way the multiple-word search works for me:

    var request = {
        "query": {
          "slug": ['a', 'b']
        },
               ...
      };

Everything works fine if there are some results, in case there is no results found "Cannot get length property of null" error is thrown

      if (RIKeySet.length == 1) seekCutOff = (q.pageSize + q.offset);

empty function do not use indexPath

If you define an indexPath in options, the empty function rm 'si' folder and recreate a new db with 'si' as name. The database defined in options is never emptied.

"index warm up" functionality

Faceted search and longer search strings are very fast.

However single word queries that have a large recall can be slower. On the test dataset (reuters) on low end systems, a search for 'usa' takes around 250-300ms to return 12500 docs.

This can be speeded up by iterating through all docvector tokens in the index and caching search results for them.

TypError: undefined is not a function search-index.js:55

Hi,

I have a small snippet to add all files and subfolder files to the search-index with the help of the following lines:

var si = require('search-index')({ indexPath: 'index.gz' });

...


for (var i = 0; i < files.length; i++) {
    var f = files[i];  // f = file path
    debug_si('Add ' + f + '.');

    var batchName = 'sona';
    var filters = ['path'];
    var data = { };
    data[f] = { 'path': f };
    si.add({'batchName': batchName, 'filters': filters}, data, function (err) {
        if (err) {
            debug_si('Error adding' + key + '.');
            callback(err);
        }
    });
}

After the code run through a lot of information debug logs will be printed:

....
....
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "sorting tf sets"
[information] "reinserting tf sets"
[information] "[success] incremental calibration complete"
[success] "indexed batch: [object Object]"

But finally I got an exception:

D:\workspace_js\node-track-file-changes\node_modules\search-index\lib\search-index.js:55
    callback(msg);
    ^
TypeError: undefined is not a function
    at D:\workspace_js\node-track-file-changes\node_modules\search-index\lib\search-index.js:55:5
    at D:\workspace_js\node-track-file-changes\node_modules\search-index\lib\indexing\indexer.js:233:11
    at D:\workspace_js\node-track-file-changes\node_modules\search-index\lib\indexing\calibrater.js:43:9
    at D:\workspace_js\node-track-file-changes\node_modules\search-index\node_modules\level\node_modules\level-packager\node_modules\levelup\lib\levelup.js:351:9

I don't know why I get this error. Another question for me is what does 'batchName' and 'filters' really does.

Maybe someone can help me.

dependencies missing?

when simply doing var si = require('search-index'); the following error is thrown:

module.js:340
    throw err;
    ^
Error: Cannot find module 'fstream'
    at Function.Module._resolveFilename (module.js:338:15)
    at Function.Module._load (module.js:280:25)
    at Module.require (module.js:364:17)
    at require (module.js:380:17)
    at Object.<anonymous> (D:\programming\node_modules\search-index\lib\indexing\replicator.js:3:11)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.require (module.js:364:17)
    at require (module.js:380:17)

fstream is referenced in ./lib/indexing/replicator.jsbut not listed in the package.json

Index Schema Info

Discussion?

As search-index is upgraded it 'may not' be compatible with existing index schemas, perhaps its time to store a 'semver' flag in the index itself, and check this with the version of search-index in use and [foobar] to user if required.

If agreed I'll try submit update when I get a moment.

It it possible to make search-index silent?

Hello,

Thank you for the search-index module. I have a question. In many cases, I don't want to see the search-index logs. How can I ask to search-index to not log information?

Frank

Proper object instantiation

It should be possible to make totally distinct search engine objects.

At the moment, instantiating two or more search-indexs in the same program is problematic

Do instantiation like this:

var SearchIndex = require("search-index");
var si1 = new SearchIndex(options1);
var si2 = new SearchIndex(options2);

Range filters

It should be possible to filter on a range of values. For instance- time intervals, lat/lon,

Fielded search

Users can query on one field only. For example, only return docs with the term 'banana' in the title field

Testing

Build up a new test stack with karma and jasmine

Concentrate more on logic in search index, and compileability/correctness in Forage. Take web service testing out of Forage.

Stemming

Some support for english language stemming.

"buying" should give hits for "buy", "buys", "buyer", etc

Slow indexing after 200 or so documents

I have about 900 52k documents that has been created by reading from my sqlite database.

When I loop through the 900 52k docs and do si.add, it takes forever. The process slows down at around 200 and then indexes really slowly. Is this desired? or am I missing something?

*Update:
I had about 52k documents and not 900

Streaming API

I haven't checked the code thoroughly, but since search-index uses levelup, a streaming API shouldn't be impossible?

Indexing from an fs or http stream makes a lot of sense to me.

TypeError: Cannot read property 'length' of undefined

I give this error:

/home/fatih/test/node_modules/search-index/lib/mapreduce/searcher.js:42
      totalHits = intersection.length;
                              ^
TypeError: Cannot read property 'length' of undefined
    at /home/fatih/test/node_modules/search-index/lib/mapreduce/searcher.js:42:31
    at /home/fatih/test/node_modules/search-index/node_modules/level-multiply/level-multiply.js:17:15
    at proxy (/home/fatih/test/node_modules/search-index/node_modules/level-multiply/node_modules/after/lib/after.js:22:39)
    at /home/fatih/test/node_modules/search-index/node_modules/level-multiply/level-multiply.js:29:13
    at dispatchError (/home/fatih/test/node_modules/search-index/node_modules/level/node_modules/level-packager/node_modules/levelup/lib/util.js:131:7)
    at /home/fatih/test/node_modules/search-index/node_modules/level/node_modules/level-packager/node_modules/levelup/lib/levelup.js:197:14

My code is this:

var si = require('search-index');
var colors = require('colors');
var fs = require('fs');

var data = JSON.parse(fs.readFileSync('node_modules/search-index/test/testdata/reuters-021.json'));

si.add(data, 'reuters-021.json', [], function(indexingMsg)
{
    console.log(indexingMsg);
});

console.log("Search data *".underline.red);

si.search({
    'query': {
        '*':'*'
    }
}, function(searchResults) {
    console.log(searchResults.green);
});

Strip out bloom filter functionality

Following the principle of Ocrams razor, strip out bloom filter functionality- its not really needed for the direction that search-index is going in.

Multinode functionality

Should be able to index and retrieve documents from one or more remote indexes. Support sharding

Building with MSVC 2008 fails

The compiler complains that the file stdint.h cannot be found. It is available only in the recent MSVC versions. One workaround was suggested on stackoverflow; downloading the file compatible with MSVC from msinttypes.

Would you accept a patch fixing the build on older versions of MSVC, please? I'd detect such versions and include the file from msinttypes stored with a different name than stdint.h.

Where leveldb files stored?

Is that using leveldb to store index? So, where eveldb files stored, could I specify a location when initialize search-index?
Another question is, if I use Chinese characters, should I segment the sentence separated by a space before insets document into the index.
just like this:

si.add({'doc1':{'title':"中文 字体"}}, batchName, filters, function(msg) {
  res.send(msg);
});

use search api like this:

si.search("query": {"*": ["中文 字体"]}, function(msg) {
  res.send(msg);
});

Is these right?

Index Location Configurabilty

It would be helpful if there was a configuration that allows a consumer to set the locatio n of the index.

In reality, indexes are big, should be housed in a different place than the application server code, etc.

Is this something I can add as a pull request?

Ability to facet and filter on a _range_ of values

At the moment you can facet and filter on single values, but not value 'buckets' or ranges. This functionality was present in earlier builds, but has fallen out of the most recent build because of lazy documentation (my bad) and gaps in the test coverage (also my bad).

More examples and documentation

For new comers wanting to try search-index as an indexing solution, the documentation is a bit slim and short of examples. For example, what are "facets" ? The snippets in the documentation is all but clear on their purpose and how to use them.

Could there be, at least, a full "working" example of the engine? And perhaps more than

Q: What is a facet?
R: Allows faceted navigation.

for the different options?

Also, what is expected to get when using teaser? "Creates a field that shows where the search terms exist in the given field." Can an example result be given?

Separate index process from search

Hello There,

I might not have gotten to grasp with the internals of the module, but my inital attempt was to create a node script which indexed some data, and another which provided a query http api via express.

It seems that when one process is started there is an IO lock which does not allow the other to read information. I understand that this is locking to ensure the validity of the index.

OpenError: IO error: lock si/LOCK: Resource temporarily unavailable

Is there any way that the above scenario could be tackled?

Thanks,
Fotis

Can `matcher()` return docIDs along with strings?

matcher (and the whole project, really) seems very handy, but it feels like there's a gap between it and getting search results.

In the use case of a search box with typeahead support, matcher will get you the suggestions to present to the user, but once the user selects one of the sections, to retrieve the actual relevant documents, you have to then call search with the actual selection.

It would be great if matcher gave you back not only matching field strings but also the docID, so you could just call get instead of search. Would the right approach for this be for indexDoc in indexer.js to stuff the docID in with the reverseIndex key?

Option for case insensitive indexing/searching

I don't know if I have forgot something, but I index a few documents and when I tried to make a query the query seems to be case sensitive. If I look for "this", and a field contains "This is an example" search-index does not return it.

This is the expected behaviour? Should I index all the document in lowercase?

Sort functionality for facets

Facets should be sortable (alphabetic, numeric, magnitude). There should also be a query vocabulary to express this.

Take field length norm into account

This is a way of taking field length into account when talking about term frequency. Shorter fields are typically more meaningful, therefore terms appearing in shorter fields are given a higher value. Should be calculated on indexing.

Wildcard search

Users can return all docs in the index by using, say, an asterisk ('*') as the query term.

levelUP module configuration

Would you consider having search-index use the levelup module directly instead of via level? Along with some way to pass configuration options to the constructor, this would allow use of level-js so search-index would be usable in the browser as well as via Node.js. I'd be willing to work on a patch for this.

pageSize instead of pagesize

I believe the correct key for page size is pageSize, but readme.MD shows pagesize.
If using pagesize, totalHits shows the correct value but the hits array is empty

cache functionality

Cache queries, so that results can be returned without performing an actual search.

Must be kept off of a multinode installation since old caches cannot be removed if the keys are hashed, and it is faster if it is local.

si.del callback err is true ?

Why err param from si.del callback is at true when all is ok ?

In deleteDoc it isn't better to return callback(null, true) instead of return callback(true);

Phrase Search

Search for "a phrase bounded" by inverted commas. Could possibly be implemented using the magic of ngrams

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.