pouchdb-community / pouchdb-quick-search Goto Github PK

View Code? Open in Web Editor NEW

381.0 381.0 83.0 31.59 MB

Full-text search engine on top of PouchDB

License: Apache License 2.0

JavaScript 99.35% Shell 0.13% HTML 0.52%

pouchdb-quick-search's People

Contributors

Stargazers

Watchers

Forkers

jfgirard rhiokim nihgwu akjoshi chandu0101 myl142857 szwork2013 casualuser wegewerk-mdt diamondio craftzdog tresiwald npow gdmuzzillo alpinetech rlugojr spicemix bruceyue blackforestboi jhiker shanfei mmpowers rafiyajaved datagenno hbcodexci schmaluk cremalab daviddea85 jonfk seantanly cristiang540 harbour-software valu-digital cah-ricksuggs d4tocchini mqckingbird nguyencongvu mikegreatorex mtlink mdaparte burak-koksal-jotform unoffices abwalters harryd31 coroleu curlycabbage antoinefotso anirudhk94 arc-archive freshy969 gstvg osmanorhan leonid-shevtsov ahaikal andaniel2029 trenta3 micueldev p12y diogoqueiros amadeuspagel cefn m0essy chxru curltech sharecode-id jbriales turbobit toolstation isabella232 garbados shresthabijay librocco irnc maciejlutostanski cyphercider baronrustamov femiblack mochi-cards

pouchdb-quick-search's Issues

Japanese tokenizer does not work

Hi,

I'm using this module with lunr.jp.js but it doesn't work.
The problem is that the lunr.jp.js replaces from original lunr.tokenizer to Japanese tokenizer.

The pouchdb-quick-search seems not to use global.lunr.tokenizer because it uses an internally initialized lunr which doesn't have the Japanese tokenizer.

I want to replace the tokenizer in the pouchdb-quick-search into Japanese one but it doesn't expose lunr.
It would be nice to add an option which can specify arbitrary instance of lunr.

Is it possible to build index serverside and then sync to clientside?

I'm building a Magic the gathering search engine that will work on-/offline.

I have a quite big data for clientside search about 16.9MB json data it would be awesome if I somehow could build the index on the server and send the built version to client.

Do you think this would be possible?

Index fields from nested array of objects?

Given a document like:

{
  "some_field": "foo",
  "meta": [
    {
       "nested": "First meta"
    },
    {
      "nested": "Second meta"
     }]
}

How can I index the "First meta" and "Second meta" values? I tried meta.nested and meta[].nested but they do not seem to work.

how to query two keys?

want to query
class == "book" && bookname = "pouch"

no example there

why cannot get the correct result?

below is the example ,i change the keyword to "me" "it" ,the result is null why?

thanks

(function () {
  var pouch;
  var doc = {_id: 'mydoc', title: "Guess  who?", text: "It's-a me, Mario!"};

  function log(str) {
    document.getElementById('display').innerHTML += str + '\n';
  }

  // destroy the db so you can see the document being put
  // each time you load the page. obviously you wouldn't want
  // to do this in production.
  PouchDB.destroy('mydb').then(function () {
    pouch = new PouchDB('mydb');
  }).then(function () {
    log('putting doc: ' + JSON.stringify(doc));
    return pouch.put(doc);
  }).then(function () {
    var query = {
      query: 'it',
// query:'me' 
//query:'guess' , this is ok
      fields: ['title', 'text'],
      include_docs: true,
      highlighting: true
    };
    log('searching with query: ' + JSON.stringify(query));
    return pouch.search(query);
  }).then(function (res) {
    log('result: ' + JSON.stringify(res));
  });
})();

Support for partial match for single token query

Full disclosure : I am a total noob when it comes to FTS and have never used lunr before.

From what i understand of the source code, the actual searching of terms from the query is done here :
https://github.com/nolanlawson/pouchdb-quick-search/blob/master/index.js#L174-L176

Wouldn't it be possible, if queryTerms.length is 1, to do a startKey/endKey search instead ? This way querying for "somethi" would get us docs containg the term "something".

I imagine some adaptions are needed in the rest of the search algorithm. If you think it might work / would merge a PR about this, I will work on it over the week-end.

Thanks

Search with empty filter

How can i search with empty filter i order to get all records ?

Can search still work without tokenization or stemming?

If I don't care about tokenization or stemming, is it still possible to make search work for languages like Korean, or Chinese. I thought some kind of simple string match would take place if tokenization and stemming are not used. But that doesn't seem to be the case when I search for Korean and Chinese. I didn't get any result back.

Is it just a matter of adding a new language in lunr-languages?

support requirejs setup

As the doc said, to setup this library in the browser (vs in node.js) we include 2 script tags: one for pouchdb and one for pouchdb-quick-search in that order.The first script tag attach PouchDB to the window object. the second script tag look for window.PouchDB to register the plugin: window.PouchDB.plugin(exports);

Requirejs does not attach PouchDB to window object. Consequently, pouchdb-quick-search can not be registered as a plugin. Any suggestion?

automatically fill the db from a text file

Hi, I want to create a book reader using pouchdb.
As this db is static because the book will not change in the future, this is a operation to do one time.

The idea is to automatically fill the db, reading the text from a normal csv file, where each line is a record and the fields are separated form a comma or a dot comma symbol.

Can I connect to db from linux shell?

I think to create a bash script with something like:

IPS="\n"
for Line in `cat book.csv`; do
  PouchDBinsertCommad "$Line"
done

someone have a solution for that?

regards
MaX

how to setup this library with requirejs

To include this library as a plugin for PouchDB, the doc said:

var PouchDB = require('pouchdb');
PouchDB.plugin(require('pouchdb-quick-search'));

My question is: Where is this require() method coming from? Is it related to nodejs?(i don't know anything about nodejs).

I am using requirejs+angular and this is my attempt.

define(
[
'angular'
,'pouchdb'
,'pouchdb_quick_search'
]
,function
(
angular
,PouchDB
,pouchdb_quick_search
)
{
var mod = angular.module('service/db',[]);

mod.factory('service/db/get',function(){
    return function(db_name){
        PouchDB.plugin(pouchdb_quick_search);
        return new PouchDB(db_name);            
    }
});

})

The pouchdb instance returning from 'service/db/get' has search method as undefined.

Thank you in advance.

Can I remove the stemmer and keep only the stop word function?

I would like to remove stemming and possibly trimming because I search for non-language words with greek letters in them. Is it possible to do it?

Support for regex

I've tried to pass a regex (with 'igm' flags) without much success... I want to be able to get "Mario" results even if I look for "ario"

Any plan to support it?

Changing search algorithm to better suit numbers

I'm mainly searching across strings of numbers, rather than English words, and it seems like the tf-idf algorithm isn't really suited to this. For example, if there is an entry with id = 123456 and I search for 3456 then it doesn't show up as a result, despite there being no other document with a 3456 string in it.

How would one go about changing the search algorithm to something else?

Index object?

It seems that there is an index object identified by the search parameters, what do you think about making it explicit?

var index = pouch.searchIndex({ fields: ..., filter: ... })
index.search({ query: ..., include_docs: ...})

this way it would be easier to maintain multiple indexes, build them in advance, and reason about them

Possible filter cache issue

Hi, I'm running in to an odd bug where the filter option for searching seems to be cached (at least I think that's what's happening).

I've managed to reproduce the issue with this code: http://bl.ocks.org/Darkle/9bcf54994859b53dc3da

The second search should return doc7, but it seems to have cached the first filter, even though the domainToSearchFor is changed.

Also, I found that if ran the code in the browsers dev console and did the searches manually, and then if I changed the code in the filter for the second search (even just adding a blank new line) it works as expected and updates the filter.

Add prefix search

I know I said in the readme that you don't need this for prefix search, but I realize now that you do, because we have the benefit of being case-insensitive.

is it possible to get all docs by passing empty query ?

I want to get all docs by passing empty query "" . Is it possible .. ?

Index fields from nested array of objects

Hello!
Getting problems to serach in nested array with this doc structure

{
"_id": "1077",
"_rev": "1-805dcb10756e24f3cf31b5eaf826a430",
"updated": "0",
"name": "Swegon",
"persons": [
{
"firstname": "Henrik",
"person_id": "2005",
"lastname": "Bork",
"updated": "1319029129",
},
{
"firstname": "Ernst Børge",
"person_id": "2006",
"lastname": "Johansen",
"updated": "0",
},
{
"firstname": "Leif",
"person_id": "2463",
"lastname": "Hamrebø",
"updated": "0",

]
}

Search would work with first array item like firstname, but will not work with others like person_id or lastname.

Querry:
var opts = {
fields: ['persons[].lastname'],
query: 'Johansen'
};

Confusion in the doc (instantiate/modify/index lunr)

I'm a little bit confused with lunr implementation in this plugin.

Let me explain...

In the past, I've tried to add multi language support without success ( I'm not using requireJS - using Angular 2 though - and would like to avoid if possible) and I don't really understand how to use pipelines (would like to remove accents when indexing). The problem is that I don't know how to follow lunr tutorials because there is this plugin implementation and it's confusing...

It would be awesome if someone could give me an example of implementation of a pipeline to modify the index (like removing accents) and adding mult-language support (without using requireJS).

We could even add these examples to the documentation....

Avoid requiring global `lunr` for lunr-languages

Apparently this is fixed on the lunr-languages side, so we don't need to require this anymore: MihaiValentin/lunr-languages#2.

Changes feed for search results?

Is it possible?

Unable to get language support

I'm a little bit confused with how to enable language support (in web browser).

I've tried to add the following:

  var idx = lunr(function () {
         this.use(lunr.multiLanguage('en', 'fr'));
  });

But I get the following warning:

 Overwriting existing registered function: lunr-multi-trimmer-en-fr
 Function is not registered with pipeline. This may cause problems when serialising the index.

I tried different things like adding lunr.Index.load(idx); but I get another warning (version mismatch: current 0.7.1 importing undefined) and I can't even build the index...

Any suggestion would be really appreciated

Search only returns results for certain words

Somehow I cannot get search to find anything except if the query is work or pattern. Other terms like computer, lion and dog do not return any results. Test code is below. None of these terms are stopwords, so I don't understand why no results are found.

var assert = require('assert');
var uuid = require('node-uuid');
var PouchDB = require('pouchdb');
PouchDB.plugin(require('pouchdb-quick-search'));

var text = 'lion';

var db = new PouchDB('test-db');
var obj = {
  _id: uuid.v4(),
  text: text
};

function build(cb) {
  cb();
}

db.put(obj, function (err, res) {
  if (err) console.log(err, res);
  db.get(obj._id, function (err, o) {
    assert(!err && o._id === obj._id);

    var options = {
      query: text,
      fields: ['text'],
    };

    db.search(options, function (err, result) {
      assert(!err);
      assert(result.total_rows > 0);

      build(function (err) {
        assert(!err);

        db.search(options, function (err, result) {
          assert(!err);
          assert(result.total_rows > 0);
        });
      });
    });
  });
});

Multiple languages support?

Is it possible? My database contains multiple languages. It would be nice to be able to search not just English.

search function is causing issue on Safari (OS X and iOS)

Search is working for me very well on Chrome and Firefox. However as soon as I call search on Safari (Mac OS X or iOS), I get this dialog box:

Allow this website to use space on your disk? The website “http://localhost:3000” is requesting 5 MB of disk space to store “_pouch_glossary-search-41aedf802eabccbf360b0ea56f611df1” as a database on your disk. Currently, this website is allowed to use 5 MB of disk space.

When I click "Allow" the dialog box goes away and then comes back up again and again. This happens for about 20 times and when the dialog box finally goes away, the search index has not been created.

Here's the relevant code:

return db.search({
    fields: ['name', 'tags'],
    build: true
});

Query with utf-8

Hi.
I've problems with search. Hope any body help me!

With Utf-8:
I'm convert string to search to latin1 for search to search all docs same it.

example:

{query: "nguyễn",
fields: ["display_name"],
highlighting: true,
include_docs: true}

however: i've docs with display_name is nguyen or nguyễn. And i want to list all this.

Like LIKE query in mysql:

example:

{query: "%guye%",
fields: ["display_name"],
highlighting: true,
include_docs: true}

and i want to show all docs with between is guye
example: nguyễn, nguyen, nguye, guyen ...

Thanks for watching!

Filter doesn't seem to work with build:true

Is it possible to create the index with a filter so we can index before we query (for a faster "first" query)

When I create it, I get the ok response but when I do the first query, it takes a lot of time

Rollup compatibility issue

As discussed in the issue:
ionic-team/ionic-framework#8356

This library has a compatibility issue with Rollup because of a 'strict error' in the "md5-jkmyers" library, which seems to no longer being maintained.

Knowing that, I would like to know if you could change it to the same library used in pouchdb js-spark-md5? Or fork of the "md5-jkmyers" repository and apply the bug fixes to it.

Thank you.

No able to search

Hi,
I am using pouchdb on node js with this plugin but not able to search because of the above issue mentioned...
var PouchDB = require('pouchdb');
PouchDB.plugin(require('pouchdb-quick-search'));
var db = new PouchDB('http://127.0.0.1:5984/users');
db.search({
query: 'maria',
fields: ['name']
}).then(function(result) {
console.log(result,"========")
res.send(200,result);
// handle results
}).catch(function(err) {
// handle error
console.log(err,"========")
res.send(500,err);
});
output of the above code is this : {
"code": "ESOCKETTIMEDOUT",
"connect": false,
"status": 500
}
Now I checked pouch db server logs and I find :

Please double-check your map/reduce function.
ReferenceError: isFiltered is not defined
at evalmachine.:3:9
at /usr/local/lib/node_modules/pouchdb-server/node_modules/pouchdb-mapreduce/lib/index.js:147:14
at tryMap (/usr/local/lib/node_modules/pouchdb-server/node_modules/pouchdb-abstract-mapreduce/lib/index.js:166:7)
at createDocIdsToChangesAndEmits (/usr/local/lib/node_modules/pouchdb-server/node_modules/pouchdb-abstract-mapreduce/lib/index.js:589:13)
at processBatch (/usr/local/lib/node_modules/pouchdb-server/node_modules/pouchdb-abstract-mapreduce/lib/index.js:572:37)
at process._tickCallback (internal/process/next_tick.js:103:7)

I want to know where I am doing wrong and I have users greater than 30k in the database

Cannot read property 'Promise' of undefined

Running this chrome with PouchDb 5.4.4. Got the latest search dist.

How can I search by '_id' ?

Like bellow:

    db.search({
      query: 'foo',
      fields: ['_id'],
      include_docs: true
    });

Support Http Pouch on Node

Hi,

I was looking for a full text index that works on both the browser (offline) and Couchdb for a while. My actual setup is to use a PostgreSQL cache with Fulltext index on the server and a primitive search using a simple map function in Pouch.

Your solution using Lunr is much better. I would use it on the server too but I don't want to duplicate my data from Couchdb to Pouchdb (it can be very large and that is why i want to dump the PG index).

I did some tests and manage to create a Couchdb Map/Reduce view using CommonJS (in my fork repo https://github.com/jfgirard/pouchdb-quick-search). I added the required libs (lunr + stemmerSupport, lurn-LANG if needed) with a tweaked version of your map function. All 35 tests passes (with the added missing stale option pouchdb/mapreduce#197) using TEST_DB=http://localhost:5984/quick-search.

Is it something you want to add to your code ?
If yes, I can make a PR... But I had to make some changes to hook the code to add / remove the design documents in Couchdb. With your help, it can be better done. Also, it works only with Pouch in nodejs and read the the libs from couchdb_libs folder.

Jeff

Partial word matching?

Great plugin. Just curious if partial word matching is available?

As in, user searches for:
mega
and the document title (not _id) is:
megaman

I know I can handle this with secondary index, just curious if this plugin support this.

Peer dependency needs update to PouchDB 3.0

Alternatively, the whole peer dependency could be removed.

npm ERR! peerinvalid Peer [email protected] wants pouchdb@>= 2.2.0

Issues with promises

I'm having trouble that the promise is always set to pending:
var promise = db.search({query:'java',fields: [ 'value.volumeInfo.title', 'value.volumeInfo.subtitle', 'value.volumeInfo.publisher', 'value.volumeInfo.authors', 'value.volumeInfo.publishedDate' ]})
and i got the following value for promise
Promise {cancel: function, [[PromiseStatus]]: "pending", [[PromiseValue]]: undefined}

any idea?

Won't persist data when run in IE 11

PouchDB Version 5.4.4

When developing and testing on my desktop computer with IE 11 on Windows 10 (sorry, I'm a Visual Studio guy and I find it convenient build and test in IE (it's a work thing) ), the PouchDB database fails to persist data whenever the app shuts down. Upon a restart of the app the data is gone. Also, PouchDB will frequently throw an error message saying the database has been corrupted. When that happens it is not possible to use the database anymore and you must destroy the database and create a new one to recover from it.

need to search terms with wildcard on both ends

When I type "test" it should search from

"hey test",
"test 123",
"notest"
"testyes"

2 and 4 works fine with following code, but 1 and 3 doesn't? Could anyone please help?

function searchPages(searchTerm) {
        var deferred = $q.defer();

        pouchdb.query(searchMap, {
            startkey     : searchTerm,
            endkey       : searchTerm + '\uFFFF',
            include_docs : true
        }).then(function (result) {

            console.log(result);

            var results = result.rows.map(function(r) {
                 return r.doc;
            });

            deferred.resolve(results);

        }).catch(function (err) {
            // handle errors
            deferred.reject(err);
        });

        function searchMap(doc) {
            if (doc.type === 'page') {
                emit(doc.content);
            }
        }

        return deferred.promise;
    }

P.S I've already went through comments at #8

I then decided to use map/reduce, so this issue might not exactly be related to this plugin, still wanted to give it a try :)

I would like Portuguese support.

I'm asking because you said we could ask for support for other languages.

There is info on this here: olivernn/lunr.js#16
And implementations for lunr here: https://github.com/MihaiValentin/lunr-languages

But you probably know that.

arabic support?

hi,
does this plugin support arabic search?
also,
can i use it instead of find plugin? what's their main difference?

Rebase on latest map/reduce

I'm using a map/reduce fork for performance reasons, but I should update because it's had some changes recently.

How to Split words on Hyphen ?

I noticed a difference in the way words are tokenized compared to Postgresql.

select to_tsvector('Pseudo-Mercator');
"'mercat':3 'pseudo':2 'pseudo-merc':1"

Basically, PG index both the "Pseudo-Mercator" and the sub words "Pseudo" and "Mercator".

Searching for "mercator" gives me a result with PG.

But, because Lunr only tokenize on white char, a search for "mercator" won't work.

I could create a afterTokenizer function to split each token and add them to the list.

function afterTokenizer(tokens) {
     var split;
     tokens.forEach(function(token){
     split = token.split(/-/g);
     if(split.length > 1){
         tokens = tokens.concat(split);
      }
     });
     return tokens;
}

So, index.pipeline.run(lunr.tokenizer(text)); would be index.pipeline.run(afterTokenizer(lunr.tokenizer(text)));

Is this the best way to acheive the same behavior ?

Filter Document Option

I need a way to filter the documents to include in the index. For example, documents with property trashed to true would be ignored. An other example is to index only docs of a specific type (when multiple doc type sharing a same property, such as name or title).

I want to avoid apply the filter on the result since it makes query with limit and skip more complicated and less efficient.

It would also be faster to build the index if it contains only the docs I want to search for.

With Couchdb-Lucene, the "fulltext" function defined in the design document allow me to filter what doc to index.

This is my attempt to acheive it: jfgirard@cd1a926

It rely on a "evil" new Function code though.

Is this something you want ?

angular-js search

I have the search in my angular js controller
I do get the results fine.
However my angular template does not get passed the results.

I am updating the scope variable in the .then() function.

Is there an other event i have to use?

Fuzzy search?

Is it possible?

TypeError: pouch.search is not a function

I try to use this plugin configured with webpack.

I' m sure the pouchdb. js and this plugin is successfully loaded.
And according to your ways,

But, error log “TypeError: pouch.search is not a function”.
Is there anyone using this plugin successfully with webpack?

search does return nothing when searchable text is HTML

Hi there!

In my database the content of property "content" is wrapped withHTML, e.g.

Test Page 1

this is page 1 to test any search function

The doc structure is very easy: _id, displayName and content.
The search is based on field "content".

The search will not return any result. If I remove all HTML tags the search function will work.

Do you have any ideas how to make my database searchable?

Regards
Carsten