rnewson / couchdb-lucene Goto Github PK
View Code? Open in Web Editor NEWEnables full-text searching of CouchDB documents using Lucene
License: Apache License 2.0
Enables full-text searching of CouchDB documents using Lucene
License: Apache License 2.0
When I enter documents containing utf8-chars like öäüß in a couchdb via Futon and query them afterwards with
curl http://127.0.0.1:5984/notes_development/_fti/lucene/by_title?q=pop*
I get the following error:
{"error":"ucs","reason":"{bad_utf8_character_code}"}
(running under OsX 10.5, ~ > java -version
java version "1.5.0_16"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing)
-- Frank
I'm trying to get c-l 0.5 working on couchdb 0.10.1 on fedora core 5.3. I have c-l successfully running on a different system running 0.9 and c-l 0.4. Perhaps I'm doing something very obvious wrong here, but I can't find the same issue anywhere else.
This is the trace I'm getting, with one of the example methods:
It doesn't matter what I put in the index function for the _fti index, it keeps coming up with this error, even if I change the function to a mere 'return null;', it returns the same error. On 0.4, no problems whatsoever.
I'll downgrade to 0.4 for now, but I just wanted to check if perhaps anyone else has bumped into this issue with 0.5 as well.
The 'info' request in the python external hook should look like:
path = '/'.join(['', 'info', host, str(port)] + path)
EDIT: Removed inline patch
I recently started getting this error which is preventing new document changes from being indexed. Any ideas??
Oh, and this is with couchdb 0.9.1 and couchdb-lucene 0.4.
[couchdb-lucene] ERROR Error updating index. java.lang.ClassCastException: JSON keys must be strings. at net.sf.json.JSONObject._fromJSONObject(JSONObject.java:1067) at net.sf.json.JSONObject.fromObject(JSONObject.java:177) at net.sf.json.JSONSerializer.toJSON(JSONSerializer.java:108) at net.sf.json.JSONArray._processValue(JSONArray.java:2535) at net.sf.json.JSONArray.processValue(JSONArray.java:2593) at net.sf.json.JSONArray.addValue(JSONArray.java:2580) at net.sf.json.JSONArray.element(JSONArray.java:1753) at net.sf.json.JSONArray.fromObject(JSONArray.java:183) at net.sf.json.JSONSerializer.toJSON(JSONSerializer.java:113) at net.sf.json.JSONObject._processValue(JSONObject.java:2759) at net.sf.json.JSONObject.processValue(JSONObject.java:2852) at net.sf.json.JSONObject.element(JSONObject.java:1891) at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:1175) at net.sf.json.JSONObject.fromObject(JSONObject.java:181) at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:370) at net.sf.json.JSONArray._fromJSONTokener(JSONArray.java:1160) at net.sf.json.JSONArray.fromObject(JSONArray.java:149) at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:373) at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:1147) at net.sf.json.JSONObject._fromString(JSONObject.java:1337) at net.sf.json.JSONObject.fromObject(JSONObject.java:187) at net.sf.json.JSONObject.fromObject(JSONObject.java:156) at com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:87) at com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:262) at com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:199) at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:95) at java.lang.Thread.run(Thread.java:619)
In the response headers, couchdb uses 'Content-Type', and couchdb-lucene uses 'content-type'.
According to the RFC 4.2, header field names are case-insensitive, but it being different from couchdb tripped up the library I'm using.
When inserting/creating a large number of new documents via CouchDB's bulk API, the Lucene index is not updated accordingly.
I'm having a crashing issue with CouchDB 0.10.0 and couchdb-lucene 0.4 on OS X 10.6.1.
Here's my local.ini: http://gist.github.com/231189
Here's the couch.log output: http://gist.github.com/231188
I see something that might be the issue:
[Tue, 10 Nov 2009 19:24:53 GMT] [debug] [<0.5768.0>] OS Proc: Unknown info: {#Port<0.4098>,
{data,{eol,<<"Error occurred during initialization of VM">>}}}
[Tue, 10 Nov 2009 19:24:53 GMT] [debug] [<0.5768.0>] OS Proc: Unknown info: {#Port<0.4098>,
{data,{eol,<<"Unable to load native library: libjava.jnilib">>}}}
...but I can't tell how to go about fixing and/or troubleshooting this further.
Any help very much appreciated. Thank you!
With an indexing function such as:
function(doc) { var ret=new Document(); ret.add(doc.name); return ret }
If doc.name is ever null, indexing fails, causing couchdb-lucene to no longer recognize that view as valid, which is difficult to diagnose. Also, couchdb throw the error "first argument must be non-null." on startup.
Workaround:
function(doc) { var ret=new Document(); if (doc.name) { ret.add(doc.name); return ret } }
couchdb-lucene should ignore null fields rather than dying.
I have the following problem: I have a field of type string that I want to make a query on, say field:[00.0 TO 00.17]. Unfortunately, this doesn't work correctly, since CustomQueryParser assumes everything that looks like a Number will have been indexed as a number, which is too strong an assumption IMHO.
BTW. field[-00.0 TO 00.17] works, because CustomQueryParser only recognizes nonnegative numbers (but that's a different issue).
Followed instructions line by line. As soon as the java -index proces begins it pegs the CPUs, even if there are no databases/documents present in CouchDB.
OS: Leopard 10.5.7
CouchDB: 0.9.1
Java: 1.5.0_19
I have successfully built and run couchdb-lucene 0.5 (github HEAD), with couchdb 0.10.1 (via MacPorts) on OS X following the instructions.
But I don't get any indication that it is trying to index my databases. The JVM daemon starts, but the indexes folder is empty, couchdb.log shows no attempt from couchdb-lucene to do anything with the DBs, and no couchdb-lucene.log is created.
Even if I have more work to do to set up the index functions, I'd still expect to see some activity from couchdb-lucene to query couchdb to get a list of database, design docs, _changes, etc. Right?
I feel like i'm missing something simple, but lots of re-reading the docs and tweaking configs hasn't helped.
Thanks for your work first of all !
Here is a small patch to make the 0.4x branch working with couchdb trunk, mostly due to "_changes", new feature:
http://friendpaste.com/7933r3BlrJ4pJM3WBjanM6
here is a python test case for this patch :
http://friendpaste.com/6ChcO6yunH4jvi5c2sUfM
The patch pass this simple test case. Hope I've not forgotten something.
Regards,
xav
Hi,
building the project with 'mvn' on Osx (Java 1.6 or 1.5) results in some failing tests (see gist below).
-- Frank
...
2009-12-23 13:21:35,489 INFO [localhost/5984/cgm/couchapp/by_name] Committed checkpoint at update_seq 357773
(used Ctrl+C to stop process)
[root@localhost couchdb-lucene]# bin/run
2009-12-23 14:38:31,722 INFO [Main] Index output goes to: /usr/lib/couchdb/couchdb-lucene/indexes
2009-12-23 14:38:31,789 INFO [Main] Accepting connections with SelectChannelConnector@localhost:5985
2009-12-23 14:39:10,750 INFO [localhost/5984/cgm/couchapp/by_name] Starting.
2009-12-23 14:39:22,220 INFO [localhost/5984/cgm/couchapp/by_name] Committed checkpoint at update_seq 209712
2009-12-23 14:39:32,643 INFO [localhost/5984/cgm/couchapp/by_name] Committed checkpoint at update_seq 211543
...
Using the most recent version of couchdb-lucene at the time of this posting, and couchdb 0.10.0
Hi,
I'm using couchdb-lucene 0.5 as of today (01-07-2010) and couchdb 0.10.
When couchdb-lucene is indexing my documents, during the first access of the view I'm getting a TypeError exception that's preventing the document to be indexed:
2010-01-07 23:45:08,859 WARN couchdb.lucene.ViewIndexer.localhost/5984/knownet/bag/knsearch 16bb3f66e8d4c88d11b4e5c092ba38ab caused TypeError: Cannot find default value for object. (unnamed script#37)
I modified the DocumentConverterTest.java class to using my documents and my functions and the exception is still being thrown:
marcosvm@pepita:~/Servers/couchdb-lucene-0.5-SNAPSHOT $ jruby document_converter_test.rb
Loaded suite document_converter_test
Started
E
Finished in 0.258 seconds.
constructError' org/mozilla/javascript/ScriptRuntime.java:3632:in
constructError'typeError' org/mozilla/javascript/ScriptRuntime.java:3672:in
typeError1'getDefaultValue' org/mozilla/javascript/ScriptableObject.java:700:in
getDefaultValue'toString' org/mozilla/javascript/ScriptRuntime.java:3741:in
notFunctionError'getPropFunctionAndThisHelper' org/mozilla/javascript/ScriptRuntime.java:2214:in
getPropFunctionAndThis'_c0' org/mozilla/javascript/gen/single:-1:in
call'doTopCall' org/mozilla/javascript/ScriptRuntime.java:3065:in
doTopCall'call' com/github/rnewson/couchdb/lucene/DocumentConverter.java:59:in
convert'1 tests, 0 assertions, 0 failures, 1 errors
I'm not sure how to provide this default value required by Rhino or if it's an actual bug during the document conversion.
I put a copy of the function and one document here: http://gist.github.com/271912
Any help would be appreciated.
Thanks is advance,
Marcos
Trying to search a database that contains no documents (besides the couchdb-lucene design document) brings up the error "foo is not a valid view." This is misleading, as the view works properly when there are documents in the database.
I've trying to reinstall my whole system. I get couchdb-lucene from git and couchdb from svn. I've made an index (exactly the same that its in the documentation).
And I get:
[info] [<0.98.0>] 127.0.0.1 - - 'GET' /fisica-nist3/_all_docs_by_seq?startkey=0&limit=250&include_docs=true 404
[couchdb-lucene] WARN no rows found ({"error":"not_found","reason":"missing"}).
Probably something changed in the last couchdb?
Occasionally the server will return:
HTTP/1.1 500 Internal Server Error
Server: CouchDB/0.10.0 (Erlang OTP/R12B)
Date: Wed, 23 Dec 2009 04:08:46 GMT
Content-Type: text/plain
Content-Length: 603
Traceback (most recent call last):
File "/usr/lib/couchdb/couchdb-lucene/couchdb-external-hook.py", line 41, in main
resp = respond(res, req, opts.local_host, opts.local_port)
File "/usr/lib/couchdb/couchdb-lucene/couchdb-external-hook.py", line 86, in respond
resp = res.getresponse()
File "/usr/lib/python2.4/httplib.py", line 872, in getresponse
response.begin()
File "/usr/lib/python2.4/httplib.py", line 336, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.4/httplib.py", line 300, in _read_status
raise BadStatusLine(line)
BadStatusLine
If I index with "index":"not_analyzed" I get into trouble when trying to search for these terms, as there is no way to bypass the analyzer when issuing search queries. The possibility to use a URI parameter like "analyzer=none" or "analyzer=null" etc. would be great. Or do I miss something?
Handling list values In fulltext search couchdb-lucene is broken.
ref: Pls visit- "Index Everything example" @ http://github.com/rnewson/couchdb-lucene/blob/master/README.md
Ideally, if the value of a key is a array object, its contents should be indexed as a joined string.
But currently, it would ignore the key, and create additional keys based on the positions of the array items which is undesirable.
For example lets say we have a JSON data structure (DS) like this:
{'cars':['alto','mercedes','mahindra']}
The keys generated while indexing this DS are 0,1 and 2 and the values correspondingly are alto,mercedes and mahindra, while ignoring "cars" key - which is clearly not what we want to do.
The expected behaviour is to generate single key "cars" with values of the array joined (delimiter comma) as "alto,mercedes,mahindra" and then index them.
I hope its clear!
thanks
It would be nice to have the default lucene-operator configurable per query or at least per design-document. Image you have two applications served from the same couchdb instance. In one you like OR as the default and in the other one AND. Or maybe you have different query-types...
I was just wondering, if it is possible to integrate local lucene to couchdb-lucene. If have not looked deeper into l-l, but it sounds very interesting. local lucene is available here: http://sourceforge.net/projects/locallucene/
Maybe I'll try to integrate local lucene to c-l myself, when I find the mood to dig into Java ;)
Line 82 of the python external has "method" hardcoded. We should probably be testing for "verb" in req and use that if it's there, otherwise fall back to "method" for compatibility with older versions.
http://mail-archives.apache.org/mod_mbox/couchdb-commits/200912.mbox/%[email protected]%3E
I'm using rev 9fa591c (master) and I'm getting some errors that make me think the log config option in the .ini file isn't being respected:
https://gist.github.com/a6d8c7c36581bb20b72c
Any help would be very much appreciated.
Lucene allows you to change the default operator for queries from OR to AND. It would be nice if this option could be utilized when making queries through couchdb.
Exception looks like:
java.io.IOException: Cannot create directory: /some/inaccessible/dir/lucene at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:175) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:139) at com.github.rnewson.couchdb.lucene.Index.main(Index.java:324) at com.github.rnewson.couchdb.lucene.Main.main(Main.java:32)
See [skarab/couchdb-lucene@e0ec731] for small patch to catch unhandled IOException.
As already mentioned / discussed in IRC it would be really cool to be able to pause the indexer. This would be useful when inserting or updating a hugh number of documents to increase (write) performance. I thought of something like: a) stop c-l indexer b) insert/update few ten-thousand docs c) re-start indexer.
Robert,
Is there a way to update an fulltext function without losing the current index while the new one is building? I need to rebuild an index that originally took a couple days to build and can't lose my search while it's rebuilding.
I've tried adding a new design doc and renaming it to the original once the index was complete, but couch won't let me change the id of the doc. The only other thing i could think of doing was to create the new design doc and change my code to use it once it's complete (I'm hoping to avoid this if possible).
Thanks,
Dave
I've many documents like this
{Doctype:UNITS} or
{Doctype:SOURCE}
I've take a look into CouchDB and the document conserve the caps, I looked into lucene index with luke and the value still have the caps, but if I try to do a search with the caps like this:
http://localhost:5984/fisica-nist3/_fti/lucene/by_subject/?q=Doctype:UNITS
I get no documents and in the rewrite of the output I can see Doctype conserve the caps but UNITS goes downcase:
{"q":"Doctype:units","etag":"122bb567222","view_sig":"76efdc8dfb9d98ed577f2b7640228de4","skip":0,"limit":25,"total_rows":0,"search_duration":1,"fetch_duration":0,"rows":[]}
Maybe something is changing it before lauch the query against lucene?
When building couchdb-lucene with maven2 on mac os x 10.5.8 I got following error:
Failure executing javac, but could not parse the error: An exception has occurred in the compiler (1.5.0_22). Please file a bug at the Java Developer Connection (http://java.sun.com/webapps/bugreport) after checking the Bug Parade for duplicates. Include your program and the following diagnostic in your report. Thank you.
com.sun.tools.javac.code.Symbol$CompletionFailure: file net/jcip/annotations/GuardedBy.class not found
This is caused by an know issue in HttpClient that is only fixed in the 4.1alpha version.
https://issues.apache.org/jira/browse/HTTPCLIENT-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
Fix is to edit the pom.xml of couchdb-lucene and add jcip annotations to the dependencies by putting this in:
net.jcip
jcip-annotations
1.0
After I did this it compiles fine. Probably once the new (non alpha) version of HttpClient is released this workaround won't be needed anymore.
Robert,
still seeing the memory issue. It doesnt seem to be related to the couchdb.lucene.ram. I have set it to 512MB, it does however seem to be related to the -Xms -Xmx heap size settings, more means it goes for longer. Right now, I have set it up to build the lucene view as we copy/update documents over. That seems to be working for now.
I'm not too familiar with how the JSON -> Rhino conversion happens, but it seems that the wrong thing is done when a javascript property exists but its value is null. It seems that in JsonToRhinoConverter, ScriptableObject.putProperty will place a converted JSONNull value when it should probably be deleting it (or maybe handling it differently). The current behavior results in unexpected strings such as "com.github.rnewson.couchdb.rhino.JsonToRhinoConverter..." to end up indexed instead of the fields we want.
I apologize if this is unclear, but I did put a workaround in for now that results in the behavior I expect, though I'm pretty sure I'm not doing this the right way :)
I'm trying to build from the v0.4 tag, and I'm getting the following from maven:
Running com.github.rnewson.couchdb.lucene.LanguageIdentifierTest
Tests run: 12, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 2.164 sec
Running com.github.rnewson.couchdb.lucene.RhinoTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.574 sec
Running com.github.rnewson.couchdb.lucene.IntegrationTest
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 52.468 sec <<< FAILURE!
Running com.github.rnewson.couchdb.lucene.LuceneTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.007 sec <<< FAILURE!
Running com.github.rnewson.couchdb.lucene.TikaTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.748 sec
Running org.apache.nutch.analysis.lang.LanguageIdentifierTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.507 sec
Results :
Tests in error:
longIndex(com.github.rnewson.couchdb.lucene.IntegrationTest)
index(com.github.rnewson.couchdb.lucene.IntegrationTest)
initializationError(com.github.rnewson.couchdb.lucene.LuceneTest)
Tests run: 28, Failures: 0, Errors: 3, Skipped: 1
If you want the surefire-reports files, I can email them to you...
Thanks,
Zach
CouchDB throws:
Invalid or corrupt jarfile /home/jri/couchdb-lucene-0.4-jar-with-dependencies.jar
[error] [<0.49.0>] OS Process died with status: 1
This corresponds to a integrity check:
wget http://cloud.github.com/downloads/rnewson/couchdb-lucene/couchdb-lucene-0.4-jar-with-dependencies.jar.gz
gunzip couchdb-lucene-0.4-jar-with-dependencies.jar.gz
unzip -t couchdb-lucene-0.4-jar-with-dependencies.jar
results in
Archive: couchdb-lucene-0.4-jar-with-dependencies.jar
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
I'm using collectd's new curl_json plugin to pull couchdb stats and wanted to add some for couchdb-lucene. The plugin fails to parse the json returned by couchdb-lucene because the last_modified value is passed back as an integer that is too large. The actual failure is in yajl, the json parsing library used by the curl_json plugin.
The collectd config I was using:
<URL "http://localhost:5984/history/_fti"> Instance "lucene" <Key "/doc_count"> Type "gauge" </Key> <Key "/doc_del_count"> Type "counter" </Key> <Key "/disk_size"> Type "bytes" </Key> </URL>
The error from collectd:
2009-10-06_18:31:41.38722 [2009-10-06 18:31:41] curl_json plugin: yajl_parse failed: parse error: integer overflow 2009-10-06_18:31:41.38723 roup","room"],"last_modified":1254853876000,"optimized":fals 2009-10-06_18:31:41.38724 (right here) ------^
The data returned from the curl:
{"current":true,"disk_size":4939405,"doc_count":42658,"doc_del_count":2,"fields":["body","group","room"],"last_modified":1254853876000,"optimized":false}
I noticed that couchdb returns its instance_start_time
statistic as a string in json instead of an integer and was wondering if the same could be done for couchdb-lucene. For now, I'm just using a proxy script to do that conversion for me.
Versions:
I'm getting the following error with the latest version of couchdb-lucene:
SyntaxError: invalid syntax
File "/usr/lib/couchdb/couchdb-lucene/couchdb-lucene-0.5-0.2/tools/couchdb-external-hook.py", line 82
method = req["method"] if "method" in req else req["verb"]
$ python -V
Python 2.4.3
Changing the line to this fixes the error:
method = req["verb"]
if "method" in req:
method = req["method"]
dave
A Fulltext query with the "callback" parameter returns a (JSON-encoded ?) double-quoted string. For callbacks to work, this must be a method invocation in JavaScript, without quotes. Just as in normal CouchDB views with the "callback" Parameter.
So instead of
"cb({"q":"default: ... "})"
return
cb({"q":"default:..."})
Currently, the "_design/" component isn't considered when parsing an _fti request.
This doesn't conform to standard couchdb URL conventions and seems to create a problem with multiple index functions in multiple design documents.
The problem at least exists when the index is queried, but I think the problem persists throughout, including places where c-l queries CouchDB.
When I enable couch lucene in my local.ini file and restart couchdb, couchdb itself works fine but every few seconds the lucene indexer crashes. Configuration info and logs included below.
-----INFO----- Mac OS X 10.5.8 on powerpc Apache CouchDB 0.10.0a799093 compiled couchdb-lucene with standard mac os x java version "1.5.0_19" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304) Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing) -----T E S T S----- Running com.github.rnewson.couchdb.lucene.LanguageIdentifierTest Tests run: 12, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 46.392 sec Running com.github.rnewson.couchdb.lucene.RhinoTest Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.149 sec Running com.github.rnewson.couchdb.lucene.TikaTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.351 sec Running org.apache.nutch.analysis.lang.LanguageIdentifierTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.844 sec Results : Tests run: 25, Failures: 0, Errors: 0, Skipped: 1
local.ini = http://gist.github.com/169027
couch.log = http://gist.github.com/169025
crash report = http://gist.github.com/169028
Just upgraded to 0.4 I launch couchdb and I get this from the indexer.
With other databases that are smaller I get no problem.
I've tried to delete the index and try it again, check perms of the files and all that stuff and the problem still there.
[info] [<0.59.0>] 127.0.0.1 - - 'GET' /fisica-nist3/_all_docs_by_seq?startkey=53250&limit=250&include_docs=true 200
[couchdb-lucene] WARN Exception while updating index.
java.io.FileNotFoundException: /usr/local/bin/lucene/_b.fnm (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:552)
at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:582)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:482)
at org.apache.lucene.index.FieldInfos.(FieldInfos.java:58)
at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:341)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:306)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:236)
at org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:915)
at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:4339)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3579)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2748)
at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2683)
at com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:239)
at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:96)
at java.lang.Thread.run(Thread.java:636)
[couchdb-lucene] INFO indexer stopped.
[info] [<0.59.0>] 127.0.0.1 - - 'GET' /fisica-nist3/_all_docs_by_seq?startkey=53500&limit=250&include_docs=true 200
CouchDB has added jsonp callback support in the latest (0.10 version. ).
To add this feature to couchdb-lucence entails modifying SearchRequest.java in the following way (untested):
SearchRequest constructor:
// Parse callback argument
this.callback = query.optString("callback");
execute method:
// check if to return result callback-style before end of method
if (callback != "") {
result.put("json", callback + "(" + json + ")");
} else {
result.put("json", json);
}
I'm finding that trying to figure out what's in the index is a PITA. Been trying to debug if view indexing ran or if my function is working it'd be nice to have access to a JSON blob that showed what 'sub-indexes' exist and the list of fields in each index or similar. And perhaps things like the term count per field. Similar to what Luke shows in it's overview.
That is all.
i can't find the special _db field anywhere in the 0.5 code, is it still there?
i want to add the database name of the source document when indexing. I can add an extra field for the database name to the documents, but that doesn't feel right from a programmer's point of view : )
The resulting documents from a lucene-query, should be formatable using couch-db's _list API.
For instance with a URI like:
/database/_fti/lucene/**_list/listname/**lucene _idx_name?q=
If I need to query multiple terms at the same time, one option is to concatenate them with "OR" and pass them as "q" parameter via GET request. This will give me the all hits I want, but the results lose the track of which hit is returned from which term.
I am wondering if it's possible to submit multiple terms via POST (e.g. {"queries": ["q1", "q2",...]}' and return a list of hits based on the order of input queries. It's just like the POST request of '{"keys": ["key1", "key2", ...]}' to a couchdb view URL for multiple queries.
I see duplicate results in the search, because of earlier revisions of the document. Is there a way to have couchdb-lucene only index the latest revision?
How do I boost the ranks for certain search fields? I know this is in lucene somewhere, but how do I access it in cdb-l?
Cheers
Rohit
I've tried to move to couchdb-lucene 0.5 and I get the next error. I've Rhino 1.7R1 and mozjs 1.9.0.
2009-10-02 09:08:18,511 [couchdb-lucene] INFO Indexing fisica-nist3/lucene/by_subject from scratch.
[info] [<0.98.0>] 127.0.0.1 - - 'GET' /fisica-nist3/_changes?since=0&limit=250&include_docs=true 200
2009-10-02 09:08:18,590 [couchdb-lucene] ERROR Error updating index.
java.lang.RuntimeException: Invalid object type: org.mozilla.javascript.NativeObject
at com.github.rnewson.couchdb.lucene.Rhino.map(Rhino.java:129)
at com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:267)
at com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:194)
at com.github.rnewson.couchdb.lucene.Index$Indexer.access$100(Index.java:51)
at com.github.rnewson.couchdb.lucene.Index.main(Index.java:347)
at com.github.rnewson.couchdb.lucene.Main.main(Main.java:33)
2009-10-02 09:08:19,070 [couchdb-lucene] INFO Committed changes to index (1 documents in index, 0 deletes).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.