Giter Site home page Giter Site logo

couchdb-lucene's People

Contributors

adamlofts avatar anandology avatar artikh avatar cliffano avatar czue avatar darylyu avatar davisp avatar georgekankava avatar huijari avatar inator avatar janl avatar karmi avatar lucag avatar moonmaster9000 avatar nesteffe avatar olleolleolle avatar rnewson avatar roger avatar smola avatar stefankoegl avatar streunerlein avatar vjt avatar xristy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

couchdb-lucene's Issues

querying utf-8 documents fails

When I enter documents containing utf8-chars like öäüß in a couchdb via Futon and query them afterwards with

curl http://127.0.0.1:5984/notes_development/_fti/lucene/by_title?q=pop*

I get the following error:

{"error":"ucs","reason":"{bad_utf8_character_code}"}

(running under OsX 10.5, ~ > java -version
java version "1.5.0_16"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16-b06-284)
Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing)

-- Frank

0.5 compileFunction bug?

I'm trying to get c-l 0.5 working on couchdb 0.10.1 on fedora core 5.3. I have c-l successfully running on a different system running 0.9 and c-l 0.4. Perhaps I'm doing something very obvious wrong here, but I can't find the same issue anywhere else.

This is the trace I'm getting, with one of the example methods:

http://pastie.org/765409

It doesn't matter what I put in the index function for the _fti index, it keeps coming up with this error, even if I change the function to a mere 'return null;', it returns the same error. On 0.4, no problems whatsoever.

I'll downgrade to 0.4 for now, but I just wanted to check if perhaps anyone else has bumped into this issue with 0.5 as well.

java.lang.ClassCastException: JSON keys must be strings.

I recently started getting this error which is preventing new document changes from being indexed. Any ideas??

Oh, and this is with couchdb 0.9.1 and couchdb-lucene 0.4.

[couchdb-lucene] ERROR Error updating index.
java.lang.ClassCastException: JSON keys must be strings.
    at net.sf.json.JSONObject._fromJSONObject(JSONObject.java:1067)
    at net.sf.json.JSONObject.fromObject(JSONObject.java:177)
    at net.sf.json.JSONSerializer.toJSON(JSONSerializer.java:108)
    at net.sf.json.JSONArray._processValue(JSONArray.java:2535)
    at net.sf.json.JSONArray.processValue(JSONArray.java:2593)
    at net.sf.json.JSONArray.addValue(JSONArray.java:2580)
    at net.sf.json.JSONArray.element(JSONArray.java:1753)
    at net.sf.json.JSONArray.fromObject(JSONArray.java:183)
    at net.sf.json.JSONSerializer.toJSON(JSONSerializer.java:113)
    at net.sf.json.JSONObject._processValue(JSONObject.java:2759)
    at net.sf.json.JSONObject.processValue(JSONObject.java:2852)
    at net.sf.json.JSONObject.element(JSONObject.java:1891)
    at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:1175)
    at net.sf.json.JSONObject.fromObject(JSONObject.java:181)
    at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:370)
    at net.sf.json.JSONArray._fromJSONTokener(JSONArray.java:1160)
    at net.sf.json.JSONArray.fromObject(JSONArray.java:149)
    at net.sf.json.util.JSONTokener.nextValue(JSONTokener.java:373)
    at net.sf.json.JSONObject._fromJSONTokener(JSONObject.java:1147)
    at net.sf.json.JSONObject._fromString(JSONObject.java:1337)
    at net.sf.json.JSONObject.fromObject(JSONObject.java:187)
    at net.sf.json.JSONObject.fromObject(JSONObject.java:156)
    at com.github.rnewson.couchdb.lucene.Database.getAllDocsBySeq(Database.java:87)
    at com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:262)
    at com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:199)
    at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:95)
    at java.lang.Thread.run(Thread.java:619)

[Minor] Header labels could be properly cased

In the response headers, couchdb uses 'Content-Type', and couchdb-lucene uses 'content-type'.
According to the RFC 4.2, header field names are case-insensitive, but it being different from couchdb tripped up the library I'm using.

Crashes on OS X 10.6.1

I'm having a crashing issue with CouchDB 0.10.0 and couchdb-lucene 0.4 on OS X 10.6.1.

Here's my local.ini: http://gist.github.com/231189
Here's the couch.log output: http://gist.github.com/231188

I see something that might be the issue:

[Tue, 10 Nov 2009 19:24:53 GMT] [debug] [<0.5768.0>] OS Proc: Unknown info: {#Port<0.4098>,
{data,{eol,<<"Error occurred during initialization of VM">>}}}

[Tue, 10 Nov 2009 19:24:53 GMT] [debug] [<0.5768.0>] OS Proc: Unknown info: {#Port<0.4098>,
{data,{eol,<<"Unable to load native library: libjava.jnilib">>}}}

...but I can't tell how to go about fixing and/or troubleshooting this further.

Any help very much appreciated. Thank you!

Errors from adding a null field to an index are hard to diagnose.

With an indexing function such as:

function(doc) { var ret=new Document(); ret.add(doc.name); return ret }

If doc.name is ever null, indexing fails, causing couchdb-lucene to no longer recognize that view as valid, which is difficult to diagnose. Also, couchdb throw the error "first argument must be non-null." on startup.

Workaround:

function(doc) { var ret=new Document(); if (doc.name) { ret.add(doc.name); return ret } }

couchdb-lucene should ignore null fields rather than dying.

NumberRangeQuery too eagerly created in couchdb-lucene's CustomQueryParser

I have the following problem: I have a field of type string that I want to make a query on, say field:[00.0 TO 00.17]. Unfortunately, this doesn't work correctly, since CustomQueryParser assumes everything that looks like a Number will have been indexed as a number, which is too strong an assumption IMHO.

BTW. field[-00.0 TO 00.17] works, because CustomQueryParser only recognizes nonnegative numbers (but that's a different issue).

couchdb-lucene pegs CPU on Leopard 10.5.7

Followed instructions line by line. As soon as the java -index proces begins it pegs the CPUs, even if there are no databases/documents present in CouchDB.

OS: Leopard 10.5.7
CouchDB: 0.9.1
Java: 1.5.0_19

latest couchdb-lucene not indexing

I have successfully built and run couchdb-lucene 0.5 (github HEAD), with couchdb 0.10.1 (via MacPorts) on OS X following the instructions.

But I don't get any indication that it is trying to index my databases. The JVM daemon starts, but the indexes folder is empty, couchdb.log shows no attempt from couchdb-lucene to do anything with the DBs, and no couchdb-lucene.log is created.

Even if I have more work to do to set up the index functions, I'd still expect to see some activity from couchdb-lucene to query couchdb to get a list of database, design docs, _changes, etc. Right?

I feel like i'm missing something simple, but lots of re-reading the docs and tweaking configs hasn't helped.

Losing checkpoints after restarting bin/run

...
2009-12-23 13:21:35,489 INFO [localhost/5984/cgm/couchapp/by_name] Committed checkpoint at update_seq 357773
(used Ctrl+C to stop process)
[root@localhost couchdb-lucene]# bin/run
2009-12-23 14:38:31,722 INFO [Main] Index output goes to: /usr/lib/couchdb/couchdb-lucene/indexes
2009-12-23 14:38:31,789 INFO [Main] Accepting connections with SelectChannelConnector@localhost:5985
2009-12-23 14:39:10,750 INFO [localhost/5984/cgm/couchapp/by_name] Starting.
2009-12-23 14:39:22,220 INFO [localhost/5984/cgm/couchapp/by_name] Committed checkpoint at update_seq 209712
2009-12-23 14:39:32,643 INFO [localhost/5984/cgm/couchapp/by_name] Committed checkpoint at update_seq 211543
...

Using the most recent version of couchdb-lucene at the time of this posting, and couchdb 0.10.0

TypeError when creating lucene document

Hi,

I'm using couchdb-lucene 0.5 as of today (01-07-2010) and couchdb 0.10.

When couchdb-lucene is indexing my documents, during the first access of the view I'm getting a TypeError exception that's preventing the document to be indexed:

2010-01-07 23:45:08,859 WARN couchdb.lucene.ViewIndexer.localhost/5984/knownet/bag/knsearch 16bb3f66e8d4c88d11b4e5c092ba38ab caused TypeError: Cannot find default value for object. (unnamed script#37)

I modified the DocumentConverterTest.java class to using my documents and my functions and the exception is still being thrown:

marcosvm@pepita:~/Servers/couchdb-lucene-0.5-SNAPSHOT $ jruby document_converter_test.rb
Loaded suite document_converter_test
Started
E
Finished in 0.258 seconds.

  1. Error:
    test_document_conversion(DocumentConverterTest):
    NativeException: org.mozilla.javascript.EcmaError: TypeError: Cannot find default value for object. (single#49)
    org/mozilla/javascript/ScriptRuntime.java:3654:in constructError' org/mozilla/javascript/ScriptRuntime.java:3632:inconstructError'
    org/mozilla/javascript/ScriptRuntime.java:3660:in typeError' org/mozilla/javascript/ScriptRuntime.java:3672:intypeError1'
    org/mozilla/javascript/ScriptableObject.java:781:in getDefaultValue' org/mozilla/javascript/ScriptableObject.java:700:ingetDefaultValue'
    org/mozilla/javascript/ScriptRuntime.java:724:in toString' org/mozilla/javascript/ScriptRuntime.java:3741:innotFunctionError'
    org/mozilla/javascript/ScriptRuntime.java:2247:in getPropFunctionAndThisHelper' org/mozilla/javascript/ScriptRuntime.java:2214:ingetPropFunctionAndThis'
    org/mozilla/javascript/gen/single:49:in _c0' org/mozilla/javascript/gen/single:-1:incall'
    org/mozilla/javascript/ContextFactory.java:398:in doTopCall' org/mozilla/javascript/ScriptRuntime.java:3065:indoTopCall'
    org/mozilla/javascript/gen/single:-1:in call' com/github/rnewson/couchdb/lucene/DocumentConverter.java:59:inconvert'
    document_converter_test.rb:23:in `test_document_conversion'

1 tests, 0 assertions, 0 failures, 1 errors

I'm not sure how to provide this default value required by Rhino or if it's an actual bug during the document conversion.

I put a copy of the function and one document here: http://gist.github.com/271912

Any help would be appreciated.

Thanks is advance,
Marcos

Problem with couchdb-lucene 0.4 and couchdb 0.11.0b820580

I've trying to reinstall my whole system. I get couchdb-lucene from git and couchdb from svn. I've made an index (exactly the same that its in the documentation).
And I get:

[info] [<0.98.0>] 127.0.0.1 - - 'GET' /fisica-nist3/_all_docs_by_seq?startkey=0&limit=250&include_docs=true 404
[couchdb-lucene] WARN no rows found ({"error":"not_found","reason":"missing"}).

Probably something changed in the last couchdb?

Occasional 500 errors

Occasionally the server will return:
HTTP/1.1 500 Internal Server Error
Server: CouchDB/0.10.0 (Erlang OTP/R12B)
Date: Wed, 23 Dec 2009 04:08:46 GMT
Content-Type: text/plain
Content-Length: 603

Traceback (most recent call last):
  File "/usr/lib/couchdb/couchdb-lucene/couchdb-external-hook.py", line 41, in main
    resp = respond(res, req, opts.local_host, opts.local_port)
  File "/usr/lib/couchdb/couchdb-lucene/couchdb-external-hook.py", line 86, in respond
    resp = res.getresponse()
  File "/usr/lib/python2.4/httplib.py", line 872, in getresponse
    response.begin()
  File "/usr/lib/python2.4/httplib.py", line 336, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python2.4/httplib.py", line 300, in _read_status
    raise BadStatusLine(line)
BadStatusLine
  • Using the most recent version of couchdb-lucene (at time of posting) and couchdb 0.10.0
  • The majority of the time the response is fine.
  • There is no load on the server.
  • bin/run does not report any errors.

Searching without using an analyzer not possible

If I index with "index":"not_analyzed" I get into trouble when trying to search for these terms, as there is no way to bypass the analyzer when issuing search queries. The possibility to use a URI parameter like "analyzer=none" or "analyzer=null" etc. would be great. Or do I miss something?

Handling list values In fulltext search couchdb-lucene is broken.

Handling list values In fulltext search couchdb-lucene is broken.
ref: Pls visit- "Index Everything example" @ http://github.com/rnewson/couchdb-lucene/blob/master/README.md

Ideally, if the value of a key is a array object, its contents should be indexed as a joined string.
But currently, it would ignore the key, and create additional keys based on the positions of the array items which is undesirable.

For example lets say we have a JSON data structure (DS) like this:
{'cars':['alto','mercedes','mahindra']}
The keys generated while indexing this DS are 0,1 and 2 and the values correspondingly are alto,mercedes and mahindra, while ignoring "cars" key - which is clearly not what we want to do.

The expected behaviour is to generate single key "cars" with values of the array joined (delimiter comma) as "alto,mercedes,mahindra" and then index them.
I hope its clear!

thanks

Make couchdb.lucene.operator a query-parameter

It would be nice to have the default lucene-operator configurable per query or at least per design-document. Image you have two applications served from the same couchdb instance. In one you like OR as the default and in the other one AND. Or maybe you have different query-types...

Indexer silently dies when FSDirectory cannot be created

Exception looks like:

java.io.IOException: Cannot create directory: /some/inaccessible/dir/lucene
        at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:175)
        at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:139)
        at com.github.rnewson.couchdb.lucene.Index.main(Index.java:324)
        at com.github.rnewson.couchdb.lucene.Main.main(Main.java:32)

See [skarab/couchdb-lucene@e0ec731] for small patch to catch unhandled IOException.

Ability for pausing the indexer

As already mentioned / discussed in IRC it would be really cool to be able to pause the indexer. This would be useful when inserting or updating a hugh number of documents to increase (write) performance. I thought of something like: a) stop c-l indexer b) insert/update few ten-thousand docs c) re-start indexer.

updating an index function

Robert,

Is there a way to update an fulltext function without losing the current index while the new one is building? I need to rebuild an index that originally took a couple days to build and can't lose my search while it's rebuilding.

I've tried adding a new design doc and renaming it to the original once the index was complete, but couch won't let me change the id of the doc. The only other thing i could think of doing was to create the new design doc and change my code to use it once it's complete (I'm hoping to avoid this if possible).

Thanks,
Dave

Case sensitive problem

I've many documents like this
{Doctype:UNITS} or
{Doctype:SOURCE}

I've take a look into CouchDB and the document conserve the caps, I looked into lucene index with luke and the value still have the caps, but if I try to do a search with the caps like this:

http://localhost:5984/fisica-nist3/_fti/lucene/by_subject/?q=Doctype:UNITS

I get no documents and in the rewrite of the output I can see Doctype conserve the caps but UNITS goes downcase:

{"q":"Doctype:units","etag":"122bb567222","view_sig":"76efdc8dfb9d98ed577f2b7640228de4","skip":0,"limit":25,"total_rows":0,"search_duration":1,"fetch_duration":0,"rows":[]}

Maybe something is changing it before lauch the query against lucene?

Build couchdb-lucene: jcip-annotations error

When building couchdb-lucene with maven2 on mac os x 10.5.8 I got following error:

Failure executing javac, but could not parse the error: An exception has occurred in the compiler (1.5.0_22). Please file a bug at the Java Developer Connection (http://java.sun.com/webapps/bugreport) after checking the Bug Parade for duplicates. Include your program and the following diagnostic in your report. Thank you.
com.sun.tools.javac.code.Symbol$CompletionFailure: file net/jcip/annotations/GuardedBy.class not found

This is caused by an know issue in HttpClient that is only fixed in the 4.1alpha version.
https://issues.apache.org/jira/browse/HTTPCLIENT-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Fix is to edit the pom.xml of couchdb-lucene and add jcip annotations to the dependencies by putting this in:

net.jcip jcip-annotations 1.0

After I did this it compiles fine. Probably once the new (non alpha) version of HttpClient is released this workaround won't be needed anymore.

Memory Issue

Robert,

still seeing the memory issue. It doesnt seem to be related to the couchdb.lucene.ram. I have set it to 512MB, it does however seem to be related to the -Xms -Xmx heap size settings, more means it goes for longer. Right now, I have set it up to build the lucene view as we copy/update documents over. That seems to be working for now.

JsonToRhinoConverter issue when null value specified in doc

I'm not too familiar with how the JSON -> Rhino conversion happens, but it seems that the wrong thing is done when a javascript property exists but its value is null. It seems that in JsonToRhinoConverter, ScriptableObject.putProperty will place a converted JSONNull value when it should probably be deleting it (or maybe handling it differently). The current behavior results in unexpected strings such as "com.github.rnewson.couchdb.rhino.JsonToRhinoConverter..." to end up indexed instead of the fields we want.

I apologize if this is unclear, but I did put a workaround in for now that results in the behavior I expect, though I'm pretty sure I'm not doing this the right way :)

http://gist.github.com/278496

Problems building v0.4

I'm trying to build from the v0.4 tag, and I'm getting the following from maven:


T E S T S

Running com.github.rnewson.couchdb.lucene.LanguageIdentifierTest
Tests run: 12, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 2.164 sec
Running com.github.rnewson.couchdb.lucene.RhinoTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.574 sec
Running com.github.rnewson.couchdb.lucene.IntegrationTest
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 52.468 sec <<< FAILURE!
Running com.github.rnewson.couchdb.lucene.LuceneTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.007 sec <<< FAILURE!
Running com.github.rnewson.couchdb.lucene.TikaTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.748 sec
Running org.apache.nutch.analysis.lang.LanguageIdentifierTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.507 sec

Results :

Tests in error:
longIndex(com.github.rnewson.couchdb.lucene.IntegrationTest)
index(com.github.rnewson.couchdb.lucene.IntegrationTest)
initializationError(com.github.rnewson.couchdb.lucene.LuceneTest)

Tests run: 28, Failures: 0, Errors: 3, Skipped: 1

If you want the surefire-reports files, I can email them to you...

Thanks,

Zach

jar file in downloads section is invalid

CouchDB throws:
Invalid or corrupt jarfile /home/jri/couchdb-lucene-0.4-jar-with-dependencies.jar
[error] [<0.49.0>] OS Process died with status: 1

This corresponds to a integrity check:
wget http://cloud.github.com/downloads/rnewson/couchdb-lucene/couchdb-lucene-0.4-jar-with-dependencies.jar.gz
gunzip couchdb-lucene-0.4-jar-with-dependencies.jar.gz
unzip -t couchdb-lucene-0.4-jar-with-dependencies.jar

results in
Archive: couchdb-lucene-0.4-jar-with-dependencies.jar
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.

last_modified values passed back as large integer in json

I'm using collectd's new curl_json plugin to pull couchdb stats and wanted to add some for couchdb-lucene. The plugin fails to parse the json returned by couchdb-lucene because the last_modified value is passed back as an integer that is too large. The actual failure is in yajl, the json parsing library used by the curl_json plugin.

The collectd config I was using:

<URL "http://localhost:5984/history/_fti">
  Instance "lucene"
  <Key "/doc_count">
    Type "gauge"
  </Key>
  <Key "/doc_del_count">
    Type "counter"
  </Key>
  <Key "/disk_size">
    Type "bytes"
  </Key>
</URL>

The error from collectd:

2009-10-06_18:31:41.38722 [2009-10-06 18:31:41] curl_json plugin: yajl_parse failed: parse error: integer overflow
2009-10-06_18:31:41.38723           roup","room"],"last_modified":1254853876000,"optimized":fals
2009-10-06_18:31:41.38724                      (right here) ------^

The data returned from the curl:
{"current":true,"disk_size":4939405,"doc_count":42658,"doc_del_count":2,"fields":["body","group","room"],"last_modified":1254853876000,"optimized":false}

I noticed that couchdb returns its instance_start_time statistic as a string in json instead of an integer and was wondering if the same could be done for couchdb-lucene. For now, I'm just using a proxy script to do that conversion for me.

Versions:

  • couchdb-lucene 0.4
  • couchdb 0.9.1
  • collectd 4.8.0
  • yajl trunk

invalid syntax on line 82 of couchdb-external-hook.py

I'm getting the following error with the latest version of couchdb-lucene:

SyntaxError: invalid syntax
File "/usr/lib/couchdb/couchdb-lucene/couchdb-lucene-0.5-0.2/tools/couchdb-external-hook.py", line 82
method = req["method"] if "method" in req else req["verb"]

$ python -V
Python 2.4.3

Changing the line to this fixes the error:

method = req["verb"]
if "method" in req:
    method = req["method"]

dave

JSONP callback should not be returned in quotes

A Fulltext query with the "callback" parameter returns a (JSON-encoded ?) double-quoted string. For callbacks to work, this must be a method invocation in JavaScript, without quotes. Just as in normal CouchDB views with the "callback" Parameter.

So instead of
"cb({"q":"default: ... "})"

return
cb({"q":"default:..."})

Include "_design/" prefix in design doc pathname parser

Currently, the "_design/" component isn't considered when parsing an _fti request.

This doesn't conform to standard couchdb URL conventions and seems to create a problem with multiple index functions in multiple design documents.

crashes on Mac OS X 10.5.8

When I enable couch lucene in my local.ini file and restart couchdb, couchdb itself works fine but every few seconds the lucene indexer crashes. Configuration info and logs included below.

-----INFO-----
Mac OS X 10.5.8 on powerpc
Apache CouchDB 0.10.0a799093
compiled couchdb-lucene with standard mac os x java version "1.5.0_19"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_19-b02-304)
Java HotSpot(TM) Client VM (build 1.5.0_19-137, mixed mode, sharing)
-----T E S T S-----
Running com.github.rnewson.couchdb.lucene.LanguageIdentifierTest
Tests run: 12, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 46.392 sec
Running com.github.rnewson.couchdb.lucene.RhinoTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.149 sec
Running com.github.rnewson.couchdb.lucene.TikaTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.351 sec
Running org.apache.nutch.analysis.lang.LanguageIdentifierTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.844 sec
Results :
Tests run: 25, Failures: 0, Errors: 0, Skipped: 1

local.ini = http://gist.github.com/169027
couch.log = http://gist.github.com/169025
crash report = http://gist.github.com/169028

Problem with too much documents and 0.4 version

Just upgraded to 0.4 I launch couchdb and I get this from the indexer.
With other databases that are smaller I get no problem.
I've tried to delete the index and try it again, check perms of the files and all that stuff and the problem still there.

[info] [<0.59.0>] 127.0.0.1 - - 'GET' /fisica-nist3/_all_docs_by_seq?startkey=53250&limit=250&include_docs=true 200
[couchdb-lucene] WARN Exception while updating index.
java.io.FileNotFoundException: /usr/local/bin/lucene/_b.fnm (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:552)
at org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:582)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:482)
at org.apache.lucene.index.FieldInfos.(FieldInfos.java:58)
at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:341)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:306)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:236)
at org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:915)
at org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:4339)
at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3579)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3450)
at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1638)
at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2748)
at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2683)
at com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:239)
at com.github.rnewson.couchdb.lucene.Index$Indexer.run(Index.java:96)
at java.lang.Thread.run(Thread.java:636)
[couchdb-lucene] INFO indexer stopped.
[info] [<0.59.0>] 127.0.0.1 - - 'GET' /fisica-nist3/_all_docs_by_seq?startkey=53500&limit=250&include_docs=true 200

Support JSONP style callbacks [enhancement]

CouchDB has added jsonp callback support in the latest (0.10 version. ).

To add this feature to couchdb-lucence entails modifying SearchRequest.java in the following way (untested):

SearchRequest constructor:

// Parse callback argument
this.callback = query.optString("callback");

execute method:
// check if to return result callback-style before end of method
if (callback != "") {
result.put("json", callback + "(" + json + ")");
} else {
result.put("json", json);
}

Retrieve index statistics

I'm finding that trying to figure out what's in the index is a PITA. Been trying to debug if view indexing ran or if my function is working it'd be nice to have access to a JSON blob that showed what 'sub-indexes' exist and the list of fields in each index or similar. And perhaps things like the term count per field. Similar to what Luke shows in it's overview.

That is all.

missing special field _db

i can't find the special _db field anywhere in the 0.5 code, is it still there?

i want to add the database name of the source document when indexing. I can add an extra field for the database name to the documents, but that doesn't feel right from a programmer's point of view : )

Support for couchdb _list functions

The resulting documents from a lucene-query, should be formatable using couch-db's _list API.

For instance with a URI like:

/database/_fti/lucene/**_list/listname/**lucene _idx_name?q=

Query multiple terms via POST?

If I need to query multiple terms at the same time, one option is to concatenate them with "OR" and pass them as "q" parameter via GET request. This will give me the all hits I want, but the results lose the track of which hit is returned from which term.

I am wondering if it's possible to submit multiple terms via POST (e.g. {"queries": ["q1", "q2",...]}' and return a list of hits based on the order of input queries. It's just like the POST request of '{"keys": ["key1", "key2", ...]}' to a couchdb view URL for multiple queries.

Multiple Revisions included in Search

I see duplicate results in the search, because of earlier revisions of the document. Is there a way to have couchdb-lucene only index the latest revision?

Problem with couchdb-lucene 0.5 and couchdb 0.11.0b820580

I've tried to move to couchdb-lucene 0.5 and I get the next error. I've Rhino 1.7R1 and mozjs 1.9.0.

2009-10-02 09:08:18,511 [couchdb-lucene] INFO Indexing fisica-nist3/lucene/by_subject from scratch.
[info] [<0.98.0>] 127.0.0.1 - - 'GET' /fisica-nist3/_changes?since=0&limit=250&include_docs=true 200
2009-10-02 09:08:18,590 [couchdb-lucene] ERROR Error updating index.
java.lang.RuntimeException: Invalid object type: org.mozilla.javascript.NativeObject
at com.github.rnewson.couchdb.lucene.Rhino.map(Rhino.java:129)
at com.github.rnewson.couchdb.lucene.Index$Indexer.updateDatabase(Index.java:267)
at com.github.rnewson.couchdb.lucene.Index$Indexer.updateIndex(Index.java:194)
at com.github.rnewson.couchdb.lucene.Index$Indexer.access$100(Index.java:51)
at com.github.rnewson.couchdb.lucene.Index.main(Index.java:347)
at com.github.rnewson.couchdb.lucene.Main.main(Main.java:33)
2009-10-02 09:08:19,070 [couchdb-lucene] INFO Committed changes to index (1 documents in index, 0 deletes).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.