stratio / cassandra-lucene-index Goto Github PK
View Code? Open in Web Editor NEWLucene based secondary indexes for Cassandra
License: Apache License 2.0
Lucene based secondary indexes for Cassandra
License: Apache License 2.0
This is probably not an issue but a question.
I have a script which bootstraps the data model and inserts some data. The inserts have some hefty logic (they insert > search > update) and run in parallel. (Just to mention, searching is using the lucene index.)
I used to run it on a single node and recently I added one more to form a cluster (2 nodes). I'm experiencing time outs and I tried both on the old stratio-cassandra-2.1.5.0
and the new cassandra-2.1.9
+ cassandra-lucene-index-2.1.9.0
. The database nodes + custom indexes were set-up with pretty much default settings from the documentation.
After some investigation, it seemed that the cluster could not keep up with that concurrency, which was in the order of 10's. It seems pretty strange, given that the database is benchmarked way above that.
To conclude, I'm left with few questions:
Hi,
Is there any resource that details how the indexes managed by cassandra-lucene-index. i.e. the actions that the library takes during refresh.
Specifically, i am looking at the behaviour when my indexed map column is updated during the refresh period and if are there any limitations to the number of updates or the read repairs and such.
Also, when i compared the performance of my UPSERT on the indexed column between C* secondary index and the cassandra-lucene-index, i noticed as much as 20% performance degradation for my UPSERTs. Ofcouse, C* secondary index is not an option for us, but would you know how index mgmt is different between the two. I can understand if the cost difference is during read (because we are allowing much more here), but could not figure what that would be the case for my UPSERT.
Thanks for reading.
It it mentioned Cassandra Lucene Index is compatible with Spark and Hadoop. Can you please provide any spark sql or hive sql examples to create cassandra lucene index in spark and hadoop ? Thank you .
There is no number of matched rows when you do search, only return rows. for example: a query criteria might match 1000 of rows but we only need to get return 10 rows, then we do paging by offset rows by 10 for each time like ElasticSearch and Solr.
When I use current release with Cassandra 3.0, got this error. Work ok with 2.1.11.
error:unable to find custom indexer class 'com.stratio.cassandra.lucene.Index'
Cassandra 3.0 support release soon?
Anything I can help here?
Best Regards
Running com.stratio.cassandra.lucene.schema.mapping.BitemporalMapperTest
Tests run: 85, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.92 sec <<< FAILURE! - in com.stratio.cassandra.lucene.schema.mapping.BitemporalMapperTest
testToString(com.stratio.cassandra.lucene.schema.mapping.BitemporalMapperTest) Time elapsed: 0.011 sec <<< FAILURE!
org.junit.ComparisonFailure: expected:<.../dd, nowValue=176644[44]00000}> but was:<.../dd, nowValue=176644[08]00000}>
at org.junit.Assert.assertEquals(Assert.java:123)
at org.junit.Assert.assertEquals(Assert.java:145)
at com.stratio.cassandra.lucene.schema.mapping.BitemporalMapperTest.testToString(BitemporalMapperTest.java:1205)
I think it's related to time zone.
In documentation here:
https://github.com/Stratio/cassandra-lucene-index/blob/branch-2.1.8/doc/src/site/sphinx/documentation.rst#example-1
the test-users-create.cql link is broken
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building cassandra-lucene-index 2.1.6.1
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ cassandra-lucene-index ---
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ cassandra-lucene-index ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ cassandra-lucene-index ---
[INFO] Compiling 116 source files to /home/rav/cassandra-lucene-index/target/classes
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ cassandra-lucene-index ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/rav/cassandra-lucene-index/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ cassandra-lucene-index ---
[INFO] Compiling 56 source files to /home/rav/cassandra-lucene-index/target/test-classes
[INFO]
[INFO] --- maven-surefire-plugin:2.12.4:test (default-test) @ cassandra-lucene-index ---
[INFO] Surefire report directory: /home/rav/cassandra-lucene-index/target/surefire-reports
-------------------------------------------------------
T E S T S
-------------------------------------------------------
Running com.stratio.cassandra.lucene.query.SearchTest
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.555 sec
Running com.stratio.cassandra.lucene.query.LuceneConditionTest
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.075 sec
Running com.stratio.cassandra.lucene.query.ConditionTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.01 sec
Running com.stratio.cassandra.lucene.query.RegexpConditionTest
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.1 sec
Running com.stratio.cassandra.lucene.query.ContainsConditionTest
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.074 sec
Running com.stratio.cassandra.lucene.query.GeoDistanceConditionTest
Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.1 sec
Running com.stratio.cassandra.lucene.query.PhraseConditionTest
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.049 sec
Running com.stratio.cassandra.lucene.query.FuzzyConditionTest
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.041 sec
Running com.stratio.cassandra.lucene.query.GeoBBoxConditionTest
Tests run: 23, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.038 sec
Running com.stratio.cassandra.lucene.query.SingleFieldConditionTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec
Running com.stratio.cassandra.lucene.query.MatchConditionTest
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.042 sec
Running com.stratio.cassandra.lucene.query.WildcardConditionTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.038 sec
Running com.stratio.cassandra.lucene.query.MatchAllConditionTest
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec
Running com.stratio.cassandra.lucene.query.builder.SearchBuilderTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.006 sec
Running com.stratio.cassandra.lucene.query.builder.MatchConditionBuilderTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.001 sec
Running com.stratio.cassandra.lucene.query.builder.LuceneConditionBuilderTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec
Running com.stratio.cassandra.lucene.query.builder.SortBuilderTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Running com.stratio.cassandra.lucene.query.builder.MatchAllConditionBuilderTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Running com.stratio.cassandra.lucene.query.builder.PrefixConditionBuilderTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.005 sec
Running com.stratio.cassandra.lucene.query.builder.SearchBuildersTest
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.015 sec
Running com.stratio.cassandra.lucene.query.builder.FuzzyConditionBuilderTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Running com.stratio.cassandra.lucene.query.builder.SortFieldBuilderTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec
Running com.stratio.cassandra.lucene.query.builder.RangeConditionBuilderTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Running com.stratio.cassandra.lucene.query.builder.ContainsConditionBuilderTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.002 sec
Running com.stratio.cassandra.lucene.query.builder.PhraseConditionBuilderTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Running com.stratio.cassandra.lucene.query.builder.RegexpConditionBuilderTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Running com.stratio.cassandra.lucene.query.builder.WildcardConditionBuilderTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0 sec
Running com.stratio.cassandra.lucene.query.BooleanConditionTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.039 sec
Running com.stratio.cassandra.lucene.query.PrefixConditionTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.023 sec
Running com.stratio.cassandra.lucene.query.SortFieldTest
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.053 sec
Running com.stratio.cassandra.lucene.query.DateRangeConditionTest
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.051 sec
Running com.stratio.cassandra.lucene.query.RangeConditionTest
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.017 sec
Running com.stratio.cassandra.lucene.schema.ColumnsTest
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.005 sec
Running com.stratio.cassandra.lucene.schema.ColumnTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec
Running com.stratio.cassandra.lucene.schema.mapping.DoubleMapperTest
Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.056 sec
Running com.stratio.cassandra.lucene.schema.mapping.BooleanMapperTest
Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.037 sec
Running com.stratio.cassandra.lucene.schema.mapping.DateMapperTest
Tests run: 28, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.036 sec
Running com.stratio.cassandra.lucene.schema.mapping.StringMapperTest
Tests run: 28, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.021 sec
Running com.stratio.cassandra.lucene.schema.mapping.BigIntegerMapperTest
Tests run: 60, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.043 sec
Running com.stratio.cassandra.lucene.schema.mapping.LongMapperTest
Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.018 sec
Running com.stratio.cassandra.lucene.schema.mapping.InetMapperTest
Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.029 sec
Running com.stratio.cassandra.lucene.schema.mapping.BigDecimalMapperTest
Tests run: 64, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.098 sec
Running com.stratio.cassandra.lucene.schema.mapping.FloatMapperTest
Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.021 sec
Running com.stratio.cassandra.lucene.schema.mapping.UUIDMapperTest
Tests run: 28, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.117 sec
Running com.stratio.cassandra.lucene.schema.mapping.BlobMapperTest
Tests run: 27, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.019 sec
Running com.stratio.cassandra.lucene.schema.mapping.MapperTest
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.019 sec
Running com.stratio.cassandra.lucene.schema.mapping.TextMapperTest
Tests run: 29, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.017 sec
Running com.stratio.cassandra.lucene.schema.mapping.IntegerMapperTest
Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.034 sec
Running com.stratio.cassandra.lucene.schema.mapping.GeoPointMapperTest
Tests run: 37, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.963 sec
Running com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest
Tests run: 34, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 0.04 sec <<< FAILURE!
testGetStopFromStringColumnWithDefaultPattern(com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest) Time elapsed: 0.006 sec <<< FAILURE!
java.lang.AssertionError: expected:<1425081723004> but was:<1425078123004>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at org.junit.Assert.assertEquals(Assert.java:454)
at com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest.testGetStopFromStringColumnWithDefaultPattern(DateRangeMapperTest.java:212)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
testGetStopFromStringColumnWithCustomPattern(com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest) Time elapsed: 0.003 sec <<< FAILURE!
java.lang.AssertionError: expected:<1425078000000> but was:<1425074400000>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at org.junit.Assert.assertEquals(Assert.java:454)
at com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest.testGetStopFromStringColumnWithCustomPattern(DateRangeMapperTest.java:221)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
testGetStartFromStringColumnWithCustomPattern(com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest) Time elapsed: 0.002 sec <<< FAILURE!
java.lang.AssertionError: expected:<1425078000000> but was:<1425074400000>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at org.junit.Assert.assertEquals(Assert.java:454)
at com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest.testGetStartFromStringColumnWithCustomPattern(DateRangeMapperTest.java:150)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
testGetStartFromStringColumnWithDefaultPattern(com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest) Time elapsed: 0.002 sec <<< FAILURE!
java.lang.AssertionError: expected:<1425081723004> but was:<1425078123004>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at org.junit.Assert.assertEquals(Assert.java:454)
at com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest.testGetStartFromStringColumnWithDefaultPattern(DateRangeMapperTest.java:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)
Running com.stratio.cassandra.lucene.schema.analysis.ClasspathAnalyzerBuilderTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.01 sec
Running com.stratio.cassandra.lucene.schema.analysis.PreBuiltAnalyzersTest
Tests run: 43, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.317 sec
Running com.stratio.cassandra.lucene.schema.analysis.SnowballAnalyzerBuilderTest
Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.078 sec
Running com.stratio.cassandra.lucene.schema.SchemaTest
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.11 sec
Running com.stratio.cassandra.lucene.service.LuceneIndexTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.743 sec
Results :
Failed tests: testGetStopFromStringColumnWithDefaultPattern(com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest): expected:<1425081723004> but was:<1425078123004>
testGetStopFromStringColumnWithCustomPattern(com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest): expected:<1425078000000> but was:<1425074400000>
testGetStartFromStringColumnWithCustomPattern(com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest): expected:<1425078000000> but was:<1425074400000>
testGetStartFromStringColumnWithDefaultPattern(com.stratio.cassandra.lucene.schema.mapping.DateRangeMapperTest): expected:<1425081723004> but was:<1425078123004>
Tests run: 834, Failures: 4, Errors: 0, Skipped: 0
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 21.626 s
[INFO] Finished at: 2015-06-20T10:36:33+03:00
[INFO] Final Memory: 30M/77M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project cassandra-lucene-index: There are test failures.
[ERROR]
[ERROR] Please refer to /home/rav/cassandra-lucene-index/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
When I executed the CQL below, I got two lines of result
fred@cqlsh:opensignal> SELECT COUNT(*) FROM os_snapshot2 WHERE ts = 0 and network_name_mapped = 'Verizon' and network_type_mapped = '4' and lucene = '{ filter : {type:"boolean", must:[ {type:"geo_bbox", field:"place", min_latitude: 25, max_latitude: 26, min_longitude: -100, max_longitude: -97} ]} }';
count
-------
100
---MORE---
count
-------
59
So I wonder whether this output style is a part of design or some issue.
According to my schema, the CQL will only contact coordinator and only one cohort node, since all the data with ts = 0 and network_name_mapped = 'Verizon'
is stored in one node (say, node-4).
After I have checked CQL trace, and then found that all operations processed on coordinator node and node-4. So I thought both coordinator and cohort node had issued the same CQL once. And dump their outputs to CQL execution node.
Please let me know whether this is your design or some other issue.
Here is my table schema
CREATE TABLE opensignal.os_snapshot2 (
ts timestamp,
network_name_mapped text,
network_type_mapped text,
box_lat double,
box_long double,
created_at timeuuid,
averaged_over double,
download_speed double,
lat double,
long double,
lucene text,
network_id_mapped text,
ping_time double,
reliability double,
rssi double,
speed_averaged_over double,
upload_speed double,
PRIMARY KEY ((ts, network_name_mapped), network_type_mapped, box_lat, box_long, created_at)
) WITH CLUSTERING ORDER BY (network_type_mapped ASC, box_lat ASC, box_long ASC, created_at ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
CREATE CUSTOM INDEX os_snapshot2_spatial_index ON opensignal.os_snapshot2 (lucene) USING 'com.stratio.cassandra.lucene.Index';
And my index schema
CREATE CUSTOM INDEX os_snapshot2_spatial_index ON os_snapshot2 (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '60',
'schema' : '{
fields : {
place : { type : "geo_point", latitude:"lat", longitude:"long" },
network_type_mapped: { type: "string" }
}
}'
};
I get this warning after adding lucene index jar file into cassandra lib directory:
19:48:03,074 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs multiple times on the classpath.
19:48:03,074 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [jar:file:/cassandra/lib/cassandra-lucene-index-plugin-2.1.8.4.jar!/logback.xml]
19:48:03,074 |-WARN in ch.qos.logback.classic.LoggerContext[default] - Resource [logback.xml] occurs at [file:/cassandra/conf/logback.xml]
Invalid time ranges (where the from is later than the to) prevent cassandra from starting up while replaying the commit log. In the short term, is there a way to correct this with the cluster offline? Longer term, I would think this shouldn't prevent the cluster from even starting up.
ERROR [main] 2015-08-25 08:04:48,502 CassandraDaemon.java:541 - Exception encountered during startup
java.lang.RuntimeException: java.util.concurrent.ExecutionException: com.stratio.cassandra.lucene.IndexException: Error while indexing row java.nio.HeapByteBuffer[pos=0 lim=4 cap=4] in Lucene index onestore.bt.bt_index
at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:403) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:392) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:463) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:119) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:148) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:128) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:352) [apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:524) [apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:613) [apache-cassandra-2.1.8.jar:2.1.8]
Caused by: java.util.concurrent.ExecutionException: com.stratio.cassandra.lucene.IndexException: Error while indexing row java.nio.HeapByteBuffer[pos=0 lim=4 cap=4] in Lucene index onestore.bt.bt_index
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.get(AbstractTracingAwareExecutorService.java:200) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:399) ~[apache-cassandra-2.1.8.jar:2.1.8]
... 8 common frames omitted
Caused by: com.stratio.cassandra.lucene.IndexException: Error while indexing row java.nio.HeapByteBuffer[pos=0 lim=4 cap=4] in Lucene index onestore.bt.bt_index
at com.stratio.cassandra.lucene.Index.index(Index.java:159) ~[cassandra-lucene-index-plugin-2.1.8.2.jar:na]
at org.apache.cassandra.db.index.SecondaryIndexManager$StandardUpdater.updateRowLevelIndexes(SecondaryIndexManager.java:834) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:229) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.Memtable.put(Memtable.java:210) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1237) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:400) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:363) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:455) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.8.jar:2.1.8]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_51]
at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.1.8.jar:2.1.8]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) ~[apache-cassandra-2.1.8.jar:2.1.8]
at java.lang.Thread.run(Thread.java:744) ~[na:1.7.0_51]
Caused by: java.lang.IllegalArgumentException: Wrong order: 2014-01-01T05 TO 2013-01-01T05:00:00.000
at org.apache.lucene.spatial.prefix.tree.NumberRangePrefixTree.toRangeShape(NumberRangePrefixTree.java:109) ~[cassandra-lucene-index-plugin-2.1.8.2.jar:na]
at com.stratio.cassandra.lucene.schema.mapping.BitemporalMapper.makeShape(BitemporalMapper.java:282) ~[cassandra-lucene-index-plugin-2.1.8.2.jar:na]
at com.stratio.cassandra.lucene.schema.mapping.BitemporalMapper.addFields(BitemporalMapper.java:329) ~[cassandra-lucene-index-plugin-2.1.8.2.jar:na]
at com.stratio.cassandra.lucene.schema.Schema.addFields(Schema.java:185) ~[cassandra-lucene-index-plugin-2.1.8.2.jar:na]
at com.stratio.cassandra.lucene.service.RowMapperWide.document(RowMapperWide.java:107) ~[cassandra-lucene-index-plugin-2.1.8.2.jar:na]
at com.stratio.cassandra.lucene.service.RowServiceWide.documents(RowServiceWide.java:129) ~[cassandra-lucene-index-plugin-2.1.8.2.jar:na]
at com.stratio.cassandra.lucene.service.RowServiceWide.index(RowServiceWide.java:91) ~[cassandra-lucene-index-plugin-2.1.8.2.jar:na]
at com.stratio.cassandra.lucene.Index.index(Index.java:154) ~[cassandra-lucene-index-plugin-2.1.8.2.jar:na]
ERROR [pool-4-thread-1] 2015-07-12 16:28:48,730 Log.java:53 - Unrecoverable error during asynchronously indexing
java.lang.IllegalArgumentException: DocValuesField "y" is too large, must be <= 32766
at org.apache.lucene.index.SortedDocValuesWriter.addValue(SortedDocValuesWriter.java:68) ~[cassandra-lucene-index-plugin-2.1.8.0.jar:na]
at org.apache.lucene.index.DefaultIndexingChain.indexDocValue(DefaultIndexingChain.java:434) ~[cassandra-lucene-index-plugin-2.1.8.0.jar:na]
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:376) ~[cassandra-lucene-index-plugin-2.1.8.0.jar:na]
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300) ~[cassandra-lucene-index-plugin-2.1.8.0.jar:na]
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232) ~[cassandra-lucene-index-plugin-2.1.8.0.jar:na]
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:458) ~[cassandra-lucene-index-plugin-2.1.8.0.jar:na]
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1350) ~[cassandra-lucene-index-plugin-2.1.8.0.jar:na]
at com.stratio.cassandra.lucene.service.LuceneIndex.upsert(LuceneIndex.java:185) ~[cassandra-lucene-index-plugin-2.1.8.0.jar:na]
at com.stratio.cassandra.lucene.service.RowServiceWide.doIndex(RowServiceWide.java:95) ~[cassandra-lucene-index-plugin-2.1.8.0.jar:na]
at com.stratio.cassandra.lucene.service.RowService$2.run(RowService.java:170) ~[cassandra-lucene-index-plugin-2.1.8.0.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [na:1.8.0_45]
at java.util.concurrent.FutureTask.run(Unknown Source) [na:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.8.0_45]
at java.lang.Thread.run(Unknown Source) [na:1.8.0_45]
Hello,
During a node failure in a cluster, what is the impact on the secondary indexes stored on that node? Are the indexes lost? If they are lost, are they reconstructed again?
Regards,
Vinay
When evaluating the query performance for the same schema between DSE Search with Solr and this project, we are noticing the DSE to be at least 10 times faster in the same cluster. Would there be any reason why DSE performs the Solr queries much faster?
Hello,
Is it possible to upload last version (2.1.8.2) in Maven Central?
Regards,
Julien
Hi,
Tried with both 2.1.6 and 2.1.8 in different environments and for both of them i am seeing below exception stack when the criteria is more than 12 characters:
Query: select * from
where cassandra_lucene_index = '{filter : {type: "match", field: "", value: "<greater then 12 characters>"}}';Works fine for creiteria valus that are <= 12 characters.
Traceback (most recent call last):
File "./cqlsh.py", line 1150, in perform_simple_statement
rows = future.result(self.session.default_timeout)
File "//apache-cassandra-2.2.0_i1/bin/../lib/cassandra-driver-internal-only-2.6.0c2.post.zip/cassandra-driver-2.6.0c2.post/cassandra/cluster.py", line 3296, in result
raise self._final_exception
Please let me know in you need any additional information.
I have a simple entry (in a single table):
Table structure:
CREATE TABLE users (
user_id int,
name text,
vt_from text,
vt_to text,
tt_from text,
tt_to text,
lucene text,
PRIMARY KEY (user_id, vt_from, tt_from)
);
CREATE CUSTOM INDEX users_index on users(lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '1',
'schema' : '{
fields : {
bitemporal : {
type : "bitemporal",
tt_from : "tt_from",
tt_to : "tt_to",
vt_from : "vt_from",
vt_to : "vt_to",
pattern : "yyyy/MM/dd",
now_value : "2200/12/31"}
}
}'};
value
INSERT INTO users (user_id, name, vt_from, vt_to, tt_from, tt_to)
VALUES (42, 'Bob', '2015/01/01', '2200/12/31', '2015/01/01', '2015/01/05');
The "valid" date range is from 2015-01-01 to 2200-12-31, so I should be able to read it while filtering on a sub-range: [2015-01-06, 2200-12-31]. But this does not work.
SELECT name FROM users
WHERE lucene = '{
filter : {
type : "bitemporal",
field : "bitemporal",
vt_from : "2015/01/06",
vt_to : "2200/12/31"
}
}';
This query does not find any value, despite the VT range intersects the data VT range (and TT also, as I left them to default)
Same result with specific TT in the query filter ( tt_from : "2015/01/01", tt_to : "2200/12/31"
)
But the same query does find data with: vt_from : "2015/01/04"
.
So it seems the data tt_to
(2015/01/05) is somehow filtered by the query vt_from
(2015/01/06)!
Why I cannot query my data?
the query [tt_from, tt_to] range includes the one of the data, it should not be affected by the query VT.
Hi
Any plans for supporting cassandra 2.2.0 ?
Morten
Hello,
Do you have an idea of when it will be possible to page over relevance searches (queries and sorts)?
Regards,
Julien
Just a question about deployment and use of this plugin. Suppose I have one cluster in data center 1 with cassandra with main db using vnodes. And now I would like to add another cluster in data center 2 and only there I would like to keep lucene indexes. Is it possible to separate things like that? I saw that there are some concepts with mixed deployments with cassandra where some nodes are used for main db, some for solr, spark etc.
Hello,
I got timeout if parts of Boolean query is null, for example:
SELECT * FROM table WHERE lucene='{"query":{"type":"boolean", must:null}}'; // timeout
// OperationTimedOut: errors={}, last_host=127.0.0.1
But,
SELECT * FROM table WHERE lucene='{"query":{"type":"boolean", must:{}}}';
//InvalidRequest: code=2200 [Invalid query] message="Unformateable JSON search: Can not deserialize instance of java.util.ArrayList out of START_OBJECT token at [Source: java.io.StringReader@48575ddc; line: 1, column: 27] (through reference chain: com.stratio.cassandra.lucene.search.SearchBuilder["query"]->com.stratio.cassandra.lucene.search.condition.builder.BooleanConditionBuilder["must"])"
Can you add some validations?
Hello,
is there any plan to add aggregations support?
Regards,
Julien
The scenario I created is an index refresh index every second
There is NO NEED to run this code every second if no access (or dirty access - changes of the date) were done to Cassandra from last build. (Even read only is not consider a dirty access).
As now this run unnecessary, creating workload on the machine, while no changes or access are done to the Cassandra!!! .
As you added Lucene to run on the Cassandra itself, you could add a listener/filter code to Lucene/Cassandra refresh loop to disable such runs which is aware of dirtiness or at least for no access.
CREATE INDEX log_os ON sregion.logevent ( os_type );
ON sregion.logevent (stratio_col)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds':'1',
'num_cached_filters':'1',
'ram_buffer_mb':'64',
'max_merge_mb':'5',
'max_cached_mb':'30',
'schema':'{
default_analyzer:"standard",
fields:{
event_code:{type:"string"},
application:{type:"string"},
event_time:{type:"date", pattern:"yyyy/MM/dd"},
username:{type:"string"},
ip_address:{type:"string"},
os_type:{type:"integer"},
data:{type:"text",
analyzer:"english"}}
}';
hello,
Is it possible to index collections (precisely a set) elements?
For example, I have a field, which is a set containing these values : "5", "7", "9"
Is it possible to search rows where "8" is contained (or not) in this set?
Julien
@adelapena
Can I use current release with Cassandra 2.2 or 3.0?
Do you have plan to release for 2.2 and 3.0?
Thanks.
I just wanted to know what the roadmap plans are for supporting cassandra releases. I see there is an active branch for latest cassandra 2.2 support for this plugin. While cassandra 2.1.9 has also been released, is there any plan to also release a 2.1.9 supporting version of this plugin?
Will there be two separate releases supporting the latest vs the stable version of cassandra? Will the existing 2.1.8 release of this plugin also work properly with cassandra 2.1.9?
Hello,
When performing a geodistance search, is it possible to sort results by increasing/decreasing distance from the geo point?
Regards,
Julien
We use the query builder in the client but don't want to include all transitive dependencies of the main package in our build.
If you create a table and custom index like this:
CREATE KEYSPACE IF NOT EXISTS test
WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 1};
USE test;
CREATE TABLE IF NOT EXISTS test_update (
pk int, k0 varchar, lucene text, primary key (pk)
);
CREATE CUSTOM INDEX test_update_index ON test_update (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '1',
'schema' : '{
fields : {
k0 : {type : "string"}
}
}'
};
Then upsert a row like this:
UPDATE test_update SET k0='foo' WHERE pk=0;
And select like this:
SELECT * FROM test_update WHERE lucene='{
query : {type:"match", field:"k0", value:"foo"}
}';
There is no result.
If you replace the UPDATE with INSERT like this:
INSERT INTO test_update(pk,k0) VALUES(0,'foo');
and repeat the SELECT, you get the correct result.
Using stop words with "Contains query" will throw:
InvalidRequest: code=2200 [Invalid query] message="Value discarded by analyzer"
Shouldn't stop words be discarded without any errors? Or is it intended?
Eg:
...
analyzers : {
my_custom_analyzer : {
type:"snowball",
language:"Spanish",
stopwords : "el,la,lo,loas,las,a,ante,bajo,cabe,con,contra"
}
},
...
SELECT * FROM <table> WHERE stratio_col='{ query : { type : "contains", "field : "<field>", values : [ "el", "<other_non_stop_word_term>" ] } }';
InvalidRequest: code=2200 [Invalid query] message="Value discarded by analyzer"
Hey,
I've benchmarked insertions on a cassandra table with and without an index using your code and it appears that having the secondary index divides the insertion rate per 2 on a table with a few fields and far more on one with many fields (up to 6 times slower). Are you aware of it ?
I thought secondary indexes were updated asynchronously but they are for sure updated when the corresponding table is updated.
Switching from stratio-cassandra 2.1.5 to a bare C* 2.1.6 with the cassandra-lucene-index plugin (compiled today) shows inconsistencies on Match, Phrase and Fuzzy queries, that seem to behave differently than expected in the plugin.
Match and phrase seem to have switched from single word search to sentence search, which prevents using them to search for a single word or a group of words in a string (and leaving wildcard for that, which is not as fast).
Seems to work fine, but gives a real slowdown compared to match/phrase/fuzzy :
select ref_expediteur, adresse_1_destinataire
from vision_dev.lt
where lucene = '
{filter : {type:"range", field:"date_depot_lt", lower:"2015-01-25", upper:"2015-04-01"},
query : {type:"wildcard", field:"adresse_1_destinataire", value:"*DANIEL*COGNAC*"}}';
I get the following 7 rows :
ref_expediteur | adresse_1_destinataire
----------------+-------------------------
95762968 | 15 RUE DANIEL DE COGNAC
89162952 | 15 RUE DANIEL DE COGNAC
94262880 | 15 RUE DANIEL DE COGNAC
95823340 | 15 RUE DANIEL DE COGNAC
95162706 | 15 RUE DANIEL DE COGNAC
95042969 | 15 RUE DANIEL DE COGNAC
94443320 | 15 RUE DANIEL DE COGNAC
If I use the following query :
select ref_expediteur, adresse_1_destinataire
from vision_dev.lt
where lucene = '
{filter : {type:"range", field:"date_depot_lt", lower:"2015-01-25", upper:"2015-04-01"},
query : {type:"fuzzy", field:"adresse_1_destinataire", value:"15 RUE DANIEL DECOGNAC"}}';
I get the following 8 rows :
ref_expediteur | adresse_1_destinataire
----------------+-------------------------
94262880 | 15 RUE DANIEL DE COGNAC
95042969 | 15 RUE DANIEL DE COGNAC
RI36692335 | 15 RUE DANIEL DE COSNAC
95762968 | 15 RUE DANIEL DE COGNAC
89162952 | 15 RUE DANIEL DE COGNAC
95823340 | 15 RUE DANIEL DE COGNAC
95162706 | 15 RUE DANIEL DE COGNAC
94443320 | 15 RUE DANIEL DE COGNAC
This seems like a reasonable result.
Than if we differ a bit more from the "real" string :
select ref_expediteur, adresse_1_destinataire
from vision_dev.lt
where lucene = '
{filter : {type:"range", field:"date_depot_lt", lower:"2015-01-25", upper:"2015-04-01"},
query : {type:"fuzzy", field:"adresse_1_destinataire", value:"15 RUE DANIEL COGNAC"}}';
Then we get no match at all (0 rows).
This could be related to me misunderstanding the accuracy needed for fuzzy to return results.
Using the same dataset, we can use match to get results providing the full string (exactly as stored) :
select ref_expediteur, adresse_1_destinataire
from vision_dev.lt
where lucene = '
{filter : {type:"range", field:"date_depot_lt", lower:"2015-01-25", upper:"2015-04-01"},
query : {type:"match", field:"adresse_1_destinataire", value:"15 RUE DANIEL DE COGNAC"}}';
returns 7 rows :
ref_expediteur | adresse_1_destinataire
----------------+-------------------------
95762968 | 15 RUE DANIEL DE COGNAC
89162952 | 15 RUE DANIEL DE COGNAC
94262880 | 15 RUE DANIEL DE COGNAC
95823340 | 15 RUE DANIEL DE COGNAC
95162706 | 15 RUE DANIEL DE COGNAC
95042969 | 15 RUE DANIEL DE COGNAC
94443320 | 15 RUE DANIEL DE COGNAC
Previously, this didn't work since "match" didn't allowed multiple words at once.
Now if I restrict the match to a single word :
select ref_expediteur, adresse_1_destinataire
from vision_dev.lt
where lucene = '
{filter : {type:"range", field:"date_depot_lt", lower:"2015-01-25", upper:"2015-04-01"},
query : {type:"match", field:"adresse_1_destinataire", value:"COGNAC"}}';
Then I get no result at all.
Shouldn't it return all rows containing the word "COGNAC" in it ?
Phrase gets it right, like match does, when the full string is searched as stored :
select ref_expediteur, adresse_1_destinataire
from vision_dev.lt
where lucene = '
{filter : {type:"range", field:"date_depot_lt", lower:"2015-01-25", upper:"2015-04-01"},
query : {type:"phrase", field:"adresse_1_destinataire", value:"15 RUE DANIEL DE COGNAC"}}';
returns 7 rows :
ref_expediteur | adresse_1_destinataire
----------------+-------------------------
95762968 | 15 RUE DANIEL DE COGNAC
89162952 | 15 RUE DANIEL DE COGNAC
94262880 | 15 RUE DANIEL DE COGNAC
95823340 | 15 RUE DANIEL DE COGNAC
95162706 | 15 RUE DANIEL DE COGNAC
95042969 | 15 RUE DANIEL DE COGNAC
94443320 | 15 RUE DANIEL DE COGNAC
But if I try to skip a word like "DE" and set slop like this :
select ref_expediteur, adresse_1_destinataire
from vision_dev.lt
where lucene = '
{filter : {type:"range", field:"date_depot_lt", lower:"2015-01-25", upper:"2015-04-01"},
query : {type:"phrase", field:"adresse_1_destinataire", value:"15 RUE DANIEL COGNAC", slop:2}}';
Once again I get no results.
Here's the create statement for the index :
CREATE CUSTOM INDEX lt_lucene_index ON vision_dev.lt (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '10',
'schema' : '{
fields : {
adresse_1_destinataire : {type : "string"},
adresse_1_expediteur : {type : "string"},
adresse_2_destinataire : {type : "string"},
adresse_2_expediteur : {type : "string"},
date_depot_lt : {type : "date", pattern : "yyyy-MM-dd"}
}
}'
};
Am I misunderstanding something ?
Apologies if there is a better forum for this question.
Documentation specifies that indexing_threads = 0 means synchronous indexing. Does this mean synchronous with the write to the indexed partition? What implications does this have for index reads?
For my scenario, if I have:
Will each index read reflect all previous writes to the partition, or is there still potential for the index to lag behind?
Thanks.
Hi,
Using filter match request with map field does not retrieve the value correctly.
Below the steps to reproduce:
//create table
create table mytesttable (idcol int, testtextcol text, testmapcol map<text,text>, cassandra_lucene_index text, primary key (idcol));
//create index
CREATE CUSTOM INDEX IF NOT EXISTS lucene_test_idx ON mytesttable (cassandra_lucene_index) USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds' : '1', 'schema' : '{ fields : {testtextcol: {type: "string"}, testmapcol: {type: "string"}}}'};
//create test data
insert into mytesttable (idcol, testtextcol, testmapcol) values (1, 'row1', {'attb1': 'row1attb1Val', 'attb2': 'row1attb2Val', 'attb3': 'row1attb3Val'});
insert into mytesttable (idcol, testtextcol, testmapcol) values (2, 'someLongRow2Value', {'attb1': 'someLongattb1Val', 'attb2': 'attb2Val', 'attb3': 'attb3Val'});
insert into mytesttable (idcol, testtextcol, testmapcol) values (3, 'row3', {'attb1': 'row2attb1Val', 'attb2': 'row2attb2Val', 'attb3': 'row2attb3Val'});
insert into mytesttable (idcol, testtextcol, testmapcol) values (4, 'row4', {'attb2': 'row3attb2Val', 'attb3': 'row3attb3Val'});
//this query retrieves the data fine
select * from mytesttable where cassandra_lucene_index = '{filter : {type: "match", field: "testtextcol", value: "row1"}}';
//but this query does not
select * from mytesttable where cassandra_lucene_index = '{filter : {type: "match", field: "testmapcol.attb1", value: "row1attb1Val"}}';
Please let me know if you need any other details.
We have a problem with the consistency of the lucene index.
Our context :
//Create table with the schema bellow
CREATE TABLE keyspace_1.table_1 (
field_1 text,
field_2 int,
field_3 timestamp,
field_4 text,
field_5 text,
field_6 text,
field_7 text,
field_8 text,
field_9 text,
field_10 text,
field_11 text,
field_12 text,
field_13 int,
field_14 text,
field_15 int,
field_16 int,
field_17 text,
field_18 MAP<text, text>,
field_19 text,
field_20 text,
field_21 text,
field_22 int,
field_23 int,
field_24 int,
field_25 text,
field_26 text,
field_27 text,
field_28 text,
field_29 text,
PRIMARY KEY (field_1, field_2, field_3)
);
//Load data on the table (approximativly 100Go of data)
// Create index
alter table keypace_1.table_1 add indextx text ;
CREATE CUSTOM INDEX table_index ON keypace_1.table_1 (indextx) USING 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = { 'refresh_seconds' : '1', 'schema' : '{ fields : { field_6 : {type : "string", sorted: "false"}, field_3 : {type : "date", pattern : "yyyy-MM-dd"} } }' };
// The lucene index does not return the correct data
select field_1, field_6, field_3 from keypace_1.table_1 WHERE indextx='{filter :{type:"contains", field:"field_6", values:["DC"]}}';
select field_1, field_6, field_3 from keypace_1.table_1 WHERE indextx='{filter :{type:"match", field:"field_6", value:"DC"}}';
We use Cassandra 2.1.7 with plugin Stratio 2.1.7.1
JDK 1.7.0_45
Thanks,
Any word on when support for TTL columns will be added?
From what I understand, to add a new index on a field, we have to drop the entire custom index and rebuild everything to include the new index. Could addition of new indexes avoid the need to drop existing indexes? This is going to be problematic for large tables.
If I add a new field (Column) and I like to use this column in the Lucene index.
How can I add a new field to the existing Lucene index?
Possible option = delete the Custum index and create the index again with the new field.
But how can I restart the index process on the old data?
Cassandra version (running on Ubuntu 14.04): Cassandra 2.1.8
cassandra-lucene-index: cassandra-lucene-index-plugin-2.1.8.1.jar
Simple to reproduce. I use the example schema:
CREATE KEYSPACE demo
WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 1};
USE demo;
CREATE TABLE tweets (
id INT PRIMARY KEY,
user TEXT,
body TEXT,
time TIMESTAMP,
latitude FLOAT,
longitude FLOAT,
lucene TEXT
);
CREATE CUSTOM INDEX tweets_index ON tweets (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '1',
'schema' : '{
fields : {
id : {type : "integer"},
user : {type : "string"},
body : {type : "text", analyzer : "english"},
time : {type : "date", pattern : "yyyy/MM/dd", sorted : true},
place : {type : "geo_point", latitude:"latitude", longitude:"longitude"}
}
}'
};
Insert some data:
INSERT INTO demo.tweets ( id, body, latitude, longitude, time, user ) VALUES (21, 'Utah is a great state', 79.13032, 67.2017, 1440780495620, 'Kenny');
INSERT INTO demo.tweets ( id, body, latitude, longitude, time, user ) VALUES (22, 'Indiana is a great state', 79.13232, 67.2727, 1440780995620, 'Andrea');
Then use the following query as a sanity check:
SELECT * FROM tweets WHERE lucene='{
query : {
type:"phrase",
field:"body",
value: "is a great state"}
}';
id | body | latitude | longitude | lucene | time | user
----+--------------------------+----------+-----------+--------+--------------------------+--------
22 | Indiana is a great state | 79.13232 | 67.2727 | 1.0 | 2015-08-28 16:56:35+0000 | Andrea
21 | Utah is a great state | 79.13032 | 67.2017 | 1.0 | 2015-08-28 16:48:15+0000 | Kenny
Now try the following:
SELECT * FROM tweets WHERE lucene='{
query : {
type:"phrase",
field:"body",
value: "is a great state"},
sort :
{
fields: [ {field : "user", reverse : true} ]
}
}';
You'll get the following:
OperationTimedOut: errors={}, last_host=x.x.x.x
If this is a legitimate timeout on two rows of data we are in trouble. ;)
If I sort on numeric fields I get back an immediate response. The problem only manifests itself with TEXT fields.
I have some rows in a test table:
cqlsh:lucene_test> select id, subject, user, time from emails;
id | subject | user | time
----+-----------------------------------------+--------+--------------------------
5 | this is the real thing | robbie | 2015-05-26 21:30:00-0400
1 | this is a test | robbie | 2015-05-26 17:30:00-0400
2 | this is another test | robbie | 2015-05-26 18:30:00-0400
4 | this is a fourth test | robbie | 2015-05-26 20:30:00-0400
7 | this is it | robbie | 2015-05-26 23:30:00-0400
6 | this is even better than the real thing | robbie | 2015-05-26 22:30:00-0400
3 | this is a third test | robbie | 2015-05-26 19:30:00-0400
(7 rows)
It has this index:
create custom index email_index on emails(lucene)
using 'com.stratio.cassandra.lucene.Index'
with options = {
'refresh_seconds':'1',
'schema': '{
fields: {
id : {type : "integer"},
user : {type : "string"},
subject : {type : "text", analyzer : "english"},
body : {type : "text", analyzer : "english"},
time : {type : "date", pattern : "yyyy-MM-dd hh:mm:ss"}
}
}'
};
The index works for some queries. Here's an example:
cqlsh:lucene_test> SELECT * FROM emails WHERE lucene='{
... filter : {type:"range", field:"time", lower:"2015-05-26 18:29:59"},
... query : {type:"fuzzy", field:"subject", value:"thingy", max_edits:1}
... }';
id | body | lucene | subject | time | user
----+--------------------------------------------------+-----------+-----------------------------------------+--------------------------+--------
5 | this is a test of the emergency broadcast system | 1.1713032 | this is the real thing | 2015-05-26 21:30:00-0400 | robbie
6 | this is a test of the emergency broadcast system | 1.0541729 | this is even better than the real thing | 2015-05-26 22:30:00-0400 | robbie
(2 rows)
I run a variety of phrase queries, all of which fail to return results:
cqlsh:lucene_test> SELECT * FROM emails WHERE lucene='{query : {type:"phrase", field:"subject", values:["this", "is", "test"], slop: 1}}';
id | body | lucene | subject | time | user
----+------+--------+---------+------+------
(0 rows)
cqlsh:lucene_test> SELECT * FROM emails WHERE lucene='{query : {type:"phrase", field:"subject", values:["this", "is", "test"], slop: 2}}';
id | body | lucene | subject | time | user
----+------+--------+---------+------+------
(0 rows)
cqlsh:lucene_test> SELECT * FROM emails WHERE lucene='{query : {type:"phrase", field:"subject", values:["this", "is", "test"], slop: 5}}';
id | body | lucene | subject | time | user
----+------+--------+---------+------+------
(0 rows)
cqlsh:lucene_test> SELECT * FROM emails WHERE lucene='{query : {type:"phrase", field:"subject", values:["this", "is", "a", "test"], slop: 5}}';
id | body | lucene | subject | time | user
----+------+--------+---------+------+------
(0 rows)
cqlsh:lucene_test> SELECT * FROM emails WHERE lucene='{query : {type:"phrase", field:"subject", values:["this", "is", "a", "test"]}}';
id | body | lucene | subject | time | user
----+------+--------+---------+------+------
(0 rows)
cqlsh:lucene_test> SELECT * FROM emails WHERE lucene='{query : {type:"phrase", field:"subject", values:["this"]}}';
id | body | lucene | subject | time | user
----+------+--------+---------+------+------
(0 rows)
create index before data insertion is ok, but when I created index on existing data(millions rows) , the query result always 0 row return.
Hello,
I simulated the situation of a down node for a "long time" and returning alive.
So I've set a 3 nodes cluster with RF3 replication, hinted_hand_off deactivated
I insert a row such as :
{ id : 1 , user : manu, tweet : hello world }
with an appropriate custom index
I put my node 3 offline and update the content of "tweet"
Update documents set tweet="hello paris" where id=1
I bring my node 3 online and perform this request :
SELECT * FROM documents WHERE lucene='{query : {type:"phrase", field:"tweet", value:"hello world", slop:1}}';
returning me 0 or 1 row depending on the responding nodes.
Is there a solution to prevent a comportment like that or it's totaly normal ?
Hi,
Tried both with 2.1.6 and 2.1.8 on difference environment but sorting over Map field does not seem to work.
Please note that sorting with other c* type as well as filter/query over map fields works fine, just that sorting over Map fields does not do anything.
Hope the usage is correct.
I'm running a simple bitemporal query using version 2.1.8.4. I run the query twice (right after each other) and getting a different number of results each time. No inserts/updates/etc are running against the database during this time. The data has been static for about 12 hours and the index was refreshed prior to running either query. If I run without the lucene condition, I always get the same results.
Query:
SELECT * FROM tab1 WHERE lucene='{filter : {
type:"boolean",
must:[
{type : "bitemporal", field : "bitemporal", vt_from : "2015/08/28 10:46:23:629", vt_to : "2015/08/28 10:46:23:629", tt_from : "2015/08/28 10:46:23:629", tt_to : "2015/08/28 10:46:23:629" }
]
}
}
' AND key='someKey';
Found: 393053.
****** finished query 2015-08-28T10:36:18.512-04:00
Found: 393052.
****** finished query 2015-08-28T10:37:57.464-04:00
Found: 393070.
****** finished query 2015-08-28T10:48:03.837-04:00
Found: 393085.
****** finished query 2015-08-28T10:49:39.656-04:00
Hi, I tried to query my example table using lucene index with timestamp sorting. If I use only one filter like 'range' sorting by date works great, I can reverse it also. But after I change my filter to boolean type to have two different conditions like range and prefix then reverse sorting stops working - I cannot sort in reverse anymore by date. So this works fine (can sort in both directions by date):
SELECT modifytime,noteid,userid FROM epodb.notes
WHERE fts_index = '{filter : {type:"range", field:"modifytime", lower:"2015/09/28 00:00:00"},
sort : {fields: [ {field:"modifytime", reverse:true}]}}'
limit 20;
But this one stops sorting in reverse by date (regardless the reverse parameter results are always sorted ascending):
SELECT modifytime,noteid,userid FROM epodb.notes
WHERE fts_index = '{filter : {
type : "boolean",
must : [{type:"range", field:"modifytime", lower:"2015/09/28 00:00:00"},
{type : "prefix", field : "value", value : "a"}],
sort : {fields: [ {field:"modifytime", reverse:true}]}}}'
limit 20;
I created this table:
CREATE TABLE IF NOT EXISTS flat3 (
id uuid,
title text,
description text,
creationDate timestamp,
lastModificationDate timestamp,
lucene TEXT,
PRIMARY KEY(id)
);
And I would like to create the following index (I'm using the 2.1.8.1 version):
CREATE CUSTOM INDEX flat3_index ON flat3 (lucene)
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
'refresh_seconds' : '10',
'schema' : '{
fields : {
id : {type : "uuid"},
title : {type : "text"},
description : {type : "text"},
lastModificationDate : {type : "date"}
}
}'
};
But I get the following error:
ErrorMessage code=2300 [Query invalid because of configuration issue] message="Error while validating Lucene index options: 'schema' is invalid : No column definition lastModificationDate for mapper lastModificationDate"
Any idea?
hi all,
excuse me if this is not the right location to post my queries, but by lack of information about another more suitable website I am posting them here :
we are investigating different options that exist to run a relatively small 'big data' infrastructure for our customer, and are looking currently at the Stratio-platform offering .
The total amount of data that we need to process is roughly 5 TB.
We have budget for 6 servers, each with 16 dual-thread cores and 128 GB of RAM.
To be sure that your solution can be implemented within the restrictions of our customer's budget, security policies, and IT-departmental policies, we have the following questions:
Many thanks in advance for your swift replies ! Please reply on each question, feel free to redirect us for each individual question to detailed (not generic) answers somewhere on the web.
Rob
I'm loading config files in Cassandra + index with Lucene.
A small column family with 4 fields and a config file of 15M gives no problem.
But a column family with more than 100 fields and one field is a config which is only 271K, then I got an error? Can't index column value of size 140952 for index null?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.