mgaare / cumulusrdf Goto Github PK

0.0 0.0 0.0 0 B

Automatically exported from code.google.com/p/cumulusrdf

cumulusrdf's People

cumulusrdf's Issues

the RespositoryConnection object cannot clear the repository with null context

What steps will reproduce the problem?
1. when the HttpRepository send a clear request with null context
2. the RepositoryConnectino that is get from the repository in servletContext 
would execute like conn.clear(null)
3. then get the message as "not supported: contexts == null || contexts.length 
== 0"

What is the expected output? What do you see instead?
according to the sesame API, if the context is null, then it would clear the 
whole repository. So it should support this operation instead.

Please use labels and text to provide additional information.

Original issue reported on code.google.com by [email protected] on 8 Apr 2014 at 12:51

SPARQLServlet fails to send error

SPARQLServlet does not send error properly. See attached stacktrace.

Original issue reported on code.google.com by andreas.josef.wagner on 24 Jan 2014 at 5:04

Attachments:

sparqlservlet-bug.txt

Simple keyword search

Simple keyword search: just a conjunction of terms tokenised from literals. 

  * Could be done using CQL collections: http://www.datastax.com/documentation/cql/3.0/webhelp/index.html#cql/cql_using/use_collections_c.html#useCollections
  * Lucence/Solr integration
    * Stargate: http://tuplejump.github.io/stargate/index.html //looks cool
    * Lucandra/Solandra: https://github.com/tjake/Solandra //not maintained
    * Datastax Enterprise search(DSE) //not open-source

Original issue reported on code.google.com by andreas.josef.wagner on 12 Feb 2014 at 4:18

For proxy mode not all indexes are needed

What steps will reproduce the problem?
1. Loader creates 4 indexes but only CSPO would be needed for proxy mode

Original issue reported on code.google.com by [email protected] on 23 May 2012 at 10:12

Better documentation re configuration

Documentation is unclear.

Webapp can be both configured using config file in /etc or WEB-INF properties.

Client does not read config file.

Possible solutions:
* improve documentation to make current setup clearer
* get rid of client and do loading also via webapp HTTP interface (so only 
webapp needs to be configured) - should be possible with current setup as 
thread buffers input and thus can iterate over the in-memory buffer for 
multiple index construction
* generate *.deb which installs webapp and config file (and log files with 
logrotate) in the right directories and with cassandra dependencies
* ?

Original issue reported on code.google.com by [email protected] on 2 May 2012 at 8:56

Timeout to connect to Cassandra too low

What steps will reproduce the problem?
1. Start Cassandra
2. Start Tomcat

What is the expected output? What do you see instead?

Cumulus webapp should connect to Cassandra, but Cassandra is still booting up.  
Increase timeout (or do retries) for connecting.

Original issue reported on code.google.com by [email protected] on 25 Jan 2013 at 2:21

Sort-merge join instead of nested-loop join

Currently, we use the standard nested-loop join (with index support) from 
Sesame. 

However, stored SPO-style indexes in a sorted fashion is fairly easy in 
Cassandra (and already implemented to some extend). Thus, a sorted-merge could 
be implemented without that much work. See, e.g., [1].

- Andreas

[1] http://www.informatik.uni-freiburg.de/~mschmidt/docs/sp2b_exp.pdf

Original issue reported on code.google.com by andreas.josef.wagner on 29 Jan 2014 at 9:14

Support transactions in Sesame

No support for transactions in Sesame, see 
[http://openrdf.callimachus.net/sesame/2.7/docs/users.docbook?view#section-repos
itory-api6 Sesame documentation].

Original issue reported on code.google.com by andreas.josef.wagner on 22 Nov 2013 at 12:29

Error message (compression-related) for empty results

What steps will reproduce the problem?
1. Browser shows wrong compression message with empty results

What is the expected output? What do you see instead?

Do correct compression.

Original issue reported on code.google.com by [email protected] on 25 Jan 2013 at 2:22

Build errors in branch 1.0.1

What steps will reproduce the problem?
1. svn co https://cumulusrdf.googlecode.com/svn/branches/1.0.1 cumulusRDF
2. cd cumulusRDF
3. mvn clean install

Expected output is a build success but instead a build failure is reported.
Specifically, there are two problems

1) cannot find symbol LRUMap

LRUMap (used for example in NodeDictionaryBase) comes from sesame-sail-rdbms. 
Now, I'm not able to build the project using maven because that jar is 
(indirectly) declared with "runtime" scope.

That means  

- in an Eclipse workspace all works fine (no compilation errors) because m2e 
imports runtime jars (actually it makes no distinction between scopes) in build 
path;
- running a m2e or a Maven build will fail because that dependency is not found 
at compile time. 

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
...
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) 
on project cumulusrdf: Compilation failure: Compilation failure:
[ERROR] 
/home/agazzarini/workspaces/cumulusRDF/cumulusrdf/src/main/java/edu/kit/aifb/cum
ulus/store/dict/NodeDictionaryBase.java:[13,34] package 
org.openrdf.sail.rdbms.util does not exist
[ERROR] 
/home/agazzarini/workspaces/cumulusRDF/cumulusrdf/src/main/java/edu/kit/aifb/cum
ulus/store/dict/NodeDictionaryBase.java:[47,9] cannot find symbol
...
[ERROR] 
/home/agazzarini/workspaces/cumulusRDF/cumulusrdf/src/main/java/edu/kit/aifb/cum
ulus/util/hector/CassandraHectorMap.java:[29,34] package 
org.openrdf.sail.rdbms.util does not exist
[ERROR] 
/home/agazzarini/workspaces/cumulusRDF/cumulusrdf/src/main/java/edu/kit/aifb/cum
ulus/util/hector/CassandraHectorMap.java:[118,9] cannot find symbol
[ERROR] symbol  : class LRUMap
[ERROR] location: class edu.kit.aifb.cumulus.util.hector.CassandraHectorMap<K,V>

2) RestServletPojoTest

This class, which is in the test/src folder, is referenced in a @See comment, 
in RestApplicationResource (line 483) which belongs to main/src folder. 
As consquence of that, RestApplicationResource imports a class which belongs to 
tests which are not visible during the build. 

That is not immediately visible on IDE (i.e. Eclipse) where there are no 
compilation errors but running a m2e or a Maven build I get

[ERROR] 
/home/agazzarini/workspaces/cumulusRDF/cumulusrdf/src/main/java/edu/kit/aifb/cum
ulus/webapp/rest/RESTApplicationResource.java:[44,34] cannot find symbol
[ERROR] symbol  : class RestServletPojoTest
[ERROR] location: package edu.kit.aifb.cumulus.webapp

Original issue reported on code.google.com by [email protected] on 23 Jan 2014 at 10:31

TTL (Time to live) support

Time to live for added data, to be able to use CumulusRDF as a buffer for 
streams (e.g., always keep one year's worth of data of a given stream).

Original issue reported on code.google.com by andreas.josef.wagner on 22 Jan 2014 at 8:22

Add a checkstyle configuration for coding convention checks

We will provide directly in the project the checkstyle configuration. That 
could be used by Maven for continuous integration builds and by developers in 
Eclipse (see the apposite page on wiki for configuring that)

Original issue reported on code.google.com by [email protected] on 4 Feb 2014 at 10:35

Switch to Sesame API 2.7.10 (BNode prefix issue)

As discussed here 

https://groups.google.com/forum/#!topic/cumulusrdf-dev-list/1oW1mhOSHRY

we should move to Sesame API 2.7.10 which solve the BNode prefix issue.

Original issue reported on code.google.com by [email protected] on 4 Feb 2014 at 11:35

Remove unnecessary sesame dependencies

We currently have 

<groupId>org.openrdf.sesame</groupId>
<artifactId>sesame-runtime</artifactId>

in our current pom. This simply adds (almost) all sesame libs. Regardless if 
they are needed. TODO: remove unnecessary sesame dependencies. This would make 
the jar/war more lightweight in terms of space.

Original issue reported on code.google.com by andreas.josef.wagner on 24 Jan 2014 at 8:24

Deletion performance can be very bad

Deleting from the store will trigger one test for deletion from a secondary 
index for every triple. Using a hashtable or sorted tree as buffer would 
increase performance here.

Original issue reported on code.google.com by [email protected] on 11 Feb 2014 at 11:02

Error in build 20

00:37:45,114 ERROR 
[edu.kit.aifb.cumulus.util.hector.CassandraHectorCounterFactory] counter: 
TRIPLE_COUNTER suffered an overflow! current counter value: -3

Original issue reported on code.google.com by andreas.josef.wagner on 3 Mar 2014 at 2:01

Dictionary Performance

SimpleCassandraMapDictionary has a terrible performance. This, in turn, leads a 
bad performance for RDF insert operations.

Original issue reported on code.google.com by andreas.josef.wagner on 22 Jan 2014 at 8:32

Provide dump URI or cli functionality

Need to access all the data; either via Dump CLI or HTTP interface or both.

Original issue reported on code.google.com by [email protected] on 8 Feb 2013 at 12:09

License unclear

Cannot find the license.

Original issue reported on code.google.com by [email protected] on 9 Oct 2013 at 4:07

Test failures in build 23

Lot of test failures after running the whole suite with new Asynch Bulk loader

See 
http://dev.aifb.kit.edu/jenkins/job/CumulusRDF-Milestone-v1.1/lastBuild/testRepo
rt/

Original issue reported on code.google.com by [email protected] on 5 Mar 2014 at 4:29

LoadCLI does not support multithreading any more ...

LoadCLI does not support multithreading any more. It simply uses Sesame to add 
the file. This is not the intended way LoadCLI should work.

Original issue reported on code.google.com by andreas.josef.wagner on 13 Mar 2014 at 11:03

Sparql test cases

As briefly discussed with Andreas, I would like to create a whole SPARQL test 
suite that covers as much scenarios as possible.

To do that, we could use (I asked to author and is ok for him, I'm waiting from 
OReilly permission) the examples (both ttl and rq files in book "Leaning 
SPARQL" [1])

So we will create a test case with several test methods that use and assert the 
examples in the book.

In case OReilly doesn't allow such usage I'll use those examples in order to 
create our own set of datafiles.  

[1] http://www.learningsparql.com/

Original issue reported on code.google.com by [email protected] on 19 Feb 2014 at 2:42

Error in build 20

00:38:06,669 ERROR [edu.kit.aifb.cumulus.store.CassandraRdfHectorTriple] caught 
java.lang.ArithmeticException: / by zero while inserting 0 [0, tries left: 10]
java.lang.ArithmeticException: / by zero
    at com.ecyrd.speed4j.StopWatch.toString(StopWatch.java:258)
    at edu.kit.aifb.cumulus.util.Util.logAndStopTimer(Util.java:245)
    at edu.kit.aifb.cumulus.util.Util.logAndStopTimer(Util.java:218)
    at edu.kit.aifb.cumulus.store.CassandraRdfHectorTriple.batchInsert(CassandraRdfHectorTriple.java:419)

Original issue reported on code.google.com by andreas.josef.wagner on 3 Mar 2014 at 1:59

AbstractCassandraRdfStore.close() doesn't shutdown bulk load workers

The close() method of edu.kit.aifb.cumulus.store.AbstractCassandraRdfStore must 
stop the internal workers pool (as last step) otherwise they don't allow a 
proper shutdown of the system.

Original issue reported on code.google.com by [email protected] on 2 Apr 2014 at 6:21

Bad link on Project Home

This is just a little problem with the home page of the software, rather than 
the software itself. 

What steps will reproduce the problem?
1. Go to Project Home (https://code.google.com/p/cumulusrdf/)
2. Under overview, click on the link to Apache Cassandra
3. You will be redirected to the dead link http://casssandra.apache.org/ 
(cassandra with the letter s 3 times). 

What is the expected output? What do you see instead?

I suppose it should be http://cassandra.apache.org/ (2ses)

Original issue reported on code.google.com by [email protected] on 30 Oct 2013 at 1:16

CumulusRDF webapp GUI

Although this is not a real priority for CumulusRDF, I believe we should create 
a more nice (simple) GUI for web pages.

in order to keep things simple, lightweight and fast, I suggest to use 

- bootstrap [1] for graphical things: there's a dashboard [2] sample page that 
should perfectly fits out needs;
- velocity [3] for dynamic pages: it has a very easy and powerful scripting 
language

In this way we could, at least, substitue the info and the welcome page with a 
more attractive dashboard. On top of that, we could gradually insert some 
additional functionality on the sidebar, as happens on Sesame admin console 
(e.g. summary, reports, export, add data, query, explore, remove data, SPARQL 
query & update)

[1] http://getbootstrap.com/
[2] http://getbootstrap.com/examples/dashboard
[3] http://velocity.apache.org/

Original issue reported on code.google.com by [email protected] on 16 Feb 2014 at 4:24

CRUDServlet.Put assumes objects are URI

What steps will reproduce the problem?
Send a PUT request with 

s=<http://a.b.c#d>
p=<http://a.b.c#e>
o="A literal"

s2=<http://a.b.c#d>
p2=<http://a.b.c#e>
o2="Another literal"

What is the expected output? What do you see instead?
I would expect the following triple on the store

<http://a.b.c#d> <http://a.b.c#e> Another literal"

Instead, the servlet throws an exception because the object is always supposed 
to be a valid URI (i.e. the following line URI o = valueFactory.createURI("A 
Literal") fails)

Original issue reported on code.google.com by [email protected] on 5 Feb 2014 at 3:40

Remove CompositeColumns wherever possible

As discussed in [1], we could remove the CompositeColumns in favor of simple 
byte arrays (byte array concatenations).

[1] 
https://groups.google.com/d/msgid/cumulusrdf-dev-list/52FCE956.3040606%40gmail.c
om

Original issue reported on code.google.com by andreas.josef.wagner on 16 Feb 2014 at 1:01

Better selectivity estimation

Better selectivity estimation, i.e., collect meaningful statistics for, e.g., 
triple pattern, join pattern.

Original issue reported on code.google.com by andreas.josef.wagner on 22 Nov 2013 at 12:21

Evaluate entity queries via a single scan

Currently, entity queries, e.g,. 

?x knows ?y .
?x name "x" .
?x age "18" .

are evaluated via joins along their subject (in the above example: ?x). That 
is, one would need to compute bindings for each triple patten, and join them 
using two equi-joins. 

However, this (probably) could be done much more efficiently with a single 
scan. That is, one would start with a scan of the pattern with the least 
matches (e.g., ?x age "18"):

x1 age "18" --> scan for x1 ?p ?o
x2 age "18" --> scan for x2 ?p ?o
x3 age "18" --> scan for x3 ?p ?o
...

Each such scan (x1 ?p ?o) would result in additional property/object pairs - 
these could be pushed to subsequent triple pattern accesses. For instance,  

"x1 ?p ?o" could find "x1 knows y1", "x1 knows y2", "x1 name "x"" ... The 
former two triples could be pushed to access ?x knows ?y, the latter triple 
("x1 name "x") to pattern access for ?x name "x".

The key advantage is really that scans (sorted accesses) are fairly cheap, in 
comparison to random access probes. Thus, when finding the first potential 
result entity (e.g, x1), we could just scan over (all) its associated triples 
...

- Andreas

Original issue reported on code.google.com by andreas.josef.wagner on 29 Jan 2014 at 9:33

Support further RDF serializations

Support further RDF serializations, e.g., JSON-LD, Turtle, etc. These 
serializations could be used, e.g., in

* Dump CLI
* Loader CLI
* Servlets

Original issue reported on code.google.com by andreas.josef.wagner on 22 Jan 2014 at 8:50

SimpleRDFXMLFormatter escapes Literals too much

What steps will reproduce the problem?
1. Serve data with the SimpleRDFXMLFormatter that contains a Literal that 
contains a space 

What is the expected output? What do you see instead?
Expected output is something like >Luiz Felipe<
Instead we get >Luiz+Felipe<

The reason is that the same escape function is used for Literals and Resources.

Original issue reported on code.google.com by [email protected] on 23 May 2012 at 7:46

Loading large files gives timeouts

What steps will reproduce the problem?
1. Load a large (> 2 m triples) file.
2. You will see timeout messages.

What is the expected output? What do you see instead?

Higher timeouts, perhaps slowing down input.

Original issue reported on code.google.com by [email protected] on 25 Jan 2013 at 2:23

Remove dependencies to NxParser? and Yars

Remove dependencies to NxParser? and Yars, only use Sesame 
model/parsers/writers.

Original issue reported on code.google.com by andreas.josef.wagner on 22 Nov 2013 at 12:18

Upgrade to Sesame 2.7.11

Upgrade to Sesame 2.7.11, see [1].

[1] https://openrdf.atlassian.net/browse/SES/fixforversion/11701

Original issue reported on code.google.com by andreas.josef.wagner on 14 Apr 2014 at 11:14

Add HTTPRepository

Implement a Sesame HTTPRepository. See:

* org.openrdf.http.client.HTTPClient
* 
http://answers.semanticweb.com/questions/22068/exposing-a-triple-store-as-a-sesa
me-http-repository

Original issue reported on code.google.com by andreas.josef.wagner on 22 Jan 2014 at 9:04

Logging framework + message catalog

Two enhancements are included in this issue:

1) Refactor code in order to use a more flexible and fast logging framework 
(log4j or logback). At the moment JULI is used which is optimized in Tomcat but 
relies on standard java util logging which offers a limited set of 
capabilities. 

2) A message catalog, that would consiste in an enumerative interface 
(IMessageCatalog) where all CumulusRDF messages are defined. That would allow a 
structured log with (for example) messages like this

...
2014-01-15 17:05:42,105 INFO  <CRDF-00011> : CUMULUS-RDF 1.0.0 open for 
e-business.  
...

As you can see other than having all messages classified, we could associate a 
code with each message and, for relevant messages (e.g. errors), we could 
create a Wiki page with something like:

- Code: CDRF-000034
- Level: ERROR
- Message: Malformed configuration file.
- Suggested action: check your configuration file blabalbla

I know, that would require a more effort each time we need to write an 
additional log message, but at the same time it will provide a very powerful 
and meaningful log subsystem

Original issue reported on code.google.com by [email protected] on 27 Jan 2014 at 10:11

Output complete URI in error msg

What steps will reproduce the problem?
1. Proxy Mode
2. curl -x http://localhost:8080 -H "Accept: application/rdf+xml" 
http://dbpedia.org/resource/Karlsruhe
ERROR /resource/Karlsruhe 404: resource not found

Want to have the full URI, including host part.

Original issue reported on code.google.com by [email protected] on 2 May 2012 at 8:36

NodeDictionaryBase fails to create (datatyped) literals

NodeDictionaryBase:136 creates literals assuming the n3 string has only the 
value (no language no datatype). 
In case of (example) 

n3 = "2012-02-01T09:53:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>

the Literal instance creation 

Literal l = ValueFactory.createLiteral(n3)

leads to a wrong value because datatype (and language) part is seen has part of 
the value. I mean, a new Literal is created with the following value:

""2012-02-01T09:53:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>"

What is the expected output? What do you see instead?
I would expect a Literal correctly created, with value, language and datatype.

This blocks a lot of unit test because system index triples but is not able to 
correctly return them as part of SELECT or DESCRIBE command

Original issue reported on code.google.com by [email protected] on 2 Feb 2014 at 9:36

Make proper shell based on our plain CLI

Make a proper shell based on our plain CLI functionality. 

See also: 
* 
http://stackoverflow.com/questions/14080604/libraries-for-constructing-an-intera
ctive-shell-for-java-application
* http://java.dzone.com/announcements/clamshell-cli-framework

Original issue reported on code.google.com by andreas.josef.wagner on 14 Mar 2014 at 12:10

Thread Pooling in batchBulkLoad

LoadThread instances should be managed in a pool instead of creating new 
threads for each bulk load.

Original issue reported on code.google.com by [email protected] on 26 Feb 2014 at 12:09

Upgrade to CQL

Switch from Hector thrift client to Datastax CQL client.

Original issue reported on code.google.com by andreas.josef.wagner on 22 Jan 2014 at 8:24

New (Maven) project layout

As discussed here [1], in order to enable several perspectives of the project 
test suite, we should change the project layout a bit. The layout that comes 
from the initial discussion [1] seems something like this:

cumulusrdf
--cumulusrdf-kernel
--cumulusrdf-integration-tests
--cumulusrdf-benchmark
--??

Where 

a) cumulusrdf: a top level project with pom packaging
b) cumulusrdf-kernel: please suggest a more appropriate name :), this is the 
current cumulusrdf module (war packaging). It includes sources and unit tests.
c) cumulusrdf-integration-tests: as the name suggests, this module includes 
only integration / system tests
d) cumulusrdf-benchmark: a special test module dedicated to benchmarking the 
corresponding release artifact

Another interesting module could be a "distribution", that uses the maven 
assembly plugin to produce different kind of artifacts (e.g. onejar, war, 
directory)


[1] 
https://groups.google.com/forum/#!topicsearchin/cumulusrdf-dev-list/maven|sort:d
ate|spell:true/cumulusrdf-dev-list/z3JegSK17gY

Original issue reported on code.google.com by [email protected] on 16 Feb 2014 at 4:15

Support for Cassandra 2.x

CumulusRDF currently only support Cassandra 1.x. Add support for Cassandra 2.x.

Original issue reported on code.google.com by andreas.josef.wagner on 22 Nov 2013 at 12:12

$pageName on load page

"$pageName" on load Web GUI page after successful upload.

Original issue reported on code.google.com by andreas.josef.wagner on 7 Mar 2014 at 2:28

Replace Value [] with SesameStatement

As discussed here

https://groups.google.com/forum/#!topic/cumulusrdf-dev-list/wRZ-2coKPs0

Value arrays will be replaced by SesameStatement(s)

Original issue reported on code.google.com by [email protected] on 2 Feb 2014 at 3:21

Defect in CLI Loader

CLI Loader does not load data ...

Original issue reported on code.google.com by andreas.josef.wagner on 11 Dec 2013 at 1:42

Complex Accept header parsing does not work

What steps will reproduce the problem?
1. access a CumulusRDF URI with a complex accept header (e.g., using multiple 
content types with preferences)
2. problem

What is the expected output? What do you see instead?

The client should get the correctly negotiated format.

Original issue reported on code.google.com by [email protected] on 3 Feb 2013 at 3:34

Running several cassandra-unit tests is not possible

What steps will reproduce the problem?
1. Run more than one unit test that uses cassandra-unit for starting Cassandra

What is the expected output? What do you see instead?
While I expect all tests correctly run, only the first will succeed because 
from the second the embedded Cassandra complains about a duplicate index. This 
seems to be related to cassandra-unit which doesn't provide a way to shutdown 
the embedded instance between tests.

Original issue reported on code.google.com by [email protected] on 31 Jan 2014 at 2:21

Evaluation: Composites vs Byte arrays

This is not really a bug. Instead, as discussed here

https://groups.google.com/forum/#!topic/cumulusrdf-dev-list/vOKdDAXJEqg

We could do some benchmark / test in order to see if we really need Composites. 

We are already working with the low level form of serialization (byte arrays) 
so maybe the abstraction and the "complexity" offered by Composites could be 
avoided.

Original issue reported on code.google.com by [email protected] on 16 Feb 2014 at 2:25

Merged into: #36

mgaare / cumulusrdf Goto Github PK

cumulusrdf's People

cumulusrdf's Issues

Recommend Projects

Recommend Topics

Recommend Org