trellis-ldp-archive / trellis-cassandra Goto Github PK
View Code? Open in Web Editor NEWTrellis LDP using Apache Cassandra for persistence
License: Other
Trellis LDP using Apache Cassandra for persistence
License: Other
To avoid potential inter-front-end conflicts, it's better to use C*'s timeuuid
type instead of a simple timestamp
.
To prevent slippage between asynchronous calls to C* and to simplify RDF management, we will change workflow to buffer RDF on resource retrieval (on ResourceService::get
) instead of spooling it (onResource::stream
).
Currently, consistency to Cassandra is varied by server configuration. It should be possible to let it vary as well on a per-request basis using an HTTP header or other tool.
Several new versions of the Cassandra driver have been released since we selected 3.6. Upgrade to the current best choice.
Thorntail startup succeeds, but I don't see Trellis start or any Trellis-related errors. I am running as follows:
the build:
$ mvn -P docker clean package
then within webapp/target:
$ java -jar webapp-0.0.1-SNAPSHOT-hollow-thorntail.jar webapp-0.0.1-SNAPSHOT.war
Console output: https://gist.github.com/gregjan/21c1824bbc2ba71675ec1573f0db410b
@gregjan, question for you--
The current query for checking whether a binary exists is:
SELECT identifier FROM Binarydata WHERE identifier = ? and chunk_index = 0;
Am I right in thinking that the and chunk_index = 0
is unnecessary and be removed? If there is any chunk, even if it isn't the first, we can answer the question with a "yes, this binary exists". Right?
Travis-CI has a painfully outdated Cassandra service. Circle CI may be better, or there may be other alternatives.
Use threadpooling for all asynchronous activities.
Blocked by trellis-ldp/trellis#324
Based on feedback from @gregjan et al., it would be more appropriate to fail fast on startup in the absence of a Cassandra connection, rather than waiting for it to arrive.
CassandraResourceService
and CassandraBinaryService
must support Memento action from the HTTP layer. This implies a new CassandraMementoService
and corresponding schema changes.
After trellis-ldp/trellis@db1792d and 47cf68c, it should be possible to offer an HTTP header that overrides the default (configured) chunk length for storing binaries.
Using Travis-CI is currently blocked by travis-ci/travis-ci#6420. When Travis upgrades to a more recent version of C*, we can use it.
While it's necessary to have a chunk length in hand to write a bitstream, it's not clear that it is necessary to read one. If not, the configured value should not be used, to provide the future possibility of varying it more dynamically.
The .distinct() stream method will creating buffering on the front-end nodes that may be significant for containers with many modifications/mementos. Instead we can use CQL, perhaps adding a "LIMIT 1" to the query in question. Since any row will suffice to establish the contains relationship and since the containment table is partitioned by container id, it should work.
trellis-cassandra
should support the use of Direct Containers.
While exploring my populated test repository, I encountered a 500 when trying to get back the non-RDF resources. I am seeing similar behavior as below for all such resources. Here is the example session:
jansen@X1:~$ curl -v http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090
* Trying 128.8.216.153...
* TCP_NODELAY set
* Connected to ciber-vs1.umd.edu (128.8.216.153) port 10080 (#0)
> GET /srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090 HTTP/1.1
> Host: ciber-vs1.umd.edu:10080
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Link-Template: <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090{?version}>; rel="http://mementoweb.org/ns#Memento"
< Link-Template: <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090{?subject,predicate,object}>; rel="http://www.w3.org/ns/ldp#RDFSource"
< Accept-Patch: application/sparql-update
< Date: Tue, 12 Mar 2019 17:26:03 GMT
< Allow: GET,HEAD,OPTIONS,PATCH,PUT,DELETE,POST
< Connection: keep-alive
< ETag: W/"0a56b5f371e277b53e8d1e686148d9c2"
< Last-Modified: Tue, 12 Mar 2019 16:44:29 GMT
< Vary: Accept
< Vary: Prefer
< Vary: Accept-Datetime
< Accept-Post: text/turtle,application/ld+json,application/n-triples
< Transfer-Encoding: chunked
< Content-Type: text/turtle;charset=UTF-8
< Link: <http://www.w3.org/ns/ldp#BasicContainer>; rel="type"
< Link: <http://www.w3.org/ns/ldp#Container>; rel="type"
< Link: <http://www.w3.org/ns/ldp#RDFSource>; rel="type"
< Link: <http://www.w3.org/ns/ldp#Resource>; rel="type"
< Link: <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090>; rel="original timegate"
< Link: <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090?ext=timemap>; rel="timemap"; from="Tue, 12 Mar 2019 16:44:29 GMT"; until="Tue, 12 Mar 2019 16:44:29 GMT"; type="application/link-format"
< Link: <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090?version=1552409069>; rel="memento"; datetime="Tue, 12 Mar 2019 16:44:29 GMT"
< Link: <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090>; rel="self"
<
<http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090>
<http://purl.org/dc/terms/title> "7090" ;
<http://purl.org/dc/terms/extent> "15" ;
<http://www.w3.org/ns/ldp#contains> <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090/1d0e18fc-e03c-468a-9fb4-5340097e0a75> ;
<http://www.w3.org/ns/ldp#contains> <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090/28c91744-c5c7-41f8-b6f0-ba79f2e2d9af> ;
<http://www.w3.org/ns/ldp#contains> <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090/4222abbf-1e6c-4c5a-9d21-eac3acfba6ad> ;
<http://www.w3.org/ns/ldp#contains> <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090/5a3650a6-ad6d-41c3-b308-5571b66804bb> ;
<http://www.w3.org/ns/ldp#contains> <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090/1d0e18fc-e03c-468a-9fb4-5340097e0a75> ;
<http://www.w3.org/ns/ldp#contains> <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090/28c91744-c5c7-41f8-b6f0-ba79f2e2d9af> ;
<http://www.w3.org/ns/ldp#contains> <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090/4222abbf-1e6c-4c5a-9d21-eac3acfba6ad> ;
<http://www.w3.org/ns/ldp#contains> <http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090/5a3650a6-ad6d-41c3-b308-5571b66804bb> .
* Connection #0 to host ciber-vs1.umd.edu left intact
jansen@X1:~$ curl -v http://ciber-vs1.umd.edu:10080/srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090/4222abbf-1e6c-4c5a-9d21-eac3acfba6ad
* Trying 128.8.216.153...
* TCP_NODELAY set
* Connected to ciber-vs1.umd.edu (128.8.216.153) port 10080 (#0)
> GET /srv/ciber/Transfer+Notes/nara1_vault10/National_Archives/Federal_Records/RG+255+-+Records+of+the+National+Aeronautics+and+Space+Administration/EOS+Data+Files/Crystal+Dynamics/pub/slr/data/fr/jason1/daily/7090/4222abbf-1e6c-4c5a-9d21-eac3acfba6ad HTTP/1.1
> Host: ciber-vs1.umd.edu:10080
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Connection: keep-alive
< Content-Type: text/html;charset=UTF-8
< Content-Length: 80
< Date: Tue, 12 Mar 2019 17:26:28 GMT
<
* Connection #0 to host ciber-vs1.umd.edu left intact
<html><head><title>Error</title></head><body>Internal Server Error</body></html>
Netflix Astyanax used a clever API for chunking large bitstreams. Although Astyanax itself has been deprecated in favor of the Datastax client we currently use, it might be useful to "resuscitate" some of those ideas.
I'm getting a startup error that is related to wildfly. It seems like it is missing a protocol dependency? Have you seen this before?
[email protected] | 2018-11-26 16:19:02,222 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("add") failed - address: ([
[email protected] | ("subsystem" => "undertow"),
[email protected] | ("server" => "default-server"),
[email protected] | ("http-listener" => "default")
[email protected] | ]) - failure description: {"WFLYCTL0080: Failed services" => {"org.wildfly.undertow.listener.default" => "WFLYUT0082: Could not start 'default' listener.
[email protected] | Caused by: java.net.SocketException: Protocol family unavailable"}}
As described here, Thorntail can offer the Wildfly management API. t-c*
might provide that.
The current Memento storage design uses a time series and incurs the cost of table scanning. We can do better by shunting Mementos to a separate table and using the main mutabledata
table only for current information.
Having added the WebDAV resource and two filters to my Trellis application, I'm now getting test failures:
[ERROR] Failures:
[ERROR] LdpBasicContainerIT.testCreateContainerViaPut Check for an ldp:contains triple ==> expected: <false> but was: <true>
[ERROR] MementoBinaryIT Check for a valid response to PUTting an LDP-NR ==> expected: <SUCCESSFUL> but was: <CLIENT_ERROR>
[ERROR] MementoResourceIT Check for a valid response to PUTting an LDP-NR ==> expected: <SUCCESSFUL> but was: <CLIENT_ERROR>
[ERROR] MementoTimeGateIT Check for a valid response to PUTting an LDP-NR ==> expected: <SUCCESSFUL> but was: <CLIENT_ERROR>
[ERROR] MementoTimeMapIT Check for a valid response to PUTting an LDP-NR ==> expected: <SUCCESSFUL> but was: <CLIENT_ERROR>
(replace IT
with Test
). Do we have any information about how WebDAV and Memento APIs interact?
Troll the code and minimize incidents of boxing/unboxing, to lower pressure on GC.
Consistency levels for write and read operations to Cassandra can be configured on a per-statement basis. After the first, a globally-constant consistency level can be configured for read and for write statements-- two settings in all.
A custom CQL (aggregating) function or other means could be used.
For 0.8 release, migrate trellis-cassandra
to the trellis-ldp/trellis Github org.
The chunking size for binaries persisted via CassandraBinaryService
is currently fixed for injected services at 1MB. There is a ctor that accepts chunk length, but it is not injectable. There is no way to use Tamaya config to set the chunk length, and that is the purpose of this ticket.
Documentation is currently crammed into the single README file. This should be unpacked and moved into the GitHub wiki.
I was running performance tests. What I found after two tests, between which I failed to reset the database, was that the root folder showed two identical ldp:contains relationships. Presumably there is only one contained resource, since the object of both triples was the same. I will look at the C* tables and report what I find there in a follow up comment.
Use resolution to https://issues.apache.org/jira/browse/TAMAYA-358 to provide optional persistent config.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.