robstewart57 / rdf4h Goto Github PK

View Code? Open in Web Editor NEW

76.0 76.0 28.0 2.46 MB

rdf4h is a library for working with RDF in Haskell

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.12% Haskell 98.70% HTML 1.07% Dhall 0.01% Nix 0.09%

rdf4h's Introduction

rdf4h - An RDF library for Haskell

rdf4h is a library for working with RDF in Haskell.

For details see the GitHub project page:

http://robstewart57.github.io/rdf4h/

Supports GHC versions from 9.2.5 (stackage lts-20.11).

Development with Nix and direnv

To enter a development environment, you can use Nix and direnv which install all required software, also allowing you to use your preferred shell. Once installed, just run

$ direnv allow

and you'll have a working development environment for now and the future whenever you enter this directory.

This development environment allows to use either Stack or Cabal for building the software.

RDF formats

The coverage of the W3C RDF standards are:

Format	Parsing	Serialising
NTriples	complete	complete
Turtle	complete	complete
RDF/XML	complete	not supported

These results are produced with version 4.0 of this library.

These tests are run on the W3C unit tests for RDF formats: https://github.com/w3c/rdf-tests.

Feature requests

The parsers in this library parse large files/strings contents entirely before generating RDF triples. This doesn't scale for very large files. Implementing stream based RDF parsers would overcome this problem, e.g. by creating input streams enabling output streams in the io-streams library to consume triples on-the-fly during parsing. This is discussed here: #56 (comment) and #44 (comment)
RDF/XML serialisation of RDF graphs.

Running tests

To run all the tests (parsers and the library API):

$ git submodule update --init --recursive
$ git submodule foreach git pull origin gh-pages
$ stack test --test-arguments="--quickcheck-tests 1000"

To run specific parser tests when bug fixing:

$ stack test --test-arguments="--pattern /parser-w3c-tests-ntriples/"
$ stack test --test-arguments="--pattern /parser-w3c-tests-turtle/"
$ stack test --test-arguments="--pattern /parser-w3c-tests-xml/"

Running benchmarks

To run the bencharks:

$ wget https://www.dropbox.com/s/z1it340emcreowj/bills.099.actions.rdf
$ gzip -d bills.099.actions.rdf.gz
$ stack bench

rdf4h's People

Contributors

Stargazers

Watchers

rdf4h's Issues

GHC 8.6.3 test run

Running cabal -O2 new-build and cabal -O2 new-test --enable coverage results in the following:
rdf4h-3.1.1-test-rdf4h.log

Check `parse . serialize == id` for NTriples, Turtle and RDF/XML

Discussion moved from #67 (comment) .

serialize . parse == id

Something like this would be a good property check for all three Ntriples, Turtle and RDF/XML formats.

Ideally we'd have:

parse . serialize == id

Since we use the generator instances we already have for RDF graphs in rdf4h. i.e.

Use the Arbitrary instances in testsuite/tests/Data/RDF/PropertyTests.hs to generate RDF graphs.
Serialise the graph to NTriples, Turle and RDF/XML formats.
Parse that data back into RDF graphs in Haskell.

Check the graphs are equivalent.

As you say @wismill , it comes down the how equivalence check is performed. We do have isIsomorphic and isGraphIsomorphic in https://github.com/robstewart57/rdf4h/blob/master/src/Data/RDF/Query.hs .

The problem we have with property based testing of serialize . parse == id, is that we'd need predefined NTriples, Turtle and RDF/XML inputs to parse.

test-rdf4h: rdf-tests/turtle/manifest.ttl: openFile: does not exist (No such file or directory)

When building for Stackage LTS 8:

/tmp/stackage-build13/rdf4h-3.0.1$ dist/build/test-rdf4h/test-rdf4h
test-rdf4h: rdf-tests/turtle/manifest.ttl: openFile: does not exist (No such file or directory)

New rdf4h release coinciding with xmlbf 0.6 release

The new XML parser depends on monad transformer additions added by @wismill to the xmlbf library. This dependency will be reflected in the release of 0.6 of the xmlbf library.

Once xmlbf-0.6 has been uploaded to hackage, the stack.yaml file for rdf4h should add xmlbf-0.6 as an extra dependency, removing the information about the xmlbf git repo and commit ID, since stack will find xmlbf-0.6 on hackage.

Base URI not detected with TurtleParser

Here is my simple Turtle file :

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base <http://example.org> .

<http://example.org> rdf:type owl:Ontology ;
                     owl:versionIRI <http://example.org/0.1> .

The following code is trying to read the input file and read its baseUri.

main :: IO ()
main = do
  graphOpt <- (parseFile (TurtleParser Nothing Nothing) "test.owl" :: IO (Either ParseFailure (RDF TList)))
  case graphOpt of
    Left _ -> putStrLn "Error..."
    Right graph -> do
      let myBaseUri = unBaseUrl $ fromJust $ baseUrl graph
      putStrLn myBaseUri

However, I'm getting a Maybe.fromJust: Nothing error. Something I'm doing wrong ?

NTriplesParser optimization

I worked on various optimizations of the NTriplesParser. It's not ready for a PR yet, but if you're interested, you can check it out from the optimize1 branch of my fork (https://github.com/mgmeier/rdf4h/tree/optimize1).

There are two major optimizations:

Blank node label parsing
oneOf / noneOf parsers with large sets of characters are always a red flag with parsec. Particularly one oneOf parser had a range over thousands of characters combined. The problem with these is, that all these values have to be constructed (by enumeration) for the parser to run; in the worst case, multiple times, in case the corresponding closure gets GC'ed. I replaced them with simple range checks: if (c >= x && c <=y) || (c >= x' && c <=y') ...
This brought down parsing a .nt file (~600000 triples) with many blank node labels from completely unusable to 45 seconds.
UNode parsing
I used a combination of different optimizations:
- Don't construct Text values prematurely, especially not thousands of singletons; at the same time avoid conversions to and from String for running several validations. I had to duplicate some of the functionality in Types.hs for that.
- Avoid constructing the same string values over and over. For that, I use a map from hash value to string as a parser state. Only previously unencountered strings are memorized. The Triple type is replaced by a intermediate type containing hash values; the triples are filled in only at the end of a parse.
- As an extension to the former, skip the well-formedness check of a URI of which the hash value is already in the parser state (meaning that it has been validated previously). The validation check is a somewhat expensive operation, and we only need it once per URI.
These optimizations gave a speed-up of ~35% parsing an .nt file with loads of UNodes/URIs.

If you're interested in my approach and can verify that it does improve things, we could definitely discuss on how to proceed from here (maybe IRC or mail), since what I suggested here is a proof of concept, which may need more cleanup and streamling the design with the rest of the library. Also the optimizations imply some complex changes and are more than simple one-liners. However, I'd be happy to cotribute to rdf4h, so let me know what you think.

BTW I have not touched any other parsers, the library's core types or any RDF representations.

TurtleParse produces invalid parsing tree (W3C manifest.ttl)

Parsing the following:

<#datatypes-intensional-xsd-integer-decimal-compatible> a mf:NegativeEntailmentTest;
  mf:name "datatypes-intensional-xsd-integer-decimal-compatible";
  rdfs:comment """
    The claim that xsd:integer is a subClassOF xsd:decimal is not
    incompatible with using the intensional semantics for
    datatypes.
  """;
  rdfs:approval rdft:Approved;
  mf:entailmentRegime "RDFS" ;
  mf:recognizedDatatypes ( xsd:decimal xsd:integer ) ;
  mf:unrecognizedDatatypes ( ) ;
  mf:action <datatypes-intensional/test001.nt>;
  mf:result false .

(from http://www.w3.org/2013/rdf-mt-tests/manifest.ttl) produces invalid output:

Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#NegativeEntailmentTest"))
Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#name"),LNode(PlainL(datatypes-intensional-xsd-integer-decimal-compatible)))
Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/2000/01/rdf-schema#comment"),LNode(PlainL(
    The claim that xsd:integer is a subClassOF xsd:decimal is not
    incompatible with using the intensional semantics for
    datatypes.
  )))
Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/2000/01/rdf-schema#approval"),UNode("http://www.w3.org/ns/rdftest#Approved"))
Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#entailmentRegime"),LNode(PlainL(RDFS)))
Triple(UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#recognizedDatatypes"),BNodeGen(40))
Triple(BNodeGen(40),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#first"),UNode("http://www.w3.org/2001/XMLSchema#decimal"))
Triple(BNodeGen(40),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#rest"),BNodeGen(41))
Triple(BNodeGen(41),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#first"),UNode("http://www.w3.org/2001/XMLSchema#integer"))
Triple(BNodeGen(41),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#rest"),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"))
Triple(BNodeGen(41),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#unrecognizedDatatypes"),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"))
Triple(BNodeGen(41),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#action"),UNode("xxxdatatypes-intensional/test001.nt"))
Triple(BNodeGen(41),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#result"),LNode(TypedL(false,"http://www.w3.org/2001/XMLSchema#boolean")))

Note that it breaks on Triple(BNodeGen(41),UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#unrecognizedDatatypes"),UNode("http://www.w3.org/1999/02/22-rdf-syntax-ns#nil")) where "subject" must be UNode("xxx#datatypes-intensional-xsd-integer-decimal-compatible"), not BNodeGen(41).
Also, the last 2 triples must also have UNode("http://www.w3.org/2001/sw/DataAccess/tests/test-manifest#unrecognizedDatatypes") for their subject.

However, parsing of a list with mf:entries (earlier in this file) is correct.

So, the parser either stumbled on "inline" presentation of a list, trailing ";" or something in between ;-)

typedL does not produce acceptable triple for integer 0

the function typedL applied to a value of 0 and a schema "http://www.w3.org/2001/XMLSchema#integer" produces an empty string, not "0"; the empty string is not an acceptable value for an integer according to https://www.w3.org/TR/xmlschema-2/#integer.
The problem originates with the lines

_integerStr, _decimalStr, _doubleStr :: T.Text -> T.Text
_integerStr = T.dropWhile (== '0')

which consumes all zeros from "0".
I think a conversion using printf would be simpler?
`

https://www.govtrack.us/data/rdf/bills.099.actions.rdf.gz no longer available

It might be good to have a different common benchmark. There are some datasets here: https://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets

Cannot install (upgrade) rdf4h

I have rdf4h-1.2.5 installed and want to upgrade it to the latest rdf4h-1.2.7. I do cabal install rdf4h --reinstall and it fails with:

src/Text/RDF/RDF4H/ParserUtils.hs:46:33:
    Couldn't match expected type `network-2.5.0.0:Network.URI.URI'
                with actual type `URI'
    In the `rqURI' field of a record
    In the expression:
      Request
        {rqURI = uri, rqMethod = GET,
         rqHeaders = [Header HdrConnection "close"], rqBody = B.empty}
    In an equation for `request':
        request uri
          = Request
              {rqURI = uri, rqMethod = GET,
               rqHeaders = [Header HdrConnection "close"], rqBody = B.empty}
Failed to install rdf4h-1.2.7
cabal: Error: some packages failed to install:
rdf4h-1.2.7 failed during the building phase. The exception was:
ExitFailure 1

It should probably be noted that cabal decided to also upgrade to network-uri-2.6.0.1 package.

dependency missing: ghc-prim

if using rdf4h with ghc 7.4.1 the dependency for ghc-prim is noted. it is required for the import of GHC.Generics in Types.

i would suggest to separate the rdf4h library from the code used for testing and move this code to a separate package; some of the librariess, e.g. quickcheck, are not available on the currently popular ARM plattform (e.g. raspberry, cubie etc). The requirements for the rdf4h library is much smaller than for the testing harness.
(it is not a big problem to edit the cabal file for such installations, but it is again one step away from a standardized procedure).
thank you for consideration!
andrew

relative IRI resolution in w3c tests

Some w3c tests are supposed to be the retrieval IRI for that file, e.g. turtle-subm-01.ttl

@prefix : <#> .
[] :x :y .

Should be parsed into (turtle-subm-01.nt):

_:genid1 <http://www.w3.org/2013/TurtleTests/turtle-subm-01.ttl#x> <http://www.w3.org/2013/TurtleTests/turtle-subm-01.ttl#y> .

This is explicit at http://www.w3.org/2013/TurtleTests/ :

Relative IRI Resolution: The home of the test suite is the URL of this page. Per RFC 3986 section 5.1.3, the base IRI for parsing each file is the retrieval IRI for that file. For example, the tests turtle-subm-01 and turtle-subm-27 require relative IRI resolution against a base of http://www.w3.org/2013/TurtleTests/turtle-subm-01.ttl and http://www.w3.org/2013/TurtleTests/turtle-subm-27.ttl respectively.

How should the w3c rdf4h tests deal with this?

Custom functions for processing URIs

I stumbled upon isAbsoluteUrl function which detects whether a given string presents an absolute URL. It does so by merely detecting a ":" within the given string.

Now, since we already import Network.URI, I thought we could reuse its isAbsoluteURI to rely on "industry-proven" power instead of reinventing the wheel. Replacing our custom isAbsoluteUrl with isAbsoluteURI resulted in +33 failing test cases. This should be investigated and fixed, I believe.

There are, perhaps, more cases where we could replace our functions with those from Network.URI. I suggest that we do so at some point.

Fix failing Turtle parser test turtle-subm-27

I'm seeing two tests failing. It's the same test, and both the attoparsec and the parsec instances of Parser fail.

turtle-subm-27:                                                       FAIL
          Exception: HUnitFailure (Just (SrcLoc {srcLocPackage = "main", srcLocModule = "W3C.W3CAssertions", srcLocFile = "testsuite/tests/W3C/W3CAssertions.hs", srcLocStartLine = 24, srcLocStartCol = 3, srcLocEndLine = 24, srcLocEndCol = 99})) (Reason "not isomorphic: Triple (UNode \"http://w3c.github.io/rdf-tests/turtle/a1\") (UNode \"http://w3c.github.io/rdf-tests/turtle/b1\") (UNode \"http://w3c.github.io/rdf-tests/turtle/c1\")\nTriple (UNode \"http://example.org/ns/a2\") (UNode \"http://example.org/ns/b2\") (UNode \"http://example.org/ns/c2\")\nTriple (UNode \"http://example.org/ns/foo/a3\") (UNode \"http://example.org/ns/foo/b3\") (UNode \"http://example.org/ns/foo/c3\")\nTriple (UNode \"http://example.org/ns/foo/bar#a4\") (UNode \"http://example.org/ns/foo/bar#b4\") (UNode \"http://example.org/ns/foo/bar#c4\")\nTriple (UNode \"http://example.org/ns2#a5\") (UNode \"http://example.org/ns2#b5\") (UNode \"http://example.org/ns2#c5\")\n compared with Triple (UNode \"http://www.w3.org/2013/TurtleTests/a1\") (UNode \"http://www.w3.org/2013/TurtleTests/b1\") (UNode \"http://www.w3.org/2013/TurtleTests/c1\")\nTriple (UNode \"http://example.org/ns/a2\") (UNode \"http://example.org/ns/b2\") (UNode \"http://example.org/ns/c2\")\nTriple (UNode \"http://example.org/ns/foo/a3\") (UNode \"http://example.org/ns/foo/b3\") (UNode \"http://example.org/ns/foo/c3\")\nTriple (UNode \"http://example.org/ns/foo/bar#a4\") (UNode \"http://example.org/ns/foo/bar#b4\") (UNode \"http://example.org/ns/foo/bar#c4\")\nTriple (UNode \"http://example.org/ns2#a5\") (UNode \"http://example.org/ns2#b5\") (UNode \"http://example.org/ns2#c5\")\n")

TripleSerializer treatment of prefixes

the documentation gives the impression that the conversion from triples to an rdf (eg. TripleGraph) will handle the prefixes which are defined in the namespace. in my tests (and my perusal of the code) this seems not to be the case.
i suggest to update the documentaiton accoringly (or to implement the mapping of prefixes).
thank you for very useful code!
andrew frank

add read to automatic derived operations for triples

In an application I produce and read triples (mostly for error tracking) and found that Triple is automatically instantiated for Show but not for Read. Is this intentional?
Changing the code to include Read seems to work (at least I could compile and my code does not show any problem). I would appreciate if this change could be incorporated in the hackage version.
thank you!
andrew

Support for XML literals

This bug is identified by the conformance-xml-example09 test.

comparing-graphs:                                                       FAIL
        Exception: user error (Graph xml-example09 not equivalent to expected:
        Expected:
          Triple (UNode "http://example.org/item01") (UNode "http://example.org/stuff/1.0/prop") (LNode (TypedL "<a:Box xmlns:a=\"http://example.org/a#\" required=\"true\">\n         <a:widget size=\"10\"></a:widget>\n         <a:grommit id=\"23\"></a:grommit></a:Box>\n    " "http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"))
        Found:
          Triple (UNode "http://example.org/item01") (UNode "http://example.org/stuff/1.0/prop") (LNode (TypedL "<a:Box required=\"true\">\n         <a:widget size=\"10\"></a:widget>\n         <a:grommit id=\"23\"></a:grommit></a:Box>\n    " "http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"))
        )

See https://travis-ci.org/robstewart57/rdf4h/jobs/540783693#L1739 .

Issue with `query` for `AdjHashMap`

In the following code, both query results are expected equal, but they are not. Only the first result with TList is correct.

rdf = "PREFIX ex: <ex:> ex:s1  ex:p1 ex:o1 ;  ex:p2 ex:o2  ."
query' g = query g Nothing (Just $ unode "ex:p1") Nothing
parser = TurtleParser Nothing Nothing
g1 = parseString parser rdf  :: Either ParseFailure (RDF TList)
query' <$> g1
-- Right [Triple (UNode "ex:s1") (UNode "ex:p1") (UNode "ex:o1")]
g2 = parseString parser rdf :: Either ParseFailure (RDF AdjHashMap)
query' <$> g2
-- Right [Triple (UNode "ex:s1") (UNode "ex:p1") (UNode "ex:o1"),Triple (UNode "ex:s1") (UNode "ex:p1") (UNode "ex:o2")]

Adapting xmlbf based RDF/XML parser without transformers

It's quite likely that xmlbf will proceed with the removal of monad transformers support.

https://gitlab.com/k0001/xmlbf/issues/25

Currently, the XML parser relies on a commit in the xmlbf repository that has transformers support:

https://github.com/robstewart57/rdf4h/blob/master/src/Text/RDF/RDF4H/XmlParser.hs

Could the implementation of XmlParser.hs be adapted, as suggested in https://gitlab.com/k0001/xmlbf/issues/25#note_178094971 , removing the need to rely on transformers but preserving the stateful nature of the RDF/XML parser with StateT?

CC @wismill

Help parsing large file

Hi there,
I am working on parsing a large turtle file, ideally I would like to turn it into an equivalent haskell program.
I have been profiling the read function and see a growth over time in memory and other things 👍

For 30k lines of the file I got these stats from rdf4h-3.0.1 release from stack.

        total alloc = 29,235,026,136 bytes  (excludes profiling overheads)

COST CENTRE      MODULE                        SRC                                                    %time %alloc

>>=              Text.Parsec.Prim              Text/Parsec/Prim.hs:202:5-29                            17.4    7.1
satisfy          Text.Parsec.Char              Text/Parsec/Char.hs:(140,1)-(142,71)                    16.2   32.7
noneOf.\         Text.Parsec.Char              Text/Parsec/Char.hs:40:38-52                            14.3    0.0

We can see that a large amount of memory and time is spent in the parsec. I am wondering the following :

can we parse this data incrementally ? Would it make sense to read this in line by line and feed that to the parser or something?
can we convert the rdf into an equivalent haskell source program that would be compiled and strongly typed.
will attoparsec help?

Examples of the files are here:
https://gist.github.com/h4ck3rm1k3/e1b4cfa58c4dcdcfc18cecab013cc6c9

IRIError and SchemaError are not used

ping @wismill

Where are IRIError and SchemaError used?

I would've thought that the left value for the following functions:

mkIRI :: Text -> Either String IRI
parseIRI :: Text -> Either String IRIRef
parseRelIRI :: Text -> Either String IRIRef
validateIRI :: Text -> Either String Text
resolveIRI :: Text -> Text -> Either String Text

Would be IRIError or SchemaError rather than String?

Turtle parser does not restore subject context after parsing RDF collections

When parsing a RDF collection, the turtle parser does not restore the subject context. Hence, all further predicate-object tupels are added to the last list node of the created collection:

Example:

@prefix : <http://example.org/foo#> .
:subject
  :predicate1 ( :a ) ;
  :predicate2 :b .

The parser will add :predicate2 :b to the last list blank node instead of the :subject.

Test suite files missing in hackage release

[10 of 10] Compiling Main             ( src/Rdf4hParseMain.hs, dist/build/rdf4h/rdf4h-tmp/Main.dyn_o )
Linking dist/build/rdf4h/rdf4h ...
Preprocessing test suite 'test-rdf4h' for rdf4h-1.3.2...

testsuite/tests/Test.hs:5:18:
    Could not find module ‘Data.RDF.TriplesGraph_Test’
    Perhaps you meant
      Data.RDF.TriplesGraph (needs flag -package-key rdf4h-1.3.2@rdf4h_H6kO2G7c2mkK5Er9o3qrbC)
    Use -v to see a list of the files searched for.

testsuite/tests/Test.hs:6:18:
    Could not find module ‘Data.RDF.MGraph_Test’
    Perhaps you meant
      Data.RDF.MGraph (needs flag -package-key rdf4h-1.3.2@rdf4h_H6kO2G7c2mkK5Er9o3qrbC)
    Use -v to see a list of the files searched for.

testsuite/tests/Test.hs:7:18:
    Could not find module ‘Data.RDF.PatriciaTreeGraph_Test’
    Perhaps you meant
      Data.RDF.PatriciaTreeGraph (needs flag -package-key rdf4h-1.3.2@rdf4h_H6kO2G7c2mkK5Er9o3qrbC)
    Use -v to see a list of the files searched for.

testsuite/tests/Test.hs:8:18:
    Could not find module ‘Text.RDF.RDF4H.XmlParser_Test’
    Perhaps you meant
      Text.RDF.RDF4H.XmlParser (needs flag -package-key rdf4h-1.3.2@rdf4h_H6kO2G7c2mkK5Er9o3qrbC)
    Use -v to see a list of the files searched for.

testsuite/tests/Test.hs:9:18:
    Could not find module ‘Text.RDF.RDF4H.TurtleParser_ConformanceTest’
    Use -v to see a list of the files searched for.

testsuite/tests/Test.hs:11:18:
    Could not find module ‘W3C.RdfXmlTest’
    Use -v to see a list of the files searched for.

testsuite/tests/Test.hs:12:18:
    Could not find module ‘W3C.NTripleTest’
    Use -v to see a list of the files searched for.

testsuite/tests/Test.hs:13:8:
    Could not find module ‘Data.RDF.GraphTestUtils’
    Use -v to see a list of the files searched for.

testsuite/tests/W3C/TurtleTest.hs:8:8:
    Could not find module ‘W3C.Manifest’
    Use -v to see a list of the files searched for.
builder for ‘/nix/store/cll0fh0h91di2c75fncw8k6w8gfw93dj-rdf4h-1.3.2.drv’ failed with exit code 1
error: build of ‘/nix/store/cll0fh0h91di2c75fncw8k6w8gfw93dj-rdf4h-1.3.2.drv’ failed

Tried with 1.3.3, too, the same problem.

Push 1.3.1 to hackage

See title, basically a follow-up to robstewart57/hsparql#13.

adopt network-uri more widely

Migrate Unode Text to:

data Node =
  UNode Network.URI
  | BNode !T.Text
  | BNodeGen !Int
  | LNode !LValue
    deriving Generic

The primary benefit is the URI validation that network-uri implements according to the RFC3986 standard. Two current blockers are:

network-uri uses String representations, not Text . An issue has been opened to inquire. haskell/network-uri#11
rdf4h automatically derives Hashable for nodes This uses the Generic instance for Node. However, URI has no Generic instance in network-uri. I pull request has been opened. haskell/network-uri#12

XMLParser doesn't seem to recognize blank nodes

I posted a question on stackoverflow about this recently. Basically, if you have an XML data structure like this:

    <pgterms:bookshelf>
      <rdf:Description rdf:nodeID="N8d8ab517be5d4d24a574a79c302445fc">
        <dcam:memberOf rdf:resource="2009/pgterms/Bookshelf"/>
        <rdf:value>Napoleonic(Bookshelf)</rdf:value>
      </rdf:Description>
    </pgterms:bookshelf>

It seems to be completely ignored by RDF4H. The blank nodes show up if I first convert the XML file to a Turtle file.

Criterion benchmarks for RDF graph instances

We don't know what the construction or query performance is for MGraph or TriplesGraph. We should know about this so that we can spot which is faster for each use case of the rdf4h API. We should also have this in place so that any new implementations of an RDF instance can be measured against existing ones, which relates to #19 .

Criterion would give us a robust benchmarking platform to understand the performance of the rdf4h API, and its performance limitations. https://hackage.haskell.org/package/criterion

Remove old HXT based RDF/XML parser

@wismill has completely re-implemented the RDF/XML parser, using the xmlbf library. Not only is it a correct implementation (passing w3c unit tests), it is also faster. See #67 (comment) .

It doesn't useful to expose both XML parsers to users. The options are:

Keep the XmlParserHXT module in this repository, for reference, but don't expose it in the cabal file.
Remove the XmlParserHXT module, i.e the srcText/RDF/RDF4H/XmlParserHXT.hs file.

@wismill thoughts?

turtle serializer does not properly deal with prefixes

dear robert
the bug which lead to the long discussion in the other bug report was hitting me again today. the error is very simply in the turtle serializer code: the code expects a map from url to prefix (not prefix to url - it is reversed in a function a bit above, for reasons i do not understand). then the test for the match of the prefix must be with the second element (not the first). i changed from (k,) to (,k) and it works.
can you check my fix and put it into the repository? thank you!

-- Expects a map from uri to prefix, and returns the (prefix, uri_expansion)
-- from the mappings such that uri_expansion is a prefix of uri, or Nothing if
-- there is no such mapping. This function does a linear-time search over the
-- map, but the prefix mappings should always be very small, so it's okay for now.
findMapping :: Map T.Text T.Text -> T.Text -> Maybe (T.Text, T.Text)
findMapping pms uri =
case mapping of
Nothing -> Nothing
Just (u, p) -> Just (p, T.drop (T.length u) uri) -- empty localName is permitted
where
mapping = find ((_, k) -> T.isPrefixOf k uri) (Map.toList pms)
-- exchanged _ and k: the map is from uri to prefix, check for k match as prefix to uri
-- it was reversed in writeTriples

Import of Data.Text.Lazy.Binary

In Data.RDF.Types there is an import of instances

import Data.Text.Lazy.Binary ()

however, I cannot find a package that has or ever had such a module. I changed it to import Data.Text.Binary () from text-binary and that seems to work, but I am puzzled.

hWriteRdf reverses node text

URIs and abbreviated forms are reversed. Literals are written correctly.

I see commit messages about optimizing comparison by storing URIs reversed. Is this some part of that?

W3C test suite

I suggest we discuss implementing W3C test suite in this thread.
It is currently developed in my repository, in "w3tests" branch: https://github.com/cordawyn/rdf4h/tree/w3tests

Examples don't work

Unless I'm missing something, the ParseURLs example casts the result as a TriplesList, which doesn't seem to be exposed anywhere.

Support for GHC 7.10.3 - 8.2.2

TravisCI shows that the versions in the cabal and/or stack yaml file are incompatible with:

lts-6 (7.10.3)
lts-9 (8.0.2
lts-11 (8.2.2)

See https://travis-ci.org/robstewart57/rdf4h/builds/540783687 .

Backslash character parser bugs in NTriples parser

let Right (g::RDF TList) =
    parseString
         NTriplesParser
         (Data.Text.pack "<http://a.example/s> <http://a.example/p> \"\\r\" .")

Evaluating g:

Triple (UNode "http://a.example/s") (UNode "http://a.example/p") (LNode (PlainL "r"))

So the \r character is being parsed just as r.

This is the reason for a number of TurtleParser tests failing, since their results are compared against NTriple golden references parsed with the NTriples parser, e.g.

stack test --test-arguments="--pattern literal_with_escaped_CARRIAGE_RETURN"
stack test --test-arguments="--pattern literal_with_CHARACTER_TABULATION"

JSON-LD support

Is there any thought to adding support for JSON-LD? If no, any interest in pull requests that attempt to add rudimentary support?

RDF graph instance using alga

Alga is a library for algebraic construction and manipulation of graphs in Haskell. See the Haskell 2017 paper Algebraic Graphs with Class (Functional Pearl) (link).

The idea would be to implement a new module Data.RDF.Graph.Alga, with an implementation for all methods in the Rdf class, i.e.

instance Rdf Alga where ...

Ensure a clear definition of the base URI

It is not always clear what the "base URI" means in various places of this library (see this comment.

Therefore we should be more precise in naming and documentation to ensure consistency.

References:

RFC3986, section 5.1
https://www.w3.org/TR/turtle/#sec-iri-references

`same` function passed to p_select_match_fn in Data.RDF.GraphTestUtils

I don't understand some of the same definitions for the property test cases for select*. E.g. two triples t1 and t2 are apparently the same for p_select_match_spo in

same t1 t2 = subjectOf t1 == subjectOf t2 && predicateOf t1 == predicateOf t2 &&
                 objectOf t1 /= objectOf t2

Why objectOf t1 /= objectOf t2 ? I'd have thought objectOf t1 == objectOf t2 .

This oddity is seen in p_select_match_sp and p_select_match_so and p_select_match_spo.

https://github.com/robstewart57/rdf4h/blob/master/testsuite/tests/Data/RDF/GraphTestUtils.hs

Improve performance

Now that rdf4h has a complete support for NTriples & Turtle, it may be a good time to focus on performance:

Parsers
Graph implementation
IRI handling

As mentioned in #35 and #44, there are several places where we could improve the parsers. I think it would be a good idea to keep only one modern parser library (attoparsec or megaparsec) to keep the implementation simple and make it more efficient.

I think the handling of prefixes in UNode is not satisfying. For instance, several important operations require expandTriples which is very expensive. I propose that we remove expandTriples and make use of a smart constructor unode :: Text -> Either IRIError UNode for IRI (currently merely a constructor synonym) that ensure the IRI is a valid absolute IRI. Then have a function mkIRI that accept a namespace (or a prefix mapping using a new type class) to create IRIs, from a relative IRI or a prefixed IRI (see expandURI).

Edit: change the proposed signature of unode to use Either rather than Maybe.

Parsing rdf-schema files

Hello,

First of all thank you for implementing this library. I got how it works pretty quickly and it helped me to solve a lot of implementation details pretty quickly.

Currently I am working on a project to generate code for schema.org schemas which are implemented in rdf-schema format. Here is an example:
https://raw.githubusercontent.com/schemaorg/schemaorg/master/data/releases/3.7/schema.nt

I wanted to parse this format into a schema object which will represent the class structure of rdf schema. After that, I am planing to generate code for different programming languages.

Unfortunately, I am stuck on parsing the schema. I can parse rdf schema as an rdf document like this:
https://github.com/huseyinyilmaz/schemaorg/blob/master/library/Download.hs#L30

This format does not represent the schema itself, instead I would have just have triples that I need to parse to a schema structure. Also it turns out that rdf-schema documents can have references to other schema files with their internet address. So those references should also be downloaded and parsed. Here is an example

<http://schema.org/spatial> <http://www.w3.org/2002/07/owl#equivalentProperty> <http://purl.org/dc/terms/spatial> .

So my question is, does rdf4h library parse rdf-schema links to validate the documents? If not is there a plan to support rdf-schema validation?

Cannot parse literals with single quotes

TurtlePaser fails to parse literals with single quotes, e.g.:

<#literal_with_dquote> rdfs:comment 'literal with dquote "x\"y"' .

(see line 418 in http://www.w3.org/2013/N-TriplesTests/manifest.ttl)

Error message:

Left (ParseFailure "(line 418, column 17):\nunexpected \"'\"\nexpecting whitespace-or-comment or object")

nb: Also applies to http://www.w3.org/2013/N-QuadsTests/manifest.ttl

Implement rapper-like RDF format conversion executable

Inspired by the rapper executable, which is built on top of the Raptor library:

http://librdf.org/raptor/rapper.html

E.g.

rapper -o ntriples http://planetrdf.com/guide/rss.rdf
rapper -i rss-tag-soup -o rss-1.0 pile-of-rss.xml http://example.org/base/
rapper --count http://example.org/index.rdf

This functionality is already partially supported by the rdf4h executable.

fix the RDF/XML parser (#17).
match the executable flags of rapper.
benchmark rapper against the rdf4h executable for all RDF parsing/serialising conversion combinations.

XML/RDF parser doesn't handle files with an XML specification header

This is a known bug. A solution has been sought on the HXT issue track here, and on stackoverflow here, but currently no replies :-/

Corner case property failure for query_match_spo

query_match_spo:                                                        FAIL

        *** Failed! Falsifiable (after 29 tests):

        Triple (UNode "ex:o1") (UNode "ex:o1") (LNode (PlainLL "earth" "fr"))

        Triple (UNode "ex:o1") (UNode "http://www.example.org/foo1") (LNode (TypedL "earth" "http://www.w3.org/2001/XMLSchema#string"))

        Triple (UNode "ex:p1") (UNode "ex:s1") (BNode ":_genid3")

        Triple (UNode "ex:s1") (UNode "http://www.example.org/bar1") (UNode "ex:s1")

        Triple (UNode "ex:s1") (UNode "http://www.example.org/bar1") (LNode (TypedL "world" "http://www.w3.org/2001/XMLSchema#token"))

        Triple (UNode "http://www.example.org/foo1") (UNode "ex:o1") (UNode "ex:s2")

        Triple (BNode ":_genid1") (UNode "http://www.example.org/bar1") (LNode (TypedL "hello" "http://www.w3.org/2001/XMLSchema#int"))

        Triple (BNode ":_genid5") (UNode "http://www.example.org/foo0") (UNode "http://www.example.org/bar0")

        

        Just (Triple (UNode "ex:s1") (UNode "http://www.example.org/bar1") (UNode "ex:s1"))

        Use --quickcheck-replay=566405 to reproduce.

ToXML instance for RDF graphs

From #67 (comment) .

From xmlbf, we could get XML/RDF serialisation for free from a ToXml instance in rdf4h? I.e.

instance RdfSerializer XmlSerializer where ...

in a new file, src/Text/RDF/RDF4H/XmlSerialzer.hs .

URI validity check

Data.RDF.Types.isAbsoluteURI assumes that the URI will be valid (and uses fromJust), which is not guaranteed to be the case. The functions that use it directly also don't seem to allow a failure. Perhaps everything affected should be wrapped into Maybe, or something that would report the cause of a failure, in order to save potential debugging effort when it happens.

main :: IO ()
main = do
  -- empty list based RDF graph
  let myEmptyGraph = empty :: RDF TList

  triple1 = triple (unode "...") (unode "...") (unode "...")
  graph1 = addTriple myEmptyGraph triple1

  triple2 = triple (unode "...") (unode "...") (unode "...")
  graph2 = addTriple graph1 triple2

  graph3 = removeTriple graph2 triple1

  putStrLn (showGraph graph3)

Is it currently possible to simply combine those operations (addTriple, removeTriple) without creating temporary variables (graph1, graph2) ?

Something in monadic style would be nice. For example:

createGraph = do
  let triple1 = triple (unode "...") (unode "...") (unode "...")
  addTriple $ triple1
  addTriple $ triple (unode "...") (unode "...") (unode "...")
  removeTriple $ triple1