Giter Site home page Giter Site logo

innovimax-sarl / quixdm Goto Github PK

View Code? Open in Web Editor NEW
12.0 4.0 3.0 606 KB

QuiXDM is an ubiquitous open-source implementation of a Streaming Data Model to process XML, JSON, YAML, RDF, CSV, HTML

Home Page: http://innovimax-sarl.github.io/QuiXDM/

License: Apache License 2.0

Java 100.00%
java xdm xml stream-interface jackson rdf yaml csv json

quixdm's Introduction

Build Status Coverity Scan Build Status Code Climate

QuiXDM

QuiXDM is an ubiquitous open-source datamodel to process in a Streaming fashion:

Getting Started

To install it

Why QuiXDM?

There is SAX,StAX, DOM, Jackson, Jena, CSVParser, HTMLParser out there for processing data

Feature\API SAX StAX DOM Jackson QuiXDM
in memory/streaming streaming streaming in memory streaming streaming
push/pull push pull -- pull pull
data model low level XML low level XML low level XML low level JSON XPath Data Model
handle sequence no no no no yes
handle json/yaml no no no yes yes
handle rdf no no no no yes
handle csv no no no no yes
handle html no no no no yes

How does it work?

It uses a consistent datamodel to represent all those contents in streaming.

// Here is the grammar of events
sequence       := START_SEQUENCE, (document|json_yaml|table|semantic)*, END_SEQUENCE
document       := START_DOCUMENT, (PROCESSING-INSTRUCTION|COMMENT)*, element, (PROCESSING-INSTRUCTION|COMMENT)*, END_DOCUMENT
json_yaml      := START_JSON, object, END_JSON
table          := START_TABLE, header*, array_of_array, END_TABLE
semantic       := START_RDF, statement*, END_RDF
element        := START_ELEMENT, (NAMESPACE|ATTRIBUTE)*, (TEXT|element|PROCESSING-INSTRUCTION|COMMENT)*, END_ELEMENT
object         := START_OBJECT, (KEY_NAME, value)*, END_OBJECT
value          := object|array|flat_value
flat_value     := VALUE_FALSE|VALUE_TRUE|VALUE_NUMBER|VALUE_NULL|VALUE_STRING
array          := START_ARRAY, value*, END_ARRAY
array_of_array := START_ARRAY, flat_array+, END_ARRAY
flat_array     := START_ARRAY, flat_value*, END_ARRAY
statement      := START_PREDICATE, SUBJECT, OBJECT, GRAPH?, END_PREDICATE

Mostly look at QuiXToken.java

Use

With Object creation (à la javax.xml.stream.XMLEventReader)

Simplest way to use, is to instantiate innovimax.quixproc.datamodel.in.QuiXEventStreamReader.java

Iterable<Source> sources = 
		"/tmp/file/file_aaa.xml",	
		"/tmp/file/file_aab.json",
		"/tmp/file/file_aac.csv",
		"/tmp/file/file_aad.yml",
		"/tmp/file/file_aae.n3"	
;
QuiXEventStreamReader qesr = new QuiXEventStreamReader(sources);
while(qesr.hasNext()) {
	System.out.println(qesr.next());
}

Lightweight iterator without Object creation (à la javax.xml.stream.XMLStreamReader)

TODO

Well it comes from the fact that Streaming interface in XML should really be streaming. The truth is that there is no such character streaming interface in Java.

  • String is definitely not streamable and limited to 2^31 characters
  • CharSequence, which could have been, is neither because it has length()
  • CharIterator doesn't exist in the JDK (but you can find it here)
  • CharSequence.chars() returns IntStream (instead of CharStream because Java 8 people didn't want to add it)
  • Java 8 Stream implies that every char is boxed (which means it's highly INEFFICIENT)

Having such context, that's why QuiXCharStream and QuiXQName went live in order to :

  • be able to address the TEXT recombination issue (text() node in XDM cannot be contiguous)
  • be able to stream even corner cases XML:
    • huge string
    • huge names
    • huge namespace uris

Contributors

Innovimax is contributing to this work

Related Projects

QuiXDM can be used standalone

This is the data model of QuiXPath and QuiXProc

It is part of two bigger projects :

quixdm's People

Contributors

dependabot[bot] avatar innovimax avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

quixdm's Issues

java.lang.IllegalStateException in the Load class

I got an exception while reading the following document with the Load class:
<a><a>toto</a><b></b></a>

public void test() throws XMLStreamException, QuixException {
  final Load load = new Load(new ByteArrayInputStream(      "<a><a>toto</a><b></b></a>".getBytes()), "");
    while (load.hasNext()) {
      load.next();
    }
}

java.lang.IllegalStateException: Current state END_ELEMENT is not among the 
statesCHARACTERS, COMMENT, CDATA, SPACE, ENTITY_REFERENCE, DTD valid for 
getText() 
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.getText(Unknown Source)
    at innovimax.quixproc.datamodel.Load.updateText(Load.java:169)
    at innovimax.quixproc.datamodel.Load.next(Load.java:115)
    at com.quixpath.tests.quixdm.Bugs.test(Bugs.java:26)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at junit.framework.TestCase.runTest(TestCase.java:168)
    at junit.framework.TestCase.runBare(TestCase.java:134)
    at junit.framework.TestResult$1.protect(TestResult.java:110)
    at junit.framework.TestResult.runProtected(TestResult.java:128)
    at junit.framework.TestResult.run(TestResult.java:113)
    at junit.framework.TestCase.run(TestCase.java:124)
    at junit.framework.TestSuite.runTest(TestSuite.java:232)
    at junit.framework.TestSuite.run(TestSuite.java:227)
    at junit.framework.TestSuite.runTest(TestSuite.java:232)
    at junit.framework.TestSuite.run(TestSuite.java:227)
    at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:49)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)


Original issue reported on code.google.com by [email protected] on 24 Aug 2011 at 10:28

XMLInputFactory.newFactory - java.lang.NoSuchMethodError

What steps will reproduce the problem?
I installed QuiXDM on a new machine. I compiled the code but I got a problem at 
run-time:

java.lang.NoSuchMethodError: 
javax.xml.stream.XMLInputFactory.newFactory()Ljavax/xml/stream/XMLInputFactory;
at innovimax.quixproc.datamodel.Load.<init>(Load.java:38)

I never had this problem before.

Some hints:
 - The JVM of the new machine is Open JDK 1.6. To my knowledge, the javax package is standard but it could be the problem.
 - According to my search, newFactory method was introduced in Java 6. It does not exist in Java 5. As I use Open JDK 1.6, it should not be a problem...



Original issue reported on code.google.com by [email protected] on 30 Aug 2011 at 3:14

Load.hasNext() method

What steps will reproduce the problem?
Use the Load class to get a Stream of events for an InputStream. For example, 
the input stream can be <doc></doc> 

What is the expected output? What do you see instead?
The stream of events is empty.

Please provide any additional information below.
The first call to Load.hasNext() return false and it should return true. The 
current code is 'this.state == State.FINISH;'. It should be 'this.state != 
State.FINISH;'. Right?

Original issue reported on code.google.com by [email protected] on 24 Aug 2011 at 8:49

QuixStreamReader - This should never happen since attribute are processed

Use the QuixStreamReader with the following stream:
QuixEvent.getStartDocument("");
QuixEvent.getStartElement("doc", "");
QuixEvent.getStartElement("a", "");
QuixEvent.getAttribute("id", "", "0");
QuixEvent.getAttribute("att", "", "att0");
QuixEvent.getEndElement("a", "");
QuixEvent.getStartElement("a", "");
QuixEvent.getAttribute("id", "", "1");
QuixEvent.getAttribute("att", "", "att1");
QuixEvent.getEndElement("a", "");
QuixEvent.getEndElement("doc", "");
QuixEvent.getEndDocument("");

If an element has two attributes, the second attribute is not returned by the 
next method. In the previous example, there is no problem with 'id' but there 
is problem with 'att'.

In debug mode, I saw that the following code is reached (in the next method):
case ATTRIBUTE: 
  // This should never happen since attribute are processed
  // DO NOTHING
  break;

According to me, the bug is the case START_ELEMENT. Indeed, the infinite loop 
'while (true)' is broken if the future is not a namespace. In the previous 
example, as an attribute is not a namespace, the loop is broken after the first 
attribute.

The correct code could be:
if (future.isAttribute()) {
  attributes.add(future.asAttribute());
} 
ELSE // new code is here.
if (future.isNamespace()) {
  namespaces.add(future.asNamespace());
} else {
    break;
}

Original issue reported on code.google.com by [email protected] on 24 Aug 2011 at 12:02

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.