Giter Site home page Giter Site logo

mathmlcan's People

Contributors

davidluptak avatar dependabot[bot] avatar formanek avatar jakubadler avatar jimmyli97 avatar michal-ruzicka avatar petrsojka avatar physikerwelt avatar robsis avatar witiko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mathmlcan's Issues

WstxEOFException: Unexpected EOF in prolog

Hello, I am getting a strange exception when trying to index some documents. The exception is as follows:

May 09, 2018 8:42:41 PM cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer execute
SEVERE: error while parsing the input file. 
com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
	at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
	at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134)
	at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040)
	at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
	at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.minimizeElements(ElementMinimizer.java:134)
	at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.execute(ElementMinimizer.java:84)
	at cz.muni.fi.mir.mathmlcanonicalization.MathMLCanonicalizer.executeStreamModules(MathMLCanonicalizer.java:375)
	at cz.muni.fi.mir.mathmlcanonicalization.MathMLCanonicalizer.canonicalize(MathMLCanonicalizer.java:326)
	at cz.muni.fi.mias.math.MathTokenizer.parseMathML(MathTokenizer.java:304)
	at cz.muni.fi.mias.math.MathTokenizer.processFormulae(MathTokenizer.java:280)
	at cz.muni.fi.mias.math.MathTokenizer.reset(MathTokenizer.java:246)
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:613)
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1500)
	at cz.muni.fi.mias.indexing.Indexing.indexDocsThreaded(Indexing.java:145)
	at cz.muni.fi.mias.indexing.Indexing.indexFiles(Indexing.java:89)
	at cz.muni.fi.mias.MIaS.main(MIaS.java:39)

I run a command:

java -jar MIaS-1.6.6-4.10.4-SNAPSHOT.jar -conf ~/sandbox/mias/mias.properties -overwrite ~/sandbox/mias/data/samples/sample-mathml.xhtml ~/sandbox/mias/data/samples

My setup is:

MathMLCan: develop branch
MIaS: master branch
MIaSMath: master branch

$ java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)

The mias.properties configuration is:

INDEXDIR=~/sandbox/mias/indexes/index-0
UPDATE=false
THREADS=16
MAXRESULTS=10000
DOCLIMIT=-1
FORMULA_DOCUMENTS=true

The sample-mathml.xhtml file is as simple as:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
  </head>
  <body>
    <math>
      <mfrac linethickness="1">
        <!-- numerator -->
        <mrow>
          <mi> x </mi>
          <mo> + </mo>
          <mi mathcolor="red"> y </mi>
          <mo> + </mo>
          <mi> z </mi>
        </mrow>
        <!-- denominator -->
        <mrow>
          <mi> x </mi>
          <mphantom>
            <mo> + </mo>
            <mi> y </mi>
          </mphantom>
          <mo> + </mo>
          <mi> z </mi>
        </mrow>
      </mfrac>
      <mfenced open=":" close="?">
      </mfenced>
    </math>
  </body>
</html>

Is there please any help out there or any ideas what could be a problem here?

Operator to identifier conversion (OperatorNormalizer module)

Opposite conversion (mo -> mi) should be allowed, e.g. for function identifiers: exp sin cos tan tg cot cotan cotg ctg ctn sec csc cosec arcsin arccos arctan arccot arcsec arccsc sinh cosh tanh coth cesh csch arcsinh arcosh artanh arcoth arsech arcsch log lg ln

Removing operators can break MathML validity

OperatorNormalizer should not remove mo element if it is a required argument of its parent. This can change formula meaning and even create invalid MathML from a valid input.

TableConvertor module

New module should convert between table representations and other possibilities - e.g. binomial coefficients can be expressed by mtable elements or mfrac elements with linethickness=0

MrowNormalizer - fenced expression format

Apart from removing redundant mrow elements, MrowNormalizer also adds mrow to transform detected fenced expressions to the form:

<mrow><mo>(</mo><mrow> ... </mrow><mo>)</mo></mrow>

I'm not sure what should be preferred in case when this format breaks the rules for mrow minimizing (only one element in inner mrow, etc.) Should the redundant elements be removed or should the exact format be kept?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.