Giter Site home page Giter Site logo

fred-wang / texzilla Goto Github PK

View Code? Open in Web Editor NEW
128.0 13.0 20.0 1.7 MB

LALR Javascript LaTeX-to-MathML converter compatible with Unicode

Home Page: http://fred-wang.github.io/TeXZilla/

JavaScript 10.86% Python 17.68% HTML 18.80% XSLT 3.70% Shell 3.75% Makefile 3.86% M4 0.79% Yacc 38.87% Lex 1.68%

texzilla's Introduction

TeXZilla

License

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

Description

TeXZilla is a Javascript LaTeX-to-MathML converter compatible with Unicode. It has performed as the fastest state of the art LaTeX-To-MathML converter according to recent research in this field (see [1]). This is still a work in progress and things may change in the future. Please report any bug you find to the issue tracker.

For a quick overview, you can try a live demo, install a Firefox add-on or try a Firefox OS webapp.

You can download a release archive or install an npm package.

Please read the wiki to get more information on how to integrate TeXZilla in your Web page or project as well as a description of the TeXZilla syntax. See also the examples/ directory.

Build Instructions

The following dependencies are required:

On Debian-based Linux distributions, try sudo apt-get install coreutils sed curl make xsltproc python npm phantomjs bash closure-compiler and install Jison with npm install jison -g.

To build TeXZilla, run the tests and generate the minified version:

  ./configure
  make all
  make minify

Type make help for more commands.

References

[1] "Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context" by M. Schubotz, et al. In: Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL). Fort Worth, USA, June 2018. DOI:10.1145/3197026.3197058

texzilla's People

Contributors

andreg-p avatar bkardell avatar fred-wang avatar runarberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

texzilla's Issues

\hline is not supported

Is it a bug? Is there any alternative way to draw a horizontal line to separate rows in a matrix?

Parse error for some matrix environment

The amsmath package define six environments for matrix:

  • matrix,
  • pmatrix,
  • bmatrix,
  • Bmatrix,
  • vmatrix,
  • Vmatrix.

For bmatrix, Bmatrix and Vmatrix we got a parse error.

Steps to reproduce

  • Build the parser
  • Open index.html with Firefox.
  • Type in the box (you can replace bmatrix with Bmatrix or Vmatrix):
\begin{bmatrix}
a_11 & a_12 \\
a_21 & a_22
\end{bmatrix}
  • Press 'TAB'.

Do not use multiple values for tabular attributes (columnalign etc)

From https://mathml-refresh.github.io/mathml/chapter3.html#presm.mtable :

In the above specifications for attributes affecting rows (respectively, columns, or the gaps between rows or columns), the notation (...)+ means that multiple values can be given for the attribute as a space separated list (see Section 2.1.5 MathML Attribute Values). In this context, a single value specifies the value to be used for all rows (resp., columns or gaps). A list of values are taken to apply to corresponding rows (resp., columns or gaps) in order, that is starting from the top row for rows or first column (left or right, depending on directionality) for columns. If there are more rows (resp., columns or gaps) than supplied values, the last value is repeated as needed. If there are too many values supplied, the excess are ignored.

The parser could convert them to single values on mtd elements.

Improve Travis Continuous Integration

I've connected the TeXZilla repository to travis. After several attempts, I ended up doing "build only".

https://travis-ci.org/fred-wang/TeXZilla/builds

Ideally we should also:

  1. run the tests with nodejs
  2. run the tests with phantomjs
  3. run the tests with slimerjs (not available as a package for Ubuntu precise)
  4. run "make minify" (closure-compiler not available as a package for Ubuntu precise, but is in more recent version)

Prime sign treated as superscript

Consider the input "A'". TeXZilla.toMathMLString("A'") gives:

<math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msup><mi>A</mi><mo></mo></msup><annotation encoding="TeX">A'</annotation></semantics></math>

Unfortunately, the macro \prime seems unsupported. Moreover, typing the Unicode character "’" gives a parse error. They are all workarounds I am aware of.

<math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><merror><mtext>Parse error on line 1:
\\(A’\\)
----^
</mtext></merror><annotation encoding=\"TeX\">A’</annotation></semantics></math>

Styling the page

Would it be fine if I styled this page to be a lot more visual friendly?

Abtract away the element/attribute structure

In general, the structure handled by TeXZilla is a string representing an element.

The serialization could be defered and we could instead handle a structure containing:

  • an element name
  • a list of (attribute, value) pairs on that element.
  • a string representing the children

This will allow to:

  • avoid duplicate attributes (not well-forme XML). As in itex2MML, this may happen with table attributes.
  • avoid to use useless mstyle or mrow. For example the mathvariant or color attributes can just be set on token elements.

Support itex2MML commands

That should almost be the case.

  • Issues #12 and #5
  • Need an option to treat xy as <mi>xy</mi>
  • Need to write unit tests for each itex2MML command to ensure they are supported.

switch observer model in custom element?

I was thinking the other day that the observer model is kind of flawed for custom elements because the lifecycle is so wonky... for example, currently

document.body.innerHTML = `<la-tex>f(x)=\\sum_{n=-\\infty}^\\infty c_n e^{2\\pi i(n/T) x} = \\sum_{n=-\\infty}^\\infty \\hat{f}(\\xi_n) e^{2\\pi i\\xi_n x}\\Delta\\xi</la-text>`

Will fail to fire (as would I if you included this at the end of your document, probably) as the nodes already exist - so they aren't being treated 'as parsed'. I think I added this this way and just never added the workarounds.... But.... I think the simplest one is to actually just not use a mutationobserver directly at all and make use of the slotchange event which seems to happen regardless, consistently... Maybe we should switch to use that? Something kinda like this https://glitch.com/edit/#!/mathml-examples?path=js/elements/la-tex.js:21:5 (though, obvs we'd rework a little to include attrs correctly too here?) The above snip will work if you try it in the console on https://mathml-examples.glitch.me/foundation-expansion.html

If we want I can work this up and send a pull...

Add an option to throw an exception when parsing fails

This was reported by @r-gaia-cs. My proposal is to add an optional parameter aThrowExceptionOnError:

toMathMLString = function(aTeX, aDisplay, aRTL, aThrowExceptionOnError)
toMathML = function(aTeX, aDisplay, aRTL, aThrowExceptionOnError)

so that when aThrowExceptionOnError = true, TeXZilla will throw an exception instead of returning an merror element.

Tilde for a non-breaking space is not supported

The tilde character ~ should be treated as a non-breaking space. See the LaTeX FAQ. Original issue: josdejong/mathjs#1299

Input

x~y~z

Expected output

<mi>x</mi>
<mspace linebreak="nobreak" width="mediummathspace" />
<mi>y</mi>
<mspace linebreak="nobreak" width="mediummathspace" />
<mi>z</mi>

Actual output

<mi>x</mi>
<mo stretchy="false">~</mo>
<mi>y</mi>
<mo stretchy="false">~</mo>
<mi>z</mi>

Strange value for lspace

Line 493 of TeXZilla.jison has

$$ = newTag("mpadded", $2, "width=\"0em\" lspace=\"-100%width\"");

The value for lspace is correct?

Strict Mode

$ git log HEAD -1
commit 71f92bc764761fb69db8030e5d548ddcc83d98ca
Author: Frédéric Wang <[email protected]>
Date:   Sun May 31 12:59:01 2015 +0200

    Travis: Just run "make build" without testing for now...
$ texzilla
/usr/local/lib/texzilla-js/TeXZilla.js:876
        function lex() {
        ^^^^^^^^
SyntaxError: In strict mode code, functions can only be declared at top level or immediately within another function.
    at exports.runInThisContext (vm.js:73:16)
    at Module._compile (module.js:443:25)
    at Object.Module._extensions..js (module.js:478:10)
    at Module.load (module.js:355:32)
    at Function.Module._load (module.js:310:12)
    at Function.Module.runMain (module.js:501:10)
    at startup (node.js:124:16)
    at node.js:842:3
$ node --version
v0.11.14

Can't resolve 'webserver'

Hey,
when packing our site with webpack which depends on texzilla, we run into #53 . If we then add "system" as a dependency, we get this warning message instead:

Module not found: Error: Can't resolve 'webserver' in './node_modules/texzilla'
 @ ./node_modules/texzilla/TeXZilla.js
...

It seems like both this issue and #53 still need to be fixed. In #53 it sounds like you are not planning to fix it here - so maybe somewhere else?

Java JDK7 Rhino ScriptEngine issues

Hello,
I've been using TeXZilla as a library in Java ScriptEngine API,
(http://docs.oracle.com/javase/7/docs/technotes/guides/scripting/programmer_guide/)
I found a couple of issues with TeXZilla 0.9.7 when running on the following engine:

JS engine name: Rhino
version: Rhino 1.7 release 0.7.r2.2.el6 2010 08 21
language version: 1.7

This is the default engine for Java 1.7

  1. Math object is sealed, cannot add the log2 property
  2. Object.getPrototypeOf doesn't exist

I've attached a quick patch that fixes the issues.

Improve error message

The default error message don't help very much.

Step to reproduce

  1. Try to parse \frac{1}.

It show

Parse error on line 1:
\frac{1}
--------^
Expecting '{', 'LEFT', 'OPFS', '.', 'BIG', 'BBIG', 'BIGG', 'BBIGG', 'BIGL', 'BBIGL', 'BIGGL', 'BBIGGL', 'NUM', 'TEXT', 'A', 'F', 'MI', 'MN', 'MO', 'OP', 'OPS', 'OPAS', 'MS', 'MTEXT', 'HIGH_SURROGATE', 'BMP_CHARACTER', 'OPERATORNAME', 'MATHOP', 'MATHBIN', 'MATHREL', 'FRAC', 'ROOT', 'SQRT', 'UNDERSET', 'OVERSET', 'UNDEROVERSET', 'XARROW', 'MATHRLAP', 'MATHLLAP', 'MATHCLAP', 'PHANTOM', 'TFRAC', 'BINOM', 'TBINOM', 'PMOD', 'UNDERBRACE', 'UNDERLINE', 'OVERBRACE', 'ACCENT', 'ACCENTNS', 'BOXED', 'SLASH', 'QUAD', 'QQUAD', 'NEGSPACE', 'NEGMEDSPACE', 'NEGTHICKSPACE', 'THINSPACE', 'MEDSPACE', 'THICKSPACE', 'SPACE', 'MATHRAISEBOX', 'MATHBB', 'MATHBF', 'MATHBIT', 'MATHSCR', 'MATHBSCR', 'MATHSF', 'MATHFRAK', 'MATHIT', 'MATHTT', 'MATHRM', 'HREF', 'STATUSLINE', 'TOOLTIP', 'TOGGLE', 'BTOGGLE', 'TENSOR', 'MULTI', 'BMATRIX', 'BGATHERED', 'BPMATRIX', 'BBMATRIX', 'BVMATRIX', 'BBBVMATRIX', 'BVVVMATRIX', 'BSMALLMATRIX', 'BCASES', 'BALIGNED', 'BARRAY', 'SUBSTACK', 'ARRAY', got 'EOF'\frac{1}

Comparison with LaTeX

$ cat e.tex
\documentclass{article}
\begin{document}
$\frac{1}$
\end{document}
$ pdflatex -interaction=nonstopmode e
...
! Missing } inserted.
<inserted text> 
                }
l.3 $\frac{1}$

! Too many }'s.
\frac #1#2->{\begingroup #1\endgroup \over #2}

l.3 $\frac{1}$
...

Solution

Replace JISON parser.parserError as suggest by @zaach here and here.

Use non-breaking space in token elements

Currently, TeXZilla collapses whitespace in token elements:
https://github.com/fred-wang/TeXZilla/blob/master/TeXZilla.jison#L235

Perhaps instead of calling trim(), the leading/trailing space should be replaced by a  . This can be done by modifying the trim polyfill here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/Trim#Compatibility

So

$1.replace(/^\s+|\s+$/g, "\u00A0").replace(/\s+/g, " ");

will work I think.

Escape Text

In TeXZilla.jison we have

function escapeText(aString)
{
  /* Escape reserved XML characters for use as text nodes. */
  return aString.replace(/&/g, "&amp;").replace(/</g, "&lt;");
}

Should we replace > with &gt;?

Use `phantom.injectJs` instead of `require`

The file commonJS.js exists because of the line

var parser = require("./TeXZilla").parser;

in unit-tests.js.

What about replace this line with

phantom.injectJs("./TeXZilla.js");

and remove the file commonJS.js?

Arabic characters are not accepted

Using Arabic characters (either from regular or math blocks), cause a parse error, eg:

ط^٢

No problem with digits, or Latin math characters.

Do not make uppercase greek letter italic

For example \alpha generates α which has automatic mathvariant=italic. LaTeX does not seem to do that by default so we should add an explicit mathvariant="normal".

mtext test fail

This depends of #31

$ make tests
....
Test 26... FAIL
TeXZilla.toMathMLString, unexpected result:
  Actual: '<math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtext> x y </mtext><annotation encoding="TeX">\\mtext{  x   y  }</annotation></semantics></math>'
  Expected: '<math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtext> x y </mtext><annotation encoding="TeX">\\mtext{  x   y  }</annotation></semantics></math>'
...

Merge master and gh-pages

Was wondering since github now allows any branch to be hosted and the gh-pages is pretty much the same as the master branch. Perhaps it'd be a good idea to have one of these branches deleted (preferrably gh-pages).

Structure not converted to string

This is related to issue #10 and introduced with r-gaia-cs@635732f.

Some structures/objects aren't converted to string properly:

Test 19... FAIL
  Actual: '<math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mroot>[object Object][object Object]</mroot><annotation encoding="TeX">\\sqrt[3]x</annotation></semantics></math>'
  Expected: '<math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mroot><mi>x</mi><mn>3</mn></mroot><annotation encoding="TeX">\\sqrt[3]x</annotation></semantics></math>'

Can't resolve system

I installed TeXZilla using NPM. When I tried to import it, this error appeared:

./node_modules/texzilla/TeXZilla.js
Module not found: Can't resolve 'system' in '/home/m93a/Dokumenty/Vývoj/punk-tex/node_modules/texzilla'

Improve expression grouping using operator precedence

See http://www.w3.org/Math/MathML3/chapter3.html#id.3.3.1.3 for a discussion of grouping of sub-expressions.

Expression grouping was removed in 2f3e5f4 because it didn't work very well and was a serious performance burden for Jison.

I believe in general, correct operator grouping is not possible because one has to infer some semantics from the TeX source , especially when fences are involved like in y = { [0,1) ; (2,3] }^2.

However, I think it would still be possible to improve the grouping via operator precedence:

  • do this in newMrow instead of relying on the grammar's operator precedence
  • only do this for a restricted set of operators like unary/binary operators and relations.
  • fences should not be involved or only at the beginning and end of the mrow

Cleanup the this.parseError code

TeXZilla generates the following code

    if (true) {
         this.parseError = parseError;
     } else {
    }

in order to avoid error with Object.getPrototypeOf in Rhino (issue #38). This could be cleanup a bit further.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.