Giter Site home page Giter Site logo

transpect / docx2tex Goto Github PK

View Code? Open in Web Editor NEW
490.0 37.0 48.0 1.06 MB

Converts Microsoft Word docx to LaTeX

License: BSD 2-Clause "Simplified" License

XProc 43.67% Shell 6.28% XSLT 47.69% Batchfile 2.36%
latex docx office ooxml msword mtef mathtype omml

docx2tex's People

Contributors

fr4nze avatar gimsieke avatar lwittmar avatar md-5 avatar mkraetke avatar polypunkt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docx2tex's Issues

possible error in some of the last commits

[low priority]
If I pull the docx2tex code from the repo now, the d2t script fails and I get the following errors in the .d2t.log
This does not happen if I download the release 1.1.
It is not a problem for me, but I tought I should report.

ERROR: http://transpect.github.io/../index.html:1:107:Not a pipeline or library: html
ERROR: err:XS0044:Unexpected step name: tr:load-cascaded
ERROR: It is a static error if any element in the XProc namespace or any step has element children other than those specified for it by this specification. In particular, the presence of atomic steps for which there is no visible declaration may raise this error.

Citavi→BibTeX

Citavi 6 seems to store its references as base64-encoded JSON in field codes. There has been a request to transform them into BibTeX.
We will do this if we receive at least € 960 of external funding for it. The user who requested the feature is currently considering sponsorship (that is, committing to the full amount).

LaTeX Error: There's no line here to end.

I have a docx file that lead to the following pdflatex error (I can provide the file by private email).

Underfull \hbox (badness 10000) in paragraph at lines 36--36
\OT1/cmr/bx/n/10.95 mun-the-ra-pie

Underfull \hbox (badness 10000) in paragraph at lines 36--36
\OT1/cmr/bx/n/10.95 nicht ge-eig-net

Underfull \hbox (badness 10000) in paragraph at lines 36--36
[]|\OT1/cmr/bx/n/10.95 nicht

Underfull \hbox (badness 10000) in paragraph at lines 36--36
\OT1/cmr/bx/n/10.95 quan-ti-fi-

Underfull \hbox (badness 10000) in paragraph at lines 36--36
[]|\OT1/cmr/m/n/10.95 Idelalisib/Rituximab f[]uhrt zu ei-ner

Underfull \hbox (badness 3354) in paragraph at lines 36--36
\OT1/cmr/m/n/10.95 Verl[]angerung der pro-gres-si-ons-frei-en und des

Underfull \hbox (badness 3536) in paragraph at lines 36--36
\OT1/cmr/m/n/10.95 Ge-samt[]uberlebenszeit so-wie zu ei-ner Stei-ge-

Underfull \hbox (badness 10000) in alignment at lines 36--36
[][][]

! LaTeX Error: There's no line here to end.

See the LaTeX manual or LaTeX Companion for explanation.
Type H for immediate help.
...

l.40 \newline

Beginner's question on how to use docx2tex

Dear Sirs/Madams,
I need to convert a number of docx files to LaTeX so I have downloaded your tool on my xubuntu 19.04 laptop. Regrettably, when I try to run your code an error message is displayed:

$ ./d2t ~/Documents/Introduction.docx 
starting docx2tex
Errors encountered while running docx2tex. Please see /home/eidon/Documents/Introduction.d2t.log for details.
$ cat Introduction.d2t.log
./d2t: line 203: /home/eidon/packages/docx2tex-master/calabash/calabash.sh: No such file or directory

From this I understood that I needed to install calabash, which I did by running

$ java -jar xmlcalabash-1.1.27-99.jar

Despite this, the error is still there. Would you be so kind as to help me? Thank you very much!

Kind regards,
Eidon

hidden text not "tagged"

Desiderata

Text that has font effect "hidden" is translated as normal text, even if it does not display in a pdf generated from the docx and it does not display in the document when the "formatting symbol" button (¶) is not active.

In the following example, the word "bbb" is "hidden".
Would it be possible to have it translated into something like \@gobble{bbb}?

https://medialab.sissa.it/owncloud/index.php/s/VFtUaKfo3chdV82

d2t does not play well with filenames containing whitespace

./d2t "6 Interferometrische Sensoren/160228 6_Interferometrische Sensoren.docx" 
./d2t: line 87: [: too many arguments
./d2t: line 121: [: too many arguments
./d2t: line 143: $LOG: ambiguous redirect
./d2t: line 146: [: too many arguments
starting docx2tex
./d2t: line 167: $LOG: ambiguous redirect
Errors encountered while running docx2tex. Please see /Users/ajung/src/docx2tex/6 Interferometrische Sensoren/6
Interferometrische
160228
6_Interferometrische
Sensoren.docx
.docx.d2t.log for details.

question re conf.xml

How would I configure the conf.xml to produce an article instead of a book? Is there a basic conf.xml for articles?

Problems with Endnote references

I have a document where I used Endnote to manage the references. The file is the same of #3.

What happens is that the superscripted numbers in the main text corresponding to the references are all replaced by \href{}{}, which causes the resulting pdf to show nothing instead of the superscripted numbers.

xproc-util/load/xpl/load.xpl:0:load-error:Could not load...

hello,

When I run d2t, below error occurs:

cp: '../modelo-resumo-semana-conhecimento-2019.docx' e '/usr/src/modelo-resumo-semana-conhecimento-2019.docx' são o mesmo arquivo
INFO : xpl/docx2tex.xpl:197:38:No custom-font-maps loaded.
ERROR: xproc-util/load/xpl/load.xpl:0:load-error:Could not load file:/usr/src/docx2tex/conf/conf.csv (file:///usr/src/docx2tex/xproc-util/load/xpl/load.xpl) dtd-validate=false
ERROR: xproc-util/load/xpl/load.xpl:0:load-error:Could not load file:/usr/src/docx2tex/conf/conf.csv (file:///usr/src/docx2tex/xproc-util/load/xpl/load.xpl) dtd-validate=false
Message: Mode: insert-xpath
Message: Mode: docx2hub:preprocess-styles
Message: Mode: docx2hub:resolve-tblBorders
Message: Mode: docx2hub:add-props
Message: Mode: docx2hub:props2atts
Message: Mode: docx2hub:remove-redundant-run-atts
Message: Mode: docx2hub:join-instrText-runs
Message: Mode: docx2hub:field-functions
Message: Mode: wml-to-dbk
Message: Mode: docx2hub:join-runs
Message: Mode: hub:twipsify-lengths
Message: Mode: hub:split-at-tab
Message: Mode: hub:identifiers
Message: Mode: hub:tabs-to-indent
Message: Mode: hub:handle-indent
Message: Mode: hub:prepare-lists
Message: Mode: hub:lists
Message: Mode: hub:postprocess-lists
Message: Mode: docx2tex-preprocess
Message: Mode: docx2tex-postprocess
INFO : cascade/xpl/load-cascaded.xpl:43:59:load-cascaded: using file:/usr/src/docx2tex/xml2tex/xsl/xml2tex.xsl
INFO : cascade/xpl/load-cascaded.xpl:43:59:load-cascaded: using file:/usr/src/docx2tex/xml2tex/xsl/calstable2tabular.xsl
WARN : file:///usr/src/docx2tex/xslt-util/functx/xsl/functx.xsl:35:66:Stylesheet module http://transpect.io/xslt-util/functx/xsl/functx.xsl is included or imported more than once. This is permitted, but may lead to errors or unexpected behavior
INFO : cascade/xpl/load-cascaded.xpl:43:59:load-cascaded: using file:///usr/src/docx2tex/mml-normalize/xsl/mml-normalize.xsl
Message: Mode: mml2tex-grouping
Message: Mode: mml2tex-preprocess
INFO : cascade/xpl/load-cascaded.xpl:43:59:load-cascaded: using file:/usr/src/docx2tex/mml2tex/xsl/invoke-mml2tex.xsl
WARN : err:SXXP0005:The source document is in namespace http://docbook.org/ns/docbook, but none of the template rules match elements in this namespace (Use --suppressXsltNamespaceCheck:on to avoid this warning)
Message: Mode: escape-bad-chars
Message: Stylesheet compilation failed: 2 errors reported
Message: [FATAL ERROR]: XSLT mode 'escape-bad-chars' failed due to conversion errors. 
ERROR: xproc-util/xslt-mode/xpl/xslt-mode.xpl:0:xslt-mode-escape-bad-chars:Stylesheet compilation failed: 2 errors reported
ERROR: xproc-util/xslt-mode/xpl/xslt-mode.xpl:0:xslt-mode-escape-bad-chars:Stylesheet compilation failed: 2 errors reported
ERROR: xproc-util/xslt-mode/xpl/xslt-mode.xpl:0:xslt-mode-escape-bad-chars:Stylesheet compilation failed: 2 errors reported
ERROR: Unknown error

Java version:

java -version
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

"! Undefined control sequence" error

Improper TeX output is being generated for the attached DOCX file .

Overfull \hbox (18.6093pt too wide) in paragraph at lines 65--66
\OT1/cmr/m/n/10.95 ovial-sarkome. Die h^^?aufigsten We-ichteil-sarkome des Erwa
ch-se-nen sind in Tabelle 1 aufgef^^?uhrt.
4 [5] [6]
Chapter 3.
! Undefined control sequence.
l.74 ... 3-gradige Klassifikationsschema der {\grq
}French Federation of Canc...

Handling of embedded .emf files

We have DOCX files where the authors often embed Powerpoint files.
This case is not handler properly.

! LaTeX Error: Unknown graphics extension: .emf.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                              

l.429 ...16t125157.docx.tmp/word/media/image1.emf}

? x

Ideally .emf files would converted to proper SVGs or PNGs.
If this is not possible they should be removed and not carried forward the LaTeX output
Perhaps removed image could be replace with a placeholder or a warning message.

! Undefined control sequence.

(/opt/local/share/texmf-texlive/tex/latex/latexconfig/epstopdf-sys.cfg))
(/opt/local/share/texmf-texlive/tex/latex/hyperref/nameref.sty
(/opt/local/share/texmf-texlive/tex/generic/oberdiek/gettitlestring.sty))
Chapter 1.
! Undefined control sequence.
l.37 ...den Sie daf"{u}r die Formatvorlage {\glqq
}"{U}berschrift 1{\grqq}.

I can send you the related DOCX file by private email.

"...FATAL: Failed to parse Saxon configuration file..."

I tried to change word files into latex files.
but failed.
fail message is
"...FATAL: Failed to parse Saxon configuration file.
java.nio.file.InvalidPathException: Illegal char <*> at index 96: C:\Users\alpac\Documents\GitHub\docx2tex\calabash\extensions\transpect\javascript-extension\lib*..."
help me, please...OTL

Is this the project relatex to docx2latex.com?

Hi, I am looking for the source code that generated from google doc latex document.

I wonder if this is the script used by docx2latex.com, with this script I am not getting the same result and perhaps I am missing something.

Thanks in advance,

how to obtain a tex file with utf8 chars?

I would like to obtain that in the output tex file certain chars stays in utf8 (à) and are not translated into latex macros ('a). Is this possible?

I tried to look in the <charmap> of conf.xml, but the chars that I want are not there.
I was looking at fontmaps, but, if I understand correctly, I want the opposite of what they would do.

thanks
aaa.docx

[][] Overfull \hbox (1.19997pt too wide) in alignment at lines 141--141 [][] ! Undefined control sequence. <argument> \Micro _{0}ɛ_{0} l.154 \end{tabularx}

[][] 

Underfull \hbox (badness 10000) in alignment at lines 90--90
[][][] 
[1{/usr/local/texlive/2015/texmf-var/fonts/map/pdftex/updmap/pdftex.map}]
Overfull \hbox (1.19997pt too wide) in alignment at lines 102--102
[][] 

Overfull \hbox (1.19997pt too wide) in alignment at lines 114--114
[][] 

Overfull \hbox (1.19997pt too wide) in alignment at lines 128--128
[][] 

Overfull \hbox (1.19997pt too wide) in alignment at lines 141--141
[][] 
! Undefined control sequence.
<argument> \Micro 
                  _{0}ɛ_{0}
l.154 \end{tabularx}

? 

Exception in thread "main" java.lang.NoClassDefFoundError: javax/activation/

Hello,

After updating jvm, I can no longer use docx2tex to produce tex files from docx files with the ./d2t command. I use mac OS via terminal for conversions and currently I have the following version of java:
java version "1.8.0_211"
Java (TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot (TM) 64-Bit Server VM (build 25.211-b12, mixed mode).
I would like to know if the problem is really with the java version of my computer, if someone else has already encountered this problem and, if possible, what solution should I take to remedy the problem.

Thank you very much in advance.

Follow the log generated.

2 Exception in thread "main" java.lang.NoClassDefFoundError: javax/activation/ DataSource
3 at java.base/java.lang.Class.getDeclaredMethods0(Native Method)
4 at java.base/java.lang.Class.privateGetDeclaredMethods(Class.java:31 72)
5 at java.base/java.lang.Class.getMethodsRecursive(Class.java:3313)
6 at java.base/java.lang.Class.getMethod0(Class.java:3299)
7 at java.base/java.lang.Class.getMethod(Class.java:2112)
8 at com.xmlcalabash.core.XProcRuntime.initializeSteps(XProcRuntime.ja va:317)
9 at com.xmlcalabash.core.XProcRuntime.(XProcRuntime.java:272)
10 at com.xmlcalabash.drivers.Main.run(Main.java:100)
11 at com.xmlcalabash.drivers.Main.main(Main.java:83)
12 Caused by: java.lang.ClassNotFoundException: javax.activation.DataSource
13 at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(Builti nClassLoader.java:583)
14 at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadCla ss(ClassLoaders.java:178)
15 at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
16 ... 9 more

Issues with d2t.bat

The current version of the d2t.bat file doesn't work correctly it has 2 issues:

  • The path to calabash is incorrect
  • Exiting the script also exits the shell

The attached patch fixes both these issues
windows_fixes.diff.txt

Installation problem

Hi, I'm trying to install docx2tex on my Mac running El Capitain. Towards the end I get this:

Submodule path 'mml2tex': checked out '03430be79a70b283679cfc1cb1529da5a044f41f'
Cloning into 'schema/hub'...
The authenticity of host 'github.com (192.30.252.130)' can't be established.
RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'github.com,192.30.252.130' (RSA) to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
Clone of '[email protected]:le-tex/Hub.git' into submodule path 'schema/hub' failed

I'm not sure if this is a problem on my end or not, as I'm something of a newbie to Github.

JAVA 1.8 and upper?

─diamon@diamon-ThinkPad-13 ~/projects/docx2tex ‹system› ‹master*›
╰─$ ./d2t test.docx
starting docx2tex
Errors encountered while running docx2tex. Please see /home/diamon/projects/docx2tex/test.d2t.log for details.
╭─diamon@diamon-ThinkPad-13 ~/projects/docx2tex ‹system› ‹master*›
╰─$ cat test.d2t.log 1 ↵
cp: 'test.docx' и '/home/diamon/projects/docx2tex/test.docx' - один и тот же файл
Exception in thread "main" java.lang.NoClassDefFoundError: javax/activation/DataSource
at java.base/java.lang.Class.getDeclaredMethods0(Native Method)
at java.base/java.lang.Class.privateGetDeclaredMethods(Class.java:3119)
at java.base/java.lang.Class.getMethodsRecursive(Class.java:3260)
at java.base/java.lang.Class.getMethod0(Class.java:3246)
at java.base/java.lang.Class.getMethod(Class.java:2065)
at com.xmlcalabash.core.XProcRuntime.initializeSteps(XProcRuntime.java:317)
at com.xmlcalabash.core.XProcRuntime.(XProcRuntime.java:272)
at com.xmlcalabash.drivers.Main.run(Main.java:100)
at com.xmlcalabash.drivers.Main.main(Main.java:83)
Caused by: java.lang.ClassNotFoundException: javax.activation.DataSource
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:190)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:499)
... 9 more

ERROR: An empty sequence is not allowed as the third argument of replace()

I am trying to convert a .docx file that is an article with equations, figures, and even references introduced with Endnote.

Using the master (since with the last pre-release (0.3), I was getting the same reported error that was solved recently), and running:

$ docx2tex-master/d2t -o test test.docx

I am getting the following errors:
ERROR: docx2tex-master/xproc-util/load/xpl/load.xpl:0:load-error:Could not load file:/usr/people/jmdamas/docx2tex-master/conf/conf.csv (file:///usr/people/jmdamas/docx2tex-master/xproc-util/load/xpl/load.xpl) dtd-validate=false
ERROR: file:///usr/people/jmdamas/docx2tex-master/mml2tex/xsl/mml2tex.xsl:339:err:XPTY0004:An empty sequence is not allowed as the third argument of replace()
ERROR: An empty sequence is not allowed as the third argument of replace()
ERROR: cause: file:///usr/people/jmdamas/docx2tex-master/mml2tex/xsl/mml2tex.xsl:339:err:XPTY0004:An empty sequence is not allowed as the third argument of replace()
ERROR: An empty sequence is not allowed as the third argument of replace()
ERROR: cause: file:///usr/people/jmdamas/docx2tex-master/mml2tex/xsl/mml2tex.xsl:339:err:XPTY0004:An empty sequence is not allowed as the third argument of replace()
ERROR: Pipeline failed: An empty sequence is not allowed as the third argument of replace()
ERROR: Underlying exception: An empty sequence is not allowed as the third argument of replace()

In the first ERROR, I don't understand why the file can't be loaded, since it is there.
The other errors are all related to a replace function, but I can't understand the origin.

I tried to run with an a shorter version of the .docx (just the first 5 or 6 pages, with some equations), and I didn't get any errors. I tried to remove the Endnote references only (I thought they might be a problem) and tested it, and it gave me the errors again. I could go in a trial-and-error mode, trying to identify which part of the document is causing the problem, but I don't think that's a solution.

Can you give me some tips on how to solve this?

Oh, I am running this on an Ubuntu 12.04 with JAVA 1.7.0_80.

Meanwhile, I am using an older (?) version of this software in codeplex (https://docx2tex.codeplex.com/releases/view/19618), that is working well on Windows.

conf.csv in d2t

Why is there a default value of conf.csv when the pipeline actually expects an XML file (that it then loads with tr:load)?

funny translations: \TimesNewRoman{56 41 43...

The translation of the attached file results in a lot of macros in the following form:
\TimesNewRoman{41 43...

As soon as I open the docx in word and save it, the problem disappears.

Might be related to issues/25

(please do not distribute the attached file)
wj.docx

ERROR: An empty sequence is not allowed as the result of function tr:theme-font()

Hi guys,

I am having this issue. Does this sounds familiar to you?

./d2t -o tmpp ~/workspace/ets/phd/thesis/versions/v1.11.docx
starting docx2tex
Errors encountered while running docx2tex. Please see /Users/david/opt/docx2tex/tmpp/v1.11.d2t.log for details.

Log file:

Message: Mode: insert-xpath
ERROR: file:/Users/david/opt/docx2tex/docx2hub/xsl/insert-xpath.xsl:223:err:XTTE0780:An empty sequence is not allowed as the result of function tr:theme-font()
ERROR: An empty sequence is not allowed as the result of function tr:theme-font()
ERROR:     cause: file:/Users/david/opt/docx2tex/docx2hub/xsl/insert-xpath.xsl:223:err:XTTE0780:An empty sequence is not allowed as the result of function tr:theme-font()
ERROR: An empty sequence is not allowed as the result of function tr:theme-font()
ERROR:     cause: file:/Users/david/opt/docx2tex/docx2hub/xsl/insert-xpath.xsl:223:err:XTTE0780:An empty sequence is not allowed as the result of function tr:theme-font()
ERROR: An empty sequence is not allowed as the result of function tr:theme-font()
ERROR:     cause: file:/Users/david/opt/docx2tex/docx2hub/xsl/insert-xpath.xsl:223:err:XTTE0780:An empty sequence is not allowed as the result of function tr:theme-font()
ERROR: An empty sequence is not allowed as the result of function tr:theme-font()
ERROR:     cause: file:/Users/david/opt/docx2tex/docx2hub/xsl/insert-xpath.xsl:223:err:XTTE0780:An empty sequence is not allowed as the result of function tr:theme-font()
ERROR: Pipeline failed: An empty sequence is not allowed as the result of function tr:theme-font()
ERROR: Underlying exception: An empty sequence is not allowed as the result of function tr:theme-font()

I am using the master branch.

In fact I don't really care about the system font. I don't if there is way to ignore this error and continue?

FATAL: Failed to parse Saxon configuration file.

I want to use docx2tex to test whether it can convert mathtype equation to latex.
I use the most recent docx2tex release.
I got error message as below:

FATAL: Failed to parse Saxon configuration file.
java.nio.file.InvalidPathException: Illegal char <*> at index 67: C:\docx2tex\calabash\extensions\transpect\javascript-extension\lib*

This is my docx file.
equation.docx

lost spaces and formatting in hyperlink

Please note that, in a fragment similar to the following,
the spaces after "Instrum." and after "88" are lost
and the boldface of "88" is also lost (or applied to the whole hyperlink maybe)

...<w:t xml:space="preserve">Rev. Sci. Instrum. </w:t></w:r>
<w:r w:rsidR="00A51604" w:rsidRPr="002E01AB"><w:rPr><w:rStyle w:val="af1"/><w:b/><w:bCs/><w:color w:val="auto"/><w:lang w:val="en-US"/></w:rPr>
<w:t xml:space="preserve">88 </w:t>...

TeX becomes:

Rev. Sci. Instrum.88(2017) 033504

MWE here:
https://medialab.sissa.it/owncloud/index.php/s/qPdO4qMBWdU28RH

*.tmp folder outside the -o folder

This is a minor issue, but when I run doc2tex with the -o option, the *.tmp folder, which contains stuff like the media folder with the images to be inserted, is placed outside the -o folder, so the path for the images is wrong and they are not loaded when the .tex is compiled.

Cheers

lost image and caption (probably related to east-asia chars)

Please note that in the translation of the file given below, figure 4 and its caption become (line 89 in the tex file):

\等线{46 69 ...

There is something similar at line 330.

I tried to isolate the problematic part only, but I get a different output (below), so I'm providing the whole document, but please do not distribute it

\DengXian{46 69 ...

Problematic file and translation:
https://medialab.sissa.it/owncloud/index.php/s/4APWPgtO5slLkuA

[OT] I'm opening many issues, please feel free to stop me if I'm too pesky :-)

Problems with symbols and accented characters

The reference file is the same as #3.

I am not sure of how doc2tex should be dealing with these issues, but while symbols like the alpha or beta characters are converted to $\alpha$ or $\beta$, other symbols are not being recognized. Some examples include the minus or times signs, apostrophe (for example, see «Kramers'» in the file), or tildes (which should be converted to \~{} or \textasciitilde{}).

Moreover, some accented characters, like in my name, are not being recognized, but comparing with the output from codeplex's doc2tex, I identified this problem to be the lack of \usepackage[utf8]{inputenc} in the preamble. Am I correct?

About the lone symbols, is there anything that can be done?
Also, can doc2tex recognize the differences between types of dashes (see here)?

Cheers

Improper LaTeX output

docx2text running on

http://public.zopyx.com/lungenkarzinom-nicht-kleinzellig-nsclc.docx

generates improper LaTeX....possibly an improper DOCX structure however the converter should
perhaps not generate improper output but add some logging message to the log.


[Loading MPS to PDF converter (version 2006.09.02).]
) (/opt/local/share/texmf-texlive/tex/latex/oberdiek/epstopdf-base.sty
(/opt/local/share/texmf-texlive/tex/latex/oberdiek/grfext.sty)
(/opt/local/share/texmf-texlive/tex/latex/latexconfig/epstopdf-sys.cfg))
(/opt/local/share/texmf-texlive/tex/latex/hyperref/nameref.sty
(/opt/local/share/texmf-texlive/tex/generic/oberdiek/gettitlestring.sty))
[1{/opt/local/var/db/texmf/fonts/map/pdftex/updmap/pdftex.map}] [2]
Chapter 1.

! LaTeX Error: Lonely \item--perhaps a missing list environment.

See the LaTeX manual or LaTeX Companion for explanation.
Type H for immediate help.
...

l.29 2.\item \chapter
{Grundlagen}
? c
Type to proceed, S to scroll future error messages,
R to run without stopping, Q to run quietly,
I to insert something, E to edit your file,
1 or ... or 9 to ignore the next 1 to 9 tokens of input,
H for help, X to quit.

Message: docx2hub error on unzipping.

Message: docx2hub error on unzipping.
Zip file seems to be corrupted: /infektionen-bei-haematologischen-und-onkologischen-patienten-uebersicht.docx (No such file or directory)

ERROR: err:XD0001:Only whitespace text nodes can appear at the top level in a document
ERROR: err:XD0001:Only whitespace text nodes can appear at the top level in a document
ERROR: err:XD0001:Only whitespace text nodes can appear at the top level in a document
ERROR: It is a dynamic error if a non-XML resource is produced on a step output or arrives on a step input.

I can provide the sample file by email since Github does not support DOCX uploads.

The issue appears to be specific to MacOSX. Converting the same file on Linux works.

! LaTeX Error: Lonely \item--perhaps a missing list environment.

I am receiving the following error for a given DOCX document (sorry, I can not provide the source
due to non-disclosure reasons).

Package hyperref Warning: Suppressing link with empty target on input line 59.

Package hyperref Warning: Suppressing link with empty target on input line 61.

! LaTeX Error: Lonely \item--perhaps a missing list environment.

See the LaTeX manual or LaTeX Companion for explanation.
Type H for immediate help.
...

l.61 \href{}{1.1}\item \href
{}{Besondere Darstellungen im Handbuch }

58
59 \href{}{1. Aufbau des Handbuchs }
60
61 \href{}{1.1}\item \href{}{Besondere Darstellungen im Handbuch }
62
63 \href{}{1.2}\item \href{}{Zielgruppe }
64
65 \href{}{1.3}\item \href{}{Die Themenabschnitte des Handbuchs im "{U}berblick }

cp ....are the same file

Fresh installation using:

git clone https://github.com/transpect/docx2tex --recursive

Any conversion with d2t gives me the same error

cp: ‘/tmp/docx-samples/160229_Wolff_Sensor_Technologien/all/1_Einleitung.docx’ and ‘/tmp/docx-samples/160229_Wolff_Sensor_Technologien/all/1_Einleitung.docx’ are the same file
ERROR: xml2tex/xpl/xml2tex.xpl:71:65:err:XS0052:Cannot import: http://transpect.io/mml2tex/xpl/mml2tex.xpl
ERROR:     cause: I/O error reported by XML parser processing http://transpect.io/mml2tex/xpl/mml2tex.xpl: http://transpect.github.io/mml2tex/xpl/mml2tex.xpl
ERROR: It is a static error if the URI of a p:import cannot be retrieved or if, once retrieved, it does not point to a p:library, p:declare-step, or p:pipeline.
ERROR: Underlying exception: I/O error reported by XML parser processing http://transpect.io/mml2tex/xpl/mml2tex.xpl: http://transpect.github.io/mml2tex/xpl/mml2tex.xpl

Formatting issues with in-text (sub|super)scripts

The reference file is the same as #3.

I am experiencing some issues with subscripts and superscripts in the reference file.

  1. I think there in a lack of brackets for longer subscripted words, with kcat being converted to \textit{k}$_cat$/ instead of \textit{k}$_{cat}$/.
  2. Issues with the superscript when the following character in a minus sign, like $^−1$ (maybe related with issue #5?)
  3. I am getting double-equation mark-up for Cα, like this C$^$\alpha$$. This screws things and a lot of the non-equation text following it is pulled into the equation mode.

Cheers

xslt-util/calstable/xpl and com.xmlcalabash conversion errors

Bug Report:
My OS: Linux Gentoo Base System release 2.24.1.12 64 bit PC desktop
Java: 1.8.0_66
Shell: bash 4.3.42 (x86_64-pc-linux-gnu)
Install: cd /home/el/bin; git clone https://github.com/transpect/docx2tex --recursive
The input docx has a few unicode shenanigans, but nothing too out of band: http://www.filedropper.com/examplefail
Run you code: cd /home/el/bin/docx2tex; ./d2t ExampleFail.docx
Failure .log File: http://www.filedropper.com/examplefaild2t

What I expected: I expected some kind of output file ExampleFail.tex output containing latex code.

Quarantining the bug, proving the bug isn't on my side:

  1. Use libreoffice version 5.2.3.3 -writer to create an new empty .docx document containing the ascii text asdf.

  2. Save the above file as Untitled.docx using format Microsoft Word 2007-2013 XML (.docx) format.

  3. Openoffice -writer produces this Untitled.docx: http://www.filedropper.com/untitled_22

  4. Run the code: cd /home/el/bin/docx2tex; ./d2t Untitled.docx

  5. docx2tex works as expected, the contents of Untitled.tex render by pdflatex to a similar looking pdf:

The problem is in the table layouts.

Equation-related issues (\underset, \sum\nolimits, \substack, cases environment)

The reference file is the same as #3.

  1. I use some sum-class symbols in-text and the limits are always on the right of the sigma symbol. But doc2tex is recognizing this and using the \underset{<limits>}{\sum} instead of the more common \sum\nolimits. In fact, this is also happening in the equations not in-text, but I am guessing that's because they are inside the tabular environment.
  2. Limits of sums with more than one line (see equation 2 of reference file) are being transformed into an array environment. I think the best solution would be to use \substack (maybe \mathclamp could also help here)

Cheers

lost newline

In the following file, a newline is lost between "...Fast ICA Algorithm" and "FastICA disintegrates...":
https://medialab.sissa.it/owncloud/index.php/s/PXm3ktw0LFYXVti

However, please note that in the original docx, there are a couple of spaces missing in "3.1Denoisingby" (probably a typo), but when I tried to add them (to get "3.1 Denoising by") and save the file, the missing newline magically appears in the TeX translation and I can see no error.

Problem about time improvment

Hi, i have tons of docx files to transform but this project is very time-consuming. I found it generates some temporary files in the process, i guess this may be the problem. I am not good at shell, could you please offer a solution for me? many thanks.

runtime error

I get the following error when I try to translate this file:

INFO : file:..docx2tex/xpl/docx2tex.xpl:188:38:No custom-font-maps loaded.
ERROR: file:...docx2tex/xproc-util/load/xpl/load.xpl:0:load-error:Could not load file:...docx2tex/conf/conf.csv (file:...docx2tex/xproc-util/load/xpl/load.xpl) dtd-validate=false
...

https://medialab.sissa.it/owncloud/index.php/s/BZFHlref5mB3uAS

(I think my installation is ok, because I can translate other documents)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.