jessicategner / pypandoc Goto Github PK

View Code? Open in Web Editor NEW

817.0 18.0 106.0 487 KB

Thin wrapper for "pandoc" (MIT)

Home Page: http://pypi.python.org/pypi/pypandoc/

License: Other

Python 98.71% Dockerfile 0.95% HTML 0.35%

pypandoc formatting markdown html pdf python pandoc pdf-generation hacktoberfest hacktoberfest2022

pypandoc's Introduction

Hi there 👋 I'm Jessica Tegner

]

⚡ Recent Activity

🗣 Commented on #5 in JessicaTegner/PyTinyTeX
🎉 Merged PR #363 in JessicaTegner/pypandoc
🎉 Merged PR #359 in JessicaTegner/pypandoc
🗣 Commented on #276 in JessicaTegner/pypandoc
🔒 Closed issue #361 in JessicaTegner/pypandoc
🎉 Merged PR #362 in JessicaTegner/pypandoc
🔒 Closed issue #335 in JessicaTegner/pypandoc
🗣 Commented on #335 in JessicaTegner/pypandoc
🗣 Commented on #353 in JessicaTegner/pypandoc
🔒 Closed issue #353 in JessicaTegner/pypandoc

pypandoc's People

Contributors

Stargazers

Watchers

Forkers

erunamojazz mlaloux benjaoming rosscdh pierrebizouard b-rich gabeos dastanko msabramo mindw rhiaro relekang jankatins binaryaaron thatcher pombreda pombredanne jontsai dlukes valholl rossant posborne felixonmars prigio pycontribs willingc gwgundersen swistakm tkob momingxu kubilayeksioglu cwt1 kristoff3r sameersingh7 jwaschkau lawrennd nocarryr radek-sprta lionawurscht lbartoletti ahom jsmedmar wolfv burgerbecky chong-jiao dairoot jarrahg thombashi schq garethjv elben10 aristotle-metadata-enterprises nishair artemeger alserkli apostropheeditor abarrafo najuzilu jwidner abdealiloko sbland sternenseemann extratone eorlbruder gly-git ztencmcp jayvdb sthagen roman2git mcm teeekay cybertailor scottclowe h0r5e edwardbetts hey-thanks hey24sheep ewertonbello kershuilong hugobuddel keszybz stjordanis ssahgal davidkorczynski kianmeng arpitjain799 dissupeng iq-scm pouyanpi lwd-temp wangdong09 kencx homaggroup rdhyee juprigh strogo cplushua sofurs leftomelas befeleme

pypandoc's Issues

native format fails

This

pypandoc.convert('A', 'latex', format='native')

gives this

RuntimeError: Pandoc died with exitcode "1" during conversion: pandoc: Cannot parse document

This is with

>>> pypandoc.__version__
'1.1.2'

Workaround to make the wheel platform specific

See here: http://lucumr.pocoo.org/2014/1/27/python-on-wheels/

import os
from setuptools import setup
from setuptools.dist import Distribution

class BinaryDistribution(Distribution):
    def is_pure(self):
        return False

setup(
    ...,
    include_package_data=True,
    distclass=BinaryDistribution,
)

Add a method to get the pandoc version

In knitpy, I have the need to ensure that the used pandoc version is higher than 1.10 (and probably 1.12, similar to ipython nbconvert).

ipython has such a version check: https://github.com/jupyter/jupyter_nbconvert/blob/master/jupyter_nbconvert/utils/pandoc.py#L58, maybe that could be copied to pypandoc?

@msabramo Any idea what would be a good way to resolve this? Now we're getting nasty failures. Example of a failure: https://travis-ci.org/bebraw/pypandoc/jobs/52296424 . Pandoc is giving pandoc: Unknown reader: markdown+strikeout.

Can we benefit from http://docs.travis-ci.com/user/build-configuration/#Installing-Packages-Using-apt ?

can't convert to docx

Using this to convert to docx does not work:

pandoc: Cannot write docx output to stdout.
Specify an output file using the -o option.

Add option to use Docverter

I have an idea but I think it can wait until after 0.90.

Idea is to add option to use and/or fallback to http://docverter.com instead of local pandoc. This could be huge, because then you could possibly use pypandoc without having pandoc installed.

html5 and new formats

Pls, update the input/output formats for Pandoc 1.11.1.


from_formats = ('native', 'json', 'markdown', 'markdown_strict', 
               'markdown_phpextra', 'markdown_github',  
               'markdown_mmd', 'rst', 'mediawiki', 'docbook',  
               'textile', 'html', 'latex')

to_formats = ('native', 'json', 'docx', 'odt', 'epub', 'epub3', 
              'fb2', 'html', 'html5', 's5', 'slidy', 'slideous', 
              'dzslides', 'docbook', 'opendocument', 'latex', 
              'beamer', 'context', 'texinfo', 'man', 'markdown', 
              'markdown_strict', 'markdown_phpextra', 
              'markdown_github', 'markdown_mmd', 'plain', 'rst', 
              'mediawiki', 'textile', 'rtf', 'org', 'asciidoc')

Problem with parsing of format names

There seems to be a minor parsing problem in twiki with get_pandoc_formats():

['commonmark', 'docbook', 'docx', 'epub', 'haddock', 'html', 'json', 'latex', 'markdown', 'markdown_github', 'markdown_mmd', 'markdown_phpextra', 'markdown_strict', 'mediawiki', 'native', 'odt', 'opml', 'org', 'rst', 't2t', 'textile', "twiki [ only Pandoc's JSON version of native AST]"]

py35 wheel names are not platform specific...

This due to the trick of overwriting the pure lib attribute doesn't work anymore as bdist_wheel doesn't use it anymore on 3.5...

I deleted the packages from PyPI so that they are hopefully not used yet... (if the users have a pandoc install, it doesn't matter..)

I've a patch (currentlx testing) and uploaded newly built py35 packages for win (funnily mac wasn't affected?). As we currently only have readme and setup.py fixes, I don't think we need a new release.

use newer pandoc

See here for the problem: #44 (comment)
-> travis only installs 0.9, but pandoc > 0.10 is needed to support extensions

There are several possible ideas to fix this (apart from "build pandoc from source)

use IPythons version (see their travis config: https://github.com/ipython/ipython/blob/master/.travis.yml -> this will break if IPYthon removes this version, but only 19MB download
use the one from RStudio (travis uses that in their r environment): https://github.com/travis-ci/travis-build/pull/386/files#diff-5b7e1aa01f1c6f058df6e0b9e2486c62R370 -> probably stable, but is a 100+MB download
use a language=r travis config, which has a newer pandoc version and install python via conda :-)

No prebuilt wheels for v1.2.0 on pypi?

Hello, was it intentional to stop publishing prebuilt wheels to pypi? Just wondering, because the instructions still say the wheel packages include pandoc for Windows and Mac OS X, but the wheels are not listed under the files section on pypi. https://pypi.python.org/pypi/pypandoc/1.2.0

When using the latest version 1.2.0 my builds/checks fail with the pandoc not found error...using download_pandoc() or reverting to v1.1.3 gets them working again.

Figure out why Travis yields `ValueError: 'Options:' is not in list`

See https://travis-ci.org/bebraw/pypandoc/builds/64269432 .

Why would it fail to find Options: in Pandoc output? Any ideas?

cc @msabramo @xysmas

Pandoc died with exitcode "9" during conversion: b'pandoc: Unknown writer: pdf

To be honest I have no idea where exactly is the problem here. I have following piece of code:

import pypandoc

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--fmt")
    parser.add_argument("--fail", dest="fail", action="store_true")
    parser.set_defaults(fail=False)
    args = parser.parse_args()

    if args.fail:
        to = args.fmt 
    else:
        # Ridiculous but works  
        to = "pdf" if args.fmt == "pdf" else args.fmt 

    pypandoc.convert(
        source="# Foo", to=to,
        format="markdown",
        outputfile="foo.pdf"
    )

if __name__ == "__main__":
    main()

which works just fine when I run it with python ex2.py --fmt pdf and let it use completely meaningless "pdf" if args.fmt == "pdf" else args.fmt but fails otherwise with

RuntimeError: Pandoc died with exitcode "9" during conversion: b'pandoc: Unknown writer: pdf
To create a pdf with pandoc, use the latex or beamer writer and specify
an output file with .pdf extension (pandoc -t latex -o filename.pdf).'

Environment:

Debian GNU/Linux
Pandoc Version: 1.16.0.2~dfsg-1
pypandoc==1.1.3
Python 3.5.1, Python 2.7.11

Converting from .docx / .odt

Pandoc is also able to take e.g. a .docx / .odt file and output plain text / markdown / html etc., but pypandoc's current approach of first reading the input file in memory as plain text (pypandoc.py#147) and then piping the string to pandoc makes it impossible to take advantage of this (obviously since .docx / .odt is a compressed archive, not a plain text file).

Maybe this is by design (?); if not, it would be great if pypandoc supported this part of pandoc functionality as well :) (TBH, I don't fully understand why it's necessary to read in the input file and pipe it to pandoc instead of just providing it to the pandoc subprocess call as an argument, but I have only taken a glance at the code and there may be ramifications I'm not aware of...)

Great job otherwise!

Enable travis builds on push?

It seems that pushes to master will not result in an travis run.

@bebraw Would you mind going to https://travis-ci.org/bebraw/pypandoc/settings and enable Build pushes?

I would like to make a release so that I can supply wheels for macosx and windows (it seems that you can't upload wheels for linux) and build the conda packages for all three platforms.

My release checklist is

make it (travis) green
bump version and tag as 'v1.1.1'
push tags and commit to master

Anything I missed?

*.rst Titles and Tables unrecognized by Sphinx or my reStructuredText editor

When I convert any format to *.rst the titles created are not recognized by Sphinx or my reStructuredText editor. This seems due to an extra blank line that is inserted between the title and the repeating =.

I am using this code:

from pypandoc import convert
convert(source=clashing_html, to='rst', format='html')

HTML Input

<h1>Property Aliases</h1>

Current reStructured Text Output

Property Aliases

================

Desired reStructured Text Output

Property Aliases
================

In the case of tables the output looks like this:

+---------------------+--------------------------+

| Property Name       | Clashing Aliases         |

+---------------------+--------------------------+

| background-color    | ``bc-``                  |

+---------------------+--------------------------+

| background-repeat   | ``br-`` ``repeat``       |

+---------------------+--------------------------+

| border-collapse     | ``border-c-`` ``bc-``    |

+---------------------+--------------------------+

| border-color        | ``border-c-`` ``bc-``    |

+---------------------+--------------------------+

It needs to look like this for sphinx to understand it.

+---------------------+--------------------------+
| Property Name       | Clashing Aliases         |
+---------------------+--------------------------+
| background-color    | ``bc-``                  |
+---------------------+--------------------------+
| background-repeat   | ``br-`` ``repeat``       |
+---------------------+--------------------------+
| border-collapse     | ``border-c-`` ``bc-``    |
+---------------------+--------------------------+
| border-color        | ``border-c-`` ``bc-``    |
+---------------------+--------------------------+

pandoc fails with "invalid argument (invalid character)"

Hey,

unfortunately pandoc seem to fail with the following string:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pypandoc as pandoc

markdown = '''This is a test

&#55357; &#56862;'''

html = pandoc.convert(markdown, 'html', format='md')

This is the error message:

$ python test.py 
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    html = pandoc.convert(markdown, 'html', format='md')
  File "/Library/Python/2.7/site-packages/pypandoc.py", line 94, in convert
    outputfile=outputfile, filters=filters)
  File "/Library/Python/2.7/site-packages/pypandoc.py", line 135, in _convert
    filters=filters)
  File "/Library/Python/2.7/site-packages/pypandoc.py", line 204, in _process_file
    'Pandoc died with exitcode "%s" during conversation: %s' % (p.returncode, stderr)
RuntimeError: Pandoc died with exitcode "1" during conversation: pandoc: <stdout>: commitBuffer: invalid argument (invalid character)

I am using pandoc version 1.14.0.1 and pypandoc 0.9.7. Interestingly the same script works with Ubuntu 14.04 (Python 2.7.5) like a charme, but fails with OSX 10.9.5 (Python 2.7.5).

I assume its some kind of encoding issue but have no idea how to fix this?

Kind regards,
Christoph

Make pandoc download a regular library function

I think we should refactor the code, so that the downloader becomes part of the library (e.g pypandoc.download_pandoc(to_path='.', url=None) -> you can download to some specific location (setup.py download_pandoc would set that to pypandoc/files) and also update pandoc if a new release comes out) and the setup just calls it when building wheels. This would make it possible to download pandoc by the user if there was no wheel available.

What is the syntax for converting to pdf?

I have tried naively with output = pypandoc.convert('paper.md', 'pdf') which won't work.

Make sure that the dir of the pandoc executable is in PATH

We also install citeproc and there is a high probability that next to pandoc other filters are installed. So let's make sure that pandoc finds these.

If pandoc currently does not look into that dir: if we have a full path for pandoc (=Not in PATH), we should add that dir to the PATH as well before calling pandoc.

Please add tests and examples to the released tarball

It would be useful to have tests and examples in the released tarball, to be able to sanity check the module before using it (and as documentation): if you are ok with the idea I can prepare a pull request which adds the proper lines to MANIFEST.in.

pandoc-citeproc not recognized as an option

I have a small script that uses pypandoc to put a file's current YAML frontmatter into the pandoc'ed output file (for jekyll).

The file works great until I attempt to add in a filter option:

pdoc_args = ['--mathjax',
                     '--smart',
                     '--filter pandoc-citeproc'
                     ]

produces:

$ ./pyconverter.py yaml_frontmatter.md 
Converting yaml_frontmatter.md
pandoc: unrecognized option `--filter pandoc-citeproc'

fwiw, i have pandoc-citeproc installed and it works fine with the same options from the command line:

$ pandoc --mathjax --smart --filter pandoc-citeproc yaml_frontmatter.md

Produces:

<p class="notice-info">
<strong>Content Warning:</strong> Machine Learning, Python
</p>
<h2 id="section">Section</h2>
<p>See <span class="citation">(see Fenner 2012)</span></p>
<p>Inline Math: <span class="math">\(\sum_{i}^{n_1}\)</span></p>
<p>display math:</p>
<p>[ \begin{align} x = 1 y = 2 ]</p>
<pre class="sourceCode python"><code class="sourceCode python">testing <span class="kw">for</span> <span class="dt">all</span> in <span class="dt">all</span></code></pre>
<div class="references">
<h1 id="references" class="unnumbered">References</h1>
<p>Fenner, Martin. 2012. “One-Click Science Marketing.” <em>Nature Materials</em> 11 (4). Nature Publishing Group: 261–63. doi:<a href="http://dx.doi.org/10.1038/nmat3283">10.1038/nmat3283</a>.</p>
</div>

Thoughts?

Update pandoc

https://github.com/jgm/pandoc/releases -> pandoc 1.16.0.2 is current, we ship 1.15... Should probably be done after #83

End-of-line differences in the released file

Hello

In the released zip found on https://pypi.python.org/pypi/pypandoc a few files (including pypandoc/init.py and the README.md) have been converted to use dos-style line endings (\n\r) instead of the unix-style (\n) they used to have (and still have in the git repo).

While this doesn't break anything, IMHO it is somewhat ugly and it makes it a bit harder to compare files around.

Could you please investigate what happened or just check that it's not happening any longer the next time you'll do a release?

Thanks

Enable appveyor on PRs

@bebraw It seems that you have to enable that in the github repo: http://help.appveyor.com/discussions/questions/203-auto-run-tests-on-pull-requests#comment_33212404 Unfortunately, I don't have access to this config options.could you enable this?

FYI: appveyor currently runs under my account (which seems to ok: in contrast to travis, the appveyor build is not paired with the github repo, but with a person having access to the repo -> on appveyor, you can have multiple persons running builds for the same repo) with my pypi upload credentials.

Travis also runs with my pypi upload credentials

Incorrect path in appened to pandoc search_paths

pypandoc attempts to search for /usr/bin/pandoc, but accidentally just adds /usr/bin to the search_paths in the _ensure_pandoc_path() function. The problematic line is https://github.com/bebraw/pypandoc/blob/master/pypandoc/__init__.py#L310.

This is only a big problem if /usr/bin is not in your PATH, but it does result in some diagnostic error messages being printed (since subprocess.Popen attempts to execute a directory).

The solution is to further append the binary name ('pandoc') to the path.

converting string does not work as advertised

In [1]: import pypandoc

In [2]: output = pypandoc.convert('#some title', 'rst', format='md')
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-2-031b23b69b18> in <module>()
----> 1 output = pypandoc.convert('#some title', 'rst', format='md')

/home/esc/anaconda/lib/python2.7/site-packages/pypandoc/pypandoc.pyc in convert(source, to, format, extra_args)
      8     Raises OSError if pandoc is not found! Make sure it has been installed and is available at path.
      9     '''
---> 10     return _convert(_read_file, _process_file, source, to, format, extra_args)
     11 
     12 def _convert(reader, processor, source, to, format=None, extra_args=()):

/home/esc/anaconda/lib/python2.7/site-packages/pypandoc/pypandoc.pyc in _convert(reader, processor, source, to, format, extra_args)
     11 
     12 def _convert(reader, processor, source, to, format=None, extra_args=()):
---> 13     source, format = reader(source, format)
     14 
     15     formats = {

/home/esc/anaconda/lib/python2.7/site-packages/pypandoc/pypandoc.pyc in _read_file(source, format)
     41 
     42 def _read_file(source, format):
---> 43     with open(source) as f:
     44         format = format or os.path.splitext(source)[1].strip('.')
     45         source = f.read()

IOError: [Errno 2] No such file or directory: '#some title'

extra_args=['--latex-engine=xelatex', '-V', 'mainfont="Font Name"'] doesn't work

My conversion class looks like this:

class PandocPDFConverter(object):
def generate_output(self, docx_file, **kwargs):
extra_args = [
'--latex-engine=xelatex',
'-V',
'mainfont="Noto Serif"'
]
output = pypandoc.convert(docx_file,
'pdf',
outputfile='test.pdf',
extra_args=extra_args)

... and it doesn't output PDF file. However, when the same arguments are passed during conversion from command line, everything works fine.
Error message:
RuntimeError: Pandoc died with exitcode "43" during conversation: ! LaTeX Error: Missing \begin{document}.

See the LaTeX manual or LaTeX Companion for explanation.
Type H for immediate help.
...

l.18 \setmainfont{"Noto Serif"}

pandoc: Error producing PDF from TeX source

But when when I change 'mainfont="Noto Serif"' with 'mainfont="[NotoSerif-Regular.ttf]"' conversion goes well.

P.S.
IMHO, you meant to write "conversion", not "conversation" (pypandoc.py, line No 214).

Do a beta release

adress #97
Add a changelog in chronological order to see what's changed between last version and now
version and tag with a 0b0 suffix: http://python-packaging-user-guide.readthedocs.io/en/latest/distributing/#pre-release-versioning
upload to pypi and conda package in my private repo
write a short blogpost to be distributed via @bebraw (#96 (comment))

Adding support for pandoc 1.9.4

I've been successful with pypandoc in several environments including windows and osx. However in my production environment I'm stuck with Red Hat Enterprise 6 which only supports pandoc 1.9.4 from the official rpm repository.

I'm trying to make use of the great new project knitpy by @JanSchulz for dynamic reports (I have some background in R, but prefer Python though knitr is better for my application that ipython notebooks).

Here are the related threads that get at the gist of the issue before I give you my specific code:

jgm/pandoc#751

If its possible to add some introspection and backward compatible support for calling pandoc 1.9 (documenting what is still not supported given you have to fallback), then that would get me a lot further.

I have the ability to do the coding given the project contributors support but it would go faster if I just a little had direction since pandoc is not something I've worked with.

Here is my ticket for knitpy but this is something I'd see better positioned as a patch to pypandoc.

jankatins/knitpy#8

Add more info to PyPI page

A little ironic that pypandoc of all things has a nice README.md on GitHub, but doesn't have a great PyPI page? :)

Can you make the setup.py use pypandoc itself to convert Markdown to reST?

Support Python3

See https://gist.github.com/1360404 .

pypandoc install fails of pandoc isn't installed

This is pretty easy to repro. Just do the following in a virtualenv on a machine where pandoc is not installed:

pip install pypandoc

I think errors related to pandoc not being installed should be deferred until execution time and should not prevent individuals from installing the library. The particular use case where this is really impacting me currently is when trying to use setuptools-markdown (https://github.com/msabramo/setuptools-markdown). Here, it is reasonable for a maintainer to install pandoc to convert markdown to restructured text when releasing to pypi. Unfortunately, right now we need to throw the baby out with the bathwater because setup_requires also installs dependencies for every client installing the package.

Since most clients don't have pandoc installed (and they shouldn't need it), using setuptools-markdown isn't an option and we can't have nice things. The stack trace that shows up seems to indicate that fixing this might be as easy as putting a try clause around your setup.py:

Collecting pypandoc
  Downloading pypandoc-1.0.4.tar.gz
    ---------------------------------------------------------------
    An error occurred while trying to run `pandoc`
    Maybe try:
        sudo yum install pandoc
    See http://johnmacfarlane.net/pandoc/installing.html
    for installation options
    ---------------------------------------------------------------
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/tmp/pip-build-SRTyxW/pypandoc/setup.py", line 6, in <module>
        long_description = pypandoc.convert('README.md', 'rst')
      File "pypandoc.py", line 94, in convert
        outputfile=outputfile, filters=filters)
      File "pypandoc.py", line 114, in _convert
        from_formats, to_formats = get_pandoc_formats()
      File "pypandoc.py", line 276, in get_pandoc_formats
        raise OSError("You probably do not have pandoc installed.")
    OSError: You probably do not have pandoc installed.
    Complete output from command python setup.py egg_info:
    ---------------------------------------------------------------

    An error occurred while trying to run `pandoc`

    Maybe try:



        sudo yum install pandoc

    See http://johnmacfarlane.net/pandoc/installing.html

    for installation options

    ---------------------------------------------------------------



    Traceback (most recent call last):

      File "<string>", line 20, in <module>

      File "/tmp/pip-build-SRTyxW/pypandoc/setup.py", line 6, in <module>

        long_description = pypandoc.convert('README.md', 'rst')

      File "pypandoc.py", line 94, in convert

        outputfile=outputfile, filters=filters)

      File "pypandoc.py", line 114, in _convert

        from_formats, to_formats = get_pandoc_formats()

      File "pypandoc.py", line 276, in get_pandoc_formats

        raise OSError("You probably do not have pandoc installed.")

    OSError: You probably do not have pandoc installed.

    ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-SRTyxW/pypandoc

Fix problem with uploads for multiple py3 files on appveyor

https://ci.appveyor.com/project/JanSchulz/pypandoc-apo32/build/1.0.29/job/lwftccqdyuh7n6qd
-> we only upload one py2 and one py3 file, so we fail on the two more py3 builds

Pandoc 1.18 compatibility

Looks like the first change here broke some things for pypandoc.

Without looking at anything other than the error message, it looks like before you were parsing the output of pandoc --version to get the list of input and output formats. Now you should just be able to parse pandoc --list-input-formats, likewise for output.

Decode as utf-8

Since pypandoc uses utf-8 encoding but not decoding, it throws an error if there is utf-8 in the output. For example,

import pypandoc
pypandoc.convert(u'Φ', 'html', format='md')

pandoc uses utf-8 for both encoding as well as decoding, so fixing this should be pretty quick. I'll dig around a bit..!

Coordinate a release - 0.90

Add @msabramo's contributions to README
Figure out what to with #9. @msabramo, I would appreciate your input here.

Am I missing something? I can't think of any other tasks.

pypandoc.convert() doesn't handle errors properly

Function pypandoc.convert() doesn't handle errors properly.

When I call pypandoc.convert(filepath, 'rst') with filepath which doesn't
exists, I got puzzling exception RuntimeError: Missing format! insdead of
statement of actual root cause of the error.

Similar issue happen when an empty string or int is passed as a filepath.

Reproducer

Here is simple reproducer:

#!/usr/bin/env python
# -*- coding: utf8 -*-


import traceback
import sys

import pypandoc


files = [
    '/home/martin/projects/pyp2rpm/README.md',
    '/home/martin/projects/pypandoc/README.md',
    '/home/martin/tmp/doesnotexists.md',
    '',
    42,
    None,
    ]

for filepath in files:
    print("converting '{0}' (type: {1})".format(filepath, type(filepath)))
    try:
        pypandoc.convert(filepath, 'rst')
    except Exception:
        traceback.print_exc(file=sys.stdout)
    print()

When I run this script on my machine where 1st two files exists, I got
the following output:

$ ./reproducer.py
converting '/home/martin/projects/pyp2rpm/README.md' (type: <class 'str'>)

converting '/home/martin/projects/pypandoc/README.md' (type: <class 'str'>)

converting '/home/martin/tmp/doesnotexists.md' (type: <class 'str'>)
Traceback (most recent call last):
  File "./reproducer.py", line 23, in <module>
    pypandoc.convert(filepath, 'rst')
  File "/home/martin/projects/pypandoc/.env/lib/python3.4/site-packages/pypandoc-1.1.2-py3.4.egg/pypandoc/__init__.py", line 50, in convert
    outputfile=outputfile, filters=filters)
  File "/home/martin/projects/pypandoc/.env/lib/python3.4/site-packages/pypandoc-1.1.2-py3.4.egg/pypandoc/__init__.py", line 68, in _convert
    raise RuntimeError('Missing format!')
RuntimeError: Missing format!

converting '' (type: <class 'str'>)
Traceback (most recent call last):
  File "./reproducer.py", line 23, in <module>
    pypandoc.convert(filepath, 'rst')
  File "/home/martin/projects/pypandoc/.env/lib/python3.4/site-packages/pypandoc-1.1.2-py3.4.egg/pypandoc/__init__.py", line 50, in convert
    outputfile=outputfile, filters=filters)
  File "/home/martin/projects/pypandoc/.env/lib/python3.4/site-packages/pypandoc-1.1.2-py3.4.egg/pypandoc/__init__.py", line 68, in _convert
    raise RuntimeError('Missing format!')
RuntimeError: Missing format!

converting '42' (type: <class 'int'>)
Traceback (most recent call last):
  File "./reproducer.py", line 23, in <module>
    pypandoc.convert(filepath, 'rst')
  File "/home/martin/projects/pypandoc/.env/lib/python3.4/site-packages/pypandoc-1.1.2-py3.4.egg/pypandoc/__init__.py", line 50, in convert
    outputfile=outputfile, filters=filters)
  File "/home/martin/projects/pypandoc/.env/lib/python3.4/site-packages/pypandoc-1.1.2-py3.4.egg/pypandoc/__init__.py", line 68, in _convert
    raise RuntimeError('Missing format!')
RuntimeError: Missing format!

converting 'None' (type: <class 'NoneType'>)
Traceback (most recent call last):
  File "./reproducer.py", line 23, in <module>
    pypandoc.convert(filepath, 'rst')
  File "/home/martin/projects/pypandoc/.env/lib/python3.4/site-packages/pypandoc-1.1.2-py3.4.egg/pypandoc/__init__.py", line 50, in convert
    outputfile=outputfile, filters=filters)
  File "/home/martin/projects/pypandoc/.env/lib/python3.4/site-packages/pypandoc-1.1.2-py3.4.egg/pypandoc/__init__.py", line 55, in _convert
    source, format, input_type = reader(source, format, encoding=encoding)
  File "/home/martin/projects/pypandoc/.env/lib/python3.4/site-packages/pypandoc-1.1.2-py3.4.egg/pypandoc/__init__.py", line 96, in _read_file
    path = os.path.exists(source)
  File "/home/martin/projects/pypandoc/.env/lib/python3.4/genericpath.py", line 19, in exists
    os.stat(path)
TypeError: stat: can't specify None for path argument

latex not properly converted

This

pypandoc.convert('\\"o{A}', 'latex', format='markdown')

gives

u'"o\\{A\\}\n'

when it should give

u'öA'

This is with

>>> pypandoc.__version__
'1.1.2'

Idea: Create wheels bundled with pandoc

I think the one tricky thing about this module is that it relies on having pandoc installed and that's not Python.

I wonder if you can create wheels on OS X and Linux that bundle a pre-built pandoc binary. You'd have to check with John MacFarlane to see if he's ok with this, but I think it will meet the license (GPL) as long as you include a notice with a link to the source code.

I mention OS X and Windows and not Linux because PyPI won't let you upload binary wheels for Linux, because it's very hard to build one that works reliably across different Linux versions.

Support URL paths

The _identify_path function doesn't treat web URLs as path, but it should.

pandoc supports stuff like: pandoc https://www.xyplorer.com/whatsnew.php -o "file.pdf"

pypandoc fails with: ValueError: stat: path too long for Windows

Hi,
I wrote a small script using pypandoc, by passing it a preformatted string, and it works like a charm on Linux, but when I tried it on Windows it fails with the following Traceback:

Traceback (most recent call last):
  File ".\PycharmProjects\reddit2ebook\reddit2ebook\main.py", line 101, in <module>
    main()
  File ".\PycharmProjects\reddit2ebook\reddit2ebook\main.py", line 45, in main
    create_ebook(text, directory, bookname)
  File ".\PycharmProjects\reddit2
ebook\reddit2ebook\main.py", line 96, in create_ebook
    outputfile=directory + ebook_name + '.epub')
  File "C:\Python34\lib\site-packages\pypandoc.py", line 94, in convert
    outputfile=outputfile, filters=filters)
  File "C:\Python34\lib\site-packages\pypandoc.py", line 99, in _convert
    source, format = reader(source, format, encoding=encoding)
  File "C:\Python34\lib\site-packages\pypandoc.py", line 140, in _read_file
    path = os.path.exists(source)
  File "C:\Python34\lib\genericpath.py", line 19, in exists
    os.stat(path)
ValueError: stat: path too long for Windows

It seems that _read_file tries to call os.path.exists() with the source to check if its a the path to a valid file or a string containing formatted text. While this works on *nix Systems, it fails on Windows due to the limit of 255 characters per path, which most texts easily surpass.

For now I could fix it by including:

    except ValueError:
        path = ''

in line 142 after the try catch block

Pypandoc can not be used if $HOME is not set

Pandoc and any functions of pypandoc fail is $HOME environment variable is not set:

>>> import os, pypandoc
>>> os.unsetenv('HOME')
>>> pypandoc.get_pandoc_version()
pandoc: HOME: getAppUserDataDirectory: does not exist (no environment variable)
Maybe try:

    sudo yum install pandoc
See http://johnmacfarlane.net/pandoc/installing.html
for installation options
---------------------------------------------------------------

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/pypandoc/__init__.py", line 251, in get_pandoc_version
    _ensure_pandoc_path()
  File "/usr/lib/python2.7/site-packages/pypandoc/__init__.py", line 336, in _ensure_pandoc_path
    raise OSError("No pandoc was found: either install pandoc and add it\n"
OSError: No pandoc was found: either install pandoc and add it
to your PATH or install pypandoc wheels with included pandoc.

As the first line of the output, which comes from stderr, shows, pandoc --version tries to look up $HOME (to show default data directory) and fails because it is not set.

This is undesirable when pypandoc is used from a user with no $HOME, such as apache user.

A workaround is to use os.putenv, but the effect is somewhat non-local. Better solution would to utlilize --data-dir option of pandoc command.

passing variables to pypandoc?

I'm writing a small script to take an input file from argparse for conversion and publication of .md files to git repos and two mediawikis. Super happy to find this wrapper but I'm running into an issue, and I can't work out if it's my bad python knowledge or pypandoc, here is where I'm at:

#!/usr/bin/env python

import argparse
import mwclient
import pypandoc

parser = argparse.ArgumentParser(description='Cross publish your texts to Mediawikis and git repos')
parser.add_argument('-f', '--file', type=argparse.FileType('r'), help='your input text file')
args = parser.parse_args()
print args.file.readlines()

input = args.file.readlines()
print input
output = pypandoc.convert_file(input, 'mediawiki')
print output

but when I run it, I get this response:

Traceback (most recent call last):
  File "crosspublish.py", line 14, in <module>
    output = pypandoc.convert_file(input, 'mediawiki')
  File "/usr/local/lib/python2.7/dist-packages/pypandoc/__init__.py", line 137, in convert_file
    raise RuntimeError("source_file is not a valid path")
RuntimeError: source_file is not a valid path

For which I can not fine a solution. Is it not possible to pass variables to pypandoc?

Many many thanks, and all the best.

Build on py3.5 fails

$ obvci_conda_build_dir ./recipes janschulz --channel main --build-condition "python >=3.5,<3.6"
Fetching package metadata: ......
Resolving distributions from 5 recipes... 
Computed that there are 4 distributions from the 5 recipes:
Resolved dependencies, will be built in the following order: 
    matplotlib-1.5.0+532.g3d942f7.dirty-py35_0 (will be built: False)
    pypandoc-1.1.3-py35_0 (will be built: True)
    statsmodels-0.6.1-py35_0 (will be built: True)
    ipyext-0.1.0-py35_0 (will be built: False)
Nothing to be done for matplotlib - it is already on main.
Building  pypandoc-1.1.3-py35_0
Removing old build environment
Removing old work directory
BUILD START: pypandoc-1.1.3-py35_0
Fetching package metadata: ........
Solving package specifications: ..........
The following packages will be downloaded:
    package                    |            build
    ---------------------------|-----------------
    python-3.5.1               |                0        12.7 MB  defaults
    setuptools-19.2            |           py35_0         349 KB  defaults
    wheel-0.26.0               |           py35_1          77 KB  defaults
    pip-7.1.2                  |           py35_0         1.4 MB  defaults
    ------------------------------------------------------------
                                           Total:        14.6 MB
The following NEW packages will be INSTALLED:
    openssl:    1.0.2e-0      defaults
    pip:        7.1.2-py35_0  defaults
    python:     3.5.1-0       defaults
    readline:   6.2-2         defaults
    setuptools: 19.2-py35_0   defaults
    sqlite:     3.9.2-0       defaults
    tk:         8.5.18-0      defaults
    wheel:      0.26.0-py35_1 defaults
    xz:         5.0.5-0       defaults
    zlib:       1.2.8-0       defaults
Removing old work directory
checkout: u'v1.1.3'
==> git log -n1 <==
commit 7f03f316f2015fd8254fb2387681008f2b597444
Author: Jan Schulz <[email protected]>
Date:   Fri Jan 15 21:27:17 2016 +0100
    pypandoc v1.1.3
==> git describe --tags --dirty <==
v1.1.3
==> git status <==
HEAD detached at v1.1.3
nothing to commit, working directory clean
Package: pypandoc-1.1.3-py35_0
source tree in: /Users/travis/miniconda/conda-bld/work
+ /Users/travis/miniconda/envs/_build/bin/python setup.py download_pandoc
Maybe try:
    brew install pandoc
See http://johnmacfarlane.net/pandoc/installing.html
for installation options
---------------------------------------------------------------
!!! pandoc not found, long_description is bad, don't upload this to PyPI !!!
running download_pandoc
* Downloading pandoc from https://github.com/jgm/pandoc/releases/download/1.15.1/pandoc-1.15.1-osx.pkg ...
* Unpacking pandoc-1.15.1-osx.pkg to tempfolder...
x .
x ./usr
x ./usr/local
x ./usr/local/bin
x ./usr/local/bin/pandoc
x ./usr/local/bin/pandoc-citeproc
x ./usr/local/share
x ./usr/local/share/man
x ./usr/local/share/man/man1
x ./usr/local/share/man/man1/pandoc-citeproc.1
x ./usr/local/share/man/man1/pandoc.1
x ./usr/local/share/man/man5
* Copying pandoc to /Users/travis/miniconda/conda-bld/work/pypandoc/files ...
* Copying pandoc-citeproc to /Users/travis/miniconda/conda-bld/work/pypandoc/files ...
* Done.
++ uname -s
+ '[' Darwin '!=' Linux ']'
+ echo 'Build wheels...'
Build wheels...
+ /Users/travis/miniconda/envs/_build/bin/python setup.py bdist_wheel
Maybe try:
    brew install pandoc
See http://johnmacfarlane.net/pandoc/installing.html
for installation options
---------------------------------------------------------------
!!! pandoc not found, long_description is bad, don't upload this to PyPI !!!
Patching wheel building...
Making sure that wheel is platform specific...
running bdist_wheel
Traceback (most recent call last):
  File "setup.py", line 238, in <module>
    cmdclass=cmd_classes
  File "/Users/travis/miniconda/envs/_build/lib/python3.5/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/Users/travis/miniconda/envs/_build/lib/python3.5/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/Users/travis/miniconda/envs/_build/lib/python3.5/distutils/dist.py", line 973, in run_command
    cmd_obj.ensure_finalized()
  File "/Users/travis/miniconda/envs/_build/lib/python3.5/distutils/cmd.py", line 107, in ensure_finalized
    self.finalize_options()
  File "setup.py", line 210, in finalize_options
    assert "any" not in self.get_archive_basename(), "bdist_wheel will not generate platform specific names, aborting!"
  File "/Users/travis/miniconda/envs/_build/lib/python3.5/site-packages/wheel/bdist_wheel.py", line 162, in get_archive_basename
    impl_tag, abi_tag, plat_tag = self.get_tag()
  File "setup.py", line 206, in get_tag
    assert tag == supported_tags[0]
AssertionError
Command failed: /bin/bash -x -e /Users/travis/build/JanSchulz/package-builder/recipes/pypandoc/build.sh
The command "obvci_conda_build_dir ./recipes janschulz --channel main --build-condition "python >=3.5,<3.6"" exited with 1.

Will probably be fixed by #84

JSON is not recognized as input format

When converting a JSON document (pandoc AST) to Markdown, I get the following error:

97      def _convert(reader, processor, source, to, format=None, extra_args=(), encoding=None,
 98                  outputfile=None, filters=None):
 99         source, format, input_type = reader(source, format, encoding=encoding)
100     
101         formats = {
102             'dbk': 'docbook',
103             'md': 'markdown',
104             'rest': 'rst',
105             'tex': 'latex',
106         }
107     
108         format = formats.get(format, format)
109         to = formats.get(to, to)
110     
111         if not format:
112             raise RuntimeError('Missing format!')
113     
114         from_formats, to_formats = get_pandoc_formats()
115     
116         if _get_base_format(format) not in from_formats:
117             raise RuntimeError(
118                 'Invalid input format! Expected one of these: ' +
119  ->             ', '.join(from_formats))
120     
121         base_to_format = _get_base_format(to)
122         if base_to_format not in to_formats:
123             raise RuntimeError(
124                 'Invalid output format! Expected one of these: ' +
125                 ', '.join(to_formats))
126     
127         # list from https://github.com/jgm/pandoc/blob/master/pandoc.hs
128         # `[...] where binaries = ["odt","docx","epub","epub3"] [...]`
129         if base_to_format in ["odt", "docx", "epub", "epub3"] and not outputfile:
130             raise RuntimeError(
131                 'Output to %s only works by using a outputfile.' % base_to_format
132             )
133     
134         return processor(source, input_type, to, format, extra_args,
135                          outputfile=outputfile, filters=filters)
(Pdb) p from_formats
['commonmark', 'docbook', 'docx', 'epub', 'haddock', 'html', 'json*', 'latex', 'markdown', 'markdown_github', 'markdown_mmd', 'markdown_phpextra', 'markdown_strict', 'mediawiki', 'native', 'opml', 'org', 'rst', 't2t', 'textile', "twiki                 [ *only Pandoc's JSON version of native AST]"]

It looks like the list contains json* whereas format == 'json', so the test L122 fails

OS X 10.10.5
Python 3.4.3
pypandoc 1.0.2
pandoc 1.15.0.6

Specifying path to pandoc

Is there some way to specify the path to pandoc within a script? I am importing pypandoc into a Flask app that runs on an Apache server, and since the server runs as a special Apache user, the app can't seem to find pandoc even though it's installed in my own user path. I know pyandoc had a way around this with a variable for the pandoc path, but can't tell if there's a way to manage that here.

Please add license text to the released tarball

I've noticed that the tarball released on pypi is missing the LICENSE file, and while there are references to the MIT license, the only way to know the actual licensing terms is from the git repo.

Also, the code taken from IPython py3compat.py and encoding.py is marked as under the Modified BSD License, but the actual text of that license (plus copyright notice) is not repeated anywhere in the package (as requested by the license: "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software."). IANAL, but I believe that adding the text of the Modified BSD to LICENSE (including the copyright notice and specifying the parts to which it applies) should be enought.

Thanks in advance

Bad test file generated from spec during setup

$ pip install pypandoc
...
Installing collected packages: pypandoc
  Running setup.py install for pypandoc
    SyntaxError: ("'return' outside function", ('/usr/local/lib/python2.7/dist-packages/pypandoc/test_pypandoc.py', 15, None, 'return _convert(reader, processor, source, to, format, extra_args)\n'))

the test_pypandoc.py file that has been generated looks a bit mangled:

def test_converter(to, format=None, extra_args=()):

    def reader(*args):

        pass

import unittest
import pypandoc
class TestPypandoc(unittest.TestCase):
    def processor(*args):

        return 'ok'

        source = 'foo'
    return _convert(reader, processor, source, to, format, extra_args)


    def test_converts_valid_format(self):
        self.assertEqual(test_converter(format='md', to='rest'), 'ok')

    def test_does_not_convert_to_invalid_format(self):
        try:test_converter(format='md', to='invalid') 
        except RunTimeError: pass

    def test_does_not_convert_from_invalid_format(self):
        try:test_converter(format='invalid', to='rest') 
        except RunTimeError: pass
suite = unittest.TestLoader().loadTestsFromTestCase(TestPypandoc)
unittest.TextTestRunner(verbosity=2).run(suite)

More generally, can you tell me what package is turning the .spec file into the test_X.py ? I've never seen a .spec file before

check returntype of pandoc

In[7]: output = pypandoc.convert('<h1>Primary Heading</h1>',
    'md', format='html',
    extra_args=['--atx-headers', "--blah"])
pandoc: unrecognized option `--blah'
Try pandoc --help for more information.
In[8]: output
Out[8]: u''