Giter Site home page Giter Site logo

mdtopdf's Introduction

mdtopdf

CI GoDoc License

Introduction: Markdown to PDF

This package depends on two other packages:

  • The gomarkdown parser to read the markdown source
  • The fpdf packace to generate the PDF

Both of the above are documented at Go Docs.

The tests included here are from the BlackFriday package. See the "testdata" folder. The tests create PDF files and thus while the tests may complete without errors, visual inspection of the created PDF is the only way to determine if the tests really pass!

The tests create log files that trace the gomarkdown parser callbacks. This is a valuable debug tool showing each callback and data provided in each while the AST is presented.

Supported Markdown

The supported elements of markdown are:

  • Emphasized and strong text
  • Headings 1-6
  • Ordered and unordered lists
  • Nested lists
  • Images
  • Tables (but see limitations below)
  • Links
  • Code blocks and backticked text

How to use of non-Latin fonts/languages is documented in a section below.

Limitations and Known Issues

  1. It is common for Markdown to include HTML. HTML is treated as a "code block". There is no attempt to convert raw HTML to PDF.

  2. Github-flavored Markdown permits strikethough using tildes. This is not supported at present by fpdf as a font style.

  3. The markdown link title, which would show when converted to HTML as hover-over text, is not supported. The generated PDF will show the actual URL that will be used if clicked, but this is a function of the PDF viewer.

  4. Currently all levels of unordered lists use a dash for the bullet. This is a planned fix; see here.

  5. Definition lists are not supported (not sure that markdown supports them -- I need to research this)

  6. The following text features may be tweaked: font, size, spacing, styile, fill color, and text color. These are exported and available via the Styler struct. Note that fill color only works if the text is ouput using CellFormat(). This is the case for: tables, codeblocks, and backticked text.

  7. Tables are supported, but no attempt is made to ensure fit. You can, however, change the font size and spacing to make it smaller. See example.

Installation

To install the package, run the usual go get:

$ go get github.com/mandolyte/mdtopdf

You can also install the md2pdf binary directly onto your $GOBIN dir with:

$ go install github.com/mandolyte/mdtopdf/cmd/md2pdf@latest

Syntax highlighting

mdtopdf supports colourised output via the gohighlight module.

For examples, see testdata/Markdown Documentation - Syntax.text and testdata/Markdown Documentation - Syntax.pdf

Quick start

In the cmd folder is an example using the package. It demonstrates a number of features. The test PDF was created with this command:

$ go run md2pdf.go -i test.md -o test.pdf

To benefit from Syntax highlighting, invoke thusly:

$ go run md2pdf.go -i syn_test.md -s /path/to/syntax_files -o test.pdf

To convert multiple MD files into a single PDF, use:

$ go run md2pdf.go -i /path/to/md/directory -o test.pdf

This repo has the gohighlight module configured as a submodule so if you clone with --recursive, you will have the highlight dir in its root. Alternatively, you may issue the below to update an existing clone:

git submodule update --remote

Note 1: the cmd folder has an example for the syntax highlighting. See the script run_syntax_highlighting.sh. This example assumes that the folder with the syntax files is located at relative location: ../../../jessp01/gohighlight/syntax_files.

Note 2: when annotating the code block to specify the language, the annotation name must match syntax base filename.

Additional options

  -i string
	Input filename, dir consisting of .md|.markdown files or HTTP(s) URL; default is os.Stdin
  -o string
    	Output PDF filename; required
  -s string
    	Path to github.com/jessp01/gohighlight/syntax_files
  --new-page-on-hr
    	Interpret HR as a new page; useful for presentations
  --page-size string
    	[A3 | A4 | A5] (default "A4")
  --theme string
    	[light|dark] (default "light")
  --title string
    	Presentation title
  --author string
    	Author; used if -footer is passed
  --font-file string
    	path to font file to use
  --font-name string
    	Font name ID; e.g 'Helvetica-1251'
  --unicode-encoding string
    	e.g 'cp1251'
  --with-footer
    	Print doc footer (author  title  page number)
  --help
    	Show usage message

For example, the below will:

  • Set the title to My Grand Title
  • Set Random Bloke as the author (used in the footer)
  • Set the dark theme
  • Start a new page when encountering a HR (---); useful for creating presentations
  • Print a footer (author name, title, page number)
$ go run md2pdf.go  -i /path/to/md \
    -o /path/to/pdf --title "My Grand Title" --author "Random Bloke" \
    --theme dark --new-page-on-hr --with-footer

Using non-ASCII Glyphs/Fonts

In order to use a non-ASCII language there are a number things that must be done. The PDF generator must be configured WithUnicodeTranslator:

// https://en.wikipedia.org/wiki/Windows-1251
pf := mdtopdf.NewPdfRenderer("", "", *output, "trace.log", mdtopdf.WithUnicodeTranslator("cp1251")) 

In addition, this package's Styler must be used to set the font to match that is configured with the PDF generator.

A complete working example may be found for Russian in the cmd folder nameed russian.go.

For a full example, run:

$ go run md2pdf.go -i russian.md -o russian.pdf \
    --unicode-encoding cp1251 --font-file helvetica_1251.json --font-name Helvetica_1251

Note to Self

In order to update pkg.go.dev with latest release, the following will do the trick. Essentially, it is creating a module and then running the go get command for the desired release. Using the proxy will have the side effect of updating the info on the go pkg web site.

$ pwd
/home/cecil/Downloads
$ mkdir tmp
$ cd tmp
$ ls
$ go mod init example.com/mypkg
go: creating new go.mod: module example.com/mypkg
$ cat go.mod 
module example.com/mypkg

go 1.20
$ GOPROXY=https://proxy.golang.org GO111MODULE=on go get github.com/mandolyte/[email protected]
go: added github.com/go-pdf/fpdf v0.8.0
go: added github.com/jessp01/gohighlight v0.21.1-7
go: added github.com/mandolyte/mdtopdf v1.4.1
go: added github.com/gomarkdown/markdown 
go: added gopkg.in/yaml.v2 v2.4.0

mdtopdf's People

Contributors

a-h avatar dcarbone avatar dcu avatar dependabot[bot] avatar ehsanx64 avatar ernierasta avatar jessp01 avatar mandolyte avatar prasannahanu avatar richmahn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mdtopdf's Issues

could not locate "helveticaBB"

Problem

While working with this library I found an interesting case. If you create this markdown

### **Email:** issuer\_email

and when you try to convert, you will get an error:

➜  go run convert.go -i bad-markdown.md -o result.pdf 
2023/01/20 16:37:07 pdf.OutputFileAndClose() error:Pdf.OutputFileAndClose() error on result.pdf:could not locate "helveticaBB" among embedded core font definition files

Additional info
My understanding, gotopdf missing some fonts for italic headers

Support for a "book form" PDF

Sorry about the issue title. What I mean specifically is to support multiple markdown files to create one PDF.

This would permit a scenario of writing a book or article with multiple chapters or sections, where each chapter/section is in its own markdown file.

Can you support scaling of images?

Can you support scaling of images in this project? They are coming out at 100% which can be disproportional for the page, in the context of an eBook or white-paper this would be particularly useful.

Incorrect url in the readme.md

In the Syntax Highlighting section, in the README.md file, the link to gohighlight module is incorrect (https://github.com/mandolyte/mdtopdf/blob/master/github.com/jessp01/gohighlight) which causes a 404.

Table rendering issues

Hi. First off great project.

I seem to be having problems rendering tables. Columns with very long lines overlap the table's boundary.

### Some Title Goes Here

| Horizonal Header | Horizonal Value                                                      |
|-------------------------|----------------------------------------------------------------------|
| Period                  | 2023/1                                                                                                  |
| Project          | Project #1                                                                                             |
| Manager       | Very long name goes here without wrapping with ellipses if possible. |
| Access ID     | 1234567890                                                           |

This is what I get, which is not what I need

image

Also is it possible to have horizontal header/value layout?

Panic when markdown has dollar signs in it

The following markdown doc causes mdtopdf to panic:

Bob had $20 and but lost $10.

The panic is:

$ ./md2pdf -i  /tmp/foo.md  -o /tmp/out.md.pdf
panic: Unknown node type *ast.Math

goroutine 1 [running]:
github.com/mandolyte/mdtopdf.(*PdfRenderer).RenderNode(0x4000030800?, {0x0?, 0x40001a1b08?}, {0x45ac90?, 0x4000020aa0?}, 0xc0?)
        /home/psanford/projects/thirdparty/mdtopdf/mdtopdf.go:454 +0x5e4
github.com/gomarkdown/markdown.Render.func1({0x45ac90?, 0x4000020aa0?}, 0x80?)
        /home/psanford/.cache/gopath/pkg/mod/github.com/gomarkdown/[email protected]/markdown.go:63 +0x50
github.com/gomarkdown/markdown/ast.NodeVisitorFunc.Visit(0x4000172840?, {0x45ac90?, 0x4000020aa0?}, 0x68?)
        /home/psanford/.cache/gopath/pkg/mod/github.com/gomarkdown/[email protected]/ast/node.go:574 +0x38
github.com/gomarkdown/markdown/ast.Walk({0x45ac90, 0x4000020aa0}, {0x457420, 0x400007b3e0})
        /home/psanford/.cache/gopath/pkg/mod/github.com/gomarkdown/[email protected]/ast/node.go:546 +0x58
github.com/gomarkdown/markdown/ast.Walk({0x45a430, 0x4000172840}, {0x457420, 0x400007b3e0})
        /home/psanford/.cache/gopath/pkg/mod/github.com/gomarkdown/[email protected]/ast/node.go:557 +0x144
github.com/gomarkdown/markdown/ast.Walk({0x45a3e8, 0x40001727e0}, {0x457420, 0x400007b3e0})
        /home/psanford/.cache/gopath/pkg/mod/github.com/gomarkdown/[email protected]/ast/node.go:557 +0x144
github.com/gomarkdown/markdown/ast.WalkFunc(...)
        /home/psanford/.cache/gopath/pkg/mod/github.com/gomarkdown/[email protected]/ast/node.go:580
github.com/gomarkdown/markdown.Render({0x45a3e8, 0x40001727e0}, {0x458c20, 0x40001d0000})
        /home/psanford/.cache/gopath/pkg/mod/github.com/gomarkdown/[email protected]/markdown.go:62 +0xd0
github.com/mandolyte/mdtopdf.(*PdfRenderer).Run(0x40001d0000, {0x400003e200?, 0x242?, 0x40001a1d30?})
        /home/psanford/projects/thirdparty/mdtopdf/mdtopdf.go:344 +0xb4
github.com/mandolyte/mdtopdf.(*PdfRenderer).Process(0x40001d0000, {0x400003e200?, 0x1e?, 0x200?})
        /home/psanford/projects/thirdparty/mdtopdf/mdtopdf.go:318 +0x1b8
main.main()
        /home/psanford/projects/thirdparty/mdtopdf/cmd/md2pdf/md2pdf.go:157 +0x9fc

Seems like the issue is that you have enabled the MathJax extension via parser.CommonExtensions but you don't actually support those AST types.

I'm not trying to use MathJax so it would be nice if there was an option to override the markdown extensions enabled.

Installing latest binary results with binary named "cmd", not "mdtopdf"

@jessp01 - am I doing something wrong? Below is a transcript.

$ echo $GOBIN
/home/cecil/go/bin
$ go install github.com/mandolyte/mdtopdf/cmd@latest
$ mdtopdf -h
-bash: mdtopdf: command not found
$ ls $GOBIN
catcsv      csvq      gomodifytags  impl         reordercsv        staticcheck
cmd         dedupcsv  goplay        pivotcsv     simplehttpserver  transformcsv
comparecsv  diffcsv   gopls         recursecsv   sortcsv
convert     dlv       gotests       recursedata  splitcsv
$ cmd -h
Usage of cmd:
  -author string
        Author; used if -footer is passed
  -font-file string
        path to font file to use
  -font-name string
        Font name ID; e.g 'Helvetica-1251'
  -help
        Show usage message
  -i string
        Input filename or HTTP(s) URL; default is os.Stdin
  -new-page-on-hr
        Interpret HR as a new page; useful for presentations
  -o string
        Output PDF filename; required
  -orientation string
        [portrait | landscape] (default "portrait")
  -page-size string
        [A3 | A4 | A5] (default "A4")
  -s string
        Path to github.com/jessp01/gohighlight/syntax_files
  -theme string
        [light|dark] (default "light")
  -title string
        Presentation title
  -unicode-encoding string
        e.g 'cp1251'
  -with-footer
        Print doc footer (author  title  page number)
$ go install github.com/mandolyte/mdtopdf/cmd/mdtopdf@latest
go: github.com/mandolyte/mdtopdf/cmd/mdtopdf@latest: module github.com/mandolyte/mdtopdf@latest found (v1.5.0), but does not contain package github.com/mandolyte/mdtopdf/cmd/mdtopdf
$ 

Chinese encoding problems

I noticed that the chinese pdf example you generated is garbled.

https://github.com/mandolyte/mdtopdf/blob/master/cmd/chinese.pdf

Thanks and...

This project is awesome, thank you! We gophers are just too awesome!

Would it ever make sense to do a "dark mode" feature or better yet have a pluggable palette of 3-4 colors. :D

Thanks again,
Drew

Table support

Need to review gofpdf examples (if any) of creating tables. BlackFriday provides events for:

  • table
  • table head
  • table body
  • table row
  • table cell

I have not yet tested the BF support for tables... so that should be the first action - make sure it really works.

Consider using gofpdf

Note that github.com/jung-kurt/gofpdf is no longer being maintained, whereas github.com/go-pdf/fpdf is a drop-in replacement (I have done so on my local copy), that is being actively maintained and has performance enhancements.

Non-local and non-png images cause conversion error

Use a markdown document that references an image that is either not local or isn't a png. An example is the README.md for this project, which has a godoc image link. Run mktopdf -i <the_markdown_doc> -o test.pdf.

Expected behaviour is that the output doesn't has the image, fine since not all markdown/HTML syntax is supported. A warning might be useful.

Instead, export completely fails with an error indicating that it cannot open the file path and dumps the http URL for the image link.

Idea: themed output

Since markdown has no real styling information and PDF has no real equivalent to after-the-fact styling like CSS it would be interesting if this tool had some theming concept. Out of the box, it would be great to support certain common styles of PDF documents. One example that comes to mind is an academic paper style (e.g. https://arxiv.org/pdf/1801.00013.pdf). Another could be a user guide style that resembles certain common manuals. I think it would also be useful to have some way to add more themes to the tool whether programmatically or declaratively.

Does it support non latin characters?

source
Влюбиться можно в красоту, но полюбить – лишь только душу!

result
ВлюбитÑOEÑ•Ñ• можно в краѕоту, но полюбитÑOE
– лишÑOE толÑOEко душу!

This is how it makes text with non latin characters. A required font is in the project folder and the font name is used in Styler options. How to fix it?

get the latest version

Hi author, I noticed that the latest version has reached v2.2.4, but when I go get it, I only get v1.5.3, is there any way to use the latest code

List support

The following remains to be done:

  • different bullets for nested items
  • paragraphs within an item is not working correctly (will likely require push/pop stack to manage proper context)
  • loose vs tight is not supported; only "tight" is working at present.

Needed changes in light of the creation of v2

Hi @mandolyte ,

Since we're now on v2, some structure changes are required as, when one runs:

go install github.com/mandolyte/mdtopdf/cmd/md2pdf@latest

The received version is v1.5.3, not v2.2.1.

When trying to explicitly install v2.2.1:

$ go install github.com/mandolyte/mdtopdf/cmd/[email protected]

One would get this message:

go: github.com/mandolyte/mdtopdf/cmd/[email protected]: github.com/mandolyte/[email protected]: invalid version: module contains a go.mod file, so module path must match major version ("github.com/mandolyte/mdtopdf/v2")

Please see https://go.dev/blog/v2-go-modules

Happy to help with the restructure if you'd like.

Cheers,

md2pdf_trace.log by default

I noticed that the tool creates an output trace file by default. This isn't ideal, since the trace contains the contents of the file and leaves it on the system.

Truncation of lines in long code blocks

Try converting a markdown document with a wide line in the code block:

This line is really quite long, long and even more long so that the text reaches the edge of the page. More and more and more.

It is expected that either the text is wrapped or the code block font size adjusted to accommodate the width of the line.

Instead, the text goes outside of the edge of the grey code block box and gets truncated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.