lruiz / markdownpapers Goto Github PK

View Code? Open in Web Editor NEW

149.0 15.0 23.0 449 KB

Markdown parser and transformer implemented in Java

Home Page: markdown.tautua.org

Shell 0.15% Java 46.78% HTML 52.96% Batchfile 0.11%

markdown java javacc maven-plugin

markdownpapers's People

Contributors

Stargazers

Watchers

markdownpapers's Issues

In correct ending tag

when I have

        <i class="icon-download"></i> // explicitly close tag

It becomes

        <i class="icon-download"/> // the close tag is gone

Note the <i> tag cannot be self closed, the resulted html code cause page rendering problem in browsers.

Would like to request a new release

Thanks for fixing some bugs in the last few months; could you please create a new release at https://github.com/lruiz/MarkdownPapers/downloads?

Embedded html tags get escaped

MarkdownPapers escape my embedded html tags with data-xx attribute:

<div data-bind="foreach:...">
foo
</div>

Becomes:

<p>&lt;div data-bind=&quot;foreach:...&quot;&gt;
foo
&lt;/div&gt;</p>

code block with explicit language

In github flavoured markdown I can specify language for code: https://help.github.com/articles/github-flavored-markdown#syntax-highlighting
Some syntax highlighters (like google prettify) enables extended highlighting for this language.
How can I implement it using markdownpapers?

** * _ not interpreted

I used the unit test from the gitblit project to do the following tests, Bold, Italic and Underscore are the most common use of markdown.

Test string -> error

"*B*" -> null pointer
"* B *" ->  null pointer
**B** -> null pointer
"** B **" -> java.text.ParseException: Encountered " "*" "* "" at line 1, column 2.

Was expecting one of:
" " ...
"\t" ...
" " ...

at com.gitblit.utils.MarkdownUtils.transformMarkdown(MarkdownUtils.java:68)
at com.gitblit.utils.MarkdownUtils.transformMarkdown(MarkdownUtils.java:44)
at com.gitblit.tests.MarkdownUtilsTest.testMarkdown(MarkdownUtilsTest.java:27)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

=========== http://daringfireball.net/projects/markdown/basics
Markdown uses asterisks and underscores to indicate spans of emphasis.

Markdown:

Some of these words are emphasized.
Some of these words are emphasized also.

Use two asterisks for strong emphasis.
Or, if you prefer, use two underscores instead.
Output:

Some of these words are emphasized. Some of these words are emphasized also.

Use two asterisks for strong emphasis. Or, if you prefer, use two underscores instead.

Strikethrough support

It would be useful to support strikethrough mode via leading and trailing double-dashes. E.g., dash-dash-foo-dash-dash becomes ~~foo~~.

Image will not parse with the format ![](http://some.url)

I have use the version 1.4.3 and I found that image will not parse with the format :

![](https://some.url) // will not parse

it can be parse with the format :

![alt](https://some.url) // can parse

Empty alt for image should be parse I think, this may be a bug.

The only way to specify a "title" tag on an image is to use the "id" mechanism

I'm able to specify alt-tag on an image using this:

![text](url/to/image.jpg)

But, I cannot specify title tag unless I usethis mechanism:

[arbitrary_id]: path/to/image2.png "title_text" ![alt_text][arbitrary_id]

Lines that begin with a valid `something` sequence are not transformed

An issue was reported against Gitblit for not properly supporting Markdown text with lines that either begin with a something sequence or else lines that only have a something sequence. Gitblit ships with MarkdownPapers v1.3.2 from MavenCentral.

Expected Output
Gitblit/MarkdownPapers output

downstream bug report

Markdown inside span-levelelements does not get parsed

If I put markdown inside of <span> or <a> tags, it does not get parsed. But according to http://daringfireball.net/projects/markdown/syntax#html

unlike block-level HTML tags, Markdown syntax is processed within span-level tags."

Doesn't recognise code blocks within list items

The following doesn't render 'some code' in a code block

* title   
text

        some code

URL with "!" character produces ParseException

A markdown content with a link to an URL containing "!" character (like a Twitter URL) produces a ParseException.
For example, the following content produces ParseException : [My twitter account](https://twitter.com/#!/clacote)

Emphasis content lost in generated HTML

When I am using * or _ to emphase some text, the content between * or _ is lost.

See the second line of the doxia-module page, the doxia-module.md file contains following Markdown :

Write all your docs in **src/site/markdown/** with **md** as file extension, ...

In the generated web page (http://markdown.tautua.org/doxia-module.html) is :

Write all your docs in with as file extension, ...

Rendering problems, wrong emphasis recognition

Wrong emphasis recognition

# First header

*   Invisible underscore: _, c.  

# Second header

*   Invisible underscore: myimage_widthxheight.png

Unicode characters are converted into HTML special chars

Is it an expected behavior of a bug? Looks like something I wouldn't expect..

Fail to parse "" inside paragraph

the parser only detects when comments are found a block level, example

<!--
good comment
-->

inside paragraph <!-- parse
error --> .

A newline inside a bracket breaks the parser

If I write

[
linkme]

I get an IndexOutOfBoundsException:

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.AbstractStringBuilder.charAt(AbstractStringBuilder.java:191)
at java.lang.StringBuilder.charAt(StringBuilder.java:72)
at org.tautua.markdownpapers.ast.Link.getText(Link.java:40)
at org.tautua.markdownpapers.ast.Link.getResource(Link.java:64)
at org.tautua.markdownpapers.HtmlEmitter.visit(HtmlEmitter.java:151)
at org.tautua.markdownpapers.ast.Link.accept(Link.java:92)
at org.tautua.markdownpapers.ast.SimpleNode.childrenAccept(SimpleNode.java:94)
at org.tautua.markdownpapers.HtmlEmitter.visit(HtmlEmitter.java:139)
at org.tautua.markdownpapers.ast.Line.accept(Line.java:31)
at org.tautua.markdownpapers.HtmlEmitter.visitChildrenAndAppendSeparator(HtmlEmitter.java:267)
at org.tautua.markdownpapers.HtmlEmitter.visit(HtmlEmitter.java:212)
at org.tautua.markdownpapers.ast.Paragraph.accept(Paragraph.java:39)
at org.tautua.markdownpapers.HtmlEmitter.visitChildrenAndAppendSeparator(HtmlEmitter.java:267)
at org.tautua.markdownpapers.HtmlEmitter.visit(HtmlEmitter.java:66)
at org.tautua.markdownpapers.ast.Document.accept(Document.java:55)
at org.tautua.markdownpapers.Markdown.transform(Markdown.java:34)

PlainTextEmitter

Would be very convenient if you create a PlainTextEmitter, which would produce a plain text. Sometimes this could be a useful feature.

Rendering issues

I was testing your engine and found multiple differences with other implementations:

Especially this one http://joncom.be/experiments/markdown-editor/edit/

The resulting html, when processed with markdownPapers is slightly different.

Here is the test document I am using

Syntax Cheatsheet

Phrase Emphasis

italic bold
italic bold

Links

Inline:
An example

Reference-style labels (titles are optional):
An example. Then, anywhere
else in the doc, define the link:

Images

Inline (titles are optional):

Reference-style:

Headers

Setext-style:

Header 1

Header 2

atx-style (closing #'s are optional):

Header 1

Header 2

Header 6

Lists

Ordered, without paragraphs:

Unordered, with paragraphs:

A list item.

With multiple paragraphs.

You can nest them:

Abacus
answer
Bubbles

bunk
bupkis
- BELITTLER
burper

Cunning

Blockquotes

Email-style angle brackets
are used for blockquotes.

And, they can be nested.

Headers in blockquotes

You can quote a list.

Etc.

Code Spans

<code> spans are delimited
by backticks.

You can include literal backticks like this.

Preformatted Code Blocks

Indent every line of a code block by at least 4 spaces or 1 tab.
This is a normal paragraph.

This is a preformatted
code block.

Horizontal Rules

Three or more dashes or asterisks:

Manual Line Breaks

End a line with two or more spaces:
Roses are red,
Violets are blue.

ParserException in case of "-" characters

Hi,

I have just realized that "-" characters are not handled correctly if they are in path strings. The example below leads to a ParserException:

![id](path/to/my-photo.png)

But the same is true about links.

An asterisk with a space causes a syntax error

If I'm writing a bulleted list and skip an entry, I get an error.

For example:

(The second line is asterisk,space)

The error is

Encountered " "\n "" at line 2, column 3. Was expecting one of: " " ... "\t" ... "&" ... "" ... "" ... "!" ... ":" ... "\"" ... "=" ... ">" ... "[" ... "(" ... "<" ... "-" ... "+" ... "]" ... ")" ... "#" ... "\'" ... "/" ... "*" ... "_" ... "" ... <CODE_SPAN> ... <NUMBERING> ... <CHAR_ENTITY_REF> ... <NUMERIC_CHAR_REF> ... <ESCAPED_CHAR> ... <CHAR_SEQUENCE> ... " " ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... " " ... " " ... " " ... " " ... <NUMERIC_CHAR_REF> ... <CHAR_ENTITY_REF> ... <CODE_SPAN> ... "[" ... "!" ... "<" ... "_" ... "*" ... " " ... "<" ... <CHAR_SEQUENCE> ... " " ... "&" ... "\\" ... "" ... "!" ... ":" ... "-->" ... "<!--" ... """ ... "=" ... <ESCAPED_CHAR> ... ">" ... "[" ... "(" ... "<" ... "-" ... ... "+" ... "]" ... ")" ... "#" ... "'" ... "/" ... "" ... "\t" ... "" ... "[" ... "!" ... "<" ... "" ... "" ... " " ... "<" ...

Incorrect generation of HTML when entities are positioned inside emphasis underscores

This test case shows incorrect HTML generation when using entities positioned inside emphasis underscores.

__Bold&trade;__

should appear as:

Bold™

import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;

import org.junit.Test;
import org.tautua.markdownpapers.Markdown;
import org.tautua.markdownpapers.parser.ParseException;

import static junit.framework.Assert.assertEquals;

public class MarkdownTest
{

    @Test
    public void testIncorrectEntityTransform()
    {
        // incorrectly generates:
        //            <p>__Bold&trade;__</p>
        assertEquals("<p><strong>Bold&trade;</strong></p>\n", transform("__Bold&trade;__"));
    }

    @Test
    public void testNoEntityTransform()
    {
        assertEquals("<p><strong>Bold</strong></p>\n", transform("__Bold__"));
    }

    @Test
    public void testEntityNotWrappedTransform()
    {
        assertEquals("<p><strong>Bold</strong>&trade;<em>Italics</em></p>\n", transform("__Bold__&trade;_Italics_"));
    }

    private String transform(String in)
    {
        StringWriter out = new StringWriter();
        Markdown md = new Markdown();

        try
        {
            md.transform(new StringReader(in), out);
        }
        catch (ParseException e)
        {
            throw new RuntimeException("Error parsing Markdown", e);
        }

        return out.toString();        
    }

}

Using markdownpapers-core 1.2.7

Embedding code snippets does not work

Embedding code snippets does not work with markdownpapers version 1.2.3.

The following code

<code>
String s1 = new String("");
String s2 = new String("");
</code>

Is displayed as

String s = new String(""); String s1 = new String("");

Whereas, I think both the statements should have been displayed in different lines.

Parser fail when encounter a "!" alone

Parser fail when encounter a "!" alone, for example:

  Hello World !

line [1] Encountered " "!" "! "" at line 1, column 13.
[ERROR] Was expecting one of:
[ERROR] " " ...
[ERROR] "\t" ...
[ERROR] "&" ...
[ERROR] "`" ...

Parsing error

ParseException occured : Encountered " "-" "- "" at line 75, column 19. Was expecting one of: " " ... "\t" ... "&" ... "" ... "" ... "\"" ... ">" ... "(" ... "<" ... ")" ... "\'" ... "/" ... ... ... ... ... " " ... "&" ... ... ... ">" ... "(" ... "<" ... ")" ... "\t" ... "/" ... "\\" ... "" ... "'" ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... ... "<" ... " " ... "<" ... ... " " ... "&" ... "" ... "" ... "\"" ... ... ">" ... "(" ... "<" ... ... " " ... "&" ... ... ... ">" ... "(" ... "<" ... ")" ... "\t" ... "/" ... "\\" ... "" ... "'" ... """ ...

line 75 is the open div, it actually fails on custom-class

<div class="custom-class" markdown="1">
This is a div wrapping some Markdown plus.  Without the DIV attribute, it ignores the 
block. 
</div>

Is there a recommended way to prevent JS injection on user generated content?

Hi,

We are using MardownPapers through Markdown Play! module for user generated content : user are able to fill in their profile with Markdown enabled content.

MarkdownPapers does not prevent JavaScript injection (and it might not be its job to do it).
For exemple, the following Markdown content produces working HTML and JavaScript :

<script language="javascript">function yes(){document.location.href="https://github.com/lruiz/MarkdownPapers";}</script>
<div onmouseover="yes()">Hover with your mouse</div>

Is there a recommended workaround to prevent JavaScript injection?

support markdown plus table

as shown in http://warpedvisions.org/projects/markdown-cheat-sheet/, a table could be something like:

| Header | Header | Right  |
| ------ | ------ | -----: |
|  Cell  |  Cell  |   $10  |
|  Cell  |  Cell  |   $20  |

Content "<3 Play!" produces java.text.ParseException

User generated content "<3 Play!" produces java.text.ParseException : java.text.ParseException: Encountered " "!" "! "" at line 1, column 8. Was expecting one of: "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ...

Merge site generation into main project

Merge site generation into main project, requires doxia-module to be merge too

Code block within a list item

According to Markdown syntax, it should be possible to include code blocks within a list item as follows:

This is a list:

* first item, with some code:

        int x = 10;

* second item

Thank you

(Note that the code block is indented twice with 8 spaces as required)

The output from this parser is:

<p>This is a list:</p>
<ul>
<li>
<p>first item, with some code:</p>
<p>        int x = 10;</p>
</li>
<li><p>second item</p></li>
</ul>
<p>Thank you</p>

However, the expected output should be:

<p>This is a list:</p>
<ul>
<li>
<p>first item, with some code:</p>
<pre><code>int x = 10;</code></pre>
</li>
<li><p>second item</p></li>
</ul>
<p>Thank you</p>

Tested on MarkdownPapers v1.2.3 and v1.3.3, Java 1.6.0.41

Paser error on link's title when "-" appears

The following md will not parse

[Heise-Online](http://www.heise.de "Heise-Online")

Error message

org.tautua.markdownpapers.parser.ParseException: Encountered " "-" "- "" at line 1, column 42.

Reported by Andreas Schouten

Emphasized line in a list is not highlighted correctly

As I understand Mardown, the following code

- Hello

  *World*

Should create a list with 1 item and 2 paragraphs, where the second one is emphasized.

But the following test does not work:

import java.io.StringReader;
import java.io.StringWriter;
import static org.junit.Assert.*;
import org.junit.Test;
import org.tautua.markdownpapers.Markdown;
import org.tautua.markdownpapers.parser.ParseException;

public class MarkdownTest {

    @Test
    public void emphasisAroundElementInAList() {
        String strong = transform("- Hello\n\n  **World**");
        String     em = transform("- Hello\n\n  *World*");

        assertEquals(strong.replace("strong>", "em>"), em);
    }

    private String transform(String in) {
        StringWriter out = new StringWriter();
        Markdown md = new Markdown();

        try {
            md.transform(new StringReader(in), out);
        } catch (ParseException e) {
            throw new RuntimeException("Error parsing Markdown", e);
        }

        return out.toString();
    }
}

Can`t find org.tautua.markdownpapers.parser.Parser

I cant find org.tautua.markdownpapers.parser.Parser`

Anyway to display syntax highlighting in code blocks?

Say github's markdown support GFM and allow you to specify language for code blocks for syntax highlighting. Is there anyway to do similar thing in MarkdownPapers?

Merge doxia-module into the project

a number in image name leads to runtime exception in parsing

Try this syntax:

![my image](/test/image-1.png)

You will get a runtime exception in parsing. 1. is interpreted as a start of a list

Provide a main class so it can be called in command line

Please consider to provide the main class, so one can invoke it in command line.

java markdownpapers.jar input.md output.html

<img src="http://..."> tag causes parser error

Hi,

I tried some markdown processors and all parsed img-tags properly. But somehow, I got a Parser exception because the parser wouldn't accept the ":" from "http://"

example:
(edit: uhm ... this editor directly replaces the <img ... with the image, so it obviously is valid in Markdown) ... but you certainly understand what I mean ...

caused:
Exception in thread "main" org.tautua.markdownpapers.parser.ParseException: Encountered " ":" ": "" at line 32, column 14.
Was expecting one of:
......

Thx in advance for helping me here

All the best,
Thomas

Regressions in 1.2 compared to 1.1?

Hi Larry,

I use your library within my Git hosting project, Gitblit. I'm trying to keep updated on dependencies but the 1.2.x series is causing me trouble. I can not pinpoint the exact trouble as there seems to be more than one problem. I generate my project website from hybrid Markdown/Html sources which are kept alongside my project sources right here on GitHub. The project website built with the 1.1.1 release from the Markdown/Html sources is available here. I recognize that this is not a well defined issue, but maybe if you took a peek at my Markdown sources (docs folder) the flaw (either in my documents or in the library) would stand out.

Thanks,
James

Line breaks are not recognized

Line breaks are not recognized.

Javascript Injection

It's possible to inject malicious javascript code into markdown text.

You should avoid the possibility to use script tag and all event attributes (onXXXX)

Incorrect parsing of HTML entities representing numerical entities

In jjtree/Markdown.jit the expression that checks for numerical references appears to be missing the '#' character that follows the ampersand character. So while this expression is searching for e.g. "&1234;", according to e.g. http://en.wikipedia.org/wiki/Numeric_character_reference it should rather look for "Ӓ". The same holds for hexadecimal numerical entities that would start with "&#x" and not just "&x" as implemented here

MarkdownPapers/core/src/main/jjtree/Markdown.jjt

Line 301 in 306f95b

    
           | < NUMERIC_CHAR_REF : "&" ( ( ["0"-"9"] ){1,4} | "x" ( ["0"-"9", "a"-"f", "A"-"F"] ){1,4} ) ";" >

Quoted entities in code blocks are quoted twice

The following markdown text:

Some text &quot;foo&quot; `hehe` &lt; 3

    some code `with` &quot;stuff&quot; &lt; 2

boo

Results in the following HTML:

<p>Some text &quot;foo&quot; <code>hehe</code> &lt; 3</p>

<pre><code>some code `with` &amp;quot;stuff&amp;quot; &amp;lt; 2
</code></pre>

<p>boo</p>

Note the &quot; that is double quoting.

Rel attribute in links

While rendering markdown using this library, generated links always have <a href=".."> and I want to add <a rel="nofollow" href="..."> to eliminate direct page rank referrals from my user-generated content.

Maybe you can put a flag to constructor in order to eliminate this.

Thanks.

Underscores inside links break the parser

In markdownpapers, the following links should work (i.e. the URLs should preserve underscores), but do not:

This is http://example.com/some_underscored_url

This is a link reference

images cannot be inserted

Hi!

While using the markdownpapers-doxia-module with the maven site plugin, some problems were found. The following code snippets led to double img tags in the html file. (<img<img)

![alt](path/to/image1.png "title")

[id]: path/to/image2.png "title"
![alt][id]

As a result, the images did not show up. In addition, this line resulted in a NullPointerException.

![alt](path/to/image.png)

anchor links don't close

<a id='anchor-point> results in all content after the link being wrapped inside the link

Two problems with parsing (found in gitblit)

Please see

https://code.google.com/p/gitblit/issues/detail?id=326

and

https://code.google.com/p/gitblit/issues/detail?id=327

Not 100% sure if this is what markdown specs say should happen, but other parsers don't appear to have the problem

lt and gt entities in tables

Release 1.2.3 doesn't like lt and gt entities within a td.

assertEquals("<table><tr><td>&lt;test&gt;</td></tr></table>", MarkdownUtils.transformMarkdown("<table><tr><td>&lt;test&gt;</td></tr></table>"));

Links don't work with digits in URL

(using version 1.3.1 from Maven)
This markup:

Try a [reference][here]. 
[here]: http://www.example-0.com/

gives

Try a reference.

0.com/

Also inline links give a parse error,

org.tautua.markdownpapers.parser.ParseException: Encountered " <NUMBERING> "0. ""

It seems to be punctuation followed by a number; www.example1.com is OK, www.example-1.com is not.

lruiz / markdownpapers Goto Github PK

markdownpapers's People

Contributors

Stargazers

Watchers

Forkers

markdownpapers's Issues

Syntax Cheatsheet

Phrase Emphasis

Links

Images

Headers

Header 1

Header 2

Header 1

Header 2

Header 6

Lists

Blockquotes

Headers in blockquotes

Code Spans

Preformatted Code Blocks

Horizontal Rules

Manual Line Breaks

Recommend Projects

Recommend Topics

Recommend Org