lruiz / markdownpapers Goto Github PK
View Code? Open in Web Editor NEWMarkdown parser and transformer implemented in Java
Home Page: markdown.tautua.org
Markdown parser and transformer implemented in Java
Home Page: markdown.tautua.org
when I have
<i class="icon-download"></i> // explicitly close tag
It becomes
<i class="icon-download"/> // the close tag is gone
Note the <i>
tag cannot be self closed, the resulted html code cause page rendering problem in browsers.
Thanks for fixing some bugs in the last few months; could you please create a new release at https://github.com/lruiz/MarkdownPapers/downloads?
MarkdownPapers escape my embedded html tags with data-xx attribute:
<div data-bind="foreach:...">
foo
</div>
Becomes:
<p><div data-bind="foreach:...">
foo
</div></p>
In github flavoured markdown I can specify language for code: https://help.github.com/articles/github-flavored-markdown#syntax-highlighting
Some syntax highlighters (like google prettify) enables extended highlighting for this language.
How can I implement it using markdownpapers?
I used the unit test from the gitblit project to do the following tests, Bold, Italic and Underscore are the most common use of markdown.
Test string -> error
"*B*" -> null pointer
"* B *" -> null pointer
**B** -> null pointer
"** B **" -> java.text.ParseException: Encountered " "*" "* "" at line 1, column 2.
Was expecting one of:
" " ...
"\t" ...
" " ...
at com.gitblit.utils.MarkdownUtils.transformMarkdown(MarkdownUtils.java:68)
at com.gitblit.utils.MarkdownUtils.transformMarkdown(MarkdownUtils.java:44)
at com.gitblit.tests.MarkdownUtilsTest.testMarkdown(MarkdownUtilsTest.java:27)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
=========== http://daringfireball.net/projects/markdown/basics
Markdown uses asterisks and underscores to indicate spans of emphasis.
Markdown:
Some of these words are emphasized.
Some of these words are emphasized also.
Use two asterisks for strong emphasis.
Or, if you prefer, use two underscores instead.
Output:
Some of these words are emphasized. Some of these words are emphasized also.
Use two asterisks for strong emphasis. Or, if you prefer, use two underscores instead.
It would be useful to support strikethrough mode via leading and trailing double-dashes. E.g., dash-dash-foo-dash-dash becomes foo.
I have use the version 1.4.3 and I found that image will not parse with the format :
![](https://some.url) // will not parse
it can be parse with the format :
![alt](https://some.url) // can parse
Empty alt for image should be parse I think, this may be a bug.
I'm able to specify alt-tag on an image using this:
![text](url/to/image.jpg)
But, I cannot specify title tag unless I usethis mechanism:
[arbitrary_id]: path/to/image2.png "title_text" ![alt_text][arbitrary_id]
An issue was reported against Gitblit for not properly supporting Markdown text with lines that either begin with a something
sequence or else lines that only have a something
sequence. Gitblit ships with MarkdownPapers v1.3.2 from MavenCentral.
If I put markdown inside of <span>
or <a>
tags, it does not get parsed. But according to http://daringfireball.net/projects/markdown/syntax#html
unlike block-level HTML tags, Markdown syntax is processed within span-level tags."
The following doesn't render 'some code' in a code block
* title
text
some code
A markdown content with a link to an URL containing "!" character (like a Twitter URL) produces a ParseException.
For example, the following content produces ParseException : [My twitter account](https://twitter.com/#!/clacote)
When I am using * or _ to emphase some text, the content between * or _ is lost.
See the second line of the doxia-module page, the doxia-module.md file contains following Markdown :
Write all your docs in **src/site/markdown/** with **md** as file extension, ...
In the generated web page (http://markdown.tautua.org/doxia-module.html) is :
Write all your docs in with as file extension, ...
Wrong emphasis recognition
# First header
* Invisible underscore: _, c.
# Second header
* Invisible underscore: myimage_widthxheight.png
Is it an expected behavior of a bug? Looks like something I wouldn't expect..
the parser only detects when comments are found a block level, example
<!--
good comment
-->
inside paragraph <!-- parse
error --> .
If I write
[
linkme]
I get an IndexOutOfBoundsException:
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.AbstractStringBuilder.charAt(AbstractStringBuilder.java:191)
at java.lang.StringBuilder.charAt(StringBuilder.java:72)
at org.tautua.markdownpapers.ast.Link.getText(Link.java:40)
at org.tautua.markdownpapers.ast.Link.getResource(Link.java:64)
at org.tautua.markdownpapers.HtmlEmitter.visit(HtmlEmitter.java:151)
at org.tautua.markdownpapers.ast.Link.accept(Link.java:92)
at org.tautua.markdownpapers.ast.SimpleNode.childrenAccept(SimpleNode.java:94)
at org.tautua.markdownpapers.HtmlEmitter.visit(HtmlEmitter.java:139)
at org.tautua.markdownpapers.ast.Line.accept(Line.java:31)
at org.tautua.markdownpapers.HtmlEmitter.visitChildrenAndAppendSeparator(HtmlEmitter.java:267)
at org.tautua.markdownpapers.HtmlEmitter.visit(HtmlEmitter.java:212)
at org.tautua.markdownpapers.ast.Paragraph.accept(Paragraph.java:39)
at org.tautua.markdownpapers.HtmlEmitter.visitChildrenAndAppendSeparator(HtmlEmitter.java:267)
at org.tautua.markdownpapers.HtmlEmitter.visit(HtmlEmitter.java:66)
at org.tautua.markdownpapers.ast.Document.accept(Document.java:55)
at org.tautua.markdownpapers.Markdown.transform(Markdown.java:34)
Would be very convenient if you create a PlainTextEmitter
, which would produce a plain text. Sometimes this could be a useful feature.
I was testing your engine and found multiple differences with other implementations:
Especially this one http://joncom.be/experiments/markdown-editor/edit/
The resulting html, when processed with markdownPapers is slightly different.
Here is the test document I am using
italic bold
italic bold
Inline:
An example
Reference-style labels (titles are optional):
An example. Then, anywhere
else in the doc, define the link:
Setext-style:
atx-style (closing #'s are optional):
Ordered, without paragraphs:
Unordered, with paragraphs:
With multiple paragraphs.
You can nest them:
Email-style angle brackets
are used for blockquotes.And, they can be nested.
Headers in blockquotes
- You can quote a list.
- Etc.
<code>
spans are delimited
by backticks.
You can include literal backticks like this
.
Indent every line of a code block by at least 4 spaces or 1 tab.
This is a normal paragraph.
This is a preformatted
code block.
Three or more dashes or asterisks:
End a line with two or more spaces:
Roses are red,
Violets are blue.
Hi,
I have just realized that "-" characters are not handled correctly if they are in path strings. The example below leads to a ParserException:
![id](path/to/my-photo.png)
But the same is true about links.
If I'm writing a bulleted list and skip an entry, I get an error.
For example:
(The second line is asterisk,space)
The error is
Encountered " "\n "" at line 2, column 3. Was expecting one of: " " ... "\t" ... "&" ... "" ... "" ... "!" ... ":" ... "\"" ... "=" ... ">" ... "[" ... "(" ... "<" ... "-" ... "+" ... "]" ... ")" ... "#" ... "\'" ... "/" ... "*" ... "_" ... "<!--" ... "-->" ... <CODE_SPAN> ... <NUMBERING> ... <CHAR_ENTITY_REF> ... <NUMERIC_CHAR_REF> ... <ESCAPED_CHAR> ... <CHAR_SEQUENCE> ... " " ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... "*" ... " " ... " " ... " " ... " " ... <NUMERIC_CHAR_REF> ... <CHAR_ENTITY_REF> ... <CODE_SPAN> ... "[" ... "!" ... "<" ... "_" ... "*" ... " " ... "<" ... <CHAR_SEQUENCE> ... " " ... "&" ... "\\" ... "
" ... "!" ... ":" ... "-->" ... "<!--" ... """ ... "=" ... <ESCAPED_CHAR> ... ">" ... "[" ... "(" ... "<" ... "-" ... ... "+" ... "]" ... ")" ... "#" ... "'" ... "/" ... "" ... "\t" ... "" ... "[" ... "!" ... "<" ... "" ... "" ... " " ... "<" ...
This test case shows incorrect HTML generation when using entities positioned inside emphasis underscores.
__Bold™__
should appear as:
Bold™
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import org.junit.Test;
import org.tautua.markdownpapers.Markdown;
import org.tautua.markdownpapers.parser.ParseException;
import static junit.framework.Assert.assertEquals;
public class MarkdownTest
{
@Test
public void testIncorrectEntityTransform()
{
// incorrectly generates:
// <p>__Bold™__</p>
assertEquals("<p><strong>Bold™</strong></p>\n", transform("__Bold™__"));
}
@Test
public void testNoEntityTransform()
{
assertEquals("<p><strong>Bold</strong></p>\n", transform("__Bold__"));
}
@Test
public void testEntityNotWrappedTransform()
{
assertEquals("<p><strong>Bold</strong>™<em>Italics</em></p>\n", transform("__Bold__™_Italics_"));
}
private String transform(String in)
{
StringWriter out = new StringWriter();
Markdown md = new Markdown();
try
{
md.transform(new StringReader(in), out);
}
catch (ParseException e)
{
throw new RuntimeException("Error parsing Markdown", e);
}
return out.toString();
}
}
Using markdownpapers-core 1.2.7
Embedding code snippets does not work with markdownpapers version 1.2.3.
The following code
<code>
String s1 = new String("");
String s2 = new String("");
</code>
Is displayed as
String s = new String(""); String s1 = new String("");
Whereas, I think both the statements should have been displayed in different lines.
Parser fail when encounter a "!" alone, for example:
Hello World !
line [1] Encountered " "!" "! "" at line 1, column 13.
[ERROR] Was expecting one of:
[ERROR] " " ...
[ERROR] "\t" ...
[ERROR] "&" ...
[ERROR] "`" ...
ParseException occured : Encountered " "-" "- "" at line 75, column 19. Was expecting one of: " " ... "\t" ... "&" ... "" ... "" ... "\"" ... ">" ... "(" ... "<" ... ")" ... "\'" ... "/" ... ... ... ... ... " " ... "&" ... ... ... ">" ... "(" ... "<" ... ")" ... "\t" ... "/" ... "\\" ... "
" ... "'" ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... """ ... ... "<" ... " " ... "<" ... ... " " ... "&" ... "" ... "" ... "\"" ... ... ">" ... "(" ... "<" ... ... " " ... "&" ... ... ... ">" ... "(" ... "<" ... ")" ... "\t" ... "/" ... "\\" ... "
" ... "'" ... """ ...
line 75 is the open div, it actually fails on custom-class
<div class="custom-class" markdown="1">
This is a div wrapping some Markdown plus. Without the DIV attribute, it ignores the
block.
</div>
Hi,
We are using MardownPapers through Markdown Play! module for user generated content : user are able to fill in their profile with Markdown enabled content.
MarkdownPapers does not prevent JavaScript injection (and it might not be its job to do it).
For exemple, the following Markdown content produces working HTML and JavaScript :
<script language="javascript">function yes(){document.location.href="https://github.com/lruiz/MarkdownPapers";}</script>
<div onmouseover="yes()">Hover with your mouse</div>
Is there a recommended workaround to prevent JavaScript injection?
as shown in http://warpedvisions.org/projects/markdown-cheat-sheet/, a table could be something like:
| Header | Header | Right |
| ------ | ------ | -----: |
| Cell | Cell | $10 |
| Cell | Cell | $20 |
User generated content "<3 Play!" produces java.text.ParseException : java.text.ParseException: Encountered " "!" "! "" at line 1, column 8. Was expecting one of: "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ... "=" ...
Merge site generation into main project, requires doxia-module to be merge too
According to Markdown syntax, it should be possible to include code blocks within a list item as follows:
This is a list:
* first item, with some code:
int x = 10;
* second item
Thank you
(Note that the code block is indented twice with 8 spaces as required)
The output from this parser is:
<p>This is a list:</p>
<ul>
<li>
<p>first item, with some code:</p>
<p> int x = 10;</p>
</li>
<li><p>second item</p></li>
</ul>
<p>Thank you</p>
However, the expected output should be:
<p>This is a list:</p>
<ul>
<li>
<p>first item, with some code:</p>
<pre><code>int x = 10;</code></pre>
</li>
<li><p>second item</p></li>
</ul>
<p>Thank you</p>
Tested on MarkdownPapers v1.2.3 and v1.3.3, Java 1.6.0.41
The following md will not parse
[Heise-Online](http://www.heise.de "Heise-Online")
Error message
org.tautua.markdownpapers.parser.ParseException: Encountered " "-" "- "" at line 1, column 42.
Reported by Andreas Schouten
As I understand Mardown, the following code
- Hello
*World*
Should create a list with 1 item and 2 paragraphs, where the second one is emphasized.
But the following test does not work:
import java.io.StringReader;
import java.io.StringWriter;
import static org.junit.Assert.*;
import org.junit.Test;
import org.tautua.markdownpapers.Markdown;
import org.tautua.markdownpapers.parser.ParseException;
public class MarkdownTest {
@Test
public void emphasisAroundElementInAList() {
String strong = transform("- Hello\n\n **World**");
String em = transform("- Hello\n\n *World*");
assertEquals(strong.replace("strong>", "em>"), em);
}
private String transform(String in) {
StringWriter out = new StringWriter();
Markdown md = new Markdown();
try {
md.transform(new StringReader(in), out);
} catch (ParseException e) {
throw new RuntimeException("Error parsing Markdown", e);
}
return out.toString();
}
}
I cant find
org.tautua.markdownpapers.parser.Parser`
Say github's markdown support GFM and allow you to specify language for code blocks for syntax highlighting. Is there anyway to do similar thing in MarkdownPapers?
Merge doxia-module into the project
Try this syntax:
![my image](/test/image-1.png)
You will get a runtime exception in parsing. 1.
is interpreted as a start of a list
Please consider to provide the main class, so one can invoke it in command line.
java markdownpapers.jar input.md output.html
Hi,
I tried some markdown processors and all parsed img-tags properly. But somehow, I got a Parser exception because the parser wouldn't accept the ":" from "http://"
example:
(edit: uhm ... this editor directly replaces the <img ... with the image, so it obviously is valid in Markdown) ... but you certainly understand what I mean ...
caused:
Exception in thread "main" org.tautua.markdownpapers.parser.ParseException: Encountered " ":" ": "" at line 32, column 14.
Was expecting one of:
......
Thx in advance for helping me here
All the best,
Thomas
Hi Larry,
I use your library within my Git hosting project, Gitblit. I'm trying to keep updated on dependencies but the 1.2.x series is causing me trouble. I can not pinpoint the exact trouble as there seems to be more than one problem. I generate my project website from hybrid Markdown/Html sources which are kept alongside my project sources right here on GitHub. The project website built with the 1.1.1 release from the Markdown/Html sources is available here. I recognize that this is not a well defined issue, but maybe if you took a peek at my Markdown sources (docs folder) the flaw (either in my documents or in the library) would stand out.
Thanks,
James
Line breaks are not recognized.
It's possible to inject malicious javascript code into markdown text.
You should avoid the possibility to use script tag and all event attributes (onXXXX)
<script language="javascript">
function yes(){
document.location.href="http://www.mysite.com";
}
</script>
<div onmouseover="yes()">
Bla Bla
</div>
In jjtree/Markdown.jit the expression that checks for numerical references appears to be missing the '#' character that follows the ampersand character. So while this expression is searching for e.g. "&1234;", according to e.g. http://en.wikipedia.org/wiki/Numeric_character_reference it should rather look for "Ӓ". The same holds for hexadecimal numerical entities that would start with "&#x" and not just "&x" as implemented here
The following markdown text:
Some text "foo" `hehe` < 3
some code `with` "stuff" < 2
boo
Results in the following HTML:
<p>Some text "foo" <code>hehe</code> < 3</p>
<pre><code>some code `with` &quot;stuff&quot; &lt; 2
</code></pre>
<p>boo</p>
Note the &quot;
that is double quoting.
While rendering markdown using this library, generated links always have <a href="..">
and I want to add <a rel="nofollow" href="...">
to eliminate direct page rank referrals from my user-generated content.
Maybe you can put a flag to constructor in order to eliminate this.
Thanks.
In markdownpapers, the following links should work (i.e. the URLs should preserve underscores), but do not:
This is http://example.com/some_underscored_url
This is a link reference
Hi!
While using the markdownpapers-doxia-module with the maven site plugin, some problems were found. The following code snippets led to double img tags in the html file. (<img<img)
![alt](path/to/image1.png "title")
[id]: path/to/image2.png "title"
![alt][id]
As a result, the images did not show up. In addition, this line resulted in a NullPointerException.
![alt](path/to/image.png)
<a id='anchor-point> results in all content after the link being wrapped inside the link
Please see
https://code.google.com/p/gitblit/issues/detail?id=326
and
https://code.google.com/p/gitblit/issues/detail?id=327
Not 100% sure if this is what markdown specs say should happen, but other parsers don't appear to have the problem
Release 1.2.3 doesn't like lt and gt entities within a td.
assertEquals("<table><tr><td><test></td></tr></table>", MarkdownUtils.transformMarkdown("<table><tr><td><test></td></tr></table>"));
(using version 1.3.1 from Maven)
This markup:
Try a [reference][here].
[here]: http://www.example-0.com/
gives
Try a reference.
0.com/
Also inline links give a parse error,
org.tautua.markdownpapers.parser.ParseException: Encountered " <NUMBERING> "0. ""
It seems to be punctuation followed by a number; www.example1.com is OK, www.example-1.com is not.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.