Giter Site home page Giter Site logo

ije / md4w Goto Github PK

View Code? Open in Web Editor NEW
50.0 4.0 0.0 350 KB

A Markdown renderer written in Zig & C, compiled to WebAssymbly for all JS runtimes.

License: MIT License

Zig 59.61% JavaScript 40.39%
html markdown md4c parser streaming wasm webassembly zig renderer

md4w's Introduction

md4w

A Markdown renderer written in Zig & C, compiled to WebAssymbly for all JS runtimes.

  • Compliance: powered by md4c that is fully compliant to CommonMark 0.31, and partially supports GFM like task list, table, etc.
  • Fast: written in Zig & C, compiled to WebAssembly (it's about 2.5x faster than markdown-it, see benchmark).
  • Small: ~28KB gzipped.
  • Simple: zero dependencies, easy to use.
  • Streaming: supports web streaming API for large markdown files.
  • Universal: works in any JavaScript runtime (Node.js, Deno, Bun, Browsers, Cloudflare Workers, etc).

Usage

// npm i md4w (Node.js, Bun, Cloudflare Workers, etc.)
import { init, mdToHtml, mdToJSON, mdToReadableHtml } from "md4w";
// or use the CDN url (Deno, Browsers)
import { init, mdToHtml, mdToJSON, mdToReadableHtml } from "https://esm.sh/md4w";

// waiting for md4w.wasm...
await init();

// markdown -> HTML
const html = mdToHtml("Stay _foolish_, stay **hungry**!");

// markdown -> HTML (ReadableStream)
const readable = mdToReadableHtml("Stay _foolish_, stay **hungry**!");
const response = new Response(readable, {
  headers: { "Content-Type": "text/html" },
});

// markdown -> JSON
const tree = mdToJSON("Stay _foolish_, stay **hungry**!");

Wasm Mode

md4w provides two webassembly binary files:

  • md4w-fast.wasm: Faster but larger binary file. (270KB gzipped)
  • md4w-small.wasm: Tiny but slower binary file. (28KB gzipped)

By default, md4w uses the md4w-fast.wasm binary from file system, uses the md4w-small.wasm binary from CDN. You can also specify the wasm file by adding the wasmMode option.

import { init } from "md4w";

await init("fast"); // or "small"

If you are using a bundler like vite, you need to configure the wasm input manually.

import { init } from "md4w";
import wasmUrl from "md4w/js/md4w-fast.wasm?url";

await init(wasmUrl);

Parse Flags

By default, md4w uses the following parse flags:

  • COLLAPSE_WHITESPACE: Collapse non-trivial whitespace into single space.
  • PERMISSIVE_ATX_HEADERS: Do not require space in ATX headers (###header).
  • PERMISSIVE_URL_AUTO_LINKS: Recognize URLs as links.
  • STRIKETHROUGH: Text enclosed in tilde marks, e.g. ~foo bar~.
  • TABLES: Support GitHub-style tables.
  • TASK_LISTS: Support GitHub-style task lists.

You can use the parseFlags option to change the renderer behavior:

mdToHtml("Stay _foolish_, stay **hungry**!", {
  parseFlags: [
    "DEFAULT",
    "NO_HTML",
    "LATEX_MATH_SPANS",
    // ... other parse flags
  ],
});

All available parse flags are:

export enum ParseFlags {
  /** Collapse non-trivial whitespace into single space. */
  COLLAPSE_WHITESPACE,
  /** Do not require space in ATX headers ( ###header ) */
  PERMISSIVE_ATX_HEADERS,
  /** Recognize URLs as links. */
  PERMISSIVE_URL_AUTO_LINKS,
  /** Recognize e-mails as links.*/
  PERMISSIVE_EMAIL_AUTO_LINKS,
  /** Disable indented code blocks. (Only fenced code works.) */
  NO_INDENTED_CODE_BLOCKS,
  /** Disable raw HTML blocks. */
  NO_HTML_BLOCKS,
  /** Disable raw HTML (inline). */
  NO_HTML_SPANS,
  /** Support GitHub-style tables. */
  TABLES,
  /** Support strike-through spans (text enclosed in tilde marks, e.g. ~foo bar~). */
  STRIKETHROUGH,
  /** Support WWW autolinks (without proto; just 'www.') */
  PERMISSIVE_WWW_AUTO_LINKS,
  /** Support GitHub-style task lists. */
  TASKLISTS,
  /** Support LaTeX math spans ($...$) and LaTeX display math spans ($$...$$) are supported. (Note though that the HTML renderer outputs them verbatim in a custom tag <x-equation>.) */
  LATEX_MATH_SPANS,
  /** Support wiki-style links ([[link label]] and [[target article|link label]]) are supported. (Note that the HTML renderer outputs them in a custom tag <x-wikilink>.) */
  WIKI_LINKS,
  /** Denotes an underline instead of an ordinary emphasis or strong emphasis. */
  UNDERLINE,
  /** Using hard line breaks. */
  HARD_SOFT_BREAKS,
  /** Shorthand for NO_HTML_BLOCKS | NO_HTML_SPANS */
  NO_HTML,
  /** Default flags COLLAPSE_WHITESPACE | PERMISSIVE_ATX_HEADERS | PERMISSIVE_URL_AUTO_LINKS | STRIKETHROUGH | TABLES | TASK_LISTS */
  DEFAULT,
}

Code Highlighter

md4w would not add colors to the code blocks by default, however, we provide a setCodeHighlighter function to allow you to add any code highlighter you like.

import { setCodeHighlighter } from "md4w";

setCodeHighlighter((code, lang) => {
  return `<pre><code class="language-${lang}">${hl(code)}</code></pre>`;
});

Caveats

  • The returned code will be inserted into the html directly, without html escaping. You should take care of the html escaping by yourself.
  • Although we don't send back the highlighted code to the wasm module, the performance is still impacted by the code highlighter.

Web Streaming API

md4w supports web streaming API for large markdown files, this also is useful for a http server to stream the outputed html.

import { mdToReadableHtml } from "md4w";

const readable = mdToReadableHtml(readFile("large.md"));

// write to file
const file = await Deno.open("/foo/bar.html", { write: true, create: true });
readable.pipeTo(file.writable);

// or send to browser
const response = new Response(readable, {
  headers: { "Content-Type": "text/html" },
});

Buffer Size

By default, md4w uses a buffer size of 4KB for streaming, you can change it by adding the bufferSize option.

mdToReadableHtml(largeMarkdown, {
  bufferSize: 16 * 1024,
});

Caveats

The streaming API currently only uses the buffer for output, you still need to load the whole markdown data into memory.

Rendering to JSON

md4w also provides a mdToJSON function to render the markdown to JSON.

const traverse = (node) => {
  // text node
  if (typeof node === "string") {
    console.log(node);
    return;
  }

  // element type
  console.log(node.type);

  // element attributes (may be undefined)
  console.log(node.props);

  // element children (may be undefined)
  node.children?.forEach(traverse);
};

const tree = mdToJSON("Stay _foolish_, stay **hungry**!");
traverse(tree);

Node Type

The node type is a number that represents the type of the node. You can import the NodeType enum to get the human-readable node type.

import { NodeType } from "md4w";

console.log(NodeType.P); // 9
console.log(NodeType.IMG); // 33

if (node.type === NodeType.IMG) {
  console.log("This is an image node, `src` is", node.props.src);
}

All available node types are defined in the NodeType enum.

Development

The renderer is written in Zig, ensure you have it (0.11.0) installed.

zig build && deno test -A

Benchmark

screenshot

zig build && deno bench -A test/benchmark.js

Prior Art

  • md4c - C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
  • markdown-wasm - Very fast Markdown parser and HTML generator implemented in WebAssembly, based on md4c.

License

MIT

md4w's People

Contributors

ije avatar

Stargazers

tosaka avatar Akira Komamura avatar Ethan Niser avatar Katsuyuki Karasawa avatar Kyle Malloy avatar Damian Reeves avatar Nenad Kostic avatar Rico avatar etienne avatar Lori avatar Carl Räfting avatar Divy Srivastava avatar Dario Vladović avatar Vlad Sirenko avatar Josh Maxwell avatar Ryan Conceicao avatar Taras Glek avatar Siddharth Gelera (reaper) avatar Jonny Gamba avatar Mohammad Bagher Abiyat avatar Orlin M Bozhinov avatar Rintaro Itokawa avatar akumarujon avatar Samuel Burkhard avatar Alexis Delrieu avatar  avatar Divyansh Singh avatar Sébastien Chopin avatar Mladen Macanović avatar Zhao Xiaohong avatar Huy Giang avatar Justin Bennett avatar  avatar Tyler Davis Mitchell avatar Gwenaël Gallon avatar George Kontridze avatar Toshiaki Maki avatar Ryuta Suzuki avatar Craig Doremus avatar Andrew Chou avatar Sébastien Deleuze avatar Lucas Menezes avatar  avatar Frank Denis avatar Tim Bart avatar Duc-Thien Bui avatar Saulius Krasuckas avatar Larry Williamson avatar Pooya Parsa avatar Marcos Adriano avatar

Watchers

 avatar Pooya Parsa avatar  avatar Jonny Gamba avatar

md4w's Issues

In browser loading

I'm a big fan of this lib and thanks for sharing it.

I wanted to ask about the preferred way to load md4w in the browser (client side).

I tried it using Vite (without a CDN). However the dynamic nature of loading prevents Vite from building the final bundle.

<script>
  import { init, mdToHtml } from "md4w";
  await init();
</script>

Error:

✘ [ERROR] Top-level await is not available in the configured target environment ("chrome87", "edge88", "es2020", "firefox78", "safari14" + 2 overrides)

node_modules/.pnpm/[email protected]/node_modules/md4w/js/index.js:12:23:
  12 │   const { readFile } = await import(m);
     ╵                        ~~~~~

Shouldn't be there a dedicated Build for Browsers on npm.js?

JSON parse error with `mdToJSON` on commonamrk

reproduction:

import { init, mdToJSON } from 'md4w'

const commonMark = await fetch('https://github.com/ije/md4w/raw/main/test/commonmark-spec.md').then(r => r.text())

await init()

try {
  console.log(mdToJSON(commonMark))
} catch (error) {
  console.error(error.stack.slice(0, 200))
}
\Error: Failed to parse JSON: Bad escaped character in JSON at position 4959
[{"type":5},{"type":9,"children":["title: CommonMark Spec","\n","author: John MacFarlane","\n","version: '0.31.2'","\n","data

Support WebAssembly ESM Integration / unwasm

Hi! Kinda excited to see this project since I was planning to work on some markdown utils pkg and was looking for a fast native parser, this is an amazing effort 🔥

Some context:

There is an outstanding proposal from WebAssmbly working group to support .wasm imports as ESM imports, resolving the ever-lasting issues of inconsistent WASM support across platforms and tools.

Besides this, we have Cloudflare Workers that have their specific requirement that requires a wasm import to be predictable (and not compiled on demand) (and other worker runtimes that don't like top-level await for init)

To allow widely adopting wasm libraries within Nuxt and UnJS ecosystems, I have recently started working on unwasm which is an effort to allow adopting ESM WASM modules ahead of time in toolings and allow a universal way of wasm module consumption. unwasm is under development but also constantly tested to make sure has maximum compatibility at the same time.

What is required for it to work?

A small refactor to split the init functionality from the rest of logic (mem alloc, and util exports).

If you are interested, I can happily make a minimal environment for you to easily try this.

feature: add render page mode

---
title: Hello World
desc: Hello world!
slug: hello-world
cover: /images/hello-world.png
---

![Cover]({cover})

# {title}

Hello world!

👇

<html>
<head>
  <title>Hello World</title>
  <meta name="description" content="Hello world!">
</head>
<body>
  <p><image alt="Cover" src="/images/hello-world.png"></p>
  <h1>Hello World</h1>
  <p>Hello world!<p>
</body>
</html>

Use global writer

currently we create a new writer for each renderer call, a global writer without memory alloc/free in each render should have better perfermance.

Exposing parse utils

Hi. I quickly made this tracker issue while writing unjs/automd#32 to see if you are interested to also expose a simple parse util? (could be either stream or returning whole AST). This can be used as parser core in unjs/omark ❤️

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.