Giter Site home page Giter Site logo

breakdance / breakdance Goto Github PK

View Code? Open in Web Editor NEW
512.0 23.0 28.0 1.71 MB

It's time for your markup to get down! HTML to markdown converter. Breakdance is a highly pluggable, flexible and easy to use.

Home Page: https://breakdance.github.io/breakdance/

License: MIT License

JavaScript 70.45% CSS 24.47% HTML 5.08%
markdown html convert parse compile render to-markdown html-to-markdown converter markup

breakdance's Introduction

breakdance NPM version NPM monthly downloads NPM total downloads Build Status

Breakdance is a node.js library for converting HTML to markdown. Highly pluggable, flexible and easy to use. It's time for your markup to get down.

Breakdance is a node.js library for converting HTML to markdown. You can use breakdance to:

  • Migrate HTML blog posts to markdown
  • Convert wiki pages to markdown
  • Convert HTML documentation to markdown
  • Convert HTML presentations or slide decks to markdown
  • Convert busy web pages into readable markdown documents.

Visit our website for detailed documentation, examples, recipes, and advice on authoring and finding plugins.

Why should I use breakdance?

Breakdance uses cheerio to parse HTML, and snapdragon for rendering, which provides granular control over the entire conversion process in a way that is easy to understand, reason about, and customize. If you see something you don't like, it's easy to change!

Generates well-formatted markdown

  • Comprehensive HTML tag coverage.
  • Granular control over every HTML element and attributes
  • Even converts HTML tables to markdown!

Extremely pluggable

Every part of the conversion is customizable:

  • options are available for customizing output of any HTML tag if you don't like the defaults
  • plugins are easy to write if you'd like to share your customizations with the world

HTML-to-markdown example

Tables

The following HTML table from bootstrap's docs:

<h2 id=tables-hover-rows>Hover rows</h2>
<p>Add <code>.table-hover</code> to enable a hover state on table rows within a <code>&lt;tbody&gt;</code>.</p>
<div class=bs-example data-example-id=hoverable-table>
  <table class="table table-hover">
    <thead>
      <tr>
        <th>#</th>
        <th>First Name</th>
        <th>Last Name</th>
        <th>Username</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <th scope=row>1</th>
        <td>Mark</td>
        <td>Otto</td>
        <td>@mdo</td>
      </tr>
      <tr>
        <th scope=row>2</th>
        <td>Jacob</td>
        <td>Thornton</td>
        <td>@fat</td>
      </tr>
      <tr>
        <th scope=row>3</th>
        <td>Larry</td>
        <td>the Bird</td>
        <td>@twitter</td>
      </tr>
    </tbody>
  </table>
</div>

Would render to the following markdown:

## Hover rows

Add `.table-hover` to enable a hover state on table rows within a `<tbody>`.

| # | First Name | Last Name | Username |
| --- | --- | --- | --- |
| 1 | Mark | Otto | @mdo |
| 2 | Jacob | Thornton | @fat |
| 3 | Larry | the Bird | @twitter |

See the documentation for more examples.

About

Community

Get updates on Breakdance's development and chat with the project maintainers and community members.

  • Follow @breakdancejs on Twitter.
  • Join the conversation on Gitter
  • Implementation help may be found on Stack Overflow (please use the tag breakdancebreakdance).
  • For maximum discoverability, plugin developers should use the keyword breakdance on packages which modify or add to the functionality of Breakdance when distributing through npm or similar delivery mechanisms.

Related projects

Contributing

Pull requests and stars are always welcome. For bugs and feature requests, please create an issue.

Please read the contributing guide for advice on opening issues, pull requests, and coding standards.

Contributors

Commits Contributor
118 jonschlinkert
3 doowb
1 davidbgk

Release history

Changelog entries are classified using the following labels from keep-a-changelog:

  • added: for new features
  • changed: for changes in existing functionality
  • deprecated: for once-stable features removed in upcoming releases
  • removed: for deprecated features removed in this release
  • fixed: for any bug fixes

Custom labels used in this changelog:

  • dependencies: bumps dependencies
  • housekeeping: code re-organization, minor edits, or other changes that don't fit in one of the other categories.
3.0.0 - 2017-05-12

Removed

  • CLI was externalized to [breakdance-cli][]
2.0.0 - 2017-04-25

Changed

  • <b>: now renders as **bold**, same as <strong> tags
  • <i>: now renders as _italics_, same as <em> tags
1.1.0 - 2017-04-21

Fixed

  • <code>: improvements to whitespace handling
  • <code>: no longer renders empty tags
  • <p>: normalize Unicode U+00A0 non-breaking spaces to "normal" Unicode U+0020 spaces. Non-breaking spaces are useful in HTML, but cause flow problems in markdown.

Added

  • documentation for options.comments, options.unsmarty, options.trailingWhitespace, all previously undocumented options. See breakdance's options
1.0.0 - 2017-03-12

Added

  • Adds support for <base>, closes issue #3

Changed

  • Changed the CLI command from tomd to either br. As a fallback, you can also use breakdance if there is a conflict. The CLI has not yet been documented, so hopefully this doesn't cause any issues for anyone.

Fixed

  • An extra trailing newline was being added on <code> tags

Added

  • Adds keepEmpty option, to selective keep empty tags that are omitted by built-in omitEmpty tags
  • Adds documentation for omit, pick and omitEmpty and keepEmpty options

Changed

  • Externalized utils.js to breakdance-util, to allow plugin authors to use the same utilities as breakdance, for consistency.

Fixed

  • Better whitespace handling in table, a and dl tags

Added

  • Adds documentation for url option
[0.1.0]

First release.

(Changelog generated by helper-changelog)

Running tests

Running and reviewing unit tests is a great way to get familiarized with a library and its API. You can install dependencies and run tests with the following command:

$ npm install && npm test

Author

Jon Schlinkert

License

Copyright © 2017, Jon Schlinkert. Released under the MIT license.


This file was generated by verb-generate-readme, v0.5.0, on May 12, 2017.

breakdance's People

Contributors

davidbgk avatar doowb avatar jonschlinkert avatar validark avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

breakdance's Issues

Breakdance in browser

I was looking to improve my HTML to Markdown conversion for a web
application I have, but it appears that after an npm install of breakdance
that importing the module causes a "TypeError: fs.readdirSync is not a function".

My suspicion is that it's because the breakdance CLI is part of the
module, even though it's not required if one is calling the
breakdance( html_text) function in a browser context.

Table conversion fails for tables without headers and cells that contain <p>'s

Cells that contain <p>

Input:

<table>
    <thead>
        <tr>
            <th>Heading 1</th>
            <th>Heading 2</th>
            <th>Heading 3</th>
        </tr>
    </thead>
    <tbody>
    <tr>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
    </tr>
    <tr>
        <td>Table cell</td>
        <td>
            <p>p #1</p>
            <p>p #2</p>
        </td>
        <td>Table cell</td>
        <td>Table cell</td>
    </tr>
    </tbody>
</table>

Output:

| Heading 1 | Heading 2 | Heading 3 |
| --- | --- | --- |
| Table cell | Table cell | Table cell | Table cell |
| Table cell |

p #1

p #2  | Table cell | Table cell |

Preferred output:

| Heading 1 | Heading 2 | Heading 3 |
| --- | --- | --- |
| Table cell | Table cell | Table cell | Table cell |
| Table cell | p #1 <br> p #2  | Table cell | Table cell |

or even just removing the line break entirely

Seems related to #7

Tables without a header

Input:

<table>
    <tbody>
    <tr>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
    </tr>
    <tr>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
        <td>Table cell</td>
    </tr>
    </tbody>
</table>

Output (does not show as a table at all):

| Table cell | Table cell | Table cell | Table cell |
| Table cell | Table cell | Table cell | Table cell |

Preferred output:
Add a blank header to the table

| | | | |
| - | - | - | - |
| Table cell | Table cell | Table cell | Table cell |
| Table cell | Table cell | Table cell | Table cell |

<b>, <br> and <i> tags

Hello, thanks for awesome project

console.log(breakdance('<b>fail</b><br><i>fail</i><br><strong>success</strong>'))
<b>fail</b><br>
<i>fail</i><br>
**success**

As I found playing with markup while writing this issue. <b> <i> and <br> works as is and can be not transpiled into markdown **bold** *italic* \n. It would be great if this would be documented in breakdance.

Indeed I need this behavior. Of course I can write my own regexps or tune handlers and I'll do so. But I think there should be a option key that change this behaviour

Thanks for you work

breakdance dot io is no longer your domain

Some of your docs still link to breakdance dot io, but the domain's currently showing a for-sale page. Should re-point all of those links to the GitHub Pages version if the project is still alive!

<a> with <img> inside returns an empty string

The following HTML code:

<a href="/"><img src="/image.jpg alt="Alt text" /></a>

gets converted to an empty string.

I wrote a simple failing test:

it('should convert an anchor with img tag inside to markdown', function () {
  isEqual.inline('<a href="/"><img src="/image.jpg" alt="Alt text" /></a>', '[![Alt text](/image.jpg)](/)');
});

Links does not take <base ...> into account

Hi and thanks for a great lib!

Given an html document like this:

<!doctype html>
<html>
  <head>
    <base href="/pages/">
  </head>
  <body>
    <a href="page2.html">Hi</a>
  </body>
</html>

The actual location for page2.html is /pages/page2.html because there is a <base> element which sets the base url for all relative urls.

But when I compile it to markdown with Breakdance it yields:

[Hi](page2.html)

When it should instead be:

[Hi](/pages/page2.html)

I had a hard time tracking down the domain option to the breakdance-util package, and because it depends on state and options from the compiler I haven't figured out a good way to solve it.

My quick fix is to use cheerio myself like this:

const $ = cheerio.load('...the html above...');
breakdance($.html(), {domain: url.resolve(myUrl, $('base').first().attr('href'))})

This works but it would be better if Breakdance did support the <base> element, which I think it should.

What do you think?

Preferences & Element Studio Option disappeared.

Under builder settings there used to be two more options that disappeared for all my sites.

Preferences & Element Studio

I need to upload custom font that used to be under preferences. Is there any other way to upload custom font?

image (9)

Table conversion fail when containing <h> tags

Breakdance is unable to convert tables that contain text with <h> heading tags.
Example:

<table>
  <tr>
    <th><h3>Firstname</h3></th>
    <th>Lastname</th> 
    <th>Age</th>
  </tr>
</table>

(this converts to markdown that looks identical to the above input)

This behavior seems partially correct since <h> tags are not supported by tables.
However, total failure of conversion to markdown cannot be justified simply because tables can contain any text like '#%&@#&!", so any unsupported or unrecognizable syntax should also be treated like text.
Therefore the expected output should be:
| <h3>Firstname</h3> | Lastname | Age |

The problem is further amplified by the fact that currently Breakdance leaves the whole <table> HTML syntax in the markdown output, thus making it no longer usable as markdown. At minimum any bad syntax could be removed or replaced with a warning like:
[Unsupported HTML syntax - cannot be converted to markdown]

BTW, thanks Breakdance developers for your great contributions!

IMG loses width, height

Parsing something like this:

<img src="url1" width="394" height="106" />

Produces output like this:

![](url1)

While it is unpleasant that markdown doesn't have a way to specify width and height, I think it was more useful to (at least optionally) get output like this:

<img src="url1" width="394" height="106" />

(Which is to say, the input).

I hit upon this but working with tools that round-trip between HTML and markdown. This was the simplest common case I found which is "lossy".

(Already tried: { omit: ['img'] })

Is there any way to get outerHTML of the node?

Hi,
I want render some nodes as html … well leaving them as is. Is it possible?

    const page = `
    <figure>
          <img src="/media-narrow.png" width="800" height="400" alt="A stacked component, image on the top, text below" />
      <figcaption>The component as a single column</figcaption>
    </figure>
    `;
    const breakdance = new Breakdance();
    breakdance.before(['figure'], function(node) {
      this.emit(`I DONT KNOW HOW TO PUT NODE INNER HTML HERE`, node);
    });
    const md = breakdance.render(page);

Also I think that would be nice transform this to:

    ![A stacked component, image on the top, text below][The component as a single column]
    [The component as a single column]: /media-narrow.png "A stacked component, image on the top, text below"

But not sure how to solve this

[email protected] crashes breakdance

Hi, I'm using [email protected] (latest) and this is the error:

> var breakdance = require('breakdance');
> breakdance('<strong>The freaks come out at night!</strong>')
Error: expected node to be an instance of Node
    at assert (/Users/cristi.constantin/Dev/clean-mark/node_modules/snapdragon/node_modules/snapdragon-util/index.js:1018:19)
    at Object.utils.isOpen (/Users/cristi.constantin/Dev/clean-mark/node_modules/snapdragon/node_modules/snapdragon-util/index.js:584:3)
    at Compiler.visit (/Users/cristi.constantin/Dev/clean-mark/node_modules/snapdragon/lib/compiler.js:187:14)
    at Compiler.compiler.visit (/Users/cristi.constantin/Dev/clean-mark/node_modules/breakdance/lib/compiler.js:66:22)
    at Compiler.mapVisit (/Users/cristi.constantin/Dev/clean-mark/node_modules/snapdragon/lib/compiler.js:229:23)
    at Compiler.compile (/Users/cristi.constantin/Dev/clean-mark/node_modules/snapdragon/lib/compiler.js:261:12)
    at Breakdance.compile (/Users/cristi.constantin/Dev/clean-mark/node_modules/breakdance/index.js:348:24)
    at Breakdance.render (/Users/cristi.constantin/Dev/clean-mark/node_modules/breakdance/index.js:373:18)
    at Breakdance (/Users/cristi.constantin/Dev/clean-mark/node_modules/breakdance/index.js:26:25)
❯ node --version
v9.11.1
❯ npm --version
5.8.0

Only with [email protected] I get the error, previous versions (0.11.3 and lower) work fine.

<br> tags in output

How do I avoid ending up with <br> tags in the output?

For example, if the input is: <i>italics</i> <br/> <h1> header 1 </h1>

The ouput is:

 _italics_ <br>

# header 1

The <br/> became a <br> (great!) but then isn't converted to markdown (not great).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.