Giter Site home page Giter Site logo

tim-gromeyer / html2md Goto Github PK

View Code? Open in Web Editor NEW
15.0 1.0 0.0 889 KB

Transform your HTML into clean, easy-to-read markdown with html2md.

Home Page: https://tim-gromeyer.github.io/html2md/

License: MIT License

CMake 13.85% C++ 76.96% Python 8.71% Shell 0.17% HTML 0.31%
html-to-markdown html5 markdown cpp17 cpp-library html python3 html2markdown html2md

html2md's Introduction

html2md's People

Contributors

dependabot[bot] avatar tim-gromeyer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

html2md's Issues

html2md inserts unnecessary new line breaks

Hey Tim!

Thanks for this library. I'm planning to use this in my block-editor (https://github.com/nuttyartist/notes/tree/block-editor) when a user paste HTML content into the editor, I want to convert it to Markdown.

But I'm encountering a problem, the same one I encountered with QTextDocument::toMarkdown (after doing setHTML). For some reason both insert line breaks (\n) unnecessarily. For example I took the following random text from the internet (https://news.ycombinator.com/item?id=38108048).
m_clipboard->mimeData(QClipboard::Clipboard)->html()
returns:

<meta charset='utf-8'>
<span style=\"color: rgb(0, 0, 0); font-family: Verdana, Geneva, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(246, 246, 239); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;\">My partner is an Astrophysicist who relies on Gnu Emacs as her daily driver. Her work involves managing a treasure trove of legacy code written in a variety of languages like Fortran, Matlab, IDL, and IRAF. This code is essential for her data reduction pipelines, supporting instruments across observatories such as Keck 1 &amp; 2, the AAT, Gemini, and more.</span>
<p style=\"margin-top: 8px; margin-bottom: 0px; color: rgb(0, 0, 0); font-family: Verdana, Geneva, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(246, 246, 239); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;\">Each time she acquires a new Mac, she embarks on a week-long odyssey to set up her computing environment from scratch. It's not because she enjoys it; rather, it's a necessity because the built-in migration assistant just doesn't cut it for her specialised needs.</p>
<p style=\"margin-top: 8px; margin-bottom: 0px; color: rgb(0, 0, 0); font-family: Verdana, Geneva, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(246, 246, 239); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;\">While she currently wields the power of an M1 Max MacBook Pro and runs on the Monterey operating system, she tends to stick with the pre-installed OS for the lifespan of her hardware, which often spans several years. In her case, this could be another 2-3 years or even more before she retires the machine or hands it over to a postdoc or student.</p>
<p style=\"margin-top: 8px; margin-bottom: 0px; color: rgb(0, 0, 0); font-family: Verdana, Geneva, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(246, 246, 239); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;\">But why does she avoid the annual OS upgrades? It's simple. About a decade ago, every OS update would wreak havoc on her meticulously set-up environment. Paths would break, software would malfunction, and libraries that used to reside in one place mysteriously migrated to another. The headache and disruptions were just not worth it.</p>
<p style=\"margin-top: 8px; margin-bottom: 0px; color: rgb(0, 0, 0); font-family: Verdana, Geneva, sans-serif; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(246, 246, 239); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;\">She decided to call it quits on annual OS upgrades roughly 7-8 years ago. While I've suggested Docker as a potential solution, it still requires her to take on the role of administrator and caretaker, which, in her busy world of astrophysical research, can be quite the distraction.</p>"

Using html2md:

My partner is an Astrophysicist who relies on Gnu Emacs as her daily driver. Her\nwork involves managing a treasure trove of legacy code written in a variety of languages\nlike Fortran, Matlab, IDL, and IRAF. This code is essential for her data reduction\npipelines, supporting instruments across observatories such as Keck 1 &amp; 2, the\nAAT, Gemini, and more.\nEach time she acquires a new Mac, she embarks on a week-long odyssey to set up her\ncomputing environment from scratch. It's not because she enjoys it; rather, it's\na necessity because the built-in migration assistant just doesn't cut it for her\nspecialised needs.\n\nWhile she currently wields the power of an M1 Max MacBook Pro and runs on the Monterey\noperating system, she tends to stick with the pre-installed OS for the lifespan of\nher hardware, which often spans several years. In her case, this could be another\n2-3 years or even more before she retires the machine or hands it over to a postdoc\nor student.\n\nBut why does she avoid the annual OS upgrades? It's simple. About a decade ago, every\nOS update would wreak havoc on her meticulously set-up environment. Paths would break,\nsoftware would malfunction, and libraries that used to reside in one place mysteriously\nmigrated to another. The headache and disruptions were just not worth it.\n\nShe decided to call it quits on annual OS upgrades roughly 7-8 years ago. While I've\nsuggested Docker as a potential solution, it still requires her to take on the role\nof administrator and caretaker, which, in her busy world of astrophysical research,\ncan be quite the distraction.\n

While it should return:

My partner is an Astrophysicist who relies on Gnu Emacs as her daily driver. Her work involves managing a treasure trove of legacy code written in a variety of languages like Fortran, Matlab, IDL, and IRAF. This code is essential for her data reduction pipelines, supporting instruments across observatories such as Keck 1 & 2, the AAT, Gemini, and more.\nEach time she acquires a new Mac, she embarks on a week-long odyssey to set up her computing environment from scratch. It's not because she enjoys it; rather, it's a necessity because the built-in migration assistant just doesn't cut it for her specialised needs.\n\nWhile she currently wields the power of an M1 Max MacBook Pro and runs on the Monterey operating system, she tends to stick with the pre-installed OS for the lifespan of her hardware, which often spans several years. In her case, this could be another 2-3 years or even more before she retires the machine or hands it over to a postdoc or student.\n\nBut why does she avoid the annual OS upgrades? It's simple. About a decade ago, every OS update would wreak havoc on her meticulously set-up environment. Paths would break, software would malfunction, and libraries that used to reside in one place mysteriously migrated to another. The headache and disruptions were just not worth it.\n\nShe decided to call it quits on annual OS upgrades roughly 7-8 years ago. While I've suggested Docker as a potential solution, it still requires her to take on the role of administrator and caretaker, which, in her busy world of astrophysical research, can be quite the distraction.

What can be done about this? (QTextMarkdown shares the same problem).

System error when execute html2md.exe

Describe the bug
System error when execute html2md.exe

To Reproduce
Steps to reproduce the behavior:

  1. Run 'html2md.exe'
  2. See error

image

image

image

image

Desktop

  • OS: Windows 10 64-bit

Crash: libc++abi: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc

Hey again Tim!

I tried to paste some HTML from HackerNews into my note app and it crashed on the convert() function. This is the HTML that was copied from the clipboard:

<meta charset='utf-8'><table border=\"0\" style=\"font-family: Verdana, Geneva, sans-serif; letter-spacing: normal; orphans: 2; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(246, 246, 239); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;\"><tbody><tr><td class=\"ind\" indent=\"0\" style=\"font-family: Verdana, Geneva, sans-serif; font-size: 10pt; color: rgb(130, 130, 130);\"><img src=\"https://news.ycombinator.com/s.gif\" height=\"1\" width=\"0\"></td><td valign=\"top\" class=\"votelinks\" style=\"font-family: Verdana, Geneva, sans-serif; font-size: 10pt; color: rgb(130, 130, 130);\"><center><a id=\"up_21885445\" class=\"clicky\" href=\"https://news.ycombinator.com/vote?id=21885445&amp;how=up&amp;auth=4ab46530c8158343f958f2cda580e250bcc8e667&amp;goto=item%3Fid%3D21884828#21885445\" style=\"color: rgb(0, 0, 0); text-decoration: none;\"><div class=\"votearrow\" title=\"upvote\" style=\"width: 10px; height: 10px; border: 0px; margin: 3px 2px 6px; background: url(&quot;triangle.svg&quot;) 0% 0% / 10px, linear-gradient(transparent, transparent) no-repeat;\"></div></a></center></td><td class=\"default\" style=\"font-family: Verdana, Geneva, sans-serif; font-size: 10pt; color: rgb(130, 130, 130);\"><div style=\"margin-top: 2px; margin-bottom: -10px;\"><span class=\"comhead\" style=\"font-family: Verdana, Geneva, sans-serif; font-size: 8pt; color: rgb(130, 130, 130);\"><a href=\"https://news.ycombinator.com/user?id=brudgers\" class=\"hnuser\" style=\"color: rgb(130, 130, 130); text-decoration: none;\">brudgers</a><span> </span><span class=\"age\" title=\"2019-12-26T17:52:06\"><a href=\"https://news.ycombinator.com/item?id=21885445\" style=\"color: rgb(130, 130, 130); text-decoration: none;\">on Dec 27, 2019</a></span><span> </span><span id=\"unv_21885445\"></span><span class=\"navs\">|<span> </span><a href=\"https://news.ycombinator.com/item?id=21884828#21894436\" class=\"clicky\" aria-hidden=\"true\" style=\"color: rgb(130, 130, 130); text-decoration: none;\">next</a><span> </span><a class=\"togg clicky\" id=\"21885445\" n=\"28\" href=\"javascript:void(0)\" style=\"color: rgb(130, 130, 130); text-decoration: none;\">[–]</a><span class=\"onstory\"></span></span></span></div><br><div class=\"comment\" style=\"font-family: Verdana, Geneva, sans-serif; font-size: 9pt; max-width: 970px; overflow-wrap: anywhere; overflow: hidden;\"><span class=\"commtext c00\" style=\"color: rgb(0, 0, 0);\">Excel alternatives might be uncountable. Implementing spreadsheet basics is an advanced beginner exercise. But even Google’s billions only get it a distant second best because Microsoft is still working hard despite the lead. Sure Google and Apple can meet most needs most of the time. They’re good enough mainly because they are free beer. Not because they are open source. Obviously.</span></div></td></tr></tbody></table>

Can you verify if you also experience this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.