Giter Site home page Giter Site logo

artiomn / markdown_articles_tool Goto Github PK

View Code? Open in Web Editor NEW
107.0 5.0 23.0 308 KB

Parse markdown article, download images and replace images URL's with local paths

License: MIT License

Python 99.93% Shell 0.07%
markdown markdown-converter images md markdown-parser downloader markdown-to-html markdown-to-pdf html markdown-articles

markdown_articles_tool's People

Contributors

artiomn avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

markdown_articles_tool's Issues

Local image processing

Hello!
Can you add processing "local" images ? Copying (relative and absolute paths) and replacing path in file
For example, files in folders:

1\Test.md
1\_d\1.png
_resources\2.png

Content of Test.md:

![](https://pandoc.org/diagram.jpg)

![](_d/1.png)

![](../_resources/2.png)

and command:

markdown_tool.py Test.md -D -d out -p out -O Test2.md

Produces files:

out\diagram.jpg

and file Test2.md

![](out/diagram.jpg)

![](_d/1.png)

![](../_resources/2.png)

Needed:

out\diagram.jpg
out\1.jpg
out\2.jpg

and file:

![](out/diagram.jpg)

![](out/1.png)

![](out/2.png)

Option to replace image file name with hash

Hello!
Can you add option to replace image file name with hash ?
Needed for using in "media" big folder for markdown notes: "deduplicate" while processing many notes and for an unique file names

Image links and folder doesn't match

The script downloaded the images perfectly into an /images folder in the same directory as the files. However, the markdown links only reference the image by name and should be prepended with /images/. Maybe I just need to specify this when running the command, but I'm not sure from the instructions how to do that.

Download image with blank inside

Some tools as codimd use blank to set image size as [name](url =300px)

To download image, a simple fix is

diff --git a/markdown_toolset/www_tools.py b/markdown_toolset/www_tools.py
index bc2e58c..ac54304 100644
--- a/markdown_toolset/www_tools.py
+++ b/markdown_toolset/www_tools.py
@@ -34,6 +34,7 @@ def download_from_url(url: str, timeout=None):
     :param url: URL to download.
     :param timeout: timeout before fail.
     """
+    url = url.split()[0]
 
     try:
         response = requests.get(url, allow_redirects=True, timeout=timeout, headers=NECESSARY_HEADERS)

Link is deleted even though no image was found

Hello

My users may introduce a mistake when writing markdown, tagging an important link as an image. As a result, the tool will try to download the html and then delete the link. Losing the link from the article is a problem.

For example in a md file containing

Important link to remember: ![](https://www.google.com/)

the link would be deleted when processing the article, resulting in the following

Important link to remember: ![](.html)

A solution to this particular case would be to raise an exception when replacing the image link if the file name is empty. In www_tools.py after line 73 in function get_filename_from_url, add

if f_name == "": raise ValueError(f'F_name is empty {req.url}')

However, the problem would persist for certain links.

Important link: ![](https://github.com/artiomn/markdown_articles_tool)
would still be replaced by
Important link: ![](markdown_articles_tool.html)

In this case it would be necessary to check the MIME type of the downloaded content before replacing the link.

Option to specify output filename

Thanks for this nice tool !
Currently the output filename is non-deterministic which makes it hard to use in batch scripts.
Please provide a parameter like --out FILENAME or sth like this.

Image download is skipped when using md image size syntax

Hello

Following the 0.1.2 update, I have noticed that some images were not downloaded. It comes from the fact that in Markdown, you can specify an image width or height by adding " =WIDTHxHEIGHT". But when trying to download the image, the tool includes this information in the image URL. For instance, if a markdown file contains

My avatar scaled to 300 pixels width: ![](https://avatars.githubusercontent.com/u/32387838 =300x)

the tool will try to download the image at

https://avatars.githubusercontent.com/u/32387838 =300x

which is an invalid URL. Thus, the error message for unrecognized MIME type will be printed, and the download will be skipped.

Notes:

  • This syntax is not recognized by every md parser, but it works on CodiMD.
  • A link may still be valid if a query is used in the URL, as =300x will be considered a parameter. For example, https://avatars.githubusercontent.com/u/32387838?s=80&v=4 =300x is a valid URL
  • I found ths syntax described in this StackOverflow answer: https://stackoverflow.com/a/21242579

Support of misplaced local images

Hi

It would be great if this plugin can also search for missing images in the vault. So far I was not able to find a plugin that can find missing images in the vault. Sometimes we move notes around and the attachments/images are not moved properly, so we end up with missing images even though images would be somewhere in the vault.

thanks

--replace-image-names Option Not Implemented

I tried using the --replace-image-names option as described in the README, but it seems like this feature is not implemented yet. When I use it, no image names are actually replaced. Could you please confirm if this feature is currently available or not? If it's not implemented yet, it would be a really useful addition to the tool.

Steps to Reproduce:

Run command markdown_articles_tool --replace-image-names ...
Observe that image names are not replaced.
Expected Outcome:
Image names should be replaced as per the documentation.

Actual Outcome:
No image names were replaced.

Thank you for looking into this issue.

images with unrecognized MIME type work wrong

Hello!
I use this image link format like

![](https://cubox.pro/c/filters:no_upscale()?valid=false&imageUrl=https%3A%2F%2Fpicx.zhimg.com%2F50%2Fv2-53de590b6bb3f42d1a06d28c806c698d_720w.jpg%3Fsource%3D1940ef5c)

so i use the code

python markdown_articles_tool/markdown_tool.py 1.md -E

The program recognized some different image links as identical and replaced the links with

root@pdf:/home/guang# python markdown_articles_tool/markdown_tool.py 1.md -E
Markdown tool version 0.1.3 started...
02.08.2023 05:10:39 File "1.md" will be processed...
02.08.2023 05:10:39 Image public path: 
02.08.2023 05:10:39 Images links count = 17
02.08.2023 05:10:39 Downloading image 1 of 17 from "https://cubox.pro/c/filters:no_upscale()?valid=false&imageUrl=https%3A%2F%2Fpicx.zhimg.com%2F50%2Fv2-53de590b6bb3f42d1a06d28c806c698d_720w.jpg%3Fsource%3D1940ef5c"...
02.08.2023 05:10:40 Image will be written to the file "/home/guang/images/1.png"...
02.08.2023 05:10:40 Downloading image 2 of 17 from "https://cubox.pro/c/filters:no_upscale()?valid=false&imageUrl=https%3A%2F%2Fpica.zhimg.com%2F50%2Fv2-872d10f75dfa52172835fe6fbf22c5fe_720w.jpg%3Fsource%3D1940ef5c"...
02.08.2023 05:10:40 Image will be written to the file "/home/guang/images/1.jpg"...
02.08.2023 05:10:40 Downloading image 3 of 17 from "https://cubox.pro/c/filters:no_upscale()?valid=false&imageUrl=https%3A%2F%2Fpic1.zhimg.com%2F50%2Fv2-c4b89a30d2a3fe1897cfe24388ec935e_720w.jpg%3Fsource%3D1940ef5c"...
02.08.2023 05:10:40 Image "/home/guang/images/1.jpg" already exists and will not be written...
02.08.2023 05:10:40 Downloading image 4 of 17 from "https://cubox.pro/c/filters:no_upscale()?valid=false&imageUrl=https%3A%2F%2Fpic1.zhimg.com%2F50%2Fv2-2a53a9691dd1823bf8e268bccd5ddc33_720w.jpg%3Fsource%3D1940ef5c"...
02.08.2023 05:10:41 Image "/home/guang/images/1.png" already exists and will not be written...
02.08.2023 05:10:41 Downloading image 5 of 17 from "https://cubox.pro/c/filters:no_upscale()?valid=false&imageUrl=https%3A%2F%2Fpic1.zhimg.com%2F50%2Fv2-0efb5de65201ba08c47863d88b61669f_720w.jpg%3Fsource%3D1940ef5c"...

Hope you can help me to solve this problem, thanks

images

are only the images pertinent to the article downloaded or all images?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.