In particular, it would be nice if images posted in github issues would get pulled int

To fully untangle image references, rdm tex and <code

URLs can be downloaded in rdm tex or <code class="not

Support images linked via URLs about rdm HOT 9 CLOSED

innolitics commented on August 29, 2024

Support images linked via URLs

from rdm.

Comments (9)

johndgiese commented on August 29, 2024

Would it make sense to assume that, if an image is in regulatory/tmp, we don't want to download it again?

from rdm.

orwonthe commented on August 29, 2024

To fully untangle image references, rdm tex and rdm render need:
(1 sources) A search path or list of places to use as a base for dereferencing relative locations.

I think rdm tex and rdm render need to have an optional document search path argument.
For example, suppose there is a relative reference to '../images/medical/panel_design.svg'
Suppose the search path consists of this list: ['./documents/', './release/images/']

Then rdm would try to find the image
first at './documents/../images/medical/panel_design.svg'
and then at './release/images/../images/medical/panel_design.svg'

(2 destinations) Also needed is the document base of the output file.
Both rdm tex and rdm render currently write to standard out. So there is not currently a way to know where to put a downloaded or copied image since there is no knowledge of the output folder.

For the example above, the image would be copied by rdm render to '../images/medical/panel_design.svg' relative to the document base. rdm tex would the create a .pdf version at the same location.

At least two solutions to problem (2) :
(A) Don't use standard out. Require a --output command argument.
Since we are producing both a .tex or .md output PLUS images we have already broken the standard out metaphor. The output folder becomes the implied document base.

(B) Include an argument that provides the document base.
This would allow the body of the output (but not the images) to be piped through further filters.

Options (A) and (B) are not mutually exclusive. Both command line arguments could be available.

With both option (A) and (B) the document base would be automatically added as an additional source search path.
With either solution downloaded images will be placed in ../images relative to the document base.

Absolute file references would be left alone.
rdm tex would treat the .pdf conversion of and absoluted file reference to an .svg file the same as downloaded images, placing it in
../images with a unique name.

from rdm.

johndgiese commented on August 29, 2024

A search path or list of places to use as a base for dereferencing relative locations.

Could we resolve relative paths using the current document's path as the base path?

(B) Include an argument that provides the document base.

This option feels good to me.

With both option (A) and (B) the document base would be automatically added as an additional source search path.

To avoid the need for a search path, perhaps we could add a setting to config.yml or commandline argument that points to the image directory. Then, if we need to download images we place them in here and swap out the URL for a relative path from the document to this location.

Absolute file references would be left alone.

Definitely! I guess relative file references would also be left alone, right?

from rdm.

johndgiese commented on August 29, 2024

Here is an example:

If we have

documents/file.md

which contains:

![](./images/image.png)
![](/users/dg/file.png)
![](https://githubcdn.com/asdf/asdf/other.png)

And we run `rdm render --download-to=release/images documents/file.md > release/file.md

It would download https://githubcdn.com/asdf/asdf/other.png into images/ (using some name generation algorithm) and then the output file would be:

![](./images/image.png)
![](/users/dg/file.png)
![](./images/hashhash.png)

from rdm.

johndgiese commented on August 29, 2024

Perhaps if no download-to argument is passed in, we could just leave the URLs alone? This way, if someone didn't want to fill their git repo with a bunch of images from github issues they wouldn't have to? Just an idea... I guess we could also make it so the download-to option worked with rdm tex in the same way, so if people wanted to download images in the second stage that would work to. I think this is how we would operate with photonicare... since the markdown files don't really matter for them.

Hmm, this is kind of complicated

from rdm.

orwonthe commented on August 29, 2024

I am thinking using input document path a as one item search path works for all the cases so far. A 'download-to' option for urls works if it is presumed to be either absolute or relative to the output document base or None for no downloads. rdm tex can look for a translate-to option. If missing it could simply translate into the source folder, changing only the extension.

from rdm.

johndgiese commented on August 29, 2024

I think here are a few requirements that weren't apparent initially:

We need to be able to download images from URLs so that we can generate PDFs from the tex files. This was the original requirement I had in mind.
I can see use cases where we would want to download images from URLs when we render the markdown files too. However, if we implemented this, we would still need to support not downloading the images from URLs, since some users (and in particular our two clients that use RDM) won't want to muck up their git repos with a bunch of miscelanious screenshots etc. downloaded from github issues.

It seems like meeting requirement 2 is much more difficult and complicated than meeting requirement 1. This is because the tex files involved with requirement 1 are temporary intermediate files, and the PDFs store images within themselves. Thus, to solve 1 we can simply download each image URL, save it as a hashed filename in regulatory/tmp and expand the filenames in the generated tex files to absolute paths. We wouldn't need to add any new command line options or configuration, and we wouldn't need to come up with a good url-to-filename algorithm.

I think requirement 2 only matters if somebody wants the markdown files to be the "official" version of the documents. I can envision somebody wanting this, but so far, everybody wants to use PDFs or Word Documents for the official version, since the documents need to be shared with non developers who don't use GitHub and in general have little interest in even signing in to browse a markdown file. If the PDFs or Word Documents are "official", then there is much less concern that linked images will change out from under you.

I say "less", since there still is "some" concern. Furthermore, if there was more concern about the instability of the images in URLs, it seems that there are other ways to download and store them in git that would be cleaner than a fully automated solution. E.g., you could write a script that downloads the github issue images into a folder, and add a phony recipe to the makefile for this. By doing it this way, users could retain some organization instead of just seeing hashes in the file names. Of course, this is more work, but it is a possible work around if someone really wanted to.

Given some of the complexity involved with meeting 2 with paths, and with having to store images in git, I'm starting to think we should punt on requirement 2 for now, and just meet 1. This way, we can move on to more pressing issues that have an immediate need (like producing word documents). If we went this route, we should create an issue for requirement 2 and write out our progress so far on this, for the record, and so other RDM users can comment on it.

Let me know what you think about this reasoning.

from rdm.

orwonthe commented on August 29, 2024

URLs can be downloaded in rdm tex or rdm render or neither. That is all easy. The only difficulty is the ambiguity of file based images that use relative paths. This is most easily solved in rdm render since it has the starting source path. We could either copy images or adjust relative paths.

from rdm.

johndgiese commented on August 29, 2024

This is now addressed using pandoc's --extract-media option:

--extract-media=DIR
Extract images and other media contained in or linked from the source document to the path DIR, creating it if necessary, and adjust the images references in the document so they point to the extracted files. If the source format is a binary container (docx, epub, or odt), the media is extracted from the container and the original filenames are used. Otherwise the media is read from the file system or downloaded, and new filenames are constructed based on SHA1 hashes of the contents.

from rdm.

Support images linked via URLs about rdm HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent