Hello , any option(s) to download attachments and save it/them locally too ; some webp

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Attachment(s) download about goscrape HOT 7 CLOSED

cornelk commented on August 16, 2024

Attachment(s) download

from goscrape.

Comments (7)

lotphi commented on August 16, 2024

Why pdf documents are converted as html files ? is it a bug ?

from goscrape.

cornelk commented on August 16, 2024

thanks for the report, do you have any example website to test?

from goscrape.

lotphi commented on August 16, 2024

https://eetaa722.fr/
pdf links are at the end of main page. Here is one of them:
<p style="padding-left: 30px;"><a href="https://eetaa722.fr/wp-content/uploads/2023/02/4-STI2D-term.pdf" rel="attachment wp-att-6320">4-STI2D Term</a></p>

Below a Screenshot

Also, this one have pb
http://eetaa722.fr/index.php/mentions-legales/

from goscrape.

cornelk commented on August 16, 2024

this has been fixed now in the latest main branch version, let me know if you still see any issues

from goscrape.

lotphi commented on August 16, 2024

Soory but still see issues. Here is a log demostrate that pdf files are downloaded as html

`2023-04-01T19:53:10.986+0200 DEBUG HTML Element relinked {"URL": "https://eetaa722.fr/wp-content/uploads/2023/01/Aide-pour-linscription-dans-demarches-simplifiees.pdf", "Fixed": "wp-content/uploads/2023/01/Aide-pour-linscription-dans-demarches-simplifiees.html"}

2023-04-01T19:53:10.989+0200 DEBUG HTML Element relinked {"URL": "https://eetaa722.fr/wp-content/uploads/2023/01/Declaration-du-representant-legal.pdf", "Fixed": "wp-content/uploads/2023/01/Declaration-du-representant-legal.html"}

2023-04-01T19:53:10.989+0200 DEBUG HTML Element relinked {"URL": "https://eetaa722.fr/wp-content/uploads/2023/01/Attestation-sur-lhonneur.pdf", "Fixed": "wp-content/uploads/2023/01/Attestation-sur-lhonneur.html"}`

from goscrape.

cornelk commented on August 16, 2024

@lotphi make sure to install the latest dev version:

go install github.com/cornelk/goscrape@main
goscrape -v https://eetaa722.fr/
...
2023-04-04 12:21:57  DEBUG   HTML Element relinked {"url":"https://eetaa722.fr/wp-content/uploads/2023/01/Declaration-du-representant-legal.pdf","fixed_url":"wp-content/uploads/2023/01/Declaration-du-representant-legal.pdf"}

from goscrape.

lotphi commented on August 16, 2024

Perfect !
Thanks a lot

from goscrape.

Related Issues (18)

Recommend Projects