evmer / perlego-downloader Goto Github PK

View Code? Open in Web Editor NEW

102.0 15.0 50.0 36 KB

Download books from Perlego.com in PDF format

License: MIT License

Python 100.00%

download ebook ebook-downloader pdf perlego

perlego-downloader's People

Contributors

Stargazers

Watchers

Forkers

em4r chukiedee pandamoon21 cailong2017 heildampf triandahulu patatonzo ninotaruc0805 suzi66 bleedingsaber t0mekk owohai strfury zzvq lai123pp keipizu daringcalf pangpangcodes wctejerina brucekeefe wil-alex tl4mz gg456yy marcelloo80 afifsj ignaciotfw safaomar88 13iwebster gmavrop dfslammirror osgis-co caian-moreira tua2542 usama455 oppplll legenden84 rubycoder shoaibcaan choche-forense jossuda yjoo975 666pizza rc561u vncloudsco kjww22 nadeemramli drewg981 qxan0nym0usx

perlego-downloader's Issues

uvloop does not support Windows at the moment

uvloop does not support Windows at the moment - which makes the whole app incompatible with Win. any workarounds?

Script stops after running after downloading few chapters

Not really sure if the script has a timeout limit but after printing out the statement "chapters 74-75 downloaded"
the script stops making any progress. and I also can't see the cache file created in the directory.
I'm running linux ubuntu

browser closed

Browser closed unexpectedly:
line 171, in html2pdf
browser = await launch(options={
line 249, in
asyncio.run(html2pdf())

Request: Make this work with books that are in PDF format already

Hi, this seems great, but I believe that it only works with books that are in ePub format on Perlego. Is there any chance it could be expanded to include books that are in .pdf format already?

Incorrect PDF

Hi, don't know if you can help with this issue...

I eventually got the script to run after making a few modifications:
I added encoding to line 140
f = open(f'epub_{BOOK_ID}/{page_no}.html', 'w', encoding='utf-8')
and on line 146 I included the option to enable-local-file-access
pdfkit.from_file([f'epub_{BOOK_ID}/{i}.html' for i in range(page_no)], f'{BOOK_ID}.pdf', options={'encoding': 'UTF-8', 'enable-local-file-access': None})

However, the resulting PDF seems incomplete.
When I delete the PDF and re-run the script the PDF is always in the wrong order, and sometimes I get chapters 1, 3, 9, and 11 (and no others), and other times I get different chapters. The number of pages in the resulting PDF varies as well.

x

uvloop does not support windows

Collecting uvloop
Using cached uvloop-0.16.0.tar.gz (2.1 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Hp\AppData\Local\Temp\pip-install-8yu86g2k\uvloop_b8a859a0e28f49d49f1cf699eb8d3af0\setup.py", line 8, in
raise RuntimeError('uvloop does not support Windows at the moment')
RuntimeError: uvloop does not support Windows at the moment
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Images in PDF file not visible

It seems that images are not extracted from the epub web version correctly - only placeholders appear in the PDF file. So far I only had these problems with the publisher 'Kohlhammer'.

Example: 1074828/treatment-of-eating-disorders-by-emotion-regulation-pdf

Import "requests" /"websocket" /"pyppeeter" /"PyPDF2" could not be resolved from source Pylance

how to solve this?
I installed the requirements in requirements.txt with pip install, but it is still giving me this error in vs code.

Not an issue but a feature request.

Hi, very usefull script.
I was wondering if its possible to make a lightweigt version for scribd too scribd-downloader.py .There is only one solution on github and it has issues.
Thanks.

The read operation timed out

My internet is fine, what is this read operation time out mean?

ModuleNotFoundError: No module named 'PIL'

Script Error: "PUPPETEER_THREADS" no defined

Facing this issue on windows 11.

Unavailable in your region?

Hi again,

I've managed to get the script working, however for whatever reason, there are some books from this series which I'm unable to run the script and it is giving me a "download error: connection timed out" (due to region)?

Could someone see if they're having the same issues? Here is one of the book iDs: 3286443
The title is: PPI California Civil Seismic Principles Practice Exams, 12th Edition

As always, thanks in advance!

Issue on Windows 10

I received a message that says "perlego-downloader-main\downloader.py", line 2, in from PIL import Image ModuleNotFoundError: No module named 'PIL'". I'm not sure which step did I do wrong.

No module named 'PIL'

C:\Users\Ioana>python3 C:\Users\Ioana\Desktop\perlego\downloader.py
Traceback (most recent call last):
File "C:\Users\Ioana\Desktop\perlego\downloader.py", line 2, in
from PIL import Image
ModuleNotFoundError: No module named 'PIL'

Error Code 2

Hi! I can't download book. The process of downloading is:

chapters 0-1 downloaded
chapters 1-2 downloaded
chapters 2-4 downloaded
chapters 4-5 downloaded
chapters 5-8 downloaded
chapters 8-13 downloaded
chapters 13-22 downloaded
chapters 22-45 downloaded
chapters 45-70 downloaded
chapters 70-79 downloaded
chapters 79-90 downloaded
SystemExit: {'event': 'error', 'data': {'message': 'An unexpected error occurred.', 'code': 2}}

What does error with 'code 2' mean?
Same error occurs on windows 10 and linux systems.

I am getting an issue with the script

it takes so long to download actually i am not been able to download single book with this script.
i am getting this screen only, surprisingly in all books it shows file size would be 86.8mb.
please help me out with this i am a medical student and i want to download some books.
it will be great if you reply.

Issues: CERTIFICATE VERIFY FAILED

After running the script, error shows below:

init_book_delivery() error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)

I am sure I am copy and past correctly.

plshelp

Pyppeteer Error

Hi there,

Im having issues with pdf books. so far all epubs work well.

Book ID is 3361318, and a folder is created with all the downloaded pages in html format, however error comes up before it converts to the hmtls to pdf and stitches it all together.

I also had to update line 19 (PUPPETEER_THREADS = 50) to 400. not sure, but it was originally only outputting the first 50 pages to html. I have used both 50 and 400 still no effect, but atleast all 255 pages are output as html, rather than just the first 50.

this is the last part that MacOSX terminal outputs.

page 221-222 downloaded
page 222-223 downloaded
page 223-224 downloaded
page 224-225 downloaded
Traceback (most recent call last):
File "/Users/Moon/Desktop/Perlego-Downloader/perlego-downloader-main/downloader.py", line 242, in
asyncio.run(html2pdf())
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/Users/Moon/Desktop/Perlego-Downloader/perlego-downloader-main/downloader.py", line 238, in html2pdf
await asyncio.gather(*[render_page(chapter_no, sem) for chapter_no in contents])
File "/Users/Moon/Desktop/Perlego-Downloader/perlego-downloader-main/downloader.py", line 221, in render_page
await page.goto(f'file://{cache_dir}/{chapter_no}.html', {"waitUntil" : ["load", "domcontentloaded", "networkidle0", "networkidle2"]})
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyppeteer/page.py", line 837, in goto
raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.

Download all book except one page

Hi!
There were several books I wanted to download but actually couldn't because of a problem on one of the books' pages that does not display on Perlego website.
Is there a way to change code to actually download the whole book except for a few selected pages ?
Thank's a lot

Page UTF-8 encoding issue?

I would come across some books with what looks to be character encoding issues.
These would be seemingly random pages (though always the same pages if I redo the download) in only certain books.

characters such as â€¢Â or ÂÂÂ, etc will appear across these pages.

High Resources usage

The script works, but I can't get the job done because I have to kill it due to high resources usage. I tried ulimit for RAM and cpulimit too but whenever the code uses chrome, then chrome's resources usage litterally skyrockets to a point where I have to shut down the computer if I wait too long. Anyway I gather it could be possible to limit resources usage through the script, but I'm not so sure if it' going to apply to chrome processes launched from the script.

Connection timed out

Greetings,

I've Mac OS 16, and I followed the instractions exactly step by step by the thing is everytime I run the command it says
perlego % python3 downloader.py
download error: Connection timed out

please your support

what to do?

OSError: [WinError 14001] The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail

help

help me please I tried in ubuntu but it gave me an error

Ubuntu under WSL: Browser closed unexpectedly

Traceback (most recent call last):
File "downloader.py", line 243, in
asyncio.run(html2pdf())
File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "downloader.py", line 164, in html2pdf
browser = await launch(options={
File "/home/forsite/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/home/forsite/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/home/forsite/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

init_book_delivery() error

I am using MacOS, and I follow all the steps and installed the required modules.
After fill in all the tokens, and run it, it shows
init_book_delivery() error: module 'websocket' has no attribute 'create_connection'

How to fix this problem?

thank you!

KeyError: 'event' line 70, in <module>

Greetings everyone,

I've update the last files, yet I've an issue
hope you can help

with the following

% python3 downloader.py
chapters 0-1 downloaded
chapters 1-2 downloaded
chapters 2-3 downloaded
chapters 3-4 downloaded
chapters 4-5 downloaded
chapters 5-22 downloaded
chapters 22-34 downloaded
chapters 34-48 downloaded
chapters 48-49 downloaded
Traceback (most recent call last):
File "/Users/user/perlego/downloader.py", line 70, in
if data['event'] == 'error':
KeyError: 'event'

RuntimeError: asyncio.run() cannot be called from a running event loop

Hi Thank you for providing nice code.

Anyway, there is an error like
"
page 136-137 downloaded
page 137-138 downloaded
page 138-139 downloaded
Traceback (most recent call last):

File ~\untitled0.py:250 in
asyncio.run(html2pdf())

File ~\anaconda3\lib\asyncio\runners.py:33 in run
raise RuntimeError(

RuntimeError: asyncio.run() cannot be called from a running event loop
"

How can I fix it? Please. ..

PermissionError: [Errno 13] Permission denied:

Traceback (most recent call last):
File "/home/xxxxx/perlego-downloader/downloader.py", line 156, in
os.mkdir(cache_dir)
PermissionError: [Errno 13] Permission denied: '/home/xxxx/perlego-downloader/PDF_144322/'

Error : Too many files open

Firstly, a good piece of code and a lot of learning for us... the novices.
Secondly, the code breaks down when the number of pages > 1000 (approx) with an error : Too many files open
I've reduced number of threads from 50 to 20.

PUPPETEER_THREADS = 20

Still facing the same issue
Thanx

'Browser closed unexpectedly' error

I have downloaded the new script but I'm still getting errors also the book id I'm trying is 1323963 maybe it has something to do with it? I do not know hope this helps you.

Connection to remote host was lost?

Hi all!

Sorry, i'm relatively new to all these coding, so hopefully it's an easy fix for someone out there. I followed the instructions as per your video, however I'm receiving the following error:

"websocket._exceptions.WebSocketConnectionClosedException: Connection to remote host was lost."

Any help would be appreciated. Thanks in advance

error

hello
i am getting this error dear

building pdf...
Traceback (most recent call last):
File "downloader.py", line 141, in
f.write(content)
File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ue000' in position 15534: character maps to

Unexpected TOKEN - Windows 7 - Error

Hello, first, thank you very much for this great tool.
Second, I'm having this error on Windows 7:

Símbolo (token) inesperado 'C:\Users\Mac\AppData\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32
\chrome.exe' en la expresión o la instrucción.
En línea: 1 Carácter: 914

'C:\Users\Mac\AppData\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32\chrome.exe --disable-bac
kground-networking --disable-background-timer-throttling --disable-breakpad --disable-browser-side-navigation --disable
-client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=
site-per-process --disable-hang-monitor --disable-popup-blocking--disable-prompt-on-repost --disable-sync --disable-tra
nslate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update --enable-automation --password-store=
basic--use-mock-keychain --headless --hide-scrollbars --mute-audio --disable-gpu about:blank --remote-debugging-port=17
61 --user-data-dir=C:\Users\Mac\AppData\Local\pyppeteer\pyppeteer\.dev_profile\tmpa5fw9qio'C:\Users\Mac\AppD
ata\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32\chrome.exe <<<< --disable-background-networkin
g --disable-background-timer-throttling --disable-breakpad --disable-browser-side-navigation --disable-client-side-phis
hing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=site-per-process
--disable-hang-monitor --disable-popup-blocking--disable-prompt-on-repost --disable-sync --disable-translate --metrics-
recording-only --no-first-run --safebrowsing-disable-auto-update --enable-automation --password-store=basic--use-mock-k
eychain --headless --hide-scrollbars --mute-audio --disable-gpu about:blank --remote-debugging-port=1761 --user-data-di
r=C:\Users\Mac\AppData\Local\pyppeteer\pyppeteer\.dev_profile\tmpa5fw9qio
- CategoryInfo : ParserError: (C:\Users\Mac...n32\chrome.exe:String) [], ParentContainsErrorRecordExc
  eption
- FullyQualifiedErrorId : UnexpectedToken

PS C:\python\python38>

Could you help me please?
Thanks in advanced.

help

First of all thank you very much for this script
as like this script can you please create same in scribd books ?
they have very large library and its very helpful for peoples :)

websocket expception error

PDF font size

Is there any way to increase the PDF font size on the output? By default it comes out quite small and it's hard on my old eyes. I found a way to adjust the page margins, but no luck on font size. Any guidance?

Errors when merging pdf pages

Hi all, nice script.
I failed to merge the created pdfs.
How can I fix it? Thanks

download error: Connection timed out"

Hello again,

Sorry to bother you again. I tried running the script again and to no avail :(

It's giving me an error "download error: Connection timed out"

Thanks in advance for your assistance !

please could someone help

after trying for hours i'm unable to get past the browser error problem

i'm just looking to download book id: 2059382

could someone kindly email me the epub or pdf file to: [email protected]

sorry to post here but thank you so much in advance

Crash on Ubuntu 22.04

Running on Ubuntu 22.04 on an AWS VM. It downloads all the pieces but then when it launches the Chromium module it crashes.

`Traceback (most recent call last):
File "/home/ubuntu/src/perlego-downloader/downloader.py", line 250, in
asyncio.run(html2pdf())
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/home/ubuntu/src/perlego-downloader/downloader.py", line 171, in html2pdf
browser = await launch(options={
File "/home/ubuntu/.local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/home/ubuntu/.local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/home/ubuntu/.local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

paragraphs issue

please check whole book have this text issue

this is book link
https://www.perlego.com/book/3294395/second-language-pronunciation-bridging-the-gap-between-research-and-teaching-pdf

How to solve 'Memory error'?

Hi all! It seems that my computer does not have enough RAM to process.

I could only download around 600 pages of pdf and then my cmd would shows a memory error.

Is it possible to indicate which page to start? So that it would be possible for me to download it in two parts.

I am using a Ryzen 5 3600 and 16GB ram.

line 2, in <module> from PIL import Image ModuleNotFoundError: No module named 'PIL'

x

JSONDecodeError("Expecting value", s, err.value) from None

I am new to python and don't know anything. Getting below error can you please help out on this

C:\Users\dell\Downloads\perlego-downloader-main>python3 downloader.py
Traceback (most recent call last):
File "C:\Users\dell\Downloads\perlego-downloader-main\downloader.py", line 38, in
data_content = json.loads(json.loads(data['data']['content']))
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)