evmer / perlego-downloader Goto Github PK
View Code? Open in Web Editor NEWDownload books from Perlego.com in PDF format
License: MIT License
Download books from Perlego.com in PDF format
License: MIT License
uvloop does not support Windows at the moment - which makes the whole app incompatible with Win. any workarounds?
Not really sure if the script has a timeout limit but after printing out the statement "chapters 74-75 downloaded"
the script stops making any progress. and I also can't see the cache file created in the directory.
I'm running linux ubuntu
Browser closed unexpectedly:
line 171, in html2pdf
browser = await launch(options={
line 249, in
asyncio.run(html2pdf())
Hi, this seems great, but I believe that it only works with books that are in ePub format on Perlego. Is there any chance it could be expanded to include books that are in .pdf format already?
Hi, don't know if you can help with this issue...
I eventually got the script to run after making a few modifications:
I added encoding to line 140
f = open(f'epub_{BOOK_ID}/{page_no}.html', 'w', encoding='utf-8')
and on line 146 I included the option to enable-local-file-access
pdfkit.from_file([f'epub_{BOOK_ID}/{i}.html' for i in range(page_no)], f'{BOOK_ID}.pdf', options={'encoding': 'UTF-8', 'enable-local-file-access': None})
However, the resulting PDF seems incomplete.
When I delete the PDF and re-run the script the PDF is always in the wrong order, and sometimes I get chapters 1, 3, 9, and 11 (and no others), and other times I get different chapters. The number of pages in the resulting PDF varies as well.
Collecting uvloop
Using cached uvloop-0.16.0.tar.gz (2.1 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Hp\AppData\Local\Temp\pip-install-8yu86g2k\uvloop_b8a859a0e28f49d49f1cf699eb8d3af0\setup.py", line 8, in
raise RuntimeError('uvloop does not support Windows at the moment')
RuntimeError: uvloop does not support Windows at the moment
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
It seems that images are not extracted from the epub web version correctly - only placeholders appear in the PDF file. So far I only had these problems with the publisher 'Kohlhammer'.
Example: 1074828/treatment-of-eating-disorders-by-emotion-regulation-pdf
how to solve this?
I installed the requirements in requirements.txt with pip install, but it is still giving me this error in vs code.
Hi, very usefull script.
I was wondering if its possible to make a lightweigt version for scribd too scribd-downloader.py .There is only one solution on github and it has issues.
Thanks.
My internet is fine, what is this read operation time out mean?
Hi again,
I've managed to get the script working, however for whatever reason, there are some books from this series which I'm unable to run the script and it is giving me a "download error: connection timed out" (due to region)?
Could someone see if they're having the same issues? Here is one of the book iDs: 3286443
The title is: PPI California Civil Seismic Principles Practice Exams, 12th Edition
As always, thanks in advance!
I received a message that says "perlego-downloader-main\downloader.py", line 2, in from PIL import Image ModuleNotFoundError: No module named 'PIL'". I'm not sure which step did I do wrong.
C:\Users\Ioana>python3 C:\Users\Ioana\Desktop\perlego\downloader.py
Traceback (most recent call last):
File "C:\Users\Ioana\Desktop\perlego\downloader.py", line 2, in
from PIL import Image
ModuleNotFoundError: No module named 'PIL'
Hi! I can't download book. The process of downloading is:
chapters 0-1 downloaded
chapters 1-2 downloaded
chapters 2-4 downloaded
chapters 4-5 downloaded
chapters 5-8 downloaded
chapters 8-13 downloaded
chapters 13-22 downloaded
chapters 22-45 downloaded
chapters 45-70 downloaded
chapters 70-79 downloaded
chapters 79-90 downloaded
SystemExit: {'event': 'error', 'data': {'message': 'An unexpected error occurred.', 'code': 2}}
What does error with 'code 2' mean?
Same error occurs on windows 10 and linux systems.
it takes so long to download actually i am not been able to download single book with this script.
i am getting this screen only, surprisingly in all books it shows file size would be 86.8mb.
please help me out with this i am a medical student and i want to download some books.
it will be great if you reply.
After running the script, error shows below:
init_book_delivery() error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)
I am sure I am copy and past correctly.
Hi there,
Im having issues with pdf books. so far all epubs work well.
Book ID is 3361318, and a folder is created with all the downloaded pages in html format, however error comes up before it converts to the hmtls to pdf and stitches it all together.
I also had to update line 19 (PUPPETEER_THREADS = 50) to 400. not sure, but it was originally only outputting the first 50 pages to html. I have used both 50 and 400 still no effect, but atleast all 255 pages are output as html, rather than just the first 50.
page 221-222 downloaded
page 222-223 downloaded
page 223-224 downloaded
page 224-225 downloaded
Traceback (most recent call last):
File "/Users/Moon/Desktop/Perlego-Downloader/perlego-downloader-main/downloader.py", line 242, in
asyncio.run(html2pdf())
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/Users/Moon/Desktop/Perlego-Downloader/perlego-downloader-main/downloader.py", line 238, in html2pdf
await asyncio.gather(*[render_page(chapter_no, sem) for chapter_no in contents])
File "/Users/Moon/Desktop/Perlego-Downloader/perlego-downloader-main/downloader.py", line 221, in render_page
await page.goto(f'file://{cache_dir}/{chapter_no}.html', {"waitUntil" : ["load", "domcontentloaded", "networkidle0", "networkidle2"]})
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyppeteer/page.py", line 837, in goto
raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.
Hi!
There were several books I wanted to download but actually couldn't because of a problem on one of the books' pages that does not display on Perlego website.
Is there a way to change code to actually download the whole book except for a few selected pages ?
Thank's a lot
I would come across some books with what looks to be character encoding issues.
These would be seemingly random pages (though always the same pages if I redo the download) in only certain books.
characters such as • or ÂÂÂ, etc will appear across these pages.
The script works, but I can't get the job done because I have to kill it due to high resources usage. I tried ulimit for RAM and cpulimit too but whenever the code uses chrome, then chrome's resources usage litterally skyrockets to a point where I have to shut down the computer if I wait too long. Anyway I gather it could be possible to limit resources usage through the script, but I'm not so sure if it' going to apply to chrome processes launched from the script.
Greetings,
I've Mac OS 16, and I followed the instractions exactly step by step by the thing is everytime I run the command it says
perlego % python3 downloader.py
download error: Connection timed out
please your support
OSError: [WinError 14001] The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail
help me please I tried in ubuntu but it gave me an error
Traceback (most recent call last):
File "downloader.py", line 243, in
asyncio.run(html2pdf())
File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "downloader.py", line 164, in html2pdf
browser = await launch(options={
File "/home/forsite/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/home/forsite/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/home/forsite/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
I am using MacOS, and I follow all the steps and installed the required modules.
After fill in all the tokens, and run it, it shows
init_book_delivery() error: module 'websocket' has no attribute 'create_connection'
How to fix this problem?
thank you!
Greetings everyone,
I've update the last files, yet I've an issue
hope you can help
with the following
% python3 downloader.py
chapters 0-1 downloaded
chapters 1-2 downloaded
chapters 2-3 downloaded
chapters 3-4 downloaded
chapters 4-5 downloaded
chapters 5-22 downloaded
chapters 22-34 downloaded
chapters 34-48 downloaded
chapters 48-49 downloaded
Traceback (most recent call last):
File "/Users/user/perlego/downloader.py", line 70, in
if data['event'] == 'error':
KeyError: 'event'
Hi Thank you for providing nice code.
Anyway, there is an error like
"
page 136-137 downloaded
page 137-138 downloaded
page 138-139 downloaded
Traceback (most recent call last):
File ~\untitled0.py:250 in
asyncio.run(html2pdf())
File ~\anaconda3\lib\asyncio\runners.py:33 in run
raise RuntimeError(
RuntimeError: asyncio.run() cannot be called from a running event loop
"
How can I fix it? Please. ..
Traceback (most recent call last):
File "/home/xxxxx/perlego-downloader/downloader.py", line 156, in
os.mkdir(cache_dir)
PermissionError: [Errno 13] Permission denied: '/home/xxxx/perlego-downloader/PDF_144322/'
Firstly, a good piece of code and a lot of learning for us... the novices.
Secondly, the code breaks down when the number of pages > 1000 (approx) with an error : Too many files open
I've reduced number of threads from 50 to 20.
PUPPETEER_THREADS = 20
Still facing the same issue
Thanx
Hi all!
Sorry, i'm relatively new to all these coding, so hopefully it's an easy fix for someone out there. I followed the instructions as per your video, however I'm receiving the following error:
"websocket._exceptions.WebSocketConnectionClosedException: Connection to remote host was lost."
Any help would be appreciated. Thanks in advance
hello
i am getting this error dear
building pdf...
Traceback (most recent call last):
File "downloader.py", line 141, in
f.write(content)
File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ue000' in position 15534: character maps to
Hello, first, thank you very much for this great tool.
Second, I'm having this error on Windows 7:
Símbolo (token) inesperado 'C:\Users\Mac\AppData\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32
\chrome.exe' en la expresión o la instrucción.
En línea: 1 Carácter: 914
PS C:\python\python38>
Could you help me please?
Thanks in advanced.
First of all thank you very much for this script
as like this script can you please create same in scribd books ?
they have very large library and its very helpful for peoples :)
Is there any way to increase the PDF font size on the output? By default it comes out quite small and it's hard on my old eyes. I found a way to adjust the page margins, but no luck on font size. Any guidance?
after trying for hours i'm unable to get past the browser error problem
i'm just looking to download book id: 2059382
could someone kindly email me the epub or pdf file to: [email protected]
sorry to post here but thank you so much in advance
Running on Ubuntu 22.04 on an AWS VM. It downloads all the pieces but then when it launches the Chromium module it crashes.
`Traceback (most recent call last):
File "/home/ubuntu/src/perlego-downloader/downloader.py", line 250, in
asyncio.run(html2pdf())
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/home/ubuntu/src/perlego-downloader/downloader.py", line 171, in html2pdf
browser = await launch(options={
File "/home/ubuntu/.local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/home/ubuntu/.local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/home/ubuntu/.local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:
`
please check whole book have this text issue
this is book link
https://www.perlego.com/book/3294395/second-language-pronunciation-bridging-the-gap-between-research-and-teaching-pdf
Hi all! It seems that my computer does not have enough RAM to process.
I could only download around 600 pages of pdf and then my cmd would shows a memory error.
Is it possible to indicate which page to start? So that it would be possible for me to download it in two parts.
I am using a Ryzen 5 3600 and 16GB ram.
I am new to python and don't know anything. Getting below error can you please help out on this
C:\Users\dell\Downloads\perlego-downloader-main>python3 downloader.py
Traceback (most recent call last):
File "C:\Users\dell\Downloads\perlego-downloader-main\downloader.py", line 38, in
data_content = json.loads(json.loads(data['data']['content']))
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.