Giter Site home page Giter Site logo

perlego-downloader's People

Contributors

evmer avatar jajosheni avatar owohai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

perlego-downloader's Issues

Script stops after running after downloading few chapters

Not really sure if the script has a timeout limit but after printing out the statement "chapters 74-75 downloaded"
the script stops making any progress. and I also can't see the cache file created in the directory.
I'm running linux ubuntu

browser closed

Browser closed unexpectedly:
line 171, in html2pdf
browser = await launch(options={
line 249, in
asyncio.run(html2pdf())

Incorrect PDF

Hi, don't know if you can help with this issue...

I eventually got the script to run after making a few modifications:
I added encoding to line 140
f = open(f'epub_{BOOK_ID}/{page_no}.html', 'w', encoding='utf-8')
and on line 146 I included the option to enable-local-file-access
pdfkit.from_file([f'epub_{BOOK_ID}/{i}.html' for i in range(page_no)], f'{BOOK_ID}.pdf', options={'encoding': 'UTF-8', 'enable-local-file-access': None})

However, the resulting PDF seems incomplete.
When I delete the PDF and re-run the script the PDF is always in the wrong order, and sometimes I get chapters 1, 3, 9, and 11 (and no others), and other times I get different chapters. The number of pages in the resulting PDF varies as well.

uvloop does not support windows

Collecting uvloop
Using cached uvloop-0.16.0.tar.gz (2.1 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Hp\AppData\Local\Temp\pip-install-8yu86g2k\uvloop_b8a859a0e28f49d49f1cf699eb8d3af0\setup.py", line 8, in
raise RuntimeError('uvloop does not support Windows at the moment')
RuntimeError: uvloop does not support Windows at the moment
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Not an issue but a feature request.

Hi, very usefull script.
I was wondering if its possible to make a lightweigt version for scribd too scribd-downloader.py .There is only one solution on github and it has issues.
Thanks.

Unavailable in your region?

Hi again,

I've managed to get the script working, however for whatever reason, there are some books from this series which I'm unable to run the script and it is giving me a "download error: connection timed out" (due to region)?

Could someone see if they're having the same issues? Here is one of the book iDs: 3286443
The title is: PPI California Civil Seismic Principles Practice Exams, 12th Edition

As always, thanks in advance!

Screen Shot 2022-10-05 at 14 36 32
Screen Shot 2022-10-05 at 14 38 23

Issue on Windows 10

I received a message that says "perlego-downloader-main\downloader.py", line 2, in from PIL import Image ModuleNotFoundError: No module named 'PIL'". I'm not sure which step did I do wrong.

No module named 'PIL'

C:\Users\Ioana>python3 C:\Users\Ioana\Desktop\perlego\downloader.py
Traceback (most recent call last):
File "C:\Users\Ioana\Desktop\perlego\downloader.py", line 2, in
from PIL import Image
ModuleNotFoundError: No module named 'PIL'

Error Code 2

Hi! I can't download book. The process of downloading is:

chapters 0-1 downloaded
chapters 1-2 downloaded
chapters 2-4 downloaded
chapters 4-5 downloaded
chapters 5-8 downloaded
chapters 8-13 downloaded
chapters 13-22 downloaded
chapters 22-45 downloaded
chapters 45-70 downloaded
chapters 70-79 downloaded
chapters 79-90 downloaded
SystemExit: {'event': 'error', 'data': {'message': 'An unexpected error occurred.', 'code': 2}}

What does error with 'code 2' mean?
Same error occurs on windows 10 and linux systems.

I am getting an issue with the script

it takes so long to download actually i am not been able to download single book with this script.
i am getting this screen only, surprisingly in all books it shows file size would be 86.8mb.
please help me out with this i am a medical student and i want to download some books.
it will be great if you reply.
Screenshot 2022-11-02 at 1 12 33 AM
Screenshot 2022-11-02 at 1 12 39 AM

Issues: CERTIFICATE VERIFY FAILED

After running the script, error shows below:

init_book_delivery() error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:992)

I am sure I am copy and past correctly.

Pyppeteer Error

Hi there,

Im having issues with pdf books. so far all epubs work well.

Book ID is 3361318, and a folder is created with all the downloaded pages in html format, however error comes up before it converts to the hmtls to pdf and stitches it all together.

I also had to update line 19 (PUPPETEER_THREADS = 50) to 400. not sure, but it was originally only outputting the first 50 pages to html. I have used both 50 and 400 still no effect, but atleast all 255 pages are output as html, rather than just the first 50.

this is the last part that MacOSX terminal outputs.

page 221-222 downloaded
page 222-223 downloaded
page 223-224 downloaded
page 224-225 downloaded
Traceback (most recent call last):
File "/Users/Moon/Desktop/Perlego-Downloader/perlego-downloader-main/downloader.py", line 242, in
asyncio.run(html2pdf())
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/Users/Moon/Desktop/Perlego-Downloader/perlego-downloader-main/downloader.py", line 238, in html2pdf
await asyncio.gather(*[render_page(chapter_no, sem) for chapter_no in contents])
File "/Users/Moon/Desktop/Perlego-Downloader/perlego-downloader-main/downloader.py", line 221, in render_page
await page.goto(f'file://{cache_dir}/{chapter_no}.html', {"waitUntil" : ["load", "domcontentloaded", "networkidle0", "networkidle2"]})
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyppeteer/page.py", line 837, in goto
raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 30000 ms exceeded.


Download all book except one page

Hi!
There were several books I wanted to download but actually couldn't because of a problem on one of the books' pages that does not display on Perlego website.
Is there a way to change code to actually download the whole book except for a few selected pages ?
Thank's a lot

Page UTF-8 encoding issue?

I would come across some books with what looks to be character encoding issues.
These would be seemingly random pages (though always the same pages if I redo the download) in only certain books.

characters such as • or ÂÂÂ, etc will appear across these pages.

High Resources usage

The script works, but I can't get the job done because I have to kill it due to high resources usage. I tried ulimit for RAM and cpulimit too but whenever the code uses chrome, then chrome's resources usage litterally skyrockets to a point where I have to shut down the computer if I wait too long. Anyway I gather it could be possible to limit resources usage through the script, but I'm not so sure if it' going to apply to chrome processes launched from the script.

Connection timed out

Greetings,

I've Mac OS 16, and I followed the instractions exactly step by step by the thing is everytime I run the command it says
perlego % python3 downloader.py
download error: Connection timed out

please your support

what to do?

OSError: [WinError 14001] The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail

help

help me please I tried in ubuntu but it gave me an error

Ubuntu under WSL: Browser closed unexpectedly

Traceback (most recent call last):
File "downloader.py", line 243, in
asyncio.run(html2pdf())
File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "downloader.py", line 164, in html2pdf
browser = await launch(options={
File "/home/forsite/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/home/forsite/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/home/forsite/.local/lib/python3.8/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

init_book_delivery() error

I am using MacOS, and I follow all the steps and installed the required modules.
After fill in all the tokens, and run it, it shows
init_book_delivery() error: module 'websocket' has no attribute 'create_connection'

How to fix this problem?

thank you!

KeyError: 'event' line 70, in <module>

Greetings everyone,

I've update the last files, yet I've an issue
hope you can help

with the following

% python3 downloader.py
chapters 0-1 downloaded
chapters 1-2 downloaded
chapters 2-3 downloaded
chapters 3-4 downloaded
chapters 4-5 downloaded
chapters 5-22 downloaded
chapters 22-34 downloaded
chapters 34-48 downloaded
chapters 48-49 downloaded
Traceback (most recent call last):
File "/Users/user/perlego/downloader.py", line 70, in
if data['event'] == 'error':
KeyError: 'event'

RuntimeError: asyncio.run() cannot be called from a running event loop

Hi Thank you for providing nice code.

Anyway, there is an error like
"
page 136-137 downloaded
page 137-138 downloaded
page 138-139 downloaded
Traceback (most recent call last):

File ~\untitled0.py:250 in
asyncio.run(html2pdf())

File ~\anaconda3\lib\asyncio\runners.py:33 in run
raise RuntimeError(

RuntimeError: asyncio.run() cannot be called from a running event loop
"

How can I fix it? Please. ..

PermissionError: [Errno 13] Permission denied:

Traceback (most recent call last):
File "/home/xxxxx/perlego-downloader/downloader.py", line 156, in
os.mkdir(cache_dir)
PermissionError: [Errno 13] Permission denied: '/home/xxxx/perlego-downloader/PDF_144322/'

Error : Too many files open

Firstly, a good piece of code and a lot of learning for us... the novices.
Secondly, the code breaks down when the number of pages > 1000 (approx) with an error : Too many files open
I've reduced number of threads from 50 to 20.

PUPPETEER_THREADS = 20

Still facing the same issue
Thanx

'Browser closed unexpectedly' error

I have downloaded the new script but I'm still getting errors also the book id I'm trying is 1323963 maybe it has something to do with it? I do not know hope this helps you.
image

Connection to remote host was lost?

Hi all!

Sorry, i'm relatively new to all these coding, so hopefully it's an easy fix for someone out there. I followed the instructions as per your video, however I'm receiving the following error:

"websocket._exceptions.WebSocketConnectionClosedException: Connection to remote host was lost."

Any help would be appreciated. Thanks in advance

Screen Shot 2022-09-11 at 23 49 37

error

hello
i am getting this error dear

building pdf...
Traceback (most recent call last):
File "downloader.py", line 141, in
f.write(content)
File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ue000' in position 15534: character maps to

Unexpected TOKEN - Windows 7 - Error

Hello, first, thank you very much for this great tool.
Second, I'm having this error on Windows 7:

Símbolo (token) inesperado 'C:\Users\Mac\AppData\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32
\chrome.exe' en la expresión o la instrucción.
En línea: 1 Carácter: 914

  • 'C:\Users\Mac\AppData\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32\chrome.exe --disable-bac
    kground-networking --disable-background-timer-throttling --disable-breakpad --disable-browser-side-navigation --disable
    -client-side-phishing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=
    site-per-process --disable-hang-monitor --disable-popup-blocking--disable-prompt-on-repost --disable-sync --disable-tra
    nslate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update --enable-automation --password-store=
    basic--use-mock-keychain --headless --hide-scrollbars --mute-audio --disable-gpu about:blank --remote-debugging-port=17
    61 --user-data-dir=C:\Users\Mac\AppData\Local\pyppeteer\pyppeteer\.dev_profile\tmpa5fw9qio'C:\Users\Mac\AppD
    ata\Local\pyppeteer\pyppeteer\local-chromium\588429\chrome-win32\chrome.exe <<<< --disable-background-networkin
    g --disable-background-timer-throttling --disable-breakpad --disable-browser-side-navigation --disable-client-side-phis
    hing-detection --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=site-per-process
    --disable-hang-monitor --disable-popup-blocking--disable-prompt-on-repost --disable-sync --disable-translate --metrics-
    recording-only --no-first-run --safebrowsing-disable-auto-update --enable-automation --password-store=basic--use-mock-k
    eychain --headless --hide-scrollbars --mute-audio --disable-gpu about:blank --remote-debugging-port=1761 --user-data-di
    r=C:\Users\Mac\AppData\Local\pyppeteer\pyppeteer\.dev_profile\tmpa5fw9qio
    • CategoryInfo : ParserError: (C:\Users\Mac...n32\chrome.exe:String) [], ParentContainsErrorRecordExc
      eption
    • FullyQualifiedErrorId : UnexpectedToken

PS C:\python\python38>

Could you help me please?
Thanks in advanced.

help

First of all thank you very much for this script
as like this script can you please create same in scribd books ?
they have very large library and its very helpful for peoples :)

PDF font size

Is there any way to increase the PDF font size on the output? By default it comes out quite small and it's hard on my old eyes. I found a way to adjust the page margins, but no luck on font size. Any guidance?

download error: Connection timed out"

Hello again,

Sorry to bother you again. I tried running the script again and to no avail :(

It's giving me an error "download error: Connection timed out"

Thanks in advance for your assistance !
Screen Shot 2022-09-18 at 11 56 21

please could someone help

after trying for hours i'm unable to get past the browser error problem

i'm just looking to download book id: 2059382

could someone kindly email me the epub or pdf file to: [email protected]

sorry to post here but thank you so much in advance

Crash on Ubuntu 22.04

Running on Ubuntu 22.04 on an AWS VM. It downloads all the pieces but then when it launches the Chromium module it crashes.

`Traceback (most recent call last):
File "/home/ubuntu/src/perlego-downloader/downloader.py", line 250, in
asyncio.run(html2pdf())
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/home/ubuntu/src/perlego-downloader/downloader.py", line 171, in html2pdf
browser = await launch(options={
File "/home/ubuntu/.local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "/home/ubuntu/.local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "/home/ubuntu/.local/lib/python3.10/site-packages/pyppeteer/launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

`

How to solve 'Memory error'?

Hi all! It seems that my computer does not have enough RAM to process.

I could only download around 600 pages of pdf and then my cmd would shows a memory error.

Is it possible to indicate which page to start? So that it would be possible for me to download it in two parts.

I am using a Ryzen 5 3600 and 16GB ram.

JSONDecodeError("Expecting value", s, err.value) from None

I am new to python and don't know anything. Getting below error can you please help out on this

C:\Users\dell\Downloads\perlego-downloader-main>python3 downloader.py
Traceback (most recent call last):
File "C:\Users\dell\Downloads\perlego-downloader-main\downloader.py", line 38, in
data_content = json.loads(json.loads(data['data']['content']))
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2032.0_x64__qbz5n2kfra8p0\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.