jenni / obooks Goto Github PK
View Code? Open in Web Editor NEWO'Books :books::sparkles: Download books from O'Reilly | Safaribooks
O'Books :books::sparkles: Download books from O'Reilly | Safaribooks
My library provides access to Safari Books online. However, there is no cookie ID present when you log in due to the fact that it is not a personal login. It's a generic volume login that anyone who has a library card in the system can access
When downloading a book the log says [Success] cover image downloaded successfully!
but the final file shows no cover. At least not on my iOS or MacOS apps (Finder, Books).
I also tried an app called PocketBook Reader but it also did not show a cover.
I usually use the provided docker container.
Enable passing multiple book ids in -b <book_id>
flag.
obooks depends on the epub-gen to convert the downloaded html into an .epub
.
Epub
takes an options objects with a cover:
property that could be a url or a path to a file.
The meta
page fetched in the OBook class contains all the needed info and links about the book. The cover lives in the titlepage.xhtml
. Joining asset_base_url
with cover.jpg
found in the images array gives the full path to the actual cover with an acceptable quality.
Trying to use obooks to downlaod a book for the first time. I'm following the examplw in the readme
Open the folder that contains cloned repo
command: cli.js -b "9781491952016" -c "nk321sl58xb4kg0hhidim67fv09is4mdh"
note: I modified the cookie here so it's not my actual cookie.
When I rand this, I get a small window that appears that says and error happened in cli.js, line 1, character 1.
Error: Invalid Character
I've tried other ways to enter the command:
cli.js -b <9781491952016> -c
cli.js -b 9781491952016 -c nk321sl58xb4kg0hhidim67fv09is4mdh
cli.js -b "<9781491952016>" -c "" this command will just open up cli.js in the text editor...And that's it
when I open the epub on a site that provide epub reader online, the Table of Contents seems broken and not working
here's the ebook id: 9781787125360
Implement logic to execute external calls in parallel
I installed Node.JS and it's dependencies
I ran npm install
I'm trying to download a book using the CLI method, but I keep getting StatusCodeError: 404 - {"error":"Could not fetch work"}
Here is the command I ran: node cli.js -b "<9781491952016>" -c "<uhn7fxvjku8442ejdljcnn2kcueke3df>"
Here is the full output. I don't know JavaScript, so maybe this is something super simple to solve?
:::::::: ::: ::::::::: :::::::: :::::::: ::: ::: ::::::::
:+: :+: :+ :+: :+: :+: :+: :+: :+: :+: :+: :+: :+:
+:+ +:+ +:+ +:+ +:+ +:+ +:+ +:+ +:+ +:+ +:+
+#+ +:+ +#++:++#+ +#+ +:+ +#+ +:+ +#++:++ +#++:++#++
+#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+
#+# #+# #+# #+# #+# #+# #+# #+# #+# #+# #+# #+#
######## ######### ######## ######## ### ### ########
+:++:++:++:+ using stored cookies
StatusCodeError: 404 - {"error":"Could not fetch work"}
at new StatusCodeError (D:\dev\obooks\node_modules\request-promise-core\lib\errors.js:32:15)
at Request.plumbing.callback (D:\dev\obooks\node_modules\request-promise-core\lib\plumbing.js:104:33)
at Request.RP$callback [as _callback] (D:\dev\obooks\node_modules\request-promise-core\lib\plumbing.js:46:31)
at Request.self.callback (D:\dev\obooks\node_modules\request\request.js:185:22)
at Request.emit (events.js:315:20)
at Request.<anonymous> (D:\dev\obooks\node_modules\request\request.js:1161:10)
at Request.emit (events.js:315:20)
at IncomingMessage.<anonymous> (D:\dev\obooks\node_modules\request\request.js:1083:12)
at Object.onceWrapper (events.js:421:28)
at IncomingMessage.emit (events.js:327:22)
at endReadableNT (internal/streams/readable.js:1327:12)
at processTicksAndRejections (internal/process/task_queues.js:80:21) {
statusCode: 404,
error: { error: 'Could not fetch work' },
options: {
uri: 'https://learning.oreilly.com/api/v1/book/<9781491952016>',
method: 'GET',
followAllRedirects: true,
resolveWithFullResponse: true,
headers: {
Accept: '*/*',
'Cache-Control': 'no-cache',
Cookie: '<uhn7oxvj9u8444ejdlj8nn2kcueke3cf>',
Connection: 'keep-alive'
},
body: null,
json: true,
callback: [Function: RP$callback],
transform: undefined,
simple: true,
transform2xxOnly: false
},
response: <ref *1> IncomingMessage {
_readableState: ReadableState {
objectMode: false,
highWaterMark: 16384,
buffer: BufferList { head: null, tail: null, length: 0 },
length: 0,
pipes: [],
flowing: true,
ended: true,
endEmitted: true,
reading: false,
sync: false,
needReadable: false,
emittedReadable: false,
readableListening: false,
resumeScheduled: false,
errorEmitted: false,
emitClose: true,
autoDestroy: false,
destroyed: true,
errored: null,
closed: false,
closeEmitted: false,
defaultEncoding: 'utf8',
awaitDrainWriters: null,
multiAwaitDrain: false,
readingMore: false,
decoder: null,
encoding: null,
[Symbol(kPaused)]: false
},
_events: [Object: null prototype] {
end: [Array],
close: [Array],
data: [Function (anonymous)],
error: [Function (anonymous)]
},
_eventsCount: 4,
_maxListeners: undefined,
socket: null,
httpVersionMajor: 1,
httpVersionMinor: 1,
httpVersion: '1.1',
complete: true,
headers: {
connection: 'keep-alive',
'content-length': '32',
server: 'istio-envoy',
'content-type': 'application/json',
'strict-transport-security': 'max-age=31536000; includeSubDomains',
'x-envoy-upstream-service-time': '98',
etag: 'W/"20-VP1zPcsrMiwJerbVIDyOLy0If+4"',
'x-content-type-options': 'nosniff',
'x-powered-by': 'Express',
'accept-ranges': 'bytes',
date: 'Mon, 12 Apr 2021 18:04:08 GMT',
via: '1.1 varnish',
'x-client-ip': '45.19.192.15',
'x-served-by': 'cache-dal21221-DAL',
'x-cache': 'MISS',
'x-cache-hits': '0',
'x-timer': 'S1618250648.303722,VS0,VE359',
vary: 'Accept, Accept-Encoding, Authorization, Cookie'
},
rawHeaders: [
'Connection',
'keep-alive',
'Content-Length',
'32',
'server',
'istio-envoy',
'content-type',
'application/json',
'strict-transport-security',
'max-age=31536000; includeSubDomains',
'x-envoy-upstream-service-time',
'98',
'etag',
'W/"20-VP1zPcsrMiwJerbVIDyOLy0If+4"',
'x-content-type-options',
'nosniff',
'x-powered-by',
'Express',
'Accept-Ranges',
'bytes',
'Date',
'Mon, 12 Apr 2021 18:04:08 GMT',
'Via',
'1.1 varnish',
'X-Client-IP',
'45.19.192.15',
'X-Served-By',
'cache-dal21221-DAL',
'X-Cache',
'MISS',
'X-Cache-Hits',
'0',
'X-Timer',
'S1618250648.303722,VS0,VE359',
'Vary',
'Accept, Accept-Encoding, Authorization, Cookie'
],
trailers: {},
rawTrailers: [],
aborted: false,
upgrade: false,
url: '',
method: null,
statusCode: 404,
statusMessage: 'Not Found',
client: TLSSocket {
_tlsOptions: [Object],
_secureEstablished: true,
_securePending: false,
_newSessionPending: false,
_controlReleased: true,
secureConnecting: false,
_SNICallback: null,
servername: 'learning.oreilly.com',
alpnProtocol: false,
authorized: true,
authorizationError: null,
encrypted: true,
_events: [Object: null prototype],
_eventsCount: 8,
connecting: false,
_hadError: false,
_parent: null,
_host: 'learning.oreilly.com',
_readableState: [ReadableState],
_maxListeners: undefined,
_writableState: [WritableState],
allowHalfOpen: false,
_sockname: null,
_pendingData: null,
_pendingEncoding: '',
server: undefined,
_server: null,
ssl: null,
_requestCert: true,
_rejectUnauthorized: true,
parser: null,
_httpMessage: [ClientRequest],
[Symbol(res)]: [TLSWrap],
[Symbol(verified)]: true,
[Symbol(pendingSession)]: null,
[Symbol(async_id_symbol)]: 10,
[Symbol(kHandle)]: null,
[Symbol(kSetNoDelay)]: false,
[Symbol(lastWriteQueueSize)]: 0,
[Symbol(timeout)]: null,
[Symbol(kBuffer)]: null,
[Symbol(kBufferCb)]: null,
[Symbol(kBufferGen)]: null,
[Symbol(kCapture)]: false,
[Symbol(kBytesRead)]: 614,
[Symbol(kBytesWritten)]: 238,
[Symbol(connect-options)]: [Object],
[Symbol(RequestTimeout)]: undefined
},
_consuming: true,
_dumped: false,
req: ClientRequest {
_events: [Object: null prototype],
_eventsCount: 5,
_maxListeners: undefined,
outputData: [],
outputSize: 0,
writable: true,
destroyed: true,
_last: true,
chunkedEncoding: false,
shouldKeepAlive: true,
_defaultKeepAlive: true,
useChunkedEncodingByDefault: false,
sendDate: false,
_removedConnection: false,
_removedContLen: false,
_removedTE: false,
_contentLength: null,
_hasBody: true,
_trailer: '',
finished: true,
_headerSent: true,
socket: [TLSSocket],
_header: 'GET /api/v1/book/%3C9781491952016%3E HTTP/1.1\r\n' +
'Accept: */*\r\n' +
'Cache-Control: no-cache\r\n' +
'Cookie: <uhn7oxvj9u8444ejdlj8nn2kcueke3cf>\r\n' +
'Connection: keep-alive\r\n' +
'host: learning.oreilly.com\r\n' +
'content-type: application/json\r\n' +
'content-length: 4\r\n' +
'\r\n',
_keepAliveTimeout: 0,
_onPendingData: [Function: noopPendingOutput],
agent: [Agent],
socketPath: undefined,
method: 'GET',
maxHeaderSize: undefined,
insecureHTTPParser: undefined,
path: '/api/v1/book/%3C9781491952016%3E',
_ended: true,
res: [Circular *1],
aborted: false,
timeoutCb: null,
upgradeOrConnect: false,
parser: null,
maxHeadersCount: null,
reusedSocket: false,
host: 'learning.oreilly.com',
protocol: 'https:',
[Symbol(kCapture)]: false,
[Symbol(kNeedDrain)]: false,
[Symbol(corked)]: 0,
[Symbol(kOutHeaders)]: [Object: null prototype]
},
request: Request {
_events: [Object: null prototype],
_eventsCount: 5,
_maxListeners: undefined,
uri: [Url],
method: 'GET',
followAllRedirects: true,
resolveWithFullResponse: true,
headers: [Object],
body: 'null',
readable: true,
writable: true,
explicitMethod: true,
_qs: [Querystring],
_auth: [Auth],
_oauth: [OAuth],
_multipart: [Multipart],
_redirect: [Redirect],
_tunnel: [Tunnel],
_rp_resolve: [Function (anonymous)],
_rp_reject: [Function (anonymous)],
_rp_promise: [Promise [Object]],
_rp_callbackOrig: undefined,
callback: [Function (anonymous)],
_rp_options: [Object],
setHeader: [Function (anonymous)],
hasHeader: [Function (anonymous)],
getHeader: [Function (anonymous)],
removeHeader: [Function (anonymous)],
localAddress: undefined,
pool: {},
dests: [],
__isRequestRequest: true,
_callback: [Function: RP$callback],
proxy: null,
tunnel: true,
setHost: true,
originalCookieHeader: '<uhn7oxvj9u8444ejdlj8nn2kcueke3cf>',
_disableCookies: true,
_jar: undefined,
port: 443,
host: 'learning.oreilly.com',
path: '/api/v1/book/%3C9781491952016%3E',
_json: true,
httpModule: [Object],
agentClass: [Function: Agent],
agent: [Agent],
_started: true,
href: 'https://learning.oreilly.com/api/v1/book/%3C9781491952016%3E',
req: [ClientRequest],
ntick: true,
response: [Circular *1],
originalHost: 'learning.oreilly.com',
originalHostHeaderName: 'host',
responseContent: [Circular *1],
_destdata: true,
_ended: true,
_callbackCalled: true,
[Symbol(kCapture)]: false
},
toJSON: [Function: responseToJSON],
caseless: Caseless { dict: [Object] },
body: { error: 'Could not fetch work' },
[Symbol(kCapture)]: false,
[Symbol(RequestTimeout)]: undefined
}
}
I did exactly what is described but only download images
Two things because the second doesn't warrant its own issue:
Handle 500 error when downloading a chapter and O'Reilly goes unavailable.
Hi, Hello, this is my first time using docker and I got an error. please help. This is full error
docker: Error response from daemon: create $(pwd)/obooks: "$(pwd)/obooks" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path. See 'docker run --help'. 'datestamp' is not recognized as an internal or external command, operable program or batch file. 'version' is not recognized as an internal or external command, operable program or batch file. 'browserGpcFlag' is not recognized as an internal or external command, operable program or batch file. 'isIABGlobal' is not recognized as an internal or external command, operable program or batch file. 'landingPath' is not recognized as an internal or external command, operable program or batch file. groups: '=C0001%3A1%2CC0002%3A1%2CC0003%3A1%2CC0004%3A1': no such user 'hosts' is not recognized as an internal or external command, operable program or batch file. 'genVendors' is not recognized as an internal or external command, operable program or batch file.
I tried CLI and got error also
node:internal/modules/cjs/loader:1080
throw err;
^
Error: Cannot find module 'commander'
Require stack:
- C:\Users\Admin\Downloads\obooks\obooks-master\cli.js
at Module._resolveFilename (node:internal/modules/cjs/loader:1077:15)
at Module._load (node:internal/modules/cjs/loader:922:27)
at Module.require (node:internal/modules/cjs/loader:1143:19)
at require (node:internal/modules/cjs/helpers:110:18)
at Object.<anonymous> (C:\Users\Admin\Downloads\obooks\obooks-master\cli.js:3:17)
at Module._compile (node:internal/modules/cjs/loader:1256:14)
at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
at Module.load (node:internal/modules/cjs/loader:1119:32)
at Module._load (node:internal/modules/cjs/loader:960:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12) {
code: 'MODULE_NOT_FOUND',
requireStack: [ 'C:\\Users\\Admin\\Downloads\\obooks\\obooks-master\\cli.js' ]
}
Node.js v18.17.0
'datestamp' is not recognized as an internal or external command,
operable program or batch file.
'version' is not recognized as an internal or external command,
operable program or batch file.
'browserGpcFlag' is not recognized as an internal or external command,
operable program or batch file.
'isIABGlobal' is not recognized as an internal or external command,
operable program or batch file.
'landingPath' is not recognized as an internal or external command,
operable program or batch file.
groups: '=C0001%3A1%2CC0002%3A1%2CC0003%3A1%2CC0004%3A1': no such user
'hosts' is not recognized as an internal or external command,
operable program or batch file.
'genVendors' is not recognized as an internal or external command,
operable program or batch file.
O'Reilly might fail halfway of the download. When downloading the book again, check if image exists before downloading it to speed up the process.
Hi,
Can you please provide an example how the cookie string should look like? Whatever I do, I'm always getting this:
(node:6131) UnhandledPromiseRejectionWarning: StatusCodeError: 401 - {"detail":"Authentication credentials were not provided."}
Thanks
> node --version
v14.17.0
> npm -v
6.14.13
> npm list --depth=0
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]
`-- [email protected]
> npm i
npm WARN [email protected] No repository field.
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] (node_modules\fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for [email protected]: wanted {"os":"darwin","arch":"any"} (current: {"os":"win32","arch":"x64"})
added 466 packages from 343 contributors and audited 468 packages in 7.234s
24 packages are looking for funding
run `npm fund` for details
found 2 moderate severity vulnerabilities
run `npm audit fix` to fix them, or `npm audit` for details
After running the following command: node cli.js -b "9781492077992" -c "actual cookies"
it outputs:
+:++:++:++:+ using stored cookies
+:++:++:++:+ downloading: Head First Design Patterns, 2nd Edition
+:++:++:++:+ 22 chapters to download, please wait...
+:++:++:++:+ assembling book...
The cookie was collected by logging in to O'Reilly via Chrome, copied the cookie value from the request header at the network tab of DevTools. Yes, I have an active subscription to O'Reilly where I can read this particular book using the browser. I checked whether it was parsing/storing the correct cookie by observing the generated session.json
file:
{
"cookie": "BrowserCookie=x; salesforce_id=x; groot_sessionid=x; logged_in=y; csrfsafari=x; orm-rt=x; sessionid=x; csrftoken=x; orm-jwt=x"
}
I am having the same issue as #30 where it only downloads images. I have tried different books, it's all the same.
On large books, O'Reilly might suddenly throw a 500 error halfway. Create chapters cache to prevent downloading same chapters on retry.
Running cli.js generates an EPUB file that:
1- Has a <h1>
element right after <body>
at the beginning of each chapter in .xhtml files, which results a duplicate text of the title. The additional <h1>
element is not part of the served .xhtml file from O'Reilly website.
2- Doesn't download the cover image, instead cover.jpg
file displays a generated table of contents.
3- The first page toc.xhtml
is a table of contents that's not part of the book (It's a duplicate of the actual toc).
4- Doesn't download the CSS file.
5- Anchor links at the actual toc is not working, it is locating to a non-existing path which results ERR_FILE_NOT_FOUND
when clicked (Not related to the toc at the first page, that's expected to be removed).
1- Not have duplicate titles in each chapter/section.
2- Display the correct cover image.
3- Removal of the first page toc.xhtml
since the book already contains its own table of contents.
4- Have identical page style from CSS (fonts, italic texts etc..) of the book when viewed from O'Reilly.
5- Clicking on a anchor link at the toc should jump pages and not redirect to an incorrect file location.
Steps to reproduce the issues:
$ ./cli.js -b "9781800560871" -c "Your Cookies"
done ๐โจ
, you will be able to view the issues mentioned above by opening the .epub file using Calibre.I am not aware whether these issues are producible only in specific books.
Hi @jenni
I have recently installed obooks and when running:
./cli.js -b "9781492045342" -e "[email protected]" -p "password"
I am facing the following error:
(node:89676) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'oauth' of undefined
at GoogleAuthentication.goToLoginPage (/Users/MustiDarsh/Desktop/tmp/obooks/lib/authentication.js:99:73)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
at async GoogleAuthentication.authenticate (/Users/MustiDarsh/Desktop/tmp/obooks/lib/authentication.js:45:27)
at async main (/Users/MustiDarsh/Desktop/tmp/obooks/cli.js:64:18)
(node:89676) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:89676) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
I am wondering whether the project is still working and maintained.
Looking forward to hearing from you.
Regards,
MD
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.