Giter Site home page Giter Site logo

jenni / obooks Goto Github PK

View Code? Open in Web Editor NEW
103.0 6.0 14.0 850 KB

O'Books :books::sparkles: Download books from O'Reilly | Safaribooks

JavaScript 98.60% Dockerfile 1.40%
safaribooks oreilly oreilly-books oreilly-books-downloader safaribooksonline ebooks epub hacktoberfest

obooks's People

Contributors

arithmomaniac avatar dependabot[bot] avatar jenni avatar rukia-a avatar tralafarlaw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

obooks's Issues

No cookie ID present when logged in with a library card

My library provides access to Safari Books online. However, there is no cookie ID present when you log in due to the fact that it is not a personal login. It's a generic volume login that anyone who has a library card in the system can access

Cover is not properly saved

When downloading a book the log says [Success] cover image downloaded successfully! but the final file shows no cover. At least not on my iOS or MacOS apps (Finder, Books).
I also tried an app called PocketBook Reader but it also did not show a cover.

I usually use the provided docker container.

Cover image missing from the books

obooks depends on the epub-gen to convert the downloaded html into an .epub.
Epub takes an options objects with a cover: property that could be a url or a path to a file.
The meta page fetched in the OBook class contains all the needed info and links about the book. The cover lives in the titlepage.xhtml. Joining asset_base_url with cover.jpg found in the images array gives the full path to the actual cover with an acceptable quality.

"Invalid Character" Error when trying to download a book

Trying to use obooks to downlaod a book for the first time. I'm following the examplw in the readme
Open the folder that contains cloned repo
command: cli.js -b "9781491952016" -c "nk321sl58xb4kg0hhidim67fv09is4mdh"

note: I modified the cookie here so it's not my actual cookie.

When I rand this, I get a small window that appears that says and error happened in cli.js, line 1, character 1.

Error: Invalid Character

I've tried other ways to enter the command:

cli.js -b <9781491952016> -c
cli.js -b 9781491952016 -c nk321sl58xb4kg0hhidim67fv09is4mdh
cli.js -b "<9781491952016>" -c "" this command will just open up cli.js in the text editor...And that's it

broken generated epub file

when I open the epub on a site that provide epub reader online, the Table of Contents seems broken and not working

here's the ebook id: 9781787125360

Trying to make this work, but I'm not understanding what I'm doing wrong

I installed Node.JS and it's dependencies
I ran npm install

I'm trying to download a book using the CLI method, but I keep getting StatusCodeError: 404 - {"error":"Could not fetch work"}
Here is the command I ran: node cli.js -b "<9781491952016>" -c "<uhn7fxvjku8442ejdljcnn2kcueke3df>"

Here is the full output. I don't know JavaScript, so maybe this is something super simple to solve?

          ::::::::  ::: :::::::::   ::::::::   ::::::::  :::    ::: ::::::::
        :+:    :+: :+  :+:    :+: :+:    :+: :+:    :+: :+:   :+: :+:    :+:
       +:+    +:+     +:+    +:+ +:+    +:+ +:+    +:+ +:+  +:+  +:+
      +#+    +:+     +#++:++#+  +#+    +:+ +#+    +:+ +#++:++   +#++:++#++
     +#+    +#+     +#+    +#+ +#+    +#+ +#+    +#+ +#+  +#+         +#+
    #+#    #+#     #+#    #+# #+#    #+# #+#    #+# #+#   #+# #+#    #+#
    ########      #########   ########   ########  ###    ### ########














+:++:++:++:+      using stored cookies
StatusCodeError: 404 - {"error":"Could not fetch work"}
    at new StatusCodeError (D:\dev\obooks\node_modules\request-promise-core\lib\errors.js:32:15)
    at Request.plumbing.callback (D:\dev\obooks\node_modules\request-promise-core\lib\plumbing.js:104:33)
    at Request.RP$callback [as _callback] (D:\dev\obooks\node_modules\request-promise-core\lib\plumbing.js:46:31)
    at Request.self.callback (D:\dev\obooks\node_modules\request\request.js:185:22)
    at Request.emit (events.js:315:20)
    at Request.<anonymous> (D:\dev\obooks\node_modules\request\request.js:1161:10)
    at Request.emit (events.js:315:20)
    at IncomingMessage.<anonymous> (D:\dev\obooks\node_modules\request\request.js:1083:12)
    at Object.onceWrapper (events.js:421:28)
    at IncomingMessage.emit (events.js:327:22)
    at endReadableNT (internal/streams/readable.js:1327:12)
    at processTicksAndRejections (internal/process/task_queues.js:80:21) {
  statusCode: 404,
  error: { error: 'Could not fetch work' },
  options: {
    uri: 'https://learning.oreilly.com/api/v1/book/<9781491952016>',
    method: 'GET',
    followAllRedirects: true,
    resolveWithFullResponse: true,
    headers: {
      Accept: '*/*',
      'Cache-Control': 'no-cache',
      Cookie: '<uhn7oxvj9u8444ejdlj8nn2kcueke3cf>',
      Connection: 'keep-alive'
    },
    body: null,
    json: true,
    callback: [Function: RP$callback],
    transform: undefined,
    simple: true,
    transform2xxOnly: false
  },
  response: <ref *1> IncomingMessage {
    _readableState: ReadableState {
      objectMode: false,
      highWaterMark: 16384,
      buffer: BufferList { head: null, tail: null, length: 0 },
      length: 0,
      pipes: [],
      flowing: true,
      ended: true,
      endEmitted: true,
      reading: false,
      sync: false,
      needReadable: false,
      emittedReadable: false,
      readableListening: false,
      resumeScheduled: false,
      errorEmitted: false,
      emitClose: true,
      autoDestroy: false,
      destroyed: true,
      errored: null,
      closed: false,
      closeEmitted: false,
      defaultEncoding: 'utf8',
      awaitDrainWriters: null,
      multiAwaitDrain: false,
      readingMore: false,
      decoder: null,
      encoding: null,
      [Symbol(kPaused)]: false
    },
    _events: [Object: null prototype] {
      end: [Array],
      close: [Array],
      data: [Function (anonymous)],
      error: [Function (anonymous)]
    },
    _eventsCount: 4,
    _maxListeners: undefined,
    socket: null,
    httpVersionMajor: 1,
    httpVersionMinor: 1,
    httpVersion: '1.1',
    complete: true,
    headers: {
      connection: 'keep-alive',
      'content-length': '32',
      server: 'istio-envoy',
      'content-type': 'application/json',
      'strict-transport-security': 'max-age=31536000; includeSubDomains',
      'x-envoy-upstream-service-time': '98',
      etag: 'W/"20-VP1zPcsrMiwJerbVIDyOLy0If+4"',
      'x-content-type-options': 'nosniff',
      'x-powered-by': 'Express',
      'accept-ranges': 'bytes',
      date: 'Mon, 12 Apr 2021 18:04:08 GMT',
      via: '1.1 varnish',
      'x-client-ip': '45.19.192.15',
      'x-served-by': 'cache-dal21221-DAL',
      'x-cache': 'MISS',
      'x-cache-hits': '0',
      'x-timer': 'S1618250648.303722,VS0,VE359',
      vary: 'Accept, Accept-Encoding, Authorization, Cookie'
    },
    rawHeaders: [
      'Connection',
      'keep-alive',
      'Content-Length',
      '32',
      'server',
      'istio-envoy',
      'content-type',
      'application/json',
      'strict-transport-security',
      'max-age=31536000; includeSubDomains',
      'x-envoy-upstream-service-time',
      '98',
      'etag',
      'W/"20-VP1zPcsrMiwJerbVIDyOLy0If+4"',
      'x-content-type-options',
      'nosniff',
      'x-powered-by',
      'Express',
      'Accept-Ranges',
      'bytes',
      'Date',
      'Mon, 12 Apr 2021 18:04:08 GMT',
      'Via',
      '1.1 varnish',
      'X-Client-IP',
      '45.19.192.15',
      'X-Served-By',
      'cache-dal21221-DAL',
      'X-Cache',
      'MISS',
      'X-Cache-Hits',
      '0',
      'X-Timer',
      'S1618250648.303722,VS0,VE359',
      'Vary',
      'Accept, Accept-Encoding, Authorization, Cookie'
    ],
    trailers: {},
    rawTrailers: [],
    aborted: false,
    upgrade: false,
    url: '',
    method: null,
    statusCode: 404,
    statusMessage: 'Not Found',
    client: TLSSocket {
      _tlsOptions: [Object],
      _secureEstablished: true,
      _securePending: false,
      _newSessionPending: false,
      _controlReleased: true,
      secureConnecting: false,
      _SNICallback: null,
      servername: 'learning.oreilly.com',
      alpnProtocol: false,
      authorized: true,
      authorizationError: null,
      encrypted: true,
      _events: [Object: null prototype],
      _eventsCount: 8,
      connecting: false,
      _hadError: false,
      _parent: null,
      _host: 'learning.oreilly.com',
      _readableState: [ReadableState],
      _maxListeners: undefined,
      _writableState: [WritableState],
      allowHalfOpen: false,
      _sockname: null,
      _pendingData: null,
      _pendingEncoding: '',
      server: undefined,
      _server: null,
      ssl: null,
      _requestCert: true,
      _rejectUnauthorized: true,
      parser: null,
      _httpMessage: [ClientRequest],
      [Symbol(res)]: [TLSWrap],
      [Symbol(verified)]: true,
      [Symbol(pendingSession)]: null,
      [Symbol(async_id_symbol)]: 10,
      [Symbol(kHandle)]: null,
      [Symbol(kSetNoDelay)]: false,
      [Symbol(lastWriteQueueSize)]: 0,
      [Symbol(timeout)]: null,
      [Symbol(kBuffer)]: null,
      [Symbol(kBufferCb)]: null,
      [Symbol(kBufferGen)]: null,
      [Symbol(kCapture)]: false,
      [Symbol(kBytesRead)]: 614,
      [Symbol(kBytesWritten)]: 238,
      [Symbol(connect-options)]: [Object],
      [Symbol(RequestTimeout)]: undefined
    },
    _consuming: true,
    _dumped: false,
    req: ClientRequest {
      _events: [Object: null prototype],
      _eventsCount: 5,
      _maxListeners: undefined,
      outputData: [],
      outputSize: 0,
      writable: true,
      destroyed: true,
      _last: true,
      chunkedEncoding: false,
      shouldKeepAlive: true,
      _defaultKeepAlive: true,
      useChunkedEncodingByDefault: false,
      sendDate: false,
      _removedConnection: false,
      _removedContLen: false,
      _removedTE: false,
      _contentLength: null,
      _hasBody: true,
      _trailer: '',
      finished: true,
      _headerSent: true,
      socket: [TLSSocket],
      _header: 'GET /api/v1/book/%3C9781491952016%3E HTTP/1.1\r\n' +
        'Accept: */*\r\n' +
        'Cache-Control: no-cache\r\n' +
        'Cookie: <uhn7oxvj9u8444ejdlj8nn2kcueke3cf>\r\n' +
        'Connection: keep-alive\r\n' +
        'host: learning.oreilly.com\r\n' +
        'content-type: application/json\r\n' +
        'content-length: 4\r\n' +
        '\r\n',
      _keepAliveTimeout: 0,
      _onPendingData: [Function: noopPendingOutput],
      agent: [Agent],
      socketPath: undefined,
      method: 'GET',
      maxHeaderSize: undefined,
      insecureHTTPParser: undefined,
      path: '/api/v1/book/%3C9781491952016%3E',
      _ended: true,
      res: [Circular *1],
      aborted: false,
      timeoutCb: null,
      upgradeOrConnect: false,
      parser: null,
      maxHeadersCount: null,
      reusedSocket: false,
      host: 'learning.oreilly.com',
      protocol: 'https:',
      [Symbol(kCapture)]: false,
      [Symbol(kNeedDrain)]: false,
      [Symbol(corked)]: 0,
      [Symbol(kOutHeaders)]: [Object: null prototype]
    },
    request: Request {
      _events: [Object: null prototype],
      _eventsCount: 5,
      _maxListeners: undefined,
      uri: [Url],
      method: 'GET',
      followAllRedirects: true,
      resolveWithFullResponse: true,
      headers: [Object],
      body: 'null',
      readable: true,
      writable: true,
      explicitMethod: true,
      _qs: [Querystring],
      _auth: [Auth],
      _oauth: [OAuth],
      _multipart: [Multipart],
      _redirect: [Redirect],
      _tunnel: [Tunnel],
      _rp_resolve: [Function (anonymous)],
      _rp_reject: [Function (anonymous)],
      _rp_promise: [Promise [Object]],
      _rp_callbackOrig: undefined,
      callback: [Function (anonymous)],
      _rp_options: [Object],
      setHeader: [Function (anonymous)],
      hasHeader: [Function (anonymous)],
      getHeader: [Function (anonymous)],
      removeHeader: [Function (anonymous)],
      localAddress: undefined,
      pool: {},
      dests: [],
      __isRequestRequest: true,
      _callback: [Function: RP$callback],
      proxy: null,
      tunnel: true,
      setHost: true,
      originalCookieHeader: '<uhn7oxvj9u8444ejdlj8nn2kcueke3cf>',
      _disableCookies: true,
      _jar: undefined,
      port: 443,
      host: 'learning.oreilly.com',
      path: '/api/v1/book/%3C9781491952016%3E',
      _json: true,
      httpModule: [Object],
      agentClass: [Function: Agent],
      agent: [Agent],
      _started: true,
      href: 'https://learning.oreilly.com/api/v1/book/%3C9781491952016%3E',
      req: [ClientRequest],
      ntick: true,
      response: [Circular *1],
      originalHost: 'learning.oreilly.com',
      originalHostHeaderName: 'host',
      responseContent: [Circular *1],
      _destdata: true,
      _ended: true,
      _callbackCalled: true,
      [Symbol(kCapture)]: false
    },
    toJSON: [Function: responseToJSON],
    caseless: Caseless { dict: [Object] },
    body: { error: 'Could not fetch work' },
    [Symbol(kCapture)]: false,
    [Symbol(RequestTimeout)]: undefined
  }
}

docker: Error response from daemon:

Hi, Hello, this is my first time using docker and I got an error. please help. This is full error

docker: Error response from daemon: create $(pwd)/obooks: "$(pwd)/obooks" includes invalid characters for a local volume name, only "[a-zA-Z0-9][a-zA-Z0-9_.-]" are allowed. If you intended to pass a host directory, use absolute path. See 'docker run --help'. 'datestamp' is not recognized as an internal or external command, operable program or batch file. 'version' is not recognized as an internal or external command, operable program or batch file. 'browserGpcFlag' is not recognized as an internal or external command, operable program or batch file. 'isIABGlobal' is not recognized as an internal or external command, operable program or batch file. 'landingPath' is not recognized as an internal or external command, operable program or batch file. groups: '=C0001%3A1%2CC0002%3A1%2CC0003%3A1%2CC0004%3A1': no such user 'hosts' is not recognized as an internal or external command, operable program or batch file. 'genVendors' is not recognized as an internal or external command, operable program or batch file.

I tried CLI and got error also

node:internal/modules/cjs/loader:1080
  throw err;
  ^
Error: Cannot find module 'commander'
Require stack:
- C:\Users\Admin\Downloads\obooks\obooks-master\cli.js
    at Module._resolveFilename (node:internal/modules/cjs/loader:1077:15)
    at Module._load (node:internal/modules/cjs/loader:922:27)
    at Module.require (node:internal/modules/cjs/loader:1143:19)
    at require (node:internal/modules/cjs/helpers:110:18)
    at Object.<anonymous> (C:\Users\Admin\Downloads\obooks\obooks-master\cli.js:3:17)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [ 'C:\\Users\\Admin\\Downloads\\obooks\\obooks-master\\cli.js' ]
}
Node.js v18.17.0
'datestamp' is not recognized as an internal or external command,
operable program or batch file.
'version' is not recognized as an internal or external command,
operable program or batch file.
'browserGpcFlag' is not recognized as an internal or external command,
operable program or batch file.
'isIABGlobal' is not recognized as an internal or external command,
operable program or batch file.
'landingPath' is not recognized as an internal or external command,
operable program or batch file.
groups: '=C0001%3A1%2CC0002%3A1%2CC0003%3A1%2CC0004%3A1': no such user
'hosts' is not recognized as an internal or external command,
operable program or batch file.
'genVendors' is not recognized as an internal or external command,
operable program or batch file. 

Cookie format example

Hi,
Can you please provide an example how the cookie string should look like? Whatever I do, I'm always getting this:
(node:6131) UnhandledPromiseRejectionWarning: StatusCodeError: 401 - {"detail":"Authentication credentials were not provided."}
Thanks

Not downloading book, images only.

> node --version
v14.17.0

> npm -v
6.14.13

> npm list --depth=0
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]
+-- [email protected]
`-- [email protected]
> npm i
npm WARN [email protected] No repository field.
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] (node_modules\fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for [email protected]: wanted {"os":"darwin","arch":"any"} (current: {"os":"win32","arch":"x64"})

added 466 packages from 343 contributors and audited 468 packages in 7.234s

24 packages are looking for funding
  run `npm fund` for details

found 2 moderate severity vulnerabilities
  run `npm audit fix` to fix them, or `npm audit` for details

After running the following command: node cli.js -b "9781492077992" -c "actual cookies" it outputs:

+:++:++:++:+      using stored cookies
+:++:++:++:+      downloading: Head First Design Patterns, 2nd Edition
+:++:++:++:+      22 chapters to download, please wait...
+:++:++:++:+      assembling book...

The cookie was collected by logging in to O'Reilly via Chrome, copied the cookie value from the request header at the network tab of DevTools. Yes, I have an active subscription to O'Reilly where I can read this particular book using the browser. I checked whether it was parsing/storing the correct cookie by observing the generated session.json file:

{
    "cookie": "BrowserCookie=x; salesforce_id=x; groot_sessionid=x; logged_in=y; csrfsafari=x; orm-rt=x; sessionid=x; csrftoken=x; orm-jwt=x"
}

I am having the same issue as #30 where it only downloads images. I have tried different books, it's all the same.

Create cache for chapters

On large books, O'Reilly might suddenly throw a 500 error halfway. Create chapters cache to prevent downloading same chapters on retry.

Issues with the generated EPUB file (duplicate ch titles & toc, no cover img & css, toc anchor links not working)

Issues:

Running cli.js generates an EPUB file that:

1- Has a <h1> element right after <body> at the beginning of each chapter in .xhtml files, which results a duplicate text of the title. The additional <h1> element is not part of the served .xhtml file from O'Reilly website.
2- Doesn't download the cover image, instead cover.jpg file displays a generated table of contents.
3- The first page toc.xhtml is a table of contents that's not part of the book (It's a duplicate of the actual toc).
4- Doesn't download the CSS file.
5- Anchor links at the actual toc is not working, it is locating to a non-existing path which results ERR_FILE_NOT_FOUND when clicked (Not related to the toc at the first page, that's expected to be removed).

Expected Behavior:

1- Not have duplicate titles in each chapter/section.
2- Display the correct cover image.
3- Removal of the first page toc.xhtml since the book already contains its own table of contents.
4- Have identical page style from CSS (fonts, italic texts etc..) of the book when viewed from O'Reilly.
5- Clicking on a anchor link at the toc should jump pages and not redirect to an incorrect file location.

Steps To Reproduce:

Steps to reproduce the issues:

  1. Run the following command $ ./cli.js -b "9781800560871" -c "Your Cookies"
  2. After the tool completes running successfully and outputs done ๐Ÿ“šโœจ, you will be able to view the issues mentioned above by opening the .epub file using Calibre.

Environment:

  • OS: Ubuntu 21.10 x86_64
  • Kernel: 5.13.0-20-generic
  • Shell: bash 5.1.8
  • Node: v12.22.5
  • npm: v8.1.1
  • Python3: v3.9.7

Anything else:

I am not aware whether these issues are producible only in specific books.

Issues running obooks

Hi @jenni

I have recently installed obooks and when running:

./cli.js -b "9781492045342" -e "[email protected]" -p "password"

I am facing the following error:

(node:89676) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'oauth' of undefined
    at GoogleAuthentication.goToLoginPage (/Users/MustiDarsh/Desktop/tmp/obooks/lib/authentication.js:99:73)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at async GoogleAuthentication.authenticate (/Users/MustiDarsh/Desktop/tmp/obooks/lib/authentication.js:45:27)
    at async main (/Users/MustiDarsh/Desktop/tmp/obooks/cli.js:64:18)
(node:89676) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:89676) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

I am wondering whether the project is still working and maintained.

Looking forward to hearing from you.

Regards,

MD

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.