Giter Site home page Giter Site logo

epub-parser's Introduction

epub-parser

What It Does

Epub-parser is a simple way to make working with epub files more programmer- and web- friendly by converting its data structures to JSON and providing some simplified structures for convenient access to that data.

Specifically, it:

  • takes a lot of essential data from any valid Epub file and makes it available in an "easy" JSON namespace
  • converts the rest of an Epub's metadata to JSON structures, and makes that available via a "raw" namespace
  • takes care of a lot of dirty work in determining the primary ID of a file, the location of the NCX file, the OPS root, etc
  • builds some boilerplate HTML you can use in your web app to display a user-friendly version of NCX data (note: this functionality is expanded in the epub2html module if you need more like it)

Requirements

This depends on the xml2js and node-zip modules. Eyes is also required but you can use console instead if you want.

Installing

npm install epub-parser

Usage

See the example directory for a sample of how to use this module. See also the epub2web, epub-cache, and epub-editor for modules that leverage this one, and furtehr examples of how this can be used.

In a nutshell though, it's as simple as this:

var epubParser = require('epub-parser');

epubParser.open(epubFullPath, function (err, epubData) {

	if(err) return console.log(err);
	console.log(epubData.easy);
	console.log(epubData.raw.json.ncx);

});

Data structure

Use the example with a valid epub file to see how epub-parser builds its data structure. The eyes module will help you inspect it at all levels in a clean way (it's currently a dependency).

Here's an example of the "easy" namespace using a sample file from Jellybooks:

{
    navMapHTML: '<ul><li><a href="safe_house_cover.html">Cover</a></li>\n<li><a href="safe_house_booktitlepage.html">Title Page</a></li>\n<li><a href="safe_house_dedication.html">Dedication</a></li>\n<li><a href="safe_house_toc.html">Table of Contents</a></li>\n<li><a href="safe_house_aboutthisbook.html">A Note on the Isle of Man</a></li>\n<li><a href="safe_house_chapter_01.html">I don’t remember much . . .</a></li>\n<li><a href="safe_house_part_01.html">Part One</a><ul><li><a href="safe_house_chapter_02.html">Chapter One</a></li>\n<li><a href="safe_house_chapter_03.html">Chapter Two</a></li>\n<li><a href="safe_house_chapter_04.html">Chapter Three</a></li>\n<li><a href="safe_house_chapter_05.html">Chapter Four</a></li>\n<li><a href="safe_house_chapter_06.html">Chapter Five</a></li>\n</ul>\n</li>\n<li><a href="jellybooks.html">Jellybooks sweet page</a></li>\n</ul>\n',
    primaryID: {
        name: 'BookId',
        scheme: 'ISBN',
        value: 'safe_house'
    }
}

Here's another example showing how the NCX data structure is mapped:

{
    docTitle: [
        {
            text: [ 'Safe House' ]
        }
    ],
    docAuthor: [
        {
            text: [ 'Ewan, Chris' ]
        }
    ],
    $: { xmlns: 'http://www.daisy.org/z3986/2005/ncx/', version: '2005-1' },
    navMap: [
        {
            navPoint: [
                {
                    content: [
                        {
                            $: { src: 'safe_house_cover.html' }
                        }
                    ],
                    navLabel: [
                        {
                            text: [ 'Cover' ]
                        }
                    ],
                    $: {
                        class: 'chapter',
                        playOrder: '1',
                        id: 'navPoint-1'
                    }
                },
                {
                    content: [
                        {
                            $: { src: 'safe_house_booktitlepage.html' }
                        }
                    ],
                    navLabel: [
                        {
                            text: [ 'Title Page' ]
                        }
                    ],
                    $: {
                        class: 'chapter',
                        playOrder: '2',
                        id: 'navPoint-2'
                    }
                },
                {
                    content: [
                        {
                            $: { src: 'safe_house_dedication.html' }
                        }
                    ],
                    navLabel: [
                        {
                            text: [ 'Dedication' ]
                        }
                    ],
                    $: {
                        class: 'chapter',
                        playOrder: '3',
                        id: 'navPoint-3'
                    }
                },
                {
                    content: [
                        {
                            $: { src: 'safe_house_toc.html' }
                        }
                    ],
                    navLabel: [
                        {
                            text: [ 'Table of Contents' ]
                        }
                    ],
                    $: {
                        class: 'chapter',
                        playOrder: '4',
                        id: 'navPoint-4'
                    }
                },
                {
                    content: [
                        {
                            $: { src: 'safe_house_aboutthisbook.html' }
                        }
                    ],
                    navLabel: [
                        {
                            text: [ 'A Note on the Isle of Man' ]
                        }
                    ],
                    $: {
                        class: 'chapter',
                        playOrder: '5',
                        id: 'navPoint-5'
                    }
                },
                {
                    content: [
                        {
                            $: { src: 'safe_house_chapter_01.html' }
                        }
                    ],
                    navLabel: [
                        {
                            text: [ 'I don’t remember much . . .' ]
                        }
                    ],
                    $: {
                        class: 'chapter',
                        playOrder: '6',
                        id: 'navPoint-6'
                    }
                },
                {
                    content: [
                        {
                            $: { src: 'safe_house_part_01.html' }
                        }
                    ],
                    navLabel: [
                        {
                            text: [ 'Part One' ]
                        }
                    ],
                    $: {
                        class: 'chapter',
                        playOrder: '7',
                        id: 'navPoint-7'
                    },
                    navPoint: [
                        {
                            content: [
                                {
                                    $: { src: 'safe_house_chapter_02.html' }
                                }
                            ],
                            navLabel: [
                                {
                                    text: [ 'Chapter One' ]
                                }
                            ],
                            $: {
                                class: 'chapter',
                                playOrder: '8',
                                id: 'navPoint-8'
                            }
                        },
                        {
                            content: [
                                {
                                    $: { src: 'safe_house_chapter_03.html' }
                                }
                            ],
                            navLabel: [
                                {
                                    text: [ 'Chapter Two' ]
                                }
                            ],
                            $: {
                                class: 'chapter',
                                playOrder: '9',
                                id: 'navPoint-9'
                            }
                        },
                        {
                            content: [
                                {
                                    $: { src: 'safe_house_chapter_04.html' }
                                }
                            ],
                            navLabel: [
                                {
                                    text: [ 'Chapter Three' ]
                                }
                            ],
                            $: {
                                class: 'chapter',
                                playOrder: '10',
                                id: 'navPoint-10'
                            }
                        },
                        {
                            content: [
                                {
                                    $: { src: 'safe_house_chapter_05.html' }
                                }
                            ],
                            navLabel: [
                                {
                                    text: [ 'Chapter Four' ]
                                }
                            ],
                            $: {
                                class: 'chapter',
                                playOrder: '11',
                                id: 'navPoint-11'
                            }
                        },
                        {
                            content: [
                                {
                                    $: { src: 'safe_house_chapter_06.html' }
                                }
                            ],
                            navLabel: [
                                {
                                    text: [ 'Chapter Five' ]
                                }
                            ],
                            $: {
                                class: 'chapter',
                                playOrder: '12',
                                id: 'navPoint-12'
                            }
                        }
                    ]
                },
                {
                    content: [
                        {
                            $: { src: 'jellybooks.html' }
                        }
                    ],
                    navLabel: [
                        {
                            text: [ 'Jellybooks sweet page' ]
                        }
                    ],
                    $: {
                        class: 'chapter',
                        playOrder: '13',
                        id: 'navPoint-13'
                    }
                }
            ]
        }
    ],
    head: [
        {
            meta: [
                {
                    $: { name: 'dtb:ISBN', content: 'safe_house' }
                },
                {
                    $: { name: 'dtb:generator', content: 'EPUBLib version 3.0' }
                },
                {
                    $: { name: 'dtb:depth', content: '2' }
                },
                {
                    $: { name: 'dtb:totalPageCount', content: '0' }
                },
                {
                    $: { name: 'dtb:maxPageNumber', content: '0' }
                }
            ]
        }
    ]
}

Troubleshooting

Epub-parser is non-validating. Therefore, it will be fairly tolerant as long as the XML is well-formed and the metadata files can be found where they're expected to be found. However, you should use the IDPF's epubcheck validator to make sure you're only using valid epubs as input.

Bug reporting

Please report bugs to vaporbook on github. Your help is appreciated. Also, subscribe to this repo, as it will be used in several projects I'm working on, and I hope to make frequent fixes and updates.

epub-parser's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

epub-parser's Issues

Fails to open Gutenberg ePubs

Although the test ePub works fine, I get this same error trying to open any ePub from Project Gutenberg:

[TypeError: Cannot read property '0' of undefined]
metadata element error: Cannot read property '0' of undefined
manifest element error: Cannot read property '0' of undefined
must throw this

Here's an example book:

https://dl.dropboxusercontent.com/u/382588/ocean2.0/development/gutenberg-710.epub


Seems this is where the error is thrown:

        function parsePackageElements() {         

          // operates on global vars
          try {
              metadata = opf[opfPrefix+"metadata"][0];
          } catch(e) {
              console.log('metadata element error: '+e.message);
          }
          try {
              manifest = opf[opfPrefix+"manifest"][0];

          } catch (e) {
              console.log('manifest element error: '+e.message);
              console.log('must throw this');
              throw (e);
          }
          try {
              spine = opf[opfPrefix+"spine"][0];
          } catch(e) {
              console.log('spine element error: '+e.message);
              console.log('must throw this');
              throw (e);
          }
          try {
              guide = opf[opfPrefix+"guide"][0];
          } catch (e) {
              ;
          }
        }

It looks like the manifest, spine and guide keys are expected to be prefixed with 'opf:' but they are not so in the object:

{ '$': 
   { 'xmlns:opf': 'http://www.idpf.org/2007/opf',
     'xmlns:dcterms': 'http://purl.org/dc/terms/',
     'xmlns:dc': 'http://purl.org/dc/elements/1.1/',
     'xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
     xmlns: 'http://www.idpf.org/2007/opf',
     version: '2.0',
     'unique-identifier': 'id' },
  metadata: 
   [ { 'dc:rights': [Object],
       'dc:identifier': [Object],
       'dc:creator': [Object],
       'dc:title': [Object],
       'dc:language': [Object],
       'dc:subject': [Object],
       'dc:date': [Object],
       'dc:source': [Object],
       meta: [Object] } ],
  manifest: [ { item: [Object] } ],
  spine: [ { '$': [Object], itemref: [Object] } ],
  guide: [ { reference: [Object] } ] }

Error importing epub

I've encountered an error while parsing, here's the console output including a stack trace:

opsRoot is: (derived from content.opf)
parsing OPF data

events.js:72
throw er; // Unhandled 'error' event
^
TypeError: Cannot read property 'length' of undefined
at /Users/steveridout/code/readlang/node_modules/epub-parser/lib/epub-parser.js:234:36
at Parser. (/Users/steveridout/code/readlang/node_modules/epub-parser/node_modules/xml2js/lib/xml2js.js:384:20)
at Parser.emit (events.js:95:17)
at Object.onclosetag (/Users/steveridout/code/readlang/node_modules/epub-parser/node_modules/xml2js/lib/xml2js.js:348:26)
at emit (/Users/steveridout/code/readlang/node_modules/epub-parser/node_modules/xml2js/node_modules/sax/lib/sax.js:615:33)
at emitNode (/Users/steveridout/code/readlang/node_modules/epub-parser/node_modules/xml2js/node_modules/sax/lib/sax.js:620:3)
at closeTag (/Users/steveridout/code/readlang/node_modules/epub-parser/node_modules/xml2js/node_modules/sax/lib/sax.js:861:5)
at Object.write (/Users/steveridout/code/readlang/node_modules/epub-parser/node_modules/xml2js/node_modules/sax/lib/sax.js:1294:29)
at Parser.exports.Parser.Parser.parseString (/Users/steveridout/code/readlang/node_modules/epub-parser/node_modules/xml2js/lib/xml2js.js:403:31)
at Parser.parseString (/Users/steveridout/code/readlang/node_modules/epub-parser/node_modules/xml2js/lib/xml2js.js:6:61)

If you want I can privately share the offending .epub file to help reproduce the issue, I don't want to share it publicly since I don't own the copyright.

PS: thanks very much for this module!

How to contact Vaporbook?

Your project page says "Please report bugs to vaporbook on github", but there is no way for one user on GitHub to contact another user on GitHub.

I don't have a bug; I just want to know what epub-parser does. I want to extract the book text (not the metadata) from epubs, but it doesn't seem to do that.

License?

Is their no license to this repository? I can't seem to find one, it could just be me not looking hard enough though. I'm not sure that I'd be able to use this module without a license explicitly granting the permissions.

This line in package.json seems to suggest that the license is BSD.

Accessing to Epub's Content

Hi, thanks for this amazing tech. Right now I can access to the epub's metadata, my question is, it's posible to access all the contents?.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.