aerkalov / ebooklib Goto Github PK

View Code? Open in Web Editor NEW

1.4K 1.4K 226.0 904 KB

Python E-book library for handling books in EPUB2/EPUB3 format -

Home Page: https://ebooklib.readthedocs.io/

License: GNU Affero General Public License v3.0

Python 100.00%

epub python python-library

ebooklib's People

Contributors

Stargazers

Watchers

Forkers

171230839 computamike hongquan jeroanan the-happy-hippo booktype mcccclean eos87 rdhyee mjmeintjes fdeandao punchagan leleu madevelopers andyroberts biggani yomguy parisson tomirendo kkucherenkov shyn olexpono unixtech ecrowdmedia dnlzsy clach04 57uff3r kotnik psypherpunk alexrakowski durden digglife rocktee edwardbetts gislab ocrack pombredanne birkbeckctp momingxu alin23 apprendimento wannaphong shanhaiying mr-eddy-n ride90 benjhastings fbigun danielhjames magickcoding bookronin kennyl ifarhankhan olethanh kodeworker michaelstorm petrchpetr foodpoison deborahgu ywzhaiqi yuchou tantale sealemar francofaa rjshaver surya10197 tyronebj lnrsoft afshinamiree optimiz-net wenxq kyuhwas woyun-qing mataoct walelile linrongbin juggernautbooks oscargibson ashadulhoque mediakraken-dependancies einverne tylerwhipple liuyanzhi aserun feitianyiren soroushj maanijou nipundiwan1992 tom-gardner pylixm muhammadzeeshan34 ruthlessruler takishima 06linux rec nmrta easily44 jftavares geeksivan gesmvstasr yishuihanhan

ebooklib's Issues

The EPUB folder name is not configurable.

The default folder name is hard-coded in the container's XML template and will not reflect the name assigned for the book.

Implement document type

It would be handy to have Document type also. We should be able to know "this is html document" but we should also know if it is cover.xhtml, nav.xhtml or just another chapter.

Make some function names more pythonic + update docs + update examples

writeEPUB and readEPUB should really be write_epub and read_epub.

Basic plugin for filtering non HTML5 content

We need a basic plugin which will be able to filter out most of non HTML5 tags, attributes and things like that.

What we would need later is also replace non supported tags with new syntax. For instance, replace tag with element and css and etc...etc.....

Implement new API for Plugins

API will change a lot, but for now we just need something to start working.

def before_write(self, book):
    "Processing before save"

def after_write(self, book):
    "Processing after save"

def before_read(self, book):
    "Processing before save"

def after_read(self, book):
    "Processing after save"

def item_after_read(self, book, item):
    "Process general item after read."

def item_before_write(self, book, item):
    "Process general item before write."

def html_after_read(self, book, chapter):
    "Processing HTML before read."

def html_before_write(self, book, chapter):
     "Processing HTML before write"

Add additional item types for audio and video files

Add different item types like ITEM_AUDIO and ITEM_VIDEO.

Remove dependency of itertools module

No need for this. Just use normal generator expression.

Handle properties in manifest file when writing to epub

Handle properties tag when creating epub file.

Parse EPUB2 guide

The EPUB2 guide element of the OPF file is not parsed when an EPUB file is loaded.

Creating EPUB files does not work in Python 3.3

Issue with string with lxml parse function and dictionary iteritems method.

Implement add_item method for EpubHtml

When manipulating with chapter we should be able to add other items (like scripts, stylesheets) to this item and EbookLib should be able to make automatic links for us.

For instance, you have one style file which you would like to add to other Html files.

style = '''BODY { text-align: justify;}'''

default_css = epub.EpubItem(uid="style_default", file_name="style/default.css", media_type="text/css", content=style)
book.add_item(default_css)

c2 = epub.EpubHtml(title='About this book', file_name='about.xhtml')
c2.content='<h1>About this book</h1><p>Helou, this is my book! There are many books, but this one is mine.</p>'
c2.add_item(default_css)

Preserve XML declaration when creating XML files

We do not preserve XML declarations with creating XML files. What we should so is use option xml_declaration when using etree.tostring function.

Example:
tree_str = etree.tostring(tree, pretty_print=True, encoding='utf-8', xml_declaration=True)

Fix error in README file

Sample code will not run. Fix error in README file.

Head and body elements missing in some cases

If the original document has empty body with no children, body and head elements will be missing from the generated content.

Navigation xhtml file should behave like other xhtml files

Navigation xhtml file should extend standard xhtml chapter file class. Also, we should be able to use nav.add_item() to add CSS style definitions. For now, it was just hard coded.

Implement API to add files which are not present in the manifest

There is a need to add files which are not present in the manifest (for instance - iTunesMetadata.plist, META-INF/com.apple.ibooks.display-options.xml).

Probably have it as it is right now but have argument .add_item(item, manifest=False).

Make it work on Python 2.7 / Python 3.3

Will need to change couple of API calls. This will make library work on Python 2.7 minimal.

Faulty navigation points in the NCX

Navigation points in the NCX documents that correspond to book sections have an empty URL assigned for content. This in reported as error by the epubcheck program.

Support for guide

Implement API and create guide element in the manifest. Guide is deprecated feature, but we should be able to support it.

We should also be able to support landmark feature:
http://www.idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml-nav-def-types-landmarks

Guide

Example of deprecated guide:

Guide support different types:

cover
title-page
toc
index
glossary
acknowledgements
bibliography
colophon
copyright-page
dedication
epigraph
foreword
loi
lot
notes
preface
text

Remove print statements from source code

Do not use ZIP_STORED for every item in zip file

For unknown reasons we are using ZIP_STORED flag for every single item in zip file. We should use it only for mimetype file.

Item in spine could have flag linear

We need to support linear flag in spine. The best would be to have option in Item and to be able to mark it somehow when defining spine.

Put parsing function in the utils module

We are using HTML5 parser way too many times. Just put it in the utils module.

Book title and UID fields not assigned when reading from file

When reading an EPUB from a file, the book title and its UID fields are not updated. As this information is contained in the metadata, make sure that it is also assigned to corresponding fields.

Have EpubItem for remote resources

EPUB3 supports remote-resources property for video and audio elements. Meaning, they can be stored somewhere remotely. But this is only for audio and video tags. These items also must be placed in the list of resouces. Our EpubItem should be aware of this and not create local file in EPUB3 in this case.

We must not use .wait() for waiting Popen to end

We must use communicate. Here is a little tip... Read documentation and look at the big red boxes in the documentation.

Extend API with methods for filtering data

Implement methods for fetching and filtering data in book or chapter.

Check type of Item in epub

We should be able to check type of items in EPUB file. Return some kind of ID for different items (image, html, css, ...)

Fix typo in setup.py

Implement different methods for fetching different items from a book

Implement call like get_link_of_href to fetch item from a book. The question is, should it return just one item or it should return more then one item. I guess more useful would be just to return one item.

Cover file should also extend EpubHtml class

It would be best if Cover file also extends EpubHtml. It would be possible to add dynamically other CSS files or JavaScript files with API.

Move common functions to ebooklib.utils

There are some common functionality which should really be in ebooklib.utils. Things like debug, parse, ....

Cover image item

When an EPUB3 manifest is loaded, the item with cover-image property assigned is not recognized as the cover image.

Epubcheck fails for some tag attributes

For instance, P dir="RTL" will complain because RTL is in uppercase. Epubcheck expects them to be in lowercase.

Delete temporary directory

Somehow temporary directory with unextracted epub ended up in the repository. I wonder who has put it inside.

Wrong copyright info

I copy pasted copyright from Booktype. Should removed references to Booktype from inside.

Do not create new title tag in chapter if it already exists

When creating chapter content we do two things. We copy old tags from the original document and we also add new title tag. We should not add new tag if it already exists. But also, we should not set empty title (if it is not defined) if it already exists.

standard tidy
https://github.com/w3c/tidy-html5