aerkalov / ebooklib Goto Github PK
View Code? Open in Web Editor NEWPython E-book library for handling books in EPUB2/EPUB3 format -
Home Page: https://ebooklib.readthedocs.io/
License: GNU Affero General Public License v3.0
Python E-book library for handling books in EPUB2/EPUB3 format -
Home Page: https://ebooklib.readthedocs.io/
License: GNU Affero General Public License v3.0
The default folder name is hard-coded in the container's XML template and will not reflect the name assigned for the book.
It would be handy to have Document type also. We should be able to know "this is html document" but we should also know if it is cover.xhtml, nav.xhtml or just another chapter.
writeEPUB and readEPUB should really be write_epub and read_epub.
We need a basic plugin which will be able to filter out most of non HTML5 tags, attributes and things like that.
What we would need later is also replace non supported tags with new syntax. For instance, replace tag with element and css and etc...etc.....
API will change a lot, but for now we just need something to start working.
def before_write(self, book):
"Processing before save"
def after_write(self, book):
"Processing after save"
def before_read(self, book):
"Processing before save"
def after_read(self, book):
"Processing after save"
def item_after_read(self, book, item):
"Process general item after read."
def item_before_write(self, book, item):
"Process general item before write."
def html_after_read(self, book, chapter):
"Processing HTML before read."
def html_before_write(self, book, chapter):
"Processing HTML before write"
Add different item types like ITEM_AUDIO and ITEM_VIDEO.
No need for this. Just use normal generator expression.
Handle properties tag when creating epub file.
The EPUB2 guide element of the OPF file is not parsed when an EPUB file is loaded.
Issue with string with lxml parse function and dictionary iteritems method.
When manipulating with chapter we should be able to add other items (like scripts, stylesheets) to this item and EbookLib should be able to make automatic links for us.
For instance, you have one style file which you would like to add to other Html files.
style = '''BODY { text-align: justify;}'''
default_css = epub.EpubItem(uid="style_default", file_name="style/default.css", media_type="text/css", content=style)
book.add_item(default_css)
c2 = epub.EpubHtml(title='About this book', file_name='about.xhtml')
c2.content='<h1>About this book</h1><p>Helou, this is my book! There are many books, but this one is mine.</p>'
c2.add_item(default_css)
We do not preserve XML declarations with creating XML files. What we should so is use option xml_declaration when using etree.tostring function.
Example:
tree_str = etree.tostring(tree, pretty_print=True, encoding='utf-8', xml_declaration=True)
Sample code will not run. Fix error in README file.
Increase version to 0.15
If the original document has empty body with no children, body and head elements will be missing from the generated content.
Navigation xhtml file should extend standard xhtml chapter file class. Also, we should be able to use nav.add_item() to add CSS style definitions. For now, it was just hard coded.
There is a need to add files which are not present in the manifest (for instance - iTunesMetadata.plist, META-INF/com.apple.ibooks.display-options.xml).
Probably have it as it is right now but have argument .add_item(item, manifest=False).
Will need to change couple of API calls. This will make library work on Python 2.7 minimal.
Navigation points in the NCX documents that correspond to book sections have an empty URL assigned for content. This in reported as error by the epubcheck program.
Implement API and create guide element in the manifest. Guide is deprecated feature, but we should be able to support it.
We should also be able to support landmark feature:
http://www.idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml-nav-def-types-landmarks
Example of deprecated guide:
Guide support different types:
For unknown reasons we are using ZIP_STORED flag for every single item in zip file. We should use it only for mimetype file.
We need to support linear flag in spine. The best would be to have option in Item and to be able to mark it somehow when defining spine.
We are using HTML5 parser way too many times. Just put it in the utils module.
When reading an EPUB from a file, the book title and its UID fields are not updated. As this information is contained in the metadata, make sure that it is also assigned to corresponding fields.
EPUB3 supports remote-resources property for video and audio elements. Meaning, they can be stored somewhere remotely. But this is only for audio and video tags. These items also must be placed in the list of resouces. Our EpubItem should be aware of this and not create local file in EPUB3 in this case.
We must use communicate. Here is a little tip... Read documentation and look at the big red boxes in the documentation.
Implement methods for fetching and filtering data in book or chapter.
We should be able to check type of items in EPUB file. Return some kind of ID for different items (image, html, css, ...)
Implement call like get_link_of_href to fetch item from a book. The question is, should it return just one item or it should return more then one item. I guess more useful would be just to return one item.
It would be best if Cover file also extends EpubHtml. It would be possible to add dynamically other CSS files or JavaScript files with API.
There are some common functionality which should really be in ebooklib.utils. Things like debug, parse, ....
When an EPUB3 manifest is loaded, the item with cover-image property assigned is not recognized as the cover image.
For instance, P dir="RTL" will complain because RTL is in uppercase. Epubcheck expects them to be in lowercase.
Somehow temporary directory with unextracted epub ended up in the repository. I wonder who has put it inside.
I copy pasted copyright from Booktype. Should removed references to Booktype from inside.
When creating chapter content we do two things. We copy old tags from the original document and we also add new title tag. We should not add new tag if it already exists. But also, we should not set empty title (if it is not defined) if it already exists.
Write basic code samples for read/write operations.
Just use six package to make it work better on Python2/Python3.
When reading an EPUB file, if the NCX file is not present the TOC structure should be obtained by parsing the NAV document instead.
We should unquote filenames when reading them from zip file.
When metadata found inside the OPF file is parsed, a bogus entry is read and placed in the book's metadata container.
Add some basic sample files. Something to show how to use EbookLib library.
We have separate cover template and we used duplicated methods for the same thing. Just extend EpubHtml and use its methods for processing HTML.
Read and store linear value from spine structure.
Like the title said, mime type is not correctly gussed. mimetype.guess_type can return string OR tuple. We are only handling if it returns tuple. End result is value None for our mime type.
Add license info, author info and setup.py file.
We need standard plugin which will use tidy to clean chapter content before they are saved in EPUB.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.