aerkalov / ebooklib Goto Github PK
View Code? Open in Web Editor NEWPython E-book library for handling books in EPUB2/EPUB3 format -
Home Page: https://ebooklib.readthedocs.io/
License: GNU Affero General Public License v3.0
Python E-book library for handling books in EPUB2/EPUB3 format -
Home Page: https://ebooklib.readthedocs.io/
License: GNU Affero General Public License v3.0
Will need to change couple of API calls. This will make library work on Python 2.7 minimal.
When reading an EPUB file, if the NCX file is not present the TOC structure should be obtained by parsing the NAV document instead.
Issue with string with lxml parse function and dictionary iteritems method.
API will change a lot, but for now we just need something to start working.
def before_write(self, book):
"Processing before save"
def after_write(self, book):
"Processing after save"
def before_read(self, book):
"Processing before save"
def after_read(self, book):
"Processing after save"
def item_after_read(self, book, item):
"Process general item after read."
def item_before_write(self, book, item):
"Process general item before write."
def html_after_read(self, book, chapter):
"Processing HTML before read."
def html_before_write(self, book, chapter):
"Processing HTML before write"
Navigation xhtml file should extend standard xhtml chapter file class. Also, we should be able to use nav.add_item() to add CSS style definitions. For now, it was just hard coded.
I copy pasted copyright from Booktype. Should removed references to Booktype from inside.
No need for this. Just use normal generator expression.
When an EPUB3 manifest is loaded, the item with cover-image property assigned is not recognized as the cover image.
We should be able to check type of items in EPUB file. Return some kind of ID for different items (image, html, css, ...)
The EPUB2 guide element of the OPF file is not parsed when an EPUB file is loaded.
Navigation points in the NCX documents that correspond to book sections have an empty URL assigned for content. This in reported as error by the epubcheck program.
It would be handy to have Document type also. We should be able to know "this is html document" but we should also know if it is cover.xhtml, nav.xhtml or just another chapter.
When creating chapter content we do two things. We copy old tags from the original document and we also add new title tag. We should not add new tag if it already exists. But also, we should not set empty title (if it is not defined) if it already exists.
We need a basic plugin which will be able to filter out most of non HTML5 tags, attributes and things like that.
What we would need later is also replace non supported tags with new syntax. For instance, replace tag with element and css and etc...etc.....
Implement call like get_link_of_href to fetch item from a book. The question is, should it return just one item or it should return more then one item. I guess more useful would be just to return one item.
We need to support linear flag in spine. The best would be to have option in Item and to be able to mark it somehow when defining spine.
There are some common functionality which should really be in ebooklib.utils. Things like debug, parse, ....
Like the title said, mime type is not correctly gussed. mimetype.guess_type can return string OR tuple. We are only handling if it returns tuple. End result is value None for our mime type.
Handle properties tag when creating epub file.
If the original document has empty body with no children, body and head elements will be missing from the generated content.
There is a need to add files which are not present in the manifest (for instance - iTunesMetadata.plist, META-INF/com.apple.ibooks.display-options.xml).
Probably have it as it is right now but have argument .add_item(item, manifest=False).
Implement methods for fetching and filtering data in book or chapter.
Add license info, author info and setup.py file.
Add different item types like ITEM_AUDIO and ITEM_VIDEO.
We should unquote filenames when reading them from zip file.
When metadata found inside the OPF file is parsed, a bogus entry is read and placed in the book's metadata container.
Increase version to 0.15
EPUB3 supports remote-resources property for video and audio elements. Meaning, they can be stored somewhere remotely. But this is only for audio and video tags. These items also must be placed in the list of resouces. Our EpubItem should be aware of this and not create local file in EPUB3 in this case.
We need standard plugin which will use tidy to clean chapter content before they are saved in EPUB.
We are using HTML5 parser way too many times. Just put it in the utils module.
It would be best if Cover file also extends EpubHtml. It would be possible to add dynamically other CSS files or JavaScript files with API.
writeEPUB and readEPUB should really be write_epub and read_epub.
Just use six package to make it work better on Python2/Python3.
For instance, P dir="RTL" will complain because RTL is in uppercase. Epubcheck expects them to be in lowercase.
We must use communicate. Here is a little tip... Read documentation and look at the big red boxes in the documentation.
We have separate cover template and we used duplicated methods for the same thing. Just extend EpubHtml and use its methods for processing HTML.
For unknown reasons we are using ZIP_STORED flag for every single item in zip file. We should use it only for mimetype file.
The default folder name is hard-coded in the container's XML template and will not reflect the name assigned for the book.
Somehow temporary directory with unextracted epub ended up in the repository. I wonder who has put it inside.
Implement API and create guide element in the manifest. Guide is deprecated feature, but we should be able to support it.
We should also be able to support landmark feature:
http://www.idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml-nav-def-types-landmarks
Example of deprecated guide:
Guide support different types:
When reading an EPUB from a file, the book title and its UID fields are not updated. As this information is contained in the metadata, make sure that it is also assigned to corresponding fields.
Read and store linear value from spine structure.
Sample code will not run. Fix error in README file.
Write basic code samples for read/write operations.
When manipulating with chapter we should be able to add other items (like scripts, stylesheets) to this item and EbookLib should be able to make automatic links for us.
For instance, you have one style file which you would like to add to other Html files.
style = '''BODY { text-align: justify;}'''
default_css = epub.EpubItem(uid="style_default", file_name="style/default.css", media_type="text/css", content=style)
book.add_item(default_css)
c2 = epub.EpubHtml(title='About this book', file_name='about.xhtml')
c2.content='<h1>About this book</h1><p>Helou, this is my book! There are many books, but this one is mine.</p>'
c2.add_item(default_css)
Add some basic sample files. Something to show how to use EbookLib library.
We do not preserve XML declarations with creating XML files. What we should so is use option xml_declaration when using etree.tostring function.
Example:
tree_str = etree.tostring(tree, pretty_print=True, encoding='utf-8', xml_declaration=True)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.