Giter Site home page Giter Site logo

html4docx's Introduction

HTML FOR DOCX

Convert html to docx, this project is a fork from descontinued pqzx/html2docx.

How install

pip install html-for-docx

Usage

The basic usage: Add HTML formatted to an existing Docx

from html4docx import HtmlToDocx

parser = HtmlToDocx()
html_string = '<h1>Hello world</h1>'
parser.add_html_to_document(html_string, filename_docx)

You can use python-docx to manipulate the file as well, here an example

from docx import Document
from html4docx import HtmlToDocx

document = Document()
new_parser = HtmlToDocx()

html_string = '<h1>Hello world</h1>'
new_parser.add_html_to_document(html_string, document)

document.save('your_file_name')

Convert files directly

from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.parse_html_file(input_html_file_path, output_docx_file_path)

Convert files from a string

from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
docx = new_parser.parse_html_string(input_html_file_string)

Change table styles

Tables are not styled by default. Use the table_style attribute on the parser to set a table style. The style is used for all tables.

from html4docx import HtmlToDocx

new_parser = HtmlToDocx()
new_parser.table_style = 'Light Shading Accent 4'

To add borders to tables, use the TableGrid style:

new_parser.table_style = 'TableGrid'

Default table styles can be found here: https://python-docx.readthedocs.io/en/latest/user/styles-understanding.html#table-styles-in-default-template

Why

My goal to fork and fix/update this package was to complete my current task at work that envolves manipulating a html to docs which the original couldn't complete because was lacking of few features and bugs, so instead creating a package from zero, I prefer update this one.

Differences (fixes and new features)

Fixes

  • Handle missing run for leading br tag | dashingdove from PR
  • Fix base64 images | djplaner from Issue
  • Handle img tag without src attribute | johnjor from PR
  • Fix bug when any style has !important | Dfop02
  • Fix 'style lookup by style_id is deprecated.' | Dfop02

New Features

  • Add Witdh/Height style to images | maifeeulasad from PR
  • Support px, cm, pt and % for style margin-left to paragraphs | Dfop02
  • Improve performance on large tables | dashingdove from PR
  • Support for HTML Pagination | Evilran from PR
  • Support Table style | Evilran from PR
  • Support alternative encoding | HebaElwazzan from PR
  • Refactory Tests to be more consistent and less 'human validation' | Dfop02

License

This project is licensed under the MIT License - see the LICENSE file for details

html4docx's People

Contributors

dfop02 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

kdipippo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.