Giter Site home page Giter Site logo

springload / draftjs_exporter Goto Github PK

View Code? Open in Web Editor NEW
80.0 18.0 21.0 721 KB

Convert Draft.js ContentState to HTML

Home Page: https://www.draftail.org/blog/2018/03/13/rethinking-rich-text-pipelines-with-draft-js

License: MIT License

Python 97.73% Makefile 0.84% Shell 1.30% JavaScript 0.13%
draft-js python exporter draftjs-exporter draftail rich-text

draftjs_exporter's Introduction

Library to convert rich text from Draft.js raw ContentState to HTML.

It is developed alongside the Draftail rich text editor, for Wagtail. Check out the online demo, and our introductory blog post.

Why

Draft.js is a rich text editor framework for React. Its approach is different from most rich text editors because it does not store data as HTML, but rather in its own representation called ContentState. This exporter is useful when the ContentState to HTML conversion has to be done in a Python ecosystem.

The initial use case was to gain more control over the content managed by rich text editors in a Wagtail/Django site. If you want to read the full story, have a look at our blog post: Rethinking rich text pipelines with Draft.js.

Features

This project adheres to Semantic Versioning, and measures performance and code coverage. Code is checked with mypy.

  • Extensive configuration of the generated HTML.
  • Default, extensible block & inline style maps for common HTML elements.
  • Convert line breaks to <br> elements.
  • Define any attribute in the block map – custom class names for elements.
  • React-like API to create custom components.
  • Automatic conversion of entity data to HTML attributes (int & boolean to string, style object to style string).
  • Nested lists (<li> elements go inside <ul> or <ol>, with multiple levels).
  • Output inline styles as inline elements (<em>, <strong>, pick and choose, with any attribute).
  • Overlapping inline style and entity ranges.
  • Static type annotations.

Usage

Draft.js stores data in a JSON representation based on blocks, representing lines of content in the editor, annotated with entities and styles to represent rich text. For more information, this article covers the concepts further.

Getting started

This exporter takes the Draft.js ContentState data as input, and outputs HTML based on its configuration. To get started, install the package:

pip install draftjs_exporter

We support the following Python versions: 3.8, 3.9, 3.10, 3.11, 3.12, 3.13. For legacy Python versions, find compatible releases in the CHANGELOG.

In your code, create an exporter and use the render method to create HTML:

from draftjs_exporter.dom import DOM
from draftjs_exporter.html import HTML

# Configuration options are detailed below.
config = {}

# Initialise the exporter.
exporter = HTML(config)

# Render a Draft.js `contentState`
html = exporter.render({
    'entityMap': {},
    'blocks': [{
        'key': '6mgfh',
        'text': 'Hello, world!',
        'type': 'unstyled',
        'depth': 0,
        'inlineStyleRanges': [],
        'entityRanges': []
    }]
})

print(html)

You can also run an example by downloading this repository and then using python example.py, or by using our online Draft.js demo.

Configuration

The exporter output is extensively configurable to cater for varied rich text requirements.

# draftjs_exporter provides default configurations and predefined constants for reuse.
from draftjs_exporter.constants import BLOCK_TYPES, ENTITY_TYPES
from draftjs_exporter.defaults import BLOCK_MAP, STYLE_MAP
from draftjs_exporter.dom import DOM

config = {
    # `block_map` is a mapping from Draft.js block types to a definition of their HTML representation.
    # Extend BLOCK_MAP to start with sane defaults, or make your own from scratch.
    'block_map': dict(BLOCK_MAP, **{
        # The most basic mapping format, block type to tag name.
        BLOCK_TYPES.HEADER_TWO: 'h2',
        # Use a dict to define props on the block.
        BLOCK_TYPES.HEADER_THREE: {'element': 'h3', 'props': {'class': 'u-text-center'}},
        # Add a wrapper (and wrapper_props) to wrap adjacent blocks.
        BLOCK_TYPES.UNORDERED_LIST_ITEM: {
            'element': 'li',
            'wrapper': 'ul',
            'wrapper_props': {'class': 'bullet-list'},
        },
        # Use a custom component for more flexibility (reading block data or depth).
        BLOCK_TYPES.BLOCKQUOTE: blockquote,
        BLOCK_TYPES.ORDERED_LIST_ITEM: {
            'element': list_item,
            'wrapper': ordered_list,
        },
        # Provide a fallback component (advanced).
        BLOCK_TYPES.FALLBACK: block_fallback
    }),
    # `style_map` defines the HTML representation of inline elements.
    # Extend STYLE_MAP to start with sane defaults, or make your own from scratch.
    'style_map': dict(STYLE_MAP, **{
        # Use the same mapping format as in the `block_map`.
        'KBD': 'kbd',
        # The `style` prop can be defined as a dict, that will automatically be converted to a string.
        'HIGHLIGHT': {'element': 'strong', 'props': {'style': {'textDecoration': 'underline'}}},
        # Provide a fallback component (advanced).
        INLINE_STYLES.FALLBACK: style_fallback,
    }),
    'entity_decorators': {
        # Map entities to components so they can be rendered with their data.
        ENTITY_TYPES.IMAGE: image,
        ENTITY_TYPES.LINK: link
        # Lambdas work too.
        ENTITY_TYPES.HORIZONTAL_RULE: lambda props: DOM.create_element('hr'),
        # Discard those entities.
        ENTITY_TYPES.EMBED: None,
        # Provide a fallback component (advanced).
        ENTITY_TYPES.FALLBACK: entity_fallback,
    },
    'composite_decorators': [
        # Use composite decorators to replace text based on a regular expression.
        {
            'strategy': re.compile(r'\n'),
            'component': br,
        },
        {
            'strategy': re.compile(r'#\w+'),
            'component': hashtag,
        },
        {
            'strategy': LINKIFY_RE,
            'component': linkify,
        },
    ],
}

See examples.py for more details.

Advanced usage

Custom components

To generate arbitrary markup with dynamic data, the exporter comes with an API to create rendering components. This API mirrors React’s createElement API (what JSX compiles to).

# All of the API is available from a single `DOM` namespace
from draftjs_exporter.dom import DOM


# Components are simple functions that take `props` as parameter and return DOM elements.
def image(props):
    # This component creates an image element, with the relevant attributes.
    return DOM.create_element('img', {
        'src': props.get('src'),
        'width': props.get('width'),
        'height': props.get('height'),
        'alt': props.get('alt'),
    })


def blockquote(props):
    # This component uses block data to render a blockquote.
    block_data = props['block']['data']

    # Here, we want to display the block's content so we pass the `children` prop as the last parameter.
    return DOM.create_element('blockquote', {
        'cite': block_data.get('cite')
    }, props['children'])


def button(props):
    href = props.get('href', '#')
    icon_name = props.get('icon', None)
    text = props.get('text', '')

    return DOM.create_element('a', {
            'class': 'icon-text' if icon_name else None,
            'href': href,
        },
        # There can be as many `children` as required.
        # It is also possible to reuse other components and render them instead of HTML tags.
        DOM.create_element(icon, {'name': icon_name}) if icon_name else None,
        DOM.create_element('span', {'class': 'icon-text'}, text) if icon_name else text
    )

Apart from create_element, a parse_html method is also available. Use it to interface with other HTML generators, like template engines.

See examples.py in the repository for more details.

Fallback components

When dealing with changes in the content schema, as part of ongoing development or migrations, some content can go stale. To solve this, the exporter allows the definition of fallback components for blocks, styles, and entities. This feature is only used for development at the moment, if you have a use case for this in production we would love to hear from you. Please get in touch!

Add the following to the exporter config,

config = {
    'block_map': dict(BLOCK_MAP, **{
        # Provide a fallback for block types.
        BLOCK_TYPES.FALLBACK: block_fallback
    }),
}

This fallback component can now control the exporter behavior when normal components are not found. Here is an example:

def block_fallback(props):
    type_ = props['block']['type']

    if type_ == 'example-discard':
        logging.warning(f'Missing config for "{type_}". Discarding block, keeping content.')
        # Directly return the block's children to keep its content.
        return props['children']
    elif type_ == 'example-delete':
        logging.error(f'Missing config for "{type_}". Deleting block.')
        # Return None to not render anything, removing the whole block.
        return None
    else:
        logging.warning(f'Missing config for "{type_}". Using div instead.')
        # Provide a fallback.
        return DOM.create_element('div', {}, props['children'])

See examples.py in the repository for more details.

Alternative backing engines

By default, the exporter uses a dependency-free engine called string to build the DOM tree. There are alternatives:

  • html5lib (via BeautifulSoup)
  • lxml.
  • string_compat (A variant of string with no backwards-incompatible changes since its first release).

The string engine is the fastest, and does not have any dependencies. Its only drawback is that the parse_html method does not escape/sanitise HTML like that of other engines.

  • For html5lib, do pip install draftjs_exporter[html5lib].
  • For lxml, do pip install draftjs_exporter[lxml]. It also requires libxml2 and libxslt to be available on your system.
  • There are no additional dependencies for string_compat.

Then, use the engine attribute of the exporter config:

config = {
    # Specify which DOM backing engine to use.
    'engine': DOM.HTML5LIB,
    # Or for lxml:
    'engine': DOM.LXML,
    # Or to use the "maximum output stability" string_compat engine:
    'engine': DOM.STRING_COMPAT,
}

Custom backing engines

The exporter supports using custom engines to generate its output via the DOM API. This can be useful to implement custom export formats, e.g. to Markdown (experimental).

Here is an example implementation:

from draftjs_exporter import DOMEngine

class DOMListTree(DOMEngine):
    """
    Element tree using nested lists.
    """

    @staticmethod
    def create_tag(t, attr=None):
        return [t, attr, []]

    @staticmethod
    def append_child(elt, child):
        elt[2].append(child)

    @staticmethod
    def render(elt):
        return elt


exporter = HTML({
    # Use the dotted module syntax to point to the DOMEngine implementation.
    'engine': 'myproject.example.DOMListTree'
})

Type annotations

The exporter’s codebase uses static type annotations, checked with mypy. Reusable types are made available:

from draftjs_exporter.dom import DOM
from draftjs_exporter.types import Element, Props


# Components are simple functions that take `props` as parameter and return DOM elements.
def image(props: Props) -> Element:
    # This component creates an image element, with the relevant attributes.
    return DOM.create_element('img', {
        'src': props.get('src'),
        'width': props.get('width'),
        'height': props.get('height'),
        'alt': props.get('alt'),
    })

Contributing

See anything you like in here? Anything missing? We welcome all support, whether on bug reports, feature requests, code, design, reviews, tests, documentation, and more. Please have a look at our contribution guidelines.

If you just want to set up the project on your own computer, the contribution guidelines also contain all of the setup commands.

Credits

This project is made possible by the work of Springload, a New Zealand digital agency. The beautiful demo site is the work of @thibaudcolas.

View the full list of contributors. MIT licensed.

draftjs_exporter's People

Contributors

ericpai avatar everyonesdesign avatar hugovk avatar joshbarr avatar likewhoa avatar loicteixeira avatar mojeto avatar stormheg avatar su27 avatar thibaudcolas avatar tpict avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

draftjs_exporter's Issues

Unicode decode error installing on Ubuntu 16.04

pip 10.0.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)

  Downloading https://files.pythonhosted.org/packages/6d/e5/fcf88a5dab82ca619ff6824a062b46d9315ba91e64204275213a1a712125/draftjs_exporter-2.1.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-rmsz5l7l/draftjs-exporter/setup.py", line 91, in <module>
        long_description=md2pypi('README.md'),
      File "/tmp/pip-install-rmsz5l7l/draftjs-exporter/setup.py", line 59, in md2pypi
        content = io.open(filename).read()
      File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1727: ordinal not in range(128)

Any chance for a version update to master that doesn't run the markdown conversion in the setup.py script?

Unicode decode reading markdown file

Ubuntu 16.04

python 3.5

Collecting draftjs_exporter (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/c3/98/2ae0db16e3841d9d0623b1a2248987e1edd037ab7eaa04e45e4fdf18873b/draftjs_exporter-2.1.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-f_emtego/draftjs-exporter/setup.py", line 38, in <module>
        long_description = f.read()
      File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1956: ordinal not in range(128)

Add benchmarking tests with real-world content

At the moment the exports test suite is used to measure performance. While the content in it is significant, it's not really representative of real-world workloads.

Once someone has an open-source site using this, I would be keen to dump all of its content in this repository so we have more relevant numbers to look at.

Custom components switch places

I have two custom components: image and direct speech. There's nothing very fancy in their code, e.g. image is rendered like this:

class Image:
    def render(self, props):
        data = props['block']['data']
        return self.wrapper(data)

    def wrapper(self, data):
        caption = data.get('caption', '')

        return tags.div(
            {'class': 'contentImage'},
            self.image(data),
            DOM.create_text_node(caption))

    def image(self, data):
        src = data.get('image', '')
        caption = data.get('caption', '')
        return tags.img({
            'src': src,
            'class': 'contentImage-image',
            'alt': caption})

then I generate the content for the blocks (not important fields are dropped):

{  
   "blocks":[  
      {  
         "text":"Start text",
         "type":"unstyled",
         ...
      },
      {  
         "type":"image",
         "data":{  
            "caption":"Some image caption",
            "image":"http://placekitten.com/401/402"
         },
         ...
      },
      {  
         "text":"Middle text",
         "type":"unstyled",
         ...
      },
      {  
         "type":"direct-speech",
         "data":{  
            "name":"Steve Jobs",
            "image":"http://placekitten.com/500/501",
            "text":"Hello, world!!!!111"
         },
         ...
      },
      {  
         "text":"Last text",
         "type":"unstyled",
         ...
      }
   ]
   ...
}

I expect "Middle text" block to be between the image and direct speech, but for some reason these two blocks are rendered together and the text appears before them:

<p>Start text</p>
<p>Middle text</p>
<div>
   <div class="contentImage">
      <img class="contentImage-image" src="http://placekitten.com/401/402">Some image caption
   </div>
   <blockquote class="directSpeech">
      <div class="directSpeech-image" style="background-image: url(http://placekitten.com/500/501);"></div>
      <div class="directSpeech-wrapper">
         <div class="directSpeech-title">
            Steve Jobs
         </div>
         <p class="directSpeech-text">Hello, world!!!!111</p>
      </div>
   </blockquote>
</div>
<p>Last text</p>

I hope the bug is reproducable; if you have any questions or need more details I'll provide you any information.

use case

Forgive my naivete, but why is this package needed? What is its purpose / use case for Python devs? I just found out about draft.js, and don't know a lot about javascript. Thanks.

Update build environments to latest Python version

At the moment the project is tested on Python 2.7, 3.4, 3.5. We need to update this to test Python 3.6 (or whatever the latest version is when we get to this task).

Also, I'm very keen on limiting the number of environments we test on. Would it be reasonable to:

  • Only build on Python 2.7, and the oldest supported Python 3 version (3.4) and the latest, not all intermediary versions?
  • Only build on Python 2.7, and the latest Python 3 version?

Entities with adjacent offset are rendered incorrectly

Test version: v2.1.4

Test with the following code:

from draftjs_exporter.constants import BLOCK_TYPES
from draftjs_exporter.constants import ENTITY_TYPES
from draftjs_exporter.defaults import BLOCK_MAP
from draftjs_exporter.dom import DOM
from draftjs_exporter.html import HTML
import json

a = '''{
    "blocks": [
        {
            "key": "bh6r4",
            "text": "🙄😖",
            "type": "unstyled",
            "depth": 0,
            "inlineStyleRanges": [],
            "entityRanges": [
                {
                "offset": 0,
                "length": 1,
                "key": 7
                },
                {
                "offset": 1,
                "length": 1,
                "key": 8
                }
            ],
            "data": {}
        }
    ],
    "entityMap": {
        "7": {
            "type": "emoji",
            "mutability": "IMMUTABLE",
            "data": {
                "emojiUnicode": "🙄"
            }
        },
        "8": {
            "type": "emoji",
            "mutability": "IMMUTABLE",
            "data": {
                "emojiUnicode": "😖"
            }
        }
    }
}'''


def emoji(props):
    emoji_encode = []
    for c in props.get('emojiUnicode'):
        code = '%04x' % ord(c)
        if code != '200d':
            emoji_encode.append('%04x' % ord(c))

    return DOM.create_element('span', {
        'data-emoji': '-'.join(emoji_encode),
        'class': 'emoji',
    }, props['children'])


def entity_fallback(props):
    return DOM.create_element('span', {'class': 'missing-entity'},
                              props['children'])


def style_fallback(props):
    return props['children']


def block_fallback(props):
    return DOM.create_element('div', {}, props['children'])


DRAFTJS_EXPORTER_CONFIG = {
    'entity_decorators': {
        'emoji': emoji,
        ENTITY_TYPES.FALLBACK: entity_fallback,
    },
    'block_map': dict(BLOCK_MAP, **{
        BLOCK_TYPES.FALLBACK: block_fallback,
    })
}

exporter = HTML(DRAFTJS_EXPORTER_CONFIG)

if __name__ == '__main__':
    print(exporter.render(json.loads(a)))

Actual output:

<p><span class="emoji" data-emoji="1f644">🙄</span>😖<span class="emoji" data-emoji="1f616"></span></p>

Expected output:

<p><span class="emoji" data-emoji="1f644">🙄</span><span class="emoji" data-emoji="1f616">😖</span></p>

Publish as a wheel – and hash issues with newly-published distributions on existing releases v2.1.7, v2.1.6, v2.1.5

Edit: 🚧 For hash issues since the package got published as a wheel – see comments below.


It looks like the exporter isn’t using wheels as its published format. We should switch over to wheels, which have a number of advantages as described on https://pythonwheels.com/.

It’s not clear to me whether switching from eggs to wheels is a breaking change or not, so the first step would be to research this and decide what to do.

Entity renderers should be given more data on render

Follow-up to #87. At the moment, entity components are only given a small amount of data about an entity for rendering:

props = entity_details['data'].copy()
props['entity'] = {
'type': entity_details['type'],
}
nodes = DOM.create_element()
for n in self.element_stack:
DOM.append_child(nodes, n)
elt = DOM.create_element(opts.element, props, nodes)

There are a few more problems here:

  • The shape of the props is different than how this is stored in the Draft.js entityMap.
  • The entity data is given as the top-level props, which makes it impossible to add extra props without risking overwriting the entity data (say if the entity has a field called type that does something different from the entity type).

We should refactor this to something like:

props = {}
props['entity'] = entity_details
props['entity']['key'] = key
props['block'] = block
props['blocks'] = blocks
# (Potentially the `entity_range` as well?)

This way, it's easy to add arbitrary props without risking overwrites. The components get full access to the entity data, mutability, type, and key. And to the current block and blocks list, like styles, decorators, blocks.

Compared to the changes introduced in #90, this would be a breaking change, so should be considered carefully. Users of the exporter will have to rewrite their renderers slightly to use the new props shape.

Drop support for Python 2.7

There is no plans to do this immediately, just opening this early so people interested in this project have a fair heads-up.


I'm looking into Python versions support of the exporter as part of #101. Although that issue isn't anything like a pressing feature / bug to address, I think it's important to be clear that the exporter isn't going to support old versions of Python forever.

Python 2.7 will officially stop being supported as a language on January 1st 2020 (PEP-373), 8 months from now (https://pythonclock.org/). The exporter should also drop Python 2.7 support then, if not before, like many other projects (https://python3statement.org/). The exporter is relatively stable, so there is no need to do this sooner than needed, either in 2020 to align with other projects in the Python ecosystem, or earlier if there is important work done on the exporter and it feels like Python 2.7 support is a hindrance.


There are many people relying on this project and using Python 2.7 at the moment, here are the pip installs over the last 30 days:

| python_version | download_count |
| -------------- | -------------- |
| 3.6            |         18,249 |
| 2.7            |         11,815 |
| 3.7            |          8,106 |
| 3.5            |          3,112 |
| 3.4            |            384 |
| 3.3            |              4 |
| 3.8            |              3 |
| 2.6            |              1 |
| Total          |         41,674 |

If you're one of them, it would be good to hear from you! Please be assured that if upgrading Python versions isn't an option for you, older versions of the exporter will still be available and keep on working. Considering how stable this project is, they'll also most likely keep on being relevant.

Support the "decorator" feature?

Hi there,

I'm working on integrating this library in my own project, it works really nice, great work, thanks a lot.

And is there a plan to support the decorator rendering? It'll be quite useful.

Missing 0.4 tag

v0.4 is up on pypi but the corresponding tag is missing on GitHub :(

Support arbitrary nesting levels

The exporter should support blocks going from depth 0 to depth 3, or any depth jump, as this frequently happens with real-world HTML.

Remove support for class based decorator

Capturing offline discussions for a public discussion.

Currently a decorator can be written as a function (accepting a single positional argument props) or a class (with a single render method accepting a single positional argument props) which has a few issues:

  • Nothing is passed to the __init__ of the class but instead everything is passed to props which makes the class quite useless. Moving the props to the __init__ will either create boilerplate or will force the user to inherit from a custom class which does that for them. In short, it's not clear which benefit there is to use a class.
  • To keep naming consistent, functions are names are camel cased (so when you update your config file, you don't have to think whether it's a function or a class) which gets the linter to complain. The library should not encourage non-PEP8 compliant code.

We were therefore thinking of removing support for class based decorators. Any thoughts?

DOM class variable issue with multiple engines (draftjs_exporter_markdown)

First of all, I would like to thank you as someone who is using this library.
I'm using this library with a markdown version. (draftjs_exporter_markdown)
Because the dom of draftjs_exporter.dom.DOM is a class-level variable, it could be a problem depending on the timing of creating two versions of the HTML object.
Furthermore, I'm using this on a web server, this can cause bigger problems in a multithreaded environment.
Not only the markdown version, but the 3 engines built-in draftjs_exporter will cause the same problem.

Could you solve this problem?
Attached test code and results.

from draftjs_exporter.dom import DOM
from draftjs_exporter.html import HTML
from draftjs_exporter.constants import ENTITY_TYPES
from draftjs_exporter_markdown import ENGINE as MARKDOWN_ENGINE

template = {
    'blocks': [
        {'key': 'rrwx',
         'type': 'unstyled',
         'text': '<a href="https://www.google.com">google.com</a>',
         'depth': 0,
         'inlineStyleRanges': [],
         'entityRanges': [{'offset': 9, 'length': 22, 'key': 0}],
         'data': {}}],
    'entityMap': {'0': {'type': 'LINK',
                        'mutability': 'MUTABLE',
                        'data': {'url': 'https://www.google.com'}}}}

html_exporter = HTML({
    'entity_decorators': {
        ENTITY_TYPES.LINK: lambda props: DOM.create_element('a', {
            'href': props['url']
        }, props['children']),
    },
    'engine': DOM.LXML
})
html1 = html_exporter.render(template)

markdown_exporter = HTML({
    'engine': MARKDOWN_ENGINE
})
html2 = html_exporter.render(template)

print(html1)
print(html2)

output:

<p>&lt;a href="<a href="https://www.google.com">https://www.google.com</a>"&gt;google.com&lt;/a&gt;</p>
<p><a href="<a href="https://www.google.com">https://www.google.com</a>">google.com</a></p>

Nested inline style configuration question

  • In the https://www.draftail.org/ documentation I searched for: nested inline style.
  • In the issues / pull requests, I searched for: nested inline style
  • In Stack Overflow, I searched for: draftjs exporter nested inline style

I have this block

            {
                "key": "",
                "text": "Bold Italic Underline",
                "type": "unstyled",
                "depth": 0,
                "inlineStyleRanges": [
                    {"offset": 0, "length": 21, "style": "BOLD"},
                    {"offset": 5, "length": 16, "style": "ITALIC"},
                    {"offset": 12, "length": 9, "style": "UNDERLINE"},
                ],
                "entityRanges": [],
                "data": {},
            },

I expect it to render HTML as

<p><strong>Bold <em>Italic <u>Underline</u></em></strong></p>

but it renders HTML as

<p><strong>Bold </strong><strong><em>Italic </em></strong><strong><em><u>Underline</u></em></strong></p>

Is it a bug or is there any configuration?

[WIP] Corner cases to address

Discuss the tricky problems the exporter needs to address. They can be addressed by "doing the right thing", documenting the shortcomings, and/or letting the user configure the output they want.

  • Custom attributes (not data-* attributes but invalid ones like *ngFor) – ok with html5lib
  • Order in which ranges are applied (em in strong or strong in em)
    • Stable (alphabetical) order for now, but not configurable.
  • Order in which attributes are rendered can be different between Python versions - use something like from collections import OrderedDict?
    • html5lib inserts attributes in alphabetical order
  • Nested blocks where nesting jumps one level of depth (depth = 1, then depth = 3)
    • Not supported for now
  • unstyled blocks without text, should they render as empty p tags? br?
    • Empty elements for now.

Experiment with PEP-484 type hints

Since Pyre got released, I've been thinking the exporter would be a good project to have type annotations in, either with Pyre or Mypy.

Corresponding PEPs:

Also worth checking out:


As far as I understand, in order to release anything useful, PEP-3107 is the bare minimum, and PEP-484 is a good baseline, so Python 3.5+ only. This means this package would need to drop support for Python 2.7 and 3.4. Wagtail, the main project relying on this package, has already dropped Python 2.7 compatibility, and the last version of Django to support Python 3.4 was v2.0 (https://docs.djangoproject.com/en/dev/faq/install/)

Edit: ^ I might be wrong, since the annotation syntax is supported starting with Python 3 the package should work in versions below 3.5. But type checking will only be doable starting in v3.5+?

This means that starting when Wagtail makes a new release without Django 2.0 support (or without Python 3.4 support, if that comes first), it will be possible to release the exporter with type annotations included (some time in 2019, see https://docs.wagtail.io/en/latest/releases/upgrading.html).

If anyone wants to experiment with this in the meantime, I would be interested to see what bugs this would surface. In my opinion the first step would be to use https://github.com/dropbox/pyannotate. I think there is a similar project from Google that does annotations based on instrumentation of running code.

Exporter loads the html5lib engine even if it isn't used

At the moment, the exporter loads the html5lib engine even if it is configured to use another one. The impact is 11.6MB of memory taken for nothing:

Line #    Mem usage    Increment   Line Contents
================================================
     8     11.8 MiB      0.0 MiB   @profile
     9                             def run():
    10     23.4 MiB     11.6 MiB       from draftjs_exporter.html import HTML
    11
    12     23.4 MiB      0.0 MiB       HTML({'engine': 'string'})
[...]

Small demo script:

from memory_profiler import profile


@profile
def run():
    from draftjs_exporter.html import HTML

    HTML({'engine': 'string'}).render({
        'entityMap': {},
        'blocks': [
            {
                "key": "2nols",
                "text": "Test",
                "type": "unstyled",
                "depth": 0,
                "inlineStyleRanges": [],
                "entityRanges": [],
                "data": {}
            },
        ]
    })


run()

This could be a relatively simple fix (see patch below) if DOM.use was used consistently, but there are many tests that aren't calling it explicitly (and thus implicitly rely on the html5lib default). I would suggest to wait for #79 to be taken care of, so we don't have to update all those tests and they can then rely on the new default.

--- a/draftjs_exporter/dom.py
+++ b/draftjs_exporter/dom.py
@@ -3,7 +3,6 @@ from __future__ import absolute_import, unicode_literals
 import inspect
 import re

-from draftjs_exporter.engines.html5lib import DOM_HTML5LIB
 from draftjs_exporter.error import ConfigException

 # Python 2/3 unicode compatibility hack.
@@ -28,7 +27,7 @@ class DOM(object):
     LXML = 'lxml'
     STRING = 'string'

-    dom = DOM_HTML5LIB
+    dom = HTML5LIB

     @staticmethod
     def camel_to_dash(camel_cased_str):
@@ -37,7 +36,7 @@ class DOM(object):
         return dashed_case_str.replace('--', '-')

     @classmethod
-    def use(cls, engine=DOM_HTML5LIB):
+    def use(cls, engine=HTML5LIB):
         """
         Choose which DOM implementation to use.
         """
@@ -45,6 +44,7 @@ class DOM(object):
             if inspect.isclass(engine):
                 cls.dom = engine
             elif engine.lower() == cls.HTML5LIB:
+                from draftjs_exporter.engines.html5lib import DOM_HTML5LIB
                 cls.dom = DOM_HTML5LIB
             elif engine.lower() == cls.LXML:

Nested items are not inserted in the right wrapper when lowering depth and increasing it again

Infringing test case:

def test_render_with_backtracking_nested_wrapping(self):

[
                {
                    'text': 'Backtracking, two at once... (2)',
                    'depth': 2,
                },
                {
                    'text': 'Uh oh (1)',
                    'depth': 1,
                },
                {
                    'text': 'Up, up, and away! (2)',
                    'depth': 2,
                },
                {
                    'text': 'Arh! (1)',
                    'depth': 1,
                },
                {
                    'text': 'Did this work? (0)',
                    'depth': 0,
                },
]

Should be :

  • [...]
    • [...]
      • 2 A
    • 1 B
      • 2 C
    • 1 D
  • 0 E

Is (C is not in the right spot):

  • [...]
    • [...]
      • 2 A
      • 2 C
    • 1 B
    • 1 D
  • 0 E

Change default engine to the new dependency-free one introduced in #77

The exporter now has an engine that doesn't have any dependencies. It should probably be the one activated by default, to make the whole package dependency-free unless another engine is configured. It also happens to be faster, and less memory-hungry.

This is a breaking change though, no matter how little difference there is with the output of the current default (html5lib + BeautifulSoup), so should be part of the 2.0.0 release.

As part of this change, it will also be necessary to move the html5lib / BS4 dependencies to a separate extra like for lxml (pip install draftjs_exporter[html5lib]), as well as update the documentation.

Block-level data support

I began to use draft-js and I discovered a feature called 'block-level data'. Via Modifier#setBlockData you can set additional metadata for a specific block.

In that case the generated JSON looks like this:

{  
   "entityMap": {},
   "blocks": [  
      ...
      {  
         "key": "a4dpo",
         "text": "asdfasdfsadfatsetsd",
         "type": "direct-speech",
         "depth": 0,
         "inlineStyleRanges": [],
         "entityRanges": [],
         "data": {  
            "name": "test"
         }
      },
      ...
   ]
}

Is there any way to implement a block rendering depending on these attributes values?

Create a new dependency-free DOM backing engine

Make a dependency-free implementation of DOMEngine based on xml.etree.ElementTree or xml.etree.cElementTree. This might have a positive performance impact, as well as facilitating the use of the exporter.

For reference, here is my tentative implementation of an engine using ElementTree. For some reason it outputs wrappers twice, I would suspect a bug in wrapper_state that this particular implementation surfaces.

class DOM_ETREE(DOMEngine):
    """
    lxml implementation of the DOM API.
    """
    @staticmethod
    def create_tag(type_, attr=None):
        if not attr:
            attr = {}

        return etree.Element(type_, attrib=attr)

    @staticmethod
    def parse_html(markup):
        pass

    @staticmethod
    def append_child(elt, child):
        if hasattr(child, 'tag'):
            elt.append(child)
        else:
            c = etree.Element('fragment')
            c.text = child
            elt.append(c)

    @staticmethod
    def render(elt):
        return re.sub(r'(</?fragment>)', '', etree.tostring(elt, method='html'))

    @staticmethod
    def render_debug(elt):
        return etree.tostring(elt, method='html')

Edit: once implemented, this could become the default engine so people can more easily "choose their own adventure" with any other engine, but still have a working default when doing pip install draftjs_exporter.

How to apply text align inline style inside header tag?

How to apply text align inline style inside header tag?

Example : block

 {
            "key": "cu1e2",
            "data": {},
            "text": "test",
            "type": "header-three",
            "depth": 0,
            "entityRanges": [],
            "inlineStyleRanges": [
                {
                    "style": "right",
                    "length": 4,
                    "offset": 0
                }
            ]
        },

style_map components should be given data on render

At the moment, style_map components do not receive any data beyond the text to style (as props['children]).

def render_styles(self, text_node):
node = text_node
if not self.is_empty():
# Nest the tags.
for s in sorted(self.styles, reverse=True):
opt = Options.for_style(self.style_map, s)
node = DOM.create_element(opt.element, opt.props, node)

This is ok for common use cases (BOLD, ITALIC, etc), but it makes the style_map fallback rather useless – there is no way to know what style needs the fallback, or have any other information about the context to adjust the fallback behavior.

Here's what the block_map fallback has access to for comparison:

props['block'] = {
        'type': type_,
        'depth': depth,
        'data': data,
}

In retrospect I think this could've been all of the block's attributes, not just a cherry-picked shortlist, so for inline styles we could pass the following exhaustive props:

{
    # The style range to render.
    "range": 
        "offset": 10,
        "length": 17,
        "style": "BOLD"
    },
    # The full block data, eg.
    "block": {
            "key": "t7k7",
            "text": "Unstyled test test test test test",
            "type": "unstyled",
            "depth": 0,
            "inlineStyleRanges": [
                {
                    "offset": 10,
                    "length": 17,
                    "style": "BOLD"
                }
            ],
            "entityRanges": [
                {
                    "offset": 0,
                    "length": 4,
                    "key": 6
                }
            ],
            "data": {}
        },
}

Here's the approximative change:

-    def render_styles(self, text_node):
+    def render_styles(self, text_node, block):
        node = text_node
        if not self.is_empty():
            # Nest the tags.
            for s in sorted(self.styles, reverse=True):
                opt = Options.for_style(self.style_map, s)
+                props['block'] = block
+                props['range'] = s
                node = DOM.create_element(opt.element, opt.props, node)

        return node

Ideally I'd like entities and blocks to also be given more data (enough data to recreate the whole ContentState, thus making the exporter usable to create content migrations), but that's a separate issue.

About the performance of bs4 + html5lib

Hi Thibaud Colas,

Here I got another problem. In my project, when I wrote a real-world note text which is not too long, but with a lot of entities, I found it takes more than 5 seconds to render. Of course that's unacceptable for an online service, so I tried to reduce the number of temporary wrapper elements to optimize the speed, finally I made it a little better, like more than 3 seconds, that's all I could do.

But when I tried to use lxml instead of html5lib, the rendering time decreased to less than 1 second!

WTH? Then I found someone's benchmark , which explains the hug difference (with python 2).

And here's my simple test case with a few images and "subjects". With lxml, the rendering takes 0.17 seconds:

def Soup(raw_str):
    """
    Wrapper around BeautifulSoup to keep the code DRY.
    """
    return BeautifulSoup(raw_str, 'lxml')
         47459 function calls (46117 primitive calls) in 0.164 seconds

   Ordered by: cumulative time
   List reduced from 257 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.165    0.165 html.py:24(render)
       16    0.001    0.000    0.115    0.007 html.py:39(render_block)
      133    0.002    0.000    0.091    0.001 dom.py:35(create_element)
       38    0.000    0.000    0.091    0.002 entity_state.py:55(render_entitities)
        6    0.001    0.000    0.073    0.012 benchmark.py:173(render)
      174    0.001    0.000    0.072    0.000 dom.py:17(Soup)
      174    0.008    0.000    0.071    0.000 __init__.py:87(__init__)
      172    0.001    0.000    0.059    0.000 dom.py:28(create_tag)
        1    0.000    0.000    0.049    0.049 wrapper_state.py:111(to_string)
        1    0.000    0.000    0.049    0.049 dom.py:124(render)
      174    0.001    0.000    0.042    0.000 __init__.py:285(_feed)
      174    0.001    0.000    0.041    0.000 _lxml.py:246(feed)
 1166/515    0.002    0.000    0.037    0.000 {hasattr}
      107    0.001    0.000    0.036    0.000 element.py:1029(__getattr__)
      107    0.000    0.000    0.034    0.000 element.py:1273(find)
      107    0.001    0.000    0.034    0.000 element.py:1284(find_all)
      174    0.007    0.000    0.034    0.000 {method 'feed' of 'lxml.etree._FeedParser' objects}
      107    0.003    0.000    0.033    0.000 element.py:518(_find_all)
        2    0.000    0.000    0.028    0.014 element.py:1077(__unicode__)
    336/2    0.008    0.000    0.028    0.014 element.py:1105(decode)

But with html5lib it takes about 0.65 seconds.

         178504 function calls (177142 primitive calls) in 0.663 seconds

   Ordered by: cumulative time
   List reduced from 455 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.663    0.663 html.py:24(render)
      174    0.001    0.000    0.572    0.003 dom.py:17(Soup)
      174    0.008    0.000    0.571    0.003 __init__.py:87(__init__)
       16    0.001    0.000    0.551    0.034 html.py:39(render_block)
      174    0.001    0.000    0.543    0.003 __init__.py:285(_feed)
      174    0.003    0.000    0.542    0.003 _html5lib.py:57(feed)
      172    0.001    0.000    0.488    0.003 dom.py:28(create_tag)
      133    0.002    0.000    0.412    0.003 dom.py:35(create_element)
       38    0.000    0.000    0.382    0.010 entity_state.py:55(render_entitities)
      174    0.011    0.000    0.350    0.002 html5parser.py:55(__init__)
        6    0.001    0.000    0.256    0.043 benchmark.py:173(render)
      174    0.001    0.000    0.188    0.001 html5parser.py:225(parse)
      174    0.002    0.000    0.187    0.001 html5parser.py:81(_parse)
     5916    0.104    0.000    0.176    0.000 utils.py:49(__init__)
      174    0.007    0.000    0.145    0.001 html5parser.py:157(mainLoop)
        1    0.000    0.000    0.104    0.104 wrapper_state.py:111(to_string)
        1    0.000    0.000    0.104    0.104 dom.py:124(render)
      174    0.019    0.000    0.090    0.001 html5parser.py:874(__init__)
       16    0.000    0.000    0.089    0.006 wrapper_state.py:125(element_for)
      174    0.054    0.000    0.086    0.000 html5parser.py:422(getPhases)

So, any suggestion for optimizing? And I don't know if html5lib is good enough for us to ignore the performance issue, how do you think? Thank you~

Improve dependencies compatibility definition, and testing

Raised by @loicteixeira in https://github.com/springload/draftjs_exporter/pull/74/files#r135967185. Opening as a separate issue because it's worth discussing and there is no point in holding that PR for this.

At the moment, draftjs_exporter defines its dependencies as:

dependencies = [
'beautifulsoup4>=4.4.1,<5',
'html5lib>=0.999,<=1.0b10',
]
lxml_dependencies = [
'lxml>=3.6.0',
]

Those ranges are purposefully big (we want to support as many versions as possible for something as fundamental to people's tech stacks), which is good, but as @loicteixeira puts it then it would make sense to test accordingly, with the lower and upper bounds at least.


More info to help in the decision,

    "beautifulsoup4>=4.5.1",
    "html5lib>=0.999,<1",

Finally, bear in mind that our usage of the APIs of those dependencies is very small (HTML string -> DOM nodes conversion, DOM nodes -> HTML string conversion, create nodes, append child to node), which means that the potential breakage would only likely be in how those engines handle specific content, which is hard to test for. We do however have a small test suite of "potential engine quirks": https://github.com/springload/draftjs_exporter/blob/master/tests/engines/test_engines_differences.py

How to interlink with mathjax to export math equations HTML

"entityMap": {
        "0": {
            "type": "INLINE-EQUATION",
            "mutability": "IMMUTABLE",
            "data": {"text": "u=1/5, and v=3/5"}
        }
    }

I have tried something like this but its not working

"entity_decorators": {
        "INLINE-EQUATION": lambda props: DOM.create_element(
            "span", {"class": "mjx-process", 'data-inline-math': props['text']}
        ),
 }

need to generate math HTML! can anyone help me with this ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.