Giter Site home page Giter Site logo

Custom html entity defs about python-ftfy HOT 4 CLOSED

rspeer avatar rspeer commented on August 20, 2024
Custom html entity defs

from python-ftfy.

Comments (4)

rspeer avatar rspeer commented on August 20, 2024

Sorry for letting this suggestion lapse.

I think it would be argument bloat to make the entities customizable, but on the other hand, my design goal is to make the right thing happen as often as possible, so I could just code in some extra entities. Is this the entire list of sometimes-accepted but non-standard HTML entities, or are there more?

from python-ftfy.

chbrown avatar chbrown commented on August 20, 2024

That's not the entire list by any means, just the entity-looking things that showed up in my particular messily-encoded file. :)

I don't know if there's a good standard list somewhere, since those characters are outside the W3 spec, but here's some relevant Chromium source code and some from WebKit (using "Sacute" as a relatively unambiguous search term). Or maybe it'd be easier to incorporate the already-Python constants from html5lib?

That full html5lib list might be overboard, though? There are quite a few characters in there!

In [1]: import html5lib; len(html5lib.constants.entities)
Out[1]: 2231

from python-ftfy.

rspeer avatar rspeer commented on August 20, 2024

A desirable outcome for me is a list of entities that I don't have to curate myself, and html5lib looks like a good choice there. If it's important to decode these entities in an HTML5 parser, it's probably also important in ftfy.

(The best thing would be to replace the whole function with html.unescape from the standard library, but that's a Python 3.4+ thing, and, y'know, long before I drop support for Python 3.3, I have to drop support for 2.6 and 2.7 and see how mad people get.)

from python-ftfy.

chbrown avatar chbrown commented on August 20, 2024

html5lib is a pretty sizable project to depend on, but its dependencies, six and webencodings, seem reasonable. I agree with your judgment here and the solution in c960156.

from python-ftfy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.