Giter Site home page Giter Site logo

Is this UTF8-friendly? about escapeless HOT 11 OPEN

DonaldTsang avatar DonaldTsang commented on August 19, 2024
Is this UTF8-friendly?

from escapeless.

Comments (11)

kosarev avatar kosarev commented on August 19, 2024

It depends on what you are going to achieve. The two likely most common tasks about Unicode are making a given sequence to be string-safe by removing the quote characters and the other one is encoding binary data as a Unicode string. To be efficient, the latter requires an encoding specifically aimed to deal with Unicode, I believe, which is not the case with the escapeless encodings. And for the first case, some generalization of the algorithm would be needed to make sure the quote characters are not mapped to a byte that would result in invalid UTF-8 sequences.

from escapeless.

DonaldTsang avatar DonaldTsang commented on August 19, 2024

@kosarev So escapeless is not a Python/JS affair then? (If it is so Unicode-unsafe)?
See https://github.com/rinick/base2e15 https://github.com/grandchild/base32k https://github.com/qntm/base32768

from escapeless.

kosarev avatar kosarev commented on August 19, 2024

Yes, comparing to these Unicode-specific encodings mentioned, escapeless is a different animal. It is most efficient when you need to strip certain characters/bytes from a stream by the cost of a low fixed-size overhead.

from escapeless.

DonaldTsang avatar DonaldTsang commented on August 19, 2024

@kosarev so is it possible to create a compatibility format than can be converted from Unicode-safe "alt-format" to Escapeless? without the need for a python-like bytes format?

from escapeless.

kosarev avatar kosarev commented on August 19, 2024

If I take the idea right, sure, there should be no problem to use escapeless in the middle of a chain of Unicode-specific encodings. As to representation of binary data, I guess you mean JS, in which case an array of bytes sounds like a good replacement for the Python's byte strings, with likely no changes in the algorithms themselves.

from escapeless.

DonaldTsang avatar DonaldTsang commented on August 19, 2024

@kosarev so basically data => escapeless => Unicode or JSON compatible string => escapeless => data

from escapeless.

kosarev avatar kosarev commented on August 19, 2024

Yes, given by Unicode or JSON compatible string you mean Unicode-safe binary-to-text and text-to-binary encodings, and not just emitting raw binary data to strings.

from escapeless.

DonaldTsang avatar DonaldTsang commented on August 19, 2024

@kosarev I mean the JSON spec does allow certain "special characters" to slip through, right?
{"string": "<as many types of characters as possible>"}
The angle bracket only disallow whitespace characters I think. What else can you think of?

from escapeless.

kosarev avatar kosarev commented on August 19, 2024

Well, escapeless wouldn't allow you to exclude those special characters, if that's what you mean, because it has to be in the middle of the encoding chain, that is, it processes purely binary data and so has to be surrounded with Unicode-specific encodings on both the ends of the chain. By removing certain characters from that binary data in the middle we can't generally affect which characters will appear in the encoded JSON string as it depends on that Unicode-specific encoding used.

from escapeless.

DonaldTsang avatar DonaldTsang commented on August 19, 2024

@kosarev but escapeless can have down to 225 characters, so surely some of the forbidden code space can be stripped off right?

from escapeless.

kosarev avatar kosarev commented on August 19, 2024

It can strip off even more characters, it just won't be efficient comparing to other approaches. Answering your question, the thing is that removing certain characters in binary data doesn't mean these or some other characters will disappear from their Unicode-encoded version, because most likely there will be no 1-to-1 correspondence.

from escapeless.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.