Giter Site home page Giter Site logo

whatever_disentangler's Introduction

whatever-disentangler is a brute-force disentangler for legacy encodings

Use cases

  • When you already know what the expected (disentangled) string looks like
  • When you know which encodings you want to try, even without knowing what the expected string looks like
  • Tough cases which need two-step detangling

Installation with Poetry

git clone https://github.com/kirisakow/whatever-disentangler.git

cd whatever-disentangler

poetry install

Use whatever-disentangler as a CLI executable

Run script with no arguments to see a complete usage note. Here are the key moments:

  1. str_to_fix is the only required argument and the only positional argument. As a positional argument, it takes no key, only the value; as the only positional argument, it goes either to the very first or the very last position of the command line (prefer the beginning though, otherwise it may be mistaken for the value of those other arguments that can take multiple values). If the string contains spaces, enclose it in quotation marks.
  2. All the other arguments are optional. Their keys must go in pair with their values: --expected_str "the actual expected string". Both the underscore and the hyphen are valid characters to write the keys; in other words, both snake_case and kebab-case notations are valid.
  3. The optional arguments --encoding-from and --encoding-to can take multiple values, separated by space or another IFS.

Examples:

python whatever_disentangler "échéancier" --recursivity-depth 2 --expected-str "échéancier" --encoding_from cp1250 cp1251 cp1252
...
'échéancier' ('cp1252') -> 'échéancier' ('utf_8')
    -> 'échéancier' ('cp1252') -> 'échéancier' ('utf_8')
    -> 'échéancier' ('cp1252') -> 'échéancier' ('utf_8_sig')
...

Use whatever-disentangler as an importable library in Python code

Add whatever-disentangler as a dependency so you can import it:

cd your-project

poetry add --editable ../rel/path/to/whatever-disentangler/

poetry install

Use whatever-disentangler as both offline executable or a remote HTTP API caller:

from whatever_disentangler import whatever_disentangler as wd

# this one is an offline disentangler:
disentangler = wd.Disentangler()
disentangler.flatten_legibly(
  disentangler.disentangle(str_to_fix="боз▌з╤з╙з╤ б░з▄зтз╤Б0Ж3з▀Б0┌1! Б0┘5з╓зтзрз┴з▐ зуз▌з╤з╙з╤!", expected_str="Слава Україні! Героям слава!", recursivity_depth=2)
)

# and this one is remote: it calls a homemade REST API:
remote_disentangler = wd.RemoteDisentangler(endpoint='https://crac.ovh/fix_legacy_encoding')
response_obj = await remote_disentangler.fetch_response(str_to_fix="Ţč޻޹ަ ŢÓަޮޢ޴޷޵޺! Ţč޻޹ަ ޹ަŢŔްޢ!", expected_str="Жыве Беларусь! Жыве вечна!", recursivity_depth=2)
remote_disentangler.flatten_legibly(response_obj)

To see whatever_disentangler in action,

whatever_disentangler's People

Contributors

kirisakow avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.