Giter Site home page Giter Site logo

Comments (5)

bnoordhuis avatar bnoordhuis commented on July 18, 2024

Your example doesn't make sense. You're converting from macintosh to utf-8 and then you feed it utf-8 instead of macintosh. Try passing bytes (buffer or uint8array) instead of a string to convert().

Note "macintosh" is an alias for MacRoman and that's a one-byte encoding, see https://en.wikipedia.org/wiki/Mac_OS_Roman. E.g. the ç is as encoded as a single byte 0x8D, not as √ß (5 bytes) like in your example.

from node-iconv.

deepak786 avatar deepak786 commented on July 18, 2024

But according to the link https://string-functions.com/encodingtable.aspx?encoding=65001&decoding=10000 √ß is ç

from node-iconv.

bnoordhuis avatar bnoordhuis commented on July 18, 2024

You misunderstand (but I don't blame you - that page is hella confusing.)

The page is showing examples of corrupted output. You're taking that output and using it as input, corrupting it even further.

ç in utf-8 is the byte sequence 0xC3 0xA7. Interpreting (corrupting) that as macintosh gives √ß, look at the wikipedia table.

Compare the (uncorrupted) string in both encodings:

console.log(Buffer.from("Manutenção")); // utf-8
console.log(Iconv("utf-8", "macintosh").convert("Manutenção")); // macintosh
//     utf-8: <Buffer 4d 61 6e 75 74 65 6e c3 a7 c3 a3 6f>
// macintosh: <Buffer 4d 61 6e 75 74 65 6e 8d 8b 6f>

from node-iconv.

deepak786 avatar deepak786 commented on July 18, 2024

checking these threads it seems that it is the issue on macOS
https://stackoverflow.com/questions/58863905/how-can-i-convert-encoding-of-special-characters-in-python
https://stackoverflow.com/questions/15283189/how-to-decode-these-characters-%E2%88%9A-%E2%88%9A-%E2%88%9A%E2%89%A0

Solution:

  • First I check if the string is MacRoman encoded
function isMacRomanEncoded(data){
  return (data.indexOf('¬') > -1) || (data.indexOf('√') > -1);
}
  • Then using the library iconv-lite, encode and decode the string
let data = ...;
let iconv = require('iconv-lite');
if(isMacRomanEncoded(data)){
  console.log('MacRoman encoded');
  let buffer = iconv.encode(data, 'MacRoman');
  console.log(iconv.decode(buffer, 'utf-8'));
}else{
  console.log('MacRoman not encoded');
  console.log(data);
}

from node-iconv.

bnoordhuis avatar bnoordhuis commented on July 18, 2024

Using a heuristic like that seems pretty flawed. Treating every string containing √ as corrupted macroman is bound to generate lots of false positives.

Aside: data.includes('√') is more idiomatic nowadays than data.indexOf('√') > -1

from node-iconv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.