Giter Site home page Giter Site logo

ledger: handling unicode about go HOT 9 CLOSED

exercism avatar exercism commented on July 16, 2024
ledger: handling unicode

from go.

Comments (9)

soniakeys avatar soniakeys commented on July 16, 2024

Hmm, lemme try working the exercise...

from go.

soniakeys avatar soniakeys commented on July 16, 2024

It looks like a little better would be to count just runes that pass unicode.IsGraphic? (didn't quite work.)

Edit: And a little better would be to use godoc.org/golang.org/x/text/width...

Ha! I have my program passing with descriptions of "Freude schöner Götterfunken" (those are combining characters) and "Ode To Joy 기쁨 에 송시 Hyuna". I'll hold off on posting an iteration so I don't spoil your fun.

from go.

soniakeys avatar soniakeys commented on July 16, 2024

Huh. I was trying to use godoc.org/golang.org/x/text/currency but sadly ran into this:

    // TODO: use pattern.
    io.WriteString(s, opt.symbol(lang, cur))
    if v.amount != nil {
        s.Write(space)

The "pattern" mentioned in the comment would be a CLDR Number Pattern, which would do the right thing with a space between the currency symbol and the amount -- rather than hard coding a space.

from go.

petertseng avatar petertseng commented on July 16, 2024

Ahhh, so we should think about wide characters as well, and this is super interesting for your example "Ode To Joy 기쁨 에 송시 Hyuna" because the cut should happen in the middle of a wide character (in the middle of that 시) . What would you say the result should be when that happens? Replace the wide character with a " ...", or should it just be "..." without the space?

Date       | Description               | Change
01/01/2015 | Ode To Joy 기쁨 에 송...  |        $1.00 

or

Date       | Description               | Change
01/01/2015 | Ode To Joy 기쁨 에 송 ... |        $1.00 

(Oy, that doesn't even look right on my browser because the wide characters aren't exactly the width of two narrows... but it does look fine in a monospace font in my editor)

from go.

petertseng avatar petertseng commented on July 16, 2024

Eh, submitted an iteration that deals with wide characters and combining characters, though I'm not confident that I'm really doing the combining characters the right way. I'm somewhat wondering whether we should have people deal with wide characters at all. The argument for it is that if we're dealing with one aspect of unicode (that some codepoints take more than one byte) then we should deal with other aspects too (that characters have varying widths). On the other hand, it could be a lot to deal with in an exercise that is mostly about refactoring some existing code. Then again, maybe it's not that bad if we write some existing code in the initial ledger.go

I am almost tempted to say "this variable width business even deserves its own exercise", but I don't know.

from go.

soniakeys avatar soniakeys commented on July 16, 2024

Cool. I uploaded an iteration at http://exercism.io/submissions/fbab39b6626b498d96e9f0466d4a5c6d. You can see I handle combining characters with if !unicode.Is(unicode.Mn, r) {.

Unicode is a rabbit hole. To keep the exercise focused on refactoring, we could just take out the multibyte test case and make it all ASCII. If we leave it in, we should at least handle the existing test case properly. Beyond that, I don't know. The exercise gets more complex, but I think in the real world most anyone who handles Unicode has to handle combining characters. I hear Go is popular in China. It's good stuff to be able do.

from go.

petertseng avatar petertseng commented on July 16, 2024

Ah I see. I guess my solution just kinda works by chance. I also do a width.LookupRune(r).Kind() but it so happened that the comining characters were EastAsianAmbiguous and so I just treated them as zero... but I'm sure there have to be other EastAsianAmbiguous characters that have nonzero widths, so my solution is not quite good enough.

Given all this, maybe the best thing to do is to go with ASCII only in the ledger tests, and then possibly think of some separate exercise that will deal with Unicode. Will probably be about 24 hours before I can do the first thing so someone could feel free to beat me to it. The Unicode exercise will of course need a bit of thought.

from go.

kytrinyx avatar kytrinyx commented on July 16, 2024

I think you're both right that

a) unicode handling is important, and
b) it's a rabbit hole, and
c) we should probably put it in a separate exercise.

from go.

soniakeys avatar soniakeys commented on July 16, 2024

resolved with #199

from go.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.