Giter Site home page Giter Site logo

Comments (9)

chen-wu avatar chen-wu commented on June 16, 2024 2

I also invetigated it a little more. Our problem here is the difference between the code page of dotnet and the DICOM standard. So, I tried to use the ICU to convert the UTF16 string to the target charset. However, the dotnet version of ICU library did not implement the conversion.

I will try to check whether other encodings have simlilar problem or not. If not, maybe we can just add patch for ISO 2022 IR 13 only. Or we just mark it as an exception that we cannot handle.

from fo-dicom.

mrbean-bremen avatar mrbean-bremen commented on June 16, 2024

I'll have a look at this, though probably not for the current release.

from fo-dicom.

chen-wu avatar chen-wu commented on June 16, 2024

It seems whether we need to use another encoder depends on whether the current encoder throws exception or not.

However, the encoding in dotnet may be different with the DICOM standard. Some code page may cover more characters.
For example, https://dicom.nema.org/medical/dicom/current/output/html/part05.html#sect_H.3.2.
"ISO 2022 IR 13" is JIS X 0201. And "ISO 2022 IR 87" is JIS X 0208.

But the Shift JIS contains characters for both 0201 and 0208. So, when trying to use IR13(ShiftJIS) to encode the whole string, it will not throw any exception.

These bytes can be decoded correctly in the same way. As they will eventually be processed as Shift JIS.

So, I think it is too difficult to implement such features. Maybe we should just use UTF8 if necessary. Otherwise, we will need to identify the difference between the dotnet encodings with the DICOM ones.

from fo-dicom.

mrbean-bremen avatar mrbean-bremen commented on June 16, 2024

@chen-wu - yes, handling JIS X 201 vs JIS X 208 can indeed be a problem, and we need probably add specific code for that. I actually was aware that there might be some problems with that, but decided to ignore this in the PR for the time being, as it may get a little complicated, and I first wanted to cover the main cases (to my knowledge, this is the only case that may cause such problems).
May we could handle this in a separate issue after this one is finished to avoid too much complexity in a single PR. Anyway, all of this will get integrated only after the next release (which hopefully will be soon).

from fo-dicom.

mrbean-bremen avatar mrbean-bremen commented on June 16, 2024

@chen-wu - do you have some example or test that shows the problem? E.g. a string that is decoded for ISO 2022 IR 13 but is not valid JIS X 201? (or the same with ISO 2022 IR 87 / JIS X 208)?

from fo-dicom.

chen-wu avatar chen-wu commented on June 16, 2024

@mrbean-bremen This example should work.

And you can use following code

        Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
        var raw = new byte[] {
            0xD4, 0xCF, 0xC0, 0xDE, 0x5E, 0xC0, 0xDB, 0xB3, 0x3D, 0x1B, 0x24, 0x42, 0x3B, 0x33, 0x45, 0x44, 0x1B, 0x28,
            0x4A, 0x5E, 0x1B, 0x24, 0x42, 0x42, 0x40, 0x4F, 0x3A, 0x1B, 0x28, 0x4A, 0x3D, 0x1B, 0x24, 0x42, 0x24, 0x64,
            0x24, 0x5E, 0x24, 0x40, 0x1B, 0x28, 0x4A, 0x5E, 0x1B, 0x24, 0x42, 0x24, 0x3F, 0x24, 0x6D, 0x24, 0x26, 0x1B,
            0x28, 0x4A,
        };
        var value = "ヤマダ^タロウ=山田^太郎=やまだ^たろう";
        var value2 = "山田";
        
        var shiftJisEncoding = Encoding.GetEncoding("shift_jis");
        var iso2022JpEncoding = Encoding.GetEncoding("iso-2022-jp");

        var result1 = shiftJisEncoding.GetBytes(value);
        var result2 = shiftJisEncoding.GetBytes(value2);
        var result3 = iso2022JpEncoding.GetBytes(value);
        var result4 = iso2022JpEncoding.GetBytes(value2);

You can see that the "山田" will not trigger exception in Shift-JIS encoding. And the 4 "results" will show that the "山田" is encoded into different bytes.

from fo-dicom.

mrbean-bremen avatar mrbean-bremen commented on June 16, 2024

Thank you, I will have a look at this!

from fo-dicom.

chen-wu avatar chen-wu commented on June 16, 2024

I think we can just give up here, we have too many problems to solve. 😂

The DICOM standard defines some character sets. These character sets are just "standard". Different vendor has different implementation. They may add some additional characters to the code points which are not used in the standard or combine several standards into one code page.

The standard also has different versions. For example, the DICOM standard is using "JIS X 0208-1990", and the "iso-2022-jp" standard contains both "JIS X 0208-1978" and "JIS X 0208-1983", "iso-2022-jp-2" contains "JIS X 0208-1990". I did not check the implementation, it may be more complex.

In our case, the "shift-jis" contains both "JIS X 0201-1997" and "JIS X 0208-1997" characters. However, its 0208 part is not compatible with the 0208 standard. You will need to do some shift conversion. The EUC-JP may be correct.
https://en.wikipedia.org/wiki/Shift_JIS
That's why we cannot encode Japanese text correctly.

So, to make our implementation fully DICOM compliant, we may need to import correct versions of the character set standard from ISO or somewhere else and write the encoder by ourselves to eliminate the difference between the character set standard with the dotnet code pages.

If we still want to use the dotnet encodings, it would be better to also check the range of the bytes encoded to see if we need to move to next encoder.
It may need some additional work to confirm about that, but I think this will only happen in Japanese. For Chinese and Korean, they usually do not need multiple character sets to encode the text.

Note: If you want to check the PS3.3 of DICOM Standard. DO NOT use the HTML version.

from fo-dicom.

mrbean-bremen avatar mrbean-bremen commented on June 16, 2024

It may need some additional work to confirm about that, but I think this will only happen in Japanese

That's what I thought. I think we have to go that extra step to be compliant, but it probably shall be done in an extra PR after #1791 has been merged (I'm holding that one as draft until the next release is out).

from fo-dicom.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.