Comments (9)
I also invetigated it a little more. Our problem here is the difference between the code page of dotnet and the DICOM standard. So, I tried to use the ICU to convert the UTF16 string to the target charset. However, the dotnet version of ICU library did not implement the conversion.
I will try to check whether other encodings have simlilar problem or not. If not, maybe we can just add patch for ISO 2022 IR 13 only. Or we just mark it as an exception that we cannot handle.
from fo-dicom.
I'll have a look at this, though probably not for the current release.
from fo-dicom.
It seems whether we need to use another encoder depends on whether the current encoder throws exception or not.
However, the encoding in dotnet may be different with the DICOM standard. Some code page may cover more characters.
For example, https://dicom.nema.org/medical/dicom/current/output/html/part05.html#sect_H.3.2.
"ISO 2022 IR 13" is JIS X 0201. And "ISO 2022 IR 87" is JIS X 0208.
But the Shift JIS contains characters for both 0201 and 0208. So, when trying to use IR13(ShiftJIS) to encode the whole string, it will not throw any exception.
These bytes can be decoded correctly in the same way. As they will eventually be processed as Shift JIS.
So, I think it is too difficult to implement such features. Maybe we should just use UTF8 if necessary. Otherwise, we will need to identify the difference between the dotnet encodings with the DICOM ones.
from fo-dicom.
@chen-wu - yes, handling JIS X 201 vs JIS X 208 can indeed be a problem, and we need probably add specific code for that. I actually was aware that there might be some problems with that, but decided to ignore this in the PR for the time being, as it may get a little complicated, and I first wanted to cover the main cases (to my knowledge, this is the only case that may cause such problems).
May we could handle this in a separate issue after this one is finished to avoid too much complexity in a single PR. Anyway, all of this will get integrated only after the next release (which hopefully will be soon).
from fo-dicom.
@chen-wu - do you have some example or test that shows the problem? E.g. a string that is decoded for ISO 2022 IR 13
but is not valid JIS X 201? (or the same with ISO 2022 IR 87 / JIS X 208)?
from fo-dicom.
@mrbean-bremen This example should work.
And you can use following code
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
var raw = new byte[] {
0xD4, 0xCF, 0xC0, 0xDE, 0x5E, 0xC0, 0xDB, 0xB3, 0x3D, 0x1B, 0x24, 0x42, 0x3B, 0x33, 0x45, 0x44, 0x1B, 0x28,
0x4A, 0x5E, 0x1B, 0x24, 0x42, 0x42, 0x40, 0x4F, 0x3A, 0x1B, 0x28, 0x4A, 0x3D, 0x1B, 0x24, 0x42, 0x24, 0x64,
0x24, 0x5E, 0x24, 0x40, 0x1B, 0x28, 0x4A, 0x5E, 0x1B, 0x24, 0x42, 0x24, 0x3F, 0x24, 0x6D, 0x24, 0x26, 0x1B,
0x28, 0x4A,
};
var value = "ヤマダ^タロウ=山田^太郎=やまだ^たろう";
var value2 = "山田";
var shiftJisEncoding = Encoding.GetEncoding("shift_jis");
var iso2022JpEncoding = Encoding.GetEncoding("iso-2022-jp");
var result1 = shiftJisEncoding.GetBytes(value);
var result2 = shiftJisEncoding.GetBytes(value2);
var result3 = iso2022JpEncoding.GetBytes(value);
var result4 = iso2022JpEncoding.GetBytes(value2);
You can see that the "山田" will not trigger exception in Shift-JIS encoding. And the 4 "results" will show that the "山田" is encoded into different bytes.
from fo-dicom.
Thank you, I will have a look at this!
from fo-dicom.
I think we can just give up here, we have too many problems to solve. 😂
The DICOM standard defines some character sets. These character sets are just "standard". Different vendor has different implementation. They may add some additional characters to the code points which are not used in the standard or combine several standards into one code page.
The standard also has different versions. For example, the DICOM standard is using "JIS X 0208-1990", and the "iso-2022-jp" standard contains both "JIS X 0208-1978" and "JIS X 0208-1983", "iso-2022-jp-2" contains "JIS X 0208-1990". I did not check the implementation, it may be more complex.
In our case, the "shift-jis" contains both "JIS X 0201-1997" and "JIS X 0208-1997" characters. However, its 0208 part is not compatible with the 0208 standard. You will need to do some shift conversion. The EUC-JP may be correct.
https://en.wikipedia.org/wiki/Shift_JIS
That's why we cannot encode Japanese text correctly.
So, to make our implementation fully DICOM compliant, we may need to import correct versions of the character set standard from ISO or somewhere else and write the encoder by ourselves to eliminate the difference between the character set standard with the dotnet code pages.
If we still want to use the dotnet encodings, it would be better to also check the range of the bytes encoded to see if we need to move to next encoder.
It may need some additional work to confirm about that, but I think this will only happen in Japanese. For Chinese and Korean, they usually do not need multiple character sets to encode the text.
Note: If you want to check the PS3.3 of DICOM Standard. DO NOT use the HTML version.
from fo-dicom.
It may need some additional work to confirm about that, but I think this will only happen in Japanese
That's what I thought. I think we have to go that extra step to be compliant, but it probably shall be done in an extra PR after #1791 has been merged (I'm holding that one as draft until the next release is out).
from fo-dicom.
Related Issues (20)
- Code in fo-dicom to apply window width and center on an image HOT 1
- TCP listener stuck while processing TLS handshake HOT 4
- HTJ2K Throwing exception HOT 2
- The SCU client transmits images to the fo-dicom c-store-scp, and the speed decreases sequentially after each task is transmitted
- Korean Hangul characters incorrectly read from dicom file HOT 2
- fo-dicom v5 Dependency Injection Documentation HOT 3
- DicomServer performance regression HOT 6
- Microsoft Code Analysis HOT 2
- Creating a TCP connection using `DicomClient.SendAsync()` does not take `AssociationRequestTimeoutInMs` into account HOT 4
- DicomValidation for VR PN HOT 17
- dataset AddOrUpdate not find Encod args HOT 4
- DicomJsonConverter throws for non-DicomElements HOT 7
- Decimal String not adhering to standard? HOT 9
- Service stopped When Changing DICOM File Compression HOT 13
- Incorrect conversion between C# decimal and DICOM Decimal String HOT 1
- ISO 2022 IR 58 Encoded Strings Cannot Be Decoded Correctly HOT 3
- Performance impact in DicomDataset comparing HOT 4
- What is the plan for supporting .NET 8? HOT 3
- Ability to cancel the store request in C-Get
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fo-dicom.