DicomValidation for VR PN about fo-dicom HOT 17 CLOSED

alpha-yanagida commented on September 21, 2024

DicomValidation for VR PN

from fo-dicom.

Comments (17)

alpha-yanagida commented on September 21, 2024 2

I'm not sure yet if this behavior is expected or a bug (I may have another look some time later), but you could avoid this problem by using the code I showed above, e.g.:

I will leave this issue as it may be investigated at a later date from the above.
If you do not need this issue, you can close it. Or I will close it.

from fo-dicom.

mrbean-bremen commented on September 21, 2024 1

Closing this issue after creating the spin-off, as discussed.

from fo-dicom.

mrbean-bremen commented on September 21, 2024

Can you please add the SpecificCharacterSet value of the dataset? Also, can you show the byte values for the encoded patient name? From the ASCII output above (�$Bh=**�(B=), they are not completely apparent.

from fo-dicom.

mrbean-bremen commented on September 21, 2024

So, you have "\ISO 2022 IR 87" as SpecificCharacterSet in the result, but I'm still not sure what is the value of the returned patient name tag (in bytes). Can you please add this?

I understand that you are showing some code that reproduces the behavior, but I don't see where that code is used (and it is incomplete). As far as I understood, you get the problem with a dataset received via a C-FIND response, correct?

from fo-dicom.

alpha-yanagida commented on September 21, 2024

I added the following

resultDataset.AddOrUpdate(DicomTag.SpecificCharacterSet, "\ISO 2022 IR 87");
Byte data of patient name from Wireshark.
Added stack trace to Actual error content

I understand that you are showing some code that reproduces the behavior, but I don't see where that code is used (and it is incomplete). As far as I understood, you get the problem with a dataset received via a C-FIND response, correct?

Sorry for the poor explanation, but the app has the ability to act as a MWM server (SCP) and return data such as patient name according to the C=FIND received from the modality (SCU).

The role of the application is to mediate DICOM communication between the core system and the modalities.

csv file of patient information from core system falls into folder after barcode reading
when the app receives C=FIND from the modality, it reads the csv file, formats it for DICOM communication, and sends it to the modality.

The problem occurred when the patient name was formatted for DICOM communication, i.e., when the patient name was added to the data set. (See also the actual error description.)

from fo-dicom.

mrbean-bremen commented on September 21, 2024

I have checked how fo-dicom handles that patient name, and I did not have any problems. To test it, I also patched a real DICOM file with that patient name and encoding, and the patient name is correctly read, decoded and validated on reading the dataset.

It would help if you could show some code with that behavior. My assumption is that the patient name is somehow wrongly decoded, so that that "=" will still be in the string and be seen as a delimiter.

What is done in the code is something like this (not exactly, this is for illustration), which works without problems:

          var bytes = new byte[]  // your patient name
            {
                0x4e, 0x49, 0x53, 0x48, 0x49, 0x4b, 0x41, 0x57, 0x41, 0x5e, 0x52, 0x49, 0x53, 0x41, 0x54, 0x4f,
                0x3d, 0x1b, 0x24, 0x42, 0x40, 0x3e, 0x40, 0x6e, 0x1b, 0x28, 0x42, 0x5e, 0x1b, 0x24, 0x42, 0x68,
                0x3d, 0x4e, 0x24, 0x1b, 0x28, 0x42, 0x3d, 0x1b, 0x24, 0x42, 0x25, 0x4b, 0x25, 0x37, 0x25, 0x2b,
                0x25, 0x6f, 0x1b, 0x28, 0x42, 0x5e, 0x1b, 0x24, 0x42, 0x25, 0x6a, 0x25, 0x35, 0x25, 0x48, 0x1b,
                0x28, 0x42
            };
            var dataset = new DicomDataset();
            result.Add(DicomTag.SpecificCharacterSet, "\\ISO 2022 IR 87");

            var patientName = new DicomPersonName(DicomTag.PatientName,
                dataset.GetEncodingsForSerialization(), new MemoryByteBuffer(bytes));
            // patientName has now the correct decoded value
            dataset.Add(patientName);  // does validation

from fo-dicom.

alpha-yanagida commented on September 21, 2024

The following is an excerpt of the code.

PatientNameAll in the WorklistItem contained "=".

Individual names will be withheld.
� is control code ESC.

// in worklist item model
public string patientName { get; private set; }
public string patientNameAll => DicomEncoding.GetEncoding("ISO 2022 IR 6").GetString(patientNameAllByte); }
private byte[] patientNameAllByte { get; set; }

List<string> namesKanji = {"**", "莉*"}; // patient name read from csv
// Format patientName to ROMAJI^NAME=KANJI^NAME=KANA^NAME
var encoding1 = DicomEncoding.GetEncoding("ISO 2022 IR 6");
var encoding2 = DicomEncoding.GetEncoding("ISO 2022 IR 87");

var nameByte = encoding1.GetBytes($"{string.Join("^", namesRomaji)}=").ToList(); // romaji
for (int i = 0; i < namesKanji.Count; i++) // Kanji
{
    nameByte.AddRange(encoding2.GetBytes($"{namesKanji[i]}")); if (i ! = namesKanji.Count; i++)
    if (i ! = namesKanji.Count - 1)
    {
        nameByte.AddRange(encoding1.GetBytes("^")); if (i ! = namesKanji.Count - 1)
    }
}
// Kana is treated in the same way as Kanji

patientNameAllByte = nameByte.ToArray();
patientName = patientNameAll; // patientNameAll is *********^******=�$B@>@n�(B^�$Bh=N$�(B=�$B%K%7%+%o�(B^�$B%j%5%H�(B

// in MyDicomService file

var resultDataset = new DicomDataset();
resultDataset.AddOrUpdate(DicomTag.SpecificCharacterSet, "\\ISO 2022 IR 87");
resultDataset.AddOrUpdate(DicomTag.PatientName, patientName);

Byte data of nameByte

  | [0] | 78 | byte
  | [1] | 73 | byte
  | [2] | 83 | byte
  | [3] | 72 | byte
  | [4] | 73 | byte
  | [5] | 75 | byte
  | [6] | 65 | byte
  | [7] | 87 | byte
  | [8] | 65 | byte
  | [9] | 94 | byte
  | [10] | 82 | byte
  | [11] | 73 | byte
  | [12] | 83 | byte
  | [13] | 65 | byte
  | [14] | 84 | byte
  | [15] | 79 | byte
  | [16] | 61 | byte
  | [17] | 27 | byte
  | [18] | 36 | byte
  | [19] | 66 | byte
  | [20] | 64 | byte
  | [21] | 62 | byte
  | [22] | 64 | byte
  | [23] | 110 | byte
  | [24] | 27 | byte
  | [25] | 40 | byte
  | [26] | 66 | byte
  | [27] | 94 | byte
  | [28] | 27 | byte
  | [29] | 36 | byte
  | [30] | 66 | byte
  | [31] | 104 | byte
  | [32] | 61 | byte
  | [33] | 78 | byte
  | [34] | 36 | byte
  | [35] | 27 | byte
  | [36] | 40 | byte
  | [37] | 66 | byte
  | [38] | 61 | byte
  | [39] | 27 | byte
  | [40] | 36 | byte
  | [41] | 66 | byte
  | [42] | 37 | byte
  | [43] | 75 | byte
  | [44] | 37 | byte
  | [45] | 55 | byte
  | [46] | 37 | byte
  | [47] | 43 | byte
  | [48] | 37 | byte
  | [49] | 111 | byte
  | [50] | 27 | byte
  | [51] | 40 | byte
  | [52] | 66 | byte
  | [53] | 94 | byte
  | [54] | 27 | byte
  | [55] | 36 | byte
  | [56] | 66 | byte
  | [57] | 37 | byte
  | [58] | 106 | byte
  | [59] | 37 | byte
  | [60] | 53 | byte
  | [61] | 37 | byte
  | [62] | 72 | byte
  | [63] | 27 | byte
  | [64] | 40 | byte
  | [65] | 66 | byte

from fo-dicom.

mrbean-bremen commented on September 21, 2024

I'm not sure yet if this behavior is expected or a bug (I may have another look some time later), but you could avoid this problem by using the code I showed above, e.g.:

var dataset = new DicomDataset();
result.Add(DicomTag.SpecificCharacterSet, "\\ISO 2022 IR 87");

// assuming patientNameAllByte is of type byte[] and has the correct value
var patientName = new DicomPersonName(DicomTag.PatientName,
     dataset.GetEncodingsForSerialization(), new MemoryByteBuffer(patientNameAllByte));
dataset.Add(patientName);

from fo-dicom.

alpha-yanagida commented on September 21, 2024

Sorry for the late reply.

I tried the above code and was able to work around the problem.

Thanks for your time!

from fo-dicom.

chen-wu commented on September 21, 2024

It seems the "dataset.Add" function cannot accept CJK string. If you are using some CJK characters in the string, it may not be encoded correctly.

    var raw = new byte[] {
        0x59, 0x61, 0x6d, 0x61, 0x64, 0x61, 0x5e, 0x54, 0x61, 0x72, 0x6f, 0x75, 0x3d, 0x1b, 0x24, 0x42, 0x3b, 0x33,
        0x45,
        0x44, 0x1b, 0x28, 0x42, 0x5e, 0x1b, 0x24, 0x42, 0x42, 0x40, 0x4f, 0x3a, 0x1b, 0x28, 0x42, 0x3d, 0x1b, 0x24,
        0x42,
        0x24, 0x64, 0x24, 0x5e, 0x24, 0x40, 0x1b, 0x28, 0x42, 0x5e, 0x1b, 0x24, 0x42, 0x24, 0x3f, 0x24, 0x6d, 0x24,
        0x26,
        0x1b, 0x28, 0x42
    }; // https://dicom.nema.org/medical/dicom/current/output/html/part05.html#table_H.3-1

    var ds1 = new DicomDataset();
    ds1.Add(DicomTag.SpecificCharacterSet, "\\ISO 2022 IR 87");
    ds1.Add(DicomTag.PatientName, "Yamada^Tarou=山田^太郎=やまだ^たろう");
    
    var ds2 = new DicomDataset();
    ds2.Add(DicomTag.SpecificCharacterSet, "\\ISO 2022 IR 87");
    var pn = new DicomPersonName(DicomTag.PatientName,
        DicomEncoding.GetEncodings(ds2.GetValues<string>(DicomTag.SpecificCharacterSet)),
        new MemoryByteBuffer(raw));
    ds2.Add(pn);
    
    var out1 = ds1.GetString(DicomTag.PatientName); // "Yamada^Tarou=山田^太郎=やまだ^たろう"
    var out2 = ds2.GetString(DicomTag.PatientName); // "Yamada^Tarou=山田^太郎=やまだ^たろう"
    
    var outbuf1 = ds1.GetValues<byte>(DicomTag.PatientName); // length of outbuf1 is 26
    var outbuf2 = ds2.GetValues<byte>(DicomTag.PatientName); // length of outbuf2 is 60

The ds2 is a correct way to add such DicomItem. However, the GetEncodingsForSerialization() seems to be an internal method. So, I replaced it to GetEncodings.
26 is exactly the length of "Yamada^Tarou=山田^太郎=やまだ^たろう" string. But it should be encoded as a 60 bytes array.

from fo-dicom.

mrbean-bremen commented on September 21, 2024

@chen-wu - yes, this is exactly the described behavior, including the workaround, and the reason why this issue is still open.
I suspect that the problem has to do with the multi-valued encoding (e.g. only the first encoding is used, which is ASCII in this case).
And yes, I used an internal instead of the public method for the workaround, good catch.

from fo-dicom.

mrbean-bremen commented on September 21, 2024

I just confirmed that saving with a single-value encoding works well, it only does not work with a multi-valued encoding.
Note that the problem only occurs with newly added data, that cannot be represented by the first encoding.

I just had forgotten that this is not implemented (and never has been), as I wrote a comment to that extent myself in the code in a PR I made some time ago...

from fo-dicom.

gofal commented on September 21, 2024

patientNameAllByte = nameByte.ToArray();
patientName = patientNameAll; // patientNameAll is ***^=�$B@>@n�(B^�$Bh=N$�(B=�$B%K%7%+%o�(B^�$B%j%5%H�(B

// in MyDicomService file

var resultDataset = new DicomDataset();
resultDataset.AddOrUpdate(DicomTag.SpecificCharacterSet, "\ISO 2022 IR 87");
resultDataset.AddOrUpdate(DicomTag.PatientName, patientName);

The problem, that I see in your example is, that you have various strings that should be binary serialized using different encodings.
Then you use the various encodings to generate an array of bytes. But then you join them together and instead of passing this already serialized byte array to the DicomDataset, you convert it back into a .net string into the variable patientName. This might cause the error, because when the byte-array is converted to a string, then .net takes one encoding and decodes it into a string. And after passing this string to DicomDataset, then this string is again binary serialized using one encoding.

This is why you did not face any issue as soon as you pass the bytearray directly into the DicomDataset

from fo-dicom.

mrbean-bremen commented on September 21, 2024

@gofal - that is true, but maybe the example by @chen-wu shows the problem better. If using a single value encoding that can encode the name (like GB18030 or UTF-8), this works without a problem.
The problem is that fo-dicom always uses the first encoding to encode the string, and ignores the rest - thus the comment in the code. This is something that I should have implemented at the time but didn't, probably because I wasn't sure how to do it.
If the whole string can be encoded using one of the encodings, this would be relatively easy - we could just set the fallback encoding to EncoderFallback.ExceptionFallback while encoding, and handling the exception (which will appear rarely, so this will probably not be a performance problem) by trying the other encodings. For PN though special handling is needed, as each component has to be encoded separately.
The code where the encoding should be changed inside the string is more complicated. I'm not sure how to do this with Encoding, but then I'm not sure if this a realistic use case at all.

from fo-dicom.

gofal commented on September 21, 2024

The code where the binary data is added directly seems to solve the issue that no validation error is raised. But does it also work on the modality side? Is the modality able, to render the PatientName correctly?
https://dicom.nema.org/dicom/2013/output/chtml/part05/chapter_6.html Dicom defines that it is not sufficient, just to concanate the various binary streams of the various encodings, but it also has to contain the separator 0x1b followed by 2 bytes that tell the parser which codec to use. This ESC-separtor followed by the two bytes referencin the codec ISO 2022 IR 87 seems to be missing.

fo-dicom is able to decode text that is encoded with mutliple encodings. But I also highly recommend taht if someone creates new data to use an encoding that contains all used characters, e.g. UTF-8. I also cannot imagine a usecase, where this is not possible.

If we wanted to support encoding a text with various encodings, then this would require some API like StringBuilder, where you can add one text after the other and with each text also pass the encoding to use.
And those values should then be excluded from transcoding in case the SpecificCharacterSet of the DicomDataset changes.
Not a trivial task.

from fo-dicom.

mrbean-bremen commented on September 21, 2024

Yes, I agree. I had a closer look, and there is a reason why I didn't implement this last time - it can get quite complicated.
It should be possible to implement this without manually adding the encoding (I did this in pydicom), but there are ways to workaround the problem, and the code to do this would be quite complicated, and also possibly slower.

So while I'm not really happy with the current state (especially because I did some implementation on the decoding side and should have also done the encoding), I think we can live with this for now.

from fo-dicom.

gofal commented on September 21, 2024

Thank you! I could not find a issue about that encoding. Could you please open a new issue with label "Enhancement" about the feature of encodig a text with codeextensions? Then we could close this issue, but we have the feature in our backlog for future versions,

from fo-dicom.

DicomValidation for VR PN about fo-dicom HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent