Comparing the primary sample information for Patient 43 and 35 from v11 to the corresp

checking, seems something wrong with the <a href="https://cavatica.sbgenomics.com/u/kf

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Also <a class="user-mention notranslate" data-hovercard-type="user" data-

Histology file fields about d3b-bixu-data-assembly HOT 18 OPEN

komalsrathi commented on August 18, 2024

Histology file fields

from d3b-bixu-data-assembly.

Comments (18)

zhangb1 commented on August 18, 2024 1

checking, seems something wrong with the BS_AG4BP2PM CNVkit cns file, I am modifying the cns output and rerun the data assembly now.

@komalsrathi consensus_wgs_plus_cnvkit_wxs.BS_AG4BP2PM.tsv.gz should only have BS_AG4BP2PM plus v11 samples.
consensus_wgs_plus_cnvkit_wxs.BS_HSXARQ1K.tsv.gz will have BS_HSXARQ1K plus v11 samples

from d3b-bixu-data-assembly.

jharenza commented on August 18, 2024

@komalsrathi is this an issue in the warehouse or a transformed file?

from d3b-bixu-data-assembly.

komalsrathi commented on August 18, 2024

Found in the data assembly file which I have linked in the ticket - the same file is being used for reporting code.

from d3b-bixu-data-assembly.

zhangb1 commented on August 18, 2024

Yeah, I don't have those informations when doing the data assembly so I didn't add those.

from d3b-bixu-data-assembly.

aadamk commented on August 18, 2024

For 1, this may be reliant on migrating out of DT to Redcap for future PNOC trials so that pathology free text gets mapped to the CBTN term upstream by the CRU.
For 2 and 3, we need to get the short and broad hist PR merged into the toolkit.
For 4, do you mean that the PFS_days contains an RNA_library name for that biospecimen?

Also @zhangb1 - for a PNOC008 recurrence sample that was completed last week, BS_AG4BP2PM, the absolute copy number (copy_number field in the linked output file) is a floating point number rather than integer. It seems the depth field from cnvkit may have somehow been pulled into the merged output file, whereas for BS_HSXARQ1K, this is not the case - the copy_number field contains an integer. As such, BS_AG4BP2PM contains many events erroneously called as amplifications. Can you dig into why this may have happened and implement a fix?

from d3b-bixu-data-assembly.

komalsrathi commented on August 18, 2024

For 4, do you mean that the PFS_days contains an RNA_library name for that biospecimen?

Yes, here is what I meant:

> dat %>% filter(Kids_First_Biospecimen_ID == "BS_A7Y1Y314") %>% dplyr::select(RNA_library, PFS_days)
  RNA_library PFS_days
1        <NA> stranded

from d3b-bixu-data-assembly.

komalsrathi commented on August 18, 2024

Also @zhangb1 - for a PNOC008 recurrence sample that was completed last week, BS_AG4BP2PM, the absolute copy number (copy_number field in the linked output file) is a floating point number rather than integer. It seems the depth field from cnvkit may have somehow been pulled into the merged output file, whereas for BS_HSXARQ1K, this is not the case - the copy_number field contains an integer. As such, BS_AG4BP2PM contains many events erroneously called as amplifications. Can you dig into why this may have happened and implement a fix?

Actually @aadamk I am just noticing that in addition to the floating numbers for BS_AG4BP2PM (patient 43 recurrence), the latest merged file consensus_wgs_plus_cnvkit_wxs.BS_AG4BP2PM.tsv.gz does not have any rows corresponding to patient 35 recurrence sample i.e. BS_HSXARQ1K which is why we don't have any CNV findings for this patient...

from d3b-bixu-data-assembly.

komalsrathi commented on August 18, 2024

But the last merged file should have both patients. Because we want to compare all patients up to that point...

from d3b-bixu-data-assembly.

zhangb1 commented on August 18, 2024

We only do N+1.....

from d3b-bixu-data-assembly.

aadamk commented on August 18, 2024

We only do N+1.....

can you set things up in such a way so that you keep recursively adding on to the latest file?
e.g. N = v11 file; N + 1 = v11 + new patient;
then, when a new patient is enrolled, the next cwl run is added on to N+1 to generate N + 2 (e.g. v11 + previously enrolled patient + new patient), and so on?

from d3b-bixu-data-assembly.

aadamk commented on August 18, 2024

and for that logic to apply to all files (snv, sv, cnv, rnaseq, fusion, etc)?

from d3b-bixu-data-assembly.

zhangb1 commented on August 18, 2024

Can we setup a meeting to discuss this? We should discuss with @yuankunzhu about how to process that.

We may need to wrap up the new CWL to do that.

from d3b-bixu-data-assembly.

zhangb1 commented on August 18, 2024

Also @komalsrathi in the task here : https://cavatica.sbgenomics.com/u/kfdrc-harmonization/sd-8y99qzjj-data-assembly/tasks/1ba0c658-6c10-49c8-8ec9-cf89bf5b88bf/

the float issue for BS_AG4BP2PM has been fixed.

from d3b-bixu-data-assembly.

aadamk commented on August 18, 2024

Can we setup a meeting to discuss this? We should discuss with @yuankunzhu about how to process that.

We may need to wrap up the new CWL to do that.

ok - tomorrow's toolkit meeting should be fine.

from d3b-bixu-data-assembly.

komalsrathi commented on August 18, 2024

Thanks @zhangb1 I'll merge those files on my end.

from d3b-bixu-data-assembly.

komalsrathi commented on August 18, 2024

@aadamk Added a 5th point to the data assembly histology issues.

from d3b-bixu-data-assembly.

komalsrathi commented on August 18, 2024

@jharenza I added a 6th point to this ticket - wasn't sure if this is an issue with the v10/v11 file or something during data assembly.

from d3b-bixu-data-assembly.

jharenza commented on August 18, 2024

@jharenza I added a 6th point to this ticket - wasn't sure if this is an issue with the v10/v11 file or something during data assembly.

Just noticed the following discrepancy, two samples BS_J4E9SW51 and BS_H1XPVS9A are annotated as LGAT in v10/v11 but HGAT in data assembly histology files:

Thanks @komalsrathi. The comparison between data assembly and v11 won't be apples to apples because the data assembly doesn't include molecular subtyping. These two samples changed following subtyping from pathology_diagnosis == High-grade glioma/astrocytoma (WHO grade III/IV) to Pleomorphic xanthoastrocytoma, BRAF V600E. LGG is the module which subtypes PXAs. However, with that being said, they are also special cases because they are "malignant pxa" according to pathology_free_text_diagnosis. Unfortunately, we do not have a field for grade in the histology file (aside from the LGG or HGG pathology_diagnosis field) and PXA can be either grade II or III. One thing we can do in the future is add a manual review of path reports + discussion with pathologists for tumors which are subtyped via LGG subtyping and can be higher grades. This is something I had recently thought about, but is difficult to do perfectly with the given data, so thanks for bringing this up. These two are likely high-grade PXAs, but that also begs the question of whether we want to lump them in with other HGGs. There is a new (2021) category of tumors called "Circumscribed astrocytic gliomas", separate of diffuse HGGs (of interest here), which contains: Pilocytic astrocytoma, High-grade astrocytoma with piloid features, Pleomorphic xanthoastrocytoma, Subependymal giant cell astrocytoma, Chordoid glioma, and Astroblastoma, MN1-altered and these are of mixed grade. So, I guess my advice would be to keep these two samples out of the HGG group since they are likely pretty biologically different, even though they are potentially higher grade tumors. cc @aadamk

from d3b-bixu-data-assembly.

Histology file fields about d3b-bixu-data-assembly HOT 18 OPEN

Comments (18)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent