Giter Site home page Giter Site logo

Comments (18)

zhangb1 avatar zhangb1 commented on August 18, 2024 1

checking, seems something wrong with the BS_AG4BP2PM CNVkit cns file, I am modifying the cns output and rerun the data assembly now.

@komalsrathi consensus_wgs_plus_cnvkit_wxs.BS_AG4BP2PM.tsv.gz should only have BS_AG4BP2PM plus v11 samples.
consensus_wgs_plus_cnvkit_wxs.BS_HSXARQ1K.tsv.gz will have BS_HSXARQ1K plus v11 samples

from d3b-bixu-data-assembly.

jharenza avatar jharenza commented on August 18, 2024

@komalsrathi is this an issue in the warehouse or a transformed file?

from d3b-bixu-data-assembly.

komalsrathi avatar komalsrathi commented on August 18, 2024

Found in the data assembly file which I have linked in the ticket - the same file is being used for reporting code.

from d3b-bixu-data-assembly.

zhangb1 avatar zhangb1 commented on August 18, 2024

Yeah, I don't have those informations when doing the data assembly so I didn't add those.

from d3b-bixu-data-assembly.

aadamk avatar aadamk commented on August 18, 2024

For 1, this may be reliant on migrating out of DT to Redcap for future PNOC trials so that pathology free text gets mapped to the CBTN term upstream by the CRU.
For 2 and 3, we need to get the short and broad hist PR merged into the toolkit.
For 4, do you mean that the PFS_days contains an RNA_library name for that biospecimen?

Also @zhangb1 - for a PNOC008 recurrence sample that was completed last week, BS_AG4BP2PM, the absolute copy number (copy_number field in the linked output file) is a floating point number rather than integer. It seems the depth field from cnvkit may have somehow been pulled into the merged output file, whereas for BS_HSXARQ1K, this is not the case - the copy_number field contains an integer. As such, BS_AG4BP2PM contains many events erroneously called as amplifications. Can you dig into why this may have happened and implement a fix?

from d3b-bixu-data-assembly.

komalsrathi avatar komalsrathi commented on August 18, 2024

For 4, do you mean that the PFS_days contains an RNA_library name for that biospecimen?

Yes, here is what I meant:

> dat %>% filter(Kids_First_Biospecimen_ID == "BS_A7Y1Y314") %>% dplyr::select(RNA_library, PFS_days)
  RNA_library PFS_days
1        <NA> stranded

from d3b-bixu-data-assembly.

komalsrathi avatar komalsrathi commented on August 18, 2024

Also @zhangb1 - for a PNOC008 recurrence sample that was completed last week, BS_AG4BP2PM, the absolute copy number (copy_number field in the linked output file) is a floating point number rather than integer. It seems the depth field from cnvkit may have somehow been pulled into the merged output file, whereas for BS_HSXARQ1K, this is not the case - the copy_number field contains an integer. As such, BS_AG4BP2PM contains many events erroneously called as amplifications. Can you dig into why this may have happened and implement a fix?

Actually @aadamk I am just noticing that in addition to the floating numbers for BS_AG4BP2PM (patient 43 recurrence), the latest merged file consensus_wgs_plus_cnvkit_wxs.BS_AG4BP2PM.tsv.gz does not have any rows corresponding to patient 35 recurrence sample i.e. BS_HSXARQ1K which is why we don't have any CNV findings for this patient...

from d3b-bixu-data-assembly.

komalsrathi avatar komalsrathi commented on August 18, 2024

But the last merged file should have both patients. Because we want to compare all patients up to that point...

from d3b-bixu-data-assembly.

zhangb1 avatar zhangb1 commented on August 18, 2024

We only do N+1.....

from d3b-bixu-data-assembly.

aadamk avatar aadamk commented on August 18, 2024

We only do N+1.....

can you set things up in such a way so that you keep recursively adding on to the latest file?
e.g. N = v11 file; N + 1 = v11 + new patient;
then, when a new patient is enrolled, the next cwl run is added on to N+1 to generate N + 2 (e.g. v11 + previously enrolled patient + new patient), and so on?

from d3b-bixu-data-assembly.

aadamk avatar aadamk commented on August 18, 2024

and for that logic to apply to all files (snv, sv, cnv, rnaseq, fusion, etc)?

from d3b-bixu-data-assembly.

zhangb1 avatar zhangb1 commented on August 18, 2024

Can we setup a meeting to discuss this? We should discuss with @yuankunzhu about how to process that.

We may need to wrap up the new CWL to do that.

from d3b-bixu-data-assembly.

zhangb1 avatar zhangb1 commented on August 18, 2024

Also @komalsrathi in the task here : https://cavatica.sbgenomics.com/u/kfdrc-harmonization/sd-8y99qzjj-data-assembly/tasks/1ba0c658-6c10-49c8-8ec9-cf89bf5b88bf/

the float issue for BS_AG4BP2PM has been fixed.

from d3b-bixu-data-assembly.

aadamk avatar aadamk commented on August 18, 2024

Can we setup a meeting to discuss this? We should discuss with @yuankunzhu about how to process that.

We may need to wrap up the new CWL to do that.

ok - tomorrow's toolkit meeting should be fine.

from d3b-bixu-data-assembly.

komalsrathi avatar komalsrathi commented on August 18, 2024

Thanks @zhangb1 I'll merge those files on my end.

from d3b-bixu-data-assembly.

komalsrathi avatar komalsrathi commented on August 18, 2024

@aadamk Added a 5th point to the data assembly histology issues.

from d3b-bixu-data-assembly.

komalsrathi avatar komalsrathi commented on August 18, 2024

@jharenza I added a 6th point to this ticket - wasn't sure if this is an issue with the v10/v11 file or something during data assembly.

from d3b-bixu-data-assembly.

jharenza avatar jharenza commented on August 18, 2024

@jharenza I added a 6th point to this ticket - wasn't sure if this is an issue with the v10/v11 file or something during data assembly.

Just noticed the following discrepancy, two samples BS_J4E9SW51 and BS_H1XPVS9A are annotated as LGAT in v10/v11 but HGAT in data assembly histology files:

Thanks @komalsrathi. The comparison between data assembly and v11 won't be apples to apples because the data assembly doesn't include molecular subtyping. These two samples changed following subtyping from pathology_diagnosis == High-grade glioma/astrocytoma (WHO grade III/IV) to Pleomorphic xanthoastrocytoma, BRAF V600E. LGG is the module which subtypes PXAs. However, with that being said, they are also special cases because they are "malignant pxa" according to pathology_free_text_diagnosis. Unfortunately, we do not have a field for grade in the histology file (aside from the LGG or HGG pathology_diagnosis field) and PXA can be either grade II or III. One thing we can do in the future is add a manual review of path reports + discussion with pathologists for tumors which are subtyped via LGG subtyping and can be higher grades. This is something I had recently thought about, but is difficult to do perfectly with the given data, so thanks for bringing this up. These two are likely high-grade PXAs, but that also begs the question of whether we want to lump them in with other HGGs. There is a new (2021) category of tumors called "Circumscribed astrocytic gliomas", separate of diffuse HGGs (of interest here), which contains: Pilocytic astrocytoma, High-grade astrocytoma with piloid features, Pleomorphic xanthoastrocytoma, Subependymal giant cell astrocytoma, Chordoid glioma, and Astroblastoma, MN1-altered and these are of mixed grade. So, I guess my advice would be to keep these two samples out of the HGG group since they are likely pretty biologically different, even though they are potentially higher grade tumors. cc @aadamk

from d3b-bixu-data-assembly.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.