xiamaz / pedia-workflow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pedia-charite/pedia-workflow

0.0 0.0 1.0 207.21 MB

This is the global workflow analysing data. 1. Quality check; 2. Phenomization; 3. Simulation; 4. classification

Python 96.53% Ruby 0.31% R 3.16%

pedia-workflow's Introduction

Hi there 👋

pedia-workflow's People

Contributors

Watchers

Forkers

gsalma

pedia-workflow's Issues

Corrected hgvs are not returned as hgvs objects

When corrected hgvs strings are used, these are directly returned as strings and not as hgvs objects. This causes an error in the reference transcript check with mutalyzer, since the acc attribute of the hgvs objects is being edited to correct the reference transcript field.

Current behavior:
When error fixer has hgvs string overrides, these are returned as [str]

Expected behavior:
The parse function should always return a list of hgvs objects.

Proposed fix:
Create hgvs objects before returning from the parser step after loading the hgvs strings from the error fixer.

geneList is exported as an array

The geneList generated though the case object to OldJson exporter, creates geneList entries with a genes list instead of gene_id, gene_omim_id and gene_symbol separated into separate keys of the gene entry.

Observed behavior:

{    ....
genes : [ {gene_symbol: 'xx', gene_id: 'xx', gene_omim_id: 'xx} ]
...   }

Expected behavior

{    ....
gene_symbol: 'xx',
gene_id: 'xx',
gene_omim_id: 'xx
...   } // each genes entry with multiple genes should be separated into multiple geneList objects.

Proposed solution:
Create multiple rows in the process of the geneList export by exploding the pandas dataframe on the genes column. Afterwards separate the genes information, which is a python dict at this stage, into three separate columns of gene_symbol, gene_id, gene_omim_id.

genomicData is incomplete

Genomic data should contain specific mutation information. Currently only the HGVS string is being exported in genomicData.
Expected behavior:
genomicData is a list of objects containing the following structure.

       {
            "Test Information": {
                "Molecular Test": "TARGETED_TESTING",
                "Notation": "CDNA_LEVEL",
                "Genotype": "HOMOZYGOUS",
                "Mutation Type": "Monogenic",
                "Gene Name": "PIGT"
            },
            "Mutations": {
                "additional info": "freeform notes here",
                "Build": "",
                "result": "VARIANTS_DETECTED",
                "Inheritance Mode": "Autosomal Recessive",
                "HGVS-code": "NM_015937.5:c.1079G>T"
            }
        }

Current behavior:
genomicData is a list of objects containing only HGVS-code.

{
     HGVS-code: 'hgvs code string'
}

Proposed fix:
The information can be completed by utilizing information saved in the hgvsparser object. Thus similarly to #2, we will need to ensure this object to be saved in the case object and create some serialization functions to export the expected structure of the genomicData object.

genomicEntry is incomplete

Currently the genomicEntry is not being exported by the case to OldJson converter.

Expected behavior:
genomic entry containing the following information.

{
    "entry_id": 0000,
    "gene": {},
    "result": "VARIANTS_DETECTED",
    "test_type": "EXOME_SEQUENCING",
    "variant_type": "",
    "variants": {
        "gene": {
            "gene_id": 1301,
            "gene_symbol": "COL11A1",
            "omim_id": "120280"
        },
        "mutation": {
            "location": "3816+5",
            "mutation_type": "SUBSTITUTION",
            "original_base": "G",
            "substituted_base": "A",
            "transcript": "NM_001854.3"
        },
        "notes": "adfaklsdjflkasdjfasjdkflas",
        "variant_information": "CDNA_LEVEL",
        "zygosity": "HETEROZYGOUS"
    }
}

Current behavior:
No genomicEntry is being exported.

Proposed fix:
Genomic entry can be assembled through the Information cointained in the HgvsParser Object used to create correct hgvs objects. We will need to save these entities in the case object. Afterwards genomic entries can be exported by parsing the hgvs objects in the case class.

Submitter format is wrong

Current submitter format in the old json format is missing the name field. Also the fields are misnamed.
Expected behavior:
{ user_team: 'xx', user_email: 'xx', user_name: 'xx' }

Current behavior:
{ team: 'xx', email: 'xx'}

Old specifications of the old jSON format seem to be inconsistent on the correct specification of the submitter field. Thus the proposed format will need to be adapted. Only changes in the Case to OldJson conversion will be necessary.

jSON export from new format

Currently a case object can only be used to create an OldJson object, which can thereafter be saved.

NewJson objects can currently not be created from case objects, but it would be helpful to be able to write back our fixes back to the original specification.

This will require some changes to the current codebase including:

NewJson constructor will need to be adapted for multiple contructors. This means, that the linker function should only be called on reading a new json file.
An additional constructor will need to be written for creating NewJson from case objects. It should be quite similar to the OldJson converter, just with different fields.

Algo deploy version not included in oldJson export

Currently the algorithm version deployed by Face2Gene for the generation of the suggested syndromes results is not being exported into the oldJson object.

Expected behavior:
Field in json data containing algo_deploy_version.

Current behavior:
Algorithm version is not present in the json object.

Proposed fix:
The algorithm deploy version is not being exported from the NewJson format into our case file. By adding an algo_version attribute to our case object, this information can thereafter be exposed to the case to oldjson export process.

HGVS Transcript validation

Currently hgvs objects are generated only on the basis of the parseability of the hgvs string. No variant validation is being done.

Expected result:
All hgvs variants should be consistent to a reference. This needs to be ensured for all variants present in the case object.

Current result:
Variants in the case object are not validated.

Proposed fix:
Transcript validation can be done via Mutalyzer or UTA (present in the biocommons/hgvs library). We will need to implement the necessary API bindings.

HGVS strings failing validation should be added to the errorfixer for manual resolution.

Since additional external API calls can reduce the reliability of the entire pipeline, some form of storage of externally validated hgvs strings should be implemented. This could also be done via the errorfixer. The generated dictionary of genomic_entry_id to hgvs strings can thereafter be used to quickly translate the raw data into correct hgvs variants.

xiamaz / pedia-workflow Goto Github PK

pedia-workflow's Introduction

Hi there 👋

pedia-workflow's People

Contributors

Watchers

Forkers

pedia-workflow's Issues

Corrected hgvs are not returned as hgvs objects

geneList is exported as an array

genomicData is incomplete

genomicEntry is incomplete

Submitter format is wrong

jSON export from new format

Algo deploy version not included in oldJson export

HGVS Transcript validation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent