Giter Site home page Giter Site logo

mutalyzer's Introduction

Mutalyzer

image

image

image

image

image

image

image


Package designed to check descriptions of sequence variants according to the Human Genome Sequence Variation Society (HGVS) guidelines.

Please see ReadTheDocs for the latest documentation.

mutalyzer's People

Contributors

jkvis avatar marksantcroos avatar mihailefter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mutalyzer's Issues

Usage of legacy locus selectors.

The name checker crashes on the following description.

NG_012337.1(SDHD):c.274G>T

It would be nice if whenever a legacy locus selector is used, we try to find it in the reference model and present the user with a selectable list of options. E.g., in this particular example, we could say something like

Transcript "SDHD" not found, but the a gene was found by that name. Please choose from:
NG_012337.1(NM_003002.2):c.274G>T (succinate dehydrogenase complex, subunit D, integral membrane protein)

Likewise, we could allow for the HGNC id in the same way.

Note: I do not suggest to resolve the full legacy locus selectors (e.g., SDHD_v1). In this case I would discard everything after the _ and follow the same procedure described above.

Multiple entry points for position converter.

In the position_convert​ endpoint, there seem to be multiple ways of providing input (i.e., via a description and via a combination of other input fields). It would be cleaner to split this into two different endpoints.

Name checker bug.

This variant inserts two consecutive Cs. It is corrected however, to a duplication that does not contain two consecutive Cs.

Start and end positions swapped.

When the following request is done to the API:

curl -X GET "http://v3.mutalyzer.nl/api/reference_model/NM_002001.2" -H  "accept: application/json"

we get the following response:

"model": {
  "id": "NM_002001.2",
  "type": "record",
  "location": {
    "type": "range",
    "start": {
      "type": "point",
      "position": 1191
    },
    "end": {
      "type": "point",
      "position": 0
    }
  },
...

The start and end positions seem to be swapped.

Suggestion for performance.

Perhaps we should not send the entire reference model and reference sequence to the JavaScript client by default. This could be done on request, if it is absolutely needed.

Incorrect mapping around splice sites.

Variant

NM_002001.4:c.55_56insTTTT

is converted to:

NC_000001.11(NM_002001.4):c.55_56insTTTT

Which is not correct because there is an intron between c.55 and c.56.

It is unclear how to map this variant, for now it would be nice to raise an error.

Wrong title.

The title of description_extract​ says: "Convert a position".

Repeated sequences.

Add support for repeated sequences using the following format:

start _ end SEQ [ repeat_number ]

where SEQ is the repeat unit, which:

  1. occurs repeat_number_seq times between start and end locations in the reference sequence.
    1.1. repeat_number_seq >= 0
    1.2. end - start + 1 % |SEQ| = 0
  2. occurs repeat_number of times in the observed sequence, with repeat_number >= 0.

Short sequence repeats.

When checking the following description:

LRG_24:g.5525C[4]

The non-informative message "Some response error occured." appears. I would expect either a message stating that the operation is not supported, or a normalised result.

cdna to genomic converter : is data up to date ?

Hi everyone !

We are trying to use your API to convert cdna to genomic position. Overall, it's working pretty fine, but we had a problem with one conversion :
https://v3.mutalyzer.nl/positionconverter?referenceId=NC_000003.11&fromSelectorId=NM_014850.4&fromCoordinateSystem=c&position=2392&toSelectorId=&toCoordinateSystem=g&includeOverlapping=true
The problem is the version of the NM : NM_014850.4 doesn't work, but NM_014850.3 works fine.
For the NCBI, the .4 version is the one accepted since november 2018 (https://www.ncbi.nlm.nih.gov/nuccore/NM_014850), is this time gap normal ? And if yes, where can I find the accepted NM list for a given NC ?

Thanks,

Quentin Riché-Piotaix, PhD
Bioinformatic Engineer,
CHU Poitiers

Missing warning messages.

The following descriptions are (rightfully) silently corrected. However a warning about why they were corrected would be in order.

NG_012337.1:g.7125+1G>T
NG_012337.1:g.7125G>TA

For the first description I would expect a warning about using an intronic position without a proper exon boundary.
For the second description I would expect a warning about the type (operator) used.

Normalised description model missing.

I can see the description model of the (possibly wrong) input, but the description model after normalisation is missing. Arguably, we should only offer the normalised model, if any at all.

Wrong insertion of a range.

A description like NG_123.4:g.ins100_110 is short hand for NG_123.4:g.insNG_123.4:100_110. however, this variant differs from this one.

I suspect that the selection of a transcript may have something to do with this.

RNA descriptions.

The following description (generated by Mutalyzer) is not accepted: NG_012337.1(NM_003002.2):r.([274g>u;278u>g])

bug converting cDNA to genomic position with Mutalyzer API v3

Hi team,

We are trying to use your API to convert cDNA sequenced to genomic positions: https://v3.mutalyzer.nl/positionconverter?referenceId=NM_000334.4&fromSelectorId&fromCoordinateSystem=c&position=9877&toSelectorId&toCoordinateSystem=g&includeOverlapping=true

The results obtain is not valid with this version of the API, in this example it should be: NC_000017.10:g.62013765C>T as we correctly obtain when using Mutalyzer v2: https://mutalyzer.nl/position-converter?assembly_name_or_alias=GRCh37&description=NM_000334.4%3Ac.9877G%3EA

Thanks!
Leslie Matalonga

--
Leslie Matalonga, PhD
Clinical Genomics Specialist
CNAG-CRG
Tel:934020828

Missing feedback.

The following description:

NC_000016.9:g.[15815278C>T;15815278del]

is normalised to:

NC_000016.9:g.15815278C>T

Part of the description is discarded, but no warning or errors are given.

Server error.

Some internal server error is triggered when checking the following variant description.

NC_000001.11:g.114750024_114750025ins[(123);114750025_114750040]

Wrong normalisation.

The following variant description :

NC_000016.9:g.[15815278C>A;15815279del]

is erroneously normalised to:

NC_000016.9:g.15815277_15815279dup

No exons in transcript model.

When a transcript is used in the name checker, error ESELECTORMODELNOEXONS may be raised. The name checker can and should continue in this case, by assuming that the whole transcript is one big exon.

Halt on ambiguous descriptions.

In the following example, the description can not be interpreted because of internal inconsistencies.

NG_012337.1:g.7125delGACinsT

According to the position, one nucleotide is deleted, but according to the (optional) sequence, three nucleotides are deleted. In case of such inconsistencies, I would suggest to halt instead of silently correcting the description.

Incorrect allele descriptions.

The following variant description:

LRG_303:g.6883_6884insTTTCGCCCC

is correctly normalised to:

LRG_303:g.6875_6883dup

However, when an other variant is added upstream, e.g.:

LRG_303:g.[11del;6883_6884insTTTCGCCCC]

it is incorrectly normalised to:

LRG_303:g.[11del;6883_6884insCGCCCCTTT]

Perhaps this is a bug in the mutator module?

mutalyzer_name_checker error?

Hi,

I've installed the mutalyzer 3.0.0a2 dev0 from source, but can not be applied.

Errors were attached.

  File "/bioinfo/software/miniconda3/bin/mutalyzer_name_checker", line 33, in <module>
    sys.exit(load_entry_point('mutalyzer==3.0.0a2.dev0', 'console_scripts', 'mutalyzer_name_checker')())
  File "/bioinfo/software/miniconda3/bin/mutalyzer_name_checker", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/bioinfo/software/miniconda3/lib/python3.7/site-packages/importlib_metadata/__init__.py", line 167, in load
    module = import_module(match.group('module'))
  File "/bioinfo/software/miniconda3/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/bioinfo/software/miniconda3/lib/python3.7/site-packages/mutalyzer-3.0.0a2.dev0-py3.7.egg/mutalyzer/cli.py", line 4, in <module>
    from mutalyzer.name_checker import name_check
  File "/bioinfo/software/miniconda3/lib/python3.7/site-packages/mutalyzer-3.0.0a2.dev0-py3.7.egg/mutalyzer/name_checker.py", line 1, in <module>
    from .description import Description
  File "/bioinfo/software/miniconda3/lib/python3.7/site-packages/mutalyzer-3.0.0a2.dev0-py3.7.egg/mutalyzer/description.py", line 10, in <module>
    from mutalyzer_mutator import mutate
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 668, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 638, in _load_backward_compatible
  File "/bioinfo/software/miniconda3/lib/python3.7/site-packages/mutalyzer_mutator-0.2.0-py3.7.egg/mutalyzer_mutator/__init__.py", line 17, in <module>
  File "/bioinfo/software/miniconda3/lib/python3.7/site-packages/mutalyzer_mutator-0.2.0-py3.7.egg/mutalyzer_mutator/__init__.py", line 7, in _get_metadata
  File "/bioinfo/software/miniconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 482, in get_distribution
    raise TypeError("Expected string, Requirement, or Distribution", dist)
TypeError: ('Expected string, Requirement, or Distribution', None)

Any tips to fix this error?

Thanks,
Junfeng

Missing default return values.

The following pattern is found a number of times (e.g., 1, 2, 3, 4) in this project.

if something:
    return a
elif something_else:
    return b

This however leads to an inconsistency in return type when neither something nor something_else is true. A default return value is preferred here.

Also see the recommendation "Either all return statements in a function should return an expression, or none of them should." (pep8).

Incorrect example.

The example on the Name Checker page results in an error. It would be better to only show working examples.

Negative strand shift

For variants on the negative strand the 3' rule is not applied.

Example:

  • NG_008835.1(NM_001168390.2):c.*3186del should be normalized to NG_008835.1(NM_001168390.2):c.*3188del. The genomic description should be NG_008835.1:g.320804del

image

Duplications normalization problem

It seems that normalizing duplications on the reverse strand is not performed correctly:

  • NC_000001.11(NM_032833.5):c.65_66insGGCTTCCGGTTCTGGCC is wrongly normalized to NC_000001.11(NM_032833.5):c.66_82dup. On the transcript reference it seems fine: NM_032833.5:c.65_66insGGCTTCCGGTTCTGGCC is normalized to NM_032833.5:c.49_65dup.
  • NC_000009.11:g.21974758_21974759insC should be normalized to NC_000009.11(NM_000077.5):c.68dup and not to NC_000009.11(NM_000077.5):c.69dup. Next, when NC_000009.11(NM_000077.5):c.69dup is used as in put it is wrongly normalized to NC_000009.11(NM_000077.4):c.70dup. It seems like there is a shifting problem.
  • NG_012337.1(NM_012459.2):c.5_6dup is wrongly normalized to NG_012337.1(NM_012459.2):c.7_8dup.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.