Giter Site home page Giter Site logo

lifs-tools / goslin Goto Github PK

View Code? Open in Web Editor NEW
10.0 5.0 1.0 7.35 MB

Goslin is the Grammar on succinct lipid nomenclature.

Home Page: https://lifs-tools.org/goslin

License: Other

ANTLR 100.00%
parsing nomenclature grammar metabolomics mass-spectrometry lipid lipidomics

goslin's Introduction

goslin

Goslin is the Grammar on succinct lipid nomenclature.

Goslin defines multiple grammers compatible with ANTLRv4 for different sources of shorthand lipid nomenclature. This allows to generate parsers based on the defined grammars, which provide immediate feedback whether a processed lipid shorthand notation string is compliant with a particular grammar, or not.

Overview of Goslin and Tutorials

Goslin 2.0 supports the updated lipid shorthand nomenclature with new structural levels.

Citing Goslin

If you use Goslin or any of the specific implementations in your work, we kindly ask you to cite the original publication:

If you are using any of the new features of Goslin 2.0, please cite the following, updated Goslin 2.0 publication:

References

Related Projects

Test data

  1. testfiles/lipidmaps-names-Feb-10-2020.tsv - generated from LipidMAPS LMSDB export on Feb. 10th, 2020. Filtered all entries without an abbreviation.
  2. testfiles/swisslipids-names-Feb-10-2020.tsv - generated from Swiss Lipids (lipids table) export on Feb. 10th ,2020.

Short samples of lipid names used for testing of the implementations are available from the testfiles directory.

License

The Goslin grammars are licensed under the terms of the MIT license (see LICENSE).

goslin's People

Contributors

dominik-kopczynski avatar nilshoffmann avatar renovate-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

gregfa

goslin's Issues

Ether lipid "O-" not recognized

Describe the bug
Ether lipids are misclassified as Di-acyl-species

To Reproduce
Steps to reproduce the behavior:

  1. Go to Goslin webapplication v. 1.1.4
  2. Enter "PC O-18:1_18:1"
  3. click validate
  4. See result as screen shot: not working . sn-1 ether connection is not recognized but classified as "Ester". Nevertheless, the lipid is named as "PC O-18:1-18:1"

Expected behavior
The same works when using "PC P-18:1/18:1".
See result as screen shot: working.
Else: If the lipid name is not correct, there should be an error. But the O- is transferred without error.

Screenshots
If applicable, add screenshots to help explain your problem.

not working:
not_working

working:

working

Additional context
Thank you for your work. We plan to use Goslin starting with version 2.0.

Problems parsing complex HMDB names

Some HMDB lipid names cannot be parsed.
I have implemented some patches in @gbaquer/LipidParser. Happy to incorporate into goslin and test. Let me know.

Issue 1: Functional groups

  • PG(20:4(8Z,11Z,14Z,17Z)-2OH(5S,6R)/22:6(4Z,7Z,10Z,13Z,16Z,19Z))
  • PC(20:5(7Z,9Z,11E,13E,17Z)-3OH(5,6,15)/14:0)
  • PS(18:2(10E,12Z)+=O(9)/18:1(9Z))

Solution:
Remove with REGEX
x=re.sub('(-\w+\(\d+[ A-Za-z,\d]*\))|\+=\w\(\d+\)','',x)

Issue 2: Specific FA's

  • PA(P-16:0/LTE4)
  • PE(PGJ2/20:5(5Z,8Z,11Z,14Z,17Z))
  • PE(22:0/5-iso PGF2VI)

Solution:
Replace by the corresponding nomenclature. For example: PGJ2 -> 20:5
for i in range(fa.shape[0]): x=re.sub(fa.name[i],fa.repname[i],x)

Parsing Ubiquinone or Coenzyme

Hi,

I am currently using rgoslin release version 1.1.2

I was trying to use it to parse a Ubiquinone, namely Coenzyme Q10.

Unfortunately, I do not know what name should I use as an input.

  • CoQ 10
  • Coenzyme Q10
  • Ubiquinone-10
library("rgoslin")

isValidLipidName("CoQ 10")
#> Warning in rcpp_is_valid_lipid_name(lipidName): Parsing of lipid name 'CoQ 10'
#> caused an exception: Lipid not found
#> [1] FALSE
isValidLipidName("Coenzyme Q10")
#> Warning in rcpp_is_valid_lipid_name(lipidName): Parsing of lipid name 'Coenzyme
#> Q10' caused an exception: Lipid not found
#> [1] FALSE
isValidLipidName("Ubiquinone-10")
#> Warning in rcpp_is_valid_lipid_name(lipidName): Parsing of lipid name
#> 'Ubiquinone-10' caused an exception: Lipid not found
#> [1] FALSE
Created on 2021-09-02 by the reprex package (v2.0.1)

Kindly advise.

Thank you.

Update LIPID MAPS grammar for updated shorthand notation

Is your feature request related to a problem? Please describe.
LIPID MAPS has recently updated their shorthand nomenclature / abbreviation following this publication:
https://www.jlr.org/content/early/2020/10/09/jlr.S120001025.full.pdf

However, there are still some cases, where the new rules (removal of unnecessary parentheses) are not displayed in the common name, but only on the abbreviation levels:
https://www.lipidmaps.org/data/LMSDRecord.php?LMID=LMGL02010003

Describe the solution you'd like
Goslin should be able to parse the updated names and should also respect the updated hierarchy introduced in the paper linked to above.

Describe alternatives you've considered
None

Additional context
None

Add Lipidclass nomenclature from ALEX123

For LUX score template SMILES, the naming conventions and list of lipid classes of Pauling, Hermansson, et al. PLoS One 2017 (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0188394) was used.
Some Lipidclass names are not used in Lipidmaps or Swiss lipids and would be great to have as an additional alias or complete addition, such as:

SQDG ~ Sulfoquinovosyl diacylglycerol (https://www.lipidmaps.org/data/LMSDRecord.php?LMID=LMGL05010004)
aPG = n-acylated PG (no entry in DBs, fatty acid at the head group)
aPE = NAPE ~ additional name
DMPE = PE-NMe2 ~ additional name
MMPE = PE-NMe ~ additional name
PEt = PEth = PEtOH ~ Phosphatidylethanol without the amine part

HeadGroup Formula for Ceramides is missing two Oxygen

The sum formula calculation for Ceramides is incorrect, since Ceramides have two additional Oxygens, which are not included in the head group definition of Goslin:

SP | Ceramides [SP02] | [2] | 2 | H | 1.0078 | [Cer, Ceramide]

This is in the following line of the table

Cer,SP,Ceramides [SP02],2,2,H,Ceramide,,,,,

To Reproduce

  1. Go to https://apps.lifs.isas.de/goslin
  2. Enter Cer 30:1 and Cer 18:1/16:0 into the form
  3. Click on submit
  4. Check the sum formulas and corresponding m/z values.
  5. The sum formula is C30H59NO and the neutral exact mass is 449.4597

Expected behavior
The sum formula should contain two more Oxygens, it should read HO2. Thus, the full sum formula should return C30H59NO3 in case of Cer 30:1

Add ACer to Goslin grammars

Is your feature request related to a problem? Please describe.
ACer (Acyl Ceramides) are currently not supported in the Goslin Grammars.

Describe the solution you'd like
ACers should be parseable and mapped similarly to other Ceramides.

ACers are currently not available in LIPID MAPS apparently.
SwissLipids has some examples: https://www.swisslipids.org/#/search/Acylceramide

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

github-actions
.github/workflows/check-grammars.yml
  • actions/checkout v2
  • actions/setup-java v2
  • gradle/gradle-build-action v1.5.0
gradle
build.gradle
  • org.antlr:antlr4 4.9.2

  • Check this box to trigger a request for Renovate to run again on this repository

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.