Giter Site home page Giter Site logo

kalininalab / glyles Goto Github PK

View Code? Open in Web Editor NEW
10.0 0.0 5.0 10.52 MB

A tool to convert IUPAC representations of glycans into SMILES strings.

Home Page: https://glyles.readthedocs.io

License: MIT License

Python 99.24% ANTLR 0.76%
glycans iupac smiles antlr4 grammar-parser

glyles's Introduction

GlyLES

testing docs-image piwheels PyPI - Downloads codecov DOI

A tool to convert IUPAC representation of Glycans into SMILES representation. This repo is still in the development phase; so, feel free to report any errors or issues. The code is available on github and the documentation can be found on ReadTheDocs.

Specification and (current) Limitations

The exact specification we're referring to when talking about "IUPAC representations of glycan" or "IUPAC-condensed", is given in the "Notes" section of this website. But as this package is still in the development phase, not everything of the specification is implemented yet (especially not all side chains you can attach to monomers). The structure of the glycan can be represented as a tree of the monosaccharides with maximal branching factor 4, i.e., each monomer in the glycan has at most 4 children.

Installation

So far, this package can only be downloaded from the python package index. So the installation with pip is very easy. Just type

pip install glyles

and you're ready to use it as described below. Use

pip install --upgrade glyles

to upgrade the glyles package to the most recent version.

Basic Usage

As a Python Package

Convert the IUPAC into a SMILES representation using the handy convert method

from glyles import convert

convert(glycan="Man(a1-2)Man", output_file="./test.txt")

You can also use the convert_generator method to get a generator for all SMILES:

from glyles import convert_generator

for smiles in convert_generator(glycan_list=["Man(a1-2)Man a", "Man(a1-2)Man b"]):
    print(smiles)

For more examples of how to use this package, please see the notebooks in the examples folder and checkout the documentation on ReadTheDocs.

In the Commandline

As of version 0.5.9, there is a commandline interface to GlyLES which is automatically installed when installing GlyLES through pip. The CLI is open for one or multiple IUPAC inputs as individual arguments. Due to the syntax of the IUPAC-condensed notation and the argument parsing in commandlines, the IUPAC strings must be given in quotes.

glyles -i "Man(a1-2)Man" -o test_output.txt
glyles -i "Man(a1-2)Man" "Fuc(a1-6)Glc" -o test_output.txt

File-input is also possible.

glyles -i input_file.txt -o test_output.txt

Providing multiple files and IUPAC-condensed names is als supported.

glyles -i input_file1.txt "Man(a1-2)Man" input_file2.txt input_file13.txt "Fuc(a1-6)Glc" -o test_output.txt

Notation of glycans

There are multiple different notations for glycans in IUPAC. So, according to the SNGF specification, Man(a1-4)Gal, Mana1-4Gal, and Mana4Gal all describe the same disaccharide. This is also covered in this package as all three notations will be parsed into the same tree of monosaccharides and result in the same SMILES string.

This is also described more detailed in a section on ReadTheDocs.

Poetry

To develop this package, we use the poetry package manager (see here for detailed instruction). It has basically the same functionality as conda but supports the package management better and also supports distinguishing packages into those that are needed to use the package and those that are needed in the development of the package. To enable others to work on this repository, we also publish the exact specifications of our poetry environment.

Citation

If you use GlyLES in your work, please cite

@article{joeres2023glyles,
  title={GlyLES: Grammar-based Parsing of Glycans from IUPAC-condensed to SMILES},
  author={Joeres, Roman and Bojar, Daniel and Kalinina, Olga V},
  journal={Journal of Cheminformatics},
  volume={15},
  number={1},
  pages={1--11},
  year={2023},
  publisher={BioMed Central}
}

glyles's People

Contributors

alperyurtseven avatar dcambie avatar old-shatterhand avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

glyles's Issues

Support Wildcards

Extend the grammar and code to accommodate for wildcards. Not in the get_smiles() functionality, but in the parsing of IUPACs. Also, add a method to check if glycans contain wildcards.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.