Giter Site home page Giter Site logo

beam's Introduction

Beam

DOI

Beam - to express by means of a radiant smile

Beam is a free toolkit dedicated to parsing and generating Simplified molecular-input line-entry system - SMILES™ line notations. The primary focus of the library is to elegantly handle the SMILES™ syntax and as fast as possible.

Beaming

Note: Beam is still in a development and some APIs will likely change until a release is made.

One of the primary types in Beam is the Graph it provides convenience methods for reading SMILES™ notation directly.

Graph g = Graph.fromSmiles("CCO");

and for writing it back to SMILES™ notation.

String smi = g.toSmiles();

Beam provides excellent round tripping, preserving exactly how the input was specified. Disregarding inputs with redundant brackets and erroneous/repeated ring numbers - the actually input will generally be identical to the output.

// bond labels
Graph.fromSmiles("C1=CC=CC=C1").toSmiles();    // kekule      (implicit single bonds)
Graph.fromSmiles("C-1=C-C=C-C=C1").toSmiles(); // kekule      (explicit single bonds)
Graph.fromSmiles("c1ccccc1").toSmiles();       // delocalised (implicit aromatic bonds)
Graph.fromSmiles("c:1:c:c:c:c:c1").toSmiles(); // delocalised (explicit aromatic bonds)

// bracket atoms stay as bracket atoms
Graph.fromSmiles("[CH]1=[CH][CH]=[CH][CH]=[CH]1").toSmiles();
Graph.fromSmiles("[CH]1=[CH]C=C[CH]=[CH]1").toSmiles();       // mix bracket and subset atoms

Although preserving the representation was one of the design goals for beam it is common to normalise output SMILES™.

Collapse a graph with labelled hydrogens [CH3][CH2][OH] to one with implicit hydrogens CCO.

Graph g = Graph.fromSmiles("[CH3][CH2][OH]");
Graph h = Functions.collapse(g);
h.toSmiles().equals("CCO");

Expand a graph where the hydrogens are implicit CCO to one with labelled hydrogens [CH3][CH2][OH].

Graph g = Graph.fromSmiles("CCO");
Graph h = Functions.expand(g);
h.toSmiles().equals("[CH3][CH2][OH]");

Stereo specification is persevered through rearrangements. The example below randomly generates arbitrary SMILES™ preserving correct stereo-configuration.

Graph g  = Graph.fromSmiles("CCC[C@@](C)(O)[C@H](C)N");
StringBuilder sb = new StringBuilder(g.toSmiles());
for (int i = 0; i < 25; i++)
    sb.append('.').append(Functions.randomise(g).toSmiles());
System.out.println(sb);

Bond based double-bond configuration is normal in SMILES but can be problematic. The issue is that a single symbol may be specifying two adjacent configurations. A proposed extension was to use atom-based double-bond configuration.

Beam will input, output and convert atom and bond-based double-bond stereo specification.

Graph  g   = Graph.fromSmiles("F/C=C/F");
Graph  h   = Functions.atomBasedDBStereo(g);
String smi = h.toSmiles();
smi.equals("F[C@H]=[C@@H]F");
Graph  g   = Graph.fromSmiles("F[C@H]=[C@@H]F");
Graph  h   = Functions.bondBasedDBStereo(g);
String smi = h.toSmiles();
smi.equals("F/C=C/F");

Convert a graph with delocalised bonds to kekulé representation.

Graph  furan        = Graph.fromSmiles("o1cccc1");
Graph  furan_kekule = furan.kekule();
String smi          = furan_kekule.toSmiles();
smi.equals("O1C=CC=C1");

With bond-based double-bond stereo specification there are two possible ways to write each bond-based configuration. beam allows you to normalise the labels such that the first symbol is always a forward slash (/). Some examples are shown below.

Graph   g   = Graph.fromSmiles("F\\C=C/F");
Graph   h   = Functions.normaliseDirectionalLabels(g);
String  smi = h.toSmiles();
smi.equals("F/C=C\\F");
F/C=C/C              is normalised to F/C=C/C
F\C=C\C              is normalised to F/C=C/C
F/C=C\C              is normalised to F/C=C\C
F\C=C/C              is normalised to F/C=C\C
C(\F)(/C)=C\C        is normalised to C(/F)(\C)=C/C
FC=C(F)C=C(F)\C=C\C  is normalised to FC=C(F)C=C(F)/C=C/C

Beam me up

beam is still in development but you can obtain the latest build from the EBI snapshots repository. An example configuration for maven is shown below.

<project>
...
<repositories>
   <repository>
      <id>ebi-repo</id>
      <url>http://www.ebi.ac.uk/intact/maven/nexus/content/repositories/ebi-repo/</url>
   </repository>
   <repository>
      <id>ebi-repo-snapshots</id>
      <url>http://www.ebi.ac.uk/intact/maven/nexus/content/repositories/ebi-repo-snapshots/</url>
   </repository>
</repositories>
...
<dependencies>
    <dependency>
        <groupId>uk.ac.ebi.beam</groupId>
        <artifactId>beam-core</artifactId>
        <version>LATEST</version>
    </dependency>
    <dependency>
        <groupId>uk.ac.ebi.beam</groupId>
        <artifactId>beam-func</artifactId>
        <version>LATEST</version>
    </dependency>
</dependencies>
...
</project>

License BSD 2-Clause

Copyright (c) 2013, European Bioinformatics Institute (EMBL-EBI) All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of the FreeBSD Project.

How to cite

Use the DOI at the top of this README or: John Mayfield, BEAM v1.0, 2017, www.github.com/johnmay/beam


™: SMILES is a trademark of Daylight Chemical Information Systems

beam's People

Contributors

dependabot[bot] avatar johnmay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

beam's Issues

Problem parsing tetrahedral stereochemistry with ring closure as last element

(With CDK2.1) The following SMILES strings represent the same molecule:

[C@]1(Cl)(F)I.Br1
[C@@](Cl)1(F)I.Br1
[C@](Cl)(F)1I.Br1
[C@@](Cl)(F)(I)1.Br1

The last SMILES string triggers an exception:

org.openscience.cdk.exception.InvalidSmilesException: org.openscience.cdk.exception.InvalidSmilesException: could not parse '[C@@](Cl)(F)(I)1.Br1', Invaid number of verticies for TH1/TH2 stereo chemistry

Please add a "how to cite" section

... reflecting how you like Beam to be cited. Maybe it is an idea to put release 0.9.1 on ZENODO, because I could use that DOI to cite Beam.

Using matching implementation in jgrapht

Hi John,
I'm one of the developers/maintainers of the open source graph library JGrapht . I'm currently revising some of the matching algorithms in our library. While doing so I came across your matching implementation for maximum cardinality matchings (beam/core/src/main/java/uk/ac/ebi/beam/MaximumMatching.java). It seems that this implementation is faster than the implementation we currently have in our library. Therefore, with your permission, I would like to include your version in the library. Obviously, you'll remain the author of the code, and you'll be mentioned on our 'Contributors' page. No additional effort from your side is required.

I made some modifications to your matching code to make it compatible with jgrapht:
Proposed matching implementation in jgrapht
(This is not the final version; I'll have to do some additional testing, streamlining and cleanup )

Thanks!

InvalidSmilesException should be public

Graph.fromSmiles throws the checked exception InvalidSmilesException, but this class is package protected which complicates catching it. Probably InvalidSmilesException should be public.
An alternative fix would be to declare Graph.fromSmiles as throwing an IOException as InvalidSmilesException extends from this.

Is "C0CCCCC%0" a valid SMILES?

Is "C0CCCCC%0" a correct SMILES string? I think it should be "C0CCCCC%00" or "C%00CCCCC%00".

Quote from the OpenSMILES Specification:

Two-digit ring numbers are permitted, but must be preceeded by the percent "%" symbol, such as "C%25CCCCC%25" for cyclohexane. Three-digit numbers and larger are never permitted. However, note that three digits are not invalid; for example, "C%123" is the same as "C3%12", that is, an atom with two rnum specifications.

The digit(s) representing a ring-closure are interpreted as a number, not a symbol, and two rnums match if their numbers match. Thus, C1CCCCC%01 is a valid SMILES and is the same as C1CCCCC1. Likewise, C%00CCCCC%00 is a valid SMILES.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.