Giter Site home page Giter Site logo

1-butanol smiles code about opsin HOT 3 CLOSED

dan2097 avatar dan2097 commented on August 29, 2024
1-butanol smiles code

from opsin.

Comments (3)

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


The reason for this is that OPSIN's SMILES writer starts writing SMILES from the first atom in the molecule which, due to the order it creates groups in, is the 1 position of the butane. From this atom it can either write out C(CCC)O or the equally ugly C(O)CCC.

While I agree that CCCCO is prettier, C(CCC)O is still the same structure. Hence I'm not sure the added complication/slight computation cost in the SMILES writer is a good trade-off, as SMILES are primarily intended for reading by machine. The hypothetical fix I think would be to try and start on the first terminal atom (if such an atom exists).
(on opsin.ch.cam.ac.uk, the primary purpose of the outputs is: the depiction for humans, SMILES for input to other software, StdInChI for checking structure identity [OPSIN's SMILES are NOT canonical, and even if they were StdInChI is better at handling mesomers/tautomers] and StdInChIKey for easily searching for documents mentioning that structure)

from opsin.

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by dbgerhard (Bitbucket: dbgerhard, GitHub: dbgerhard).


Hi Daniel, thanks for getting back to me and I appreciate your response. Let me explain my application and see if there's any possible solution.

We were hoping to use your system, in combination with a bunch of other stuff, to produce a large set (1000's) of automatically generated multiple choice and JSME questions for our students learning organic chemistry nomenclature and reactions. Our system begins by randomly generating a correct reaction based on a database of IUPAC names; removes one component, and then generates a series of likely incorrect answers the student can choose from. Our JSME questions require the student to draw the missing molecule, and the question then checks the smiles code generated by JSME against the smiles code generated by opsin from the IUPAC name. I know we're cobbling bits together but given our limited development time and budget we really wanted to avoid writing our own IUPAC-to-smiles parser so when we discovered OPSIN could do this, we were very excited.

The question system itself is based on moodle, and as such requires a perfect match between the student's response (in smiles as produced from JSME) and the smiles code representing the molecule. Since moodle isn't smart enough to know that CCCCO is the same as C(CCC)O.

So we're hoping to match JSME-generated smiles to opsin-generated smiles, and since they produce different variants, the systems can't talk to each other. Do you know of any way to start from smiles code (or IUPAC name) and generate all the possible SMILES variants? We would then need to hard-code each possible variant into the moodle questions to ensure a match happens when the student produces a molecule from JSME.

from opsin.

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


It isn't going to be possible to get OPSIN to generate the same SMILES as JSME. As well as atom ordering SMILES can also represent aromatic systems in two ways. OPSIN will give C1=CC=CC=C1 for benzene but I think JSME will give you c1ccccc1, again both are the same structure. Canonical SMILES algorithms have been developed to address these issues... but almost all implementations differ from each other so you can compare canonical SMILES generated by one implementation, but they are not comparable to those generated by another! (which is part of the reason why OPSIN doesn't even try and produce canonical SMILES)

If it's possible to hook in another web service, then my recommendation would be to convert the SMILES from JSME to StdInChI using the NCI's resolver (OPSIN can produce StdInChI directly). It can be used RESTfully to perform such conversions
e.g.
https://cactus.nci.nih.gov/chemical/structure/C(CCC)O/stdinchi
https://cactus.nci.nih.gov/chemical/structure/CCCCO/stdinchi

Although probably out of scope for a system where you are either right or wrong, InChI's have the nice property of being layered, so if they are not identical you can work out at what point two compounds differ e.g. stereochemistry, isotopes, hydrogen positions, connectivity, atomic composition.

The same service can also convert SMILES to its canonical SMILES. Some example differences between canonical SMILES and InChI: InChI will consider a nitro group represented as N(=O)=O equivalent to one represented as [N+](=O)[O-]. InChI will consider common tautomers of a compound to be equivalent.

To give an idea of why enumerating the possible SMILES for a structure is not tractable the following might be useful:
https://nextmovesoftware.com/blog/2014/07/15/how-do-i-write-thee-let-me-count-the-ways/

The NCI's chemical identifier resolver is more fully documented here:
https://cactus.nci.nih.gov/chemical/structure
It actually uses OPSIN for converting systematic chemical names... although I think it might be still using quite an old version.

from opsin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.