Giter Site home page Giter Site logo

One vs. two word esters about opsin HOT 6 CLOSED

dan2097 avatar dan2097 commented on August 29, 2024
One vs. two word esters

from opsin.

Comments (6)

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


In IUPAC nomenclature formally only the space separated version is an ester.
In the non-space separated case I do not think there is sufficient information to determine that the ester interpretation was intended.
The absence of a counter ion does make the non-ester suspicious but ultimately if someone wants to talk about an "[9-Hydroxy-6-methyl-3-(5-phenylpentan-2-yl)oxy-5,6,6a,7,8,9,10,10a-octahydrophenanthridin-1-yl]acetate" ion they should be able to.
Hence I'm leaning towards working as intended. There probably is room for improvement in ester names with more than one substituent e.g. "ethyl2-aminoacetate" which clearly was intended to be an ester even though the space is missing.

from opsin.

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Steve Chapman (Bitbucket: isomerdesign, ).


I agree with each of your points. The worry in this case, for instance, is that substance is listed (correctly) in the Misuse of Drugs Act but incorrectly in the ACMD report that recommended its addition: http://www.homeoffice.gov.uk/publications/alcohol-drugs/drugs/acmd1/acmd-report-agonists?view=Binary, causing confusion.

I suppose what I'd like is a Google-type intervention of the "did you mean finite **state **machine" when one mistypes finite **stale **machine, or //some //indication the name is suspect.

Another concern is the missing locant defaults to 2, e.g.. phenyldecanoate = 2-phenyldecanoate. Omitting a locant seems increasingly frowned upon by IUPAC unless there is pretty much no possible ambiguity. Not so here. Consider the difference between 3-hexyl decanoate = hex-3-yl decanoate and 3-hexyldecanoate = 3-(hexyl)decanoate. Even if a missing locant does not defeat the parser////, couldn't it whine a little about it?

from opsin.

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


Adding detection for ambiguity would be nice although to do so rigorously is not completely straightforward e.g. hexyl is not ambiguous even thought there are non-equivalent carbons from which a carbon could be removed. I would be keen if ambiguity detection were to be introduced to keep to an absolute minimum the amount of false positives. A charge imbalance could be a good reason to produce a warning (although in some databases such structures do exist), but to actually suggest a cause/solution would require adding a rule to detect this particular problem.

While I would be happy to accept contributions to this area of the project I don't think I am going to be able to find the time to look into it personally (my PhD is currently focusing on the automatic extraction of chemical reactions).
I have started looking into the fused ring numbering problem you brought up and will update your original post when/if my generalisation of the code is successful.

from opsin.

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Steve Chapman (Bitbucket: isomerdesign, ).


Thank you, Daniel. I agree it's not a pressing issue--I just felt it should be noted, really. The fused ring numbering problem is more important.

from opsin.

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


I'm not sure whether or not its more important but from a completionist point of view the deficiency in fused ring numbering is very annoying.
The version of fused ring numbering I am playing with currently works with 3,4,5,6 membered rings in all combinations and ring sizes >6 involved in 2 or fewer rings. The code for aligning the ring system in the directions with most rings in a line seems to not work quite right yet with 5 member rings and possibly only considering two different variants of the 5 membered rings may not be sufficient for systems where the 5 membered ring is not part of the row with most rings.

from opsin.

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


I have added heuristics for treating cases where the space is missing as esters. This version is now up on the web service for testing.
The heuristics are:

  • The left most substituent in the wordrule must have no locant and the right most group must be an "ate" or "ite" group
  • If the left most substituent has no locant but other substituents do -->ester
  • If the substituent has the multiplier "mono" -->ester
  • If the parent group has no substitutable positions -->ester
  • If substitution is ambiguous -->ester
  • If the name is of the form alkyl(methanoate|ethanoate|formate|acetate) -->ester

The lattermost rule is required as there is only one possible position for substitution on these structures.

The detection of ambiguity is pretty good although not completely fool-proof (due to things like double bonds not having been formally assigned yet rather than problems with the atom environment perception algorithm). I'm a bit dubious about this heuristic as it can result in different interpretations of otherwise very similar names e.g. diethylmalonate -->not ester, diethylsuccinate -->ester, but ethylsuccinate --> not ester (as the position for the ethyl is unambiguous)

from opsin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.