Giter Site home page Giter Site logo

The drawbacks of backtracking about opsin HOT 4 CLOSED

dan2097 avatar dan2097 commented on August 29, 2024
The drawbacks of backtracking

from opsin.

Comments (4)

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


The common name is in my opinion formally ambiguous. Unfortunately OPSIN does not currently detect ambiguity in chemical names and instead tends to stop looking for possibilities as soon as it has found a sensible outcome.

The heuristic that is employed is to start from the rightmost group in the bracket and work right to left checking whether the group has the desired locant.

I have changed this heuristic to first check the adjacent group when the following criteria are satisfied:

  • The locant is of the form \d+[a-z]?'* i.e. numeric
  • Neither a hyphen or locant are present e.g. 1-pentyl-3-(2-methyl-phenylacetyl)indole or 1-pentyl-3-(2-methyl2phenylacetyl)indole or 1-pentyl-3-(2-methyl-2-phenylacetyl)indole will retain OPSIN's original interpretation.

In my regression sets, especially a set of polymer names, this change makes a uniformly positive improvement, albeit it only effects names that are formally ambiguous.

Thanks for the bug report. The fixed version is up on the web service. Let me know if it doesn't perform as expected.

Daniel

P.S. OPSIN only actually generates one parse for this name as at there is no ambiguity in tokenizing this name/assigning meaning to the tokens.

from opsin.

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


1-pentyl-3-(1-methylphenylacetyl)indole is probably interpreted incorrectly (as in it produces a structure rather than a valency error). OPSIN currently doesn't treat phenyl as being explicitly phen-1-yl, and it really should do.

from opsin.

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Anonymous.


Outstanding! Fastest Debug Ever.

Thanks for the explanation as well, Daniel. I had just read your paper and had alternative parses on my mind, but in future I'll stick to reporting the symptoms and leave the diagnosis to you.

The problem is solved, but just fyi, here are a few more curious/degenerate cases:

  • 1-pentyl-3-(methylphenylacetyl)indole = 1-pentyl-3-(2-methyl-2-phenylacetyl)indole
  • 1-pentyl-3-(1-methylphenylacetyl)indole = 1-pentyl-3-(2-methylphenylacetyl)indole
  • 1-pentyl-3-(3-methyl-2-phenylacetyl)indole = 1-pentyl-3-(3-methylphenylacetyl)indole

All the best, and props to the OPSIN team for this extraordinarily useful service.

/Steve

from opsin.

dan2097 avatar dan2097 commented on August 29, 2024

Original comment by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


Yeah alternative parses are an important part of OPSIN but as you can see from the pie chart in that paper multiple parses are fortunately moderately rare. Ideally multiple parses should only happen if there are multiple ways of interpreting something that are either non trivial or impossible to disambiguate between e.g. 2-methylthiophenyl which could be 2-(methylthio)phenyl or 2-methyl-thiophen-yl.

  • 1-pentyl-3-(methylphenylacetyl)indole acting as 1-pentyl-3-(2-methyl-2-phenylacetyl)indole I think is working as intended (although the name is clearly ambiguous)
  • 1-pentyl-3-(1-methylphenylacetyl)indole was being incorrectly interpreted as 1-pentyl-3-(1-methylphen-2-ylacetyl)indole which clearly makes no sense as the position of the radical on phenyl is always at locant 1. Its current interpretation isn't too much better but at least now is clearly a case of garbage in, garbage out.
  • Interpreting 1-pentyl-3-(3-methyl-2-phenylacetyl)indole in that way is the only way to generate a structure so I think it is working as intended.

The change which which stops phenyl being broken down into phen and yl is now live.

Thanks for the feedback

from opsin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.