Giter Site home page Giter Site logo

opsin's People

Contributors

baoilleach avatar dan2097 avatar jimdowning avatar johnmay avatar mailaender avatar merkys avatar mjw99 avatar rapodaca avatar rogersayle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

opsin's Issues

Add support for detecting substitution ambiguity

Original report by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


The most common source of ambiguity in names is substitution ambiguity. At the point of multiplying substituents for unlocanted substitution the StereoAnalyser should be used to detect the number of degenerate substitutable hydrogens and assure that the multiplier is this number (or 1 less e.g. pentachlorobenzene is also unambiguous)

Ambiguity detection would be off by default, and produce a result status of WARNING if detected.

Rewrite alpha/beta stereochemistry assignment handling code

Original report by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


Currently alpha/beta stereo are treated like other stereochemistry terms and applied to the stereoparent after it has been constructed. Alpha/beta stereochemistry indicates the position of a group relative to the plane of the unsubstituted parent, hence to do this in general the term must be associated with the atom through which the group that was added attached.
[The current implementation makes the assumption that the non-hydrogen is the atom added and hence fails when a position is double substituted]
Proposal: Keep alpha/beta stereochemistry in much the way it is originally parsed. In bridge assignment, suffix assignment and substitutent attachment add consider being preceded either by a locant or a stereochemistry element of this type. Once the group is attached, the stereochemistry term should be moved in front of the stereoparent and associanted with the atomid of the atom that attached to the stereoparent.

Bracketed alpha/beta stereochemistry can be handled as it is currently. Alpha/beta stereo directly in front of a stereoparent should probably be recognized at the point that it is currently applied.

syntax for greek letters

Original report by Anonymous.


The website gives, among others, the following two ways to use a name with greek letters:

The romanised name for the letter e.g. lambda

The romanised name for the letter surrounded by dots e.g. .lambda.

However,
1,1-dioxo-1,2-dihydro-1lambda^6-thieno[2,3-d]isothiazol-3-one

is properly parsed while

1,1-dioxo-1,2-dihydro-1.lambda.^6-thieno[2,3-d]isothiazol-3-one and
1,1-dioxo-1,2-dihydro-1.lambda.6-thieno[2,3-d]isothiazol-3-one fail.

The drawbacks of backtracking

Original report by Steve Chapman (Bitbucket: isomerdesign, ).


Synthetic cannabinoid JWH-251 is commonly named "1-pentyl-3-(2-methylphenylacetyl)indole" which is parsed as "1-pentyl-3-(2-methyl-2-phenylacetyl)indole" rather than "1-pentyl-3-[(2-methylphenyl)acetyl]indole." The latter name evokes the correct depiction of JWH-251. (Credit Lee Fadness for this discovery)

However "1-pentyl-3-(3-methylphenylacetyl)indole" is parsed as "1-pentyl-3-[(3-methylphenyl)acetyl]indole," presumably because the parse that worked for the prior name fails for this one. Pragmatic but //inconsistant//.

Also, this variant parses correctly and consistently regardless of the locant: "1-pentyl-3-(2-methylphenacyl)indole".

HTML tag sparks existential question

Original report by Steve Chapman (Bitbucket: isomerdesign, ).


An HTML italic tag embedded in a name consistently evokes this response:

//Problem retrieving server error message! Is this server running?

Example:

  • Works: (R)-chlorobromoiodomethane
  • Fails:(R) chlorobromoiodomethane

This is a trivial issue when the cause is recognized and corrected, but the error message is sufficiently distracting that I suspect some never get past it.

Issue parsing this molecule

Original report by Anonymous.


I try to use opsin to convert this molecule name into smile:

acetic acid 2'-methyl-2'-(1",2",4"-trimethylpent-2"-enyloxy)propyl ester

can you tell me why it does not work ?

best regard,

Guillaume Godin

Return CML as a String rather than a XOM element

Original report by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


OPSIN currently returns CML (Chemical Markup Language) as a XOM element. Changing this interface to a String would allow OPSIN's dependency on XOM to be removed and would provide access to the CML without knowledge of a 3rd party library. On the downside the signature of parseToCML would be changed.
This change would be part of a v2.0 release.

Trivial typo when vexed by a name (quite correctly) deemed unparsable

Original report by Steve Chapman (Bitbucket: isomerdesign, ).


"3-methanone(cyclopropyl)indole has no tokens unknown to OPSIN but does not conform to its grammar. From left to right it is unparsable due to the following being uninterpretable:(cyclopropyl)indole The following or which was not parseable: cyclopropyl)indole"

[Unparsable and vexatious name courtesy of Health Canada's Office of Controlled Substances and Creative Writing]

Handling isotopomers

Original report by Anonymous.


I believe that 3-methyl-4-(propyl-2,3-13C)octane is valid IUPAC specification for an isotopomer with 13C at the 2 and 3 atoms of the propyl group. However, OPSIN is not recognizing this.

A lot of stable isotope tracing experiments are generating NMR datasets with assigned isotopomers. Our abililty to utilize IUPAC depends on the ability of tools to process the isotopomer part of the IUPAC standard.

Lower-case r/s stereochemical notation

Original report by pmortenson (Bitbucket: pmortenson, ).


Me again!

We use Chemaxon's structure-to-name package, and then check the names by round-tripping them with OPSIN. We have noticed one class of names that always fail - here is an example:

[(1s,3r)-3-methylcyclobutyl]methanol

It was generated from this SMILES: C[C@H]1CC@HC1

Here there is relative stereochemistry across a symmetrical ring, so (R,S) and (S,R) are equivalent. Is this incorrect naming from Chemaxon, or just not a feature that is currently in OPSIN?

Thanks for all your help,
Paul

Deoxide and deoxido

Original report by Takayuki Kimura (Bitbucket: takayukiii, GitHub: takayukiii).


Hi, I used OPSIN to change IUPAC to SMILES and it worked well. Thank you.
I noticed one thing. The program asked me to change 'oxidetetrahydro-'
to 'oxidotetrahydro-' and this might be wrong. Thanks again.

'non' assumed when 'nona' submitted in name

Original report by Roy Garvey (Bitbucket: rgarvey, GitHub: rgarvey).


N-butyl-4-nitrobenzeneamine nona(4-nitrobenzeneamine)
elicits error message "non" indicates none. "nona" should probably be used. My guess is that the parser does not check the next character before deciding that the error has occurred. This error seems to be occurring generally when 'nine' items are being considered.

Numbering non-linear fused rings: An example of a known limitation?

Original report by Steve Chapman (Bitbucket: isomerdesign, ).


"Fused ring systems involving non 6-membered rings which are not in a "chain" cannot currently be numbered e.g. indeno[2,1-c]pyridine can be numbered, benzo[cd]indole cannot"

I'm guessing that's why this parses fine:

  • [1,4]oxazino[2,3,4-hi]indole

while this does not:

  • 2,3-dihydro[1,4]oxazino[2,3,4-hi]indole

I hesitate to label this a bug, but the alternatives are few. Perhaps "test-case" would be a useful option?

One vs. two word esters

Original report by Steve Chapman (Bitbucket: isomerdesign, ).


Omitting the space makes a difference:

[9-Hydroxy-6-methyl-3-(5-phenylpentan-2-yl)oxy-5,6,6a,7,8,9,10,10a-octahydrophenanthridin-1-yl]acetate

[9-Hydroxy-6-methyl-3-(5-phenylpentan-2-yl)oxy-5,6,6a,7,8,9,10,10a-octahydrophenanthridin-1-yl] acetate

Simper cases exhibit the same behaviour: hexylacetate vs hexyl acetate.

R groups (alkyl)

Original report by Jiri Novotny (Bitbucket: gorgitko, GitHub: gorgitko).


Hello,

having this IUPAC: alkyl(dimethyl)[2-(hydroxyimino)-2-(pyridin-2-yl)ethyl]ammonium

I get this error: alkyl(dimethyl)[2-(hydroxyimino)-2-(pyridin-2-yl)ethyl]ammonium was uninterpretable due to the following section of the name: alkyl The following was not understandable in the context it was used: alk

Is it possible to also support these alkyl (R) groups? They are normally noted with * in SMILES.

Thanks,

Jiri

(9S)-Cinchonan-9-ol

Original report by Joe Polak (Bitbucket: JoePolak, GitHub: JoePolak).


Hello,

A user of ChemDoodle has reported that the name "(9S)-Cinchonan-9-ol" isn't supported by the name parser, which uses OPSIN 1.5. I checked with the live version on at http://opsin.ch.cam.ac.uk/, and got the following output:

"(9S)-Cinchonan-9-ol was uninterpretable due to the following section of the name: (9S)-Cinchonan-9-ol The following was not understandable in the context it was used: Cinch"

Thought you should know.

Joe Polak
Senior Developer
iChemLabs

Amines vs amides and and their glyoxalyl indoles

Original report by Steve Chapman (Bitbucket: isomerdesign, ).


http://opsin.ch.cam.ac.uk/opsin/glyoxalylamine.png
http://opsin.ch.cam.ac.uk/opsin/glyoxalylamide.png
http://opsin.ch.cam.ac.uk/opsin/indol-3-ylglyoxalylamide.png
http://opsin.ch.cam.ac.uk/opsin/2-(indol-3-yl)glyoxalylamine.png

1.Amide has a charge on the nitrogen, the amine does not.

2.The commonplace name for a tryptamine precursor is indol-3-ylglyoxalylamide.which my be imperfect IUPAC but there you go. The structure this name produces is not the one typically desired. Adding a proper locant gives the correct structure.

InChI generation error

Original report by Anonymous.


InChI generation error where SMILES and CML descriptions are created for:
Ammonium 3,13,16,19,22-pentaoxo-5,8,11-triaza-2,14,15,20,21-pentaoxa-1-ytterbatetracyclo[6.6.3.3{1,5}.3{1,11}]tricosane hydrate

Uninterpretable molecule

Original report by AlisonChoy (Bitbucket: AlisonChoy, GitHub: AlisonChoy).


2-endoamino-benzobicyclo[2,2,1]-heptane

2-endoamino-benzobicyclo[2,2,1]-heptane was uninterpretable due to the following section of the name: 2-endo The following was not understandable in the context it was used: end

Lambda notation bug?

Original report by Anonymous.


Hi Dan,

I noticed that OPSIN fails to parse the following name:

1lambda6,4-thiomorpholine-1,1-dione

however, both of the following are parsed fine:

1lambda6,4lambda3-thiomorpholine-1,1-dione (different name for the same compound)
1lambda6,2-benzothiazole-1,1-dione

So it looks like there isn't a problem with the 1,4 thiomorpholine numbering, and neither is it always necessry to specify lambda values for all locants. Is this a bug, or have I missed something?

Fantastic piece of software, by the way....

Allow unprimed locants on acetophenone

Original report by Daniel Lowe (Bitbucket: dan2097, GitHub: dan2097).


acetophenone has 2,2',3',4',5',6' locants. In many names the primed locants are treated as unprimed and if required the 2 position is referred to as alpha.

In principle changing the locants if an unprimed number 3 or 4 is detected would give better recall.

Thioacetic acid. Ambiguity by IUPAC design?

Original report by Steve Chapman (Bitbucket: isomerdesign, ).


Opsin parses thioacetic acid as thioacetic O-acid, i.e., the hydroxy moiety is undisturbed. ACD/Labs Name parses thioacetic acid as thioacetic S-acid, i.e. the carbonyl moiety is undisturbed.Wikipedia also favours the S-acid.

Opsin rejects the qualifier for the acid, e.g. thioacetic S-acid, thioacetic O-acid, but accepts it in an ester, e.g. Thioacetic acid S-methyl ester, Thioacetic acid O-methyl ester

I don't know that there is a right or wrong answer, just an ill-conceived overload of thio in this context with no clear guidance from IUPAC that I can find.

Add support for levulinyl

Original report by Noel O'Boyle (Bitbucket: baoilleach, GitHub: baoilleach).


Hi Daniel,
I'm not sure if people are misusing this or not, but levulinyl is sometimes used in the context of "Levulinyl chloride" or as a substituent on sugars (e.g. 3-O-levulinyl). OPSIN is happy with "Levulinic acid" but the chloride gives:
"Suffix: yl does not apply to the group it was associated with (type: acidStem) due to the group's subType: ylForNothing according to suffixApplicability.xml"

1-butanol smiles code

Original report by dbgerhard (Bitbucket: dbgerhard, GitHub: dbgerhard).


Hi folks, thanks for this amazing site.

I put in 1-butanol, and I expected the smiles code to be CCCCO but your system produced C(CCC)O. Can you check why your smiles code is inserting a branch? C(CCC)O would suggest a substituent where none exists. Thanks.

Mancude rings refuse to fuse?

Original report by Steve Chapman (Bitbucket: isomerdesign, ).


I tried to construct this structure: 11,12-dihydroindolo[2,3-a]carbazole, by instead approaching the name as 11,12-dihydrobenzo[2,1-b:3,4-b']diindole.

That does in fact work, but the fully conjugated (didehydro) name benzo[2,1-b:3,4-b']diindole is rebuffed with "Could not assign all higher order bonds."

Still, this name benzo[1,2-b:3,4-b']diindole, and this name benzo[2,1-b:4,5-b']diindole are accepted, which seems odd. This name indolo[2,3-a]carbazole is also accepted.

whitespace important?

Original report by Anonymous.


Insertion of a space in the failed example
4,4',4''-trimethyl(2,2':6',2''-terpyridine)-4,4',4''-tricarboxylate
as
4,4',4''-trimethyl (2,2':6',2''-terpyridine)-4,4',4''-tricarboxylate
made parseable.

spelling for "Invalid fusion descriptor"

Original report by Anonymous.


This is nit-picky but for the error given when the name "6,6-dimethyl-4-morpholino-2-(1H-pyrrolo[2,3-b]pyridin-5-yl)-8,9-dihydro-6H-[1,4]oxazino[3,4-e]purine" is given the error reads: "Invalid fusion descriptor: Heteroatom placement is ambigous as it is not present in both components of the fusion"

The word ambigous should be ambiguous.

Galactose

Original report by Andrius Merkys (Bitbucket: merkys, GitHub: merkys).


Galactose should be cyclic, however, it is perceived as a linear molecule:

$ echo galactose | java -jar src/opsin-2.3.0-jar-with-dependencies.jar 
Run the jar using the -h flag for help. Enter a chemical name to begin:
O=C[C@H](O)[C@@H](O)[C@@H](O)[C@H](O)CO

Hemisulfate

Original report by Richard Apodaca (Bitbucket: rapodaca, GitHub: rapodaca).


Salts containing the "hemisulfate" form are not recognized, for example, "pyridinium hemisulfate" (two pyridines and one sulfuric acid).

The term appears regularly in chemical supplier catalogs.

For charged groups, the behavior of "sulfate" groups is the same as that needed for "hemisulfate". For example, "pyridinium sulfate" gives the same structure that "pyridine hemisulfate" should give.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.