Giter Site home page Giter Site logo

event_db about ddimdl HOT 4 OPEN

ShenggengLin avatar ShenggengLin commented on June 27, 2024
event_db

from ddimdl.

Comments (4)

YifanDengWHU avatar YifanDengWHU commented on June 27, 2024

Hi, Shenggeng!
For the first problem, this is because the same drug-drug pair are recorded twice in the data. For example, (sildenafil, Isosorbide mononitrate) and (Isosorbide mononitrate, sildenafil) for another. But they are the same in fact. So we delete half of them.
For the second problem. Just try to learn the usage of RDKit package. For example, for the drug Isosorbide mononitrate. We can collect its SMILES [H][C@]12OCC@@H[C@@]1([H])OC[C@@h]2O from DrugBank.
So here is the code:

from rdKit import Chem
from rdkit.Chem import AllChem
smile = '[H][C@]12OC[C@@H](O[N+]([O-])=O)[C@@]1([H])OC[C@@H]2O'
mol = Chem.MolFromSmiles(smile)
morgan_hashed = AllChem.GetMorganFingerprintAsBitVect(mol,2,nBits=881)
morgan_hashed.ToBitString()

It will be a bit vector of 881 length.

from ddimdl.

ShenggengLin avatar ShenggengLin commented on June 27, 2024

Hello, Yifan!

Thank you very much for your reply. I have understand the first question. Thank you very much!

But I still have questions about the second question.

For drug DB01296, his smiles is' N[C@H]1C(O)OC@HC@@H[C@@h]1O '. Through the code you provided, I did get a 881 dimensional vector. But in the event.db , its smiles features are 9|10|14|18|19|20|178|181|283|284|285|286|299|308|332|338|339|340|341|344|345|346|347|351|352|365|366|367|380|393|405|406|528|563|566|567|571|582|592|614|615|617|637|638|639|643|661|662|663|679|680|681|682|683|689|690|691|701|703.
I wonder what these numbers mean?Does it mean that these positions are 1 in the 881 dimensional vector? But if this is the case, for the drug db01296, its ninth digit is 0, but there are 9 in these numbers. And its 16th digit is 1, but there is no 16 in these numbers.

from ddimdl.

YifanDengWHU avatar YifanDengWHU commented on June 27, 2024

Yes, you are right.
The reason is because the fingerprint methods are different. For the fingerprint in the current dataset, it is obtained by a former student. He used the RDkit in JAVA.
The code in my code used MorganFingerprint. It is the most common method. I have test the result. There is little difference between the current dataset's fingerprint and MorganFingerprint.

from ddimdl.

ShenggengLin avatar ShenggengLin commented on June 27, 2024

OK, I see. Thank you for your reply!Thank you very much!

from ddimdl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.