yifandengwhu / ddimdl Goto Github PK

View Code? Open in Web Editor NEW

61.0 61.0 16.0 4.84 MB

Python 100.00%

ddimdl's People

Contributors

Stargazers

Watchers

Forkers

naodandandan dckio asadkhanmaharvi bens1320 kunalgr cknoxrun qiufenchen jiamingouyang mycroftxu cpmss521 keangzhu akrishta reemaaldumikhi kayden-wang ww-jackon abaldeo

ddimdl's Issues

Request full code

Thank you so much for your contribution Prof.
I'd like to kindly ask for the full code as me and my team are in need of further understanding of it in order to implement it in our program, NN are a bit of a new area and we'd appreciate it greatly. Would it be alright to message you further questions on email regarding it?

Also, can you explain in simple words what the output would be?

Thank you. [email protected]

Request of Raw Data set

Hye hope so you are doing well, can you share with us a raw data that is purely available on DrugBank rather than these 4 processes tables?
Thank you I am waiting for a positive response from your side.

task2 and task3

@YifanDengWHU Prof. Deng, I hope this message finds you well. I would like to know how you constructed the dataset in Task 2 and Task 3 of DDIMDL. Can you email the full code of the three tasks? This is my QQ email [email protected]
Thanks in advance.

Request Full Code of 3 Tasks for Further Study

@YifanDengWHU Thank you for sharing the research effort of DDIMDL in the Github. After some investigation, we are gaining a basic understanding. Can you email the full code of the three tasks for our further study and understanding. My email ID is [email protected].

关于数据处理的问题

          学长你好！

我是广工大三的学生，最近也在研究关于DDI的问题.
我想问一下您在论文中提到说将小于10个DDI的罕见事件删除了，但是我在您的event.db中的发现有的event是低于10的。比如如下：(这是我整理你的event.db中收集到的）
('the vasopressor activities increase', 9), ('a decrease in the absorption resulting in a reduced serum concentration and potentially a decrease in efficacy cause', 9), ('the hyponatremic activities increase', 9), ('the excretion rate which could result in a lower serum level and potentially a reduction in efficacy increase', 7), ('The risk or severity of myopathy and rhabdomyolysis increase', 7), ('the neuromuscular blocking activities decrease', 6), ('The risk of a hypersensitivity reaction increase', 5), ('the hypocalcemic activities increase', 5), ('the vasodilatory activities increase', 5), ('the myelosuppressive activities increase', 5), ('the hyperglycemic activities increase', 5)
是我理解错了还是怎么回事呢?

Originally posted by @Savant-HO in #14 (comment)

Ask for the DDI types of Devision

Smiles Didn't extracted using DRKG_drug_spider.py

Hi, @YifanDengWHU hope you will be doing well I run the DRKG_drug_spider.py file It did scrape the fields e.g enzymes, targets, etc but it didn't extract the smiles field from the drug bank databse the smiles field is empty it's not showing any smiles for any of the drug's I am attaching the pic below you can see.

About the AUPR calculation

Hi, thanks for your share. I have a question about the AUPR calculation.

In your code, the AUPR is

precision, recall, pr_thresholds = precision_recall_curve(y_true, y_score)
return auc(precision, recall, reorder=True)

However, the usage in my opinion is

precision, recall, pr_thresholds = precision_recall_curve(y_true, y_score)
return auc(recall, precision)

The results of these two calculations are different. Would you like to check the calculation?

about the the running rounds

I am sorry to contact you but I have a question: You set epoch=100 in your source code. The early_stopping condition can be satisfied when epoch is 20+ or 30+ generally. I think in this situation, early stopping means that the model has been trained well. However, it will automatically run a few more rounds like this, beginning from epoch=1 until it meets the early stopping condition. I wonder why. Would you please kindly reply to me? Thank you very much.

table 2 results in paper

When I run your code I get two results files first one is smile+target+enzyme_each_DDIMDL.csv and the second one is smile+target+enzyme_all_DDIMDL.csv I did understand the second file results which are overall results of the paper but can you please tell what is the results of first file is these are results of each and every 65 events.
My second question can you please tell me from where did you get the results of table 2 in your paper I am attaching the table pic below.

i need some help in testing the model after get files

event_db

Hello, Deng Yifan. I'm very interested in your article. I think it's a very good job. So right now I'm trying to replicate it.

I would like to ask you two questions：

First: the article says that you extracted 74528 pairs of DDI, but in the event.db There are only 37264 pairs of DDI. I wonder if your experiment only used 37264 pairs of DDI?

Second: In the drug table of event.db, the smile characteristics of drugs are some numbers. Did you use rdkit to convert a smile string into an 881 dimensional fingerprint? I am a fourth year undergraduate student. I have been looking for it on the Internet for a long time, but I still don't know how to convert it. If it's convenient, can you disclose this code?

Looking forward to your reply, thank you very much!

Request full code

problem in test the model

I require assistance with testing the model. Once I have obtained the files "model.h5," "smile+target+enzyme_each_DDIMDL.csv," and "smile+target+enzyme_all_DDIMDL.csv," I am using a Flask server to establish a connection with an HTML file. However, I encountered an error after adding drug names to the HTML form.

I would like to understand the specific data requirements for the ".h5" file. What kind of data should be passed to it, and what are the necessary details for it to function correctly? Additionally, I would appreciate guidance on how to effectively test the model after it is up and running.

About the number of interactions and the substructure dimension

Hi, thanks for your kind reply. I have two questions:

The number of interactions in your paper is 74528 while the data in the code is 37264. I guess the "74528" actually contains the "drugA-drugB" and the "drugB-drugA"？Can we understand that the "drugA-drugB" and the "drugB-drugA" are actually the same event with same label?
The substructure dimension when I run the code is 583 instead of the 881 in your paper. Is something wrong?

Fingerprints

Can you please share the script of drug smiles conversion to fingerprint? I want values as you have in the event database 0|12|23|.........

from rdkit import Chem
from rdkit.Chem import AllChem
mol = Chem.MolFromSmiles('CC(C)C1=C(C(=C(N1CCC@HO)C2=CC=C(C=C2)F)C3=CC=CC=C3)C(=O)NC4=CC=CC=C4')
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=881)
print(fp.ToBitString())

Database connectivity issue

Hi,
pls check the issue, how can I solve this? pls, let me know.

Traceback (most recent call last):
  File "/opt/software/applications/miniconda3/lib/python3.7/site-packages/pandas/io/sql.py", line 2056, in execute
    cur.execute(*args, **kwargs)
sqlite3.OperationalError: disk I/O error

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "DDIMDL.py", line 387, in <module>
    main(args)
  File "DDIMDL.py", line 333, in main
    df_drug = pd.read_sql('select * from drug;', conn)
  File "/opt/software/applications/miniconda3/lib/python3.7/site-packages/pandas/io/sql.py", line 608, in read_sql
    chunksize=chunksize,
  File "/opt/software/applications/miniconda3/lib/python3.7/site-packages/pandas/io/sql.py", line 2116, in read_query
    cursor = self.execute(*args)
  File "/opt/software/applications/miniconda3/lib/python3.7/site-packages/pandas/io/sql.py", line 2068, in execute
    raise ex from exc
pandas.io.sql.DatabaseError: Execution failed on sql 'select * from drug;': disk I/O error

drawing

Hello, Deng Yifan. I'm very interested in your article. I would like to ask you two questions：

First: There is a drawing method in ddimdl.py, but it is not called in the code. Is there any part of the code that has not been uploaded?There are also several diagrams mentioned in the paper have not seen the code, convenient
Can you share the code?

Second: At the end of the code run, only the results of the DDIMDL model appear. What can I do to run the results of "RF","KNN" and "LR"?

Looking forward to your reply, thank you very much!

KeyError: 'DB00001'

I am trying to run DRKG_drug_spider.py and it is throwing an error KeyError: DB00001.
What I have done is create a drug table first then run your code it is giving me the below error. Can you please help me with what's wrong I am doing?

KeyError Traceback (most recent call last)
in
51 transporter=''
52 #Creat a table named drug first, so that you can use the insert sql code.
---> 53 cur.execute("insert into drug(id,name,interaction,smile,target,enzyme,carrier,transporter)values(?,?,?,?,?,?,?,?)",(drug[0][i],name,interaction,smile,target,enzyme,carrier,transporter))
54 conn.commit()
55 conn.close()

~\Anaconda3\lib\site-packages\pandas\core\series.py in getitem(self, key)
1066 key = com.apply_if_callable(key, self)
1067 try:
-> 1068 result = self.index.get_value(self, key)
1069
1070 if not is_scalar(result):

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4728 k = self._convert_scalar_indexer(k, kind="getitem")
4729 try:
-> 4730 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
4731 except KeyError as e1:
4732 if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: 'DB00001'

NLPProcess running error

Hello, Deng Yifan. After I installed the environment according to requirement.txt, I still got an error when running NLPProcess: RuntimeError: index_select(): Expected dtype int32 or int64 for index. Could you please help me solve it. This will help me a lot.

Download Issue

I am running the DRKG_drug_spider.py file but always it is printing 150???

error related to sklearn.linear_model.logistic

Hello,
I am trying to run this script but its giving me an error.
from sklearn.linear_model.logistic import LogisticRegression
ModuleNotFoundError: No module named 'sklearn.linear_model.logistic'

NLP_Process

Can anyone please share with me the complete NLPProcess code that should be working fine with stanza?

task

Hi,Where are the three tasks mentioned in the literature in the code?

Input shapes for LSTMS

Hi @YifanDengWHU hope you will be doing well I want to try LSTM or RNN on your dataset using your feature extractions methods I did change the DNN function in your code but throw an error of input shapes can you please help me a little what will be input shapes for LSTMs or RNN I am pasting the updated DNN function code below thanks. I did try by my self to find out the shapes but because of my laptop specs, I did take a lot of time.
def DNN():
train_input=Input(shape=(vector_size*2,),name='Inputlayer')
train_in=LSTM(512)(train_input)
train_in=BatchNormalization()(train_in)
train_in=Dropout(droprate)(train_in)
train_in=LSTM(256)(train_in)
train_in=BatchNormalization()(train_in)
train_in=Dropout(droprate)(train_in)
train_in=Dense(event_num)(train_in)
out=Activation('softmax')(train_in)
model=Model(input=train_input,output=out)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
return model