Please wait a minute...

中国生物工程杂志

CHINA BIOTECHNOLOGY
中国生物工程杂志  2021, Vol. 41 Issue (11): 40-47    DOI: 10.13523/j.cb.2106027
技术与方法     
基于深度学习与多层次信息融合的药物靶标亲和力预测*
唐跃威1,刘治平1,2,**()
1 山东大学控制科学与工程学院 济南 250061
2 山东大学智能医学工程研究中心 济南 250061
Drug-target Affinity Prediction Based on Deep Learning and Multi-layered Information Fusion
TANG Yue-wei1,LIU Zhi-ping1,2,**()
1 School of Control Science and Engineering, Shandong University, Jinan 250061, China
2 Center for Intelligent Medicine, Shandong University, Jinan 250061, China
 全文: PDF(748 KB)   HTML
摘要:

药物研发是非常重要但也十分耗费人力物力的过程。利用计算机辅助预测药物与蛋白质亲和力的方法可以极大地加快药物研发过程。药物靶标亲和力预测的关键在于对药物和蛋白质进行准确详细地信息表征。提出一种基于深度学习与多层次信息融合的药物靶标亲和力的预测模型,试图通过综合药物与蛋白质的多层次信息,来获得更好的预测表现。首先将药物表述成分子图和扩展连接指纹两种形式,分别利用图卷积神经网络模块和全连接层进行学习;其次将蛋白质序列和蛋白质K-mer特征分别输入卷积神经网络模块和全连接层来学习蛋白质潜在特征;随后将4个通道学习到的特征进行融合,再利用全连接层进行预测。在两个基准药物靶标亲和力数据集上验证了所提方法的有效性,并与其他已有模型作对比研究。结果说明提出的模型相比基准模型能得到更好的预测性能,表明提出的综合药物与蛋白质多层次信息的药物靶标亲和力预测策略是有效的。

关键词: 药物靶标亲和力药物蛋白质深度学习多层次信息融合    
Abstract:

Drug discovery is a very important and costly process. Computer-assisted methods for predicting drug-protein affinity can greatly speed up the process of drug discovery. The key to the prediction of drug target affinity lies in the accurate and detailed characterization of drug and protein information. In this paper, a prediction model for drug target affinity based on deep learning and multi-layered information fusion is proposed, in an attempt to obtain better prediction performance by integrating multi-layered information of drugs and proteins. Firstly, the drug is expressed as molecular graph and ECFP, GCN module and fully connected(FC) layer are used for learning, respectively. Secondly, protein sequence and K-mer feature of protein are input into CNN module and FC layer, respectively to learn potential protein features. Finally, the features learned from the four channels are concatenated and the FC layer is used for prediction. In this study, the availability of the proposed method is verified on the two benchmark datasets of drug-targets affinity and compared with other existing models. The results show that the proposed model can obtain better prediction performance than the baseline model, which indicates that the proposed strategy for predicting drug target affinity based on multi-layered information fusion of drug and protein is effective.

Key words: Drug target affinity    Drug    Protein    Deep learning    Multi-layered information fusion
收稿日期: 2021-06-16 出版日期: 2021-12-01
ZTFLH:  Q819  
基金资助: * 国家重点研发计划(2020YFA0712402);国家自然科学基金(61973190)
通讯作者: 刘治平     E-mail: zpliu@sdu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
唐跃威
刘治平

引用本文:

唐跃威,刘治平. 基于深度学习与多层次信息融合的药物靶标亲和力预测*[J]. 中国生物工程杂志, 2021, 41(11): 40-47.

TANG Yue-wei,LIU Zhi-ping. Drug-target Affinity Prediction Based on Deep Learning and Multi-layered Information Fusion. China Biotechnology, 2021, 41(11): 40-47.

链接本文:

https://manu60.magtech.com.cn/biotech/CN/10.13523/j.cb.2106027        https://manu60.magtech.com.cn/biotech/CN/Y2021/V41/I11/40

Datasets Proteins Drugs Interactions Density/%
Davis 442 68 30 056 100
KIBA 229 2 111 118 254 24.4
表1  Davis和KIBA数据集
图1  模型框架
Method Drug Protein Evaluation
Graph ECFP PS K-mer CI MSE
Proposed model M1 0 1 0 1 0.886 0.240
M2 0 1 1 0 0.887 0.250
M3 1 0 0 1 0.868 0.287
M4 1 0 1 0 0.876 0.282
M5 0 1 1 1 0.889 0.241
M6 1 0 1 1 0.873 0.281
M7 1 1 0 1 0.884 0.245
M8 1 1 1 0 0.882 0.254
M9 1 1 1 1 0.885 0.246
表2  文中所提模型在Davis数据集上的独立测试集的预测值平均CI/MSE得分
Method Drug Protein Evaluation
Graph ECFP PS K-mer CI MSE
Proposed model M1 0 1 0 1 0.879 0.152
M2 0 1 1 0 0.880 0.154
M3 1 0 0 1 0.866 0.167
M4 1 0 1 0 0.869 0.165
M5 0 1 1 1 0.882 0.147
M6 1 0 1 1 0.873 0.163
M7 1 1 0 1 0.877 0.157
M8 1 1 1 0 0.880 0.152
M9 1 1 1 1 0.882 0.149
表3  文中所提模型在KIBA数据集上的独立测试集的预测值平均CI/MSE得分
Method Drug Protein CI MSE
KronRLS PubChem Sim S-W 0.871 0.379
SimBoost PubChem Sim S-W 0.872 0.282
DeepDTA LS PS 0.878 0.261
WideDTA LS+LMCS PS + PDM 0.886 0.262
Proposed model ECFP PS + K-mer 0.889 0.241
表4  本文所提模型与基线模型在Davis数据集上CI/MSE得分对比
Method Drug Protein CI MSE
KronRLS PubChem Sim S-W 0.782 0.411
SimBoost PubChem Sim S-W 0.836 0.222
DeepDTA LS PS 0.863 0.194
WideDTA LS+LMCS PS+PDM 0.875 0.194
Proposed model ECFP PS+K-mer 0.882 0.147
表5  本文所提模型与基线模型在KIBA数据集上CI/MSE得分对比
[1] Mullard A. New drugs cost US$2.6 billion to develop. Nature Reviews Drug Discovery, 2014, 13(12):877.
[2] Ashburn T T, Thor K B. Drug repositioning: identifying and developing new uses for existing drugs. Nature Reviews Drug Discovery, 2004, 3(8):673-683.
pmid: 15286734
[3] Öztürk H, Ozkirimli E, Özgür A. WideDTA: prediction of drug-target binding affinity.[2019-02-04].https://arxiv.org/abs/1902.04166 .
[4] Yamanishi Y, Kotera M, Kanehisa M, et al. Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics, 2010, 26(12):i246-i254.
doi: 10.1093/bioinformatics/btq176
[5] Nascimento A C A, Prudêncio R B C, Costa I G. A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics, 2016, 17:46.
doi: 10.1186/s12859-016-0890-3 pmid: 26801218
[6] Cheng Z Z, Zhou S G, Wang Y, et al. Effectively identifying compound-protein interactions by learning from positive and unlabeled examples. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2018, 15(6):1832-1843.
doi: 10.1109/TCBB.8857
[7] Pahikkala T, Airola A, Pietilä S, et al. Toward more realistic drug-target interaction predictions. Briefings in Bioinformatics, 2015, 16(2):325-337.
doi: 10.1093/bib/bbu010 pmid: 24723570
[8] He T, Heidemeyer M, Ban F Q, et al. SimBoost: a read-across approach for predicting drug-target binding affinities using gradient boosting machines. Journal of Cheminformatics, 2017, 9(1):24.
doi: 10.1186/s13321-017-0209-z
[9] Wu Y F, Gao M, Zeng M, et al. BridgeDPI: a novel graph neural network for predicting drug-protein interactions.[2021-01-29].https://arxiv.org/abs/2101.12547 .
[10] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 2012, 29(6):82-97.
[11] Ma J S, Sheridan R P, Liaw A, et al. Deep neural nets as a method for quantitative structure-activity relationships. Journal of Chemical Information and Modeling, 2015, 55(2):263-274.
doi: 10.1021/ci500747n
[12] Öztürk H, Özgür A, Ozkirimli E. DeepDTA: deep drug-target binding affinity prediction. Bioinformatics, 2018, 34(17):i821-i829.
doi: 10.1093/bioinformatics/bty593
[13] Abbasi K, Razzaghi P, Poso A, et al. DeepCDA: deep cross-domain compound-protein affinity prediction through LSTM and convolutional neural networks. Bioinformatics, 2020, 36(17):4633-4642.
doi: 10.1093/bioinformatics/btaa544
[14] Hirohara M, Saito Y, Koda Y, et al. Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinformatics, 2018, 19(Suppl 19):526.
doi: 10.1186/s12859-018-2523-5 pmid: 30598075
[15] Karimi M, Wu D, Wang Z Y, et al. DeepAffinity: interpretable deep learning of compound-protein affinity through unified recurrent and convolutional neural networks. Bioinformatics, 2019, 35(18):3329-3338.
doi: 10.1093/bioinformatics/btz111
[16] Lim J, Ryu S, Park K, et al. Predicting drug-target interaction using a novel graph neural network with 3D structure-embedded graph representation. Journal of Chemical Information and Modeling, 2019, 59(9):3981-3988.
doi: 10.1021/acs.jcim.9b00387
[17] Gomes J, Ramsundar B, Feinberg E N, et al. Atomic convolutional networks for predicting protein-ligand binding affinity.[2017-03-30].https://arxiv.org/abs/1703.10603 .
[18] Rose P W, Prlić A, Altunkaya A, et al. The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Research, 2017, 45(D1):D271-D281.
[19] Davis M I, Hunt J P, Herrgard S, et al. Comprehensive analysis of kinase inhibitor selectivity. Nature Biotechnology, 2011, 29(11):1046-1051.
doi: 10.1038/nbt.1990
[20] Tang J, Szwajda A, Shakyawar S, et al. Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. Journal of Chemical Information and Modeling, 2014, 54(3):735-743.
doi: 10.1021/ci400709d
[21] Rogers D, Hahn M. Extended-connectivity fingerprints. Journal of Chemical Information and Modeling, 2010, 50(5):742-754.
doi: 10.1021/ci100050t pmid: 20426451
[22] Wu Z H, Pan S R, Chen F W, et al. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1):4-24.
doi: 10.1109/TNNLS.5962385
[23] Ramsundar B, Eastman P, Walters P, et al. Deep learning for the life sciences: applying deep learning to genomics, microscopy, drug discovery, and more. California: O’Reilly Media, 2019: 55-56.
[24] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks.[2017-02-22].https://arxiv.org/abs/1609.02907v4 .
[25] Gönen M, Heller G. Concordance probability and discriminatory power in proportional hazards regression. Biometrika, 2005, 92(4):965-970.
doi: 10.1093/biomet/92.4.965
[26] Sigrist C J A, Cerutti L, de Castro E, et al. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Research, 2010, 38(suppl_1):D161-D166.
doi: 10.1093/nar/gkp885
[1] 孙莉萍,徐宛,李孟伟,曾茹,翁建. 孢粉素的物理化学性质和生物医学应用研究进展*[J]. 中国生物工程杂志, 2021, 41(9): 92-100.
[2] 李佳欣,张正,刘赫,杨青,吕成志,杨君. 角蛋白载药纳米颗粒的制备及药物可控释放性能研究*[J]. 中国生物工程杂志, 2021, 41(8): 8-16.
[3] 刘少金,冯雪娇,王俊姝,肖正强,程平生. 我国核酸药物市场分析及对策建议[J]. 中国生物工程杂志, 2021, 41(7): 99-109.
[4] 陈文洁,苗先锋. 抗体偶联药物国内研发现状及企业布局分析[J]. 中国生物工程杂志, 2021, 41(6): 105-110.
[5] 许叶春,柳红,李剑峰,沈敬山,蒋华良. 抗新冠肺炎药物研究进展[J]. 中国生物工程杂志, 2021, 41(6): 111-118.
[6] 史瑞,严景华. 抗新型冠状病毒单克隆中和抗体药物研发进展*[J]. 中国生物工程杂志, 2021, 41(6): 129-135.
[7] 苗轶男,李敬知,王帅,李春,王颖. 萜烯生物合成中关键酶的研究进展*[J]. 中国生物工程杂志, 2021, 41(6): 60-70.
[8] 吕慧中,赵晨辰,朱链,许娜. 外泌体靶向递药在肿瘤治疗中的进展[J]. 中国生物工程杂志, 2021, 41(5): 79-86.
[9] 吴忧,辛林. 新的药物传递系统:外泌体作为药物载体递送*[J]. 中国生物工程杂志, 2020, 40(9): 28-35.
[10] 杨威,宋方祥,王帅,张黎,王红霞,李焱. 药物输送系统中Janus纳米粒子的制备及应用 *[J]. 中国生物工程杂志, 2020, 40(7): 70-81.
[11] 武瑞君,李治非,张鑫,濮润,敖翼,孙燕荣. 新冠病毒抗体药物研发进展及展望分析[J]. 中国生物工程杂志, 2020, 40(5): 1-6.
[12] 胡益波,皮畅钰,张哲,向柏宇,夏立秋. 丝状真菌蛋白表达系统研究进展*[J]. 中国生物工程杂志, 2020, 40(5): 94-104.
[13] 程平,张洋子,马翾,陈旭,朱保庆,许文涛. 刺激响应型DNA水凝胶的性质及其应用 *[J]. 中国生物工程杂志, 2020, 40(3): 132-143.
[14] 陈心怡,刘护,戴大章,李春. 提高糖基化的酶蛋白可结晶性研究 *[J]. 中国生物工程杂志, 2020, 40(3): 154-162.
[15] 李炳娟,刘金锭,廖谊芳,韩文英,刘珂,侯晨露,张磊. 老黄酶OYE家族的蛋白质工程的研究进展 *[J]. 中国生物工程杂志, 2020, 40(3): 163-169.