Please wait a minute...

中国生物工程杂志

China Biotechnology
China Biotechnology  2022, Vol. 42 Issue (4): 40-48    DOI: 10.13523/j.cb.2111037
    
Research Progress of Drug Target Interaction Prediction Based on Machine Learning
LIU Hao-miao,YANG Zhi-wei**(),WANG Li-zhuo,ZHOU Yan-zhang,LONG Jian-gang
Center of Mitochondrial Biology and Medicine, Key Laboratory of Biomedical Information Engineering, Ministry of Education, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
Download: HTML   PDF(785KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

In recent years, with the continuous breakthrough of computer hardware capability, software efficiency and data abundance, the applications of artificial intelligence technology represented by machine learning have been continuously expanded and integrated, which has greatly promoted the development in fields of biology, medicine, pharmacy, and especially drug R&D. Among those technology advancements, the identification of drug-target interactions (DTI) is an important problem in the field of drug R&D and a popular research direction for the cross-integration of artificial intelligence technology. As the source of innovative drug development, drug-target interaction prediction can provide high-probability potential drug targets for biological experiments, thereby increasing the rate of lead compound discovery, increasing the success rate of late-stage drug development and shortening the total development cycle. Researchers have already done a lot of work in constructing the prediction methods of drug-target interactions by building databases, developing software and establishing machine learning algorithms. In most works, data are transformed into feature vectors or similarities, and then suitable machine learning methods are employed to build predictive models. This paper introduces the basic process and reviews the research progress of drug-target interaction prediction based on machine learning. In addition, the advantages and disadvantages of existing prediction methods are briefly summarized in order to facilitate the development of more efficient prediction algorithms and drug-target interaction prediction methods.



Key wordsMachine learning      Drug target interaction      Drug research      Algorithm     
Received: 18 November 2021      Published: 05 May 2022
ZTFLH:  Q819  
Corresponding Authors: Zhi-wei YANG     E-mail: yzws-123@xjtu.edu.cn
Cite this article:

LIU Hao-miao,YANG Zhi-wei,WANG Li-zhuo,ZHOU Yan-zhang,LONG Jian-gang. Research Progress of Drug Target Interaction Prediction Based on Machine Learning. China Biotechnology, 2022, 42(4): 40-48.

URL:

https://manu60.magtech.com.cn/biotech/10.13523/j.cb.2111037     OR     https://manu60.magtech.com.cn/biotech/Y2022/V42/I4/40

Fig.1 Technology roadmap of machine learning applied in the DTI prediction
数据库 链接 简介
DrugBank[7] https://go.drugbank.com 包含详细的药物数据和全面的药物靶标信息,最流行的数据库之一
PubChem[8] https://pubchem.ncbi.nlm.nih.gov 各种化合物及其相关活性的集合,支持复杂的查询和检索结果的下载
TTD[9] https://db.idrblab.net/ttd 记录已知蛋白质、核酸靶标相关信息,以及此类靶标针对的疾病、通路和对应产生相互作用的药物-靶标分子
BindingDB[10] https://www.bindingdb.org/bind/index.jsp 提供作用于各类蛋白质的海量小分子活性数据,相互作用的亲和力信息
KEGG DRUG[11] https://www.genome.jp/kegg/drug 各种基因组和生物途径的集合,包含有关各种疾病、药物和化合物的信息
ChEMBL[12] https://www.ebi.ac.uk/chembl 包含具有类似药物特性的生物活性分子的详细信息,提供针对药物靶标的生物活性数据
STITCH[13] https://stitch.embl.de 存储蛋白质和小分子之间相互作用信息,数据从其他数据库和文献中收集
ZINC[14] https://zinc.docking.org 提供化合物的购买信息、靶标、临床试验等方面的信息,并包含靶标预测功能
DGIdb[15] https://dgidb.genome.wustl.edu/ 包含药物与基因相互作用的相关信息,可通过输入基因查找相互作用的药物或通过输入药物查找相互作用的基因
BRENDA[16] https://www.brenda-enzymes.org 全面的酶数据库,包含大量酶及其相应的酶-配体相关信息
UniProt[17] https://www.uniprot.org 蛋白质数据库,包含有关蛋白质序列及其生物功能信息的信息
SIDER[18] https://sideeffects.embl.de 整合了有关药物、靶点和药物副作用的数据,以便全面了解药物的作用及其不良反应
 
工具 链接 简介
CDK[24] http://cdk.github.io/ 软件应安装在Linux下,可以计算16种分子指纹
PaDEL[25] http://www.yapcwsoft.com/dd/padeldescriptor 计算分子描述符和指纹的软件。可以计算12种类型的指纹
RDKit[26] http://www.rdkit.org 为化合物生成各种描述符的工具包,可运行于各种操作系统
ChemDes[27] http://www.scbdd.com/chemdes/list-fingerprints/ 提供了格式转换、描述符计算、指纹生成、相似度计算等功能的Web平台
Rcpi[28] http://bioconductor.org/packages/release/bioc/html/Rcpi.html 用于药物、蛋白质及其相互作用的复杂表示,它计算各种化学、物理化学和结构描述符
PyDPI[29] http://sourceforge.net/projects/pydpicao/ 服务于DTI,可以计算药物的分子描述符和蛋白质的结构和物理化学性质
Table 2 Tools for calculating drug protein descriptors
Fig.2 Binary label matrix R
[1]   Adams C P, Brantner V V. Estimating the cost of new drug development: is it really $802 million? Health Affairs, 2006, 25(2): 420-428.
doi: 10.1377/hlthaff.25.2.420
[2]   Chen S C, Zhu Y L, Zhang D Q, et al. Feature extraction approaches based on matrix pattern: MatPCA and MatFLDA. Pattern Recognition Letters, 2005, 26(8): 1157-1167.
doi: 10.1016/j.patrec.2004.10.009
[3]   Dejori M, Schuermann B, Stetter M. Hunting drug targets by systems-level modeling of gene expression profiles. IEEE Transactions on Nanobioscience, 2004, 3(3): 180-191.
doi: 10.1109/TNB.2004.833690
[4]   Russ A P, Lampel S. The druggable genome: an update. Drug Discovery Today, 2005, 10(23-24): 1607-1610.
doi: 10.1016/S1359-6446(05)03666-4
[5]   Li Z P, Wang R S, Zhang X S. Two-stage flux balance analysis of metabolic networks for drug target identification. BMC Systems Biology, 2011, 5(Suppl 1): S11.
[6]   Chatr-Aryamontri A, Ceol A, Palazzi L M, et al. MINT: the molecular INTeraction database. Nucleic Acids Research, 2007, 35(Database): D572-D574.
[7]   Wishart D S, Knox C, Guo A C, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research, 2006, 34(suppl_1): D668-D672.
doi: 10.1093/nar/gkj067
[8]   Kim S, Thiessen P A, Bolton E E, et al. PubChem substance and compound databases. Nucleic Acids Research, 2015, 44(D1): D1202-D1213.
doi: 10.1093/nar/gkv951
[9]   Chen X, Ji Z L, Chen Y Z. TTD: therapeutic target database. Nucleic Acids Research, 2002, 30(1): 412-415.
[10]   Liu T Q, Lin Y, Wen X, et al. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Research, 2006, 35(suppl_1): D198-D201.
[11]   Kanehisa M, Furumichi M, Tanabe M, et al. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Research, 2016, 45(D1): D353-D361.
doi: 10.1093/nar/gkw1092
[12]   Gaulton A, Bellis L J, Bento A P, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Research, 2011, 40(D1): D1100-D1107.
[13]   Szklarczyk D, Santos A, von Mering C, et al. STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Research, 2015, 44(D1): D380-D384.
doi: 10.1093/nar/gkv1277
[14]   Sterling T, Irwin J J. ZINC 15-ligand discovery for everyone. Journal of Chemical Information and Modeling, 2015, 55(11): 2324-2337.
doi: 10.1021/acs.jcim.5b00559 pmid: 26479676
[15]   Cotto K C, Wagner A H, Feng Y Y, et al. DGIdb 3.0: a redesign and expansion of the drug-gene interaction database. Nucleic Acids Research, 2018, 46(D1): D1068-D1073.
[16]   Schomburg I, Chang A, Ebeling C, et al. BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Research, 2004, 32(suppl_1): D431-D433.
[17]   Consortium U. UniProt: a hub for protein information. Nucleic Acids Research, 2015, 43(Database issue): D204-D212.
doi: 10.1093/nar/gku989
[18]   Kuhn M, Letunic I, Jensen L J, et al. The SIDER database of drugs and side effects. Nucleic Acids Research, 2016, 44(D1): D1075-D1079.
[19]   Pozzan A. Molecular descriptors and methods for ligand based virtual high throughput screening in drug discovery. Current Pharmaceutical Design, 2006, 12(17): 2099-2110.
doi: 10.2174/138161206777585247
[20]   Chen I J, Hubbard R E. Lessons for fragment library design: analysis of output from multiple screening campaigns. Journal of Computer-Aided Molecular Design, 2009, 23(8): 603-620.
doi: 10.1007/s10822-009-9280-5 pmid: 19495994
[21]   Feng H W, Zhang L, Li S M, et al. Predicting the reproductive toxicity of chemicals using ensemble learning methods and molecular fingerprints. Toxicology Letters, 2021, 340: 4-14.
doi: 10.1016/j.toxlet.2021.01.002
[22]   Batista J, Godden J W, Bajorath J. Assessment of molecular similarity from the analysis of randomly generated structural fragment populations. Journal of Chemical Information and Modeling, 2006, 46(5): 1937-1944.
doi: 10.1021/ci0601261 pmid: 16995724
[23]   Biasini M, Bienert S, Waterhouse A, et al. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Research, 2014, 42(Web Server issue): W252-W258.
doi: 10.1093/nar/gku340
[24]   Steinbeck C, Han Y Q, Kuhn S, et al. The chemistry development kit (CDK): an open-source Java library for chemo- and bioinformatics. ChemInform, 2003, 34(21): 493-500.
[25]   Yap C W. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. Journal of Computational Chemistry, 2011, 32(7): 1466-1474.
doi: 10.1002/jcc.21707
[26]   Lovrić M, Molero J M, Kern R. PySpark and RDKit: moving towards big data in cheminformatics. Molecular Informatics, 2019, 38(6): 1800082.
doi: 10.1002/minf.201800082
[27]   Dong J, Cao D S, Miao H Y, et al. ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation. Journal of Cheminformatics, 2015, 7: 60.
doi: 10.1186/s13321-015-0109-z pmid: 26664458
[28]   Cao D S, Xiao N, Xu Q S, et al. Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions. Bioinformatics, 2014, 31(2): 279-281.
doi: 10.1093/bioinformatics/btu624
[29]   Cao D S, Liang Y Z, Yan J, et al. PyDPI: freely available Python package for chemoinformatics, bioinformatics, and chemogenomics studies. Journal of Chemical Information and Modeling, 2013, 53(11): 3086-3096.
doi: 10.1021/ci400127q
[30]   Johnson M, Maggiora G. Concepts and applications of molecular similarity. New York: Wiley Interscience, 1990.
[31]   González-Díaz H, Prado-Prado F, García-Mera X, et al. MIND-BEST: web server for drugs and target discovery; design, synthesis, and assay of MAO-B inhibitors and theoretical-experimental study of G3PDH protein from Trichomonas gallinae. Journal of Proteome Research, 2011, 10(4): 1698-1718.
doi: 10.1021/pr101009e pmid: 21184613
[32]   Shoichet B K, Kuntz I D, Bodian D L. Molecular docking using shape descriptors. Journal of Computational Chemistry, 1992, 13(3): 380-397.
doi: 10.1002/jcc.540130311
[33]   Chen X, Liu X E, Wu J. Research progress on drug representation learning. Journal of Tsinghua University (Science and Technology), 2020(2): 171-180.
[34]   Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. CoRR, 2012.DOI: abs/1201.0490:2825-2830.
doi: abs/1201.0490:2825-2830
[35]   Quinlan J R. Induction of decision trees. Machine Learning, 1986, 1(1): 81-106.
[36]   Deb K, Pratap A, Agarwal S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182-197.
doi: 10.1109/4235.996017
[37]   Mountrakis G, Im J, Ogole C. Support vector machines in remote sensing: a review. ISPRS Journal of Photogrammetry and Remote Sensing, 2011, 66(3): 247-259.
doi: 10.1016/j.isprsjprs.2010.11.001
[38]   Biau G. Analysis of a random forests model. Journal of Machine Learning Research, 2012, 13: 1063-1095.
[39]   Peduzzi P, Concato J, Kemper E, et al. A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 1996, 49(12): 1373-1379.
doi: 10.1016/s0895-4356(96)00236-3 pmid: 8970487
[40]   Srivastava N, Hinton G E, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[41]   Wu Z R, Li W H, Liu G X, et al. Network-based methods for prediction of drug-target interactions. Frontiers in Pharmacology, 2018, 9: 1134.
doi: 10.3389/fphar.2018.01134
[42]   Zeng X X, Zhu S Y, Liu X R, et al. deepDR: a network-based deep learning approach to in silico drug repositioning. Bioinformatics, 2019, 35(24): 5191-5198.
doi: 10.1093/bioinformatics/btz418
[43]   Zhang R L, Ding Y R. Identification of key features of CNS drugs based on SVM and greedy algorithm. Current Computer-Aided Drug Design, 2020, 16(6): 725-733.
doi: 10.2174/1573409915666191212095340
[44]   Madhukar N S, Khade P K, Huang L, et al. A Bayesian machine learning approach for drug target identification using diverse data types. Nature Communications, 2019, 10: 5221.
doi: 10.1038/s41467-019-12928-6 pmid: 31745082
[45]   Mahmud S M H, Chen W Y, Liu Y S, et al. PreDTIs: prediction of drug-target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques. Briefings in Bioinformatics, 2021, 22(5): bbab046.
doi: 10.1093/bib/bbab046
[46]   Piazza I, Beaton N, Bruderer R, et al. A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes. Nature Communications, 2020, 11: 4200.
doi: 10.1038/s41467-020-18071-x
[47]   Chu Y Y, Kaushik A C, Wang X G, et al. DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Briefings in Bioinformatics, 2021, 22(1): 451-462.
doi: 10.1093/bib/bbz152
[48]   Li Y, Liu X Z, You Z H, et al. A computational approach for predicting drug-target interactions from protein sequence and drug substructure fingerprint information. International Journal of Intelligent Systems, 2021, 36(1): 593-609.
doi: 10.1002/int.22332
[49]   Sachdev K, Gupta M K. A comprehensive review of feature based methods for drug target interaction prediction. Journal of Biomedical Informatics, 2019, 93: 103159.
doi: 10.1016/j.jbi.2019.103159
[50]   Li X Y, Li W K, Zeng M, et al. Network-based methods for predicting essential genes or proteins: a survey. Briefings in Bioinformatics, 2020, 21(2): 566-583.
doi: 10.1093/bib/bbz017
[51]   Huang K, Xiao C, Glass L M, et al. SkipGNN: predicting molecular interactions with skip-graph networks. Scientific Reports, 2020, 10: 21092.
doi: 10.1038/s41598-020-77766-9
[52]   Parvizi P, Azuaje F, Theodoratou E, et al. A network-based embedding method for drug-target interaction prediction. Annual International Conference of the IEEE Engineering in Medicine and Biology Society IEEE Engineering in Medicine and Biology Society Annual International Conference, 2020, 2020: 5304-5307.
[53]   Yue Y, He S. DTI-HeNE: a novel method for drug-target interaction prediction based on heterogeneous network embedding. BMC Bioinformatics, 2021, 22(1): 418.
doi: 10.1186/s12859-021-04327-w
[54]   Wan F P, Hong L X, Xiao A, et al. NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions. Bioinformatics, 2018, 35(1): 104-111.
doi: 10.1093/bioinformatics/bty543
[55]   Mohamed S K, Novááček V, Nounu A. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics, 2019, 36(2): 603-610.
[56]   Shang Y F, Gao L, Zou Q, et al. Prediction of drug-target interactions based on multi-layer network representation learning. Neurocomputing, 2021, 434: 80-89.
doi: 10.1016/j.neucom.2020.12.068
[57]   Zhao T Y, Hu Y, Valsdottir L R, et al. Identifying drug-target interactions based on graph convolutional network and deep neural network. Briefings in Bioinformatics, 2020, 22(2): 2141-2150.
doi: 10.1093/bib/bbaa044
[58]   Xu X, Xuan P, Zhang T, et al. Inferring drug-target interactions based on random walk and convolutional neural network. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021. DOI: 10.1109/TCBB.2021.3066813.
doi: 10.1109/TCBB.2021.3066813
[59]   Lee D D, Seung H S. Learning the parts of objects by non-negative matrix factorization. Nature, 1999, 401 (6755): 788-791.
doi: 10.1038/44565
[60]   Stokes J M, Yang K, Swanson K, et al. A deep learning approach to antibiotic discovery. Cell, 2020, 180(4): 688-702.e13.
doi: 10.1016/j.cell.2020.01.021
[61]   Meng Y J, Jin M, Tang X F, et al. Drug repositioning based on similarity constrained probabilistic matrix factorization: COVID-19 as a case study. Applied Soft Computing, 2021, 103: 107135.
doi: 10.1016/j.asoc.2021.107135
[62]   Bagherian M, Kim R B, Jiang C, et al. Coupled matrix-matrix and coupled tensor-matrix completion methods for predicting drug-target interactions. Briefings in Bioinformatics, 2020, 22(2): 2161-2171.
doi: 10.1093/bib/bbaa025 pmid: 32186716
[63]   Yang M Y, Wu G Y, Zhao Q C, et al. Computational drug repositioning based on multi-similarities bilinear matrix factorization. Briefings in Bioinformatics, 2020, 22(4): bbaa267.
doi: 10.1093/bib/bbaa267
[64]   Ceddia G, Pinoli P, Ceri S, et al. Matrix factorization-based technique for drug repurposing predictions. IEEE Journal of Biomedical and Health Informatics, 2020, 24(11): 3162-3172.
doi: 10.1109/JBHI.2020.2991763
[65]   Hao M, Bryant S H, Wang Y. Predicting drug-target interactions by dual-network integrated logistic matrix factorization. Scientific Reports, 2017, 7: 40376.
doi: 10.1038/srep40376
[66]   Wang M H, Tang C, Chen J J. Drug-target interaction prediction via dual Laplacian graph regularized matrix completion. BioMed Research International, 2018, 2018: 1425608.
[67]   Peng Y H, Gao P P, Shi L, et al. Central and peripheral metabolic defects contribute to the pathogenesis of Alzheimer’s disease: targeting mitochondria for diagnosis and prevention. Antioxidants & Redox Signaling, 2020, 32(16): 1188-1236.
[68]   Hao J J, Shen W L, Tian C, et al. Mitochondrial nutrients improve immune dysfunction in the type 2 diabetic Goto-Kakizaki rats. Journal of Cellular and Molecular Medicine, 2009, 13(4): 701-711.
doi: 10.1111/j.1582-4934.2008.00342.x
[1] WU Rui-jun,LI Zhi-fei,ZHANG Xin,PU Run,AO Yi,SUN Yan-rong. Development and Prospect of Antibody Drugs for SARS-CoV-2[J]. China Biotechnology, 2020, 40(5): 1-6.
[2] JIANG Ji-zhe, PAN Hang, YUE Min, ZHANG Le. The Study of Worldwide Brucella canis of Phylogenetic Groups by Comparative Genomics-based Approaches[J]. China Biotechnology, 2020, 40(3): 38-47.
[3] Zhi-yong XIE,Xiang ZHOU. Machine Learning in Medical Imaging:the Applications in Drug Discovery and Precision Medicine[J]. China Biotechnology, 2019, 39(2): 90-100.
[4] ZHANG Xu, DING Jian, GAO Peng, GAO Min-jie, JIA Lu-qiang, TU Ting-yong, SHI Zhong-ping. Fed-batch Culture of Saccharomyces cerevisiae with Adaptive Control Based on Differential Evolution Algorithm[J]. China Biotechnology, 2016, 36(1): 68-75.
[5] PU Run, GUAN Zhen-he, SU Yue, GENG Xiang-nan, AO Yi. Research & Development Status and Future Suggestions of Pharmaceutical Industry in China[J]. China Biotechnology, 2014, 34(7): 114-119.
[6] ZHOU Yong, ZHENG Yi, SONG Li-dan. The Optimization of Medium for Coenzyme Q10 Fermentation by Artificial Neural Network associated with Genetic Algorithms[J]. China Biotechnology, 2013, 33(9): 73-78.