|
|
Research Progress of Drug Target Interaction Prediction Based on Machine Learning |
LIU Hao-miao,YANG Zhi-wei**(),WANG Li-zhuo,ZHOU Yan-zhang,LONG Jian-gang |
Center of Mitochondrial Biology and Medicine, Key Laboratory of Biomedical Information Engineering, Ministry of Education, School of Life Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China |
|
|
Abstract In recent years, with the continuous breakthrough of computer hardware capability, software efficiency and data abundance, the applications of artificial intelligence technology represented by machine learning have been continuously expanded and integrated, which has greatly promoted the development in fields of biology, medicine, pharmacy, and especially drug R&D. Among those technology advancements, the identification of drug-target interactions (DTI) is an important problem in the field of drug R&D and a popular research direction for the cross-integration of artificial intelligence technology. As the source of innovative drug development, drug-target interaction prediction can provide high-probability potential drug targets for biological experiments, thereby increasing the rate of lead compound discovery, increasing the success rate of late-stage drug development and shortening the total development cycle. Researchers have already done a lot of work in constructing the prediction methods of drug-target interactions by building databases, developing software and establishing machine learning algorithms. In most works, data are transformed into feature vectors or similarities, and then suitable machine learning methods are employed to build predictive models. This paper introduces the basic process and reviews the research progress of drug-target interaction prediction based on machine learning. In addition, the advantages and disadvantages of existing prediction methods are briefly summarized in order to facilitate the development of more efficient prediction algorithms and drug-target interaction prediction methods.
|
Received: 18 November 2021
Published: 05 May 2022
|
|
Corresponding Authors:
Zhi-wei YANG
E-mail: yzws-123@xjtu.edu.cn
|
|
|
[1] |
Adams C P, Brantner V V. Estimating the cost of new drug development: is it really $802 million? Health Affairs, 2006, 25(2): 420-428.
doi: 10.1377/hlthaff.25.2.420
|
|
|
[2] |
Chen S C, Zhu Y L, Zhang D Q, et al. Feature extraction approaches based on matrix pattern: MatPCA and MatFLDA. Pattern Recognition Letters, 2005, 26(8): 1157-1167.
doi: 10.1016/j.patrec.2004.10.009
|
|
|
[3] |
Dejori M, Schuermann B, Stetter M. Hunting drug targets by systems-level modeling of gene expression profiles. IEEE Transactions on Nanobioscience, 2004, 3(3): 180-191.
doi: 10.1109/TNB.2004.833690
|
|
|
[4] |
Russ A P, Lampel S. The druggable genome: an update. Drug Discovery Today, 2005, 10(23-24): 1607-1610.
doi: 10.1016/S1359-6446(05)03666-4
|
|
|
[5] |
Li Z P, Wang R S, Zhang X S. Two-stage flux balance analysis of metabolic networks for drug target identification. BMC Systems Biology, 2011, 5(Suppl 1): S11.
|
|
|
[6] |
Chatr-Aryamontri A, Ceol A, Palazzi L M, et al. MINT: the molecular INTeraction database. Nucleic Acids Research, 2007, 35(Database): D572-D574.
|
|
|
[7] |
Wishart D S, Knox C, Guo A C, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Research, 2006, 34(suppl_1): D668-D672.
doi: 10.1093/nar/gkj067
|
|
|
[8] |
Kim S, Thiessen P A, Bolton E E, et al. PubChem substance and compound databases. Nucleic Acids Research, 2015, 44(D1): D1202-D1213.
doi: 10.1093/nar/gkv951
|
|
|
[9] |
Chen X, Ji Z L, Chen Y Z. TTD: therapeutic target database. Nucleic Acids Research, 2002, 30(1): 412-415.
|
|
|
[10] |
Liu T Q, Lin Y, Wen X, et al. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Research, 2006, 35(suppl_1): D198-D201.
|
|
|
[11] |
Kanehisa M, Furumichi M, Tanabe M, et al. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Research, 2016, 45(D1): D353-D361.
doi: 10.1093/nar/gkw1092
|
|
|
[12] |
Gaulton A, Bellis L J, Bento A P, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Research, 2011, 40(D1): D1100-D1107.
|
|
|
[13] |
Szklarczyk D, Santos A, von Mering C, et al. STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Research, 2015, 44(D1): D380-D384.
doi: 10.1093/nar/gkv1277
|
|
|
[14] |
Sterling T, Irwin J J. ZINC 15-ligand discovery for everyone. Journal of Chemical Information and Modeling, 2015, 55(11): 2324-2337.
doi: 10.1021/acs.jcim.5b00559
pmid: 26479676
|
|
|
[15] |
Cotto K C, Wagner A H, Feng Y Y, et al. DGIdb 3.0: a redesign and expansion of the drug-gene interaction database. Nucleic Acids Research, 2018, 46(D1): D1068-D1073.
|
|
|
[16] |
Schomburg I, Chang A, Ebeling C, et al. BRENDA, the enzyme database: updates and major new developments. Nucleic Acids Research, 2004, 32(suppl_1): D431-D433.
|
|
|
[17] |
Consortium U. UniProt: a hub for protein information. Nucleic Acids Research, 2015, 43(Database issue): D204-D212.
doi: 10.1093/nar/gku989
|
|
|
[18] |
Kuhn M, Letunic I, Jensen L J, et al. The SIDER database of drugs and side effects. Nucleic Acids Research, 2016, 44(D1): D1075-D1079.
|
|
|
[19] |
Pozzan A. Molecular descriptors and methods for ligand based virtual high throughput screening in drug discovery. Current Pharmaceutical Design, 2006, 12(17): 2099-2110.
doi: 10.2174/138161206777585247
|
|
|
[20] |
Chen I J, Hubbard R E. Lessons for fragment library design: analysis of output from multiple screening campaigns. Journal of Computer-Aided Molecular Design, 2009, 23(8): 603-620.
doi: 10.1007/s10822-009-9280-5
pmid: 19495994
|
|
|
[21] |
Feng H W, Zhang L, Li S M, et al. Predicting the reproductive toxicity of chemicals using ensemble learning methods and molecular fingerprints. Toxicology Letters, 2021, 340: 4-14.
doi: 10.1016/j.toxlet.2021.01.002
|
|
|
[22] |
Batista J, Godden J W, Bajorath J. Assessment of molecular similarity from the analysis of randomly generated structural fragment populations. Journal of Chemical Information and Modeling, 2006, 46(5): 1937-1944.
doi: 10.1021/ci0601261
pmid: 16995724
|
|
|
[23] |
Biasini M, Bienert S, Waterhouse A, et al. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Research, 2014, 42(Web Server issue): W252-W258.
doi: 10.1093/nar/gku340
|
|
|
[24] |
Steinbeck C, Han Y Q, Kuhn S, et al. The chemistry development kit (CDK): an open-source Java library for chemo- and bioinformatics. ChemInform, 2003, 34(21): 493-500.
|
|
|
[25] |
Yap C W. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. Journal of Computational Chemistry, 2011, 32(7): 1466-1474.
doi: 10.1002/jcc.21707
|
|
|
[26] |
Lovrić M, Molero J M, Kern R. PySpark and RDKit: moving towards big data in cheminformatics. Molecular Informatics, 2019, 38(6): 1800082.
doi: 10.1002/minf.201800082
|
|
|
[27] |
Dong J, Cao D S, Miao H Y, et al. ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation. Journal of Cheminformatics, 2015, 7: 60.
doi: 10.1186/s13321-015-0109-z
pmid: 26664458
|
|
|
[28] |
Cao D S, Xiao N, Xu Q S, et al. Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions. Bioinformatics, 2014, 31(2): 279-281.
doi: 10.1093/bioinformatics/btu624
|
|
|
[29] |
Cao D S, Liang Y Z, Yan J, et al. PyDPI: freely available Python package for chemoinformatics, bioinformatics, and chemogenomics studies. Journal of Chemical Information and Modeling, 2013, 53(11): 3086-3096.
doi: 10.1021/ci400127q
|
|
|
[30] |
Johnson M, Maggiora G. Concepts and applications of molecular similarity. New York: Wiley Interscience, 1990.
|
|
|
[31] |
González-Díaz H, Prado-Prado F, García-Mera X, et al. MIND-BEST: web server for drugs and target discovery; design, synthesis, and assay of MAO-B inhibitors and theoretical-experimental study of G3PDH protein from Trichomonas gallinae. Journal of Proteome Research, 2011, 10(4): 1698-1718.
doi: 10.1021/pr101009e
pmid: 21184613
|
|
|
[32] |
Shoichet B K, Kuntz I D, Bodian D L. Molecular docking using shape descriptors. Journal of Computational Chemistry, 1992, 13(3): 380-397.
doi: 10.1002/jcc.540130311
|
|
|
[33] |
Chen X, Liu X E, Wu J. Research progress on drug representation learning. Journal of Tsinghua University (Science and Technology), 2020(2): 171-180.
|
|
|
[34] |
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. CoRR, 2012.DOI: abs/1201.0490:2825-2830.
doi: abs/1201.0490:2825-2830
|
|
|
[35] |
Quinlan J R. Induction of decision trees. Machine Learning, 1986, 1(1): 81-106.
|
|
|
[36] |
Deb K, Pratap A, Agarwal S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182-197.
doi: 10.1109/4235.996017
|
|
|
[37] |
Mountrakis G, Im J, Ogole C. Support vector machines in remote sensing: a review. ISPRS Journal of Photogrammetry and Remote Sensing, 2011, 66(3): 247-259.
doi: 10.1016/j.isprsjprs.2010.11.001
|
|
|
[38] |
Biau G. Analysis of a random forests model. Journal of Machine Learning Research, 2012, 13: 1063-1095.
|
|
|
[39] |
Peduzzi P, Concato J, Kemper E, et al. A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 1996, 49(12): 1373-1379.
doi: 10.1016/s0895-4356(96)00236-3
pmid: 8970487
|
|
|
[40] |
Srivastava N, Hinton G E, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
|
|
|
[41] |
Wu Z R, Li W H, Liu G X, et al. Network-based methods for prediction of drug-target interactions. Frontiers in Pharmacology, 2018, 9: 1134.
doi: 10.3389/fphar.2018.01134
|
|
|
[42] |
Zeng X X, Zhu S Y, Liu X R, et al. deepDR: a network-based deep learning approach to in silico drug repositioning. Bioinformatics, 2019, 35(24): 5191-5198.
doi: 10.1093/bioinformatics/btz418
|
|
|
[43] |
Zhang R L, Ding Y R. Identification of key features of CNS drugs based on SVM and greedy algorithm. Current Computer-Aided Drug Design, 2020, 16(6): 725-733.
doi: 10.2174/1573409915666191212095340
|
|
|
[44] |
Madhukar N S, Khade P K, Huang L, et al. A Bayesian machine learning approach for drug target identification using diverse data types. Nature Communications, 2019, 10: 5221.
doi: 10.1038/s41467-019-12928-6
pmid: 31745082
|
|
|
[45] |
Mahmud S M H, Chen W Y, Liu Y S, et al. PreDTIs: prediction of drug-target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques. Briefings in Bioinformatics, 2021, 22(5): bbab046.
doi: 10.1093/bib/bbab046
|
|
|
[46] |
Piazza I, Beaton N, Bruderer R, et al. A machine learning-based chemoproteomic approach to identify drug targets and binding sites in complex proteomes. Nature Communications, 2020, 11: 4200.
doi: 10.1038/s41467-020-18071-x
|
|
|
[47] |
Chu Y Y, Kaushik A C, Wang X G, et al. DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features. Briefings in Bioinformatics, 2021, 22(1): 451-462.
doi: 10.1093/bib/bbz152
|
|
|
[48] |
Li Y, Liu X Z, You Z H, et al. A computational approach for predicting drug-target interactions from protein sequence and drug substructure fingerprint information. International Journal of Intelligent Systems, 2021, 36(1): 593-609.
doi: 10.1002/int.22332
|
|
|
[49] |
Sachdev K, Gupta M K. A comprehensive review of feature based methods for drug target interaction prediction. Journal of Biomedical Informatics, 2019, 93: 103159.
doi: 10.1016/j.jbi.2019.103159
|
|
|
[50] |
Li X Y, Li W K, Zeng M, et al. Network-based methods for predicting essential genes or proteins: a survey. Briefings in Bioinformatics, 2020, 21(2): 566-583.
doi: 10.1093/bib/bbz017
|
|
|
[51] |
Huang K, Xiao C, Glass L M, et al. SkipGNN: predicting molecular interactions with skip-graph networks. Scientific Reports, 2020, 10: 21092.
doi: 10.1038/s41598-020-77766-9
|
|
|
[52] |
Parvizi P, Azuaje F, Theodoratou E, et al. A network-based embedding method for drug-target interaction prediction. Annual International Conference of the IEEE Engineering in Medicine and Biology Society IEEE Engineering in Medicine and Biology Society Annual International Conference, 2020, 2020: 5304-5307.
|
|
|
[53] |
Yue Y, He S. DTI-HeNE: a novel method for drug-target interaction prediction based on heterogeneous network embedding. BMC Bioinformatics, 2021, 22(1): 418.
doi: 10.1186/s12859-021-04327-w
|
|
|
[54] |
Wan F P, Hong L X, Xiao A, et al. NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions. Bioinformatics, 2018, 35(1): 104-111.
doi: 10.1093/bioinformatics/bty543
|
|
|
[55] |
Mohamed S K, Novááček V, Nounu A. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics, 2019, 36(2): 603-610.
|
|
|
[56] |
Shang Y F, Gao L, Zou Q, et al. Prediction of drug-target interactions based on multi-layer network representation learning. Neurocomputing, 2021, 434: 80-89.
doi: 10.1016/j.neucom.2020.12.068
|
|
|
[57] |
Zhao T Y, Hu Y, Valsdottir L R, et al. Identifying drug-target interactions based on graph convolutional network and deep neural network. Briefings in Bioinformatics, 2020, 22(2): 2141-2150.
doi: 10.1093/bib/bbaa044
|
|
|
[58] |
Xu X, Xuan P, Zhang T, et al. Inferring drug-target interactions based on random walk and convolutional neural network. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2021. DOI: 10.1109/TCBB.2021.3066813.
doi: 10.1109/TCBB.2021.3066813
|
|
|
[59] |
Lee D D, Seung H S. Learning the parts of objects by non-negative matrix factorization. Nature, 1999, 401 (6755): 788-791.
doi: 10.1038/44565
|
|
|
[60] |
Stokes J M, Yang K, Swanson K, et al. A deep learning approach to antibiotic discovery. Cell, 2020, 180(4): 688-702.e13.
doi: 10.1016/j.cell.2020.01.021
|
|
|
[61] |
Meng Y J, Jin M, Tang X F, et al. Drug repositioning based on similarity constrained probabilistic matrix factorization: COVID-19 as a case study. Applied Soft Computing, 2021, 103: 107135.
doi: 10.1016/j.asoc.2021.107135
|
|
|
[62] |
Bagherian M, Kim R B, Jiang C, et al. Coupled matrix-matrix and coupled tensor-matrix completion methods for predicting drug-target interactions. Briefings in Bioinformatics, 2020, 22(2): 2161-2171.
doi: 10.1093/bib/bbaa025
pmid: 32186716
|
|
|
[63] |
Yang M Y, Wu G Y, Zhao Q C, et al. Computational drug repositioning based on multi-similarities bilinear matrix factorization. Briefings in Bioinformatics, 2020, 22(4): bbaa267.
doi: 10.1093/bib/bbaa267
|
|
|
[64] |
Ceddia G, Pinoli P, Ceri S, et al. Matrix factorization-based technique for drug repurposing predictions. IEEE Journal of Biomedical and Health Informatics, 2020, 24(11): 3162-3172.
doi: 10.1109/JBHI.2020.2991763
|
|
|
[65] |
Hao M, Bryant S H, Wang Y. Predicting drug-target interactions by dual-network integrated logistic matrix factorization. Scientific Reports, 2017, 7: 40376.
doi: 10.1038/srep40376
|
|
|
[66] |
Wang M H, Tang C, Chen J J. Drug-target interaction prediction via dual Laplacian graph regularized matrix completion. BioMed Research International, 2018, 2018: 1425608.
|
|
|
[67] |
Peng Y H, Gao P P, Shi L, et al. Central and peripheral metabolic defects contribute to the pathogenesis of Alzheimer’s disease: targeting mitochondria for diagnosis and prevention. Antioxidants & Redox Signaling, 2020, 32(16): 1188-1236.
|
|
|
[68] |
Hao J J, Shen W L, Tian C, et al. Mitochondrial nutrients improve immune dysfunction in the type 2 diabetic Goto-Kakizaki rats. Journal of Cellular and Molecular Medicine, 2009, 13(4): 701-711.
doi: 10.1111/j.1582-4934.2008.00342.x
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|