|
|
Comparison on the Performance of Risk Prediction Models for Type 2 Diabetes |
GUO Jin-dan1,GAO Yan-yan2,GAO Huai-lin3,**(),CHEN Yu-bao1,**() |
1 Institute of Laboratory Animal Sciences,Chinese Academy of Medical Sciences & Peking Union Medical College, National Human Diseases; Animal Model Resource Center, NHC Key Laboratory of Human Disease Comparative Medicine, Beijing 100021, China 2 General Internal Medicine Department, Majiagou Hospital, Kailuan Medical Health Group, Tangshan 063006, China 3 Diabetes Research Institute of Hebei Yiling Hospital, Shijiazhuang 050090, China |
|
|
Abstract Objective: To compare and analyze the difference in performance and application value of five common machine learning algorithms in type 2 diabetes mellitus (T2DM) risk prediction models. Methods: Public diabetes datasets such as the Pima Indians diabetes dataset (PIDD) were utilized to model five common algorithms, namely, logistic regression (LR), support vector machine (SVM), decision tree (DT), naive Bayes (NB) and k-nearest neighbor (KNN). Different training set proportions were set and random repeated sampling was performed. Accuracy and stability were used as the main criteria to compare different models. Results: Through the comparative study of the five models, it is found that their outcomes were closely related to the sample size. It is recommended that the proportion of training sets in model training should be in the range of 0.8 to 0.85, and certain missing values can be tolerated. In addition, the prediction effect of the model is closely related to the sampling of datasets, and LR, SVM and NB methods have better prediction effect. Conclusions: Different prediction models have significantly different performances, with the LR model having the best performance and clear advantages over the other models. The study’s findings can be used as a guide when choosing the fundamental algorithm of a clinical T2DM risk prediction model.
|
Received: 30 May 2023
Published: 01 December 2023
|
|
Corresponding Authors:
**Huai-lin GAO,Yu-bao CHEN
E-mail: **chenyubao@cnilas.org;gaohuailin@126.com
|
|
|
[1] |
Li L M, Jiang B G, Sun L L. HNF1A: from monogenic diabetes to type 2 diabetes and gestational diabetes mellitus. Frontiers in Endocrinology, 2022, 13: 829565.
doi: 10.3389/fendo.2022.829565
|
|
|
[2] |
Juan J, Yang H X. Prevalence, prevention, and lifestyle intervention of gestational diabetes mellitus in China. International Journal of Environmental Research and Public Health, 2020, 17(24): 9517.
doi: 10.3390/ijerph17249517
|
|
|
[3] |
Wei X L, Zhang Z T, Chong M K C, et al. Evaluation of a package of risk-based pharmaceutical and lifestyle interventions in patients with hypertension and/or diabetes in rural China: a pragmatic cluster randomised controlled trial. PLoS Medicine, 2021, 18(7): e1003694.
doi: 10.1371/journal.pmed.1003694
|
|
|
[4] |
Tuppad A, Patil S D. Machine learning for diabetes clinical decision support: a review. Advances in Computational Intelligence, 2022, 2(2): 1-24.
doi: 10.1007/s43674-021-00007-7
|
|
|
[5] |
Wiebe N, Ye F, Crumley E T, et al. Temporal associations among body mass index, fasting insulin, and systemic inflammation. JAMA Network Open, 2021, 4(3): e211263.
doi: 10.1001/jamanetworkopen.2021.1263
|
|
|
[6] |
Zhao M, Wan J, Qin W Z, et al. A machine learning-based diagnosis modelling of type 2 diabetes mellitus with environmental metal exposure. Computer Methods and Programs in Biomedicine, 2023, 235: 107537.
doi: 10.1016/j.cmpb.2023.107537
|
|
|
[7] |
Wang L, Pan Z L, Liu W, et al. A dual-attention based coupling network for diabetes classification with heterogeneous data. Journal of Biomedical Informatics, 2023, 139: 104300.
doi: 10.1016/j.jbi.2023.104300
|
|
|
[8] |
De Silva K, Enticott J, Barton C, et al., Use and performance of machine learning models for type 2 diabetes prediction in clinical and community care settings: protocol for a systematic review and meta-analysis of predictive modeling studies. Digital Health, 2021, 7: 1-26.
|
|
|
[9] |
De Silva K, Lee W K, Forbes A, et al. Use and performance of machine learning models for type 2 diabetes prediction in community settings: a systematic review and meta-analysis. International Journal of Medical Informatics, 2020, 143: 104268.
doi: 10.1016/j.ijmedinf.2020.104268
|
|
|
[10] |
Carrillo-Larco R M, Aparcana-Granda D J, Mejia J R, et al. FINDRISC in Latin America: a systematic review of diagnosis and prognosis models. BMJ Open Diabetes Research & Care, 2020, 8(1): e001169.
doi: 10.1136/bmjdrc-2019-001169
|
|
|
[11] |
Joseph L P, Joseph E A, Prasad R. Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture. Computers in Biology and Medicine, 2022, 151: 106178
doi: 10.1016/j.compbiomed.2022.106178
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|