Please wait a minute...

中国生物工程杂志

China Biotechnology
China Biotechnology  2023, Vol. 43 Issue (11): 35-42    DOI: 10.13523/j.cb.2305049
    
Comparison on the Performance of Risk Prediction Models for Type 2 Diabetes
GUO Jin-dan1,GAO Yan-yan2,GAO Huai-lin3,**(),CHEN Yu-bao1,**()
1 Institute of Laboratory Animal Sciences,Chinese Academy of Medical Sciences & Peking Union Medical College, National Human Diseases; Animal Model Resource Center, NHC Key Laboratory of Human Disease Comparative Medicine, Beijing 100021, China
2 General Internal Medicine Department, Majiagou Hospital, Kailuan Medical Health Group, Tangshan 063006, China
3 Diabetes Research Institute of Hebei Yiling Hospital, Shijiazhuang 050090, China
Download: HTML   PDF(1009KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  Objective: To compare and analyze the difference in performance and application value of five common machine learning algorithms in type 2 diabetes mellitus (T2DM) risk prediction models. Methods: Public diabetes datasets such as the Pima Indians diabetes dataset (PIDD) were utilized to model five common algorithms, namely, logistic regression (LR), support vector machine (SVM), decision tree (DT), naive Bayes (NB) and k-nearest neighbor (KNN). Different training set proportions were set and random repeated sampling was performed. Accuracy and stability were used as the main criteria to compare different models. Results: Through the comparative study of the five models, it is found that their outcomes were closely related to the sample size. It is recommended that the proportion of training sets in model training should be in the range of 0.8 to 0.85, and certain missing values can be tolerated. In addition, the prediction effect of the model is closely related to the sampling of datasets, and LR, SVM and NB methods have better prediction effect. Conclusions: Different prediction models have significantly different performances, with the LR model having the best performance and clear advantages over the other models. The study’s findings can be used as a guide when choosing the fundamental algorithm of a clinical T2DM risk prediction model.

Key wordsType 2 diabetes mellitus      Prediction model      Performance comparison     
Received: 30 May 2023      Published: 01 December 2023
ZTFLH:  Q141  
Corresponding Authors: **Huai-lin GAO,Yu-bao CHEN     E-mail: **chenyubao@cnilas.org;gaohuailin@126.com
Cite this article:

GUO Jin-dan, GAO Yan-yan, GAO Huai-lin, CHEN Yu-bao. Comparison on the Performance of Risk Prediction Models for Type 2 Diabetes. China Biotechnology, 2023, 43(11): 35-42.

URL:

https://manu60.magtech.com.cn/biotech/10.13523/j.cb.2305049     OR     https://manu60.magtech.com.cn/biotech/Y2023/V43/I11/35

变量 Mean(±SD) 变量 Mean(±SD)
年龄 47.0(±13.4) 红细胞计数(RBC) 5.0(±0.5)
天冬氨酸氨基转移酶(AST) 27.0(±13.9) 血红蛋白(HB) 150.0(±16.3)
丙氨酸氨基转移酶(ALT) 28.2(±23.1) 红细胞压积(HCT) 0.45(±0.04)
碱性磷酸酶(ALP) 87.7(±25.2) 红细胞平均体积(MCV) 89.3(±4.3)
r-谷氨酰基转移酶(r-GT) 40.0(±42.0) 红细胞平均血红蛋白量(MCH) 30.0(±1.9)
总蛋白(TP) 76.7(±4.0) 红细胞平均血红蛋白浓度(MCHC) 336.0(±11.3)
白蛋白(ALB) 45.8(±2.6) 红细胞体积分布宽度(RDW) 12.7(±0.9)
球蛋白(GLB) 30.9(±3.5) 血小板计数(PLT) 250.3(±58.6)
白球比例(A/G) 1.5(±0.2) 血小板平均体积(MPV) 10.7(±1.0)
甘油三酯(TG) 1.9(±1.8) 血小板体积分布宽度(PDW) 13.3(±2.1)
总胆固醇(TC) 5.2(±1.0) 血小板比积(PCT) 0.27(±0.06)
高密度脂蛋白胆固醇(HDL-C) 1.4(±0.3) 中性粒细胞(NE) 56.6(±7.7)
低密度脂蛋白胆固醇(LDL-C) 3.4(±0.9) 淋巴细胞(LYM) 33.8(±7.2)
尿素(BUN) 5.0(±1.3) 单核细胞(MONO) 6.9(±1.6)
肌酐(Cr) 78.7(±13.7) 嗜酸性粒细胞(EOS) 2.1(±1.7)
尿酸(UA) 356.9(±95.2) 嗜碱性粒细胞(BAS) 0.6(±0.3)
白细胞计数(WBC) 6.6(±1.6)
Table 1 Information statistics of dataset 3
Method TR 数据集1 数据集2 数据集3
Mean SE 10-fold Mean SE 10-fold 10-fold
LR 0.6 0.767 2 0.006 0 0.772 2 0.763 6 0.005 0 0.772 6 0.910 9
0.65 0.772 1 0.006 2 0.762 6 0.007 9
0.7 0.770 1 0.007 0 0.759 4 0.006 1
0.75 0.767 8 0.008 2 0.769 1 0.006 1
0.8 0.778 6 0.007 4 0.769 3 0.007 4
0.85 0.781 9 0.006 0 0.758 3 0.010 5
0.9 0.766 2 0.011 6 0.761 1 0.019 9
SVM 0.6 0.752 3 0.007 7 0.760 4 0.767 2 0.005 2 0.750 4 0.909 9
0.65 0.761 7 0.007 2 0.763 7 0.008 4
0.7 0.764 9 0.007 6 0.766 4 0.008 5
0.75 0.762 0 0.007 8 0.770 7 0.005 1
0.8 0.772 7 0.007 5 0.763 4 0.008 2
0.85 0.774 1 0.009 5 0.763 5 0.011 4
0.9 0.746 8 0.015 0 0.774 0 0.019 9
DT 0.6 0.729 2 0.008 1 0.762 5 0.727 5 0.006 1 0.719 2 0.908 4
0.65 0.730 9 0.009 0 0.728 5 0.010 8
0.7 0.752 8 0.009 5 0.722 3 0.007 4
0.75 0.730 2 0.009 3 0.717 3 0.012 7
0.8 0.740 9 0.009 8 0.718 3 0.012 1
0.85 0.750 9 0.009 2 0.718 3 0.010 6
0.9 0.728 6 0.014 9 0.697 4 0.019 6
NB 0.6 0.752 3 0.006 5 0.738 2 0.759 0 0.007 2 0.732 0 0.861 4
0.65 0.760 6 0.005 3 0.761 4 0.007 7
0.7 0.761 1 0.007 6 0.759 8 0.007 3
0.75 0.754 2 0.008 0 0.766 0 0.007 2
0.8 0.756 5 0.007 8 0.762 7 0.007 8
0.85 0.760 3 0.006 1 0.760 9 0.005 8
0.9 0.741 6 0.012 5 0.763 6 0.016 7
KNN 0.6 0.715 2 0.007 3 0.756 5 0.734 4 0.010 0 0.755 6 0.909 1
0.65 0.721 6 0.010 6 0.729 2 0.006 7
0.7 0.720 4 0.012 3 0.728 9 0.007 7
0.75 0.726 1 0.011 6 0.734 5 0.009 9
0.8 0.733 1 0.011 4 0.734 0 0.012 1
0.85 0.747 4 0.005 8 0.724 3 0.010 9
0.9 0.723 4 0.008 9 0.731 2 0.019 4
Table 2 Calculation results of training sets of different sizes
Fig.1 Model prediction results
Fig.2 Analysis results of different model A. Line chart showing the average accuracy of different models under different sampling proportion B. Plot of different model standards errors with sampling proportion C. 70 different groups’ t-test P-value box graph of random sampling and 10-fold cross validation results D. Paired test results of mean difference of different models
[1]   Li L M, Jiang B G, Sun L L. HNF1A: from monogenic diabetes to type 2 diabetes and gestational diabetes mellitus. Frontiers in Endocrinology, 2022, 13: 829565.
doi: 10.3389/fendo.2022.829565
[2]   Juan J, Yang H X. Prevalence, prevention, and lifestyle intervention of gestational diabetes mellitus in China. International Journal of Environmental Research and Public Health, 2020, 17(24): 9517.
doi: 10.3390/ijerph17249517
[3]   Wei X L, Zhang Z T, Chong M K C, et al. Evaluation of a package of risk-based pharmaceutical and lifestyle interventions in patients with hypertension and/or diabetes in rural China: a pragmatic cluster randomised controlled trial. PLoS Medicine, 2021, 18(7): e1003694.
doi: 10.1371/journal.pmed.1003694
[4]   Tuppad A, Patil S D. Machine learning for diabetes clinical decision support: a review. Advances in Computational Intelligence, 2022, 2(2): 1-24.
doi: 10.1007/s43674-021-00007-7
[5]   Wiebe N, Ye F, Crumley E T, et al. Temporal associations among body mass index, fasting insulin, and systemic inflammation. JAMA Network Open, 2021, 4(3): e211263.
doi: 10.1001/jamanetworkopen.2021.1263
[6]   Zhao M, Wan J, Qin W Z, et al. A machine learning-based diagnosis modelling of type 2 diabetes mellitus with environmental metal exposure. Computer Methods and Programs in Biomedicine, 2023, 235: 107537.
doi: 10.1016/j.cmpb.2023.107537
[7]   Wang L, Pan Z L, Liu W, et al. A dual-attention based coupling network for diabetes classification with heterogeneous data. Journal of Biomedical Informatics, 2023, 139: 104300.
doi: 10.1016/j.jbi.2023.104300
[8]   De Silva K, Enticott J, Barton C, et al., Use and performance of machine learning models for type 2 diabetes prediction in clinical and community care settings: protocol for a systematic review and meta-analysis of predictive modeling studies. Digital Health, 2021, 7: 1-26.
[9]   De Silva K, Lee W K, Forbes A, et al. Use and performance of machine learning models for type 2 diabetes prediction in community settings: a systematic review and meta-analysis. International Journal of Medical Informatics, 2020, 143: 104268.
doi: 10.1016/j.ijmedinf.2020.104268
[10]   Carrillo-Larco R M, Aparcana-Granda D J, Mejia J R, et al. FINDRISC in Latin America: a systematic review of diagnosis and prognosis models. BMJ Open Diabetes Research & Care, 2020, 8(1): e001169.
doi: 10.1136/bmjdrc-2019-001169
[11]   Joseph L P, Joseph E A, Prasad R. Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture. Computers in Biology and Medicine, 2022, 151: 106178
doi: 10.1016/j.compbiomed.2022.106178
[1] GAN Qiao, MENG Qing-xiong. Intestinal Microflora and Its Metabolites in Relation to the Pathogenesis and Intervention of T2DM[J]. China Biotechnology, 2022, 42(3): 62-71.
[2] LIU Yan-jie, JI Hong, LIN Lu-xia, ZANG Xue-zhang, SONG Chang-zheng, RONG Hai-qin. Solid Phase Peptide Synthesis and Analysis for Exendin-4[J]. China Biotechnology, 2011, 31(02): 69-73.