Please wait a minute...

中国生物工程杂志

CHINA BIOTECHNOLOGY
中国生物工程杂志  2023, Vol. 43 Issue (11): 35-42    DOI: 10.13523/j.cb.2305049
技术与方法     
2型糖尿病风险预测模型性能比较研究*
郭金旦1,高艳艳2,高怀林3,**(),陈禹保1,**()
1 中国医学科学院医学实验动物研究所 国家人类疾病动物模型资源库国家卫生健康委员会人类疾病比较医学重点实验室 北京 100021
2 河北省唐山开滦医疗健康集团马家沟医院 唐山 063006
3 河北以岭医院糖尿病研究所 石家庄 050090
Comparison on the Performance of Risk Prediction Models for Type 2 Diabetes
GUO Jin-dan1,GAO Yan-yan2,GAO Huai-lin3,**(),CHEN Yu-bao1,**()
1 Institute of Laboratory Animal Sciences,Chinese Academy of Medical Sciences & Peking Union Medical College, National Human Diseases; Animal Model Resource Center, NHC Key Laboratory of Human Disease Comparative Medicine, Beijing 100021, China
2 General Internal Medicine Department, Majiagou Hospital, Kailuan Medical Health Group, Tangshan 063006, China
3 Diabetes Research Institute of Hebei Yiling Hospital, Shijiazhuang 050090, China
 全文: PDF(1009 KB)   HTML
摘要: 目的: 探讨5种常见机器学习算法在2型糖尿病风险预测模型构建中的预测性能差异与应用价值。方法: 利用Pima Indians等公共糖尿病数据集,对Logistic回归(LR)、支持向量机(SVM)、决策树(DT)、朴素贝叶斯(NB)和k最近邻域法(KNN)5种常见算法分别建模,设置不同训练集比例和随机重复抽样,以准确性、稳定性作为主要评判标准对不同模型进行比较研究。结果: 对于所有模型,训练集比例在0.8~0.85时预测效果最佳,并且能够容忍一定的缺失值,训练集的随机抽样也会影响预测效果,不同预测模型的预测效果存在明显区别,LR、SVM和NB方法预测效果较好。结论: LR方法整体效果最好,研究结果可为临床2型糖尿病预测模型评估和核心算法的选择提供参考。
关键词: 2型糖尿病预测模型性能比较    
Abstract: Objective: To compare and analyze the difference in performance and application value of five common machine learning algorithms in type 2 diabetes mellitus (T2DM) risk prediction models. Methods: Public diabetes datasets such as the Pima Indians diabetes dataset (PIDD) were utilized to model five common algorithms, namely, logistic regression (LR), support vector machine (SVM), decision tree (DT), naive Bayes (NB) and k-nearest neighbor (KNN). Different training set proportions were set and random repeated sampling was performed. Accuracy and stability were used as the main criteria to compare different models. Results: Through the comparative study of the five models, it is found that their outcomes were closely related to the sample size. It is recommended that the proportion of training sets in model training should be in the range of 0.8 to 0.85, and certain missing values can be tolerated. In addition, the prediction effect of the model is closely related to the sampling of datasets, and LR, SVM and NB methods have better prediction effect. Conclusions: Different prediction models have significantly different performances, with the LR model having the best performance and clear advantages over the other models. The study’s findings can be used as a guide when choosing the fundamental algorithm of a clinical T2DM risk prediction model.
Key words: Type 2 diabetes mellitus    Prediction model    Performance comparison
收稿日期: 2023-05-30 出版日期: 2023-12-01
ZTFLH:  Q141  
基金资助: 河北省自然科学基金(H2019106062)
通讯作者: **高怀林,陈禹保     E-mail: **chenyubao@cnilas.org;gaohuailin@126.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
郭金旦
高艳艳
高怀林
陈禹保

引用本文:

郭金旦, 高艳艳, 高怀林, 陈禹保. 2型糖尿病风险预测模型性能比较研究*[J]. 中国生物工程杂志, 2023, 43(11): 35-42.

GUO Jin-dan, GAO Yan-yan, GAO Huai-lin, CHEN Yu-bao. Comparison on the Performance of Risk Prediction Models for Type 2 Diabetes. China Biotechnology, 2023, 43(11): 35-42.

链接本文:

https://manu60.magtech.com.cn/biotech/CN/10.13523/j.cb.2305049        https://manu60.magtech.com.cn/biotech/CN/Y2023/V43/I11/35

变量 Mean(±SD) 变量 Mean(±SD)
年龄 47.0(±13.4) 红细胞计数(RBC) 5.0(±0.5)
天冬氨酸氨基转移酶(AST) 27.0(±13.9) 血红蛋白(HB) 150.0(±16.3)
丙氨酸氨基转移酶(ALT) 28.2(±23.1) 红细胞压积(HCT) 0.45(±0.04)
碱性磷酸酶(ALP) 87.7(±25.2) 红细胞平均体积(MCV) 89.3(±4.3)
r-谷氨酰基转移酶(r-GT) 40.0(±42.0) 红细胞平均血红蛋白量(MCH) 30.0(±1.9)
总蛋白(TP) 76.7(±4.0) 红细胞平均血红蛋白浓度(MCHC) 336.0(±11.3)
白蛋白(ALB) 45.8(±2.6) 红细胞体积分布宽度(RDW) 12.7(±0.9)
球蛋白(GLB) 30.9(±3.5) 血小板计数(PLT) 250.3(±58.6)
白球比例(A/G) 1.5(±0.2) 血小板平均体积(MPV) 10.7(±1.0)
甘油三酯(TG) 1.9(±1.8) 血小板体积分布宽度(PDW) 13.3(±2.1)
总胆固醇(TC) 5.2(±1.0) 血小板比积(PCT) 0.27(±0.06)
高密度脂蛋白胆固醇(HDL-C) 1.4(±0.3) 中性粒细胞(NE) 56.6(±7.7)
低密度脂蛋白胆固醇(LDL-C) 3.4(±0.9) 淋巴细胞(LYM) 33.8(±7.2)
尿素(BUN) 5.0(±1.3) 单核细胞(MONO) 6.9(±1.6)
肌酐(Cr) 78.7(±13.7) 嗜酸性粒细胞(EOS) 2.1(±1.7)
尿酸(UA) 356.9(±95.2) 嗜碱性粒细胞(BAS) 0.6(±0.3)
白细胞计数(WBC) 6.6(±1.6)
表1  数据集3信息统计情况
Method TR 数据集1 数据集2 数据集3
Mean SE 10-fold Mean SE 10-fold 10-fold
LR 0.6 0.767 2 0.006 0 0.772 2 0.763 6 0.005 0 0.772 6 0.910 9
0.65 0.772 1 0.006 2 0.762 6 0.007 9
0.7 0.770 1 0.007 0 0.759 4 0.006 1
0.75 0.767 8 0.008 2 0.769 1 0.006 1
0.8 0.778 6 0.007 4 0.769 3 0.007 4
0.85 0.781 9 0.006 0 0.758 3 0.010 5
0.9 0.766 2 0.011 6 0.761 1 0.019 9
SVM 0.6 0.752 3 0.007 7 0.760 4 0.767 2 0.005 2 0.750 4 0.909 9
0.65 0.761 7 0.007 2 0.763 7 0.008 4
0.7 0.764 9 0.007 6 0.766 4 0.008 5
0.75 0.762 0 0.007 8 0.770 7 0.005 1
0.8 0.772 7 0.007 5 0.763 4 0.008 2
0.85 0.774 1 0.009 5 0.763 5 0.011 4
0.9 0.746 8 0.015 0 0.774 0 0.019 9
DT 0.6 0.729 2 0.008 1 0.762 5 0.727 5 0.006 1 0.719 2 0.908 4
0.65 0.730 9 0.009 0 0.728 5 0.010 8
0.7 0.752 8 0.009 5 0.722 3 0.007 4
0.75 0.730 2 0.009 3 0.717 3 0.012 7
0.8 0.740 9 0.009 8 0.718 3 0.012 1
0.85 0.750 9 0.009 2 0.718 3 0.010 6
0.9 0.728 6 0.014 9 0.697 4 0.019 6
NB 0.6 0.752 3 0.006 5 0.738 2 0.759 0 0.007 2 0.732 0 0.861 4
0.65 0.760 6 0.005 3 0.761 4 0.007 7
0.7 0.761 1 0.007 6 0.759 8 0.007 3
0.75 0.754 2 0.008 0 0.766 0 0.007 2
0.8 0.756 5 0.007 8 0.762 7 0.007 8
0.85 0.760 3 0.006 1 0.760 9 0.005 8
0.9 0.741 6 0.012 5 0.763 6 0.016 7
KNN 0.6 0.715 2 0.007 3 0.756 5 0.734 4 0.010 0 0.755 6 0.909 1
0.65 0.721 6 0.010 6 0.729 2 0.006 7
0.7 0.720 4 0.012 3 0.728 9 0.007 7
0.75 0.726 1 0.011 6 0.734 5 0.009 9
0.8 0.733 1 0.011 4 0.734 0 0.012 1
0.85 0.747 4 0.005 8 0.724 3 0.010 9
0.9 0.723 4 0.008 9 0.731 2 0.019 4
表2  不同规模训练集计算结果
图1  模型预测结果
图2  不同模型的计算结果
[1] Li L M, Jiang B G, Sun L L. HNF1A: from monogenic diabetes to type 2 diabetes and gestational diabetes mellitus. Frontiers in Endocrinology, 2022, 13: 829565.
doi: 10.3389/fendo.2022.829565
[2] Juan J, Yang H X. Prevalence, prevention, and lifestyle intervention of gestational diabetes mellitus in China. International Journal of Environmental Research and Public Health, 2020, 17(24): 9517.
doi: 10.3390/ijerph17249517
[3] Wei X L, Zhang Z T, Chong M K C, et al. Evaluation of a package of risk-based pharmaceutical and lifestyle interventions in patients with hypertension and/or diabetes in rural China: a pragmatic cluster randomised controlled trial. PLoS Medicine, 2021, 18(7): e1003694.
doi: 10.1371/journal.pmed.1003694
[4] Tuppad A, Patil S D. Machine learning for diabetes clinical decision support: a review. Advances in Computational Intelligence, 2022, 2(2): 1-24.
doi: 10.1007/s43674-021-00007-7
[5] Wiebe N, Ye F, Crumley E T, et al. Temporal associations among body mass index, fasting insulin, and systemic inflammation. JAMA Network Open, 2021, 4(3): e211263.
doi: 10.1001/jamanetworkopen.2021.1263
[6] Zhao M, Wan J, Qin W Z, et al. A machine learning-based diagnosis modelling of type 2 diabetes mellitus with environmental metal exposure. Computer Methods and Programs in Biomedicine, 2023, 235: 107537.
doi: 10.1016/j.cmpb.2023.107537
[7] Wang L, Pan Z L, Liu W, et al. A dual-attention based coupling network for diabetes classification with heterogeneous data. Journal of Biomedical Informatics, 2023, 139: 104300.
doi: 10.1016/j.jbi.2023.104300
[8] De Silva K, Enticott J, Barton C, et al., Use and performance of machine learning models for type 2 diabetes prediction in clinical and community care settings: protocol for a systematic review and meta-analysis of predictive modeling studies. Digital Health, 2021, 7: 1-26.
[9] De Silva K, Lee W K, Forbes A, et al. Use and performance of machine learning models for type 2 diabetes prediction in community settings: a systematic review and meta-analysis. International Journal of Medical Informatics, 2020, 143: 104268.
doi: 10.1016/j.ijmedinf.2020.104268
[10] Carrillo-Larco R M, Aparcana-Granda D J, Mejia J R, et al. FINDRISC in Latin America: a systematic review of diagnosis and prognosis models. BMJ Open Diabetes Research & Care, 2020, 8(1): e001169.
doi: 10.1136/bmjdrc-2019-001169
[11] Joseph L P, Joseph E A, Prasad R. Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture. Computers in Biology and Medicine, 2022, 151: 106178
doi: 10.1016/j.compbiomed.2022.106178
[1] 甘巧, 孟庆雄. 肠道菌群及其代谢产物与T2DM发病机制及干预措施*[J]. 中国生物工程杂志, 2022, 42(3): 62-71.
[2] 张杰, 林炳锋, 许平翠, 王娜妮, 陈郁. 麦冬提取物治疗2型糖尿病小鼠的血清代谢组学研究*[J]. 中国生物工程杂志, 2022, 42(11): 99-108.
[3] 陈庆宇,王鲜忠,张姣姣. 基因技术在治疗2型糖尿病中的应用*[J]. 中国生物工程杂志, 2020, 40(11): 73-81.
[4] 冯琳晶,于洋,杜红伟. FoxO1在胰岛β细胞代谢灵活性受损及失代偿进程中的作用 *[J]. 中国生物工程杂志, 2018, 38(6): 70-76.
[5] 王得华, 马义, 韩磊, 肖兴, 李艳伟, 党诗莹, 范志勇, 文涛, 洪岸. 新型基因重组PACAP衍生物MPL-2的制备及其抗2型糖尿病作用研究[J]. 中国生物工程杂志, 2017, 37(5): 59-65.
[6] 杜彩贺, 胡芳, 魏婷婷, 张仁敏, 张红琳, 周东蕊, 陆祖宏. PCR-DGGE指纹图谱技术分析2型糖尿病模型小鼠胃微生物菌群结构[J]. 中国生物工程杂志, 2012, 32(03): 25-31.
[7] 刘延杰, 季虹, 林鲁霞, 臧学章, 宋长征, 荣海钦. Exendin-4的固相化学合成及鉴定[J]. 中国生物工程杂志, 2011, 31(02): 69-73.