Application of PCA and K-Means Clustering Methods to Identify Diabetes Mellitus Patient Groups Based on Risk Factors

Anisa Simanjuntak, Muhammad Siddik Hasibuan


Diabetes mellitus is a chronic disease characterized by high levels of glucose (sugar) in the blood that is high for a long period of time. Identification is the process of recognizing and determining the characteristics of a particular object or entity. hypertension (high blood pressure), smoking and lack of physical activity can affect the condition of diabetes mellitus patients. Therefore, an approach is needed that can identify groups of diabetic patients based on their risk factors, so that appropriate management and treatment can be carried out. The purpose of this study is to apply PCA method by reducing data dimension to identify the linear combination of the most contributing risk factors in diabetes mellitus patient data and apply K-Means Clustering to cluster into groups based on similar risk factors. The methods to be used are Principal Component Analysis (PCA) and K-Means Clustering. type of quantitative research, this research can be categorized as analytic research, variables are risk factors for diabetes mellitus disease. The results of research using the PCA (principal component analysis) method obtained 9 main components (PC) 86.9275%. correlation between attributes and principal components, then a matrix component is formed with a loading value that the greater the value, the stronger the correlation with the principal component formed with a cut off point of loading value> 0.4 regardless of positive and negative. By using the K-Means Clustering method, The clustering results obtained are divided into 3 groups of diabetes patients based on existing risk factors. Centroid C1 represents a group of diabetes mellitus patients whose condition is at a mild level, while Centroid C2 represents a group of diabetes mellitus patients who are at a moderate level, and Centroid C3 represents a group of patients with severe or dangerous diabetes mellitus.


Diabetes Mellitus, Principal Component Analysis, K-Means Clustering

Full Text:



Abdillah, A. A., & Prianto, B. (2019). Pembelajaran Mesin Menggunakan Principal Component Analysis dan Support Vector Machines untuk Mendeteksi Diabetes. Jurnal Matematika Dan Sains, 24(1), 10–14.

Agustanti, D., & Purbianto, P. (2022). Pengaruh Konsumsi Air Alkali Terhadap Kadar Glukosa Darah Pada Pasien Diabetes Mellitus. Jurnal Ilmiah Keperawatan Sai Betik, 16(2), 149.

Azizah, U. N., Wurjanto, M. A., Kusariana, N., & Susanto, H. S. (2022). Hubungan Kualitas Tidur dengan Kontrol Glikemik pada Penderita Diabetes Melitus : Systematic Review. Jurnal Epidemiologi Kesehatan Komunitas, 7(1), 411–422.

BASTIAN, A. (2018). Penerapan Algoritma K-Means Clustering Analysis Pada Penyakit Menular Manusia (Studi Kasus Kabupaten Majalengka). Jurnal Sistem Informasi, 14(1), 28–34.

Hayqal, H. H. Q., Oni Soesanto, & Yuana Sukmawaty. (2022). K-Means Clustering dan Principal Component Analysis (PCA) Dalam Radial Basis Function Neural Network (RBFNN) Untuk Klasifikasi Data Multivariat. Journal of Mathematics Theory and Application, 4(1), 1–7.

Hediyati, D., & Suartana, I. M. (2021). Penerapan Principal Component Analysis (PCA) Untuk Reduksi Dimensi Pada Proses Clustering Data Produksi Pertanian Di Kabupaten Bojonegoro. Journal of Information Engineering and Educational Technology, 5(2).

IDF. (2021). International Diabetes Federation. Diabetes Research and Clinical Practice.

Ilu, S. Y., Rajesh, P., & Mohammed, H. (2022). Prediction of COVID-19 using long short-term memory by integrating principal component analysis and clustering techniques. Informatics in Medicine Unlocked, 31(June), 100990.

Jamal, A., Handayani, A., Septiandri, A. A., Ripmiatin, E., & Effendi, Y. (2018). Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction. Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, 9(3), 192.

Kemenkes RI. (2018). Penyakit Diabetes Melitus.

Kesuma Dinata, R., & Hasdyna, N. (2020). Machine Learning.pdf (M. S. DR. Fajriana, S.Si. (ed.); Pertama). Unimal Press.

No, V., Hal, J., Elang, A., Setyadji, S., Wibowo, A. P., Ngurah, I. G., Matthew, A., Pratama, R. B., Masyhuda, T. A., Sinaga, A. A., Purwanti, E., & Werdiningsih, I. (2023). Analisis Klaster Data Pasien Diabetes untuk Identifikasi Pola dan Karakteristik Pasien. 5(3), 172–182.

Nuraisyah, F. (2018). Faktor Risiko Diabetes Mellitus Tipe 2. Jurnal Kebidanan Dan Keperawatan Aisyiyah, 13(2), 120–127.

Prasatya, A., Siregar, R. R. A., & Arianto, R. (2020). Penerapan Metode K-Means Dan C4.5 Untuk Prediksi Penderita Diabetes. Petir, 13(1), 86–100.

Purbolaksono, M. D., Irvan Tantowi, M., Imam Hidayat, A., & Adiwijaya, A. (2021). Perbandingan Support Vector Machine dan Modified Balanced Random Forest dalam Deteksi Pasien Penyakit Diabetes. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(2), 393–399.

Riskesdas. (2018). Laporan Provinsi Sumatera Utara Riskesdas 2018. In Badan Penelitian dan Pengembangan Kesehatan.

Simeftiany Indrilemta Lomo, Endang Darmawan, & Sugiyarto. (2023). Cluster analysis of type II Diabetes Mellitus Patients with the Fuzzy C-means method. Annals of Mathematical Modeling, 3(1), 24–31.

WHO. (2018). Noncommunicable diseases.

Yulianti, T. R., Siregar, K. N., Prabawa, A., & Fadhilah, N. (2022). Identifikasi Atribut dengan Principal Component Analysis dan K-Means Clustering Sebagai Dasar Penyusunan Strategi Promosi KB Pria di Indonesia. Jurnal Biostatistik, Kependudukan, Dan Informatika Kesehatan, 2(2), 79.

Zhu, C., Idemudia, C. U., & Feng, W. (2019). Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Informatics in Medicine Unlocked, 17(January), 100179.



  • There are currently no refbacks.

Copyright (c) 2023 Anisa Simanjuntak, Muhammad Siddik Hasibuan

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Creative Commons License
J-PS (Prisma Sains: Jurnal Pengkajian Ilmu dan Pembelajaran Matematika dan IPA IKIP Mataram) p-ISSN (print) 2338-4530, e-ISSN (online) 2540-7899 is licensed under a Creative Commons Attribution 4.0 International License.

View My Stats