Dr. Ashish Tripathi Satish M. Srinivasan


Abstract:- Predictive analytics embraces an extensive range of techniques including but are not limited to statistical modeling, machine learning, Artificial Intelligence and data mining. It has profound usefulness in different application areas such as data-driven decision making, business intelligence, public health, disaster management and response, as well as many other fields. In this study, we design and implement a predictive analytics system that can be used to forecast the likelihood that a diabetic patient will be readmitted to the hospital.  Upon extensively cleaning the Diabetes 130-US hospitals dataset containing patient records spanning for over 9 years i.e. from 1999 till 2008, we modelled the relationship between the predictors and the response variable using the XGBoost classifier. Upon performing hyperparameter optimization for the XGBoost, we obtained a maximum AUC of 0.671. Our study reveals that attributes such as lab procedures, number of medications, time in hospital, discharge disposition and number of inpatient visits are strong predictors for the response variable (i.e. re-admission of patients). Findings from this study can help hospitals design suitable protocols to ensure that patients with a higher probability of re-admission are recovering well and possibly reducing the risk of future re-admission. In the long run, not only will our study improve the life quality of diabetic patients, it will also help in reducing the medical expenses associated with re-admission.

Full Text:



American Diabetes Association. (2010). Diagnosis and classification of diabetes mellitus. Diabetes Care. 33(Suppl 1), S62–S69. doi: 10.2337/dc10-S062

American Diabetes Association. (2017). Age, race, gender & family history. Retrieved from http://www.diabetes.org/are-you-at-risk/lower-your-risk/nonmodifiables.html

American Diabetes Association. (2018a). Statistics about diabetes: Overall numbers, diabetes and prediabetes. Retrieved from http://www.diabetes.org/diabetes-basics/statistics/

American Diabetes Association. (2018b). Economic costs of diabetes in the U.S. in 2017. Retrieved from http://care.diabetesjournals.org/content/early/2018/03/20/dci18-0007

Bhuvan, M S, Ankit, K., Adil, Z. & Vinith, K. (2016). Identifying Diabetic Patients with High Risk of Readmission. Retrieved from https://arxiv.org/pdf/1602.04257.pdf

Biau, G. (2012). Analysis of a Random Forests Model. Journal of Machine Learning Research, 13, 1063-1095.

Briefing, T. D. (2014) Ahrq: The conditions that cause the most readmissions. The Daily Briefing.

Brown, A. F., Mangione, C. M., Saliba, D., Sarkisian, C. A. (2003). California Healthcare Foundation/American Geriatrics Society Panel on Improving Care for Elders with Diabetes. Guidelines for improving the care of the older person with diabetes mellitus. Journal of the American Geriatrics Society, 51(5), S265–S280.

Centers for Disease Control and Prevention. (2011). National diabetes fact sheet: General information and national estimates on diabetes in the United States, 2011. Atlanta, Georgia, U.S. Department of Health and Human Services, Centers for Disease Control and Prevention.

Centers for Disease Control and Prevention. (CDC). (2016). Leading causes of death. Retrieved from https://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm

Centers for Disease Control and Prevention. (CDC). (2017). National Diabetes Statistics Report, 2017. Retrieved fromhttps://www.cdc.gov/diabetes/pdfs/data/statistics/national-diabetes-statistics-report.pdf

Chen, T., Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In the proceedings of the KDD’ 16, 1-10.

Dutta, D. (2016). How to perform feature selection (i.e. pick important variables) using Boruta Package in R?. Retrieved from https://www.analyticsvidhya.com/blog/2016/03/select-important-variables-boruta-package/

Dungan, K. M. (2012). The effect of diabetes on hospital readmissions. Journal of diabetes science and technology, 6(5):1045–1052.

Eby, E., Hardwick, C., Yu, M., Gelwicks, S., Deschamps, K., Xie, J., & George, T. (2014). Predictors of 30 day hospital readmission in patients with type 2 diabetes: a retrospective, case-control, database study. Current Medical Research & Opinion, 31(1):107–114.

Echouffo-Tcheugui, J. B., Caleyachetty, R., Muennig, P. A., Narayan, K. M., & Golden, S. H. (2016). Cumulative social risk and type 2 diabetes in US adults: The National Health and Nutrition Examination Survey (NHANES) 1999–2006. European Journal of Preventive Cardiology, 23(12), 1282-1288. doi: 10.1177/2047487315627036.

Gambino, B. (2017, July 19). Link between Diabetes and Hospital Readmission Rates. Endocrinology Advisor. Retrieved January 16, 2018, from http://www.endocrinologyadvisor.com/type-2-diabetes/diabetes-hospital-readmission-rates/article/675957/.

Hephzibah, M., & Goutam, C. (2015). Predicting Readmission of Diabetic Patients using the high performance Support Vector Machine algorithm of SAS® Enterprise Miner. Retrieved from https://support.sas.com/resources/papers/proceedings15/3254-2015.pdf

Hasan, M. (2001). Readmission of patients to hospital: still defined and poorly understood. International Journal for Quality in Health Care, 13(3):177–179.

Herman, B. B. H. (2014). The costs of 10 top medicaid readmission conditions. Web.

Hill, J., Nielsen, M., & Fox, M. H. (2013). Understanding the social factors that contribute to diabetes: A means to informing health care and social policies for the chronically ill. The Permanente Journal, 17(2): 67-72. doi: 10.7812/TPP/12-099

Hosseinzadeh, A., Izadi, M. T., Verma, A., Precup, D., & Buckeridge, D. L (2013). Assessing the predictability of hospital readmission using machine learning. In H. Muoz-Avila and D. J. Stracuzzi, editors, IAAI. AAAI, 978-1-57735-615-8.

Howell, S., Coory, M., Martin, J. & Duckett, S. (2009) Using routine inpatient data to identify patients at risk of hospital readmission. BMC Health Services Research, 9(1):96.

Jiang, H. J., Stryer, D., Friedman, B., & Andrews, R. (2003) Multiple hospitalizations for patients with diabetes. Diabetes care, 26(5):1421– 1426.

Jolliffe, I. T. and Cadima, J. (2016). Principal Component Analysis: A Review and Recent Developments. Philos Trans A Math Phys Eng Sci, 374(2065).

Khanolkar, A. R., Amin, R., Taylor-Robinson, D., Viner, R. M., Warner, J. T., & Stephenson, T. (2016). Young people with Type 1 diabetes of non-white ethnicity and lower socio-economic status have poorer glycaemic control in England and Wales. Diabetic Medicine, 33(11), 1508-1515. doi: 10.1111/dme.13079. Epub 2016 Feb 23

Kirkman, M. S., Briscoe, V. J., Clark, N., Florez, H., Haas, L. B., Halter, J. B., Huang, E. S., Korytkowski, M. T., Munshi, M. N., Odegard, P. S., Pratley, R. E., & Swift, C. S. (2012). Diabetes in older adults. Diabetes Care, 35(12), 2650-2664. doi: 10.2337/dc12-1801

Kim, H., Ross, J. S., Melkus, G. D., Zhao, Z. & Boockvar, K. (2010). Scheduled and unscheduled hospital readmissions among diabetes patients. The American journal of managed care, 16(10):760

Kursa, M. B., & Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), 1-13.

Lichman. M. (2013). UCI machine learning repository.

Long, J. S. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks, CA: Sage Publications, Inc.

Mansyur, C. L., Rustveld, L. O., Nash, S. G., & Jibaja-Weiss, M. L. (2015). Social factors and barriers to self-care adherence in Hispanic men and women with diabetes. Patient Education & Counseling, 98(6), 805-810. doi: 10.1016/j.pec.2015.03.001

Medline Plus. (2014). Insulin injection. Retrieved from https://medlineplus.gov/druginfo/meds/a682611.html

NIDDK. (n.d.). Symptoms & causes of diabetes. Retrieved from https://www.niddk.nih.gov/health-information/diabetes/overview/symptoms-causes

Osborne, J. W. (2015). Best practices in logistic regression. Thousand Oaks, CA: Sage Publications, Inc.

Pampel, F. C. (2000). Logistic regression: A primer. Thousand Oaks, CA: Sage Publications, Inc.

Signorello, L. B., Schlundt, D. G., Cohen, S. S., Steinwandel, M. D., Buchowski, M. S., McLaughlin, J. K., Hargreaves, M. K., & Blot, W. J. (2007). Comparing diabetes prevalence between African Americans and whites of similar socioeconomic status. American Public Health Association, 97(12): 2260–2267. doi: 10.2105/AJPH.2006.094482

Silverstein, M. D., Qin, H., Mercer, S. Q., Fong, J. & Haydar. Z. (2008) Risk factors for 30-day hospital readmission in patients 65 years of age. Proceedings (Baylor University. Medical Center), 21(4):363.

Spanakis, E. K., & Golden, S. H. (2014). Race/ethnic difference in diabetes and diabetic complications. Current Diabetes Reports, 13(6), 1-18. doi: 10.1007/s11892-013-0421-9

Strack, B., DeShazo, J. P., Gennings, C., Olmo, J. L., Ventura, S., Cios, K. J. & Clore, J. N. (2014). Impact of hba1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. Bio Med research international

Wang, R. (2012). Ada Boost for Feature Selection, Classification and its Relation with SVM, A Review. Physics Procedia, 25, 800-807.

Web MD. (n.d.). Diabetes overview. Retrieved from http://www.webmd.com/diabetes/

WHO. (2013). Diabetes. Retrieved from https://web.archive.org/web/20130826174444/http://www.who.int/mediacentre/factsheets/fs312/en/

Web MD. (2017). Diabetes and high blood pressure. Retrieved from http://www.webmd.com/hypertension-high-blood-pressure/guide/high-blood-pressure


  • There are currently no refbacks.