Introduction: Diabetes Mellitus (DM) is a significant health problem in the United States, with over 20 million adults diagnosed with the condition. Type 2 Diabetes Mellitus, characterized by insulin resistance, in particular has been associated with various adverse conditions such as chronic kidney disease and peripheral artery disease. The presence of Type 2 Diabetes in an individual is also associated with various risk factors such as genetic markers and ethnicity. Native Americans, in particular, are more susceptible to Type 2 Diabetes Mellitus, with Native Americans having over two times the likelihood to present with Type 2 DM than non Hispanic whites. Of worry is the Pima Indian population in Arizona, which has the highest prevalence of Type 2 DM in the world. There have been many risk factors associated with the population such as genetic markers and lifestyle changes, but there has not been much research on the utilization of raw data to find the most pertinent factors for diabetes incidence.
Objective: There were three main objectives of the study. One objective was to elucidate potential new relationships via linear regression. Another objective was to determine which factors were indicative of Type 2 DM in the population. Finally, the last objective was to compare the incidence of Type 2 DM in the dataset to trends seen elsewhere.
Methods: The dataset was uploaded from an open source site with citation onto Python. The dataset, created in 1990, was composed of 768 female patients across 9 different attributes (Number of Pregnancies, Plasma Glucose Levels, Systolic Blood Pressure, Triceps Skin Thickness, Insulin Levels, BMI, Diabetes Pedigree Function, Age and Diabetes Presence (0 or 1)). The dataset was then cleaned using mean or median imputation. Post cleaning, linear regression was done to assess the relationships between certain factors in the population and assessed via the probability statistic for significance, with the exclusion of the Diabetes Pedigree Function and Diabetes Presence. Reverse stepwise logistic regression was used to determine the most pertinent factors for Type 2 DM via the Akaike Information Criterion and through the statistical significance in the model. Finally, data from the Center of Disease Control (CDC) Diabetes Surveillance was assessed for relationships with Female DM Percenatge in Pinal County through Obesity or through Physical Inactivity via simple logistic regression for statistical significance.
Results: The majority of the relationships found were statistically significant with each other. The most pertinent factors of Type 2 DM in the dataset were the number of pregnancies, the plasma glucose levels as well as the Blood Pressure. Via the USDS Data from the CDC, the relationships between Female DM Percentage and the obesity and inactivity percentages were statistically significant.
Conclusion: The trends found in the study matched the trends found in the literature. Per the results, recommendations for better diabetes control include more medical education as well as better blood sugar monitoring.With more analysis, there can be more done for checking other factors such as genetic factors and epidemiological analysis. In conclusion, the study accomplished its main objectives.