Search Content

Matching Items (2)

The Elucidation of Potential New Factors that Influence and Impact Type 2 Diabetes Mellitus Prevalence in Pima Indian populations

Description

Introduction: Diabetes Mellitus (DM) is a significant health problem in the United States, with over 20 million adults diagnosed with the condition. Type 2 Diabetes Mellitus, characterized by insulin resistance, in particular has been associated with various adverse conditions such as chronic kidney disease and peripheral artery disease. The presence of Type 2 Diabetes in an individual is also associated with various risk factors such as genetic markers and ethnicity. Native Americans, in particular, are more susceptible to Type 2 Diabetes Mellitus, with Native Americans having over two times the likelihood to present with Type 2 DM than non Hispanic whites. Of worry is the Pima Indian population in Arizona, which has the highest prevalence of Type 2 DM in the world. There have been many risk factors associated with the population such as genetic markers and lifestyle changes, but there has not been much research on the utilization of raw data to find the most pertinent factors for diabetes incidence.

Objective: There were three main objectives of the study. One objective was to elucidate potential new relationships via linear regression. Another objective was to determine which factors were indicative of Type 2 DM in the population. Finally, the last objective was to compare the incidence of Type 2 DM in the dataset to trends seen elsewhere.

Methods: The dataset was uploaded from an open source site with citation onto Python. The dataset, created in 1990, was composed of 768 female patients across 9 different attributes (Number of Pregnancies, Plasma Glucose Levels, Systolic Blood Pressure, Triceps Skin Thickness, Insulin Levels, BMI, Diabetes Pedigree Function, Age and Diabetes Presence (0 or 1)). The dataset was then cleaned using mean or median imputation. Post cleaning, linear regression was done to assess the relationships between certain factors in the population and assessed via the probability statistic for significance, with the exclusion of the Diabetes Pedigree Function and Diabetes Presence. Reverse stepwise logistic regression was used to determine the most pertinent factors for Type 2 DM via the Akaike Information Criterion and through the statistical significance in the model. Finally, data from the Center of Disease Control (CDC) Diabetes Surveillance was assessed for relationships with Female DM Percenatge in Pinal County through Obesity or through Physical Inactivity via simple logistic regression for statistical significance.

Results: The majority of the relationships found were statistically significant with each other. The most pertinent factors of Type 2 DM in the dataset were the number of pregnancies, the plasma glucose levels as well as the Blood Pressure. Via the USDS Data from the CDC, the relationships between Female DM Percentage and the obesity and inactivity percentages were statistically significant.

Conclusion: The trends found in the study matched the trends found in the literature. Per the results, recommendations for better diabetes control include more medical education as well as better blood sugar monitoring.With more analysis, there can be more done for checking other factors such as genetic factors and epidemiological analysis. In conclusion, the study accomplished its main objectives.

ContributorsKondury, Kasyap Krishna (Author) / Scotch, Matthew (Thesis director) / Aliste, Marcela (Committee member) / College of Health Solutions (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Machine Learning: A Sentiment Analysis of Customer Reviews

Description

Machine learning is the process of training a computer with algorithms to learn from data and make informed predictions. In a world where large amounts of data are constantly collected, machine learning is an important tool to analyze this data to find patterns and learn useful information from it. Machine learning applications expand to numerous fields; however, I chose to focus on machine learning with a business perspective for this thesis, specifically e-commerce.

The e-commerce market utilizes information to target customers and drive business. More and more online services have become available, allowing consumers to make purchases and interact with an online system. For example, Amazon is one of the largest Internet-based retail companies. As people shop through this website, Amazon gathers huge amounts of data on its customers from personal information to shopping history to viewing history. After purchasing a product, the customer may leave reviews and give a rating based on their experience. Performing analytics on all of this data can provide insights into making more informed business and marketing decisions that can lead to business growth and also improve the customer experience.
For this thesis, I have trained binary classification models on a publicly available product review dataset from Amazon to predict whether a review has a positive or negative sentiment. The sentiment analysis process includes analyzing and encoding the human language, then extracting the sentiment from the resulting values. In the business world, sentiment analysis provides value by revealing insights into customer opinions and their behaviors. In this thesis, I will explain how to perform a sentiment analysis and analyze several different machine learning models. The algorithms for which I compared the results are KNN, Logistic Regression, Decision Trees, Random Forest, Naïve Bayes, Linear Support Vector Machines, and Support Vector Machines with an RBF kernel.

ContributorsMadaan, Shreya (Author) / Meuth, Ryan (Thesis director) / Nakamura, Mutsumi (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Dean, W.P. Carey School of Business (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05