Matching Items (71)
Filtering by

Clear all filters

133339-Thumbnail Image.png
Description
Medical records are increasingly being recorded in the form of electronic health records (EHRs), with a significant amount of patient data recorded as unstructured natural language text. Consequently, being able to extract and utilize clinical data present within these records is an important step in furthering clinical care. One important

Medical records are increasingly being recorded in the form of electronic health records (EHRs), with a significant amount of patient data recorded as unstructured natural language text. Consequently, being able to extract and utilize clinical data present within these records is an important step in furthering clinical care. One important aspect within these records is the presence of prescription information. Existing techniques for extracting prescription information — which includes medication names, dosages, frequencies, reasons for taking, and mode of administration — from unstructured text have focused on the application of rule- and classifier-based methods. While state-of-the-art systems can be effective in extracting many types of information, they require significant effort to develop hand-crafted rules and conduct effective feature engineering. This paper presents the use of a bidirectional LSTM with CRF tagging model initialized with precomputed word embeddings for extracting prescription information from sentences without requiring significant feature engineering. The experimental results, run on the i2b2 2009 dataset, achieve an F1 macro measure of 0.8562, and scores above 0.9449 on four of the six categories, indicating significant potential for this model.
ContributorsRawal, Samarth Chetan (Author) / Baral, Chitta (Thesis director) / Anwar, Saadat (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
133143-Thumbnail Image.png
Description
The prevalence of bots, or automated accounts, on social media is a well-known problem. Some of the ways bots harm social media users include, but are not limited to, spreading misinformation, influencing topic discussions, and dispersing harmful links. Bots have affected the field of disaster relief on social media as

The prevalence of bots, or automated accounts, on social media is a well-known problem. Some of the ways bots harm social media users include, but are not limited to, spreading misinformation, influencing topic discussions, and dispersing harmful links. Bots have affected the field of disaster relief on social media as well. These bots cause problems such as preventing rescuers from determining credible calls for help, spreading fake news and other malicious content, and generating large amounts of content which burdens rescuers attempting to provide aid in the aftermath of disasters. To address these problems, this research seeks to detect bots participating in disaster event related discussions and increase the recall, or number of bots removed from the network, of Twitter bot detection methods. The removal of these bots will also prevent human users from accidentally interacting with these bot accounts and being manipulated by them. To accomplish this goal, an existing bot detection classification algorithm known as BoostOR was employed. BoostOR is an ensemble learning algorithm originally modeled to increase bot detection recall in a dataset and it has the possibility to solve the social media bot dilemma where there may be several different types of bots in the data. BoostOR was first introduced as an adjustment to existing ensemble classifiers to increase recall. However, after testing the BoostOR algorithm on unobserved datasets, results showed that BoostOR does not perform as expected. This study attempts to improve the BoostOR algorithm by comparing it with a baseline classification algorithm, AdaBoost, and then discussing the intentional differences between the two. Additionally, this study presents the main factors which contribute to the shortcomings of the BoostOR algorithm and proposes a solution to improve it. These recommendations should ensure that the BoostOR algorithm can be applied to new and unobserved datasets in the future.
ContributorsDavis, Matthew William (Author) / Liu, Huan (Thesis director) / Nazer, Tahora H. (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)
Created2018-12
Description
In the field of machine learning, reinforcement learning stands out for its ability to explore approaches to complex, high dimensional problems that outperform even expert humans. For robotic locomotion tasks reinforcement learning provides an approach to solving them without the need for unique controllers. In this thesis, two reinforcement learning

In the field of machine learning, reinforcement learning stands out for its ability to explore approaches to complex, high dimensional problems that outperform even expert humans. For robotic locomotion tasks reinforcement learning provides an approach to solving them without the need for unique controllers. In this thesis, two reinforcement learning algorithms, Deep Deterministic Policy Gradient and Group Factor Policy Search are compared based upon their performance in the bipedal walking environment provided by OpenAI gym. These algorithms are evaluated on their performance in the environment and their sample efficiency.
ContributorsMcDonald, Dax (Author) / Ben Amor, Heni (Thesis director) / Yang, Yezhou (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)
Created2018-12
133482-Thumbnail Image.png
Description
Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries

Cryptocurrencies have become one of the most fascinating forms of currency and economics due to their fluctuating values and lack of centralization. This project attempts to use machine learning methods to effectively model in-sample data for Bitcoin and Ethereum using rule induction methods. The dataset is cleaned by removing entries with missing data. The new column is created to measure price difference to create a more accurate analysis on the change in price. Eight relevant variables are selected using cross validation: the total number of bitcoins, the total size of the blockchains, the hash rate, mining difficulty, revenue from mining, transaction fees, the cost of transactions and the estimated transaction volume. The in-sample data is modeled using a simple tree fit, first with one variable and then with eight. Using all eight variables, the in-sample model and data have a correlation of 0.6822657. The in-sample model is improved by first applying bootstrap aggregation (also known as bagging) to fit 400 decision trees to the in-sample data using one variable. Then the random forests technique is applied to the data using all eight variables. This results in a correlation between the model and data of 9.9443413. The random forests technique is then applied to an Ethereum dataset, resulting in a correlation of 9.6904798. Finally, an out-of-sample model is created for Bitcoin and Ethereum using random forests, with a benchmark correlation of 0.03 for financial data. The correlation between the training model and the testing data for Bitcoin was 0.06957639, while for Ethereum the correlation was -0.171125. In conclusion, it is confirmed that cryptocurrencies can have accurate in-sample models by applying the random forests method to a dataset. However, out-of-sample modeling is more difficult, but in some cases better than typical forms of financial data. It should also be noted that cryptocurrency data has similar properties to other related financial datasets, realizing future potential for system modeling for cryptocurrency within the financial world.
ContributorsBrowning, Jacob Christian (Author) / Meuth, Ryan (Thesis director) / Jones, Donald (Committee member) / McCulloch, Robert (Committee member) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2018-05
134011-Thumbnail Image.png
Description
Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to extracting semantic meaning from thousands of medical form PDFs. Currently

Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to extracting semantic meaning from thousands of medical form PDFs. Currently in the BioElectrical Systems and Technology Lab, there is a biosensor in development that retrieves and analyzes data manually. In a proof of concept, this project uses the neural network architecture to automatically parse and classify a cardiac disease data set as well as explore health related factors impacting cardiac disease in patients of all ages.
Created2018-05
132995-Thumbnail Image.png
Description
Lyric classification and generation are trending in topics in the machine learning community. Long Short-Term Networks (LSTMs) are effective tools for classifying and generating text. We explored their effectiveness in the generation and classification of lyrical data and proposed methods of evaluating their accuracy. We found that LSTM networks with

Lyric classification and generation are trending in topics in the machine learning community. Long Short-Term Networks (LSTMs) are effective tools for classifying and generating text. We explored their effectiveness in the generation and classification of lyrical data and proposed methods of evaluating their accuracy. We found that LSTM networks with dropout layers were effective at lyric classification. We also found that Word embedding LSTM networks were extremely effective at lyric generation.
ContributorsTallapragada, Amit (Author) / Ben Amor, Heni (Thesis director) / Caviedes, Jorge (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)
Created2019-05
134943-Thumbnail Image.png
Description
Prostate cancer is the second most common kind of cancer in men. Fortunately, it has a 99% survival rate. To achieve such a survival rate, a variety of aggressive therapies are used to treat prostate cancers that are caught early. Androgen deprivation therapy (ADT) is a therapy that is given

Prostate cancer is the second most common kind of cancer in men. Fortunately, it has a 99% survival rate. To achieve such a survival rate, a variety of aggressive therapies are used to treat prostate cancers that are caught early. Androgen deprivation therapy (ADT) is a therapy that is given in cycles to patients. This study attempted to analyze what factors in a group of 79 patients caused them to stick with or discontinue the treatment. This was done using naïve Bayes classification, a machine-learning algorithm. The usage of this algorithm identified high testosterone as an indicator of a patient persevering with the treatment, but failed to produce statistically significant high rates of prediction.
ContributorsMillea, Timothy Michael (Author) / Kostelich, Eric (Thesis director) / Kuang, Yang (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-12
Description
Social media users are inundated with information. Especially on Instagram--a social media service based on sharing photos--where for many users, missing important posts is a common issue. By creating a recommendation system which learns each user's preference and gives them a curated list of posts, the information overload issue can

Social media users are inundated with information. Especially on Instagram--a social media service based on sharing photos--where for many users, missing important posts is a common issue. By creating a recommendation system which learns each user's preference and gives them a curated list of posts, the information overload issue can be mediated in order to enhance the user experience for Instagram users. This paper explores methods for creating such a recommendation system. The proposed method employs a learning model called ``Factorization Machines" which combines the advantages of linear models and latent factor models. In this work I derived features from Instagram post data, including the image, social data about the post, and information about the user who created the post. I also collect user-post interaction data describing which users ``liked" which posts, and this was used in models leveraging latent factors. The proposed model successfully improves the rate of interesting content seen by the user by anywhere from 2 to 12 times.
ContributorsFakhri, Kian (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-12
135758-Thumbnail Image.png
Description
Food safety is vital to the well-being of society; therefore, it is important to inspect food products to ensure minimal health risks are present. A crucial phase of food inspection is the identification of foreign particles found in the sample, such as insect body parts. The presence of certain species

Food safety is vital to the well-being of society; therefore, it is important to inspect food products to ensure minimal health risks are present. A crucial phase of food inspection is the identification of foreign particles found in the sample, such as insect body parts. The presence of certain species of insects, especially storage beetles, is a reliable indicator of possible contamination during storage and food processing. However, the current approach to identifying species is visual examination by human analysts; this method is rather subjective and time-consuming. Furthermore, confident identification requires extensive experience and training. To aid this inspection process, we have developed in collaboration with FDA analysts some image analysis-based machine intelligence to achieve species identification with up to 90% accuracy. The current project is a continuation of this development effort. Here we present an image analysis environment that allows practical deployment of the machine intelligence on computers with limited processing power and memory. Using this environment, users can prepare input sets by selecting images for analysis, and inspect these images through the integrated pan, zoom, and color analysis capabilities. After species analysis, the results panel allows the user to compare the analyzed images with referenced images of the proposed species. Further additions to this environment should include a log of previously analyzed images, and eventually extend to interaction with a central cloud repository of images through a web-based interface. Additional issues to address include standardization of image layout, extension of the feature-extraction algorithm, and utilizing image classification to build a central search engine for widespread usage.
ContributorsMartin, Daniel Luis (Author) / Ahn, Gail-Joon (Thesis director) / Doupé, Adam (Committee member) / Xu, Joshua (Committee member) / Computer Science and Engineering Program (Contributor) / Department of Finance (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05
147904-Thumbnail Image.png
Description

This paper is centered on the use of generative adversarial networks (GANs) to convert or generate RGB images from grayscale ones. The primary goal is to create sensible and colorful versions of a set of grayscale images by training a discriminator to recognize failed or generated images and training a

This paper is centered on the use of generative adversarial networks (GANs) to convert or generate RGB images from grayscale ones. The primary goal is to create sensible and colorful versions of a set of grayscale images by training a discriminator to recognize failed or generated images and training a generator to attempt to satisfy the discriminator. The network design is described in further detail below; however there are several potential issues that arise including the averaging of a color for certain images such that small details in an image are not assigned unique colors leading to a neutral blend. We attempt to mitigate this issue as much as possible.

ContributorsMarkabawi, Jah (Co-author) / Masud, Abdullah (Co-author) / Lobo, Ian (Co-author) / Koleber, Keith (Co-author) / Yang, Yingzhen (Thesis director) / Wang, Yancheng (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)
Created2021-05