Search Content

Topological Descriptors for Parkinson's Disease Classification and Regression Analysis

Description

At present, the vast majority of human subjects with neurological disease are still diagnosed through in-person assessments and qualitative analysis of patient data. In this paper, we propose to use Topological Data Analysis (TDA) together with machine learning tools to automate the process of Parkinson’s disease classification and severity assessment.…

At present, the vast majority of human subjects with neurological disease are still diagnosed through in-person assessments and qualitative analysis of patient data. In this paper, we propose to use Topological Data Analysis (TDA) together with machine learning tools to automate the process of Parkinson’s disease classification and severity assessment. An automated, stable, and accurate method to evaluate Parkinson’s would be significant in streamlining diagnoses of patients and providing families more time for corrective measures. We propose a methodology which incorporates TDA into analyzing Parkinson’s disease postural shifts data through the representation of persistence images. Studying the topology of a system has proven to be invariant to small changes in data and has been shown to perform well in discrimination tasks. The contributions of the paper are twofold. We propose a method to 1) classify healthy patients from those afflicted by disease and 2) diagnose the severity of disease. We explore the use of the proposed method in an application involving a Parkinson’s disease dataset comprised of healthy-elderly, healthy-young and Parkinson’s disease patients.

ContributorsRahman, Farhan Nadir (Co-author) / Nawar, Afra (Co-author) / Turaga, Pavan (Thesis director) / Krishnamurthi, Narayanan (Committee member) / Electrical Engineering Program (Contributor) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Data Management Behind Machine Learning

Description

This thesis dives into the world of artificial intelligence by exploring the functionality of a single layer artificial neural network through a simple housing price classification example while simultaneously considering its impact from a data management perspective on both the software and hardware level. To begin this study, the universally…

This thesis dives into the world of artificial intelligence by exploring the functionality of a single layer artificial neural network through a simple housing price classification example while simultaneously considering its impact from a data management perspective on both the software and hardware level. To begin this study, the universally accepted model of an artificial neuron is broken down into its key components and then analyzed for functionality by relating back to its biological counterpart. The role of a neuron is then described in the context of a neural network, with equal emphasis placed on how it individually undergoes training and then for an entire network. Using the technique of supervised learning, the neural network is trained with three main factors for housing price classification, including its total number of rooms, bathrooms, and square footage. Once trained with most of the generated data set, it is tested for accuracy by introducing the remainder of the data-set and observing how closely its computed output for each set of inputs compares to the target value. From a programming perspective, the artificial neuron is implemented in C so that it would be more closely tied to the operating system and therefore make the collected profiler data more precise during the program's execution. The program is designed to break down each stage of the neuron's training process into distinct functions. In addition to utilizing more functional code, the struct data type is used as the underlying data structure for this project to not only represent the neuron but for implementing the neuron's training and test data. Once fully trained, the neuron's test results are then graphed to visually depict how well the neuron learned from its sample training set. Finally, the profiler data is analyzed to describe how the program operated from a data management perspective on the software and hardware level.

ContributorsRichards, Nicholas Giovanni (Author) / Miller, Phillip (Thesis director) / Meuth, Ryan (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Categorizing and Discovering Social Bots

Description

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we describe previous work in bot detection and observation, and then we perform bot detection of our own. For our bot…

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we describe previous work in bot detection and observation, and then we perform bot detection of our own. For our bot detection, we are interested in bots on Twitter that tweet Arabic extremist-like phrases. A testing dataset is collected using the honeypot method, and five different heuristics are measured for their effectiveness in detecting bots. The model underperformed, but we have laid the ground-work for a vastly untapped focus on bot detection: extremist ideal diffusion through bots.

ContributorsKarlsrud, Mark C. (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Barrett, The Honors College (Contributor) / Computing and Informatics Program (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Predicting Trends on Twitter with Time Series Analysis

Description

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public.…

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public. With this motivation, this paper develops a model for trends leveraging previous work with k-nearest-neighbors and dynamic time warping. The development of this model provides insight into the length and features of trends, and successfully generalizes to identify 74.3% of trends in the time period of interest. The model developed in this work provides understanding into why par- ticular words trend on Twitter.

ContributorsMarshall, Grant A (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Query System for epiDMS and EnergyPlus

Description

With the development of technology, there has been a dramatic increase in the number of machine learning programs. These complex programs make conclusions and can predict or perform actions based off of models from previous runs or input information. However, such programs require the storing of a very large amount…

With the development of technology, there has been a dramatic increase in the number of machine learning programs. These complex programs make conclusions and can predict or perform actions based off of models from previous runs or input information. However, such programs require the storing of a very large amount of data. Queries allow users to extract only the information that helps for their investigation. The purpose of this thesis was to create a system with two important components, querying and visualization. Metadata was stored in Sedna as XML and time series data was stored in OpenTSDB as JSON. In order to connect the two databases, the time series ID was stored as a metric in the XML metadata. Queries should be simple, flexible, and return all data that fits the query parameters. The query language used was an extension of XQuery FLWOR that added time series parameters. Visualization should be easily understood and be organized in a way to easily find important information and details. Because of the possibility of a large amount of data being returned from a query, a multivariate heat map was used to visualize the time series results. The two programs that the system performed queries on was Energy Plus and Epidemic Simulation Data Management System. By creating such a system, it would be easier for people of the project's fields to find the relationship between metadata that leads to the desired results over time. Over the time of the thesis project, the overall software was completed, however the software must be optimized in order to take the enormous amount of data expected from the system.

ContributorsTse, Adam Yusof (Author) / Candan, Selcuk (Thesis director) / Chen, Xilun (Committee member) / Barrett, The Honors College (Contributor) / School of Music (Contributor) / Computer Science and Engineering Program (Contributor)

Created2015-05

Jaipur Simulation and AI

Description

This paper details the process for designing both a simulation of the board game Jaipur, and an artificial intelligence (AI) agent that can play the game against a human player. When designing an AI for a card game, there are two major problems that can arise. The first is the…

This paper details the process for designing both a simulation of the board game Jaipur, and an artificial intelligence (AI) agent that can play the game against a human player. When designing an AI for a card game, there are two major problems that can arise. The first is the difficulty of using a search space to analyze every possible set of future moves. Due to the randomized nature of the deck of cards, the search space rapidly leads to an exponentially growing set of potential game states to analyze when one tries to look more than one turn ahead. The second aspect that poses difficulty is the element of uncertainty that exists from opponent feedback. Certain moves are weak to specific opponent reactions, and these are difficult to predict due to hidden information. To circumvent these problems, the AI uses a greedy approach to decision making, attempting to maximize the value of its plays immediately, and not play for future turns. The agent utilizes conditional statements to evaluate the game state and choose a game action that it deems optimal, a heuristic to place an expected value (EV) of the goods it can choose from, and selects the best one based on this evaluation. Initial implementation of the simulation was done using C++ through a terminal application, and then was translated to a graphical interface using Unity and C#.

ContributorsOrr, James Christopher (Author) / Kobayashi, Yoshihiro (Thesis director) / Selgrad, Justin (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Comparison of sentiment analysis systems and an application in signed link prediction

Description

Social media sites are platforms in which individuals discuss a wide range of topics and share a huge amount of information about themselves and their interests. So much of this information is encoded through unstructured text that users post on the these types of sites. There has been a considerable…

Social media sites are platforms in which individuals discuss a wide range of topics and share a huge amount of information about themselves and their interests. So much of this information is encoded through unstructured text that users post on the these types of sites. There has been a considerable amount of work done in respect to sentiment analysis on these sites to infer users' opinions and preferences. However there is a gap where it may be difficult to infer how a user feels about particular pages or topics that they have not conveyed their sentiment for in a observable form. Collaborative filtering is a common method used to solve this problem with user data, but has only infrequently been used with sentiment information in order to make inferences about users preferences. In this paper we extend previous work on leveraging sentiment in collaborative filtering, specifically to approximate user sentiment and subsequently their vote for candidates in an online election. Sentiment is shown to be an effective tool for making these types of predictions in the absence of other more explicit user preference information. In addition to this, we present an evaluation of sentiment analysis methods and tools that are used in state of the art sentiment analysis systems in order to understand which of these methods to leverage in our experiments.

ContributorsBaird, James Daniel (Author) / Liu, Huan (Thesis director) / Wang, Suhang (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

First Impressions: A Multimodal Analysis of Movie Trailers and Film Success

Description

Due to the popularity of the movie industry, a film's opening weekend box-office performance is of great interest not only to movie studios, but to the general public, as well. In hopes of maximizing a film's opening weekend revenue, movie studios invest heavily in pre-release advertisement. The most visible advertisement…

Due to the popularity of the movie industry, a film's opening weekend box-office performance is of great interest not only to movie studios, but to the general public, as well. In hopes of maximizing a film's opening weekend revenue, movie studios invest heavily in pre-release advertisement. The most visible advertisement is the movie trailer, which, in no more than two minutes and thirty seconds, serves as many people's first introduction to a film. The question, however, is how can we be confident that a trailer will succeed in its promotional task, and bring about the audience a studio expects? In this thesis, we use machine learning classification techniques to determine the effectiveness of a movie trailer in the promotion of its namesake. We accomplish this by creating a predictive model that automatically analyzes the audio and visual characteristics of a movie trailer to determine whether or not a film's opening will be successful by earning at least 35% of a film's production budget during its first U.S. box office weekend. Our predictive model performed reasonably well, achieving an accuracy of 68.09% in a binary classification. Accuracy increased to 78.62% when including genre in our predictive model.

ContributorsWilliams, Terrance D'Mitri (Author) / Pon-Barry, Heather (Thesis director) / Zafarani, Reza (Committee member) / Maciejewski, Ross (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2014-05

Malware Analysis and Classification Framework: Detecting Financial Malware Using Machine Learning Techniques

Description

Malware that perform identity theft or steal bank credentials are becoming increasingly common and can cause millions of dollars of damage annually. A large area of research focus is the automated detection and removal of such malware, due to their large impact on millions of people each year. Such a…

Malware that perform identity theft or steal bank credentials are becoming increasingly common and can cause millions of dollars of damage annually. A large area of research focus is the automated detection and removal of such malware, due to their large impact on millions of people each year. Such a detector will be beneficial to any industry that is regularly the target of malware, such as the financial sector. Typical detection approaches such as those found in commercial anti-malware software include signature-based scanning, in which malware executables are identified based on a unique signature or fingerprint developed for that malware. However, as malware authors continue to modify and obfuscate their malware, heuristic detection is increasingly popular, in which the behaviors of the malware are identified and patterns recognized. We explore a malware analysis and classification framework using machine learning to train classifiers to distinguish between malware and benign programs based upon their features and behaviors. Using both decision tree learning and support vector machines as classifier models, we obtained overall classification accuracies of around 80%. Due to limitations primarily including the usage of a small data set, our approach may not be suitable for practical classification of malware and benign programs, as evident by a high error rate.

ContributorsAnwar, Sajid (Co-author) / Chan, Tsz (Co-author) / Ahn, Gail-Joon (Thesis director) / Zhao, Ziming (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Development of an Educational Video Game

Description

The objective of this creative project was to gain experience in digital modeling, animation, coding, shader development and implementation, model integration techniques, and application of gaming principles and design through developing a professional educational game. The team collaborated with Glendale Community College (GCC) to produce an interactive product intended to…

The objective of this creative project was to gain experience in digital modeling, animation, coding, shader development and implementation, model integration techniques, and application of gaming principles and design through developing a professional educational game. The team collaborated with Glendale Community College (GCC) to produce an interactive product intended to supplement educational instructions regarding nutrition. The educational game developed, "Nutribots" features the player acting as a nutrition based nanobot sent to the small intestine to help the body. Throughout the game the player will be asked nutrition based questions to test their knowledge of proteins, carbohydrates, and lipids. If the player is unable to answer the question, they must use game mechanics to progress and receive the information as a reward. The level is completed as soon as the question is answered correctly. If the player answers the questions incorrectly twenty times within the entirety of the game, the team loses faith in the player, and the player must reset from title screen. This is to limit guessing and to make sure the player retains the information through repetition once it is demonstrated that they do not know the answers. The team was split into two different groups for the development of this game. The first part of the team developed models, animations, and textures using Autodesk Maya 2016 and Marvelous Designer. The second part of the team developed code and shaders, and implemented products from the first team using Unity and Visual Studio. Once a prototype of the game was developed, it was show-cased amongst peers to gain feedback. Upon receiving feedback, the team implemented the desired changes accordingly. Development for this project began on November 2015 and ended on April 2017. Special thanks to Laura Avila Department Chair and Jennifer Nolz from Glendale Community College Technology and Consumer Sciences, Food and Nutrition Department.

ContributorsNolz, Daisy (Co-author) / Martin, Austin (Co-author) / Quinio, Santiago (Co-author) / Armstrong, Jessica (Co-author) / Kobayashi, Yoshihiro (Thesis director) / Valderrama, Jamie (Committee member) / School of Arts, Media and Engineering (Contributor) / School of Film, Dance and Theatre (Contributor) / Department of English (Contributor) / Computer Science and Engineering Program (Contributor) / Computing and Informatics Program (Contributor) / Herberger Institute for Design and the Arts (Contributor) / School of Sustainability (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Filtering by