Search Content

Computational methods for knowledge integration in the analysis of large-scale biological networks

Description

As we migrate into an era of personalized medicine, understanding how bio-molecules interact with one another to form cellular systems is one of the key focus areas of systems biology. Several challenges such as the dynamic nature of cellular systems, uncertainty due to environmental influences, and the heterogeneity between individual…

As we migrate into an era of personalized medicine, understanding how bio-molecules interact with one another to form cellular systems is one of the key focus areas of systems biology. Several challenges such as the dynamic nature of cellular systems, uncertainty due to environmental influences, and the heterogeneity between individual patients render this a difficult task. In the last decade, several algorithms have been proposed to elucidate cellular systems from data, resulting in numerous data-driven hypotheses. However, due to the large number of variables involved in the process, many of which are unknown or not measurable, such computational approaches often lead to a high proportion of false positives. This renders interpretation of the data-driven hypotheses extremely difficult. Consequently, a dismal proportion of these hypotheses are subject to further experimental validation, eventually limiting their potential to augment existing biological knowledge. This dissertation develops a framework of computational methods for the analysis of such data-driven hypotheses leveraging existing biological knowledge. Specifically, I show how biological knowledge can be mapped onto these hypotheses and subsequently augmented through novel hypotheses. Biological hypotheses are learnt in three levels of abstraction -- individual interactions, functional modules and relationships between pathways, corresponding to three complementary aspects of biological systems. The computational methods developed in this dissertation are applied to high throughput cancer data, resulting in novel hypotheses with potentially significant biological impact.

ContributorsRamesh, Archana (Author) / Kim, Seungchan (Thesis advisor) / Langley, Patrick W (Committee member) / Baral, Chitta (Committee member) / Kiefer, Jeffrey (Committee member) / Arizona State University (Publisher)

Created2012

Predicting Trends on Twitter with Time Series Analysis

Description

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public.…

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public. With this motivation, this paper develops a model for trends leveraging previous work with k-nearest-neighbors and dynamic time warping. The development of this model provides insight into the length and features of trends, and successfully generalizes to identify 74.3% of trends in the time period of interest. The model developed in this work provides understanding into why par- ticular words trend on Twitter.

ContributorsMarshall, Grant A (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Categorizing and Discovering Social Bots

Description

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we describe previous work in bot detection and observation, and then we perform bot detection of our own. For our bot…

Bots tamper with social media networks by artificially inflating the popularity of certain topics. In this paper, we define what a bot is, we detail different motivations for bots, we describe previous work in bot detection and observation, and then we perform bot detection of our own. For our bot detection, we are interested in bots on Twitter that tweet Arabic extremist-like phrases. A testing dataset is collected using the honeypot method, and five different heuristics are measured for their effectiveness in detecting bots. The model underperformed, but we have laid the ground-work for a vastly untapped focus on bot detection: extremist ideal diffusion through bots.

ContributorsKarlsrud, Mark C. (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Barrett, The Honors College (Contributor) / Computing and Informatics Program (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

An Image Analysis Environment for Species Identification of Food Contaminating Beetles

Description

Food safety is vital to the well-being of society; therefore, it is important to inspect food products to ensure minimal health risks are present. A crucial phase of food inspection is the identification of foreign particles found in the sample, such as insect body parts. The presence of certain species…

Food safety is vital to the well-being of society; therefore, it is important to inspect food products to ensure minimal health risks are present. A crucial phase of food inspection is the identification of foreign particles found in the sample, such as insect body parts. The presence of certain species of insects, especially storage beetles, is a reliable indicator of possible contamination during storage and food processing. However, the current approach to identifying species is visual examination by human analysts; this method is rather subjective and time-consuming. Furthermore, confident identification requires extensive experience and training. To aid this inspection process, we have developed in collaboration with FDA analysts some image analysis-based machine intelligence to achieve species identification with up to 90% accuracy. The current project is a continuation of this development effort. Here we present an image analysis environment that allows practical deployment of the machine intelligence on computers with limited processing power and memory. Using this environment, users can prepare input sets by selecting images for analysis, and inspect these images through the integrated pan, zoom, and color analysis capabilities. After species analysis, the results panel allows the user to compare the analyzed images with referenced images of the proposed species. Further additions to this environment should include a log of previously analyzed images, and eventually extend to interaction with a central cloud repository of images through a web-based interface. Additional issues to address include standardization of image layout, extension of the feature-extraction algorithm, and utilizing image classification to build a central search engine for widespread usage.

ContributorsMartin, Daniel Luis (Author) / Ahn, Gail-Joon (Thesis director) / DoupÃÂ©, Adam (Committee member) / Xu, Joshua (Committee member) / Computer Science and Engineering Program (Contributor) / Department of Finance (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

First Impressions: A Multimodal Analysis of Movie Trailers and Film Success

Description

Due to the popularity of the movie industry, a film's opening weekend box-office performance is of great interest not only to movie studios, but to the general public, as well. In hopes of maximizing a film's opening weekend revenue, movie studios invest heavily in pre-release advertisement. The most visible advertisement…

Due to the popularity of the movie industry, a film's opening weekend box-office performance is of great interest not only to movie studios, but to the general public, as well. In hopes of maximizing a film's opening weekend revenue, movie studios invest heavily in pre-release advertisement. The most visible advertisement is the movie trailer, which, in no more than two minutes and thirty seconds, serves as many people's first introduction to a film. The question, however, is how can we be confident that a trailer will succeed in its promotional task, and bring about the audience a studio expects? In this thesis, we use machine learning classification techniques to determine the effectiveness of a movie trailer in the promotion of its namesake. We accomplish this by creating a predictive model that automatically analyzes the audio and visual characteristics of a movie trailer to determine whether or not a film's opening will be successful by earning at least 35% of a film's production budget during its first U.S. box office weekend. Our predictive model performed reasonably well, achieving an accuracy of 68.09% in a binary classification. Accuracy increased to 78.62% when including genre in our predictive model.

ContributorsWilliams, Terrance D'Mitri (Author) / Pon-Barry, Heather (Thesis director) / Zafarani, Reza (Committee member) / Maciejewski, Ross (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2014-05

Query System for epiDMS and EnergyPlus

Description

With the development of technology, there has been a dramatic increase in the number of machine learning programs. These complex programs make conclusions and can predict or perform actions based off of models from previous runs or input information. However, such programs require the storing of a very large amount…

With the development of technology, there has been a dramatic increase in the number of machine learning programs. These complex programs make conclusions and can predict or perform actions based off of models from previous runs or input information. However, such programs require the storing of a very large amount of data. Queries allow users to extract only the information that helps for their investigation. The purpose of this thesis was to create a system with two important components, querying and visualization. Metadata was stored in Sedna as XML and time series data was stored in OpenTSDB as JSON. In order to connect the two databases, the time series ID was stored as a metric in the XML metadata. Queries should be simple, flexible, and return all data that fits the query parameters. The query language used was an extension of XQuery FLWOR that added time series parameters. Visualization should be easily understood and be organized in a way to easily find important information and details. Because of the possibility of a large amount of data being returned from a query, a multivariate heat map was used to visualize the time series results. The two programs that the system performed queries on was Energy Plus and Epidemic Simulation Data Management System. By creating such a system, it would be easier for people of the project's fields to find the relationship between metadata that leads to the desired results over time. Over the time of the thesis project, the overall software was completed, however the software must be optimized in order to take the enormous amount of data expected from the system.

ContributorsTse, Adam Yusof (Author) / Candan, Selcuk (Thesis director) / Chen, Xilun (Committee member) / Barrett, The Honors College (Contributor) / School of Music (Contributor) / Computer Science and Engineering Program (Contributor)

Created2015-05

Using Machine Learning Models to Detect Fake News, Bots, and Rumors on Social Media

Description

In this paper, I introduce the fake news problem and detail how it has been exacerbated through social media. I explore current practices for fake news detection using natural language processing and current benchmarks in ranking the efficacy of various language models. Using a Twitter-specific benchmark, I attempt to reproduce the scores of…

In this paper, I introduce the fake news problem and detail how it has been exacerbated through social media. I explore current practices for fake news detection using natural language processing and current benchmarks in ranking the efficacy of various language models. Using a Twitter-specific benchmark, I attempt to reproduce the scores of six language models demonstrating their effectiveness in seven tweet classification tasks. I explain the successes and challenges in reproducing these results and provide analysis for the future implications of fake news research.

ContributorsChang, Ariz Bay (Author) / Liu, Huan (Thesis director) / Tahir, Anique (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Using Neural Networks to Perform Colorwork on Images

Description

This paper is centered on the use of generative adversarial networks (GANs) to convert or generate RGB images from grayscale ones. The primary goal is to create sensible and colorful versions of a set of grayscale images by training a discriminator to recognize failed or generated images and training a…

This paper is centered on the use of generative adversarial networks (GANs) to convert or generate RGB images from grayscale ones. The primary goal is to create sensible and colorful versions of a set of grayscale images by training a discriminator to recognize failed or generated images and training a generator to attempt to satisfy the discriminator. The network design is described in further detail below; however there are several potential issues that arise including the averaging of a color for certain images such that small details in an image are not assigned unique colors leading to a neutral blend. We attempt to mitigate this issue as much as possible.

ContributorsMarkabawi, Jah (Co-author) / Masud, Abdullah (Co-author) / Lobo, Ian (Co-author) / Koleber, Keith (Co-author) / Yang, Yingzhen (Thesis director) / Wang, Yancheng (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Using Neural Networks to Perform Colorwork on Images

Description

This paper is centered on the use of generative adversarial networks (GANs) to convert or generate RGB images from grayscale ones. The primary goal is to create sensible and colorful versions of a set of grayscale images by training a discriminator to recognize failed or generated images and training a…

This paper is centered on the use of generative adversarial networks (GANs) to convert or generate RGB images from grayscale ones. The primary goal is to create sensible and colorful versions of a set of grayscale images by training a discriminator to recognize failed or generated images and training a generator to attempt to satisfy the discriminator. The network design is described in further detail below; however there are several potential issues that arise including the averaging of a color for certain images such that small details in an image are not assigned unique colors leading to a neutral blend. We attempt to mitigate this issue as much as possible.

ContributorsMasud, Abdullah Bin (Co-author) / Koleber, Keith (Co-author) / Lobo, Ian (Co-author) / Markabawi, Jah (Co-author) / Yang, Yingzhen (Thesis director) / Wang, Yancheng (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Self Play Machine Learning and Pokemon

Description

Video games often feature agents that the human player interacts with to overcome.
Designing these agents to cover every case of human interaction is difficult, and usually
imperfect, as human players are capable of learning to overcome these agents in unintended
ways. Artificial intelligence is a growing field that seeks to solve problems…

Video games often feature agents that the human player interacts with to overcome.
Designing these agents to cover every case of human interaction is difficult, and usually
imperfect, as human players are capable of learning to overcome these agents in unintended
ways. Artificial intelligence is a growing field that seeks to solve problems by simulating
learning in specific environments. The aim of this paper is to explore the applications that the
self play learning branch of artificial intelligence may pose on game development in the future,
and to attempt to implement a working version of a self play agent learning to play a Pokemon
battle. Originally designed Pokemon battle behavior is often suboptimal, getting stuck making
ineffective or incorrect choices, so training a self play model to learn the strategy and structure of
Pokemon battles from a clean slate would result in an organic agent that would outperform the
original behavior of the computer controlled agents. Though unsuccessful in my implementation,
this paper serves as a record of the exploration of this field, and a log of what worked and what
did not, in order to benefit any future person interested in the same topics.

ContributorsCiudad, Erick Marcel (Author) / Meuth, Ryan (Thesis director) / Kobayashi, Yoshihiro (Committee member) / Computing and Informatics Program (Contributor) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-12

Filtering by