Search Content

Predicting Trends on Twitter with Time Series Analysis

Description

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public.…

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public. With this motivation, this paper develops a model for trends leveraging previous work with k-nearest-neighbors and dynamic time warping. The development of this model provides insight into the length and features of trends, and successfully generalizes to identify 74.3% of trends in the time period of interest. The model developed in this work provides understanding into why par- ticular words trend on Twitter.

ContributorsMarshall, Grant A (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Reliance Dashboard: An Automated Real Estate Data Analysis Dashboard

Description

Investment real estate is unique among similar financial instruments by nature of each property's internal complexities and interaction with the external economy. Where a majority of tradable assets are static goods within a dynamic market, real estate investments are dynamic goods within a dynamic market. Furthermore, investment real estate, particularly…

Investment real estate is unique among similar financial instruments by nature of each property's internal complexities and interaction with the external economy. Where a majority of tradable assets are static goods within a dynamic market, real estate investments are dynamic goods within a dynamic market. Furthermore, investment real estate, particularly commercial properties, not only interacts with the surrounding economy, it reflects it. Alive with tenancy, each and every commercial investment property provides a microeconomic view of businesses that make up the local economy. Management of commercial investment real estate captures this economic snapshot in a unique abundance of untapped statistical data. While analysis of such data is undeniably valuable, the efforts involved with this process are time consuming. Given this unutilized potential our team has develop proprietary software to analyze this data and communicate the results automatically though and easy to use interface. We have worked with a local real estate property management and ownership firm, Reliance Management, to develop this system through the use of their current, historical, and future data. Our team has also built a relationship with the executives of Reliance Management to review functionality and pertinence of the system we have dubbed, Reliance Dashboard.

ContributorsBurton, Daryl (Co-author) / Workman, Jack (Co-author) / LePine, Marcie (Thesis director) / Atkinson, Robert (Committee member) / Barrett, The Honors College (Contributor) / Department of Finance (Contributor) / Department of Management (Contributor) / Computer Science and Engineering Program (Contributor)

Created2015-05

Data Analytics to Identify the Genetic Basis for Resilience to Temperature Stress in Soybeans

Description

This paper explores the ability to predict yields of soybeans based on genetics and environmental factors. Based on the biology of soybeans, it has been shown that yields are best when soybeans grow within a certain temperature range. The event a soybean is exposed to temperature outside their accepted range…

This paper explores the ability to predict yields of soybeans based on genetics and environmental factors. Based on the biology of soybeans, it has been shown that yields are best when soybeans grow within a certain temperature range. The event a soybean is exposed to temperature outside their accepted range is labeled as an instance of stress. Currently, there are few models that use genetic information to predict how crops may respond to stress. Using data provided by an agricultural business, a model was developed that can categorically label soybean varieties by their yield response to stress using genetic data. The model clusters varieties based on their yield production in response to stress. The clustering criteria is based on variance distribution and correlation. A logistic regression is then fitted to identify significant gene markers in varieties with minimal yield variance. Such characteristics provide a probabilistic outlook of how certain varieties will perform when planted in different regions. Given changing global climate conditions, this model demonstrates the potential of using data to efficiently develop and grow crops adjusted to climate changes.

ContributorsDean, Arlen (Co-author) / Ozcan, Ozkan (Co-author) / Travis, Daniel (Co-author) / Gel, Esma (Thesis director) / Armbruster, Dieter (Committee member) / Parry, Sam (Committee member) / Industrial, Systems and Operations Engineering Program (Contributor) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Big Data Network Analysis of Genetic Variation and Gene Expression in Individuals with Breast Cancer

Description

The advent of big data analytics tools and frameworks has allowed for a plethora of new approaches to research and analysis, making data sets that were previously too large or complex more accessible and providing methods to collect, store, and investigate non-traditional data. These tools are starting to be applied…

The advent of big data analytics tools and frameworks has allowed for a plethora of new approaches to research and analysis, making data sets that were previously too large or complex more accessible and providing methods to collect, store, and investigate non-traditional data. These tools are starting to be applied in more creative ways, and are being used to improve upon traditional computation methods through distributed computing. Statistical analysis of expression quantitative trait loci (eQTL) data has classically been performed using the open source tool PLINK - which runs on high performance computing (HPC) systems. However, progress has been made in running the statistical analysis in the ecosystem of the big data framework Hadoop, resulting in decreased run time, reduced storage footprint, reduced job micromanagement and increased data accessibility. Now that the data can be more readily manipulated, analyzed and accessed, there are opportunities to use the modularity and power of Hadoop to further process the data. This project focuses on adding a component to the data pipeline that will perform graph analysis on the data. This will provide more insight into the relation between various genetic differences in individuals with breast cancer, and the resulting variation - if any - in gene expression. Further, the investigation will look to see if there is anything to be garnered from a perspective shift; applying tools used in classical networking contexts (such as the Internet) to genetically derived networks.

ContributorsRandall, Jacob Christopher (Author) / Buetow, Kenneth (Thesis director) / Meuth, Ryan (Committee member) / Almalih, Sara (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Introduction to Unstructured Case Management

Description

Unstructured data management proves an increasingly valuable asset for organizations today as the amount of data organizations own increases every year. The purpose of this project is to detail the process which ServiceNow and CommonSpirit Health use in developing their new IntelliRoute model which aims to classify and auto-resolve a…

Unstructured data management proves an increasingly valuable asset for organizations today as the amount of data organizations own increases every year. The purpose of this project is to detail the process which ServiceNow and CommonSpirit Health use in developing their new IntelliRoute model which aims to classify and auto-resolve a significant portion of CommonSpirit Health’s more than 3,000,000 HR service-related cases. This paper examines typical strategies used to manage unstructured data and ServiceNow’s approach. Their approach focuses on data labelling by attaching a criticality sentiment to unstructured data and relating helpful knowledge base articles. The labelled data is then used to train an Artificial Intelligence model which automatically labels cases and refers appropriate knowledge articles.

ContributorsDe Waard, Jan (Author) / Bergsagel, Matteo (Co-author) / Chavez-Echeagaray, Maria Elena (Thesis director) / Burns, Christopher (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2022-05

Filtering by

Predicting Trends on Twitter with Time Series Analysis

Reliance Dashboard: An Automated Real Estate Data Analysis Dashboard

Data Analytics to Identify the Genetic Basis for Resilience to Temperature Stress in Soybeans

Big Data Network Analysis of Genetic Variation and Gene Expression in Individuals with Breast Cancer

Introduction to Unstructured Case Management