Search Content

Preliminary Results and the Unconsidered Potential of the 2014 Open Payments Research Dataset: Introducing a Complex Systems Framework for Extracting Meaningful Information from Big Data

Description

This work challenges the conventional perceptions surrounding the utility and use of the CMS Open Payments data. I suggest unconsidered methodologies for extracting meaningful information from these data following an exploratory analysis of the 2014 research dataset that, in turn, enhance its value as a public good. This dataset is…

This work challenges the conventional perceptions surrounding the utility and use of the CMS Open Payments data. I suggest unconsidered methodologies for extracting meaningful information from these data following an exploratory analysis of the 2014 research dataset that, in turn, enhance its value as a public good. This dataset is favored for analysis over the general payments dataset as it is believed that generating transparency in the pharmaceutical and medical device R&D process would be of the greatest benefit to public health. The research dataset has been largely ignored by analysts and this may be one of the few works that have accomplished a comprehensive exploratory analysis of these data. If we are to extract valuable information from this dataset, we must alter both our approach as well as focus our attention towards re-conceptualizing the questions that we ask. Adopting the theoretical framework of complex systems serves as the foundation for our interpretation of the research dataset. This framework, in conjunction with a methodological toolkit for network analysis, may set a precedent for the development of alternative perspectives that allow for novel interpretations of the information that big data attempts to convey. By thus proposing a novel perspective in interpreting the information that this dataset contains, it is possible to gain insight into the emergent dynamics of the collaborative relationships that are established during the pharmaceutical and medical device R&D process.

ContributorsVelez-Cruz, Nayely Luz (Author) / Laubichler, Manfred (Thesis director) / Van Der Leeuw, Sander (Committee member) / School of Sustainability (Contributor) / Sanford School of Social and Family Dynamics (Contributor) / School of Human Evolution and Social Change (Contributor) / School of Historical, Philosophical and Religious Studies (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Predicting Trends on Twitter with Time Series Analysis

Description

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public.…

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public. With this motivation, this paper develops a model for trends leveraging previous work with k-nearest-neighbors and dynamic time warping. The development of this model provides insight into the length and features of trends, and successfully generalizes to identify 74.3% of trends in the time period of interest. The model developed in this work provides understanding into why par- ticular words trend on Twitter.

ContributorsMarshall, Grant A (Author) / Liu, Huan (Thesis director) / Morstatter, Fred (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor)

Created2015-05

Reliance Dashboard: An Automated Real Estate Data Analysis Dashboard

Description

Investment real estate is unique among similar financial instruments by nature of each property's internal complexities and interaction with the external economy. Where a majority of tradable assets are static goods within a dynamic market, real estate investments are dynamic goods within a dynamic market. Furthermore, investment real estate, particularly…

Investment real estate is unique among similar financial instruments by nature of each property's internal complexities and interaction with the external economy. Where a majority of tradable assets are static goods within a dynamic market, real estate investments are dynamic goods within a dynamic market. Furthermore, investment real estate, particularly commercial properties, not only interacts with the surrounding economy, it reflects it. Alive with tenancy, each and every commercial investment property provides a microeconomic view of businesses that make up the local economy. Management of commercial investment real estate captures this economic snapshot in a unique abundance of untapped statistical data. While analysis of such data is undeniably valuable, the efforts involved with this process are time consuming. Given this unutilized potential our team has develop proprietary software to analyze this data and communicate the results automatically though and easy to use interface. We have worked with a local real estate property management and ownership firm, Reliance Management, to develop this system through the use of their current, historical, and future data. Our team has also built a relationship with the executives of Reliance Management to review functionality and pertinence of the system we have dubbed, Reliance Dashboard.

ContributorsBurton, Daryl (Co-author) / Workman, Jack (Co-author) / LePine, Marcie (Thesis director) / Atkinson, Robert (Committee member) / Barrett, The Honors College (Contributor) / Department of Finance (Contributor) / Department of Management (Contributor) / Computer Science and Engineering Program (Contributor)

Created2015-05

Darknet Markets Analysis & Business Intelligence

Description

In the era of big data, the impact of information technologies in improving organizational performance is growing as unstructured data is increasingly important to business intelligence. Daily data gives businesses opportunities to respond to changing markets. As a result, many companies invest lots of money in big data in order…

In the era of big data, the impact of information technologies in improving organizational performance is growing as unstructured data is increasingly important to business intelligence. Daily data gives businesses opportunities to respond to changing markets. As a result, many companies invest lots of money in big data in order to obtain adverse outcomes. In particular, analysis of commercial websites may reveal relations of different parties in digital markets that pose great value to businesses. However, complex ecommercial sites present significant challenges for primary web analysts. While some resources and tutorials of web analysis are available for studying, some learners especially entrylevel analysts still struggle with getting satisfying results. Thus, I am interested in developing a computer program in the Python programming language for investigating the relation between sellers’ listings and their seller levels in a darknet market. To investigate the relation, I couple web data retrieval techniques with doc2vec, a machine learning algorithm. This approach does not allow me to analyze the potential relation between sellers’ listings and reputations in the context of darknet markets, but assist other users of business intelligence with similar analysis of online markets. I present several conclusions found through the analysis. Key findings suggest that no relation exists between similarities of different sellers’ listings and their seller levels in rsClub Market. This study can become a great and unique example of web analysis and create potential values for modern enterprises.

ContributorsWang, Zhen (Author) / Benjamin, Victor (Thesis director) / Santanam, Raghu (Committee member) / Barrett, The Honors College (Contributor)

Created2018-05

Marketing Implications of Big Data: An Examination of the Retail Industry

Description

As the use of Big Data gains momentum and transitions into mainstream adoption, marketers are racing to generate valuable insights that can create well-informed strategic business decisions. The retail market is a fiercely competitive industry, and the rapid adoption of smartphones and tablets have led e-commerce rivals to grow at…

As the use of Big Data gains momentum and transitions into mainstream adoption, marketers are racing to generate valuable insights that can create well-informed strategic business decisions. The retail market is a fiercely competitive industry, and the rapid adoption of smartphones and tablets have led e-commerce rivals to grow at an unbelievable rate. Retailers are able to collect and analyze data from both their physical stores and e-commerce platforms, placing them in a unique position to be able to fully capitalize on the power of Big Data. This thesis is an examination of Big Data and how marketers can use it to create better experiences for consumers. Insights generated from the use of Big Data can result in increased customer engagement, loyalty, and retention for an organization. Businesses of all sizes, whether it be enterprise, small-to-midsize, and even solely e-commerce organizations have successfully implemented Big Data technology. However, there are issues regarding challenges and the ethical and legal concerns that need to be addressed as the world continues to adopt the use of Big Data analytics and insights. With the abundance of data collected in today's digital world, marketers must take advantage of available resources to improve the overall customer experience.

ContributorsHaghgoo, Sam (Author) / Ostrom, Amy (Thesis director) / Giles, Bret (Committee member) / Barrett, The Honors College (Contributor) / Department of Marketing (Contributor) / W. P. Carey School of Business (Contributor) / Department of Management (Contributor)

Created2014-05

THE INTERCONNECTIVITY OF DRIVERLESS CARS AND SMART CITIES

Description

Through the advancement of technology, social media, and the ever-growing connectedness society has with the digital world, the automotive industry’s market paradigm has been uprooted and turned onto its head. There is a race globally for the first company to achieve a truly autonomous vehicle, and one of the major…

Through the advancement of technology, social media, and the ever-growing connectedness society has with the digital world, the automotive industry’s market paradigm has been uprooted and turned onto its head. There is a race globally for the first company to achieve a truly autonomous vehicle, and one of the major testing grounds is in the very state of Arizona. The technology is still under development, and there are many challenges and snags, like necessary big data, companies are encountering along the way. A smart city could share the necessary level of data with driverless vehicles, and through the back and forth communication of cars and cities could bring in that level of context and understanding needed to bring the promise of safer driving to life. Currently, companies are tight-lipped and keep to themselves on their research and development, so governments are struggling to manage the upcoming changes with such little information. The challenge is how to deal with the newly emerging inventions which managers have not figured out yet, as far as autonomous cars are concerned. This thesis covers the difficulties governments and companies will face when attempting to adopt driverless cars and smart cities into their infrastructure; public approval, legislation, infrastructure reforms, and communication between municipals and corporations. Through a survey conducted specifically for this thesis, interviews with government officials and corporate managers, and additional research, this thesis provides clearer insights on the situation and provides recommendations for managers and governments alike.

ContributorsStone, Mindi (Author) / Lynch, Patrick (Thesis director) / Nelson, Roy C. (Committee member) / Thunderbird School of Global Management (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Data Analytics to Identify the Genetic Basis for Resilience to Temperature Stress in Soybeans

Description

This paper explores the ability to predict yields of soybeans based on genetics and environmental factors. Based on the biology of soybeans, it has been shown that yields are best when soybeans grow within a certain temperature range. The event a soybean is exposed to temperature outside their accepted range…

This paper explores the ability to predict yields of soybeans based on genetics and environmental factors. Based on the biology of soybeans, it has been shown that yields are best when soybeans grow within a certain temperature range. The event a soybean is exposed to temperature outside their accepted range is labeled as an instance of stress. Currently, there are few models that use genetic information to predict how crops may respond to stress. Using data provided by an agricultural business, a model was developed that can categorically label soybean varieties by their yield response to stress using genetic data. The model clusters varieties based on their yield production in response to stress. The clustering criteria is based on variance distribution and correlation. A logistic regression is then fitted to identify significant gene markers in varieties with minimal yield variance. Such characteristics provide a probabilistic outlook of how certain varieties will perform when planted in different regions. Given changing global climate conditions, this model demonstrates the potential of using data to efficiently develop and grow crops adjusted to climate changes.

ContributorsDean, Arlen (Co-author) / Ozcan, Ozkan (Co-author) / Travis, Daniel (Co-author) / Gel, Esma (Thesis director) / Armbruster, Dieter (Committee member) / Parry, Sam (Committee member) / Industrial, Systems and Operations Engineering Program (Contributor) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Big Data Network Analysis of Genetic Variation and Gene Expression in Individuals with Breast Cancer

Description

The advent of big data analytics tools and frameworks has allowed for a plethora of new approaches to research and analysis, making data sets that were previously too large or complex more accessible and providing methods to collect, store, and investigate non-traditional data. These tools are starting to be applied…

The advent of big data analytics tools and frameworks has allowed for a plethora of new approaches to research and analysis, making data sets that were previously too large or complex more accessible and providing methods to collect, store, and investigate non-traditional data. These tools are starting to be applied in more creative ways, and are being used to improve upon traditional computation methods through distributed computing. Statistical analysis of expression quantitative trait loci (eQTL) data has classically been performed using the open source tool PLINK - which runs on high performance computing (HPC) systems. However, progress has been made in running the statistical analysis in the ecosystem of the big data framework Hadoop, resulting in decreased run time, reduced storage footprint, reduced job micromanagement and increased data accessibility. Now that the data can be more readily manipulated, analyzed and accessed, there are opportunities to use the modularity and power of Hadoop to further process the data. This project focuses on adding a component to the data pipeline that will perform graph analysis on the data. This will provide more insight into the relation between various genetic differences in individuals with breast cancer, and the resulting variation - if any - in gene expression. Further, the investigation will look to see if there is anything to be garnered from a perspective shift; applying tools used in classical networking contexts (such as the Internet) to genetically derived networks.

ContributorsRandall, Jacob Christopher (Author) / Buetow, Kenneth (Thesis director) / Meuth, Ryan (Committee member) / Almalih, Sara (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

The Future of Biological Big Data

Description

In recent years, biological research and clinical healthcare has been disrupted by the ability to retrieve vast amounts of information pertaining to an organism’s health and biological systems. From increasingly accessible wearables collecting realtime biometric data to cutting-edge high throughput biological sequencing methodologies providing snapshots of an organism’s molecular profile,…

In recent years, biological research and clinical healthcare has been disrupted by the ability to retrieve vast amounts of information pertaining to an organism’s health and biological systems. From increasingly accessible wearables collecting realtime biometric data to cutting-edge high throughput biological sequencing methodologies providing snapshots of an organism’s molecular profile, biological data is rapidly increasing in its prevalence. As more biological data continues to be harvested, artificial intelligence and machine learning are well positioned to aid in leveraging this big data for breakthrough scientific outcomes and revolutionized medical care. <br/><br/>The coming decade’s intersection between biology and computational science will be ripe with opportunities to utilize biological big data to advance human health and mitigate disease. Standardization, aggregation and centralization of this biological data will be critical to drawing novel scientific insights that will lead to a more robust understanding of disease etiology and therapeutic avenues. Future development of cheaper, more accessible molecular sensing technology, in conjunction with the emergence of more precise wearables, will pave the road to a truly personalized and preventative healthcare system. However, with these vast opportunities come significant threats. As biological big data advances, privacy and security concerns may hinder society's adoption of these technologies and subsequently dampen the positive impacts this information can have on society. Moreover, the openness of biological data serves as a national security threat given that this data can be used to identify medical vulnerabilities in a population, highlighting the dual-use implications of biological big data. <br/><br/>Additional factors to be considered by academia, private industry, and defense include the ongoing relationship between science and society at-large, as well as the political and social dimensions surrounding the public’s trust in science. Organizations that seek to contribute to the future of biological big data must also remain vigilant to equity, representation and bias in their data sets and data processing techniques. Finally, the positive impacts of biological big data lie on the foundation of responsible innovation, as these emerging technologies do not operate in standalone fashion but rather form a complex ecosystem.

ContributorsDave, Nikhil (Author) / Johnson, Brian David (Thesis director) / Dudley, Sean (Committee member) / Levinson, Rachel (Committee member) / School for the Future of Innovation in Society (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Big Data Generator and Evaluation of a Similarity Grouping Operator

Description

As Big Data becomes more relevant, existing grouping and clustering algorithms will need to be evaluated for their effectiveness with large amounts of data. Previous work in Similarity Grouping proposes a possible alternative to existing data analytics tools, which acts as a hybrid between fast grouping and insightful clustering. We,…

As Big Data becomes more relevant, existing grouping and clustering algorithms will need to be evaluated for their effectiveness with large amounts of data. Previous work in Similarity Grouping proposes a possible alternative to existing data analytics tools, which acts as a hybrid between fast grouping and insightful clustering. We, the SimCloud Team, proposed Distributed Similarity Group-by (DSG), a distributed implementation of Similarity Group By. Experimental results show that DSG is effective at generating meaningful clusters and has a lower runtime than K-Means, a commonly used clustering algorithm. This document presents my personal contributions to this team effort. The contributions include the multi-dimensional synthetic data generator, execution of the Increasing Scale Factor experiment, and presentations at the NCURIE Symposium and the SISAP 2019 Conference.

ContributorsWallace, Xavier Guillermo (Author) / Silva, Yasin (Thesis director) / Kuai, Xu (Committee member) / School for the Future of Innovation in Society (Contributor) / School of Mathematical and Natural Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-12

Filtering by