Search Content

Matching Items (3)

Filtering by

All Subjects: Big Data
Creators: School for the Future of Innovation in Society

Data Analytics to Identify the Genetic Basis for Resilience to Temperature Stress in Soybeans

Description

This paper explores the ability to predict yields of soybeans based on genetics and environmental factors. Based on the biology of soybeans, it has been shown that yields are best when soybeans grow within a certain temperature range. The event a soybean is exposed to temperature outside their accepted range is labeled as an instance of stress. Currently, there are few models that use genetic information to predict how crops may respond to stress. Using data provided by an agricultural business, a model was developed that can categorically label soybean varieties by their yield response to stress using genetic data. The model clusters varieties based on their yield production in response to stress. The clustering criteria is based on variance distribution and correlation. A logistic regression is then fitted to identify significant gene markers in varieties with minimal yield variance. Such characteristics provide a probabilistic outlook of how certain varieties will perform when planted in different regions. Given changing global climate conditions, this model demonstrates the potential of using data to efficiently develop and grow crops adjusted to climate changes.

ContributorsDean, Arlen (Co-author) / Ozcan, Ozkan (Co-author) / Travis, Daniel (Co-author) / Gel, Esma (Thesis director) / Armbruster, Dieter (Committee member) / Parry, Sam (Committee member) / Industrial, Systems and Operations Engineering Program (Contributor) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

The Future of Biological Big Data

Description

In recent years, biological research and clinical healthcare has been disrupted by the ability to retrieve vast amounts of information pertaining to an organism’s health and biological systems. From increasingly accessible wearables collecting realtime biometric data to cutting-edge high throughput biological sequencing methodologies providing snapshots of an organism’s molecular profile, biological data is rapidly increasing in its prevalence. As more biological data continues to be harvested, artificial intelligence and machine learning are well positioned to aid in leveraging this big data for breakthrough scientific outcomes and revolutionized medical care. <br/><br/>The coming decade’s intersection between biology and computational science will be ripe with opportunities to utilize biological big data to advance human health and mitigate disease. Standardization, aggregation and centralization of this biological data will be critical to drawing novel scientific insights that will lead to a more robust understanding of disease etiology and therapeutic avenues. Future development of cheaper, more accessible molecular sensing technology, in conjunction with the emergence of more precise wearables, will pave the road to a truly personalized and preventative healthcare system. However, with these vast opportunities come significant threats. As biological big data advances, privacy and security concerns may hinder society's adoption of these technologies and subsequently dampen the positive impacts this information can have on society. Moreover, the openness of biological data serves as a national security threat given that this data can be used to identify medical vulnerabilities in a population, highlighting the dual-use implications of biological big data. <br/><br/>Additional factors to be considered by academia, private industry, and defense include the ongoing relationship between science and society at-large, as well as the political and social dimensions surrounding the public’s trust in science. Organizations that seek to contribute to the future of biological big data must also remain vigilant to equity, representation and bias in their data sets and data processing techniques. Finally, the positive impacts of biological big data lie on the foundation of responsible innovation, as these emerging technologies do not operate in standalone fashion but rather form a complex ecosystem.

ContributorsDave, Nikhil (Author) / Johnson, Brian David (Thesis director) / Dudley, Sean (Committee member) / Levinson, Rachel (Committee member) / School for the Future of Innovation in Society (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Big Data Generator and Evaluation of a Similarity Grouping Operator

Description

As Big Data becomes more relevant, existing grouping and clustering algorithms will need to be evaluated for their effectiveness with large amounts of data. Previous work in Similarity Grouping proposes a possible alternative to existing data analytics tools, which acts as a hybrid between fast grouping and insightful clustering. We, the SimCloud Team, proposed Distributed Similarity Group-by (DSG), a distributed implementation of Similarity Group By. Experimental results show that DSG is effective at generating meaningful clusters and has a lower runtime than K-Means, a commonly used clustering algorithm. This document presents my personal contributions to this team effort. The contributions include the multi-dimensional synthetic data generator, execution of the Increasing Scale Factor experiment, and presentations at the NCURIE Symposium and the SISAP 2019 Conference.

ContributorsWallace, Xavier Guillermo (Author) / Silva, Yasin (Thesis director) / Kuai, Xu (Committee member) / School for the Future of Innovation in Society (Contributor) / School of Mathematical and Natural Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-12