Matching Items (1,068)
Filtering by

Clear all filters

161579-Thumbnail Image.png
Description
Infectious diseases spread at a rapid rate, due to the increasing mobility of the human population. It is important to have a variety of containment and assessment strategies to prevent and limit their spread. In the on-going COVID-19 pandemic, telehealth services including daily health surveys are used to study the

Infectious diseases spread at a rapid rate, due to the increasing mobility of the human population. It is important to have a variety of containment and assessment strategies to prevent and limit their spread. In the on-going COVID-19 pandemic, telehealth services including daily health surveys are used to study the prevalence and severity of the disease. Daily health surveys can also help to study the progression and fluctuation of symptoms as recalling, tracking, and explaining symptoms to doctors can often be challenging for patients. Data aggregates collected from the daily health surveys can be used to identify the surge of a disease in a community. This thesis enhances a well-known boosting algorithm, XGBoost, to predict COVID-19 from the anonymized self-reported survey responses provided by Carnegie Mellon University (CMU) - Delphi research group in collaboration with Facebook. Despite the tremendous COVID-19 surge in the United States, this survey dataset is highly imbalanced with 84% negative COVID-19 cases and 16% positive cases. It is tedious to learn from an imbalanced dataset, especially when the dataset could also be noisy, as seen commonly in self-reported surveys. This thesis addresses these challenges by enhancing XGBoost with a tunable loss function, ?-loss, that interpolates between the exponential loss (? = 1/2), the log-loss (? = 1), and the 0-1 loss (? = ∞). Results show that tuning XGBoost with ?-loss can enhance performance over the standard XGBoost with log-loss (? = 1).
ContributorsVikash Babu, Gokulan (Author) / Sankar, Lalitha (Thesis advisor) / Berisha, Visar (Committee member) / Zhao, Ming (Committee member) / Trieu, Ni (Committee member) / Arizona State University (Publisher)
Created2021
161510-Thumbnail Image.png
Description
The proliferation of semantic data in the form of RDF (Resource Description Framework) triples demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing

The proliferation of semantic data in the form of RDF (Resource Description Framework) triples demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second is that solutions are optimized for certain types of query patterns and don’t necessarily work well for all types, and third is concerned with reducing pre-processing cost. Therefore, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL (SPARQL Protocol and RDF Query Language) query performance regardless of its pattern shape with minimized pre-processing overhead. In this context, the first contribution of this work is a distributed RDF data partitioning schema called 3CStore that extends the existing VP (Vertical Partitioning) approach by using a subset of triples from the VP tables based on different join correlations. This approach speeds up queries at the cost of additional pre-processing overhead. To solve this, a relational partitioning schema called VPExp was developed by splitting predicates based on explicit type information of objects. This approach gains a significant query performance only for the specific type of query where the object is bound to a value for a particular predicate. To get efficient query performance on a wide range of query patterns, an improved solution is proposed by extending the existing Property Table approach to Subset-Property Table and combined with the VP approach. Further investigation on distributed RDF processing and querying systems based on typical use cases led to a novel relational partitioning schema called PTP (Property Table Partitioning) that further partitions the whole Property Table into the number of unique properties to minimize query input size and join operations during query evaluation. Finally, an RDF data management system based on the SPARQL-over-SQL approach called S3QLRDF is developed that generates the optimal query execution plan using statistics of PTP tables to provide efficient SPARQL query processing on a distributed system.
ContributorsHassan, P M Mahmudul Mahmudul (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Davulcu, Hasan (Committee member) / Sarwat Abdelghany Aly Elsayed, Mohamed (Committee member) / Arizona State University (Publisher)
Created2021
161544-Thumbnail Image.png
Description
Embedded software is different in many aspects to traditional software; as such, a software developer may face issues when attempting to transition from traditional to embedded software development. This thesis explores providing feedback and applying optimizations at the source code level of embedded software. The aim is to measure the

Embedded software is different in many aspects to traditional software; as such, a software developer may face issues when attempting to transition from traditional to embedded software development. This thesis explores providing feedback and applying optimizations at the source code level of embedded software. The aim is to measure the impact of these optimizations on teaching embedded software design principles, as well as assessing the relative success of each optimization in terms of a variety of metrics. There are many considerations when altering code and is a known limitation imposed by most software optimization schemes. By applying optimizations at the source level, the aim is to demonstrate what the optimizations do and how they provide value to the resulting software. In order to fulfill these goals, the Embedded C Source Optimizer has been developed, which is used to import and export code, select which optimizations are applied, and provide feedback to the end user. This utility abstracts away the lower level operations performed by each optimization, while conveying the resulting changes to the end user. Since embedded systems are generally quite limited compared to modern computers, someone transitioning from traditional software design to embedded software may find it challenging to understand how to overcome these limitations. Clearly conveying means to improve a naive implementation of an embedded program aids through demonstrating what changes need to be made to satisfy embedded design rules. The optimizations which the utility can apply range from simple replacement operations to more complex applications of implicit utilization of built-in hardware peripherals on supported microcontrollers. Each optimization comes with its own set of considerations, risks, and potential level of improvement to the resulting code. These optimization options are evaluated by comparing embedded software before and after each option is applied through a variety of metrics, allowing the relative success of each to be determined as effectively as possible. The end goal for this utility is to aid in crossing the hurdle from traditional software to embedded software in a comprehensive and educational manner, with the provided optimization options acting as an avenue for teaching embedded concepts.
ContributorsLisonbee, Tanner Boyd (Author) / Heinrichs, Robert (Thesis advisor) / Acuna, Ruben (Committee member) / Jordan, Shawn (Committee member) / Arizona State University (Publisher)
Created2021
Description
Self-Driving cars are a long-lasting ambition for many AI scientists and engineers. In the last decade alone, many self-driving cars like Google Waymo, Tesla Autopilot, Uber, etc. have been roaming the streets of many cities. As a rapidly expanding field, researchers all over the world are attempting to develop more

Self-Driving cars are a long-lasting ambition for many AI scientists and engineers. In the last decade alone, many self-driving cars like Google Waymo, Tesla Autopilot, Uber, etc. have been roaming the streets of many cities. As a rapidly expanding field, researchers all over the world are attempting to develop more safe and efficient AI agents that can navigate through our cities. However, driving is a very complex task to master even for a human, let alone the challenges in developing robots to do the same. It requires attention and inputs from the surroundings of the car, and it is nearly impossible for us to program all the possible factors affecting this complex task. As a solution, imitation learning was introduced, wherein the agents learn a policy, mapping the observations to the actions through demonstrations given by humans. Through imitation learning, one could easily teach self-driving cars the expected behavior in many scenarios. Despite their autonomous nature, it is undeniable that humans play a vital role in the development and execution of safe and trustworthy self-driving cars and hence form the strongest link in this application of Human-Robot Interaction. Several approaches were taken to incorporate this link between humans and self-driving cars, one of which involves the communication of human's navigational instruction to self-driving cars. The communicative channel provides humans with control over the agent’s decisions as well as the ability to guide them in real-time. In this work, the abilities of imitation learning in creating a self-driving agent that can follow natural language instructions given by humans based on environmental objects’ descriptions were explored. The proposed model architecture is capable of handling latent temporal context in these instructions thus making the agent capable of taking multiple decisions along its course. The work shows promising results that push the boundaries of natural language instructions and their complexities in navigating self-driving cars through towns.
ContributorsMoudhgalya, Nithish B (Author) / Amor, Hani Ben (Thesis advisor) / Baral, Chitta (Committee member) / Yang, Yezhou (Committee member) / Zhang, Wenlong (Committee member) / Arizona State University (Publisher)
Created2021
161549-Thumbnail Image.png
Description
Traditionally, databases have been categorized as either row-oriented or column-oriented databases. Row-oriented databases store each row of the table’s data contiguously onto the disk whereas column-oriented databases store each column’s data contiguously onto the disk. In recent years, columnar database management systems are becoming increasingly popular because deep and narrow

Traditionally, databases have been categorized as either row-oriented or column-oriented databases. Row-oriented databases store each row of the table’s data contiguously onto the disk whereas column-oriented databases store each column’s data contiguously onto the disk. In recent years, columnar database management systems are becoming increasingly popular because deep and narrow queries are faster on them. Hence, column-oriented databases are highly optimized to be used with analytical (OLAP) workloads (Mike Freedman 2019). That is why they are very frequently used in business intelligence (BI), data warehouses, etc., which involve working with large data sets, intensive queries and aggregated computing. As the size of data keeps growing, efficient compression of data becomes an important consideration for these databases to optimize storage as well as improve query performance. Since column-oriented databases store data of the same data type contiguously, most modern compression techniques provide better compression ratios as compared to row-oriented databases. This thesis introduces a new compression technique called SA128 for column-oriented databases that performs a column-wise compression of database tables. SA128 is a multi-stage compression technique which performs a column-wise compression followed by a table-wide compression of database tables. In the first stage, SA128 performs an analysis based on the characteristics of data (such as data type and distribution) and determines which combination of lossless compression algorithms would result in the best compression ratio. In the second phase, SA128 uses an entropy encoding technique such as rANS (Duda, J., 2013) to further improve the compression ratio.
ContributorsAnand, Sukhpreet Singh (Author) / Bansal, Ajay (Thesis advisor) / Heinrichs, Robert R (Committee member) / Gonzalez-Sanchez, Javier (Committee member) / Arizona State University (Publisher)
Created2021
161413-Thumbnail Image.png
Description
Student retention is a critical metric for many universities whose intention is to support student success. The goal of this thesis is to create retention models utilizing machine learning (ML) techniques. The factors explored in this research include only those known during the admissions process. These models have two goals:

Student retention is a critical metric for many universities whose intention is to support student success. The goal of this thesis is to create retention models utilizing machine learning (ML) techniques. The factors explored in this research include only those known during the admissions process. These models have two goals: first, to correctly predict as many non-returning students as possible, while minimizing the number of students who are falsely predicted as non-returning. Next, to identify important features in student retention and provide a practical explanation for a student's decision to no longer persist. The models are then used to provide outreach to students that need more support. The findings of this research indicate that the current top performing model is Adaboost which is able to successfully predict non-returning students with an accuracy of 54 percent.
ContributorsWade, Alexis N (Author) / Gel, Esma (Thesis advisor) / Yan, Hao (Thesis advisor) / Pavlic, Theodore (Committee member) / Arizona State University (Publisher)
Created2021
Description

Standardization is sorely lacking in the field of musical machine learning. This thesis project endeavors to contribute to this standardization by training three machine learning models on the same dataset and comparing them using the same metrics. The music-specific metrics utilized provide more relevant information for diagnosing the shortcomings of

Standardization is sorely lacking in the field of musical machine learning. This thesis project endeavors to contribute to this standardization by training three machine learning models on the same dataset and comparing them using the same metrics. The music-specific metrics utilized provide more relevant information for diagnosing the shortcomings of each model.

ContributorsHilliker, Jacob (Author) / Li, Baoxin (Thesis director) / Libman, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)
Created2021-12
161047-Thumbnail Image.png
ContributorsHilliker, Jacob (Author) / Li, Baoxin (Thesis director) / Libman, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)
Created2021-12
161048-Thumbnail Image.jpg
ContributorsHilliker, Jacob (Author) / Li, Baoxin (Thesis director) / Libman, Jeffrey (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)
Created2021-12
161889-Thumbnail Image.png
Description
Systematic Reviews (SRs) aim to synthesize the totality of evidence for clinical practice and are important in making clinical practice guidelines and health policy decisions. However, conducting SRs manually is a laborious and time-consuming process. This challenge is growing due to the increase in the number of databases to search

Systematic Reviews (SRs) aim to synthesize the totality of evidence for clinical practice and are important in making clinical practice guidelines and health policy decisions. However, conducting SRs manually is a laborious and time-consuming process. This challenge is growing due to the increase in the number of databases to search and the papers being published. Hence, the automation of SRs is an essential task. The goal of this thesis work is to develop Natural Language Processing (NLP)-based classifiers to automate the title and abstract-based screening for clinical SRs based on inclusion/exclusion criteria. In clinical SRs, a high-sensitivity system is a key requirement. Most existing methods for SRs use binary classification systems trained on labeled data to predict inclusion/exclusion. While previous studies have shown that NLP-based classification methods can automate title and abstract-based screening for SRs, methods for achieving high-sensitivity have not been empirically studied. In addition, the training strategy for binary classification has several limitations: (1) it ignores the inclusion/exclusion criteria, (2) lacks generalization ability, (3) suffers from low resource data, and (4) fails to achieve reasonable precision at high-sensitivity levels. This thesis work presents contributions to several aspects of the clinical systematic review domain. First, it presents an empirical study of NLP-based supervised text classification and high-sensitivity methods on datasets developed from six different SRs in the clinical domain. Second, this thesis work provides a novel approach to view SR as a Question Answering (QA) problem in order to overcome the limitations of the binary classification training strategy; and propose a more general abstract screening model for different SRs. Finally, this work provides a new QA-based dataset for six different SRs which is made available to the community.
ContributorsParmar, Mihir Prafullsinh (Author) / Baral, Chitta (Thesis advisor) / Devarakonda, Murthy (Thesis advisor) / Riaz, Irbaz B (Committee member) / Arizona State University (Publisher)
Created2021