This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.

Displaying 1 - 2 of 2
Filtering by

Clear all filters

168435-Thumbnail Image.png
Description
Artificial Intelligence, as the hottest research topic nowadays, is mostly driven by data. There is no doubt that data is the king in the age of AI. However, natural high-quality data is precious and rare. In order to obtain enough and eligible data to support AI tasks, data processing is

Artificial Intelligence, as the hottest research topic nowadays, is mostly driven by data. There is no doubt that data is the king in the age of AI. However, natural high-quality data is precious and rare. In order to obtain enough and eligible data to support AI tasks, data processing is always required. To be even worse, the data preprocessing tasks are often dull and heavy, which require huge human labors to deal with. Statistics show 70% - 80% of the data scientists' time is spent on data integration process. Among various reasons, schema changes that commonly exist in the data warehouse are one significant obstacle that impedes the automation of the end-to-end data integration process. Traditional data integration applications rely on data processing operators such as join, union, aggregation and so on. Those operations are fragile and can be easily interrupted by schema changes. Whenever schema changes happen, the data integration applications will require human labors to solve the interruptions and downtime. The industries as well as the data scientists need a new mechanism to handle the schema changes in data integration tasks. This work proposes a new direction of data integration applications based on deep learning models. The data integration problem is defined in the scenario of integrating tabular-format data with natural schema changes, using the cell-based data abstraction. In addition, data augmentation and adversarial learning are investigated to boost the model robustness to schema changes. The experiments are tested on two real-world data integration scenarios, and the results demonstrate the effectiveness of the proposed approach.
ContributorsWang, Zijie (Author) / Zou, Jia (Thesis advisor) / Baral, Chitta (Committee member) / Candan, K. Selcuk (Committee member) / Arizona State University (Publisher)
Created2021
168430-Thumbnail Image.png
Description
T-cells are an integral component of the immune system, enabling the body to distinguish between pathogens and the self. The primary mechanism which enables this is their T-cell receptors (TCR) which bind to antigen epitopes foreign to the body. This detection mechanism allows the T-cell to determine when an immune

T-cells are an integral component of the immune system, enabling the body to distinguish between pathogens and the self. The primary mechanism which enables this is their T-cell receptors (TCR) which bind to antigen epitopes foreign to the body. This detection mechanism allows the T-cell to determine when an immune response is necessary. The computational prediction of TCR-epitope binding is important to researchers for both medical applications and for furthering their understanding of the biological mechanisms that impact immunity. Models which have been developed for this purpose fail to account for the interrelationships between amino acids and demonstrate poor out-of-sample performance. Small changes to the amino acids in these protein sequences can drastically change their structure and function. In recent years, attention-based deep learning models have shown success in their ability to learn rich contextual representations of data. To capture the contextual biological relationships between the amino acids, a multi-head self-attention model was created to predict the binding affinity between given TCR and epitope sequences. By learning the structural nuances of the sequences, this model is able to improve upon existing model performance and grant insights into the underlying mechanisms which impact binding.
ContributorsCai, Michael Ray (Author) / Lee, Heewook (Thesis advisor) / Bang, Seojin (Committee member) / Baral, Chitta (Committee member) / Arizona State University (Publisher)
Created2021