Filtering by
The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner.
The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability.
The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability.
The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control.
Breast cancer is one of the most common types of cancer worldwide. Early detection and diagnosis are crucial for improving the chances of successful treatment and survival. In this thesis, many different machine learning algorithms were evaluated and compared to predict breast cancer malignancy from diagnostic features extracted from digitized images of breast tissue samples, called fine-needle aspirates. Breast cancer diagnosis typically involves a combination of mammography, ultrasound, and biopsy. However, machine learning algorithms can assist in the detection and diagnosis of breast cancer by analyzing large amounts of data and identifying patterns that may not be discernible to the human eye. By using these algorithms, healthcare professionals can potentially detect breast cancer at an earlier stage, leading to more effective treatment and better patient outcomes. The results showed that the gradient boosting classifier performed the best, achieving an accuracy of 96% on the test set. This indicates that this algorithm can be a useful tool for healthcare professionals in the early detection and diagnosis of breast cancer, potentially leading to improved patient outcomes.
(LC-MS/MS) is used to identify and quantify peptides and proteins. LC-MS/MS produces mass spectra, which must be searched by one or more engines, which employ
algorithms to match spectra to theoretical spectra derived from a reference database.
These engines identify and characterize proteins and their component peptides. By
training a convolutional neural network on a dataset of over 6 million MS/MS spectra
derived from human proteins, we aim to create a tool that can quickly and effectively
identify spectra as peptides prior to database searching. This can significantly reduce search space and thus run time for database searches, thereby accelerating LCMS/MS-based proteomics data acquisition. Additionally, by training neural networks
on labels derived from the search results of three different database search engines, we
aim to examine and compare which features are best identified by individual search
engines, a neural network, or a combination of these.
This thesis is part of a collaboration between ASU’s Interactive Robotics Laboratory and NASA’s Jet Propulsion Laboratory. In this thesis, the training pipeline from Sharma’s paper “Pose Estimation for Non-Cooperative Spacecraft Rendezvous Using Convolutional Neural Networks” was modified to perform pose estimation on a complex object - specifically, a segment of a hollow truss. After initial attempts to replicate the architecture used in the paper and train solely on synthetic images, a combination of synthetic dataset generation and transfer learning on an ImageNet-pretrained AlexNet model was implemented to mitigate the difficulty of gathering large amounts of real-world data. Experimentation with pose estimation accuracy and hyperparameters of the model resulted in gradual test accuracy improvement, and future work is suggested to improve pose estimation for complex objects with some form of rotational symmetry.