Search Content

Learning from asymmetric models and matched pairs

Description

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus…

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.

ContributorsKoh, Derek (Author) / Runger, George C. (Thesis advisor) / Wu, Tong (Committee member) / Pan, Rong (Committee member) / Cesta, John (Committee member) / Arizona State University (Publisher)

Created2013

Characterization of coronary atherosclerotic plaques by dual energy computed tomography

Description

Coronary heart disease (CHD) is the most prevalent cause of death worldwide. Atherosclerosis which is the condition of plaque buildup on the inside of the coronary artery wall is the main cause of CHD. Rupture of unstable atherosclerotic coronary plaque is known to be the cause of acute coronary syndrome.…

Coronary heart disease (CHD) is the most prevalent cause of death worldwide. Atherosclerosis which is the condition of plaque buildup on the inside of the coronary artery wall is the main cause of CHD. Rupture of unstable atherosclerotic coronary plaque is known to be the cause of acute coronary syndrome. The composition of plaque is important for detection of plaque vulnerability. Due to prognostic importance of early stage identification, non-invasive assessment of plaque characterization is necessary. Computed tomography (CT) has emerged as a non-invasive alternative to coronary angiography. Recently, dual energy CT (DECT) coronary angiography has been performed clinically. DECT scanners use two different X-ray energies in order to determine the energy dependency of tissue attenuation values for each voxel. They generate virtual monochromatic energy images, as well as material basis pair images. The characterization of plaque components by DECT is still an active research topic since overlap between the CT attenuations measured in plaque components and contrast material shows that the single mean density might not be an appropriate measure for characterization. This dissertation proposes feature extraction, feature selection and learning strategies for supervised characterization of coronary atherosclerotic plaques. In my first study, I proposed an approach for calcium quantification in contrast-enhanced examinations of the coronary arteries, potentially eliminating the need for an extra non-contrast X-ray acquisition. The ambiguity of separation of calcium from contrast material was solved by using virtual non-contrast images. Additional attenuation data provided by DECT provides valuable information for separation of lipid from fibrous plaque since the change of their attenuation as the energy level changes is different. My second study proposed these as the input to supervised learners for a more precise classification of lipid and fibrous plaques. My last study aimed at automatic segmentation of coronary arteries characterizing plaque components and lumen on contrast enhanced monochromatic X-ray images. This required extraction of features from regions of interests. This study proposed feature extraction strategies and selection of important ones. The results show that supervised learning on the proposed features provides promising results for automatic characterization of coronary atherosclerotic plaques by DECT.

ContributorsYamak, Didem (Author) / Akay, Metin (Thesis advisor) / Muthuswamy, Jit (Committee member) / Akay, Yasemin (Committee member) / Pavlicek, William (Committee member) / Vernon, Brent (Committee member) / Arizona State University (Publisher)

Created2013

Simultaneous variable and feature group selection in heterogeneous learning: optimization and applications

Description

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous…

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous data, it is interesting to design efficient machine learning models that are capable of performing variable selection and feature group (data source) selection simultaneously (a.k.a bi-level selection). In this thesis, I carry out research along this direction with a particular focus on designing efficient optimization algorithms. I start with a unified bi-level learning model that contains several existing feature selection models as special cases. Then the proposed model is further extended to tackle the block-wise missing data, one of the major challenges in the diagnosis of Alzheimer's Disease (AD). Moreover, I propose a novel interpretable sparse group feature selection model that greatly facilitates the procedure of parameter tuning and model selection. Last but not least, I show that by solving the sparse group hard thresholding problem directly, the sparse group feature selection model can be further improved in terms of both algorithmic complexity and efficiency. Promising results are demonstrated in the extensive evaluation on multiple real-world data sets.

ContributorsXiang, Shuo (Author) / Ye, Jieping (Thesis advisor) / Mittelmann, Hans D (Committee member) / Davulcu, Hasan (Committee member) / He, Jingrui (Committee member) / Arizona State University (Publisher)

Created2014

Query expansion for handling exploratory and ambiguous keyword queries

Description

Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical…

Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical options, in order to narrow down the search and reach the desired result. Typical corpus-driven keyword query expansion approaches return popular words in the results as expanded queries. These empirical methods fail to cover all semantics of categories present in the query results. More importantly these methods do not consider the semantic relationship between the keywords featured in an expanded query. Contrary to a normal keyword search setting, these factors are non-trivial in an exploratory and ambiguous query setting where the user's precise discernment of different categories present in the query results is more important for making subsequent search decisions. In this thesis, I propose a new framework for keyword query expansion: generating a set of queries that correspond to the categorization of original query results, which is referred as Categorizing query expansion. Two approaches of algorithms are proposed, one that performs clustering as pre-processing step and then generates categorizing expanded queries based on the clusters. The other category of algorithms handle the case of generating quality expanded queries in the presence of imperfect clusters.

ContributorsNatarajan, Sivaramakrishnan (Author) / Chen, Yi (Thesis advisor) / Candan, Selcuk (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

System complexity reduction via feature selection

Description

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve…

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.

ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2011

Machine learning methods for biosignature discovery

Description

Alzheimer's Disease (AD) is the most common form of dementia observed in elderly patients and has significant social-economic impact. There are many initiatives which aim to capture leading causes of AD. Several genetic, imaging, and biochemical markers are being explored to monitor progression of AD and explore treatment and detection…

Alzheimer's Disease (AD) is the most common form of dementia observed in elderly patients and has significant social-economic impact. There are many initiatives which aim to capture leading causes of AD. Several genetic, imaging, and biochemical markers are being explored to monitor progression of AD and explore treatment and detection options. The primary focus of this thesis is to identify key biomarkers to understand the pathogenesis and prognosis of Alzheimer's Disease. Feature selection is the process of finding a subset of relevant features to develop efficient and robust learning models. It is an active research topic in diverse areas such as computer vision, bioinformatics, information retrieval, chemical informatics, and computational finance. In this work, state of the art feature selection algorithms, such as Student's t-test, Relief-F, Information Gain, Gini Index, Chi-Square, Fisher Kernel Score, Kruskal-Wallis, Minimum Redundancy Maximum Relevance, and Sparse Logistic regression with Stability Selection have been extensively exploited to identify informative features for AD using data from Alzheimer's Disease Neuroimaging Initiative (ADNI). An integrative approach which uses blood plasma protein, Magnetic Resonance Imaging, and psychometric assessment scores biomarkers has been explored. This work also analyzes the techniques to handle unbalanced data and evaluate the efficacy of sampling techniques. Performance of feature selection algorithm is evaluated using the relevance of derived features and the predictive power of the algorithm using Random Forest and Support Vector Machine classifiers. Performance metrics such as Accuracy, Sensitivity and Specificity, and area under the Receiver Operating Characteristic curve (AUC) have been used for evaluation. The feature selection algorithms best suited to analyze AD proteomics data have been proposed. The key biomarkers distinguishing healthy and AD patients, Mild Cognitive Impairment (MCI) converters and non-converters, and healthy and MCI patients have been identified.

ContributorsDubey, Rashmi (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Wu, Tong (Committee member) / Arizona State University (Publisher)

Created2012

Comparison of feature selection methods for robust dexterous decoding of finger movements from the primary motor cortex of a non-human primate using support vector machine

Description

Robust and stable decoding of neural signals is imperative for implementing a useful neuroprosthesis capable of carrying out dexterous tasks. A nonhuman primate (NHP) was trained to perform combined flexions of the thumb, index and middle fingers in addition to individual flexions and extensions of the same digits. An array…

Robust and stable decoding of neural signals is imperative for implementing a useful neuroprosthesis capable of carrying out dexterous tasks. A nonhuman primate (NHP) was trained to perform combined flexions of the thumb, index and middle fingers in addition to individual flexions and extensions of the same digits. An array of microelectrodes was implanted in the hand area of the motor cortex of the NHP and used to record action potentials during finger movements. A Support Vector Machine (SVM) was used to classify which finger movement the NHP was making based upon action potential firing rates. The effect of four feature selection techniques, Wilcoxon signed-rank test, Relative Importance, Principal Component Analysis, and Mutual Information Maximization was compared based on SVM classification performance. SVM classification was used to examine the functional parameters of (i) efficacy (ii) endurance to simulated failure and (iii) longevity of classification. The effect of using isolated-neuron and multi-unit firing rates was compared as the feature vector supplied to the SVM. The best classification performance was on post-implantation day 36, when using multi-unit firing rates the worst classification accuracy resulted from features selected with Wilcoxon signed-rank test (51.12 ± 0.65%) and the best classification accuracy resulted from Mutual Information Maximization (93.74 ± 0.32%). On this day when using single-unit firing rates, the classification accuracy from the Wilcoxon signed-rank test was 88.85 ± 0.61 % and Mutual Information Maximization was 95.60 ± 0.52% (degrees of freedom =10, level of chance =10%)

ContributorsPadmanaban, Subash (Author) / Greger, Bradley (Thesis advisor) / Santello, Marco (Thesis advisor) / Helms Tillery, Stephen (Committee member) / Arizona State University (Publisher)

Created2015

Community, Collaboration, and Creativity: An Exploration of Original Characters

Description

How do you convey what’s interesting and important to you as an artist in a digital world of constantly shifting attentions? For many young creatives, the answer is original characters, or OCs. An OC is a character that an artist creates for personal enjoyment, whether based on an already existing…

How do you convey what’s interesting and important to you as an artist in a digital world of constantly shifting attentions? For many young creatives, the answer is original characters, or OCs. An OC is a character that an artist creates for personal enjoyment, whether based on an already existing story or world, or completely from their own imagination.
As creations made for purely personal interests, OCs are an excellent elevator pitch to talk one creative to another, opening up opportunities for connection in a world where communication is at our fingertips but personal connection is increasingly harder to make. OCs encourage meaningful interaction by offering themselves as muses, avatars, and story pieces, and so much more, where artists can have their characters interact with other creatives through many different avenues such as art-making, table top games, or word of mouth.

In this thesis, I explore the worlds and aesthetics of many creators and their original characters through qualitative research and collaborative art-making. I begin with a short survey of my creative peers, asking general questions about their characters and thoughts on OCs, then move to sketching characters from various creators. I focus my research to a group of seven core creators and their characters, whom I interview and work closely with in order to create a series of seven final paintings of their original characters.

ContributorsCote, Jacqueline (Author) / Button, Melissa M (Thesis director) / Dove-Viebahn, Aviva (Committee member) / School of Art (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Listen Up! Developing Accessible Educational Materials for Aspiring Audio Engineers

Description

In the past ten years, the United States’ sound recording industries have experienced significant decreases in employment opportunities for aspiring audio engineers from economic imbalances in the music industry’s digital streaming era and reductions in government funding for career and technical education (CTE). The Recording Industry Association of America reports…

In the past ten years, the United States’ sound recording industries have experienced significant decreases in employment opportunities for aspiring audio engineers from economic imbalances in the music industry’s digital streaming era and reductions in government funding for career and technical education (CTE). The Recording Industry Association of America reports promises of music industry sustainability based on increasing annual revenues in paid streaming services and artists’ high creative demand. The rate of new audio engineer entries in the sound recording subsection of the music industry is not viable to support streaming artists’ high demand to engineer new music recordings. Offering CTE programs in secondary education is rare for aspiring engineers with insufficient accessibility to pursue a post-secondary or vocational education because of financial and academic limitations. These aspiring engineers seek alternatives for receiving an informal education in audio engineering on the Internet using video sharing services like YouTube to search for tutorials and improve their engineering skills. The shortage of accessible educational materials on the Internet restricts engineers from advancing their own audio engineering education, reducing opportunities to enter a desperate job market in need of independent, home studio-based engineers. Content creators on YouTube take advantage of this situation and commercialize their own video tutorial series for free and selling paid subscriptions to exclusive content. This is misleading for newer engineers because these tutorials omit important understandings of fundamental engineering concepts. Instead, content creators teach inflexible engineering methodologies that are mostly beneficial to their own way of thinking. Content creators do not often assess the incompatibility of teaching their own methodologies to potential entrants in a profession that demands critical thinking skills requiring applied fundamental audio engineering concepts and techniques. This project analyzes potential solutions to resolve the deficiencies in online audio engineering education and experiments with structuring simple, deliverable, accessible educational content and materials to new entries in audio engineering. Designing clear, easy to follow material to these new entries in audio engineering is essential for developing a strong understanding for the application of fundamental concepts in future engineers’ careers. Approaches to creating and designing educational content requires translating complex engineering concepts through simplified mediums that reduce limitations in learning for future audio engineers.

ContributorsBurns, Triston Connor (Author) / Tobias, Evan (Thesis director) / Libman, Jeff (Committee member) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Can You See Me?: Stories to Fight Erasure

Description

There has been a recent push for queer fiction, especially in the young adult genre, whose focus is gay and lesbian relationships. This growth is much needed in terms of visibility and the furthering of acceptance, but there are still subjects within the LGBTQ+ community that need to be addressed,…

There has been a recent push for queer fiction, especially in the young adult genre, whose focus is gay and lesbian relationships. This growth is much needed in terms of visibility and the furthering of acceptance, but there are still subjects within the LGBTQ+ community that need to be addressed, including bisexual, asexual, and non-binary erasure. There are many people who claim that these identities do not exist, are labels used as a stepping stone on one's journey to discovering that they are homosexual, or are invented excuses for overtly promiscuous or prudish behavior. The existence of negative stereotypes, particularly those of non-binary individuals, is largely due to a lack of visibility and respectful representation within media and popular culture. However, there is still a dearth of non-binary content in popular literature outside of young adult fiction. Can You See Me? aims to fill the gap in bisexual, asexual, and non-binary representation in adult literature. Each of the four stories that make up this collection deals with an aspect of gender and/or sexuality that has been erased, ignored, or denied visibility in American popular culture. The first story, "We'll Grow Lemon Trees," examines bisexual erasure through the lens of sociolinguistics. A bisexual Romanian woman emigrates to Los Angeles in 1989 and must navigate a new culture, learn new languages, and try to move on from her past life under a dictatorship where speaking up could mean imprisonment or death. The second story "Up, Down, All Around," is about a young genderqueer child and their parents dealing with microaggressions, examining gender norms, and exploring personal identity through imaginary scenarios, each involving an encounter with an unknown entity and a colander. The third story, "Aces High," follows two asexual characters from the day they're born to when they are 28 years old, as they find themselves in pop culture. The two endure identity crises, gender discrimination, erasure, individual obsessions, and prejudice as they learn to accept themselves and embrace who they are. In the fourth and final story, "Mile Marker 72," a gay Mexican man must hide in plain sight as he deals with the death of his partner and coming out to his best friend, whose brother is his partner's murderer.

ContributorsOchser, Jordyn M. (Author) / Bell, Matt (Thesis director) / Free, Melissa (Committee member) / Department of English (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Filtering by