Search Content

Learning from asymmetric models and matched pairs

Description

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus…

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.

ContributorsKoh, Derek (Author) / Runger, George C. (Thesis advisor) / Wu, Tong (Committee member) / Pan, Rong (Committee member) / Cesta, John (Committee member) / Arizona State University (Publisher)

Created2013

Shi Ge, piano

ContributorsShi, Ge (Performer) / ASU Library. Music Library (Publisher)

Created2018-03-25

Characterization of coronary atherosclerotic plaques by dual energy computed tomography

Description

Coronary heart disease (CHD) is the most prevalent cause of death worldwide. Atherosclerosis which is the condition of plaque buildup on the inside of the coronary artery wall is the main cause of CHD. Rupture of unstable atherosclerotic coronary plaque is known to be the cause of acute coronary syndrome.…

Coronary heart disease (CHD) is the most prevalent cause of death worldwide. Atherosclerosis which is the condition of plaque buildup on the inside of the coronary artery wall is the main cause of CHD. Rupture of unstable atherosclerotic coronary plaque is known to be the cause of acute coronary syndrome. The composition of plaque is important for detection of plaque vulnerability. Due to prognostic importance of early stage identification, non-invasive assessment of plaque characterization is necessary. Computed tomography (CT) has emerged as a non-invasive alternative to coronary angiography. Recently, dual energy CT (DECT) coronary angiography has been performed clinically. DECT scanners use two different X-ray energies in order to determine the energy dependency of tissue attenuation values for each voxel. They generate virtual monochromatic energy images, as well as material basis pair images. The characterization of plaque components by DECT is still an active research topic since overlap between the CT attenuations measured in plaque components and contrast material shows that the single mean density might not be an appropriate measure for characterization. This dissertation proposes feature extraction, feature selection and learning strategies for supervised characterization of coronary atherosclerotic plaques. In my first study, I proposed an approach for calcium quantification in contrast-enhanced examinations of the coronary arteries, potentially eliminating the need for an extra non-contrast X-ray acquisition. The ambiguity of separation of calcium from contrast material was solved by using virtual non-contrast images. Additional attenuation data provided by DECT provides valuable information for separation of lipid from fibrous plaque since the change of their attenuation as the energy level changes is different. My second study proposed these as the input to supervised learners for a more precise classification of lipid and fibrous plaques. My last study aimed at automatic segmentation of coronary arteries characterizing plaque components and lumen on contrast enhanced monochromatic X-ray images. This required extraction of features from regions of interests. This study proposed feature extraction strategies and selection of important ones. The results show that supervised learning on the proposed features provides promising results for automatic characterization of coronary atherosclerotic plaques by DECT.

ContributorsYamak, Didem (Author) / Akay, Metin (Thesis advisor) / Muthuswamy, Jit (Committee member) / Akay, Yasemin (Committee member) / Pavlicek, William (Committee member) / Vernon, Brent (Committee member) / Arizona State University (Publisher)

Created2013

Kristina Shatuho, piano

ContributorsShatuho, Kristina (Performer) / ASU Library. Music Library (Publisher)

Created2018-03-27

Simultaneous variable and feature group selection in heterogeneous learning: optimization and applications

Description

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous…

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous data, it is interesting to design efficient machine learning models that are capable of performing variable selection and feature group (data source) selection simultaneously (a.k.a bi-level selection). In this thesis, I carry out research along this direction with a particular focus on designing efficient optimization algorithms. I start with a unified bi-level learning model that contains several existing feature selection models as special cases. Then the proposed model is further extended to tackle the block-wise missing data, one of the major challenges in the diagnosis of Alzheimer's Disease (AD). Moreover, I propose a novel interpretable sparse group feature selection model that greatly facilitates the procedure of parameter tuning and model selection. Last but not least, I show that by solving the sparse group hard thresholding problem directly, the sparse group feature selection model can be further improved in terms of both algorithmic complexity and efficiency. Promising results are demonstrated in the extensive evaluation on multiple real-world data sets.

ContributorsXiang, Shuo (Author) / Ye, Jieping (Thesis advisor) / Mittelmann, Hans D (Committee member) / Davulcu, Hasan (Committee member) / He, Jingrui (Committee member) / Arizona State University (Publisher)

Created2014

Daniel Carlisi, piano

ContributorsCarlisi, Daniel (Performer) / ASU Library. Music Library (Publisher)

Created2018-04-07

Query expansion for handling exploratory and ambiguous keyword queries

Description

Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical…

Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical options, in order to narrow down the search and reach the desired result. Typical corpus-driven keyword query expansion approaches return popular words in the results as expanded queries. These empirical methods fail to cover all semantics of categories present in the query results. More importantly these methods do not consider the semantic relationship between the keywords featured in an expanded query. Contrary to a normal keyword search setting, these factors are non-trivial in an exploratory and ambiguous query setting where the user's precise discernment of different categories present in the query results is more important for making subsequent search decisions. In this thesis, I propose a new framework for keyword query expansion: generating a set of queries that correspond to the categorization of original query results, which is referred as Categorizing query expansion. Two approaches of algorithms are proposed, one that performs clustering as pre-processing step and then generates categorizing expanded queries based on the clusters. The other category of algorithms handle the case of generating quality expanded queries in the presence of imperfect clusters.

ContributorsNatarajan, Sivaramakrishnan (Author) / Chen, Yi (Thesis advisor) / Candan, Selcuk (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

System complexity reduction via feature selection

Description

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve…

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.

ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2011

The incorporation of Greek folk melodies in the piano works of Yannis Constantinidis: with special consideration of the 22 songs and dances from the Dodecanese

Description

Yannis Constantinidis was the last of the handful of composers referred to collectively as the Greek National School. The members of this group strove to create a distinctive national style for Greece, founded upon a synthesis of Western compositional idioms with melodic, rhyhmic, and modal features of their local folk…

Yannis Constantinidis was the last of the handful of composers referred to collectively as the Greek National School. The members of this group strove to create a distinctive national style for Greece, founded upon a synthesis of Western compositional idioms with melodic, rhyhmic, and modal features of their local folk traditions. Constantinidis particularly looked to the folk melodies of his native Asia Minor and the nearby Dodecanese Islands. His musical output includes operettas, musical comedies, orchestral works, chamber and vocal music, and much piano music, all of which draws upon folk repertories for thematic material. The present essay examines how he incorporates this thematic material in his piano compositions, written between 1943 and 1971, with a special focus on the 22 Songs and Dances from the Dodecanese. In general, Constantinidis's pianistic style is expressed through miniature pieces in which the folk tunes are presented mostly intact, but embedded in accompaniment based in early twentieth-century modal harmony. Following the dictates of the founding members of the Greek National School, Manolis Kalomiris and Georgios Lambelet, the modal basis of his harmonic vocabulary is firmly rooted in the characteristics of the most common modes of Greek folk music. A close study of his 22 Songs and Dances from the Dodecanese not only offers a valuable insight into his harmonic imagination, but also demonstrates how he subtly adapts his source melodies. This work also reveals his care in creating a musical expression of the words of the original folk songs, even in purely instrumental compositon.

ContributorsSavvidou, Dina (Author) / Hamilton, Robert (Thesis advisor) / Little, Bliss (Committee member) / Meir, Baruch (Committee member) / Thompson, Janice M (Committee member) / Arizona State University (Publisher)

Created2011

Six Chinese piano pieces of the twentieth century: a recording project

Description

This paper describes six representative works by twentieth-century Chinese composers: Jian-Zhong Wang, Er-Yao Lin, Yi-Qiang Sun, Pei-Xun Chen, Ying-Hai Li, and Yi Chen, which are recorded by the author on the CD. The six pieces selected for the CD all exemplify traits of Nationalism, with or without Western influences. Of…

This paper describes six representative works by twentieth-century Chinese composers: Jian-Zhong Wang, Er-Yao Lin, Yi-Qiang Sun, Pei-Xun Chen, Ying-Hai Li, and Yi Chen, which are recorded by the author on the CD. The six pieces selected for the CD all exemplify traits of Nationalism, with or without Western influences. Of the six works on the CD, two are transcriptions of the Han Chinese folk-like songs, one is a composition in the style of the Uyghur folk music, two are transcriptions of traditional Chinese instrumental music dating back to the eighteenth century, and one is an original composition in a contemporary style using folk materials. Two of the composers, who studied in the United States, were strongly influenced by Western compositional style. The other four, who did not study abroad, retained traditional Chinese style in their compositions. The pianistic level of difficulty in these six pieces varies from intermediate to advanced level. This paper includes biographical information for the six composers, background information on the compositions, and a brief analysis of each work. The author was exposed to these six pieces growing up, always believing that they are beautiful and deserve to be appreciated. When the author came to the United States for her studies, she realized that Chinese compositions, including these six pieces, were not sufficiently known to her peers. This recording and paper are offered in the hopes of promoting a wider familiarity with Chinese music and culture.

ContributorsLuo, Yali, D.M.A (Author) / Hamilton, Robert (Thesis advisor) / Campbell, Andrew (Committee member) / Pagano, Caio (Committee member) / Cosand, Walter (Committee member) / Rogers, Rodney (Committee member) / Arizona State University (Publisher)

Created2012

Filtering by