Search Content

Simultaneous variable and feature group selection in heterogeneous learning: optimization and applications

Description

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous…

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous data, it is interesting to design efficient machine learning models that are capable of performing variable selection and feature group (data source) selection simultaneously (a.k.a bi-level selection). In this thesis, I carry out research along this direction with a particular focus on designing efficient optimization algorithms. I start with a unified bi-level learning model that contains several existing feature selection models as special cases. Then the proposed model is further extended to tackle the block-wise missing data, one of the major challenges in the diagnosis of Alzheimer's Disease (AD). Moreover, I propose a novel interpretable sparse group feature selection model that greatly facilitates the procedure of parameter tuning and model selection. Last but not least, I show that by solving the sparse group hard thresholding problem directly, the sparse group feature selection model can be further improved in terms of both algorithmic complexity and efficiency. Promising results are demonstrated in the extensive evaluation on multiple real-world data sets.

ContributorsXiang, Shuo (Author) / Ye, Jieping (Thesis advisor) / Mittelmann, Hans D (Committee member) / Davulcu, Hasan (Committee member) / He, Jingrui (Committee member) / Arizona State University (Publisher)

Created2014

Dense non-natural sequence peptide microarrays for epitope mapping and diagnostics

Description

The healthcare system in this country is currently unacceptable. New technologies may contribute to reducing cost and improving outcomes. Early diagnosis and treatment represents the least risky option for addressing this issue. Such a technology needs to be inexpensive, highly sensitive, highly specific, and amenable to adoption in a clinic.…

The healthcare system in this country is currently unacceptable. New technologies may contribute to reducing cost and improving outcomes. Early diagnosis and treatment represents the least risky option for addressing this issue. Such a technology needs to be inexpensive, highly sensitive, highly specific, and amenable to adoption in a clinic. This thesis explores an immunodiagnostic technology based on highly scalable, non-natural sequence peptide microarrays designed to profile the humoral immune response and address the healthcare problem. The primary aim of this thesis is to explore the ability of these arrays to map continuous (linear) epitopes. I discovered that using a technique termed subsequence analysis where epitopes could be decisively mapped to an eliciting protein with high success rate. This led to the discovery of novel linear epitopes from Plasmodium falciparum (Malaria) and Treponema palladium (Syphilis), as well as validation of previously discovered epitopes in Dengue and monoclonal antibodies. Next, I developed and tested a classification scheme based on Support Vector Machines for development of a Dengue Fever diagnostic, achieving higher sensitivity and specificity than current FDA approved techniques. The software underlying this method is available for download under the BSD license. Following this, I developed a kinetic model for immunosignatures and tested it against existing data driven by previously unexplained phenomena. This model provides a framework and informs ways to optimize the platform for maximum stability and efficiency. I also explored the role of sequence composition in explaining an immunosignature binding profile, determining a strong role for charged residues that seems to have some predictive ability for disease. Finally, I developed a database, software and indexing strategy based on Apache Lucene for searching motif patterns (regular expressions) in large biological databases. These projects as a whole have advanced knowledge of how to approach high throughput immunodiagnostics and provide an example of how technology can be fused with biology in order to affect scientific and health outcomes.

ContributorsRicher, Joshua Amos (Author) / Johnston, Stephen A. (Thesis advisor) / Woodbury, Neal (Committee member) / Stafford, Phillip (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)

Created2014

The effect of high-intensity interval exercise on postprandial endothelial function in youth

Description

In adults, consuming a high-fat meal can induce endothelial dysfunction while exercise may mitigate postprandial endothelial dysfunction. Whether exercise is protective against postprandial endothelial dysfunction in obese youth is unknown. The purpose of this study was to determine if high-intensity interval exercise (HIIE) performed the evening prior to a high-fat…

In adults, consuming a high-fat meal can induce endothelial dysfunction while exercise may mitigate postprandial endothelial dysfunction. Whether exercise is protective against postprandial endothelial dysfunction in obese youth is unknown. The purpose of this study was to determine if high-intensity interval exercise (HIIE) performed the evening prior to a high-fat meal protects against postprandial endothelial dysfunction in obese adolescent males. Fourteen obese adolescent males (BMI%tile=98.5±0.6; 14.3±1.0yrs) completed the study. After initial screening, participants arrived, fasted at 9:00 in the morning where brachial artery flow-mediated dilation (FMD) was measured using duplex ultrasound after 20min of supine rest (7.0±3.0%) and completed a maximal exercise test on a cycle ergometer (VO2peak=2.6±0.5 L/min). Participants were randomized and completed 2 conditions: a morning high-fat meal challenge with evening prior HIIE (EX+M) or a morning high-fat meal challenge without prior exercise (MO). The EX+M condition included a single HIIE session on a cycle ergometer (8 X 2min at ≥90%HRmax, with 2min active recovery between bouts, 42min total) which was performed at 17:00 the evening prior to the meal challenge. In both conditions, a mixed-meal was tailored to participants body weight consisting of 0.7g of fat/kg of body weight consumed (889±95kcal; 65% Fat, 30% CHO). FMD was measured at fasting (>10hrs) and subsequently measured at 2hr and 4hr after high-fat meal consumption. Exercise did not improve fasting FMD (7.5±3.0 vs. 7.4±2.8%, P=0.927; EX+M and MO, respectively). Despite consuming a high-fat meal, FMD was not reduced at 2hr (8.4±3.4 vs. 7.6±3.9%; EX+M and MO, respectively) or 4hr (8.8±3.9 vs. 8.6±4.0%; EX+M and MO, respectively) in either condition and no differences were observed between condition (p(condition*time)=0.928). These observations remained after adjusting for baseline artery diameter and shear rate. We observed that HIIE, the evening prior, had no effect on fasting or postprandial endothelial function when compared with a meal only condition. Future research should examine whether exercise training may be able to improve postprandial endothelial function rather than a single acute bout in obese youth.

ContributorsRyder, Justin Ross (Author) / Shaibi, Gabriel Q (Thesis advisor) / Gaesser, Glenn A (Committee member) / Vega-Lopez, Sonia (Committee member) / Crespo, Noe C (Committee member) / Katsanos, Christos (Committee member) / Arizona State University (Publisher)

Created2014

Measuring the effects of a school and community-based dietary change intervention on the home food environment among parents with school-aged children

Description

Availability and accessibility of foods in the home influence dietary behaviors. However, much of the literature involving measurement of the home food environment (HFE) has examined only self-reported data, and home food inventory tools have not been used to assess behavior change intervention efficacy. Thus, this quasi-experimental study was conducted…

Availability and accessibility of foods in the home influence dietary behaviors. However, much of the literature involving measurement of the home food environment (HFE) has examined only self-reported data, and home food inventory tools have not been used to assess behavior change intervention efficacy. Thus, this quasi-experimental study was conducted to test the preliminary efficacy of a 10-week dietary behavioral intervention on the HFE, measured through the presence of fruits, vegetables, and sources of sugars in the household. Participants included 23 parents (21 females; age=36±5.5) of children 6-11 years old living in an ethnically diverse community within a Southwestern metropolitan area. Sociodemographic information was collected at baseline using a survey. A modified version of the Home Food Inventory was completed in the homes of participants by trained research assistants at baseline and following termination of the intervention. Relative to baseline, the intervention resulted in significant increases in availability of different types of fruits (7.7±3.2 vs. 9.4±3.1; p=0.004) and high fiber/low sugar cereal (2.3±1.4 vs. 2.7±1.4; p=0.033). There was a significant reduction in availability of sugar-sweetened beverages (3.2±1.9 vs. 1.7±1.3; p=0.004), and an increase in the number of households with accessible 100% fruit juice (3 vs. 17 households; p=0.001) and bottled/contained water (9 vs. 22 households; p<0.001). Moreover, there were meaningful changes in the number of households with accessible chocolate milk (7 vs. 0), strawberry milk (3 vs. 0), and diet soda pop (2 vs. 0). There was a significant increase in the number of households with accessible ready-to-eat vegetables (8 vs. 19 households; p=0.007), and ready-to-eat fruit (8 vs. 17; p=0.022), and a significant reduction in available prepared desserts (3.0±2.0 vs. 1.7±1.3; p=0.005), and candy (2.0±1.7 vs. 0.6±0.7; p<0.001). There were no significant changes in availability of vegetables and sugar-laden cereals, or accessibility of fresh fruit, fresh vegetables, dry cereal, candy, soda pop, desserts, and sports/fruit drinks. Overall, results suggest that the current dietary behavior change intervention resulted in positive changes in the HFE. Further research to confirm these results in a randomized controlled trial is warranted.

ContributorsCassinat, Rachel (Author) / Vega-Lopez, Sonia (Thesis advisor) / Bruening, Meredith (Committee member) / Crespo, Noe (Committee member) / Arizona State University (Publisher)

Created2015

Reconstruction-free inference from compressive measurements

Description

As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a…

As a promising solution to the problem of acquiring and storing large amounts of image and video data, spatial-multiplexing camera architectures have received lot of attention in the recent past. Such architectures have the attractive feature of combining a two-step process of acquisition and compression of pixel measurements in a conventional camera, into a single step. A popular variant is the single-pixel camera that obtains measurements of the scene using a pseudo-random measurement matrix. Advances in compressive sensing (CS) theory in the past decade have supplied the tools that, in theory, allow near-perfect reconstruction of an image from these measurements even for sub-Nyquist sampling rates. However, current state-of-the-art reconstruction algorithms suffer from two drawbacks -- They are (1) computationally very expensive and (2) incapable of yielding high fidelity reconstructions for high compression ratios. In computer vision, the final goal is usually to perform an inference task using the images acquired and not signal recovery. With this motivation, this thesis considers the possibility of inference directly from compressed measurements, thereby obviating the need to use expensive reconstruction algorithms. It is often the case that non-linear features are used for inference tasks in computer vision. However, currently, it is unclear how to extract such features from compressed measurements. Instead, using the theoretical basis provided by the Johnson-Lindenstrauss lemma, discriminative features using smashed correlation filters are derived and it is shown that it is indeed possible to perform reconstruction-free inference at high compression ratios with only a marginal loss in accuracy. As a specific inference problem in computer vision, face recognition is considered, mainly beyond the visible spectrum such as in the short wave infra-red region (SWIR), where sensors are expensive.

ContributorsLohit, Suhas Anand (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2015

Exploring latent structure in data: algorithms and implementations

Description

Feature representations for raw data is one of the most important component in a machine learning system. Traditionally, features are \textit{hand crafted} by domain experts which can often be a time consuming process. Furthermore, they do not generalize well to unseen data and novel tasks. Recently, there have been many…

Feature representations for raw data is one of the most important component in a machine learning system. Traditionally, features are \textit{hand crafted} by domain experts which can often be a time consuming process. Furthermore, they do not generalize well to unseen data and novel tasks. Recently, there have been many efforts to generate data-driven representations using clustering and sparse models. This dissertation focuses on building data-driven unsupervised models for analyzing raw data and developing efficient feature representations.

Simultaneous segmentation and feature extraction approaches for silicon-pores sensor data are considered. Aggregating data into a matrix and performing low rank and sparse matrix decompositions with additional smoothness constraints are proposed to solve this problem. Comparison of several variants of the approaches and results for signal de-noising and translocation/trapping event extraction are presented. Algorithms to improve transform-domain features for ion-channel time-series signals based on matrix completion are presented. The improved features achieve better performance in classification tasks and in reducing the false alarm rates when applied to analyte detection.

Developing representations for multimedia is an important and challenging problem with applications ranging from scene recognition, multi-media retrieval and personal life-logging systems to field robot navigation. In this dissertation, we present a new framework for feature extraction for challenging natural environment sounds. Proposed features outperform traditional spectral features on challenging environmental sound datasets. Several algorithms are proposed that perform supervised tasks such as recognition and tag annotation. Ensemble methods are proposed to improve the tag annotation process.

To facilitate the use of large datasets, fast implementations are developed for sparse coding, the key component in our algorithms. Several strategies to speed-up Orthogonal Matching Pursuit algorithm using CUDA kernel on a GPU are proposed. Implementations are also developed for a large scale image retrieval system. Image-based "exact search" and "visually similar search" using the image patch sparse codes are performed. Results demonstrate large speed-up over CPU implementations and good retrieval performance is also achieved.

ContributorsSattigeri, Prasanna S (Author) / Spanias, Andreas (Thesis advisor) / Thornton, Trevor (Committee member) / Goryll, Michael (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2014

Graph-based sparse learning: models, algorithms, and applications

Description

Sparse learning is a powerful tool to generate models of high-dimensional data with high interpretability, and it has many important applications in areas such as bioinformatics, medical image processing, and computer vision. Recently, the a priori structural information has been shown to be powerful for improving the performance of sparse…

Sparse learning is a powerful tool to generate models of high-dimensional data with high interpretability, and it has many important applications in areas such as bioinformatics, medical image processing, and computer vision. Recently, the a priori structural information has been shown to be powerful for improving the performance of sparse learning models. A graph is a fundamental way to represent structural information of features. This dissertation focuses on graph-based sparse learning. The first part of this dissertation aims to integrate a graph into sparse learning to improve the performance. Specifically, the problem of feature grouping and selection over a given undirected graph is considered. Three models are proposed along with efficient solvers to achieve simultaneous feature grouping and selection, enhancing estimation accuracy. One major challenge is that it is still computationally challenging to solve large scale graph-based sparse learning problems. An efficient, scalable, and parallel algorithm for one widely used graph-based sparse learning approach, called anisotropic total variation regularization is therefore proposed, by explicitly exploring the structure of a graph. The second part of this dissertation focuses on uncovering the graph structure from the data. Two issues in graphical modeling are considered. One is the joint estimation of multiple graphical models using a fused lasso penalty and the other is the estimation of hierarchical graphical models. The key technical contribution is to establish the necessary and sufficient condition for the graphs to be decomposable. Based on this key property, a simple screening rule is presented, which reduces the size of the optimization problem, dramatically reducing the computational cost.

ContributorsYang, Sen (Author) / Ye, Jieping (Thesis advisor) / Wonka, Peter (Thesis advisor) / Wang, Yalin (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2014

Effect of a short term high fat diet on kidney morphology and function

Description

Long term high fat diets (HFD) are correlated with the development of diabetes

and kidney disease. However, the impact of short term high fat intake on the etiology of kidney disease has not been well-studied. Therefore, this study examined the impact of a six week HFD (60% fat) on kidney structure…

Long term high fat diets (HFD) are correlated with the development of diabetes

and kidney disease. However, the impact of short term high fat intake on the etiology of kidney disease has not been well-studied. Therefore, this study examined the impact of a six week HFD (60% fat) on kidney structure and function in young male Sprague-Dawley rats. Previous studies have shown that these animals develop indices of diabetes compared to rats fed a standard rodent chow (5% fat) for six weeks. The hypothesis of this study is that six weeks of HFD will lead to early stages of kidney disease as evidenced by morphological and functional changes in the kidney. Alterations in morphology were determined by measuring structural changes in the kidneys (changes in mass, fatty acid infiltration, and structural damage). Alterations in kidney function were measured by analyzing urinary biomarkers of oxidative RNA/DNA damage, renal tissue lipid peroxidation, urinary markers of impaired kidney function (urinary protein, creatinine, and hydrogen peroxide (H2O2)), markers of inflammation (tumor necrosis factor alpha (TNFα) and interleukin 6 (IL-6)), as well as cystatin C, a plasma biomarker of kidney function. The results of these studies determined that short term HFD intake is not sufficient to induce early stage kidney disease. Beyond increases in renal mass, there were no significant differences between the markers of renal structure and function in the HFD and standard rodent chow-fed rats.

ContributorsCrinigan, Catherine (Author) / Sweazea, Karen (Thesis advisor) / Johnston, Carol (Committee member) / Mayol-Kreiser, Sandra (Committee member) / Arizona State University (Publisher)

Created2015

Food deserts: identifying and overcoming issues in the supply chain

Description

Research related to food deserts, areas with limited access to healthy and affordable food options, has focused primarily on issues of healthy food access, food quality and pricing, dietary outcomes, and increased risk for chronic diseases among residents. However, upstream challenges that might play a major role in the…

Research related to food deserts, areas with limited access to healthy and affordable food options, has focused primarily on issues of healthy food access, food quality and pricing, dietary outcomes, and increased risk for chronic diseases among residents. However, upstream challenges that might play a major role in the creation and perpetuation of food deserts, namely problems in the supply chain, have been less considered. In this qualitative study, researchers conducted semi-structured interviews with local produce supply chain representatives to understand their perspectives on the barriers to, and potential solutions for, supplying affordable produce to underserved areas in Phoenix, AZ. Through industry and academic experts, six representatives of the supply chain were identified and recruited to take part in one-hour interviews. Interviews were audio-recorded, transcribed, and coded into categories using a general inductive approach. Using the qualitative analysis software NVIVO to assist in data analysis, themes and subthemes emerged. Results suggested that considerable barriers exist among the representatives for supplying fresh, affordable produce in Phoenix-area food deserts, including minimum delivery requirements beyond the needs of the average small store, a desire to work with high-volume customers due to transportation and production costs, and the higher price point of produce for both store owners and consumers. Conversely, opportunities were identified that could be important in overcoming such barriers, including, tax or economic incentives that would make distribution into food deserts financially viable, infrastructural support for the safe handling and storage of fresh foods at existing retail outlets, and the development of novel distribution mechanisms for producers such as mobile markets and food hubs. Future research is needed to determine if these findings are representative of a larger, more diverse sample of Arizona produce supply chain representatives.

ContributorsLacagnina, Gina (Author) / Wharton, Christopher (Christopher Mack), 1977- (Thesis advisor) / Hughner, Renee (Committee member) / Barroso, Cristina (Committee member) / Arizona State University (Publisher)

Created2015

Understanding social media users via attributes and links

Description

With the rise of social media, hundreds of millions of people spend countless hours all over the globe on social media to connect, interact, share, and create user-generated data. This rich environment provides tremendous opportunities for many different players to easily and effectively reach out to people, interact with them,…

With the rise of social media, hundreds of millions of people spend countless hours all over the globe on social media to connect, interact, share, and create user-generated data. This rich environment provides tremendous opportunities for many different players to easily and effectively reach out to people, interact with them, influence them, or get their opinions. There are two pieces of information that attract most attention on social media sites, including user preferences and interactions. Businesses and organizations use this information to better understand and therefore provide customized services to social media users. This data can be used for different purposes such as, targeted advertisement, product recommendation, or even opinion mining. Social media sites use this information to better serve their users.

Despite the importance of personal information, in many cases people do not reveal this information to the public. Predicting the hidden or missing information is a common response to this challenge. In this thesis, we address the problem of predicting user attributes and future or missing links using an egocentric approach. The current research proposes novel concepts and approaches to better understand social media users in twofold including, a) their attributes, preferences, and interests, and b) their future or missing connections and interactions. More specifically, the contributions of this dissertation are (1) proposing a framework to study social media users through their attributes and link information, (2) proposing a scalable algorithm to predict user preferences; and (3) proposing a novel approach to predict attributes and links with limited information. The proposed algorithms use an egocentric approach to improve the state of the art algorithms in two directions. First by improving the prediction accuracy, and second, by increasing the scalability of the algorithms.

ContributorsAbbasi, Mohammad Ali, 1975- (Author) / Liu, Huan (Thesis advisor) / Davulcu, Hasan (Committee member) / Ye, Jieping (Committee member) / Agarwal, Nitin (Committee member) / Arizona State University (Publisher)

Created2014

Filtering by