Matching Items (39)
151689-Thumbnail Image.png
Description
Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.
ContributorsYuan, Lei (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Xue, Guoliang (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)
Created2013
151929-Thumbnail Image.png
Description
The entire history of HIV-1 is hidden in its ten thousand bases, where information regarding its evolutionary traversal through the human population can only be unlocked with fine-scale sequence analysis. Measurable footprints of mutation and recombination have imparted upon us a wealth of knowledge, from multiple chimpanzee-to-human transmissions to patterns

The entire history of HIV-1 is hidden in its ten thousand bases, where information regarding its evolutionary traversal through the human population can only be unlocked with fine-scale sequence analysis. Measurable footprints of mutation and recombination have imparted upon us a wealth of knowledge, from multiple chimpanzee-to-human transmissions to patterns of neutralizing antibody and drug resistance. Extracting maximum understanding from such diverse data can only be accomplished by analyzing the viral population from many angles. This body of work explores two primary aspects of HIV sequence evolution, point mutation and recombination, through cross-sectional (inter-individual) and longitudinal (intra-individual) investigations, respectively. Cross-sectional Analysis: The role of Haiti in the subtype B pandemic has been hotly debated for years; while there have been many studies, up to this point, no one has incorporated the well-known mechanism of retroviral recombination into their biological model. Prior to the use of recombination detection, multiple analyses produced trees where subtype B appears to have first entered Haiti, followed by a jump into the rest of the world. The results presented here contest the Haiti-first theory of the pandemic and instead suggest simultaneous entries of subtype B into Haiti and the rest of the world. Longitudinal Analysis: Potential N-linked glycosylation sites (PNGS) are the most evolutionarily dynamic component of one of the most evolutionarily dynamic proteins known to date. While the number of mutations associated with the increase or decrease of PNGS frequency over time is high, there are a set of relatively stable sites that persist within and between longitudinally sampled individuals. Here, I identify the most conserved stable PNGSs and suggest their potential roles in host-virus interplay. In addition, I have identified, for the first time, what may be a gp-120-based environmental preference for N-linked glycosylation sites.
ContributorsHepp, Crystal Marie, 1981- (Author) / Rosenberg, Michael S. (Thesis advisor) / Hedrick, Philip (Committee member) / Escalante, Ananias (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)
Created2013
152973-Thumbnail Image.png
Description
Studies of ancient pathogens are moving beyond simple confirmatory analysis of diseased bone; bioarchaeologists and ancient geneticists are posing nuanced questions and utilizing novel methods capable of confronting the debates surrounding pathogen origins and evolution, and the relationships between humans and disease in the past. This dissertation examines two ancient

Studies of ancient pathogens are moving beyond simple confirmatory analysis of diseased bone; bioarchaeologists and ancient geneticists are posing nuanced questions and utilizing novel methods capable of confronting the debates surrounding pathogen origins and evolution, and the relationships between humans and disease in the past. This dissertation examines two ancient human diseases through molecular and bioarchaeological lines of evidence, relying on techniques in paleogenetics and phylogenetics to detect, isolate, sequence and analyze ancient and modern pathogen DNA within an evolutionary framework. Specifically this research addresses outstanding issues regarding a) the evolution, origin and phylogenetic placement of the pathogen causing skeletal tuberculosis in New World prior to European contact, and b) the phylogeny and origins of the parasite causing the human leishmaniasis disease complex. An additional chapter presents a review of the major technological and theoretical advances in ancient pathogen genomics to frame the contributions of this work within a rapidly developing field. This overview emphasizes that understanding the evolution of human disease is critical to contextualizing relationships between humans and pathogens, and the epidemiological shifts observed both in the past and in the present era of (re)emerging infectious diseases. These questions continue to be at the forefront of not only pathogen research, but also

bioarchaeological and paleopathological scholarship.
ContributorsHarkins, Kelly M (Author) / Buikstra, Jane E. (Thesis advisor) / Stone, Anne C (Thesis advisor) / Knudson, Kelly (Committee member) / Kumar, Sudhir (Committee member) / Krause, Johannes (Committee member) / Arizona State University (Publisher)
Created2014
150168-Thumbnail Image.png
Description
Like individual organisms, complex social groups are able to maintain predictable trajectories of growth, from initial colony foundation to mature reproductively capable units. They do so while simultaneously responding flexibly to variation in nutrient availability and intake. Leafcutter ant colonies function as tri-trophic systems, in which the ants harvest vegetation

Like individual organisms, complex social groups are able to maintain predictable trajectories of growth, from initial colony foundation to mature reproductively capable units. They do so while simultaneously responding flexibly to variation in nutrient availability and intake. Leafcutter ant colonies function as tri-trophic systems, in which the ants harvest vegetation to grow a fungus that, in turn, serves as food for the colony. Fungal growth rates and colony worker production are interdependent, regulated by nutritional and behavioral feedbacks. Fungal growth and quality are directly affected by worker foraging decisions, while worker production is, in turn, dependent on the amount and condition of the fungus. In this dissertation, I first characterized the growth relationship between the workers and the fungus of the desert leafcutter ant Acromyrmex versicolor during early stages of colony development, from colony foundation by groups of queens through the beginnings of exponential growth. I found that this relationship undergoes a period of slow growth and instability when workers first emerge, and then becomes allometrically positive. I then evaluated how mass and element ratios of resources collected by the ants are translated into fungus and worker population growth, and refuse, finding that colony digestive efficiency is comparable to digestive efficiencies of other herbivorous insects and ruminants. To test how colonies behaviorally respond to perturbations of the fungus garden, I quantified activity levels and task performance of workers in colonies with either supplemented or diminished fungus gardens, and found that colonies adjusted activity and task allocation in response to the fungus garden size. Finally, to identify possible forms of nutrient limitation, I measured how colony performance was affected by changes in the relative amounts of carbohydrates, protein, and phosphorus available in the resources used to grow the fungus garden. From this experiment, I concluded that colony growth is primarily carbohydrate-limited.
ContributorsClark, Rebecca, 1981- (Author) / Fewell, Jennifer H (Thesis advisor) / Mueller, Ulrich (Committee member) / Liebig, Juergen (Committee member) / Elser, James (Committee member) / Harrison, Jon (Committee member) / Arizona State University (Publisher)
Created2011
150206-Thumbnail Image.png
Description
Proteins are a fundamental unit in biology. Although proteins have been extensively studied, there is still much to investigate. The mechanism by which proteins fold into their native state, how evolution shapes structural dynamics, and the dynamic mechanisms of many diseases are not well understood. In this thesis, protein folding

Proteins are a fundamental unit in biology. Although proteins have been extensively studied, there is still much to investigate. The mechanism by which proteins fold into their native state, how evolution shapes structural dynamics, and the dynamic mechanisms of many diseases are not well understood. In this thesis, protein folding is explored using a multi-scale modeling method including (i) geometric constraint based simulations that efficiently search for native like topologies and (ii) reservoir replica exchange molecular dynamics, which identify the low free energy structures and refines these structures toward the native conformation. A test set of eight proteins and three ancestral steroid receptor proteins are folded to 2.7Å all-atom RMSD from their experimental crystal structures. Protein evolution and disease associated mutations (DAMs) are most commonly studied by in silico multiple sequence alignment methods. Here, however, the structural dynamics are incorporated to give insight into the evolution of three ancestral proteins and the mechanism of several diseases in human ferritin protein. The differences in conformational dynamics of these evolutionary related, functionally diverged ancestral steroid receptor proteins are investigated by obtaining the most collective motion through essential dynamics. Strikingly, this analysis shows that evolutionary diverged proteins of the same family do not share the same dynamic subspace. Rather, those sharing the same function are simultaneously clustered together and distant from those functionally diverged homologs. This dynamics analysis also identifies 77% of mutations (functional and permissive) necessary to evolve new function. In silico methods for prediction of DAMs rely on differences in evolution rate due to purifying selection and therefore the accuracy of DAM prediction decreases at fast and slow evolvable sites. Here, we investigate structural dynamics through computing the contribution of each residue to the biologically relevant fluctuations and from this define a metric: the dynamic stability index (DSI). Using DSI we study the mechanism for three diseases observed in the human ferritin protein. The T30I and R40G DAMs show a loss of dynamic stability at the C-terminus helix and nearby regulatory loop, agreeing with experimental results implicating the same regulatory loop as a cause in cataracts syndrome.
ContributorsGlembo, Tyler J (Author) / Ozkan, Sefika B (Thesis advisor) / Thorpe, Michael F (Committee member) / Ros, Robert (Committee member) / Kumar, Sudhir (Committee member) / Shumway, John (Committee member) / Arizona State University (Publisher)
Created2011
150095-Thumbnail Image.png
Description
Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It

Multi-task learning (MTL) aims to improve the generalization performance (of the resulting classifiers) by learning multiple related tasks simultaneously. Specifically, MTL exploits the intrinsic task relatedness, based on which the informative domain knowledge from each task can be shared across multiple tasks and thus facilitate the individual task learning. It is particularly desirable to share the domain knowledge (among the tasks) when there are a number of related tasks but only limited training data is available for each task. Modeling the relationship of multiple tasks is critical to the generalization performance of the MTL algorithms. In this dissertation, I propose a series of MTL approaches which assume that multiple tasks are intrinsically related via a shared low-dimensional feature space. The proposed MTL approaches are developed to deal with different scenarios and settings; they are respectively formulated as mathematical optimization problems of minimizing the empirical loss regularized by different structures. For all proposed MTL formulations, I develop the associated optimization algorithms to find their globally optimal solution efficiently. I also conduct theoretical analysis for certain MTL approaches by deriving the globally optimal solution recovery condition and the performance bound. To demonstrate the practical performance, I apply the proposed MTL approaches on different real-world applications: (1) Automated annotation of the Drosophila gene expression pattern images; (2) Categorization of the Yahoo web pages. Our experimental results demonstrate the efficiency and effectiveness of the proposed algorithms.
ContributorsChen, Jianhui (Author) / Ye, Jieping (Thesis advisor) / Kumar, Sudhir (Committee member) / Liu, Huan (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)
Created2011
150491-Thumbnail Image.png
Description
We propose a novel solution to prevent cancer by developing a prophylactic cancer. Several sources of antigens for cancer vaccines have been published. Among these, antigens that contain a frame-shift (FS) peptide or viral peptide are quite attractive for a variety of reasons. FS sequences, from either mistake in RNA

We propose a novel solution to prevent cancer by developing a prophylactic cancer. Several sources of antigens for cancer vaccines have been published. Among these, antigens that contain a frame-shift (FS) peptide or viral peptide are quite attractive for a variety of reasons. FS sequences, from either mistake in RNA processing or in genomic DNA, may lead to generation of neo-peptides that are foreign to the immune system. Viral peptides presumably would originate from exogenous but integrated viral nucleic acid sequences. Both are non-self, therefore lessen concerns about development of autoimmunity. I have developed a bioinformatical approach to identify these aberrant transcripts in the cancer transcriptome. Their suitability for use in a vaccine is evaluated by establishing their frequencies and predicting possible epitopes along with their population coverage according to the prevalence of major histocompatibility complex (MHC) types. Viral transcripts and transcripts with FS mutations from gene fusion, insertion/deletion at coding microsatellite DNA, and alternative splicing were identified in NCBI Expressed Sequence Tag (EST) database. 48 FS chimeric transcripts were validated in 50 breast cell lines and 68 primary breast tumor samples with their frequencies from 4% to 98% by RT-PCR and sequencing confirmation. These 48 FS peptides, if translated and presented, could be used to protect more than 90% of the population in Northern America based on the prediction of epitopes derived from them. Furthermore, we synthesized 150 peptides that correspond to FS and viral peptides that we predicted would exist in tumor patients and we tested over 200 different cancer patient sera. We found a number of serological reactive peptide sequences in cancer patients that had little to no reactivity in healthy controls; strong support for the strength of our bioinformatic approach. This study describes a process used to identify aberrant transcripts that lead to a new source of antigens that can be tested and used in a prophylactic cancer vaccine. The vast amount of transcriptome data of various cancers from the Cancer Genome Atlas (TCGA) project will enhance our ability to further select better cancer antigen candidates.
ContributorsLee, HoJoon (Author) / Johnston, Stephen A. (Thesis advisor) / Kumar, Sudhir (Committee member) / Miller, Laurence (Committee member) / Stafford, Phillip (Committee member) / Sykes, Kathryn (Committee member) / Arizona State University (Publisher)
Created2012
151089-Thumbnail Image.png
Description
Many studies over the past two decades examined the link between climate patterns and discharge, but few have attempted to study the effects of the El Niño Southern Oscillation (ENSO) on localized and watershed specific processes such as nutrient loading in the Southwestern United States. The Multivariate ENSO Index (MEI)

Many studies over the past two decades examined the link between climate patterns and discharge, but few have attempted to study the effects of the El Niño Southern Oscillation (ENSO) on localized and watershed specific processes such as nutrient loading in the Southwestern United States. The Multivariate ENSO Index (MEI) is used to describe the state of the ENSO, with positive (negative) values referring to an El Niño condition (La Niña condition). This study examined the connection between the MEI and precipitation, discharge, and total nitrogen (TN) and total phosphorus (TP) concentrations in the Upper Salt River Watershed in Arizona. Unrestricted regression models (UMs) and restricted regression models (RMs) were used to investigate the relationship between the discharges in Tonto Creek and the Salt River as functions of the magnitude of the MEI, precipitation, and season (winter/summer). The results suggest that in addition to precipitation, the MEI/season relationship is an important factor for predicting discharge. Additionally, high discharge events were associated with high magnitude ENSO events, both El Niño and La Niña. An UM including discharge and season, and a RM (restricting the seasonal factor to zero), were applied to TN and TP concentrations in the Salt River. Discharge and seasonality were significant factors describing the variability in TN in the Salt River while discharge alone was the significant factor describing TP. TN and TP in Roosevelt Lake were evaluated as functions of both discharge and MEI. Some significant correlations were found but internal nutrient cycling as well as seasonal stratification of the water column of the lake likely masks the true relationships. Based on these results, the MEI is a useful predictor of discharge, as well as nutrient loading in the Salt River Watershed through the Salt River and Tonto Creek. A predictive model investigating the effect of ENSO on nutrient loading through discharge can illustrate the effects of large scale climate patterns on smaller systems.
ContributorsSversvold, Darren (Author) / Neuer, Susanne (Thesis advisor) / Elser, James (Committee member) / Fenichel, Eli (Committee member) / Arizona State University (Publisher)
Created2012
150816-Thumbnail Image.png
Description
Land management practices such as domestic animal grazing can alter plant communities via changes in soil structure and chemistry, species composition, and plant nutrient content. These changes can affect the abundance and quality of plants consumed by insect herbivores with consequent changes in population dynamics. These population changes can translate

Land management practices such as domestic animal grazing can alter plant communities via changes in soil structure and chemistry, species composition, and plant nutrient content. These changes can affect the abundance and quality of plants consumed by insect herbivores with consequent changes in population dynamics. These population changes can translate to massive crop damage and pest control costs. My dissertation focused on Oedaleus asiaticus, a dominant Asian locust, and had three main objectives. First, I identified morphological, physiological, and behavioral characteristics of the migratory ("brown") and non-migratory ("green") phenotypes. I found that brown morphs had longer wings, larger thoraxes and higher metabolic rates compared to green morphs, suggesting that developmental plasticity allows greater migratory capacity in the brown morph of this locust. Second, I tested the hypothesis of a causal link between livestock overgrazing and an increase in migratory swarms of O. asiaticus. Current paradigms generally assume that increased plant nitrogen (N) should enhance herbivore performance by relieving protein-limitation, increasing herbivorous insect populations. I showed, in contrast to this scenario, that host plant N-enrichment and high protein artificial diets decreased the size and viability of O. asiaticus. Plant N content was lowest and locust abundance highest in heavily livestock-grazed fields where soils were N-depleted, likely due to enhanced erosion and leaching. These results suggest that heavy livestock grazing promotes outbreaks of this locust by reducing plant protein content. Third, I tested for the influence of dietary imbalance, in conjunction with high population density, on migratory plasticity. While high population density has clearly been shown to induce the migratory morph in several locusts, the effect of diet has been unclear. I found that locusts reared at high population density and fed unfertilized plants (i.e. high quality plants for O. asiaticus) had the greatest migratory capacity, and maintained a high percent of brown locusts. These results did not support the hypothesis that poor-quality resources increased expression of migratory phenotypes. This highlights a need to develop new theoretical frameworks for predicting how environmental factors will regulate migratory plasticity in locusts and perhaps other insects.
ContributorsCease, Arianne (Author) / Harrison, Jon (Thesis advisor) / Elser, James (Thesis advisor) / DeNardo, Dale (Committee member) / Quinlan, Michael (Committee member) / Sabo, John (Committee member) / Arizona State University (Publisher)
Created2012
Description
The greatest barrier to understanding how life interacts with its environment is the complexity in which biology operates. In this work, I present experimental designs, analysis methods, and visualization techniques to overcome the challenges of deciphering complex biological datasets. First, I examine an iron limitation transcriptome of Synechocystis sp. PCC

The greatest barrier to understanding how life interacts with its environment is the complexity in which biology operates. In this work, I present experimental designs, analysis methods, and visualization techniques to overcome the challenges of deciphering complex biological datasets. First, I examine an iron limitation transcriptome of Synechocystis sp. PCC 6803 using a new methodology. Until now, iron limitation in experiments of Synechocystis sp. PCC 6803 gene expression has been achieved through media chelation. Notably, chelation also reduces the bioavailability of other metals, whereas naturally occurring low iron settings likely result from a lack of iron influx and not as a result of chelation. The overall metabolic trends of previous studies are well-characterized but within those trends is significant variability in single gene expression responses. I compare previous transcriptomics analyses with our protocol that limits the addition of bioavailable iron to growth media to identify consistent gene expression signals resulting from iron limitation. Second, I describe a novel method of improving the reliability of centroid-linkage clustering results. The size and complexity of modern sequencing datasets often prohibit constructing distance matrices, which prevents the use of many common clustering algorithms. Centroid-linkage circumvents the need for a distance matrix, but has the adverse effect of producing input-order dependent results. In this chapter, I describe a method of cluster edge counting across iterated centroid-linkage results and reconstructing aggregate clusters from a ranked edge list without a distance matrix and input-order dependence. Finally, I introduce dendritic heat maps, a new figure type that visualizes heat map responses through expanding and contracting sequence clustering specificities. Heat maps are useful for comparing data across a range of possible states. However, data binning is sensitive to clustering cutoffs which are often arbitrarily introduced by researchers and can substantially change the heat map response of any single data point. With an understanding of how the architectural elements of dendrograms and heat maps affect data visualization, I have integrated their salient features to create a figure type aimed at viewing multiple levels of clustering cutoffs, allowing researchers to better understand the effects of environment on metabolism or phylogenetic lineages.
ContributorsKellom, Matthew (Author) / Raymond, Jason (Thesis advisor) / Anbar, Ariel (Committee member) / Elser, James (Committee member) / Shock, Everett (Committee member) / Walker, Sarah (Committee member) / Arizona State University (Publisher)
Created2017