Search Content

Structured sparse learning and its applications to biomedical and biological data

Description

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups…

Sparsity has become an important modeling tool in areas such as genetics, signal and audio processing, medical image processing, etc. Via the penalization of l-1 norm based regularization, the structured sparse learning algorithms can produce highly accurate models while imposing various predefined structures on the data, such as feature groups or graphs. In this thesis, I first propose to solve a sparse learning model with a general group structure, where the predefined groups may overlap with each other. Then, I present three real world applications which can benefit from the group structured sparse learning technique. In the first application, I study the Alzheimer's Disease diagnosis problem using multi-modality neuroimaging data. In this dataset, not every subject has all data sources available, exhibiting an unique and challenging block-wise missing pattern. In the second application, I study the automatic annotation and retrieval of fruit-fly gene expression pattern images. Combined with the spatial information, sparse learning techniques can be used to construct effective representation of the expression images. In the third application, I present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores help us to illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.

ContributorsYuan, Lei (Author) / Ye, Jieping (Thesis advisor) / Wang, Yalin (Committee member) / Xue, Guoliang (Committee member) / Kumar, Sudhir (Committee member) / Arizona State University (Publisher)

Created2013

Single cell RT-qPCR based ocean environmental sensing device development

Description

This thesis research focuses on developing a single-cell gene expression analysis method for marine diatom Thalassiosira pseudonana and constructing a chip level tool to realize the single cell RT-qPCR analysis. This chip will serve as a conceptual foundation for future deployable ocean monitoring systems. T. pseudonana, which is a common…

This thesis research focuses on developing a single-cell gene expression analysis method for marine diatom Thalassiosira pseudonana and constructing a chip level tool to realize the single cell RT-qPCR analysis. This chip will serve as a conceptual foundation for future deployable ocean monitoring systems. T. pseudonana, which is a common surface water microorganism, was detected in the deep ocean as confirmed by phylogenetic and microbial community functional studies. Six-fold copy number differences between 23S rRNA and 23S rDNA were observed by RT-qPCR, demonstrating the moderate functional activity of detected photosynthetic microbes in the deep ocean including T. pseudonana. Because of the ubiquity of T. pseudonana, it is a good candidate for an early warning system for ocean environmental perturbation monitoring. This early warning system will depend on identifying outlier gene expression at the single-cell level. An early warning system based on single-cell analysis is expected to detect environmental perturbations earlier than population level analysis which can only be observed after a whole community has reacted. Preliminary work using tube-based, two-step RT-qPCR revealed for the first time, gene expression heterogeneity of T. pseudonana under different nutrient conditions. Heterogeneity was revealed by different gene expression activity for individual cells under the same conditions. This single cell analysis showed a skewed, lognormal distribution and helped to find outlier cells. The results indicate that the geometric average becomes more important and representative of the whole population than the arithmetic average. This is in contrast with population level analysis which is limited to arithmetic averages only and highlights the value of single cell analysis. In order to develop a deployable sensor in the ocean, a chip level device was constructed. The chip contains surface-adhering droplets, defined by hydrophilic patterning, that serve as real-time PCR reaction chambers when they are immersed in oil. The chip had demonstrated sensitivities at the single cell level for both DNA and RNA. The successful rate of these chip-based reactions was around 85%. The sensitivity of the chip was equivalent to published microfluidic devices with complicated designs and protocols, but the production process of the chip was simple and the materials were all easily accessible in conventional environmental and/or biology laboratories. On-chip tests provided heterogeneity information about the whole population and were validated by comparing with conventional tube based methods and by p-values analysis. The power of chip-based single-cell analyses were mainly between 65-90% which were acceptable and can be further increased by higher throughput devices. With this chip and single-cell analysis approaches, a new paradigm for robust early warning systems of ocean environmental perturbation is possible.

ContributorsShi, Xu (Author) / Meldrum, Deirdre R. (Thesis advisor) / Zhang, Weiwen (Committee member) / Chao, Shih-hui (Committee member) / Westerhoff, Paul (Committee member) / Arizona State University (Publisher)

Created2013

Signaling pathway deregulation: identification through genomic aberrations and verification through genomic activity

Description

Given the process of tumorigenesis, biological signaling pathways have become of interest in the field of oncology. Many of the regulatory mechanisms that are altered in cancer are directly related to signal transduction and cellular communication. Thus, identifying signaling pathways that have become deregulated may provide useful information…

Given the process of tumorigenesis, biological signaling pathways have become of interest in the field of oncology. Many of the regulatory mechanisms that are altered in cancer are directly related to signal transduction and cellular communication. Thus, identifying signaling pathways that have become deregulated may provide useful information to better understanding altered regulatory mechanisms within cancer. Many methods that have been created to measure the distinct activity of signaling pathways have relied strictly upon transcription profiles. With advancements in comparative genomic hybridization techniques, copy number data has become extremely useful in providing valuable information pertaining to the genomic landscape of cancer. The purpose of this thesis is to develop a methodology that incorporates both gene expression and copy number data to identify signaling pathways that have become deregulated in cancer. The central idea is that copy number data may significantly assist in identifying signaling pathway deregulation by justifying the aberrant activity being measured in gene expression profiles. This method was then applied to four different subtypes of breast cancer resulting in the identification of signaling pathways associated with distinct functionalities for each of the breast cancer subtypes.

ContributorsTrevino, Robert (Author) / Kim, Seungchan (Thesis advisor) / Ringner, Markus (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2011

Optimization of a viral system to produce vaccines and other biopharmaceuticals in plants

Description

Plants are a promising upcoming platform for production of vaccine components and other desirable pharmaceutical proteins that can only, at present, be made in living systems. The unique soil microbe Agrobacterium tumefaciens can transfer DNA to plants very efficiently, essentially turning plants into factories capable of producing virtually any gene.…

Plants are a promising upcoming platform for production of vaccine components and other desirable pharmaceutical proteins that can only, at present, be made in living systems. The unique soil microbe Agrobacterium tumefaciens can transfer DNA to plants very efficiently, essentially turning plants into factories capable of producing virtually any gene. While genetically modified bacteria have historically been used for producing useful biopharmaceuticals like human insulin, plants can assemble much more complicated proteins, like human antibodies, that bacterial systems cannot. As plants do not harbor human pathogens, they are also safer alternatives than animal cell cultures. Additionally, plants can be grown very cheaply, in massive quantities.

In my research, I have studied the genetic mechanisms that underlie gene expression, in order to improve plant-based biopharmaceutical production. To do this, inspiration was drawn from naturally-occurring gene regulatory mechanisms, especially those from plant viruses, which have evolved mechanisms to co-opt the plant cellular machinery to produce high levels of viral proteins. By testing, modifying, and combining genetic elements from diverse sources, an optimized expression system has been developed that allows very rapid production of vaccine components, monoclonal antibodies, and other biopharmaceuticals. To improve target gene expression while maintaining the health and function of the plants, I identified, studied, and modified 5’ untranslated regions, combined gene terminators, and a nuclear matrix attachment region. The replication mechanisms of a plant geminivirus were also studied, which lead to additional strategies to produce more toxic biopharmaceutical proteins. Finally, the mechanisms employed by a geminivirus to spread between cells were investigated. It was demonstrated that these movement mechanisms can be functionally transplanted into a separate genus of geminivirus, allowing modified virus-based gene expression vectors to be spread between neighboring plant cells. Additionally, my work helps shed light on the basic genetic mechanisms employed by all living organisms to control gene expression.

ContributorsDiamos, Andy (Author) / Mason, Hugh S (Thesis advisor) / Mor, Tsafrir (Committee member) / Hogue, Brenda (Committee member) / Stout, Valerie (Committee member) / Arizona State University (Publisher)

Created2017

Circular RNA characterization and regulatory network prediction in human tissue

Description

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their…

Circular RNAs (circRNAs) are a class of endogenous, non-coding RNAs that are formed when exons back-splice to each other and represent a new area of transcriptomics research. Numerous RNA sequencing (RNAseq) studies since 2012 have revealed that circRNAs are pervasively expressed in eukaryotes, especially in the mammalian brain. While their functional role and impact remains to be clarified, circRNAs have been found to regulate micro-RNAs (miRNAs) as well as parental gene transcription and may thus have key roles in transcriptional regulation. Although circRNAs have continued to gain attention, our understanding of their expression in a cell-, tissue- , and brain region-specific context remains limited. Further, computational algorithms produce varied results in terms of what circRNAs are detected. This thesis aims to advance current knowledge of circRNA expression in a region specific context focusing on the human brain, as well as address computational challenges.

The overarching goal of my research unfolds over three aims: (i) evaluating circRNAs and their predicted impact on transcriptional regulatory networks in cell-specific RNAseq data; (ii) developing a novel solution for de novo detection of full length circRNAs as well as in silico validation of selected circRNA junctions using assembly; and (iii) application of these assembly based detection and validation workflows, and integrating existing tools, to systematically identify and characterize circRNAs in functionally distinct human brain regions. To this end, I have developed novel bioinformatics workflows that are applicable to non-polyA selected RNAseq datasets and can be used to characterize circRNA expression across various sample types and diseases. Further, I establish a reference dataset of circRNA expression profiles and regulatory networks in a brain region-specific manner. This resource along with existing databases such as circBase will be invaluable in advancing circRNA research as well as improving our understanding of their role in transcriptional regulation and various neurological conditions.

ContributorsSekar, Shobana (Author) / Liang, Winnie S (Thesis advisor) / Dinu, Valentin (Thesis advisor) / Craig, David (Committee member) / Liu, Li (Committee member) / Arizona State University (Publisher)

Created2018

DNA methylation and gene expression profiling for Parkinson's biomarker discovery

Description

Parkinson’s disease (PD) is a progressive neurodegenerative disorder, diagnosed late in

the disease by a series of motor deficits that manifest over years or decades. It is characterized by degeneration of mid-brain dopaminergic neurons with a high prevalence of dementia associated with the spread of pathology to cortical regions. Patients exhibiting…

Parkinson’s disease (PD) is a progressive neurodegenerative disorder, diagnosed late in

the disease by a series of motor deficits that manifest over years or decades. It is characterized by degeneration of mid-brain dopaminergic neurons with a high prevalence of dementia associated with the spread of pathology to cortical regions. Patients exhibiting symptoms have already undergone significant neuronal loss without chance for recovery. Analysis of disease specific changes in gene expression directly from human patients can uncover invaluable clues about a still unknown etiology, the potential of which grows exponentially as additional gene regulatory measures are questioned. Epigenetic mechanisms are emerging as important components of neurodegeneration, including PD; the extent to which methylation changes correlate with disease progression has not yet been reported. This collection of work aims to define multiple layers of PD that will work toward developing biomarkers that not only could improve diagnostic accuracy, but also push the boundaries of the disease detection timeline. I examined changes in gene expression, alternative splicing of those gene products, and the regulatory mechanism of DNA methylation in the Parkinson’s disease system, as well as the pathologically related Alzheimer’s disease (AD). I first used RNA sequencing (RNAseq) to evaluate differential gene expression and alternative splicing in the posterior cingulate cortex of patients with PD and PD with dementia (PDD). Next, I performed a longitudinal genome-wide methylation study surveying ~850K CpG methylation sites in whole blood from 189 PD patients and 191 control individuals obtained at both a baseline and at a follow-up visit after 2 years. I also considered how symptom management medications could affect the regulatory mechanism of DNA methylation. In the last chapter of this work, I intersected RNAseq and DNA methylation array datasets from whole blood patient samples for integrated differential analyses of both PD and AD. Changes in gene expression and DNA methylation reveal clear patterns of pathway dysregulation that can be seen across brain and blood, from one study to the next. I present a thorough survey of molecular changes occurring within the idiopathic Parkinson’s disease patient and propose candidate targets for potential molecular biomarkers.

ContributorsHenderson, Adrienne Rose (Author) / Huentelman, Matthew J (Thesis advisor) / Newbern, Jason (Thesis advisor) / Dunckley, Travis L (Committee member) / Jensen, Kendall (Committee member) / Wilson, Melissa (Committee member) / Arizona State University (Publisher)

Created2019

Multiplexed single-cell spatial proteomics and transcriptomics

Description

Single-cell proteomics and transcriptomics analysis are crucial to gain insights of

healthy physiology and disease pathogenesis. The comprehensive profiling of biomolecules in individual cells of a heterogeneous system can provide deep insights into many important biological questions, such as the distinct cellular compositions or regulation of inter- and intracellular signaling pathways…

Single-cell proteomics and transcriptomics analysis are crucial to gain insights of

healthy physiology and disease pathogenesis. The comprehensive profiling of biomolecules in individual cells of a heterogeneous system can provide deep insights into many important biological questions, such as the distinct cellular compositions or regulation of inter- and intracellular signaling pathways of healthy and diseased tissues. With multidimensional molecular imaging of many different biomarkers in patient biopsies, diseases can be accurately diagnosed to guide the selection of the ideal treatment.

As an urgent need to advance single-cell analysis, imaging-based technologies have been developed to detect and quantify multiple DNA, RNA and protein molecules in single cell in situ. Novel fluorescent probes have been designed and synthesized, which targets specifically either their nucleic acid counterpart or protein epitopes. These highly multiplexed imaging-based platforms have the potential to detect and quantify 100 different protein molecules and 1000 different nucleic acids in a single cell.

Using novel fluorescent probes, a large number of biomolecules have been detected and quantified in formalin-fixed paraffin-embedded (FFPE) brain tissue at single-cell resolution. By studying protein expression levels, neuronal heterogeneity has been revealed in distinct subregions of human hippocampus.

ContributorsMondal, Manas (Author) / Guo, Jia (Thesis advisor) / Gould, Ian (Committee member) / Ros, Alexandra (Committee member) / Arizona State University (Publisher)

Created2018

Vitellogenin Expression and Deformed Wing Virus Replication in Apis mellifera

Description

Vitellogenin (vg) is a precursor protein of egg yolk in honeybees, but it is also known to have immunological functions. The purpose of this experiment was to determine the effect of vg on the viral load of deformed wing virus (DWV) in worker honey bees (Apis mellifera). I hypothesized that…

Vitellogenin (vg) is a precursor protein of egg yolk in honeybees, but it is also known to have immunological functions. The purpose of this experiment was to determine the effect of vg on the viral load of deformed wing virus (DWV) in worker honey bees (Apis mellifera). I hypothesized that a reduction in vg expression would lead to an increase in the viral load. I collected 180 worker bees and split them into four groups: half the bees were subjected to a vg gene knockdown by injections of double stranded vg RNA, and the rest were injected with green fluorescent protein (gfp) double stranded RNA. Half of each group was thereafter injected with DWV, and half given a sham injection. The rate of mortality in all four groups was higher than expected, leaving only 17 bees total. I dissected these bees' fat bodies and extracted their RNA to test for vg and DWV. PCR results showed that, out of the small group of remaining bees, the levels of vg were not statistically different. Furthermore, both groups of virus-injected bees showed similar viral loads. Because of the high mortality rate bees and the lack of differing levels of vg transcript between experimental and control groups, I could not draw conclusions from these results. The high mortality could be caused by several factors: temperature-induced stress, repeated stress from the two injections, and stress from viral infection. In addition, it is possible that the vg dsRNA batch I used was faulty. This thesis exemplifies that information cannot safely be extracted when loss of sampling units result in a small datasets that do not represent the original sampling population.

ContributorsCrable, Emma Lewis (Author) / Amdam, Gro (Thesis director) / Wang, Ying (Committee member) / Dahan, Romain (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2017-12

Bcl11a and Bcl11b Regulation of the Decision for Murine Cells to Proliferate or Differentiate During Skeletal Muscle Development and Repair

Description

The development of skeletal muscle during embryogenesis and repair in adults is dependent on the intricate balance between the proliferation of myogenic progenitor cells and the differentiation of those cells into functional muscle fibers. Recent studies demonstrate that the Drosophila melanogaster transcription factor CG9650 is expressed in muscle progenitor cells,…

The development of skeletal muscle during embryogenesis and repair in adults is dependent on the intricate balance between the proliferation of myogenic progenitor cells and the differentiation of those cells into functional muscle fibers. Recent studies demonstrate that the Drosophila melanogaster transcription factor CG9650 is expressed in muscle progenitor cells, where it maintains myoblast numbers. We are interested in the Mus musculus orthologs Bcl11a and Bcl11b (C2H2 zinc finger transcription factors), and understanding their role as molecular switches that control proliferation/differentiation decisions in muscle progenitor cells. Expression analysis revealed that Bcl11b, but not Bcl11a, is expressed in the region of the mouse embryo populated with myogenic progenitor cells; gene expression studies in muscle cell culture confirmed Bcl11b is also selectively transcribed in muscle. Furthermore, Bcl11b is down-regulated with differentiation, which is consistent with the belief that the gene plays a role in cell proliferation.

ContributorsDuong, Brittany Bach (Author) / Rawls, Alan (Thesis director) / Wilson-Rawls, Jeanne (Committee member) / Barrett, The Honors College (Contributor) / Harrington Bioengineering Program (Contributor) / School of Life Sciences (Contributor)

Created2014-05

In Situ RNA Expression Analysis Using Two-Photon Laser Lysis (2PLL) and Microfluidic RT-qPCR

Description

A major goal of the Center for Biosignatures Discovery Automation (CBDA) is to design a diagnostic tool that detects novel cancer biosignatures at the single-cell level. We designed the Single-cell QUantitative In situ Reverse Transcription-Polymerase Chain Reaction (SQUIRT-PCR) system by combining a two-photon laser lysis (2PLL) system with a…

A major goal of the Center for Biosignatures Discovery Automation (CBDA) is to design a diagnostic tool that detects novel cancer biosignatures at the single-cell level. We designed the Single-cell QUantitative In situ Reverse Transcription-Polymerase Chain Reaction (SQUIRT-PCR) system by combining a two-photon laser lysis (2PLL) system with a microfluidic reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) platform. It is important to identify early molecular changes from intact tissues as prognosis for premalignant conditions and develop new molecular targets for prevention of cancer progression and improved therapies. This project analyzes RNA expression at the single-cell level and presents itself with two major challenges: (1) detecting low levels of RNA and (2) minimizing RNA absorption in the polydimethylsiloxane (PDMS) microfluidic channel. The first challenge was overcome by successfully detecting picogram (pg) levels of RNA using the Fluidigm (FD) BioMark™ HD System (Fluidigm Corporation, South San Francisco, CA) for RT-qPCR analysis. This technology incorporates a highly precise integrated fluidic circuit (IFC) that allows for high-throughput genetic screening using microarrays. The second challenge entailed collecting data from RNA flow-through samples that were passed through microfluidic channels. One channel was treated with a coating of polyethylene glycol (PEG) and the other remained untreated. Various flow-through samples were subjected to RT-qPCR and analyzed using the FD FLEXsix™ Gene Expression IFC. As predicted, the results showed that the treated PDMS channel absorbed less RNA than the untreated PDMS channel. Once the optimization of the PDMS microfluidic platform is complete, it will be implemented into the 2PLL system. This novel technology will be able to identify cell populations in situ and could have a large impact on cancer diagnostics.

ContributorsBlatt, Amy Elissa (Author) / Meldrum, Deirdre R. (Thesis director) / Tran, Thai (Committee member) / Chao, Joseph (Committee member) / Barrett, The Honors College (Contributor) / Harrington Bioengineering Program (Contributor)

Created2014-05