Search Content

An exploration of statistical modelling methods on simulation data case study: biomechanical predator-prey simulations

Description

Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more…

Modern, advanced statistical tools from data mining and machine learning have become commonplace in molecular biology in large part because of the “big data” demands of various kinds of “-omics” (e.g., genomics, transcriptomics, metabolomics, etc.). However, in other fields of biology where empirical data sets are conventionally smaller, more traditional statistical methods of inference are still very effective and widely used. Nevertheless, with the decrease in cost of high-performance computing, these fields are starting to employ simulation models to generate insights into questions that have been elusive in the laboratory and field. Although these computational models allow for exquisite control over large numbers of parameters, they also generate data at a qualitatively different scale than most experts in these fields are accustomed to. Thus, more sophisticated methods from big-data statistics have an opportunity to better facilitate the often-forgotten area of bioinformatics that might be called “in-silicomics”.

As a case study, this thesis develops methods for the analysis of large amounts of data generated from a simulated ecosystem designed to understand how mammalian biomechanics interact with environmental complexity to modulate the outcomes of predator–prey interactions. These simulations investigate how other biomechanical parameters relating to the agility of animals in predator–prey pairs are better predictors of pursuit outcomes. Traditional modelling techniques such as forward, backward, and stepwise variable selection are initially used to study these data, but the number of parameters and potentially relevant interaction effects render these methods impractical. Consequently, new modelling techniques such as LASSO regularization are used and compared to the traditional techniques in terms of accuracy and computational complexity. Finally, the splitting rules and instances in the leaves of classification trees provide the basis for future simulation with an economical number of additional runs. In general, this thesis shows the increased utility of these sophisticated statistical techniques with simulated ecological data compared to the approaches traditionally used in these fields. These techniques combined with methods from industrial Design of Experiments will help ecologists extract novel insights from simulations that combine habitat complexity, population structure, and biomechanics.

ContributorsSeto, Christian (Author) / Pavlic, Theodore (Thesis advisor) / Li, Jing (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)

Created2018

Stabilization of 3D DNA nanostructures for in vivo applications and developing an assay to estimate stability

Description

Though DNA nanostructures (DNs) have become interesting subjects of drug delivery, in vivo imaging and biosensor research, however, for real biological applications, they should be ‘long circulating’ in blood. One of the crucial requirements for DN stability is high salt concentration (like ~5–20 mM Mg2+) that is unavailable in a…

Though DNA nanostructures (DNs) have become interesting subjects of drug delivery, in vivo imaging and biosensor research, however, for real biological applications, they should be ‘long circulating’ in blood. One of the crucial requirements for DN stability is high salt concentration (like ~5–20 mM Mg2+) that is unavailable in a cell culture medium or in blood. Hence DNs denature promptly when injected into living systems. Another important factor is the presence of nucleases that cause fast degradation of unprotected DNs. The third factor is ‘opsonization’ which is the immune process by which phagocytes target foreign particles introduced into the bloodstream. The primary aim of this thesis is to design strategies that can improve the in vivo stability of DNs, thus improving their pharmacodynamics and biodistribution.

Several strategies were investigated to address the three previously mentioned limitations. The first attempt was to study the effect length and conformation of polyethylene glycol (PEG) on DN stability. DNs were also coated with PEG-lipid and human serum albumin (HSA) and their stealth efficiencies were compared. The findings reveal that both PEGylation and albumin coating enhance low salt stability, increase resistance towards nuclease action and reduce uptake of DNs by macrophages. Any protective coating around a DN increases its hydrodynamic radius, which is a crucial parameter influencing their clearance. Keeping this in mind, intrinsically stable DNs that can survive low salt concentration without any polymer coating were built. Several DNA compaction agents and DNA binders were screened to stabilize DNs in low magnesium conditions. Among them arginine, lysine, bis-lysine and hexamine cobalt showed the potential to enhance DN stability.

This thesis also presents a sensitive assay, the Proximity Ligation Assay (PLA), for the estimation of DN stability with time. It requires very simple modifications on the DNs and it can yield precise results from a very small amount of sample. The applicability of PLA was successfully tested on several DNs ranging from a simple wireframe tetrahedron to a 3D origami and the protocol to collect in vivo samples, isolate the DNs and measure their stability was developed.

ContributorsBanerjee, Saswata (Author) / Yan, Hao (Thesis advisor) / Angell, Austen (Committee member) / Woodbury, Neal (Committee member) / Liu, Yan (Committee member) / Arizona State University (Publisher)

Created2018

Bowties, barcodes, and DNA origami: a novel approach for paired-chain immune receptor repertoire analysis

Description

There are many biological questions that require single-cell analysis of gene sequences, including analysis of clonally distributed dimeric immunoreceptors on lymphocytes (T cells and B cells) and/or the accumulation of driver/accessory mutations in polyclonal tumors. Lysis of bulk cell populations results in mixing of gene sequences, making it impossible to…

There are many biological questions that require single-cell analysis of gene sequences, including analysis of clonally distributed dimeric immunoreceptors on lymphocytes (T cells and B cells) and/or the accumulation of driver/accessory mutations in polyclonal tumors. Lysis of bulk cell populations results in mixing of gene sequences, making it impossible to know which pairs of gene sequences originated from any particular cell and obfuscating analysis of rare sequences within large populations. Although current single-cell sorting technologies can be used to address some of these questions, such approaches are expensive, require specialized equipment, and lack the necessary high-throughput capacity for comprehensive analysis. Water-in-oil emulsion approaches for single cell sorting have been developed but droplet-based single-cell lysis and analysis have proven inefficient and yield high rates of false pairings. Ideally, molecular approaches for linking gene sequences from individual cells could be coupled with next-generation high-throughput sequencing to overcome these obstacles, but conventional approaches for linking gene sequences, such as by transfection with bridging oligonucleotides, result in activation of cellular nucleases that destroy the template, precluding this strategy. Recent advances in the synthesis and fabrication of modular deoxyribonucleic acid (DNA) origami nanostructures have resulted in new possibilities for addressing many current and long-standing scientific and technical challenges in biology and medicine. One exciting application of DNA nanotechnology is the intracellular capture, barcode linkage, and subsequent sequence analysis of multiple messenger RNA (mRNA) targets from individual cells within heterogeneous cell populations. DNA nanostructures can be transfected into individual cells to capture and protect mRNA for specific expressed genes, and incorporation of origami-specific bowtie-barcodes into the origami nanostructure facilitates pairing and analysis of mRNA from individual cells by high-throughput next-generation sequencing. This approach is highly modular and can be adapted to virtually any two (and possibly more) gene target sequences, and therefore has a wide range of potential applications for analysis of diverse cell populations such as understanding the relationship between different immune cell populations, development of novel immunotherapeutic antibodies, or improving the diagnosis or treatment for a wide variety of cancers.

ContributorsSchoettle, Louis (Author) / Blattman, Joseph N (Thesis advisor) / Yan, Hao (Committee member) / Chang, Yung (Committee member) / Lindsay, Stuart (Committee member) / Arizona State University (Publisher)

Created2017

Molecular profiling plasma extracellular vesicles from breast cancer patients

Description

Extracellular vesicles (EVs) represent a heterogeneous population of small vesicles, consisting of a phospholipidic bilayer surrounding a soluble interior cargo. These vesicles play an important role in cellular communication by virtue of their protein, RNA, and lipid content, which can be transferred among cells. Peripheral blood is a rich source…

Extracellular vesicles (EVs) represent a heterogeneous population of small vesicles, consisting of a phospholipidic bilayer surrounding a soluble interior cargo. These vesicles play an important role in cellular communication by virtue of their protein, RNA, and lipid content, which can be transferred among cells. Peripheral blood is a rich source of circulating EVs. An analysis of EVs in peripheral blood could provide access to unparalleled amounts of biomarkers of great diagnostic, prognostic as well as therapeutic value. In the current study, a plasma EV enrichment method based on pluronic co-polymer was first established and characterized. Plasma EVs from breast cancer patients were then enriched, profiled and compared to non-cancer controls. Proteins signatures that contributed to the prediction of cancer samples from non-cancer controls were created by a random-forest based cross-validation approach. We found that a large portion of these signatures were related to breast cancer aggression. To verify such findings, KIAA0100, one of the features identified, was chosen for in vitro molecular and cellular studies in the breast cancer cell line MDA-MB-231. We found that KIAA0100 regulates cancer cell aggression in MDA-MB-231 in an anchorage-independent manner and is particularly associated with anoikis resistance through its interaction with HSPA1A. Lastly, plasma EVs contain not only individual proteins, but also numerous molecular complexes. In order to measure millions of proteins, isoforms, and complexes simultaneously, Adaptive Dynamic Artificial Poly-ligand Targeting (ADAPT) platform was applied. ADAPT employs an enriched library of single-stranded oligodeoxynucleotides to profile complex biological samples, thus achieving a deep coverage of system-wide, native biomolecules. Profiling of EVs from breast cancer patients was able to obtain a prediction AUC performance of 0.73 when compared biopsy-positive cancer patient to healthy controls and 0.64 compared to biopsy-negative controls and such performance was not associated with the physical breast condition indicated by BIRAD scores. Taken together, current research demonstrated the potential of profiling plasma EVs in searching for therapeutic targets as well as diagnostic signatures.

ContributorsZhong, Zhenyu (Author) / Spetzler, David (Thesis advisor) / Yan, Hao (Thesis advisor) / Lake, Douglas (Committee member) / Mangone, Marco (Committee member) / Arizona State University (Publisher)

Created2018

Sensing and regulation from nucleic acid devices

Description

The highly predictable structural and thermodynamic behavior of deoxynucleic acid (DNA) and ribonucleic acid (RNA) have made them versatile tools for creating artificial nanostructures over broad range. Moreover, DNA and RNA are able to interact with biological ligand as either synthetic aptamers or natural components, conferring direct biological functions to…

The highly predictable structural and thermodynamic behavior of deoxynucleic acid (DNA) and ribonucleic acid (RNA) have made them versatile tools for creating artificial nanostructures over broad range. Moreover, DNA and RNA are able to interact with biological ligand as either synthetic aptamers or natural components, conferring direct biological functions to the nucleic acid devices. The applications of nucleic acids greatly relies on the bio-reactivity and specificity when applied to highly complexed biological systems.

This dissertation aims to 1) develop new strategy to identify high affinity nucleic acid aptamers against biological ligand; and 2) explore highly orthogonal RNA riboregulators in vivo for constructing multi-input gene circuits with NOT logic. With the aid of a DNA nanoscaffold, pairs of hetero-bivalent aptamers for human alpha thrombin were identified with ultra-high binding affinity in femtomolar range with displaying potent biological modulations for the enzyme activity. The newly identified bivalent aptamers enriched the aptamer tool box for future therapeutic applications in hemostasis, and also the strategy can be potentially developed for other target molecules. Secondly, by employing a three-way junction structure in the riboregulator structure through de-novo design, we identified a family of high-performance RNA-sensing translational repressors that down-regulates gene translation in response to cognate RNAs with remarkable dynamic range and orthogonality. Harnessing the 3WJ repressors as modular parts, we integrate them into biological circuits that execute universal NAND and NOR logic with up to four independent RNA inputs in Escherichia coli.

ContributorsZhou, Yu (Ph.D.) (Author) / Yan, Hao (Thesis advisor) / Green, Alexander (Thesis advisor) / Woodbury, Neal (Committee member) / Ros, Alexandra (Committee member) / Arizona State University (Publisher)

Created2019

Image-based process monitoring via generative adversarial autoencoder with applications to rolling defect detection

Description

Image-based process monitoring has recently attracted increasing attention due to the advancement of the sensing technologies. However, existing process monitoring methods fail to fully utilize the spatial information of images due to their complex characteristics including the high dimensionality and complex spatial structures. Recent advancement of the unsupervised deep models…

Image-based process monitoring has recently attracted increasing attention due to the advancement of the sensing technologies. However, existing process monitoring methods fail to fully utilize the spatial information of images due to their complex characteristics including the high dimensionality and complex spatial structures. Recent advancement of the unsupervised deep models such as a generative adversarial network (GAN) and generative adversarial autoencoder (AAE) has enabled to learn the complex spatial structures automatically. Inspired by this advancement, we propose an anomaly detection framework based on the AAE for unsupervised anomaly detection for images. AAE combines the power of GAN with the variational autoencoder, which serves as a nonlinear dimension reduction technique with regularization from the discriminator. Based on this, we propose a monitoring statistic efficiently capturing the change of the image data. The performance of the proposed AAE-based anomaly detection algorithm is validated through a simulation study and real case study for rolling defect detection.

ContributorsYeh, Huai-Ming (Author) / Yan, Hao (Thesis advisor) / Pan, Rong (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2019

Computational design and study of structural and dynamic nucleic acid systems

Description

DNA and RNA are generally regarded as one of the central molecules in molecular biology. Recent advancements in the field of DNA/RNA nanotechnology witnessed the success of usage of DNA/RNA as programmable molecules to construct nano-objects with predefined shapes and dynamic molecular machines for various functions. From the perspective of…

DNA and RNA are generally regarded as one of the central molecules in molecular biology. Recent advancements in the field of DNA/RNA nanotechnology witnessed the success of usage of DNA/RNA as programmable molecules to construct nano-objects with predefined shapes and dynamic molecular machines for various functions. From the perspective of structural design with nucleic acid, there are basically two types of assembly method, DNA tile based assembly and DNA origami based assembly, used to construct infinite-sized crystal structures and finite-sized molecular structures. The assembled structure can be used for arrangement of other molecules or nanoparticles with the resolution of nanometers to create new type of materials. The dynamic nucleic acid machine is based on the DNA strand displacement, which allows two nucleic acid strands to hybridize with each other to displace one or more prehybridized strands in the process. Strand displacement reaction has been implemented to construct a variety of dynamic molecular systems, such as molecular computer, oscillators, in vivo devices for gene expression control.

This thesis will focus on the computational design of structural and dynamic nucleic acid systems, particularly for new type of DNA structure design and high precision control of gene expression in vivo. Firstly, a new type of fundamental DNA structural motif, the layered-crossover motif, will be introduced. The layered-crossover allow non-parallel alignment of DNA helices with precisely controlled angle. By using the layered-crossover motif, the scaffold can go through the 3D framework DNA origami structures. The properties of precise angle control of the layered-crossover tiles can also be used to assemble 2D and 3D crystals. One the dynamic control part, a de-novo-designed riboregulator is developed that can recognize single nucleotide variation. The riboregulators can also be used to develop paper-based diagnostic devices.

ContributorsHong, Fan, Ph. D (Author) / Yan, Hao (Thesis advisor) / Liu, Yan (Thesis advisor) / Green, Alexander A. (Committee member) / Borges, Chad (Committee member) / Arizona State University (Publisher)

Created2019

New statistical transfer learning models for health care applications

Description

Transfer learning is a sub-field of statistical modeling and machine learning. It refers to methods that integrate the knowledge of other domains (called source domains) and the data of the target domain in a mathematically rigorous and intelligent way, to develop a better model for the target domain than a…

Transfer learning is a sub-field of statistical modeling and machine learning. It refers to methods that integrate the knowledge of other domains (called source domains) and the data of the target domain in a mathematically rigorous and intelligent way, to develop a better model for the target domain than a model using the data of the target domain alone. While transfer learning is a promising approach in various application domains, my dissertation research focuses on the particular application in health care, including telemonitoring of Parkinson’s Disease (PD) and radiomics for glioblastoma.

The first topic is a Mixed Effects Transfer Learning (METL) model that can flexibly incorporate mixed effects and a general-form covariance matrix to better account for similarity and heterogeneity across subjects. I further develop computationally efficient procedures to handle unknown parameters and large covariance structures. Domain relations, such as domain similarity and domain covariance structure, are automatically quantified in the estimation steps. I demonstrate METL in an application of smartphone-based telemonitoring of PD.

The second topic focuses on an MRI-based transfer learning algorithm for non-invasive surgical guidance of glioblastoma patients. Limited biopsy samples per patient create a challenge to build a patient-specific model for glioblastoma. A transfer learning framework helps to leverage other patient’s knowledge for building a better predictive model. When modeling a target patient, not every patient’s information is helpful. Deciding the subset of other patients from which to transfer information to the modeling of the target patient is an important task to build an accurate predictive model. I define the subset of “transferrable” patients as those who have a positive rCBV-cell density correlation, because a positive correlation is confirmed by imaging theory and the its respective literature.

The last topic is a Privacy-Preserving Positive Transfer Learning (P3TL) model. Although negative transfer has been recognized as an important issue by the transfer learning research community, there is a lack of theoretical studies in evaluating the risk of negative transfer for a transfer learning method and identifying what causes the negative transfer. My work addresses this issue. Driven by the theoretical insights, I extend Bayesian Parameter Transfer (BPT) to a new method, i.e., P3TL. The unique features of P3TL include intelligent selection of patients to transfer in order to avoid negative transfer and maintain patient privacy. These features make P3TL an excellent model for telemonitoring of PD using an At-Home Testing Device.

ContributorsYoon, Hyunsoo (Author) / Li, Jing (Thesis advisor) / Wu, Teresa (Committee member) / Yan, Hao (Committee member) / Hu, Leland S. (Committee member) / Arizona State University (Publisher)

Created2018

Fast forward and inverse wave propagation for tomographic imaging of defects in solids

Description

Aging-related damage and failure in structures, such as fatigue cracking, corrosion, and delamination, are critical for structural integrity. Most engineering structures have embedded defects such as voids, cracks, inclusions from manufacturing. The properties and locations of embedded defects are generally unknown and hard to detect in complex engineering structures.…

Aging-related damage and failure in structures, such as fatigue cracking, corrosion, and delamination, are critical for structural integrity. Most engineering structures have embedded defects such as voids, cracks, inclusions from manufacturing. The properties and locations of embedded defects are generally unknown and hard to detect in complex engineering structures. Therefore, early detection of damage is beneficial for prognosis and risk management of aging infrastructure system.

Non-destructive testing (NDT) and structural health monitoring (SHM) are widely used for this purpose. Different types of NDT techniques have been proposed for the damage detection, such as optical image, ultrasound wave, thermography, eddy current, and microwave. The focus in this study is on the wave-based detection method, which is grouped into two major categories: feature-based damage detection and model-assisted damage detection. Both damage detection approaches have their own pros and cons. Feature-based damage detection is usually very fast and doesn’t involve in the solution of the physical model. The key idea is the dimension reduction of signals to achieve efficient damage detection. The disadvantage is that the loss of information due to the feature extraction can induce significant uncertainties and reduces the resolution. The resolution of the feature-based approach highly depends on the sensing path density. Model-assisted damage detection is on the opposite side. Model-assisted damage detection has the ability for high resolution imaging with limited number of sensing paths since the entire signal histories are used for damage identification. Model-based methods are time-consuming due to the requirement for the inverse wave propagation solution, which is especially true for the large 3D structures.

The motivation of the proposed method is to develop efficient and accurate model-based damage imaging technique with limited data. The special focus is on the efficiency of the damage imaging algorithm as it is the major bottleneck of the model-assisted approach. The computational efficiency is achieved by two complimentary components. First, a fast forward wave propagation solver is developed, which is verified with the classical Finite Element(FEM) solution and the speed is 10-20 times faster. Next, efficient inverse wave propagation algorithms is proposed. Classical gradient-based optimization algorithms usually require finite difference method for gradient calculation, which is prohibitively expensive for large degree of freedoms. An adjoint method-based optimization algorithms is proposed, which avoids the repetitive finite difference calculations for every imaging variables. Thus, superior computational efficiency can be achieved by combining these two methods together for the damage imaging. A coupled Piezoelectric (PZT) damage imaging model is proposed to include the interaction between PZT and host structure. Following the formulation of the framework, experimental validation is performed on isotropic and anisotropic material with defects such as cracks, delamination, and voids. The results show that the proposed method can detect and reconstruct multiple damage simultaneously and efficiently, which is promising to be applied to complex large-scale engineering structures.

ContributorsChang, Qinan (Author) / Liu, Yongming (Thesis advisor) / Mignolet, Marc (Committee member) / Chattopadhyay, Aditi (Committee member) / Yan, Hao (Committee member) / Ren, Yi (Committee member) / Arizona State University (Publisher)

Created2019

Designing and Testing of Large 2D Arrays of DNA Origami

Description

Repeating tiles made of DNA were used to try to form an indefinitely large structure. Both the tiles and structure were 2D. Two different patterns were tested, one corrugated and one not. Corrugation means that the tiles alternated between facing up and facing down, canceling out any curvature to the…

Repeating tiles made of DNA were used to try to form an indefinitely large structure. Both the tiles and structure were 2D. Two different patterns were tested, one corrugated and one not. Corrugation means that the tiles alternated between facing up and facing down, canceling out any curvature to the tile and creating a slightly corrugated but largely 2D pattern. Annealing methods were also experimented with. Annealing the structure in two, separate steps as opposed to one was tested. Another experiment was comparing cyclic versus linear annealing. A linear decrease in temperatures defines the linear annealing, and a cyclic method involved a linear drop to a certain temperature, followed by a slight increase in temperature and cooling back down again. This cycle is done several times before it continues linear cool down. It was seen that both corrugated and non-corrugated structures could be made. In both cases tiles that make up a larger section of the overall pattern were more successful. This is especially important for the non-corrugated pattern. Linear and 2step annealing methods seem to yield the best results.

ContributorsHunt, Ashley Elizabeth (Author) / Yan, Liu (Thesis director) / Yan, Hao (Committee member) / Barrett, The Honors College (Contributor) / Department of Chemistry and Biochemistry (Contributor)

Created2015-05