Matching Items (2)
Description
While techniques for reading DNA in some capacity has been possible for decades,
the ability to accurately edit genomes at scale has remained elusive. Novel techniques
have been introduced recently to aid in the writing of DNA sequences. While writing
DNA is more accessible, it still remains expensive, justifying the increased interest in
in silico predictions of cell behavior. In order to accurately predict the behavior of
cells it is necessary to extensively model the cell environment, including gene-to-gene
interactions as completely as possible.
Significant algorithmic advances have been made for identifying these interactions,
but despite these improvements current techniques fail to infer some edges, and
fail to capture some complexities in the network. Much of this limitation is due to
heavily underdetermined problems, whereby tens of thousands of variables are to be
inferred using datasets with the power to resolve only a small fraction of the variables.
Additionally, failure to correctly resolve gene isoforms using short reads contributes
significantly to noise in gene quantification measures.
This dissertation introduces novel mathematical models, machine learning techniques,
and biological techniques to solve the problems described above. Mathematical
models are proposed for simulation of gene network motifs, and raw read simulation.
Machine learning techniques are shown for DNA sequence matching, and DNA
sequence correction.
Results provide novel insights into the low level functionality of gene networks. Also
shown is the ability to use normalization techniques to aggregate data for gene network
inference leading to larger data sets while minimizing increases in inter-experimental
noise. Results also demonstrate that high error rates experienced by third generation
sequencing are significantly different than previous error profiles, and that these errors can be modeled, simulated, and rectified. Finally, techniques are provided for amending this DNA error that preserve the benefits of third generation sequencing.
the ability to accurately edit genomes at scale has remained elusive. Novel techniques
have been introduced recently to aid in the writing of DNA sequences. While writing
DNA is more accessible, it still remains expensive, justifying the increased interest in
in silico predictions of cell behavior. In order to accurately predict the behavior of
cells it is necessary to extensively model the cell environment, including gene-to-gene
interactions as completely as possible.
Significant algorithmic advances have been made for identifying these interactions,
but despite these improvements current techniques fail to infer some edges, and
fail to capture some complexities in the network. Much of this limitation is due to
heavily underdetermined problems, whereby tens of thousands of variables are to be
inferred using datasets with the power to resolve only a small fraction of the variables.
Additionally, failure to correctly resolve gene isoforms using short reads contributes
significantly to noise in gene quantification measures.
This dissertation introduces novel mathematical models, machine learning techniques,
and biological techniques to solve the problems described above. Mathematical
models are proposed for simulation of gene network motifs, and raw read simulation.
Machine learning techniques are shown for DNA sequence matching, and DNA
sequence correction.
Results provide novel insights into the low level functionality of gene networks. Also
shown is the ability to use normalization techniques to aggregate data for gene network
inference leading to larger data sets while minimizing increases in inter-experimental
noise. Results also demonstrate that high error rates experienced by third generation
sequencing are significantly different than previous error profiles, and that these errors can be modeled, simulated, and rectified. Finally, techniques are provided for amending this DNA error that preserve the benefits of third generation sequencing.
ContributorsFaucon, Philippe Christophe (Author) / Liu, Huan (Thesis advisor) / Wang, Xiao (Committee member) / Crook, Sharon M (Committee member) / Wang, Yalin (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)
Created2017
Description
Many systems in the world \u2014 such as cellular networks, the post service, or transportation pathways \u2014 can be modeled as networks or graphs. The practical applications of graph algorithms generally seek to achieve some goal while minimizing some cost such as money or distance. While the minimum linear arrangement (MLA) problem has been widely-studied amongst graph ordering and embedding problems, there have been no developments into versions of the problem involving degree higher than 2. An application of our problem can be seen in overlay networks in telecommunications. An overlay network is a virtual network that is built on top of another network. It is a logical network where the links between nodes represent the physical paths connecting the nodes in the underlying infrastructure. The underlying physical network may be incomplete, but as long as it is connected, we can build a complete overlay network on top of it. Since some nodes may be overloaded by traffic, we can reduce the strain on the overlay network by limiting the communication between nodes. Some edges, however, may have more importance than others so we must be careful about our selection of which nodes are allowed to communicate with each other. The balance of reducing the degree of the network while maximizing communication forms the basis of our d-degree minimum arrangement problem. In this thesis we will look at several approaches to solving the generalized d-degree minimum arrangement d-MA problem where we embed a graph onto a subgraph of a given degree. We first look into the requirements and challenges of solving the d-MA problem. We will then present a polynomial-time heuristic and compare its performance with the optimal solution derived from integer linear programming. We will show that a simple (d-1)-ary tree construction provides the optimal structure for uniform graphs with large requests sets. Finally, we will present experimental data gathered from running simulations on a variety of graphs to evaluate the efficiency of our heuristic and tree construction.
ContributorsWang, Xiao (Author) / Richa, Andrea (Thesis director) / Nakamura, Mutsumi (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05