Matching Items (2)
156080-Thumbnail Image.png
Description
While techniques for reading DNA in some capacity has been possible for decades,

the ability to accurately edit genomes at scale has remained elusive. Novel techniques

have been introduced recently to aid in the writing of DNA sequences. While writing

DNA is more accessible, it still remains expensive, justifying the increased interest in

in

While techniques for reading DNA in some capacity has been possible for decades,

the ability to accurately edit genomes at scale has remained elusive. Novel techniques

have been introduced recently to aid in the writing of DNA sequences. While writing

DNA is more accessible, it still remains expensive, justifying the increased interest in

in silico predictions of cell behavior. In order to accurately predict the behavior of

cells it is necessary to extensively model the cell environment, including gene-to-gene

interactions as completely as possible.

Significant algorithmic advances have been made for identifying these interactions,

but despite these improvements current techniques fail to infer some edges, and

fail to capture some complexities in the network. Much of this limitation is due to

heavily underdetermined problems, whereby tens of thousands of variables are to be

inferred using datasets with the power to resolve only a small fraction of the variables.

Additionally, failure to correctly resolve gene isoforms using short reads contributes

significantly to noise in gene quantification measures.

This dissertation introduces novel mathematical models, machine learning techniques,

and biological techniques to solve the problems described above. Mathematical

models are proposed for simulation of gene network motifs, and raw read simulation.

Machine learning techniques are shown for DNA sequence matching, and DNA

sequence correction.

Results provide novel insights into the low level functionality of gene networks. Also

shown is the ability to use normalization techniques to aggregate data for gene network

inference leading to larger data sets while minimizing increases in inter-experimental

noise. Results also demonstrate that high error rates experienced by third generation

sequencing are significantly different than previous error profiles, and that these errors can be modeled, simulated, and rectified. Finally, techniques are provided for amending this DNA error that preserve the benefits of third generation sequencing.
ContributorsFaucon, Philippe Christophe (Author) / Liu, Huan (Thesis advisor) / Wang, Xiao (Committee member) / Crook, Sharon M (Committee member) / Wang, Yalin (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)
Created2017
135251-Thumbnail Image.png
Description
Many systems in the world \u2014 such as cellular networks, the post service, or transportation pathways \u2014 can be modeled as networks or graphs. The practical applications of graph algorithms generally seek to achieve some goal while minimizing some cost such as money or distance. While the minimum linear arrangement

Many systems in the world \u2014 such as cellular networks, the post service, or transportation pathways \u2014 can be modeled as networks or graphs. The practical applications of graph algorithms generally seek to achieve some goal while minimizing some cost such as money or distance. While the minimum linear arrangement (MLA) problem has been widely-studied amongst graph ordering and embedding problems, there have been no developments into versions of the problem involving degree higher than 2. An application of our problem can be seen in overlay networks in telecommunications. An overlay network is a virtual network that is built on top of another network. It is a logical network where the links between nodes represent the physical paths connecting the nodes in the underlying infrastructure. The underlying physical network may be incomplete, but as long as it is connected, we can build a complete overlay network on top of it. Since some nodes may be overloaded by traffic, we can reduce the strain on the overlay network by limiting the communication between nodes. Some edges, however, may have more importance than others so we must be careful about our selection of which nodes are allowed to communicate with each other. The balance of reducing the degree of the network while maximizing communication forms the basis of our d-degree minimum arrangement problem. In this thesis we will look at several approaches to solving the generalized d-degree minimum arrangement d-MA problem where we embed a graph onto a subgraph of a given degree. We first look into the requirements and challenges of solving the d-MA problem. We will then present a polynomial-time heuristic and compare its performance with the optimal solution derived from integer linear programming. We will show that a simple (d-1)-ary tree construction provides the optimal structure for uniform graphs with large requests sets. Finally, we will present experimental data gathered from running simulations on a variety of graphs to evaluate the efficiency of our heuristic and tree construction.
ContributorsWang, Xiao (Author) / Richa, Andrea (Thesis director) / Nakamura, Mutsumi (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05