Filtering by
- Status: Published
the ability to accurately edit genomes at scale has remained elusive. Novel techniques
have been introduced recently to aid in the writing of DNA sequences. While writing
DNA is more accessible, it still remains expensive, justifying the increased interest in
in silico predictions of cell behavior. In order to accurately predict the behavior of
cells it is necessary to extensively model the cell environment, including gene-to-gene
interactions as completely as possible.
Significant algorithmic advances have been made for identifying these interactions,
but despite these improvements current techniques fail to infer some edges, and
fail to capture some complexities in the network. Much of this limitation is due to
heavily underdetermined problems, whereby tens of thousands of variables are to be
inferred using datasets with the power to resolve only a small fraction of the variables.
Additionally, failure to correctly resolve gene isoforms using short reads contributes
significantly to noise in gene quantification measures.
This dissertation introduces novel mathematical models, machine learning techniques,
and biological techniques to solve the problems described above. Mathematical
models are proposed for simulation of gene network motifs, and raw read simulation.
Machine learning techniques are shown for DNA sequence matching, and DNA
sequence correction.
Results provide novel insights into the low level functionality of gene networks. Also
shown is the ability to use normalization techniques to aggregate data for gene network
inference leading to larger data sets while minimizing increases in inter-experimental
noise. Results also demonstrate that high error rates experienced by third generation
sequencing are significantly different than previous error profiles, and that these errors can be modeled, simulated, and rectified. Finally, techniques are provided for amending this DNA error that preserve the benefits of third generation sequencing.
The purpose of this project is to create a useful tool for musicians that utilizes the harmonic content of their playing to recommend new, relevant chords to play. This is done by training various Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs) on the lead sheets of 100 different jazz standards. A total of 200 unique datasets were produced and tested, resulting in the prediction of nearly 51 million chords. A note-prediction accuracy of 82.1% and a chord-prediction accuracy of 34.5% were achieved across all datasets. Methods of data representation that were rooted in valid music theory frameworks were found to increase the efficacy of harmonic prediction by up to 6%. Optimal LSTM input sizes were also determined for each method of data representation.
My first project focuses on a new strategy for preparing solid-state nanopore sensors for DNA sequencing. Challenges for existing nanopore approaches include specificity of detection, controllability of translocation, and scalability of fabrication. In a new solid-state pore architecture, top-down fabrication of an initial electrode gap embedded in a sealed nanochannel is followed by feedback-controlled electrochemical deposition of metal to shrink the gap and define the nanopore size. The resulting structure allows for the use of an electric field to control the motion of DNA through the pore and the direct detection of a tunnel current through a DNA molecule.
My second project focuses on top-down fabrication strategies for a fixed nanogap device to explore the electronic conductance of proteins. Here, a metal-insulator-metal junction can be fabricated with top-down fabrication techniques, and the subsequent electrode surfaces can be chemically modified with molecules that bind strongly to a target protein. When proteins bind to molecules on either side of the dielectric gap, a molecular junction is formed with observed conductances on the order of nanosiemens. These devices can be used in applications such as DNA sequencing or to gain insight into fundamental questions such as the mechanism of electron transport in proteins.