Matching Items (78)
134524-Thumbnail Image.png
Description
With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of data per run, and such large storage demands are even

With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of data per run, and such large storage demands are even more important to consider for institutions that rely on their own servers rather than large data centers (cloud storage)1. Compression algorithms aim to reduce the amount of space taken up by large genomic datasets by encoding the most frequently occurring symbols with the shortest bit codewords and by changing the order of the data to make it easier to encode. Depending on the probability distribution of the symbols in the dataset or the structure of the data, choosing the wrong algorithm could result in a compressed file larger than the original or a poorly compressed file that results in a waste of time and space2. To test efficiency among compression algorithms for each file type, 37 open-source compression algorithms were used to compress six types of genomic datasets (FASTA, VCF, BCF, GFF, GTF, and SAM) and evaluated on compression speed, decompression speed, compression ratio, and file size using the benchmark test lzbench. Compressors that outpreformed the popular bioinformatics compressor Gzip (zlib -6) were evaluated against one another by ratio and speed for each file type and across the geometric means of all file types. Compressors that exhibited fast compression and decompression speeds were also evaluated by transmission time through variable speed internet pipes in scenarios where the file was compressed only once or compressed multiple times.
ContributorsHowell, Abigail (Author) / Cartwright, Reed (Thesis director) / Wilson Sayres, Melissa (Committee member) / Taylor, Jay (Committee member) / Barrett, The Honors College (Contributor)
Created2017-05
135440-Thumbnail Image.png
Description
Many bacteria actively import environmental DNA and incorporate it into their genomes. This behavior, referred to as transformation, has been described in many species from diverse taxonomic backgrounds. Transformation is expected to carry some selective advantages similar to those postulated for meiotic sex in eukaryotes. However, the accumulation of loss-of-function

Many bacteria actively import environmental DNA and incorporate it into their genomes. This behavior, referred to as transformation, has been described in many species from diverse taxonomic backgrounds. Transformation is expected to carry some selective advantages similar to those postulated for meiotic sex in eukaryotes. However, the accumulation of loss-of-function alleles at transformation loci and an increased mutational load from recombining with DNA from dead cells create additional costs to transformation. These costs have been shown to outweigh many of the benefits of recombination under a variety of likely parameters. We investigate an additional proposed benefit of sexual recombination, the Red Queen hypothesis, as it relates to bacterial transformation. Here we describe a computational model showing that host-pathogen coevolution may provide a large selective benefit to transformation and allow transforming cells to invade an environment dominated by otherwise equal non-transformers. Furthermore, we observe that host-pathogen dynamics cause the selection pressure on transformation to vary extensively in time, explaining the tight regulation and wide variety of rates observed in naturally competent bacteria. Host-pathogen dynamics may explain the evolution and maintenance of natural competence despite its associated costs.
ContributorsPalmer, Nathan David (Author) / Cartwright, Reed (Thesis director) / Wang, Xuan (Committee member) / Sievert, Chris (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05
135454-Thumbnail Image.png
Description
Mammary gland development in humans during puberty involves the enlargement of breast tissue, but this is not true in non-human primates. To identify potential causes of this difference, I examined variation in substitution rates across genes related to mammary development. Genes undergoing purifying selection show slower-than-average substitution rates, while genes

Mammary gland development in humans during puberty involves the enlargement of breast tissue, but this is not true in non-human primates. To identify potential causes of this difference, I examined variation in substitution rates across genes related to mammary development. Genes undergoing purifying selection show slower-than-average substitution rates, while genes undergoing positive selection show faster rates. These may be related to the difference between humans and other primates. Three genes were found to be accelerated were FOXF1, IGFBP5, and ATP2B2, but only the latter one was found in humans and it seems unlikely that it would be related to the differences between mammary gland development at puberty between humans and non-human primates.
ContributorsArroyo, Diana (Author) / Cartwright, Reed (Thesis director) / Wilson Sayres, Melissa (Committee member) / Schwartz, Rachel (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)
Created2016-05
Description

This thesis focuses on how domain formation and local disorder mediate non-equilibrium order in the context of condensed matter physics. More specifically, the data supports c-axis CDW ordering in the context of the rare-earth Tritellurides. Experimental studies were performed on Pd:ErTe3 by ultra-fast pump-probe and x-ray free electron laser (XFEL).

This thesis focuses on how domain formation and local disorder mediate non-equilibrium order in the context of condensed matter physics. More specifically, the data supports c-axis CDW ordering in the context of the rare-earth Tritellurides. Experimental studies were performed on Pd:ErTe3 by ultra-fast pump-probe and x-ray free electron laser (XFEL). Ginzburg Landau models were used to simulate domain formation. Universal scaling analysis on the data reveals that topological defects govern the relaxation of domain walls in Pd:ErTe3. This thesis presents information on progress towards using light to control material domains.

ContributorsMiller, Alex (Author) / Teitelbaum, Samuel (Thesis director) / Belitsky, Andrei (Committee member) / Kaindl, Robert (Committee member) / Barrett, The Honors College (Contributor) / Department of Physics (Contributor) / School of Mathematical and Statistical Sciences (Contributor)
Created2023-05
156871-Thumbnail Image.png
Description
Understanding the diversity, evolutionary relationships, and geographic distribution of species is foundational knowledge in biology. However, this knowledge is lacking for many diverse lineages of the tree of life. This is the case for the desert stink beetles in the tribe Amphidorini LeConte, 1862 (Coleoptera: Tenebrionidae) – a lineage of

Understanding the diversity, evolutionary relationships, and geographic distribution of species is foundational knowledge in biology. However, this knowledge is lacking for many diverse lineages of the tree of life. This is the case for the desert stink beetles in the tribe Amphidorini LeConte, 1862 (Coleoptera: Tenebrionidae) – a lineage of arid-adapted flightless beetles found throughout western North America. Four interconnected studies that jointly increase our knowledge of this group are presented. First, the darkling beetle fauna of the Algodones sand dunes in southern California is examined as a case study to explore the scientific practice of checklist creation. An updated list of the species known from this region is presented, with a critical focus on material now made available through digitization and global aggregation. This part concludes with recommendations for future biodiversity checklist authors. Second, the psammophilic genus Trogloderus LeConte, 1879 is revised. Six new species are described, and the first, multi-gene phylogeny for the genus is inferred. In addition, historical biogeographic reconstructions along with novel hypotheses of speciation patterns within the Intermountain Region are given. In particular, the Kaibab Plateau and Kaiparowitz Formation are found to have promoted speciation on the Colorado Plateau. The Owens Valley and prehistoric Bouse Embayment are similarly hypothesized to drive species diversification in southern California. Third, a novel phylogenomic analysis for the tribe Amphidorini is presented, based on 29 de novo partial transcriptomes. Three putative ortholog sets were discovered and analyzed to infer the relationships between species groups and genera. The existing classification of the tribe is found to be highly inadequate, though the earliest-diverging relationships within the tribe are still in question. Finally, the new phylogenetic framework is used to provide a genus-level revision for the Amphidorini, which previously contained six valid genera and 253 valid species. This updated classification includes more than 100 taxonomic changes and results in the revised tribe consisting of 16 genera, with three being described as new to science.
ContributorsJohnston, Murray Andrew (Author) / Franz, Nico M (Thesis advisor) / Cartwright, Reed (Committee member) / Taylor, Jesse (Committee member) / Pigg, Kathleen (Committee member) / Arizona State University (Publisher)
Created2018
154297-Thumbnail Image.png
Description
In this thesis, I present the study of nucleon structure from distinct perspectives. I start by elaborating the motivations behind the endeavors and then introducing the key concept, namely the generalized parton distribution functions (GPDs), which serves as the frame- work describing hadronic particles in terms of their fundamental constituents.

In this thesis, I present the study of nucleon structure from distinct perspectives. I start by elaborating the motivations behind the endeavors and then introducing the key concept, namely the generalized parton distribution functions (GPDs), which serves as the frame- work describing hadronic particles in terms of their fundamental constituents. The second chapter is then devoted to a detailed phenomenological study of the Virtual Compton Scattering (VCS) process, where a more comprehensive parametrization is suggested. In the third chapter, the renormalization kernels that enters the QCD evolution equations at twist- 4 accuracy are computed in terms of Feynman diagrams in momentum space, which can be viewed as an extension of the work by Bukhvostov, Frolov, Lipatov, and Kuraev (BKLK). The results can be used for determining the QCD background interaction for future precision measurements.
ContributorsJi, Yao, Ph. D (Author) / Belitsky, Andrei (Thesis advisor) / Lebed, Richard (Committee member) / Schmidt, Kevin E (Committee member) / Vachaspati, Tanmay (Committee member) / Arizona State University (Publisher)
Created2016
154965-Thumbnail Image.png
Description
The work presented in this dissertation examines three different nonequilibrium particle physics processes that could play a role in answering the question “how was the particle content of today’s universe produced after the big bang?” Cosmic strings produced from spontaneous breaking of a hidden sector $U(1)_{\rm X}$ symmetry could couple

The work presented in this dissertation examines three different nonequilibrium particle physics processes that could play a role in answering the question “how was the particle content of today’s universe produced after the big bang?” Cosmic strings produced from spontaneous breaking of a hidden sector $U(1)_{\rm X}$ symmetry could couple to Standard Model fields through Higgs Portal or Kinetic Mixing operators and radiate particles that contribute to the diffuse gamma ray background. In this work we calculate the properties of these strings, including finding effective couplings between the strings and Standard Model fields. Explosive particle production after inflation, known as preheating, would have produced a stochastic background of gravitational waves (GW). This work shows how the presence of realistic additional fields and interactions can affect this prediction dramatically. Specifically, it considers the inflaton to be coupled to a light scalar field, and shows that even a very small quartic self-interaction term will reduce the amplitude of the gravitational wave spectrum. For self-coupling $\lambda_{\chi} \gtrsim g^2$, where $g^2$ is the inflaton-scalar coupling, the peak energy density goes as $\Omega_{\rm GW}^{(\lambda_{\chi})} / \Omega_{\rm GW}^{(\lambda_{\chi}=0)} \sim (g^2/\lambda_{\chi})^{2}$. Finally, leptonic charge-parity (CP) violation could be an important clue to understanding the origin of our universe's matter-antimatter asymmetry, and long-baseline neutrino oscillation experiments in the coming decade may uncover this. The CP violating effects of a possible fourth ``sterile" neutrino can interfere with the usual three neutrinos; this work shows how combinations of various measurements can help break those degeneracies.
ContributorsHyde, Jeffrey Morgan (Author) / Vachaspati, Tanmay (Thesis advisor) / Easson, Damien (Committee member) / Belitsky, Andrei (Committee member) / Comfort, Joseph (Committee member) / Arizona State University (Publisher)
Created2016
153631-Thumbnail Image.png
Description
With the discovery of the Higgs Boson in 2012, particle physics has decidedly moved beyond the Standard Model into a new epoch. Though the Standard Model particle content is now completely accounted for, there remain many theoretical issues about the structure of the theory in need of resolution. Among these

With the discovery of the Higgs Boson in 2012, particle physics has decidedly moved beyond the Standard Model into a new epoch. Though the Standard Model particle content is now completely accounted for, there remain many theoretical issues about the structure of the theory in need of resolution. Among these is the hierarchy problem: since the renormalized Higgs mass receives quadratic corrections from a higher cutoff scale, what keeps the Higgs boson light? Many possible solutions to this problem have been advanced, such as supersymmetry, Randall-Sundrum models, or sub-millimeter corrections to gravity. One such solution has been advanced by the Lee-Wick Standard Model. In this theory, higher-derivative operators are added to the Lagrangian for each Standard Model field, which result in propagators that possess two physical poles and fall off more rapidly in the ultraviolet regime. It can be shown by an auxiliary field transformation that the higher-derivative theory is identical to positing a second, manifestly renormalizable theory in which new fields with opposite-sign kinetic and mass terms are found. These so-called Lee-Wick fields have opposite-sign propagators, and famously cancel off the quadratic divergences that plague the renormalized Higgs mass. The states in the Hilbert space corresponding to Lee-Wick particles have negative norm, and implications for causality and unitarity are examined.

This dissertation explores a variant of the theory called the N = 3 Lee-Wick

Standard Model. The Lagrangian of this theory features a yet-higher derivative operator, which produces a propagator with three physical poles and possesses even better high-energy behavior than the minimal Lee-Wick theory. An analogous auxiliary field transformation takes this higher-derivative theory into a renormalizable theory with states of alternating positive, negative, and positive norm. The phenomenology of this theory is examined in detail, with particular emphasis on the collider signatures of Lee-Wick particles, electroweak precision constraints on the masses that the new particles can take on, and scenarios in early-universe cosmology in which Lee-Wick particles can play a significant role.
ContributorsTerBeek, Russell Henry (Author) / Lebed, Richard F (Thesis advisor) / Alarcon, Ricardo (Committee member) / Belitsky, Andrei (Committee member) / Chamberlin, Ralph (Committee member) / Parikh, Maulik (Committee member) / Arizona State University (Publisher)
Created2015
158291-Thumbnail Image.png
Description
This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is

This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is based on the variational principle to differentiate hard cluster assignments, which was missing in the literature. This thesis shows multiple techniques to regularize and generalize OT to cope with various tasks including clustering, aligning, and interpolating distributional data. It also discusses the connections of the new formulation to other OT and clustering formulations to better understand their gaps and the means to close them. Finally, this thesis demonstrates the advantages of the proposed OT techniques in solving machine learning problems and their downstream applications in computer graphics, computer vision, and image processing.
ContributorsMi, Liang (Author) / Wang, Yalin (Thesis advisor) / Chen, Kewei (Committee member) / Karam, Lina (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2020
158849-Thumbnail Image.png
Description
Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The

Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The problem of accurate genotyping is exacerbated when
there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model or-
ganisms using an example Eucalyptus melliodora individual. I use the structure of
the tree to find bounds on its somatic mutation rate and evaluate several algorithms
for variant calling. I find that conventional methods are suitable if the genome of a
close relative can be adapted to the study organism. However, with structured data,
a likelihood framework that is aware of this structure is more accurate. I use the
techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator
(KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing
data. Base quality scores can help detect errors in sequencing reads, but are often
inaccurate. The most popular method for correcting this issue requires a known
set of variant sites, which is unavailable in most cases. I simulate data and show
that errors in this set of variant sites can cause calibration errors. I then show that
KBBQ accurately recalibrates base quality scores while requiring no reference or other
information and performs as well as other methods.
Finally, I use the Eucalyptus data to investigate the impact of quality score calibra-
tion on the quality of output variant calls and show that improved base quality score
calibration increases the sensitivity and reduces the false positive rate of a variant
calling algorithm.
ContributorsOrr, Adam James (Author) / Cartwright, Reed (Thesis advisor) / Wilson, Melissa (Committee member) / Kusumi, Kenro (Committee member) / Taylor, Jesse (Committee member) / Pfeifer, Susanne (Committee member) / Arizona State University (Publisher)
Created2020