Search Content

Genie: A Population Genetics Simulation Built with JavaScript

Description

The modern web presents an opportunity for educators and researchers to create tools that are highly accessible. Because of the near-ubiquity of modern web browsers, developers who hope to create educational and analytical tools can reach a large au- dience by creating web applications. Using JavaScript, HTML, and other modern…

The modern web presents an opportunity for educators and researchers to create tools that are highly accessible. Because of the near-ubiquity of modern web browsers, developers who hope to create educational and analytical tools can reach a large au- dience by creating web applications. Using JavaScript, HTML, and other modern web development technologies, Genie was developed as a simulator to help educators in biology, genetics, and evolution classrooms teach their students about population genetics. Because Genie was designed for the modern web, it is highly accessible to both educators and students, who can access the web application using any modern web browser on virtually any device. Genie demonstrates the efficacy of web devel- opment technologies for demonstrating and simulating complex processes, and it will be a unique educational tool for educators who teach population genetics.

ContributorsRoos, Benjamin Hirsch (Author) / Cartwright, Reed (Thesis director) / Wilson Sayres, Melissa (Committee member) / Mayron, Liam (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)

Created2015-05

A composite genome approach to identify phylogenetically informative data from next-generation sequencing

Description

Background
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS,…

Background
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.
Results
For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.
Conclusions
SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.

ContributorsSchwartz, Rachel (Author) / Harkins, Kelly (Author) / Stone, Anne (Author) / Cartwright, Reed (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Human Evolution and Social Change (Contributor) / School of Life Sciences (Contributor)

Created2015-06-11

A Curation of the Callithrix penicillata Draft Genome

Description

Callithrix penicillata, also known as the Black-tufted marmoset primarily lives in the Brazilian highlands and has had little research conducted on it. For this project I performed a genome curation on the newly assembled genome of this species. The scaffolds obtained by the Dovetail Genomics reads were organized and labeled…

Callithrix penicillata, also known as the Black-tufted marmoset primarily lives in the Brazilian highlands and has had little research conducted on it. For this project I performed a genome curation on the newly assembled genome of this species. The scaffolds obtained by the Dovetail Genomics reads were organized and labeled into chromosomes using the 2014 Callithrix jacchus genome as a reference. Then, using that same genome as a reference, 13 of the chromosomes were reverse complimented to be continuous with the 2014 Callithrix jacchus genome. The N50 statistics of the assembly were calculated and found to be 124 Mb. Quality scores were run for the final genome using referee and visualized with a bar plot, with 99% of sites scoring above 0. Heterozygosity was also calculated and found to be 0.3%. Finally, the final version of the genome was visually compared to the 2017 Callithrix jacchus genome and the GRCh38 human genome. This genome was submitted to the NCBIs database to await further approval.

ContributorsJohnson, Joelle Genevieve (Author) / Cartwright, Reed (Thesis director) / Stone, Anne (Committee member) / School of Molecular Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-12

A Review of the Human Vermiform Appendix and its Proposed Function

Description

Since its discovery in 1524, many people have characterized the vermiform appendix. Charles Darwin considered the human appendix to be a vestige and a useless structure. Others at the time opposed this hypothesis. However, Darwin's hypothesis became prevalent one until recently when there became a renewed interest in the appendix…

Since its discovery in 1524, many people have characterized the vermiform appendix. Charles Darwin considered the human appendix to be a vestige and a useless structure. Others at the time opposed this hypothesis. However, Darwin's hypothesis became prevalent one until recently when there became a renewed interest in the appendix because of advancements in microscopes, knowledge of the immune system, and phylogenetics. In this review, I will argue that the vermiform appendix, although still not completely understood, has important functions. First, I will give the anatomy of the appendix. I will discuss the comparative anatomy between different animals and also primates. I will address the effects of appendicitis and appendectomy. I will give background on vestigial structures and will discuss if the appendix is a vestige. Following, I will review the evolution of the appendix. Finally, I will argue that the function of the appendix is as an immune organ, including discussion of gut-associated lymphoid tissue (GALT), development of lymphoid follicles in GALT and their comparison within different organs, Immunoglobulin A (IgA) function in the gut, biofilms as evidence that the appendix is a safe-house for beneficial bacteria, re-inoculation of the bowel, and protection against recurring infection. I will conclude with future studies that should be conducted to further our understanding of the vermiform appendix.

ContributorsPrestwich, Shelby Elizabeth (Author) / Cartwright, Reed (Thesis director) / Lynch, John (Committee member) / Furstenau, Tara (Committee member) / School of Geographical Sciences and Urban Planning (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

An Analysis of the Benchmark Test lzbench for Open-Source Compressors

Description

With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of data per run, and such large storage demands are even…

With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of data per run, and such large storage demands are even more important to consider for institutions that rely on their own servers rather than large data centers (cloud storage)1. Compression algorithms aim to reduce the amount of space taken up by large genomic datasets by encoding the most frequently occurring symbols with the shortest bit codewords and by changing the order of the data to make it easier to encode. Depending on the probability distribution of the symbols in the dataset or the structure of the data, choosing the wrong algorithm could result in a compressed file larger than the original or a poorly compressed file that results in a waste of time and space2. To test efficiency among compression algorithms for each file type, 37 open-source compression algorithms were used to compress six types of genomic datasets (FASTA, VCF, BCF, GFF, GTF, and SAM) and evaluated on compression speed, decompression speed, compression ratio, and file size using the benchmark test lzbench. Compressors that outpreformed the popular bioinformatics compressor Gzip (zlib -6) were evaluated against one another by ratio and speed for each file type and across the geometric means of all file types. Compressors that exhibited fast compression and decompression speeds were also evaluated by transmission time through variable speed internet pipes in scenarios where the file was compressed only once or compressed multiple times.

ContributorsHowell, Abigail (Author) / Cartwright, Reed (Thesis director) / Wilson Sayres, Melissa (Committee member) / Taylor, Jay (Committee member) / Barrett, The Honors College (Contributor)

Created2017-05

Strong Episodic Selection for Natural Competence for Transformation due to Host-Pathogen Dynamics

Description

Many bacteria actively import environmental DNA and incorporate it into their genomes. This behavior, referred to as transformation, has been described in many species from diverse taxonomic backgrounds. Transformation is expected to carry some selective advantages similar to those postulated for meiotic sex in eukaryotes. However, the accumulation of loss-of-function…

Many bacteria actively import environmental DNA and incorporate it into their genomes. This behavior, referred to as transformation, has been described in many species from diverse taxonomic backgrounds. Transformation is expected to carry some selective advantages similar to those postulated for meiotic sex in eukaryotes. However, the accumulation of loss-of-function alleles at transformation loci and an increased mutational load from recombining with DNA from dead cells create additional costs to transformation. These costs have been shown to outweigh many of the benefits of recombination under a variety of likely parameters. We investigate an additional proposed benefit of sexual recombination, the Red Queen hypothesis, as it relates to bacterial transformation. Here we describe a computational model showing that host-pathogen coevolution may provide a large selective benefit to transformation and allow transforming cells to invade an environment dominated by otherwise equal non-transformers. Furthermore, we observe that host-pathogen dynamics cause the selection pressure on transformation to vary extensively in time, explaining the tight regulation and wide variety of rates observed in naturally competent bacteria. Host-pathogen dynamics may explain the evolution and maintenance of natural competence despite its associated costs.

ContributorsPalmer, Nathan David (Author) / Cartwright, Reed (Thesis director) / Wang, Xuan (Committee member) / Sievert, Chris (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Identifying Variation Within Substitution Rates in Mammary Gland Development Genes within Primate Genomes

Description

Mammary gland development in humans during puberty involves the enlargement of breast tissue, but this is not true in non-human primates. To identify potential causes of this difference, I examined variation in substitution rates across genes related to mammary development. Genes undergoing purifying selection show slower-than-average substitution rates, while genes…

Mammary gland development in humans during puberty involves the enlargement of breast tissue, but this is not true in non-human primates. To identify potential causes of this difference, I examined variation in substitution rates across genes related to mammary development. Genes undergoing purifying selection show slower-than-average substitution rates, while genes undergoing positive selection show faster rates. These may be related to the difference between humans and other primates. Three genes were found to be accelerated were FOXF1, IGFBP5, and ATP2B2, but only the latter one was found in humans and it seems unlikely that it would be related to the differences between mammary gland development at puberty between humans and non-human primates.

ContributorsArroyo, Diana (Author) / Cartwright, Reed (Thesis director) / Wilson Sayres, Melissa (Committee member) / Schwartz, Rachel (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Diversity and Distribution of the Desert Stink Beetles: Systematics of the Amphidorini LeConte, 1862 (Coleoptera: Tenebrionidae)

Description

Understanding the diversity, evolutionary relationships, and geographic distribution of species is foundational knowledge in biology. However, this knowledge is lacking for many diverse lineages of the tree of life. This is the case for the desert stink beetles in the tribe Amphidorini LeConte, 1862 (Coleoptera: Tenebrionidae) – a lineage of…

Understanding the diversity, evolutionary relationships, and geographic distribution of species is foundational knowledge in biology. However, this knowledge is lacking for many diverse lineages of the tree of life. This is the case for the desert stink beetles in the tribe Amphidorini LeConte, 1862 (Coleoptera: Tenebrionidae) – a lineage of arid-adapted flightless beetles found throughout western North America. Four interconnected studies that jointly increase our knowledge of this group are presented. First, the darkling beetle fauna of the Algodones sand dunes in southern California is examined as a case study to explore the scientific practice of checklist creation. An updated list of the species known from this region is presented, with a critical focus on material now made available through digitization and global aggregation. This part concludes with recommendations for future biodiversity checklist authors. Second, the psammophilic genus Trogloderus LeConte, 1879 is revised. Six new species are described, and the first, multi-gene phylogeny for the genus is inferred. In addition, historical biogeographic reconstructions along with novel hypotheses of speciation patterns within the Intermountain Region are given. In particular, the Kaibab Plateau and Kaiparowitz Formation are found to have promoted speciation on the Colorado Plateau. The Owens Valley and prehistoric Bouse Embayment are similarly hypothesized to drive species diversification in southern California. Third, a novel phylogenomic analysis for the tribe Amphidorini is presented, based on 29 de novo partial transcriptomes. Three putative ortholog sets were discovered and analyzed to infer the relationships between species groups and genera. The existing classification of the tribe is found to be highly inadequate, though the earliest-diverging relationships within the tribe are still in question. Finally, the new phylogenetic framework is used to provide a genus-level revision for the Amphidorini, which previously contained six valid genera and 253 valid species. This updated classification includes more than 100 taxonomic changes and results in the revised tribe consisting of 16 genera, with three being described as new to science.

ContributorsJohnston, Murray Andrew (Author) / Franz, Nico M (Thesis advisor) / Cartwright, Reed (Committee member) / Taylor, Jesse (Committee member) / Pigg, Kathleen (Committee member) / Arizona State University (Publisher)

Created2018

Transportation Techniques for Geometric Clustering

Description

This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is…

This thesis introduces new techniques for clustering distributional data according to their geometric similarities. This work builds upon the optimal transportation (OT) problem that seeks global minimum cost for matching distributional data and leverages the connection between OT and power diagrams to solve different clustering problems. The OT formulation is based on the variational principle to differentiate hard cluster assignments, which was missing in the literature. This thesis shows multiple techniques to regularize and generalize OT to cope with various tasks including clustering, aligning, and interpolating distributional data. It also discusses the connections of the new formulation to other OT and clustering formulations to better understand their gaps and the means to close them. Finally, this thesis demonstrates the advantages of the proposed OT techniques in solving machine learning problems and their downstream applications in computer graphics, computer vision, and image processing.

ContributorsMi, Liang (Author) / Wang, Yalin (Thesis advisor) / Chen, Kewei (Committee member) / Karam, Lina (Committee member) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2020

Methods for Detecting Mutations in Non-model Organisms

Description

Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The…

Next-generation sequencing is a powerful tool for detecting genetic variation. How-ever, it is also error-prone, with error rates that are much larger than mutation rates.
This can make mutation detection difficult; and while increasing sequencing depth
can often help, sequence-specific errors and other non-random biases cannot be de-
tected by increased depth. The problem of accurate genotyping is exacerbated when
there is not a reference genome or other auxiliary information available.
I explore several methods for sensitively detecting mutations in non-model or-
ganisms using an example Eucalyptus melliodora individual. I use the structure of
the tree to find bounds on its somatic mutation rate and evaluate several algorithms
for variant calling. I find that conventional methods are suitable if the genome of a
close relative can be adapted to the study organism. However, with structured data,
a likelihood framework that is aware of this structure is more accurate. I use the
techniques developed here to evaluate a reference-free variant calling algorithm.
I also use this data to evaluate a k-mer based base quality score recalibrator
(KBBQ), a tool I developed to recalibrate base quality scores attached to sequencing
data. Base quality scores can help detect errors in sequencing reads, but are often
inaccurate. The most popular method for correcting this issue requires a known
set of variant sites, which is unavailable in most cases. I simulate data and show
that errors in this set of variant sites can cause calibration errors. I then show that
KBBQ accurately recalibrates base quality scores while requiring no reference or other
information and performs as well as other methods.
Finally, I use the Eucalyptus data to investigate the impact of quality score calibra-
tion on the quality of output variant calls and show that improved base quality score
calibration increases the sensitivity and reduces the false positive rate of a variant
calling algorithm.

ContributorsOrr, Adam James (Author) / Cartwright, Reed (Thesis advisor) / Wilson, Melissa (Committee member) / Kusumi, Kenro (Committee member) / Taylor, Jesse (Committee member) / Pfeifer, Susanne (Committee member) / Arizona State University (Publisher)

Created2020