Search Content

Matching Items (3)

Filtering by

All Subjects: Bioinformatics
Creators: Cartwright, Reed
Member of: Barrett, The Honors College Thesis/Creative Project Collection

Identifying Variation Within Substitution Rates in Mammary Gland Development Genes within Primate Genomes

Description

Mammary gland development in humans during puberty involves the enlargement of breast tissue, but this is not true in non-human primates. To identify potential causes of this difference, I examined variation in substitution rates across genes related to mammary development. Genes undergoing purifying selection show slower-than-average substitution rates, while genes undergoing positive selection show faster rates. These may be related to the difference between humans and other primates. Three genes were found to be accelerated were FOXF1, IGFBP5, and ATP2B2, but only the latter one was found in humans and it seems unlikely that it would be related to the differences between mammary gland development at puberty between humans and non-human primates.

ContributorsArroyo, Diana (Author) / Cartwright, Reed (Thesis director) / Wilson Sayres, Melissa (Committee member) / Schwartz, Rachel (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

An Analysis of the Benchmark Test lzbench for Open-Source Compressors

Description

With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of data per run, and such large storage demands are even more important to consider for institutions that rely on their own servers rather than large data centers (cloud storage)1. Compression algorithms aim to reduce the amount of space taken up by large genomic datasets by encoding the most frequently occurring symbols with the shortest bit codewords and by changing the order of the data to make it easier to encode. Depending on the probability distribution of the symbols in the dataset or the structure of the data, choosing the wrong algorithm could result in a compressed file larger than the original or a poorly compressed file that results in a waste of time and space2. To test efficiency among compression algorithms for each file type, 37 open-source compression algorithms were used to compress six types of genomic datasets (FASTA, VCF, BCF, GFF, GTF, and SAM) and evaluated on compression speed, decompression speed, compression ratio, and file size using the benchmark test lzbench. Compressors that outpreformed the popular bioinformatics compressor Gzip (zlib -6) were evaluated against one another by ratio and speed for each file type and across the geometric means of all file types. Compressors that exhibited fast compression and decompression speeds were also evaluated by transmission time through variable speed internet pipes in scenarios where the file was compressed only once or compressed multiple times.

ContributorsHowell, Abigail (Author) / Cartwright, Reed (Thesis director) / Wilson Sayres, Melissa (Committee member) / Taylor, Jay (Committee member) / Barrett, The Honors College (Contributor)

Created2017-05

A Curation of the Callithrix penicillata Draft Genome

Description

Callithrix penicillata, also known as the Black-tufted marmoset primarily lives in the Brazilian highlands and has had little research conducted on it. For this project I performed a genome curation on the newly assembled genome of this species. The scaffolds obtained by the Dovetail Genomics reads were organized and labeled into chromosomes using the 2014 Callithrix jacchus genome as a reference. Then, using that same genome as a reference, 13 of the chromosomes were reverse complimented to be continuous with the 2014 Callithrix jacchus genome. The N50 statistics of the assembly were calculated and found to be 124 Mb. Quality scores were run for the final genome using referee and visualized with a bar plot, with 99% of sites scoring above 0. Heterozygosity was also calculated and found to be 0.3%. Finally, the final version of the genome was visually compared to the 2017 Callithrix jacchus genome and the GRCh38 human genome. This genome was submitted to the NCBIs database to await further approval.

ContributorsJohnson, Joelle Genevieve (Author) / Cartwright, Reed (Thesis director) / Stone, Anne (Committee member) / School of Molecular Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-12