Search Content

An Analysis of the Benchmark Test lzbench for Open-Source Compressors

Description

With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of data per run, and such large storage demands are even…

With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of data per run, and such large storage demands are even more important to consider for institutions that rely on their own servers rather than large data centers (cloud storage)1. Compression algorithms aim to reduce the amount of space taken up by large genomic datasets by encoding the most frequently occurring symbols with the shortest bit codewords and by changing the order of the data to make it easier to encode. Depending on the probability distribution of the symbols in the dataset or the structure of the data, choosing the wrong algorithm could result in a compressed file larger than the original or a poorly compressed file that results in a waste of time and space2. To test efficiency among compression algorithms for each file type, 37 open-source compression algorithms were used to compress six types of genomic datasets (FASTA, VCF, BCF, GFF, GTF, and SAM) and evaluated on compression speed, decompression speed, compression ratio, and file size using the benchmark test lzbench. Compressors that outpreformed the popular bioinformatics compressor Gzip (zlib -6) were evaluated against one another by ratio and speed for each file type and across the geometric means of all file types. Compressors that exhibited fast compression and decompression speeds were also evaluated by transmission time through variable speed internet pipes in scenarios where the file was compressed only once or compressed multiple times.

ContributorsHowell, Abigail (Author) / Cartwright, Reed (Thesis director) / Wilson Sayres, Melissa (Committee member) / Taylor, Jay (Committee member) / Barrett, The Honors College (Contributor)

Created2017-05

Identifying Variation Within Substitution Rates in Mammary Gland Development Genes within Primate Genomes

Description

Mammary gland development in humans during puberty involves the enlargement of breast tissue, but this is not true in non-human primates. To identify potential causes of this difference, I examined variation in substitution rates across genes related to mammary development. Genes undergoing purifying selection show slower-than-average substitution rates, while genes…

Mammary gland development in humans during puberty involves the enlargement of breast tissue, but this is not true in non-human primates. To identify potential causes of this difference, I examined variation in substitution rates across genes related to mammary development. Genes undergoing purifying selection show slower-than-average substitution rates, while genes undergoing positive selection show faster rates. These may be related to the difference between humans and other primates. Three genes were found to be accelerated were FOXF1, IGFBP5, and ATP2B2, but only the latter one was found in humans and it seems unlikely that it would be related to the differences between mammary gland development at puberty between humans and non-human primates.

ContributorsArroyo, Diana (Author) / Cartwright, Reed (Thesis director) / Wilson Sayres, Melissa (Committee member) / Schwartz, Rachel (Committee member) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Genetic Variants in GC, CYP2R1, and VDR Genes and Associations of Serum 25-Hydroxyvitamin D Concentrations in a Population of Hispanic and Non-Hispanic Adults Residing in San Diego County, California

Description

Vitamin D is a nutrient that is obtained through the diet and vitamin D supplementation and created from exposure to Ultraviolet B (UVB) radiation. While there are many factors that determine how much serum 25-hydroxyvitamin D (25(OH)D) concentration is in the body, little is known about how genetic variation in…

Vitamin D is a nutrient that is obtained through the diet and vitamin D supplementation and created from exposure to Ultraviolet B (UVB) radiation. While there are many factors that determine how much serum 25-hydroxyvitamin D (25(OH)D) concentration is in the body, little is known about how genetic variation in vitamin D-related genes influences serum 25(OH)D concentrations resulting from daily vitamin D intake and exposure to direct sunlight. Previous studies show that common genetic variants rs10741657 (CYP2R1), rs4588 (GC), rs228678 (GC), and rs4516035 (VDR) act as moderators and alter the effect of outdoor time and vitamin D intake on serum 25(OH)D concentrations. The objective of this study is to analyze the associations between serum 25(OH)D concentrations resulting from outdoor time and vitamin D intake, and genetic risk scores (GRS) established from previous studies involving single nucleotide polymorphisms (SNP) located on or near genes involving vitamin D synthesis, transport, activation, and degradation in 102 Hispanic and Non-Hispanic adults in the San Diego County, California. This study is a secondary analysis of data from the Community of Mine study. Global Positioning System (GPS) data collected by the Qstarz GPS device worn by each participant was used to measure outdoor time, a proxy measurement for sun exposure time. Vitamin D intake was assessed using two 24-hour dietary recalls. Blood samples were measured for serum 25(OH)D concentrations. DNA was provided to assess each participant for the various genetic variants. Adjusted analyses of the GRS and serum 25(OH)D concentrations showed that individuals with high GRS (3-4) had lower serum 25(OH)D concentrations than individuals with low GRS (0-2) for both Nissen GRS and Rivera-Paredez GRS.

ContributorsAnderson, Heather Ray (Author) / Sears, Dorothy (Thesis advisor) / Alexon, Christy (Committee member) / Dinu, Valentin (Committee member) / Jankowska, Marta (Committee member) / Arizona State University (Publisher)

Created2022

Investigation of DNA methylation in obesity and its underlying insulin resistance

Description

Obesity and its underlying insulin resistance are caused by environmental and genetic factors. DNA methylation provides a mechanism by which environmental factors can regulate transcriptional activity. The overall goal of the work herein was to (1) identify alterations in DNA methylation in human skeletal muscle with obesity and its underlying…

Obesity and its underlying insulin resistance are caused by environmental and genetic factors. DNA methylation provides a mechanism by which environmental factors can regulate transcriptional activity. The overall goal of the work herein was to (1) identify alterations in DNA methylation in human skeletal muscle with obesity and its underlying insulin resistance, (2) to determine if these changes in methylation can be altered through weight-loss induced by bariatric surgery, and (3) to identify DNA methylation biomarkers in whole blood that can be used as a surrogate for skeletal muscle.

Assessment of DNA methylation was performed on human skeletal muscle and blood using reduced representation bisulfite sequencing (RRBS) for high-throughput identification and pyrosequencing for site-specific confirmation. Sorbin and SH3 homology domain 3 (SORBS3) was identified in skeletal muscle to be increased in methylation (+5.0 to +24.4 %) in the promoter and 5’untranslated region (UTR) in the obese participants (n= 10) compared to lean (n=12), and this finding corresponded with a decrease in gene expression (fold change: -1.9, P=0.0001). Furthermore, SORBS3 was demonstrated in a separate cohort of morbidly obese participants (n=7) undergoing weight-loss induced by surgery, to decrease in methylation (-5.6 to -24.2%) and increase in gene expression (fold change: +1.7; P=0.05) post-surgery. Moreover, SORBS3 promoter methylation was demonstrated in vitro to inhibit transcriptional activity (P=0.000003). The methylation and transcriptional changes for SORBS3 were significantly (P≤0.05) correlated with obesity measures and fasting insulin levels. SORBS3 was not identified in the blood methylation analysis of lean (n=10) and obese (n=10) participants suggesting that it is a muscle specific marker. However, solute carrier family 19 member 1 (SLC19A1) was identified in blood and skeletal muscle to have decreased 5’UTR methylation in obese participants, and this was significantly (P≤0.05) predicted by insulin sensitivity.

These findings suggest SLC19A1 as a potential blood-based biomarker for obese, insulin resistant states. The collective findings of SORBS3 DNA methylation and gene expression present an exciting novel target in skeletal muscle for further understanding obesity and its underlying insulin resistance. Moreover, the dynamic changes to SORBS3 in response to metabolic improvements and weight-loss induced by surgery.

ContributorsDay, Samantha Elaine (Author) / Coletta, Dawn K. (Thesis advisor) / Katsanos, Christos (Committee member) / Mandarino, Lawrence J. (Committee member) / Shaibi, Gabriel Q. (Committee member) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2017

Fine Mapping Functional Noncoding Genetic Elements Via Machine Learning

Description

All biological processes like cell growth, cell differentiation, development, and aging requires a series of steps which are characterized by gene regulation. Studies have shown that gene regulation is the key to various traits and diseases. Various factors affect the gene regulation which includes genetic signals, epigenetic tracks, genetic variants,…

All biological processes like cell growth, cell differentiation, development, and aging requires a series of steps which are characterized by gene regulation. Studies have shown that gene regulation is the key to various traits and diseases. Various factors affect the gene regulation which includes genetic signals, epigenetic tracks, genetic variants, etc. Deciphering and cataloging these functional genetic elements in the non-coding regions of the genome is one of the biggest challenges in precision medicine and genetic research. This thesis presents two different approaches to identifying these elements: TreeMap and DeepCORE. The first approach involves identifying putative causal genetic variants in cis-eQTL accounting for multisite effects and genetic linkage at a locus. TreeMap performs an organized search for individual and multiple causal variants using a tree guided nested machine learning method. DeepCORE on the other hand explores novel deep learning techniques that models the relationship between genetic, epigenetic and transcriptional patterns across tissues and cell lines and identifies co-operative regulatory elements that affect gene regulation. These two methods are believed to be the link for genotype-phenotype association and a necessary step to explaining various complex diseases and missing heritability.

ContributorsChandrashekar, Pramod Bharadwaj (Author) / Liu, Li (Thesis advisor) / Runger, George C. (Committee member) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2020

Pathways of Distinction Analysis of Liver Cancer Data: Genetic Differences Between Males and Females

Description

The Pathways of Distinction Analysis (PoDA) program calculates relationships between a given group of genes contained within a pathway, and a disease state. It was used here to investigate liver cancer, and to explore how genetic variability may contribute to the different rates of development of the disease in males…

The Pathways of Distinction Analysis (PoDA) program calculates relationships between a given group of genes contained within a pathway, and a disease state. It was used here to investigate liver cancer, and to explore how genetic variability may contribute to the different rates of development of the disease in males and females. The goal of the study was to identify germline variation that differs by sex in hepatocellular carcinoma. Using the program, multiple pathways and genes were identified to have significant differences in their relationship to liver cancer in males and females. In animal studies, the genes which were identified using the PoDA analysis have been shown to impact liver cancer, often with different results for males and females. While these genes are often the focus in animal models, they are absent from current Genome Wide Association Studies (GWAS) catalogs for humans. By working to bridge the results of animal studies and human studies, the results help to identify the causes of liver cancer, and more specifically, the reason the disease affects males at much higher rates. The differences in pathways identified to be significant for the two sexes indicate the germline variance may play sex-specific roles in the development of hepatocellular carcinoma. Additionally, these results reinforce the capacity of the PoDA analysis to identify genes that may be missed by more traditional GWAS methods. This study lays the groundwork for further investigations into the identified genes and pathways, and how they behave differently within males and females.

ContributorsOlson, Erik Jon (Author) / Buetow, Kenneth (Thesis advisor) / Wilson, Melissa (Committee member) / Cartwright, Reed (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by

An Analysis of the Benchmark Test lzbench for Open-Source Compressors

Identifying Variation Within Substitution Rates in Mammary Gland Development Genes within Primate Genomes

Genetic Variants in GC, CYP2R1, and VDR Genes and Associations of Serum 25-Hydroxyvitamin D Concentrations in a Population of Hispanic and Non-Hispanic Adults Residing in San Diego County, California

Investigation of DNA methylation in obesity and its underlying insulin resistance

Fine Mapping Functional Noncoding Genetic Elements Via Machine Learning

Pathways of Distinction Analysis of Liver Cancer Data: Genetic Differences Between Males and Females