Matching Items (4)

Filtering by

Clear all filters

134524-Thumbnail Image.png

An Analysis of the Benchmark Test lzbench for Open-Source Compressors

Description

With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of

With the rising data output and falling costs of Next Generation Sequencing technologies, research into data compression is crucial to maintaining storage efficiency and costs. High throughput sequencers such as the HiSeqX Ten can produce up to 1.8 terabases of data per run, and such large storage demands are even more important to consider for institutions that rely on their own servers rather than large data centers (cloud storage)1. Compression algorithms aim to reduce the amount of space taken up by large genomic datasets by encoding the most frequently occurring symbols with the shortest bit codewords and by changing the order of the data to make it easier to encode. Depending on the probability distribution of the symbols in the dataset or the structure of the data, choosing the wrong algorithm could result in a compressed file larger than the original or a poorly compressed file that results in a waste of time and space2. To test efficiency among compression algorithms for each file type, 37 open-source compression algorithms were used to compress six types of genomic datasets (FASTA, VCF, BCF, GFF, GTF, and SAM) and evaluated on compression speed, decompression speed, compression ratio, and file size using the benchmark test lzbench. Compressors that outpreformed the popular bioinformatics compressor Gzip (zlib -6) were evaluated against one another by ratio and speed for each file type and across the geometric means of all file types. Compressors that exhibited fast compression and decompression speeds were also evaluated by transmission time through variable speed internet pipes in scenarios where the file was compressed only once or compressed multiple times.

Contributors

Agent

Created

Date Created
2017-05

132980-Thumbnail Image.png

Beginning to investigate Lactase Persistence in Turkana

Description

Lactase persistence is the ability of adults to digest lactose in milk (Segurel & Bon, 2017). Mammals are generally distinguished by their mammary glands which gives females the ability to produce milk and feed their newborn children. The new born

Lactase persistence is the ability of adults to digest lactose in milk (Segurel & Bon, 2017). Mammals are generally distinguished by their mammary glands which gives females the ability to produce milk and feed their newborn children. The new born therefore requires the ability to breakdown the lactose in the milk to ensure its proper digestion (Segurel & Bon, 2017). Generally, humans lose the expression of lactase after weaning, which prevents them being able to breakdown lactose from dairy (Flatz, 1987).
My research is focused on the people of Turkana, a human pastoral population inhabiting Northwest Kenya. The people of Turkana are Nilotic people that are native to the Turkana district. There are currently no conclusive studies done on evidence for genetic lactase persistence in Turkana. Therefore, my research will be on the evolution of lactase persistence in the people of Turkana. The goal of this project is to investigate the evolutionary history of two genes with known involvement in lactase persistence, LCT and MCM6, in the Turkana. Variants in these genes have previously been identified to result in the ability to digest lactose post-weaning age. Furthermore, an additional study found that a closely related population to the Turkana, the Massai, showed stronger signals of recent selection for lactase persistence than Europeans in these genes. My goal is to characterize known variants associated with lactase persistence by calculating their allele frequencies in the Turkana and conduct selection scans to determine if LCT/MCM6 show signatures of positive selection. In doing this, we conducted a pilot study consisting of 10 female Turkana individuals and 10 females from four different populations from the 1000 genomes project namely: the Yoruba in Ibadan, Nigeria (YRI); Luhya in Webuye, Kenya; Utah Residents with Northern and Western European Ancestry (CEU); and the Southern Han Chinese. The allele frequency calculation suggested that the CEU (Utah Residents with Northern and Western European Ancestry) population had a higher lactase persistence associated allele frequency than all the other populations analyzed here, including the Turkana population. Our Tajima’s D calculations and analysis suggested that both the Turkana population and the four haplotype map populations shows signatures of positive selection in the same region. The iHS selection scans we conducted to detect signatures of positive selection on all five populations showed that the Southern Han Chinese (CHS), the LWK (Luhya in Webuye, Kenya) and the YRI (Yoruba in Ibadan, Nigeria) populations had stronger signatures of positive selection than the Turkana population. The LWK (Luhya in Webuye, Kenya) and the YRI (Yoruba in Ibadan, Nigeria) populations showed the strongest signatures of positive selection in this region. This project serves as a first step in the investigation of lactase persistence in the Turkana population and its evolution over time.

Contributors

Agent

Created

Date Created
2019-05

135454-Thumbnail Image.png

Identifying Variation Within Substitution Rates in Mammary Gland Development Genes within Primate Genomes

Description

Mammary gland development in humans during puberty involves the enlargement of breast tissue, but this is not true in non-human primates. To identify potential causes of this difference, I examined variation in substitution rates across genes related to mammary development.

Mammary gland development in humans during puberty involves the enlargement of breast tissue, but this is not true in non-human primates. To identify potential causes of this difference, I examined variation in substitution rates across genes related to mammary development. Genes undergoing purifying selection show slower-than-average substitution rates, while genes undergoing positive selection show faster rates. These may be related to the difference between humans and other primates. Three genes were found to be accelerated were FOXF1, IGFBP5, and ATP2B2, but only the latter one was found in humans and it seems unlikely that it would be related to the differences between mammary gland development at puberty between humans and non-human primates.

Contributors

Agent

Created

Date Created
2016-05

135114-Thumbnail Image.png

Genetic diversity across the pseudoautosomal boundary varies across human populations

Description

Unlike the autosomes, recombination on the sex chromosomes is limited to the pseudoautosomal regions (PARs) at each end of the chromosome. PAR1 spans approximately 2.7 Mb from the tip of the proximal arm of each sex chromosome, and a pseudoautosomal

Unlike the autosomes, recombination on the sex chromosomes is limited to the pseudoautosomal regions (PARs) at each end of the chromosome. PAR1 spans approximately 2.7 Mb from the tip of the proximal arm of each sex chromosome, and a pseudoautosomal boundary between the PAR1 and non-PAR region is thought to have evolved from a Y-specific inversion that suppressed recombination across the boundary. In addition to the two PARs, there is also a human-specific X-transposed region (XTR) that was duplicated from the X to the Y chromosome. Genetic diversity is expected to be higher in recombining than nonrecombining regions, particularly because recombination reduces the effects of linked selection, allowing neutral variation to accumulate. We previously showed that diversity decreases linearly across the previously defined pseudoautosomal boundary (rather than drop suddenly at the boundary), suggesting that the pseudoautosomal boundary may not be as strict as previously thought. In this study, we analyzed data from 1271 genetic females to explore the extent to which the pseudoautosomal boundary varies among human populations (broadly, African, European, South Asian, East Asian, and the Americas). We found that, in all populations, genetic diversity was significantly higher in the PAR1 and XTR than in the non-PAR regions, and that diversity decreased linearly from the PAR1 to finally reach a non-PAR value well past the pseudoautosomal boundary in all populations. However, we also found that the location at which diversity changes from reflecting the higher PAR1 diversity to the lower nonPAR diversity varied by as much as 500 kb among populations. The lack of genetic evidence for a strict pseudoautosomal boundary and the variability in patterns of diversity across the pseudoautosomal boundary are consistent with two potential explanations: (1) the boundary itself may vary across populations, or (2) that population-specific demographic histories have shaped diversity across the pseudoautosomal boundary.

Contributors

Agent

Created

Date Created
2016-12