Search Content

HBS1L-MYB loci involvement in Fetal Hemoglobin Expression

Description

This project studies two single nucleotide polymorphisms (SNPs) within the HBS1L-MYB loci. Both SNPs are associated with a heightened expression of fetal hemoglobin. DNA samples of NCAA athletes who have sickle cell trait were genotyped to find the allele frequency of each SNP. When comparing all populations using information provided…

This project studies two single nucleotide polymorphisms (SNPs) within the HBS1L-MYB loci. Both SNPs are associated with a heightened expression of fetal hemoglobin. DNA samples of NCAA athletes who have sickle cell trait were genotyped to find the allele frequency of each SNP. When comparing all populations using information provided from the Human Genome Project on Ensembl, the minor A allele has a frequency of 22% and the major, G, allele has a frequency of 78%. The frequency distribution of the minor allele in the population data was higher than the frequency obtained from the sampled data by 15%. This means that the samples, which are heterozygous for sickle cell, display a lower frequency for the mutation than the global population.

ContributorsCiambella, Michelle Lynn (Author) / Stone, Anne (Thesis director) / Foy, Joseph (Committee member) / Madrigal, Lorena (Committee member) / Barrett, The Honors College (Contributor) / School of Life Sciences (Contributor)

Created2014-05

The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents

Description

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they…

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they typically require additional training (for example, scholars have to learn how to use the command line) or are difficult to automate without programming skills. The Giles Ecosystem is a distributed system based on Apache Kafka that allows users to upload documents for text and image extraction. The system components are implemented using Java and the Spring Framework and are available under an Open Source license on GitHub (https://github.com/diging/).

ContributorsLessios-Damerow, Julia (Contributor) / Peirson, Erick (Contributor) / Laubichler, Manfred (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2017-09-28

Novel DNA Extraction Methods for Mollusks and the History and Significance of Bermuda Land Snails

Description

Bermuda Land Snails make up a genus called Poecilozonites that is endemic to Bermuda and is extensively present in its fossil record. These snails were also integral to the creation of the theory of punctuated equilibrium. The DNA of mollusks is difficult to sequence because of a class of proteins…

Bermuda Land Snails make up a genus called Poecilozonites that is endemic to Bermuda and is extensively present in its fossil record. These snails were also integral to the creation of the theory of punctuated equilibrium. The DNA of mollusks is difficult to sequence because of a class of proteins called mucopolysaccharides that are present in high concentrations in mollusk tissue, and are not removed with standard DNA extraction methods. They inhibit Polymerase Chain Reactions (PCRs) and interfere with Next Generation Sequencing methods. This paper will discuss the DNA extraction methods that were designed to remove the inhibitory proteins that were tested on another gastropod species (Pomacea canaliculata). These were chosen because they are invasive and while they are not pulmonates, they are similar enough to Bermuda Land Snails to reliably test extraction methods. The methods that were tested included two commercially available kits: the Qiagen Blood and Tissue Kit and the Omega Biotek Mollusc Extraction Kit, and one Hexadecyltrimethylammonium Bromide (CTAB) Extraction method that was modified for use on mollusk tissue. The Blood and Tissue kit produced some DNA, the mollusk kit produced almost none, and the CTAB Extraction Method produced the highest concentrations on average, and may prove to be the most viable option for future extractions. PCRs attempted with the extracted DNA have all failed, though it is likely due to an issue with reagents. Further spectrographic analysis of the DNA from the test extractions has shown that they were successful at removing mucopolysaccharides. When the protocol is optimized, it will be used to extract DNA from the tissue from six individuals from each of the two extant species of Bermuda Land Snails. This DNA will be used in several experiments involving Next Generation Sequencing, with the goal of assembling a variety of genome data. These data will then be used to a construct reference genome for Bermuda Land Snails. The genomes generated by this project will be used in population genetic analyses between individuals of the same species, and between individuals of different species. These analyses will then be used to aid in conservation efforts for the species.

ContributorsClark, Patrick Louis (Author) / Stone, Anne (Thesis director) / Winingear, Stevie (Committee member) / School of Life Sciences (Contributor, Contributor) / School of Human Evolution & Social Change (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

Image-level and group-level models for Drosophila gene expression pattern annotation

Description

Background
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the…

Background
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison.
Results
We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach.
Conclusion
In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.

ContributorsSun, Qian (Author) / Muckatira, Sherin (Author) / Yuan, Lei (Author) / Ji, Shuiwang (Author) / Newfeld, Stuart (Author) / Kumar, Sudhir (Author) / Ye, Jieping (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor) / Ira A. Fulton Schools of Engineering (Contributor)

Created2013-12-03

GRASP [Genomic Resource Access for Stoichioproteomics]: comparative explorations of the atomic content of 12 Drosophila proteomes

Description

Background
“Stoichioproteomics” relates the elemental composition of proteins and proteomes to variation in the physiological and ecological environment. To help harness and explore the wealth of hypotheses made possible under this framework, we introduce GRASP (http://www.graspdb.net), a public bioinformatic knowledgebase containing information on the frequencies of 20 amino acids and atomic…

Background
“Stoichioproteomics” relates the elemental composition of proteins and proteomes to variation in the physiological and ecological environment. To help harness and explore the wealth of hypotheses made possible under this framework, we introduce GRASP (http://www.graspdb.net), a public bioinformatic knowledgebase containing information on the frequencies of 20 amino acids and atomic composition of their side chains. GRASP integrates comparative protein composition data with annotation data from multiple public databases. Currently, GRASP includes information on proteins of 12 sequenced Drosophila (fruit fly) proteomes, which will be expanded to include increasingly diverse organisms over time. In this paper we illustrate the potential of GRASP for testing stoichioproteomic hypotheses by conducting an exploratory investigation into the composition of 12 Drosophila proteomes, testing the prediction that protein atomic content is associated with species ecology and with protein expression levels.
Results
Elements varied predictably along multivariate axes. Species were broadly similar, with the D. willistoni proteome a clear outlier. As expected, individual protein atomic content within proteomes was influenced by protein function and amino acid biochemistry. Evolution in elemental composition across the phylogeny followed less predictable patterns, but was associated with broad ecological variation in diet. Using expression data available for D. melanogaster, we found evidence consistent with selection for efficient usage of elements within the proteome: as expected, nitrogen content was reduced in highly expressed proteins in most tissues, most strongly in the gut, where nutrients are assimilated, and least strongly in the germline.
Conclusions
The patterns identified here using GRASP provide a foundation on which to base future research into the evolution of atomic composition in Drosophila and other taxa.

ContributorsGilbert, James D. J. (Author) / Acquisti, Claudia (Author) / Martinson, Holly M. (Author) / Elser, James (Author) / Kumar, Sudhir (Author) / Fagan, William F. (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor)

Created2013-09-04

A composite genome approach to identify phylogenetically informative data from next-generation sequencing

Description

Background
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS,…

Background
Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.
Results
For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.
Conclusions
SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.

ContributorsSchwartz, Rachel (Author) / Harkins, Kelly (Author) / Stone, Anne (Author) / Cartwright, Reed (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor) / College of Liberal Arts and Sciences (Contributor) / School of Human Evolution and Social Change (Contributor) / School of Life Sciences (Contributor)

Created2015-06-11

Evolutionary Diagnosis of Non-Synonymous Variants Involved in Differential Drug Response

Description

Background:
Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants (nsSNVs) may cause differences in drug efficacy and side effects. Therefore, it is desirable to evaluate nsSNVs of interest in their…

Background:
Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants (nsSNVs) may cause differences in drug efficacy and side effects. Therefore, it is desirable to evaluate nsSNVs of interest in their ability to modulate the drug response.

Results:
We found that the available data on the link between drug response and nsSNV is rather modest. There were only 31 distinct drug response-altering (DR-altering) and 43 distinct drug response-neutral (DR-neutral) nsSNVs in the whole Pharmacogenomics Knowledge Base (PharmGKB). However, even with this modest dataset, it was clear that existing bioinformatics tools have difficulties in correctly predicting the known DR-altering and DR-neutral nsSNVs. They exhibited an overall accuracy of less than 50%, which was not better than random diagnosis. We found that the underlying problem is the markedly different evolutionary properties between positions harboring nsSNVs linked to drug responses and those observed for inherited diseases. To solve this problem, we developed a new diagnosis method, Drug-EvoD, which was trained on the evolutionary properties of nsSNVs associated with drug responses in a sparse learning framework. Drug-EvoD achieves a TPR of 84% and a TNR of 53%, with a balanced accuracy of 69%, which improves upon other methods significantly.

Conclusions:
The new tool will enable researchers to computationally identify nsSNVs that may affect drug responses. However, much larger training and testing datasets are needed to develop more reliable and accurate tools.

ContributorsGerek, Nevin Z. (Author) / Liu, Li (Author) / Gerold, Kristyn (Author) / Biparva, Pegah (Author) / Thomas, Eric D. (Author) / Kumar, Sudhir (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor)

Created2015-01-15

Natural and Anthropogenic Hybridization in Two Species of Eastern Brazilian Marmosets (Callithrix jacchus and C. penicillata)

Description

Animal hybridization is well documented, but evolutionary outcomes and conservation priorities often differ for natural and anthropogenic hybrids. Among primates, an order with many endangered species, the two contexts can be hard to disentangle from one another, which carries important conservation implications. Callithrix marmosets give us a unique glimpse of…

Animal hybridization is well documented, but evolutionary outcomes and conservation priorities often differ for natural and anthropogenic hybrids. Among primates, an order with many endangered species, the two contexts can be hard to disentangle from one another, which carries important conservation implications. Callithrix marmosets give us a unique glimpse of genetic hybridization effects under distinct natural and human-induced contexts. Here, we use a 44 autosomal microsatellite marker panel to examine genome-wide admixture levels and introgression at a natural C. jacchus and C. penicillata species border along the Sao Francisco River in NE Brazil and in an area of Rio de Janeiro state where humans introduced these species exotically. Additionally, we describe for the first time autosomal genetic diversity in wild C. penicillata and expand previous C. jacchus genetic data. We characterize admixture within the natural zone as bimodal where hybrid ancestry is biased toward one parental species or the other. We also show evidence that Sao Francisco River islands are gateways for bidirectional gene flow across the species border. In the anthropogenic zone, marmosets essentially form a hybrid swarm with intermediate levels of admixture, likely from the absence of strong physical barriers to interspecific breeding. Our data show that while hybridization can occur naturally, the presence of physical, even if leaky, barriers to hybridization is important for maintaining species genetic integrity. Thus, we suggest further study of hybridization under different contexts to set well informed conservation guidelines for hybrid populations that often fit somewhere between "natural" and "man-made."

ContributorsMalukiewicz, Joanna (Author) / Boere, Vanner (Author) / Fuzessy, Lisieux F. (Author) / Grativol, Adriana D. (Author) / de Oliveira e Silva, Ita (Author) / Pereira, Luiz C. M. (Author) / Ruiz-Miranda, Carlos R. (Author) / Valenca, Yuri M. (Author) / Stone, Anne (Author) / College of Liberal Arts and Sciences (Contributor) / School of Human Evolution and Social Change (Contributor)

Created2015-06-10

Validation of qPCR Methods for the Detection of Mycobacterium in New World Animal Reservoirs

Description

Zoonotic pathogens that cause leprosy (Mycobacterium leprae) and tuberculosis (Mycobacterium tuberculosis complex, MTBC) continue to impact modern human populations. Therefore, methods able to survey mycobacterial infection in potential animal hosts are necessary for proper evaluation of human exposure threats. Here we tested for mycobacterial-specific single- and multi-copy loci using qPCR.…

Zoonotic pathogens that cause leprosy (Mycobacterium leprae) and tuberculosis (Mycobacterium tuberculosis complex, MTBC) continue to impact modern human populations. Therefore, methods able to survey mycobacterial infection in potential animal hosts are necessary for proper evaluation of human exposure threats. Here we tested for mycobacterial-specific single- and multi-copy loci using qPCR. In a trial study in which armadillos were artificially infected with M. leprae, these techniques were specific and sensitive to pathogen detection, while more traditional ELISAs were only specific. These assays were then employed in a case study to detect M. leprae as well as MTBC in wild marmosets. All marmosets were negative for M. leprae DNA, but 14 were positive for the mycobacterial rpoB gene assay. Targeted capture and sequencing of rpoB and other MTBC genes validated the presence of mycobacterial DNA in these samples and revealed that qPCR is useful for identifying mycobacterial-infected animal hosts.

ContributorsHousman, Genevieve (Author) / Malukiewicz, Joanna (Author) / Boere, Vanner (Author) / Grativol, Adriana D. (Author) / Pereira, Luiz Cezar M. (Author) / de Oliveira e Silva, Ita (Author) / Ruiz-Miranda, Carlos R. (Author) / Truman, Richard (Author) / Stone, Anne (Author) / College of Liberal Arts and Sciences (Contributor) / School of Human Evolution and Social Change (Contributor) / Biodesign Institute (Contributor) / Personalized Diagnostics (Contributor)

Created2015-11-16

The Evolutionary History of Amino Acid Variations Mediating Increased Resistance of S. aureus Identifies Reversion Mutations in Metabolic Regulators

Description

The evolution of resistance in Staphylococcus aureus occurs rapidly, and in response to all known antimicrobial treatments. Numerous studies of model species describe compensatory roles of mutations in mediating competitive fitness, and there is growing evidence that these mutation types also drive adaptation of S. aureus strains. However, few studies…

The evolution of resistance in Staphylococcus aureus occurs rapidly, and in response to all known antimicrobial treatments. Numerous studies of model species describe compensatory roles of mutations in mediating competitive fitness, and there is growing evidence that these mutation types also drive adaptation of S. aureus strains. However, few studies have tracked amino acid changes during the complete evolutionary trajectory of antibiotic adaptation or been able to predict their functional relevance. Here, we have assessed the efficacy of computational methods to predict biological resistance of a collection of clinically known Resistance Associated Mutations (RAMs). We have found that >90% of known RAMs are incorrectly predicted to be functionally neutral by at least one of the prediction methods used. By tracing the evolutionary histories of all of the false negative RAMs, we have discovered that a significant number are reversion mutations to ancestral alleles also carried in the MSSA476 methicillin-sensitive isolate. These genetic reversions are most prevalent in strains following daptomycin treatment and show a tendency to accumulate in biological pathway reactions that are distinct from those accumulating non-reversion mutations. Our studies therefore show that in addition to non-reversion mutations, reversion mutations arise in isolates exposed to new antibiotic treatments. It is possible that acquisition of reversion mutations in the genome may prevent substantial fitness costs during the progression of resistance. Our findings pose an interesting question to be addressed by further clinical studies regarding whether or not these reversion mutations lead to a renewed vulnerability of a vancomycin or daptomycin resistant strain to antibiotics administered at an earlier stage of infection.

ContributorsChampion, Mia (Author) / Gray, Vanessa (Author) / Eberhard, Carl (Author) / Kumar, Sudhir (Author) / Biodesign Institute (Contributor) / Center for Evolution and Medicine (Contributor)

Created2013-02-12