Search Content

The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents

Description

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they…

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they typically require additional training (for example, scholars have to learn how to use the command line) or are difficult to automate without programming skills. The Giles Ecosystem is a distributed system based on Apache Kafka that allows users to upload documents for text and image extraction. The system components are implemented using Java and the Spring Framework and are available under an Open Source license on GitHub (https://github.com/diging/).

ContributorsLessios-Damerow, Julia (Contributor) / Peirson, Erick (Contributor) / Laubichler, Manfred (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2017-09-28

A context-dependent alarm signal in the ant Temnothorax rugatulus

Description

Because collective cognition emerges from local signaling among group members, deciphering communication systems is crucial to understanding the underlying mechanisms. Alarm signals are widespread in the social insects and can elicit a variety of behavioral responses to danger, but the functional plasticity of these signals has not been well studied.…

Because collective cognition emerges from local signaling among group members, deciphering communication systems is crucial to understanding the underlying mechanisms. Alarm signals are widespread in the social insects and can elicit a variety of behavioral responses to danger, but the functional plasticity of these signals has not been well studied. Here we report an alarm pheromone in the ant Temnothorax rugatulus that elicits two different behaviors depending on context. When an ant was tethered inside an unfamiliar nest site and unable to move freely, she released a pheromone from her mandibular gland that signaled other ants to reject this nest as a potential new home, presumably to avoid potential danger. When the same pheromone was presented near the ants' home nest, they were instead attracted to it, presumably to respond to a threat to the colony. We used coupled gas chromatography/mass spectrometry to identify candidate compounds from the mandibular gland and tested each one in a nest choice bioassay. We found that 2,5-dimethylpyrazine was sufficient to induce rejection of a marked new nest and also to attract ants when released at the home nest. This is the first detailed investigation of chemical communication in the leptothoracine ants. We discuss the possibility that this pheromone's deterrent function can improve an emigrating colony's nest site selection performance.

ContributorsSasaki, Takao (Author) / Hoelldobler, Bert (Author) / Millar, Jocelyn G. (Author) / Pratt, Stephen (Author) / College of Liberal Arts and Sciences (Contributor) / School of Life Sciences (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor) / Center for Social Dynamics and Complexity (Contributor)

Created2014-09-01

Differential Gene Expression in Type II Diabetes

Description

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develo…

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develop a gene regulatory pathway, and 2) utilize this pathway to determine suitable drug therapeutics for prevention and treatment. Using a Gene Set Enrichment Analysis (GSEA), a set of 1000 gene identifiers from a Mayo Clinic database was analyzed to determine the most significant genetic variants related to insulin signaling pathways involved in Type II Diabetes. The following genes were identified: NRAS, KRAS, PIK3CA, PDE3B, TSC1, AKT3, SOS1, NEU1, PRKAA2, AMPK, and ACC. In an extensive literature review and cross-analysis with Kegg and Reactome pathway databases, novel SNPs located on these gene variants were identified and used to determine suitable drug therapeutics for treatment. Overall, understanding how genetic mutations affect target gene function related to Type II Diabetes disease pathology is crucial to the development of effective diagnosis and treatment. This project provides new insight into the molecular basis of the Type II Diabetes, serving to help untangle the regulatory complexity of the disease and aid in the advancement of diagnosis and treatment. Keywords: Type II Diabetes mellitus, Gene Set Enrichment Analysis, genetic variants, KEGG Insulin Pathway, gene-regulatory pathway

ContributorsBucklin, Lindsay (Co-author) / Davis, Vanessa (Co-author) / Holechek, Susan (Thesis director) / Wang, Junwen (Committee member) / Nyarige, Verah (Committee member) / School of Human Evolution & Social Change (Contributor) / School of Life Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Differential Gene Expression in Type II Diabetes

Description

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develo…

This research project investigated known and novel differential genetic variants and their associated molecular pathways involved in Type II diabetes mellitus for the purpose of improving diagnosis and treatment methods. The goal of this investigation was to 1) identify the genetic variants and SNPs in Type II diabetes to develop a gene regulatory pathway, and 2) utilize this pathway to determine suitable drug therapeutics for prevention and treatment. Using a Gene Set Enrichment Analysis (GSEA), a set of 1000 gene identifiers from a Mayo Clinic database was analyzed to determine the most significant genetic variants related to insulin signaling pathways involved in Type II Diabetes. The following genes were identified: NRAS, KRAS, PIK3CA, PDE3B, TSC1, AKT3, SOS1, NEU1, PRKAA2, AMPK, and ACC. In an extensive literature review and cross-analysis with Kegg and Reactome pathway databases, novel SNPs located on these gene variants were identified and used to determine suitable drug therapeutics for treatment. Overall, understanding how genetic mutations affect target gene function related to Type II Diabetes disease pathology is crucial to the development of effective diagnosis and treatment. This project provides new insight into the molecular basis of the Type II Diabetes, serving to help untangle the regulatory complexity of the disease and aid in the advancement of diagnosis and treatment.

ContributorsDavis, Vanessa Brooke (Co-author) / Bucklin, Lindsay (Co-author) / Holechek, Susan (Thesis director) / Wang, Junwen (Committee member) / School of Molecular Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Novel Bioinformatics Methods for Co-expression Analysis of Single Cell RNA Sequencing and Circular RNA Sequencing Time Series Data

Description

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular…

High throughput transcriptome data analysis like Single-cell Ribonucleic Acid sequencing (scRNA-seq) and Circular Ribonucleic Acid (circRNA) data have made significant breakthroughs, especially in cancer genomics. Analysis of transcriptome time series data is core in identifying time point(s) where drastic changes in gene transcription are associated with homeostatic to non-homeostatic cellular transition (tipping points). In Chapter 2 of this dissertation, I present a novel cell-type specific and co-expression-based tipping point detection method to identify target gene (TG) versus transcription factor (TF) pairs whose differential co-expression across time points drive biological changes in different cell types and the time point when these changes are observed. This method was applied to scRNA-seq data sets from a SARS-CoV-2 study (18 time points), a human cerebellum development study (9 time points), and a lung injury study (18 time points). Similarly, leveraging transcriptome data across treatment time points, I developed methodologies to identify treatment-induced and cell-type specific differentially co-expressed pairs (DCEPs). In part one of Chapter 3, I presented a pipeline that used a series of statistical tests to detect DCEPs. This method was applied to scRNA-seq data of patients with non-small cell lung cancer (NSCLC) sequenced across cancer treatment times. However, this pipeline does not account for correlations among multiple single cells from the same sample and correlations among multiple samples from the same patient. In Part 2 of Chapter 3, I presented a solution to this problem using a mixed-effect model. In Chapter 4, I present a summary of my work that focused on the cross-species analysis of circRNA transcriptome time series data. I compared circRNA profiles in neonatal pig and mouse hearts, identified orthologous circRNAs, and discussed regulation mechanisms of cardiomyocyte proliferation and myocardial regeneration conserved between mouse and pig at different time points.

ContributorsNyarige, Verah Mocheche (Author) / Liu, Li (Thesis advisor) / Wang, Junwen (Thesis advisor) / Dinu, Valentin (Committee member) / Arizona State University (Publisher)

Created2022

Mining Associations between MRI Morphometry Measurements and Beta-Amyloid/tau Burden

Description

Beta-Amyloid(Aβ) plaques and tau protein tangles in the brain are now widely recognized as the defining hallmarks of Alzheimer’s disease (AD), followed by structural atrophy detectable on brain magnetic resonance imaging (MRI) scans. However, current methods to detect Aβ/tau pathology are either invasive (lumbar puncture) or quite costly and not…

Beta-Amyloid(Aβ) plaques and tau protein tangles in the brain are now widely recognized as the defining hallmarks of Alzheimer’s disease (AD), followed by structural atrophy detectable on brain magnetic resonance imaging (MRI) scans. However, current methods to detect Aβ/tau pathology are either invasive (lumbar puncture) or quite costly and not widely available (positron emission tomography (PET)). And one of the particular neurodegenerative regions is the hippocampus to which the influence of Aβ/tau on has been one of the research projects focuses in the AD pathophysiological progress. In this dissertation, I proposed three novel machine learning and statistical models to examine subtle aspects of the hippocampal morphometry from MRI that are associated with Aβ /tau burden in the brain, measured using PET images. The first model is a novel unsupervised feature reduction model to generate a low-dimensional representation of hippocampal morphometry for each individual subject, which has superior performance in predicting Aβ/tau burden in the brain. The second one is an efficient federated group lasso model to identify the hippocampal subregions where atrophy is strongly associated with abnormal Aβ/Tau. The last one is a federated model for imaging genetics, which can identify genetic and transcriptomic influences on hippocampal morphometry. Finally, I stated the results of these three models that have been published or submitted to peer-reviewed conferences and journals.

ContributorsWu, Jianfeng (Author) / Wang, Yalin (Thesis advisor) / Li, Baoxin (Committee member) / Liang, Jianming (Committee member) / Wang, Junwen (Committee member) / Wu, Teresa (Committee member) / Arizona State University (Publisher)

Created2022

Statistical Methods for Analysis of Genomic Data with Applications in Oncology

Description

This dissertation presents three novel algorithms with real-world applications to genomic oncology. While the methodologies presented here were all developed to overcome various challenges associated with the adoption of high throughput genomic data in clinical oncology, they can be used in other domains as well. First, a network informed feature…

This dissertation presents three novel algorithms with real-world applications to genomic oncology. While the methodologies presented here were all developed to overcome various challenges associated with the adoption of high throughput genomic data in clinical oncology, they can be used in other domains as well. First, a network informed feature ranking algorithm is presented, which shows a significant increase in ability to select true predictive features from simulated data sets when compared to other state of the art graphical feature ranking methods. The methodology also shows an increased ability to predict pathological complete response to preoperative chemotherapy from genomic sequencing data of breast cancer patients utilizing domain knowledge from protein-protein interaction networks. Second, an algorithm that overcomes population biases inherent in the use of a human reference genome developed primarily from European populations is presented to classify microsatellite instability (MSI) status from next-generation-sequencing (NGS) data. The methodology significantly increases the accuracy of MSI status prediction in African and African American ancestries. Finally, a single variable model is presented to capture the bimodality inherent in genomic data stemming from heterogeneous diseases. This model shows improvements over other parametric models in the measurements of receiver-operator characteristic (ROC) curves for bimodal data. The model is used to estimate ROC curves for heterogeneous biomarkers in a dataset containing breast cancer and cancer-free specimen.

ContributorsSaul, Michelle (Author) / Dinu, Valentin (Thesis advisor) / Liu, Li (Committee member) / Wang, Junwen (Committee member) / Arizona State University (Publisher)

Created2021

Leaning on the Ethical Crutch: A Critique of Codes of Ethics

Description

What's a profession without a code of ethics? Being a legitimate profession almost requires drafting a code and, at least nominally, making members follow it. Codes of ethics (henceforth “codes”) exist for a number of reasons, many of which can vary widely from profession to profession - but above all…

What's a profession without a code of ethics? Being a legitimate profession almost requires drafting a code and, at least nominally, making members follow it. Codes of ethics (henceforth “codes”) exist for a number of reasons, many of which can vary widely from profession to profession - but above all they are a form of codified self-regulation. While codes can be beneficial, it argues that when we scratch below the surface, there are many problems at their root. In terms of efficacy, codes can serve as a form of ethical window dressing, rather than effective rules for behavior. But even more that, codes can degrade the meaning behind being a good person who acts ethically for the right reasons.

ContributorsSadowski, Jathan (Author) / Consortium for Science, Policy and Outcomes (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2013-11-30

The Role of Diverse Strategies in Sustainable Knowledge Production

Description

Online communities are becoming increasingly important as platforms for large-scale human cooperation. These communities allow users seeking and sharing professional skills to solve problems collaboratively. To investigate how users cooperate to complete a large number of knowledge-producing tasks, we analyze Stack Exchange, one of the largest question and answer systems…

Online communities are becoming increasingly important as platforms for large-scale human cooperation. These communities allow users seeking and sharing professional skills to solve problems collaboratively. To investigate how users cooperate to complete a large number of knowledge-producing tasks, we analyze Stack Exchange, one of the largest question and answer systems in the world. We construct attention networks to model the growth of 110 communities in the Stack Exchange system and quantify individual answering strategies using the linking dynamics on attention networks. We identify two answering strategies. Strategy A aims at performing maintenance by doing simple tasks, whereas strategy B aims at investing time in doing challenging tasks. Both strategies are important: empirical evidence shows that strategy A decreases the median waiting time for answers and strategy B increases the acceptance rate of answers. In investigating the strategic persistence of users, we find that users tends to stick on the same strategy over time in a community, but switch from one strategy to the other across communities. This finding reveals the different sets of knowledge and skills between users. A balance between the population of users taking A and B strategies that approximates 2:1, is found to be optimal to the sustainable growth of communities.

ContributorsWu, Lingfei (Author) / Baggio, Jacopo (Author) / Janssen, Marco (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2016-03-02

Morals, Materials, and Technoscience: The Energy Security Imaginary in the United States

Description

This article advances recent scholarship on energy security by arguing that the concept is best understood as a sociotechnical imaginary, a collective vision for a “good society” realized through technoscientific-oriented policies. Focusing on the 1952 Resources for Freedom report, the authors trace the genealogy of energy security, elucidating how it…

This article advances recent scholarship on energy security by arguing that the concept is best understood as a sociotechnical imaginary, a collective vision for a “good society” realized through technoscientific-oriented policies. Focusing on the 1952 Resources for Freedom report, the authors trace the genealogy of energy security, elucidating how it establishes a morality of efficiency that orients policy action under the guise of security toward the liberalizing of markets in resource states and a robust program of energy research and development in the United States. This evidence challenges the pervasive historical anchoring of the concept in the 1970s and illustrates the importance of the genealogical approach for the emerging literature on energy and sociotechnical imaginaries. Exploring the genealogy of energy security also unpacks key social, political, and economic undercurrents that disrupt the seeming universality of the language of energy, leading the authors to question whether energy security discourse is appropriate for guiding policy action during ongoing global energy transitions.

ContributorsTidwell, Abraham (Author) / Smith, Jessica M. (Author) / Consortium for Science, Policy and Outcomes (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2015-09-01

Filtering by