Search Content

The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents

Description

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they…

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they typically require additional training (for example, scholars have to learn how to use the command line) or are difficult to automate without programming skills. The Giles Ecosystem is a distributed system based on Apache Kafka that allows users to upload documents for text and image extraction. The system components are implemented using Java and the Spring Framework and are available under an Open Source license on GitHub (https://github.com/diging/).

ContributorsLessios-Damerow, Julia (Contributor) / Peirson, Erick (Contributor) / Laubichler, Manfred (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2017-09-28

The Immunosignature of Canine Lymphoma: Characterization and Diagnostic Application

Description

Background: Cancer diagnosis in both dogs and humans is complicated by the lack of a non-invasive diagnostic test. To meet this clinical need, we apply the recently developed immunosignature assay to spontaneous canine lymphoma as clinical proof-of-concept. Here we evaluate the immunosignature as a diagnostic for spontaneous canine lymphoma at both…

Background: Cancer diagnosis in both dogs and humans is complicated by the lack of a non-invasive diagnostic test. To meet this clinical need, we apply the recently developed immunosignature assay to spontaneous canine lymphoma as clinical proof-of-concept. Here we evaluate the immunosignature as a diagnostic for spontaneous canine lymphoma at both at initial diagnosis and evaluating the disease free interval following treatment.

Methods: Sera from dogs with confirmed lymphoma (B cell n = 38, T cell n = 11) and clinically normal dogs (n = 39) were analyzed. Serum antibody responses were characterized by analyzing the binding pattern, or immunosignature, of serum antibodies on a non-natural sequence peptide microarray. Peptides were selected and tested for the ability to distinguish healthy dogs from those with lymphoma and to distinguish lymphoma subtypes based on immunophenotype. The immunosignature of dogs with lymphoma were evaluated for individual signatures. Changes in the immunosignatures were evaluated following treatment and eventual relapse.

Results: Despite being a clonal disease, both an individual immunosignature and a generalized lymphoma immunosignature were observed in each dog. The general lymphoma immunosignature identified in the initial set of dogs (n = 32) was able to predict disease status in an independent set of dogs (n = 42, 97% accuracy). A separate immunosignature was able to distinguish the lymphoma based on immunophenotype (n = 25, 88% accuracy). The individual immunosignature was capable of confirming remission three months following diagnosis. Immunosignature at diagnosis was able to predict which dogs with B cell lymphoma would relapse in less than 120 days (n = 33, 97% accuracy).

Conclusion: We conclude that the immunosignature can serve as a multilevel diagnostic for canine, and potentially human, lymphoma.

ContributorsJohnston, Stephen (Author) / Thamm, Douglas H. (Author) / Legutki, Joseph Barten (Author) / Biodesign Institute (Contributor)

Created2014-09-08

Challenges and Opportunities in Coding the Commons: Problems, Procedures, and Potential Solutions in Large-N Comparative Case Studies

Description

On-going efforts to understand the dynamics of coupled social-ecological (or more broadly, coupled infrastructure) systems and common pool resources have led to the generation of numerous datasets based on a large number of case studies. This data has facilitated the identification of important factors and fundamental principles which increase our…

On-going efforts to understand the dynamics of coupled social-ecological (or more broadly, coupled infrastructure) systems and common pool resources have led to the generation of numerous datasets based on a large number of case studies. This data has facilitated the identification of important factors and fundamental principles which increase our understanding of such complex systems. However, the data at our disposal are often not easily comparable, have limited scope and scale, and are based on disparate underlying frameworks inhibiting synthesis, meta-analysis, and the validation of findings. Research efforts are further hampered when case inclusion criteria, variable definitions, coding schema, and inter-coder reliability testing are not made explicit in the presentation of research and shared among the research community. This paper first outlines challenges experienced by researchers engaged in a large-scale coding project; then highlights valuable lessons learned; and finally discusses opportunities for further research on comparative case study analysis focusing on social-ecological systems and common pool resources. Includes supplemental materials and appendices published in the International Journal of the Commons 2016 Special Issue. Volume 10 - Issue 2 - 2016.

ContributorsRatajczyk, Elicia (Author) / Brady, Ute (Author) / Baggio, Jacopo (Author) / Barnett, Allain J. (Author) / Perez Ibarra, Irene (Author) / Rollins, Nathan (Author) / Rubinos, Cathy (Author) / Shin, Hoon Cheol (Author) / Yu, David (Author) / Aggarwal, Rimjhim (Author) / Anderies, John (Author) / Janssen, Marco (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2016-09-09

The Tragedy of the Unexamined Cat: Why K-12 and University Education Are Still in the Dark Ages and How Citizen Science Allows for a Renaissance

Description

At the end of the dark ages, anatomy was taught as though everything that could be known was known. Scholars learned about what had been discovered rather than how to make discoveries. This was true even though the body (and the rest of biology) was very poorly understood. The renaissance…

At the end of the dark ages, anatomy was taught as though everything that could be known was known. Scholars learned about what had been discovered rather than how to make discoveries. This was true even though the body (and the rest of biology) was very poorly understood. The renaissance eventually brought a revolution in how scholars (and graduate students) were trained and worked. This revolution never occurred in K-12 or university education such that we now teach young students in much the way that scholars were taught in the dark ages, we teach them what is already known rather than the process of knowing. Citizen science offers a way to change K-12 and university education and, in doing so, complete the renaissance. Here we offer an example of such an approach and call for change in the way students are taught science, change that is more possible than it has ever been and is, nonetheless, five hundred years delayed.

ContributorsDunn, Robert R. (Author) / Urban, Julie (Author) / Cavalier, Darlene (Author) / Cooper, Caren B. (Author) / Consortium for Science, Policy and Outcomes (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2016-03-01

A Microbial Survey of the International Space Station (ISS)

Description

Background: Modern advances in sequencing technology have enabled the census of microbial members of many natural ecosystems. Recently, attention is increasingly being paid to the microbial residents of human-made, built ecosystems, both private (homes) and public (subways, office buildings, and hospitals). Here, we report results of the characterization of the microbial…

Background: Modern advances in sequencing technology have enabled the census of microbial members of many natural ecosystems. Recently, attention is increasingly being paid to the microbial residents of human-made, built ecosystems, both private (homes) and public (subways, office buildings, and hospitals). Here, we report results of the characterization of the microbial ecology of a singular built environment, the International Space Station (ISS). This ISS sampling involved the collection and microbial analysis (via 16S rRNA gene PCR) of 15 surfaces sampled by swabs onboard the ISS. This sampling was a component of Project MERCCURI (Microbial Ecology Research Combining Citizen and University Researchers on ISS). Learning more about the microbial inhabitants of the “buildings” in which we travel through space will take on increasing importance, as plans for human exploration continue, with the possibility of colonization of other planets and moons.

Results: Sterile swabs were used to sample 15 surfaces onboard the ISS. The sites sampled were designed to be analogous to samples collected for (1) the Wildlife of Our Homes project and (2) a study of cell phones and shoes that were concurrently being collected for another component of Project MERCCURI. Sequencing of the 16S rRNA genes amplified from DNA extracted from each swab was used to produce a census of the microbes present on each surface sampled. We compared the microbes found on the ISS swabs to those from both homes on Earth and data from the Human Microbiome Project.

Conclusions: While significantly different from homes on Earth and the Human Microbiome Project samples analyzed here, the microbial community composition on the ISS was more similar to home surfaces than to the human microbiome samples. The ISS surfaces are OTU-rich with 1,036–4,294 operational taxonomic units (OTUs per sample). There was no discernible biogeography of microbes on the 15 ISS surfaces, although this may be a reflection of the small sample size we were able to obtain.

ContributorsLang, Jenna M. (Author) / Coil, David A. (Author) / Neches, Russell Y. (Author) / Brown, Wendy E. (Author) / Cavalier, Darlene (Author) / Severance, Mark (Author) / Hampton-Marcell, Jarrad T. (Author) / Gilbert, Jack A. (Author) / Eisen, Jonathan A. (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2017-12-05

Peptide Sequencing Directly on Solid Surfaces Using MALDI Mass Spectrometry

Description

There are an increasing variety of applications in which peptides are both synthesized and used attached to solid surfaces. This has created a need for high throughput sequence analysis directly on surfaces. However, common sequencing approaches that can be adapted to surface bound peptides lack the throughput often needed in…

There are an increasing variety of applications in which peptides are both synthesized and used attached to solid surfaces. This has created a need for high throughput sequence analysis directly on surfaces. However, common sequencing approaches that can be adapted to surface bound peptides lack the throughput often needed in library-based applications. Here we describe a simple approach for sequence analysis directly on solid surfaces that is both high speed and high throughput, utilizing equipment available in most protein analysis facilities. In this approach, surface bound peptides, selectively labeled at their N-termini with a positive charge-bearing group, are subjected to controlled degradation in ammonia gas, resulting in a set of fragments differing by a single amino acid that remain spatially confined on the surface they were bound to. These fragments can then be analyzed by MALDI mass spectrometry, and the peptide sequences read directly from the resulting spectra.

ContributorsZhao, Zhan-Gong (Author) / Cordovez, Lalaine Anne (Author) / Johnston, Stephen (Author) / Woodbury, Neal (Author) / Biodesign Institute (Contributor)

Created2017-12-19

A Simple Platform for the Rapid Development of Antimicrobials

Description

Recent infectious outbreaks highlight the need for platform technologies that can be quickly deployed to develop therapeutics needed to contain the outbreak. We present a simple concept for rapid development of new antimicrobials. The goal was to produce in as little as one week thousands of doses of an intervention…

Recent infectious outbreaks highlight the need for platform technologies that can be quickly deployed to develop therapeutics needed to contain the outbreak. We present a simple concept for rapid development of new antimicrobials. The goal was to produce in as little as one week thousands of doses of an intervention for a new pathogen. We tested the feasibility of a system based on antimicrobial synbodies. The system involves creating an array of 100 peptides that have been selected for broad capability to bind and/or kill viruses and bacteria. The peptides are pre-screened for low cell toxicity prior to large scale synthesis. Any pathogen is then assayed on the chip to find peptides that bind or kill it. Peptides are combined in pairs as synbodies and further screened for activity and toxicity. The lead synbody can be quickly produced in large scale, with completion of the entire process in one week.

ContributorsJohnston, Stephen (Author) / Domenyuk, Valeriy (Author) / Gupta, Nidhi (Author) / Tavares Batista, Milene (Author) / Lainson, John (Author) / Zhao, Zhan-Gong (Author) / Lusk, Joel (Author) / Loskutov, Andrey (Author) / Cichacz, Zbigniew (Author) / Stafford, Phillip (Author) / Legutki, Joseph Barten (Author) / Diehnelt, Chris (Author) / Biodesign Institute (Contributor)

Created2017-12-14

The Collective Direction of Attention Diffusion

Description

We find that the flow of attention on the Web forms a directed, tree-like structure implying the time-sensitive browsing behavior of users. Using the data of a news sharing website, we construct clickstream networks in which nodes are news stories and edges represent the consecutive clicks between two stories. To…

We find that the flow of attention on the Web forms a directed, tree-like structure implying the time-sensitive browsing behavior of users. Using the data of a news sharing website, we construct clickstream networks in which nodes are news stories and edges represent the consecutive clicks between two stories. To identify the flow direction of clickstreams, we define the “flow distance” of nodes (L_i), which measures the average number of steps a random walker takes to reach the ith node. It is observed that L_i is related with the clicks (C_i) to news stories and the age (T_i) of stories. Putting these three variables together help us understand the rise and decay of news stories from a network perspective. We also find that the studied clickstream networks preserve a stable structure over time, leading to the scaling between users and clicks. The universal scaling behavior is confirmed by the 1,000 Web forums. We suggest that the tree-like, stable structure of clickstream networks reveals the time-sensitive preference of users in online browsing. To test our assumption, we discuss three models on individual browsing behavior, and compare the simulation results with empirical data.

ContributorsWang, Cheng-Jun (Author) / Wu, Lingfei (Author) / Zhang, Jiang (Author) / Janssen, Marco (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2016-09-28

Control of Finite Critical Behaviour in a Small-Scale Social System

Description

Many adaptive systems sit near a tipping or critical point. For systems near a critical point small changes to component behaviour can induce large-scale changes in aggregate structure and function. Criticality can be adaptive when the environment is changing, but entails reduced robustness through sensitivity. This tradeoff can be resolved…

Many adaptive systems sit near a tipping or critical point. For systems near a critical point small changes to component behaviour can induce large-scale changes in aggregate structure and function. Criticality can be adaptive when the environment is changing, but entails reduced robustness through sensitivity. This tradeoff can be resolved when criticality can be tuned. We address the control of finite measures of criticality using data on fight sizes from an animal society model system (Macaca nemestrina, n=48). We find that a heterogeneous, socially organized system, like homogeneous, spatial systems (flocks and schools), sits near a critical point; the contributions individuals make to collective phenomena can be quantified; there is heterogeneity in these contributions; and distance from the critical point (DFC) can be controlled through biologically plausible mechanisms exploiting heterogeneity. We propose two alternative hypotheses for why a system decreases the distance from the critical point.

ContributorsDaniels, Bryan (Author) / Krakauer, David (Author) / Flack, Jessica (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2017-02-10

Comparative Study of Classification Algorithms for Immunosignaturing Data

Description

Background: High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of…

Background: High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of features. As new microarrays are invented, classification systems that worked well for other array types may not be ideal. Expression microarrays, arguably one of the most prevalent array types, have been used for years to help develop classification algorithms. Many biological assumptions are built into classifiers that were designed for these types of data. One of the more problematic is the assumption of independence, both at the probe level and again at the biological level. Probes for RNA transcripts are designed to bind single transcripts. At the biological level, many genes have dependencies across transcriptional pathways where co-regulation of transcriptional units may make many genes appear as being completely dependent. Thus, algorithms that perform well for gene expression data may not be suitable when other technologies with different binding characteristics exist. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides. It relies on many-to-many binding of antibodies to the random sequence peptides. Each peptide can bind multiple antibodies and each antibody can bind multiple peptides. This technology has been shown to be highly reproducible and appears promising for diagnosing a variety of disease states. However, it is not clear what is the optimal classification algorithm for analyzing this new type of data.

Results: We characterized several classification algorithms to analyze immunosignaturing data. We selected several datasets that range from easy to difficult to classify, from simple monoclonal binding to complex binding patterns in asthma patients. We then classified the biological samples using 17 different classification algorithms. Using a wide variety of assessment criteria, we found ‘Naïve Bayes’ far more useful than other widely used methods due to its simplicity, robustness, speed and accuracy.

Conclusions: ‘Naïve Bayes’ algorithm appears to accommodate the complex patterns hidden within multilayered immunosignaturing microarray data due to its fundamental mathematical properties.

ContributorsKukreja, Muskan (Author) / Johnston, Stephen (Author) / Stafford, Phillip (Author) / Biodesign Institute (Contributor)

Created2012-06-21

ASU Scholarship Showcase

Filtering by

The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents

The Immunosignature of Canine Lymphoma: Characterization and Diagnostic Application

Challenges and Opportunities in Coding the Commons: Problems, Procedures, and Potential Solutions in Large-N Comparative Case Studies

The Tragedy of the Unexamined Cat: Why K-12 and University Education Are Still in the Dark Ages and How Citizen Science Allows for a Renaissance

A Microbial Survey of the International Space Station (ISS)

Peptide Sequencing Directly on Solid Surfaces Using MALDI Mass Spectrometry

A Simple Platform for the Rapid Development of Antimicrobials

The Collective Direction of Attention Diffusion

Control of Finite Critical Behaviour in a Small-Scale Social System

Comparative Study of Classification Algorithms for Immunosignaturing Data