Search Content

The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents

Description

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they…

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they typically require additional training (for example, scholars have to learn how to use the command line) or are difficult to automate without programming skills. The Giles Ecosystem is a distributed system based on Apache Kafka that allows users to upload documents for text and image extraction. The system components are implemented using Java and the Spring Framework and are available under an Open Source license on GitHub (https://github.com/diging/).

ContributorsLessios-Damerow, Julia (Contributor) / Peirson, Erick (Contributor) / Laubichler, Manfred (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2017-09-28

The Immunosignature of Canine Lymphoma: Characterization and Diagnostic Application

Description

Background: Cancer diagnosis in both dogs and humans is complicated by the lack of a non-invasive diagnostic test. To meet this clinical need, we apply the recently developed immunosignature assay to spontaneous canine lymphoma as clinical proof-of-concept. Here we evaluate the immunosignature as a diagnostic for spontaneous canine lymphoma at both…

Background: Cancer diagnosis in both dogs and humans is complicated by the lack of a non-invasive diagnostic test. To meet this clinical need, we apply the recently developed immunosignature assay to spontaneous canine lymphoma as clinical proof-of-concept. Here we evaluate the immunosignature as a diagnostic for spontaneous canine lymphoma at both at initial diagnosis and evaluating the disease free interval following treatment.

Methods: Sera from dogs with confirmed lymphoma (B cell n = 38, T cell n = 11) and clinically normal dogs (n = 39) were analyzed. Serum antibody responses were characterized by analyzing the binding pattern, or immunosignature, of serum antibodies on a non-natural sequence peptide microarray. Peptides were selected and tested for the ability to distinguish healthy dogs from those with lymphoma and to distinguish lymphoma subtypes based on immunophenotype. The immunosignature of dogs with lymphoma were evaluated for individual signatures. Changes in the immunosignatures were evaluated following treatment and eventual relapse.

Results: Despite being a clonal disease, both an individual immunosignature and a generalized lymphoma immunosignature were observed in each dog. The general lymphoma immunosignature identified in the initial set of dogs (n = 32) was able to predict disease status in an independent set of dogs (n = 42, 97% accuracy). A separate immunosignature was able to distinguish the lymphoma based on immunophenotype (n = 25, 88% accuracy). The individual immunosignature was capable of confirming remission three months following diagnosis. Immunosignature at diagnosis was able to predict which dogs with B cell lymphoma would relapse in less than 120 days (n = 33, 97% accuracy).

Conclusion: We conclude that the immunosignature can serve as a multilevel diagnostic for canine, and potentially human, lymphoma.

ContributorsJohnston, Stephen (Author) / Thamm, Douglas H. (Author) / Legutki, Joseph Barten (Author) / Biodesign Institute (Contributor)

Created2014-09-08

Leaning on the Ethical Crutch: A Critique of Codes of Ethics

Description

What's a profession without a code of ethics? Being a legitimate profession almost requires drafting a code and, at least nominally, making members follow it. Codes of ethics (henceforth “codes”) exist for a number of reasons, many of which can vary widely from profession to profession - but above all…

What's a profession without a code of ethics? Being a legitimate profession almost requires drafting a code and, at least nominally, making members follow it. Codes of ethics (henceforth “codes”) exist for a number of reasons, many of which can vary widely from profession to profession - but above all they are a form of codified self-regulation. While codes can be beneficial, it argues that when we scratch below the surface, there are many problems at their root. In terms of efficacy, codes can serve as a form of ethical window dressing, rather than effective rules for behavior. But even more that, codes can degrade the meaning behind being a good person who acts ethically for the right reasons.

ContributorsSadowski, Jathan (Author) / Consortium for Science, Policy and Outcomes (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2013-11-30

Early Effect in Time-Dependent, High-Dimensional Nonlinear Dynamical Systems With Multiple Resonances

Description

We investigate high-dimensional nonlinear dynamical systems exhibiting multiple resonances under adiabatic parameter variations. Our motivations come from experimental considerations where time-dependent sweeping of parameters is a practical approach to probing and characterizing the bifurcations of the system. The question is whether bifurcations so detected are faithful representations of the bifurcations…

We investigate high-dimensional nonlinear dynamical systems exhibiting multiple resonances under adiabatic parameter variations. Our motivations come from experimental considerations where time-dependent sweeping of parameters is a practical approach to probing and characterizing the bifurcations of the system. The question is whether bifurcations so detected are faithful representations of the bifurcations intrinsic to the original stationary system. Utilizing a harmonically forced, closed fluid flow system that possesses multiple resonances and solving the Navier-Stokes equation under proper boundary conditions, we uncover the phenomenon of the early effect. Specifically, as a control parameter, e.g., the driving frequency, is adiabatically increased from an initial value, resonances emerge at frequency values that are lower than those in the corresponding stationary system. The phenomenon is established by numerical characterization of physical quantities through the resonances, which include the kinetic energy and the vorticity field, and a heuristic analysis based on the concept of instantaneous frequency. A simple formula is obtained which relates the resonance points in the time-dependent and time-independent systems. Our findings suggest that, in general, any true bifurcation of a nonlinear dynamical system can be unequivocally uncovered through adiabatic parameter sweeping, in spite of a shift in the bifurcation point, which is of value to experimental studies of nonlinear dynamical systems.

ContributorsPark, Youngyong (Author) / Do, Younghae (Author) / Altmeyer, Sebastian (Author) / Lai, Ying-Cheng (Author) / Lee, GyuWon (Author) / Ira A. Fulton Schools of Engineering (Contributor)

Created2015-02-09

The Role of Diverse Strategies in Sustainable Knowledge Production

Description

Online communities are becoming increasingly important as platforms for large-scale human cooperation. These communities allow users seeking and sharing professional skills to solve problems collaboratively. To investigate how users cooperate to complete a large number of knowledge-producing tasks, we analyze Stack Exchange, one of the largest question and answer systems…

Online communities are becoming increasingly important as platforms for large-scale human cooperation. These communities allow users seeking and sharing professional skills to solve problems collaboratively. To investigate how users cooperate to complete a large number of knowledge-producing tasks, we analyze Stack Exchange, one of the largest question and answer systems in the world. We construct attention networks to model the growth of 110 communities in the Stack Exchange system and quantify individual answering strategies using the linking dynamics on attention networks. We identify two answering strategies. Strategy A aims at performing maintenance by doing simple tasks, whereas strategy B aims at investing time in doing challenging tasks. Both strategies are important: empirical evidence shows that strategy A decreases the median waiting time for answers and strategy B increases the acceptance rate of answers. In investigating the strategic persistence of users, we find that users tends to stick on the same strategy over time in a community, but switch from one strategy to the other across communities. This finding reveals the different sets of knowledge and skills between users. A balance between the population of users taking A and B strategies that approximates 2:1, is found to be optimal to the sustainable growth of communities.

ContributorsWu, Lingfei (Author) / Baggio, Jacopo (Author) / Janssen, Marco (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2016-03-02

A General Method to Discover Epitopes From Sera

Description

Antigen-antibody complexes are central players in an effective immune response. However, finding those interactions relevant to a particular disease state can be arduous. Nonetheless many paths to discovery have been explored since deciphering these interactions can greatly facilitate the development of new diagnostics, therapeutics, and vaccines. In silico B cell…

Antigen-antibody complexes are central players in an effective immune response. However, finding those interactions relevant to a particular disease state can be arduous. Nonetheless many paths to discovery have been explored since deciphering these interactions can greatly facilitate the development of new diagnostics, therapeutics, and vaccines. In silico B cell epitope mapping approaches have been widely pursued, though success has not been consistent. Antibody mixtures in immune sera have been used as handles for biologically relevant antigens, but these and other experimental approaches have proven resource intensive and time consuming. In addition, these methods are often tailored to individual diseases or a specific proteome, rather than providing a universal platform. Most of these methods are not able to identify the specific antibody’s epitopes from unknown antigens, such as un-annotated neo antigens in cancer. Alternatively, a peptide library comprised of sequences unrestricted by naturally-found protein space provides for a universal search for mimotopes of an antibody’s epitope. Here we present the utility of such a non-natural random sequence library of 10,000 peptides physically addressed on a microarray for mimotope discovery without sequence information of the specific antigen. The peptide arrays were probed with serum from an antigen-immunized rabbit, or alternatively probed with serum pre-absorbed with the same immunizing antigen. With this positive and negative screening scheme, we identified the library-peptides as the mimotopes of the antigen. The unique library peptides were successfully used to isolate antigen-specific antibodies from complete immune serum. Sequence analysis of these peptides revealed the epitopes in the immunized antigen. We present this method as an inexpensive, efficient method for identifying mimotopes of any antibody’s targets. These mimotopes should be useful in defining both components of the antigen-antibody complex.

ContributorsWhittemore, Kurt (Author) / Johnston, Stephen (Author) / Sykes, Kathryn (Author) / Shen, Luhui (Author) / Biodesign Institute (Contributor)

Created2016-06-14

Novel Immune-Modulator Identified by a Rapid, Functional Screen of the Parapoxvirus Ovis (Orf Virus) Genome

Description

Background: The success of new sequencing technologies and informatic methods for identifying genes has made establishing gene product function a critical rate limiting step in progressing the molecular sciences. We present a method to functionally mine genomes for useful activities in vivo, using an unusual property of a member of the…

Background: The success of new sequencing technologies and informatic methods for identifying genes has made establishing gene product function a critical rate limiting step in progressing the molecular sciences. We present a method to functionally mine genomes for useful activities in vivo, using an unusual property of a member of the poxvirus family to demonstrate this screening approach.

Results: The genome of Parapoxvirus ovis (Orf virus) was sequenced, annotated, and then used to PCR-amplify its open-reading-frames. Employing a cloning-independent protocol, a viral expression-library was rapidly built and arrayed into sub-library pools. These were directly delivered into mice as expressible cassettes and assayed for an immune-modulating activity associated with parapoxvirus infection. The product of the B2L gene, a homolog of vaccinia F13L, was identified as the factor eliciting immune cell accumulation at sites of skin inoculation. Administration of purified B2 protein also elicited immune cell accumulation activity, and additionally was found to serve as an adjuvant for antigen-specific responses. Co-delivery of the B2L gene with an influenza gene-vaccine significantly improved protection in mice. Furthermore, delivery of the B2L expression construct, without antigen, non-specifically reduced tumor growth in murine models of cancer.

Conclusion: A streamlined, functional approach to genome-wide screening of a biological activity in vivo is presented. Its application to screening in mice for an immune activity elicited by the pathogen genome of Parapoxvirus ovis yielded a novel immunomodulator. In this inverted discovery method, it was possible to identify the adjuvant responsible for a function of interest prior to a mechanistic study of the adjuvant. The non-specific immune activity of this modulator, B2, is similar to that associated with administration of inactivated particles to a host or to a live viral infection. Administration of B2 may provide the opportunity to significantly impact host immunity while being itself only weakly recognized. The functional genomics method used to pinpoint B2 within an ORFeome may be more broadly applicable to screening for other biological activities in an animal.

ContributorsMcGuire, Michael J. (Author) / Johnston, Stephen (Author) / Sykes, Kathryn (Author) / Biodesign Institute (Contributor)

Created2012-01-13

Statistical Methods for Analyzing Immunosignatures

Description

Background: Immunosignaturing is a new peptide microarray based technology for profiling of humoral immune responses. Despite new challenges, immunosignaturing gives us the opportunity to explore new and fundamentally different research questions. In addition to classifying samples based on disease status, the complex patterns and latent factors underlying immunosignatures, which we attempt…

Background: Immunosignaturing is a new peptide microarray based technology for profiling of humoral immune responses. Despite new challenges, immunosignaturing gives us the opportunity to explore new and fundamentally different research questions. In addition to classifying samples based on disease status, the complex patterns and latent factors underlying immunosignatures, which we attempt to model, may have a diverse range of applications.

Methods: We investigate the utility of a number of statistical methods to determine model performance and address challenges inherent in analyzing immunosignatures. Some of these methods include exploratory and confirmatory factor analyses, classical significance testing, structural equation and mixture modeling.

Results: We demonstrate an ability to classify samples based on disease status and show that immunosignaturing is a very promising technology for screening and presymptomatic screening of disease. In addition, we are able to model complex patterns and latent factors underlying immunosignatures. These latent factors may serve as biomarkers for disease and may play a key role in a bioinformatic method for antibody discovery.

Conclusion: Based on this research, we lay out an analytic framework illustrating how immunosignatures may be useful as a general method for screening and presymptomatic screening of disease as well as antibody discovery.

ContributorsBrown, Justin (Author) / Stafford, Phillip (Author) / Johnston, Stephen (Author) / Dinu, Valentin (Author) / College of Health Solutions (Contributor)

Created2011-08-19

Comparative Study of Classification Algorithms for Immunosignaturing Data

Description

Background: High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of…

Background: High-throughput technologies such as DNA, RNA, protein, antibody and peptide microarrays are often used to examine differences across drug treatments, diseases, transgenic animals, and others. Typically one trains a classification system by gathering large amounts of probe-level data, selecting informative features, and classifies test samples using a small number of features. As new microarrays are invented, classification systems that worked well for other array types may not be ideal. Expression microarrays, arguably one of the most prevalent array types, have been used for years to help develop classification algorithms. Many biological assumptions are built into classifiers that were designed for these types of data. One of the more problematic is the assumption of independence, both at the probe level and again at the biological level. Probes for RNA transcripts are designed to bind single transcripts. At the biological level, many genes have dependencies across transcriptional pathways where co-regulation of transcriptional units may make many genes appear as being completely dependent. Thus, algorithms that perform well for gene expression data may not be suitable when other technologies with different binding characteristics exist. The immunosignaturing microarray is based on complex mixtures of antibodies binding to arrays of random sequence peptides. It relies on many-to-many binding of antibodies to the random sequence peptides. Each peptide can bind multiple antibodies and each antibody can bind multiple peptides. This technology has been shown to be highly reproducible and appears promising for diagnosing a variety of disease states. However, it is not clear what is the optimal classification algorithm for analyzing this new type of data.

Results: We characterized several classification algorithms to analyze immunosignaturing data. We selected several datasets that range from easy to difficult to classify, from simple monoclonal binding to complex binding patterns in asthma patients. We then classified the biological samples using 17 different classification algorithms. Using a wide variety of assessment criteria, we found ‘Naïve Bayes’ far more useful than other widely used methods due to its simplicity, robustness, speed and accuracy.

Conclusions: ‘Naïve Bayes’ algorithm appears to accommodate the complex patterns hidden within multilayered immunosignaturing microarray data due to its fundamental mathematical properties.

ContributorsKukreja, Muskan (Author) / Johnston, Stephen (Author) / Stafford, Phillip (Author) / Biodesign Institute (Contributor)

Created2012-06-21

Challenges and Opportunities in Coding the Commons: Problems, Procedures, and Potential Solutions in Large-N Comparative Case Studies

Description

On-going efforts to understand the dynamics of coupled social-ecological (or more broadly, coupled infrastructure) systems and common pool resources have led to the generation of numerous datasets based on a large number of case studies. This data has facilitated the identification of important factors and fundamental principles which increase our…

On-going efforts to understand the dynamics of coupled social-ecological (or more broadly, coupled infrastructure) systems and common pool resources have led to the generation of numerous datasets based on a large number of case studies. This data has facilitated the identification of important factors and fundamental principles which increase our understanding of such complex systems. However, the data at our disposal are often not easily comparable, have limited scope and scale, and are based on disparate underlying frameworks inhibiting synthesis, meta-analysis, and the validation of findings. Research efforts are further hampered when case inclusion criteria, variable definitions, coding schema, and inter-coder reliability testing are not made explicit in the presentation of research and shared among the research community. This paper first outlines challenges experienced by researchers engaged in a large-scale coding project; then highlights valuable lessons learned; and finally discusses opportunities for further research on comparative case study analysis focusing on social-ecological systems and common pool resources. Includes supplemental materials and appendices published in the International Journal of the Commons 2016 Special Issue. Volume 10 - Issue 2 - 2016.

ContributorsRatajczyk, Elicia (Author) / Brady, Ute (Author) / Baggio, Jacopo (Author) / Barnett, Allain J. (Author) / Perez Ibarra, Irene (Author) / Rollins, Nathan (Author) / Rubinos, Cathy (Author) / Shin, Hoon Cheol (Author) / Yu, David (Author) / Aggarwal, Rimjhim (Author) / Anderies, John (Author) / Janssen, Marco (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2016-09-09

ASU Scholarship Showcase

Filtering by

The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents

The Immunosignature of Canine Lymphoma: Characterization and Diagnostic Application

Leaning on the Ethical Crutch: A Critique of Codes of Ethics

Early Effect in Time-Dependent, High-Dimensional Nonlinear Dynamical Systems With Multiple Resonances

The Role of Diverse Strategies in Sustainable Knowledge Production

A General Method to Discover Epitopes From Sera

Novel Immune-Modulator Identified by a Rapid, Functional Screen of the Parapoxvirus Ovis (Orf Virus) Genome

Statistical Methods for Analyzing Immunosignatures

Comparative Study of Classification Algorithms for Immunosignaturing Data

Challenges and Opportunities in Coding the Commons: Problems, Procedures, and Potential Solutions in Large-N Comparative Case Studies