Search Content

The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents

Description

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they…

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they typically require additional training (for example, scholars have to learn how to use the command line) or are difficult to automate without programming skills. The Giles Ecosystem is a distributed system based on Apache Kafka that allows users to upload documents for text and image extraction. The system components are implemented using Java and the Spring Framework and are available under an Open Source license on GitHub (https://github.com/diging/).

ContributorsLessios-Damerow, Julia (Contributor) / Peirson, Erick (Contributor) / Laubichler, Manfred (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2017-09-28

Mathematical Analysis of Glioma Growth in a Murine Model

Description

Five immunocompetent C57BL/6-cBrd/cBrd/Cr (albino C57BL/6) mice were injected with GL261-luc2 cells, a cell line sharing characteristics of human glioblastoma multiforme (GBM). The mice were imaged using magnetic resonance (MR) at five separate time points to characterize growth and development of the tumor. After 25 days, the final tumor volumes of…

Five immunocompetent C57BL/6-cBrd/cBrd/Cr (albino C57BL/6) mice were injected with GL261-luc2 cells, a cell line sharing characteristics of human glioblastoma multiforme (GBM). The mice were imaged using magnetic resonance (MR) at five separate time points to characterize growth and development of the tumor. After 25 days, the final tumor volumes of the mice varied from 12 mm³ to 62 mm³, even though mice were inoculated from the same tumor cell line under carefully controlled conditions. We generated hypotheses to explore large variances in final tumor size and tested them with our simple reaction-diffusion model in both a 3-dimensional (3D) finite difference method and a 2-dimensional (2D) level set method. The parameters obtained from a best-fit procedure, designed to yield simulated tumors as close as possible to the observed ones, vary by an order of magnitude between the three mice analyzed in detail. These differences may reflect morphological and biological variability in tumor growth, as well as errors in the mathematical model, perhaps from an oversimplification of the tumor dynamics or nonidentifiability of parameters. Our results generate parameters that match other experimental in vitro and in vivo measurements. Additionally, we calculate wave speed, which matches with other rat and human measurements.

ContributorsRutter, Erica (Author) / Stepien, Tracy (Author) / Anderies, Barrett (Author) / Plasencia, Jonathan (Author) / Woolf, Eric C. (Author) / Scheck, Adrienne C. (Author) / Turner, Gregory H. (Author) / Liu, Qingwei (Author) / Frakes, David (Author) / Kodibagkar, Vikram (Author) / Kuang, Yang (Author) / Preul, Mark C. (Author) / Kostelich, Eric (Author) / College of Liberal Arts and Sciences (Contributor)

Created2017-05-31

Accurate State Estimation From Uncertain Data and Models: An Application of Data Assimilation to Mathematical Models of Human Brain Tumors

Description

Background:
Data assimilation refers to methods for updating the state vector (initial condition) of a complex spatiotemporal model (such as a numerical weather model) by combining new observations with one or more prior forecasts. We consider the potential feasibility of this approach for making short-term (60-day) forecasts of the growth and…

Background:
Data assimilation refers to methods for updating the state vector (initial condition) of a complex spatiotemporal model (such as a numerical weather model) by combining new observations with one or more prior forecasts. We consider the potential feasibility of this approach for making short-term (60-day) forecasts of the growth and spread of a malignant brain cancer (glioblastoma multiforme) in individual patient cases, where the observations are synthetic magnetic resonance images of a hypothetical tumor.

Results:
We apply a modern state estimation algorithm (the Local Ensemble Transform Kalman Filter), previously developed for numerical weather prediction, to two different mathematical models of glioblastoma, taking into account likely errors in model parameters and measurement uncertainties in magnetic resonance imaging. The filter can accurately shadow the growth of a representative synthetic tumor for 360 days (six 60-day forecast/update cycles) in the presence of a moderate degree of systematic model error and measurement noise.

Conclusions:
The mathematical methodology described here may prove useful for other modeling efforts in biology and oncology. An accurate forecast system for glioblastoma may prove useful in clinical settings for treatment planning and patient counseling.

ContributorsKostelich, Eric (Author) / Kuang, Yang (Author) / McDaniel, Joshua (Author) / Moore, Nina Z. (Author) / Martirosyan, Nikolay L. (Author) / Preul, Mark C. (Author) / College of Liberal Arts and Sciences (Contributor)

Created2011-12-21

Challenges and Opportunities in Coding the Commons: Problems, Procedures, and Potential Solutions in Large-N Comparative Case Studies

Description

On-going efforts to understand the dynamics of coupled social-ecological (or more broadly, coupled infrastructure) systems and common pool resources have led to the generation of numerous datasets based on a large number of case studies. This data has facilitated the identification of important factors and fundamental principles which increase our…

On-going efforts to understand the dynamics of coupled social-ecological (or more broadly, coupled infrastructure) systems and common pool resources have led to the generation of numerous datasets based on a large number of case studies. This data has facilitated the identification of important factors and fundamental principles which increase our understanding of such complex systems. However, the data at our disposal are often not easily comparable, have limited scope and scale, and are based on disparate underlying frameworks inhibiting synthesis, meta-analysis, and the validation of findings. Research efforts are further hampered when case inclusion criteria, variable definitions, coding schema, and inter-coder reliability testing are not made explicit in the presentation of research and shared among the research community. This paper first outlines challenges experienced by researchers engaged in a large-scale coding project; then highlights valuable lessons learned; and finally discusses opportunities for further research on comparative case study analysis focusing on social-ecological systems and common pool resources. Includes supplemental materials and appendices published in the International Journal of the Commons 2016 Special Issue. Volume 10 - Issue 2 - 2016.

ContributorsRatajczyk, Elicia (Author) / Brady, Ute (Author) / Baggio, Jacopo (Author) / Barnett, Allain J. (Author) / Perez Ibarra, Irene (Author) / Rollins, Nathan (Author) / Rubinos, Cathy (Author) / Shin, Hoon Cheol (Author) / Yu, David (Author) / Aggarwal, Rimjhim (Author) / Anderies, John (Author) / Janssen, Marco (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2016-09-09

Spatiotemporal Patterns and Predictability of Cyberattacks

Description

A relatively unexplored issue in cybersecurity science and engineering is whether there exist intrinsic patterns of cyberattacks. Conventional wisdom favors absence of such patterns due to the overwhelming complexity of the modern cyberspace. Surprisingly, through a detailed analysis of an extensive data set that records the time-dependent frequencies of attacks…

A relatively unexplored issue in cybersecurity science and engineering is whether there exist intrinsic patterns of cyberattacks. Conventional wisdom favors absence of such patterns due to the overwhelming complexity of the modern cyberspace. Surprisingly, through a detailed analysis of an extensive data set that records the time-dependent frequencies of attacks over a relatively wide range of consecutive IP addresses, we successfully uncover intrinsic spatiotemporal patterns underlying cyberattacks, where the term “spatio” refers to the IP address space. In particular, we focus on analyzing macroscopic properties of the attack traffic flows and identify two main patterns with distinct spatiotemporal characteristics: deterministic and stochastic. Strikingly, there are very few sets of major attackers committing almost all the attacks, since their attack “fingerprints” and target selection scheme can be unequivocally identified according to the very limited number of unique spatiotemporal characteristics, each of which only exists on a consecutive IP region and differs significantly from the others. We utilize a number of quantitative measures, including the flux-fluctuation law, the Markov state transition probability matrix, and predictability measures, to characterize the attack patterns in a comprehensive manner. A general finding is that the attack patterns possess high degrees of predictability, potentially paving the way to anticipating and, consequently, mitigating or even preventing large-scale cyberattacks using macroscopic approaches.

ContributorsChen, Yu-Zhong (Author) / Huang, Zi-Gang (Author) / Xu, Shouhuai (Author) / Lai, Ying-Cheng (Author) / Ira A. Fulton Schools of Engineering (Contributor)

Created2015-05-20

Optimization and Resilience of Complex Supply-Demand Networks

Description

Supply-demand processes take place on a large variety of real-world networked systems ranging from power grids and the internet to social networking and urban systems. In a modern infrastructure, supply-demand systems are constantly expanding, leading to constant increase in load requirement for resources and consequently, to problems such as low…

Supply-demand processes take place on a large variety of real-world networked systems ranging from power grids and the internet to social networking and urban systems. In a modern infrastructure, supply-demand systems are constantly expanding, leading to constant increase in load requirement for resources and consequently, to problems such as low efficiency, resource scarcity, and partial system failures. Under certain conditions global catastrophe on the scale of the whole system can occur through the dynamical process of cascading failures. We investigate optimization and resilience of time-varying supply-demand systems by constructing network models of such systems, where resources are transported from the supplier sites to users through various links. Here by optimization we mean minimization of the maximum load on links, and system resilience can be characterized using the cascading failure size of users who fail to connect with suppliers.

We consider two representative classes of supply schemes: load driven supply and fix fraction supply. Our findings are: (1) optimized systems are more robust since relatively smaller cascading failures occur when triggered by external perturbation to the links; (2) a large fraction of links can be free of load if resources are directed to transport through the shortest paths; (3) redundant links in the performance of the system can help to reroute the traffic but may undesirably transmit and enlarge the failure size of the system; (4) the patterns of cascading failures depend strongly upon the capacity of links; (5) the specific location of the trigger determines the specific route of cascading failure, but has little effect on the final cascading size; (6) system expansion typically reduces the efficiency; and (7) when the locations of the suppliers are optimized over a long expanding period, fewer suppliers are required. These results hold for heterogeneous networks in general, providing insights into designing optimal and resilient complex supply-demand systems that expand constantly in time.

ContributorsZhang, Si-Ping (Author) / Huang, Zi-Gang (Author) / Dong, Jia-Qi (Author) / Eisenberg, Daniel (Author) / Seager, Thomas (Author) / Lai, Ying-Cheng (Author) / Ira A. Fulton Schools of Engineering (Contributor)

Created2015-06-23

The Tragedy of the Unexamined Cat: Why K-12 and University Education Are Still in the Dark Ages and How Citizen Science Allows for a Renaissance

Description

At the end of the dark ages, anatomy was taught as though everything that could be known was known. Scholars learned about what had been discovered rather than how to make discoveries. This was true even though the body (and the rest of biology) was very poorly understood. The renaissance…

At the end of the dark ages, anatomy was taught as though everything that could be known was known. Scholars learned about what had been discovered rather than how to make discoveries. This was true even though the body (and the rest of biology) was very poorly understood. The renaissance eventually brought a revolution in how scholars (and graduate students) were trained and worked. This revolution never occurred in K-12 or university education such that we now teach young students in much the way that scholars were taught in the dark ages, we teach them what is already known rather than the process of knowing. Citizen science offers a way to change K-12 and university education and, in doing so, complete the renaissance. Here we offer an example of such an approach and call for change in the way students are taught science, change that is more possible than it has ever been and is, nonetheless, five hundred years delayed.

ContributorsDunn, Robert R. (Author) / Urban, Julie (Author) / Cavalier, Darlene (Author) / Cooper, Caren B. (Author) / Consortium for Science, Policy and Outcomes (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2016-03-01

A Microbial Survey of the International Space Station (ISS)

Description

Background: Modern advances in sequencing technology have enabled the census of microbial members of many natural ecosystems. Recently, attention is increasingly being paid to the microbial residents of human-made, built ecosystems, both private (homes) and public (subways, office buildings, and hospitals). Here, we report results of the characterization of the microbial…

Background: Modern advances in sequencing technology have enabled the census of microbial members of many natural ecosystems. Recently, attention is increasingly being paid to the microbial residents of human-made, built ecosystems, both private (homes) and public (subways, office buildings, and hospitals). Here, we report results of the characterization of the microbial ecology of a singular built environment, the International Space Station (ISS). This ISS sampling involved the collection and microbial analysis (via 16S rRNA gene PCR) of 15 surfaces sampled by swabs onboard the ISS. This sampling was a component of Project MERCCURI (Microbial Ecology Research Combining Citizen and University Researchers on ISS). Learning more about the microbial inhabitants of the “buildings” in which we travel through space will take on increasing importance, as plans for human exploration continue, with the possibility of colonization of other planets and moons.

Results: Sterile swabs were used to sample 15 surfaces onboard the ISS. The sites sampled were designed to be analogous to samples collected for (1) the Wildlife of Our Homes project and (2) a study of cell phones and shoes that were concurrently being collected for another component of Project MERCCURI. Sequencing of the 16S rRNA genes amplified from DNA extracted from each swab was used to produce a census of the microbes present on each surface sampled. We compared the microbes found on the ISS swabs to those from both homes on Earth and data from the Human Microbiome Project.

Conclusions: While significantly different from homes on Earth and the Human Microbiome Project samples analyzed here, the microbial community composition on the ISS was more similar to home surfaces than to the human microbiome samples. The ISS surfaces are OTU-rich with 1,036–4,294 operational taxonomic units (OTUs per sample). There was no discernible biogeography of microbes on the 15 ISS surfaces, although this may be a reflection of the small sample size we were able to obtain.

ContributorsLang, Jenna M. (Author) / Coil, David A. (Author) / Neches, Russell Y. (Author) / Brown, Wendy E. (Author) / Cavalier, Darlene (Author) / Severance, Mark (Author) / Hampton-Marcell, Jarrad T. (Author) / Gilbert, Jack A. (Author) / Eisen, Jonathan A. (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2017-12-05

The Role of Diverse Strategies in Sustainable Knowledge Production

Description

Online communities are becoming increasingly important as platforms for large-scale human cooperation. These communities allow users seeking and sharing professional skills to solve problems collaboratively. To investigate how users cooperate to complete a large number of knowledge-producing tasks, we analyze Stack Exchange, one of the largest question and answer systems…

Online communities are becoming increasingly important as platforms for large-scale human cooperation. These communities allow users seeking and sharing professional skills to solve problems collaboratively. To investigate how users cooperate to complete a large number of knowledge-producing tasks, we analyze Stack Exchange, one of the largest question and answer systems in the world. We construct attention networks to model the growth of 110 communities in the Stack Exchange system and quantify individual answering strategies using the linking dynamics on attention networks. We identify two answering strategies. Strategy A aims at performing maintenance by doing simple tasks, whereas strategy B aims at investing time in doing challenging tasks. Both strategies are important: empirical evidence shows that strategy A decreases the median waiting time for answers and strategy B increases the acceptance rate of answers. In investigating the strategic persistence of users, we find that users tends to stick on the same strategy over time in a community, but switch from one strategy to the other across communities. This finding reveals the different sets of knowledge and skills between users. A balance between the population of users taking A and B strategies that approximates 2:1, is found to be optimal to the sustainable growth of communities.

ContributorsWu, Lingfei (Author) / Baggio, Jacopo (Author) / Janssen, Marco (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2016-03-02

Leaning on the Ethical Crutch: A Critique of Codes of Ethics

Description

What's a profession without a code of ethics? Being a legitimate profession almost requires drafting a code and, at least nominally, making members follow it. Codes of ethics (henceforth “codes”) exist for a number of reasons, many of which can vary widely from profession to profession - but above all…

What's a profession without a code of ethics? Being a legitimate profession almost requires drafting a code and, at least nominally, making members follow it. Codes of ethics (henceforth “codes”) exist for a number of reasons, many of which can vary widely from profession to profession - but above all they are a form of codified self-regulation. While codes can be beneficial, it argues that when we scratch below the surface, there are many problems at their root. In terms of efficacy, codes can serve as a form of ethical window dressing, rather than effective rules for behavior. But even more that, codes can degrade the meaning behind being a good person who acts ethically for the right reasons.

ContributorsSadowski, Jathan (Author) / Consortium for Science, Policy and Outcomes (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)

Created2013-11-30

ASU Scholarship Showcase

Filtering by

The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents

Mathematical Analysis of Glioma Growth in a Murine Model

Accurate State Estimation From Uncertain Data and Models: An Application of Data Assimilation to Mathematical Models of Human Brain Tumors

Challenges and Opportunities in Coding the Commons: Problems, Procedures, and Potential Solutions in Large-N Comparative Case Studies

Spatiotemporal Patterns and Predictability of Cyberattacks

Optimization and Resilience of Complex Supply-Demand Networks

The Tragedy of the Unexamined Cat: Why K-12 and University Education Are Still in the Dark Ages and How Citizen Science Allows for a Renaissance

A Microbial Survey of the International Space Station (ISS)

The Role of Diverse Strategies in Sustainable Knowledge Production

Leaning on the Ethical Crutch: A Critique of Codes of Ethics