This growing collection consists of scholarly works authored by ASU-affiliated faculty, staff, and community members, and it contains many open access articles. ASU-affiliated authors are encouraged to Share Your Work in KEEP.

Displaying 1 - 10 of 33
Filtering by

Clear all filters

141461-Thumbnail Image.png
Description
In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they

In the digital humanities, there is a constant need to turn images and PDF files into plain text to apply analyses such as topic modelling, named entity recognition, and other techniques. However, although there exist different solutions to extract text embedded in PDF files or run OCR on images, they typically require additional training (for example, scholars have to learn how to use the command line) or are difficult to automate without programming skills. The Giles Ecosystem is a distributed system based on Apache Kafka that allows users to upload documents for text and image extraction. The system components are implemented using Java and the Spring Framework and are available under an Open Source license on GitHub (https://github.com/diging/).
ContributorsLessios-Damerow, Julia (Contributor) / Peirson, Erick (Contributor) / Laubichler, Manfred (Contributor) / ASU-SFI Center for Biosocial Complex Systems (Contributor)
Created2017-09-28
Description

Two classes of scaling behaviours, namely the super-linear scaling of links or activities, and the sub-linear scaling of area, diversity, or time elapsed with respect to size have been found to prevail in the growth of complex networked systems. Despite some pioneering modelling approaches proposed for specific systems, whether there

Two classes of scaling behaviours, namely the super-linear scaling of links or activities, and the sub-linear scaling of area, diversity, or time elapsed with respect to size have been found to prevail in the growth of complex networked systems. Despite some pioneering modelling approaches proposed for specific systems, whether there exists some general mechanisms that account for the origins of such scaling behaviours in different contexts, especially in socioeconomic systems, remains an open question. We address this problem by introducing a geometric network model without free parameter, finding that both super-linear and sub-linear scaling behaviours can be simultaneously reproduced and that the scaling exponents are exclusively determined by the dimension of the Euclidean space in which the network is embedded. We implement some realistic extensions to the basic model to offer more accurate predictions for cities of various scaling behaviours and the Zipf distribution reported in the literature and observed in our empirical studies. All of the empirical results can be precisely recovered by our model with analytical predictions of all major properties. By virtue of these general findings concerning scaling behaviour, our models with simple mechanisms gain new insights into the evolution and development of complex networked systems.

ContributorsZhang, Jiang (Author) / Li, Xintong (Author) / Wang, Xinran (Author) / Wang, Wen-Xu (Author) / Wu, Lingfei (Author) / College of Liberal Arts and Sciences (Contributor)
Created2015-04-29
Description

On-going efforts to understand the dynamics of coupled social-ecological (or more broadly, coupled infrastructure) systems and common pool resources have led to the generation of numerous datasets based on a large number of case studies. This data has facilitated the identification of important factors and fundamental principles which increase our

On-going efforts to understand the dynamics of coupled social-ecological (or more broadly, coupled infrastructure) systems and common pool resources have led to the generation of numerous datasets based on a large number of case studies. This data has facilitated the identification of important factors and fundamental principles which increase our understanding of such complex systems. However, the data at our disposal are often not easily comparable, have limited scope and scale, and are based on disparate underlying frameworks inhibiting synthesis, meta-analysis, and the validation of findings. Research efforts are further hampered when case inclusion criteria, variable definitions, coding schema, and inter-coder reliability testing are not made explicit in the presentation of research and shared among the research community. This paper first outlines challenges experienced by researchers engaged in a large-scale coding project; then highlights valuable lessons learned; and finally discusses opportunities for further research on comparative case study analysis focusing on social-ecological systems and common pool resources. Includes supplemental materials and appendices published in the International Journal of the Commons 2016 Special Issue. Volume 10 - Issue 2 - 2016.

ContributorsRatajczyk, Elicia (Author) / Brady, Ute (Author) / Baggio, Jacopo (Author) / Barnett, Allain J. (Author) / Perez Ibarra, Irene (Author) / Rollins, Nathan (Author) / Rubinos, Cathy (Author) / Shin, Hoon Cheol (Author) / Yu, David (Author) / Aggarwal, Rimjhim (Author) / Anderies, John (Author) / Janssen, Marco (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)
Created2016-09-09
128166-Thumbnail Image.png
Description

At the end of the dark ages, anatomy was taught as though everything that could be known was known. Scholars learned about what had been discovered rather than how to make discoveries. This was true even though the body (and the rest of biology) was very poorly understood. The renaissance

At the end of the dark ages, anatomy was taught as though everything that could be known was known. Scholars learned about what had been discovered rather than how to make discoveries. This was true even though the body (and the rest of biology) was very poorly understood. The renaissance eventually brought a revolution in how scholars (and graduate students) were trained and worked. This revolution never occurred in K-12 or university education such that we now teach young students in much the way that scholars were taught in the dark ages, we teach them what is already known rather than the process of knowing. Citizen science offers a way to change K-12 and university education and, in doing so, complete the renaissance. Here we offer an example of such an approach and call for change in the way students are taught science, change that is more possible than it has ever been and is, nonetheless, five hundred years delayed.

Created2016-03-01
127872-Thumbnail Image.png
Description

Background: Modern advances in sequencing technology have enabled the census of microbial members of many natural ecosystems. Recently, attention is increasingly being paid to the microbial residents of human-made, built ecosystems, both private (homes) and public (subways, office buildings, and hospitals). Here, we report results of the characterization of the microbial

Background: Modern advances in sequencing technology have enabled the census of microbial members of many natural ecosystems. Recently, attention is increasingly being paid to the microbial residents of human-made, built ecosystems, both private (homes) and public (subways, office buildings, and hospitals). Here, we report results of the characterization of the microbial ecology of a singular built environment, the International Space Station (ISS). This ISS sampling involved the collection and microbial analysis (via 16S rRNA gene PCR) of 15 surfaces sampled by swabs onboard the ISS. This sampling was a component of Project MERCCURI (Microbial Ecology Research Combining Citizen and University Researchers on ISS). Learning more about the microbial inhabitants of the “buildings” in which we travel through space will take on increasing importance, as plans for human exploration continue, with the possibility of colonization of other planets and moons.

Results: Sterile swabs were used to sample 15 surfaces onboard the ISS. The sites sampled were designed to be analogous to samples collected for (1) the Wildlife of Our Homes project and (2) a study of cell phones and shoes that were concurrently being collected for another component of Project MERCCURI. Sequencing of the 16S rRNA genes amplified from DNA extracted from each swab was used to produce a census of the microbes present on each surface sampled. We compared the microbes found on the ISS swabs to those from both homes on Earth and data from the Human Microbiome Project.

Conclusions: While significantly different from homes on Earth and the Human Microbiome Project samples analyzed here, the microbial community composition on the ISS was more similar to home surfaces than to the human microbiome samples. The ISS surfaces are OTU-rich with 1,036–4,294 operational taxonomic units (OTUs per sample). There was no discernible biogeography of microbes on the 15 ISS surfaces, although this may be a reflection of the small sample size we were able to obtain.

ContributorsLang, Jenna M. (Author) / Coil, David A. (Author) / Neches, Russell Y. (Author) / Brown, Wendy E. (Author) / Cavalier, Darlene (Author) / Severance, Mark (Author) / Hampton-Marcell, Jarrad T. (Author) / Gilbert, Jack A. (Author) / Eisen, Jonathan A. (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)
Created2017-12-05
128391-Thumbnail Image.png
Description

Given a complex geospatial network with nodes distributed in a two-dimensional region of physical space, can the locations of the nodes be determined and their connection patterns be uncovered based solely on data? We consider the realistic situation where time series/signals can be collected from a single location. A key

Given a complex geospatial network with nodes distributed in a two-dimensional region of physical space, can the locations of the nodes be determined and their connection patterns be uncovered based solely on data? We consider the realistic situation where time series/signals can be collected from a single location. A key challenge is that the signals collected are necessarily time delayed, due to the varying physical distances from the nodes to the data collection centre. To meet this challenge, we develop a compressive-sensing-based approach enabling reconstruction of the full topology of the underlying geospatial network and more importantly, accurate estimate of the time delays. A standard triangularization algorithm can then be employed to find the physical locations of the nodes in the network. We further demonstrate successful detection of a hidden node (or a hidden source or threat), from which no signal can be obtained, through accurate detection of all its neighbouring nodes. As a geospatial network has the feature that a node tends to connect with geophysically nearby nodes, the localized region that contains the hidden node can be identified.

ContributorsSu, Riqi (Author) / Wang, Wen-Xu (Author) / Wang, Xiao (Author) / Lai, Ying-Cheng (Author) / Ira A. Fulton Schools of Engineering (Contributor)
Created2016-01-06
128389-Thumbnail Image.png
Description

Recent works revealed that the energy required to control a complex network depends on the number of driving signals and the energy distribution follows an algebraic scaling law. If one implements control using a small number of drivers, e.g. as determined by the structural controllability theory, there is a high

Recent works revealed that the energy required to control a complex network depends on the number of driving signals and the energy distribution follows an algebraic scaling law. If one implements control using a small number of drivers, e.g. as determined by the structural controllability theory, there is a high probability that the energy will diverge. We develop a physical theory to explain the scaling behaviour through identification of the fundamental structural elements, the longest control chains (LCCs), that dominate the control energy. Based on the LCCs, we articulate a strategy to drastically reduce the control energy (e.g. in a large number of real-world networks). Owing to their structural nature, the LCCs may shed light on energy issues associated with control of nonlinear dynamical networks.

ContributorsChen, Yu-Zhong (Author) / Wang, Le-Zhi (Author) / Wang, Wen-Xu (Author) / Lai, Ying-Cheng (Author) / Ira A. Fulton Schools of Engineering (Contributor)
Created2016-04-20
128519-Thumbnail Image.png
Description

A challenging problem in network science is to control complex networks. In existing frameworks of structural or exact controllability, the ability to steer a complex network toward any desired state is measured by the minimum number of required driver nodes. However, if we implement actual control by imposing input signals

A challenging problem in network science is to control complex networks. In existing frameworks of structural or exact controllability, the ability to steer a complex network toward any desired state is measured by the minimum number of required driver nodes. However, if we implement actual control by imposing input signals on the minimum set of driver nodes, an unexpected phenomenon arises: due to computational or experimental error there is a great probability that convergence to the final state cannot be achieved. In fact, the associated control cost can become unbearably large, effectively preventing actual control from being realized physically. The difficulty is particularly severe when the network is deemed controllable with a small number of drivers. Here we develop a physical controllability framework based on the probability of achieving actual control. Using a recently identified fundamental chain structure underlying the control energy, we offer strategies to turn physically uncontrollable networks into physically controllable ones by imposing slightly augmented set of input signals on properly chosen nodes. Our findings indicate that, although full control can be theoretically guaranteed by the prevailing structural controllability theory, it is necessary to balance the number of driver nodes and control cost to achieve physical control.

ContributorsWang, Le-Zhi (Author) / Chen, Yu-Zhong (Author) / Wang, Wen-Xu (Author) / Lai, Ying-Cheng (Author) / Ira A. Fulton Schools of Engineering (Contributor)
Created2017-01-11
128511-Thumbnail Image.png
Description

Network reconstruction is a fundamental problem for understanding many complex systems with unknown interaction structures. In many complex systems, there are indirect interactions between two individuals without immediate connection but with common neighbors. Despite recent advances in network reconstruction, we continue to lack an approach for reconstructing complex networks with

Network reconstruction is a fundamental problem for understanding many complex systems with unknown interaction structures. In many complex systems, there are indirect interactions between two individuals without immediate connection but with common neighbors. Despite recent advances in network reconstruction, we continue to lack an approach for reconstructing complex networks with indirect interactions. Here we introduce a two-step strategy to resolve the reconstruction problem, where in the first step, we recover both direct and indirect interactions by employing the Lasso to solve a sparse signal reconstruction problem, and in the second step, we use matrix transformation and optimization to distinguish between direct and indirect interactions. The network structure corresponding to direct interactions can be fully uncovered. We exploit the public goods game occurring on complex networks as a paradigm for characterizing indirect interactions and test our reconstruction approach. We find that high reconstruction accuracy can be achieved for both homogeneous and heterogeneous networks, and a number of empirical networks in spite of insufficient data measurement contaminated by noise. Although a general framework for reconstructing complex networks with arbitrary types of indirect interactions is yet lacking, our approach opens new routes to separate direct and indirect interactions in a representative complex system.

ContributorsHan, Xiao (Author) / Shen, Zhesi (Author) / Wang, Wen-Xu (Author) / Lai, Ying-Cheng (Author) / Grebogi, Celso (Author) / Ira A. Fulton Schools of Engineering (Contributor)
Created2016-07-22
128562-Thumbnail Image.png
Description

We find that the flow of attention on the Web forms a directed, tree-like structure implying the time-sensitive browsing behavior of users. Using the data of a news sharing website, we construct clickstream networks in which nodes are news stories and edges represent the consecutive clicks between two stories. To

We find that the flow of attention on the Web forms a directed, tree-like structure implying the time-sensitive browsing behavior of users. Using the data of a news sharing website, we construct clickstream networks in which nodes are news stories and edges represent the consecutive clicks between two stories. To identify the flow direction of clickstreams, we define the “flow distance” of nodes (Li), which measures the average number of steps a random walker takes to reach the ith node. It is observed that Li is related with the clicks (Ci) to news stories and the age (Ti) of stories. Putting these three variables together help us understand the rise and decay of news stories from a network perspective. We also find that the studied clickstream networks preserve a stable structure over time, leading to the scaling between users and clicks. The universal scaling behavior is confirmed by the 1,000 Web forums. We suggest that the tree-like, stable structure of clickstream networks reveals the time-sensitive preference of users in online browsing. To test our assumption, we discuss three models on individual browsing behavior, and compare the simulation results with empirical data.

ContributorsWang, Cheng-Jun (Author) / Wu, Lingfei (Author) / Zhang, Jiang (Author) / Janssen, Marco (Author) / ASU-SFI Center for Biosocial Complex Systems (Contributor)
Created2016-09-28