Search Content

On summarization of non-linear narratives

Description

Navigating within non-linear structures is a challenge for all users when the space is large but the problem is most pronounced when the users are blind or visually impaired. Such users access digital content through screen readers like JAWS which read out the text on the screen. However presentation of…

Navigating within non-linear structures is a challenge for all users when the space is large but the problem is most pronounced when the users are blind or visually impaired. Such users access digital content through screen readers like JAWS which read out the text on the screen. However presentation of non-linear narratives in such a manner without visual cues and information about spatial dependencies is very inefficient for such users. The NSDL Science Literacy StrandMaps are visual layouts to help students and teachers browse educational resources. A Strandmap shows relationships between concepts and how they build upon one another across grade levels. NSDL Strandmaps are non-linear narratives which need to be presented to users who are blind in an effective way. A good summary of the Strandmap can give the users an idea about the concepts that are explained in it. This can help them decide whether to view the map or not. In addition, a preview-based navigation mechanism can help users decide which direction they want to take, based on a preview of upcoming content in each direction. Given a non-linear narrative like a Strandmap which has both text and structure, and a word limit w, the goal of this thesis is to find the best way to create its summary. The following approaches are considered: – Purely Text-based Approach using a Multi-document Text Summarizer – Purely Structure-based Approach using PageRank – Approaches Combining both Text and Structure → CUTS-Based Approach (Topic Segmentation) → PageRank with Content Since no reference summaries for such structures were available, user studies were conducted to evaluate these algorithms. PageRank with Content approach performed the best. Another important conclusion was that text and structure are intertwined in a Strandmap by design.

ContributorsGaur, Shruti (Author) / Candan, Kasim Selcuk (Thesis advisor) / Sundaram, Hari (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2011

Adaptive decentralized routing and detection of overlapping communities

Description

This dissertation studies routing in small-world networks such as grids plus long-range edges and real networks. Kleinberg showed that geography-based greedy routing in a grid-based network takes an expected number of steps polylogarithmic in the network size, thus justifying empirical efficiency observed beginning with Milgram. A counterpart for the grid-based…

This dissertation studies routing in small-world networks such as grids plus long-range edges and real networks. Kleinberg showed that geography-based greedy routing in a grid-based network takes an expected number of steps polylogarithmic in the network size, thus justifying empirical efficiency observed beginning with Milgram. A counterpart for the grid-based model is provided; it creates all edges deterministically and shows an asymptotically matching upper bound on the route length. The main goal is to improve greedy routing through a decentralized machine learning process. Two considered methods are based on weighted majority and an algorithm of de Farias and Megiddo, both learning from feedback using ensembles of experts. Tests are run on both artificial and real networks, with decentralized spectral graph embedding supplying geometric information for real networks where it is not intrinsically available. An important measure analyzed in this work is overpayment, the difference between the cost of the method and that of the shortest path. Adaptive routing overtakes greedy after about a hundred or fewer searches per node, consistently across different network sizes and types. Learning stabilizes, typically at overpayment of a third to a half of that by greedy. The problem is made more difficult by eliminating the knowledge of neighbors' locations or by introducing uncooperative nodes. Even under these conditions, the learned routes are usually better than the greedy routes. The second part of the dissertation is related to the community structure of unannotated networks. A modularity-based algorithm of Newman is extended to work with overlapping communities (including considerably overlapping communities), where each node locally makes decisions to which potential communities it belongs. To measure quality of a cover of overlapping communities, a notion of a node contribution to modularity is introduced, and subsequently the notion of modularity is extended from partitions to covers. The final part considers a problem of network anonymization, mostly by the means of edge deletion. The point of interest is utility preservation. It is shown that a concentration on the preservation of routing abilities might damage the preservation of community structure, and vice versa.

ContributorsBakun, Oleg (Author) / Konjevod, Goran (Thesis advisor) / Richa, Andrea (Thesis advisor) / Syrotiuk, Violet R. (Committee member) / Czygrinow, Andrzej (Committee member) / Arizona State University (Publisher)

Created2011

Sequence-based web page template detection

Description

Templates are wildly used in Web sites development. Finding the template for a given set of Web pages could be very important and useful for many applications like Web page classification and monitoring content and structure changes of Web pages. In this thesis, two novel sequence-based Web page template detection…

Templates are wildly used in Web sites development. Finding the template for a given set of Web pages could be very important and useful for many applications like Web page classification and monitoring content and structure changes of Web pages. In this thesis, two novel sequence-based Web page template detection algorithms are presented. Different from tree mapping algorithms which are based on tree edit distance, sequence-based template detection algorithms operate on the Prüfer/Consolidated Prüfer sequences of trees. Since there are one-to-one correspondences between Prüfer/Consolidated Prüfer sequences and trees, sequence-based template detection algorithms identify the template by finding a common subsequence between to Prüfer/Consolidated Prüfer sequences. This subsequence should be a sequential representation of a common subtree of input trees. Experiments on real-world web pages showed that our approaches detect templates effectively and efficiently.

ContributorsHuang, Wei (Author) / Candan, Kasim Selcuk (Thesis advisor) / Sundaram, Hari (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2011

Post-Optimization of Permutation Coverings

Description

Covering subsequences with sets of permutations arises in many applications, including event-sequence testing. Given a set of subsequences to cover, one is often interested in knowing the fewest number of permutations required to cover each subsequence, and in finding an explicit construction of such a set of permutations that has…

Covering subsequences with sets of permutations arises in many applications, including event-sequence testing. Given a set of subsequences to cover, one is often interested in knowing the fewest number of permutations required to cover each subsequence, and in finding an explicit construction of such a set of permutations that has size close to or equal to the minimum possible. The construction of such permutation coverings has proven to be computationally difficult. While many examples for permutations of small length have been found, and strong asymptotic behavior is known, there are few explicit constructions for permutations of intermediate lengths. Most of these are generated from scratch using greedy algorithms. We explore a different approach here. Starting with a set of permutations with the desired coverage properties, we compute local changes to individual permutations that retain the total coverage of the set. By choosing these local changes so as to make one permutation less "essential" in maintaining the coverage of the set, our method attempts to make a permutation completely non-essential, so it can be removed without sacrificing total coverage. We develop a post-optimization method to do this and present results on sequence covering arrays and other types of permutation covering problems demonstrating that it is surprisingly effective.

ContributorsMurray, Patrick Charles (Author) / Colbourn, Charles (Thesis director) / Czygrinow, Andrzej (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Department of Physics (Contributor)

Created2014-12

The first-fit algorithm uses many colors on some interval graphs

Description

Graph coloring is about allocating resources that can be shared except where there are certain pairwise conflicts between recipients. The simplest coloring algorithm that attempts to conserve resources is called first fit. Interval graphs are used in models for scheduling (in computer science and operations research) and in biochemistry for…

Graph coloring is about allocating resources that can be shared except where there are certain pairwise conflicts between recipients. The simplest coloring algorithm that attempts to conserve resources is called first fit. Interval graphs are used in models for scheduling (in computer science and operations research) and in biochemistry for one-dimensional molecules such as genetic material. It is not known precisely how much waste in the worst case is due to the first-fit algorithm for coloring interval graphs. However, after decades of research the range is narrow. Kierstead proved that the performance ratio R is at most 40. Pemmaraju, Raman, and Varadarajan proved that R is at most 10. This can be improved to 8. Witsenhausen, and independently Chrobak and Slusarek, proved that R is at least 4. Slusarek improved this to 4.45. Kierstead and Trotter extended the method of Chrobak and Slusarek to one good for a lower bound of 4.99999 or so. The method relies on number sequences with a certain property of order. It is shown here that each sequence considered in the construction satisfies a linear recurrence; that R is at least 5; that the Fibonacci sequence is in some sense minimally useless for the construction; and that the Fibonacci sequence is a point of accumulation in some space for the useful sequences of the construction. Limitations of all earlier constructions are revealed.

ContributorsSmith, David A. (Author) / Kierstead, Henry A. (Thesis advisor) / Czygrinow, Andrzej (Committee member) / Gelb, Anne (Committee member) / Hurlbert, Glenn H. (Committee member) / Kadell, Kevin W. J. (Committee member) / Arizona State University (Publisher)

Created2010

Leveraging metadata for extracting robust multi-variate temporal features

Description

In recent years, there are increasing numbers of applications that use multi-variate time series data where multiple uni-variate time series coexist. However, there is a lack of systematic of multi-variate time series. This thesis focuses on (a) defining a simplified inter-related multi-variate time series (IMTS) model and (b) developing robust…

In recent years, there are increasing numbers of applications that use multi-variate time series data where multiple uni-variate time series coexist. However, there is a lack of systematic of multi-variate time series. This thesis focuses on (a) defining a simplified inter-related multi-variate time series (IMTS) model and (b) developing robust multi-variate temporal (RMT) feature extraction algorithm that can be used for locating, filtering, and describing salient features in multi-variate time series data sets. The proposed RMT feature can also be used for supporting multiple analysis tasks, such as visualization, segmentation, and searching / retrieving based on multi-variate time series similarities. Experiments confirm that the proposed feature extraction algorithm is highly efficient and effective in identifying robust multi-scale temporal features of multi-variate time series.

ContributorsWang, Xiaolan (Author) / Candan, Kasim Selcuk (Thesis advisor) / Sapino, Maria Luisa (Committee member) / Fainekos, Georgios (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

Augmenting Academic Research Search And Reading With Richer Context

Description

The volume of scientific research is growing at an exponential rate over the past100 years. With the advent of the internet and ubiquitous access to the web, academic research search engines such as Google Scholar, Microsoft Academic, etc., have become the go-to platforms for systemic reviews and search. Although many…

The volume of scientific research is growing at an exponential rate over the past100 years. With the advent of the internet and ubiquitous access to the web, academic research search engines such as Google Scholar, Microsoft Academic, etc., have become the go-to platforms for systemic reviews and search. Although many academic search engines host lots of content, they provide minimal context about where the search terms matched. Many of these search engines also fail to provide additional tools which can help enhance a researcher’s understanding of research content outside their respective websites. An example of such a tool can be a browser extension/plugin that surfaces context-relevant information about a research article when the user reads a research article. This dissertation discusses a solution developed to bring more intrinsic characteristics of research documents such as the structure of the research document, tables in the document, the keywords associated with the document to improve search capabilities and augment the information a researcher may read. The prototype solution named Sci-Genie(https://sci-genie.com/) is a search engine over scientific articles from Computer Science ArXiv. Sci-Genie parses research papers and indexes research documents’ structure to provide context-relevant information about the matched search fragments. The same search engine also powers a browser extension to augment the information about a research article the user may be reading. The browser extension augments the user’s interface with information about tables from the cited papers, other papers by the same authors, and even the citations to and from the current article. The browser extension is further powered with access endpoints that leverage a machine learning model to filter tables comparing various entities. The dissertation further discusses these machine learning models and some baselines that help classify whether a table is comparing various entities or not. The dissertation finally concludes by discussing the current shortcomings of Sci-Genie and possible future research scope based on learnings after building Sci-Genie.

ContributorsDave, Valay (Author) / Zou, Jia (Thesis advisor) / Ben Amor, Heni (Thesis advisor) / Candan, Kasim Selcuk (Committee member) / Arizona State University (Publisher)

Created2021

Physical System Knowledge Extraction and Transfer Using Machine Learning

Description

Modern physical systems are experiencing tremendous evolutions with growing size, more and more complex structures, and the incorporation of new devices. This calls for better planning, monitoring, and control. However, achieving these goals is challenging since the system knowledge (e.g., system structures and edge parameters) may be unavailable for a…

Modern physical systems are experiencing tremendous evolutions with growing size, more and more complex structures, and the incorporation of new devices. This calls for better planning, monitoring, and control. However, achieving these goals is challenging since the system knowledge (e.g., system structures and edge parameters) may be unavailable for a normal system, let alone some dynamic changes like maintenance, reconfigurations, and events, etc. Therefore, extracting system knowledge becomes a central topic. Luckily, advanced metering techniques bring numerous data, leading to the emergence of Machine Learning (ML) methods with efficient learning and fast inference. This work tries to propose a systematic framework of ML-based methods to learn system knowledge under three what-if scenarios: (i) What if the system is normally operated? (ii) What if the system suffers dynamic interventions? (iii) What if the system is new with limited data? For each case, this thesis proposes principled solutions with extensive experiments. Chapter 2 tackles scenario (i) and the golden rule is to learn an ML model that maintains physical consistency, bringing high extrapolation capacity for changing operational conditions. The key finding is that physical consistency can be linked to convexity, a central concept in optimization. Therefore, convexified ML designs are proposed and the global optimality implies faithfulness to the underlying physics. Chapter 3 handles scenario (ii) and the goal is to identify the event time, type, and locations. The problem is formalized as multi-class classification with special attention to accuracy and speed. Subsequently, Chapter 3 builds an ensemble learning framework to aggregate different ML models for better prediction. Next, to tackle high-volume data quickly, a tensor as the multi-dimensional array is used to store and process data, yielding compact and informative vectors for fast inference. Finally, if no labels exist, Chapter 3 uses physical properties to generate labels for learning. Chapter 4 deals with scenario (iii) and a doable process is to transfer knowledge from similar systems, under the framework of Transfer Learning (TL). Chapter 4 proposes cutting-edge system-level TL by considering the network structure, complex spatial-temporal correlations, and different physical information.

ContributorsLi, Haoran (Author) / Weng, Yang (Thesis advisor) / Tong, Hanghang (Committee member) / Dasarathy, Gautam (Committee member) / Sankar, Lalitha (Committee member) / Arizona State University (Publisher)

Created2022

Explaining the Vulnerabilities of Machine Learning through Visual Analytics

Description

Machine learning models are increasingly being deployed in real-world applications where their predictions are used to make critical decisions in a variety of domains. The proliferation of such models has led to a burgeoning need to ensure the reliability and safety of these models, given the potential negative consequences of…

Machine learning models are increasingly being deployed in real-world applications where their predictions are used to make critical decisions in a variety of domains. The proliferation of such models has led to a burgeoning need to ensure the reliability and safety of these models, given the potential negative consequences of model vulnerabilities. The complexity of machine learning models, along with the extensive data sets they analyze, can result in unpredictable and unintended outcomes. Model vulnerabilities may manifest due to errors in data input, algorithm design, or model deployment, which can have significant implications for both individuals and society. To prevent such negative outcomes, it is imperative to identify model vulnerabilities at an early stage in the development process. This will aid in guaranteeing the integrity, dependability, and safety of the models, thus mitigating potential risks and enabling the full potential of these technologies to be realized. However, enumerating vulnerabilities can be challenging due to the complexity of the real-world environment. Visual analytics, situated at the intersection of human-computer interaction, computer graphics, and artificial intelligence, offers a promising approach for achieving high interpretability of complex black-box models, thus reducing the cost of obtaining insights into potential vulnerabilities of models. This research is devoted to designing novel visual analytics methods to support the identification and analysis of model vulnerabilities. Specifically, generalizable visual analytics frameworks are instantiated to explore vulnerabilities in machine learning models concerning security (adversarial attacks and data perturbation) and fairness (algorithmic bias). In the end, a visual analytics approach is proposed to enable domain experts to explain and diagnose the model improvement of addressing identified vulnerabilities of machine learning models in a human-in-the-loop fashion. The proposed methods hold the potential to enhance the security and fairness of machine learning models deployed in critical real-world applications.

ContributorsXie, Tiankai (Author) / Maciejewski, Ross (Thesis advisor) / Liu, Huan (Committee member) / Bryan, Chris (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2023

Access Balancing in Storage Systems by Labelling Steiner Systems

Description

A storage system requiring file redundancy and on-line repairability can be represented as a Steiner system, a combinatorial design with the property that every $t$-subset of its points occurs in exactly one of its blocks. Under this representation, files are the points and storage units are the blocks of the…

A storage system requiring file redundancy and on-line repairability can be represented as a Steiner system, a combinatorial design with the property that every $t$-subset of its points occurs in exactly one of its blocks. Under this representation, files are the points and storage units are the blocks of the Steiner system, or vice-versa. Often, the popularities of the files of such storage systems run the gamut, with some files receiving hardly any attention, and others receiving most of it. For such systems, minimizing the difference in the collective popularity between any two storage units is nontrivial; this is the access balancing problem. With regard to the representative Steiner system, the access balancing problem in its simplest form amounts to constructing either a point or block labelling: an assignment of a set of integer labels (popularity ranks) to the Steiner system's point set or block set, respectively, requiring of the former assignment that the sums of the labelled points of any two blocks differ as little as possible and of the latter that the sums of the labels assigned to the containing blocks of any two distinct points differ as little as possible. The central aim of this dissertation is to supply point and block labellings for Steiner systems of block size greater than three, for which up to this point no attempt has been made. Four major results are given in this connection. First, motivated by the close connection between the size of the independent sets of a Steiner system and the quality of its labellings, a Steiner triple system of any admissible order is constructed with a pair of disjoint independent sets of maximum cardinality. Second, the spectrum of resolvable Bose triple systems is determined in order to label some Steiner 2-designs with block size four. Third, several kinds of independent sets are used to point-label Steiner 2-designs with block size four. Finally, optimal and close to optimal block labellings are given for an infinite class of 1-rotational resolvable Steiner 2-designs with arbitrarily large block size by exploiting their underlying group-theoretic properties.

ContributorsLusi, Dylan (Author) / Colbourn, Charles J (Thesis advisor) / Czygrinow, Andrzej (Committee member) / Fainekos, Georgios (Committee member) / Richa, Andrea (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by