This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.

Displaying 1 - 10 of 97
151718-Thumbnail Image.png
Description
The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.
ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2013
151867-Thumbnail Image.png
Description
Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.
ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2013
151994-Thumbnail Image.png
Description
Under the framework of intelligent management of power grids by leveraging advanced information, communication and control technologies, a primary objective of this study is to develop novel data mining and data processing schemes for several critical applications that can enhance the reliability of power systems. Specifically, this study is broadly

Under the framework of intelligent management of power grids by leveraging advanced information, communication and control technologies, a primary objective of this study is to develop novel data mining and data processing schemes for several critical applications that can enhance the reliability of power systems. Specifically, this study is broadly organized into the following two parts: I) spatio-temporal wind power analysis for wind generation forecast and integration, and II) data mining and information fusion of synchrophasor measurements toward secure power grids. Part I is centered around wind power generation forecast and integration. First, a spatio-temporal analysis approach for short-term wind farm generation forecasting is proposed. Specifically, using extensive measurement data from an actual wind farm, the probability distribution and the level crossing rate of wind farm generation are characterized using tools from graphical learning and time-series analysis. Built on these spatial and temporal characterizations, finite state Markov chain models are developed, and a point forecast of wind farm generation is derived using the Markov chains. Then, multi-timescale scheduling and dispatch with stochastic wind generation and opportunistic demand response is investigated. Part II focuses on incorporating the emerging synchrophasor technology into the security assessment and the post-disturbance fault diagnosis of power systems. First, a data-mining framework is developed for on-line dynamic security assessment by using adaptive ensemble decision tree learning of real-time synchrophasor measurements. Under this framework, novel on-line dynamic security assessment schemes are devised, aiming to handle various factors (including variations of operating conditions, forced system topology change, and loss of critical synchrophasor measurements) that can have significant impact on the performance of conventional data-mining based on-line DSA schemes. Then, in the context of post-disturbance analysis, fault detection and localization of line outage is investigated using a dependency graph approach. It is shown that a dependency graph for voltage phase angles can be built according to the interconnection structure of power system, and line outage events can be detected and localized through networked data fusion of the synchrophasor measurements collected from multiple locations of power grids. Along a more practical avenue, a decentralized networked data fusion scheme is proposed for efficient fault detection and localization.
ContributorsHe, Miao (Author) / Zhang, Junshan (Thesis advisor) / Vittal, Vijay (Thesis advisor) / Hedman, Kory (Committee member) / Si, Jennie (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2013
151322-Thumbnail Image.png
Description
With the rapid growth of power systems and the concomitant technological advancements, the goal of achieving smart grids is no longer a vision but a foreseeable reality. Hence, the existing grids are undergoing infrastructural modifications to achieve the diverse characteristics of a smart grid. While there are many subjects associated

With the rapid growth of power systems and the concomitant technological advancements, the goal of achieving smart grids is no longer a vision but a foreseeable reality. Hence, the existing grids are undergoing infrastructural modifications to achieve the diverse characteristics of a smart grid. While there are many subjects associated with the operation of smart grids, this dissertation addresses two important aspects of smart grids: increased penetration of renewable resources, and increased reliance on sensor systems to improve reliability and performance of critical power system components. Present renewable portfolio standards are changing both structural and performance characteristics of power systems by replacing conventional generation with alternate energy resources such as photovoltaic (PV) systems. The present study investigates the impact of increased penetration of PV systems on steady state performance as well as transient stability of a large power system which is a portion of the Western U.S. interconnection. Utility scale and residential rooftop PVs are added to replace a portion of conventional generation resources. While steady state voltages are observed under various PV penetration levels, the impact of reduced inertia on transient stability performance is also examined. The simulation results obtained effectively identify both detrimental and beneficial impacts of increased PV penetration both for steady state stability and transient stability performance. With increased penetration of the renewable energy resources, and with the current loading scenario, more transmission system components such as transformers and circuit breakers are subject to increased stress and overloading. This research work explores the feasibility of increasing system reliability by applying condition monitoring systems to selected circuit breakers and transformers. A very important feature of smart grid technology is that this philosophy decreases maintenance costs by deploying condition monitoring systems that inform the operator of impending failures; or the approach can ameliorate problematic conditions. A method to identify the most critical transformers and circuit breakers with the aid of contingency ranking methods is presented in this study. The work reported in this dissertation parallels an industry sponsored study in which a considerable level of industry input and industry reported concerns are reflected.
ContributorsEftekharnejad, Sara (Author) / Heydt, Gerald (Thesis advisor) / Vittal, Vijay (Thesis advisor) / Si, Jennie (Committee member) / Tylavsky, Daniel (Committee member) / Arizona State University (Publisher)
Created2012
151517-Thumbnail Image.png
Description
Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.
ContributorsBhattacharya, Indrani (Author) / Sen, Arunabha (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2013
151540-Thumbnail Image.png
Description
The combined heat and power (CHP)-based distributed generation (DG) or dis-tributed energy resources (DERs) are mature options available in the present energy mar-ket, considered to be an effective solution to promote energy efficiency. In the urban en-vironment, the electricity, water and natural gas distribution networks are becoming in-creasingly interconnected with

The combined heat and power (CHP)-based distributed generation (DG) or dis-tributed energy resources (DERs) are mature options available in the present energy mar-ket, considered to be an effective solution to promote energy efficiency. In the urban en-vironment, the electricity, water and natural gas distribution networks are becoming in-creasingly interconnected with the growing penetration of the CHP-based DG. Subse-quently, this emerging interdependence leads to new topics meriting serious consideration: how much of the CHP-based DG can be accommodated and where to locate these DERs, and given preexisting constraints, how to quantify the mutual impacts on operation performances between these urban energy distribution networks and the CHP-based DG. The early research work was conducted to investigate the feasibility and design methods for one residential microgrid system based on existing electricity, water and gas infrastructures of a residential community, mainly focusing on the economic planning. However, this proposed design method cannot determine the optimal DG sizing and sit-ing for a larger test bed with the given information of energy infrastructures. In this con-text, a more systematic as well as generalized approach should be developed to solve these problems. In the later study, the model architecture that integrates urban electricity, water and gas distribution networks, and the CHP-based DG system was developed. The pro-posed approach addressed the challenge of identifying the optimal sizing and siting of the CHP-based DG on these urban energy networks and the mutual impacts on operation per-formances were also quantified. For this study, the overall objective is to maximize the electrical output and recovered thermal output of the CHP-based DG units. The electrici-ty, gas, and water system models were developed individually and coupled by the devel-oped CHP-based DG system model. The resultant integrated system model is used to constrain the DG's electrical output and recovered thermal output, which are affected by multiple factors and thus analyzed in different case studies. The results indicate that the designed typical gas system is capable of supplying sufficient natural gas for the DG normal operation, while the present water system cannot support the complete recovery of the exhaust heat from the DG units.
ContributorsZhang, Xianjun (Author) / Karady, George G. (Thesis advisor) / Ariaratnam, Samuel T. (Committee member) / Holbert, Keith E. (Committee member) / Si, Jennie (Committee member) / Arizona State University (Publisher)
Created2013
152541-Thumbnail Image.png
Description
Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a subset of items presented to her. The network operator again, shows that activity to a selection of peers, and thus creating a behavioral loop. That mechanism of interaction and information flow raises some very interesting questions such as: can network operator design social signals to promote a particular activity like sustainability, public health care awareness, or to promote a specific product? The focus of my thesis is to answer that question. In this thesis, I develop a framework to personalize social signals for users to guide their activities on an online platform. As the result, we gradually nudge the activity distribution on the platform from the initial distribution p to the target distribution q. My work is particularly applicable to guiding collaborations, guiding collective actions, and online advertising. In particular, I first propose a probabilistic model on how users behave and how information flows on the platform. The main part of this thesis after that discusses the Influence Individuals through Social Signals (IISS) framework. IISS consists of four main components: (1) Learner: it learns users' interests and characteristics from their historical activities using Bayesian model, (2) Calculator: it uses gradient descent method to compute the intermediate activity distributions, (3) Selector: it selects users who can be influenced to adopt or drop specific activities, (4) Designer: it personalizes social signals for each user. I evaluate the performance of IISS framework by simulation on several network topologies such as preferential attachment, small world, and random. I show that the framework gradually nudges users' activities to approach the target distribution. I use both simulation and mathematical method to analyse convergence properties such as how fast and how close we can approach the target distribution. When the number of activities is 3, I show that for about 45% of target distributions, we can achieve KL-divergence as low as 0.05. But for some other distributions KL-divergence can be as large as 0.5.
ContributorsLe, Tien D (Author) / Sundaram, Hari (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2014
152420-Thumbnail Image.png
Description
This dissertation considers an integrated approach to system design and controller design based on analyzing limits of system performance. Historically, plant design methodologies have not incorporated control relevant considerations. Such an approach could result in a system that might not meet its specifications (or one that requires a complex control

This dissertation considers an integrated approach to system design and controller design based on analyzing limits of system performance. Historically, plant design methodologies have not incorporated control relevant considerations. Such an approach could result in a system that might not meet its specifications (or one that requires a complex control architecture to do so). System and controller designers often go through several iterations in order to converge to an acceptable plant and controller design. The focus of this dissertation is on the design and control an air-breathing hypersonic vehicle using such an integrated system-control design framework. The goal is to reduce the number of system-control design iterations (by explicitly incorporate control considerations in the system design process), as well as to influence the guidance/trajectory specifications for the system. Due to the high computational costs associated with obtaining a dynamic model for each plant configuration considered, approximations to the system dynamics are used in the control design process. By formulating the control design problem using bilinear and polynomial matrix inequalities, several common control and system design constraints can be simultaneously incorporated into a vehicle design optimization. Several design problems are examined to illustrate the effectiveness of this approach (and to compare the computational burden of this methodology against more traditional approaches).
ContributorsSridharan, Srikanth (Author) / Rodriguez, Armando A (Thesis advisor) / Mittelmann, Hans D (Committee member) / Si, Jennie (Committee member) / Tsakalis, Konstantinos S (Committee member) / Arizona State University (Publisher)
Created2014
152656-Thumbnail Image.png
Description
The basal ganglia are four sub-cortical nuclei associated with motor control and reward learning. They are part of numerous larger mostly segregated loops where the basal ganglia receive inputs from specific regions of cortex. Converging on these inputs are dopaminergic neurons that alter their firing based on received and/or predicted

The basal ganglia are four sub-cortical nuclei associated with motor control and reward learning. They are part of numerous larger mostly segregated loops where the basal ganglia receive inputs from specific regions of cortex. Converging on these inputs are dopaminergic neurons that alter their firing based on received and/or predicted rewarding outcomes of a behavior. The basal ganglia's output feeds through the thalamus back to the areas of the cortex where the loop originated. Understanding the dynamic interactions between the various parts of these loops is critical to understanding the basal ganglia's role in motor control and reward based learning. This work developed several experimental techniques that can be applied to further study basal ganglia function. The first technique used micro-volume injections of low concentration muscimol to decrease the firing rates of recorded neurons in a limited area of cortex in rats. Afterwards, an artificial cerebrospinal fluid flush was injected to rapidly eliminate the muscimol's effects. This technique was able to contain the effects of muscimol to approximately a 1 mm radius volume and limited the duration of the drug effect to less than one hour. This technique could be used to temporarily perturb a small portion of the loops involving the basal ganglia and then observe how these effects propagate in other connected regions. The second part applied self-organizing maps (SOM) to find temporal patterns in neural firing rate that are independent of behavior. The distribution of detected patterns frequency on these maps can then be used to determine if changes in neural activity are occurring over time. The final technique focused on the role of the basal ganglia in reward learning. A new conditioning technique was created to increase the occurrence of selected patterns of neural activity without utilizing any external reward or behavior. A pattern of neural activity in the cortex of rats was selected using an SOM. The pattern was then reinforced by being paired with electrical stimulation of the medial forebrain bundle triggering dopamine release in the basal ganglia. Ultimately, this technique proved unsuccessful possibly due to poor selection of the patterns being reinforced.
ContributorsBaldwin, Nathan Aaron (Author) / Helms Tillery, Stephen I (Thesis advisor) / Castaneda, Edward (Committee member) / Buneo, Christopher A (Committee member) / Muthuswamy, Jitendran (Committee member) / Si, Jennie (Committee member) / Arizona State University (Publisher)
Created2014
152158-Thumbnail Image.png
Description
Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.
ContributorsRihan, Preet Inder Singh (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2013