This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.

Displaying 1 - 10 of 89
151874-Thumbnail Image.png
Description
Wind measurements are fundamental inputs for the evaluation of potential energy yield and performance of wind farms. Three-dimensional scanning coherent Doppler lidar (CDL) may provide a new basis for wind farm site selection, design, and control. In this research, CDL measurements obtained from multiple wind energy developments are analyzed and

Wind measurements are fundamental inputs for the evaluation of potential energy yield and performance of wind farms. Three-dimensional scanning coherent Doppler lidar (CDL) may provide a new basis for wind farm site selection, design, and control. In this research, CDL measurements obtained from multiple wind energy developments are analyzed and a novel wind farm control approach has been modeled. The possibility of using lidar measurements to more fully characterize the wind field is discussed, specifically, terrain effects, spatial variation of winds, power density, and the effect of shear at different layers within the rotor swept area. Various vector retrieval methods have been applied to the lidar data, and results are presented on an elevated terrain-following surface at hub height. The vector retrieval estimates are compared with tower measurements, after interpolation to the appropriate level. CDL data is used to estimate the spatial power density at hub height. Since CDL can measure winds at different vertical levels, an approach for estimating wind power density over the wind turbine rotor-swept area is explored. Sample optimized layouts of wind farm using lidar data and global optimization algorithms, accounting for wake interaction effects, have been explored. An approach to evaluate spatial wind speed and direction estimates from a standard nested Coupled Ocean and Atmosphere Mesoscale Prediction System (COAMPS) model and CDL is presented. The magnitude of spatial difference between observations and simulation for wind energy assessment is researched. Diurnal effects and ramp events as estimated by CDL and COAMPS were inter-compared. Novel wind farm control based on incoming winds and direction input from CDL's is developed. Both yaw and pitch control using scanning CDL for efficient wind farm control is analyzed. The wind farm control optimizes power production and reduces loads on wind turbines for various lidar wind speed and direction inputs, accounting for wind farm wake losses and wind speed evolution. Several wind farm control configurations were developed, for enhanced integrability into the electrical grid. Finally, the value proposition of CDL for a wind farm development, based on uncertainty reduction and return of investment is analyzed.
ContributorsKrishnamurthy, Raghavendra (Author) / Calhoun, Ronald J (Thesis advisor) / Chen, Kangping (Committee member) / Huang, Huei-Ping (Committee member) / Fraser, Matthew (Committee member) / Phelan, Patrick (Committee member) / Arizona State University (Publisher)
Created2013
151718-Thumbnail Image.png
Description
The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.
ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2013
151867-Thumbnail Image.png
Description
Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.
ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)
Created2013
151914-Thumbnail Image.png
Description
Derived from the necessity to increase testing capabilities of hybrid rocket motor (HRM) propulsion systems for Daedalus Astronautics at Arizona State University, a small-scale motor and test stand were designed and developed to characterize all components of the system. The motor is designed for simple integration and setup, such that

Derived from the necessity to increase testing capabilities of hybrid rocket motor (HRM) propulsion systems for Daedalus Astronautics at Arizona State University, a small-scale motor and test stand were designed and developed to characterize all components of the system. The motor is designed for simple integration and setup, such that both the forward-end enclosure and end cap can be easily removed for rapid integration of components during testing. Each of the components of the motor is removable allowing for a broad range of testing capabilities. While examining injectors and their potential it is thought ideal to obtain the highest regression rates and overall motor performance possible. The oxidizer and fuel are N2O and hydroxyl-terminated polybutadiene (HTPB), respectively, due to previous experience and simplicity. The injector designs, selected for the same reasons, are designed such that they vary only in the swirl angle. This system provides the platform for characterizing the effects of varying said swirl angle on HRM performance.
ContributorsSummers, Matt H (Author) / Lee, Taewoo (Thesis advisor) / Chen, Kangping (Committee member) / Wells, Valana (Committee member) / Arizona State University (Publisher)
Created2013
151944-Thumbnail Image.png
Description
The atomization of a liquid jet by a high speed cross-flowing gas has many applications such as gas turbines and augmentors. The mechanisms by which the liquid jet initially breaks up, however, are not well understood. Experimental studies suggest the dependence of spray properties on operating conditions and nozzle geom-

The atomization of a liquid jet by a high speed cross-flowing gas has many applications such as gas turbines and augmentors. The mechanisms by which the liquid jet initially breaks up, however, are not well understood. Experimental studies suggest the dependence of spray properties on operating conditions and nozzle geom- etry. Detailed numerical simulations can offer better understanding of the underlying physical mechanisms that lead to the breakup of the injected liquid jet. In this work, detailed numerical simulation results of turbulent liquid jets injected into turbulent gaseous cross flows for different density ratios is presented. A finite volume, balanced force fractional step flow solver to solve the Navier-Stokes equations is employed and coupled to a Refined Level Set Grid method to follow the phase interface. To enable the simulation of atomization of high density ratio fluids, we ensure discrete consistency between the solution of the conservative momentum equation and the level set based continuity equation by employing the Consistent Rescaled Momentum Transport (CRMT) method. The impact of different inflow jet boundary conditions on different jet properties including jet penetration is analyzed and results are compared to those obtained experimentally by Brown & McDonell(2006). In addition, instability analysis is performed to find the most dominant insta- bility mechanism that causes the liquid jet to breakup. Linear instability analysis is achieved using linear theories for Rayleigh-Taylor and Kelvin- Helmholtz instabilities and non-linear analysis is performed using our flow solver with different inflow jet boundary conditions.
ContributorsGhods, Sina (Author) / Herrmann, Marcus (Thesis advisor) / Squires, Kyle (Committee member) / Chen, Kangping (Committee member) / Huang, Huei-Ping (Committee member) / Tang, Wenbo (Committee member) / Arizona State University (Publisher)
Created2013
151517-Thumbnail Image.png
Description
Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.
ContributorsBhattacharya, Indrani (Author) / Sen, Arunabha (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2013
151528-Thumbnail Image.png
Description
The heat transfer enhancements available from expanding the cross-section of a boiling microchannel are explored analytically and experimentally. Evaluation of the literature on critical heat flux in flow boiling and associated pressure drop behavior is presented with predictive critical heat flux (CHF) and pressure drop correlations. An optimum channel configuration

The heat transfer enhancements available from expanding the cross-section of a boiling microchannel are explored analytically and experimentally. Evaluation of the literature on critical heat flux in flow boiling and associated pressure drop behavior is presented with predictive critical heat flux (CHF) and pressure drop correlations. An optimum channel configuration allowing maximum CHF while reducing pressure drop is sought. A perturbation of the channel diameter is employed to examine CHF and pressure drop relationships from the literature with the aim of identifying those adequately general and suitable for use in a scenario with an expanding channel. Several CHF criteria are identified which predict an optimizable channel expansion, though many do not. Pressure drop relationships admit improvement with expansion, and no optimum presents itself. The relevant physical phenomena surrounding flow boiling pressure drop are considered, and a balance of dimensionless numbers is presented that may be of qualitative use. The design, fabrication, inspection, and experimental evaluation of four copper microchannel arrays of different channel expansion rates with R-134a refrigerant is presented. Optimum rates of expansion which maximize the critical heat flux are considered at multiple flow rates, and experimental results are presented demonstrating optima. The effect of expansion on the boiling number is considered, and experiments demonstrate that expansion produces a notable increase in the boiling number in the region explored, though no optima are observed. Significant decrease in the pressure drop across the evaporator is observed with the expanding channels, and no optima appear. Discussion of the significance of this finding is presented, along with possible avenues for future work.
ContributorsMiner, Mark (Author) / Phelan, Patrick E (Thesis advisor) / Baer, Steven (Committee member) / Chamberlin, Ralph (Committee member) / Chen, Kangping (Committee member) / Herrmann, Marcus (Committee member) / Arizona State University (Publisher)
Created2013
152541-Thumbnail Image.png
Description
Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a subset of items presented to her. The network operator again, shows that activity to a selection of peers, and thus creating a behavioral loop. That mechanism of interaction and information flow raises some very interesting questions such as: can network operator design social signals to promote a particular activity like sustainability, public health care awareness, or to promote a specific product? The focus of my thesis is to answer that question. In this thesis, I develop a framework to personalize social signals for users to guide their activities on an online platform. As the result, we gradually nudge the activity distribution on the platform from the initial distribution p to the target distribution q. My work is particularly applicable to guiding collaborations, guiding collective actions, and online advertising. In particular, I first propose a probabilistic model on how users behave and how information flows on the platform. The main part of this thesis after that discusses the Influence Individuals through Social Signals (IISS) framework. IISS consists of four main components: (1) Learner: it learns users' interests and characteristics from their historical activities using Bayesian model, (2) Calculator: it uses gradient descent method to compute the intermediate activity distributions, (3) Selector: it selects users who can be influenced to adopt or drop specific activities, (4) Designer: it personalizes social signals for each user. I evaluate the performance of IISS framework by simulation on several network topologies such as preferential attachment, small world, and random. I show that the framework gradually nudges users' activities to approach the target distribution. I use both simulation and mathematical method to analyse convergence properties such as how fast and how close we can approach the target distribution. When the number of activities is 3, I show that for about 45% of target distributions, we can achieve KL-divergence as low as 0.05. But for some other distributions KL-divergence can be as large as 0.5.
ContributorsLe, Tien D (Author) / Sundaram, Hari (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2014
152158-Thumbnail Image.png
Description
Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.
ContributorsRihan, Preet Inder Singh (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2013
152514-Thumbnail Image.png
Description
As the size and scope of valuable datasets has exploded across many industries and fields of research in recent years, an increasingly diverse audience has sought out effective tools for their large-scale data analytics needs. Over this period, machine learning researchers have also been very prolific in designing improved algorithms

As the size and scope of valuable datasets has exploded across many industries and fields of research in recent years, an increasingly diverse audience has sought out effective tools for their large-scale data analytics needs. Over this period, machine learning researchers have also been very prolific in designing improved algorithms which are capable of finding the hidden structure within these datasets. As consumers of popular Big Data frameworks have sought to apply and benefit from these improved learning algorithms, the problems encountered with the frameworks have motivated a new generation of Big Data tools to address the shortcomings of the previous generation. One important example of this is the improved performance in the newer tools with the large class of machine learning algorithms which are highly iterative in nature. In this thesis project, I set about to implement a low-rank matrix completion algorithm (as an example of a highly iterative algorithm) within a popular Big Data framework, and to evaluate its performance processing the Netflix Prize dataset. I begin by describing several approaches which I attempted, but which did not perform adequately. These include an implementation of the Singular Value Thresholding (SVT) algorithm within the Apache Mahout framework, which runs on top of the Apache Hadoop MapReduce engine. I then describe an approach which uses the Divide-Factor-Combine (DFC) algorithmic framework to parallelize the state-of-the-art low-rank completion algorithm Orthogoal Rank-One Matrix Pursuit (OR1MP) within the Apache Spark engine. I describe the results of a series of tests running this implementation with the Netflix dataset on clusters of various sizes, with various degrees of parallelism. For these experiments, I utilized the Amazon Elastic Compute Cloud (EC2) web service. In the final analysis, I conclude that the Spark DFC + OR1MP implementation does indeed produce competitive results, in both accuracy and performance. In particular, the Spark implementation performs nearly as well as the MATLAB implementation of OR1MP without any parallelism, and improves performance to a significant degree as the parallelism increases. In addition, the experience demonstrates how Spark's flexible programming model makes it straightforward to implement this parallel and iterative machine learning algorithm.
ContributorsKrouse, Brian (Author) / Ye, Jieping (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2014