This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.

Displaying 1 - 10 of 327
Filtering by

Clear all filters

150019-Thumbnail Image.png
Description
Currently Java is making its way into the embedded systems and mobile devices like androids. The programs written in Java are compiled into machine independent binary class byte codes. A Java Virtual Machine (JVM) executes these classes. The Java platform additionally specifies the Java Native Interface (JNI). JNI allows Java

Currently Java is making its way into the embedded systems and mobile devices like androids. The programs written in Java are compiled into machine independent binary class byte codes. A Java Virtual Machine (JVM) executes these classes. The Java platform additionally specifies the Java Native Interface (JNI). JNI allows Java code that runs within a JVM to interoperate with applications or libraries that are written in other languages and compiled to the host CPU ISA. JNI plays an important role in embedded system as it provides a mechanism to interact with libraries specific to the platform. This thesis addresses the overhead incurred in the JNI due to reflection and serialization when objects are accessed on android based mobile devices. It provides techniques to reduce this overhead. It also provides an API to access objects through its reference through pinning its memory location. The Android emulator was used to evaluate the performance of these techniques and we observed that there was 5 - 10 % performance gain in the new Java Native Interface.
ContributorsChandrian, Preetham (Author) / Lee, Yann-Hang (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2011
150026-Thumbnail Image.png
Description
As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.
ContributorsLiu, Ziyang (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Jagadish, H V (Committee member) / Arizona State University (Publisher)
Created2011
149992-Thumbnail Image.png
Description
Process variations have become increasingly important for scaled technologies starting at 45nm. The increased variations are primarily due to random dopant fluctuations, line-edge roughness and oxide thickness fluctuation. These variations greatly impact all aspects of circuit performance and pose a grand challenge to future robust IC design. To improve robustness,

Process variations have become increasingly important for scaled technologies starting at 45nm. The increased variations are primarily due to random dopant fluctuations, line-edge roughness and oxide thickness fluctuation. These variations greatly impact all aspects of circuit performance and pose a grand challenge to future robust IC design. To improve robustness, efficient methodology is required that considers effect of variations in the design flow. Analyzing timing variability of complex circuits with HSPICE simulations is very time consuming. This thesis proposes an analytical model to predict variability in CMOS circuits that is quick and accurate. There are several analytical models to estimate nominal delay performance but very little work has been done to accurately model delay variability. The proposed model is comprehensive and estimates nominal delay and variability as a function of transistor width, load capacitance and transition time. First, models are developed for library gates and the accuracy of the models is verified with HSPICE simulations for 45nm and 32nm technology nodes. The difference between predicted and simulated σ/μ for the library gates is less than 1%. Next, the accuracy of the model for nominal delay is verified for larger circuits including ISCAS'85 benchmark circuits. The model predicted results are within 4% error of HSPICE simulated results and take a small fraction of the time, for 45nm technology. Delay variability is analyzed for various paths and it is observed that non-critical paths can become critical because of Vth variation. Variability on shortest paths show that rate of hold violations increase enormously with increasing Vth variation.
ContributorsGummalla, Samatha (Author) / Chakrabarti, Chaitali (Thesis advisor) / Cao, Yu (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Arizona State University (Publisher)
Created2011
149794-Thumbnail Image.png
Description
Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. To validate these approaches in a disease-specific context, we built a schizophreniaspecific network based on the inferred associations and performed a comprehensive prioritization of human genes with respect to the disease. These results are expected to be validated empirically, but computational validation using known targets are very positive.
ContributorsLee, Jang (Author) / Gonzalez, Graciela (Thesis advisor) / Ye, Jieping (Committee member) / Davulcu, Hasan (Committee member) / Gallitano-Mendel, Amelia (Committee member) / Arizona State University (Publisher)
Created2011
149907-Thumbnail Image.png
Description
Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and

Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and user credit card purchase pattern monitoring, however the matches to the user queries are in fact plentiful and the system has to efficiently sift through these many matches to locate only the few most preferable matches. In this work, we propose a complex pattern ranking (CPR) framework for specifying top-k pattern queries over streaming data, present new algorithms to support top-k pattern queries in data streaming environments, and verify the effectiveness and efficiency of the proposed algorithms. The developed algorithms identify top-k matching results satisfying both patterns as well as additional criteria. To support real-time processing of the data streams, instead of computing top-k results from scratch for each time window, we maintain top-k results dynamically as new events come and old ones expire. We also develop new top-k join execution strategies that are able to adapt to the changing situations (e.g., sorted and random access costs, join rates) without having to assume a priori presence of data statistics. Experiments show significant improvements over existing approaches.
ContributorsWang, Xinxin (Author) / Candan, K. Selcuk (Thesis advisor) / Chen, Yi (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2011
149695-Thumbnail Image.png
Description
Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query optimization in a distributed event stream processing framework that supports such applications involving various query expressions for detecting events, monitoring conditions, handling data streams, and querying data. Materialized views store the results of the computed view so that subsequent access to the view retrieves the materialized results, avoiding the cost of recomputing the entire view from base data sources. Using a service-based metadata repository that provides metadata level access to the various language components in the system, a heuristics-based algorithm detects the common subexpressions from the queries represented in a mixed multigraph model over relational and structured XML data sources. These common subexpressions can be relational, XML or a hybrid join over the heterogeneous data sources. This research examines the challenges in the definition and materialization of views when the heterogeneous data sources are retained in their native format, instead of converting the data to a common model. LINQ serves as the materialized view definition language for creating the view definitions. An algorithm is introduced that uses LINQ to create a data structure for the persistence of these hybrid views. Any changes to base data sources used to materialize views are captured and mapped to a delta structure. The deltas are then streamed within the framework for use in the incremental update of the materialized view. Algorithms are presented that use the magic sets query optimization approach to both efficiently materialize the views and to propagate the relevant changes to the views for incremental maintenance. Using representative scenarios over structured heterogeneous data sources, an evaluation of the framework demonstrates an improvement in performance. Thus, defining the LINQ-based materialized views over heterogeneous structured data sources using the detected common subexpressions and incrementally maintaining the views by using magic sets enhances the efficiency of the distributed event stream processing environment.
ContributorsChaudhari, Mahesh Balkrishna (Author) / Dietrich, Suzanne W (Thesis advisor) / Urban, Susan D (Committee member) / Davulcu, Hasan (Committee member) / Chen, Yi (Committee member) / Arizona State University (Publisher)
Created2011
150167-Thumbnail Image.png
Description
Redundant Binary (RBR) number representations have been extensively used in the past for high-throughput Digital Signal Processing (DSP) systems. Data-path components based on this number system have smaller critical path delay but larger area compared to conventional two's complement systems. This work explores the use of RBR number representation for

Redundant Binary (RBR) number representations have been extensively used in the past for high-throughput Digital Signal Processing (DSP) systems. Data-path components based on this number system have smaller critical path delay but larger area compared to conventional two's complement systems. This work explores the use of RBR number representation for implementing high-throughput DSP systems that are also energy-efficient. Data-path components such as adders and multipliers are evaluated with respect to critical path delay, energy and Energy-Delay Product (EDP). A new design for a RBR adder with very good EDP performance has been proposed. The corresponding RBR parallel adder has a much lower critical path delay and EDP compared to two's complement carry select and carry look-ahead adder implementations. Next, several RBR multiplier architectures are investigated and their performance compared to two's complement systems. These include two new multiplier architectures: a purely RBR multiplier where both the operands are in RBR form, and a hybrid multiplier where the multiplicand is in RBR form and the other operand is represented in conventional two's complement form. Both the RBR and hybrid designs are demonstrated to have better EDP performance compared to conventional two's complement multipliers. The hybrid multiplier is also shown to have a superior EDP performance compared to the RBR multiplier, with much lower implementation area. Analysis on the effect of bit-precision is also performed, and it is shown that the performance gain of RBR systems improves for higher bit precision. Next, in order to demonstrate the efficacy of the RBR representation at the system-level, the performance of RBR and hybrid implementations of some common DSP kernels such as Discrete Cosine Transform, edge detection using Sobel operator, complex multiplication, Lifting-based Discrete Wavelet Transform (9, 7) filter, and FIR filter, is compared with two's complement systems. It is shown that for relatively large computation modules, the RBR to two's complement conversion overhead gets amortized. In case of systems with high complexity, for iso-throughput, both the hybrid and RBR implementations are demonstrated to be superior with lower average energy consumption. For low complexity systems, the conversion overhead is significant, and overpowers the EDP performance gain obtained from the RBR computation operation.
ContributorsMahadevan, Rupa (Author) / Chakrabarti, Chaitali (Thesis advisor) / Kiaei, Sayfe (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2011
150212-Thumbnail Image.png
Description
This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management systems (DBMS). Also, unlike other client-driven work, this approach provides support for a richer set of schema updates including vertical split (normalization), horizontal split, vertical and horizontal merge (union), difference and intersection. The update process automatically generates a runtime update client from a mapping between the old the new schemas. The solution has been validated by testing it on a relatively small database of around 300,000 records per table and less than 1 Gb, but with limited memory buffer size of 24 Mb. This thesis presents the study of the overhead of the update process as a function of the transaction rates and the batch size used to copy data from the old to the new schema. It shows that the overhead introduced is minimal for medium size applications and that the update can be achieved with no more than one minute of downtime.
ContributorsTyagi, Preetika (Author) / Bazzi, Rida (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2011
150187-Thumbnail Image.png
Description
Genomic and proteomic sequences, which are in the form of deoxyribonucleic acid (DNA) and amino acids respectively, play a vital role in the structure, function and diversity of every living cell. As a result, various genomic and proteomic sequence processing methods have been proposed from diverse disciplines, including biology, chemistry,

Genomic and proteomic sequences, which are in the form of deoxyribonucleic acid (DNA) and amino acids respectively, play a vital role in the structure, function and diversity of every living cell. As a result, various genomic and proteomic sequence processing methods have been proposed from diverse disciplines, including biology, chemistry, physics, computer science and electrical engineering. In particular, signal processing techniques were applied to the problems of sequence querying and alignment, that compare and classify regions of similarity in the sequences based on their composition. However, although current approaches obtain results that can be attributed to key biological properties, they require pre-processing and lack robustness to sequence repetitions. In addition, these approaches do not provide much support for efficiently querying sub-sequences, a process that is essential for tracking localized database matches. In this work, a query-based alignment method for biological sequences that maps sequences to time-domain waveforms before processing the waveforms for alignment in the time-frequency plane is first proposed. The mapping uses waveforms, such as time-domain Gaussian functions, with unique sequence representations in the time-frequency plane. The proposed alignment method employs a robust querying algorithm that utilizes a time-frequency signal expansion whose basis function is matched to the basic waveform in the mapped sequences. The resulting WAVEQuery approach is demonstrated for both DNA and protein sequences using the matching pursuit decomposition as the signal basis expansion. The alignment localization of WAVEQuery is specifically evaluated over repetitive database segments, and operable in real-time without pre-processing. It is demonstrated that WAVEQuery significantly outperforms the biological sequence alignment method BLAST for queries with repetitive segments for DNA sequences. A generalized version of the WAVEQuery approach with the metaplectic transform is also described for protein sequence structure prediction. For protein alignment, it is often necessary to not only compare the one-dimensional (1-D) primary sequence structure but also the secondary and tertiary three-dimensional (3-D) space structures. This is done after considering the conformations in the 3-D space due to the degrees of freedom of these structures. As a result, a novel directionality based 3-D waveform mapping for the 3-D protein structures is also proposed and it is used to compare protein structures using a matched filter approach. By incorporating a 3-D time axis, a highly-localized Gaussian-windowed chirp waveform is defined, and the amino acid information is mapped to the chirp parameters that are then directly used to obtain directionality in the 3-D space. This mapping is unique in that additional characteristic protein information such as hydrophobicity, that relates the sequence with the structure, can be added as another representation parameter. The additional parameter helps tracking similarities over local segments of the structure, this enabling classification of distantly related proteins which have partial structural similarities. This approach is successfully tested for pairwise alignments over full length structures, alignments over multiple structures to form a phylogenetic trees, and also alignments over local segments. Also, basic classification over protein structural classes using directional descriptors for the protein structure is performed.
ContributorsRavichandran, Lakshminarayan (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Spanias, Andreas S (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Lacroix, Zoé (Committee member) / Arizona State University (Publisher)
Created2011
150189-Thumbnail Image.png
Description
This thesis research attempts to observe, measure and visualize the communication patterns among developers of an open source community and analyze how this can be inferred in terms of progress of that open source project. Here I attempted to analyze the Ubuntu open source project's email data (9 subproject log

This thesis research attempts to observe, measure and visualize the communication patterns among developers of an open source community and analyze how this can be inferred in terms of progress of that open source project. Here I attempted to analyze the Ubuntu open source project's email data (9 subproject log archives over a period of five years) and focused on drawing more precise metrics from different perspectives of the communication data. Also, I attempted to overcome the scalability issue by using Apache Pig libraries, which run on a MapReduce framework based Hadoop Cluster. I described four metrics based on which I observed and analyzed the data and also presented the results which show the required patterns and anomalies to better understand and infer the communication. Also described the usage experience with Pig Latin (scripting language of Apache Pig Libraries) for this research and how they brought the feature of scalability, simplicity, and visibility in this data intensive research work. These approaches are useful in project monitoring, to augment human observation and reporting, in social network analysis, to track individual contributions.
ContributorsMotamarri, Lakshminarayana (Author) / Santanam, Raghu (Thesis advisor) / Ye, Jieping (Thesis advisor) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)
Created2011