Search Content

Efficient Java native interface for android based mobile devices

Description

Currently Java is making its way into the embedded systems and mobile devices like androids. The programs written in Java are compiled into machine independent binary class byte codes. A Java Virtual Machine (JVM) executes these classes. The Java platform additionally specifies the Java Native Interface (JNI). JNI allows Java…

Currently Java is making its way into the embedded systems and mobile devices like androids. The programs written in Java are compiled into machine independent binary class byte codes. A Java Virtual Machine (JVM) executes these classes. The Java platform additionally specifies the Java Native Interface (JNI). JNI allows Java code that runs within a JVM to interoperate with applications or libraries that are written in other languages and compiled to the host CPU ISA. JNI plays an important role in embedded system as it provides a mechanism to interact with libraries specific to the platform. This thesis addresses the overhead incurred in the JNI due to reflection and serialization when objects are accessed on android based mobile devices. It provides techniques to reduce this overhead. It also provides an API to access objects through its reference through pinning its memory location. The Android emulator was used to evaluate the performance of these techniques and we observed that there was 5 - 10 % performance gain in the new Java Native Interface.

ContributorsChandrian, Preetham (Author) / Lee, Yann-Hang (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2011

Insights on seasonal fluxes in a desert shrubland watershed

Description

The North American Monsoon System (NAMS) contributes ~55% of the annual rainfall in the Chihuahuan Desert during the summer months. Relatively frequent, intense storms during the NAMS increase soil moisture, reduce surface temperature and lead to runoff in ephemeral channels. Quantifying these processes, however, is difficult due to the sparse…

The North American Monsoon System (NAMS) contributes ~55% of the annual rainfall in the Chihuahuan Desert during the summer months. Relatively frequent, intense storms during the NAMS increase soil moisture, reduce surface temperature and lead to runoff in ephemeral channels. Quantifying these processes, however, is difficult due to the sparse nature of coordinated observations. In this study, I present results from a field network of rain gauges (n = 5), soil probes (n = 48), channel flumes (n = 4), and meteorological equipment in a small desert shrubland watershed (~0.05 km2) in the Jornada Experimental. Using this high-resolution network, I characterize the temporal and spatial variability of rainfall, soil conditions and channel runoff within the watershed from June 2010 to September 2011, covering two NAMS periods. In addition, CO2, water and energy measurements at an eddy covariance tower quantify seasonal, monthly and event-scale changes in land-atmosphere states and fluxes. Results from this study indicate a strong seasonality in water and energy fluxes, with a reduction in Bowen ratio (B, the ratio of sensible to latent heat fluxes) from winter (B = 14) to summer (B = 3.3). This reduction is tied to shallow soil moisture availability during the summer (s = 0.040 m3/m3) as compared to the winter (s = 0.004 m3/m3). During the NAMS, I analyzed four consecutive rainfall-runoff events to quantify the soil moisture and channel flow responses and how water availability impacted the land-atmosphere fluxes. Spatial hydrologic variations during events occur over distances as short as ~15 m. The field network also allowed comparisons of several approaches to estimate evapotranspiration (ET). I found a more accurate ET estimate (a reduction of mean absolute error by 38%) when using distributed soil moisture data, as compared to a standard water balance approach based on the tower site. In addition, use of spatially-varied soil moisture data yielded a more reasonable relationship between ET and soil moisture, an important parameterization in many hydrologic models. The analyses illustrates the value of high-resolution sampling for quantifying seasonal fluxes in desert shrublands and their improvements in closing the water balance in small watersheds.

ContributorsTempleton, Ryan (Author) / Vivoni, Enrique R (Thesis advisor) / Mays, Larry (Committee member) / Fox, Peter (Committee member) / Arizona State University (Publisher)

Created2011

Query expansion for handling exploratory and ambiguous keyword queries

Description

Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical…

Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical options, in order to narrow down the search and reach the desired result. Typical corpus-driven keyword query expansion approaches return popular words in the results as expanded queries. These empirical methods fail to cover all semantics of categories present in the query results. More importantly these methods do not consider the semantic relationship between the keywords featured in an expanded query. Contrary to a normal keyword search setting, these factors are non-trivial in an exploratory and ambiguous query setting where the user's precise discernment of different categories present in the query results is more important for making subsequent search decisions. In this thesis, I propose a new framework for keyword query expansion: generating a set of queries that correspond to the categorization of original query results, which is referred as Categorizing query expansion. Two approaches of algorithms are proposed, one that performs clustering as pre-processing step and then generates categorizing expanded queries based on the clusters. The other category of algorithms handle the case of generating quality expanded queries in the presence of imperfect clusters.

ContributorsNatarajan, Sivaramakrishnan (Author) / Chen, Yi (Thesis advisor) / Candan, Selcuk (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

CPR complex pattern ranking for evaluating top-k pattern queries over event streams

Description

Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and…

Most existing approaches to complex event processing over streaming data rely on the assumption that the matches to the queries are rare and that the goal of the system is to identify these few matches within the incoming deluge of data. In many applications, such as stock market analysis and user credit card purchase pattern monitoring, however the matches to the user queries are in fact plentiful and the system has to efficiently sift through these many matches to locate only the few most preferable matches. In this work, we propose a complex pattern ranking (CPR) framework for specifying top-k pattern queries over streaming data, present new algorithms to support top-k pattern queries in data streaming environments, and verify the effectiveness and efficiency of the proposed algorithms. The developed algorithms identify top-k matching results satisfying both patterns as well as additional criteria. To support real-time processing of the data streams, instead of computing top-k results from scratch for each time window, we maintain top-k results dynamically as new events come and old ones expire. We also develop new top-k join execution strategies that are able to adapt to the changing situations (e.g., sorted and random access costs, join rates) without having to assume a priori presence of data statistics. Experiments show significant improvements over existing approaches.

ContributorsWang, Xinxin (Author) / Candan, K. Selcuk (Thesis advisor) / Chen, Yi (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2011

Time efficient and quality effective K nearest neighbor search in high dimension space

Description

K-Nearest-Neighbors (KNN) search is a fundamental problem in many application domains such as database and data mining, information retrieval, machine learning, pattern recognition and plagiarism detection. Locality sensitive hash (LSH) is so far the most practical approximate KNN search algorithm for high dimensional data. Algorithms such as Multi-Probe LSH and…

K-Nearest-Neighbors (KNN) search is a fundamental problem in many application domains such as database and data mining, information retrieval, machine learning, pattern recognition and plagiarism detection. Locality sensitive hash (LSH) is so far the most practical approximate KNN search algorithm for high dimensional data. Algorithms such as Multi-Probe LSH and LSH-Forest improve upon the basic LSH algorithm by varying hash bucket size dynamically at query time, so these two algorithms can answer different KNN queries adaptively. However, these two algorithms need a data access post-processing step after candidates' collection in order to get the final answer to the KNN query. In this thesis, Multi-Probe LSH with data access post-processing (Multi-Probe LSH with DAPP) algorithm and LSH-Forest with data access post-processing (LSH-Forest with DAPP) algorithm are improved by replacing the costly data access post-processing (DAPP) step with a much faster histogram-based post-processing (HBPP). Two HBPP algorithms: LSH-Forest with HBPP and Multi- Probe LSH with HBPP are presented in this thesis, both of them achieve the three goals for KNN search in large scale high dimensional data set: high search quality, high time efficiency, high space efficiency. None of the previous KNN algorithms can achieve all three goals. More specifically, it is shown that HBPP algorithms can always achieve high search quality (as good as LSH-Forest with DAPP and Multi-Probe LSH with DAPP) with much less time cost (one to several orders of magnitude speedup) and same memory usage. It is also shown that with almost same time cost and memory usage, HBPP algorithms can always achieve better search quality than LSH-Forest with random pick (LSH-Forest with RP) and Multi-Probe LSH with random pick (Multi-Probe LSH with RP). Moreover, to achieve a very high search quality, Multi-Probe with HBPP is always a better choice than LSH-Forest with HBPP, regardless of the distribution, size and dimension number of the data set.

ContributorsYu, Renwei (Author) / Candan, Kasim S (Thesis advisor) / Sapino, Maria L (Committee member) / Chen, Yi (Committee member) / Sundaram, Hari (Committee member) / Arizona State University (Publisher)

Created2011

Water-energy nexus insight: optimization of source waters for DBP control

Description

Local municipalities in the Phoenix Metropolitan Area have voiced an interest in purchasing alternate source water with lower DBP precursors. Along the primary source is a hydroelectric dam in which water will be diverted from. This project is an assessment of optimizing the potential blends of source water to a…

Local municipalities in the Phoenix Metropolitan Area have voiced an interest in purchasing alternate source water with lower DBP precursors. Along the primary source is a hydroelectric dam in which water will be diverted from. This project is an assessment of optimizing the potential blends of source water to a water treatment plant in an effort to enable them to more readily meet DBP regulations. To perform this analysis existing water treatment models were used in conjunction with historic water quality sampling data to predict chemical usage necessary to meet DBP regulations. A retrospective analysis was performed for the summer months of 2007 regarding potential for the WTP to reduce cost through optimizing the source water by an average of 30% over the four-month period, accumulating to overall treatment savings of $154 per MG ($82 per AF).

ContributorsRice, Jacelyn (Author) / Westerhoff, Paul (Thesis advisor) / Fox, Peter (Committee member) / Hristovski, Kiril (Committee member) / Arizona State University (Publisher)

Created2011

Client-driven dynamic database updates

Description

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management…

This thesis addresses the problem of online schema updates where the goal is to be able to update relational database schemas without reducing the database system's availability. Unlike some other work in this area, this thesis presents an approach which is completely client-driven and does not require specialized database management systems (DBMS). Also, unlike other client-driven work, this approach provides support for a richer set of schema updates including vertical split (normalization), horizontal split, vertical and horizontal merge (union), difference and intersection. The update process automatically generates a runtime update client from a mapping between the old the new schemas. The solution has been validated by testing it on a relatively small database of around 300,000 records per table and less than 1 Gb, but with limited memory buffer size of 24 Mb. This thesis presents the study of the overhead of the update process as a function of the transaction rates and the batch size used to copy data from the old to the new schema. It shows that the overhead introduced is minimal for medium size applications and that the update can be achieved with no more than one minute of downtime.

ContributorsTyagi, Preetika (Author) / Bazzi, Rida (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2011

Analysis of photocatalysis for precursor removal and formation inhibition of disinfection byproducts

Description

Disinfection byproducts are the result of reactions between natural organic matter (NOM) and a disinfectant. The formation and speciation of DBP formation is largely dependent on the disinfectant used and the natural organic matter (NOM) concentration and composition. This study examined the use of photocatalysis with titanium dioxide for the…

Disinfection byproducts are the result of reactions between natural organic matter (NOM) and a disinfectant. The formation and speciation of DBP formation is largely dependent on the disinfectant used and the natural organic matter (NOM) concentration and composition. This study examined the use of photocatalysis with titanium dioxide for the oxidation and removal of DBP precursors (NOM) and the inhibition of DBP formation. Water sources were collected from various points in the treatment process, treated with photocatalysis, and chlorinated to analyze the implications on total trihalomethane (TTHM) and the five haloacetic acids (HAA5) formations. The three sub-objectives for this study included: the comparison of enhanced and standard coagulation to photocatalysis for the removal of DBP precursors; the analysis of photocatalysis and characterization of organic matter using size exclusion chromatography and fluorescence spectroscopy and excitation-emission matrices; and the analysis of photocatalysis before GAC filtration. There were consistencies in the trends for each objective including reduced DBP precursors, measured as dissolved organic carbon DOC concentration and UV absorbance at 254 nm. Both of these parameters decreased with increased photocatalytic treatment and could be due in part to the adsorption to as well as the oxidation of NOM on the TiO2 surface. This resulted in lower THM and HAA concentrations at Medium and High photocatalytic treatment levels. However, at No UV exposure and Low photocatalytic treatment levels where oxidation reactions were inherently incomplete, there was an increase in THM and HAA formation potential, in most cases being significantly greater than those found in the raw water or Control samples. The size exclusion chromatography (SEC) results suggest that photocatalysis preferentially degrades the higher molecular mass fraction of NOM releasing lower molecular mass (LMM) compounds that have not been completely oxidized. The molecular weight distributions could explain the THM and HAA formation potentials that decreased at the No UV exposure samples but increased at Low photocatalytic treatment levels. The use of photocatalysis before GAC adsorption appears to increase bed life of the contactors; however, higher photocatalytic treatment levels have been shown to completely mineralize NOM and would therefore not require additional GAC adsorption after photocatalysis.

ContributorsDaugherty, Erin (Author) / Abbaszadegan, Morteza (Thesis advisor) / Fox, Peter (Committee member) / Mayer, Brooke (Committee member) / Arizona State University (Publisher)

Created2011

Large scale analytical insights of email communication patterns

Description

This thesis research attempts to observe, measure and visualize the communication patterns among developers of an open source community and analyze how this can be inferred in terms of progress of that open source project. Here I attempted to analyze the Ubuntu open source project's email data (9 subproject log…

This thesis research attempts to observe, measure and visualize the communication patterns among developers of an open source community and analyze how this can be inferred in terms of progress of that open source project. Here I attempted to analyze the Ubuntu open source project's email data (9 subproject log archives over a period of five years) and focused on drawing more precise metrics from different perspectives of the communication data. Also, I attempted to overcome the scalability issue by using Apache Pig libraries, which run on a MapReduce framework based Hadoop Cluster. I described four metrics based on which I observed and analyzed the data and also presented the results which show the required patterns and anomalies to better understand and infer the communication. Also described the usage experience with Pig Latin (scripting language of Apache Pig Libraries) for this research and how they brought the feature of scalability, simplicity, and visibility in this data intensive research work. These approaches are useful in project monitoring, to augment human observation and reporting, in social network analysis, to track individual contributions.

ContributorsMotamarri, Lakshminarayana (Author) / Santanam, Raghu (Thesis advisor) / Ye, Jieping (Thesis advisor) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2011

Topic sensitive sourcerank: extending sourcerank for performing context-sensitive search over deep-web

Description

Source selection is one of the foremost challenges for searching deep-web. For a user query, source selection involves selecting a subset of deep-web sources expected to provide relevant answers to the user query. Existing source selection models employ query-similarity based local measures for assessing source quality. These local measures are…

Source selection is one of the foremost challenges for searching deep-web. For a user query, source selection involves selecting a subset of deep-web sources expected to provide relevant answers to the user query. Existing source selection models employ query-similarity based local measures for assessing source quality. These local measures are necessary but not sufficient as they are agnostic to source trustworthiness and result importance, which, given the autonomous and uncurated nature of deep-web, have become indispensible for searching deep-web. SourceRank provides a global measure for assessing source quality based on source trustworthiness and result importance. SourceRank's effectiveness has been evaluated in single-topic deep-web environments. The goal of the thesis is to extend sourcerank to a multi-topic deep-web environment. Topic-sensitive sourcerank is introduced as an effective way of extending sourcerank to a deep-web environment containing a set of representative topics. In topic-sensitive sourcerank, multiple sourcerank vectors are created, each biased towards a representative topic. At query time, using the topic of query keywords, a query-topic sensitive, composite sourcerank vector is computed as a linear combination of these pre-computed biased sourcerank vectors. Extensive experiments on more than a thousand sources in multiple domains show 18-85% improvements in result quality over Google Product Search and other existing methods.

ContributorsJha, Manishkumar (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2011

Filtering by