Search Content

Enhancing the usability of complex structured data by supporting keyword searches

Description

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily…

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.

ContributorsLiu, Ziyang (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Jagadish, H V (Committee member) / Arizona State University (Publisher)

Created2011

Association based prioritization of genes

Description

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them…

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. To validate these approaches in a disease-specific context, we built a schizophreniaspecific network based on the inferred associations and performed a comprehensive prioritization of human genes with respect to the disease. These results are expected to be validated empirically, but computational validation using known targets are very positive.

ContributorsLee, Jang (Author) / Gonzalez, Graciela (Thesis advisor) / Ye, Jieping (Committee member) / Davulcu, Hasan (Committee member) / Gallitano-Mendel, Amelia (Committee member) / Arizona State University (Publisher)

Created2011

Field effect modulation of ion transport in silicon-on-insulator nanopores and their application as nanoscale coulter counters

Description

In the last few years, significant advances in nanofabrication have allowed tailoring of structures and materials at a molecular level enabling nanofabrication with precise control of dimensions and organization at molecular length scales, a development leading to significant advances in nanoscale systems. Although, the direction of progress seems to follow…

In the last few years, significant advances in nanofabrication have allowed tailoring of structures and materials at a molecular level enabling nanofabrication with precise control of dimensions and organization at molecular length scales, a development leading to significant advances in nanoscale systems. Although, the direction of progress seems to follow the path of microelectronics, the fundamental physics in a nanoscale system changes more rapidly compared to microelectronics, as the size scale is decreased. The changes in length, area, and volume ratios due to reduction in size alter the relative influence of various physical effects determining the overall operation of a system in unexpected ways. One such category of nanofluidic structures demonstrating unique ionic and molecular transport characteristics are nanopores. Nanopores derive their unique transport characteristics from the electrostatic interaction of nanopore surface charge with aqueous ionic solutions. In this doctoral research cylindrical nanopores, in single and array configuration, were fabricated in silicon-on-insulator (SOI) using a combination of electron beam lithography (EBL) and reactive ion etching (RIE). The fabrication method presented is compatible with standard semiconductor foundries and allows fabrication of nanopores with desired geometries and precise dimensional control, providing near ideal and isolated physical modeling systems to study ion transport at the nanometer level. Ion transport through nanopores was characterized by measuring ionic conductances of arrays of nanopores of various diameters for a wide range of concentration of aqueous hydrochloric acid (HCl) ionic solutions. Measured ionic conductances demonstrated two distinct regimes based on surface charge interactions at low ionic concentrations and nanopore geometry at high ionic concentrations. Field effect modulation of ion transport through nanopore arrays, in a fashion similar to semiconductor transistors, was also studied. Using ionic conductance measurements, it was shown that the concentration of ions in the nanopore volume was significantly changed when a gate voltage on nanopore arrays was applied, hence controlling their transport. Based on the ion transport results, single nanopores were used to demonstrate their application as nanoscale particle counters by using polystyrene nanobeads, monodispersed in aqueous HCl solutions of different molarities. Effects of field effect modulation on particle transition events were also demonstrated.

ContributorsJoshi, Punarvasu (Author) / Thornton, Trevor J (Thesis advisor) / Goryll, Michael (Thesis advisor) / Spanias, Andreas (Committee member) / Saraniti, Marco (Committee member) / Arizona State University (Publisher)

Created2011

Materialized views over heterogeneous structured data sources in a distributed event stream processing environment

Description

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query…

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query optimization in a distributed event stream processing framework that supports such applications involving various query expressions for detecting events, monitoring conditions, handling data streams, and querying data. Materialized views store the results of the computed view so that subsequent access to the view retrieves the materialized results, avoiding the cost of recomputing the entire view from base data sources. Using a service-based metadata repository that provides metadata level access to the various language components in the system, a heuristics-based algorithm detects the common subexpressions from the queries represented in a mixed multigraph model over relational and structured XML data sources. These common subexpressions can be relational, XML or a hybrid join over the heterogeneous data sources. This research examines the challenges in the definition and materialization of views when the heterogeneous data sources are retained in their native format, instead of converting the data to a common model. LINQ serves as the materialized view definition language for creating the view definitions. An algorithm is introduced that uses LINQ to create a data structure for the persistence of these hybrid views. Any changes to base data sources used to materialize views are captured and mapped to a delta structure. The deltas are then streamed within the framework for use in the incremental update of the materialized view. Algorithms are presented that use the magic sets query optimization approach to both efficiently materialize the views and to propagate the relevant changes to the views for incremental maintenance. Using representative scenarios over structured heterogeneous data sources, an evaluation of the framework demonstrates an improvement in performance. Thus, defining the LINQ-based materialized views over heterogeneous structured data sources using the detected common subexpressions and incrementally maintaining the views by using magic sets enhances the efficiency of the distributed event stream processing environment.

ContributorsChaudhari, Mahesh Balkrishna (Author) / Dietrich, Suzanne W (Thesis advisor) / Urban, Susan D (Committee member) / Davulcu, Hasan (Committee member) / Chen, Yi (Committee member) / Arizona State University (Publisher)

Created2011

Comparative analysis of simulation of trap induced threshold voltage fluctuations for 45 nm gate length n-MOSFET and analytical model predictions

Description

In very small electronic devices the alternate capture and emission of carriers at an individual defect site located at the interface of Si:SiO2 of a MOSFET generates discrete switching in the device conductance referred to as a random telegraph signal (RTS) or random telegraph noise (RTN). In this research work,…

In very small electronic devices the alternate capture and emission of carriers at an individual defect site located at the interface of Si:SiO2 of a MOSFET generates discrete switching in the device conductance referred to as a random telegraph signal (RTS) or random telegraph noise (RTN). In this research work, the integration of random defects positioned across the channel at the Si:SiO2 interface from source end to the drain end in the presence of different random dopant distributions are used to conduct Ensemble Monte-Carlo ( EMC ) based numerical simulation of key device performance metrics for 45 nm gate length MOSFET device. The two main performance parameters that affect RTS based reliability measurements are percentage change in threshold voltage and percentage change in drain current fluctuation in the saturation region. It has been observed as a result of the simulation that changes in both and values moderately decrease as the defect position is gradually moved from source end to the drain end of the channel. Precise analytical device physics based model needs to be developed to explain and assess the EMC simulation based higher VT fluctuations as experienced for trap positions at the source side. A new analytical model has been developed that simultaneously takes account of dopant number variations in the channel and depletion region underneath and carrier mobility fluctuations resulting from fluctuations in surface potential barriers. Comparisons of this new analytical model along with existing analytical models are shown to correlate with 3D EMC simulation based model for assessment of VT fluctuations percentage induced by a single interface trap. With scaling of devices beyond 32 nm node, halo doping at the source and drain are routinely incorporated to combat the threshold voltage roll-off that takes place with effective channel length reduction. As a final study on this regard, 3D EMC simulation method based computations of threshold voltage fluctuations have been performed for varying source and drain halo pocket length to illustrate the threshold voltage fluctuations related reliability problems that have been aggravated by trap positions near the source at the interface compared to conventional 45 nm MOSFET.

ContributorsAshraf, Nabil Shovon (Author) / Vasileska, Dragica (Thesis advisor) / Schroder, Dieter (Committee member) / Goodnick, Stephen (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)

Created2011

Codoped zinc oxide by a novel co-spray deposition technique for solar cells applications

Description

Zinc oxide (ZnO), a naturally n-type semiconductor has been identified as a promising candidate to replace indium tin oxide (ITO) as the transparent electrode in solar cells, because of its wide bandgap (3.37 eV), abundant source materials and suitable refractive index (2.0 at 600 nm). Spray deposition is a convenient…

Zinc oxide (ZnO), a naturally n-type semiconductor has been identified as a promising candidate to replace indium tin oxide (ITO) as the transparent electrode in solar cells, because of its wide bandgap (3.37 eV), abundant source materials and suitable refractive index (2.0 at 600 nm). Spray deposition is a convenient and low cost technique for large area and uniform deposition of semiconductor thin films. In particular, it provides an easier way to dope the film by simply adding the dopant precursor into the starting solution. In order to reduce the resistivity of undoped ZnO, many works have been done by doping in the ZnO with either group IIIA elements or VIIA elements using spray pyrolysis. However, the resistivity is still too high to meet TCO's resistivity requirement. In the present work, a novel co-spray deposition technique is developed to bypass a fundamental limitation in the conventional spray deposition technique, i.e. the deposition of metal oxides from incompatible precursors in the starting solution. With this technique, ZnO films codoped with one cationic dopant, Al, Cr, or Fe, and an anionic dopant, F, have been successfully synthesized, in which F is incompatible with all these three cationic dopants. Two starting solutions were prepared and co-sprayed through two separate spray heads. One solution contained only the F precursor, NH 4F. The second solution contained the Zn and one cationic dopant precursors, Zn(O 2CCH 3) 2 and AlCl 3, CrCl 3, or FeCl 3. The deposition was carried out at 500 &degC; on soda-lime glass in air. Compared to singly-doped ZnO thin films, codoped ZnO samples showed better electrical properties. Besides, a minimum sheet resistance, 55.4 Ω/sq, was obtained for Al and F codoped ZnO films after vacuum annealing at 400 &degC;, which was lower than singly-doped ZnO with either Al or F. The transmittance for the Al and F codoped ZnO samples was above 90% in the visible range. This co-spray deposition technique provides a simple and cost-effective way to synthesize metal oxides from incompatible precursors with improved properties.

ContributorsZhou, Bin (Author) / Tao, Meng (Thesis advisor) / Goryll, Michael (Committee member) / Vasileska, Dragica (Committee member) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)

Created2013

Resistance switching in chalcogenide based programmable metallization cells (PMC) and sensors under gamma-rays

Description

Chalcogenide glass (ChG) materials have gained wide attention because of their applications in conductive bridge random access memory (CBRAM), phase change memories (PC-RAM), optical rewritable disks (CD-RW and DVD-RW), microelectromechanical systems (MEMS), microfluidics, and optical communications. One of the significant properties of ChG materials is the change in the resistivity…

Chalcogenide glass (ChG) materials have gained wide attention because of their applications in conductive bridge random access memory (CBRAM), phase change memories (PC-RAM), optical rewritable disks (CD-RW and DVD-RW), microelectromechanical systems (MEMS), microfluidics, and optical communications. One of the significant properties of ChG materials is the change in the resistivity of the material when a metal such as Ag or Cu is added to it by diffusion. This study demonstrates the potential radiation-sensing capabilities of two metal/chalcogenide glass device configurations. Lateral and vertical device configurations sense the radiation-induced migration of Ag+ ions in germanium selenide glasses via changes in electrical resistance between electrodes on the ChG. Before irradiation, these devices exhibit a high-resistance `OFF-state' (in the order of 10E12) but following irradiation, with either 60-Co gamma-rays or UV light, their resistance drops to a low-resistance `ON-state' (around 10E3). Lateral devices have exhibited cyclical recovery with room temperature annealing of the Ag doped ChG, which suggests potential uses in reusable radiation sensor applications. The feasibility of producing inexpensive flexible radiation sensors has been demonstrated by studying the effects of mechanical strain and temperature stress on sensors formed on flexible polymer substrate. The mechanisms of radiation-induced Ag/Ag+ transport and reactions in ChG have been modeled using a finite element device simulator, ATLAS. The essential reactions captured by the simulator are radiation-induced carrier generation, combined with reduction/oxidation for Ag species in the chalcogenide film. Metal-doped ChGs are solid electrolytes that have both ionic and electronic conductivity. The ChG based Programmable Metallization Cell (PMC) is a technology platform that offers electric field dependent resistance switching mechanisms by formation and dissolution of nano sized conductive filaments in a ChG solid electrolyte between oxidizable and inert electrodes. This study identifies silver anode agglomeration in PMC devices following large radiation dose exposure and considers device failure mechanisms via electrical and material characterization. The results demonstrate that by changing device structural parameters, silver agglomeration in PMC devices can be suppressed and reliable resistance switching may be maintained for extremely high doses ranging from 4 Mrad(GeSe) to more than 10 Mrad (ChG).

ContributorsDandamudi, Pradeep (Author) / Kozicki, Michael N (Thesis advisor) / Barnaby, Hugh J (Committee member) / Holbert, Keith E. (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)

Created2013

The classification of domain concepts in object-oriented systems

Description

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software…

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this thesis, we present a technique that facilitates filtering of classes from existing systems at the source level based on their relationship to concepts in the domain via a classification method using machine learning. We showed that concepts can be identified using a machine learning classifier based on source level metrics. We developed an Eclipse plugin to assist with the process of manually classifying Java source code, and collecting metrics and classifications into a standard file format. We developed an Eclipse plugin to act as a concept identifier that visually indicates a class as a domain concept or not. We minimized the size of training sets to ensure a useful approach in practice. This allowed us to determine that a training set of 7:5 to 10% is nearly as effective as a training set representing 50% of the system. We showed that random selection is the most consistent and effective means of selecting a training set. We found that KNN is the most consistent performer among the learning algorithms tested. We determined the optimal feature set for this classification problem. We discussed two possible structures besides a one to one mapping of domain knowledge to implementation. We showed that classes representing more than one concept are simply concepts at differing levels of abstraction. We also discussed composite concepts representing a domain concept implemented by more than one class. We showed that these composite concepts are difficult to detect because the problem is NP-complete.

ContributorsCarey, Maurice (Author) / Colbourn, Charles (Thesis advisor) / Collofello, James (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Low power, high throughput continuous flow PCR instruments for environmental applications

Description

Continuous monitoring in the adequate temporal and spatial scale is necessary for a better understanding of environmental variations. But field deployments of molecular biological analysis platforms in that scale are currently hindered because of issues with power, throughput and automation. Currently, such analysis is performed by the collection of large…

Continuous monitoring in the adequate temporal and spatial scale is necessary for a better understanding of environmental variations. But field deployments of molecular biological analysis platforms in that scale are currently hindered because of issues with power, throughput and automation. Currently, such analysis is performed by the collection of large sample volumes from over a wide area and transporting them to laboratory testing facilities, which fail to provide any real-time information. This dissertation evaluates the systems currently utilized for in-situ field analyses and the issues hampering the successful deployment of such bioanalytial instruments for environmental applications. The design and development of high throughput, low power, and autonomous Polymerase Chain Reaction (PCR) instruments, amenable for portable field operations capable of providing quantitative results is presented here as part of this dissertation. A number of novel innovations have been reported here as part of this work in microfluidic design, PCR thermocycler design, optical design and systems integration. Emulsion microfluidics in conjunction with fluorinated oils and Teflon tubing have been used for the fluidic module that reduces cross-contamination eliminating the need for disposable components or constant cleaning. A cylindrical heater has been designed with the tubing wrapped around fixed temperature zones enabling continuous operation. Fluorescence excitation and detection have been achieved by using a light emitting diode (LED) as the excitation source and a photomultiplier tube (PMT) as the detector. Real-time quantitative PCR results were obtained by using multi-channel fluorescence excitation and detection using LED, optical fibers and a 64-channel multi-anode PMT for measuring continuous real-time fluorescence. The instrument was evaluated by comparing the results obtained with those obtained from a commercial instrument and found to be comparable. To further improve the design and enhance its field portability, this dissertation also presents a framework for the instrumentation necessary for a portable digital PCR platform to achieve higher throughputs with lower power. Both systems were designed such that it can easily couple with any upstream platform capable of providing nucleic acid for analysis using standard fluidic connections. Consequently, these instruments can be used not only in environmental applications, but portable diagnostics applications as well.

ContributorsRay, Tathagata (Author) / Youngbull, Cody (Thesis advisor) / Goryll, Michael (Thesis advisor) / Blain Christen, Jennifer (Committee member) / Yu, Hongyu (Committee member) / Arizona State University (Publisher)

Created2013

Systems integration for biosensing: design, fabrication, and packaging of microelectronics, sensors, and microfluidics

Description

Over the past fifty years, the development of sensors for biological applications has increased dramatically. This rapid growth can be attributed in part to the reduction in feature size, which the electronics industry has pioneered over the same period. The decrease in feature size has led to the production of…

Over the past fifty years, the development of sensors for biological applications has increased dramatically. This rapid growth can be attributed in part to the reduction in feature size, which the electronics industry has pioneered over the same period. The decrease in feature size has led to the production of microscale sensors that are used for sensing applications, ranging from whole-body monitoring down to molecular sensing. Unfortunately, sensors are often developed without regard to how they will be integrated into biological systems. The complexities of integration are underappreciated. Integration involves more than simply making electrical connections. Interfacing microscale sensors with biological environments requires numerous considerations with respect to the creation of compatible packaging, the management of biological reagents, and the act of combining technologies with different dimensions and material properties. Recent advances in microfluidics, especially the proliferation of soft lithography manufacturing methods, have established the groundwork for creating systems that may solve many of the problems inherent to sensor-fluidic interaction. The adaptation of microelectronics manufacturing methods, such as Complementary Metal-Oxide-Semiconductor (CMOS) and Microelectromechanical Systems (MEMS) processes, allows the creation of a complete biological sensing system with integrated sensors and readout circuits. Combining these technologies is an obstacle to forming complete sensor systems. This dissertation presents new approaches for the design, fabrication, and integration of microscale sensors and microelectronics with microfluidics. The work addresses specific challenges, such as combining commercial manufacturing processes into biological systems and developing microscale sensors in these processes. This work is exemplified through a feedback-controlled microfluidic pH system to demonstrate the integration capabilities of microscale sensors for autonomous microenvironment control.

ContributorsWelch, David (Author) / Blain Christen, Jennifer (Thesis advisor) / Muthuswamy, Jitendran (Committee member) / Frakes, David (Committee member) / LaBelle, Jeffrey (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)

Created2012

Filtering by