Search Content

Enhancing the usability of complex structured data by supporting keyword searches

Description

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily…

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.

ContributorsLiu, Ziyang (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Jagadish, H V (Committee member) / Arizona State University (Publisher)

Created2011

Association based prioritization of genes

Description

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them…

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. To validate these approaches in a disease-specific context, we built a schizophreniaspecific network based on the inferred associations and performed a comprehensive prioritization of human genes with respect to the disease. These results are expected to be validated empirically, but computational validation using known targets are very positive.

ContributorsLee, Jang (Author) / Gonzalez, Graciela (Thesis advisor) / Ye, Jieping (Committee member) / Davulcu, Hasan (Committee member) / Gallitano-Mendel, Amelia (Committee member) / Arizona State University (Publisher)

Created2011

Materialized views over heterogeneous structured data sources in a distributed event stream processing environment

Description

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query…

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query optimization in a distributed event stream processing framework that supports such applications involving various query expressions for detecting events, monitoring conditions, handling data streams, and querying data. Materialized views store the results of the computed view so that subsequent access to the view retrieves the materialized results, avoiding the cost of recomputing the entire view from base data sources. Using a service-based metadata repository that provides metadata level access to the various language components in the system, a heuristics-based algorithm detects the common subexpressions from the queries represented in a mixed multigraph model over relational and structured XML data sources. These common subexpressions can be relational, XML or a hybrid join over the heterogeneous data sources. This research examines the challenges in the definition and materialization of views when the heterogeneous data sources are retained in their native format, instead of converting the data to a common model. LINQ serves as the materialized view definition language for creating the view definitions. An algorithm is introduced that uses LINQ to create a data structure for the persistence of these hybrid views. Any changes to base data sources used to materialize views are captured and mapped to a delta structure. The deltas are then streamed within the framework for use in the incremental update of the materialized view. Algorithms are presented that use the magic sets query optimization approach to both efficiently materialize the views and to propagate the relevant changes to the views for incremental maintenance. Using representative scenarios over structured heterogeneous data sources, an evaluation of the framework demonstrates an improvement in performance. Thus, defining the LINQ-based materialized views over heterogeneous structured data sources using the detected common subexpressions and incrementally maintaining the views by using magic sets enhances the efficiency of the distributed event stream processing environment.

ContributorsChaudhari, Mahesh Balkrishna (Author) / Dietrich, Suzanne W (Thesis advisor) / Urban, Susan D (Committee member) / Davulcu, Hasan (Committee member) / Chen, Yi (Committee member) / Arizona State University (Publisher)

Created2011

Risk factors, resilient resources, coping & outcomes: a longitudinal model of adaptation to POI

Description

Female infertility can present a significant challenge to quality of life. To date, few, if any investigations have explored the process by which women adapt to premature ovarian insufficiency (POI), a specific type of infertility, over time. The current investigation proposed a bi-dimensional, multi-factor, model of adjustment characterized by the…

Female infertility can present a significant challenge to quality of life. To date, few, if any investigations have explored the process by which women adapt to premature ovarian insufficiency (POI), a specific type of infertility, over time. The current investigation proposed a bi-dimensional, multi-factor, model of adjustment characterized by the identification of six latent factors representing personal attributes (resilience resources and vulnerability), coping (adaptive and maladaptive) and outcomes (distress and wellbeing). Measures were collected over the period of one year; personal attributes were assessed at Time 1, coping at Time 2 and outcomes at Time 3. It was hypothesized that coping factors would mediate associations between personal attributes and outcomes. Confirmatory Factor Analysis (CFA), simple regressions and single mediator models were utilized to test study hypotheses. Overall, with the exception of coping, the factor structure was consistent with predictions. Two empirically derived coping factors, and a single standalone strategy, avoidance, emerged. The first factor, labeled "approach coping" was comprised of strategies directly addressing the experience of infertility. The second was comprised of strategies indicative of "letting go /moving on." Only avoidance significantly mediated the association between vulnerability and distress.

ContributorsDriscoll, Mary (Author) / Davis, Mary C. (Thesis advisor) / Aiken, Leona S. (Committee member) / Luecken, Linda J. (Committee member) / Zautra, Alex J. (Committee member) / Arizona State University (Publisher)

Created2011

A pilot study of the benefits of traditional and mindful community gardening for urban older adults' subjective well-being

Description

The population of older adults and the percentage of people living in urban areas are both increasing in the U.S. Finding ways to enhance city-dwelling, older adults' social integration, cognitive vitality, and connectedness to nature were conceptualized as critical pathways to maximizing their subjective well-being (SWB) and overall health. Past…

The population of older adults and the percentage of people living in urban areas are both increasing in the U.S. Finding ways to enhance city-dwelling, older adults' social integration, cognitive vitality, and connectedness to nature were conceptualized as critical pathways to maximizing their subjective well-being (SWB) and overall health. Past research has found that gardening is associated with increased social contact and reduced risk of dementia, and that higher levels of social support, cognitive functioning, mindfulness, and connectedness to nature are positively related to various aspects of SWB. The present study was a pilot study to examine the feasibility of conducting a randomized, controlled trial of community gardening and to provide an initial assessment of a new intervention--"Mindful Community Gardening," or mindfulness training in the context of gardening. In addition, this study examined whether community gardening, with or without mindfulness training, enhanced SWB among older adults and increased social support, attention and mindfulness, and connectedness to nature. Fifty community-dwelling adults between the ages of 55 and 79 were randomly assigned to one of three groups: Traditional Community Gardening (TCG), Mindful Community Gardening (MCG), or Wait-List Control. The TCG and MCG arms each consisted of two groups of 7 to 10 participants meeting weekly for nine weeks. TCG involved typical gardening activities undertaken collaboratively. MCG involved the same, but with the addition of guided development of non-judgmental, present-focused awareness. There was a statistically significant increase in different aspects of mindfulness for the TCG and the MCG arms. The interventions did not measurably impact social support, attention, or connectedness to nature in this small, high functioning, pilot sample. Qualitative analysis of interview data from 12 participants in the TCG and MCG groups revealed that both groups helped some participants to better cope with adversity. It was concluded that it is feasible to conduct randomized, controlled trials of community gardening with urban older adults, and considerations for implementing such interventions are delineated.

ContributorsOkvat, Heather Audrey (Author) / Zautra, Alex J. (Thesis advisor) / Davis, Mary C. (Committee member) / Knopf, Richard C. (Committee member) / Okun, Morris A. (Committee member) / Arizona State University (Publisher)

Created2011

The classification of domain concepts in object-oriented systems

Description

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software…

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this thesis, we present a technique that facilitates filtering of classes from existing systems at the source level based on their relationship to concepts in the domain via a classification method using machine learning. We showed that concepts can be identified using a machine learning classifier based on source level metrics. We developed an Eclipse plugin to assist with the process of manually classifying Java source code, and collecting metrics and classifications into a standard file format. We developed an Eclipse plugin to act as a concept identifier that visually indicates a class as a domain concept or not. We minimized the size of training sets to ensure a useful approach in practice. This allowed us to determine that a training set of 7:5 to 10% is nearly as effective as a training set representing 50% of the system. We showed that random selection is the most consistent and effective means of selecting a training set. We found that KNN is the most consistent performer among the learning algorithms tested. We determined the optimal feature set for this classification problem. We discussed two possible structures besides a one to one mapping of domain knowledge to implementation. We showed that classes representing more than one concept are simply concepts at differing levels of abstraction. We also discussed composite concepts representing a domain concept implemented by more than one class. We showed that these composite concepts are difficult to detect because the problem is NP-complete.

ContributorsCarey, Maurice (Author) / Colbourn, Charles (Thesis advisor) / Collofello, James (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Application of a temporal database framework for processing event queries

Description

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query…

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query languages have been a subject of research for more than 30 years and are a natural fit for expressing queries that involve a temporal dimension. However, operators developed in this context cannot be directly applied to event streams. The research extends a preexisting relational framework for event stream processing to support temporal queries. The language features and formal semantic extensions to extend the relational framework are identified. The extended framework supports continuous, step-wise evaluation of temporal queries. The incremental evaluation of TEQL operators is formalized to avoid re-computation of previous results. The research includes the development of a prototype that supports the integrated event and temporal query processing framework, with support for incremental evaluation and materialization of intermediate results. TEQL enables reporting temporal data in the output, direct specification of conditions over timestamps, and specification of temporal relational operators. Through the integration of temporal database operators with event languages, a new class of temporal queries is made possible for querying event streams. New features include semantic aggregation, extraction of temporal patterns using set operators, and a more accurate specification of event co-occurrence.

ContributorsShiva, Foruhar Ali (Author) / Urban, Susan D (Thesis advisor) / Chen, Yi (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)

Created2012

When resilience rides the cycle of fatigue: the role of interpersonal enjoyment on daily fatigue in women with fibromyalgia

Description

Fibromyalgia (FM) is a chronic pain condition characterized by debilitating fatigue. This study examined the dynamic relation between interpersonal enjoyment and fatigue in 102 partnered and 74 unpartnered women with FM. Participants provided three daily ratings for 21 days. They rated their fatigue in late morning and at the end…

Fibromyalgia (FM) is a chronic pain condition characterized by debilitating fatigue. This study examined the dynamic relation between interpersonal enjoyment and fatigue in 102 partnered and 74 unpartnered women with FM. Participants provided three daily ratings for 21 days. They rated their fatigue in late morning and at the end of the day. Both partnered and unpartnered participants reported their interpersonal enjoyment in the combined familial, friendship, and work domains (COMBINED domain) in the afternoon. Additionally, partnered participants reported their interpersonal enjoyment in the spousal domain. The study was guided by three hypotheses at the within-person level, based on daily diaries: (1) elevated late morning fatigue would predict diminished afternoon interpersonal enjoyment; (2) diminished interpersonal enjoyment would predict elevated end-of-day fatigue; (3) interpersonal enjoyment would mediate the late morning to end-of-day fatigue relationship. In cross-level models, the study explored whether individual differences (between-person) in late morning fatigue and afternoon interpersonal enjoyment would moderate within-person relations from late morning fatigue to afternoon interpersonal enjoyment, and from afternoon interpersonal enjoyment to end-of-day fatigue. Furthermore, it explored whether the hypothesized relationships at the within-person level would also emerge at the between-person level (between-person mediation models). Multilevel structural equation modeling and multilevel modeling were employed for model testing, separately for partnered and unpartnered participants. Within-person mediation models supported that on high fatigue mornings, afternoon interpersonal enjoyment was dampened in the spousal and combined domains in partnered and unpartnered samples. Moreover, low afternoon interpersonal enjoyment in both the spousal and combined domains predicted elevated end-of-day fatigue. Afternoon interpersonal enjoyment mediated the relationship of late morning to end-of-day fatigue in the combined domain but in not the spousal domain. Cross-level moderation analyses showed that individual differences in afternoon spousal enjoyment moderated the day-to-day relation between afternoon spousal enjoyment and end-of-day fatigue. Finally, the mediational chain was not observed at the between-person level. These findings suggest that preserving interpersonal enjoyment in non-spousal relations limits within-day increases in FM fatigue. They highlight the importance of examining domain-specificity in interpersonal enjoyment when studying fatigue, and suggest that targeting enjoyment in social relations may improve the efficacy of existing treatments.

ContributorsYeung, Wan (Author) / Aiken, Leona S. (Thesis advisor) / Davis, Mary C. (Thesis advisor) / Mackinnon, David P (Committee member) / Zautra, Alex J (Committee member) / Arizona State University (Publisher)

Created2013

Representing, reasoning and answering questions about biological pathways various applications

Description

Biological organisms are made up of cells containing numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted, manifesting as disease symptoms. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for activities including diagnosing diseases and…

Biological organisms are made up of cells containing numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted, manifesting as disease symptoms. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for activities including diagnosing diseases and drug development. Scientists studying these interconnected processes have identified various pathways involved in drug metabolism, diseases, and signal transduction, etc. High-throughput technologies, new algorithms and speed improvements over the last decade have resulted in deeper knowledge about biological systems, leading to more refined pathways. Such pathways tend to be large and complex, making it difficult for an individual to remember all aspects. Thus, computer models are needed to represent and analyze them. The refinement activity itself requires reasoning with a pathway model by posing queries against it and comparing the results against the real biological system. Many existing models focus on structural and/or factoid questions, relying on surface-level information. These are generally not the kind of questions that a biologist may ask someone to test their understanding of biological processes. Examples of questions requiring understanding of biological processes are available in introductory college level biology text books. Such questions serve as a model for the question answering system developed in this thesis. Thus, the main goal of this thesis is to develop a system that allows the encoding of knowledge about biological pathways to answer questions demonstrating understanding of the pathways. To that end, a language is developed to specify a pathway and pose questions against it. Some existing tools are modified and used to accomplish this goal. The utility of the framework developed in this thesis is illustrated with applications in the biological domain. Finally, the question answering system is used in real world applications by extracting pathway knowledge from text and answering questions related to drug development.

ContributorsAnwar, Saadat (Author) / Baral, Chitta (Thesis advisor) / Inoue, Katsumi (Committee member) / Chen, Yi (Committee member) / Davulcu, Hasan (Committee member) / Lee, Joohyung (Committee member) / Arizona State University (Publisher)

Created2014

Somatic ABC's: a theoretical framework for designing, developing and evaluating the building blocks of touch-based information delivery

Description

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's…

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's computerized devices and displays largely engage--have become overloaded, creating possibilities for distractions, delays and high cognitive load; which in turn can lead to a loss of situational awareness, increasing chances for life threatening situations such as texting while driving. Surprisingly, alternative modalities for information delivery have seen little exploration. Touch, in particular, is a promising candidate given that it is our largest sensory organ with impressive spatial and temporal acuity. Although some approaches have been proposed for touch-based information delivery, they are not without limitations including high learning curves, limited applicability and/or limited expression. This is largely due to the lack of a versatile, comprehensive design theory--specifically, a theory that addresses the design of touch-based building blocks for expandable, efficient, rich and robust touch languages that are easy to learn and use. Moreover, beyond design, there is a lack of implementation and evaluation theories for such languages. To overcome these limitations, a unified, theoretical framework, inspired by natural, spoken language, is proposed called Somatic ABC's for Articulating (designing), Building (developing) and Confirming (evaluating) touch-based languages. To evaluate the usefulness of Somatic ABC's, its design, implementation and evaluation theories were applied to create communication languages for two very unique application areas: audio described movies and motor learning. These applications were chosen as they presented opportunities for complementing communication by offloading information, typically conveyed visually and/or aurally, to the skin. For both studies, it was found that Somatic ABC's aided the design, development and evaluation of rich somatic languages with distinct and natural communication units.

ContributorsMcDaniel, Troy Lee (Author) / Panchanathan, Sethuraman (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Santello, Marco (Committee member) / Arizona State University (Publisher)

Created2012

Filtering by