Search Content

Enhancing the usability of complex structured data by supporting keyword searches

Description

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily…

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.

ContributorsLiu, Ziyang (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Jagadish, H V (Committee member) / Arizona State University (Publisher)

Created2011

Association based prioritization of genes

Description

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them…

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. To validate these approaches in a disease-specific context, we built a schizophreniaspecific network based on the inferred associations and performed a comprehensive prioritization of human genes with respect to the disease. These results are expected to be validated empirically, but computational validation using known targets are very positive.

ContributorsLee, Jang (Author) / Gonzalez, Graciela (Thesis advisor) / Ye, Jieping (Committee member) / Davulcu, Hasan (Committee member) / Gallitano-Mendel, Amelia (Committee member) / Arizona State University (Publisher)

Created2011

Humanitarian aid is never a crime: a study of one local public's attempt to negotiate rhetorical agency with the state

Description

At its core, this dissertation is a study of how one group of ordinary people attempted to make change in their local and national community by reframing a public debate. Since 1993, over five thousand undocumented migrants have died, mostly of dehydration, while attempting to cross the US/Mexico border. Volunteers…

At its core, this dissertation is a study of how one group of ordinary people attempted to make change in their local and national community by reframing a public debate. Since 1993, over five thousand undocumented migrants have died, mostly of dehydration, while attempting to cross the US/Mexico border. Volunteers for No More Deaths (NMD), a humanitarian group in Tucson, hike the remote desert trails of the southern Arizona desert and provide food, water, and first aid to undocumented migrants in medical distress. They believe that their actions reduce suffering and deaths in the desert. On December 4, 2008, Walt Staton, a NMD volunteer placed multiple one-gallon jugs of water on a known migrant trail, and a Fish and Wildlife officer on the Buenos Aires National Wildlife Refuge near Arivaca, Arizona cited him for littering. Staton refused to pay the fine, believing that he was providing life-saving humanitarian aid, and was taken to court as a result. His trial from June 1-3, 2009 is the main focus of this dissertation. The dissertation begins by tracing the history of the rhetorical marker "illegal" and its role in the deaths of thousands of "illegal" immigrants. Then, it outlines the history of NMD, from its roots in the Sanctuary Movement to its current operation as a counterpublic discursively subverting the state. Next, it examines Staton's trial as a postmodern rhetorical situation, where subjects negotiate their rhetorical agency with the state. Finally, it measures the rhetorical effect of NMD's actions by tracing humanitarian and human rights ideographs in online discussion boards before and after Staton's sentencing. The study finds that despite situational restrictions, as the postmodern critique suggests, subjects are still able to identify and engage with rhetorical opportunities, and in doing so can still subvert the state.

ContributorsAccardi, Steven (Author) / Daly Goggin, Maureen (Thesis advisor) / Miller, Keith (Committee member) / Long, Elenore (Committee member) / Arizona State University (Publisher)

Created2011

Materialized views over heterogeneous structured data sources in a distributed event stream processing environment

Description

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query…

Data-driven applications are becoming increasingly complex with support for processing events and data streams in a loosely-coupled distributed environment, providing integrated access to heterogeneous data sources such as relational databases and XML documents. This dissertation explores the use of materialized views over structured heterogeneous data sources to support multiple query optimization in a distributed event stream processing framework that supports such applications involving various query expressions for detecting events, monitoring conditions, handling data streams, and querying data. Materialized views store the results of the computed view so that subsequent access to the view retrieves the materialized results, avoiding the cost of recomputing the entire view from base data sources. Using a service-based metadata repository that provides metadata level access to the various language components in the system, a heuristics-based algorithm detects the common subexpressions from the queries represented in a mixed multigraph model over relational and structured XML data sources. These common subexpressions can be relational, XML or a hybrid join over the heterogeneous data sources. This research examines the challenges in the definition and materialization of views when the heterogeneous data sources are retained in their native format, instead of converting the data to a common model. LINQ serves as the materialized view definition language for creating the view definitions. An algorithm is introduced that uses LINQ to create a data structure for the persistence of these hybrid views. Any changes to base data sources used to materialize views are captured and mapped to a delta structure. The deltas are then streamed within the framework for use in the incremental update of the materialized view. Algorithms are presented that use the magic sets query optimization approach to both efficiently materialize the views and to propagate the relevant changes to the views for incremental maintenance. Using representative scenarios over structured heterogeneous data sources, an evaluation of the framework demonstrates an improvement in performance. Thus, defining the LINQ-based materialized views over heterogeneous structured data sources using the detected common subexpressions and incrementally maintaining the views by using magic sets enhances the efficiency of the distributed event stream processing environment.

ContributorsChaudhari, Mahesh Balkrishna (Author) / Dietrich, Suzanne W (Thesis advisor) / Urban, Susan D (Committee member) / Davulcu, Hasan (Committee member) / Chen, Yi (Committee member) / Arizona State University (Publisher)

Created2011

The classification of domain concepts in object-oriented systems

Description

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software…

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this thesis, we present a technique that facilitates filtering of classes from existing systems at the source level based on their relationship to concepts in the domain via a classification method using machine learning. We showed that concepts can be identified using a machine learning classifier based on source level metrics. We developed an Eclipse plugin to assist with the process of manually classifying Java source code, and collecting metrics and classifications into a standard file format. We developed an Eclipse plugin to act as a concept identifier that visually indicates a class as a domain concept or not. We minimized the size of training sets to ensure a useful approach in practice. This allowed us to determine that a training set of 7:5 to 10% is nearly as effective as a training set representing 50% of the system. We showed that random selection is the most consistent and effective means of selecting a training set. We found that KNN is the most consistent performer among the learning algorithms tested. We determined the optimal feature set for this classification problem. We discussed two possible structures besides a one to one mapping of domain knowledge to implementation. We showed that classes representing more than one concept are simply concepts at differing levels of abstraction. We also discussed composite concepts representing a domain concept implemented by more than one class. We showed that these composite concepts are difficult to detect because the problem is NP-complete.

ContributorsCarey, Maurice (Author) / Colbourn, Charles (Thesis advisor) / Collofello, James (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Application of a temporal database framework for processing event queries

Description

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query…

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query languages have been a subject of research for more than 30 years and are a natural fit for expressing queries that involve a temporal dimension. However, operators developed in this context cannot be directly applied to event streams. The research extends a preexisting relational framework for event stream processing to support temporal queries. The language features and formal semantic extensions to extend the relational framework are identified. The extended framework supports continuous, step-wise evaluation of temporal queries. The incremental evaluation of TEQL operators is formalized to avoid re-computation of previous results. The research includes the development of a prototype that supports the integrated event and temporal query processing framework, with support for incremental evaluation and materialization of intermediate results. TEQL enables reporting temporal data in the output, direct specification of conditions over timestamps, and specification of temporal relational operators. Through the integration of temporal database operators with event languages, a new class of temporal queries is made possible for querying event streams. New features include semantic aggregation, extraction of temporal patterns using set operators, and a more accurate specification of event co-occurrence.

ContributorsShiva, Foruhar Ali (Author) / Urban, Susan D (Thesis advisor) / Chen, Yi (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)

Created2012

Representing, reasoning and answering questions about biological pathways various applications

Description

Biological organisms are made up of cells containing numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted, manifesting as disease symptoms. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for activities including diagnosing diseases and…

Biological organisms are made up of cells containing numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted, manifesting as disease symptoms. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for activities including diagnosing diseases and drug development. Scientists studying these interconnected processes have identified various pathways involved in drug metabolism, diseases, and signal transduction, etc. High-throughput technologies, new algorithms and speed improvements over the last decade have resulted in deeper knowledge about biological systems, leading to more refined pathways. Such pathways tend to be large and complex, making it difficult for an individual to remember all aspects. Thus, computer models are needed to represent and analyze them. The refinement activity itself requires reasoning with a pathway model by posing queries against it and comparing the results against the real biological system. Many existing models focus on structural and/or factoid questions, relying on surface-level information. These are generally not the kind of questions that a biologist may ask someone to test their understanding of biological processes. Examples of questions requiring understanding of biological processes are available in introductory college level biology text books. Such questions serve as a model for the question answering system developed in this thesis. Thus, the main goal of this thesis is to develop a system that allows the encoding of knowledge about biological pathways to answer questions demonstrating understanding of the pathways. To that end, a language is developed to specify a pathway and pose questions against it. Some existing tools are modified and used to accomplish this goal. The utility of the framework developed in this thesis is illustrated with applications in the biological domain. Finally, the question answering system is used in real world applications by extracting pathway knowledge from text and answering questions related to drug development.

ContributorsAnwar, Saadat (Author) / Baral, Chitta (Thesis advisor) / Inoue, Katsumi (Committee member) / Chen, Yi (Committee member) / Davulcu, Hasan (Committee member) / Lee, Joohyung (Committee member) / Arizona State University (Publisher)

Created2014

Prioritizing phronesis: theorizing change, taking action, inventing possibilities with the Sudanese diaspora in Phoenix

Description

This project draws on sociocognitive rhetoric to ask, How, in complex situations not of our making, do we determine what needs to be done and how to leverage available means for the health of our communities and institutions? The project pulls together rhetorical concepts of the stochastic arts (those that…

This project draws on sociocognitive rhetoric to ask, How, in complex situations not of our making, do we determine what needs to be done and how to leverage available means for the health of our communities and institutions? The project pulls together rhetorical concepts of the stochastic arts (those that demand the most precise, careful planning in the least predictable places) and techne (problem-solving tools that transform limits and barriers into possibilities) to forward a stochastic techne that grounds contemplative social action at the intersection of invention and intervention and mastery and failure in real time, under constraints we can't control and outcomes we can't predict. Based on 18 months of fieldwork with the Sudanese refugee diaspora in Phoenix, I offer a method for engaging in postmodern phronesis with community partners in four ways: 1) Explanations and examples of public listening and situational mapping 2) Narratives that elucidate the stochastic techne, a heuristic for determining and testing wise rhetorical action 3) Principles for constructing mutually collaborative, mutually beneficial community-university/ community-school partnerships for jointly addressing real-world issues that matter in the places where we live 4) Descriptions and explanations that ground the hard rhetorical work of inventing new paths and destinations as some of the Sudanese women construct hybridized identities and models of social entrepreneurship that resist aid-to-Africa discourse based on American paternalism and humanitarianism and re-cast themselves as micro-financers of innovative work here and in Southern Sudan. Finally, the project pulls back from the Sudanese to consider implications for re-figuring secondary English education around phronesis. Here, I offer a framework for teachers to engage in the real work of problem-posing that aims - as Django Paris calls us - to get something done by confronting the issues that confront our communities. Grounded in classroom instruction, the chapter provides tools for scaffolding public listening, multi-voiced inquiries, and phronesis with and for local publics. I conclude by calling for English education to abandon all pretense of being a predictive science and to instead embrace productive knowledge-making and the rhetorical work of phronesis as the heart of secondary English studies.

ContributorsClifton, Jennifer (Author) / Long, Elenore (Thesis advisor) / Gee, James Paul (Committee member) / Paris, Django (Committee member) / Warriner, Doris (Committee member) / Arizona State University (Publisher)

Created2012

Reframing the problem of difference: Lillian Smith and hierarchical politics of difference

Description

ABSTRACT For many years, difference scholars, such as Cornel West, Iris Marion Young, and Janet Atwill have been reminding humanities scholars that if social equity is ever to be realized, difference needs to be reconfigured and reframed. As Janet Atwill puts it, "difference can no longer be the anomaly, the…

ABSTRACT For many years, difference scholars, such as Cornel West, Iris Marion Young, and Janet Atwill have been reminding humanities scholars that if social equity is ever to be realized, difference needs to be reconfigured and reframed. As Janet Atwill puts it, "difference can no longer be the anomaly, the enemy, or the problem to be solved. Difference is the condition" (212). While these scholars insightfully recognize that difference needs to be accepted, welcomed and loved rather than merely tolerated, they have not sufficiently addressed the perceptual change that must occur worldwide if difference as an intrinsic underlying condition of human existence is to be embraced. This project provides a point of departure for carrying out such a dramatic epistemic change by arguing that hierarchical thinking, not difference, is the real agent underwriting societal violence and discord. Hierarchical thinking delineates a more appropriate critical space than does difference for social justice inquiry, invention, and intervention. This project also rhetorically theorizes the realm of intersubjectivity and provides two novel contributions to contemporary rhetorical theory: 1) privilege as a rhetorical construct and 2) the untapped inventional potential of the postmodern understanding of intersubjectivity. To illustrate the embodied and performative aspects of hierarchical thinking, this work draws upon the writings of Lillian Smith, a white southerner (1897-1966) whose descriptive analyses of the Jim Crow South allude to large systems of privilege of which Jim Crow is merely representational. Illustrating the invidious nexus of privilege, Smith's writings describe the ways in which individuals embody and perform practices of exclusion and hate to perpetuate larger systems of privilege. Smith shows how privilege operates much as gender and power--fluidly and variously and dependent upon context. Viewing privilege as a rhetorical construct, operating dynamically, always in flux and at play, provides rhetoricians with a theoretically important move that un-yokes privilege from specific identities (e.g., white privilege). When viewed through this more dynamic and precise lens, we can readily perceive how privilege functions as a colonizing, ubiquitously learned, and variegated rhetorical practice of subordination and domination that, as a frame of analysis, offers a more fluid and accurate perspective than identity categories provide for discussions of oppression, social justice, and democratic engagement.

ContributorsHoliday, Judy (Author) / Goggin, Maureen D (Thesis advisor) / Long, Elenore (Committee member) / Miller, Keith (Committee member) / Arizona State University (Publisher)

Created2012

Otherworldly figures: rhetoric, representation and the public performance of femininity in nineteenth-century spirit mediums' autobiographies

Description

This dissertation theorizes nineteenth-century public performance of spiritual media as being inherent to the production of autobiography itself. Too often, dominant social discourses are cast as being singular cultural phenomena, but analyzing the rhetorical strategies of women attempting to access public spheres reveals fractures in what would otherwise appear to…

This dissertation theorizes nineteenth-century public performance of spiritual media as being inherent to the production of autobiography itself. Too often, dominant social discourses are cast as being singular cultural phenomena, but analyzing the rhetorical strategies of women attempting to access public spheres reveals fractures in what would otherwise appear to be a monolithic patriarchal discourse. These women's resistant performances reap the benefits of a fractured discourse to reveal a multiplicity of alternative discourses that can be accessed and leveraged to gain social power. By examining the phenomena of four nineteenth- century Spiritualists' mediumship from a rhetorical perspective, this study considers how female spirit mediums used their autobiographies to operate as discursive spaces mediating between private and public spheres; how female mediums constructed themselves in the public sphere as women and as spiritual authorities; how they negotiated entry into volatile and unpredictable publics; how they conceived of the vulnerability of the female body in the public sphere; and how they coped with complications inherent to Victorian era constructions of feminine corporeality. In conclusion, this dissertation offers a highly situated performative theory of subaltern publicity.

ContributorsLowry, Elizabeth (Author) / Daly Goggin, Maureen (Thesis advisor) / Long, Elenore (Committee member) / Miller, Keith (Committee member) / Arizona State University (Publisher)

Created2012

Filtering by