Search Content

Enhancing the usability of complex structured data by supporting keyword searches

Description

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily…

As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data.

ContributorsLiu, Ziyang (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim S (Committee member) / Davulcu, Hasan (Committee member) / Jagadish, H V (Committee member) / Arizona State University (Publisher)

Created2011

Invariant human pose feature extraction for movement recognition and pose estimation

Description

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from…

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from video data by decomposing various key contributing factors, such as pose, view angle, and body shape, in the generation of the image observations. Experimental results have shown that the resulting pose features extracted using the proposed methods exhibit excellent invariance properties to changes in view angles and body shapes. Furthermore, using the proposed invariant multifactor pose features, a suite of simple while effective algorithms have been developed to solve the movement recognition and pose estimation problems. Using these proposed algorithms, excellent human movement analysis results have been obtained, and most of them are superior to those obtained from state-of-the-art algorithms on the same testing datasets. Moreover, a number of key movement analysis challenges, including robust online gesture spotting and multi-camera gesture recognition, have also been addressed in this research. To this end, an online gesture spotting framework has been developed to automatically detect and learn non-gesture movement patterns to improve gesture localization and recognition from continuous data streams using a hidden Markov network. In addition, the optimal data fusion scheme has been investigated for multicamera gesture recognition, and the decision-level camera fusion scheme using the product rule has been found to be optimal for gesture recognition using multiple uncalibrated cameras. Furthermore, the challenge of optimal camera selection in multi-camera gesture recognition has also been tackled. A measure to quantify the complementary strength across cameras has been proposed. Experimental results obtained from a real-life gesture recognition dataset have shown that the optimal camera combinations identified according to the proposed complementary measure always lead to the best gesture recognition results.

ContributorsPeng, Bo (Author) / Qian, Gang (Thesis advisor) / Ye, Jieping (Committee member) / Li, Baoxin (Committee member) / Spanias, Andreas (Committee member) / Arizona State University (Publisher)

Created2011

Avian retinal carotenoid accumulation: ecophysiological constraints and behavioral consequences

Description

The elaborate signals of animals are often costly to produce and maintain, thus communicating reliable information about the quality of an individual to potential mates or competitors. The properties of the sensory systems that receive signals can drive the evolution of these signals and shape their form and function. However,…

The elaborate signals of animals are often costly to produce and maintain, thus communicating reliable information about the quality of an individual to potential mates or competitors. The properties of the sensory systems that receive signals can drive the evolution of these signals and shape their form and function. However, relatively little is known about the ecological and physiological constraints that may influence the development and maintenance of sensory systems. In the house finch (Carpodacus mexicanus) and many other bird species, carotenoid pigments are used to create colorful sexually selected displays, and their expression is limited by health and dietary access to carotenoids. Carotenoids also accumulate in the avian retina, protecting it from photodamage and tuning color vision. Analogous to plumage carotenoid accumulation, I hypothesized that avian vision is subject to environmental and physiological constraints imposed by the acquisition and allocation of carotenoids. To test this hypothesis, I carried out a series of field and captive studies of the house finch to assess natural variation in and correlates of retinal carotenoid accumulation and to experimentally investigate the effects of dietary carotenoid availability, immune activation, and light exposure on retinal carotenoid accumulation. Moreover, through dietary manipulations of retinal carotenoid accumulation, I tested the impacts of carotenoid accumulation on visually mediated foraging and mate choice behaviors. My results indicate that avian retinal carotenoid accumulation is variable and significantly influenced by dietary carotenoid availability and immune system activity. Behavioral studies suggest that retinal carotenoid accumulation influences visual foraging performance and mediates a trade-off between color discrimination and photoreceptor sensitivity under dim-light conditions. Retinal accumulation did not influence female choice for male carotenoid-based coloration, indicating that a direct link between retinal accumulation and sexual selection for coloration is unlikely. However, retinal carotenoid accumulation in males was positively correlated with their plumage coloration. Thus, carotenoid-mediated visual health and performance or may be part of the information encoded in sexually selected coloration.

ContributorsToomey, Matthew (Author) / McGraw, Kevin J. (Thesis advisor) / Deviche, Pierre (Committee member) / Smith, Brian (Committee member) / Rutowski, Ronald (Committee member) / Verrelli, Brian (Committee member) / Arizona State University (Publisher)

Created2011

Rationalization of protein conformational dynamics by molecular simulations: studies of the ERK2 kinase and the LAC repressor headpiece-O1 operator complex

Description

Molecular dynamics (MD) simulations provide a particularly useful approach to understanding conformational change in biomolecular systems. MD simulations provide an atomistic, physics-based description of the motions accessible to biomolecular systems on the pico- to micro-second timescale, yielding important insight into the free energy of the system, the dynamical stability of…

Molecular dynamics (MD) simulations provide a particularly useful approach to understanding conformational change in biomolecular systems. MD simulations provide an atomistic, physics-based description of the motions accessible to biomolecular systems on the pico- to micro-second timescale, yielding important insight into the free energy of the system, the dynamical stability of contacts and the role of correlated motions in directing the motions of the system. In this thesis, I use molecular dynamics simulations to provide molecular mechanisms that rationalize structural, thermodynamic, and mutation data on the interactions between the lac repressor headpiece and its O1 operator DNA as well as the ERK2 protein kinase. I performed molecular dynamics simulations of the lac repressor headpiece - O1 operator complex at the natural angle as well as at under- and overbent angles to assess the factors that determine the natural DNA bending angle. I find both energetic and entropic factors contribute to recognition of the natural angle. At the natural angle the energy of the system is minimized by optimization of protein-DNA contacts and the entropy of the system is maximized by release of water from the protein-DNA interface and decorrelation of protein motions. To identify the mechanism by which mutations lead to auto-activation of ERK2, I performed a series of molecular dynamics simulations of ERK1/2 in various stages of activation as well as the constitutively active Q103A, I84A, L73P and R65S ERK2 mutants. My simulations indicate the importance of domain closure for auto-activation and activity regulation. My results enable me to predict two loss-of-function mutants of ERK2, G83A and Q64C, that have been confirmed in experiments by collaborators. One of the powerful capabilities of MD simulations in biochemistry is the ability to find low free energy pathways that connect and explain disparate structural data on biomolecular systems. An extention of the targeted molecular dynamics technique using constraints on internal coordinates will be presented and evaluated. The method gives good results for the alanine dipeptide, but breaks down when applied to study conformational changes in GroEL and adenylate kinase.

ContributorsBarr, Daniel Alan (Author) / van der Vaart, Arjan (Thesis advisor) / Matyushov, Dmitry (Committee member) / Wolf, George (Committee member) / Shumway, John (Committee member) / Arizona State University (Publisher)

Created2011

Mapping the RNA-protein interface in telomerase RNP

Description

In the 1970s James Watson recognized the inability of conventional DNA replication machinery to replicate the extreme termini of chromosomes known as telomeres. This inability is due to the requirement of a building block primer and was termed the end replication problem. Telomerase is nature's answer to the…

In the 1970s James Watson recognized the inability of conventional DNA replication machinery to replicate the extreme termini of chromosomes known as telomeres. This inability is due to the requirement of a building block primer and was termed the end replication problem. Telomerase is nature's answer to the end replication problem. Telomerase is a ribonucleoprotein which extends telomeres through reverse transcriptase activity by reiteratively copying a short intrinsic RNA sequence to generate 3' telomeric extensions. Telomeres protect chromosomes from erosion of coding genes during replication, as well as differentiate native chromosome ends from double stranded breaks. However, controlled erosion of telomeres functions as a naturally occurring molecular clock limiting the replicative capacity of cells. Telomerase is over activated in many cancers, while inactivation leads to multiple lifespan limiting human diseases. In order to further study the interaction between telomerase RNA (TR) and telomerase reverse transcriptase protein (TERT), vertebrate TERT fragments were screened for solubility and purity following bacterial expression. Soluble fragments of medaka TERT including the RNA binding domain (TRBD) were identified. Recombinant medaka TRBD binds specifically to telomerase RNA CR4/CR5 region. Ribonucleotide and amino acid pairs in close proximity within the medaka telomerase RNA-protein complex were identified using photo-activated cross-linking in conjunction with mass spectrometry. The identified cross-linking amino acids were mapped on known crystal structures of TERTs to reveal the RNA interaction interface of TRBD. The identification of this RNA TERT interaction interface furthers the understanding of the telomerase complex at a molecular level and could be used for the targeted interruption of the telomerase complex as a potential cancer treatment.

ContributorsBley, Christopher James (Author) / Chen, Julian (Thesis advisor) / Allen, James (Committee member) / Ghirlanda, Giovanna (Committee member) / Arizona State University (Publisher)

Created2011

Association based prioritization of genes

Description

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them…

Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. Genes have widely different pertinences to the etiology and pathology of diseases. Thus, they can be ranked according to their disease-significance on a genomic scale, which is the subject of gene prioritization. Given a set of genes known to be related to a disease, it is reasonable to use them as a basis to determine the significance of other candidate genes, which will then be ranked based on the association they exhibit with respect to the given set of known genes. Experimental and computational data of various kinds have different reliability and relevance to a disease under study. This work presents a gene prioritization method based on integrated biological networks that incorporates and models the various levels of relevance and reliability of diverse sources. The method is shown to achieve significantly higher performance as compared to two well-known gene prioritization algorithms. Essentially, no bias in the performance was seen as it was applied to diseases of diverse ethnology, e.g., monogenic, polygenic and cancer. The method was highly stable and robust against significant levels of noise in the data. Biological networks are often sparse, which can impede the operation of associationbased gene prioritization algorithms such as the one presented here from a computational perspective. As a potential approach to overcome this limitation, we explore the value that transcription factor binding sites can have in elucidating suitable targets. Transcription factors are needed for the expression of most genes, especially in higher organisms and hence genes can be associated via their genetic regulatory properties. While each transcription factor recognizes specific DNA sequence patterns, such patterns are mostly unknown for many transcription factors. Even those that are known are inconsistently reported in the literature, implying a potentially high level of inaccuracy. We developed computational methods for prediction and improvement of transcription factor binding patterns. Tests performed on the improvement method by employing synthetic patterns under various conditions showed that the method is very robust and the patterns produced invariably converge to nearly identical series of patterns. Preliminary tests were conducted to incorporate knowledge from transcription factor binding sites into our networkbased model for prioritization, with encouraging results. To validate these approaches in a disease-specific context, we built a schizophreniaspecific network based on the inferred associations and performed a comprehensive prioritization of human genes with respect to the disease. These results are expected to be validated empirically, but computational validation using known targets are very positive.

ContributorsLee, Jang (Author) / Gonzalez, Graciela (Thesis advisor) / Ye, Jieping (Committee member) / Davulcu, Hasan (Committee member) / Gallitano-Mendel, Amelia (Committee member) / Arizona State University (Publisher)

Created2011

Privacy preserving service discovery and ranking for multiple user QoS requirements in service-based software systems

Description

Service based software (SBS) systems are software systems consisting of services based on the service oriented architecture (SOA). Each service in SBS systems provides partial functionalities and collaborates with other services as workflows to provide the functionalities required by the systems. These services may be developed and/or owned by different…

Service based software (SBS) systems are software systems consisting of services based on the service oriented architecture (SOA). Each service in SBS systems provides partial functionalities and collaborates with other services as workflows to provide the functionalities required by the systems. These services may be developed and/or owned by different entities and physically distributed across the Internet. Compared with traditional software system components which are usually specifically designed for the target systems and bound tightly, the interfaces of services and their communication protocols are standardized, which allow SBS systems to support late binding, provide better interoperability, better flexibility in dynamic business logics, and higher fault tolerance. The development process of SBS systems can be divided to three major phases: 1) SBS specification, 2) service discovery and matching, and 3) service composition and workflow execution. This dissertation focuses on the second phase, and presents a privacy preserving service discovery and ranking approach for multiple user QoS requirements. This approach helps service providers to register services and service users to search services through public, but untrusted service directories with the protection of their privacy against the service directories. The service directories can match the registered services with service requests, but do not learn any information about them. Our approach also enforces access control on services during the matching process, which prevents unauthorized users from discovering services. After the service directories match a set of services that satisfy the service users' functionality requirements, the service discovery approach presented in this dissertation further considers service users' QoS requirements in two steps. First, this approach optimizes services' QoS by making tradeoff among various QoS aspects with users' QoS requirements and preferences. Second, this approach ranks services based on how well they satisfy users' QoS requirements to help service users select the most suitable service to develop their SBSs.

ContributorsYin, Yin (Author) / Yau, Stephen S. (Thesis advisor) / Candan, Kasim (Committee member) / Dasgupta, Partha (Committee member) / Santanam, Raghu (Committee member) / Arizona State University (Publisher)

Created2011

Life of photosynthetic complexes in the cyanobacterium Synechocystis sp. PCC 6803

Description

The cyanobacterium Synechocystis sp. PCC 6803 performs oxygenic photosynthesis. Light energy conversion in photosynthesis takes place in photosystem I (PSI) and photosystem II (PSII) that contain chlorophyll, which absorbs light energy that is utilized as a driving force for photosynthesis. However, excess light energy may lead to formation of reactive…

The cyanobacterium Synechocystis sp. PCC 6803 performs oxygenic photosynthesis. Light energy conversion in photosynthesis takes place in photosystem I (PSI) and photosystem II (PSII) that contain chlorophyll, which absorbs light energy that is utilized as a driving force for photosynthesis. However, excess light energy may lead to formation of reactive oxygen species that cause damage to photosynthetic complexes, which subsequently need repair or replacement. To gain insight in the degradation/biogenesis dynamics of the photosystems, the lifetimes of photosynthetic proteins and chlorophyll were determined by a combined stable-isotope (15N) and mass spectrometry method. The lifetimes of PSII and PSI proteins ranged from 1-33 and 30-75 hours, respectively. Interestingly, chlorophyll had longer lifetimes than the chlorophyll-binding proteins in these photosystems. Therefore, photosynthetic proteins turn over and are replaced independently from each other, and chlorophyll is recycled from the damaged chlorophyll-binding proteins. In Synechocystis, there are five small Cab-like proteins (SCPs: ScpA-E) that share chlorophyll a/b-binding motifs with LHC proteins in plants. SCPs appear to transiently bind chlorophyll and to regulate chlorophyll biosynthesis. In this study, the association of ScpB, ScpC, and ScpD with damaged and repaired PSII was demonstrated. Moreover, in a mutant lacking SCPs, most PSII protein lifetimes were unaffected but the lifetime of chlorophyll was decreased, and one of the nascent PSII complexes was missing. SCPs appear to bind PSII chlorophyll while PSII is repaired, and SCPs stabilize nascent PSII complexes. Furthermore, aminolevulinic acid biosynthesis, an early step of chlorophyll biosynthesis, was impaired in the absence of SCPs, so that the amount of chlorophyll in the cells was reduced. Finally, a deletion mutation was introduced into the sll1906 gene, encoding a member of the putative bacteriochlorophyll delivery (BCD) protein family. The Sll1906 sequence contains possible chlorophyll-binding sites, and its homolog in purple bacteria functions in proper assembly of light-harvesting complexes. However, the sll1906 deletion did not affect chlorophyll degradation/biosynthesis and photosystem assembly. Other (parallel) pathways may exist that may fully compensate for the lack of Sll1906. This study has highlighted the dynamics of photosynthetic complexes in their biogenesis and turnover and the coordination between synthesis of chlorophyll and photosynthetic proteins.

ContributorsYao, Cheng I Daniel (Author) / Vermaas, Wim (Thesis advisor) / Fromme, Petra (Committee member) / Roberson, Robert (Committee member) / Webber, Andrew (Committee member) / Arizona State University (Publisher)

Created2011

Predicting creativity in the wild: experience sampling method and sociometric modeling of movement and face-to-face interactions in teams

Description

With the rapid growth of mobile computing and sensor technology, it is now possible to access data from a variety of sources. A big challenge lies in linking sensor based data with social and cognitive variables in humans in real world context. This dissertation explores the relationship between creativity in…

With the rapid growth of mobile computing and sensor technology, it is now possible to access data from a variety of sources. A big challenge lies in linking sensor based data with social and cognitive variables in humans in real world context. This dissertation explores the relationship between creativity in teamwork, and team members' movement and face-to-face interaction strength in the wild. Using sociometric badges (wearable sensors), electronic Experience Sampling Methods (ESM), the KEYS team creativity assessment instrument, and qualitative methods, three research studies were conducted in academic and industry R&D; labs. Sociometric badges captured movement of team members and face-to-face interaction between team members. KEYS scale was implemented using ESM for self-rated creativity and expert-coded creativity assessment. Activities (movement and face-to-face interaction) and creativity of one five member and two seven member teams were tracked for twenty five days, eleven days, and fifteen days respectively. Day wise values of movement and face-to-face interaction for participants were mean split categorized as creative and non-creative using self- rated creativity measure and expert-coded creativity measure. Paired-samples t-tests [t(36) = 3.132, p < 0.005; t(23) = 6.49 , p < 0.001] confirmed that average daily movement energy during creative days (M = 1.31, SD = 0.04; M = 1.37, SD = 0.07) was significantly greater than the average daily movement of non-creative days (M = 1.29, SD = 0.03; M = 1.24, SD = 0.09). The eta squared statistic (0.21; 0.36) indicated a large effect size. A paired-samples t-test also confirmed that face-to-face interaction tie strength of team members during creative days (M = 2.69, SD = 4.01) is significantly greater [t(41) = 2.36, p < 0.01] than the average face-to-face interaction tie strength of team members for non-creative days (M = 0.9, SD = 2.1). The eta squared statistic (0.11) indicated a large effect size. The combined approach of principal component analysis (PCA) and linear discriminant analysis (LDA) conducted on movement and face-to-face interaction data predicted creativity with 87.5% and 91% accuracy respectively. This work advances creativity research and provides a foundation for sensor based real-time creativity support tools for teams.

ContributorsTripathi, Priyamvada (Author) / Burleson, Winslow (Thesis advisor) / Liu, Huan (Committee member) / VanLehn, Kurt (Committee member) / Pentland, Alex (Committee member) / Arizona State University (Publisher)

Created2011

Mining semantics from low-level features in multimedia computing

Description

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or…

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or low-level data entities. Also, additional domain knowledge may often be indispensable for uncovering the underlying semantics, but in most cases such domain knowledge is not readily available from the acquired media streams. Thus, making use of various types of contextual information and leveraging corresponding domain knowledge are vital for effectively associating high-level semantics with low-level signals with higher accuracies in multimedia computing problems. In this work, novel computational methods are explored and developed for incorporating contextual information/domain knowledge in different forms for multimedia computing and pattern recognition problems. Specifically, a novel Bayesian approach with statistical-sampling-based inference is proposed for incorporating a special type of domain knowledge, spatial prior for the underlying shapes; cross-modality correlations via Kernel Canonical Correlation Analysis is explored and the learnt space is then used for associating multimedia contents in different forms; model contextual information as a graph is leveraged for regulating interactions among high-level semantic concepts (e.g., category labels), low-level input signal (e.g., spatial/temporal structure). Four real-world applications, including visual-to-tactile face conversion, photo tag recommendation, wild web video classification and unconstrained consumer video summarization, are selected to demonstrate the effectiveness of the approaches. These applications range from classic research challenges to emerging tasks in multimedia computing. Results from experiments on large-scale real-world data with comparisons to other state-of-the-art methods and subjective evaluations with end users confirmed that the developed approaches exhibit salient advantages, suggesting that they are promising for leveraging contextual information/domain knowledge for a wide range of multimedia computing and pattern recognition problems.

ContributorsWang, Zhesheng (Author) / Li, Baoxin (Thesis advisor) / Sundaram, Hari (Committee member) / Qian, Gang (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2011

Filtering by