Matching Items (1,036)
- All Subjects: Computer Science
Typically, the complete loss or severe impairment of a sense such as vision and/or hearing is compensated through sensory substitution, i.e., the use of an alternative sense for receiving the same information. For individuals who are blind or visually impaired, the alternative senses have predominantly been hearing and touch. For movies, visual content has been made accessible to visually impaired viewers through audio descriptions -- an additional narration that describes scenes, the characters involved and other pertinent details. However, as audio descriptions should not overlap with dialogue, sound effects and musical scores, there is limited time to convey information, often resulting in stunted and abridged descriptions that leave out many important visual cues and concepts. This work proposes a promising multimodal approach to sensory substitution for movies by providing complementary information through haptics, pertaining to the positions and movements of actors, in addition to a film's audio description and audio content. In a ten-minute presentation of five movie clips to ten individuals who were visually impaired or blind, the novel methodology was found to provide an almost two time increase in the perception of actors' movements in scenes. Moreover, participants appreciated and found useful the overall concept of providing a visual perspective to film through haptics.
Reverse engineering gene regulatory networks (GRNs) is an important problem in the domain of Systems Biology. Learning GRNs is challenging due to the inherent complexity of the real regulatory networks and the heterogeneity of samples in available biomedical data. Real world biological data are commonly collected from broad surveys (profiling studies) and aggregate highly heterogeneous biological samples. Popular methods to learn GRNs simplistically assume a single universal regulatory network corresponding to available data. They neglect regulatory network adaptation due to change in underlying conditions and cellular phenotype or both. This dissertation presents a novel computational framework to learn common regulatory interactions and networks underlying the different sets of relatively homogeneous samples from real world biological data. The characteristic set of samples/conditions and corresponding regulatory interactions defines the cellular context (context). Context, in this dissertation, represents the deterministic transcriptional activity within the specific cellular regulatory mechanism. The major contributions of this framework include - modeling and learning context specific GRNs; associating enriched samples with contexts to interpret contextual interactions using biological knowledge; pruning extraneous edges from the context-specific GRN to improve the precision of the final GRNs; integrating multisource data to learn inter and intra domain interactions and increase confidence in obtained GRNs; and finally, learning combinatorial conditioning factors from the data to identify regulatory cofactors. The framework, Expattern, was applied to both real world and synthetic data. Interesting insights were obtained into mechanism of action of drugs on analysis of NCI60 drug activity and gene expression data. Application to refractory cancer data and Glioblastoma multiforme yield GRNs that were readily annotated with context-specific phenotypic information. Refractory cancer GRNs also displayed associations between distinct cancers, not observed through only clustering. Performance comparisons on multi-context synthetic data show the framework Expattern performs better than other comparable methods.
Given the process of tumorigenesis, biological signaling pathways have become of interest in the field of oncology. Many of the regulatory mechanisms that are altered in cancer are directly related to signal transduction and cellular communication. Thus, identifying signaling pathways that have become deregulated may provide useful information to better understanding altered regulatory mechanisms within cancer. Many methods that have been created to measure the distinct activity of signaling pathways have relied strictly upon transcription profiles. With advancements in comparative genomic hybridization techniques, copy number data has become extremely useful in providing valuable information pertaining to the genomic landscape of cancer. The purpose of this thesis is to develop a methodology that incorporates both gene expression and copy number data to identify signaling pathways that have become deregulated in cancer. The central idea is that copy number data may significantly assist in identifying signaling pathway deregulation by justifying the aberrant activity being measured in gene expression profiles. This method was then applied to four different subtypes of breast cancer resulting in the identification of signaling pathways associated with distinct functionalities for each of the breast cancer subtypes.
Navigating within non-linear structures is a challenge for all users when the space is large but the problem is most pronounced when the users are blind or visually impaired. Such users access digital content through screen readers like JAWS which read out the text on the screen. However presentation of non-linear narratives in such a manner without visual cues and information about spatial dependencies is very inefficient for such users. The NSDL Science Literacy StrandMaps are visual layouts to help students and teachers browse educational resources. A Strandmap shows relationships between concepts and how they build upon one another across grade levels. NSDL Strandmaps are non-linear narratives which need to be presented to users who are blind in an effective way. A good summary of the Strandmap can give the users an idea about the concepts that are explained in it. This can help them decide whether to view the map or not. In addition, a preview-based navigation mechanism can help users decide which direction they want to take, based on a preview of upcoming content in each direction. Given a non-linear narrative like a Strandmap which has both text and structure, and a word limit w, the goal of this thesis is to find the best way to create its summary. The following approaches are considered: – Purely Text-based Approach using a Multi-document Text Summarizer – Purely Structure-based Approach using PageRank – Approaches Combining both Text and Structure → CUTS-Based Approach (Topic Segmentation) → PageRank with Content Since no reference summaries for such structures were available, user studies were conducted to evaluate these algorithms. PageRank with Content approach performed the best. Another important conclusion was that text and structure are intertwined in a Strandmap by design.
In order to catch the smartest criminals in the world, digital forensics examiners need a means of collaborating and sharing information with each other and outside experts that is not prohibitively difficult. However, standard operating procedures and the rules of evidence generally disallow the use of the collaboration software and techniques that are currently available because they do not fully adhere to the dictated procedures for the handling, analysis, and disclosure of items relating to cases. The aim of this work is to conceive and design a framework that provides a completely new architecture that 1) can perform fundamental functions that are common and necessary to forensic analyses, and 2) is structured such that it is possible to include collaboration-facilitating components without changing the way users interact with the system sans collaboration. This framework is called the Collaborative Forensic Framework (CUFF). CUFF is constructed from four main components: Cuff Link, Storage, Web Interface, and Analysis Block. With the Cuff Link acting as a mediator between components, CUFF is flexible in both the method of deployment and the technologies used in implementation. The details of a realization of CUFF are given, which uses a combination of Java, the Google Web Toolkit, Django with Apache for a RESTful web service, and an Ubuntu Enterprise Cloud using Eucalyptus. The functionality of CUFF's components is demonstrated by the integration of an acquisition script designed for Android OS-based mobile devices that use the YAFFS2 file system. While this work has obvious application to examination labs which work under the mandate of judicial or investigative bodies, security officers at any organization would benefit from the improved ability to cooperate in electronic discovery efforts and internal investigations.
Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms.
Internet sites that support user-generated content, so-called Web 2.0, have become part of the fabric of everyday life in technologically advanced nations. Users collectively spend billions of hours consuming and creating content on social networking sites, weblogs (blogs), and various other types of sites in the United States and around the world. Given the fundamentally emotional nature of humans and the amount of emotional content that appears in Web 2.0 content, it is important to understand how such websites can affect the emotions of users. This work attempts to determine whether emotion spreads through an online social network (OSN). To this end, a method is devised that employs a model based on a general threshold diffusion model as a classifier to predict the propagation of emotion between users and their friends in an OSN by way of mood-labeled blog entries. The model generalizes existing information diffusion models in that the state machine representation of a node is generalized from being binary to having n-states in order to support n class labels necessary to model emotional contagion. In the absence of ground truth, the prediction accuracy of the model is benchmarked with a baseline method that predicts the majority label of a user's emotion label distribution. The model significantly outperforms the baseline method in terms of prediction accuracy. The experimental results make a strong case for the existence of emotional contagion in OSNs in spite of possible alternative arguments such confounding influence and homophily, since these alternatives are likely to have negligible effect in a large dataset or simply do not apply to the domain of human emotions. A hybrid manual/automated method to map mood-labeled blog entries to a set of emotion labels is also presented, which enables the application of the model to a large set (approximately 900K) of blog entries from LiveJournal.
Real-world environments are characterized by non-stationary and continuously evolving data. Learning a classification model on this data would require a framework that is able to adapt itself to newer circumstances. Under such circumstances, transfer learning has come to be a dependable methodology for improving classification performance with reduced training costs and without the need for explicit relearning from scratch. In this thesis, a novel instance transfer technique that adapts a "Cost-sensitive" variation of AdaBoost is presented. The method capitalizes on the theoretical and functional properties of AdaBoost to selectively reuse outdated training instances obtained from a "source" domain to effectively classify unseen instances occurring in a different, but related "target" domain. The algorithm is evaluated on real-world classification problems namely accelerometer based 3D gesture recognition, smart home activity recognition and text categorization. The performance on these datasets is analyzed and evaluated against popular boosting-based instance transfer techniques. In addition, supporting empirical studies, that investigate some of the less explored bottlenecks of boosting based instance transfer methods, are presented, to understand the suitability and effectiveness of this form of knowledge transfer.
Free/Libre Open Source Software (FLOSS) is the product of volunteers collaborating to build software in an open, public manner. The large number of FLOSS projects, combined with the data that is inherently archived with this online process, make studying this phenomenon attractive. Some FLOSS projects are very functional, well-known, and successful, such as Linux, the Apache Web Server, and Firefox. However, for every successful FLOSS project there are 100's of projects that are unsuccessful. These projects fail to attract sufficient interest from developers and users and become inactive or abandoned before useful functionality is achieved. The goal of this research is to better understand the open source development process and gain insight into why some FLOSS projects succeed while others fail. This dissertation presents an agent-based model of the FLOSS development process. The model is built around the concept that projects must manage to attract contributions from a limited pool of participants in order to progress. In the model developer and user agents select from a landscape of competing FLOSS projects based on perceived utility. Via the selections that are made and subsequent contributions, some projects are propelled to success while others remain stagnant and inactive. Findings from a diverse set of empirical studies of FLOSS projects are used to formulate the model, which is then calibrated on empirical data from multiple sources of public FLOSS data. The model is able to reproduce key characteristics observed in the FLOSS domain and is capable of making accurate predictions. The model is used to gain a better understanding of the FLOSS development process, including what it means for FLOSS projects to be successful and what conditions increase the probability of project success. It is shown that FLOSS is a producer-driven process, and project factors that are important for developers selecting projects are identified. In addition, it is shown that projects are sensitive to when core developers make contributions, and the exhibited bandwagon effects mean that some projects will be successful regardless of competing projects. Recommendations for improving software engineering in general based on the positive characteristics of FLOSS are also presented.
Strong communities are important for society. One of the most important community builders, making friends, is poorly supported online. Dating sites support it but in romantic contexts. Other major social networks seem not to encourage it because either their purpose isn't compatible with introducing strangers or the prevalent methods of introduction aren't effective enough to merit use over real word alternatives. This paper presents a novel digital social network emphasizing creating friendships. Research has shown video chat communication can reach in-person levels of trust; coupled with a game environment to ease the discomfort people often have interacting with strangers and a recommendation engine, Zazzer, the presented system, allows people to meet and get to know each other in a manner much more true to real life than traditional methods. Its network also allows players to continue to communicate afterwards. The evaluation looks at real world use, measuring the frequency with which players choose the video chat game versus alternative, more traditional methods of online introduction. It also looks at interactions after the initial meeting to discover how effective video chat games are in creating sticky social connections. After initial use it became apparent a critical mass of users would be necessary to draw strong conclusions, however the collected data seemed to give preliminary support to the idea that video chat games are more effective than traditional ways of meeting online in creating new relationships.