Matching Items (89)

Filtering by

Clear all filters

149503-Thumbnail Image.png

Stereo based visual odometry

Description

The exponential rise in unmanned aerial vehicles has necessitated the need for accurate pose estimation under any extreme conditions. Visual Odometry (VO) is the estimation of position and orientation of a vehicle based on analysis of a sequence of images

The exponential rise in unmanned aerial vehicles has necessitated the need for accurate pose estimation under any extreme conditions. Visual Odometry (VO) is the estimation of position and orientation of a vehicle based on analysis of a sequence of images captured from a camera mounted on it. VO offers a cheap and relatively accurate alternative to conventional odometry techniques like wheel odometry, inertial measurement systems and global positioning system (GPS). This thesis implements and analyzes the performance of a two camera based VO called Stereo based visual odometry (SVO) in presence of various deterrent factors like shadows, extremely bright outdoors, wet conditions etc... To allow the implementation of VO on any generic vehicle, a discussion on porting of the VO algorithm to android handsets is presented too. The SVO is implemented in three steps. In the first step, a dense disparity map for a scene is computed. To achieve this we utilize sum of absolute differences technique for stereo matching on rectified and pre-filtered stereo frames. Epipolar geometry is used to simplify the matching problem. The second step involves feature detection and temporal matching. Feature detection is carried out by Harris corner detector. These features are matched between two consecutive frames using the Lucas-Kanade feature tracker. The 3D co-ordinates of these matched set of features are computed from the disparity map obtained from the first step and are mapped into each other by a translation and a rotation. The rotation and translation is computed using least squares minimization with the aid of Singular Value Decomposition. Random Sample Consensus (RANSAC) is used for outlier detection. This comprises the third step. The accuracy of the algorithm is quantified based on the final position error, which is the difference between the final position computed by the SVO algorithm and the final ground truth position as obtained from the GPS. The SVO showed an error of around 1% under normal conditions for a path length of 60 m and around 3% in bright conditions for a path length of 130 m. The algorithm suffered in presence of shadows and vibrations, with errors of around 15% and path lengths of 20 m and 100 m respectively.

Contributors

Agent

Created

Date Created
2010

151716-Thumbnail Image.png

Batch mode active learning for multimedia pattern recognition

Description

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities of unlabeled data and are effective in reducing human labeling effort in inducing classification models. To utilize the possible presence of multiple labeling agents, there have been attempts towards a batch mode form of active learning, where a batch of data instances is selected simultaneously for manual annotation. This dissertation is aimed at the development of novel batch mode active learning algorithms to reduce manual effort in training classification models in real world multimedia pattern recognition applications. Four major contributions are proposed in this work: $(i)$ a framework for dynamic batch mode active learning, where the batch size and the specific data instances to be queried are selected adaptively through a single formulation, based on the complexity of the data stream in question, $(ii)$ a batch mode active learning strategy for fuzzy label classification problems, where there is an inherent imprecision and vagueness in the class label definitions, $(iii)$ batch mode active learning algorithms based on convex relaxations of an NP-hard integer quadratic programming (IQP) problem, with guaranteed bounds on the solution quality and $(iv)$ an active matrix completion algorithm and its application to solve several variants of the active learning problem (transductive active learning, multi-label active learning, active feature acquisition and active learning for regression). These contributions are validated on the face recognition and facial expression recognition problems (which are commonly encountered in real world applications like robotics, security and assistive technology for the blind and the visually impaired) and also on collaborative filtering applications like movie recommendation.

Contributors

Agent

Created

Date Created
2013

151926-Thumbnail Image.png

Building adaptive computational systems for physiological and biomedical data

Description

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems. The greatest challenge in developing such systems is the subject-dependent data variations or subject-based variability in physiological and biomedical data, which leads to difference in data distributions making the task of modeling these data, using traditional machine learning algorithms, complex and challenging. As a result, despite the wide application of machine learning, efficient deployment of its principles to model real-world data is still a challenge. This dissertation addresses the problem of subject based variability in physiological and biomedical data and proposes person adaptive prediction models based on novel transfer and active learning algorithms, an emerging field in machine learning. One of the significant contributions of this dissertation is a person adaptive method, for early detection of muscle fatigue using Surface Electromyogram signals, based on a new multi-source transfer learning algorithm. This dissertation also proposes a subject-independent algorithm for grading the progression of muscle fatigue from 0 to 1 level in a test subject, during isometric or dynamic contractions, at real-time. Besides subject based variability, biomedical image data also varies due to variations in their imaging techniques, leading to distribution differences between the image databases. Hence a classifier learned on one database may perform poorly on the other database. Another significant contribution of this dissertation has been the design and development of an efficient biomedical image data annotation framework, based on a novel combination of transfer learning and a new batch-mode active learning method, capable of addressing the distribution differences across databases. The methodologies developed in this dissertation are relevant and applicable to a large set of computing problems where there is a high variation of data between subjects or sources, such as face detection, pose detection and speech recognition. From a broader perspective, these frameworks can be viewed as a first step towards design of automated adaptive systems for real world data.

Contributors

Agent

Created

Date Created
2013

152361-Thumbnail Image.png

Techniques for soundscape retrieval and synthesis

Description

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This dissertation examines the application of several computational tools in the realms of digital signal processing, multimedia information retrieval, and computer music synthesis to the analysis of the soundscape. Namely, these tools include a) an open source software library, Sirens, which can be used for the segmentation of long environmental field recordings into individual sonic events and compare these events in terms of acoustic content, b) a graph-based retrieval system that can use these measures of acoustic similarity and measures of semantic similarity using the lexical database WordNet to perform both text-based retrieval and automatic annotation of environmental sounds, and c) new techniques for the dynamic, realtime parametric morphing of multiple field recordings, informed by the geographic paths along which they were recorded.

Contributors

Agent

Created

Date Created
2013

137623-Thumbnail Image.png

Intelligent Input Parser for Organic Chemistry Reagent Questions

Description

Due to its difficult nature, organic chemistry is receiving much research attention across the nation to develop more efficient and effective means to teach it. As part of that, Dr. Ian Gould at ASU is developing an online organic chemistry

Due to its difficult nature, organic chemistry is receiving much research attention across the nation to develop more efficient and effective means to teach it. As part of that, Dr. Ian Gould at ASU is developing an online organic chemistry educational website that provides help to students, adapts to their responses, and collects data about their performance. This thesis creative project addresses the design and implementation of an input parser for organic chemistry reagent questions, to appear on his website. After students used the form to submit questions throughout the Spring 2013 semester in Dr. Gould's organic chemistry class, the data gathered from their usage was analyzed, and feedback was collected. The feedback obtained from students was positive, and suggested that the input parser accomplished the educational goals that it sought to meet.

Contributors

Agent

Created

Date Created
2013-05

153926-Thumbnail Image.png

Leveraging collective wisdom in a multilabeled blog categorization environment

Description

One of the most remarkable outcomes resulting from the evolution of the web into Web 2.0, has been the propelling of blogging into a widely adopted and globally accepted phenomenon. While the unprecedented growth of the Blogosphere has added diversity

One of the most remarkable outcomes resulting from the evolution of the web into Web 2.0, has been the propelling of blogging into a widely adopted and globally accepted phenomenon. While the unprecedented growth of the Blogosphere has added diversity and enriched the media, it has also added complexity. To cope with the relentless expansion, many enthusiastic bloggers have embarked on voluntarily writing, tagging, labeling, and cataloguing their posts in hopes of reaching the widest possible audience. Unbeknown to them, this reaching-for-others process triggers the generation of a new kind of collective wisdom, a result of shared collaboration, and the exchange of ideas, purpose, and objectives, through the formation of associations, links, and relations. Mastering an understanding of the Blogosphere can greatly help facilitate the needs of the ever growing number of these users, as well as producers, service providers, and advertisers into facilitation of the categorization and navigation of this vast environment. This work explores a novel method to leverage the collective wisdom from the infused label space for blog search and discovery. The work demonstrates that the wisdom space can provide a most unique and desirable framework to which to discover the highly sought after background information that could aid in the building of classifiers. This work incorporates this insight into the construction of a better clustering of blogs which boosts the performance of classifiers for identifying more relevant labels for blogs, and offers a mechanism that can be incorporated into replacing spurious labels and mislabels in a multi-labeled space.

Contributors

Agent

Created

Date Created
2015

151151-Thumbnail Image.png

Design and development of an immersive virtual reality team trainer for advance cardiac life support

Description

Technology in the modern day has ensured that learning of skills and behavior may be both widely disseminated and cheaply available. An example of this is the concept of virtual reality (VR) training. Virtual Reality training ensures that learning can

Technology in the modern day has ensured that learning of skills and behavior may be both widely disseminated and cheaply available. An example of this is the concept of virtual reality (VR) training. Virtual Reality training ensures that learning can be provided often, in a safe simulated setting, and it may be delivered in a manner that makes it engaging while negating the need to purchase special equipment. This thesis presents a case study in the form of a time critical, team based medical scenario known as Advanced Cardiac Life Support (ACLS). A framework and methodology associated with the design of a VR trainer for ACLS is detailed. In addition, in order to potentially provide an engaging experience, the simulator was designed to incorporate immersive elements and a multimodal interface (haptic, visual, and auditory). A study was conducted to test two primary hypotheses namely: a meaningful transfer of skill is achieved from virtual reality training to real world mock codes and the presence of immersive components in virtual reality leads to an increase in the performance gained. The participant pool consisted of 54 clinicians divided into 9 teams of 6 members each. The teams were categorized into three treatment groups: immersive VR (3 teams), minimally immersive VR (3 teams), and control (3 teams). The study was conducted in 4 phases from a real world mock code pretest to assess baselines to a 30 minute VR training session culminating in a final mock code to assess the performance change from the baseline. The minimally immersive team was treated as control for the immersive components. The teams were graded, in both VR and mock code sessions, using the evaluation metric used in real world mock codes. The study revealed that the immersive VR groups saw greater performance gain from pretest to posttest than the minimally immersive and control groups in case of the VFib/VTach scenario (~20% to ~5%). Also the immersive VR groups had a greater performance gain than the minimally immersive groups from the first to the final session of VFib/VTach (29% to -13%) and PEA (27% to 15%).

Contributors

Agent

Created

Date Created
2012

151963-Thumbnail Image.png

Robust implementation of NL2KR system and it's application in iRODS domain

Description

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to the interface language of the system. NL2KR (Natural language to knowledge representation) v.1 system is a prototype of such a system. It is a learning based system that learns new meanings of words in terms of lambda-calculus formulas given an initial lexicon of some words and their meanings and a training corpus of sentences with their translations. As a part of this thesis, we take the prototype NL2KR v.1 system and enhance various components of it to make it usable for somewhat substantial and useful interface languages. We revamped the lexicon learning components, Inverse-lambda and Generalization modules, and redesigned the lexicon learning algorithm which uses these components to learn new meanings of words. Similarly, we re-developed an inbuilt parser of the system in Answer Set Programming (ASP) and also integrated external parser with the system. Apart from this, we added some new rich features like various system configurations and memory cache in the learning component of the NL2KR system. These enhancements helped in learning more meanings of the words, boosted performance of the system by reducing the computation time by a factor of 8 and improved the usability of the system. We evaluated the NL2KR system on iRODS domain. iRODS is a rule-oriented data system, which helps in managing large set of computer files using policies. This system provides a Rule-Oriented interface langauge whose syntactic structure is like any procedural programming language (eg. C). However, direct translation of natural language (NL) to this interface language is difficult. So, for automatic translation of NL to this language, we define a simple intermediate Policy Declarative Language (IPDL) to represent the knowledge in the policies, which then can be directly translated to iRODS rules. We develop a corpus of 100 policy statements and manually translate them to IPDL langauge. This corpus is then used for the evaluation of NL2KR system. We performed 10 fold cross validation on the system. Furthermore, using this corpus, we illustrate how different components of our NL2KR system work.

Contributors

Agent

Created

Date Created
2013

152003-Thumbnail Image.png

Classifying everyday activity through label propagation with sparse training data

Description

We solve the problem of activity verification in the context of sustainability. Activity verification is the process of proving the user assertions pertaining to a certain activity performed by the user. Our motivation lies in incentivizing the user for engaging

We solve the problem of activity verification in the context of sustainability. Activity verification is the process of proving the user assertions pertaining to a certain activity performed by the user. Our motivation lies in incentivizing the user for engaging in sustainable activities like taking public transport or recycling. Such incentivization schemes require the system to verify the claim made by the user. The system verifies these claims by analyzing the supporting evidence captured by the user while performing the activity. The proliferation of portable smart-phones in the past few years has provided us with a ubiquitous and relatively cheap platform, having multiple sensors like accelerometer, gyroscope, microphone etc. to capture this evidence data in-situ. In this research, we investigate the supervised and semi-supervised learning techniques for activity verification. Both these techniques make use the data set constructed using the evidence submitted by the user. Supervised learning makes use of annotated evidence data to build a function to predict the class labels of the unlabeled data points. The evidence data captured can be either unimodal or multimodal in nature. We use the accelerometer data as evidence for transportation mode verification and image data as evidence for recycling verification. After training the system, we achieve maximum accuracy of 94% when classifying the transport mode and 81% when detecting recycle activity. In the case of recycle verification, we could improve the classification accuracy by asking the user for more evidence. We present some techniques to ask the user for the next best piece of evidence that maximizes the probability of classification. Using these techniques for detecting recycle activity, the accuracy increases to 93%. The major disadvantage of using supervised models is that it requires extensive annotated training data, which expensive to collect. Due to the limited training data, we look at the graph based inductive semi-supervised learning methods to propagate the labels among the unlabeled samples. In the semi-supervised approach, we represent each instance in the data set as a node in the graph. Since it is a complete graph, edges interconnect these nodes, with each edge having some weight representing the similarity between the points. We propagate the labels in this graph, based on the proximity of the data points to the labeled nodes. We estimate the performance of these algorithms by measuring how close the probability distribution of the data after label propagation is to the probability distribution of the ground truth data. Since labeling has a cost associated with it, in this thesis we propose two algorithms that help us in selecting minimum number of labeled points to propagate the labels accurately. Our proposed algorithm achieves a maximum of 73% increase in performance when compared to the baseline algorithm.

Contributors

Agent

Created

Date Created
2013

149977-Thumbnail Image.png

Invariant human pose feature extraction for movement recognition and pose estimation

Description

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods,

Reliable extraction of human pose features that are invariant to view angle and body shape changes is critical for advancing human movement analysis. In this dissertation, the multifactor analysis techniques, including the multilinear analysis and the multifactor Gaussian process methods, have been exploited to extract such invariant pose features from video data by decomposing various key contributing factors, such as pose, view angle, and body shape, in the generation of the image observations. Experimental results have shown that the resulting pose features extracted using the proposed methods exhibit excellent invariance properties to changes in view angles and body shapes. Furthermore, using the proposed invariant multifactor pose features, a suite of simple while effective algorithms have been developed to solve the movement recognition and pose estimation problems. Using these proposed algorithms, excellent human movement analysis results have been obtained, and most of them are superior to those obtained from state-of-the-art algorithms on the same testing datasets. Moreover, a number of key movement analysis challenges, including robust online gesture spotting and multi-camera gesture recognition, have also been addressed in this research. To this end, an online gesture spotting framework has been developed to automatically detect and learn non-gesture movement patterns to improve gesture localization and recognition from continuous data streams using a hidden Markov network. In addition, the optimal data fusion scheme has been investigated for multicamera gesture recognition, and the decision-level camera fusion scheme using the product rule has been found to be optimal for gesture recognition using multiple uncalibrated cameras. Furthermore, the challenge of optimal camera selection in multi-camera gesture recognition has also been tackled. A measure to quantify the complementary strength across cameras has been proposed. Experimental results obtained from a real-life gesture recognition dataset have shown that the optimal camera combinations identified according to the proposed complementary measure always lead to the best gesture recognition results.

Contributors

Agent

Created

Date Created
2011