Matching Items (118)

Filtering by

Clear all filters

149503-Thumbnail Image.png

Stereo based visual odometry

Description

The exponential rise in unmanned aerial vehicles has necessitated the need for accurate pose estimation under any extreme conditions. Visual Odometry (VO) is the estimation of position and orientation of a vehicle based on analysis of a sequence of images

The exponential rise in unmanned aerial vehicles has necessitated the need for accurate pose estimation under any extreme conditions. Visual Odometry (VO) is the estimation of position and orientation of a vehicle based on analysis of a sequence of images captured from a camera mounted on it. VO offers a cheap and relatively accurate alternative to conventional odometry techniques like wheel odometry, inertial measurement systems and global positioning system (GPS). This thesis implements and analyzes the performance of a two camera based VO called Stereo based visual odometry (SVO) in presence of various deterrent factors like shadows, extremely bright outdoors, wet conditions etc... To allow the implementation of VO on any generic vehicle, a discussion on porting of the VO algorithm to android handsets is presented too. The SVO is implemented in three steps. In the first step, a dense disparity map for a scene is computed. To achieve this we utilize sum of absolute differences technique for stereo matching on rectified and pre-filtered stereo frames. Epipolar geometry is used to simplify the matching problem. The second step involves feature detection and temporal matching. Feature detection is carried out by Harris corner detector. These features are matched between two consecutive frames using the Lucas-Kanade feature tracker. The 3D co-ordinates of these matched set of features are computed from the disparity map obtained from the first step and are mapped into each other by a translation and a rotation. The rotation and translation is computed using least squares minimization with the aid of Singular Value Decomposition. Random Sample Consensus (RANSAC) is used for outlier detection. This comprises the third step. The accuracy of the algorithm is quantified based on the final position error, which is the difference between the final position computed by the SVO algorithm and the final ground truth position as obtained from the GPS. The SVO showed an error of around 1% under normal conditions for a path length of 60 m and around 3% in bright conditions for a path length of 130 m. The algorithm suffered in presence of shadows and vibrations, with errors of around 15% and path lengths of 20 m and 100 m respectively.

Contributors

Agent

Created

Date Created
2010

152389-Thumbnail Image.png

Automated animal coloration quantification in digital images using dominant colors and skin classification

Description

The origin and function of color in animals has been a subject of great interest for taxonomists and ecologists in recent years. Coloration in animals is useful for many important functions like species identification, camouflage and understanding evolutionary relationships. Quantitative

The origin and function of color in animals has been a subject of great interest for taxonomists and ecologists in recent years. Coloration in animals is useful for many important functions like species identification, camouflage and understanding evolutionary relationships. Quantitative measurements of color signal and patch size in mammals, birds and reptiles, to name a few are strong indicators of sexual selection cues and individual health. These measurements provide valuable insights into the impact of environmental conditions on habitat and breeding of mammals, birds and reptiles. Recent advances in the area of digital cameras and sensors have led to a significant increase in the use of digital photography as a means of color quantification in animals. Although a significant amount of research has been conducted on ways to standardize image acquisition conditions and calibrate cameras for use in animal color quantification, almost no work has been done on designing automated methods for animal color quantification. This thesis presents a novel perceptual"–"based framework for the automated extraction and quantification of animal coloration from digital images with slowly varying (almost homogenous) background colors. This implemented framework uses a combination of several techniques including color space quantization using a few dominant colors, foreground"–"background identification, Bayesian classification and mixture Gaussian modelling of conditional densities, edge"–"enhanced model"–"based classification and Saturation"–"Brightness quantization to extract the colored patch. This approach assumes no prior information about the color of either the subject or the background and also the position of the subject in the image. The performance of the proposed method is evaluated for the plumage color of the wild house finches. Segmentation results obtained using the implemented framework are compared with manually scored results to illustrate the performance of this system. The segmentation results show a high correlation with manually scored images. This novel framework also eliminates common problems in manual scoring of digital images such as low repeatability and inter"–"observer error.

Contributors

Agent

Created

Date Created
2013

151716-Thumbnail Image.png

Batch mode active learning for multimedia pattern recognition

Description

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real

The rapid escalation of technology and the widespread emergence of modern technological equipments have resulted in the generation of humongous amounts of digital data (in the form of images, videos and text). This has expanded the possibility of solving real world problems using computational learning frameworks. However, while gathering a large amount of data is cheap and easy, annotating them with class labels is an expensive process in terms of time, labor and human expertise. This has paved the way for research in the field of active learning. Such algorithms automatically select the salient and exemplar instances from large quantities of unlabeled data and are effective in reducing human labeling effort in inducing classification models. To utilize the possible presence of multiple labeling agents, there have been attempts towards a batch mode form of active learning, where a batch of data instances is selected simultaneously for manual annotation. This dissertation is aimed at the development of novel batch mode active learning algorithms to reduce manual effort in training classification models in real world multimedia pattern recognition applications. Four major contributions are proposed in this work: $(i)$ a framework for dynamic batch mode active learning, where the batch size and the specific data instances to be queried are selected adaptively through a single formulation, based on the complexity of the data stream in question, $(ii)$ a batch mode active learning strategy for fuzzy label classification problems, where there is an inherent imprecision and vagueness in the class label definitions, $(iii)$ batch mode active learning algorithms based on convex relaxations of an NP-hard integer quadratic programming (IQP) problem, with guaranteed bounds on the solution quality and $(iv)$ an active matrix completion algorithm and its application to solve several variants of the active learning problem (transductive active learning, multi-label active learning, active feature acquisition and active learning for regression). These contributions are validated on the face recognition and facial expression recognition problems (which are commonly encountered in real world applications like robotics, security and assistive technology for the blind and the visually impaired) and also on collaborative filtering applications like movie recommendation.

Contributors

Agent

Created

Date Created
2013

151120-Thumbnail Image.png

Clinically relevant classification and retrieval of diabetic retinopathy images

Description

Diabetic retinopathy (DR) is a common cause of blindness occurring due to prolonged presence of diabetes. The risk of developing DR or having the disease progress is increasing over time. Despite advances in diabetes care over the years, DR remains

Diabetic retinopathy (DR) is a common cause of blindness occurring due to prolonged presence of diabetes. The risk of developing DR or having the disease progress is increasing over time. Despite advances in diabetes care over the years, DR remains a vision-threatening complication and one of the leading causes of blindness among American adults. Recent studies have shown that diagnosis based on digital retinal imaging has potential benefits over traditional face-to-face evaluation. Yet there is a dearth of computer-based systems that can match the level of performance achieved by ophthalmologists. This thesis takes a fresh perspective in developing a computer-based system aimed at improving diagnosis of DR images. These images are categorized into three classes according to their severity level. The proposed approach explores effective methods to classify new images and retrieve clinically-relevant images from a database with prior diagnosis information associated with them. Retrieval provides a novel way to utilize the vast knowledge in the archives of previously-diagnosed DR images and thereby improve a clinician's performance while classification can safely reduce the burden on DR screening programs and possibly achieve higher detection accuracy than human experts. To solve the three-class retrieval and classification problem, the approach uses a multi-class multiple-instance medical image retrieval framework that makes use of spectrally tuned color correlogram and steerable Gaussian filter response features. The results show better retrieval and classification performances than prior-art methods and are also observed to be of clinical and visual relevance.

Contributors

Agent

Created

Date Created
2012

151926-Thumbnail Image.png

Building adaptive computational systems for physiological and biomedical data

Description

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological

In recent years, machine learning and data mining technologies have received growing attention in several areas such as recommendation systems, natural language processing, speech and handwriting recognition, image processing and biomedical domain. Many of these applications which deal with physiological and biomedical data require person specific or person adaptive systems. The greatest challenge in developing such systems is the subject-dependent data variations or subject-based variability in physiological and biomedical data, which leads to difference in data distributions making the task of modeling these data, using traditional machine learning algorithms, complex and challenging. As a result, despite the wide application of machine learning, efficient deployment of its principles to model real-world data is still a challenge. This dissertation addresses the problem of subject based variability in physiological and biomedical data and proposes person adaptive prediction models based on novel transfer and active learning algorithms, an emerging field in machine learning. One of the significant contributions of this dissertation is a person adaptive method, for early detection of muscle fatigue using Surface Electromyogram signals, based on a new multi-source transfer learning algorithm. This dissertation also proposes a subject-independent algorithm for grading the progression of muscle fatigue from 0 to 1 level in a test subject, during isometric or dynamic contractions, at real-time. Besides subject based variability, biomedical image data also varies due to variations in their imaging techniques, leading to distribution differences between the image databases. Hence a classifier learned on one database may perform poorly on the other database. Another significant contribution of this dissertation has been the design and development of an efficient biomedical image data annotation framework, based on a novel combination of transfer learning and a new batch-mode active learning method, capable of addressing the distribution differences across databases. The methodologies developed in this dissertation are relevant and applicable to a large set of computing problems where there is a high variation of data between subjects or sources, such as face detection, pose detection and speech recognition. From a broader perspective, these frameworks can be viewed as a first step towards design of automated adaptive systems for real world data.

Contributors

Agent

Created

Date Created
2013

152361-Thumbnail Image.png

Techniques for soundscape retrieval and synthesis

Description

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This dissertation examines the application of several computational tools in the realms of digital signal processing, multimedia information retrieval, and computer music synthesis to the analysis of the soundscape. Namely, these tools include a) an open source software library, Sirens, which can be used for the segmentation of long environmental field recordings into individual sonic events and compare these events in terms of acoustic content, b) a graph-based retrieval system that can use these measures of acoustic similarity and measures of semantic similarity using the lexical database WordNet to perform both text-based retrieval and automatic annotation of environmental sounds, and c) new techniques for the dynamic, realtime parametric morphing of multiple field recordings, informed by the geographic paths along which they were recorded.

Contributors

Agent

Created

Date Created
2013

153926-Thumbnail Image.png

Leveraging collective wisdom in a multilabeled blog categorization environment

Description

One of the most remarkable outcomes resulting from the evolution of the web into Web 2.0, has been the propelling of blogging into a widely adopted and globally accepted phenomenon. While the unprecedented growth of the Blogosphere has added diversity

One of the most remarkable outcomes resulting from the evolution of the web into Web 2.0, has been the propelling of blogging into a widely adopted and globally accepted phenomenon. While the unprecedented growth of the Blogosphere has added diversity and enriched the media, it has also added complexity. To cope with the relentless expansion, many enthusiastic bloggers have embarked on voluntarily writing, tagging, labeling, and cataloguing their posts in hopes of reaching the widest possible audience. Unbeknown to them, this reaching-for-others process triggers the generation of a new kind of collective wisdom, a result of shared collaboration, and the exchange of ideas, purpose, and objectives, through the formation of associations, links, and relations. Mastering an understanding of the Blogosphere can greatly help facilitate the needs of the ever growing number of these users, as well as producers, service providers, and advertisers into facilitation of the categorization and navigation of this vast environment. This work explores a novel method to leverage the collective wisdom from the infused label space for blog search and discovery. The work demonstrates that the wisdom space can provide a most unique and desirable framework to which to discover the highly sought after background information that could aid in the building of classifiers. This work incorporates this insight into the construction of a better clustering of blogs which boosts the performance of classifiers for identifying more relevant labels for blogs, and offers a mechanism that can be incorporated into replacing spurious labels and mislabels in a multi-labeled space.

Contributors

Agent

Created

Date Created
2015

154364-Thumbnail Image.png

Perceptual-based locally adaptive noise and blur detection

Description

The quality of real-world visual content is typically impaired by many factors including image noise and blur. Detecting and analyzing these impairments are important steps for multiple computer vision tasks. This work focuses on perceptual-based locally adaptive noise and blur

The quality of real-world visual content is typically impaired by many factors including image noise and blur. Detecting and analyzing these impairments are important steps for multiple computer vision tasks. This work focuses on perceptual-based locally adaptive noise and blur detection and their application to image restoration.

In the context of noise detection, this work proposes perceptual-based full-reference and no-reference objective image quality metrics by integrating perceptually weighted local noise into a probability summation model. Results are reported on both the LIVE and TID2008 databases. The proposed metrics achieve consistently a good performance across noise types and across databases as compared to many of the best very recent quality metrics. The proposed metrics are able to predict with high accuracy the relative amount of perceived noise in images of different content.

In the context of blur detection, existing approaches are either computationally costly or cannot perform reliably when dealing with the spatially-varying nature of the defocus blur. In addition, many existing approaches do not take human perception into account. This work proposes a blur detection algorithm that is capable of detecting and quantifying the level of spatially-varying blur by integrating directional edge spread calculation, probability of blur detection and local probability summation. The proposed method generates a blur map indicating the relative amount of perceived local blurriness. In order to detect the flat
ear flat regions that do not contribute to perceivable blur, a perceptual model based on the Just Noticeable Difference (JND) is further integrated in the proposed blur detection algorithm to generate perceptually significant blur maps. We compare our proposed method with six other state-of-the-art blur detection methods. Experimental results show that the proposed method performs the best both visually and quantitatively.

This work further investigates the application of the proposed blur detection methods to image deblurring. Two selective perceptual-based image deblurring frameworks are proposed, to improve the image deblurring results and to reduce the restoration artifacts. In addition, an edge-enhanced super resolution algorithm is proposed, and is shown to achieve better reconstructed results for the edge regions.

Contributors

Agent

Created

Date Created
2016

151151-Thumbnail Image.png

Design and development of an immersive virtual reality team trainer for advance cardiac life support

Description

Technology in the modern day has ensured that learning of skills and behavior may be both widely disseminated and cheaply available. An example of this is the concept of virtual reality (VR) training. Virtual Reality training ensures that learning can

Technology in the modern day has ensured that learning of skills and behavior may be both widely disseminated and cheaply available. An example of this is the concept of virtual reality (VR) training. Virtual Reality training ensures that learning can be provided often, in a safe simulated setting, and it may be delivered in a manner that makes it engaging while negating the need to purchase special equipment. This thesis presents a case study in the form of a time critical, team based medical scenario known as Advanced Cardiac Life Support (ACLS). A framework and methodology associated with the design of a VR trainer for ACLS is detailed. In addition, in order to potentially provide an engaging experience, the simulator was designed to incorporate immersive elements and a multimodal interface (haptic, visual, and auditory). A study was conducted to test two primary hypotheses namely: a meaningful transfer of skill is achieved from virtual reality training to real world mock codes and the presence of immersive components in virtual reality leads to an increase in the performance gained. The participant pool consisted of 54 clinicians divided into 9 teams of 6 members each. The teams were categorized into three treatment groups: immersive VR (3 teams), minimally immersive VR (3 teams), and control (3 teams). The study was conducted in 4 phases from a real world mock code pretest to assess baselines to a 30 minute VR training session culminating in a final mock code to assess the performance change from the baseline. The minimally immersive team was treated as control for the immersive components. The teams were graded, in both VR and mock code sessions, using the evaluation metric used in real world mock codes. The study revealed that the immersive VR groups saw greater performance gain from pretest to posttest than the minimally immersive and control groups in case of the VFib/VTach scenario (~20% to ~5%). Also the immersive VR groups had a greater performance gain than the minimally immersive groups from the first to the final session of VFib/VTach (29% to -13%) and PEA (27% to 15%).

Contributors

Agent

Created

Date Created
2012

151963-Thumbnail Image.png

Robust implementation of NL2KR system and it's application in iRODS domain

Description

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a

Currently, to interact with computer based systems one needs to learn the specific interface language of that system. In most cases, interaction would be much easier if it could be done in natural language. For that, we will need a module which understands natural language and automatically translates it to the interface language of the system. NL2KR (Natural language to knowledge representation) v.1 system is a prototype of such a system. It is a learning based system that learns new meanings of words in terms of lambda-calculus formulas given an initial lexicon of some words and their meanings and a training corpus of sentences with their translations. As a part of this thesis, we take the prototype NL2KR v.1 system and enhance various components of it to make it usable for somewhat substantial and useful interface languages. We revamped the lexicon learning components, Inverse-lambda and Generalization modules, and redesigned the lexicon learning algorithm which uses these components to learn new meanings of words. Similarly, we re-developed an inbuilt parser of the system in Answer Set Programming (ASP) and also integrated external parser with the system. Apart from this, we added some new rich features like various system configurations and memory cache in the learning component of the NL2KR system. These enhancements helped in learning more meanings of the words, boosted performance of the system by reducing the computation time by a factor of 8 and improved the usability of the system. We evaluated the NL2KR system on iRODS domain. iRODS is a rule-oriented data system, which helps in managing large set of computer files using policies. This system provides a Rule-Oriented interface langauge whose syntactic structure is like any procedural programming language (eg. C). However, direct translation of natural language (NL) to this interface language is difficult. So, for automatic translation of NL to this language, we define a simple intermediate Policy Declarative Language (IPDL) to represent the knowledge in the policies, which then can be directly translated to iRODS rules. We develop a corpus of 100 policy statements and manually translate them to IPDL langauge. This corpus is then used for the evaluation of NL2KR system. We performed 10 fold cross validation on the system. Furthermore, using this corpus, we illustrate how different components of our NL2KR system work.

Contributors

Agent

Created

Date Created
2013