Matching Items (86)
Recent advances in camera architectures and associated mathematical representations now enable compressive acquisition of images and videos at low data-rates. While most computer vision applications of today are composed of conventional cameras, which collect a large amount redundant data and power hungry embedded systems, which compress the collected data for further processing, compressive cameras offer the advantage of direct acquisition of data in compressed domain and hence readily promise to find applicability in computer vision, particularly in environments hampered by limited communication bandwidths. However, despite the significant progress in theory and methods of compressive sensing, little headway has been made in developing systems for such applications by exploiting the merits of compressive sensing. In such a setting, we consider the problem of activity recognition, which is an important inference problem in many security and surveillance applications. Since all successful activity recognition systems involve detection of human, followed by recognition, a potential fully functioning system motivated by compressive camera would involve the tracking of human, which requires the reconstruction of atleast the initial few frames to detect the human. Once the human is tracked, the recognition part of the system requires only the features to be extracted from the tracked sequences, which can be the reconstructed images or the compressed measurements of such sequences. However, it is desirable in resource constrained environments that these features be extracted from the compressive measurements without reconstruction. Motivated by this, in this thesis, we propose a framework for understanding activities as a non-linear dynamical system, and propose a robust, generalizable feature that can be extracted directly from the compressed measurements without reconstructing the original video frames. The proposed feature is termed recurrence texture and is motivated from recurrence analysis of non-linear dynamical systems. We show that it is possible to obtain discriminative features directly from the compressed stream and show its utility in recognition of activities at very low data rates.
Diabetic retinopathy (DR) is a common cause of blindness occurring due to prolonged presence of diabetes. The risk of developing DR or having the disease progress is increasing over time. Despite advances in diabetes care over the years, DR remains a vision-threatening complication and one of the leading causes of blindness among American adults. Recent studies have shown that diagnosis based on digital retinal imaging has potential benefits over traditional face-to-face evaluation. Yet there is a dearth of computer-based systems that can match the level of performance achieved by ophthalmologists. This thesis takes a fresh perspective in developing a computer-based system aimed at improving diagnosis of DR images. These images are categorized into three classes according to their severity level. The proposed approach explores effective methods to classify new images and retrieve clinically-relevant images from a database with prior diagnosis information associated with them. Retrieval provides a novel way to utilize the vast knowledge in the archives of previously-diagnosed DR images and thereby improve a clinician's performance while classification can safely reduce the burden on DR screening programs and possibly achieve higher detection accuracy than human experts. To solve the three-class retrieval and classification problem, the approach uses a multi-class multiple-instance medical image retrieval framework that makes use of spectrally tuned color correlogram and steerable Gaussian filter response features. The results show better retrieval and classification performances than prior-art methods and are also observed to be of clinical and visual relevance.
Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.
The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.
Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.
Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.
The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.
Software has a great impact on the energy efficiency of any computing system--it can manage the components of a system efficiently or inefficiently. The impact of software is amplified in the context of a wearable computing system used for activity recognition. The design space this platform opens up is immense and encompasses sensors, feature calculations, activity classification algorithms, sleep schedules, and transmission protocols. Design choices in each of these areas impact energy use, overall accuracy, and usefulness of the system. This thesis explores methods software can influence the trade-off between energy consumption and system accuracy. In general the more energy a system consumes the more accurate will be. We explore how finding the transitions between human activities is able to reduce the energy consumption of such systems without reducing much accuracy. We introduce the Log-likelihood Ratio Test as a method to detect transitions, and explore how choices of sensor, feature calculations, and parameters concerning time segmentation affect the accuracy of this method. We discovered an approximate 5X increase in energy efficiency could be achieved with only a 5% decrease in accuracy. We also address how a system's sleep mode, in which the processor enters a low-power state and sensors are turned off, affects a wearable computing platform that does activity recognition. We discuss the energy trade-offs in each stage of the activity recognition process. We find that careful analysis of these parameters can result in great increases in energy efficiency if small compromises in overall accuracy can be tolerated. We call this the ``Great Compromise.'' We found a 6X increase in efficiency with a 7% decrease in accuracy. We then consider how wireless transmission of data affects the overall energy efficiency of a wearable computing platform. We find that design decisions such as feature calculations and grouping size have a great impact on the energy consumption of the system because of the amount of data that is stored and transmitted. For example, storing and transmitting vector-based features such as FFT or DCT do not compress the signal and would use more energy than storing and transmitting the raw signal. The effect of grouping size on energy consumption depends on the feature. For scalar features energy consumption is proportional in the inverse of grouping size, so it's reduced as grouping size goes up. For features that depend on the grouping size, such as FFT, energy increases with the logarithm of grouping size, so energy consumption increases slowly as grouping size increases. We find that compressing data through activity classification and transition detection significantly reduces energy consumption and that the energy consumed for the classification overhead is negligible compared to the energy savings from data compression. We provide mathematical models of energy usage and data generation, and test our ideas using a mobile computing platform, the Texas Instruments Chronos watch.
Continuous monitoring of sensor data from smart phones to identify human activities and gestures, puts a heavy load on the smart phone's power consumption. In this research study, the non-Euclidean geometry of the rich sensor data obtained from the user's smart phone is utilized to perform compressive analysis and efficient classification of human activities by employing machine learning techniques. We are interested in the generalization of classical tools for signal approximation to newer spaces, such as rotation data, which is best studied in a non-Euclidean setting, and its application to activity analysis. Attributing to the non-linear nature of the rotation data space, which involve a heavy overload on the smart phone's processor and memory as opposed to feature extraction on the Euclidean space, indexing and compaction of the acquired sensor data is performed prior to feature extraction, to reduce CPU overhead and thereby increase the lifetime of the battery with a little loss in recognition accuracy of the activities. The sensor data represented as unit quaternions, is a more intrinsic representation of the orientation of smart phone compared to Euler angles (which suffers from Gimbal lock problem) or the computationally intensive rotation matrices. Classification algorithms are employed to classify these manifold sequences in the non-Euclidean space. By performing customized indexing (using K-means algorithm) of the evolved manifold sequences before feature extraction, considerable energy savings is achieved in terms of smart phone's battery life.
Today's world is seeing a rapid technological advancement in various fields, having access to faster computers and better sensing devices. With such advancements, the task of recognizing human activities has been acknowledged as an important problem, with a wide range of applications such as surveillance, health monitoring and animation. Traditional approaches to dynamical modeling have included linear and nonlinear methods with their respective drawbacks. An alternative idea I propose is the use of descriptors of the shape of the dynamical attractor as a feature representation for quantification of nature of dynamics. The framework has two main advantages over traditional approaches: a) representation of the dynamical system is derived directly from the observational data, without any inherent assumptions, and b) the proposed features show stability under different time-series lengths where traditional dynamical invariants fail.
Approximately 1\% of the total world population are stroke survivors, making it the most common neurological disorder. This increasing demand for rehabilitation facilities has been seen as a significant healthcare problem worldwide. The laborious and expensive process of visual monitoring by physical therapists has motivated my research to invent novel strategies to supplement therapy received in hospital in a home-setting. In this direction, I propose a general framework for tuning component-level kinematic features using therapists’ overall impressions of movement quality, in the context of a Home-based Adaptive Mixed Reality Rehabilitation (HAMRR) system.
The rapid technological advancements in computing and sensing has resulted in large amounts of data which requires powerful tools to analyze. In the recent past, topological data analysis methods have been investigated in various communities, and the work by Carlsson establishes that persistent homology can be used as a powerful topological data analysis approach for effectively analyzing large datasets. I have explored suitable topological data analysis methods and propose a framework for human activity analysis utilizing the same for applications such as action recognition.
This work examines two main areas in model-based time-varying signal processing with emphasis in speech processing applications. The first area concentrates on improving speech intelligibility and on increasing the proposed methodologies application for clinical practice in speech-language pathology. The second area concentrates on signal expansions matched to physical-based models but without requiring independent basis functions; the significance of this work is demonstrated with speech vowels.
A fully automated Vowel Space Area (VSA) computation method is proposed that can be applied to any type of speech. It is shown that the VSA provides an efficient and reliable measure and is correlated to speech intelligibility. A clinical tool that incorporates the automated VSA was proposed for evaluation and treatment to be used by speech language pathologists. Two exploratory studies are performed using two databases by analyzing mean formant trajectories in healthy speech for a wide range of speakers, dialects, and coarticulation contexts. It is shown that phonemes crowded in formant space can often have distinct trajectories, possibly due to accurate perception.
A theory for analyzing time-varying signals models with amplitude modulation and frequency modulation is developed. Examples are provided that demonstrate other possible signal model decompositions with independent basis functions and corresponding physical interpretations. The Hilbert transform (HT) and the use of the analytic form of a signal are motivated, and a proof is provided to show that a signal can still preserve desirable mathematical properties without the use of the HT. A visualization of the Hilbert spectrum is proposed to aid in the interpretation. A signal demodulation is proposed and used to develop a modified Empirical Mode Decomposition (EMD) algorithm.
The increased risk of falling and the worse ability to perform other daily physical activities in the elderly cause concern about monitoring and correcting basic everyday movement. In this thesis, a Kinect-based system was designed to assess one of the most important factors in balance control of human body when doing Sit-to-Stand (STS) movement: the postural symmetry in mediolateral direction. A symmetry score, calculated by the data obtained from a Kinect RGB-D camera, was proposed to reflect the mediolateral postural symmetry degree and was used to drive a real-time audio feedback designed in MAX/MSP to help users adjust themselves to perform their movement in a more symmetrical way during STS. The symmetry score was verified by calculating the Spearman correlation coefficient with the data obtained from Inertial Measurement Unit (IMU) sensor and got an average value at 0.732. Five healthy adults, four males and one female, with normal balance abilities and with no musculoskeletal disorders, were selected to participate in the experiment and the results showed that the low-cost Kinect-based system has the potential to train users to perform a more symmetrical movement in mediolateral direction during STS movement.
There has been tremendous technological advancement in the past two decades. Faster computers and improved sensing devices have broadened the research scope in computer vision. With these developments, the task of assessing the quality of human actions, is considered an important problem that needs to be tackled. Movement quality assessment finds wide range of application in motor control, health-care, rehabilitation and physical therapy. Home-based interactive physical therapy requires the ability to monitor, inform and assess the quality of everyday movements. Obtaining labeled data from trained therapists/experts is the main limitation, since it is both expensive and time consuming.
Motivated by recent studies in motor control and therapy, in this thesis an existing computational framework is used to assess balance impairment and disease severity in people suffering from Parkinson's disease. The framework uses high-dimensional shape descriptors of the reconstructed phase space, of the subjects' center of pressure (CoP) tracings while performing dynamical postural shifts. The performance of the framework is evaluated using a dataset collected from 43 healthy and 17 Parkinson's disease impaired subjects, and outperforms other methods, such as dynamical shift indices and use of chaotic invariants, in assessment of balance impairment.
In this thesis, an unsupervised method is also proposed that measures movement quality assessment of simple actions like sit-to-stand and dynamic posture shifts by modeling the deviation of a given movement from an ideal movement path in the configuration space, i.e. the quality of movement is directly related to similarity to the ideal trajectory, between the start and end pose. The S^1xS^1 configuration space was used to model the interaction of two joint angles in sit-to-stand actions, and the R^2 space was used to model the subject's CoP while performing dynamic posture shifts for application in movement quality estimation.
This thesis aims to explore the language of different bodies in the field of dance by analyzing
the habitual patterns of dancers from different backgrounds and vernaculars. Contextually,
the term habitual patterns is defined as the postures or poses that tend to re-appear,
often unintentionally, as the dancer performs improvisational dance. The focus lies in exposing
the movement vocabulary of a dancer to reveal his/her unique fingerprint.
The proposed approach for uncovering these movement patterns is to use a clustering
technique; mainly k-means. In addition to a static method of analysis, this paper uses
an online method of clustering using a streaming variant of k-means that integrates into
the flow of components that can be used in a real-time interactive dance performance. The
computational system is trained by the dancer to discover identifying patterns and therefore
it enables a feedback loop resulting in a rich exchange between dancer and machine. This
can help break a dancer’s tendency to create similar postures, explore larger kinespheric
space and invent movement beyond their current capabilities.
This paper describes a project that distinguishes itself in that it uses a custom database
that is curated for the purpose of highlighting the similarities and differences between various
movement forms. It puts particular emphasis on the process of choosing source movement
qualitatively, before the technological capture process begins.