Matching Items (83)

Filtering by

Clear all filters

151092-Thumbnail Image.png
Description

Recent advances in camera architectures and associated mathematical representations now enable compressive acquisition of images and videos at low data-rates. While most computer vision applications of today are composed of conventional cameras, which collect a large amount redundant data and power hungry embedded systems, which compress the collected data for

Recent advances in camera architectures and associated mathematical representations now enable compressive acquisition of images and videos at low data-rates. While most computer vision applications of today are composed of conventional cameras, which collect a large amount redundant data and power hungry embedded systems, which compress the collected data for further processing, compressive cameras offer the advantage of direct acquisition of data in compressed domain and hence readily promise to find applicability in computer vision, particularly in environments hampered by limited communication bandwidths. However, despite the significant progress in theory and methods of compressive sensing, little headway has been made in developing systems for such applications by exploiting the merits of compressive sensing. In such a setting, we consider the problem of activity recognition, which is an important inference problem in many security and surveillance applications. Since all successful activity recognition systems involve detection of human, followed by recognition, a potential fully functioning system motivated by compressive camera would involve the tracking of human, which requires the reconstruction of atleast the initial few frames to detect the human. Once the human is tracked, the recognition part of the system requires only the features to be extracted from the tracked sequences, which can be the reconstructed images or the compressed measurements of such sequences. However, it is desirable in resource constrained environments that these features be extracted from the compressive measurements without reconstruction. Motivated by this, in this thesis, we propose a framework for understanding activities as a non-linear dynamical system, and propose a robust, generalizable feature that can be extracted directly from the compressed measurements without reconstructing the original video frames. The proposed feature is termed recurrence texture and is motivated from recurrence analysis of non-linear dynamical systems. We show that it is possible to obtain discriminative features directly from the compressed stream and show its utility in recognition of activities at very low data rates.

ContributorsKulkarni, Kuldeep Sharad (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Frakes, David (Committee member) / Arizona State University (Publisher)
Created2012
151120-Thumbnail Image.png
Description

Diabetic retinopathy (DR) is a common cause of blindness occurring due to prolonged presence of diabetes. The risk of developing DR or having the disease progress is increasing over time. Despite advances in diabetes care over the years, DR remains a vision-threatening complication and one of the leading causes of

Diabetic retinopathy (DR) is a common cause of blindness occurring due to prolonged presence of diabetes. The risk of developing DR or having the disease progress is increasing over time. Despite advances in diabetes care over the years, DR remains a vision-threatening complication and one of the leading causes of blindness among American adults. Recent studies have shown that diagnosis based on digital retinal imaging has potential benefits over traditional face-to-face evaluation. Yet there is a dearth of computer-based systems that can match the level of performance achieved by ophthalmologists. This thesis takes a fresh perspective in developing a computer-based system aimed at improving diagnosis of DR images. These images are categorized into three classes according to their severity level. The proposed approach explores effective methods to classify new images and retrieve clinically-relevant images from a database with prior diagnosis information associated with them. Retrieval provides a novel way to utilize the vast knowledge in the archives of previously-diagnosed DR images and thereby improve a clinician's performance while classification can safely reduce the burden on DR screening programs and possibly achieve higher detection accuracy than human experts. To solve the three-class retrieval and classification problem, the approach uses a multi-class multiple-instance medical image retrieval framework that makes use of spectrally tuned color correlogram and steerable Gaussian filter response features. The results show better retrieval and classification performances than prior-art methods and are also observed to be of clinical and visual relevance.

ContributorsChandakkar, Parag Shridhar (Author) / Li, Baoxin (Thesis advisor) / Turaga, Pavan (Committee member) / Frakes, David (Committee member) / Arizona State University (Publisher)
Created2012
153488-Thumbnail Image.png
Description

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents

Audio signals, such as speech and ambient sounds convey rich information pertaining to a user’s activity, mood or intent. Enabling machines to understand this contextual information is necessary to bridge the gap in human-machine interaction. This is challenging due to its subjective nature, hence, requiring sophisticated techniques. This dissertation presents a set of computational methods, that generalize well across different conditions, for speech-based applications involving emotion recognition and keyword detection, and ambient sounds-based applications such as lifelogging.

The expression and perception of emotions varies across speakers and cultures, thus, determining features and classification methods that generalize well to different conditions is strongly desired. A latent topic models-based method is proposed to learn supra-segmental features from low-level acoustic descriptors. The derived features outperform state-of-the-art approaches over multiple databases. Cross-corpus studies are conducted to determine the ability of these features to generalize well across different databases. The proposed method is also applied to derive features from facial expressions; a multi-modal fusion overcomes the deficiencies of a speech only approach and further improves the recognition performance.

Besides affecting the acoustic properties of speech, emotions have a strong influence over speech articulation kinematics. A learning approach, which constrains a classifier trained over acoustic descriptors, to also model articulatory data is proposed here. This method requires articulatory information only during the training stage, thus overcoming the challenges inherent to large-scale data collection, while simultaneously exploiting the correlations between articulation kinematics and acoustic descriptors to improve the accuracy of emotion recognition systems.

Identifying context from ambient sounds in a lifelogging scenario requires feature extraction, segmentation and annotation techniques capable of efficiently handling long duration audio recordings; a complete framework for such applications is presented. The performance is evaluated on real world data and accompanied by a prototypical Android-based user interface.

The proposed methods are also assessed in terms of computation and implementation complexity. Software and field programmable gate array based implementations are considered for emotion recognition, while virtual platforms are used to model the complexities of lifelogging. The derived metrics are used to determine the feasibility of these methods for applications requiring real-time capabilities and low power consumption.

ContributorsShah, Mohit (Author) / Spanias, Andreas (Thesis advisor) / Chakrabarti, Chaitali (Thesis advisor) / Berisha, Visar (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2015
152778-Thumbnail Image.png
Description

Software has a great impact on the energy efficiency of any computing system--it can manage the components of a system efficiently or inefficiently. The impact of software is amplified in the context of a wearable computing system used for activity recognition. The design space this platform opens up is immense

Software has a great impact on the energy efficiency of any computing system--it can manage the components of a system efficiently or inefficiently. The impact of software is amplified in the context of a wearable computing system used for activity recognition. The design space this platform opens up is immense and encompasses sensors, feature calculations, activity classification algorithms, sleep schedules, and transmission protocols. Design choices in each of these areas impact energy use, overall accuracy, and usefulness of the system. This thesis explores methods software can influence the trade-off between energy consumption and system accuracy. In general the more energy a system consumes the more accurate will be. We explore how finding the transitions between human activities is able to reduce the energy consumption of such systems without reducing much accuracy. We introduce the Log-likelihood Ratio Test as a method to detect transitions, and explore how choices of sensor, feature calculations, and parameters concerning time segmentation affect the accuracy of this method. We discovered an approximate 5X increase in energy efficiency could be achieved with only a 5% decrease in accuracy. We also address how a system's sleep mode, in which the processor enters a low-power state and sensors are turned off, affects a wearable computing platform that does activity recognition. We discuss the energy trade-offs in each stage of the activity recognition process. We find that careful analysis of these parameters can result in great increases in energy efficiency if small compromises in overall accuracy can be tolerated. We call this the ``Great Compromise.'' We found a 6X increase in efficiency with a 7% decrease in accuracy. We then consider how wireless transmission of data affects the overall energy efficiency of a wearable computing platform. We find that design decisions such as feature calculations and grouping size have a great impact on the energy consumption of the system because of the amount of data that is stored and transmitted. For example, storing and transmitting vector-based features such as FFT or DCT do not compress the signal and would use more energy than storing and transmitting the raw signal. The effect of grouping size on energy consumption depends on the feature. For scalar features energy consumption is proportional in the inverse of grouping size, so it's reduced as grouping size goes up. For features that depend on the grouping size, such as FFT, energy increases with the logarithm of grouping size, so energy consumption increases slowly as grouping size increases. We find that compressing data through activity classification and transition detection significantly reduces energy consumption and that the energy consumed for the classification overhead is negligible compared to the energy savings from data compression. We provide mathematical models of energy usage and data generation, and test our ideas using a mobile computing platform, the Texas Instruments Chronos watch.

ContributorsBoyd, Jeffrey Michael (Author) / Sundaram, Hari (Thesis advisor) / Li, Baoxin (Thesis advisor) / Shrivastava, Aviral (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2014
152813-Thumbnail Image.png
Description

Continuous monitoring of sensor data from smart phones to identify human activities and gestures, puts a heavy load on the smart phone's power consumption. In this research study, the non-Euclidean geometry of the rich sensor data obtained from the user's smart phone is utilized to perform compressive analysis and efficient

Continuous monitoring of sensor data from smart phones to identify human activities and gestures, puts a heavy load on the smart phone's power consumption. In this research study, the non-Euclidean geometry of the rich sensor data obtained from the user's smart phone is utilized to perform compressive analysis and efficient classification of human activities by employing machine learning techniques. We are interested in the generalization of classical tools for signal approximation to newer spaces, such as rotation data, which is best studied in a non-Euclidean setting, and its application to activity analysis. Attributing to the non-linear nature of the rotation data space, which involve a heavy overload on the smart phone's processor and memory as opposed to feature extraction on the Euclidean space, indexing and compaction of the acquired sensor data is performed prior to feature extraction, to reduce CPU overhead and thereby increase the lifetime of the battery with a little loss in recognition accuracy of the activities. The sensor data represented as unit quaternions, is a more intrinsic representation of the orientation of smart phone compared to Euler angles (which suffers from Gimbal lock problem) or the computationally intensive rotation matrices. Classification algorithms are employed to classify these manifold sequences in the non-Euclidean space. By performing customized indexing (using K-means algorithm) of the evolved manifold sequences before feature extraction, considerable energy savings is achieved in terms of smart phone's battery life.

ContributorsSivakumar, Aswin (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)
Created2014
154384-Thumbnail Image.png
Description

Today's world is seeing a rapid technological advancement in various fields, having access to faster computers and better sensing devices. With such advancements, the task of recognizing human activities has been acknowledged as an important problem, with a wide range of applications such as surveillance, health monitoring and animation. Traditional

Today's world is seeing a rapid technological advancement in various fields, having access to faster computers and better sensing devices. With such advancements, the task of recognizing human activities has been acknowledged as an important problem, with a wide range of applications such as surveillance, health monitoring and animation. Traditional approaches to dynamical modeling have included linear and nonlinear methods with their respective drawbacks. An alternative idea I propose is the use of descriptors of the shape of the dynamical attractor as a feature representation for quantification of nature of dynamics. The framework has two main advantages over traditional approaches: a) representation of the dynamical system is derived directly from the observational data, without any inherent assumptions, and b) the proposed features show stability under different time-series lengths where traditional dynamical invariants fail.

Approximately 1\% of the total world population are stroke survivors, making it the most common neurological disorder. This increasing demand for rehabilitation facilities has been seen as a significant healthcare problem worldwide. The laborious and expensive process of visual monitoring by physical therapists has motivated my research to invent novel strategies to supplement therapy received in hospital in a home-setting. In this direction, I propose a general framework for tuning component-level kinematic features using therapists’ overall impressions of movement quality, in the context of a Home-based Adaptive Mixed Reality Rehabilitation (HAMRR) system.

The rapid technological advancements in computing and sensing has resulted in large amounts of data which requires powerful tools to analyze. In the recent past, topological data analysis methods have been investigated in various communities, and the work by Carlsson establishes that persistent homology can be used as a powerful topological data analysis approach for effectively analyzing large datasets. I have explored suitable topological data analysis methods and propose a framework for human activity analysis utilizing the same for applications such as action recognition.

ContributorsVenkataraman, Vinay (Author) / Turaga, Pavan (Thesis advisor) / Papandreou-Suppappol, Antonia (Committee member) / Krishnamurthi, Narayanan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)
Created2016
154387-Thumbnail Image.png
Description

The generation following post-modernism has been left with little to the imagination. In a world defined by continual technological distraction, Millennials absorb their world primarily through visual media. Where, then, is there a place for poetry, and how do writers reconcile a narcissistic world monopolized by "selfies" and virtual communication?

The generation following post-modernism has been left with little to the imagination. In a world defined by continual technological distraction, Millennials absorb their world primarily through visual media. Where, then, is there a place for poetry, and how do writers reconcile a narcissistic world monopolized by "selfies" and virtual communication? How does a poet use the "I" selflessly in order to achieve the universal? "Poetry as a Development of Human Empathy" attempts to bridge the divide between everyday society and poets that has been growing since experimental writing became more widely accepted after the atomic bomb, while exploring reasons as to how poetry has alienated itself as an art and ways in which poets might find a way back into being an important force in the world.

ContributorsAsdel, Bryan (Author) / Dubie, Norman (Thesis advisor) / Rios, Alberto (Committee member) / Goldberg, Beckian (Committee member) / Arizona State University (Publisher)
Created2016
154413-Thumbnail Image.png
Description

Raised on card-catalogues, then expected to save the world with microchips, there is a generation that was left straddling two millennia. Often lumped in with the X’ers or Millennials, this generation didn’t grow up with or without technology, technology grew up with them. The poems in The Aerodynamics of Hunger

Raised on card-catalogues, then expected to save the world with microchips, there is a generation that was left straddling two millennia. Often lumped in with the X’ers or Millennials, this generation didn’t grow up with or without technology, technology grew up with them. The poems in The Aerodynamics of Hunger strike a balance between the easy-going materialism of the 90’s and our current culture of instant gratification, between the tendency to treat science like a God and prescribe God like science. These poems see straight through the world of hypersex and click-bait, yet they admit their complicity in its creation and distribution. They watch the world become connected on a new level, but testify to the resulting struggle of place one’s self in relation to something, anything. The burden is great, but journeying through it is an undeniable pleasure.

ContributorsBassett, Kyle (Author) / Rios, Alberto A (Thesis advisor) / Dubie, Norman (Committee member) / Bell, Matthew (Committee member) / Arizona State University (Publisher)
Created2016
154572-Thumbnail Image.png
Description

This work examines two main areas in model-based time-varying signal processing with emphasis in speech processing applications. The first area concentrates on improving speech intelligibility and on increasing the proposed methodologies application for clinical practice in speech-language pathology. The second area concentrates on signal expansions matched to physical-based models but

This work examines two main areas in model-based time-varying signal processing with emphasis in speech processing applications. The first area concentrates on improving speech intelligibility and on increasing the proposed methodologies application for clinical practice in speech-language pathology. The second area concentrates on signal expansions matched to physical-based models but without requiring independent basis functions; the significance of this work is demonstrated with speech vowels.

A fully automated Vowel Space Area (VSA) computation method is proposed that can be applied to any type of speech. It is shown that the VSA provides an efficient and reliable measure and is correlated to speech intelligibility. A clinical tool that incorporates the automated VSA was proposed for evaluation and treatment to be used by speech language pathologists. Two exploratory studies are performed using two databases by analyzing mean formant trajectories in healthy speech for a wide range of speakers, dialects, and coarticulation contexts. It is shown that phonemes crowded in formant space can often have distinct trajectories, possibly due to accurate perception.

A theory for analyzing time-varying signals models with amplitude modulation and frequency modulation is developed. Examples are provided that demonstrate other possible signal model decompositions with independent basis functions and corresponding physical interpretations. The Hilbert transform (HT) and the use of the analytic form of a signal are motivated, and a proof is provided to show that a signal can still preserve desirable mathematical properties without the use of the HT. A visualization of the Hilbert spectrum is proposed to aid in the interpretation. A signal demodulation is proposed and used to develop a modified Empirical Mode Decomposition (EMD) algorithm.

ContributorsSandoval, Steven, 1984- (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Liss, Julie M (Committee member) / Turaga, Pavan (Committee member) / Kovvali, Narayan (Committee member) / Arizona State University (Publisher)
Created2016
154603-Thumbnail Image.png
Description

The increased risk of falling and the worse ability to perform other daily physical activities in the elderly cause concern about monitoring and correcting basic everyday movement. In this thesis, a Kinect-based system was designed to assess one of the most important factors in balance control of human body when

The increased risk of falling and the worse ability to perform other daily physical activities in the elderly cause concern about monitoring and correcting basic everyday movement. In this thesis, a Kinect-based system was designed to assess one of the most important factors in balance control of human body when doing Sit-to-Stand (STS) movement: the postural symmetry in mediolateral direction. A symmetry score, calculated by the data obtained from a Kinect RGB-D camera, was proposed to reflect the mediolateral postural symmetry degree and was used to drive a real-time audio feedback designed in MAX/MSP to help users adjust themselves to perform their movement in a more symmetrical way during STS. The symmetry score was verified by calculating the Spearman correlation coefficient with the data obtained from Inertial Measurement Unit (IMU) sensor and got an average value at 0.732. Five healthy adults, four males and one female, with normal balance abilities and with no musculoskeletal disorders, were selected to participate in the experiment and the results showed that the low-cost Kinect-based system has the potential to train users to perform a more symmetrical movement in mediolateral direction during STS movement.

ContributorsZhou, Henghao (Author) / Turaga, Pavan (Thesis advisor) / Ingalls, Todd (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)
Created2016