Matching Items (126)

Filtering by

Clear all filters

152361-Thumbnail Image.png

Techniques for soundscape retrieval and synthesis

Description

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This

The study of acoustic ecology is concerned with the manner in which life interacts with its environment as mediated through sound. As such, a central focus is that of the soundscape: the acoustic environment as perceived by a listener. This dissertation examines the application of several computational tools in the realms of digital signal processing, multimedia information retrieval, and computer music synthesis to the analysis of the soundscape. Namely, these tools include a) an open source software library, Sirens, which can be used for the segmentation of long environmental field recordings into individual sonic events and compare these events in terms of acoustic content, b) a graph-based retrieval system that can use these measures of acoustic similarity and measures of semantic similarity using the lexical database WordNet to perform both text-based retrieval and automatic annotation of environmental sounds, and c) new techniques for the dynamic, realtime parametric morphing of multiple field recordings, informed by the geographic paths along which they were recorded.

Contributors

Agent

Created

Date Created
2013

152037-Thumbnail Image.png

Use of a non-invasive acoustical monitoring system to predict ad libitum eating events

Description

Obesity is currently a prevalent health concern in the United States. Essential to combating it are accurate methods of assessing individual dietary intake under ad libitum conditions. The acoustical monitoring system (AMS), consisting of a throat microphone and jaw strain

Obesity is currently a prevalent health concern in the United States. Essential to combating it are accurate methods of assessing individual dietary intake under ad libitum conditions. The acoustical monitoring system (AMS), consisting of a throat microphone and jaw strain sensor, has been proposed as a non-invasive method for tracking free-living eating events. This study assessed the accuracy of eating events tracked by the AMS, compared to the validated vending machine system used by the NIDDK in Phoenix. Application of AMS data toward estimation of mass and calories consumed was also considered. In this study, 10 participants wore the AMS in a clinical setting for 24 hours while all food intake was recorded by the vending machine. Results indicated a correlation of 0.76 between number of eating events by the AMS and the vending machine (p = 0.019). A dependent T-test yielded a p-value of 0.799, illustrating a lack of significant difference between these methods of tracking intake. Finally, number of seconds identified as eating by the AMS had a 0.91 correlation with mass of intake (p = 0.001) and a 0.70 correlation with calories of intake (p = 0.034). These results indicate that the AMS is a valid method of objectively recording eating events under ad libitum conditions. Additional research is required to validate this device under free-living conditions.

Contributors

Agent

Created

Date Created
2013

152165-Thumbnail Image.png

Informatics approach to improving surgical skills training

Description

Surgery as a profession requires significant training to improve both clinical decision making and psychomotor proficiency. In the medical knowledge domain, tools have been developed, validated, and accepted for evaluation of surgeons' competencies. However, assessment of the psychomotor skills still

Surgery as a profession requires significant training to improve both clinical decision making and psychomotor proficiency. In the medical knowledge domain, tools have been developed, validated, and accepted for evaluation of surgeons' competencies. However, assessment of the psychomotor skills still relies on the Halstedian model of apprenticeship, wherein surgeons are observed during residency for judgment of their skills. Although the value of this method of skills assessment cannot be ignored, novel methodologies of objective skills assessment need to be designed, developed, and evaluated that augment the traditional approach. Several sensor-based systems have been developed to measure a user's skill quantitatively, but use of sensors could interfere with skill execution and thus limit the potential for evaluating real-life surgery. However, having a method to judge skills automatically in real-life conditions should be the ultimate goal, since only with such features that a system would be widely adopted. This research proposes a novel video-based approach for observing surgeons' hand and surgical tool movements in minimally invasive surgical training exercises as well as during laparoscopic surgery. Because our system does not require surgeons to wear special sensors, it has the distinct advantage over alternatives of offering skills assessment in both learning and real-life environments. The system automatically detects major skill-measuring features from surgical task videos using a computing system composed of a series of computer vision algorithms and provides on-screen real-time performance feedback for more efficient skill learning. Finally, the machine-learning approach is used to develop an observer-independent composite scoring model through objective and quantitative measurement of surgical skills. To increase effectiveness and usability of the developed system, it is integrated with a cloud-based tool, which automatically assesses surgical videos upload to the cloud.

Contributors

Agent

Created

Date Created
2013

152003-Thumbnail Image.png

Classifying everyday activity through label propagation with sparse training data

Description

We solve the problem of activity verification in the context of sustainability. Activity verification is the process of proving the user assertions pertaining to a certain activity performed by the user. Our motivation lies in incentivizing the user for engaging

We solve the problem of activity verification in the context of sustainability. Activity verification is the process of proving the user assertions pertaining to a certain activity performed by the user. Our motivation lies in incentivizing the user for engaging in sustainable activities like taking public transport or recycling. Such incentivization schemes require the system to verify the claim made by the user. The system verifies these claims by analyzing the supporting evidence captured by the user while performing the activity. The proliferation of portable smart-phones in the past few years has provided us with a ubiquitous and relatively cheap platform, having multiple sensors like accelerometer, gyroscope, microphone etc. to capture this evidence data in-situ. In this research, we investigate the supervised and semi-supervised learning techniques for activity verification. Both these techniques make use the data set constructed using the evidence submitted by the user. Supervised learning makes use of annotated evidence data to build a function to predict the class labels of the unlabeled data points. The evidence data captured can be either unimodal or multimodal in nature. We use the accelerometer data as evidence for transportation mode verification and image data as evidence for recycling verification. After training the system, we achieve maximum accuracy of 94% when classifying the transport mode and 81% when detecting recycle activity. In the case of recycle verification, we could improve the classification accuracy by asking the user for more evidence. We present some techniques to ask the user for the next best piece of evidence that maximizes the probability of classification. Using these techniques for detecting recycle activity, the accuracy increases to 93%. The major disadvantage of using supervised models is that it requires extensive annotated training data, which expensive to collect. Due to the limited training data, we look at the graph based inductive semi-supervised learning methods to propagate the labels among the unlabeled samples. In the semi-supervised approach, we represent each instance in the data set as a node in the graph. Since it is a complete graph, edges interconnect these nodes, with each edge having some weight representing the similarity between the points. We propagate the labels in this graph, based on the proximity of the data points to the labeled nodes. We estimate the performance of these algorithms by measuring how close the probability distribution of the data after label propagation is to the probability distribution of the ground truth data. Since labeling has a cost associated with it, in this thesis we propose two algorithms that help us in selecting minimum number of labeled points to propagate the labels accurately. Our proposed algorithm achieves a maximum of 73% increase in performance when compared to the baseline algorithm.

Contributors

Agent

Created

Date Created
2013

150599-Thumbnail Image.png

Somatic ABC's: a theoretical framework for designing, developing and evaluating the building blocks of touch-based information delivery

Description

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness

Situations of sensory overload are steadily becoming more frequent as the ubiquity of technology approaches reality--particularly with the advent of socio-communicative smartphone applications, and pervasive, high speed wireless networks. Although the ease of accessing information has improved our communication effectiveness and efficiency, our visual and auditory modalities--those modalities that today's computerized devices and displays largely engage--have become overloaded, creating possibilities for distractions, delays and high cognitive load; which in turn can lead to a loss of situational awareness, increasing chances for life threatening situations such as texting while driving. Surprisingly, alternative modalities for information delivery have seen little exploration. Touch, in particular, is a promising candidate given that it is our largest sensory organ with impressive spatial and temporal acuity. Although some approaches have been proposed for touch-based information delivery, they are not without limitations including high learning curves, limited applicability and/or limited expression. This is largely due to the lack of a versatile, comprehensive design theory--specifically, a theory that addresses the design of touch-based building blocks for expandable, efficient, rich and robust touch languages that are easy to learn and use. Moreover, beyond design, there is a lack of implementation and evaluation theories for such languages. To overcome these limitations, a unified, theoretical framework, inspired by natural, spoken language, is proposed called Somatic ABC's for Articulating (designing), Building (developing) and Confirming (evaluating) touch-based languages. To evaluate the usefulness of Somatic ABC's, its design, implementation and evaluation theories were applied to create communication languages for two very unique application areas: audio described movies and motor learning. These applications were chosen as they presented opportunities for complementing communication by offloading information, typically conveyed visually and/or aurally, to the skin. For both studies, it was found that Somatic ABC's aided the design, development and evaluation of rich somatic languages with distinct and natural communication units.

Contributors

Agent

Created

Date Created
2012

149621-Thumbnail Image.png

Mediated social interpersonal communication: evidence-based understanding of multimedia solutions for enriching social situational awareness

Description

Social situational awareness, or the attentiveness to one's social surroundings, including the people, their interactions and their behaviors is a complex sensory-cognitive-motor task that requires one to be engaged thoroughly in understanding their social interactions. These interactions are formed out

Social situational awareness, or the attentiveness to one's social surroundings, including the people, their interactions and their behaviors is a complex sensory-cognitive-motor task that requires one to be engaged thoroughly in understanding their social interactions. These interactions are formed out of the elements of human interpersonal communication including both verbal and non-verbal cues. While the verbal cues are instructive and delivered through speech, the non-verbal cues are mostly interpretive and requires the full attention of the participants to understand, comprehend and respond to them appropriately. Unfortunately certain situations are not conducive for a person to have complete access to their social surroundings, especially the non-verbal cues. For example, a person is who is blind or visually impaired may find that the non-verbal cues like smiling, head nod, eye contact, body gestures and facial expressions of their interaction partners are not accessible due to their sensory deprivation. The same could be said of people who are remotely engaged in a conversation and physically separated to have a visual access to one's body and facial mannerisms. This dissertation describes novel multimedia technologies to aid situations where it is necessary to mediate social situational information between interacting participants. As an example of the proposed system, an evidence-based model for understanding the accessibility problem faced by people who are blind or visually impaired is described in detail. From the derived model, a sleuth of sensing and delivery technologies that use state-of-the-art computer vision algorithms in combination with novel haptic interfaces are developed towards a) A Dyadic Interaction Assistant, capable of helping individuals who are blind to access important head and face based non-verbal communicative cues during one-on-one dyadic interactions, and b) A Group Interaction Assistant, capable of provide situational awareness about the interaction partners and their dynamics to a user who is blind, while also providing important social feedback about their own body mannerisms. The goal is to increase the effective social situational information that one has access to, with the conjuncture that a good awareness of one's social surroundings gives them the ability to understand and empathize with their interaction partners better. Extending the work from an important social interaction assistive technology, the need for enriched social situational awareness is everyday professional situations are also discussed, including, a) enriched remote interactions between physically separated interaction partners, and b) enriched communication between medical professionals during critical care procedures, towards enhanced patient safety. In the concluding remarks, this dissertation engages the readers into a science and technology policy discussion on the potential effect of a new technology like the social interaction assistant on the society. Discussing along the policy lines, social disability is highlighted as an important area that requires special attention from researchers and policy makers. Given that the proposed technology relies on wearable inconspicuous cameras, the discussion of privacy policies is extended to encompass newly evolving interpersonal interaction recorders, like the one presented in this dissertation.

Contributors

Agent

Created

Date Created
2011

149503-Thumbnail Image.png

Stereo based visual odometry

Description

The exponential rise in unmanned aerial vehicles has necessitated the need for accurate pose estimation under any extreme conditions. Visual Odometry (VO) is the estimation of position and orientation of a vehicle based on analysis of a sequence of images

The exponential rise in unmanned aerial vehicles has necessitated the need for accurate pose estimation under any extreme conditions. Visual Odometry (VO) is the estimation of position and orientation of a vehicle based on analysis of a sequence of images captured from a camera mounted on it. VO offers a cheap and relatively accurate alternative to conventional odometry techniques like wheel odometry, inertial measurement systems and global positioning system (GPS). This thesis implements and analyzes the performance of a two camera based VO called Stereo based visual odometry (SVO) in presence of various deterrent factors like shadows, extremely bright outdoors, wet conditions etc... To allow the implementation of VO on any generic vehicle, a discussion on porting of the VO algorithm to android handsets is presented too. The SVO is implemented in three steps. In the first step, a dense disparity map for a scene is computed. To achieve this we utilize sum of absolute differences technique for stereo matching on rectified and pre-filtered stereo frames. Epipolar geometry is used to simplify the matching problem. The second step involves feature detection and temporal matching. Feature detection is carried out by Harris corner detector. These features are matched between two consecutive frames using the Lucas-Kanade feature tracker. The 3D co-ordinates of these matched set of features are computed from the disparity map obtained from the first step and are mapped into each other by a translation and a rotation. The rotation and translation is computed using least squares minimization with the aid of Singular Value Decomposition. Random Sample Consensus (RANSAC) is used for outlier detection. This comprises the third step. The accuracy of the algorithm is quantified based on the final position error, which is the difference between the final position computed by the SVO algorithm and the final ground truth position as obtained from the GPS. The SVO showed an error of around 1% under normal conditions for a path length of 60 m and around 3% in bright conditions for a path length of 130 m. The algorithm suffered in presence of shadows and vibrations, with errors of around 15% and path lengths of 20 m and 100 m respectively.

Contributors

Agent

Created

Date Created
2010

149543-Thumbnail Image.png

Replay debugger for multi threaded android applications

Description

Debugging is a hard task. Debugging multi-threaded applications with their inherit non-determinism is all the more difficult. Non-determinism of any kind adds to the difficulty of cyclic debugging. In Android applications which are written in Java, threads and concurrency constructs

Debugging is a hard task. Debugging multi-threaded applications with their inherit non-determinism is all the more difficult. Non-determinism of any kind adds to the difficulty of cyclic debugging. In Android applications which are written in Java, threads and concurrency constructs introduce non-determinism to the program execution. Even with the same input, consecutive runs may not be the same and reproducing the same bug is a challenging task. This makes it difficult to understand and analyze the execution behavior or to understand the source of a failing execution. This thesis introduces a replay mechanism for Android applications written in Java and is based on the Lamport Clock. This tool provides the user with a controlled debugging environment, where the program execution follows the identical partially ordered happened-before dependency among threads, as during the recorded execution. In this, certain significant events like thread creation, synchronization etc. are recorded during run-time. They can later be replayed off-line, as many times as needed to pinpoint and fix an error in the application. It is software based approach and has been implemented by modifying the Dalvik Virtual Machine in the Android platform. The method of replay described in this thesis is independent of the underlying operating system scheduler.

Contributors

Agent

Created

Date Created
2011

149922-Thumbnail Image.png

Mining semantics from low-level features in multimedia computing

Description

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a

Bridging semantic gap is one of the fundamental problems in multimedia computing and pattern recognition. The challenge of associating low-level signal with their high-level semantic interpretation is mainly due to the fact that semantics are often conveyed implicitly in a context, relying on interactions among multiple levels of concepts or low-level data entities. Also, additional domain knowledge may often be indispensable for uncovering the underlying semantics, but in most cases such domain knowledge is not readily available from the acquired media streams. Thus, making use of various types of contextual information and leveraging corresponding domain knowledge are vital for effectively associating high-level semantics with low-level signals with higher accuracies in multimedia computing problems. In this work, novel computational methods are explored and developed for incorporating contextual information/domain knowledge in different forms for multimedia computing and pattern recognition problems. Specifically, a novel Bayesian approach with statistical-sampling-based inference is proposed for incorporating a special type of domain knowledge, spatial prior for the underlying shapes; cross-modality correlations via Kernel Canonical Correlation Analysis is explored and the learnt space is then used for associating multimedia contents in different forms; model contextual information as a graph is leveraged for regulating interactions among high-level semantic concepts (e.g., category labels), low-level input signal (e.g., spatial/temporal structure). Four real-world applications, including visual-to-tactile face conversion, photo tag recommendation, wild web video classification and unconstrained consumer video summarization, are selected to demonstrate the effectiveness of the approaches. These applications range from classic research challenges to emerging tasks in multimedia computing. Results from experiments on large-scale real-world data with comparisons to other state-of-the-art methods and subjective evaluations with end users confirmed that the developed approaches exhibit salient advantages, suggesting that they are promising for leveraging contextual information/domain knowledge for a wide range of multimedia computing and pattern recognition problems.

Contributors

Agent

Created

Date Created
2011

Robust margin based classifiers for small sample data

Description

In many classication problems data samples cannot be collected easily, example in drug trials, biological experiments and study on cancer patients. In many situations the data set size is small and there are many outliers. When classifying such data, example

In many classication problems data samples cannot be collected easily, example in drug trials, biological experiments and study on cancer patients. In many situations the data set size is small and there are many outliers. When classifying such data, example cancer vs normal patients the consequences of mis-classication are probably more important than any other data type, because the data point could be a cancer patient or the classication decision could help determine what gene might be over expressed and perhaps a cause of cancer. These mis-classications are typically higher in the presence of outlier data points. The aim of this thesis is to develop a maximum margin classier that is suited to address the lack of robustness of discriminant based classiers (like the Support Vector Machine (SVM)) to noise and outliers. The underlying notion is to adopt and develop a natural loss function that is more robust to outliers and more representative of the true loss function of the data. It is demonstrated experimentally that SVM's are indeed susceptible to outliers and that the new classier developed, here coined as Robust-SVM (RSVM), is superior to all studied classier on the synthetic datasets. It is superior to the SVM in both the synthetic and experimental data from biomedical studies and is competent to a classier derived on similar lines when real life data examples are considered.

Contributors

Agent

Created

Date Created
2011