Search Content

Matching Items (2)

Filtering by

All Subjects: Computer Engineering

Modern Sensory Substitution for Vision in Dynamic Environments

Description

Societal infrastructure is built with vision at the forefront of daily life. For those with

severe visual impairments, this creates countless barriers to the participation and

enjoyment of life’s opportunities. Technological progress has been both a blessing and

a curse in this regard. Digital text together with screen readers and refreshable Braille

displays have made whole libraries readily accessible and rideshare tech has made

independent mobility more attainable. Simultaneously, screen-based interactions and

experiences have only grown in pervasiveness and importance, precluding many of

those with visual impairments.

Sensory Substituion, the process of substituting an unavailable modality with

another one, has shown promise as an alternative to accomodation, but in recent

years meaningful strides in Sensory Substitution for vision have declined in frequency.

Given recent advances in Computer Vision, this stagnation is especially disconcerting.

Designing Sensory Substitution Devices (SSDs) for vision for use in interactive settings

that leverage modern Computer Vision techniques presents a variety of challenges

including perceptual bandwidth, human-computer-interaction, and person-centered

machine learning considerations. To surmount these barriers an approach called Per-

sonal Foveated Haptic Gaze (PFHG), is introduced. PFHG consists of two primary

components: a human visual system inspired interaction paradigm that is intuitive

and flexible enough to generalize to a variety of applications called Foveated Haptic

Gaze (FHG), and a person-centered learning component to address the expressivity

limitations of most SSDs. This component is called One-Shot Object Detection by

Data Augmentation (1SODDA), a one-shot object detection approach that allows a

user to specify the objects they are interested in locating visually and with minimal

effort realizing an object detection model that does so effectively.

The Personal Foveated Haptic Gaze framework was realized in a virtual and real-

world application: playing a 3D, interactive, first person video game (DOOM) and

finding user-specified real-world objects. User study results found Foveated Haptic

Gaze to be an effective and intuitive interface for interacting with dynamic visual

world using solely haptics. Additionally, 1SODDA achieves competitive performance

among few-shot object detection methods and high-framerate many-shot object de-

tectors. The combination of which paves the way for modern Sensory Substitution

Devices for vision.

ContributorsFakhri, Bijan (Author) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy L (Committee member) / Venkateswara, Hemanth (Committee member) / Amor, Heni (Committee member) / Arizona State University (Publisher)

Created2020

Characterizing Dysarthric Speech with Transfer Learning

Description

Speech is known to serve as an early indicator of neurological decline, particularly in motor diseases. There is significant interest in developing automated, objective signal analytics that detect clinically-relevant changes and in evaluating these algorithms against the existing gold-standard: perceptual evaluation by trained speech and language pathologists. Hypernasality, the result of poor control of the velopharyngeal flap---the soft palate regulating airflow between the oral and nasal cavities---is one such speech symptom of interest, as precise velopharyngeal control is difficult to achieve under neuromuscular disorders. However, a host of co-modulating variables give hypernasal speech a complex and highly variable acoustic signature, making it difficult for skilled clinicians to assess and for automated systems to evaluate. Previous work in rating hypernasality from speech relies on either engineered features based on statistical signal processing or machine learning models trained end-to-end on clinical ratings of disordered speech examples. Engineered features often fail to capture the complex acoustic patterns associated with hypernasality, while end-to-end methods tend to overfit to the small datasets on which they are trained. In this thesis, I present a set of acoustic features, models, and strategies for characterizing hypernasality in dysarthric speech that split the difference between these two approaches, with the aim of capturing the complex perceptual character of hypernasality without overfitting to the small datasets available. The features are based on acoustic models trained on a large corpus of healthy speech, integrating expert knowledge to capture known perceptual characteristics of hypernasal speech. They are then used in relatively simple linear models to predict clinician hypernasality scores. These simple models are robust, generalizing across diseases and outperforming comprehensive set of baselines in accuracy and correlation. This novel approach represents a new state-of-the-art in objective hypernasality assessment.

ContributorsSaxon, Michael Stephen (Author) / Berisha, Visar (Thesis advisor) / Panchanathan, Sethuraman (Thesis advisor) / Venkateswara, Hemanth (Committee member) / Arizona State University (Publisher)

Created2020