Matching Items (5)

135018-Thumbnail Image.png

Voice Reconfigurable Networks

Description

The software element of home and small business networking solutions has failed to keep pace with annual development of newer and faster hardware. The software running on these devices is

The software element of home and small business networking solutions has failed to keep pace with annual development of newer and faster hardware. The software running on these devices is an afterthought, oftentimes equipped with minimal features, an obtuse user interface, or both. At the same time, this past year has seen the rise of smart home assistants that represent the next step in human-computer interaction with their advanced use of natural language processing. This project seeks to quell the issues with the former by exploring a possible fusion of a powerful, feature-rich software-defined networking stack and the incredible natural language processing tools of smart home assistants. To accomplish these ends, a piece of software was developed to leverage the powerful natural language processing capabilities of one such smart home assistant, the Amazon Echo. On one end, this software interacts with Amazon Web Services to retrieve information about a user's speech patterns and key information contained in their speech. On the other end, the software joins that information with its previous session state to intelligently translate speech into a series of commands for the separate components of a networking stack. The software developed for this project empowers a user to quickly make changes to several facets of their networking gear or acquire information about it with just their language \u2014 no terminals, java applets, or web configuration interfaces needed, thus circumventing clunky UI's or jumping from shell to shell. It is the author's hope that showing how networking equipment can be configured in this innovative way will draw more attention to the current failings of networking equipment and inspire a new series of intuitive user interfaces.

Contributors

Agent

Created

Date Created
  • 2016-12

135876-Thumbnail Image.png

A Guide to Speech Recognition Algorithms

Description

Many tasks that humans do from day to day are taken for granted in term of appreciating their true complexity. Humans are the only species on the planet that have

Many tasks that humans do from day to day are taken for granted in term of appreciating their true complexity. Humans are the only species on the planet that have developed such an in-depth means of auditory communication. Recreating the mechanisms in the brain that recognize speech patterns is no easy task. This paper compares and contrasts various algorithms used in modern day ASR systems, and focuses primarily on ASR systems in resource constrained environments. The Green colored blocks in Figure 1 will be focused on in greater detail throughout this paper, they are the key to building an exceptional ASR system. Deep Neural Networks (DNNs) are the clear and current leader among ASR technologies; all research in this field is currently revolving around this method. Although DNNs are very effective, many older methods of ASR are used often due to the complexities involved with DNNs; these difficulties include the large amount of hardware resources as well as development resources, such as engineers and money, required for this method.

Contributors

Agent

Created

Date Created
  • 2015-12

137137-Thumbnail Image.png

The Emblems: Speech-Recognition in Games

Description

Speech recognition in games is rarely seen. This work presents a project, a 2D computer game named "The Emblems" which utilizes speech recognition as input. The game itself is a

Speech recognition in games is rarely seen. This work presents a project, a 2D computer game named "The Emblems" which utilizes speech recognition as input. The game itself is a two person strategy game whose goal is to defeat the opposing player's army. This report focuses on the speech-recognition aspect of the project. The players interact on a turn-by-turn basis by speaking commands into the computer's microphone. When the computer recognizes a command, it will respond accordingly by having the player's unit perform an action on screen.

Contributors

Created

Date Created
  • 2014-05

153910-Thumbnail Image.png

In-vehicle multimodal interaction: an approach to mitigate driver distraction

Description

Despite the various driver assistance systems and electronics, the threat to life of driver, passengers and other people on the road still persists. With the growth in technology, the use

Despite the various driver assistance systems and electronics, the threat to life of driver, passengers and other people on the road still persists. With the growth in technology, the use of in-vehicle devices with a plethora of buttons and features is increasing resulting in increased distraction. Recently, speech recognition has emerged as an alternative to distraction and has the potential to be beneficial. However, considering the fact that automotive environment is dynamic and noisy in nature, distraction may not arise from the manual interaction, but due to the cognitive load. Hence, speech recognition certainly cannot be a reliable mode of communication.

The thesis is focused on proposing a simultaneous multimodal approach for designing interface between driver and vehicle with a goal to enable the driver to be more attentive to the driving tasks and spend less time fiddling with distractive tasks. By analyzing the human-human multimodal interaction techniques, new modes have been identified and experimented, especially suitable for the automotive context. The identified modes are touch, speech, graphics, voice-tip and text-tip. The multiple modes are intended to work collectively to make the interaction more intuitive and natural. In order to obtain a minimalist user-centered design for the center stack, various design principles such as 80/20 rule, contour bias, affordance, flexibility-usability trade-off etc. have been implemented on the prototypes. The prototype was developed using the Dragon software development kit on android platform for speech recognition.

In the present study, the driver behavior was investigated in an experiment conducted on the DriveSafety driving simulator DS-600s. Twelve volunteers drove the simulator under two conditions: (1) accessing the center stack applications using touch only and (2) accessing the applications using speech with offered text-tip. The duration for which user looked away from the road (eyes-off-road) was measured manually for each scenario. Comparison of results proved that eyes-off-road time is less for the second scenario. The minimalist design with 8-10 icons per screen proved to be effective as all the readings were within the driver distraction recommendations (eyes-off-road time < 2sec per screen) defined by NHTSA.

Contributors

Agent

Created

Date Created
  • 2015

154757-Thumbnail Image.png

Approximate neural networks for speech applications in resource-constrained environments

Description

Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory

Speech recognition and keyword detection are becoming increasingly popular applications for mobile systems. While deep neural network (DNN) implementation of these systems have very good performance,

they have large memory and compute resource requirements, making their implementation on a mobile device quite challenging. In this thesis, techniques to reduce the memory and computation cost

of keyword detection and speech recognition networks (or DNNs) are presented.

The first technique is based on representing all weights and biases by a small number of bits and mapping all nodal computations into fixed-point ones with minimal degradation in the

accuracy. Experiments conducted on the Resource Management (RM) database show that for the keyword detection neural network, representing the weights by 5 bits results in a 6 fold reduction in memory compared to a floating point implementation with very little loss in performance. Similarly, for the speech recognition neural network, representing the weights by 6 bits results in a 5 fold reduction in memory while maintaining an error rate similar to a floating point implementation. Additional reduction in memory is achieved by a technique called weight pruning,

where the weights are classified as sensitive and insensitive and the sensitive weights are represented with higher precision. A combination of these two techniques helps reduce the memory

footprint by 81 - 84% for speech recognition and keyword detection networks respectively.

Further reduction in memory size is achieved by judiciously dropping connections for large blocks of weights. The corresponding technique, termed coarse-grain sparsification, introduces

hardware-aware sparsity during DNN training, which leads to efficient weight memory compression and significant reduction in the number of computations during classification without

loss of accuracy. Keyword detection and speech recognition DNNs trained with 75% of the weights dropped and classified with 5-6 bit weight precision effectively reduced the weight memory

requirement by ~95% compared to a fully-connected network with double precision, while showing similar performance in keyword detection accuracy and word error rate.

Contributors

Agent

Created

Date Created
  • 2016