Matching Items (14)
Filtering by

Clear all filters

151948-Thumbnail Image.png
Description
Smart home system (SHS) is a kind of information system aiming at realizing home automation. The SHS can connect with almost any kind of electronic/electric device used in a home so that they can be controlled and monitored centrally. Today's technology also allows the home owners to control and monitor

Smart home system (SHS) is a kind of information system aiming at realizing home automation. The SHS can connect with almost any kind of electronic/electric device used in a home so that they can be controlled and monitored centrally. Today's technology also allows the home owners to control and monitor the SHS installed in their homes remotely. This is typically realized by giving the SHS network access ability. Although the SHS's network access ability brings a lot of conveniences to the home owners, it also makes the SHS facing more security threats than ever before. As a result, when designing a SHS, the security threats it might face should be given careful considerations. System security threats can be solved properly by understanding them and knowing the parts in the system that should be protected against them first. This leads to the idea of solving the security threats a SHS might face from the requirements engineering level. Following this idea, this paper proposes a systematic approach to generate the security requirements specifications for the SHS. It can be viewed as the first step toward the complete SHS security requirements engineering process.
ContributorsXu, Rongcao (Author) / Ghazarian, Arbi (Thesis advisor) / Bansal, Ajay (Committee member) / Lindquist, Timothy (Committee member) / Arizona State University (Publisher)
Created2013
Description
The world’s population is currently 9% visually impaired. Medical sciences do not have a biological fix that can cure this visual impairment. Visually impaired people are currently being assisted with biological fixes or assistive devices. The current assistive devices are limited in size as well as resolution. This thesis presents

The world’s population is currently 9% visually impaired. Medical sciences do not have a biological fix that can cure this visual impairment. Visually impaired people are currently being assisted with biological fixes or assistive devices. The current assistive devices are limited in size as well as resolution. This thesis presents the development and experimental validation of a control system for a new vibrotactile haptic display that is currently in development. In order to allow the vibrotactile haptic display to be used to represent motion, the control system must be able to change the image displayed at a rate of at least 30 frames/second. In order to achieve this, this thesis introduces and investigates the use of three improvements: threading, change filtering, and wave libraries. Through these methods, it is determined that an average of 40 frames/second can be achieved.
ContributorsKIM, KENDRA (Author) / Sodemann, Angela (Thesis advisor) / Robertson, John (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)
Created2018
156331-Thumbnail Image.png
Description
Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and efficient, search over that graph.

To facilitate rapid, correct, efficient, and intuitive development of graph based solutions we propose a new programming language construct - the search statement. Given a supra-root node, a procedure which determines the children of a given parent node, and optional definitions of the fail-fast acceptance or rejection of a solution, the search statement can conduct a search over any graph or network. Structurally, this statement is modelled after the common switch statement and is put into a largely imperative/procedural context to allow for immediate and intuitive development by most programmers. The Go programming language has been used as a foundation and proof-of-concept of the search statement. A Go compiler is provided which implements this construct.
ContributorsHenderson, Christopher (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)
Created2018
154862-Thumbnail Image.png
Description
The last decade has witnessed a paradigm shift in computing platforms, from laptops and servers to mobile devices like smartphones and tablets. These devices host an immense variety of applications many of which are computationally expensive and thus are power hungry. As most of these mobile platforms are powered by

The last decade has witnessed a paradigm shift in computing platforms, from laptops and servers to mobile devices like smartphones and tablets. These devices host an immense variety of applications many of which are computationally expensive and thus are power hungry. As most of these mobile platforms are powered by batteries, energy efficiency has become one of the most critical aspects of such devices. Thus, the energy cost of the fundamental arithmetic operations executed in these applications has to be reduced. As voltage scaling has effectively ended, the energy efficiency of integrated circuits has ceased to improve within successive generations of transistors. This resulted in widespread use of Application Specific Integrated Circuits (ASIC), which provide incredible energy efficiency. However, these are not flexible and have high non-recurring engineering (NRE) cost. Alternatively, Field Programmable Gate Arrays (FPGA) offer flexibility to implement any application, but at the cost of higher area and energy compared to ASIC.

In this work, a spatially programmable architecture customized for image processing applications is proposed. The intent is to bridge the efficiency gap between ASICs and FPGAs, by offering FPGA-like flexibility and ASIC-like energy efficiency. This architecture minimizes the energy overheads in FPGAs, which result from the use of fine-grained programming style and global interconnect. It is flexible compared to an ASIC and can accommodate multiple applications.

The main contribution of the thesis is the feasibility analysis of the data path of this architecture, customized for image processing applications. The data path is implemented at the register transfer level (RTL), and the synthesis results are obtained in 45nm technology cell library from a leading foundry. The results of image-processing applications demonstrate that this architecture is within a factor of 10x of the energy and area efficiency of ASIC implementations.
ContributorsSatapathy, Saktiswarup (Author) / Brunhaver, John (Thesis advisor) / Clark, Lawrence T (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)
Created2016
155058-Thumbnail Image.png
Description
Coarse-grained Reconfigurable Arrays (CGRAs) are promising accelerators capable

of accelerating even non-parallel loops and loops with low trip-counts. One challenge

in compiling for CGRAs is to manage both recurring and nonrecurring variables in

the register file (RF) of the CGRA. Although prior works have managed recurring

variables via rotating RF, they access the nonrecurring

Coarse-grained Reconfigurable Arrays (CGRAs) are promising accelerators capable

of accelerating even non-parallel loops and loops with low trip-counts. One challenge

in compiling for CGRAs is to manage both recurring and nonrecurring variables in

the register file (RF) of the CGRA. Although prior works have managed recurring

variables via rotating RF, they access the nonrecurring variables through either a

global RF or from a constant memory. The former does not scale well, and the latter

degrades the mapping quality. This work proposes a hardware-software codesign

approach in order to manage all the variables in a local nonrotating RF. Hardware

provides modulo addition based indexing mechanism to enable correct addressing

of recurring variables in a nonrotating RF. The compiler determines the number of

registers required for each recurring variable and configures the boundary between the

registers used for recurring and nonrecurring variables. The compiler also pre-loads

the read-only variables and constants into the local registers in the prologue of the

schedule. Synthesis and place-and-route results of the previous and the proposed RF

design show that proposed solution achieves 17% better cycle time. Experiments of

mapping several important and performance-critical loops collected from MiBench

show proposed approach improves performance (through better mapping) by 18%,

compared to using constant memory.
ContributorsDave, Shail (Author) / Shrivastava, Aviral (Thesis advisor) / Ren, Fengbo (Committee member) / Ogras, Umit Y. (Committee member) / Arizona State University (Publisher)
Created2016
155791-Thumbnail Image.png
Description
Caches pose a serious limitation in scaling many-core architectures since the demand of area and power for maintaining cache coherence increases rapidly with the number of cores. Scratch-Pad Memories (SPMs) provide a cheaper and lower power alternative that can be used to build a more scalable many-core architecture. The trade-off

Caches pose a serious limitation in scaling many-core architectures since the demand of area and power for maintaining cache coherence increases rapidly with the number of cores. Scratch-Pad Memories (SPMs) provide a cheaper and lower power alternative that can be used to build a more scalable many-core architecture. The trade-off of substituting SPMs for caches is however that the data must be explicitly managed in software. Heap management on SPM poses a major challenge due to the highly dynamic nature of of heap data access. Most existing heap management techniques implement a software caching scheme on SPM, emulating the behavior of hardware caches. The state-of-the-art heap management scheme implements a 4-way set-associative software cache on SPM for a single program running with one thread on one core. While the technique works correctly, it suffers from signifcant performance overhead. This paper presents a series of compiler-based efficient heap management approaches that reduces heap management overhead through several optimization techniques. Experimental results on benchmarks from MiBenchGuthaus et al. (2001) executed on an SMM processor modeled in gem5Binkert et al. (2011) demonstrate that our approach (implemented in llvm v3.8Lattner and Adve (2004)) can improve execution time by 80% on average compared to the previous state-of-the-art.
ContributorsLin, Jinn-Pean (Author) / Shrivastava, Aviral (Thesis advisor) / Ren, Fengbo (Committee member) / Ogras, Umit Y. (Committee member) / Arizona State University (Publisher)
Created2017
155631-Thumbnail Image.png
Description
The information era has brought about many technological advancements in the past

few decades, and that has led to an exponential increase in the creation of digital images and

videos. Constantly, all digital images go through some image processing algorithm for

various reasons like compression, transmission, storage, etc. There is data loss during

The information era has brought about many technological advancements in the past

few decades, and that has led to an exponential increase in the creation of digital images and

videos. Constantly, all digital images go through some image processing algorithm for

various reasons like compression, transmission, storage, etc. There is data loss during this

process which leaves us with a degraded image. Hence, to ensure minimal degradation of

images, the requirement for quality assessment has become mandatory. Image Quality

Assessment (IQA) has been researched and developed over the last several decades to

predict the quality score in a manner that agrees with human judgments of quality. Modern

image quality assessment (IQA) algorithms are quite effective at prediction accuracy, and

their development has not focused on improving computational performance. The existing

serial implementation requires a relatively large run-time on the order of seconds for a single

frame. Hardware acceleration using Field programmable gate arrays (FPGAs) provides

reconfigurable computing fabric that can be tailored for a broad range of applications.

Usually, programming FPGAs has required expertise in hardware descriptive languages

(HDLs) or high-level synthesis (HLS) tool. OpenCL is an open standard for cross-platform,

parallel programming of heterogeneous systems along with Altera OpenCL SDK, enabling

developers to use FPGA's potential without extensive hardware knowledge. Hence, this

thesis focuses on accelerating the computationally intensive part of the most apparent

distortion (MAD) algorithm on FPGA using OpenCL. The results are compared with CPU

implementation to evaluate performance and efficiency gains.
ContributorsGunavelu Mohan, Aswin (Author) / Sohoni, Sohum (Thesis advisor) / Ren, Fengbo (Thesis advisor) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)
Created2017
155292-Thumbnail Image.png
Description
Image processing has changed the way we store, view and share images. One important component of sharing images over the networks is image compression. Lossy image compression techniques compromise the quality of images to reduce their size. To ensure that the distortion of images due to image compression is not

Image processing has changed the way we store, view and share images. One important component of sharing images over the networks is image compression. Lossy image compression techniques compromise the quality of images to reduce their size. To ensure that the distortion of images due to image compression is not highly detectable by humans, the perceived quality of an image needs to be maintained over a certain threshold. Determining this threshold is best done using human subjects, but that is impractical in real-world scenarios. As a solution to this issue, image quality assessment (IQA) algorithms are used to automatically compute a fidelity score of an image.

However, poor performance of IQA algorithms has been observed due to complex statistical computations involved. General Purpose Graphics Processing Unit (GPGPU) programming is one of the solutions proposed to optimize the performance of these algorithms.

This thesis presents a Compute Unified Device Architecture (CUDA) based optimized implementation of full reference IQA algorithm, Visual Signal to Noise Ratio (VSNR) that uses M-level 2D Discrete Wavelet Transform (DWT) with 9/7 biorthogonal filters among other statistical computations. The presented implementation is tested upon four different image quality databases containing images with multiple distortions and sizes ranging from 512 x 512 to 1600 x 1280. The CUDA implementation of VSNR shows a speedup of over 32x for 1600 x 1280 images. It is observed that the speedup scales with the increase in size of images. The results showed that the implementation is fast enough to use VSNR on high definition videos with a frame rate of 60 fps. This work presents the optimizations made due to the use of GPU’s constant memory and reuse of allocated memory on the GPU. Also, it shows the performance improvement using profiler driven GPGPU development in CUDA. The presented implementation can be deployed in production combined with existing applications.
ContributorsGupta, Ayush (Author) / Sohoni, Sohum (Thesis advisor) / Amresh, Ashish (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)
Created2017
155154-Thumbnail Image.png
Description
Achieving human level intelligence is a long-term goal for many Artificial Intelligence (AI) researchers. Recent developments in combining deep learning and reinforcement learning helped us to move a step forward in achieving this goal. Reinforcement learning using a delayed reward mechanism is an approach to machine intelligence which studies decision

Achieving human level intelligence is a long-term goal for many Artificial Intelligence (AI) researchers. Recent developments in combining deep learning and reinforcement learning helped us to move a step forward in achieving this goal. Reinforcement learning using a delayed reward mechanism is an approach to machine intelligence which studies decision making with control and how a decision making agent can learn to act optimally in an environment-unaware conditions.

Q-learning is one of the model-free reinforcement directed learning strategies which uses temporal differences to estimate the performances of state-action pairs called Q values. A simple implementation of Q-learning algorithm can be done using a Q table memory to store and update the Q values. However, with an increase in state space data due to a complex environment, and with an increase in possible number of actions an agent can perform, Q table reaches its space limit and would be difficult to scale well. Q-learning with neural networks eliminates the use of Q table by approximating the Q function using neural networks.

Autonomous agents need to develop cognitive properties and become self-adaptive to be deployable in any environment. Reinforcement learning with Q-learning have been very efficient in solving such problems. However, embedded systems like space rovers and autonomous robots rarely implement such techniques due to the constraints faced like processing power, chip area, convergence rate and cost of the chip. These problems present a need for a portable, low power, area efficient hardware accelerator to accelerate the process of such learning.

This problem is targeted by implementing a hardware schematic architecture for Q-learning using Artificial Neural networks. This architecture exploits the massive parallelism provided by neural network with a dedicated fine grain parallelism provided by a Field Programmable Gate Array (FPGA) thereby processing the Q values at a high throughput. Mars exploration rovers currently use Xilinx-Space-grade FPGA devices for image processing, pyrotechnic operation control and obstacle avoidance. The hardware resource consumption for the architecture has been synthesized considering Xilinx Virtex7 FPGA as the target device.
ContributorsGankidi, Pranay Reddy (Author) / Thangavelautham, Jekanthan (Thesis advisor) / Ren, Fengbo (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)
Created2016
Description
There exists extensive research on the use of twisty puzzles, such as the Rubik's Cube, in educational contexts to assist in developing critical thinking skills and in teaching abstract concepts, such as group theory. However, the existing research does not consider the use of twisty puzzles in developing language proficiency.

There exists extensive research on the use of twisty puzzles, such as the Rubik's Cube, in educational contexts to assist in developing critical thinking skills and in teaching abstract concepts, such as group theory. However, the existing research does not consider the use of twisty puzzles in developing language proficiency. Furthermore, there remain methodological issues in integrating standard twisty puzzles into a class curriculum due to the ease with which erroneous cube twists occur, leading to a puzzle scramble that deviates from the intended teaching goal. To address these issues, an extensive examination of the "smart cube" market took place in order to determine whether a device that virtualizes twisty puzzles while maintaining the intuitive tactility of manipulating such puzzles can be employed both to fill the language education void and to mitigate the potential frustration experienced by students who unintentionally scramble a puzzle due to executing the wrong moves. This examination revealed the presence of Bluetooth smart cubes, which are capable of interfacing with a companion web or mobile application that visualizes and reacts to puzzle manipulations. This examination also revealed the presence of a device called the WOWCube, which is a 2x2x2 smart cube entertainment system that has 24 Liquid Crystal Display (LCD) screens, one for each face's square, enabling better integration of the application with the puzzle hardware. Developing applications both for the Bluetooth smart cube using React Native and for the WOWCube demonstrated the higher feasibility of developing with the WOWCube due to its streamlined development kit as well as its ability to tie the application to the device hardware, enhancing the tactile immersion of the players with the application itself. Using the WOWCube, a word puzzle game featuring three game modes was implemented to assist in teaching players English vocabulary. Due to its incorporation of features that enable dynamic puzzle generation and resetting, players who participated in a user survey found that the game was compelling and that it exercised their critical thinking skills. This demonstrates the feasibility of smart cube applications in both critical thinking and language skills.
ContributorsHreshchyshyn, Jacob (Author) / Bansal, Ajay (Thesis advisor) / Mehlhase, Alexandra (Committee member) / Baron, Tyler (Committee member) / Arizona State University (Publisher)
Created2023