Search Content

Scratchpad Management in Software Managed Manycore Architectures

Description

Caches have long been used to reduce memory access latency. However, the increased complexity of cache coherence brings significant challenges in processor design as the number of cores increases. While making caches scalable is still an important research problem, some researchers are exploring the possibility of a more power-efficient SRAM…

Caches have long been used to reduce memory access latency. However, the increased complexity of cache coherence brings significant challenges in processor design as the number of cores increases. While making caches scalable is still an important research problem, some researchers are exploring the possibility of a more power-efficient SRAM called scratchpad memories or SPMs. SPMs consume significantly less area, and are more energy-efficient per access than caches, and therefore make the design of on-chip memories much simpler. Unlike caches, which fetch data from memories automatically, an SPM requires explicit instructions for data transfers. SPM-only architectures are thus named as software managed manycore (SMM), since the data movements of such architectures rely on software. SMM processors have been widely used in different areas, such as embedded computing, network processing, or even high performance computing. While SMM processors provide a low-power platform, the hardware alone does not guarantee power efficiency, if applications on such processors deliver low performance. Efficient software techniques are therefore required. A big body of management techniques for SMM architectures are compiler-directed, as inserting data movement operations by hand forces programmers to trace flow of data, which can be error-prone and sometimes difficult if not impossible. This thesis develops compiler-directed techniques to manage data transfers for embedded applications on SMMs efficiently. The techniques analyze and find out the proper program points and insert data movement instructions accordingly. The techniques manage code, stack and heap data of applications, and reduce execution time by 14%, 52% and 80% respectively compared to their predecessors on typical embedded applications. On top of managing local data, a technique is also developed for shared data in SMM architectures. Experimental results show it achieves more than 2X speedup than the previous technique on average.

ContributorsCai, Jian (Author) / Shrivastava, Aviral (Thesis advisor) / Wu, Carole (Committee member) / Ren, Fengbo (Committee member) / Dasgupta, Partha (Committee member) / Arizona State University (Publisher)

Created2017

Algorithm Architecture Co-design for Dense and Sparse Matrix Computations

Description

With the end of Dennard scaling and Moore's law, architects have moved towards

heterogeneous designs consisting of specialized cores to achieve higher performance

and energy efficiency for a target application domain. Applications of linear algebra

are ubiquitous in the field of scientific computing, machine learning, statistics,

etc. with matrix computations being fundamental to these…

With the end of Dennard scaling and Moore's law, architects have moved towards

heterogeneous designs consisting of specialized cores to achieve higher performance

and energy efficiency for a target application domain. Applications of linear algebra

are ubiquitous in the field of scientific computing, machine learning, statistics,

etc. with matrix computations being fundamental to these linear algebra based solutions.

Design of multiple dense (or sparse) matrix computation routines on the

same platform is quite challenging. Added to the complexity is the fact that dense

and sparse matrix computations have large differences in their storage and access

patterns and are difficult to optimize on the same architecture. This thesis addresses

this challenge and introduces a reconfigurable accelerator that supports both dense

and sparse matrix computations efficiently.

The reconfigurable architecture has been optimized to execute the following linear

algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM

(Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver),

LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication),

SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where

each core consists of a 2D array of processing elements (PE).

The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix

updates efficiently. A sequence of such updates is used to solve a larger problem inside

a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR)

is used to perform sparse kernel updates. Scalable partitioning and mapping schemes

are presented that map input matrices of any given size to the multicore architecture.

Design trade-offs related to the PE array dimension, size of local memory inside a core

and the bandwidth between on-chip memories and the cores have been presented. An

optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto

32 GOPS using a single core.

ContributorsAnimesh, Saurabh (Author) / Chakrabarti, Chaitali (Thesis advisor) / Brunhaver, John (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)

Created2018

Distortion Robust Biometric Recognition

Description

Information forensics and security have come a long way in just a few years thanks to the recent advances in biometric recognition. The main challenge remains a proper design of a biometric modality that can be resilient to unconstrained conditions, such as quality distortions. This work presents a solution to…

Information forensics and security have come a long way in just a few years thanks to the recent advances in biometric recognition. The main challenge remains a proper design of a biometric modality that can be resilient to unconstrained conditions, such as quality distortions. This work presents a solution to face and ear recognition under unconstrained visual variations, with a main focus on recognition in the presence of blur, occlusion and additive noise distortions.

First, the dissertation addresses the problem of scene variations in the presence of blur, occlusion and additive noise distortions resulting from capture, processing and transmission. Despite their excellent performance, ’deep’ methods are susceptible to visual distortions, which significantly reduce their performance. Sparse representations, on the other hand, have shown huge potential capabilities in handling problems, such as occlusion and corruption. In this work, an augmented SRC (ASRC) framework is presented to improve the performance of the Spare Representation Classifier (SRC) in the presence of blur, additive noise and block occlusion, while preserving its robustness to scene dependent variations. Different feature types are considered in the performance evaluation including image raw pixels, HoG and deep learning VGG-Face. The proposed ASRC framework is shown to outperform the conventional SRC in terms of recognition accuracy, in addition to other existing sparse-based methods and blur invariant methods at medium to high levels of distortion, when particularly used with discriminative features.

In order to assess the quality of features in improving both the sparsity of the representation and the classification accuracy, a feature sparse coding and classification index (FSCCI) is proposed and used for feature ranking and selection within both the SRC and ASRC frameworks.

The second part of the dissertation presents a method for unconstrained ear recognition using deep learning features. The unconstrained ear recognition is performed using transfer learning with deep neural networks (DNNs) as a feature extractor followed by a shallow classifier. Data augmentation is used to improve the recognition performance by augmenting the training dataset with image transformations. The recognition performance of the feature extraction models is compared with an ensemble of fine-tuned networks. The results show that, in the case where long training time is not desirable or a large amount of data is not available, the features from pre-trained DNNs can be used with a shallow classifier to give a comparable recognition accuracy to the fine-tuned networks.

ContributorsMounsef, Jinane (Author) / Karam, Lina (Thesis advisor) / Papandreou-Suppapola, Antonia (Committee member) / Li, Baoxin (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)

Created2018

Optimizing a Parallel Computing Stack for Single Board Computers

Description

The current trend of interconnected devices, or the internet of things (IOT) has led to the popularization of single board computers (SBC). This is primarily due to their form-factor and low price. This has led to unique networks of devices that can have unstable network connections and minimal processing power.…

The current trend of interconnected devices, or the internet of things (IOT) has led to the popularization of single board computers (SBC). This is primarily due to their form-factor and low price. This has led to unique networks of devices that can have unstable network connections and minimal processing power. Many parallel program- ming libraries are intended for use in high performance computing (HPC) clusters. Unlike the IOT environment described, HPC clusters will in general look to obtain very consistent network speeds and topologies. There are a significant number of software choices that make up what is referred to as the HPC stack or parallel processing stack. My thesis focused on building an HPC stack that would run on the SCB computer name the Raspberry Pi. The intention in making this Raspberry Pi cluster is to research performance of MPI implementations in an IOT environment, which had an impact on the design choices of the cluster. This thesis is a compilation of my research efforts in creating this cluster as well as an evaluation of the software that was chosen to create the parallel processing stack.

ContributorsO'Meara, Braedon Richard (Author) / Meuth, Ryan (Thesis director) / Dasgupta, Partha (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

A Study of Some Edge-Deletion Algorithms for Reducing Disease Spread on Networks

Description

This thesis discusses three recent optimization problems that seek to reduce disease spread on arbitrary graphs by deleting edges, and it discusses three approximation algorithms developed for these problems. Important definitions are presented including the Linear Threshold and Triggering Set models and the set function properties of submodularity and monotonicity.…

This thesis discusses three recent optimization problems that seek to reduce disease spread on arbitrary graphs by deleting edges, and it discusses three approximation algorithms developed for these problems. Important definitions are presented including the Linear Threshold and Triggering Set models and the set function properties of submodularity and monotonicity. Also, important results regarding the Linear Threshold model and computation of the influence function are presented along with proof sketches. The three main problems are formally presented, and NP-hardness results along with proof sketches are presented where applicable. The first problem seeks to reduce spread of infection over the Linear Threshold process by making use of an efficient tree data structure. The second problem seeks to reduce the spread of infection over the Linear Threshold process while preserving the PageRank distribution of the input graph. The third problem seeks to minimize the spectral radius of the input graph. The algorithms designed for these problems are described in writing and with pseudocode, and their approximation bounds are stated along with time complexities. Discussion of these algorithms considers how these algorithms could see real-world use. Challenges and the ways in which these algorithms do or do not overcome them are noted. Two related works, one which presents an edge-deletion disease spread reduction problem over a deterministic threshold process and the other which considers a graph modification problem aimed at minimizing worst-case disease spread, are compared with the three main works to provide interesting perspectives. Furthermore, a new problem is proposed that could avoid some issues faced by the three main problems described, and directions for future work are suggested.

ContributorsStanton, Andrew Warren (Author) / Richa, Andrea (Thesis director) / Czygrinow, Andrzej (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

PCI and Personal Data Anonymization

Description

In the last few years, billion-dollar companies like Yahoo and Equifax have had data breaches causing millions of people’s personal information to be leaked online. Other billion-dollar companies like Google and Facebook have gotten in trouble for abusing people’s personal information for financial gain as well. In this new age…

In the last few years, billion-dollar companies like Yahoo and Equifax have had data breaches causing millions of people’s personal information to be leaked online. Other billion-dollar companies like Google and Facebook have gotten in trouble for abusing people’s personal information for financial gain as well. In this new age of technology where everything is being digitalized and stored online, people all over the world are concerned about what is happening to their personal information and how they can trust it is being kept safe. This paper describes, first, the importance of protecting user data, second, one easy tool that companies and developers can use to help ensure that their user’s information (credit card information specifically) is kept safe, how to implement that tool, and finally, future work and research that needs to be done. The solution I propose is a software tool that will keep credit card data secured. It is only a small step towards achieving a completely secure data anonymized system, but when implemented correctly, it can reduce the risk of credit card data from being exposed to the public. The software tool is a script that can scan every viable file in any given system, server, or other file-structured Linux system and detect if there any visible credit card numbers that should be hidden.

ContributorsPappas, Alexander (Author) / Zhao, Ming (Thesis director) / Kuznetsov, Eugene (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Polarization in the Supreme Court Nomination Processes of Merrick Garland, Neil Gorsuch, and Brett Kavanaugh

Description

Political polarization is the coalescence of political parties -- and the individuals of which parties are composed -- around opposing ends of the ideological spectrum. Political parties in the United States have always been divided, however, in recent years this division has only intensified. Recently, polarization has also wound its…

Political polarization is the coalescence of political parties -- and the individuals of which parties are composed -- around opposing ends of the ideological spectrum. Political parties in the United States have always been divided, however, in recent years this division has only intensified. Recently, polarization has also wound its way to the Supreme Court and the nomination processes of justices to the Court. This paper examines how prevalent polarization in the Supreme Court nomination process has become by looking specifically at the failed nomination of Judge Merrick Garland and the confirmations of now-Justices Neil Gorsuch and Brett Kavanaugh. This is accomplished by comparing the ideologies and qualifications of the three most recent nominees to those of previous nominees, as well as analysing the ideological composition of the Senate at the times of the individual nominations.

ContributorsJoss, Jacob (Author) / Hoekstra, Valerie (Thesis director) / Critchlow, Donald (Committee member) / Computer Science and Engineering Program (Contributor) / School of Politics and Global Studies (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Helix: A First Game Retrospective

Description

The original version of Helix, the one I pitched when first deciding to make a video game
for my thesis, is an action-platformer, with the intent of metroidvania-style progression
and an interconnected world map.

The current version of Helix is a turn based role-playing game, with the intent of roguelike
gameplay and a dark…

The original version of Helix, the one I pitched when first deciding to make a video game
for my thesis, is an action-platformer, with the intent of metroidvania-style progression
and an interconnected world map.

The current version of Helix is a turn based role-playing game, with the intent of roguelike
gameplay and a dark fantasy theme. We will first be exploring the challenges that came
with programming my own game - not quite from scratch, but also without a prebuilt
engine - then transition into game design and how Helix has evolved from its original form
to what we see today.

ContributorsDiscipulo, Isaiah K (Author) / Meuth, Ryan (Thesis director) / Kobayashi, Yoshihiro (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

RecyclePlus: A Sustainability iOS Application

Description

RecyclePlus is an iOS mobile application that allows users to be knowledgeable in the realms of sustainability. It gives encourages users to be environmental responsible by providing them access to recycling information. In particular, it allows users to search up certain materials and learn about its recyclability and how to…

RecyclePlus is an iOS mobile application that allows users to be knowledgeable in the realms of sustainability. It gives encourages users to be environmental responsible by providing them access to recycling information. In particular, it allows users to search up certain materials and learn about its recyclability and how to properly dispose of the material. Some searches will show locations of facilities near users that collect certain materials and dispose of the materials properly. This is a full stack software project that explores open source software and APIs, UI/UX design, and iOS development.

ContributorsTran, Nikki (Author) / Ganesh, Tirupalavanam (Thesis director) / Meuth, Ryan (Committee member) / Watts College of Public Service & Community Solut (Contributor) / Department of Information Systems (Contributor) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Topological Descriptors for Parkinson's Disease Classification and Regression Analysis

Description

At present, the vast majority of human subjects with neurological disease are still diagnosed through in-person assessments and qualitative analysis of patient data. In this paper, we propose to use Topological Data Analysis (TDA) together with machine learning tools to automate the process of Parkinson’s disease classification and severity assessment.…

At present, the vast majority of human subjects with neurological disease are still diagnosed through in-person assessments and qualitative analysis of patient data. In this paper, we propose to use Topological Data Analysis (TDA) together with machine learning tools to automate the process of Parkinson’s disease classification and severity assessment. An automated, stable, and accurate method to evaluate Parkinson’s would be significant in streamlining diagnoses of patients and providing families more time for corrective measures. We propose a methodology which incorporates TDA into analyzing Parkinson’s disease postural shifts data through the representation of persistence images. Studying the topology of a system has proven to be invariant to small changes in data and has been shown to perform well in discrimination tasks. The contributions of the paper are twofold. We propose a method to 1) classify healthy patients from those afflicted by disease and 2) diagnose the severity of disease. We explore the use of the proposed method in an application involving a Parkinson’s disease dataset comprised of healthy-elderly, healthy-young and Parkinson’s disease patients.

ContributorsRahman, Farhan Nadir (Co-author) / Nawar, Afra (Co-author) / Turaga, Pavan (Thesis director) / Krishnamurthi, Narayanan (Committee member) / Electrical Engineering Program (Contributor) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05