Search Content

Displaying 1 - 2 of 2

Filtering by

All Subjects: engineering

Design of an automated validation environment for a radiation hardened MIPS microprocessor

Description

Ever reducing time to market, along with short product lifetimes, has created a need to shorten the microprocessor design time. Verification of the design and its analysis are two major components of this design cycle. Design validation techniques can be broadly classified into two major categories: simulation based approaches and formal techniques. Simulation based microprocessor validation involves running millions of cycles using random or pseudo random tests and allows verification of the register transfer level (RTL) model against an architectural model, i.e., that the processor executes instructions as required. The validation effort involves model checking to a high level description or simulation of the design against the RTL implementation. Formal techniques exhaustively analyze parts of the design but, do not verify RTL against the architecture specification. The focus of this work is to implement a fully automated validation environment for a MIPS based radiation hardened microprocessor using simulation based approaches. The basic framework uses the classical validation approach in which the design to be validated is described in a Hardware Definition Language (HDL) such as VHDL or Verilog. To implement a simulation based approach a number of random or pseudo random tests are generated. The output of the HDL based design is compared against the one obtained from a "perfect" model implementing similar functionality, a mismatch in the results would thus indicate a bug in the HDL based design. Effort is made to design the environment in such a manner that it can support validation during different stages of the design cycle. The validation environment includes appropriate changes so as to support architecture changes which are introduced because of radiation hardening. The manner in which the validation environment is build is highly dependent on the specifications of the perfect model used for comparisons. This work implements the validation environment for two MIPS simulators as the reference model. Two bugs have been discovered in the RTL model, using simulation based approaches through the validation environment.

ContributorsSharma, Abhishek (Author) / Clark, Lawrence (Thesis advisor) / Holbert, Keith E. (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2011

Implementation of Graph Kernels on Multi core Architecture

Description

Graphs are one of the key data structures for many real-world computing applica-

tions such as machine learning, social networks, genomics etc. The main challenges of

graph processing include diculty in parallelizing the workload that results in work-

load imbalance, poor memory locality and very large number of memory accesses.

This causes large-scale graph processing to be very expensive.

This thesis presents implementation of a select set of graph kernels on a multi-core

architecture, Transmuter. The kernels are Breadth-First Search (BFS), Page Rank

(PR), and Single Source Shortest Path (SSSP). Transmuter is a multi-tiled architec-

ture with 4 tiles and 16 general processing elements (GPE) per tile that supports a

two level cache hierarchy. All graph processing kernels have been implemented on

Transmuter using Gem5 architectural simulator.

The key pre-processing steps in improving the performance are static partition-

ing by destination and balancing the workload among the processing cores. Results

obtained by processing graphs that are partitioned against un-partitioned graphs

show almost 3x improvement in performance. Choice of data structure also plays an

important role in the amount of storage space consumed and the amount of synchro-

nization required in a parallel implementation. Here the compressed sparse column

data format was used. BFS and SSSP are frontier-based algorithms where a frontier

represents a subset of vertices that are active during the current iteration. They

were implemented using the Boolean frontier array data structure. PR is an iterative

algorithm where all vertices are active at all times.

The performance of the dierent Transmuter implementations for the 14nm node

were evaluated based on metrics such as power consumption (Watt), Giga Operations

Per Second(GOPS), GOPS/Watt and L1/L2 cache misses. GOPS/W numbers for

graphs with 10k nodes and 10k edges is 33 for BFS, 477 for PR and 10 for SSSP.

i

Frontier-based algorithms have much lower GOPS/W compared to iterative algo-

rithms such as PR. This is because all nodes in Page Rank are active at all points

in time. For all three kernel implementations, the L1 cache miss rates are quite low

while the L2 cache hit rates are high.

ContributorsRENGANATHAN, SRINIDHI (Author) / Chakrabarti, Chaitali (Thesis advisor) / Shrivastava, Aviral (Committee member) / Mudge, Trevor (Committee member) / Arizona State University (Publisher)

Created2019

ASU Electronic Theses and Dissertations

Filtering by

Design of an automated validation environment for a radiation hardened MIPS microprocessor

Implementation of Graph Kernels on Multi core Architecture