Search Content

Reliable arithmetic circuit design inspired by SNP systems

Description

ABSTRACT Developing new non-traditional device models is gaining popularity as the silicon-based electrical device approaches its limitation when it scales down. Membrane systems, also called P systems, are a new class of biological computation model inspired by the way cells process chemical signals. Spiking Neural P systems (SNP systems), a…

ABSTRACT Developing new non-traditional device models is gaining popularity as the silicon-based electrical device approaches its limitation when it scales down. Membrane systems, also called P systems, are a new class of biological computation model inspired by the way cells process chemical signals. Spiking Neural P systems (SNP systems), a certain kind of membrane systems, is inspired by the way the neurons in brain interact using electrical spikes. Compared to the traditional Boolean logic, SNP systems not only perform similar functions but also provide a more promising solution for reliable computation. Two basic neuron types, Low Pass (LP) neurons and High Pass (HP) neurons, are introduced. These two basic types of neurons are capable to build an arbitrary SNP neuron. This leads to the conclusion that these two basic neuron types are Turing complete since SNP systems has been proved Turing complete. These two basic types of neurons are further used as the elements to construct general-purpose arithmetic circuits, such as adder, subtractor and comparator. In this thesis, erroneous behaviors of neurons are discussed. Transmission error (spike loss) is proved to be equivalent to threshold error, which makes threshold error discussion more universal. To improve the reliability, a new structure called motif is proposed. Compared to Triple Modular Redundancy improvement, motif design presents its efficiency and effectiveness in both single neuron and arithmetic circuit analysis. DRAM-based CMOS circuits are used to implement the two basic types of neurons. Functionality of basic type neurons is proved using the SPICE simulations. The motif improved adder and the comparator, as compared to conventional Boolean logic design, are much more reliable with lower leakage, and smaller silicon area. This leads to the conclusion that SNP system could provide a more promising solution for reliable computation than the conventional Boolean logic.

ContributorsAn, Pei (Author) / Cao, Yu (Thesis advisor) / Barnaby, Hugh (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2013

Towards energy efficient computing with Linux: enabling task level power awareness and support for energy efficient accelerator

Description

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate…

With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate software developers to leverage these hardware techniques and improve energy efficiency of the system. To achieve this, I propose two solutions for Linux kernel: Optimal use of these architectural enhancements to achieve greater energy efficiency requires accurate modeling of processor power consumption. Though there are many models available in literature to model processor power consumption, there is a lack of such models to capture power consumption at the task-level. Task-level energy models are a requirement for an operating system (OS) to perform real-time power management as OS time multiplexes tasks to enable sharing of hardware resources. I propose a detailed design methodology for constructing an architecture agnostic task-level power model and incorporating it into a modern operating system to build an online task-level power profiler. The profiler is implemented inside the latest Linux kernel and validated for Intel Sandy Bridge processor. It has a negligible overhead of less than 1\% hardware resource consumption. The profiler power prediction was demonstrated for various application benchmarks from SPEC to PARSEC with less than 4\% error. I also demonstrate the importance of the proposed profiler for emerging architectural techniques through use case scenarios, which include heterogeneous computing and fine grained per-core DVFS. Along with architectural enhancement in general purpose processors to improve energy efficiency, hardware accelerators like Coarse Grain reconfigurable architecture (CGRA) are gaining popularity. Unlike vector processors, which rely on data parallelism, CGRA can provide greater flexibility and compiler level control making it more suitable for present SoC environment. To provide streamline development environment for CGRA, I propose a flexible framework in Linux to do design space exploration for CGRA. With accurate and flexible hardware models, fine grained integration with accurate architectural simulator, and Linux memory management and DMA support, a user can carry out limitless experiments on CGRA in full system environment.

ContributorsDesai, Digant Pareshkumar (Author) / Vrudhula, Sarma (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2013

System-level synthesis of dataplane subsystems for MPSoCs

Description

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a…

In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a constellation of heterogeneous processing elements (PEs) (general purpose PEs and application-specific integrated circuits (ASICS)). A typical MPSoC will be composed of a application processor, such as an ARM Coretex-A9 with cache coherent memory hierarchy, and several application sub-systems. Each of these sub-systems are composed of highly optimized instruction processors, graphics/DSP processors, and custom hardware accelerators. Typically, these sub-systems utilize scratchpad memories (SPM) rather than support cache coherency. The overall architecture is an integration of the various sub-systems through a high bandwidth system-level interconnect (such as a Network-on-Chip (NoC)). The shift to MPSoCs has been fueled by three major factors: demand for high performance, the use of component libraries, and short design turn around time. As customers continue to desire more and more complex applications on their embedded devices the performance demand for these devices continues to increase. Designers have turned to using MPSoCs to address this demand. By using pre-made IP libraries designers can quickly piece together a MPSoC that will meet the application demands of the end user with minimal time spent designing new hardware. Additionally, the use of MPSoCs allows designers to generate new devices very quickly and thus reducing the time to market. In this work, a complete MPSoC synthesis design flow is presented. We first present a technique \cite{leary1_intro} to address the synthesis of the interconnect architecture (particularly Network-on-Chip (NoC)). We then address the synthesis of the memory architecture of a MPSoC sub-system \cite{leary2_intro}. Lastly, we present a co-synthesis technique to generate the functional and memory architectures simultaneously. The validity and quality of each synthesis technique is demonstrated through extensive experimentation.

ContributorsLeary, Glenn (Author) / Chatha, Karamvir S (Thesis advisor) / Vrudhula, Sarma (Committee member) / Shrivastava, Aviral (Committee member) / Beraha, Rudy (Committee member) / Arizona State University (Publisher)

Created2013

Unified framework for energy-proportional computing in multicore processors: novel algorithms and practical implementation

Description

Multicore processors have proliferated in nearly all forms of computing, from servers, desktop, to smartphones. The primary reason for this large adoption of multicore processors is due to its ability to overcome the power-wall by providing higher performance at a lower power consumption rate. With multi-cores, there is increased need…

Multicore processors have proliferated in nearly all forms of computing, from servers, desktop, to smartphones. The primary reason for this large adoption of multicore processors is due to its ability to overcome the power-wall by providing higher performance at a lower power consumption rate. With multi-cores, there is increased need for dynamic energy management (DEM), much more than for single-core processors, as DEM for multi-cores is no more a mechanism just to ensure that a processor is kept under specified temperature limits, but also a set of techniques that manage various processor controls like dynamic voltage and frequency scaling (DVFS), task migration, fan speed, etc. to achieve a stated objective. The objectives span a wide range from maximizing throughput, minimizing power consumption, reducing peak temperature, maximizing energy efficiency, maximizing processor reliability, and so on, along with much more wider constraints of temperature, power, timing, and reliability constraints. Thus DEM can be very complex and challenging to achieve. Since often times many DEMs operate together on a single processor, there is a need to unify various DEM techniques. This dissertation address such a need. In this work, a framework for DEM is proposed that provides a unifying processor model that includes processor power, thermal, timing, and reliability models, supports various DEM control mechanisms, many different objective functions along with equally diverse constraint specifications. Using the framework, a range of novel solutions is derived for instances of DEM problems, that include maximizing processor performance, energy efficiency, or minimizing power consumption, peak temperature under constraints of maximum temperature, memory reliability and task deadlines. Finally, a robust closed-loop controller to implement the above solutions on a real processor platform with a very low operational overhead is proposed. Along with the controller design, a model identification methodology for obtaining the required power and thermal models for the controller is also discussed. The controller is architecture independent and hence easily portable across many platforms. The controller has been successfully deployed on Intel Sandy Bridge processor and the use of the controller has increased the energy efficiency of the processor by over 30%

ContributorsHanumaiah, Vinay (Author) / Vrudhula, Sarma (Thesis advisor) / Chatha, Karamvir (Committee member) / Chakrabarti, Chaitali (Committee member) / Rodriguez, Armando (Committee member) / Askin, Ronald (Committee member) / Arizona State University (Publisher)

Created2013

Fully automated radiation hardened by design circuit construction

Description

A fully automated logic design methodology for radiation hardened by design (RHBD) high speed logic using fine grained triple modular redundancy (TMR) is presented. The hardening techniques used in the cell library are described and evaluated, with a focus on both layout techniques that mitigate total ionizing dose (TID) and…

A fully automated logic design methodology for radiation hardened by design (RHBD) high speed logic using fine grained triple modular redundancy (TMR) is presented. The hardening techniques used in the cell library are described and evaluated, with a focus on both layout techniques that mitigate total ionizing dose (TID) and latchup issues and flip-flop designs that mitigate single event transient (SET) and single event upset (SEU) issues. The base TMR self-correcting master-slave flip-flop is described and compared to more traditional hardening techniques. Additional refinements are presented, including testability features that disable the self-correction to allow detection of manufacturing defects. The circuit approach is validated for hardness using both heavy ion and proton broad beam testing. For synthesis and auto place and route, the methodology and circuits leverage commercial logic design automation tools. These tools are glued together with custom CAD tools designed to enable easy conversion of standard single redundant hardware description language (HDL) files into hardened TMR circuitry. The flow allows hardening of any synthesizable logic at clock frequencies comparable to unhardened designs and supports standard low-power techniques, e.g. clock gating and supply voltage scaling.

ContributorsHindman, Nathan (Author) / Clark, Lawrence T (Thesis advisor) / Holbert, Keith E. (Committee member) / Barnaby, Hugh (Committee member) / Allee, David (Committee member) / Arizona State University (Publisher)

Created2012

Radiation sensing using chalcogenide glass materials

Description

The dissolution of metal layers such as silver into chalcogenide glass layers such as germanium selenide changes the resistivity of the metal and chalcogenide films by a great extent. It is known that the incorporation of the metal can be achieved by ultra violet light exposure or thermal processes. In…

The dissolution of metal layers such as silver into chalcogenide glass layers such as germanium selenide changes the resistivity of the metal and chalcogenide films by a great extent. It is known that the incorporation of the metal can be achieved by ultra violet light exposure or thermal processes. In this work, the use of metal dissolution by exposure to gamma radiation has been explored for radiation sensor applications. Test structures were designed and a process flow was developed for prototype sensor fabrication. The test structures were designed such that sensitivity to radiation could be studied. The focus is on the effect of gamma rays as well as ultra violet light on silver dissolution in germanium selenide (Ge30Se70) chalcogenide glass. Ultra violet radiation testing was used prior to gamma exposure to assess the basic mechanism. The test structures were electrically characterized prior to and post irradiation to assess resistance change due to metal dissolution. A change in resistance was observed post irradiation and was found to be dependent on the radiation dose. The structures were also characterized using atomic force microscopy and roughness measurements were made prior to and post irradiation. A change in roughness of the silver films on Ge30Se70 was observed following exposure. This indicated the loss of continuity of the film which causes the increase in silver film resistance following irradiation. Recovery of initial resistance in the structures was also observed after the radiation stress was removed. This recovery was explained with photo-stimulated deposition of silver from the chalcogenide at room temperature confirmed with the re-appearance of silver dendrites on the chalcogenide surface. The results demonstrate that it is possible to use the metal dissolution effect in radiation sensing applications.

ContributorsChandran, Ankitha (Author) / Kozicki, Michael N (Thesis advisor) / Holbert, Keith E. (Committee member) / Barnaby, Hugh (Committee member) / Arizona State University (Publisher)

Created2012

Compact modeling of multi-gate transistors

Description

Scaling of the classical planar MOSFET below 20 nm gate length is facing not only technological difficulties but also limitations imposed by short channel effects, gate and junction leakage current due to quantum tunneling, high body doping induced threshold voltage variation, and carrier mobility degradation. Non-classical multiple-gate structures such as…

Scaling of the classical planar MOSFET below 20 nm gate length is facing not only technological difficulties but also limitations imposed by short channel effects, gate and junction leakage current due to quantum tunneling, high body doping induced threshold voltage variation, and carrier mobility degradation. Non-classical multiple-gate structures such as double-gate (DG) FinFETs and surrounding gate field-effect-transistors (SGFETs) have good electrostatic integrity and are an alternative to planar MOSFETs for below 20 nm technology nodes. Circuit design with these devices need compact models for SPICE simulation. In this work physics based compact models for the common-gate symmetric DG-FinFET, independent-gate asymmetric DG-FinFET, and SGFET are developed. Despite the complex device structure and boundary conditions for the Poisson-Boltzmann equation, the core structure of the DG-FinFET and SGFET models, are maintained similar to the surface potential based compact models for planar MOSFETs such as SP and PSP. TCAD simulations show differences between the transient behavior and the capacitance-voltage characteristics of bulk and SOI FinFETs if the gate-voltage swing includes the accumulation region. This effect can be captured by a compact model of FinFETs only if it includes the contribution of both types of carriers in the Poisson-Boltzmann equation. An accurate implicit input voltage equation valid in all regions of operation is proposed for common-gate symmetric DG-FinFETs with intrinsic or lightly doped bodies. A closed-form algorithm is developed for solving the new input voltage equation including ambipolar effects. The algorithm is verified for both the surface potential and its derivatives and includes a previously published analytical approximation for surface potential as a special case when ambipolar effects can be neglected. The symmetric linearization method for common-gate symmetric DG-FinFETs is developed in a form free of the charge-sheet approximation present in its original formulation for bulk MOSFETs. The accuracy of the proposed technique is verified by comparison with exact results. An alternative and computationally efficient description of the boundary between the trigonometric and hyperbolic solutions of the Poisson-Boltzmann equation for the independent-gate asymmetric DG-FinFET is developed in terms of the Lambert W function. Efficient numerical algorithm is proposed for solving the input voltage equation. Analytical expressions for terminal charges of an independent-gate asymmetric DG-FinFET are derived. The new charge model is C-infinity continuous, valid for weak as well as for strong inversion condition of both the channels and does not involve the charge-sheet approximation. This is accomplished by developing the symmetric linearization method in a form that does not require identical boundary conditions at the two Si-SiO2 interfaces and allows for volume inversion in the DG-FinFET. Verification of the model is performed with both numerical computations and 2D TCAD simulations under a wide range of biasing conditions. The model is implemented in a standard circuit simulator through Verilog-A code. Simulation examples for both digital and analog circuits verify good model convergence and demonstrate the capabilities of new circuit topologies that can be implemented using independent-gate asymmetric DG-FinFETs.

ContributorsDessai, Gajanan (Author) / Gildenblat, Gennady (Committee member) / McAndrew, Colin (Committee member) / Cao, Yu (Committee member) / Barnaby, Hugh (Committee member) / Arizona State University (Publisher)

Created2012

Medical implant receiver system

Description

The medical industry has benefited greatly by electronic integration resulting in the explosive growth of active medical implants. These devices often treat and monitor chronic health conditions and require very minimal power usage. A key part of these medical implants is an ultra-low power two way wireless communication system. This…

The medical industry has benefited greatly by electronic integration resulting in the explosive growth of active medical implants. These devices often treat and monitor chronic health conditions and require very minimal power usage. A key part of these medical implants is an ultra-low power two way wireless communication system. This enables both control of the implant as well as relay of information collected. This research has focused on a high performance receiver for medical implant applications. One commonly quoted specification to compare receivers is energy per bit required. This metric is useful, but incomplete in that it ignores Sensitivity level, bit error rate, and immunity to interferers. In this study exploration of receiver architectures and convergence upon a comprehensive solution is done. This analysis is used to design and build a system for validation. The Direct Conversion Receiver architecture implemented for the MICS standard in 0.18 µm CMOS process consumes approximately 2 mW is competitive with published research.

ContributorsStevens, Mark (Author) / Kiaei, Sayfe (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Aberle, James T., 1961- (Committee member) / Barnaby, Hugh (Committee member) / Arizona State University (Publisher)

Created2012

On-line coloring of partial orders, circular arc graphs, and trees

Description

A central concept of combinatorics is partitioning structures with given constraints. Partitions of on-line posets and on-line graphs, which are dynamic versions of the more familiar static structures posets and graphs, are examined. In the on-line setting, vertices are continually added to a poset or graph while a chain partition…

A central concept of combinatorics is partitioning structures with given constraints. Partitions of on-line posets and on-line graphs, which are dynamic versions of the more familiar static structures posets and graphs, are examined. In the on-line setting, vertices are continually added to a poset or graph while a chain partition or coloring (respectively) is maintained. %The optima of the static cases cannot be achieved in the on-line setting. Both upper and lower bounds for the optimum of the number of chains needed to partition a width $w$ on-line poset exist. Kierstead's upper bound of $\frac{5^w-1}{4}$ was improved to $w^{14 \lg w}$ by Bosek and Krawczyk. This is improved to $w^{3+6.5 \lg w}$ by employing the First-Fit algorithm on a family of restricted posets (expanding on the work of Bosek and Krawczyk) . Namely, the family of ladder-free posets where the $m$-ladder is the transitive closure of the union of two incomparable chains $x_1\le\dots\le x_m$, $y_1\le\dots\le y_m$ and the set of comparabilities $\{x_1\le y_1,\dots, x_m\le y_m\}$. No upper bound on the number of colors needed to color a general on-line graph exists. To lay this fact plain, the performance of on-line coloring of trees is shown to be particularly problematic. There are trees that require $n$ colors to color on-line for any positive integer $n$. Furthermore, there are trees that usually require many colors to color on-line even if they are presented without any particular strategy. For restricted families of graphs, upper and lower bounds for the optimum number of colors needed to maintain an on-line coloring exist. In particular, circular arc graphs can be colored on-line using less than 8 times the optimum number from the static case. This follows from the work of Pemmaraju, Raman, and Varadarajan in on-line coloring of interval graphs.

ContributorsSmith, Matthew Earl (Author) / Kierstead, Henry A (Thesis advisor) / Colbourn, Charles (Committee member) / Czygrinow, Andrzej (Committee member) / Fishel, Susanna (Committee member) / Hurlbert, Glenn (Committee member) / Arizona State University (Publisher)

Created2012

Testing of threshold logic latch based hybrid circuits

Description

The advent of threshold logic simplifies the traditional Boolean logic to the single level multi-input function. Threshold logic latch (TLL), among implementations of threshold logic, is functionally equivalent to a multi-input function with an edge triggered flip-flop, which stands out to improve area and both dynamic and leakage power consumption,…

The advent of threshold logic simplifies the traditional Boolean logic to the single level multi-input function. Threshold logic latch (TLL), among implementations of threshold logic, is functionally equivalent to a multi-input function with an edge triggered flip-flop, which stands out to improve area and both dynamic and leakage power consumption, providing an appropriate design alternative. Accordingly, the TLL standard cell library is designed. Through technology mapping, hybrid circuit is generated by absorbing the logic cone backward from each flip-flip to get the smallest remaining feeder. With the scan test methodology adopted, design for testability (DFT) is proposed, including scan element design and scan chain insertion. Test synthesis flow is then introduced, according to the Cadence tool, RTL compiler. Test application is the process of applying vectors and the response analysis, which is mainly about the testbench design. A parameterized generic self-checking Verilog testbench is designed for static fault detection. Test development refers to the fault modeling, and test generation. Firstly, functional truth table test generation on TLL cells is proposed. Before the truth table test of the threshold function, the dependence of sequence of vectors applied, i.e., the dependence of current state on the previous state, should be eliminated. Transition test (dynamic pattern) on all weak inputs is proved to be able to test the reset function, which is supposed to erase the history in the reset phase before every evaluation phase. Remaining vectors in the truth table except the weak inputs are then applied statically (static pattern). Secondly, dynamic patterns for all weak inputs are proposed to detect structural transistor level faults analyzed in the TLL cell, with single fault assumption and stuck-at faults, stuck-on faults, and stuck-open faults under consideration. Containing those patterns, the functional test covers all testable structural faults inside the TLL. Thirdly, with the scope of the whole hybrid netlist, the procedure of test generation is proposed with three steps: scan chain test; test of feeders and other scan elements except TLLs; functional pattern test of TLL cells. Implementation of this procedure is discussed in the automatic test pattern generation (ATPG) chapter.

ContributorsHu, Yang (Author) / Vrudhula, Sarma (Thesis advisor) / Barnaby, Hugh (Committee member) / Yu, Shimeng (Committee member) / Arizona State University (Publisher)

Created2013