Search Content

An analytical approach to efficient circuit variability analysis in scaled CMOS design

Description

Process variations have become increasingly important for scaled technologies starting at 45nm. The increased variations are primarily due to random dopant fluctuations, line-edge roughness and oxide thickness fluctuation. These variations greatly impact all aspects of circuit performance and pose a grand challenge to future robust IC design. To improve robustness,…

Process variations have become increasingly important for scaled technologies starting at 45nm. The increased variations are primarily due to random dopant fluctuations, line-edge roughness and oxide thickness fluctuation. These variations greatly impact all aspects of circuit performance and pose a grand challenge to future robust IC design. To improve robustness, efficient methodology is required that considers effect of variations in the design flow. Analyzing timing variability of complex circuits with HSPICE simulations is very time consuming. This thesis proposes an analytical model to predict variability in CMOS circuits that is quick and accurate. There are several analytical models to estimate nominal delay performance but very little work has been done to accurately model delay variability. The proposed model is comprehensive and estimates nominal delay and variability as a function of transistor width, load capacitance and transition time. First, models are developed for library gates and the accuracy of the models is verified with HSPICE simulations for 45nm and 32nm technology nodes. The difference between predicted and simulated σ/μ for the library gates is less than 1%. Next, the accuracy of the model for nominal delay is verified for larger circuits including ISCAS'85 benchmark circuits. The model predicted results are within 4% error of HSPICE simulated results and take a small fraction of the time, for 45nm technology. Delay variability is analyzed for various paths and it is observed that non-critical paths can become critical because of Vth variation. Variability on shortest paths show that rate of hold violations increase enormously with increasing Vth variation.

ContributorsGummalla, Samatha (Author) / Chakrabarti, Chaitali (Thesis advisor) / Cao, Yu (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Arizona State University (Publisher)

Created2011

Compiler and runtime for memory management on software managed manycore processors

Description

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale…

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.

ContributorsBai, Ke (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Xue, Guoliang (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2014

Multidimensional DFT IP generators for FPGA platforms

Description

Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns.…

Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns. However, architectures based on RC decomposition are not efficient for large input size data which have to be stored in external memories based Synchronous Dynamic RAM (SDRAM). In this dissertation, first an efficient architecture to implement 2-D DFT for large-sized input data is proposed. This architecture achieves very high throughput by exploiting the inherent parallelism due to a novel 2-D decomposition and by utilizing the row-wise burst access pattern of the SDRAM external memory. In addition, an automatic IP generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2048x2048 input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations. While the proposed 2-D DFT IP can achieve high performance, its output is bit-reversed. For systems where the output is required to be in natural order, use of this DFT IP would result in timing overhead. To solve this problem, a new bandwidth-efficient MD DFT IP that is transpose-free and produces outputs in natural order is proposed. It is based on a novel decomposition algorithm that takes into account the output order, FPGA resources, and the characteristics of off-chip memory access. An IP generator is designed and integrated into an in-house FPGA development platform, AlgoFLEX, for easy verification and fast integration. The corresponding 2-D and 3-D DFT architectures are ported onto the BEE3 board and their performance measured and analyzed. The results shows that the architecture can maintain the maximum memory bandwidth throughout the whole procedure while avoiding matrix transpose operations used in most other MD DFT implementations. The proposed architecture has also been ported onto the Xilinx ML605 board. When clocked at 100 MHz, 2048x2048 images with complex single-precision can be processed in less than 27 ms. Finally, transpose-free imaging flows for range-Doppler algorithm (RDA) and chirp-scaling algorithm (CSA) in SAR imaging are proposed. The corresponding implementations take advantage of the memory access patterns designed for the MD DFT IP and have superior timing performance. The RDA and CSA flows are mapped onto a unified architecture which is implemented on an FPGA platform. When clocked at 100MHz, the RDA and CSA computations with data size 4096x4096 can be completed in 323ms and 162ms, respectively. This implementation outperforms existing SAR image accelerators based on FPGA and GPU.

ContributorsYu, Chi-Li (Author) / Chakrabarti, Chaitali (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Karam, Lina (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)

Created2012

Compilation of stream programs onto embedded multicore architectures

Description

In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM)…

In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM) has been introduced in today's embedded multicore processors. Current design frameworks for developing stream applications on SPM enhanced embedded architectures typically do not include a compiler that can perform automatic partitioning, mapping and scheduling under limited on-chip SPM capacities and memory access delays. Consequently, many designs are implemented manually, which leads to lengthy tasks and inferior designs. In this work, optimization techniques that automatically compile stream programs onto embedded multi-core architectures are proposed. As an initial case study, we implemented an automatic target recognition (ATR) algorithm on the IBM Cell Broadband Engine (BE). Then integer linear programming (ILP) and heuristic approaches were proposed to schedule stream programs on a single core embedded processor that has an SPM with code overlay. Later, ILP and heuristic approaches for Compiling Stream programs on SPM enhanced Multicore Processors (CSMP) were studied. The proposed CSMP ILP and heuristic approaches do not optimize for cycles in stream applications. Further, the number of software pipeline stages in the implementation is dependent on actor to processing engine (PE) mapping and is uncontrollable. We next presented a Retiming technique for Throughput optimization on Embedded Multi-core processors (RTEM). RTEM approach inherently handles cycles and can accept an upper bound on the number of software pipeline stages to be generated. We further enhanced RTEM by incorporating unrolling (URSTEM) that preserves all the beneficial properties of RTEM heuristic and also scales with the number of PEs through unrolling.

ContributorsChe, Weijia (Author) / Chatha, Karam Singh (Thesis advisor) / Vrudhula, Sarma (Committee member) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2012

Predictive modeling for extremely scaled CMOS and post silicon devices

Description

To extend the lifetime of complementary metal-oxide-semiconductors (CMOS), emerging process techniques are being proposed to conquer the manufacturing difficulties. New structures and materials are proposed with superior electrical properties to traditional CMOS, such as strain technology and feedback field-effect transistor (FB-FET). To continue the design success and make an impact…

To extend the lifetime of complementary metal-oxide-semiconductors (CMOS), emerging process techniques are being proposed to conquer the manufacturing difficulties. New structures and materials are proposed with superior electrical properties to traditional CMOS, such as strain technology and feedback field-effect transistor (FB-FET). To continue the design success and make an impact on leading products, advanced circuit design exploration must begin concurrently with early silicon development. Therefore, an accurate and scalable model is desired to correctly capture those effects and flexible to extend to alternative process choices. For example, strain technology has been successfully integrated into CMOS fabrication to improve transistor performance but the stress is non-uniformly distributed in the channel, leading to systematic performance variations. In this dissertation, a new layout-dependent stress model is proposed as a function of layout, temperature, and other device parameters. Furthermore, a method of layout decomposition is developed to partition the layout into a set of simple patterns for model extraction. These solutions significantly reduce the complexity in stress modeling and simulation. On the other hand, semiconductor devices with self-feedback mechanisms are emerging as promising alternatives to CMOS. Fe-FET was proposed to improve the switching by integrating a ferroelectric material as gate insulator in a MOSFET structure. Under particular circumstances, ferroelectric capacitance is effectively negative, due to the negative slope of its polarization-electrical field curve. This property makes the ferroelectric layer a voltage amplifier to boost surface potential, achieving fast transition. A new threshold voltage model for Fe-FET is developed, and is further revealed that the impact of random dopant fluctuation (RDF) can be suppressed. Furthermore, through silicon via (TSV), a key technology that enables the 3D integration of chips, is studied. TSV structure is usually a cylindrical metal-oxide-semiconductors (MOS) capacitor. A piecewise capacitance model is proposed for 3D interconnect simulation. Due to the mismatch in coefficients of thermal expansion (CTE) among materials, thermal stress is observed in TSV process and impacts neighboring devices. The stress impact is investigated to support the interaction between silicon process and IC design at the early stage.

ContributorsWang, Chi-Chao (Author) / Cao, Yu (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Clark, Lawrence (Committee member) / Schroder, Dieter (Committee member) / Arizona State University (Publisher)

Created2011

Study of Optical and Radiative Properties of Inhomogeneous Metallic Structures

Description

The objective of this dissertation is to study the optical and radiative properties of inhomogeneous metallic structures. In the ongoing search for new materials with tunable optical characteristics, porous metals and nanowires provides an extensive design space to engineer its optical response based on the morphology-dependent phenomena.This dissertation firstly discusses…

The objective of this dissertation is to study the optical and radiative properties of inhomogeneous metallic structures. In the ongoing search for new materials with tunable optical characteristics, porous metals and nanowires provides an extensive design space to engineer its optical response based on the morphology-dependent phenomena.This dissertation firstly discusses the use of aluminum nanopillar array on a quartz substrate as spectrally selective optical filter with narrowband transmission for thermophotovoltaic systems. The narrow-band transmission enhancement is attributed to the magnetic polariton resonance between neighboring aluminum nanopillars. Tuning of the resonance wavelengths for selective filters was achieved by changing the nanopillar geometry. It concludes by showing improved efficiency of Gallium-Antimonide thermophotovoltaic system by coupling the designed filter with the cell. Next, isotropic nanoporous gold films are investigated for applications in energy conversion and three-dimensional laser printing. The fabricated nanoporous gold samples are characterized by scanning electron microscopy, and the spectral hemispherical reflectance is measured with an integrating sphere. The effective isotropic optical constants of nanoporous gold with varying pore volume fraction are modeled using the Bruggeman effective medium theory. Nanoporous gold are metastable and to understand its temperature dependent optical properties, a lab-scale fiber-based optical spectrometer setup is developed to characterize the in-situ specular reflectance of nanoporous gold thin films at temperatures ranging from 25 to 500 oC. The in-situ and the ex-situ measurements suggest that the ii specular, diffuse, and hemispherical reflectance varies as a function of temperature due to the morphology (ligament diameter) change observed. The dissertation continues with modeling and measurements of the radiative properties of porous powders. The study shows the enhanced absorption by mixing porous copper to copper powder. This is important from the viewpoint of scalability to get end products such as sheets and tubes with the requirement of high absorptance that can be produced through three-dimensional printing. Finally, the dissertation concludes with recommendations on the methods to fabricate the suggested optical filters to improve thermophotovoltaic system efficiencies. The results presented in this dissertation will facilitate not only the manufacturing of materials but also the promising applications in solar thermal energy and optical systems.

ContributorsRamesh, Rajagopalan (Author) / Wang, Liping (Thesis advisor) / Azeredo, Bruno (Thesis advisor) / Phelan, Patrick (Committee member) / Yu, Hongbin (Committee member) / Rykaczewski, Konrad (Committee member) / Arizona State University (Publisher)

Created2022

Opto-thermal Energy Transport with Selective Metamaterials and Solar Thermal Characterization of Selective Metafilm Absorbers

Description

The objective of this dissertation is to study the use of metamaterials as narrow-band and broadband selective absorbers for opto-thermal and solar thermal energy conversion. Narrow-band selective absorbers have applications such as plasmonic sensing and cancer treatment, while one of the main applications of selective metamaterials with broadband absorption is…

The objective of this dissertation is to study the use of metamaterials as narrow-band and broadband selective absorbers for opto-thermal and solar thermal energy conversion. Narrow-band selective absorbers have applications such as plasmonic sensing and cancer treatment, while one of the main applications of selective metamaterials with broadband absorption is efficiently converting solar energy into heat as solar absorbers.

This dissertation first discusses the use of gold nanowires as narrow-band selective metamaterial absorbers. An investigation into plasmonic localized heating indicated that film-coupled gold nanoparticles exhibit tunable selective absorption based on the size of the nanoparticles. By using anodized aluminum oxide templates, aluminum nanodisc narrow-band absorbers were fabricated. A metrology instrument to measure the reflectance and transmittance of micro-scale samples was also developed and used to measure the reflectance of the aluminum nanodisc absorbers (220 µm diameter area). Tuning of the resonance wavelengths of these absorbers can be achieved through changing their geometry. Broadband absorption can be achieved by using a combination of geometries for these metamaterials which would facilitate their use as solar absorbers.

Recently, solar energy harvesting has become a topic of considerable research investigation due to it being an environmentally conscious alternative to fossil fuels. The next section discusses the steady-state temperature measurement of a lab-scale multilayer solar absorber, named metafilm. A lab-scale experimental setup is developed to characterize the solar thermal performance of selective solar absorbers. Under a concentration factor of 20.3 suns, a steady-state temperature of ~500 degrees Celsius was achieved for the metafilm compared to 375 degrees Celsius for a commercial black absorber under the same conditions. Thermal durability testing showed that the metafilm could withstand up to 700 degrees Celsius in vacuum conditions and up to 400 degrees Celsius in atmospheric conditions with little degradation of its optical and radiative properties. Moreover, cost analysis of the metafilm found it to cost significantly less ($2.22 per square meter) than commercial solar coatings ($5.41-100 per square meter).

Finally, this dissertation concludes with recommendations for further studies like using these selective metamaterials and metafilms as absorbers and emitters and using the aluminum nanodiscs on glass as selective filters for photovoltaic cells to enhance solar thermophotovoltaic energy conversion.

ContributorsAlshehri, Hassan (Author) / Wang, Liping (Thesis advisor) / Phelan, Patrick (Committee member) / Rykaczewski, Konrad (Committee member) / Wang, Robert (Committee member) / Yu, Hongbin (Committee member) / Arizona State University (Publisher)

Created2018

Analysis and design of native file system enhancements for storage class memory

Description

As persistent non-volatile memory solutions become integrated in the computing ecosystem and landscape, traditional commodity file systems architected and developed for traditional block I/O based memory solutions must be reevaluated. A majority of commodity file systems have been architected and designed with the goal of managing data on non-volatile…

As persistent non-volatile memory solutions become integrated in the computing ecosystem and landscape, traditional commodity file systems architected and developed for traditional block I/O based memory solutions must be reevaluated. A majority of commodity file systems have been architected and designed with the goal of managing data on non-volatile storage devices such as hard disk drives (HDDs) and solid state drives (SSDs). HDDs and SSDs are attached to a computing system via a controller or I/O hub, often referred to as the southbridge. The point of HDD and SSD attachment creates multiple levels of translation for any data managed by the CPU that must be stored in non-volatile memory (NVM) on an HDD or SSD. Storage Class Memory (SCM) devices provide the ability to store data at the CPU and DRAM level of a computing system. A novel set of modifications to the ext2 and ext4 commodity file systems to address the needs of SCM will be presented and discussed. An in-depth analysis of many existing file systems, from multiple sources, will be presented along with an analysis to identify key modifications and extensions that would be necessary to execute file system on SCM devices. From this analysis, modifications and extensions have been applied to the FAT commodity file system for key functional tests that will be presented to demonstrate the operation and execution of the file system extensions.

ContributorsRobles, Raymond (Author) / Syrotiuk, Violet (Thesis advisor) / Sohoni, Sohum (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)

Created2016

Memory interference characterization and mitigation for heterogeneous smartphones

Description

The availability of a wide range of general purpose as well as accelerator cores on

modern smartphones means that a significant number of applications can be executed

on a smartphone simultaneously, resulting in an ever increasing demand on the memory

subsystem. While the increased computation capability is intended for improving

user experience, memory requests…

The availability of a wide range of general purpose as well as accelerator cores on

modern smartphones means that a significant number of applications can be executed

on a smartphone simultaneously, resulting in an ever increasing demand on the memory

subsystem. While the increased computation capability is intended for improving

user experience, memory requests from each concurrent application exhibit unique

memory access patterns as well as specific timing constraints. If not considered, this

could lead to significant memory contention and result in lowered user experience.

This work first analyzes the impact of memory degradation caused by the interference

at the memory system for a broad range of commonly-used smartphone applications.

The real system characterization results show that smartphone applications,

such as web browsing and media playback, suffer significant performance degradation.

This is caused by shared resource contention at the application processor’s last-level

cache, the communication fabric, and the main memory.

Based on the detailed characterization results, rest of this thesis focuses on the

design of an effective memory interference mitigation technique. Since web browsing,

being one of the most commonly-used smartphone applications and represents many

html-based smartphone applications, my thesis focuses on meeting the performance

requirement of a web browser on a smartphone in the presence of background processes

and co-scheduled applications. My thesis proposes a light-weight user space frequency

governor to mitigate the degradation caused by interfering applications, by predicting

the performance and power consumption of web browsing. The governor selects an

optimal energy-efficient frequency setting periodically by using the statically-trained

performance and power models with dynamically-varying architecture and system

conditions, such as the memory access intensity of background processes and/or coscheduled applications, and temperature of cores. The governor has been extensively evaluated on a Nexus 5 smartphone over a diverse range of mobile workloads. By

operating at the most energy-efficient frequency setting in the presence of interference,

energy efficiency is improved by as much as 35% and with an average of 18% compared

to the existing interactive governor, while maintaining the satisfactory performance

of web page loading under 3 seconds.

ContributorsShingari, Davesh (Author) / Wu, Carole-Jean (Thesis advisor) / Vrudhula, Sarma (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2016

Dynamic control of radiative heat transfer with tunable materials for thermal management in both far and near fields

Description

The proposed research mainly focuses on employing tunable materials to achieve dynamic control of radiative heat transfer in both far and near fields for thermal management. Vanadium dioxide (VO2), which undergoes a phase transition from insulator to metal at the temperature of 341 K, is one tunable material being applied.…

The proposed research mainly focuses on employing tunable materials to achieve dynamic control of radiative heat transfer in both far and near fields for thermal management. Vanadium dioxide (VO2), which undergoes a phase transition from insulator to metal at the temperature of 341 K, is one tunable material being applied. The other one is graphene, whose optical properties can be tuned by chemical potential through external bias or chemical doping.

In the far field, a VO2-based metamaterial thermal emitter with switchable emittance in the mid-infrared has been theoretically studied. When VO2 is in the insulating phase, high emittance is observed at the resonance frequency of magnetic polaritons (MPs), while the structure becomes highly reflective when VO2 turns metallic. A VO2-based thermal emitter with tunable emittance is also demonstrated due to the excitation of MP at different resonance frequencies when VO2 changes phase. Moreover, an infrared thermal emitter made of graphene-covered SiC grating could achieve frequency-tunable emittance peak via the change of the graphene chemical potential.

In the near field, a radiation-based thermal rectifier is constructed by investigating radiative transfer between VO2 and SiO2 separated by nanometer vacuum gap distances. Compared to the case where VO2 is set as the emitter at 400 K as a metal, when VO2 is considered as the receiver at 300 K as an insulator, the energy transfer is greatly enhanced due to the strong surface phonon polariton (SPhP) coupling between insulating VO2 and SiO2. A radiation-based thermal switch is also explored by setting VO2 as both the emitter and the receiver. When both VO2 emitter and receiver are at the insulating phase, the switch is at the “on” mode with a much enhanced heat flux due to strong SPhP coupling, while the near-field radiative transfer is greatly suppressed when the emitting VO2 becomes metallic at temperatures higher than 341K during the “off” mode. In addition, an electrically-gated thermal modulator made of graphene covered SiC plates is theoretically studied with modulated radiative transport by varying graphene chemical potential. Moreover, the MP effect on near-field radiative transport has been investigated by spectrally enhancing radiative heat transfer between two metal gratings.

ContributorsYang, Yue (Author) / Wang, Liping (Thesis advisor) / Phelan, Patrick (Committee member) / Wang, Robert (Committee member) / Tongay, Sefaattin (Committee member) / Rykaczewski, Konrad (Committee member) / Arizona State University (Publisher)

Created2016

Filtering by