Matching Items (21)
Filtering by

Clear all filters

Description
Advances in software and applications continue to demand advances in memory. The ideal memory would be non-volatile and have maximal capacity, speed, retention time, endurance, and radiation hardness while also having minimal physical size, energy usage, and cost. The programmable metallization cell (PMC) is an emerging memory technology that is

Advances in software and applications continue to demand advances in memory. The ideal memory would be non-volatile and have maximal capacity, speed, retention time, endurance, and radiation hardness while also having minimal physical size, energy usage, and cost. The programmable metallization cell (PMC) is an emerging memory technology that is likely to surpass flash memory in all the listed ideal memory characteristics. A comprehensive physics-based model is needed to fully understand PMC operation and aid in design optimization. With the intent of advancing the PMC modeling effort, this thesis presents two simulation models for the PMC. The first model is a finite element model based on Silvaco Atlas finite element analysis software. Limitations of the software are identified that make this model inconsistent with the operating mechanism of the PMC. The second model is a physics-based numerical model developed for the PMC. This model is successful in matching data measured from a chalcogenide glass PMC designed and manufactured at ASU. Matched operating characteristics observable in the current and resistance vs. voltage data include the OFF/ON resistances and write/erase and electrodeposition voltage thresholds. Multilevel programming is also explained and demonstrated with the numerical model. The numerical model has already proven useful by revealing some information presented about the operation and characteristics of the PMC.
ContributorsOleksy, David Ryan (Author) / Barnaby, Hugh J (Thesis advisor) / Kozicki, Michael N (Committee member) / Edwards, Arthur H (Committee member) / Arizona State University (Publisher)
Created2013
152415-Thumbnail Image.png
Description
We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.
ContributorsBai, Ke (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Xue, Guoliang (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)
Created2014
152288-Thumbnail Image.png
Description
Chalcogenide glass (ChG) materials have gained wide attention because of their applications in conductive bridge random access memory (CBRAM), phase change memories (PC-RAM), optical rewritable disks (CD-RW and DVD-RW), microelectromechanical systems (MEMS), microfluidics, and optical communications. One of the significant properties of ChG materials is the change in the resistivity

Chalcogenide glass (ChG) materials have gained wide attention because of their applications in conductive bridge random access memory (CBRAM), phase change memories (PC-RAM), optical rewritable disks (CD-RW and DVD-RW), microelectromechanical systems (MEMS), microfluidics, and optical communications. One of the significant properties of ChG materials is the change in the resistivity of the material when a metal such as Ag or Cu is added to it by diffusion. This study demonstrates the potential radiation-sensing capabilities of two metal/chalcogenide glass device configurations. Lateral and vertical device configurations sense the radiation-induced migration of Ag+ ions in germanium selenide glasses via changes in electrical resistance between electrodes on the ChG. Before irradiation, these devices exhibit a high-resistance `OFF-state' (in the order of 10E12) but following irradiation, with either 60-Co gamma-rays or UV light, their resistance drops to a low-resistance `ON-state' (around 10E3). Lateral devices have exhibited cyclical recovery with room temperature annealing of the Ag doped ChG, which suggests potential uses in reusable radiation sensor applications. The feasibility of producing inexpensive flexible radiation sensors has been demonstrated by studying the effects of mechanical strain and temperature stress on sensors formed on flexible polymer substrate. The mechanisms of radiation-induced Ag/Ag+ transport and reactions in ChG have been modeled using a finite element device simulator, ATLAS. The essential reactions captured by the simulator are radiation-induced carrier generation, combined with reduction/oxidation for Ag species in the chalcogenide film. Metal-doped ChGs are solid electrolytes that have both ionic and electronic conductivity. The ChG based Programmable Metallization Cell (PMC) is a technology platform that offers electric field dependent resistance switching mechanisms by formation and dissolution of nano sized conductive filaments in a ChG solid electrolyte between oxidizable and inert electrodes. This study identifies silver anode agglomeration in PMC devices following large radiation dose exposure and considers device failure mechanisms via electrical and material characterization. The results demonstrate that by changing device structural parameters, silver agglomeration in PMC devices can be suppressed and reliable resistance switching may be maintained for extremely high doses ranging from 4 Mrad(GeSe) to more than 10 Mrad (ChG).
ContributorsDandamudi, Pradeep (Author) / Kozicki, Michael N (Thesis advisor) / Barnaby, Hugh J (Committee member) / Holbert, Keith E. (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)
Created2013
152867-Thumbnail Image.png
Description
There is an ever growing need for larger memories which are reliable and fast. New technologies to implement non-volatile memories which are large, fast, compact and cost-efficient are being studied extensively. One of the most promising technologies being developed is the resistive RAM (ReRAM). In ReRAM the resistance of the

There is an ever growing need for larger memories which are reliable and fast. New technologies to implement non-volatile memories which are large, fast, compact and cost-efficient are being studied extensively. One of the most promising technologies being developed is the resistive RAM (ReRAM). In ReRAM the resistance of the device varies with the voltage applied across it. Programmable metallization cells (PMC) is one of the devices belonging to this category of non-volatile memories.

In order to advance the development of these devices, there is a need to develop simulation models which replicate the behavior of these devices in circuits. In this thesis, a verilogA model for the PMC has been developed. The behavior of the model has been tested using DC and transient simulations. Experimental data obtained from testing PMC devices fabricated at Arizona State University have been compared to results obtained from simulation.

A basic memory cell known as the 1T 1R cell built using the PMC has also been simulated and verified. These memory cells have the potential to be building blocks of large scale memories. I believe that the verilogA model developed in this thesis will prove to be a powerful tool for researchers and circuit developers looking to develop non-volatile memories using alternative technologies.
ContributorsBharadwaj, Vineeth (Author) / Barnaby, Hugh (Thesis advisor) / Kozicki, Michael (Committee member) / Mikkola, Esko (Committee member) / Arizona State University (Publisher)
Created2014
152978-Thumbnail Image.png
Description
Nonvolatile memory (NVM) technologies have been an integral part of electronic systems for the past 30 years. The ideal non-volatile memory have minimal physical size, energy usage, and cost while having maximal speed, capacity, retention time, and radiation hardness. A promising candidate for next-generation memory is ion-conducting bridging RAM which

Nonvolatile memory (NVM) technologies have been an integral part of electronic systems for the past 30 years. The ideal non-volatile memory have minimal physical size, energy usage, and cost while having maximal speed, capacity, retention time, and radiation hardness. A promising candidate for next-generation memory is ion-conducting bridging RAM which is referred to as programmable metallization cell (PMC), conductive bridge RAM (CBRAM), or electrochemical metallization memory (ECM), which is likely to surpass flash memory in all the ideal memory characteristics. A comprehensive physics-based model is needed to completely understand PMC operation and assist in design optimization.

To advance the PMC modeling effort, this thesis presents a precise physical model parameterizing materials associated with both ion-rich and ion-poor layers of the PMC's solid electrolyte, so that captures the static electrical behavior of the PMC in both its low-resistance on-state (LRS) and high resistance off-state (HRS). The experimental data is measured from a chalcogenide glass PMC designed and manufactured at ASU. The static on- and off-state resistance of a PMC device composed of a layered (Ag-rich/Ag-poor) Ge30Se70 ChG film is characterized and modeled using three dimensional simulation code written in Silvaco Atlas finite element analysis software. Calibrating the model to experimental data enables the extraction of device parameters such as material bandgaps, workfunctions, density of states, carrier mobilities, dielectric constants, and affinities.

The sensitivity of our modeled PMC to the variation of its prominent achieved material parameters is examined on the HRS and LRS impedance behavior.

The obtained accurate set of material parameters for both Ag-rich and Ag-poor ChG systems and process variation verification on electrical characteristics enables greater fidelity in PMC device simulation, which significantly enhances our ability to understand the underlying physics of ChG-based resistive switching memory.
ContributorsRajabi, Saba (Author) / Barnaby, Hugh (Thesis advisor) / Kozicki, Michael (Committee member) / Vasileska, Dragica (Committee member) / Arizona State University (Publisher)
Created2014
153040-Thumbnail Image.png
Description
Android has been the dominant platform in which most of the mobile development is being done. By the end of the second quarter of 2014, 84.7 percent of the entire world mobile phones market share had been captured by Android. The Android library internally uses the modified Linux kernel as

Android has been the dominant platform in which most of the mobile development is being done. By the end of the second quarter of 2014, 84.7 percent of the entire world mobile phones market share had been captured by Android. The Android library internally uses the modified Linux kernel as the part of its stack. The I/O scheduler, is a part of the Linux kernel, responsible for scheduling data requests to the internal and the external memory devices that are attached to the mobile systems.

The usage of solid state drives in the Android tablet has also seen a rise owing to its speed of operation and mechanical stability. The I/O schedulers that exist in the present Linux kernel are not better suited for handling solid state drives in particular to exploit the inherent parallelism offered by the solid state drives. The Android provides information to the Linux kernel about the processes running in the foreground and background. Based on this information the kernel decides the process scheduling and the memory management, but no such information exists for the I/O scheduling. Research shows that the resource management could be done better if the operating system is aware of the characteristics of the requester. Thus, there is a need for a better I/O scheduler that could schedule I/O operations based on the application and also exploit the parallelism in the solid state drives. The scheduler proposed through this research does that. It contains two algorithms working in unison one focusing on the solid state drives and the other on the application awareness.

The Android application context aware scheduler has the features of increasing the responsiveness of the time sensitive applications and also increases the throughput by parallel scheduling of request in the solid state drive. The suggested scheduler is tested using standard benchmarks and real-time scenarios, the results convey that our scheduler outperforms the existing default completely fair queuing scheduler of the Android.
ContributorsSivasankaran, Jeevan Prasath (Author) / Lee, Yann Hang (Thesis advisor) / Wu, Carole-Jean (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2014
153193-Thumbnail Image.png
Description
As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but they become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware.

As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but they become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. This work presents a source-to-source translator written using CETUS that identifies a conservative superset of all the shared data in a multi-threaded application and maps it to the shared memory such that it enables execution on HSM architectures.
ContributorsRawat, Tushar (Author) / Shrivastava, Aviral (Thesis advisor) / Dasgupta, Partha (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)
Created2014
150204-Thumbnail Image.png
Description
Programmable metallization cell (PMC) technology is based on an electrochemical phenomenon in which a metallic electrodeposit can be grown or dissolved between two electrodes depending on the voltage applied between them. Devices based on this phenomenon exhibit a unique, self-healing property, as a broken metallic structure can be healed by

Programmable metallization cell (PMC) technology is based on an electrochemical phenomenon in which a metallic electrodeposit can be grown or dissolved between two electrodes depending on the voltage applied between them. Devices based on this phenomenon exhibit a unique, self-healing property, as a broken metallic structure can be healed by applying an appropriate voltage between the two broken ends. This work explores methods of fabricating interconnects and switches based on PMC technology on flexible substrates. The objective was the evaluation of the feasibility of using this technology in flexible electronics applications in which reliability is a primary concern. The re-healable property of the interconnect is characterized for the silver doped germanium selenide (Ag-Ge-Se) solid electrolyte system. This property was evaluated by measuring the resistances of the healed interconnect structures and comparing these to the resistances of the unbroken structures. The reliability of the interconnects in both unbroken and healed states is studied by investigating the resistances of the structures to DC voltages, AC voltages and different temperatures as a function of time. This work also explores replacing silver with copper for these interconnects to enhance their reliability. A model for PMC-based switches on flexible substrates is proposed and compared to the observed device behavior with the objective of developing a formal design methodology for these devices. The switches were subjected to voltage sweeps and their resistance was investigated as a function of sweep voltage. The resistance of the switches as a function of voltage pulse magnitude when placed in series with a resistance was also investigated. A model was then developed to explain the behavior of these devices. All observations were based on statistical measurements to account for random errors. The results of this work demonstrate that solid electrolyte based interconnects display self-healing capability, which depends on the applied healing voltage and the current limit. However, they fail at lower current densities than metal interconnects due to an ion-drift induced failure mechanism. The results on the PMC based switches demonstrate that a model comprising a Schottky diode in parallel with a variable resistor predicts the behavior of the device.
ContributorsBaliga, Sunil Ravindranath (Author) / Kozicki, Michael N (Thesis advisor) / Schroder, Dieter K. (Committee member) / Chae, Junseok (Committee member) / Alford, Terry L. (Committee member) / Arizona State University (Publisher)
Created2011
150476-Thumbnail Image.png
Description
Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns.

Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns. However, architectures based on RC decomposition are not efficient for large input size data which have to be stored in external memories based Synchronous Dynamic RAM (SDRAM). In this dissertation, first an efficient architecture to implement 2-D DFT for large-sized input data is proposed. This architecture achieves very high throughput by exploiting the inherent parallelism due to a novel 2-D decomposition and by utilizing the row-wise burst access pattern of the SDRAM external memory. In addition, an automatic IP generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2048x2048 input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations. While the proposed 2-D DFT IP can achieve high performance, its output is bit-reversed. For systems where the output is required to be in natural order, use of this DFT IP would result in timing overhead. To solve this problem, a new bandwidth-efficient MD DFT IP that is transpose-free and produces outputs in natural order is proposed. It is based on a novel decomposition algorithm that takes into account the output order, FPGA resources, and the characteristics of off-chip memory access. An IP generator is designed and integrated into an in-house FPGA development platform, AlgoFLEX, for easy verification and fast integration. The corresponding 2-D and 3-D DFT architectures are ported onto the BEE3 board and their performance measured and analyzed. The results shows that the architecture can maintain the maximum memory bandwidth throughout the whole procedure while avoiding matrix transpose operations used in most other MD DFT implementations. The proposed architecture has also been ported onto the Xilinx ML605 board. When clocked at 100 MHz, 2048x2048 images with complex single-precision can be processed in less than 27 ms. Finally, transpose-free imaging flows for range-Doppler algorithm (RDA) and chirp-scaling algorithm (CSA) in SAR imaging are proposed. The corresponding implementations take advantage of the memory access patterns designed for the MD DFT IP and have superior timing performance. The RDA and CSA flows are mapped onto a unified architecture which is implemented on an FPGA platform. When clocked at 100MHz, the RDA and CSA computations with data size 4096x4096 can be completed in 323ms and 162ms, respectively. This implementation outperforms existing SAR image accelerators based on FPGA and GPU.
ContributorsYu, Chi-Li (Author) / Chakrabarti, Chaitali (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Karam, Lina (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2012
150544-Thumbnail Image.png
Description
Limited Local Memory (LLM) multicore architectures are promising powerefficient architectures will scalable memory hierarchy. In LLM multicores, each core can access only a small local memory. Accesses to a large shared global memory can only be made explicitly through Direct Memory Access (DMA) operations. Standard Template Library (STL) is a

Limited Local Memory (LLM) multicore architectures are promising powerefficient architectures will scalable memory hierarchy. In LLM multicores, each core can access only a small local memory. Accesses to a large shared global memory can only be made explicitly through Direct Memory Access (DMA) operations. Standard Template Library (STL) is a powerful programming tool and is widely used for software development. STLs provide dynamic data structures, algorithms, and iterators for vector, deque (double-ended queue), list, map (red-black tree), etc. Since the size of the local memory is limited in the cores of the LLM architecture, and data transfer is not automatically supported by hardware cache or OS, the usage of current STL implementation on LLM multicores is limited. Specifically, there is a hard limitation on the amount of data they can handle. In this article, we propose and implement a framework which manages the STL container classes on the local memory of LLM multicore architecture. Our proposal removes the data size limitation of the STL, and therefore improves the programmability on LLM multicore architectures with little change to the original program. Our implementation results in only about 12%-17% increase in static library code size and reasonable runtime overheads.
ContributorsLu, Di (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Dasgupta, Partha (Committee member) / Arizona State University (Publisher)
Created2012