Search Content

Compiler and runtime for memory management on software managed manycore processors

Description

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale…

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.

ContributorsBai, Ke (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Xue, Guoliang (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)

Created2014

VerilogA modelling of programmable metallization cells

Description

There is an ever growing need for larger memories which are reliable and fast. New technologies to implement non-volatile memories which are large, fast, compact and cost-efficient are being studied extensively. One of the most promising technologies being developed is the resistive RAM (ReRAM). In ReRAM the resistance of the…

There is an ever growing need for larger memories which are reliable and fast. New technologies to implement non-volatile memories which are large, fast, compact and cost-efficient are being studied extensively. One of the most promising technologies being developed is the resistive RAM (ReRAM). In ReRAM the resistance of the device varies with the voltage applied across it. Programmable metallization cells (PMC) is one of the devices belonging to this category of non-volatile memories.

In order to advance the development of these devices, there is a need to develop simulation models which replicate the behavior of these devices in circuits. In this thesis, a verilogA model for the PMC has been developed. The behavior of the model has been tested using DC and transient simulations. Experimental data obtained from testing PMC devices fabricated at Arizona State University have been compared to results obtained from simulation.

A basic memory cell known as the 1T 1R cell built using the PMC has also been simulated and verified. These memory cells have the potential to be building blocks of large scale memories. I believe that the verilogA model developed in this thesis will prove to be a powerful tool for researchers and circuit developers looking to develop non-volatile memories using alternative technologies.

ContributorsBharadwaj, Vineeth (Author) / Barnaby, Hugh (Thesis advisor) / Kozicki, Michael (Committee member) / Mikkola, Esko (Committee member) / Arizona State University (Publisher)

Created2014

Static behavior of chalcogenide based programmable metallization cells

Description

Nonvolatile memory (NVM) technologies have been an integral part of electronic systems for the past 30 years. The ideal non-volatile memory have minimal physical size, energy usage, and cost while having maximal speed, capacity, retention time, and radiation hardness. A promising candidate for next-generation memory is ion-conducting bridging RAM which…

Nonvolatile memory (NVM) technologies have been an integral part of electronic systems for the past 30 years. The ideal non-volatile memory have minimal physical size, energy usage, and cost while having maximal speed, capacity, retention time, and radiation hardness. A promising candidate for next-generation memory is ion-conducting bridging RAM which is referred to as programmable metallization cell (PMC), conductive bridge RAM (CBRAM), or electrochemical metallization memory (ECM), which is likely to surpass flash memory in all the ideal memory characteristics. A comprehensive physics-based model is needed to completely understand PMC operation and assist in design optimization.

To advance the PMC modeling effort, this thesis presents a precise physical model parameterizing materials associated with both ion-rich and ion-poor layers of the PMC's solid electrolyte, so that captures the static electrical behavior of the PMC in both its low-resistance on-state (LRS) and high resistance off-state (HRS). The experimental data is measured from a chalcogenide glass PMC designed and manufactured at ASU. The static on- and off-state resistance of a PMC device composed of a layered (Ag-rich/Ag-poor) Ge30Se70 ChG film is characterized and modeled using three dimensional simulation code written in Silvaco Atlas finite element analysis software. Calibrating the model to experimental data enables the extraction of device parameters such as material bandgaps, workfunctions, density of states, carrier mobilities, dielectric constants, and affinities.

The sensitivity of our modeled PMC to the variation of its prominent achieved material parameters is examined on the HRS and LRS impedance behavior.

The obtained accurate set of material parameters for both Ag-rich and Ag-poor ChG systems and process variation verification on electrical characteristics enables greater fidelity in PMC device simulation, which significantly enhances our ability to understand the underlying physics of ChG-based resistive switching memory.

ContributorsRajabi, Saba (Author) / Barnaby, Hugh (Thesis advisor) / Kozicki, Michael (Committee member) / Vasileska, Dragica (Committee member) / Arizona State University (Publisher)

Created2014

Android application context aware I/O scheduler

Description

Android has been the dominant platform in which most of the mobile development is being done. By the end of the second quarter of 2014, 84.7 percent of the entire world mobile phones market share had been captured by Android. The Android library internally uses the modified Linux kernel as…

Android has been the dominant platform in which most of the mobile development is being done. By the end of the second quarter of 2014, 84.7 percent of the entire world mobile phones market share had been captured by Android. The Android library internally uses the modified Linux kernel as the part of its stack. The I/O scheduler, is a part of the Linux kernel, responsible for scheduling data requests to the internal and the external memory devices that are attached to the mobile systems.

The usage of solid state drives in the Android tablet has also seen a rise owing to its speed of operation and mechanical stability. The I/O schedulers that exist in the present Linux kernel are not better suited for handling solid state drives in particular to exploit the inherent parallelism offered by the solid state drives. The Android provides information to the Linux kernel about the processes running in the foreground and background. Based on this information the kernel decides the process scheduling and the memory management, but no such information exists for the I/O scheduling. Research shows that the resource management could be done better if the operating system is aware of the characteristics of the requester. Thus, there is a need for a better I/O scheduler that could schedule I/O operations based on the application and also exploit the parallelism in the solid state drives. The scheduler proposed through this research does that. It contains two algorithms working in unison one focusing on the solid state drives and the other on the application awareness.

The Android application context aware scheduler has the features of increasing the responsiveness of the time sensitive applications and also increases the throughput by parallel scheduling of request in the solid state drive. The suggested scheduler is tested using standard benchmarks and real-time scenarios, the results convey that our scheduler outperforms the existing default completely fair queuing scheduler of the Android.

ContributorsSivasankaran, Jeevan Prasath (Author) / Lee, Yann Hang (Thesis advisor) / Wu, Carole-Jean (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2014

Enabling multi-threaded applications on hybrid shared memory manycore architectures

Description

As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but they become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware.…

As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but they become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. This work presents a source-to-source translator written using CETUS that identifies a conservative superset of all the shared data in a multi-threaded application and maps it to the shared memory such that it enables execution on HSM architectures.

ContributorsRawat, Tushar (Author) / Shrivastava, Aviral (Thesis advisor) / Dasgupta, Partha (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)

Created2014

STL on limited local memory (LLM) multi-core processors

Description

Limited Local Memory (LLM) multicore architectures are promising powerefficient architectures will scalable memory hierarchy. In LLM multicores, each core can access only a small local memory. Accesses to a large shared global memory can only be made explicitly through Direct Memory Access (DMA) operations. Standard Template Library (STL) is a…

Limited Local Memory (LLM) multicore architectures are promising powerefficient architectures will scalable memory hierarchy. In LLM multicores, each core can access only a small local memory. Accesses to a large shared global memory can only be made explicitly through Direct Memory Access (DMA) operations. Standard Template Library (STL) is a powerful programming tool and is widely used for software development. STLs provide dynamic data structures, algorithms, and iterators for vector, deque (double-ended queue), list, map (red-black tree), etc. Since the size of the local memory is limited in the cores of the LLM architecture, and data transfer is not automatically supported by hardware cache or OS, the usage of current STL implementation on LLM multicores is limited. Specifically, there is a hard limitation on the amount of data they can handle. In this article, we propose and implement a framework which manages the STL container classes on the local memory of LLM multicore architecture. Our proposal removes the data size limitation of the STL, and therefore improves the programmability on LLM multicore architectures with little change to the original program. Our implementation results in only about 12%-17% increase in static library code size and reasonable runtime overheads.

ContributorsLu, Di (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Dasgupta, Partha (Committee member) / Arizona State University (Publisher)

Created2012

Compilation of stream programs onto embedded multicore architectures

Description

In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM)…

In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM) has been introduced in today's embedded multicore processors. Current design frameworks for developing stream applications on SPM enhanced embedded architectures typically do not include a compiler that can perform automatic partitioning, mapping and scheduling under limited on-chip SPM capacities and memory access delays. Consequently, many designs are implemented manually, which leads to lengthy tasks and inferior designs. In this work, optimization techniques that automatically compile stream programs onto embedded multi-core architectures are proposed. As an initial case study, we implemented an automatic target recognition (ATR) algorithm on the IBM Cell Broadband Engine (BE). Then integer linear programming (ILP) and heuristic approaches were proposed to schedule stream programs on a single core embedded processor that has an SPM with code overlay. Later, ILP and heuristic approaches for Compiling Stream programs on SPM enhanced Multicore Processors (CSMP) were studied. The proposed CSMP ILP and heuristic approaches do not optimize for cycles in stream applications. Further, the number of software pipeline stages in the implementation is dependent on actor to processing engine (PE) mapping and is uncontrollable. We next presented a Retiming technique for Throughput optimization on Embedded Multi-core processors (RTEM). RTEM approach inherently handles cycles and can accept an upper bound on the number of software pipeline stages to be generated. We further enhanced RTEM by incorporating unrolling (URSTEM) that preserves all the beneficial properties of RTEM heuristic and also scales with the number of PEs through unrolling.

ContributorsChe, Weijia (Author) / Chatha, Karam Singh (Thesis advisor) / Vrudhula, Sarma (Committee member) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2012

Electronic Body Protectors: Improving upon an unbiased method to judge Taekwondo Competitions

Description

In competitive Taekwondo, Electronic Body Protectors (EBPs) are used to register hits made by players during sparring. EBPs are comprised of three main components: chest guard, foot sock, and headgear. This equipment interacts with each other through the use of magnets, electric sensors, transmitters, and a receiver. The receiver is…

In competitive Taekwondo, Electronic Body Protectors (EBPs) are used to register hits made by players during sparring. EBPs are comprised of three main components: chest guard, foot sock, and headgear. This equipment interacts with each other through the use of magnets, electric sensors, transmitters, and a receiver. The receiver is connected to a computer programmed with software to process signals from the transmitter and determine whether or not a competitor scored a point. The current design of EBPs, however, have numerous shortcomings, including sensing false positives, failing to register hits, costing too much, and relying on human judgment. This thesis will thoroughly delineate the operation of the current EBPs used and discuss research performed in order to eliminate these weaknesses.

ContributorsSpell, Valerie Anne (Author) / Kozicki, Michael (Thesis director) / Kitchen, Jennifer (Committee member) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Retention of programmable metallization cells during ionizing radiation exposure

Description

Non-volatile memory (NVM) has become a staple in the everyday life of consumers. NVM manifests inside cell phones, laptops, and most recently, wearable tech such as smart watches. NAND Flash has been an excellent solution to conditions requiring fast, compact NVM. Current technology nodes are nearing the physical limits of…

Non-volatile memory (NVM) has become a staple in the everyday life of consumers. NVM manifests inside cell phones, laptops, and most recently, wearable tech such as smart watches. NAND Flash has been an excellent solution to conditions requiring fast, compact NVM. Current technology nodes are nearing the physical limits of scaling, preventing flash from improving. To combat the limitations of flash and to appease consumer demand for progressively faster and denser NVM, new technologies are needed. One possible candidate for the replacement of NAND Flash is programmable metallization cells (PMC). PMC are a type of resistive memory, meaning that they do not rely on charge storage to maintain a logic state. Depending on their application, it is possible that devices containing NVM will be exposed to harsh radiation environments. As part of the process for developing a novel memory technology, it is important to characterize the effects irradiation has on the functionality of the devices.

This thesis characterizes the effects that ionizing γ-ray irradiation has on the retention of the programmed resistive state of a PMC. The PMC devices tested used Ge30Se70 doped with Ag as the solid electrolyte layer and were fabricated by the thesis author in a Class 100 clean room. Individual device tiles were wire bonded into ceramic packages and tested in a biased and floating contact scenario.

The first scenario presented shows that PMC devices are capable of retaining their programmed state up to the maximum exposed total ionizing dose (TID) of 3.1 Mrad(Si). In this first scenario, the contacts of the PMC devices were left floating during exposure. The second scenario tested shows that the PMC devices are capable of retaining their state until the maximum TID of 10.1 Mrad(Si) was reached. The contacts in the second scenario were biased, with a 50 mV read voltage applied to the anode contact. Analysis of the results show that Ge30Se70 PMC are ionizing radiation tolerant and can retain a programmed state to a higher TID than NAND Flash memory.

ContributorsTaggart, Jennifer Lynn (Author) / Barnaby, Hugh (Thesis advisor) / Kozicki, Michael (Committee member) / Holbert, Keith E. (Committee member) / Arizona State University (Publisher)

Created2015

Memory interference characterization and mitigation for heterogeneous smartphones

Description

The availability of a wide range of general purpose as well as accelerator cores on

modern smartphones means that a significant number of applications can be executed

on a smartphone simultaneously, resulting in an ever increasing demand on the memory

subsystem. While the increased computation capability is intended for improving

user experience, memory requests…

The availability of a wide range of general purpose as well as accelerator cores on

modern smartphones means that a significant number of applications can be executed

on a smartphone simultaneously, resulting in an ever increasing demand on the memory

subsystem. While the increased computation capability is intended for improving

user experience, memory requests from each concurrent application exhibit unique

memory access patterns as well as specific timing constraints. If not considered, this

could lead to significant memory contention and result in lowered user experience.

This work first analyzes the impact of memory degradation caused by the interference

at the memory system for a broad range of commonly-used smartphone applications.

The real system characterization results show that smartphone applications,

such as web browsing and media playback, suffer significant performance degradation.

This is caused by shared resource contention at the application processor’s last-level

cache, the communication fabric, and the main memory.

Based on the detailed characterization results, rest of this thesis focuses on the

design of an effective memory interference mitigation technique. Since web browsing,

being one of the most commonly-used smartphone applications and represents many

html-based smartphone applications, my thesis focuses on meeting the performance

requirement of a web browser on a smartphone in the presence of background processes

and co-scheduled applications. My thesis proposes a light-weight user space frequency

governor to mitigate the degradation caused by interfering applications, by predicting

the performance and power consumption of web browsing. The governor selects an

optimal energy-efficient frequency setting periodically by using the statically-trained

performance and power models with dynamically-varying architecture and system

conditions, such as the memory access intensity of background processes and/or coscheduled applications, and temperature of cores. The governor has been extensively evaluated on a Nexus 5 smartphone over a diverse range of mobile workloads. By

operating at the most energy-efficient frequency setting in the presence of interference,

energy efficiency is improved by as much as 35% and with an average of 18% compared

to the existing interactive governor, while maintaining the satisfactory performance

of web page loading under 3 seconds.

ContributorsShingari, Davesh (Author) / Wu, Carole-Jean (Thesis advisor) / Vrudhula, Sarma (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2016

Filtering by