Matching Items (40)
Filtering by

Clear all filters

151527-Thumbnail Image.png
Description
Rapid technology scaling, the main driver of the power and performance improvements of computing solutions, has also rendered our computing systems extremely susceptible to transient errors called soft errors. Among the arsenal of techniques to protect computation from soft errors, Control Flow Checking (CFC) based techniques have gained a reputation

Rapid technology scaling, the main driver of the power and performance improvements of computing solutions, has also rendered our computing systems extremely susceptible to transient errors called soft errors. Among the arsenal of techniques to protect computation from soft errors, Control Flow Checking (CFC) based techniques have gained a reputation of effective, yet low-cost protection mechanism. The basic idea is that, there is a high probability that a soft-fault in program execution will eventually alter the control flow of the program. Therefore just by making sure that the control flow of the program is correct, significant protection can be achieved. More than a dozen techniques for CFC have been developed over the last several decades, ranging from hardware techniques, software techniques, and hardware-software hybrid techniques as well. Our analysis shows that existing CFC techniques are not only ineffective in protecting from soft errors, but cause additional power and performance overheads. For this analysis, we develop and validate a simulation based experimental setup to accurately and quantitatively estimate the architectural vulnerability of a program execution on a processor micro-architecture. We model the protection achieved by various state-of-the-art CFC techniques in this quantitative vulnerability estimation setup, and find out that software only CFC protection schemes (CFCSS, CFCSS+NA, CEDA) increase system vulnerability by 18% to 21% with 17% to 38% performance overhead. Hybrid CFC protection (CFEDC) increases vulnerability by 5%, while the vulnerability remains almost the same for hardware only CFC protection (CFCET); notwithstanding the hardware overheads of design cost, area, and power incurred in the hardware modifications required for their implementations.
ContributorsRhisheekesan, Abhishek (Author) / Shrivastava, Aviral (Thesis advisor) / Colbourn, Charles Joseph (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)
Created2013
152415-Thumbnail Image.png
Description
We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale

We are expecting hundreds of cores per chip in the near future. However, scaling the memory architecture in manycore architectures becomes a major challenge. Cache coherence provides a single image of memory at any time in execution to all the cores, yet coherent cache architectures are believed will not scale to hundreds and thousands of cores. In addition, caches and coherence logic already take 20-50% of the total power consumption of the processor and 30-60% of die area. Therefore, a more scalable architecture is needed for manycore architectures. Software Managed Manycore (SMM) architectures emerge as a solution. They have scalable memory design in which each core has direct access to only its local scratchpad memory, and any data transfers to/from other memories must be done explicitly in the application using Direct Memory Access (DMA) commands. Lack of automatic memory management in the hardware makes such architectures extremely power-efficient, but they also become difficult to program. If the code/data of the task mapped onto a core cannot fit in the local scratchpad memory, then DMA calls must be added to bring in the code/data before it is required, and it may need to be evicted after its use. However, doing this adds a lot of complexity to the programmer's job. Now programmers must worry about data management, on top of worrying about the functional correctness of the program - which is already quite complex. This dissertation presents a comprehensive compiler and runtime integration to automatically manage the code and data of each task in the limited local memory of the core. We firstly developed a Complete Circular Stack Management. It manages stack frames between the local memory and the main memory, and addresses the stack pointer problem as well. Though it works, we found we could further optimize the management for most cases. Thus a Smart Stack Data Management (SSDM) is provided. In this work, we formulate the stack data management problem and propose a greedy algorithm for the same. Later on, we propose a general cost estimation algorithm, based on which CMSM heuristic for code mapping problem is developed. Finally, heap data is dynamic in nature and therefore it is hard to manage it. We provide two schemes to manage unlimited amount of heap data in constant sized region in the local memory. In addition to those separate schemes for different kinds of data, we also provide a memory partition methodology.
ContributorsBai, Ke (Author) / Shrivastava, Aviral (Thesis advisor) / Chatha, Karamvir (Committee member) / Xue, Guoliang (Committee member) / Chakrabarti, Chaitali (Committee member) / Arizona State University (Publisher)
Created2014
151200-Thumbnail Image.png
Description
In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM)

In recent years, we have observed the prevalence of stream applications in many embedded domains. Stream programs distinguish themselves from traditional sequential programming languages through well defined independent actors, explicit data communication, and stable code/data access patterns. In order to achieve high performance and low power, scratch pad memory (SPM) has been introduced in today's embedded multicore processors. Current design frameworks for developing stream applications on SPM enhanced embedded architectures typically do not include a compiler that can perform automatic partitioning, mapping and scheduling under limited on-chip SPM capacities and memory access delays. Consequently, many designs are implemented manually, which leads to lengthy tasks and inferior designs. In this work, optimization techniques that automatically compile stream programs onto embedded multi-core architectures are proposed. As an initial case study, we implemented an automatic target recognition (ATR) algorithm on the IBM Cell Broadband Engine (BE). Then integer linear programming (ILP) and heuristic approaches were proposed to schedule stream programs on a single core embedded processor that has an SPM with code overlay. Later, ILP and heuristic approaches for Compiling Stream programs on SPM enhanced Multicore Processors (CSMP) were studied. The proposed CSMP ILP and heuristic approaches do not optimize for cycles in stream applications. Further, the number of software pipeline stages in the implementation is dependent on actor to processing engine (PE) mapping and is uncontrollable. We next presented a Retiming technique for Throughput optimization on Embedded Multi-core processors (RTEM). RTEM approach inherently handles cycles and can accept an upper bound on the number of software pipeline stages to be generated. We further enhanced RTEM by incorporating unrolling (URSTEM) that preserves all the beneficial properties of RTEM heuristic and also scales with the number of PEs through unrolling.
ContributorsChe, Weijia (Author) / Chatha, Karam Singh (Thesis advisor) / Vrudhula, Sarma (Committee member) / Chakrabarti, Chaitali (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)
Created2012
151100-Thumbnail Image.png
Description
The ability to shift the photovoltaic (PV) power curve and make the energy accessible during peak hours can be accomplished through pairing solar PV with energy storage technologies. A prototype hybrid air conditioning system (HACS), built under supervision of project head Patrick Phelan, consists of PV modules running a DC

The ability to shift the photovoltaic (PV) power curve and make the energy accessible during peak hours can be accomplished through pairing solar PV with energy storage technologies. A prototype hybrid air conditioning system (HACS), built under supervision of project head Patrick Phelan, consists of PV modules running a DC compressor that operates a conventional HVAC system paired with a second evaporator submerged within a thermal storage tank. The thermal storage is a 0.284m3 or 75 gallon freezer filled with Cryogel balls, submerged in a weak glycol solution. It is paired with its own separate air handler, circulating the glycol solution. The refrigerant flow is controlled by solenoid valves that are electrically connected to a high and low temperature thermostat. During daylight hours, the PV modules run the DC compressor. The refrigerant flow is directed to the conventional HVAC air handler when cooling is needed. Once the desired room temperature is met, refrigerant flow is diverted to the thermal storage, storing excess PV power. During peak energy demand hours, the system uses only small amounts of grid power to pump the glycol solution through the air handler (note the compressor is off), allowing for money and energy savings. The conventional HVAC unit can be scaled down, since during times of large cooling demands the glycol air handler can be operated in parallel with the conventional HVAC unit. Four major test scenarios were drawn up in order to fully comprehend the performance characteristics of the HACS. Upon initial running of the system, ice was produced and the thermal storage was charged. A simple test run consisting of discharging the thermal storage, initially ~¼ frozen, was performed. The glycol air handler ran for 6 hours and the initial cooling power was 4.5 kW. This initial test was significant, since greater than 3.5 kW of cooling power was produced for 3 hours, thus demonstrating the concept of energy storage and recovery.
ContributorsPeyton-Levine, Tobin (Author) / Phelan, Patrick (Thesis advisor) / Trimble, Steve (Committee member) / Wang, Robert (Committee member) / Arizona State University (Publisher)
Created2012
149480-Thumbnail Image.png
Description
Portable devices rely on battery systems that contribute largely to the overall device form factor and delay portability due to recharging. Membraneless microfluidic fuel cells are considered as the next generation of portable power sources for their compatibility with higher energy density reactants. Microfluidic fuel cells are potentially cost effective

Portable devices rely on battery systems that contribute largely to the overall device form factor and delay portability due to recharging. Membraneless microfluidic fuel cells are considered as the next generation of portable power sources for their compatibility with higher energy density reactants. Microfluidic fuel cells are potentially cost effective and robust because they use low Reynolds number flow to maintain fuel and oxidant separation instead of ion exchange membranes. However, membraneless fuel cells suffer from poor efficiency due to poor mass transport and Ohmic losses. Current microfluidic fuel cell designs suffer from reactant cross-diffusion and thick boundary layers at the electrode surfaces, which result in a compromise between the cell's power output and fuel utilization. This dissertation presents novel flow field architectures aimed at alleviating the mass transport limitations. The first architecture provides a reactant interface where the reactant diffusive concentration gradients are aligned with the bulk flow, mitigating reactant mixing through diffusion and thus crossover. This cell also uses porous electro-catalysts to improve electrode mass transport which results in higher extraction of reactant energy. The second architecture uses porous electrodes and an inert conductive electrolyte stream between the reactants to enhance the interfacial electrical conductivity and maintain complete reactant separation. This design is stacked hydrodynamically and electrically, analogous to membrane based systems, providing increased reactant utilization and power. These fuel cell architectures decouple the fuel cell's power output from its fuel utilization. The fuel cells are tested over a wide range of conditions including variation of the loads, reactant concentrations, background electrolytes, flow rates, and fuel cell geometries. These experiments show that increasing the fuel cell power output is accomplished by increasing reactant flow rates, electrolyte conductivity, and ionic exchange areas, and by decreasing the spacing between the electrodes. The experimental and theoretical observations presented in this dissertation will aid in the future design and commercialization of a new portable power source, which has the desired attributes of high power output per weight and volume and instant rechargeability.
ContributorsSalloum, Kamil S (Author) / Posner, Jonathan D (Thesis advisor) / Adrian, Ronald (Committee member) / Christen, Jennifer (Committee member) / Phelan, Patrick (Committee member) / Chen, Kangping (Committee member) / Arizona State University (Publisher)
Created2010
149523-Thumbnail Image.png
Description
Many expect renewable energy technologies to play a leading role in a sustainable energy supply system and to aid the shift away from an over-reliance on traditional hydrocarbon resources in the next few decades. This dissertation develops environmental, policy and social models to help understand various aspects of photovoltaic (PV)

Many expect renewable energy technologies to play a leading role in a sustainable energy supply system and to aid the shift away from an over-reliance on traditional hydrocarbon resources in the next few decades. This dissertation develops environmental, policy and social models to help understand various aspects of photovoltaic (PV) technologies. The first part of this dissertation advances the life cycle assessment (LCA) of PV systems by expanding the boundary of included processes using hybrid LCA and accounting for the technology-driven dynamics of environmental impacts. Hybrid LCA extends the traditional method combining bottom-up process-sum and top-down economic input-output (EIO) approaches. The embodied energy and carbon of multi-crystalline silicon photovoltaic systems are assessed using hybrid LCA. From 2001 to 2010, the embodied energy and carbon fell substantially, indicating that technological progress is realizing reductions in environmental impacts in addition to lower module price. A variety of policies support renewable energy adoption, and it is critical to make them function cooperatively. To reveal the interrelationships among these policies, the second part of this dissertation proposes three tiers of policy architecture. This study develops a model to determine the specific subsidies required to support a Renewable Portfolio Standard (RPS) goal. The financial requirements are calculated (in two scenarios) and compared with predictable funds from public sources. A main result is that the expected investments to achieve the RPS goal far exceed the economic allocation for subsidy of distributed PV. Even with subsidies there are often challenges with social acceptance. The third part of this dissertation originally develops a fuzzy logic inference model to relate consumers' attitudes about the technology such as perceived cost, maintenance, and environmental concern to their adoption intention. Fuzzy logic inference model is a type of soft computing models. It has the advantage of dealing with imprecise and insufficient information and mimicking reasoning processes of human brains. This model is implemented in a case study of residential PV adoption using data through a survey of homeowners in Arizona. The output of this model is the purchasing probability of PV.
ContributorsZhai, Pei (Author) / Williams, Eric D. (Thesis advisor) / Allenby, Braden (Committee member) / Phelan, Patrick (Committee member) / Arizona State University (Publisher)
Created2010
171895-Thumbnail Image.png
Description
Adversarial threats of deep learning are increasingly becoming a concern due to the ubiquitous deployment of deep neural networks(DNNs) in many security-sensitive domains. Among the existing threats, adversarial weight perturbation is an emerging class of threats that attempts to perturb the weight parameters of DNNs to breach security and privacy.In

Adversarial threats of deep learning are increasingly becoming a concern due to the ubiquitous deployment of deep neural networks(DNNs) in many security-sensitive domains. Among the existing threats, adversarial weight perturbation is an emerging class of threats that attempts to perturb the weight parameters of DNNs to breach security and privacy.In this thesis, the first weight perturbation attack introduced is called Bit-Flip Attack (BFA), which can maliciously flip a small number of bits within a computer’s main memory system storing the DNN weight parameter to achieve malicious objectives. Our developed algorithm can achieve three specific attack objectives: I) Un-targeted accuracy degradation attack, ii) Targeted attack, & iii) Trojan attack. Moreover, BFA utilizes the rowhammer technique to demonstrate the bit-flip attack in an actual computer prototype. While the bit-flip attack is conducted in a white-box setting, the subsequent contribution of this thesis is to develop another novel weight perturbation attack in a black-box setting. Consequently, this thesis discusses a new study of DNN model vulnerabilities in a multi-tenant Field Programmable Gate Array (FPGA) cloud under a strict black-box framework. This newly developed attack framework injects faults in the malicious tenant by duplicating specific DNN weight packages during data transmission between off-chip memory and on-chip buffer of a victim FPGA. The proposed attack is also experimentally validated in a multi-tenant cloud FPGA prototype. In the final part, the focus shifts toward deep learning model privacy, popularly known as model extraction, that can steal partial DNN weight parameters remotely with the aid of a memory side-channel attack. In addition, a novel training algorithm is designed to utilize the partially leaked DNN weight bit information, making the model extraction attack more effective. The algorithm effectively leverages the partial leaked bit information and generates a substitute prototype of the victim model with almost identical performance to the victim.
ContributorsRakin, Adnan Siraj (Author) / Fan, Deliang (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Seo, Jae-Sun (Committee member) / Cao, Yu (Committee member) / Arizona State University (Publisher)
Created2022
190894-Thumbnail Image.png
Description
Energy storage technologies are essential to overcome the temporal variability in renewable energy. The primary aim of this thesis is to develop reactor solutions to better analyze the potential of thermochemical energy storage (TCES) using non-stoichiometric metal oxides, for the multi-day energy storage application. A TCES system consists of a

Energy storage technologies are essential to overcome the temporal variability in renewable energy. The primary aim of this thesis is to develop reactor solutions to better analyze the potential of thermochemical energy storage (TCES) using non-stoichiometric metal oxides, for the multi-day energy storage application. A TCES system consists of a reduction reactor and an insulated MOx storage bin. The reduction reactor heats (to ~ 1100 °C) and partially reduces the MOx, thereby adding sensible and chemical energy (i.e., charging it) under reduced pO2 environments (~10 Pa). Inert gas removes the oxygen generated during reduction. The storage bin holds the hot and partially reduced MOx (typically particles) until it is used in an energy recovery device (i.e., discharge). Irrespective of the reactor heat source (here electrical), or the particle-inert gas flows (here countercurrent), the thermal reduction temperature and inert gas (here N2) flow minimize when the process approaches reversibility, i.e., operates near equilibrium. This study specifically focuses on developing a reduction reactor based on the theoretical considerations for approaching reversibility along the reaction path. The proposed Zigzag flow reactor (ZFR) is capable of thermally reducing CAM28 particles at temperatures ~ 1000 °C under an O2 partial pressure ~ 10 Pa. The associated analytical and numerical models analyze the reaction equilibrium under a real (discrete) reaction path and the mass transfer kinetic conditions necessary to approach equilibrium. The discrete equilibrium model minimizes the exergy destroyed in a practical reactor and identifies methods of maximizing the energy storage density () and the exergetic efficiency. The mass transfer model analyzes the O2 N2 concentration boundary layers to recommend sizing considerations to maximize the reactor power density. Two functional ZFR prototypes, the -ZFR and the -ZFR, establish the proof of concept and achieved a reduction extent, Δδ = 0.071 with CAM28 at T~950 °C and pO2 = 10 Pa, 7x higher than a previous attempt in the literature. The -ZFR consistently achieved  > 100 Wh/kg during >10 h. runtime and the -ZFR displayed an improved  = 130 Wh/kg during >5 h. operation with CAM28. A techno-economic model of a grid-scale ZFR with an associated storage bin analyzes the cost of scaling the ZFR for grid energy storage requirements. The scaled ZFR capital costs contribute < 1% to the levelized cost of thermochemical energy storage, which ranges from 5-20 ¢/kWh depending on the storage temperature and storage duration.
ContributorsGhotkar, Rhushikesh (Author) / Milcarek, Ryan (Thesis advisor) / Ermanoski, Ivan (Committee member) / Phelan, Patrick (Committee member) / Wang, Liping (Committee member) / Wang, Robert (Committee member) / Arizona State University (Publisher)
Created2023
189327-Thumbnail Image.png
Description
In recent years, the proliferation of deep neural networks (DNNs) has revolutionized the field of artificial intelligence, enabling advancements in various domains. With the emergence of efficient learning techniques such as quantization and distributed learning, DNN systems have become increasingly accessible for deployment on edge devices. This accessibility brings significant

In recent years, the proliferation of deep neural networks (DNNs) has revolutionized the field of artificial intelligence, enabling advancements in various domains. With the emergence of efficient learning techniques such as quantization and distributed learning, DNN systems have become increasingly accessible for deployment on edge devices. This accessibility brings significant benefits, including real-time inference on the edge, which mitigates communication latency, and on-device learning, which addresses privacy concerns and enables continuous improvement. However, the resource limitations of edge devices pose challenges in equipping them with robust safety protocols, making them vulnerable to various attacks. Two notable attacks that affect edge DNN systems are Bit-Flip Attacks (BFA) and architecture stealing attacks. BFA compromises the integrity of DNN models, while architecture stealing attacks aim to extract valuable intellectual property by reverse engineering the model's architecture. Furthermore, in Split Federated Learning (SFL) scenarios, where training occurs on distributed edge devices, Model Inversion (MI) attacks can reconstruct clients' data, and Model Extraction (ME) attacks can extract sensitive model parameters. This thesis aims to address these four attack scenarios and develop effective defense mechanisms. To defend against BFA, both passive and active defensive strategies are discussed. Furthermore, for both model inference and training, architecture stealing attacks are mitigated through novel defense techniques, ensuring the integrity and confidentiality of edge DNN systems. In the context of SFL, the thesis showcases defense mechanisms against MI attacks for both supervised and self-supervised learning applications. Additionally, the research investigates ME attacks in SFL and proposes countermeasures to enhance resistance against potential ME attackers. By examining and addressing these attack scenarios, this research contributes to the security and privacy enhancement of edge DNN systems. The proposed defense mechanisms enable safer deployment of DNN models on resource-constrained edge devices, facilitating the advancement of real-time applications, preserving data privacy, and fostering the widespread adoption of edge computing technologies.
ContributorsLi, Jingtao (Author) / Chakrabarti, Chaitali (Thesis advisor) / Fan, Deliang (Committee member) / Cao, Yu (Committee member) / Trieu, Ni (Committee member) / Arizona State University (Publisher)
Created2023
171954-Thumbnail Image.png
Description
This thesis presents a code generation tool to improve the programmability of systolic array processors such as the Domain Adaptive Processor (DAP) that was designed by researchers at the University of Michigan for wireless communication workloads. Unlike application-specific integrated circuits, DAP aims to achieve high performance without trading off much

This thesis presents a code generation tool to improve the programmability of systolic array processors such as the Domain Adaptive Processor (DAP) that was designed by researchers at the University of Michigan for wireless communication workloads. Unlike application-specific integrated circuits, DAP aims to achieve high performance without trading off much on programmability and reconfigurability. The structure of a typical DAP code for each Processing Element (PE) is very different from any other programming language format. As a result, writing code for DAP requires the programmer to acquire processor-specific knowledge including configuration rules, cycle accurate execution state for memory and datapath components within each PE, etc. Each code must be carefully handcrafted to meet the strict timing and resource constraints, leading to very long programming times and low productivity. In this thesis, a code generation and optimization tool is introduced to improve the programmability of DAP and make code development easier. The tool consists of a configuration code generator, optimizer, and a scheduler. An Instruction Set Architecture (ISA) has been designed specifically for DAP. The programmer writes the assembly code for each PE using the DAP ISA. The assembly code is then translated into a low-level configuration code. This configuration code undergoes several optimizations passes. Level 1 (L1) optimization handles instruction redundancy and performs loop optimizations through code movement. The Level 2 (L2) optimization performs instruction-level parallelism. Use of L1 and L2 optimization passes result in a code that has fewer instructions and requires fewer cycles. In addition, a scheduling tool has been introduced which performs final timing adjustments on the code to match the input data rate.
ContributorsVipperla, Anish (Author) / Chakrabarti, Chaitali (Thesis advisor) / Bliss, Daniel (Committee member) / Akoglu, Ali (Committee member) / Arizona State University (Publisher)
Created2022