Matching Items (23)
Filtering by

Clear all filters

151527-Thumbnail Image.png
Description
Rapid technology scaling, the main driver of the power and performance improvements of computing solutions, has also rendered our computing systems extremely susceptible to transient errors called soft errors. Among the arsenal of techniques to protect computation from soft errors, Control Flow Checking (CFC) based techniques have gained a reputation

Rapid technology scaling, the main driver of the power and performance improvements of computing solutions, has also rendered our computing systems extremely susceptible to transient errors called soft errors. Among the arsenal of techniques to protect computation from soft errors, Control Flow Checking (CFC) based techniques have gained a reputation of effective, yet low-cost protection mechanism. The basic idea is that, there is a high probability that a soft-fault in program execution will eventually alter the control flow of the program. Therefore just by making sure that the control flow of the program is correct, significant protection can be achieved. More than a dozen techniques for CFC have been developed over the last several decades, ranging from hardware techniques, software techniques, and hardware-software hybrid techniques as well. Our analysis shows that existing CFC techniques are not only ineffective in protecting from soft errors, but cause additional power and performance overheads. For this analysis, we develop and validate a simulation based experimental setup to accurately and quantitatively estimate the architectural vulnerability of a program execution on a processor micro-architecture. We model the protection achieved by various state-of-the-art CFC techniques in this quantitative vulnerability estimation setup, and find out that software only CFC protection schemes (CFCSS, CFCSS+NA, CEDA) increase system vulnerability by 18% to 21% with 17% to 38% performance overhead. Hybrid CFC protection (CFEDC) increases vulnerability by 5%, while the vulnerability remains almost the same for hardware only CFC protection (CFCET); notwithstanding the hardware overheads of design cost, area, and power incurred in the hardware modifications required for their implementations.
ContributorsRhisheekesan, Abhishek (Author) / Shrivastava, Aviral (Thesis advisor) / Colbourn, Charles Joseph (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)
Created2013
Description
The Mobile Waterway Monitor seeks to monitor water in an unexplored way. The module is buoyant and will float with the current as well as harvests solar energy. In short, the Mobile Waterway Monitor excels in size constraints, flexibility, extensibility, and capability. This current following monitor can show both measured

The Mobile Waterway Monitor seeks to monitor water in an unexplored way. The module is buoyant and will float with the current as well as harvests solar energy. In short, the Mobile Waterway Monitor excels in size constraints, flexibility, extensibility, and capability. This current following monitor can show both measured trends like pH and interpolated trends like water speed, river contours, and elevation drop. The MWM strikes a balance between accuracy, portability, and being multi-purpose.
ContributorsStribrny, Kody John (Author) / Vrudhula, Sarma (Thesis director) / Wu, Carole-Jean (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)
Created2017-05
168714-Thumbnail Image.png
Description
Deep neural network-based methods have been proved to achieve outstanding performance on object detection and classification tasks. Deep neural networks follow the ``deeper model with deeper confidence'' belief to gain a higher recognition accuracy. However, reducing these networks' computational costs remains a challenge, which impedes their deployment on embedded devices.

Deep neural network-based methods have been proved to achieve outstanding performance on object detection and classification tasks. Deep neural networks follow the ``deeper model with deeper confidence'' belief to gain a higher recognition accuracy. However, reducing these networks' computational costs remains a challenge, which impedes their deployment on embedded devices. For instance, the intersection management of Connected Autonomous Vehicles (CAVs) requires running computationally intensive object recognition algorithms on low-power traffic cameras. This dissertation aims to study the effect of a dynamic hardware and software approach to address this issue. Characteristics of real-world applications can facilitate this dynamic adjustment and reduce the computation. Specifically, this dissertation starts with a dynamic hardware approach that adjusts itself based on the toughness of input and extracts deeper features if needed. Next, an adaptive learning mechanism has been studied that use extracted feature from previous inputs to improve system performance. Finally, a system (ARGOS) was proposed and evaluated that can be run on embedded systems while maintaining the desired accuracy. This system adopts shallow features at inference time, but it can switch to deep features if the system desires a higher accuracy. To improve the performance, ARGOS distills the temporal knowledge from deep features to the shallow system. Moreover, ARGOS reduces the computation furthermore by focusing on regions of interest. The response time and mean average precision are adopted for the performance evaluation to evaluate the proposed ARGOS system.
ContributorsFarhadi, Mohammad (Author) / Yang, Yezhou (Thesis advisor) / Vrudhula, Sarma (Committee member) / Wu, Carole-Jean (Committee member) / Ren, Yi (Committee member) / Arizona State University (Publisher)
Created2022
168720-Thumbnail Image.png
Description
Artificial intelligence (AI) has the potential to drive us towards a future in which all of humanity flourishes. It also comes with substantial risks of oppression and calamity. For example, social media platforms have knowingly and surreptitiously promoted harmful content, e.g., the rampant instances of disinformation and hate speech. Machine

Artificial intelligence (AI) has the potential to drive us towards a future in which all of humanity flourishes. It also comes with substantial risks of oppression and calamity. For example, social media platforms have knowingly and surreptitiously promoted harmful content, e.g., the rampant instances of disinformation and hate speech. Machine learning algorithms designed for combating hate speech were also found biased against underrepresented and disadvantaged groups. In response, researchers and organizations have been working to publish principles and regulations for the responsible use of AI. However, these conceptual principles also need to be turned into actionable algorithms to materialize AI for good. The broad aim of my research is to design AI systems that responsibly serve users and develop applications with social impact. This dissertation seeks to develop the algorithmic solutions for Socially Responsible AI (SRAI), a systematic framework encompassing the responsible AI principles and algorithms, and the responsible use of AI. In particular, it first introduces an interdisciplinary definition of SRAI and the AI responsibility pyramid, in which four types of AI responsibilities are described. It then elucidates the purpose of SRAI: how to bridge from the conceptual definitions to responsible AI practice through the three human-centered operations -- to Protect and Inform users, and Prevent negative consequences. They are illustrated in the social media domain given that social media has revolutionized how people live but has also contributed to the rise of many societal issues. The three representative tasks for each dimension are cyberbullying detection, disinformation detection and dissemination, and unintended bias mitigation. The means of SRAI is to develop responsible AI algorithms. Many issues (e.g., discrimination and generalization) can arise when AI systems are trained to improve accuracy without knowing the underlying causal mechanism. Causal inference, therefore, is intrinsically related to understanding and resolving these challenging issues in AI. As a result, this dissertation also seeks to gain an in-depth understanding of AI by looking into the precise relationships between causes and effects. For illustration, it introduces a recent work that applies deep learning to estimating causal effects and shows that causal learning algorithms can outperform traditional methods.
ContributorsCheng, Lu (Author) / Liu, Huan (Thesis advisor) / Varshney, Kush R. (Committee member) / Silva, Yasin N. (Committee member) / Wu, Carole-Jean (Committee member) / Candan, Kasim S. (Committee member) / Arizona State University (Publisher)
Created2022
156829-Thumbnail Image.png
Description
Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has significantly increased the domain and the number

of safety-critical applications – application with unacceptable consequences of failure. Software-level error resilience schemes are attractive because they can

Advances in semiconductor technology have brought computer-based systems intovirtually all aspects of human life. This unprecedented integration of semiconductor based systems in our lives has significantly increased the domain and the number

of safety-critical applications – application with unacceptable consequences of failure. Software-level error resilience schemes are attractive because they can provide commercial-off-the-shelf microprocessors with adaptive and scalable reliability.

Among all software-level error resilience solutions, in-application instruction replication based approaches have been widely used and are deemed to be the most effective. However, existing instruction-based replication schemes only protect some part of computations i.e. arithmetic and logical instructions and leave the rest as unprotected. To improve the efficacy of instruction-level redundancy-based approaches, we developed several error detection and error correction schemes. nZDC (near Zero silent

Data Corruption) is an instruction duplication scheme which protects the execution of whole application. Rather than detecting errors on register operands of memory and control flow operations, nZDC checks the results of such operations. nZDC en

sures the correct execution of memory write instruction by reloading stored value and checking it against redundantly computed value. nZDC also introduces a novel control flow checking mechanism which replicates compare and branch instructions and

detects both wrong direction branches as well as unwanted jumps. Fault injection experiments show that nZDC can improve the error coverage of the state-of-the-art schemes by more than 10x, without incurring any more performance penalty. Further

more, we introduced two error recovery solutions. InCheck is our backward recovery solution which makes light-weighted error-free checkpoints at the basic block granularity. In the case of error, InCheck reverts the program execution to the beginning of last executed basic block and resumes the execution by the aid of preserved in formation. NEMESIS is our forward recovery scheme which runs three versions of computation and detects errors by checking the results of all memory write and branch

operations. In the case of a mismatch, NEMESIS diagnosis routine decides if the error is recoverable. If yes, NEMESIS recovery routine reverts the effect of error from the program state and resumes program normal execution from the error detection

point.
ContributorsDidehban, Moslem (Author) / Shrivastava, Aviral (Thesis advisor) / Wu, Carole-Jean (Committee member) / Clark, Lawrence (Committee member) / Mahlke, Scott (Committee member) / Arizona State University (Publisher)
Created2018
156791-Thumbnail Image.png
Description
General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs

General-purpose processors propel the advances and innovations that are the subject of humanity’s many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions.

Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in prior proposals, often lead to sub-optimal system performance. I demonstrate that in order to maximize system performance, it is essential to manage the LLCs while being cognizant of its interaction with the system main memory. I propose ReMAP, which reduces the net memory access cost by evicting cache lines that either have no reuse, or have low memory access cost. ReMAP improves the performance of the CMP system by as much as 13%, and by an average of 6.5%.

Rather than the LLC, the L1 data cache has a pronounced impact on GPGPU performance by acting as the bandwidth filter for the rest of the memory subsystem. Prior work has shown that the severely constrained data cache capacity in GPGPUs leads to sub-optimal performance. In this thesis, I propose two novel techniques that address the GPGPU data cache capacity problem. I propose ID-Cache that performs effective cache bypassing and cache line size selection to improve cache capacity utilization. Next, I propose LATTE-CC that considers the GPU’s latency tolerance feature and adaptively compresses the data stored in the data cache, thereby increasing its effective capacity. ID-Cache and LATTE-CC are shown to achieve 71% and 19.2% speedup, respectively, over a wide variety of GPGPU applications.

Complementing the aforementioned microarchitecture techniques, I identify the need for system architecture innovations to sustain performance scalability of GPG- PUs in the face of slowing Moore’s Law. I propose a novel GPU architecture called the Multi-Chip-Module GPU (MCM-GPU) that integrates multiple GPU modules to form a single logical GPU. With intelligent memory subsystem optimizations tailored for MCM-GPUs, it can achieve within 7% of the performance of a similar but hypothetical monolithic die GPU. Taking a step further, I present an in-depth study of the energy-efficiency characteristics of future MCM-GPUs. I demonstrate that the inherent non-uniform memory access side-effects form the key energy-efficiency bottleneck in the future.

In summary, this thesis offers key insights into the performance and energy-efficiency bottlenecks in CMPs and GPGPUs, which can guide future architects towards developing high-performance and energy-efficient general-purpose processors.
ContributorsArunkumar, Akhil (Author) / Wu, Carole-Jean (Thesis advisor) / Shrivastava, Aviral (Committee member) / Lee, Yann-Hang (Committee member) / Bolotin, Evgeny (Committee member) / Arizona State University (Publisher)
Created2018
157100-Thumbnail Image.png
Description
One of the main goals of computer architecture design is to improve performance without much increase in the power consumption. It cannot be achieved by adding increasingly complex intelligent schemes in the hardware, since they will become increasingly less power-efficient. Therefore, parallelism comes up as the solution. In fact, the

One of the main goals of computer architecture design is to improve performance without much increase in the power consumption. It cannot be achieved by adding increasingly complex intelligent schemes in the hardware, since they will become increasingly less power-efficient. Therefore, parallelism comes up as the solution. In fact, the irrevocable trend of computer design in near future is still to keep increasing the number of cores while reducing the operating frequency. However, it is not easy to scale number of cores. One important challenge is that existing cores consume too much power. Another challenge is that cache-based memory hierarchy poses a serious limitation due to the rapidly increasing demand of area and power for coherence maintenance.

In this dissertation, opportunities to resolve the aforementioned issues were explored in two aspects.

Firstly, the possibility of removing hardware cache altogether, and replacing it with scratchpad memory with software management was explored. Scratchpad memory consumes much less power than caches. However, as data management logic is completely shifted to Software, how to reduce software overhead is challenging. This thesis presents techniques to manage scratchpad memory judiciously by exploiting application semantics and knowledge of data access patterns, thereby enabling optimization of data movement across the memory hierarchy. Experimental results show that the optimization was able to reduce stack data management overhead by 13X, produce better code mapping in more than 80% of the case, and improve performance by 83% in heap management.

Secondly, the possibility of using software branch hinting to replace hardware branch prediction to completely eliminate power consumption on corresponding hardware components was explored. As branch predictor is removed from hardware, software logic is responsible for reducing branch penalty. Techniques to minimize the branch penalty by optimizing branch hint placement were proposed, which can reduce branch penalty by 35.4% over the state-of-the-art.
ContributorsLu, Jing (Author) / Shrivastava, Aviral (Thesis advisor) / Sarjoughian, Hessam S. (Committee member) / Wu, Carole-Jean (Committee member) / Doupe, Adam (Committee member) / Arizona State University (Publisher)
Created2019
153968-Thumbnail Image.png
Description
The holy grail of computer hardware across all market segments has been to sustain performance improvement at the same pace as silicon technology scales. As the technology scales and the size of transistors shrinks, the power consumption and energy usage per transistor decrease. On the other hand, the transistor density

The holy grail of computer hardware across all market segments has been to sustain performance improvement at the same pace as silicon technology scales. As the technology scales and the size of transistors shrinks, the power consumption and energy usage per transistor decrease. On the other hand, the transistor density increases significantly by technology scaling. Due to technology factors, the reduction in power consumption per transistor is not sufficient to offset the increase in power consumption per unit area. Therefore, to improve performance, increasing energy-efficiency must be addressed at all design levels from circuit level to application and algorithm levels.

At architectural level, one promising approach is to populate the system with hardware accelerators each optimized for a specific task. One drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low as they perform one specific function. Using software programmable accelerators is an alternative approach to achieve high energy-efficiency and programmability. Due to intrinsic characteristics of software accelerators, they can exploit both instruction level parallelism and data level parallelism.

Coarse-Grained Reconfigurable Architecture (CGRA) is a software programmable accelerator consists of a number of word-level functional units. Motivated by promising characteristics of software programmable accelerators, the potentials of CGRAs in future computing platforms is studied and an end-to-end CGRA research framework is developed. This framework consists of three different aspects: CGRA architectural design, integration in a computing system, and CGRA compiler. First, the design and implementation of a CGRA and its instruction set is presented. This design is then modeled in a cycle accurate system simulator. The simulation platform enables us to investigate several problems associated with a CGRA when it is deployed as an accelerator in a computing system. Next, the problem of mapping a compute intensive region of a program to CGRAs is formulated. From this formulation, several efficient algorithms are developed which effectively utilize CGRA scarce resources very well to minimize the running time of input applications. Finally, these mapping algorithms are integrated in a compiler framework to construct a compiler for CGRA
ContributorsHamzeh, Mahdi (Author) / Vrudhula, Sarma (Thesis advisor) / Gopalakrishnan, Kailash (Committee member) / Shrivastava, Aviral (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)
Created2015
154657-Thumbnail Image.png
Description
Several decades of transistor technology scaling has brought the threat of soft errors to modern embedded processors. Several techniques have been proposed to protect these systems from soft errors. However, their effectiveness in protecting the computation cannot be ascertained without accurate and quantitative estimation of system reliability. Vulnerability -- a

Several decades of transistor technology scaling has brought the threat of soft errors to modern embedded processors. Several techniques have been proposed to protect these systems from soft errors. However, their effectiveness in protecting the computation cannot be ascertained without accurate and quantitative estimation of system reliability. Vulnerability -- a metric that defines the probability of system-failure (reliability) through analytical models -- is the most effective mechanism for our current estimation and early design space exploration needs. Previous vulnerability estimation tools are based around the Sim-Alpha simulator which has been to shown to have several limitations. In this thesis, I present gemV: an accurate and comprehensive vulnerability estimation tool based on gem5. Gem5 is a popular cycle-accurate micro-architectural simulator that can model several different processor models in close to real hardware form. GemV can be used for fast and early design space exploration and also evaluate the protection afforded by commodity processors. gemV is comprehensive, since it models almost all sequential components of the processor. gemV is accurate because of fine-grain vulnerability tracking, accurate vulnerability modeling of squashed instructions, and accurate vulnerability modeling of shared data structures in gem5. gemV has been thoroughly validated against extensive fault injection experiments and achieves a 97\% accuracy with 95\% confidence. A micro-architect can use gemV to discover micro-architectural variants of a processor that minimize vulnerability for allowed performance penalty. A software developer can use gemV to explore the performance-vulnerability trade-off by choosing different algorithms and compiler optimizations, while the system designer can use gemV to explore the performance-vulnerability trade-offs of choosing different Insruction Set Architectures (ISA).
ContributorsTanikella, Srinivas Karthik (Author) / Shrivastava, Aviral (Thesis advisor) / Bazzi, Rida (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)
Created2016
153948-Thumbnail Image.png
Description
Nanoparticle suspensions, popularly termed “nanofluids,” have been extensively investigated for their thermal and radiative properties. Such work has generated great controversy, although it is arguably accepted today that the presence of nanoparticles rarely leads to useful enhancements in either thermal conductivity or convective heat transfer. On the other hand, there

Nanoparticle suspensions, popularly termed “nanofluids,” have been extensively investigated for their thermal and radiative properties. Such work has generated great controversy, although it is arguably accepted today that the presence of nanoparticles rarely leads to useful enhancements in either thermal conductivity or convective heat transfer. On the other hand, there are still examples of unanticipated enhancements to some properties, such as the reported specific heat of molten salt-based nanofluids and the critical heat flux. Another largely overlooked example is the apparent effect of nanoparticles on the effective latent heat of vaporization (hfg) of aqueous nanofluids. A previous study focused on molecular dynamics (MD) modeling supplemented with limited experimental data to suggest that hfg increases with increasing nanoparticle concentration.

Here, this research extends that exploratory work in an effort to determine if hfg of aqueous nanofluids can be manipulated, i.e., increased or decreased, by the addition of graphite or silver nanoparticles. Our results to date indicate that hfg can be substantially impacted, by up to ± 30% depending on the type of nanoparticle. Moreover, this dissertation reports further experiments with changing surface area based on volume fraction (0.005% to 2%) and various nanoparticle sizes to investigate the mechanisms for hfg modification in aqueous graphite and silver nanofluids. This research also investigates thermophysical properties, i.e., density and surface tension in aqueous nanofluids to support the experimental results of hfg based on the Clausius - Clapeyron equation. This theoretical investigation agrees well with the experimental results. Furthermore, this research investigates the hfg change of aqueous nanofluids with nanoscale studies in terms of melting of silver nanoparticles and hydrophobic interactions of graphite nanofluid. As a result, the entropy change due to those mechanisms could be a main cause of the changes of hfg in silver and graphite nanofluids.

Finally, applying the latent heat results of graphite and silver nanofluids to an actual solar thermal system to identify enhanced performance with a Rankine cycle is suggested to show that the tunable latent heat of vaporization in nanofluilds could be beneficial for real-world solar thermal applications with improved efficiency.
ContributorsLee, Soochan (Author) / Phelan, Patrick E (Thesis advisor) / Wu, Carole-Jean (Thesis advisor) / Wang, Robert (Committee member) / Wang, Liping (Committee member) / Taylor, Robert A. (Committee member) / Prasher, Ravi (Committee member) / Arizona State University (Publisher)
Created2015