Search Content

Stagioni: Temperature management to enable near-sensor processing for performance, fidelity, and energy-efficiency of vision and imaging workloads

Description

Vision processing on traditional architectures is inefficient due to energy-expensive off-chip data movements. Many researchers advocate pushing processing close to the sensor to substantially reduce data movements. However, continuous near-sensor processing raises the sensor temperature, impairing the fidelity of imaging/vision tasks.

The work characterizes the thermal implications of using 3D stacked…

Vision processing on traditional architectures is inefficient due to energy-expensive off-chip data movements. Many researchers advocate pushing processing close to the sensor to substantially reduce data movements. However, continuous near-sensor processing raises the sensor temperature, impairing the fidelity of imaging/vision tasks.

The work characterizes the thermal implications of using 3D stacked image sensors with near-sensor vision processing units. The characterization reveals that near-sensor processing reduces system power but degrades image quality. For reasonable image fidelity, the sensor temperature needs to stay below a threshold, situationally determined by application needs. Fortunately, the characterization also identifies opportunities -- unique to the needs of near-sensor processing -- to regulate temperature based on dynamic visual task requirements and rapidly increase capture quality on demand.

Based on the characterization, the work proposes and investigate two thermal management strategies -- stop-capture-go and seasonal migration -- for imaging-aware thermal management. The work present parameters that govern the policy decisions and explore the trade-offs between system power and policy overhead. The work's evaluation shows that the novel dynamic thermal management strategies can unlock the energy-efficiency potential of near-sensor processing with minimal performance impact, without compromising image fidelity.

ContributorsKodukula, Venkatesh (Author) / LiKamWa, Robert (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Brunhaver, John (Committee member) / Arizona State University (Publisher)

Created2019

Computer Vision from Spatial-Multiplexing Cameras at Low Measurement Rates

Description

In UAVs and parking lots, it is typical to first collect an enormous number of pixels using conventional imagers. This is followed by employment of expensive methods to compress by throwing away redundant data. Subsequently, the compressed data is transmitted to a ground station. The past decade has seen the…

In UAVs and parking lots, it is typical to first collect an enormous number of pixels using conventional imagers. This is followed by employment of expensive methods to compress by throwing away redundant data. Subsequently, the compressed data is transmitted to a ground station. The past decade has seen the emergence of novel imagers called spatial-multiplexing cameras, which offer compression at the sensing level itself by providing an arbitrary linear measurements of the scene instead of pixel-based sampling. In this dissertation, I discuss various approaches for effective information extraction from spatial-multiplexing measurements and present the trade-offs between reliability of the performance and computational/storage load of the system. In the first part, I present a reconstruction-free approach to high-level inference in computer vision, wherein I consider the specific case of activity analysis, and show that using correlation filters, one can perform effective action recognition and localization directly from a class of spatial-multiplexing cameras, called compressive cameras, even at very low measurement rates of 1\%. In the second part, I outline a deep learning based non-iterative and real-time algorithm to reconstruct images from compressively sensed (CS) measurements, which can outperform the traditional iterative CS reconstruction algorithms in terms of reconstruction quality and time complexity, especially at low measurement rates. To overcome the limitations of compressive cameras, which are operated with random measurements and not particularly tuned to any task, in the third part of the dissertation, I propose a method to design spatial-multiplexing measurements, which are tuned to facilitate the easy extraction of features that are useful in computer vision tasks like object tracking. The work presented in the dissertation provides sufficient evidence to high-level inference in computer vision at extremely low measurement rates, and hence allows us to think about the possibility of revamping the current day computer systems.

ContributorsKulkarni, Kuldeep Sharad (Author) / Turaga, Pavan (Thesis advisor) / Li, Baoxin (Committee member) / Chakrabarti, Chaitali (Committee member) / Sankaranarayanan, Aswin (Committee member) / LiKamWa, Robert (Committee member) / Arizona State University (Publisher)

Created2017

Accelerating Linear Algebra and Machine Learning Kernels on a Massively Parallel Reconfigurable Architecture

Description

This thesis presents efficient implementations of several linear algebra kernels, machine learning kernels and a neural network based recommender systems engine onto a massively parallel reconfigurable architecture, Transformer. The linear algebra kernels include Triangular Matrix Solver (TRSM), LU Decomposition (LUD), QR Decomposition (QRD), and Matrix Inversion. The machine learning kernels…

This thesis presents efficient implementations of several linear algebra kernels, machine learning kernels and a neural network based recommender systems engine onto a massively parallel reconfigurable architecture, Transformer. The linear algebra kernels include Triangular Matrix Solver (TRSM), LU Decomposition (LUD), QR Decomposition (QRD), and Matrix Inversion. The machine learning kernels include an LSTM (Long Short Term Memory) cell, and a GRU (gated Recurrent Unit) cell used in recurrent neural networks. The neural network based recommender systems engine consists of multiple kernels including fully connected layers, embedding layer, 1-D batchnorm, Adam optimizer, etc.

Transformer is a massively parallel reconfigurable multicore architecture designed at the University of Michigan. The Transformer configuration considered here is 4 tiles and 16 General Processing Elements (GPEs) per tile. It supports a two level cache hierarchy where the L1 and L2 caches can operate in shared (S) or private (P) modes. The architecture was modeled using Gem5 and cycle accurate simulations were done to evaluate the performance in terms of execution times, giga-operations per second per Watt (GOPS/W), and giga-floating-point-operations per second per Watt (GFLOPS/W).

This thesis shows that for linear algebra kernels, each kernel achieves high performance for a certain cache mode and that this cache mode can change when the matrix size changes. For instance, for smaller matrix sizes, L1P, L2P cache mode is best for TRSM, while L1S, L2S is the best cache mode for LUD, and L1P, L2S is the best for QRD. For each kernel, the optimal cache mode changes when the matrix size is increased. For instance, for TRSM, the L1P, L2P cache mode is best for smaller matrix sizes ($N=64, 128, 256, 512$) and it changes to L1S, L2P for larger matrix sizes ($N=1024$). For machine learning kernels, L1P, L2P is the best cache mode for all network parameter sizes.

Gem5 simulations show that the peak performance for TRSM, LUD, QRD and Matrix Inverse in the 14nm node is 97.5, 59.4, 133.0 and 83.05 GFLOPS/W, respectively. For LSTM and GRU, the peak performance is 44.06 and 69.3 GFLOPS/W.

The neural network based recommender system was implemented in L1S, L2S cache mode. It includes a forward pass and a backward pass and is significantly more complex in terms of both computational complexity and data movement. The most computationally intensive block is the fully connected layer followed by Adam optimizer. The overall performance of the recommender systems engine is 54.55 GFLOPS/W and 169.12 GOPS/W.

ContributorsSoorishetty, Anuraag (Author) / Chakrabarti, Chaitali (Thesis advisor) / Kim, Hun Seok (Committee member) / LiKamWa, Robert (Committee member) / Arizona State University (Publisher)

Created2019

Viewpoint Recommendation for Aesthetic Photography

Description

This thesis addresses the problem of recommending a viewpoint for aesthetic photography. Viewpoint recommendation is suggesting the best camera pose to capture a visually pleasing photograph of the subject of interest by using any end-user device such as drone, mobile robot or smartphone. Solving this problem enables to capture visually…

This thesis addresses the problem of recommending a viewpoint for aesthetic photography. Viewpoint recommendation is suggesting the best camera pose to capture a visually pleasing photograph of the subject of interest by using any end-user device such as drone, mobile robot or smartphone. Solving this problem enables to capture visually pleasing photographs autonomously in areal photography, wildlife photography, landscape photography or in personal photography.

The viewpoint recommendation problem can be divided into two stages: (a) generating a set of dense novel views based on the basis views captured about the subject. The dense novel views are useful to better understand the scene and to know how the subject looks from different viewpoints and (b) each novel is scored based on how aesthetically good it is. The viewpoint with the greatest aesthetic score is recommended for capturing a visually pleasing photograph.

ContributorsKatukuri, Sathish Kumar (Author) / LiKamWa, Robert (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)

Created2019

A Scalable and Programmable I/O Controller for Region-based Computing

Description

I present my work on a scalable and programmable I/O controller for region-based computing, which will be used in a rhythmic pixel-based camera pipeline. I provide a breakdown of the development and design of the I/O controller and how it fits in to rhythmic pixel regions, along with a studyon…

I present my work on a scalable and programmable I/O controller for region-based computing, which will be used in a rhythmic pixel-based camera pipeline. I provide a breakdown of the development and design of the I/O controller and how it fits in to rhythmic pixel regions, along with a studyon memory traffic of rhythmic pixel regions and how this translates to energy efficiency. This rhythmic pixel region-based camera pipeline has been jointly developed through Dr. Robert LiKamWa’s research lab. High spatiotemporal resolutions allow high precision for vision applications, such as for detecting features for augmented reality or face detection. High spatiotemporal resolution also comes with high memory throughput, leading to higher energy usage. This creates a tradeoff between high precision and energy efficiency, which becomes more important in mobile systems. In addition, not all pixels in a frame are necessary for the vision application, such as pixels that make up the background. Rhythmic pixel regions aim to reduce the tradeoff by creating a pipeline that allows an application developer to specify regions to capture at a non-uniform spatiotemporal resolution. This is accomplished by encoding the incoming image, and only sending the pixels within these specified regions. Later these encoded representations will be decoded to a standard frame representation usable by traditional vision applications. My contribution to this effort has been the design, testing and evaluation of the I/O controller.

ContributorsNguyen, Van (Author) / LiKamWa, Robert (Thesis advisor) / Jayasuriya, Suren (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

buff: An AI Assistant Framework to Accelerate Scientific Research and Discovery

Description

The rapid growth of published research has increased the time and energy researchers invest in literature review to stay updated in their field. While existing research tools assist with organizing papers, providing basic summaries, and improving search, there is a need for an assistant that copilots researchers to drive innovation. In…

The rapid growth of published research has increased the time and energy researchers invest in literature review to stay updated in their field. While existing research tools assist with organizing papers, providing basic summaries, and improving search, there is a need for an assistant that copilots researchers to drive innovation. In response, we introduce buff, a research assistant framework employing large language models to summarize papers, identify research gaps and trends, and recommend future directions based on semantic analysis of the literature landscape, Wikipedia, and the broader internet. We demo buff through a user-friendly chat interface, powered by a citation network encompassing over 5600 research papers, amounting to over 133 million tokens of textual information. buff utilizes a network structure to fetch and analyze factual scientific information semantically. By streamlining the literature review and scientific knowledge discovery process, buff empowers researchers to concentrate their efforts on pushing the boundaries of their fields, driving innovation, and optimizing the scientific research landscape.

ContributorsBalamurugan, Neha (Author) / Arani, Punit (Co-author) / LiKamWa, Robert (Thesis director) / Bhattacharjee, Amrita (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor) / Economics Program in CLAS (Contributor)

Created2024-05

Filtering by

Stagioni: Temperature management to enable near-sensor processing for performance, fidelity, and energy-efficiency of vision and imaging workloads

Computer Vision from Spatial-Multiplexing Cameras at Low Measurement Rates

Accelerating Linear Algebra and Machine Learning Kernels on a Massively Parallel Reconfigurable Architecture

Viewpoint Recommendation for Aesthetic Photography

A Scalable and Programmable I/O Controller for Region-based Computing

buff: An AI Assistant Framework to Accelerate Scientific Research and Discovery