Search Content

Fido Tracker

Description

Millions of pets go missing every year and this project has the purpose of offering a pet GPS tracking solution to aid in this issue. An Arduino microcontroller was combined with a GPS module and GSM module to create the hardware of the device, which was then connected to a…

Millions of pets go missing every year and this project has the purpose of offering a pet GPS tracking solution to aid in this issue. An Arduino microcontroller was combined with a GPS module and GSM module to create the hardware of the device, which was then connected to a mobile application that was developed for the explicit purpose of this project. Amazon Web Services was used to significantly bring down the cost of connecting the hardware to the mobile app. Upon the completion of the project, a prototype pet GPS tracking device and mobile application were developed, and instructions were given so that any user could re-create the same solution for their own purposes.

ContributorsKiaei, Ariana (Author) / Ren, Fengbo (Thesis director) / Abraham, Seth (Committee member) / Computer Science and Engineering Program (Contributor) / Electrical Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2021-05

An IoT Solution to Air Quality Monitoring

Description

Pollution is an increasing problem around the world, and one of the main forms it takes is air pollution. Air pollution, from oxides and dioxides to particulate matter, continues to contribute to millions of deaths each year, which is more than the next three leading causes of environment-related death combined.…

Pollution is an increasing problem around the world, and one of the main forms it takes is air pollution. Air pollution, from oxides and dioxides to particulate matter, continues to contribute to millions of deaths each year, which is more than the next three leading causes of environment-related death combined. Plus, the problem is only growing as industrial plants, factories, and transportation continues to rapidly increase across the globe. Those most affected include less developed countries and individuals with pre-existing respiratory conditions. Although many citizens know about this issue, it is often unclear what times and locations are worst in terms of pollutant concentration as it can vary on the time of day, local activity, and other variable factors. As a result, citizens lack the knowledge and resources to properly combat or avoid air pollution, as well as the data and evidence to support any sort of regulatory change. Many companies and organizations have tried to address this through Air Quality Indexes (AQIs) but are not focused enough to help the everyday citizen, and often fail to include many significant pollutants. Thus, we sought to address this issue in a cost-effective way through creating a network of IoT (Internet of Things) devices and deploying them in a select area of Tempe, Arizona. We utilized Arduino Microprocessors and Wireless Radio Frequency Transceivers to send and receive air pollution data in real time. Then, displayed this data in such a way that it could be released to the public via web or mobile app. Furthermore, the product is cheap enough to be reproduced and sold in bulk as well as scaled and customized to be compatible with dozens of different air quality sensors.

ContributorsCoury, Abrahm Philip (Co-author) / Gillespie, Cody (Co-author) / Ren, Fengbo (Thesis director) / Shrivastava, Aviral (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Exploring the Implementation of Multiple Partial Reconfiguration Regions to use FPGAs in Edge Computing

Description

Edge computing is an emerging field that improves upon cloud computing by moving the service from a centralized server to several de-centralized servers that are closer to the end user to decrease the latency, bandwidth, and cost requirements. Field programmable grid array (FPGA) devices are highly reconfigurable and excel in…

Edge computing is an emerging field that improves upon cloud computing by moving the service from a centralized server to several de-centralized servers that are closer to the end user to decrease the latency, bandwidth, and cost requirements. Field programmable grid array (FPGA) devices are highly reconfigurable and excel in highly parallelized tasks, making them popular in many applications including digital signal processing and cryptography, while also making them a great candidate for edge computation. The purpose of this project was to explore existing board support packages for the Arria 10 GX FPGA and propose a BSP design with multiple partial reconfiguration regions to better support the use of FPGAs in edge computing. In this project, the general OpenCL development flow was studied, OpenCL workflow for Altera/Intel FPGAs was researched, the reference OpenCL BSP was explored to understand the connections between the modules, and a customized BSP with two partial reconfiguration regions was proposed. The existing BSP was explored using the Intel Quartus Prime software suite and the block diagrams for the existing and proposed designs were created using Microsoft Visio.

ContributorsLam, Evan (Author) / Ren, Fengbo (Thesis director) / Vrudhula, Sarma (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

FPGAs as an Edge Computing Solution

Description

As the Internet of Things continues to expand, not only must our computing power grow
alongside it, our very approach must evolve. While the recent trend has been to centralize our
computing resources in the cloud, it now looks beneficial to push more computing power
towards the “edge” with so called edge computing,…

As the Internet of Things continues to expand, not only must our computing power grow
alongside it, our very approach must evolve. While the recent trend has been to centralize our
computing resources in the cloud, it now looks beneficial to push more computing power
towards the “edge” with so called edge computing, reducing the immense strain on cloud
servers and the latency experienced by IoT devices. A new computing paradigm also brings
new opportunities for innovation, and one such innovation could be the use of FPGAs as edge
servers. In this research project, I learn the design flow for developing OpenCL kernels and
custom FPGA BSPs. Using these tools, I investigate the viability of using FPGAs as standalone
edge computing devices. Concluding that—although the technology is a great fit—the current
necessity of dynamically reprogrammable FPGAs to be closely coupled with a host CPU is
holding them back from this purpose. I propose a modification to the architecture of the Intel
Arria 10 GX that would allow it to be decoupled from its host CPU, allowing it to truly serve as a
viable edge computing solution.

ContributorsBarth, Brandon Albert (Author) / Ren, Fengbo (Thesis director) / Vrudhula, Sarma (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Internet-of-Things for Pet Care

Description

The purpose of this project was to construct and write code for an Internet-of-Things (IoT) based smart pet feeder to promote owner involvement in the health of their pets through the Internet to combat the high rate of obesity in pets in the United States. To achieve this, a pet…

The purpose of this project was to construct and write code for an Internet-of-Things (IoT) based smart pet feeder to promote owner involvement in the health of their pets through the Internet to combat the high rate of obesity in pets in the United States. To achieve this, a pet feeder was developed to track the weight of the pet as well as how much food the pet eats while allowing the owner to see video of their pet eating, all through the owner's smartphone. In this thesis, current pet feeder strengths and weaknesses were researched, pet owners were surveyed, prototype features were decided upon, physical prototype components were purchased, a physical model of the pet feeder was developed in SolidWorks, a prototype was manufactured, and existing IoT libraries were utilized to develop code.

ContributorsCoote, Gabriela Phyllis (Author) / Ren, Fengbo (Thesis director) / Shrivastava, Aviral (Committee member) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

FPGA-Based Edge-Computing Acceleration

Description

The rapid growth of Internet-of-things (IoT) and artificial intelligence applications have called forth a new computing paradigm--edge computing. Edge computing applications, such as video surveillance, autonomous driving, and augmented reality, are highly computationally intensive and require real-time processing. Current edge systems are typically based on commodity general-purpose hardware such as…

The rapid growth of Internet-of-things (IoT) and artificial intelligence applications have called forth a new computing paradigm--edge computing. Edge computing applications, such as video surveillance, autonomous driving, and augmented reality, are highly computationally intensive and require real-time processing. Current edge systems are typically based on commodity general-purpose hardware such as Central Processing Units (CPUs) and Graphical Processing Units (GPUs) , which are mainly designed for large, non-time-sensitive jobs in the cloud and do not match the needs of the edge workloads. Also, these systems are usually power hungry and are not suitable for resource-constrained edge deployments. Such application-hardware mismatch calls forth a new computing backbone to support the high-bandwidth, low-latency, and energy-efficient requirements. Also, the new system should be able to support a variety of edge applications with different characteristics. This thesis addresses the above challenges by studying the use of Field Programmable Gate Array (FPGA) -based computing systems for accelerating the edge workloads, from three critical angles. First, it investigates the feasibility of FPGAs for edge computing, in comparison to conventional CPUs and GPUs. Second, it studies the acceleration of common algorithmic characteristics, identified as loop patterns, using FPGAs, and develops a benchmark tool for analyzing the performance of these patterns on different accelerators. Third, it designs a new edge computing platform using multiple clustered FPGAs to provide high-bandwidth and low-latency acceleration of convolutional neural networks (CNNs) widely used in edge applications. Finally, it studies the acceleration of the emerging neural networks, randomly-wired neural networks, on the multi-FPGA platform. The experimental results from this work show that the new generation of workloads requires rethinking the current edge-computing architecture. First, through the acceleration of common loops, it demonstrates that FPGAs can outperform GPUs in specific loops types up to 14 times. Second, it shows the linear scalability of multi-FPGA platforms in accelerating neural networks. Third, it demonstrates the superiority of the new scheduler to optimally place randomly-wired neural networks on multi-FPGA platforms with 81.1 times better throughput than the available scheduling mechanisms.

ContributorsBiookaghazadeh, Saman (Author) / Zhao, Ming (Thesis advisor) / Ren, Fengbo (Thesis advisor) / Li, Baoxin (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)

Created2021

A Methodology and Formalism to Handle Timing Uncertainties in Cyber-Physical Systems

Description

Uncertainty is intrinsic in Cyber-Physical Systems since they interact with human and work with both analog and digital worlds. Since even minute deviation from the real values can make catastrophe in a safety-critical application, considering uncertainties in CPS behavior is essential. On the other side, time is a…

Uncertainty is intrinsic in Cyber-Physical Systems since they interact with human and work with both analog and digital worlds. Since even minute deviation from the real values can make catastrophe in a safety-critical application, considering uncertainties in CPS behavior is essential. On the other side, time is a foundational aspect of Cyber-Physical Systems (CPS). Correct timing of system events is critical to optimize responsiveness to the environment, in terms of timeliness, accuracy, and precision in the knowledge, measurement, prediction, and control of CPS behavior. In order to design a more resilient and reliable CPS, first and foremost, there should be a way to specify the timing constraints that a constructed Cyber-Physical System must meet with considering existing uncertainties. Only then, we can seek systematic approaches to check if all timing constraints are being met, and develop correct-by-construction methodologies. In this regard, Timestamp Temporal Logic (TTL) is developed to specify the timing constraints on a distributed CPS. By TTL designers can specify the timing requirements that a CPS must satisfy in a succinct and intuitive manner and express the tolerable error as a part of the language. The proposed deduction system on TTL (TTL reasoning system) gives the ability to check the consistency among expresses system specifications and simplify them to be implemented on FPGA for run-time verification. Regarding CPS run-time verification, Timestamp-based Monitoring Approach(TMA) has been designed that can hook up to a CPS and take its timing specifications in TTL and verify if the timing constraints are being met with considering existing uncertainties in the system. TMA does not need to compute whether the constraint is being met at each and every instance of time but it re-evaluates constraint only when there is an event that can affect the outcome. This enables it to perform online timing monitoring of CPS for less computation and resources. Furthermore, the minimum design parameters of the timing CPS that are required to enable testing the timing of CPS are defined in this dissertation

ContributorsMehrabian, Mohammadreza (Author) / Shrivastava, Aviral (Thesis advisor) / Ren, Fengbo (Committee member) / Sarjoughian, Hessam (Committee member) / Derler, Patricia (Committee member) / Arizona State University (Publisher)

Created2021

Reduced Order Models and Approximations for Hardware Acceleration of Neural Networks

Description

Many real-world engineering problems require simulations to evaluate the design objectives and constraints. Often, due to the complexity of the system model, simulations can be prohibitive in terms of computation time. One approach to overcome this issue is to construct a surrogate model, which approximates the original model. The focus…

Many real-world engineering problems require simulations to evaluate the design objectives and constraints. Often, due to the complexity of the system model, simulations can be prohibitive in terms of computation time. One approach to overcome this issue is to construct a surrogate model, which approximates the original model. The focus of this work is on the data-driven surrogate models, in which empirical approximations of the output are performed given the input parameters. Recently neural networks (NN) have re-emerged as a popular method for constructing data-driven surrogate models. Although, NNs have achieved excellent accuracy and are widely used, they pose their own challenges. This work addresses two common challenges, the need for: (1) hardware acceleration and (2) uncertainty quantification (UQ) in the presence of input variability. The high demand in the inference phase of deep NNs in cloud servers/edge devices calls for the design of low power custom hardware accelerators. The first part of this work describes the design of an energy-efficient long short-term memory (LSTM) accelerator. The overarching goal is to aggressively reduce the power consumption and area of the LSTM components using approximate computing, and then use architectural level techniques to boost the performance. The proposed design is synthesized and placed and routed as an application-specific integrated circuit (ASIC). The results demonstrate that this accelerator is 1.2X and 3.6X more energy-efficient and area-efficient than the baseline LSTM. In the second part of this work, a robust framework is developed based on an alternate data-driven surrogate model referred to as polynomial chaos expansion (PCE) for addressing UQ. In contrast to many existing approaches, no assumptions are made on the elements of the function space and UQ is a function of the expansion coefficients. Moreover, the sensitivity of the output with respect to any subset of the input variables can be computed analytically by post-processing the PCE coefficients. This provides a systematic and incremental method to pruning or changing the order of the model. This framework is evaluated on several real-world applications from different domains and is extended for classification tasks as well.

ContributorsAzari, Elham (Author) / Vrudhula, Sarma (Thesis advisor) / Fainekos, Georgios (Committee member) / Ren, Fengbo (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2021

Compiler Design for Accelerating Applications on Coarse-Grained Reconfigurable Architectures

Description

Coarse-Grained Reconfigurable Arrays (CGRAs) are emerging accelerators that promise low-power acceleration of compute-intensive loops in applications. The acceleration achieved by CGRA relies on the efficient mapping of the compute-intensive loops by the CGRA compiler onto the CGRA. The CGRA mapping problem, being NP-complete, is performed in a two-step process, scheduling,…

Coarse-Grained Reconfigurable Arrays (CGRAs) are emerging accelerators that promise low-power acceleration of compute-intensive loops in applications. The acceleration achieved by CGRA relies on the efficient mapping of the compute-intensive loops by the CGRA compiler onto the CGRA. The CGRA mapping problem, being NP-complete, is performed in a two-step process, scheduling, and mapping. The scheduling algorithm allocates timeslots to the nodes of the DFG, and the mapping algorithm maps the scheduled nodes onto the PEs of the CGRA. On a mapping failure, the initiation interval (II) is increased, and a new schedule is obtained for the increased II. Most previous mapping techniques use the Iterative Modulo Scheduling algorithm (IMS) to find a schedule for a given II. Since IMS generates a resource-constrained ASAP (as-soon-as-possible) scheduling, even with increased II, it tends to generate a similar schedule that is not mappable and does not explore the schedule space effectively. The problems encountered by IMS-based scheduling algorithms are explored and an improved randomized scheduling algorithm for scheduling of the application loop to be accelerated is proposed. When encountering a mapping failure for a given schedule, existing mapping algorithms either exit and retry the mapping anew, or recursively remove the previously mapped node to find a valid mapping (backtrack).Abandoning the mapping is extreme, but even backtracking may not be the best choice, since the root of the problem may not be the previous node. The challenges in existing algorithms are systematically analyzed and a failure-aware mapping algorithm is presented. The loops in general-purpose applications are often complicated loops, i.e., loops with perfect and imperfect nests and loops with nested if-then-else's (conditionals). The existing hardware-software solutions to execute branches and conditions are inefficient. A co-design approach that efficiently executes complicated loops on CGRA is proposed. The compiler transforms complex loops, maps them to the CGRA, and lays them out in the memory in a specific manner, such that the hardware can fetch and execute the instructions from the right path at runtime. Finally, a CGRA compilation simulator open-source framework is presented. This open-source CGRA simulation framework is based on LLVM and gem5 to extract the loop, map them onto the CGRA architecture, and execute them as a co-processor to an ARM CPU.

ContributorsBalasubramanian, Mahesh (Author) / Shrivastava, Aviral (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Ren, Fengbo (Committee member) / Pozzi, Laura (Committee member) / Arizona State University (Publisher)

Created2021

Vision-inspired Representation and Learning for Data-driven Signal Processing

Description

In the era of data explosion, massive data is generated from various sources at an unprecedented speed. The ever-growing amount of data reveals enormous opportunities for developing novel data-driven solutions to unsolved problems. In recent years, benefiting from numerous public datasets and advances in deep learning, data-driven approaches in the…

In the era of data explosion, massive data is generated from various sources at an unprecedented speed. The ever-growing amount of data reveals enormous opportunities for developing novel data-driven solutions to unsolved problems. In recent years, benefiting from numerous public datasets and advances in deep learning, data-driven approaches in the computer vision domain have demonstrated superior performance with high adaptability on various data and tasks. Meanwhile, signal processing has long been dominated by techniques derived from rigorous mathematical models built upon prior knowledge of signals. Due to the lack of adaptability to real data and applications, model-based methods often suffer from performance degradation and engineering difficulties. In this dissertation, multiple signal processing problems are studied from vision-inspired data representation and learning perspectives to address the major limitation on adaptability. Corresponding data-driven solutions are proposed to achieve significantly improved performance over conventional solutions. Specifically, in the compressive sensing domain, an open-source image compressive sensing toolbox and benchmark to standardize the implementation and evaluation of reconstruction methods are first proposed. Then a plug-and-play compression ratio adapter is proposed to enable the adaptability of end-to-end data-driven reconstruction methods to variable compression ratios. Lastly, the problem of transfer learning from images to bioelectric signals is experimentally studied to demonstrate the improved performance of data-driven reconstruction. In the image subsampling domain, task-adaptive data-driven image subsampling is studied to reduce data redundancy and retain information of interest simultaneously. In the semiconductor analysis domain, the data-driven automatic error detection problem is studied in the context of integrated circuit segmentation for the first time. In the light detection and ranging(LiDAR) camera calibration domain, the calibration accuracy degradation problem in low-resolution LiDAR scenarios is addressed with data-driven techniques.

ContributorsZhang, Zhikang (Author) / Ren, Fengbo (Thesis advisor) / Li, Baoxin (Committee member) / Turaga, Pavan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2023