Search Content

Scratchpad Management in Software Managed Manycore Architectures

Description

Caches have long been used to reduce memory access latency. However, the increased complexity of cache coherence brings significant challenges in processor design as the number of cores increases. While making caches scalable is still an important research problem, some researchers are exploring the possibility of a more power-efficient SRAM…

Caches have long been used to reduce memory access latency. However, the increased complexity of cache coherence brings significant challenges in processor design as the number of cores increases. While making caches scalable is still an important research problem, some researchers are exploring the possibility of a more power-efficient SRAM called scratchpad memories or SPMs. SPMs consume significantly less area, and are more energy-efficient per access than caches, and therefore make the design of on-chip memories much simpler. Unlike caches, which fetch data from memories automatically, an SPM requires explicit instructions for data transfers. SPM-only architectures are thus named as software managed manycore (SMM), since the data movements of such architectures rely on software. SMM processors have been widely used in different areas, such as embedded computing, network processing, or even high performance computing. While SMM processors provide a low-power platform, the hardware alone does not guarantee power efficiency, if applications on such processors deliver low performance. Efficient software techniques are therefore required. A big body of management techniques for SMM architectures are compiler-directed, as inserting data movement operations by hand forces programmers to trace flow of data, which can be error-prone and sometimes difficult if not impossible. This thesis develops compiler-directed techniques to manage data transfers for embedded applications on SMMs efficiently. The techniques analyze and find out the proper program points and insert data movement instructions accordingly. The techniques manage code, stack and heap data of applications, and reduce execution time by 14%, 52% and 80% respectively compared to their predecessors on typical embedded applications. On top of managing local data, a technique is also developed for shared data in SMM architectures. Experimental results show it achieves more than 2X speedup than the previous technique on average.

ContributorsCai, Jian (Author) / Shrivastava, Aviral (Thesis advisor) / Wu, Carole (Committee member) / Ren, Fengbo (Committee member) / Dasgupta, Partha (Committee member) / Arizona State University (Publisher)

Created2017

Algorithm Architecture Co-design for Dense and Sparse Matrix Computations

Description

With the end of Dennard scaling and Moore's law, architects have moved towards

heterogeneous designs consisting of specialized cores to achieve higher performance

and energy efficiency for a target application domain. Applications of linear algebra

are ubiquitous in the field of scientific computing, machine learning, statistics,

etc. with matrix computations being fundamental to these…

With the end of Dennard scaling and Moore's law, architects have moved towards

heterogeneous designs consisting of specialized cores to achieve higher performance

and energy efficiency for a target application domain. Applications of linear algebra

are ubiquitous in the field of scientific computing, machine learning, statistics,

etc. with matrix computations being fundamental to these linear algebra based solutions.

Design of multiple dense (or sparse) matrix computation routines on the

same platform is quite challenging. Added to the complexity is the fact that dense

and sparse matrix computations have large differences in their storage and access

patterns and are difficult to optimize on the same architecture. This thesis addresses

this challenge and introduces a reconfigurable accelerator that supports both dense

and sparse matrix computations efficiently.

The reconfigurable architecture has been optimized to execute the following linear

algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM

(Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver),

LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication),

SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where

each core consists of a 2D array of processing elements (PE).

The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix

updates efficiently. A sequence of such updates is used to solve a larger problem inside

a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR)

is used to perform sparse kernel updates. Scalable partitioning and mapping schemes

are presented that map input matrices of any given size to the multicore architecture.

Design trade-offs related to the PE array dimension, size of local memory inside a core

and the bandwidth between on-chip memories and the cores have been presented. An

optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto

32 GOPS using a single core.

ContributorsAnimesh, Saurabh (Author) / Chakrabarti, Chaitali (Thesis advisor) / Brunhaver, John (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)

Created2018

Distortion Robust Biometric Recognition

Description

Information forensics and security have come a long way in just a few years thanks to the recent advances in biometric recognition. The main challenge remains a proper design of a biometric modality that can be resilient to unconstrained conditions, such as quality distortions. This work presents a solution to…

Information forensics and security have come a long way in just a few years thanks to the recent advances in biometric recognition. The main challenge remains a proper design of a biometric modality that can be resilient to unconstrained conditions, such as quality distortions. This work presents a solution to face and ear recognition under unconstrained visual variations, with a main focus on recognition in the presence of blur, occlusion and additive noise distortions.

First, the dissertation addresses the problem of scene variations in the presence of blur, occlusion and additive noise distortions resulting from capture, processing and transmission. Despite their excellent performance, ’deep’ methods are susceptible to visual distortions, which significantly reduce their performance. Sparse representations, on the other hand, have shown huge potential capabilities in handling problems, such as occlusion and corruption. In this work, an augmented SRC (ASRC) framework is presented to improve the performance of the Spare Representation Classifier (SRC) in the presence of blur, additive noise and block occlusion, while preserving its robustness to scene dependent variations. Different feature types are considered in the performance evaluation including image raw pixels, HoG and deep learning VGG-Face. The proposed ASRC framework is shown to outperform the conventional SRC in terms of recognition accuracy, in addition to other existing sparse-based methods and blur invariant methods at medium to high levels of distortion, when particularly used with discriminative features.

In order to assess the quality of features in improving both the sparsity of the representation and the classification accuracy, a feature sparse coding and classification index (FSCCI) is proposed and used for feature ranking and selection within both the SRC and ASRC frameworks.

The second part of the dissertation presents a method for unconstrained ear recognition using deep learning features. The unconstrained ear recognition is performed using transfer learning with deep neural networks (DNNs) as a feature extractor followed by a shallow classifier. Data augmentation is used to improve the recognition performance by augmenting the training dataset with image transformations. The recognition performance of the feature extraction models is compared with an ensemble of fine-tuned networks. The results show that, in the case where long training time is not desirable or a large amount of data is not available, the features from pre-trained DNNs can be used with a shallow classifier to give a comparable recognition accuracy to the fine-tuned networks.

ContributorsMounsef, Jinane (Author) / Karam, Lina (Thesis advisor) / Papandreou-Suppapola, Antonia (Committee member) / Li, Baoxin (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)

Created2018

The Effectiveness of Inhibition and Biofilm Disruption on Antibiotic Resistant E. coli

Description

The purpose of this study was to observe the effectiveness of the phenylalanyl arginine β-naphthylamide dihydrochloride inhibitor and Tween 20 when combined with an antibiotic against Escherichia. coli. As antibiotic resistance becomes more and more prevalent it is necessary to think outside the box and do more than just increase…

The purpose of this study was to observe the effectiveness of the phenylalanyl arginine β-naphthylamide dihydrochloride inhibitor and Tween 20 when combined with an antibiotic against Escherichia. coli. As antibiotic resistance becomes more and more prevalent it is necessary to think outside the box and do more than just increase the dosage of currently prescribed antibiotics. This study attempted to combat two forms of antibiotic resistance. The first is the AcrAB efflux pump which is able to pump antibiotics out of the cell. The second is the biofilms that E. coli can form. By using an inhibitor, the pump should be unable to rid itself of an antibiotic. On the other hand, using Tween allows for biofilm formation to either be disrupted or for the biofilm to be dissolved. By combining these two chemicals with an antibiotic that the efflux pump is known to expel, low concentrations of each chemical should result in an equivalent or greater effect on bacteria compared to any one chemical in higher concentrations. To test this hypothesis a 96 well plate BEC screen test was performed. A range of antibiotics were used at various concentrations and with varying concentrations of both Tween and the inhibitor to find a starting point. Following this, Erythromycin and Ciprofloxacin were picked as the best candidates and the optimum range of the antibiotic, Tween, and inhibitor were established. Finally, all three chemicals were combined to observe the effects they had together as opposed to individually or paired together. From the results of this experiment several conclusions were made. First, the inhibitor did in fact increase the effectiveness of the antibiotic as less antibiotic was needed if the inhibitor was present. Second, Tween showed an ability to prevent recovery in the MBEC reading, showing that it has the ability to disrupt or dissolve biofilms. However, Tween also showed a noticeable decrease in effectiveness in the overall treatment. This negative interaction was unable to be compensated for when using the inhibitor and so the hypothesis was proven false as combining the three chemicals led to a less effective treatment method.

ContributorsPetrovich Flynn, Chandler James (Author) / Misra, Rajeev (Thesis director) / Bean, Heather (Committee member) / Perkins, Kim (Committee member) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Effects of Diffusers with and without Vortex Generators on Overall Flow and Velocity Distribution

Description

An in-depth analysis on the effects vortex generators cause to the boundary layer separation that occurs when an internal flow passes through a diffuser is presented. By understanding the effects vortex generators demonstrate on the boundary layer, they can be utilized to improve the performance and efficiencies of diffusers and…

An in-depth analysis on the effects vortex generators cause to the boundary layer separation that occurs when an internal flow passes through a diffuser is presented. By understanding the effects vortex generators demonstrate on the boundary layer, they can be utilized to improve the performance and efficiencies of diffusers and other internal flow applications. An experiment was constructed to acquire physical data that could assess the change in performance of the diffusers once vortex generators were applied. The experiment consisted of pushing air through rectangular diffusers with half angles of 10, 20, and 30 degrees. A velocity distribution model was created for each diffuser without the application of vortex generators before modeling the velocity distribution with the application of vortex generators. This allowed the two results to be directly compared to one another and the improvements to be quantified. This was completed by using the velocity distribution model to find the partial mass flow rate through the outer portion of the diffuser's cross-sectional area. The analysis concluded that the vortex generators noticeably increased the performance of the diffusers. This was best seen in the performance of the 30-degree diffuser. Initially the diffuser experienced airflow velocities near zero towards the edges. This led to 0.18% of the mass flow rate occurring in the outer one-fourth portion of the cross-sectional area. With the application of vortex generators, this percentage increased to 5.7%. The 20-degree diffuser improved from 2.5% to 7.9% of the total mass flow rate in the outer portion and the 10-degree diffuser improved from 11.9% to 19.2%. These results demonstrate an increase in performance by the addition of vortex generators while allowing the possibility for further investigation on improvement through the design and configuration of these vortex generators.

ContributorsSanchez, Zachary Daniel (Author) / Takahashi, Timothy (Thesis director) / Herrmann, Marcus (Committee member) / Mechanical and Aerospace Engineering Program (Contributor) / W.P. Carey School of Business (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Design of an Electrically Driven Centrifugal Pump for Hybrid Sounding Rocket Applications

Description

The objective of this project was to design an electrically driven centrifugal pump for the Daedalus Astronautics @ASU hybrid rocket engine (HRE). The pump design was purposefully simplified due to time, fabrication, calculation, and capability constraints, which resulted in a lower fidelity design, with the option to be improved later.…

The objective of this project was to design an electrically driven centrifugal pump for the Daedalus Astronautics @ASU hybrid rocket engine (HRE). The pump design was purposefully simplified due to time, fabrication, calculation, and capability constraints, which resulted in a lower fidelity design, with the option to be improved later. The impeller, shroud, volute, shaft, motor, and ESC were the main focuses of the pump assembly, but the seals, bearings, lubrication methods, and flow path connections were considered as elements which would require future attention. The resulting pump design is intended to be used on the Daedalus Astronautics HRE test cart for design verification. In the future, trade studies and more detailed analyses should and will be performed before this pump is integrated into the Daedalus Astronautics flight-ready HRE.

ContributorsShillingburg, Ryan Carl (Author) / White, Daniel (Thesis director) / Brunacini, Lauren (Committee member) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

A Space Elevator to Mars: Calculating Space Flight Trajectories

Description

Human habitation of other planets requires both cost-effective transportation and low time-of-flight for human passengers and critical supplies. The current methods for interplanetary orbital transfers, such as the Hohmann transfer, require either expensive, high fuel maneuvers or extended space travel. However, by utilizing the high velocities of a super-geosynchronous space…

Human habitation of other planets requires both cost-effective transportation and low time-of-flight for human passengers and critical supplies. The current methods for interplanetary orbital transfers, such as the Hohmann transfer, require either expensive, high fuel maneuvers or extended space travel. However, by utilizing the high velocities of a super-geosynchronous space elevator, spacecraft released from an apex anchor could achieve interplanetary transfers with minimal Delta V fuel and time of flight requirements. By using Lambert’s Problem and Free Release propagation to determine the minimal fuel transfer from a terrestrial space elevator to Mars under a variety of initial conditions and time-of-flight constraints, this paper demonstrates that the use of a space elevator release can address both needs by dramatically reducing the time-of-flight and the fuel budget.

ContributorsTorla, James (Author) / Peet, Matthew (Thesis director) / Swan, Peter (Committee member) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2020-05

Evaluation of an Original Design for a Cost-Effective Wheel-Mounted Dynamometer for Road Vehicles

Description

This thesis evaluates the viability of an original design for a cost-effective wheel-mounted dynamometer for road vehicles. The goal is to show whether or not a device that generates torque and horsepower curves by processing accelerometer data collected at the edge of a wheel can yield results that are comparable…

This thesis evaluates the viability of an original design for a cost-effective wheel-mounted dynamometer for road vehicles. The goal is to show whether or not a device that generates torque and horsepower curves by processing accelerometer data collected at the edge of a wheel can yield results that are comparable to results obtained using a conventional chassis dynamometer. Torque curves were generated via the experimental method under a variety of circumstances and also obtained professionally by a precision engine testing company. Metrics were created to measure the precision of the experimental device's ability to consistently generate torque curves and also to compare the similarity of these curves to the professionally obtained torque curves. The results revealed that although the test device does not quite provide the same level of precision as the professional chassis dynamometer, it does create torque curves that closely resemble the chassis dynamometer torque curves and exhibit a consistency between trials comparable to the professional results, even on rough road surfaces. The results suggest that the test device provides enough accuracy and precision to satisfy the needs of most consumers interested in measuring their vehicle's engine performance but probably lacks the level of accuracy and precision needed to appeal to professionals.

ContributorsKing, Michael (Author) / Ren, Yi (Thesis director) / Spanias, Andreas (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Mechanics of Origami Inspired Structures

Description

This research project will test the structural properties of a 3D printed origami inspired structure and compare them with a standard honeycomb structure. The models have equal face areas, model heights, and overall volume but wall thicknesses will be different. Stress-deformation curves were developed from static loading testing. The area…

This research project will test the structural properties of a 3D printed origami inspired structure and compare them with a standard honeycomb structure. The models have equal face areas, model heights, and overall volume but wall thicknesses will be different. Stress-deformation curves were developed from static loading testing. The area under these curves was used to calculate the toughness of the structures. These curves were analyzed to see which structures take more load and which deform more before fracture. Furthermore, graphs of the Stress-Strain plots were produced. Using 3-D printed parts in tough resin printed with a Stereolithography (SLA) printer, the origami inspired structure withstood a larger load, produced a larger toughness and deformed more before failure than the equivalent honeycomb structure.

ContributorsMcGregor, Alexander (Author) / Jiang, Hanqing (Thesis director) / Kingsbury, Dallas (Committee member) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Building Management System Integration: Energy Data Analytics

Description

This paper describes the research done to quantify the relationship between external air temperature and energy consumption and internal air temperature and energy consumption. The study was conducted on a LEED Gold certified building, College Avenue Commons, located on Arizona State University's Tempe campus. It includes information on the background…

This paper describes the research done to quantify the relationship between external air temperature and energy consumption and internal air temperature and energy consumption. The study was conducted on a LEED Gold certified building, College Avenue Commons, located on Arizona State University's Tempe campus. It includes information on the background of previous studies in the area, some that agree with the research hypotheses and some that take a different path. Real-time data was collected hourly for energy consumption and external air temperature. Intermittent internal air temperature was collected by undergraduate researcher, Charles Banke. Regression analysis was used to prove two research hypotheses. The authors found no correlation between external air temperature and energy consumption, nor did they find a relationship between internal air temperature and energy consumption. This paper also includes recommendations for future work to improve the study.

ContributorsBanke, Charles Michael (Author) / Chong, Oswald (Thesis director) / Parrish, Kristen (Committee member) / Mechanical and Aerospace Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05