Matching Items (10)
Filtering by

Clear all filters

Description
Despite the wealth of folk music traditions in Portugal and the importance of the clarinet in the music of bandas filarmonicas, it is uncommon to find works featuring the clarinet using Portuguese folk music elements. In the interest of expanding this type of repertoire, three new works were commissioned from

Despite the wealth of folk music traditions in Portugal and the importance of the clarinet in the music of bandas filarmonicas, it is uncommon to find works featuring the clarinet using Portuguese folk music elements. In the interest of expanding this type of repertoire, three new works were commissioned from three different composers. The resulting works are Seres Imaginarios 3 by Luis Cardoso; Delirio Barroco by Tiago Derrica; and Memória by Pedro Faria Gomes. In an effort to submit these new works for inclusion into mainstream performance literature, the author has recorded these works on compact disc. This document includes interview transcripts with each composer, providing first-person discussion of each composition, as well as detailed biographical information on each composer. To provide context, the author has included a brief discussion on Portuguese folk music, and in particular, the role that the clarinet plays in Portuguese folk music culture.
ContributorsFerreira, Wesley (Contributor) / Spring, Robert S (Thesis advisor) / Bailey, Wayne (Committee member) / Gardner, Joshua (Committee member) / Hill, Gary (Committee member) / Schuring, Martin (Committee member) / Solis, Theodore (Committee member) / Arizona State University (Publisher)
Created2013
156962-Thumbnail Image.png
Description
With the end of Dennard scaling and Moore's law, architects have moved towards

heterogeneous designs consisting of specialized cores to achieve higher performance

and energy efficiency for a target application domain. Applications of linear algebra

are ubiquitous in the field of scientific computing, machine learning, statistics,

etc. with matrix computations being fundamental to these

With the end of Dennard scaling and Moore's law, architects have moved towards

heterogeneous designs consisting of specialized cores to achieve higher performance

and energy efficiency for a target application domain. Applications of linear algebra

are ubiquitous in the field of scientific computing, machine learning, statistics,

etc. with matrix computations being fundamental to these linear algebra based solutions.

Design of multiple dense (or sparse) matrix computation routines on the

same platform is quite challenging. Added to the complexity is the fact that dense

and sparse matrix computations have large differences in their storage and access

patterns and are difficult to optimize on the same architecture. This thesis addresses

this challenge and introduces a reconfigurable accelerator that supports both dense

and sparse matrix computations efficiently.

The reconfigurable architecture has been optimized to execute the following linear

algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM

(Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver),

LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication),

SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where

each core consists of a 2D array of processing elements (PE).

The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix

updates efficiently. A sequence of such updates is used to solve a larger problem inside

a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR)

is used to perform sparse kernel updates. Scalable partitioning and mapping schemes

are presented that map input matrices of any given size to the multicore architecture.

Design trade-offs related to the PE array dimension, size of local memory inside a core

and the bandwidth between on-chip memories and the cores have been presented. An

optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto

32 GOPS using a single core.
ContributorsAnimesh, Saurabh (Author) / Chakrabarti, Chaitali (Thesis advisor) / Brunhaver, John (Committee member) / Ren, Fengbo (Committee member) / Arizona State University (Publisher)
Created2018
Description
This project includes a recording and performance guide for three newly commissioned pieces for the clarinet. The first piece, shimmer, was written by Grant Jahn and is for B-flat clarinet and electronics. The second piece, Paragon, is for B-flat clarinet and piano and was composed by Dr. Theresa Martin. The

This project includes a recording and performance guide for three newly commissioned pieces for the clarinet. The first piece, shimmer, was written by Grant Jahn and is for B-flat clarinet and electronics. The second piece, Paragon, is for B-flat clarinet and piano and was composed by Dr. Theresa Martin. The third and final piece, Duality in the Eye of a Bovine, was written by Kurt Mehlenbacher and is for B-flat clarinet, bass clarinet, and piano. In addition to the performance guide, this document also includes background information and program notes for the compositions, as well as composer biographical information, a list of other works featuring the clarinet by each composer, and transcripts of composer and performer interviews. This document is accompanied by a recording of the three pieces.
ContributorsPoupard, Caitlin Marie (Author) / Spring, Robert (Thesis advisor) / Gardner, Joshua (Thesis advisor) / Hill, Gary (Committee member) / Oldani, Robert (Committee member) / Schuring, Martin (Committee member) / Arizona State University (Publisher)
Created2016
Description
The primary objective of this research project is to expand the clarinet repertoire with the addition of four new pieces. Each of these new pieces use contemporary clarinet techniques, including electronics, prerecorded sounds, multiphonics, circular breathing, multiple articulation, demi-clarinet, and the clari-flute. The repertoire composed includes Grant Jahn’s Duo for

The primary objective of this research project is to expand the clarinet repertoire with the addition of four new pieces. Each of these new pieces use contemporary clarinet techniques, including electronics, prerecorded sounds, multiphonics, circular breathing, multiple articulation, demi-clarinet, and the clari-flute. The repertoire composed includes Grant Jahn’s Duo for Two Clarinets, Reggie Berg’s Funkalicious for Clarinet and Piano, Rusty Banks’ Star Juice for Clarinet and Fixed Media, and Chris Malloy’s A Celestial Breath for Clarinet and Electronics. In addition to the musical commissions, this project also includes interviews with the composers indicating how they wrote these works and what their influences were, along with any information pertinent to the performer, professional recordings of each piece, as well as performance notes and suggestions.
ContributorsCase-Ruchala, Celeste Ann (Contributor) / Gardner, Joshua (Thesis advisor) / Spring, Robert (Thesis advisor) / Hill, Gary (Committee member) / Rogers, Rodney (Committee member) / Schuring, Martin (Committee member) / Arizona State University (Publisher)
Created2016
ContributorsGardner, Joshua (Performer) / Forsthoff, Kyle (Performer) / Novak, Gail (Pianist) (Performer) / ASU Library. Music Library (Publisher)
Created2006-04-14
Description
Producing a brighter electron beams requires the smallest possible emittance from the cathode with the highest possible current. Several materials like ordered surface, single-crystalline metal surfaces, ordered surface, epitaxially grown high quantum efficiency alkali-antimonides, topologically non-trivial Dirac semimetals, and nano-structured confined emission photocathodes show promise of achieving ultra-low emittance with

Producing a brighter electron beams requires the smallest possible emittance from the cathode with the highest possible current. Several materials like ordered surface, single-crystalline metal surfaces, ordered surface, epitaxially grown high quantum efficiency alkali-antimonides, topologically non-trivial Dirac semimetals, and nano-structured confined emission photocathodes show promise of achieving ultra-low emittance with large currents. This work investigates the various limitations to obtain the smallest possible emittance from photocathodes, and demonstrates the performance of a novel electron gun that can utilize these photocathodes under optimal photoemission conditions. Chapter 2 discusses the combined effect of physical roughness and work function variation which contributes to the emittance. This is particularly seen in polycrystalline materials and is an explanation for their higher than expected emittance performance when operated at the photoemission threshold. A computation method is described for estimating the simultaneous contribution of both types of roughness on the mean transverse energy. This work motivates the need for implementing ordered surface, single-crystalline or epitaxially grown photocathodes. Chapter 3 investigates the effects of coulomb interactions on electron beams from theoretically low emittance, low total energy spread nanoscale photoemission sources specifically for electron microscopy applications. This computation work emphasizes the key role that image charge effects have on such cold, dense electron beams. Contrary to initial expectations, the primary limiter to beam brightness for theoretically ultra-low emittance photocathodes is the saturation current. Chapters 4 and 5 describe the development and commissioning of a high accelerating gradient, cryogenically cooled electron gun and photoemission diagnostics beamline within the Arizona State University Photoemission and Bright Beams research lab. This accelerator is unique in it's capability to utilize photocathodes mounted on holders typically used in commercial surface chemistry tools, has the necessary features and tools for operating in the optimal regime for many advanced photocathodes. A Pinhole Scan technique has been implemented on the beamline, and has shown a full 4-dimensional phase space measurement demonstrating the ability to measure beam brightness in this gun. This gun will allow for the demonstration of ultra-high brightness from next-generation ultra-low emittance photocathodes.
ContributorsGevorkyan, Gevork Samvelovich (Author) / Karkare, Siddharth (Thesis advisor) / Padmore, Howard (Committee member) / Alarcon, Ricardo (Committee member) / Kaindl, Robert (Committee member) / Graves, William (Committee member) / Arizona State University (Publisher)
Created2023
157982-Thumbnail Image.png
Description
Ultrasound B-mode imaging is an increasingly significant medical imaging modality for clinical applications. Compared to other imaging modalities like computed tomography (CT) or magnetic resonance imaging (MRI), ultrasound imaging has the advantage of being safe, inexpensive, and portable. While two dimensional (2-D) ultrasound imaging is very popular, three dimensional (3-D)

Ultrasound B-mode imaging is an increasingly significant medical imaging modality for clinical applications. Compared to other imaging modalities like computed tomography (CT) or magnetic resonance imaging (MRI), ultrasound imaging has the advantage of being safe, inexpensive, and portable. While two dimensional (2-D) ultrasound imaging is very popular, three dimensional (3-D) ultrasound imaging provides distinct advantages over its 2-D counterpart by providing volumetric imaging, which leads to more accurate analysis of tumor and cysts. However, the amount of received data at the front-end of 3-D system is extremely large, making it impractical for power-constrained portable systems.



In this thesis, algorithm and hardware design techniques to support a hand-held 3-D ultrasound imaging system are proposed. Synthetic aperture sequential beamforming (SASB) is chosen since its computations can be split into two stages, where the output generated of Stage 1 is significantly smaller in size compared to the input. This characteristic enables Stage 1 to be done in the front end while Stage 2 can be sent out to be processed elsewhere.



The contributions of this thesis are as follows. First, 2-D SASB is extended to 3-D. Techniques to increase the volume rate of 3-D SASB through a new multi-line firing scheme and use of linear chirp as the excitation waveform, are presented. A new sparse array design that not only reduces the number of active transducers but also avoids the imaging degradation caused by grating lobes, is proposed. A combination of these techniques increases the volume rate of 3-D SASB by 4\texttimes{} without introducing extra computations at the front end.



Next, algorithmic techniques to further reduce the Stage 1 computations in the front end are presented. These include reducing the number of distinct apodization coefficients and operating with narrow-bit-width fixed-point data. A 3-D die stacked architecture is designed for the front end. This highly parallel architecture enables the signals received by 961 active transducers to be digitalized, routed by a network-on-chip, and processed in parallel. The processed data are accumulated through a bus-based structure. This architecture is synthesized using TSMC 28 nm technology node and the estimated power consumption of the front end is less than 2 W.



Finally, the Stage 2 computations are mapped onto a reconfigurable multi-core architecture, TRANSFORMER, which supports different types of on-chip memory banks and run-time reconfigurable connections between general processing elements and memory banks. The matched filtering step and the beamforming step in Stage 2 are mapped onto TRANSFORMER with different memory configurations. Gem5 simulations show that the private cache mode generates shorter execution time and higher computation efficiency compared to other cache modes. The overall execution time for Stage 2 is 14.73 ms. The average power consumption and the average Giga-operations-per-second/Watt in 14 nm technology node are 0.14 W and 103.84, respectively.
ContributorsZhou, Jian (Author) / Chakrabarti, Chaitali (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Wenisch, Thomas F. (Committee member) / Ogras, Umit Y. (Committee member) / Arizona State University (Publisher)
Created2019
158684-Thumbnail Image.png
Description
The advances of Deep Learning (DL) achieved recently have successfully demonstrated its great potential of surpassing or close to human-level performance across multiple domains. Consequently, there exists a rising demand to deploy state-of-the-art DL algorithms, e.g., Deep Neural Networks (DNN), in real-world applications to release labors from repetitive work. On

The advances of Deep Learning (DL) achieved recently have successfully demonstrated its great potential of surpassing or close to human-level performance across multiple domains. Consequently, there exists a rising demand to deploy state-of-the-art DL algorithms, e.g., Deep Neural Networks (DNN), in real-world applications to release labors from repetitive work. On the one hand, the impressive performance achieved by the DNN normally accompanies with the drawbacks of intensive memory and power usage due to enormous model size and high computation workload, which significantly hampers their deployment on the resource-limited cyber-physical systems or edge devices. Thus, the urgent demand for enhancing the inference efficiency of DNN has also great research interests across various communities. On the other hand, scientists and engineers still have insufficient knowledge about the principles of DNN which makes it mostly be treated as a black-box. Under such circumstance, DNN is like "the sword of Damocles" where its security or fault-tolerance capability is an essential concern which cannot be circumvented.

Motivated by the aforementioned concerns, this dissertation comprehensively investigates the emerging efficiency and security issues of DNNs, from both software and hardware design perspectives. From the efficiency perspective, as the foundation technique for efficient inference of target DNN, the model compression via quantization is elaborated. In order to maximize the inference performance boost, the deployment of quantized DNN on the revolutionary Computing-in-Memory based neural accelerator is presented in a cross-layer (device/circuit/system) fashion. From the security perspective, the well known adversarial attack is investigated spanning from its original input attack form (aka. Adversarial example generation) to its parameter attack variant.
Contributorshe, zhezhi (Author) / Fan, Deliang (Thesis advisor) / Chakrabarti, Chaitali (Committee member) / Cao, Yu (Committee member) / Seo, Jae-Sun (Committee member) / Arizona State University (Publisher)
Created2020
161344-Thumbnail Image.png
Description
Over the past decades, the amount of data required to be processed and analyzed by computing systems has been increasing dramatically to exascale (10^18 bytes/s or ops). However, modern computing platforms' inability to deliver both energy-efficient and high-performance computing solutions leads to a gap between meets and needs, especially in

Over the past decades, the amount of data required to be processed and analyzed by computing systems has been increasing dramatically to exascale (10^18 bytes/s or ops). However, modern computing platforms' inability to deliver both energy-efficient and high-performance computing solutions leads to a gap between meets and needs, especially in resource-constraint Internet of Things (IoT) devices. Unfortunately, such a gap will keep widening mainly due to limitations in both devices and architectures. With this motivation, this dissertation's focus is on cross-layer (device/circuit/architecture/application) co-design of energy-efficient and high-performance Processing-in-Memory (PIM) platforms for implementing complex big data applications, i.e., deep learning, bioinformatics, graph processing tasks, and data encryption. The dissertation shows how to leverage innovations from device, circuit, and architecture to integrate memory and logic to break the existing memory and power walls and dramatically increase computing efficiency of today’s non-Von-Neumann computing systems.The proposed PIM platforms transform current volatile and non-volatile random access memory arrays to computational units capable of working as both memory and low-area-overhead, massively parallel, fast, reconfigurable in-memory logic. Instead of integrating complex logic units in cost-sensitive memory, the explored designs exploit hardware-friendly bit-line computing methods to implement complete Boolean logic functions between operands within a memory array in a reduced clock cycle, overcoming the multi-cycle logic issue in modern PIM platforms. Besides, new customized in-memory algorithms and mapping methods are developed to convert the crucial iteratively-used big data application's functions to bit-wise PIM-supported logic. To quantitatively analyze the performance of various PIM platforms running big data applications, a generic and comprehensive evaluation framework is presented. The overall system computing performance (throughput, latency, energy efficiency) for each application is explored through the developed framework. The device-to-algorithm co-simulation results on neural network acceleration demonstrate that the proposed platforms can obtain 36.8× higher energy-efficiency and 22× speed-up compared to state-of-the-art Graphics Processing Unit (GPU). In accelerating bioinformatics tasks such as biological sequence alignment, the presented PIM designs result in ~2×, 43.8×, 458× more throughput per Watt compared to state-of-the-art Application-Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA), and GPU platforms, respectively.
ContributorsAngizi, Shaahin (Author) / Fan, Deliang (Thesis advisor) / Seo, Jae-Sun (Committee member) / Awad, Amro (Committee member) / Zhang, Wei (Committee member) / Arizona State University (Publisher)
Created2021
153968-Thumbnail Image.png
Description
The holy grail of computer hardware across all market segments has been to sustain performance improvement at the same pace as silicon technology scales. As the technology scales and the size of transistors shrinks, the power consumption and energy usage per transistor decrease. On the other hand, the transistor density

The holy grail of computer hardware across all market segments has been to sustain performance improvement at the same pace as silicon technology scales. As the technology scales and the size of transistors shrinks, the power consumption and energy usage per transistor decrease. On the other hand, the transistor density increases significantly by technology scaling. Due to technology factors, the reduction in power consumption per transistor is not sufficient to offset the increase in power consumption per unit area. Therefore, to improve performance, increasing energy-efficiency must be addressed at all design levels from circuit level to application and algorithm levels.

At architectural level, one promising approach is to populate the system with hardware accelerators each optimized for a specific task. One drawback of hardware accelerators is that they are not programmable. Therefore, their utilization can be low as they perform one specific function. Using software programmable accelerators is an alternative approach to achieve high energy-efficiency and programmability. Due to intrinsic characteristics of software accelerators, they can exploit both instruction level parallelism and data level parallelism.

Coarse-Grained Reconfigurable Architecture (CGRA) is a software programmable accelerator consists of a number of word-level functional units. Motivated by promising characteristics of software programmable accelerators, the potentials of CGRAs in future computing platforms is studied and an end-to-end CGRA research framework is developed. This framework consists of three different aspects: CGRA architectural design, integration in a computing system, and CGRA compiler. First, the design and implementation of a CGRA and its instruction set is presented. This design is then modeled in a cycle accurate system simulator. The simulation platform enables us to investigate several problems associated with a CGRA when it is deployed as an accelerator in a computing system. Next, the problem of mapping a compute intensive region of a program to CGRAs is formulated. From this formulation, several efficient algorithms are developed which effectively utilize CGRA scarce resources very well to minimize the running time of input applications. Finally, these mapping algorithms are integrated in a compiler framework to construct a compiler for CGRA
ContributorsHamzeh, Mahdi (Author) / Vrudhula, Sarma (Thesis advisor) / Gopalakrishnan, Kailash (Committee member) / Shrivastava, Aviral (Committee member) / Wu, Carole-Jean (Committee member) / Arizona State University (Publisher)
Created2015