Search Content

Matching Items (2)

Filtering by

Genre: Academic theses
Status: Published

Accelerating Linear Algebra and Machine Learning Kernels on a Massively Parallel Reconfigurable Architecture

Description

This thesis presents efficient implementations of several linear algebra kernels, machine learning kernels and a neural network based recommender systems engine onto a massively parallel reconfigurable architecture, Transformer. The linear algebra kernels include Triangular Matrix Solver (TRSM), LU Decomposition (LUD), QR Decomposition (QRD), and Matrix Inversion. The machine learning kernels include an LSTM (Long Short Term Memory) cell, and a GRU (gated Recurrent Unit) cell used in recurrent neural networks. The neural network based recommender systems engine consists of multiple kernels including fully connected layers, embedding layer, 1-D batchnorm, Adam optimizer, etc.

Transformer is a massively parallel reconfigurable multicore architecture designed at the University of Michigan. The Transformer configuration considered here is 4 tiles and 16 General Processing Elements (GPEs) per tile. It supports a two level cache hierarchy where the L1 and L2 caches can operate in shared (S) or private (P) modes. The architecture was modeled using Gem5 and cycle accurate simulations were done to evaluate the performance in terms of execution times, giga-operations per second per Watt (GOPS/W), and giga-floating-point-operations per second per Watt (GFLOPS/W).

This thesis shows that for linear algebra kernels, each kernel achieves high performance for a certain cache mode and that this cache mode can change when the matrix size changes. For instance, for smaller matrix sizes, L1P, L2P cache mode is best for TRSM, while L1S, L2S is the best cache mode for LUD, and L1P, L2S is the best for QRD. For each kernel, the optimal cache mode changes when the matrix size is increased. For instance, for TRSM, the L1P, L2P cache mode is best for smaller matrix sizes ($N=64, 128, 256, 512$) and it changes to L1S, L2P for larger matrix sizes ($N=1024$). For machine learning kernels, L1P, L2P is the best cache mode for all network parameter sizes.

Gem5 simulations show that the peak performance for TRSM, LUD, QRD and Matrix Inverse in the 14nm node is 97.5, 59.4, 133.0 and 83.05 GFLOPS/W, respectively. For LSTM and GRU, the peak performance is 44.06 and 69.3 GFLOPS/W.

The neural network based recommender system was implemented in L1S, L2S cache mode. It includes a forward pass and a backward pass and is significantly more complex in terms of both computational complexity and data movement. The most computationally intensive block is the fully connected layer followed by Adam optimizer. The overall performance of the recommender systems engine is 54.55 GFLOPS/W and 169.12 GOPS/W.

ContributorsSoorishetty, Anuraag (Author) / Chakrabarti, Chaitali (Thesis advisor) / Kim, Hun Seok (Committee member) / LiKamWa, Robert (Committee member) / Arizona State University (Publisher)

Created2019

The Cavaillé-Coll organ and César Franck's Six pièces

Description

Nineteenth-century French organ builder Aristide Cavaillé-Coll and organist-composer César Franck established a foundation for the revival of organ music in France. Following the French Revolution, organ culture had degenerated because of the instrument's association with the church. Beginning with his instrument at St. Dénis, Cavaillé-Coll created a new symphonic organ that made it possible for composers to write organ music in the new Romantic aesthetic. In 1859, Franck received a new Cavaillé-Coll organ at the Parisian church where he served as organist, Sainte-Clotilde. He began experimenting with the innovations of this instrument: an expressive division, mechanical assists, new types of tone color, and an expanded pedal division. From about 1860, Franck began composing his first pieces for the Cavaillé-Coll organ; these were published in 1868 as the Six Pièces. With these compositions, Franck led the way in adapting the resources of the French symphonic organ to Romantic music. In this paper, I provide an analysis of the structure of each of the Six Pièces as a foundation for exploring ways in which Franck exploited the new features of his Cavaillé-Coll organ. I have made sound recordings to demonstrate specific examples of how the music fits the organ. Thanks to Cavaillé-Coll's innovations in organ building, Franck was able to write large-scale, multi-thematic works with the sonorous resources necessary to render them convincingly. The Six Pièces reveal a strong creative exchange between organist and organ builder, and they portend many of the subsequent developments of the French symphonic organ school.

ContributorsSung, Anna (Author) / Marshall, Kimberly (Thesis advisor) / Ryan, Russell (Committee member) / Rogers, Rodney (Committee member) / Pagano, Caio (Committee member) / Arizona State University (Publisher)

Created2012