<?xml version="1.0"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-05-24T16:20:58Z</responseDate><request verb="GetRecord" metadataPrefix="oai_dc">https://keep.lib.asu.edu/oai/request</request><GetRecord><record><header><identifier>oai:keep.lib.asu.edu:node-156962</identifier><datestamp>2024-12-20T18:25:12Z</datestamp><setSpec>oai_pmh:all</setSpec><setSpec>oai_pmh:repo_items</setSpec></header><metadata><oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:identifier>156962</dc:identifier>
          <dc:identifier>https://hdl.handle.net/2286/R.I.51737</dc:identifier>
                  <dc:rights>http://rightsstatements.org/vocab/InC/1.0/</dc:rights>
                  <dc:date>2018</dc:date>
                  <dc:format>79 pages</dc:format>
                  <dc:type>Masters Thesis</dc:type>
          <dc:type>Academic theses</dc:type>
          <dc:type>Text</dc:type>
                  <dc:language>eng</dc:language>
                  <dc:contributor>Animesh, Saurabh</dc:contributor>
          <dc:contributor>Chakrabarti, Chaitali</dc:contributor>
          <dc:contributor>Brunhaver, John</dc:contributor>
          <dc:contributor>Ren, Fengbo</dc:contributor>
          <dc:contributor>Arizona State University</dc:contributor>
                  <dc:description>Masters Thesis Computer Engineering 2018</dc:description>
          <dc:description>With the end of Dennard scaling and Moore&#039;s law, architects have moved towards&lt;br/&gt;&lt;br/&gt;heterogeneous designs consisting of specialized cores to achieve higher performance&lt;br/&gt;&lt;br/&gt;and energy efficiency for a target application domain. Applications of linear algebra&lt;br/&gt;&lt;br/&gt;are ubiquitous in the field of scientific computing, machine learning, statistics,&lt;br/&gt;&lt;br/&gt;etc. with matrix computations being fundamental to these linear algebra based solutions.&lt;br/&gt;&lt;br/&gt;Design of multiple dense (or sparse) matrix computation routines on the&lt;br/&gt;&lt;br/&gt;same platform is quite challenging. Added to the complexity is the fact that dense&lt;br/&gt;&lt;br/&gt;and sparse matrix computations have large differences in their storage and access&lt;br/&gt;&lt;br/&gt;patterns and are difficult to optimize on the same architecture. This thesis addresses&lt;br/&gt;&lt;br/&gt;this challenge and introduces a reconfigurable accelerator that supports both dense&lt;br/&gt;&lt;br/&gt;and sparse matrix computations efficiently.&lt;br/&gt;&lt;br/&gt;The reconfigurable architecture has been optimized to execute the following linear&lt;br/&gt;&lt;br/&gt;algebra routines: GEMV (Dense General Matrix Vector Multiplication), GEMM&lt;br/&gt;&lt;br/&gt;(Dense General Matrix Matrix Multiplication), TRSM (Triangular Matrix Solver),&lt;br/&gt;&lt;br/&gt;LU Decomposition, Matrix Inverse, SpMV (Sparse Matrix Vector Multiplication),&lt;br/&gt;&lt;br/&gt;SpMM (Sparse Matrix Matrix Multiplication). It is a multicore architecture where&lt;br/&gt;&lt;br/&gt;each core consists of a 2D array of processing elements (PE).&lt;br/&gt;&lt;br/&gt;The 2D array of PEs is of size 4x4 and is scheduled to perform 4x4 sized matrix&lt;br/&gt;&lt;br/&gt;updates efficiently. A sequence of such updates is used to solve a larger problem inside&lt;br/&gt;&lt;br/&gt;a core. A novel partitioned block compressed sparse data structure (PBCSC/PBCSR)&lt;br/&gt;&lt;br/&gt;is used to perform sparse kernel updates. Scalable partitioning and mapping schemes&lt;br/&gt;&lt;br/&gt;are presented that map input matrices of any given size to the multicore architecture.&lt;br/&gt;&lt;br/&gt;Design trade-offs related to the PE array dimension, size of local memory inside a core&lt;br/&gt;&lt;br/&gt;and the bandwidth between on-chip memories and the cores have been presented. An&lt;br/&gt;&lt;br/&gt;optimal core configuration is developed from this analysis. Synthesis results using a 7nm PDK show that the proposed accelerator can achieve a performance of upto&lt;br/&gt;&lt;br/&gt;32 GOPS using a single core.</dc:description>
                  <dc:subject>Electrical Engineering</dc:subject>
          <dc:subject>Accelerator</dc:subject>
          <dc:subject>Algebras, Linear</dc:subject>
          <dc:subject>matrix</dc:subject>
          <dc:subject>Multicore</dc:subject>
          <dc:subject>Reconfigurable</dc:subject>
          <dc:subject>sparse</dc:subject>
                  <dc:title>Algorithm Architecture Co-design for Dense and Sparse Matrix Computations</dc:title></oai_dc:dc></metadata></record></GetRecord></OAI-PMH>
