This collection includes most of the ASU Theses and Dissertations from 2011 to present. ASU Theses and Dissertations are available in downloadable PDF format; however, a small percentage of items are under embargo. Information about the dissertations/theses includes degree information, committee members, an abstract, supporting data or media.

In addition to the electronic theses found in the ASU Digital Repository, ASU Theses and Dissertations can be found in the ASU Library Catalog.

Dissertations and Theses granted by Arizona State University are archived and made available through a joint effort of the ASU Graduate College and the ASU Libraries. For more information or questions about this collection contact or visit the Digital Repository ETD Library Guide or contact the ASU Graduate College at gradformat@asu.edu.

Displaying 1 - 1 of 1
Filtering by

Clear all filters

187635-Thumbnail Image.png
Description
Vision Transformers (ViT) achieve state-of-the-art performance on image classification tasks. However, their massive size makes them unsuitable for edge devices. Unlike CNNs, limited research has been conducted on the compression of ViTs. This thesis work proposes the ”adjoined training technique” to compress any transformer based architecture. The architecture, Adjoined Vision

Vision Transformers (ViT) achieve state-of-the-art performance on image classification tasks. However, their massive size makes them unsuitable for edge devices. Unlike CNNs, limited research has been conducted on the compression of ViTs. This thesis work proposes the ”adjoined training technique” to compress any transformer based architecture. The architecture, Adjoined Vision Transformer (AN-ViT), achieves state-of-the-art performance on the ImageNet classification task. With the base network as Swin Transformer, AN-ViT with 4.1× fewer parameters and 5.5× fewer floating point operations (FLOPs) achieves similar accuracy (within 0.15%). This work further proposes Differentiable Adjoined ViT (DAN-ViT), whichuses neural architecture search to find hyper-parameters of our model. DAN-ViT outperforms the current state-of-the-art methods including Swin-Transformers by about ∼ 0.07% and achieves 85.27% top-1 accuracy on the ImageNet dataset while using 2.2× fewer parameters and with 2.2× fewer FLOPs.
ContributorsGoel, Rajeev (Author) / Yang, Yingzhen (Thesis advisor) / Yang, Yezhou (Committee member) / Zou, Jia (Committee member) / Arizona State University (Publisher)
Created2023