Matching Items (84)
Filtering by

Clear all filters

193470-Thumbnail Image.png
Description
This thesis explores the development and integration of a wrist-worn pneumatic haptic interface, Pneutouch, into multiplayer virtual reality (VR) environments. The study investigates the impact of haptics on multiplayer experiences, with a specific focus on presence, collaboration, and communication. Evaluation and investigation were performed using three mini-games, each targeting specific

This thesis explores the development and integration of a wrist-worn pneumatic haptic interface, Pneutouch, into multiplayer virtual reality (VR) environments. The study investigates the impact of haptics on multiplayer experiences, with a specific focus on presence, collaboration, and communication. Evaluation and investigation were performed using three mini-games, each targeting specific interactions and investigating presence, collaboration, and communication. It was found that haptics enhanced user presence and object realism, increased user seriousness towards tasks, and shifted the focus of interactions from user-user to user-object. In collaborative tasks, haptics increased realism but did not improve efficiency for simple tasks. In communication tasks, a unique interaction modality, termed "haptic mirroring," was introduced, which explored a new form of communication that could be implemented with haptic devices. It was found that with new communication modalities, users experience an associated learning curve. Together, these findings suggest a new set of multiplayer haptic design considerations, such as how haptics increase seriousness, shift focus from social to physical interactions, generally increase realism but decrease task efficiency, and have associated learning curves. These findings contribute to the growing body of research on haptics in VR, particularly in multiplayer settings, and provide insights that can be further investigated or utilized in the implementation of VR experiences.
ContributorsManetta, Mason (Author) / LiKamWa, Robert (Thesis advisor) / Lahey, Byron (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2024
187323-Thumbnail Image.png
Description
Intelligent transportation systems (ITS) are a boon to modern-day road infrastructure. It supports traffic monitoring, road safety improvement, congestion reduction, and other traffic management tasks. For an ITS, roadside perception capability with cameras, LIDAR, and RADAR sensors is the key. Among various roadside perception technologies, vehicle keypoint detection is a

Intelligent transportation systems (ITS) are a boon to modern-day road infrastructure. It supports traffic monitoring, road safety improvement, congestion reduction, and other traffic management tasks. For an ITS, roadside perception capability with cameras, LIDAR, and RADAR sensors is the key. Among various roadside perception technologies, vehicle keypoint detection is a fundamental problem, which involves detecting and localizing specific points on a vehicle, such as the headlights, wheels, taillights, etc. These keypoints can be used to track the movement of the vehicles and their orientation. However, there are several challenges in vehicle keypoint detection, such as the variation in vehicle models and shapes, the presence of occlusion in traffic scenarios, the influence of weather and changing lighting conditions, etc. More importantly, existing traffic perception datasets for keypoint detection are mainly limited to the frontal view with sensors mounted on the ego vehicles. These datasets are not designed for traffic monitoring cameras that are mounted on roadside poles. There’s a huge advantage of capturing the data from roadside cameras as they can cover a much larger distance with a wider field of view in many different traffic scenes, but such a dataset is usually expensive to construct. In this research, I present SKOPE3D: Synthetic Keypoint Perception 3D dataset, a one-of-its-kind synthetic perception dataset generated using a simulator from the roadside perspective. It comes with 2D bounding boxes, 3D bounding boxes, tracking IDs, and 33 keypoints for each vehicle in the scene. The dataset consists of 25K frames spanning over 28 scenes with over 150K vehicles and 4.9M keypoints. A baseline keypoint RCNN model is trained on the dataset and is thoroughly evaluated on the test set. The experiments show the capability of the synthetic dataset and knowledge transferability between synthetic and real-world data.
ContributorsPahadia, Himanshu (Author) / Yang, Yezhou (Thesis advisor) / Lu, Duo (Committee member) / Farhadi Bajestani, Mohammad (Committee member) / Arizona State University (Publisher)
Created2023
187633-Thumbnail Image.png
Description
Insufficient training data poses significant challenges to training a deep convolutional neural network (CNN) to solve a target task. One common solution to this problem is to use transfer learning with pre-trained networks to apply knowledge learned from one domain with sufficient data to a new domain with limited data

Insufficient training data poses significant challenges to training a deep convolutional neural network (CNN) to solve a target task. One common solution to this problem is to use transfer learning with pre-trained networks to apply knowledge learned from one domain with sufficient data to a new domain with limited data and avoid training a deep network from scratch. However, for such methods to work in a transfer learning setting, learned features from the source domain need to be generalizable to the target domain, which is not guaranteed since the feature space and distributions of the source and target data may be different. This thesis aims to explore and understand the use of orthogonal convolutional neural networks to improve learning of diverse, generic features that are transferable to a novel task. In this thesis, orthogonal regularization is used to pre-train deep CNNs to investigate if and how orthogonal convolution may improve feature extraction in transfer learning. Experiments using two limited medical image datasets in this thesis suggests that orthogonal regularization improves generality and reduces redundancy of learned features more effectively in certain deep networks for transfer learning. The results on feature selection and classification demonstrate the improvement in transferred features helps select more expressive features that improves generalization performance. To understand the effectiveness of orthogonal regularization on different architectures, this work studies the effects of residual learning on orthogonal convolution. Specifically, this work examines the presence of residual connections and its effects on feature similarities and show residual learning blocks help orthogonal convolution better preserve feature diversity across convolutional layers of a network and alleviate the increase in feature similarities caused by depth, demonstrating the importance of residual learning in making orthogonal convolution more effective.
ContributorsChan, Tsz (Author) / Li, Baoxin (Thesis advisor) / Liang, Jianming (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2023
187635-Thumbnail Image.png
Description
Vision Transformers (ViT) achieve state-of-the-art performance on image classification tasks. However, their massive size makes them unsuitable for edge devices. Unlike CNNs, limited research has been conducted on the compression of ViTs. This thesis work proposes the ”adjoined training technique” to compress any transformer based architecture. The architecture, Adjoined Vision

Vision Transformers (ViT) achieve state-of-the-art performance on image classification tasks. However, their massive size makes them unsuitable for edge devices. Unlike CNNs, limited research has been conducted on the compression of ViTs. This thesis work proposes the ”adjoined training technique” to compress any transformer based architecture. The architecture, Adjoined Vision Transformer (AN-ViT), achieves state-of-the-art performance on the ImageNet classification task. With the base network as Swin Transformer, AN-ViT with 4.1× fewer parameters and 5.5× fewer floating point operations (FLOPs) achieves similar accuracy (within 0.15%). This work further proposes Differentiable Adjoined ViT (DAN-ViT), whichuses neural architecture search to find hyper-parameters of our model. DAN-ViT outperforms the current state-of-the-art methods including Swin-Transformers by about ∼ 0.07% and achieves 85.27% top-1 accuracy on the ImageNet dataset while using 2.2× fewer parameters and with 2.2× fewer FLOPs.
ContributorsGoel, Rajeev (Author) / Yang, Yingzhen (Thesis advisor) / Yang, Yezhou (Committee member) / Zou, Jia (Committee member) / Arizona State University (Publisher)
Created2023
187693-Thumbnail Image.png
Description
Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving robots without a complete understanding of the 3D spatial world.

Simultaneous localization and mapping (SLAM) has traditionally relied on low-level geometric or optical features. However, these features-based SLAM methods often struggle with feature-less or repetitive scenes. Additionally, low-level features may not provide sufficient information for robot navigation and manipulation, leaving robots without a complete understanding of the 3D spatial world. Advanced information is necessary to address these limitations. Fortunately, recent developments in learning-based 3D reconstruction allow robots to not only detect semantic meanings, but also recognize the 3D structure of objects from a few images. By combining this 3D structural information, SLAM can be improved from a low-level approach to a structure-aware approach. This work propose a novel approach for multi-view 3D reconstruction using recurrent transformer. This approach allows robots to accumulate information from multiple views and encode them into a compact latent space. The resulting latent representations are then decoded to produce 3D structural landmarks, which can be used to improve robot localization and mapping.
ContributorsHuang, Chi-Yao (Author) / Yang, Yezhou (Thesis advisor) / Turaga, Pavan (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2023
187836-Thumbnail Image.png
Description
Detection of anomalies before they are included in the downstream diagnosis/prognosis models is an important criterion for maintaining the medical AI imaging model performance across internal and external datasets. Furthermore, the need to curate huge amounts of data to train supervised models that produce precise results also requires an automated

Detection of anomalies before they are included in the downstream diagnosis/prognosis models is an important criterion for maintaining the medical AI imaging model performance across internal and external datasets. Furthermore, the need to curate huge amounts of data to train supervised models that produce precise results also requires an automated model that can accurately identify in-distribution (ID) and out-of-distribution (OOD) data for ensuring the training dataset quality. However, the core challenges for designing such as system are – (i) given the infinite variations of the anomaly, curation of training data is in-feasible; (ii) making assumptions about the types of anomalies are often hypothetical. The proposed work designed an unsupervised anomaly detection model using a cascade variational autoencoder coupled with a zero-shot learning network that maps the latent vectors to semantic attributes. The performance of the proposed model is shown on two different use cases – skin images and chest radiographs and also compare against the same class of state-of-the-art generative OOD detection models.
ContributorsRamasamy, Gokul (Author) / Banerjee, Imon (Thesis advisor) / Sanyal, Arindam (Thesis advisor) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2023
187854-Thumbnail Image.png
Description
Traditional sports coaching involves face-to-face instructions with athletes or playingback 2D videos of athletes’ training. However, if the coach is not in the same area as the athlete, then the coach will not be able to see the athlete’s full body and thus cannot give precise guidance to the athlete, limiting the

Traditional sports coaching involves face-to-face instructions with athletes or playingback 2D videos of athletes’ training. However, if the coach is not in the same area as the athlete, then the coach will not be able to see the athlete’s full body and thus cannot give precise guidance to the athlete, limiting the athlete’s improvement. To address these challenges, this paper proposes Augmented Coach, an augmented reality platform where coaches can view, manipulate and comment on athletes’ movement volumetric video data remotely via the network. In particular, this work includes a). Capturing the athlete’s movement video data with Kinects and converting it into point cloud format b). Transmitting the point cloud data to the coach’s Oculus headset via 5G or wireless network c). Coach’s commenting on the athlete’s joints. In addition, the evaluation of Augmented Coach includes an assessment of its performance from five metrics via the wireless network and 5G network environment, but also from the coaches’ and athletes’ experience of using it. The result shows that Augmented Coach enables coaches to instruct athletes from a distance and provide effective feedback for correcting athletes’ motions under the network.
ContributorsQiao, Yunhan (Author) / LiKamWa, Robert (Thesis advisor) / Bansal, Ajay (Committee member) / Jayasuriya, Suren (Committee member) / Arizona State University (Publisher)
Created2023
193564-Thumbnail Image.png
Description
Manipulator motion planning has conventionally been solved using sampling and optimization-based algorithms that are agnostic to embodiment and environment configurations. However, these algorithms plan on a fixed environment representation approximated using shape primitives, and hence struggle to find solutions for cluttered and dynamic environments. Furthermore, these algorithms fail to produce

Manipulator motion planning has conventionally been solved using sampling and optimization-based algorithms that are agnostic to embodiment and environment configurations. However, these algorithms plan on a fixed environment representation approximated using shape primitives, and hence struggle to find solutions for cluttered and dynamic environments. Furthermore, these algorithms fail to produce solutions for complex unstructured environments under real-time bounds. Neural Motion Planners (NMPs) are an appealing alternative to algorithmic approaches as they can leverage parallel computing for planning while incorporating arbitrary environmental constraints directly from raw sensor observations. Contemporary NMPs successfully transfer to different environment variations, however, fail to generalize across embodiments. This thesis proposes "AnyNMP'', a generalist motion planning policy for zero-shot transfer across different robotic manipulators and environments. The policy is conditioned on semantically segmented 3D pointcloud representation of the workspace thus enabling implicit sim2real transfer. In the proposed approach, templates are formulated for manipulator kinematics and ground truth motion plans are collected for over 3 million procedurally sampled robots in randomized environments. The planning pipeline consists of a state validation model for differentiable collision detection and a sampling based planner for motion generation. AnyNMP has been validated on 5 different commercially available manipulators and showcases successful cross-embodiment planning, achieving an 80% average success rate on baseline benchmarks.
ContributorsRath, Prabin Kumar (Author) / Gopalan, Nakul (Thesis advisor) / Yu, Hongbin (Thesis advisor) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2024
193555-Thumbnail Image.png
Description
Rapid advancements in artificial intelligence (AI) have revolutionized various do- mains, enabling the development of sophisticated models capable of solving complex problems. However, as AI systems increasingly participate in critical decision-making processes, concerns about their interpretability, robustness, and reliability have in- tensified. Interpretable AI models, such as the Concept-Centric Transformer

Rapid advancements in artificial intelligence (AI) have revolutionized various do- mains, enabling the development of sophisticated models capable of solving complex problems. However, as AI systems increasingly participate in critical decision-making processes, concerns about their interpretability, robustness, and reliability have in- tensified. Interpretable AI models, such as the Concept-Centric Transformer (CCT), have emerged as promising solutions to enhance transparency in AI models. Yet, in- creasing model interpretability often requires enriching training data with concept ex- planations, escalating training costs. Therefore, intrinsically interpretable models like CCT must be designed to be data-efficient, generalizable—to accommodate smaller training sets—and robust against noise and adversarial attacks. Despite progress in interpretable AI, ensuring the robustness of these models remains a challenge.This thesis enhances the data efficiency and generalizability of the CCT model by integrating four techniques: Perturbation Random Masking (PRM), Attention Random Dropout (ARD), and the integration of manifold mixup and input mixup for memory broadcast. Comprehensive experiments on benchmark datasets such as CIFAR-100, CUB-200-2011, and ImageNet show that the enhanced CCT model achieves modest performance improvements over the original model when using a full training set. Furthermore, this performance gap increases as the training data volume decreases, particularly in few-shot learning scenarios. The enhanced CCT maintains high accuracy with limited data (even without explicitly training on ex- ample concept-level explanations), demonstrating its potential for real-world appli- cations where labeled data are scarce. These findings suggest that the enhancements enable more effective use of CCT in settings with data constraints. Ablation studies reveal that no single technique—PRM, ARD, or mixups—dominates in enhancing performance and data efficiency. Each contributes nearly equally, and their combined application yields the best results, indicating a synergistic effect that bolsters the model’s capabilities without any single method being predominant. The results of this research highlight the efficacy of the proposed enhancements in refining CCT models for greater performance, robustness, and data efficiency. By demonstrating improved performance and resilience, particularly in data-limited sce- narios, this thesis underscores the practical applicability of advanced AI systems in critical decision-making roles.
ContributorsPark, Keun Hee (Author) / Pavlic, Theodore (Thesis advisor) / Choi, YooJung (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2024
157413-Thumbnail Image.png
Description
Rapid growth of internet and connected devices ranging from cloud systems to internet of things have raised critical concerns for securing these systems. In the recent past, security attacks on different kinds of devices have evolved in terms of complexity and diversity. One of the challenges is establishing secure communication

Rapid growth of internet and connected devices ranging from cloud systems to internet of things have raised critical concerns for securing these systems. In the recent past, security attacks on different kinds of devices have evolved in terms of complexity and diversity. One of the challenges is establishing secure communication in the network among various devices and systems. Despite being protected with authentication and encryption, the network still needs to be protected against cyber-attacks. For this, the network traffic has to be closely monitored and should detect anomalies and intrusions. Intrusion detection can be categorized as a network traffic classification problem in machine learning. Existing network traffic classification methods require a lot of training and data preprocessing, and this problem is more serious if the dataset size is huge. In addition, the machine learning and deep learning methods that have been used so far were trained on datasets that contain obsolete attacks. In this thesis, these problems are addressed by using ensemble methods applied on an up to date network attacks dataset. Ensemble methods use multiple learning algorithms to get better classification accuracy that could be obtained when the corresponding learning algorithm is applied alone. This dataset for network traffic classification has recent attack scenarios and contains over fifteen attacks. This approach shows that ensemble methods can be used to classify network traffic and detect intrusions with less training times of the model, and lesser pre-processing without feature selection. In addition, this thesis also shows that only with less than ten percent of the total features of input dataset will lead to similar accuracy that is achieved on whole dataset. This can heavily reduce the training times and classification duration in real-time scenarios.
ContributorsPonneganti, Ramu (Author) / Yau, Stephen (Thesis advisor) / Richa, Andrea (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2019