Filtering by
- Creators: Panchanathan, Sethuraman
- Creators: Computer Science and Engineering Program
The dissertation outlines novel domain adaptation approaches across different feature spaces; (i) a linear Support Vector Machine model for domain alignment; (ii) a nonlinear kernel based approach that embeds domain-aligned data for enhanced classification; (iii) a hierarchical model implemented using deep learning, that estimates domain-aligned hash values for the source and target data, and (iv) a proposal for a feature selection technique to reduce cross-domain disparity. These adaptation procedures are tested and validated across a range of computer vision applications like object classification, facial expression recognition, digit recognition, and activity recognition. The dissertation also provides a unique perspective of domain adaptation literature from the point-of-view of linear, nonlinear and hierarchical feature spaces. The dissertation concludes with a discussion on the future directions for research that highlight the role of domain adaptation in an era of rapid advancements in artificial intelligence.
In this thesis, a novel domain adaptation algorithm -- Domain Adaptive Fusion (DAF) -- is proposed, which encourages a domain-invariant linear relationship between the pixel-space of different domains and the prediction-space while being trained under a domain adversarial signal. The thoughtful combination of key components in unsupervised domain adaptation and semi-supervised learning enable DAF to effectively bridge the gap between source and target domains. Experiments performed on computer vision benchmark datasets for domain adaptation endorse the efficacy of this hybrid approach, outperforming all of the baseline architectures on most of the transfer tasks.
This thesis presents three models for incremental learning; (i) Design of an algorithm for generative incremental learning using a pre-trained deep neural network classifier; (ii) Development of a hashing based clustering algorithm for efficient incremental learning; (iii) Design of a student-teacher coupled neural network to distill knowledge for incremental learning. The proposed algorithms were evaluated using popular vision datasets for classification tasks. The thesis concludes with a discussion about the feasibility of using these techniques to transfer information between networks and also for incremental learning applications.
This thesis provides solutions for the standard problem of unsupervised domain adaptation (UDA) and the more generic problem of generalized domain adaptation (GDA). The contributions of this thesis are as follows. (1) Certain and Consistent Domain Adaptation model for closed-set unsupervised domain adaptation by aligning the features of the source and target domain using deep neural networks. (2) A multi-adversarial deep learning model for generalized domain adaptation. (3) A gating model that detects out-of-distribution samples for generalized domain adaptation.
The models were tested across multiple computer vision datasets for domain adaptation.
The dissertation concludes with a discussion on the proposed approaches and future directions for research in closed set and generalized domain adaptation.
severe visual impairments, this creates countless barriers to the participation and
enjoyment of life’s opportunities. Technological progress has been both a blessing and
a curse in this regard. Digital text together with screen readers and refreshable Braille
displays have made whole libraries readily accessible and rideshare tech has made
independent mobility more attainable. Simultaneously, screen-based interactions and
experiences have only grown in pervasiveness and importance, precluding many of
those with visual impairments.
Sensory Substituion, the process of substituting an unavailable modality with
another one, has shown promise as an alternative to accomodation, but in recent
years meaningful strides in Sensory Substitution for vision have declined in frequency.
Given recent advances in Computer Vision, this stagnation is especially disconcerting.
Designing Sensory Substitution Devices (SSDs) for vision for use in interactive settings
that leverage modern Computer Vision techniques presents a variety of challenges
including perceptual bandwidth, human-computer-interaction, and person-centered
machine learning considerations. To surmount these barriers an approach called Per-
sonal Foveated Haptic Gaze (PFHG), is introduced. PFHG consists of two primary
components: a human visual system inspired interaction paradigm that is intuitive
and flexible enough to generalize to a variety of applications called Foveated Haptic
Gaze (FHG), and a person-centered learning component to address the expressivity
limitations of most SSDs. This component is called One-Shot Object Detection by
Data Augmentation (1SODDA), a one-shot object detection approach that allows a
user to specify the objects they are interested in locating visually and with minimal
effort realizing an object detection model that does so effectively.
The Personal Foveated Haptic Gaze framework was realized in a virtual and real-
world application: playing a 3D, interactive, first person video game (DOOM) and
finding user-specified real-world objects. User study results found Foveated Haptic
Gaze to be an effective and intuitive interface for interacting with dynamic visual
world using solely haptics. Additionally, 1SODDA achieves competitive performance
among few-shot object detection methods and high-framerate many-shot object de-
tectors. The combination of which paves the way for modern Sensory Substitution
Devices for vision.
Recent advancements in machine learning methods have allowed companies to develop advanced computer vision aided production lines that take advantage of the raw and labeled data captured by high-definition cameras mounted at vantage points in their factory floor. We experiment with two different methods of developing one such system to automatically track key components on a production line. By tracking the state of these key components using object detection we can accurately determine and report production line metrics like part arrival and start/stop times for key factory processes. We began by collecting and labeling raw image data from the cameras overlooking the factory floor. Using that data we trained two dedicated object detection models. Our training utilized transfer learning to start from a Faster R-CNN ResNet model trained on Microsoft’s COCO dataset. The first model we developed is a binary classifier that detects the state of a single object while the second model is a multiclass classifier that detects the state of two distinct objects on the factory floor. Both models achieved over 95% classification and localization accuracy on our test datasets. Having two additional classes did not affect the classification or localization accuracy of the multiclass model compared to the binary model.