Search Content

Exploring Deep Learning for Video Understanding

Description

Video analysis and understanding have obtained more and more attention in recent years. The research community also has devoted considerable effort and made progress in many related visual tasks, like video action/event recognition, thumbnail frame or video index retrieval, and zero-shot learning. The way to find good representative features of…

Video analysis and understanding have obtained more and more attention in recent years. The research community also has devoted considerable effort and made progress in many related visual tasks, like video action/event recognition, thumbnail frame or video index retrieval, and zero-shot learning. The way to find good representative features of videos is an important objective for these visual tasks.

Thanks to the success of deep neural networks in recent vision tasks, it is natural to take the deep learning methods into consideration for better extraction of a global representation of the images and videos. In general, Convolutional Neural Network (CNN) is utilized for obtaining the spatial information, and Recurrent Neural Network (RNN) is leveraged for capturing the temporal information.

This dissertation provides a perspective of the challenging problems in different kinds of videos which may require different solutions. Therefore, several novel deep learning-based approaches of obtaining representative features are outlined for different visual tasks like zero-shot learning, video retrieval, and video event recognition in this dissertation. To better understand and obtained the video spatial and temporal information, Convolutional Neural Network and Recurrent Neural Network are jointly utilized in most approaches. And different experiments are conducted to present the importance and effectiveness of good representative features for obtaining a better knowledge of video clips in the computer vision field. This dissertation also concludes a discussion with possible future works of obtaining better representative features of more challenging video clips.

ContributorsLi, Yikang (Author) / Li, Baoxin BL (Thesis advisor) / Karam, Lina LK (Committee member) / LiKamWa, Robert RL (Committee member) / Yang, Yezhou YY (Committee member) / Arizona State University (Publisher)

Created2020

Domain Concretization from Examples: Addressing Missing Domain Knowledge via Robust Planning

Description

Most planning agents assume complete knowledge of the domain, which may not be the case in scenarios where certain domain knowledge is missing. This problem could be due to design flaws or arise from domain ramifications or qualifications. In such cases, planning algorithms could produce highly undesirable behaviors. Planning with…

Most planning agents assume complete knowledge of the domain, which may not be the case in scenarios where certain domain knowledge is missing. This problem could be due to design flaws or arise from domain ramifications or qualifications. In such cases, planning algorithms could produce highly undesirable behaviors. Planning with incomplete domain knowledge is more challenging than partial observability in the sense that the planning agent is unaware of the existence of such knowledge, in contrast to it being just unobservable or partially observable. That is the difference between known unknowns and unknown unknowns.

In this thesis, I introduce and formulate this as the problem of Domain Concretization, which is inverse to domain abstraction studied extensively before. Furthermore, I present a solution that starts from the incomplete domain model provided to the agent by the designer and uses teacher traces from human users to determine the candidate model set under a minimalistic model assumption. A robust plan is then generated for the maximum probability of success under the set of candidate models. In addition to a standard search formulation in the model-space, I propose a sample-based search method and also an online version of it to improve search time. The solution presented has been evaluated on various International Planning Competition domains where incompleteness was introduced by deleting certain predicates from the complete domain model. The solution is also tested in a robot simulation domain to illustrate its effectiveness in handling incomplete domain knowledge. The results show that the plan generated by the algorithm increases the plan success rate without impacting action cost too much.

ContributorsSharma, Akshay (Author) / Zhang, Yu (Thesis advisor) / Fainekos, Georgios (Committee member) / Srivastava, Siddharth (Committee member) / Arizona State University (Publisher)

Created2020

Nurturing Open Design: Challenges and Opportunities for HCI to Support Crowd-driven Hardware Design

Description

Open Design is a crowd-driven global ecosystem which tries to challenge and alter contemporary modes of capitalistic hardware production. It strives to build on the collective skills, expertise and efforts of people regardless of their educational, social or political backgrounds to develop and disseminate physical products, machines and systems. In…

Open Design is a crowd-driven global ecosystem which tries to challenge and alter contemporary modes of capitalistic hardware production. It strives to build on the collective skills, expertise and efforts of people regardless of their educational, social or political backgrounds to develop and disseminate physical products, machines and systems. In contrast to capitalistic hardware production, Open Design practitioners publicly share design files, blueprints and knowhow through various channels including internet platforms and in-person workshops. These designs are typically replicated, modified, improved and reshared by individuals and groups who are broadly referred to as ‘makers’.

This dissertation aims to expand the current scope of Open Design within human-computer interaction (HCI) research through a long-term exploration of Open Design’s socio-technical processes. I examine Open Design from three perspectives: the functional—materials, tools, and platforms that enable crowd-driven open hardware production, the critical—materially-oriented engagements within open design as a site for sociotechnical discourse, and the speculative—crowd-driven critical envisioning of future hardware.

More specifically, this dissertation first explores the growing global scene of Open Design through a long-term ethnographic study of the open science hardware (OScH) movement, a genre of Open Design. This long-term study of OScH provides a focal point for HCI to deeply understand Open Design's growing global landscape. Second, it examines the application of Critical Making within Open Design through an OScH workshop with designers, engineers, artists and makers from local communities. This work foregrounds the role of HCI researchers as facilitators of collaborative critical engagements within Open Design. Third, this dissertation introduces the concept of crowd-driven Design Fiction through the development of a publicly accessible online Design Fiction platform named Dream Drones. Through a six month long development and a study with drone related practitioners, it offers several pragmatic insights into the challenges and opportunities for crowd-driven Design Fiction. Through these explorations, I highlight the broader implications and novel research pathways for HCI to shape and be shaped by the global Open Design movement.

ContributorsFernando, Kattak Kuttige Rex Piyum (Author) / Kuznetsov, Anastasia (Thesis advisor) / Turaga, Pavan (Committee member) / Middel, Ariane (Committee member) / Takamura, John (Committee member) / Arizona State University (Publisher)

Created2020

Image Restoration for Non-Traditional Camera Systems

Description

Cameras have become commonplace with wide-ranging applications of phone photography, computer vision, and medical imaging. With a growing need to reduce size and costs while maintaining image quality, the need to look past traditional style of cameras is becoming more apparent. Several non-traditional cameras have shown to be promising options…

Cameras have become commonplace with wide-ranging applications of phone photography, computer vision, and medical imaging. With a growing need to reduce size and costs while maintaining image quality, the need to look past traditional style of cameras is becoming more apparent. Several non-traditional cameras have shown to be promising options for size-constraint applications, and while they may offer several advantages, they also usually are limited by image quality degradation due to optical or a need to reconstruct a captured image. In this thesis, we take a look at three of these non-traditional cameras: a pinhole camera, a diffusion-mask lensless camera, and an under-display camera (UDC).

For each of these cases, I present a feasible image restoration pipeline to correct for their particular limitations. For the pinhole camera, I present an early pipeline to allow for practical pinhole photography by reducing noise levels caused by low-light imaging, enhancing exposure levels, and sharpening the blur caused by the pinhole. For lensless cameras, we explore a neural network architecture that performs joint image reconstruction and point spread function (PSF) estimation to robustly recover images captured with multiple PSFs from different cameras. Using adversarial learning, this approach achieves improved reconstruction results that do not require explicit knowledge of the PSF at test-time and shows an added improvement in the reconstruction model’s ability to generalize to variations in the camera’s PSF. This allows lensless cameras to be utilized in a wider range of applications that require multiple cameras without the need to explicitly train a separate model for each new camera. For UDCs, we utilize a multi-stage approach to correct for low light transmission, blur, and haze. This pipeline uses a PyNET deep neural network architecture to perform a majority of the restoration, while additionally using a traditional optimization approach which is then fused in a learned manner in the second stage to improve high-frequency features. I show results from this novel fusion approach that is on-par with the state of the art.

ContributorsRego, Joshua D (Author) / Jayasuriya, Suren (Thesis advisor) / Blain Christen, Jennifer (Thesis advisor) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2020

Deep Learning Approaches for Inferring Collective Macrostates from Individual Observations in Natural and Artificial Multi-Agent Systems Under Realistic Constraints

Description

A complex social system, whether artificial or natural, can possess its macroscopic properties as a collective, which may change in real time as a result of local behavioral interactions among a number of agents in it. If a reliable indicator is available to abstract the macrolevel states, decision makers could…

A complex social system, whether artificial or natural, can possess its macroscopic properties as a collective, which may change in real time as a result of local behavioral interactions among a number of agents in it. If a reliable indicator is available to abstract the macrolevel states, decision makers could use it to take a proactive action, whenever needed, in order for the entire system to avoid unacceptable states or con-verge to desired ones. In realistic scenarios, however, there can be many challenges in learning a model of dynamic global states from interactions of agents, such as 1) high complexity of the system itself, 2) absence of holistic perception, 3) variability of group size, 4) biased observations on state space, and 5) identification of salient behavioral cues. In this dissertation, I introduce useful applications of macrostate estimation in complex multi-agent systems and explore effective deep learning frameworks to ad-dress the inherited challenges. First of all, Remote Teammate Localization (ReTLo)is developed in multi-robot teams, in which an individual robot can use its local interactions with a nearby robot as an information channel to estimate the holistic view of the group. Within the problem, I will show (a) learning a model of a modular team can generalize to all others to gain the global awareness of the team of variable sizes, and (b) active interactions are necessary to diversify training data and speed up the overall learning process. The complexity of the next focal system escalates to a colony of over 50 individual ants undergoing 18-day social stabilization since a chaotic event. I will utilize this natural platform to demonstrate, in contrast to (b), (c)monotonic samples only from “before chaos” can be sufficient to model the panicked society, and (d) the model can also be used to discover salient behaviors to precisely predict macrostates.

ContributorsChoi, Taeyeong (Author) / Pavlic, Theodore (Thesis advisor) / Richa, Andrea (Committee member) / Ben Amor, Heni (Committee member) / Yang, Yezhou (Committee member) / Liebig, Juergen (Committee member) / Arizona State University (Publisher)

Created2020

On Feature Saliency and Deep Neural Networks

Description

Technological advances have allowed for the assimilation of a variety of data, driving a shift away from the use of simpler and constrained patterns to more complex and diverse patterns in retrieval and analysis of such data. This shift has inundated the conventional techniques and has stressed the need for…

Technological advances have allowed for the assimilation of a variety of data, driving a shift away from the use of simpler and constrained patterns to more complex and diverse patterns in retrieval and analysis of such data. This shift has inundated the conventional techniques and has stressed the need for intelligent mechanisms that can model the complex patterns in the data. Deep neural networks have shown some success at capturing complex patterns, including the so-called attentioned networks, have significant shortcomings in distinguishing what is important in data from what is noise. This dissertation observes that the traditional neural networks primarily rely solely on gradient-based learning to model deep features maps while ignoring the key insight in the data that can be leveraged as complementary information to help learn an accurate model. In particular, this dissertation shows that the localized multi-scale features (captured implicitly or explicitly) can be leveraged to help improve model performance as these features capture salient informative points in the data.

This dissertation focuses on “working with the data, not just on data”, i.e. leveraging feature saliency through pre-training, in-training, and post-training analysis of the data. In particular, non-neural localized multi-scale feature extraction, in images and time series, are relatively cheap to obtain and can provide a rough overview of the patterns in the data. Furthermore, localized features coupled with deep features can help learn a high performing network. A pre-training analysis of sizes, complexities, and distribution of these localized features can help intelligently allocate a user-provided kernel budget in the network as a single-shot hyper-parameter search. Additionally, these localized features can be used as a secondary input modality to the network for cross-attention. Retraining pre-trained networks can be a costly process, yet, a post-training analysis of model inferences can allow for learning the importance of individual network parameters to the model inferences thus facilitating a retraining-free network sparsification with minimal impact on the model performance. Furthermore, effective in-training analysis of the intermediate features in the network help learn the importance of individual intermediate features (neural attention) and this analysis can be achieved through simulating local-extrema detection or learning features simultaneously and understanding their co-occurrences. In summary, this dissertation argues and establishes that, if appropriately leveraged, localized features and their feature saliency can help learn high-accurate, yet cheaper networks.

ContributorsGarg, Yash (Author) / Candan, K. Selcuk (Thesis advisor) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Sapino, Maria Luisa (Committee member) / Arizona State University (Publisher)

Created2020

Exploring the Impact of Augmented Reality on Collaborative Decision-Making in Small Teams

Description

While signiﬁcant qualitative, user study-focused research has been done on augmented reality, relatively few studies have been conducted on multiple, co-located synchronously collaborating users in augmented reality. Recognizing the need for more collaborative user studies in augmented reality and the value such studies present, a user study is conducted of…

While signiﬁcant qualitative, user study-focused research has been done on augmented reality, relatively few studies have been conducted on multiple, co-located synchronously collaborating users in augmented reality. Recognizing the need for more collaborative user studies in augmented reality and the value such studies present, a user study is conducted of collaborative decision-making in augmented reality to investigate the following research question: “Does presenting data visualizations in augmented reality inﬂuence the collaborative decision-making behaviors of a team?” This user study evaluates how viewing data visualizations with augmented reality headsets impacts collaboration in small teams compared to viewing together on a single 2D desktop monitor as a baseline. Teams of two participants performed closed and open-ended evaluation tasks to collaboratively analyze data visualized in both augmented reality and on a desktop monitor. Multiple means of collecting and analyzing data were employed to develop a well-rounded context for results and conclusions, including software logging of participant interactions, qualitative analysis of video recordings of participant sessions, and pre- and post-study participant questionnaires. The results indicate that augmented reality doesn’t signiﬁcantly change the quantity of team member communication but does impact the means and strategies participants use to collaborate.

ContributorsKintscher, Michael (Author) / Bryan, Chris (Thesis advisor) / Amresh, Ashish (Thesis advisor) / Hansford, Dianne (Committee member) / Johnson, Erik (Committee member) / Arizona State University (Publisher)

Created2020

Anticipatory and Invisible Interfaces to Address Impaired Proprioception in Neurological Disorders

Description

The burden of adaptation has been a major limiting factor in the adoption rates of new wearable assistive technologies. This burden has created a necessity for the exploration and combination of two key concepts in the development of upcoming wearables: anticipation and invisibility. The combination of these two topics has…

The burden of adaptation has been a major limiting factor in the adoption rates of new wearable assistive technologies. This burden has created a necessity for the exploration and combination of two key concepts in the development of upcoming wearables: anticipation and invisibility. The combination of these two topics has created the field of Anticipatory and Invisible Interfaces (AII)

In this dissertation, a novel framework is introduced for the development of anticipatory devices that augment the proprioceptive system in individuals with neurodegenerative disorders in a seamless way that scaffolds off of existing cognitive feedback models. The framework suggests three main categories of consideration in the development of devices which are anticipatory and invisible:

• Idiosyncratic Design: How do can a design encapsulate the unique characteristics of the individual in the design of assistive aids?

• Adaptation to Intrapersonal Variations: As individuals progress through the various stages of a disability
eurological disorder, how can the technology adapt thresholds for feedback over time to address these shifts in ability?

• Context Aware Invisibility: How can the mechanisms of interaction be modified in order to reduce cognitive load?

The concepts proposed in this framework can be generalized to a broad range of domains; however, there are two primary applications for this work: rehabilitation and assistive aids. In preliminary studies, the framework is applied in the areas of Parkinsonian freezing of gait anticipation and the anticipation of body non-compliance during rehabilitative exercise.

ContributorsTadayon, Arash (Author) / Panchanathan, Sethuraman (Thesis advisor) / McDaniel, Troy (Committee member) / Krishnamurthi, Narayanan (Committee member) / Davulcu, Hasan (Committee member) / Li, Baoxin (Committee member) / Arizona State University (Publisher)

Created2020

Multi-Perspective Semantic Information Retrieval in the Biomedical Domain

Description

Information Retrieval (IR) is the task of obtaining pieces of data (such as documents or snippets of text) that are relevant to a particular query or need from a large repository of information. IR is a valuable component of several downstream Natural Language Processing (NLP) tasks, such as…

Information Retrieval (IR) is the task of obtaining pieces of data (such as documents or snippets of text) that are relevant to a particular query or need from a large repository of information. IR is a valuable component of several downstream Natural Language Processing (NLP) tasks, such as Question Answering. Practically, IR is at the heart of many widely-used technologies like search engines.

While probabilistic ranking functions, such as the Okapi BM25 function, have been utilized in IR systems since the 1970's, modern neural approaches pose certain advantages compared to their classical counterparts. In particular, the release of BERT (Bidirectional Encoder Representations from Transformers) has had a significant impact in the NLP community by demonstrating how the use of a Masked Language Model (MLM) trained on a considerable corpus of data can improve a variety of downstream NLP tasks, including sentence classification and passage re-ranking.

IR Systems are also important in the biomedical and clinical domains. Given the continuously-increasing amount of scientific literature across biomedical domain, the ability find answers to specific clinical queries from a repository of millions of articles is a matter of practical value to medics, doctors, and other medical professionals. Moreover, there are domain-specific challenges present in the biomedical domain, including handling clinical jargon and evaluating the similarity or relatedness of various medical symptoms when determining the relevance between a query and a sentence.

This work presents contributions to several aspects of the Biomedical Semantic Information Retrieval domain. First, it introduces Multi-Perspective Sentence Relevance, a novel methodology of utilizing BERT-based models for contextual IR. The system is evaluated using the BioASQ Biomedical IR Challenge. Finally, practical contributions in the form of a live IR system for medics and a proposed challenge on the Living Systematic Review clinical task are provided.

ContributorsRawal, Samarth (Author) / Baral, Chitta (Thesis advisor) / Devarakonda, Murthy (Committee member) / Anwar, Saadat (Committee member) / Arizona State University (Publisher)

Created2020

Interpretable Question Answering using Deep Embedded Knowledge Reasoning to Solve Qualitative Word Problems

Description

One of the measures to determine the intelligence of a system is through Question Answering, as it requires a system to comprehend a question and reason using its knowledge base to accurately answer it. Qualitative word problems are an important subset of such problems, as they require a system to…

One of the measures to determine the intelligence of a system is through Question Answering, as it requires a system to comprehend a question and reason using its knowledge base to accurately answer it. Qualitative word problems are an important subset of such problems, as they require a system to recognize and reason with qualitative knowledge expressed in natural language. Traditional approaches in this domain include multiple modules to parse a given problem and to perform the required reasoning. Recent approaches involve using large pre-trained Language models like the Bidirection Encoder Representations from Transformers for downstream question answering tasks through supervision. These approaches however either suffer from errors between multiple modules, or are not interpretable with respect to the reasoning process employed. The proposed solution in this work aims to overcome these drawbacks through a single end-to-end trainable model that performs both the required parsing and reasoning. The parsing is achieved through an attention mechanism, whereas the reasoning is performed in vector space using soft logic operations. The model also enforces constraints in the form of auxiliary loss terms to increase the interpretability of the underlying reasoning process. The work achieves state of the art accuracy on the QuaRel dataset and matches that of the QuaRTz dataset with additional interpretability.

ContributorsNarayana, Sanjay (Author) / Baral, Chitta (Thesis advisor) / Mitra, Arindam (Committee member) / Anwar, Saadat (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by