Matching Items (14)
Filtering by

Clear all filters

134286-Thumbnail Image.png
Description
Many researchers aspire to create robotics systems that assist humans in common office tasks, especially by taking over delivery and messaging tasks. For meaningful interactions to take place, a mobile robot must be able to identify the humans it interacts with and communicate successfully with them. It must also be

Many researchers aspire to create robotics systems that assist humans in common office tasks, especially by taking over delivery and messaging tasks. For meaningful interactions to take place, a mobile robot must be able to identify the humans it interacts with and communicate successfully with them. It must also be able to successfully navigate the office environment. While mobile robots are well suited for navigating and interacting with elements inside a deterministic office environment, attempting to interact with human beings in an office environment remains a challenge due to the limits on the amount of cost-efficient compute power onboard the robot. In this work, I propose the use of remote cloud services to offload intensive interaction tasks. I detail the interactions required in an office environment and discuss the challenges faced when implementing a human-robot interaction platform in a stochastic office environment. I also experiment with cloud services for facial recognition, speech recognition, and environment navigation and discuss my results. As part of my thesis, I have implemented a human-robot interaction system utilizing cloud APIs into a mobile robot, enabling it to navigate the office environment, identify humans within the environment, and communicate with these humans.
Created2017-05
168694-Thumbnail Image.png
Description
Retinotopic map, the map between visual inputs on the retina and neuronal activation in brain visual areas, is one of the central topics in visual neuroscience. For human observers, the map is typically obtained by analyzing functional magnetic resonance imaging (fMRI) signals of cortical responses to slowly moving visual stimuli

Retinotopic map, the map between visual inputs on the retina and neuronal activation in brain visual areas, is one of the central topics in visual neuroscience. For human observers, the map is typically obtained by analyzing functional magnetic resonance imaging (fMRI) signals of cortical responses to slowly moving visual stimuli on the retina. Biological evidences show the retinotopic mapping is topology-preserving/topological (i.e. keep the neighboring relationship after human brain process) within each visual region. Unfortunately, due to limited spatial resolution and the signal-noise ratio of fMRI, state of art retinotopic map is not topological. The topic was to model the topology-preserving condition mathematically, fix non-topological retinotopic map with numerical methods, and improve the quality of retinotopic maps. The impose of topological condition, benefits several applications. With the topological retinotopic maps, one may have a better insight on human retinotopic maps, including better cortical magnification factor quantification, more precise description of retinotopic maps, and potentially better exam ways of in Ophthalmology clinic.
ContributorsTu, Yanshuai (Author) / Wang, Yalin (Thesis advisor) / Lu, Zhong-Lin (Committee member) / Crook, Sharon (Committee member) / Yang, Yezhou (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)
Created2022
168454-Thumbnail Image.png
Description
Federated Learning (FL) is envisaged to be a promising solution for collaboratively training a machine learning model while keeping the training data decentralized and private. Instead of sharing raw data to the central entity, the participating client devices share focused updates for aggregation to ensure global convergence of the model.

Federated Learning (FL) is envisaged to be a promising solution for collaboratively training a machine learning model while keeping the training data decentralized and private. Instead of sharing raw data to the central entity, the participating client devices share focused updates for aggregation to ensure global convergence of the model. Owing to the shortcomings of manually handcrafted neural network architectures, the research community is striving to develop Neural Architecture Search (NAS) approaches to automatically search for optimal networks that fit the clients’ data. Despite the inaccessibility of clients’ data in an FL setting, the federated NAS literature has recently witnessed great progress to apply these NAS techniques to an FL setting. However, one of the key bottlenecks of Federated Learning is the cost of communication between clients and the server, and the state-of-the-art federated NAS techniques search for networks with millions of parameters that require several rounds of communication to find the optimal weight parameters. Also, deploying a network having millions of parameters on edge devices (which are the typical participants in an FL process) is infeasible due to its computational limitations and increased latency. Thus, this work proposes Weight-Agnostic Federated Neural Architecture Search (WFNAS), a novel evolutionary framework to search for well-performing and minimally connected weight-agnostic network architectures in an FL setting. As the connectivity of the networks themselves is the solution, there is no need for weight training and hyperparameter tuning, reducing the communication overhead significantly. The experiments indicate a gain of nearly 40% for orthogonal (vertical FL) data distributions compared to local training. This work is the first federated NAS technique in the literature for vertical FL. Although the experiments are performed in a resource-constrained environment, the aim of this thesis is to show a new direction of research to the FL community.
ContributorsThakkar, Om (Author) / Bazzi, Rida (Thesis advisor) / Li, Baoxin (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)
Created2021
193439-Thumbnail Image.png
Description
In contemporary society, the proliferation of fake identity documents presents a profound menace that permeates various facets of the social fabric. The advent of artificial intelligence coupled with sophisticated printing techniques has significantly exacerbated this issue. The ramifications of counterfeit identity documents extend far beyond the legal infractions and financial

In contemporary society, the proliferation of fake identity documents presents a profound menace that permeates various facets of the social fabric. The advent of artificial intelligence coupled with sophisticated printing techniques has significantly exacerbated this issue. The ramifications of counterfeit identity documents extend far beyond the legal infractions and financial losses incurred by victims of identity theft because they pose a severe threat to public safety, national security, and societal trust. Given these multifaceted threats, the imperative to detect and thwart fraud identity documents has become paramount. The efficacy of fraud detection tools is contingent upon the availability of extensive identity document datasets for training purposes. However, existing benchmark datasets such as MIDV-500, MIDV-2020, and FMIDV exhibit notable deficiencies such as a limited number of samples, insufficient coverage of various fraud patterns, and occasional alterations in critical personal identifier fields, particularly portrait images. These limitations constrain their effectiveness in training models capable of detecting realistic fraud instances while also safeguarding privacy. This thesis delineates the research work to address this gap by proposing a streamlined pipeline for generating synthetic identity documents and introducing the resultant benchmark dataset, named IDNet. IDNet is meticulously crafted to propel advancements in privacy-preserving fraud detection initiatives and comprises 597,900 images of synthetically generated identity documents, amounting to approximately 350 gigabytes of data. These documents are categorized into 20 types, encompassing identity documents from 10 U.S. states and 10 European countries. Additionally, the dataset includes identity documents consisting of either a single fraud pattern or multiple fraud patterns, to cater to various model training requirements.
ContributorsNag, Soham (Author) / Zou, Jia (Thesis advisor) / Yang, Yingzhen (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)
Created2024
193680-Thumbnail Image.png
Description
Recent advances in Artificial Intelligence (AI) have brought AI closer to laypeople than ever before. This leads to a pervasive problem: how would a user ascertain whether an AI system will be safe, reliable, or useful in a given situation? This problem becomes particularly challenging when it is considered that

Recent advances in Artificial Intelligence (AI) have brought AI closer to laypeople than ever before. This leads to a pervasive problem: how would a user ascertain whether an AI system will be safe, reliable, or useful in a given situation? This problem becomes particularly challenging when it is considered that most autonomous systems are not designed by their users; the internal software of these systems may be unavailable or difficult to understand; and the functionality of these systems may even change from initial specifications as a result of learning. To overcome these challenges, this dissertation proposes a paradigm for third-party autonomous assessment of black-box taskable AI systems. The four main desiderata of such assessment systems are: (i) interpretability: generating a description of the AI system's functionality in a language that the target user can understand; (ii) correctness: ensuring that the description of AI system's working is accurate; (iii) generalizability creating a solution approach that works well for different types of AI systems; and (iv) minimal requirements: creating an assessment system that does not place complex requirements on AI systems to support the third-party assessment, otherwise the manufacturers of AI system's might not support such an assessment. To satisfy these properties, this dissertation presents algorithms and requirements that would enable user-aligned autonomous assessment that helps the user understand the limits of a black-box AI system's safe operability. This dissertation proposes a personalized AI assessment module that discovers the high-level ``capabilities'' of an AI system with arbitrary internal planning algorithms/policies and learns an accurate symbolic description of these capabilities in terms of concepts that a user understands. Furthermore, the dissertation includes the associated theoretical results and the empirical evaluations. The results show that (i) a primitive query-response interface can enable the development of autonomous assessment modules that can derive a causally accurate user-interpretable model of the system's capabilities efficiently, and (ii) such descriptions are easier to understand and reason with for the users than the agent's primitive actions.
ContributorsVerma, Pulkit (Author) / Srivastava, Siddharth (Thesis advisor) / Cooke, Nancy (Committee member) / Fainekos, Georgios (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)
Created2024
156887-Thumbnail Image.png
Description
Computer vision technology automatically extracts high level, meaningful information from visual data such as images or videos, and the object recognition and detection algorithms are essential in most computer vision applications. In this dissertation, we focus on developing algorithms used for real life computer vision applications, presenting innovative algorithms for

Computer vision technology automatically extracts high level, meaningful information from visual data such as images or videos, and the object recognition and detection algorithms are essential in most computer vision applications. In this dissertation, we focus on developing algorithms used for real life computer vision applications, presenting innovative algorithms for object segmentation and feature extraction for objects and actions recognition in video data, and sparse feature selection algorithms for medical image analysis, as well as automated feature extraction using convolutional neural network for blood cancer grading.

To detect and classify objects in video, the objects have to be separated from the background, and then the discriminant features are extracted from the region of interest before feeding to a classifier. Effective object segmentation and feature extraction are often application specific, and posing major challenges for object detection and classification tasks. In this dissertation, we address effective object flow based ROI generation algorithm for segmenting moving objects in video data, which can be applied in surveillance and self driving vehicle areas. Optical flow can also be used as features in human action recognition algorithm, and we present using optical flow feature in pre-trained convolutional neural network to improve performance of human action recognition algorithms. Both algorithms outperform the state-of-the-arts at their time.

Medical images and videos pose unique challenges for image understanding mainly due to the fact that the tissues and cells are often irregularly shaped, colored, and textured, and hand selecting most discriminant features is often difficult, thus an automated feature selection method is desired. Sparse learning is a technique to extract the most discriminant and representative features from raw visual data. However, sparse learning with \textit{L1} regularization only takes the sparsity in feature dimension into consideration; we improve the algorithm so it selects the type of features as well; less important or noisy feature types are entirely removed from the feature set. We demonstrate this algorithm to analyze the endoscopy images to detect unhealthy abnormalities in esophagus and stomach, such as ulcer and cancer. Besides sparsity constraint, other application specific constraints and prior knowledge may also need to be incorporated in the loss function in sparse learning to obtain the desired results. We demonstrate how to incorporate similar-inhibition constraint, gaze and attention prior in sparse dictionary selection for gastroscopic video summarization that enable intelligent key frame extraction from gastroscopic video data. With recent advancement in multi-layer neural networks, the automatic end-to-end feature learning becomes feasible. Convolutional neural network mimics the mammal visual cortex and can extract most discriminant features automatically from training samples. We present using convolutinal neural network with hierarchical classifier to grade the severity of Follicular Lymphoma, a type of blood cancer, and it reaches 91\% accuracy, on par with analysis by expert pathologists.

Developing real world computer vision applications is more than just developing core vision algorithms to extract and understand information from visual data; it is also subject to many practical requirements and constraints, such as hardware and computing infrastructure, cost, robustness to lighting changes and deformation, ease of use and deployment, etc.The general processing pipeline and system architecture for the computer vision based applications share many similar design principles and architecture. We developed common processing components and a generic framework for computer vision application, and a versatile scale adaptive template matching algorithm for object detection. We demonstrate the design principle and best practices by developing and deploying a complete computer vision application in real life, building a multi-channel water level monitoring system, where the techniques and design methodology can be generalized to other real life applications. The general software engineering principles, such as modularity, abstraction, robust to requirement change, generality, etc., are all demonstrated in this research.
ContributorsCao, Jun (Author) / Li, Baoxin (Thesis advisor) / Liu, Huan (Committee member) / Zhang, Yu (Committee member) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)
Created2018
154868-Thumbnail Image.png
Description
Robots are becoming an important part of our life and industry. Although a lot of robot control interfaces have been developed to simplify the control method and improve user experience, users still cannot control robots comfortably. With the improvements of the robot functions, the requirements of universality and ease of

Robots are becoming an important part of our life and industry. Although a lot of robot control interfaces have been developed to simplify the control method and improve user experience, users still cannot control robots comfortably. With the improvements of the robot functions, the requirements of universality and ease of use of robot control interfaces are also increasing. This research introduces a graphical interface for Linear Temporal Logic (LTL) specifications for mobile robots. It is a sketch based interface built on the Android platform which makes the LTL control interface more friendly to non-expert users. By predefining a set of areas of interest, this interface can quickly and efficiently create plans that satisfy extended plan goals in LTL. The interface can also allow users to customize the paths for this plan by sketching a set of reference trajectories. Given the custom paths by the user, the LTL specification and the environment, the interface generates a plan balancing the customized paths and the LTL specifications. We also show experimental results with the implemented interface.
ContributorsWei, Wei (Author) / Fainekos, Georgios (Thesis advisor) / Amor, Hani Ben (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)
Created2016
156291-Thumbnail Image.png
Description
Research literature was reviewed to find recommended tools and technologies for operating Unmanned Aerial Systems (UAS) fleets in an urban environment. However, restrictive legislation prohibits fully autonomous flight without an operator. Existing literature covers considerations for operating UAS fleets in a controlled environment, with an emphasis on the effect different

Research literature was reviewed to find recommended tools and technologies for operating Unmanned Aerial Systems (UAS) fleets in an urban environment. However, restrictive legislation prohibits fully autonomous flight without an operator. Existing literature covers considerations for operating UAS fleets in a controlled environment, with an emphasis on the effect different networking approaches have on the topology of the UAS network. The primary network topology used to implement UAS communications is 802.11 protocols, which can transmit telemetry and a video stream using off the shelf hardware. Other implementations use low-frequency radios for long distance communication, or higher latency 4G LTE modems to access existing network infrastructure. However, a gap remains testing different network topologies outside of a controlled environment.

With the correct permits in place, further research can explore how different UAS network topologies behave in an urban environment when implemented with off the shelf UAS hardware. In addition to testing different network topologies, this thesis covers the implementation of building a secure, scalable system using modern cloud computation tools and services capable of supporting a variable number of UAS. The system also supports the end-to-end simulation of the system considering factors such as battery life and realistic UAS kinematics. The implementation of the system leads to new findings needed to deploy UAS fleets in urban environments.
ContributorsD'Souza, Daniel (Author) / Panchanathan, Sethuraman (Thesis advisor) / Berman, Spring (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)
Created2018
155536-Thumbnail Image.png
Description
Several physical systems exist in the real world that involve continuous as well as discrete changes. These range from natural dynamic systems like the system of a bouncing ball to robotic dynamic systems such as planning the motion of a robot across obstacles. The key aspects of effectively describing such

Several physical systems exist in the real world that involve continuous as well as discrete changes. These range from natural dynamic systems like the system of a bouncing ball to robotic dynamic systems such as planning the motion of a robot across obstacles. The key aspects of effectively describing such dynamic systems is to be able to plan and verify the evolution of the continuous components of the system while simultaneously maintaining critical constraints. Developing a framework that can effectively represent and find solutions to such physical systems prove to be highly advantageous. Both hybrid automata and action languages are formal models for describing the evolution of dynamic systems. The action language C+ is a rich and expressive language framework to formalize physical systems, but can be used only with physical systems in the discrete domain and is limited in its support of continuous domain components of such systems. Hybrid Automata is a well established formalism used to represent such complex physical systems at a theoretical level, however it is not expressive enough to capture the complex relations between the components of the system the way C+ does.

This thesis will focus on establishing a formal relationship between these two formalisms by showing how to succinctly represent Hybrid Automata in an action language which in turn is defined as a high-level notation for answer set programming modulo theories (ASPMT) --- an extension of answer set programs in the first-order level. Furthermore, this encoding framework is shown to be more effective and expressive than Hybrid Automata by highlighting its ability in allowing states of a hybrid transition system to be defined by complex relations among components that would otherwise be abstracted away in Hybrid Automata. The framework is further realized in the implementation of the system CPLUS2ASPMT, which takes advantage of state of the art ODE(Ordinary Differential Equations) based SMT solver dReal to provide support for ODE based evolution of continuous components of a dynamic system.
ContributorsLoney, Nikhil (Author) / Lee, Joohyung (Thesis advisor) / Fainekos, Georgios (Committee member) / Zhang, Yu (Committee member) / Arizona State University (Publisher)
Created2017
158256-Thumbnail Image.png
Description
There have been multiple attempts of coupling neural networks with external memory components for sequence learning problems. Such architectures have demonstrated success in algorithmic, sequence transduction, question-answering and reinforcement learning tasks. Most notable of these attempts is the Neural Turing Machine (NTM), which is an implementation of the Turing Machine

There have been multiple attempts of coupling neural networks with external memory components for sequence learning problems. Such architectures have demonstrated success in algorithmic, sequence transduction, question-answering and reinforcement learning tasks. Most notable of these attempts is the Neural Turing Machine (NTM), which is an implementation of the Turing Machine with a neural network controller that interacts with a continuous memory. Although the architecture is Turing complete and hence, universally computational, it has seen limited success with complex real-world tasks.

In this thesis, I introduce an extension of the Neural Turing Machine, the Neural Harvard Machine, that implements a fully differentiable Harvard Machine framework with a feed-forward neural network controller. Unlike the NTM, it has two different memories - a read-only program memory and a read-write data memory. A sufficiently complex task is divided into smaller, simpler sub-tasks and the program memory stores parameters of pre-trained networks trained on these sub-tasks. The controller reads inputs from an input-tape, uses the data memory to store valuable signals and writes correct symbols to an output tape. The output symbols are a function of the outputs of each sub-network and the state of the data memory. Hence, the controller learns to load the weights of the appropriate program network to generate output symbols.

A wide range of experiments demonstrate that the Harvard Machine framework learns faster and performs better than the NTM and RNNs like LSTM, as the complexity of tasks increases.
ContributorsBhatt, Manthan Bharat (Author) / Ben Amor, Hani (Thesis advisor) / Zhang, Yu (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2020