Matching Items (129)
Filtering by

Clear all filters

161371-Thumbnail Image.png
Description
A skin lesion is a part of the skin which has an uncommon growth or appearance in comparison with the skin around it. While most are harmless, some can be warnings of skin cancer. Melanoma is the deadliest form of skin cancer and its early detection in dermoscopic images is

A skin lesion is a part of the skin which has an uncommon growth or appearance in comparison with the skin around it. While most are harmless, some can be warnings of skin cancer. Melanoma is the deadliest form of skin cancer and its early detection in dermoscopic images is crucial and results in increase in the survival rate. The clinical ABCD (asymmetry, border irregularity, color variation and diameter greater than 6mm) rule is one of the most widely used method for early melanoma recognition. However, accurate classification of melanoma is still extremely difficult due to following reasons(not limited to): great visual resemblance between melanoma and non-melanoma skin lesions, less contrast difference between skin and the lesions etc. There is an ever-growing need of correct and reliable detection of skin cancers. Advances in the field of deep learning deems it perfect for the task of automatic detection and is very useful to pathologists as they aid them in terms of efficiency and accuracy. In this thesis various state of the art deep learning frameworks are used. An analysis of their parameters is done, innovative techniques are implemented to address the challenges faced in the tasks, segmentation, and classification in skin lesions.• Segmentation is task of dividing out regions of interest. This is used to only keep the ROI and separate it from its background. • Classification is the task of assigning the image a class, i.e., Melanoma(Cancer) and Nevus(Not Cancer). A pre-trained model is used and fine-tuned as per the needs of the given problem statement/dataset. Experimental results show promise as the implemented techniques reduce the false negatives rate, i.e., neural network is less likely to misclassify a melanoma.
ContributorsVerma, Vivek (Author) / Motsch, Sebastien (Thesis advisor) / Berman, Spring (Thesis advisor) / Zhuang, Houlong (Committee member) / Arizona State University (Publisher)
Created2021
158896-Thumbnail Image.png
Description
Cameras have become commonplace with wide-ranging applications of phone photography, computer vision, and medical imaging. With a growing need to reduce size and costs while maintaining image quality, the need to look past traditional style of cameras is becoming more apparent. Several non-traditional cameras have shown to be promising options

Cameras have become commonplace with wide-ranging applications of phone photography, computer vision, and medical imaging. With a growing need to reduce size and costs while maintaining image quality, the need to look past traditional style of cameras is becoming more apparent. Several non-traditional cameras have shown to be promising options for size-constraint applications, and while they may offer several advantages, they also usually are limited by image quality degradation due to optical or a need to reconstruct a captured image. In this thesis, we take a look at three of these non-traditional cameras: a pinhole camera, a diffusion-mask lensless camera, and an under-display camera (UDC).

For each of these cases, I present a feasible image restoration pipeline to correct for their particular limitations. For the pinhole camera, I present an early pipeline to allow for practical pinhole photography by reducing noise levels caused by low-light imaging, enhancing exposure levels, and sharpening the blur caused by the pinhole. For lensless cameras, we explore a neural network architecture that performs joint image reconstruction and point spread function (PSF) estimation to robustly recover images captured with multiple PSFs from different cameras. Using adversarial learning, this approach achieves improved reconstruction results that do not require explicit knowledge of the PSF at test-time and shows an added improvement in the reconstruction model’s ability to generalize to variations in the camera’s PSF. This allows lensless cameras to be utilized in a wider range of applications that require multiple cameras without the need to explicitly train a separate model for each new camera. For UDCs, we utilize a multi-stage approach to correct for low light transmission, blur, and haze. This pipeline uses a PyNET deep neural network architecture to perform a majority of the restoration, while additionally using a traditional optimization approach which is then fused in a learned manner in the second stage to improve high-frequency features. I show results from this novel fusion approach that is on-par with the state of the art.
ContributorsRego, Joshua D (Author) / Jayasuriya, Suren (Thesis advisor) / Blain Christen, Jennifer (Thesis advisor) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2020
158851-Thumbnail Image.png
Description
Most planning agents assume complete knowledge of the domain, which may not be the case in scenarios where certain domain knowledge is missing. This problem could be due to design flaws or arise from domain ramifications or qualifications. In such cases, planning algorithms could produce highly undesirable behaviors. Planning with

Most planning agents assume complete knowledge of the domain, which may not be the case in scenarios where certain domain knowledge is missing. This problem could be due to design flaws or arise from domain ramifications or qualifications. In such cases, planning algorithms could produce highly undesirable behaviors. Planning with incomplete domain knowledge is more challenging than partial observability in the sense that the planning agent is unaware of the existence of such knowledge, in contrast to it being just unobservable or partially observable. That is the difference between known unknowns and unknown unknowns.

In this thesis, I introduce and formulate this as the problem of Domain Concretization, which is inverse to domain abstraction studied extensively before. Furthermore, I present a solution that starts from the incomplete domain model provided to the agent by the designer and uses teacher traces from human users to determine the candidate model set under a minimalistic model assumption. A robust plan is then generated for the maximum probability of success under the set of candidate models. In addition to a standard search formulation in the model-space, I propose a sample-based search method and also an online version of it to improve search time. The solution presented has been evaluated on various International Planning Competition domains where incompleteness was introduced by deleting certain predicates from the complete domain model. The solution is also tested in a robot simulation domain to illustrate its effectiveness in handling incomplete domain knowledge. The results show that the plan generated by the algorithm increases the plan success rate without impacting action cost too much.
ContributorsSharma, Akshay (Author) / Zhang, Yu (Thesis advisor) / Fainekos, Georgios (Committee member) / Srivastava, Siddharth (Committee member) / Arizona State University (Publisher)
Created2020
Description
Autonomous Driving (AD) systems are being researched and developed actively in recent days to solve the task of controlling the vehicles safely without human intervention. One method to solve such task is through deep Reinforcement Learning (RL) approach. In deep RL, the main objective is to find an optimal control

Autonomous Driving (AD) systems are being researched and developed actively in recent days to solve the task of controlling the vehicles safely without human intervention. One method to solve such task is through deep Reinforcement Learning (RL) approach. In deep RL, the main objective is to find an optimal control behavior, often called policy performed by an agent, which is AD system in this case. This policy is usually learned through Deep Neural Networks (DNNs) based on the observations that the agent perceives along with rewards feedback received from environment.However, recent studies demonstrated the vulnerability of such control policies learned through deep RL against adversarial attacks. This raises concerns about the application of such policies to risk-sensitive tasks like AD. Previous adversarial attacks assume that the threats can be broadly realized in two ways: First one is targeted attacks through manipu- lation of the agent’s complete observation in real time and the other is untargeted attacks through manipulation of objects in environment. The former assumes full access to the agent’s observations at almost all time, while the latter has no control over outcomes of attack. This research investigates the feasibility of targeted attacks through physical adver- sarial objects in the environment, a threat that combines the effectiveness and practicality. Through simulations on one of the popular AD systems, it is demonstrated that a fixed optimal policy can be malfunctioned over time by an attacker e.g., performing an unintended self-parking, when an adversarial object is present. The proposed approach is formulated in such a way that the attacker can learn a dynamics of the environment and also utilizes common knowledge of agent’s dynamics to realize the attack. Further, several experiments are conducted to show the effectiveness of the proposed attack on different driving scenarios empirically. Lastly, this work also studies robustness of object location, and trade-off between the attack strength and attack length based on proposed evaluation metrics.
ContributorsBuddareddygari, Prasanth (Author) / Yang, Yezhou (Thesis advisor) / Ren, Yi (Committee member) / Fainekos, Georgios (Committee member) / Arizona State University (Publisher)
Created2021
Description
Self-Driving cars are a long-lasting ambition for many AI scientists and engineers. In the last decade alone, many self-driving cars like Google Waymo, Tesla Autopilot, Uber, etc. have been roaming the streets of many cities. As a rapidly expanding field, researchers all over the world are attempting to develop more

Self-Driving cars are a long-lasting ambition for many AI scientists and engineers. In the last decade alone, many self-driving cars like Google Waymo, Tesla Autopilot, Uber, etc. have been roaming the streets of many cities. As a rapidly expanding field, researchers all over the world are attempting to develop more safe and efficient AI agents that can navigate through our cities. However, driving is a very complex task to master even for a human, let alone the challenges in developing robots to do the same. It requires attention and inputs from the surroundings of the car, and it is nearly impossible for us to program all the possible factors affecting this complex task. As a solution, imitation learning was introduced, wherein the agents learn a policy, mapping the observations to the actions through demonstrations given by humans. Through imitation learning, one could easily teach self-driving cars the expected behavior in many scenarios. Despite their autonomous nature, it is undeniable that humans play a vital role in the development and execution of safe and trustworthy self-driving cars and hence form the strongest link in this application of Human-Robot Interaction. Several approaches were taken to incorporate this link between humans and self-driving cars, one of which involves the communication of human's navigational instruction to self-driving cars. The communicative channel provides humans with control over the agent’s decisions as well as the ability to guide them in real-time. In this work, the abilities of imitation learning in creating a self-driving agent that can follow natural language instructions given by humans based on environmental objects’ descriptions were explored. The proposed model architecture is capable of handling latent temporal context in these instructions thus making the agent capable of taking multiple decisions along its course. The work shows promising results that push the boundaries of natural language instructions and their complexities in navigating self-driving cars through towns.
ContributorsMoudhgalya, Nithish B (Author) / Amor, Hani Ben (Thesis advisor) / Baral, Chitta (Committee member) / Yang, Yezhou (Committee member) / Zhang, Wenlong (Committee member) / Arizona State University (Publisher)
Created2021
161560-Thumbnail Image.png
Description
This work considers the task of vision-and-language inference (VLI): predicting whether an inputthe sentence is true for given images or videos and starts with an investigation of model robustness to a set of 13 linguistic transformations, categorized as Semantics-Preserving or Semantics-Inverting based on whether they change the meaning of the sentence. It

This work considers the task of vision-and-language inference (VLI): predicting whether an inputthe sentence is true for given images or videos and starts with an investigation of model robustness to a set of 13 linguistic transformations, categorized as Semantics-Preserving or Semantics-Inverting based on whether they change the meaning of the sentence. It is observed that existing VLI models degenerate to close-to-random performance when tested on these linguistic transformations which include simple phenomena such as synonyms, antonyms, negation, swap-ping of subject and object, paraphrasing, and the substitutions of pronouns, comparatives, and numbers. This observation is utilized to design STAT(Semantics-Transformed Adversarial Training) { a model-agnostic and task-agnostic min-max optimization algorithm, with an inner maximization that utilizes semantic perturbations of in-put sentences to nd adversarial samples and an outer maximization that updates model parameters. Extensive experiments on three benchmark datasets (NLVR2, VIOLIN, VQA \Yes-No") not only demonstrate large gains in robustness to adversarial input sentences but also show model-agnostic performance improvements. This works also presents the suite of linguistic transformations as a robustness benchmark that may benet future research in vision and language robustness.
ContributorsChaudhary, Abhishek (Author) / Yang, Yezhou Dr. (Thesis advisor) / Li, Baoxin Dr. (Committee member) / Baral, Chitta Dr. (Committee member) / Arizona State University (Publisher)
Created2021
161431-Thumbnail Image.png
Description
In videos that contain actions performed unintentionally, agents do not achieve their desired goals. In such videos, it is challenging for computer vision systems to understand high-level concepts such as goal-directed behavior. On the other hand, from a very early age, humans are able to understand the relation between an

In videos that contain actions performed unintentionally, agents do not achieve their desired goals. In such videos, it is challenging for computer vision systems to understand high-level concepts such as goal-directed behavior. On the other hand, from a very early age, humans are able to understand the relation between an agent and their ultimate goal even if the action gets disrupted or unintentional effects occur. Inculcating this ability in artificially intelligent agents would make them better social learners by not just learning from their own mistakes, i.e, reinforcement learning, but also learning from other's mistakes. For example, this could greatly reduce the search space for artificially intelligent agents for finding the correct action sequence when trying to achieve a new goal, since they would be able to learn from others what not to do as well as how/when actions result in undesired outcomes.To validate this ability of deep learning models to perform this task, the Weakly Augmented Oops (W-Oops) dataset is proposed, built upon the Oops dataset. W-Oops consists of 2,100 unintentional human action videos, with 44 goal-directed and 33 unintentional video-level activity labels collected through human annotations. Inspired by previous methods on tasks such as weakly supervised action localization which show promise for achieving good localization results without ground truth segment annotations, this paper proposes a weakly supervised algorithm for localizing the goal-directed as well as the unintentional temporal region of a video using only video-level labels. In particular, an attention mechanism based strategy is employed that predicts the temporal regions which contributes the most to a classification task, leveraging solely video-level labels. Meanwhile, our designed overlap regularization allows the model to focus on distinct portions of the video for inferring the goal-directed and unintentional activity, while guaranteeing their temporal ordering. Extensive quantitative experiments verify the validity of our localization method.
ContributorsChakravarthy, Arnav (Author) / Yang, Yezhou (Thesis advisor) / Davulcu, Hasan (Committee member) / Pavlic, Theodore (Committee member) / Arizona State University (Publisher)
Created2021
154734-Thumbnail Image.png
Description
The human motion is defined as an amalgamation of several physical traits such as bipedal locomotion, posture and manual dexterity, and mental expectation. In addition to the “positive” body form defined by these traits, casting light on the body produces a “negative” of the body: its shadow. We often interchangeably

The human motion is defined as an amalgamation of several physical traits such as bipedal locomotion, posture and manual dexterity, and mental expectation. In addition to the “positive” body form defined by these traits, casting light on the body produces a “negative” of the body: its shadow. We often interchangeably use with silhouettes in the place of shadow to emphasize indifference to interior features. In a manner of speaking, the shadow is an alter ego that imitates the individual.

The principal value of shadow is its non-invasive behaviour of reflecting precisely the actions of the individual it is attached to. Nonetheless we can still think of the body’s shadow not as the body but its alter ego.

Based on this premise, my thesis creates an experiential system that extracts the data related to the contour of your human shape and gives it a texture and life of its own, so as to emulate your movements and postures, and to be your extension. In technical terms, my thesis extracts abstraction from a pre-indexed database that could be generated from an offline data set or in real time to complement these actions of a user in front of a low-cost optical motion capture device like the Microsoft Kinect. This notion could be the system’s interpretation of the action which creates modularized art through the abstraction’s ‘similarity’ to the live action.

Through my research, I have developed a stable system that tackles various connotations associated with shadows and the need to determine the ideal features that contribute to the relevance of the actions performed. The implication of Factor Oracle [3] pattern interpretation is tested with a feature bin of videos. The system also is flexible towards several methods of Nearest Neighbours searches and a machine learning module to derive the same output. The overall purpose is to establish this in real time and provide a constant feedback to the user. This can be expanded to handle larger dynamic data.

In addition to estimating human actions, my thesis best tries to test various Nearest Neighbour search methods in real time depending upon the data stream. This provides a basis to understand varying parameters that complement human activity recognition and feature matching in real time.
ContributorsSeshasayee, Sudarshan Prashanth (Author) / Sha, Xin Wei (Thesis advisor) / Turaga, Pavan (Thesis advisor) / Tinapple, David A (Committee member) / Arizona State University (Publisher)
Created2016
155378-Thumbnail Image.png
Description
To ensure system integrity, robots need to proactively avoid any unwanted physical perturbation that may cause damage to the underlying hardware. In this thesis work, we investigate a machine learning approach that allows robots to anticipate impending physical perturbations from perceptual cues. In contrast to other approaches that require knowledge

To ensure system integrity, robots need to proactively avoid any unwanted physical perturbation that may cause damage to the underlying hardware. In this thesis work, we investigate a machine learning approach that allows robots to anticipate impending physical perturbations from perceptual cues. In contrast to other approaches that require knowledge about sources of perturbation to be encoded before deployment, our method is based on experiential learning. Robots learn to associate visual cues with subsequent physical perturbations and contacts. In turn, these extracted visual cues are then used to predict potential future perturbations acting on the robot. To this end, we introduce a novel deep network architecture which combines multiple sub- networks for dealing with robot dynamics and perceptual input from the environment. We present a self-supervised approach for training the system that does not require any labeling of training data. Extensive experiments in a human-robot interaction task show that a robot can learn to predict physical contact by a human interaction partner without any prior information or labeling. Furthermore, the network is able to successfully predict physical contact from either depth stream input or traditional video input or using both modalities as input.
ContributorsSur, Indranil (Author) / Amor, Heni B (Thesis advisor) / Fainekos, Georgios (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2017