Matching Items (107)
Filtering by

Clear all filters

161939-Thumbnail Image.png
Description
Traditional Reinforcement Learning (RL) assumes to learn policies with respect to reward available from the environment but sometimes learning in a complex domain requires wisdom which comes from a wide range of experience. In behavior based robotics, it is observed that a complex behavior can be described by a combination

Traditional Reinforcement Learning (RL) assumes to learn policies with respect to reward available from the environment but sometimes learning in a complex domain requires wisdom which comes from a wide range of experience. In behavior based robotics, it is observed that a complex behavior can be described by a combination of simpler behaviors. It is tempting to apply similar idea such that simpler behaviors can be combined in a meaningful way to tailor the complex combination. Such an approach would enable faster learning and modular design of behaviors. Complex behaviors can be combined with other behaviors to create even more advanced behaviors resulting in a rich set of possibilities. Similar to RL, combined behavior can keep evolving by interacting with the environment. The requirement of this method is to specify a reasonable set of simple behaviors. In this research, I present an algorithm that aims at combining behavior such that the resulting behavior has characteristics of each individual behavior. This approach has been inspired by behavior based robotics, such as the subsumption architecture and motor schema-based design. The combination algorithm outputs n weights to combine behaviors linearly. The weights are state dependent and change dynamically at every step in an episode. This idea is tested on discrete and continuous environments like OpenAI’s “Lunar Lander” and “Biped Walker”. Results are compared with related domains like Multi-objective RL, Hierarchical RL, Transfer learning, and basic RL. It is observed that the combination of behaviors is a novel way of learning which helps the agent achieve required characteristics. A combination is learned for a given state and so the agent is able to learn faster in an efficient manner compared to other similar approaches. Agent beautifully demonstrates characteristics of multiple behaviors which helps the agent to learn and adapt to the environment. Future directions are also suggested as possible extensions to this research.
ContributorsVora, Kevin Jatin (Author) / Zhang, Yu (Thesis advisor) / Yang, Yezhou (Committee member) / Praharaj, Sarbeswar (Committee member) / Arizona State University (Publisher)
Created2021
161967-Thumbnail Image.png
Description
Machine learning models can pick up biases and spurious correlations from training data and projects and amplify these biases during inference, thus posing significant challenges in real-world settings. One approach to mitigating this is a class of methods that can identify filter out bias-inducing samples from the training datasets to

Machine learning models can pick up biases and spurious correlations from training data and projects and amplify these biases during inference, thus posing significant challenges in real-world settings. One approach to mitigating this is a class of methods that can identify filter out bias-inducing samples from the training datasets to force models to avoid being exposed to biases. However, the filtering leads to a considerable wastage of resources as most of the dataset created is discarded as biased. This work deals with avoiding the wastage of resources by identifying and quantifying the biases. I further elaborate on the implications of dataset filtering on robustness (to adversarial attacks) and generalization (to out-of-distribution samples). The findings suggest that while dataset filtering does help to improve OOD(Out-Of-Distribution) generalization, it has a significant negative impact on robustness to adversarial attacks. It also shows that transforming bias-inducing samples into adversarial samples (instead of eliminating them from the dataset) can significantly boost robustness without sacrificing generalization.
ContributorsSachdeva, Bhavdeep Singh (Author) / Baral, Chitta (Thesis advisor) / Liu, Huan (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2021
162001-Thumbnail Image.png
Description
Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to the surrounding environment. Using human workers to clear out these

Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to the surrounding environment. Using human workers to clear out these trash is a hazardous and time-consuming task. Employing autonomous robots for these tasks is a better approach since it is more efficient and faster than humans. However, for a robot to clean the trash objects, a good detection algorithm is required. Real-time object detection on water surfaces is a challenging issue due to nature of the environment and the volatility of the water surface. In addition to this, running an object detection algorithm on an on-board processor of a robot limits the amount of CPU consumption that the algorithm can utilize. In this thesis, a computationally low cost object detection approach for robust detection of trash objects that was run on an on-board processor of a multirotor is presented. To account for specular reflections on the water surface, we use a polarization filter and integrate a specularity removal algorithm on our approach as well. The challenges faced during testing and the means taken to eliminate those challenges are also discussed. The algorithm was compared with two other object detectors using 4 different metrics. The testing was carried out using videos of 5 different objects collected at different illumination conditions over a lake using a multirotor. The results indicate that our algorithm is much suitable to be employed in real-time since it had the highest processing speed of 21 FPS, the lowest CPU consumption of 37.5\% and considerably high precision and recall values in detecting the object.
ContributorsSyed, Danish Faraaz (Author) / Zhang, Wenlong (Thesis advisor) / Yang, Yezhou (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)
Created2021
153928-Thumbnail Image.png
Description
The work presented in this manuscript has the overarching theme of radiation. The two forms of radiation of interest are neutrons, i.e. nuclear, and electric fields. The ability to detect such forms of radiation have significant security implications that could also be extended to very practical industrial applications.

The work presented in this manuscript has the overarching theme of radiation. The two forms of radiation of interest are neutrons, i.e. nuclear, and electric fields. The ability to detect such forms of radiation have significant security implications that could also be extended to very practical industrial applications. The goal is therefore to detect, and even image, such radiation sources.

The method to do so revolved around the concept of building large-area sensor arrays. By covering a large area, we can increase the probability of detection and gather more data to build a more complete and clearer view of the environment. Large-area circuitry can be achieved cost-effectively by leveraging the thin-film transistor process of the display industry. With production of displays increasing with the explosion of mobile devices and continued growth in sales of flat panel monitors and television, the cost to build a unit continues to decrease.

Using a thin-film process also allows for flexible electronics, which could be taken advantage of in-house at the Flexible Electronics and Display Center. Flexible electronics implies new form factors and applications that would not otherwise be possible with their single crystal counterparts. To be able to effectively use thin-film technology, novel ways of overcoming the drawbacks of the thin-film process, namely the lower performance scale.

The two deliverable devices that underwent development are a preamplifier used in an active pixel sensor for neutron detection and a passive electric field imaging array. This thesis will cover the theory and process behind realizing these devices.
ContributorsChung, Hugh E (Author) / Allee, David R. (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Holbert, Keith E. (Committee member) / Arizona State University (Publisher)
Created2015
168821-Thumbnail Image.png
Description
It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents

It is not merely an aggregation of static entities that a video clip carries, but alsoa variety of interactions and relations among these entities. Challenges still remain for a video captioning system to generate natural language descriptions focusing on the prominent interest and aligning with the latent aspects beyond observations. This work presents a Commonsense knowledge Anchored Video cAptioNing (dubbed as CAVAN) approach. CAVAN exploits inferential commonsense knowledge to assist the training of video captioning model with a novel paradigm for sentence-level semantic alignment. Specifically, commonsense knowledge is queried to complement per training caption by querying a generic knowledge atlas ATOMIC, and form the commonsense- caption entailment corpus. A BERT based language entailment model trained from this corpus then serves as a commonsense discriminator for the training of video captioning model, and penalizes the model from generating semantically misaligned captions. With extensive empirical evaluations on MSR-VTT, V2C and VATEX datasets, CAVAN consistently improves the quality of generations and shows higher keyword hit rate. Experimental results with ablations validate the effectiveness of CAVAN and reveals that the use of commonsense knowledge contributes to the video caption generation.
ContributorsShao, Huiliang (Author) / Yang, Yezhou (Thesis advisor) / Jayasuriya, Suren (Committee member) / Xiao, Chaowei (Committee member) / Arizona State University (Publisher)
Created2022
168842-Thumbnail Image.png
Description
There has been an explosion in the amount of data on the internet because of modern technology – especially image data – as a consequence of an exponential growth in the number of cameras existing in the world right now; from more extensive surveillance camera systems to billions of people

There has been an explosion in the amount of data on the internet because of modern technology – especially image data – as a consequence of an exponential growth in the number of cameras existing in the world right now; from more extensive surveillance camera systems to billions of people walking around with smartphones in their pockets that come with built-in cameras. With this sudden increase in the accessibility of cameras, most of the data that is getting captured through these devices is ending up on the internet. Researchers soon took leverage of this data by creating large-scale datasets. However, generating a dataset – let alone a large-scale one – requires a lot of man-hours. This work presents an algorithm that makes use of optical flow and feature matching, along with utilizing localization outputs from a Mask R-CNN, to generate large-scale vehicle datasets without much human supervision. Additionally, this work proposes a novel multi-view vehicle dataset (MVVdb) of 500 vehicles which is also generated using the aforementioned algorithm.There are various research problems in computer vision that can leverage a multi-view dataset, e.g., 3D pose estimation, and 3D object detection. On the other hand, a multi-view vehicle dataset can be used for a 2D image to 3D shape prediction, generation of 3D vehicle models, and even a more robust vehicle make and model recognition. In this work, a ResNet is trained on the multi-view vehicle dataset to perform vehicle re-identification, which is fundamentally similar to a vehicle make and recognition problem – also showcasing the usability of the MVVdb dataset.
ContributorsGuha, Anubhab (Author) / Yang, Yezhou (Thesis advisor) / Lu, Duo (Committee member) / Banerjee, Ayan (Committee member) / Arizona State University (Publisher)
Created2022
168796-Thumbnail Image.png
Description
Researchers have observed that the frequencies of leading digits in many man-made and naturally occurring datasets follow a logarithmic curve, with digits that start with the number 1 accounting for 30% of all numbers in the dataset and digits that start with the number 9 accounting for 5% of all

Researchers have observed that the frequencies of leading digits in many man-made and naturally occurring datasets follow a logarithmic curve, with digits that start with the number 1 accounting for 30% of all numbers in the dataset and digits that start with the number 9 accounting for 5% of all numbers in the dataset. This phenomenon, known as Benford's Law, is highly repeatable and appears in lists of numbers from electricity bills, stock prices, tax returns, house prices, death rates, lengths of rivers, and naturally occurring images. This paper will demonstrate that human speech spectra also follow Benford's Law. This observation is used to motivate a new set of features that can be efficiently extracted from speech and demonstrate that these features can be used to classify between human speech and synthetic speech.
ContributorsHsu, Leo (Author) / Berisha, Visar (Thesis advisor) / Spanias, Andreas (Committee member) / Papandreou-Suppappola, Antonia (Committee member) / Arizona State University (Publisher)
Created2022