Matching Items (22)
171818-Thumbnail Image.png
Description
Recent advances in autonomous vehicle (AV) technologies have ensured that autonomous driving will soon be present in real-world traffic. Despite the potential of AVs, many studies have shown that traffic accidents in hybrid traffic environments (where both AVs and human-driven vehicles (HVs) are present) are inevitable because of the unpredictability

Recent advances in autonomous vehicle (AV) technologies have ensured that autonomous driving will soon be present in real-world traffic. Despite the potential of AVs, many studies have shown that traffic accidents in hybrid traffic environments (where both AVs and human-driven vehicles (HVs) are present) are inevitable because of the unpredictability of human-driven vehicles. Given that eliminating accidents is impossible, an achievable goal of designing AVs is to design them in a way so that they will not be blamed for any accident in which they are involved in. This work proposes BlaFT – a Blame-Free motion planning algorithm in hybrid Traffic. BlaFT is designed to be compatible with HVs and other AVs, and will not be blamed for accidents in a structured road environment. Also, it proves that no accidents will happen if all AVs are using the BlaFT motion planner and that when in hybrid traffic, the AV using BlaFT will be blame-free even if it is involved in a collision. The work instantiated scores of BlaFT and HV vehicles in an urban road scape loop in the 'Simulation of Urban MObility', ran the simulation for several hours, and observe that as the percentage of BlaFT vehicles increases, the traffic becomes safer. Adding BlaFT vehicles to HVs also increases the efficiency of traffic as a whole by up to 34%.
ContributorsPark, Sanggu (Author) / Shrivastava, Aviral (Thesis advisor) / Wang, Ruoyu (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2022
189209-Thumbnail Image.png
Description
In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained

In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained on massive curated data, they often need specific extracted knowledge to understand better and reason. This is because often relevant knowledge may be implicit or missing, which hampers machine reasoning. Apart from that, manual knowledge curation is time-consuming and erroneous. Hence, finding fast and effective methods to extract such knowledge from data is important for improving language models. This leads to finding ideal ways to utilize such knowledge by incorporating them into language models. Successful knowledge extraction and integration lead to an important question of knowledge evaluation of such models by developing tools or introducing challenging test suites to learn about their limitations and improve them further. So to improve the transformer-based models, understanding the role of knowledge becomes important. In the pursuit to improve language models with knowledge, in this dissertation I study three broad research directions spanning across the natural language, biomedical and cybersecurity domains: (1) Knowledge Extraction (KX) - How can transformer-based language models be leveraged to extract knowledge from data? (2) Knowledge Integration (KI) - How can such specific knowledge be used to improve such models? (3) Knowledge Evaluation (KE) - How can language models be evaluated for specific skills and understand their limitations? I propose methods to extract explicit textual, implicit structural, missing textual, and missing structural knowledge from natural language and binary programs using transformer-based language models. I develop ways to improve the language model’s multi-step and commonsense reasoning abilities using external knowledge. Finally, I develop challenging datasets which assess their numerical reasoning skills in both in-domain and out-of-domain settings.
ContributorsPal, Kuntal Kumar (Author) / Baral, Chitta (Thesis advisor) / Wang, Ruoyu (Committee member) / Blanco, Eduardo (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)
Created2023
187772-Thumbnail Image.png
Description
As computers and the Internet have become integral to daily life, the potential gains from exploiting these resources have increased significantly. The global landscape is now rife with highly skilled wrongdoers seeking to steal from and disrupt society. In order to safeguard society and its infrastructure, a comprehensive approach to

As computers and the Internet have become integral to daily life, the potential gains from exploiting these resources have increased significantly. The global landscape is now rife with highly skilled wrongdoers seeking to steal from and disrupt society. In order to safeguard society and its infrastructure, a comprehensive approach to research is essential. This work aims to enhance security from three unique viewpoints by expanding the resources available to educators, users, and analysts. For educators, a capture the flag as-a-service was developed to support cybersecurity education. This service minimizes the skill and time needed to establish the infrastructure for hands-on hacking experiences for cybersecurity students. For users, a tool called CloakX was created to improve online anonymity. CloakX prevents the identification of browser extensions by employing both static and dynamic rewriting techniques, thwarting contemporary methods of detecting installed extensions and thus protecting user identity. Lastly, for cybersecurity analysts, a tool named Witcher was developed to automate the process of crawling and exercising web applications while identifying web injection vulnerabilities. Overall, these contributions serve to strengthen security education, bolster privacy protection for users, and facilitate vulnerability discovery for cybersecurity analysts.
ContributorsTrickel, Erik (Author) / Doupe, Adam (Thesis advisor) / Shoshitaishvili, Yan (Thesis advisor) / Bao, Tiffany (Committee member) / Wang, Ruoyu (Committee member) / Arizona State University (Publisher)
Created2023
171782-Thumbnail Image.png
Description
Security requirements are at the heart of developing secure, invulnerable software. Without embedding security principles in the software development life cycle, the likelihood of producing insecure software increases, putting the consumers of that software at great risk. For large-scale software development, this problem is complicated as there may be hundreds

Security requirements are at the heart of developing secure, invulnerable software. Without embedding security principles in the software development life cycle, the likelihood of producing insecure software increases, putting the consumers of that software at great risk. For large-scale software development, this problem is complicated as there may be hundreds or thousands of security requirements that need to be met, and it only worsens if the software development project is developed by a distributed development team. In this thesis, an approach is provided for software security requirement traceability for large-scale and complex software development projects being developed by distributed development teams. The approach utilizes blockchain technology to improve the automation of security requirement satisfaction and create a more transparent and trustworthy development environment for distributed development teams. The approach also introduces immutability, auditability, and non-repudiation into the security requirement traceability process. The approach is evaluated against existing software security requirement solutions.
ContributorsKulkarni, Adi Deepak (Author) / Yau, Stephen S. (Thesis advisor) / Banerjee, Ayan (Committee member) / Wang, Ruoyu (Committee member) / Baek, Jaejong (Committee member) / Arizona State University (Publisher)
Created2022
157781-Thumbnail Image.png
Description
Virtualization technologies are widely used in modern computing systems to deliver shared resources to heterogeneous applications. Virtual Machines (VMs) are the basic building blocks for Infrastructure as a Service (IaaS), and containers are widely used to provide Platform as a Service (PaaS). Although it is generally believed that containers have

Virtualization technologies are widely used in modern computing systems to deliver shared resources to heterogeneous applications. Virtual Machines (VMs) are the basic building blocks for Infrastructure as a Service (IaaS), and containers are widely used to provide Platform as a Service (PaaS). Although it is generally believed that containers have less overhead than VMs, an important tradeoff which has not been thoroughly studied is the effectiveness of performance isolation, i.e., to what extent the virtualization technology prevents the applications from affecting each other’s performance when they share the resources using separate VMs or containers. Such isolation is critical to provide performance guarantees for applications consolidated using VMs or containers. This paper provides a comprehensive study on the performance isolation for three widely used virtualization technologies, full virtualization, para-virtualization, and operating system level virtualization, using Kernel-based Virtual Machine (KVM), Xen, and Docker containers as the representative implementations of these technologies. The results show that containers generally have less performance loss (up to 69% and 41% compared to KVM and Xen in network latency experiments, respectively) and better scalability (up to 83.3% and 64.6% faster compared to KVM and Xen when increasing number of VMs/containers to 64, respectively), but they also suffer from much worse isolation (up to 111.8% and 104.92% slowdown compared to KVM and Xen when adding disk stress test in TeraSort experiments under full usage (FU) scenario, respectively). The resource reservation tools help virtualization technologies achieve better performance (up to 85.9% better disk performance in TeraSort under FU scenario), but cannot help them avoid all impacts.
ContributorsHuang, Zige (Author) / Zhao, Ming (Thesis advisor) / Sarwat, Mohamed (Committee member) / Wang, Ruoyu (Committee member) / Arizona State University (Publisher)
Created2019
166285-Thumbnail Image.png
Description

A proposed solution for the decompilation of binaries that include Intel Advanced Vector Extension instruction sets is presented, along with an explanation of the methodology and an overview of the difficulties encountered with the current decompilation process. A simple approach was made to convert vector operations into scalar operations reflected

A proposed solution for the decompilation of binaries that include Intel Advanced Vector Extension instruction sets is presented, along with an explanation of the methodology and an overview of the difficulties encountered with the current decompilation process. A simple approach was made to convert vector operations into scalar operations reflected in new assembly code. This new code overwrites instructions using AVX registers so that all available decompilation software is able to properly decompile binaries using these registers. The results show that this approach is functional and successful at resolving the decompilation problem. However, there may be a way to optimize the performance of the output. In conclusion, our theoretical work can easily be extended and applied to a wider range of instructions and instruction sets to further resolve related decompilation issues with binaries utilizing external instructions.

ContributorsEdge, Hannah (Author) / Wang, Ruoyu (Thesis director) / Shoshitaishvili, Yan (Committee member) / Barrett, The Honors College (Contributor) / Computer Science and Engineering Program (Contributor)
Created2022-05
187402-Thumbnail Image.png
Description
Large software tend to have a large number of configuration options that can be tuned to a varying degree in order to run the software in a specific way. These configuration options cause a change in the execution of the software, and therefore affect the code coverage of the software.

Large software tend to have a large number of configuration options that can be tuned to a varying degree in order to run the software in a specific way. These configuration options cause a change in the execution of the software, and therefore affect the code coverage of the software. This gives rise to the problem of understanding how much a certain configuration change affects the code coverage of the software in a measurable way. It also raises the question of effectively mapping code coverage to a configuration change. Solutions to these problems could give way to increasing efficiency in various areas of software security, like maximizing code coverage in fuzz testing and vulnerability identification in specific configurations.In this work, I perform analyze widely used software, such as the database cache `Redis' and web servers like `Nginx' and `Apache httpd'. I perform fuzz tests on multiple configurations of each of these software to measure the difference in code coverage caused by each configuration. I use Coverage Instrumentation to obtain traces for each software in their configurations, and then I analyze these traces to understand the configuration's impact on the software's code coverage. In conclusion, I describe a method to measure how much code coverage differs for each configuration with respect to the default configuration of the software, and how certain configurations have a much larger difference in code coverage with respect to the default configuration than others, analyze the overlap in code coverage between the configurations and finally find the root causes of the differing code coverage.
ContributorsKumbhar, Swapnil (Author) / Shoshitaishvili, Yan (Thesis advisor) / Wang, Ruoyu (Committee member) / Xiao, Xusheng (Committee member) / Arizona State University (Publisher)
Created2023
171778-Thumbnail Image.png
Description
Honeypots – cyber deception technique used to lure attackers into a trap. They contain fake confidential information to make an attacker believe that their attack has been successful. One of the prerequisites for a honeypot to be effective is that it needs to be undetectable. Deploying sniffing and event logging

Honeypots – cyber deception technique used to lure attackers into a trap. They contain fake confidential information to make an attacker believe that their attack has been successful. One of the prerequisites for a honeypot to be effective is that it needs to be undetectable. Deploying sniffing and event logging tools alongside the honeypot also helps understand the mindset of the attacker after successful attacks. Is there any data that backs up the claim that honeypots are effective in real life scenarios? The answer is no.Game-theoretic models have been helpful to approximate attacker and defender actions in cyber security. However, in the past these models have relied on expert- created data. The goal of this research project is to determine the effectiveness of honeypots using real-world data. So, how to deploy effective honeypots? This is where honey-patches come into play. Honey-patches are software patches designed to hinder the attacker’s ability to determine whether an attack has been successful or not. When an attacker launches a successful attack on a software, the honey-patch transparently redirects the attacker into a honeypot. The honeypot contains fake information which makes the attacker believe they were successful while in reality they were not. After conducting a series of experiments and analyzing the results, there is a clear indication that honey-patches are not the perfect application security solution having both pros and cons.
ContributorsChauhan, Purv Rakeshkumar (Author) / Doupe, Adam (Thesis advisor) / Bao, Youzhi (Committee member) / Wang, Ruoyu (Committee member) / Arizona State University (Publisher)
Created2022
171701-Thumbnail Image.png
Description
Reverse engineering is a process focused on gaining an understanding for the intricaciesof a system. This practice is critical in cybersecurity as it promotes the findings and patching of vulnerabilities as well as the counteracting of malware. Disassemblers and decompilers have become essential when reverse engineering due to the readability of information they

Reverse engineering is a process focused on gaining an understanding for the intricaciesof a system. This practice is critical in cybersecurity as it promotes the findings and patching of vulnerabilities as well as the counteracting of malware. Disassemblers and decompilers have become essential when reverse engineering due to the readability of information they transcribe from binary files. However, these tools still tend to produce involved and complicated outputs that hinder the acquisition of knowledge during binary analysis. Cognitive Load Theory (CLT) explains that this hindrance is due to the human brain’s inability to process superfluous amounts of data. CLT classifies this data into three cognitive load types — intrinsic, extraneous, and germane — that each can help gauge complex procedures. In this research paper, a novel program call graph is presented accounting for these CLT principles. The goal of this graphical view is to reduce the cognitive load tied to the depiction of binary information and to enhance the overall binary analysis process. This feature was implemented within the binary analysis tool, angr and it’s user interface counterpart, angr-management. Additionally, this paper will examine a conducted user study to quantitatively and qualitatively evaluate the effectiveness of the newly proposed proximity view (PV). The user study includes a binary challenge solving portion measured by defined metrics and a survey phase to receive direct participant feedback regarding the view. The results from this study show statistically significant evidence that PV aids in challenge solving and improves the overall understanding binaries. The results also signify that this improvement comes with the cost of time. The survey section of the user study further indicates that users find PV beneficial to the reverse engineering process, but additional information needs to be included in future developments.
ContributorsSmits, Sean (Author) / Wang, Ruoyu (Thesis advisor) / Shoshitaishvili, Yan (Thesis advisor) / Doupe, Adam (Committee member) / Arizona State University (Publisher)
Created2022
171711-Thumbnail Image.png
Description
Binary analysis and software debugging are critical tools in the modern softwaresecurity ecosystem. With the security arms race between attackers discovering and exploiting vulnerabilities and the development teams patching bugs ever-tightening, there is an immense need for more tooling to streamline the binary analysis and debugging processes. Whether attempting to find the root

Binary analysis and software debugging are critical tools in the modern softwaresecurity ecosystem. With the security arms race between attackers discovering and exploiting vulnerabilities and the development teams patching bugs ever-tightening, there is an immense need for more tooling to streamline the binary analysis and debugging processes. Whether attempting to find the root cause for a buffer overflow or a segmentation fault, the analysis process often involves manually tracing the movement of data throughout a program’s life cycle. Up until this point, there has not been a viable solution to the human limitation of maintaining a cohesive mental image of the intricacies of a program’s data flow. This thesis proposes a novel data dependency graph (DDG) analysis as an addi- tion to angr’s analyses suite. This new analysis ingests a symbolic execution trace in order to generate a directed acyclic graph of the program’s data dependencies. In addition to the development of the backend logic needed to generate this graph, an angr management view to visualize the DDG was implemented. This user interface provides functionality for ancestor and descendant dependency tracing and sub-graph creation. To evaluate the analysis, a user study was conducted to measure the view’s efficacy in regards to binary analysis and software debugging. The study consisted of a control group and experimental group attempting to solve a series of 3 chal- lenges and subsequently providing feedback concerning perceived functionality and comprehensibility pertaining to the view. The results show that the view had a positive trend in relation to challenge-solving accuracy in its target domain, as participants solved 32% more challenges 21% faster when using the analysis than when using vanilla angr management.
ContributorsCapuano, Bailey Kellen (Author) / Shoshitaishvili, Yan (Thesis advisor) / Wang, Ruoyu (Thesis advisor) / Doupe, Adam (Committee member) / Arizona State University (Publisher)
Created2022