Search Content

Determining the integrity of applications and operating systems using remote and local attesters

Description

This research describes software based remote attestation schemes for obtaining the integrity of an executing user application and the Operating System (OS) text section of an untrusted client platform. A trusted external entity issues a challenge to the client platform. The challenge is executable code which the client must execute,…

This research describes software based remote attestation schemes for obtaining the integrity of an executing user application and the Operating System (OS) text section of an untrusted client platform. A trusted external entity issues a challenge to the client platform. The challenge is executable code which the client must execute, and the code generates results which are sent to the external entity. These results provide the external entity an assurance as to whether the client application and the OS are in pristine condition. This work also presents a technique where it can be verified that the application which was attested, did not get replaced by a different application after completion of the attestation. The implementation of these three techniques was achieved entirely in software and is backward compatible with legacy machines on the Intel x86 architecture. This research also presents two approaches to incorporating software based "root of trust" using Virtual Machine Monitors (VMMs). The first approach determines the integrity of an executing Guest OS from the Host OS using Linux Kernel-based Virtual Machine (KVM) and qemu emulation software. The second approach implements a small VMM called MIvmm that can be utilized as a trusted codebase to build security applications such as those implemented in this research. MIvmm was conceptualized and implemented without using any existing codebase; its minimal size allows it to be trustworthy. Both the VMM approaches leverage processor support for virtualization in the Intel x86 architecture.

ContributorsSrinivasan, Raghunathan (Author) / Dasgupta, Partha (Thesis advisor) / Colbourn, Charles (Committee member) / Shrivastava, Aviral (Committee member) / Huang, Dijiang (Committee member) / Dewan, Prashant (Committee member) / Arizona State University (Publisher)

Created2011

Post-optimization: necessity analysis for combinatorial arrays

Description

Finding the optimal solution to a problem with an enormous search space can be challenging. Unless a combinatorial construction technique is found that also guarantees the optimality of the resulting solution, this could be an infeasible task. If such a technique is unavailable, different heuristic methods are generally used to…

Finding the optimal solution to a problem with an enormous search space can be challenging. Unless a combinatorial construction technique is found that also guarantees the optimality of the resulting solution, this could be an infeasible task. If such a technique is unavailable, different heuristic methods are generally used to improve the upper bound on the size of the optimal solution. This dissertation presents an alternative method which can be used to improve a solution to a problem rather than construct a solution from scratch. Necessity analysis, which is the key to this approach, is the process of analyzing the necessity of each element in a solution. The post-optimization algorithm presented here utilizes the result of the necessity analysis to improve the quality of the solution by eliminating unnecessary objects from the solution. While this technique could potentially be applied to different domains, this dissertation focuses on k-restriction problems, where a solution to the problem can be presented as an array. A scalable post-optimization algorithm for covering arrays is described, which starts from a valid solution and performs necessity analysis to iteratively improve the quality of the solution. It is shown that not only can this technique improve upon the previously best known results, it can also be added as a refinement step to any construction technique and in most cases further improvements are expected. The post-optimization algorithm is then modified to accommodate every k-restriction problem; and this generic algorithm can be used as a starting point to create a reasonable sized solution for any such problem. This generic algorithm is then further refined for hash family problems, by adding a conflict graph analysis to the necessity analysis phase. By recoloring the conflict graphs a new degree of flexibility is explored, which can further improve the quality of the solution.

ContributorsNayeri, Peyman (Author) / Colbourn, Charles (Thesis advisor) / Konjevod, Goran (Thesis advisor) / Sen, Arunabha (Committee member) / Stanzione Jr, Daniel (Committee member) / Arizona State University (Publisher)

Created2011

The classification of domain concepts in object-oriented systems

Description

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software…

The complexity of the systems that software engineers build has continuously grown since the inception of the field. What has not changed is the engineers' mental capacity to operate on about seven distinct pieces of information at a time. The widespread use of UML has led to more abstract software design activities, however the same cannot be said for reverse engineering activities. The introduction of abstraction to reverse engineering will allow the engineer to move farther away from the details of the system, increasing his ability to see the role that domain level concepts play in the system. In this thesis, we present a technique that facilitates filtering of classes from existing systems at the source level based on their relationship to concepts in the domain via a classification method using machine learning. We showed that concepts can be identified using a machine learning classifier based on source level metrics. We developed an Eclipse plugin to assist with the process of manually classifying Java source code, and collecting metrics and classifications into a standard file format. We developed an Eclipse plugin to act as a concept identifier that visually indicates a class as a domain concept or not. We minimized the size of training sets to ensure a useful approach in practice. This allowed us to determine that a training set of 7:5 to 10% is nearly as effective as a training set representing 50% of the system. We showed that random selection is the most consistent and effective means of selecting a training set. We found that KNN is the most consistent performer among the learning algorithms tested. We determined the optimal feature set for this classification problem. We discussed two possible structures besides a one to one mapping of domain knowledge to implementation. We showed that classes representing more than one concept are simply concepts at differing levels of abstraction. We also discussed composite concepts representing a domain concept implemented by more than one class. We showed that these composite concepts are difficult to detect because the problem is NP-complete.

ContributorsCarey, Maurice (Author) / Colbourn, Charles (Thesis advisor) / Collofello, James (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Smart compilers for reliable and power-efficient embedded computing

Description

Thanks to continuous technology scaling, intelligent, fast and smaller digital systems are now available at affordable costs. As a result, digital systems have found use in a wide range of application areas that were not even imagined before, including medical (e.g., MRI, remote or post-operative monitoring devices, etc.), automotive (e.g.,…

Thanks to continuous technology scaling, intelligent, fast and smaller digital systems are now available at affordable costs. As a result, digital systems have found use in a wide range of application areas that were not even imagined before, including medical (e.g., MRI, remote or post-operative monitoring devices, etc.), automotive (e.g., adaptive cruise control, anti-lock brakes, etc.), security systems (e.g., residential security gateways, surveillance devices, etc.), and in- and out-of-body sensing (e.g., capsule swallowed by patients measuring digestive system pH, heart monitors, etc.). Such computing systems, which are completely embedded within the application, are called embedded systems, as opposed to general purpose computing systems. In the design of such embedded systems, power consumption and reliability are indispensable system requirements. In battery operated portable devices, the battery is the single largest factor contributing to device cost, weight, recharging time, frequency and ultimately its usability. For example, in the Apple iPhone 4 smart-phone, the battery is $40\%$ of the device weight, occupies $36\%$ of its volume and allows only $7$ hours (over 3G) of talk time. As embedded systems find use in a range of sensitive applications, from bio-medical applications to safety and security systems, the reliability of the computations performed becomes a crucial factor. At our current technology-node, portable embedded systems are prone to expect failures due to soft errors at the rate of once-per-year; but with aggressive technology scaling, the rate is predicted to increase exponentially to once-per-hour. Over the years, researchers have been successful in developing techniques, implemented at different layers of the design-spectrum, to improve system power efficiency and reliability. Among the layers of design abstraction, I observe that the interface between the compiler and processor micro-architecture possesses a unique potential for efficient design optimizations. A compiler designer is able to observe and analyze the application software at a finer granularity; while the processor architect analyzes the system output (power, performance, etc.) for each executed instruction. At the compiler micro-architecture interface, if the system knowledge at the two design layers can be integrated, design optimizations at the two layers can be modified to efficiently utilize available resources and thereby achieve appreciable system-level benefits. To this effect, the thesis statement is that, ``by merging system design information at the compiler and micro-architecture design layers, smart compilers can be developed, that achieve reliable and power-efficient embedded computing through: i) Pure compiler techniques, ii) Hybrid compiler micro-architecture techniques, and iii) Compiler-aware architectures''. In this dissertation demonstrates, through contributions in each of the three compiler-based techniques, the effectiveness of smart compilers in achieving power-efficiency and reliability in embedded systems.

ContributorsJeyapaul, Reiley (Author) / Shrivastava, Aviral (Thesis advisor) / Vrudhula, Sarma (Committee member) / Clark, Lawrence (Committee member) / Colbourn, Charles (Committee member) / Arizona State University (Publisher)

Created2012

A domain-specific approach to verification & validation of software requirements

Description

Gathering and managing software requirements, known as Requirement Engineering (RE), is a significant and basic step during the Software Development Life Cycle (SDLC). Any error or defect during the RE step will propagate to further steps of SDLC and resolving it will be more costly than any defect in other…

Gathering and managing software requirements, known as Requirement Engineering (RE), is a significant and basic step during the Software Development Life Cycle (SDLC). Any error or defect during the RE step will propagate to further steps of SDLC and resolving it will be more costly than any defect in other steps. In order to produce better quality software, the requirements have to be free of any defects. Verification and Validation (V&V;) of requirements are performed to improve their quality, by performing the V&V; process on the Software Requirement Specification (SRS) document. V&V; of the software requirements focused to a specific domain helps in improving quality. A large database of software requirements from software projects of different domains is created. Software requirements from commercial applications are focus of this project; other domains embedded, mobile, E-commerce, etc. can be the focus of future efforts. The V&V; is done to inspect the requirements and improve the quality. Inspections are done to detect defects in the requirements and three approaches for inspection of software requirements are discussed; ad-hoc techniques, checklists, and scenario-based techniques. A more systematic domain-specific technique is presented for performing V&V; of requirements.

ContributorsChughtai, Rehman (Author) / Ghazarian, Arbi (Thesis advisor) / Bansal, Ajay (Committee member) / Millard, Bruce (Committee member) / Arizona State University (Publisher)

Created2012

Predicting Student Dropout in Self-Paced MOOC Course

Description

One persisting problem in Massive Open Online Courses (MOOCs) is the issue of student dropout from these courses. The prediction of student dropout from MOOC courses can identify the factors responsible for such an event and it can further initiate intervention before such an event to increase student success in…

One persisting problem in Massive Open Online Courses (MOOCs) is the issue of student dropout from these courses. The prediction of student dropout from MOOC courses can identify the factors responsible for such an event and it can further initiate intervention before such an event to increase student success in MOOC. There are different approaches and various features available for the prediction of student’s dropout in MOOC courses.In this research, the data derived from the self-paced math course ‘College Algebra and Problem Solving’ offered on the MOOC platform Open edX offered by Arizona State University (ASU) from 2016 to 2020 was considered. This research aims to predict the dropout of students from a MOOC course given a set of features engineered from the learning of students in a day. Machine Learning (ML) model used is Random Forest (RF) and this model is evaluated using the validation metrics like accuracy, precision, recall, F1-score, Area Under the Curve (AUC), Receiver Operating Characteristic (ROC) curve. The average rate of student learning progress was found to have more impact than other features. The model developed can predict the dropout or continuation of students on any given day in the MOOC course with an accuracy of 87.5%, AUC of 94.5%, precision of 88%, recall of 87.5%, and F1-score of 87.5% respectively. The contributing features and interactions were explained using Shapely values for the prediction of the model. The features engineered in this research are predictive of student dropout and could be used for similar courses to predict student dropout from the course. This model can also help in making interventions at a critical time to help students succeed in this MOOC course.

ContributorsDominic Ravichandran, Sheran Dass (Author) / Gary, Kevin (Thesis advisor) / Bansal, Ajay (Committee member) / Cunningham, James (Committee member) / Sannier, Adrian (Committee member) / Arizona State University (Publisher)

Created2021

Transformer-based Automatic Mapping of Clinical Notes to Specific Clinical Concepts

Description

A significant proportion of medical errors exist in crucial medical information, and most stem from misinterpreting non-standardized clinical notes. Clinical Skills exam offered by the United States Medical Licensing Examination (USMLE) was put in place to certify patient note-taking skills before medical students joined professional practices, offering the first line…

A significant proportion of medical errors exist in crucial medical information, and most stem from misinterpreting non-standardized clinical notes. Clinical Skills exam offered by the United States Medical Licensing Examination (USMLE) was put in place to certify patient note-taking skills before medical students joined professional practices, offering the first line of defense in protecting patients from medical errors. Nonetheless, the exams were discontinued in 2021 following high costs and resource usage in scoring the exams. This thesis compares four transformer-based models, namely BERT (Bidirectional Encoder Representations from Transformers) Base Uncased, Emilyalsentzer Bio_ClinicalBERT, RoBERTa (Robustly Optimized BERT Pre-Training Approach), and DeBERTa (Decoding-enhanced BERT with disentangled attention), with the goal to map free text in patient notes to clinical concepts present in the exam rubric. The impact of context-specific embeddings on BERT was also studied to determine the need for a clinical BERT in Clinical Skills exam. This thesis proposes the use of DeBERTa as a backbone model in patient note scoring for the USMLE Clinical Skills exam after comparing it with three other transformer models. Disentangled attention and enhanced mask decoder integrated into DeBERTa were credited for the high performance of DeBERTa as compared to the other models. Besides, the effect of meta pseudo labeling was also investigated in this thesis, which in turn, further enhanced DeBERTa’s performance.

ContributorsGanesh, Jay (Author) / Bansal, Ajay (Thesis advisor) / Mehlhase, Alexandra (Committee member) / Findler, Michael (Committee member) / Arizona State University (Publisher)

Created2022

Semantic Information Extraction From Natural Language Using a Learning and Rule-Based Approach

Description

Open Information Extraction (OIE) is a subset of Natural Language Processing (NLP) that constitutes the processing of natural language into structured and machine-readable data. This thesis uses data in Resource Description Framework (RDF) triple format that comprises of a subject, predicate, and object. The extraction of RDF triples from…

Open Information Extraction (OIE) is a subset of Natural Language Processing (NLP) that constitutes the processing of natural language into structured and machine-readable data. This thesis uses data in Resource Description Framework (RDF) triple format that comprises of a subject, predicate, and object. The extraction of RDF triples from natural language is an essential step towards importing data into web ontologies as part of the linked open data cloud on the Semantic web. There have been a number of related techniques for extraction of triples from plain natural language text including but not limited to ClausIE, OLLIE, Reverb, and DeepEx. This proposed study aims to reduce the dependency on conventional machine learning models since they require training datasets, and the models are not easily customizable or explainable. By leveraging a context-free grammar (CFG) based model, this thesis aims to address some of these issues while minimizing the trade-offs on performance and accuracy. Furthermore, a deep-dive is conducted to analyze the strengths and limitations of the proposed approach.

ContributorsSingh, Varun (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Mehlhase, Alexandra (Committee member) / Arizona State University (Publisher)

Created2023

Deep Learning-Based Monocular SLAM

Description

SLAM (Simultaneous Localization and Mapping) is a problem that has existed for a long time in robotics and autonomous navigation. The objective of SLAM is for a robot to simultaneously figure out its position in space and map its environment. SLAM is especially useful and mandatory for robots that want…

SLAM (Simultaneous Localization and Mapping) is a problem that has existed for a long time in robotics and autonomous navigation. The objective of SLAM is for a robot to simultaneously figure out its position in space and map its environment. SLAM is especially useful and mandatory for robots that want to navigate autonomously. The description might make it seem like a chicken and egg problem, but numerous methods have been proposed to tackle SLAM. Before the rise in the popularity of deep learning and AI (Artificial Intelligence), most existing algorithms involved traditional hard-coded algorithms that would receive and process sensor information and convert it into some solvable sensor-agnostic problem. The challenge for these sorts of methods is having to tackle dynamic environments. The more variety in the environment, the poorer the results. Also due to the increase in computational power and the capability of deep learning-based image processing, visual SLAM has become extremely viable and maybe even preferable to traditional SLAM algorithms. In this research, a deep learning-based solution to the SLAM problem is proposed, specifically monocular visual SLAM which is solving the problem of SLAM purely with a singular camera as the input, and the model is tested on the KITTI (Karlsruhe Institute of Technology & Toyota Technological Institute) odometry dataset.

ContributorsRupaakula, Krishna Sandeep (Author) / Bansal, Ajay (Thesis advisor) / Baron, Tyler (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2023

Speedcuber Timer: Creating an Open-Source Platform for Smart Rubik’s Cube Applications

Description

Since the early 2000s the Rubik’s Cube has seen growing usage at speedsolving competitions and as an effective tool to teach Science, Technology, Engineering, Mathematics (STEM) topics at hundreds of schools and universities across the world. Recently, cube manufacturers have begun embedding sensors to enable digital face tracking. The live…

Since the early 2000s the Rubik’s Cube has seen growing usage at speedsolving competitions and as an effective tool to teach Science, Technology, Engineering, Mathematics (STEM) topics at hundreds of schools and universities across the world. Recently, cube manufacturers have begun embedding sensors to enable digital face tracking. The live feedback from these so called “smartcubes” enables a new wave of immersive solution tutorials and interactive educational games using the cube as a controller. Existing smartcube software has several limitations. Manufacturers’ applications support only a narrow set of puzzle form factors and application platforms, fragmenting the ecosystem. Most apps require an active internet connection for key features, limiting where users can practice with a smartcube. Finally, existing applications focus on a single 3x3x3connection, losing opportunities afforded by new form factors. This research demonstrates an open-source smartcube application which mitigates these limitations. Particular attention is given to creating an Application Programming Interface (API) for smartcube communication and building representative solve analysis tools. These innovations have included successful negotiations to re-license existing open-source Rubik’sCube software projects to support deployment on multiple platforms, particularly iOS. The resulting application supports smartcubes from three manufacturers, runs on two platforms (Android and iOS), functions entirely offline after an initial download of remote assets, demonstrates concurrent connections with up to six smartcubes, and supports all current and anticipated smartcube form factors. These foundational elements can accelerate future efforts to build smartcube applications, including automated performance feedback systems and personalized gamification of learning experiences. Such advances will hopefully enhance the Rubik’s Cube’s value both as a competitive toy and as a pedagogical tool in educational institutions worldwide.

ContributorsHale, Joseph (Author) / Bansal, Ajay (Thesis advisor) / Heinrichs, Robert (Committee member) / Gary, Kevin (Committee member) / Arizona State University (Publisher)

Created2023

Filtering by