Search Content

Lossless Data Compression by Representing Data as a Solution to the Diophantine Equations

Description

There has been a substantial development in the field of data transmission in the last two decades. One does not have to wait much for a high-definition video to load on the systems anymore. Data compression is one of the most important technologies that helped achieve this seamless data transmission…

There has been a substantial development in the field of data transmission in the last two decades. One does not have to wait much for a high-definition video to load on the systems anymore. Data compression is one of the most important technologies that helped achieve this seamless data transmission experience. It helps to store or send more data using less memory or network resources. However, it appears that there is a limit on the amount of compression that can be achieved with the existing lossless data compression techniques because they rely on the frequency of characters or set of characters in the data. The thesis proposes a lossless data compression technique in which the data is compressed by representing it as a set of parameters that can reproduce the original data without any loss when given to the corresponding mathematical equation. The mathematical equation used in the thesis is the sum of the first N terms in a geometric series. Various changes are made to this mathematical equation so that any given data can be compressed and decompressed. According to the proposed technique, the whole data is taken as a single decimal number and replaced with one of the terms of the used equation. All the other terms of the equation are computed and stored as a compressed file. The performance of the developed technique is evaluated in terms of compression ratio, compression time and decompression time. The evaluation metrics are then compared with the other existing techniques of the same domain.

ContributorsGrewal, Karandeep Singh (Author) / Gonzalez Sanchez, Javier (Thesis advisor) / Bansal, Ajay (Committee member) / Findler, Michael (Committee member) / Arizona State University (Publisher)

Created2021

Early Detection of At-Risk Students Using LMS Data

Description

Calculus as a math course is important subject students need to succeed in, in order to venture into STEM majors. This thesis focuses on the early detection of at-risk students in a calculus course which can provide the proper intervention that might help them succeed in the course. Calculus has…

Calculus as a math course is important subject students need to succeed in, in order to venture into STEM majors. This thesis focuses on the early detection of at-risk students in a calculus course which can provide the proper intervention that might help them succeed in the course. Calculus has high failure rates which corroborates with the data collected from Arizona State University that shows that 40% of the 3266 students whose data were used failed in their calculus course.This thesis proposes to utilize educational big data to detect students at high risk of failure and their eventual early detection and subsequent intervention can be useful. Some existing studies similar to this thesis make use of open-scale data that are lower in data count and perform predictions on low-impact Massive Open Online Courses(MOOC) based courses. In this thesis, an automatic detection method of academically at-risk students by using learning management systems(LMS) activity data along with the student information system(SIS) data from Arizona State University(ASU) for the course calculus for engineers I (MAT 265) is developed. The method will detect students at risk by employing machine learning to identify key features that contribute to the success of a student. This thesis also proposes a new technique to convert this button click data into a button click sequence which can be used as inputs to classifiers. In addition, the advancements in Natural Language Processing field can be used by adopting methods such as part-of-speech (POS) tagging and tools such as Facebook Fasttext word embeddings to convert these button click sequences into numeric vectors before feeding them into the classifiers. The thesis proposes two preprocessing techniques and evaluates them on 3 different machine learning ensembles to determine their performance across the two modalities of the class.

ContributorsDileep, Akshay Kumar (Author) / Bansal, Ajay (Thesis advisor) / Cunningham, James (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2021

Predicting Student Dropout in Self-Paced MOOC Course

Description

One persisting problem in Massive Open Online Courses (MOOCs) is the issue of student dropout from these courses. The prediction of student dropout from MOOC courses can identify the factors responsible for such an event and it can further initiate intervention before such an event to increase student success in…

One persisting problem in Massive Open Online Courses (MOOCs) is the issue of student dropout from these courses. The prediction of student dropout from MOOC courses can identify the factors responsible for such an event and it can further initiate intervention before such an event to increase student success in MOOC. There are different approaches and various features available for the prediction of student’s dropout in MOOC courses.In this research, the data derived from the self-paced math course ‘College Algebra and Problem Solving’ offered on the MOOC platform Open edX offered by Arizona State University (ASU) from 2016 to 2020 was considered. This research aims to predict the dropout of students from a MOOC course given a set of features engineered from the learning of students in a day. Machine Learning (ML) model used is Random Forest (RF) and this model is evaluated using the validation metrics like accuracy, precision, recall, F1-score, Area Under the Curve (AUC), Receiver Operating Characteristic (ROC) curve. The average rate of student learning progress was found to have more impact than other features. The model developed can predict the dropout or continuation of students on any given day in the MOOC course with an accuracy of 87.5%, AUC of 94.5%, precision of 88%, recall of 87.5%, and F1-score of 87.5% respectively. The contributing features and interactions were explained using Shapely values for the prediction of the model. The features engineered in this research are predictive of student dropout and could be used for similar courses to predict student dropout from the course. This model can also help in making interventions at a critical time to help students succeed in this MOOC course.

ContributorsDominic Ravichandran, Sheran Dass (Author) / Gary, Kevin (Thesis advisor) / Bansal, Ajay (Committee member) / Cunningham, James (Committee member) / Sannier, Adrian (Committee member) / Arizona State University (Publisher)

Created2021

OntoConnect: Domain-Agnostic Ontology Alignment using Neural Networks

Description

An ontology is a vocabulary that provides a formal description of entities within a domain and their relationships with other entities. Along with basic schema information, it also captures information in the form of metadata about cardinality, restrictions, hierarchy, and semantic meaning. With the rapid growth of semantic (RDF) data…

An ontology is a vocabulary that provides a formal description of entities within a domain and their relationships with other entities. Along with basic schema information, it also captures information in the form of metadata about cardinality, restrictions, hierarchy, and semantic meaning. With the rapid growth of semantic (RDF) data on the web, many organizations like DBpedia, Earth Science Information Partners (ESIP) are publishing more and more data in RDF format. The ontology alignment task aims at linking two or more different ontologies from the same domain or different domains. It is a process of finding the semantic relationship between two or more ontological entities and/or instances. Information/data sharing among different systems is quite limited because of differences in data based on syntax, structures, and semantics. Ontology alignment is used to overcome the limitation of semantic interoperability of current vast distributed systems available on the Web. In spite of the availability of large hierarchical domain-specific datasets, automated ontology mapping is still a complex problem. Over the years, many techniques have been proposed for ontology instance alignment, schema alignment, and link discovery. Most of the available approaches require human intervention or work within a specific domain. The challenge involves representing an entity as a vector that encodes all context information of the entity such as hierarchical information, properties, constraints, etc. The ontological representation is rich in comparison with the regular data schema because of metadata about various properties, constraints, relationship to other entities within the domain, etc. While finding similarities between entities this metadata is often overlooked. The second challenge is that the comparison of two ontologies is an intense operation and highly depends on the domain and the language that the ontologies are expressed in. Most current methods require human intervention that leads to a time-consuming and cumbersome process and the output is prone to human errors. The proposed unsupervised recursive neural network technique achieves an F-measure of 80.3% on the Anatomy dataset and the proposed graph neural network technique achieves an F-measure of 81.0% on the Anatomy dataset.

ContributorsChakraborty, Jaydeep (Author) / Bansal, Srividya (Thesis advisor) / Sherif, Mohamed (Committee member) / Bansal, Ajay (Committee member) / Hsiao, Sharon (Committee member) / Arizona State University (Publisher)

Created2021

Semantic Information Extraction From Natural Language Using a Learning and Rule-Based Approach

Description

Open Information Extraction (OIE) is a subset of Natural Language Processing (NLP) that constitutes the processing of natural language into structured and machine-readable data. This thesis uses data in Resource Description Framework (RDF) triple format that comprises of a subject, predicate, and object. The extraction of RDF triples from…

Open Information Extraction (OIE) is a subset of Natural Language Processing (NLP) that constitutes the processing of natural language into structured and machine-readable data. This thesis uses data in Resource Description Framework (RDF) triple format that comprises of a subject, predicate, and object. The extraction of RDF triples from natural language is an essential step towards importing data into web ontologies as part of the linked open data cloud on the Semantic web. There have been a number of related techniques for extraction of triples from plain natural language text including but not limited to ClausIE, OLLIE, Reverb, and DeepEx. This proposed study aims to reduce the dependency on conventional machine learning models since they require training datasets, and the models are not easily customizable or explainable. By leveraging a context-free grammar (CFG) based model, this thesis aims to address some of these issues while minimizing the trade-offs on performance and accuracy. Furthermore, a deep-dive is conducted to analyze the strengths and limitations of the proposed approach.

ContributorsSingh, Varun (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Mehlhase, Alexandra (Committee member) / Arizona State University (Publisher)

Created2023

Specialized Noise Elimination in Astronomical Data using Deep Learning

Description

Astronomy has a data de-noising problem. The quantity of data produced by astronomical instruments is immense, and a wide variety of noise is present in this data including artifacts. Many types of this noise are not easily filtered using traditional handwritten algorithms. Deep learning techniques present a potential solution to…

Astronomy has a data de-noising problem. The quantity of data produced by astronomical instruments is immense, and a wide variety of noise is present in this data including artifacts. Many types of this noise are not easily filtered using traditional handwritten algorithms. Deep learning techniques present a potential solution to the identification and filtering of these more difficult types of noise. In this thesis, deep learning approaches to two astronomical data de-noising steps are attempted and evaluated. Pre-existing simulation tools are utilized to generate a high-quality training dataset for deep neural network models. These models are then tested on real-world data. One set of models masks diffraction spikes from bright stars in James Webb Space Telescope data. A second set of models identifies and masks regions of the sky that would interfere with sky surface brightness measurements. The results obtained indicate that many such astronomical data de-noising and analysis problems can use this approach of simulating a high-quality training dataset and then utilizing a deep learning model trained on that dataset.

ContributorsJeffries, Charles George (Author) / Bansal, Ajay (Thesis advisor) / Windhorst, Rogier (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2024

ASU Electronic Theses and Dissertations

Filtering by

Lossless Data Compression by Representing Data as a Solution to the Diophantine Equations

Early Detection of At-Risk Students Using LMS Data

Predicting Student Dropout in Self-Paced MOOC Course

OntoConnect: Domain-Agnostic Ontology Alignment using Neural Networks

Semantic Information Extraction From Natural Language Using a Learning and Rule-Based Approach

Specialized Noise Elimination in Astronomical Data using Deep Learning