Matching Items (8)
Filtering by

Clear all filters

156331-Thumbnail Image.png
Description
Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and

Graph theory is a critical component of computer science and software engineering, with algorithms concerning graph traversal and comprehension powering much of the largest problems in both industry and research. Engineers and researchers often have an accurate view of their target graph, however they struggle to implement a correct, and efficient, search over that graph.

To facilitate rapid, correct, efficient, and intuitive development of graph based solutions we propose a new programming language construct - the search statement. Given a supra-root node, a procedure which determines the children of a given parent node, and optional definitions of the fail-fast acceptance or rejection of a solution, the search statement can conduct a search over any graph or network. Structurally, this statement is modelled after the common switch statement and is put into a largely imperative/procedural context to allow for immediate and intuitive development by most programmers. The Go programming language has been used as a foundation and proof-of-concept of the search statement. A Go compiler is provided which implements this construct.
ContributorsHenderson, Christopher (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)
Created2018
Description
Alzheimer’s disease (AD), is a chronic neurodegenerative disease that usually starts slowly and gets worse over time. It is the cause of 60% to 70% of cases of dementia. There is growing interest in identifying brain image biomarkers that help evaluate AD risk pre-symptomatically. High-dimensional non-linear pattern classification methods have

Alzheimer’s disease (AD), is a chronic neurodegenerative disease that usually starts slowly and gets worse over time. It is the cause of 60% to 70% of cases of dementia. There is growing interest in identifying brain image biomarkers that help evaluate AD risk pre-symptomatically. High-dimensional non-linear pattern classification methods have been applied to structural magnetic resonance images (MRI’s) and used to discriminate between clinical groups in Alzheimers progression. Using Fluorodeoxyglucose (FDG) positron emission tomography (PET) as the pre- ferred imaging modality, this thesis develops two independent machine learning based patch analysis methods and uses them to perform six binary classification experiments across different (AD) diagnostic categories. Specifically, features were extracted and learned using dimensionality reduction and dictionary learning & sparse coding by taking overlapping patches in and around the cerebral cortex and using them as fea- tures. Using AdaBoost as the preferred choice of classifier both methods try to utilize 18F-FDG PET as a biological marker in the early diagnosis of Alzheimer’s . Addi- tional we investigate the involvement of rich demographic features (ApoeE3, ApoeE4 and Functional Activities Questionnaires (FAQ)) in classification. The experimental results on Alzheimer’s Disease Neuroimaging initiative (ADNI) dataset demonstrate the effectiveness of both the proposed systems. The use of 18F-FDG PET may offer a new sensitive biomarker and enrich the brain imaging analysis toolset for studying the diagnosis and prognosis of AD.
ContributorsSrivastava, Anant (Author) / Wang, Yalin (Thesis advisor) / Bansal, Ajay (Thesis advisor) / Liang, Jianming (Committee member) / Arizona State University (Publisher)
Created2017
155468-Thumbnail Image.png
Description
Ensemble learning methods like bagging, boosting, adaptive boosting, stacking have traditionally shown promising results in improving the predictive accuracy in classification. These techniques have recently been widely used in various domains and applications owing to the improvements in computational efficiency and distributed computing advances. However, with the advent of wide

Ensemble learning methods like bagging, boosting, adaptive boosting, stacking have traditionally shown promising results in improving the predictive accuracy in classification. These techniques have recently been widely used in various domains and applications owing to the improvements in computational efficiency and distributed computing advances. However, with the advent of wide variety of applications of machine learning techniques to class imbalance problems, further focus is needed to evaluate, improve and optimize other performance measures such as sensitivity (true positive rate) and specificity (true negative rate) in classification. This thesis demonstrates a novel approach to evaluate and optimize the performance measures (specifically sensitivity and specificity) using ensemble learning methods for classification that can be especially useful in class imbalanced datasets. In this thesis, ensemble learning methods (specifically bagging and boosting) are used to optimize the performance measures (sensitivity and specificity) on a UC Irvine (UCI) 130 hospital diabetes dataset to predict if a patient will be readmitted to the hospital based on various feature vectors. From the experiments conducted, it can be empirically concluded that, by using ensemble learning methods, although accuracy does improve to some margin, both sensitivity and specificity are optimized significantly and consistently over different cross validation approaches. The implementation and evaluation has been done on a subset of the large UCI 130 hospital diabetes dataset. The performance measures of ensemble learners are compared to the base machine learning classification algorithms such as Naive Bayes, Logistic Regression, k Nearest Neighbor, Decision Trees and Support Vector Machines.
ContributorsBahl, Neeraj Dharampal (Author) / Bansal, Ajay (Thesis advisor) / Amresh, Ashish (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)
Created2017
155559-Thumbnail Image.png
Description
Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately

Machine learning methodologies are widely used in almost all aspects of software engineering. An effective machine learning model requires large amounts of data to achieve high accuracy. The data used for classification is mostly labeled, which is difficult to obtain. The dataset requires both high costs and effort to accurately label the data into different classes. With abundance of data, it becomes necessary that all the data should be labeled for its proper utilization and this work focuses on reducing the labeling effort for large dataset. The thesis presents a comparison of different classifiers performance to test if small set of labeled data can be utilized to build accurate models for high prediction rate. The use of small dataset for classification is then extended to active machine learning methodology where, first a one class classifier will predict the outliers in the data and then the outlier samples are added to a training set for support vector machine classifier for labeling the unlabeled data. The labeling of dataset can be scaled up to avoid manual labeling and building more robust machine learning methodologies.
ContributorsBatra, Salil (Author) / Femiani, John (Thesis advisor) / Amresh, Ashish (Thesis advisor) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)
Created2017
158206-Thumbnail Image.png
Description
Availability of affordable image and video capturing devices as well as rapid development of social networking and content sharing websites has led to the creation of new type of content, Social Media. Any system serving the end user’s query search request should not only take the relevant images into consideration

Availability of affordable image and video capturing devices as well as rapid development of social networking and content sharing websites has led to the creation of new type of content, Social Media. Any system serving the end user’s query search request should not only take the relevant images into consideration but they also need to be divergent for a well-rounded description of a query. As a result, the automated optimization of image retrieval results that are also divergent becomes exceedingly important.



The main focus of this thesis is to use visual description of a landmark by choosing the most diverse pictures that best describe all the details of the queried location from community-contributed datasets. For this, an end-to-end framework has been built, to retrieve relevant results that are also diverse. Different retrieval re-ranking and diversification strategies are evaluated to find a balance between relevance and diversification. Clustering techniques are employed to improve divergence. A unique fusion approach has been adopted to overcome the dilemma of selecting an appropriate clustering technique and the corresponding parameters, given a set of data to be investigated. Extensive experiments have been conducted on the Flickr Div150Cred dataset that has 30 different landmark locations. The results obtained are promising when evaluated on metrics for relevance and diversification.
ContributorsKalakota, Vaibhav Reddy (Author) / Bansal, Ajay (Thesis advisor) / Bansal, Srividya (Committee member) / Findler, Michael (Committee member) / Arizona State University (Publisher)
Created2020
158310-Thumbnail Image.png
Description
Globalization is driving a rapid increase in motivation for learning new languages, with online and mobile language learning applications being an extremely popular method of doing so. Many language learning applications focus almost exclusively on aiding students in acquiring vocabulary, one of the most important elements in achieving fluency in

Globalization is driving a rapid increase in motivation for learning new languages, with online and mobile language learning applications being an extremely popular method of doing so. Many language learning applications focus almost exclusively on aiding students in acquiring vocabulary, one of the most important elements in achieving fluency in a language. A well-balanced language curriculum must include both explicit vocabulary instruction and implicit vocabulary learning through interaction with authentic language materials. However, most language learning applications focus only on explicit instruction, providing little support for implicit learning. Students require support with implicit vocabulary learning because they need enough context to guess and acquire new words. Traditional techniques aim to teach students enough vocabulary to comprehend the text, thus enabling them to acquire new words. Despite the wide variety of support for vocabulary learning offered by learning applications today, few offer guidance on how to select an optimal vocabulary study set.

This thesis proposes a novel method of student modeling which uses pre-trained masked language models to model a student's reading comprehension abilities and detect words which are required for comprehension of a text. It explores the efficacy of using pre-trained masked language models to model human reading comprehension and presents a vocabulary study set generation pipeline using this method. This pipeline creates vocabulary study sets for explicit language learning that enable comprehension while still leaving some words to be acquired implicitly. Promising results show that masked language modeling can be used to model human comprehension and that the pipeline produces reasonably sized vocabulary study sets.
ContributorsEdgar, Vatricia Cathrine (Author) / Bansal, Ajay (Thesis advisor) / Acuna, Ruben (Committee member) / Mehlhase, Alexandra (Committee member) / Arizona State University (Publisher)
Created2020
161629-Thumbnail Image.png
Description
One persisting problem in Massive Open Online Courses (MOOCs) is the issue of student dropout from these courses. The prediction of student dropout from MOOC courses can identify the factors responsible for such an event and it can further initiate intervention before such an event to increase student success in

One persisting problem in Massive Open Online Courses (MOOCs) is the issue of student dropout from these courses. The prediction of student dropout from MOOC courses can identify the factors responsible for such an event and it can further initiate intervention before such an event to increase student success in MOOC. There are different approaches and various features available for the prediction of student’s dropout in MOOC courses.In this research, the data derived from the self-paced math course ‘College Algebra and Problem Solving’ offered on the MOOC platform Open edX offered by Arizona State University (ASU) from 2016 to 2020 was considered. This research aims to predict the dropout of students from a MOOC course given a set of features engineered from the learning of students in a day. Machine Learning (ML) model used is Random Forest (RF) and this model is evaluated using the validation metrics like accuracy, precision, recall, F1-score, Area Under the Curve (AUC), Receiver Operating Characteristic (ROC) curve. The average rate of student learning progress was found to have more impact than other features. The model developed can predict the dropout or continuation of students on any given day in the MOOC course with an accuracy of 87.5%, AUC of 94.5%, precision of 88%, recall of 87.5%, and F1-score of 87.5% respectively. The contributing features and interactions were explained using Shapely values for the prediction of the model. The features engineered in this research are predictive of student dropout and could be used for similar courses to predict student dropout from the course. This model can also help in making interventions at a critical time to help students succeed in this MOOC course.
ContributorsDominic Ravichandran, Sheran Dass (Author) / Gary, Kevin (Thesis advisor) / Bansal, Ajay (Committee member) / Cunningham, James (Committee member) / Sannier, Adrian (Committee member) / Arizona State University (Publisher)
Created2021
187326-Thumbnail Image.png
Description
Frontend development often involves the repetitive and time-consuming task of transforming a Graphical User interface (GUI) design into Frontend Code. The GUI design could either be an image or a design created on tools like Figma, Sketch, etc. This process can be particularly challenging when the website designs are experimental

Frontend development often involves the repetitive and time-consuming task of transforming a Graphical User interface (GUI) design into Frontend Code. The GUI design could either be an image or a design created on tools like Figma, Sketch, etc. This process can be particularly challenging when the website designs are experimental and undergo multiple iterations before the final version gets deployed. In such cases, developers work with the designers to make continuous changes and improve the look and feel of the website. This can lead to a lot of reworks and a poorly managed codebase that requires significant developer resources. To tackle this problem, researchers are exploring ways to automate the process of transforming image designs into functional websites instantly. This thesis explores the use of machine learning, specifically Recurrent Neural networks (RNN) to generate an intermediate code from an image design and then compile it into a React web frontend code. By utilizing this approach, designers can essentially transform an image design into a functional website, granting them creative freedom and the ability to present working prototypes to stockholders in real-time. To overcome the limitations of existing publicly available datasets, the thesis places significant emphasis on generating synthetic datasets. As part of this effort, the research proposes a novel method to double the size of the pix2code [2] dataset by incorporating additional complex HTML elements such as login forms, carousels, and cards. This approach has the potential to enhance the quality and diversity of training data available for machine learning models. Overall, the proposed approach offers a promising solution to the repetitive and time-consuming task of transforming GUI designs into frontend code.
ContributorsSingh, Ajitesh Janardan (Author) / Bansal, Ajay (Thesis advisor) / Mehlhase, Alexandra (Committee member) / Baron, Tyler (Committee member) / Arizona State University (Publisher)
Created2023