Search Content

Harnessing Digital Footprints From Paper-based Assessments: An Investigation on Students' Reviewing Behavior

Description

This thesis investigates students' learning behaviors through their interaction with an educational technology, Web Programming Grading Assistant. The technology was developed to facilitate the grading of paper-based examinations in large lecture-based classrooms and to provide richer and more meaningful feedback to students. A classroom study was designed and data was…

This thesis investigates students' learning behaviors through their interaction with an educational technology, Web Programming Grading Assistant. The technology was developed to facilitate the grading of paper-based examinations in large lecture-based classrooms and to provide richer and more meaningful feedback to students. A classroom study was designed and data was gathered from an undergraduate computer-programming course in the fall of 2016. Analysis of the data revealed that there was a negative correlation between time lag of first review attempt and performance. A survey was developed and disseminated that gave insight into how students felt about the technology and what they normally do to study for programming exams. In conclusion, the knowledge gained in this study aids in the quest to better educate students in computer programming in large in-person classrooms.

ContributorsMurphy, Hannah (Author) / Hsiao, Ihan (Thesis director) / Nelson, Brian (Committee member) / School of Computing, Informatics, and Decision Systems Engineering (Contributor) / Department of Supply Chain Management (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Cost-Sensitive Selective Classification and its Applications to Online Fraud Management

Description

Fraud is defined as the utilization of deception for illegal gain by hiding the true nature of the activity. While organizations lose around $3.7 trillion in revenue due to financial crimes and fraud worldwide, they can affect all levels of society significantly. In this dissertation, I focus on credit card…

Fraud is defined as the utilization of deception for illegal gain by hiding the true nature of the activity. While organizations lose around $3.7 trillion in revenue due to financial crimes and fraud worldwide, they can affect all levels of society significantly. In this dissertation, I focus on credit card fraud in online transactions. Every online transaction comes with a fraud risk and it is the merchant's liability to detect and stop fraudulent transactions. Merchants utilize various mechanisms to prevent and manage fraud such as automated fraud detection systems and manual transaction reviews by expert fraud analysts. Many proposed solutions mostly focus on fraud detection accuracy and ignore financial considerations. Also, the highly effective manual review process is overlooked. First, I propose Profit Optimizing Neural Risk Manager (PONRM), a selective classifier that (a) constitutes optimal collaboration between machine learning models and human expertise under industrial constraints, (b) is cost and profit sensitive. I suggest directions on how to characterize fraudulent behavior and assess the risk of a transaction. I show that my framework outperforms cost-sensitive and cost-insensitive baselines on three real-world merchant datasets. While PONRM is able to work with many supervised learners and obtain convincing results, utilizing probability outputs directly from the trained model itself can pose problems, especially in deep learning as softmax output is not a true uncertainty measure. This phenomenon, and the wide and rapid adoption of deep learning by practitioners brought unintended consequences in many situations such as in the infamous case of Google Photos' racist image recognition algorithm; thus, necessitated the utilization of the quantified uncertainty for each prediction. There have been recent efforts towards quantifying uncertainty in conventional deep learning methods (e.g., dropout as Bayesian approximation); however, their optimal use in decision making is often overlooked and understudied. Thus, I present a mixed-integer programming framework for selective classification called MIPSC, that investigates and combines model uncertainty and predictive mean to identify optimal classification and rejection regions. I also extend this framework to cost-sensitive settings (MIPCSC) and focus on the critical real-world problem, online fraud management and show that my approach outperforms industry standard methods significantly for online fraud management in real-world settings.

ContributorsYildirim, Mehmet Yigit (Author) / Davulcu, Hasan (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Huang, Dijiang (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2019

Learning Analytics and Behavior of Distributed Self-assessment and Reflections in Programming Problem Solving

Description

Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in the Computer Science (CS) and programming domain. They may choose…

Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in the Computer Science (CS) and programming domain. They may choose to utilize these opportunities to self-assess their learning progress and practice their skill. My objective in this thesis is to understand to what extent self-assess process can impact novice programmers learning and what advanced learning technologies can I provide to enhance the learner’s outcome and the progress. In this dissertation, I conducted a series of studies to investigate learning analytics and students’ behaviors in working on self-assessments and reflection opportunities. To enable this objective, I designed a personalized learning platform named QuizIT that provides daily quizzes to support learners in the computer science domain. QuizIT adopts an Open Social Student Model (OSSM) that supports personalized learning and serves as a self-assessment system. It aims to ignite self-regulating behavior and engage students in the self-assessment and reflective procedure. I designed and integrated the personalized practice recommender to the platform to investigate the self-assessment process. I also evaluated the self-assessment behavioral trails as a predictor to the students’ performance. The statistical indicators suggested that the distributed reflections were associated with the learner's performance. I proceeded to address whether distributed reflections enable self-regulating behavior and lead to better learning in CS introductory courses. From the student interactions with the system, I found distinct behavioral patterns that showed early signs of the learners' performance trajectory. The utilization of the personalized recommender improved the student’s engagement and performance in the self-assessment procedure. When I focused on enhancing reflections impact during self-assessment sessions through weekly opportunities, the learners in the CS domain showed better self-regulating learning behavior when utilizing those opportunities. The weekly reflections provided by the learners were able to capture more reflective features than the daily opportunities. Overall, this dissertation demonstrates the effectiveness of the learning technologies, including adaptive recommender and reflection, to support novice programming learners and their self-assessing processes.

ContributorsAlzaid, Mohammed (Author) / Hsiao, Ihan (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / VanLehn, Kurt (Committee member) / Nelson, Brian (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2022

AI-assisted Programming Question Generation: Constructing Semantic Networks of Programming Knowledge by Local Knowledge Graph and Abstract Syntax Tree

Description

Persistent self-assessment is the key to proficiency in computer programming. The process involves distributed practice of code tracing and writing skills which encompasses a large amount of training that is tailored for the student's learning condition. It requires the instructor to efficiently manage the learning resource and diligently generate related…

Persistent self-assessment is the key to proficiency in computer programming. The process involves distributed practice of code tracing and writing skills which encompasses a large amount of training that is tailored for the student's learning condition. It requires the instructor to efficiently manage the learning resource and diligently generate related programming questions for the student. However, programming question generation (PQG) is not an easy job. The instructor has to organize heterogeneous types of resources, i.e., conceptual programming concepts and procedural programming rules. S/he also has to carefully align the learning goals with the design of questions in regard to the topic relevance and complexity. Although numerous educational technologies like learning management systems (LMS) have been adopted across levels of programming learning, PQG is still largely based on the demanding creation task performed by the instructor without advanced technological support. To fill this gap, I propose a knowledge-based PQG model that aims to help the instructor generate new programming questions and expand existing assessment items. The PQG model is designed to transform conceptual and procedural programming knowledge from textbooks into a semantic network model by the Local Knowledge Graph (LKG) and the Abstract Syntax Tree (AST). For a given question, the model can generate a set of new questions by the associated LKG/AST semantic structures. I used the model to compare instructor-made questions from 9 undergraduate programming courses and textbook questions, which showed that the instructor-made questions had much simpler complexity than the textbook ones. The analysis also revealed the difference in topic distributions between the two question sets. A classification analysis further showed that the complexity of questions was correlated with student performance. To evaluate the performance of PQG, a group of experienced instructors from introductory programming courses was recruited. The result showed that the machine-generated questions were semantically similar to the instructor-generated questions. The questions also received significantly positive feedback regarding the topic relevance and extensibility. Overall, this work demonstrates a feasible PQG model that sheds light on AI-assisted PQG for the future development of intelligent authoring tools for programming learning.

ContributorsChung, Cheng-Yu (Author) / Hsiao, Ihan (Thesis advisor) / VanLehn, Kurt (Committee member) / Sahebi, Shaghayegh (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2022

Predicting and Interpreting Students Performance using Supervised Learning and Shapley Additive Explanations

Description

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness,…

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness, but limited studies compared different statistical techniques with latest frameworks, and interpreted models in a unified approach.

In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore

and predict students’ performances. Multiple machine learning models and the model accuracy were evaluated based on the Shapley Additive Explanation.

The Cross-Validation shows the Gradient Boosting Decision Tree has the best precision 85.93% with average 82.90%. Features like Component grade, Due Date, Submission Times have higher impact than others. Baseline model received lower precision due to lack of non-linear fitting.

ContributorsTian, Wenbo (Author) / Hsiao, Ihan (Thesis advisor) / Bazzi, Rida (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2019

A Framework for Spatial Database Explanations

Description

In the last few years, there has been a tremendous increase in the use of big data. Most of this data is hard to understand because of its size and dimensions. The importance of this problem can be emphasized by the fact that Big Data Research and Development Initiative was…

In the last few years, there has been a tremendous increase in the use of big data. Most of this data is hard to understand because of its size and dimensions. The importance of this problem can be emphasized by the fact that Big Data Research and Development Initiative was announced by the United States administration in 2012 to address problems faced by the government. Various states and cities in the US gather spatial data about incidents like police calls for service.

When we query large amounts of data, it may lead to a lot of questions. For example, when we look at arithmetic relationships between queries in heterogeneous data, there are a lot of differences. How can we explain what factors account for these differences? If we define the observation as an arithmetic relationship between queries, this kind of problem can be solved by aggravation or intervention. Aggravation views the value of our observation for different set of tuples while intervention looks at the value of the observation after removing sets of tuples. We call the predicates which represent these tuples, explanations. Observations by themselves have limited importance. For example, if we observe a large number of taxi trips in a specific area, we might ask the question: Why are there so many trips here? Explanations attempt to answer these kinds of questions.

While aggravation and intervention are designed for non spatial data, we propose a new approach for explaining spatially heterogeneous data. Our approach expands on aggravation and intervention while using spatial partitioning/clustering to improve explanations for spatial data. Our proposed approach was evaluated against a real-world taxi dataset as well as a synthetic disease outbreak datasets. The approach was found to outperform aggravation in precision and recall while outperforming intervention in precision.

ContributorsTahir, Anique (Author) / Elsayed, Mohamed (Thesis advisor) / Hsiao, Ihan (Committee member) / Maciejewski, Ross (Committee member) / Arizona State University (Publisher)

Created2018

Exploring generic features for online large-scale discussion forum comments

Description

Online discussion forums have become an integral part of education and are large repositories of valuable information. They facilitate exploratory learning by allowing users to review and respond to the work of others and approach learning in diverse ways. This research investigates the different comment semantic features and the effect…

Online discussion forums have become an integral part of education and are large repositories of valuable information. They facilitate exploratory learning by allowing users to review and respond to the work of others and approach learning in diverse ways. This research investigates the different comment semantic features and the effect they have on the quality of a post in a large-scale discussion forum. We survey the relevant literature and employ the key content quality identification features. We then construct comment semantics features and build several regression models to explore the value of comment semantics dynamics. The results reconfirm the usefulness of several essential quality predictors, including time, reputation, length, and editorship. We also found that comment semantics are valuable to shape the answer quality. Specifically, the diversity of comments significantly contributes to the answer quality. In addition, when searching for good quality answers, it is important to look for global semantics dynamics (diversity), rather than observe local differences (disputable content). Finally, the presence of comments shepherd the community to revise the posts by attracting attentions to the posts and eventually facilitate the editing process.

ContributorsAggarwal, Adithya (Author) / Hsiao, Ihan (Thesis advisor) / Lopez, Claudia (Committee member) / Walker, Erin (Committee member) / Arizona State University (Publisher)

Created2016

Feature selection techniques for effective model building and estimation on Twitter data to understand the political scenario in Latvia with supporting visualizations

Description

In supervised learning, machine learning techniques can be applied to learn a model on

a small set of labeled documents which can be used to classify a larger set of unknown

documents. Machine learning techniques can be used to analyze a political scenario

in a given society. A lot of research has been…

In supervised learning, machine learning techniques can be applied to learn a model on

a small set of labeled documents which can be used to classify a larger set of unknown

documents. Machine learning techniques can be used to analyze a political scenario

in a given society. A lot of research has been going on in this field to understand

the interactions of various people in the society in response to actions taken by their

organizations.

This paper talks about understanding the Russian influence on people in Latvia.

This is done by building an eeffective model learnt on initial set of documents

containing a combination of official party web-pages, important political leaders' social

networking sites. Since twitter is a micro-blogging site which allows people to post

their opinions on any topic, the model built is used for estimating the tweets sup-

porting the Russian and Latvian political organizations in Latvia. All the documents

collected for analysis are in Latvian and Russian languages which are rich in vocabulary resulting into huge number of features. Hence, feature selection techniques can

be used to reduce the vocabulary set relevant to the classification model. This thesis

provides a comparative analysis of traditional feature selection techniques and implementation of a new iterative feature selection method using EM and cross-domain

training along with supportive visualization tool. This method out performed other

feature selection methods by reducing the number of features up-to 50% along with

good model accuracy. The results from the classification are used to interpret user

behavior and their political influence patterns across organizations in Latvia using

interactive dashboard with combination of powerful widgets.

ContributorsBollapragada, Lakshmi Gayatri Niharika (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2016

Enhanced topic-based modeling for Twitter sentiment analysis

Description

In this thesis multiple approaches are explored to enhance sentiment analysis of tweets. A standard sentiment analysis model with customized features is first trained and tested to establish a baseline. This is compared to an existing topic based mixture model and a new proposed topic based vector model both of…

In this thesis multiple approaches are explored to enhance sentiment analysis of tweets. A standard sentiment analysis model with customized features is first trained and tested to establish a baseline. This is compared to an existing topic based mixture model and a new proposed topic based vector model both of which use Latent Dirichlet Allocation (LDA) for topic modeling. The proposed topic based vector model has higher accuracies in terms of averaged F scores than the other two models.

ContributorsBaskaran, Swetha (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2016

SearchViz: an interactive visual interface to navigate search-results in online discussion forums

Description

Online programming communities are widely used by programmers for troubleshooting or various problem solving tasks. Large and ever increasing volume of posts on these communities demands more efforts to read and comprehend thus making it harder to find relevant information. In my thesis; I designed and studied an alternate approach…

Online programming communities are widely used by programmers for troubleshooting or various problem solving tasks. Large and ever increasing volume of posts on these communities demands more efforts to read and comprehend thus making it harder to find relevant information. In my thesis; I designed and studied an alternate approach by using interactive network visualization to represent relevant search results for online programming discussion forums.

I conducted user study to evaluate the effectiveness of this approach. Results show that users were able to identify relevant information more precisely via visual interface as compared to traditional list based approach. Network visualization demonstrated effective search-result navigation support to facilitate user’s tasks and improved query quality for successive queries. Subjective evaluation also showed that visualizing search results conveys more semantic information in efficient manner and makes searching more effective.

ContributorsMehta, Vishal Vimal (Author) / Hsiao, Ihan (Thesis advisor) / Walker, Erin (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2015