Search Content

Cost-Sensitive Selective Classification and its Applications to Online Fraud Management

Description

Fraud is defined as the utilization of deception for illegal gain by hiding the true nature of the activity. While organizations lose around $3.7 trillion in revenue due to financial crimes and fraud worldwide, they can affect all levels of society significantly. In this dissertation, I focus on credit card…

Fraud is defined as the utilization of deception for illegal gain by hiding the true nature of the activity. While organizations lose around $3.7 trillion in revenue due to financial crimes and fraud worldwide, they can affect all levels of society significantly. In this dissertation, I focus on credit card fraud in online transactions. Every online transaction comes with a fraud risk and it is the merchant's liability to detect and stop fraudulent transactions. Merchants utilize various mechanisms to prevent and manage fraud such as automated fraud detection systems and manual transaction reviews by expert fraud analysts. Many proposed solutions mostly focus on fraud detection accuracy and ignore financial considerations. Also, the highly effective manual review process is overlooked. First, I propose Profit Optimizing Neural Risk Manager (PONRM), a selective classifier that (a) constitutes optimal collaboration between machine learning models and human expertise under industrial constraints, (b) is cost and profit sensitive. I suggest directions on how to characterize fraudulent behavior and assess the risk of a transaction. I show that my framework outperforms cost-sensitive and cost-insensitive baselines on three real-world merchant datasets. While PONRM is able to work with many supervised learners and obtain convincing results, utilizing probability outputs directly from the trained model itself can pose problems, especially in deep learning as softmax output is not a true uncertainty measure. This phenomenon, and the wide and rapid adoption of deep learning by practitioners brought unintended consequences in many situations such as in the infamous case of Google Photos' racist image recognition algorithm; thus, necessitated the utilization of the quantified uncertainty for each prediction. There have been recent efforts towards quantifying uncertainty in conventional deep learning methods (e.g., dropout as Bayesian approximation); however, their optimal use in decision making is often overlooked and understudied. Thus, I present a mixed-integer programming framework for selective classification called MIPSC, that investigates and combines model uncertainty and predictive mean to identify optimal classification and rejection regions. I also extend this framework to cost-sensitive settings (MIPCSC) and focus on the critical real-world problem, online fraud management and show that my approach outperforms industry standard methods significantly for online fraud management in real-world settings.

ContributorsYildirim, Mehmet Yigit (Author) / Davulcu, Hasan (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Huang, Dijiang (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2019

Learning Analytics and Behavior of Distributed Self-assessment and Reflections in Programming Problem Solving

Description

Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in the Computer Science (CS) and programming domain. They may choose…

Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in the Computer Science (CS) and programming domain. They may choose to utilize these opportunities to self-assess their learning progress and practice their skill. My objective in this thesis is to understand to what extent self-assess process can impact novice programmers learning and what advanced learning technologies can I provide to enhance the learner’s outcome and the progress. In this dissertation, I conducted a series of studies to investigate learning analytics and students’ behaviors in working on self-assessments and reflection opportunities. To enable this objective, I designed a personalized learning platform named QuizIT that provides daily quizzes to support learners in the computer science domain. QuizIT adopts an Open Social Student Model (OSSM) that supports personalized learning and serves as a self-assessment system. It aims to ignite self-regulating behavior and engage students in the self-assessment and reflective procedure. I designed and integrated the personalized practice recommender to the platform to investigate the self-assessment process. I also evaluated the self-assessment behavioral trails as a predictor to the students’ performance. The statistical indicators suggested that the distributed reflections were associated with the learner's performance. I proceeded to address whether distributed reflections enable self-regulating behavior and lead to better learning in CS introductory courses. From the student interactions with the system, I found distinct behavioral patterns that showed early signs of the learners' performance trajectory. The utilization of the personalized recommender improved the student’s engagement and performance in the self-assessment procedure. When I focused on enhancing reflections impact during self-assessment sessions through weekly opportunities, the learners in the CS domain showed better self-regulating learning behavior when utilizing those opportunities. The weekly reflections provided by the learners were able to capture more reflective features than the daily opportunities. Overall, this dissertation demonstrates the effectiveness of the learning technologies, including adaptive recommender and reflection, to support novice programming learners and their self-assessing processes.

ContributorsAlzaid, Mohammed (Author) / Hsiao, Ihan (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / VanLehn, Kurt (Committee member) / Nelson, Brian (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2022

AI-assisted Programming Question Generation: Constructing Semantic Networks of Programming Knowledge by Local Knowledge Graph and Abstract Syntax Tree

Description

Persistent self-assessment is the key to proficiency in computer programming. The process involves distributed practice of code tracing and writing skills which encompasses a large amount of training that is tailored for the student's learning condition. It requires the instructor to efficiently manage the learning resource and diligently generate related…

Persistent self-assessment is the key to proficiency in computer programming. The process involves distributed practice of code tracing and writing skills which encompasses a large amount of training that is tailored for the student's learning condition. It requires the instructor to efficiently manage the learning resource and diligently generate related programming questions for the student. However, programming question generation (PQG) is not an easy job. The instructor has to organize heterogeneous types of resources, i.e., conceptual programming concepts and procedural programming rules. S/he also has to carefully align the learning goals with the design of questions in regard to the topic relevance and complexity. Although numerous educational technologies like learning management systems (LMS) have been adopted across levels of programming learning, PQG is still largely based on the demanding creation task performed by the instructor without advanced technological support. To fill this gap, I propose a knowledge-based PQG model that aims to help the instructor generate new programming questions and expand existing assessment items. The PQG model is designed to transform conceptual and procedural programming knowledge from textbooks into a semantic network model by the Local Knowledge Graph (LKG) and the Abstract Syntax Tree (AST). For a given question, the model can generate a set of new questions by the associated LKG/AST semantic structures. I used the model to compare instructor-made questions from 9 undergraduate programming courses and textbook questions, which showed that the instructor-made questions had much simpler complexity than the textbook ones. The analysis also revealed the difference in topic distributions between the two question sets. A classification analysis further showed that the complexity of questions was correlated with student performance. To evaluate the performance of PQG, a group of experienced instructors from introductory programming courses was recruited. The result showed that the machine-generated questions were semantically similar to the instructor-generated questions. The questions also received significantly positive feedback regarding the topic relevance and extensibility. Overall, this work demonstrates a feasible PQG model that sheds light on AI-assisted PQG for the future development of intelligent authoring tools for programming learning.

ContributorsChung, Cheng-Yu (Author) / Hsiao, Ihan (Thesis advisor) / VanLehn, Kurt (Committee member) / Sahebi, Shaghayegh (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2022

Biology question generation from a semantic network

Description

Science instructors need questions for use in exams, homework assignments, class discussions, reviews, and other instructional activities. Textbooks never have enough questions, so instructors must find them from other sources or generate their own questions. In order to supply instructors with biology questions, a semantic network approach was…

Science instructors need questions for use in exams, homework assignments, class discussions, reviews, and other instructional activities. Textbooks never have enough questions, so instructors must find them from other sources or generate their own questions. In order to supply instructors with biology questions, a semantic network approach was developed for generating open response biology questions. The generated questions were compared to professional authorized questions.

To boost students’ learning experience, adaptive selection was built on the generated questions. Bayesian Knowledge Tracing was used as embedded assessment of the student’s current competence so that a suitable question could be selected based on the student’s previous performance. A between-subjects experiment with 42 participants was performed, where half of the participants studied with adaptive selected questions and the rest studied with mal-adaptive order of questions. Both groups significantly improved their test scores, and the participants in adaptive group registered larger learning gains than participants in the control group.

To explore the possibility of generating rich instructional feedback for machine-generated questions, a question-paragraph mapping task was identified. Given a set of questions and a list of paragraphs for a textbook, the goal of the task was to map the related paragraphs to each question. An algorithm was developed whose performance was comparable to human annotators.

A multiple-choice question with high quality distractors (incorrect answers) can be pedagogically valuable as well as being much easier to grade than open-response questions. Thus, an algorithm was developed to generate good distractors for multiple-choice questions. The machine-generated multiple-choice questions were compared to human-generated questions in terms of three measures: question difficulty, question discrimination and distractor usefulness. By recruiting 200 participants from Amazon Mechanical Turk, it turned out that the two types of questions performed very closely on all the three measures.

ContributorsZhang, Lishang (Author) / VanLehn, Kurt (Thesis advisor) / Baral, Chitta (Committee member) / Hsiao, Ihan (Committee member) / Wright, Christian (Committee member) / Arizona State University (Publisher)

Created2015

Detecting Political Framing Shifts and the Adversarial Phrases within\\ Rival Factions and Ranking Temporal Snapshot Contents in Social Media

Description

Social Computing is an area of computer science concerned with dynamics of communities and cultures, created through computer-mediated social interaction. Various social media platforms, such as social network services and microblogging, enable users to come together and create social movements expressing their opinions on diverse sets of issues, events, complaints,…

Social Computing is an area of computer science concerned with dynamics of communities and cultures, created through computer-mediated social interaction. Various social media platforms, such as social network services and microblogging, enable users to come together and create social movements expressing their opinions on diverse sets of issues, events, complaints, grievances, and goals. Methods for monitoring and summarizing these types of sociopolitical trends, its leaders and followers, messages, and dynamics are needed. In this dissertation, a framework comprising of community and content-based computational methods is presented to provide insights for multilingual and noisy political social media content. First, a model is developed to predict the emergence of viral hashtag breakouts, using network features. Next, another model is developed to detect and compare individual and organizational accounts, by using a set of domain and language-independent features. The third model exposes contentious issues, driving reactionary dynamics between opposing camps. The fourth model develops community detection and visualization methods to reveal underlying dynamics and key messages that drive dynamics. The final model presents a use case methodology for detecting and monitoring foreign influence, wherein a state actor and news media under its control attempt to shift public opinion by framing information to support multiple adversarial narratives that facilitate their goals. In each case, a discussion of novel aspects and contributions of the models is presented, as well as quantitative and qualitative evaluations. An analysis of multiple conflict situations will be conducted, covering areas in the UK, Bangladesh, Libya and the Ukraine where adversarial framing lead to polarization, declines in social cohesion, social unrest, and even civil wars (e.g., Libya and the Ukraine).

ContributorsAlzahrani, Sultan (Author) / Davulcu, Hasan (Thesis advisor) / Corman, Steve R. (Committee member) / Li, Baoxin (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2018

Towards Building an Intelligent Tutor for Gestural Languages using Concept Level Explainable AI

Description

Languages, specially gestural and sign languages, are best learned in immersive environments with rich feedback. Computer-Aided Language Learning (CALL) solu- tions for spoken languages have successfully incorporated some feedback mechanisms, but no such solution exists for signed languages. Computer Aided Sign Language Learning (CASLL) is a recent and promising field…

Languages, specially gestural and sign languages, are best learned in immersive environments with rich feedback. Computer-Aided Language Learning (CALL) solu- tions for spoken languages have successfully incorporated some feedback mechanisms, but no such solution exists for signed languages. Computer Aided Sign Language Learning (CASLL) is a recent and promising field of research which is made feasible by advances in Computer Vision and Sign Language Recognition(SLR). Leveraging existing SLR systems for feedback based learning is not feasible because their decision processes are not human interpretable and do not facilitate conceptual feedback to learners. Thus, fundamental research is needed towards designing systems that are modular and explainable. The explanations from these systems can then be used to produce feedback to aid in the learning process.

In this work, I present novel approaches for the recognition of location, movement and handshape that are components of American Sign Language (ASL) using both wrist-worn sensors as well as webcams. Finally, I present Learn2Sign(L2S), a chat- bot based AI tutor that can provide fine-grained conceptual feedback to learners of ASL using the modular recognition approaches. L2S is designed to provide feedback directly relating to the fundamental concepts of ASL using an explainable AI. I present the system performance results in terms of Precision, Recall and F-1 scores as well as validation results towards the learning outcomes of users. Both retention and execution tests for 26 participants for 14 different ASL words learned using learn2sign is presented. Finally, I also present the results of a post-usage usability survey for all the participants. In this work, I found that learners who received live feedback on their executions improved their execution as well as retention performances. The average increase in execution performance was 28% points and that for retention was 4% points.

ContributorsPaudyal, Prajwal (Author) / Gupta, Sandeep (Thesis advisor) / Banerjee, Ayan (Committee member) / Hsiao, Ihan (Committee member) / Azuma, Tamiko (Committee member) / Yang, Yezhou (Committee member) / Arizona State University (Publisher)

Created2020

Automatic Classification of Small Group Dynamics using Speech and Collaborative Writing

Description

Students seldom spontaneously collaborate with each other. A system that can measure collaboration in real time could be useful, for example, by helping the teacher locate a group requiring guidance. To address this challenge, the research presented here focuses on building and comparing collaboration detectors for different types of classroom…

Students seldom spontaneously collaborate with each other. A system that can measure collaboration in real time could be useful, for example, by helping the teacher locate a group requiring guidance. To address this challenge, the research presented here focuses on building and comparing collaboration detectors for different types of classroom problem solving activities, such as card sorting and handwriting.

Transfer learning using different representations was also studied with a goal of building collaboration detectors for one task can be used with a new task. Data for building such detectors were collected in the form of verbal interaction and user action logs from students’ tablets. Three qualitative levels of interactivity were distinguished: Collaboration, Cooperation and Asymmetric Contribution. Machine learning was used to induce a classifier that can assign a code for every episode based on the set of features. The results indicate that machine learned classifiers were reliable and can transfer.

ContributorsViswanathan, Sree Aurovindh (Author) / VanLehn, Kurt (Thesis advisor) / Hsiao, Ihan (Committee member) / Walker, Erin (Committee member) / D' Angelo, Cynthia (Committee member) / Arizona State University (Publisher)

Created2020

Filtering by

Cost-Sensitive Selective Classification and its Applications to Online Fraud Management

Learning Analytics and Behavior of Distributed Self-assessment and Reflections in Programming Problem Solving

AI-assisted Programming Question Generation: Constructing Semantic Networks of Programming Knowledge by Local Knowledge Graph and Abstract Syntax Tree

Biology question generation from a semantic network

Detecting Political Framing Shifts and the Adversarial Phrases within\\ Rival Factions and Ranking Temporal Snapshot Contents in Social Media

Towards Building an Intelligent Tutor for Gestural Languages using Concept Level Explainable AI

Automatic Classification of Small Group Dynamics using Speech and Collaborative Writing