Search Content

System complexity reduction via feature selection

Description

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve…

This dissertation transforms a set of system complexity reduction problems to feature selection problems. Three systems are considered: classification based on association rules, network structure learning, and time series classification. Furthermore, two variable importance measures are proposed to reduce the feature selection bias in tree models. Associative classifiers can achieve high accuracy, but the combination of many rules is difficult to interpret. Rule condition subset selection (RCSS) methods for associative classification are considered. RCSS aims to prune the rule conditions into a subset via feature selection. The subset then can be summarized into rule-based classifiers. Experiments show that classifiers after RCSS can substantially improve the classification interpretability without loss of accuracy. An ensemble feature selection method is proposed to learn Markov blankets for either discrete or continuous networks (without linear, Gaussian assumptions). The method is compared to a Bayesian local structure learning algorithm and to alternative feature selection methods in the causal structure learning problem. Feature selection is also used to enhance the interpretability of time series classification. Existing time series classification algorithms (such as nearest-neighbor with dynamic time warping measures) are accurate but difficult to interpret. This research leverages the time-ordering of the data to extract features, and generates an effective and efficient classifier referred to as a time series forest (TSF). The computational complexity of TSF is only linear in the length of time series, and interpretable features can be extracted. These features can be further reduced, and summarized for even better interpretability. Lastly, two variable importance measures are proposed to reduce the feature selection bias in tree-based ensemble models. It is well known that bias can occur when predictor attributes have different numbers of values. Two methods are proposed to solve the bias problem. One uses an out-of-bag sampling method called OOBForest, and the other, based on the new concept of a partial permutation test, is called a pForest. Experimental results show the existing methods are not always reliable for multi-valued predictors, while the proposed methods have advantages.

ContributorsDeng, Houtao (Author) / Runger, George C. (Thesis advisor) / Lohr, Sharon L (Committee member) / Pan, Rong (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2011

Analyzing student problem-solving behavior in a step-based tutor and understanding the effect of unsolicited hints

Description

Lots of previous studies have analyzed human tutoring at great depths and have shown expert human tutors to produce effect sizes, which is twice of that produced by an intelligent tutoring system (ITS). However, there has been no consensus on which factor makes them so effective. It is important to…

Lots of previous studies have analyzed human tutoring at great depths and have shown expert human tutors to produce effect sizes, which is twice of that produced by an intelligent tutoring system (ITS). However, there has been no consensus on which factor makes them so effective. It is important to know this, so that same phenomena can be replicated in an ITS in order to achieve the same level of proficiency as expert human tutors. Also, to the best of my knowledge no one has looked at student reactions when they are working with a computer based tutor. The answers to both these questions are needed in order to build a highly effective computer-based tutor. My research focuses on the second question. In the first phase of my thesis, I analyzed the behavior of students when they were working with a step-based tutor Andes, using verbal-protocol analysis. The accomplishment of doing this was that I got to know of some ways in which students use a step-based tutor which can pave way for the creation of more effective computer-based tutors. I found from the first phase of the research that students often keep trying to fix errors by guessing repeatedly instead of asking for help by clicking the hint button. This phenomenon is known as hint refusal. Surprisingly, a large portion of the student's foundering was due to hint refusal. The hypothesis tested in the second phase of the research is that hint refusal can be significantly reduced and learning can be significantly increased if Andes uses more unsolicited hints and meta hints. An unsolicited hint is a hint that is given without the student asking for one. A meta-hint is like an unsolicited hint in that it is given without the student asking for it, but it just prompts the student to click on the hint button. Two versions of Andes were compared: the original version and a new version that gave more unsolicited and meta-hints. During a two-hour experiment, there were large, statistically reliable differences in several performance measures suggesting that the new policy was more effective.

ContributorsRanganathan, Rajagopalan (Author) / VanLehn, Kurt (Thesis advisor) / Atkinson, Robert (Committee member) / Burleson, Winslow (Committee member) / Arizona State University (Publisher)

Created2011

An intelligent co-reference resolver for Winograd schema sentences containing resolved semantic entities

Description

There has been a lot of research in the field of artificial intelligence about thinking machines. Alan Turing proposed a test to observe a machine's intelligent behaviour with respect to natural language conversation. The Winograd schema challenge is suggested as an alternative, to the Turing test. It needs inferencing capabilities,…

There has been a lot of research in the field of artificial intelligence about thinking machines. Alan Turing proposed a test to observe a machine's intelligent behaviour with respect to natural language conversation. The Winograd schema challenge is suggested as an alternative, to the Turing test. It needs inferencing capabilities, reasoning abilities and background knowledge to get the answer right. It involves a coreference resolution task in which a machine is given a sentence containing a situation which involves two entities, one pronoun and some more information about the situation and the machine has to come up with the right resolution of a pronoun to one of the entities. The complexity of the task is increased with the fact that the Winograd sentences are not constrained by one domain or specific sentence structure and it also contains a lot of human proper names. This modification makes the task of association of entities, to one particular word in the sentence, to derive the answer, difficult. I have developed a pronoun resolver system for the confined domain Winograd sentences. I have developed a classifier or filter which takes input sentences and decides to accept or reject them based on a particular criteria. Once the sentence is accepted. I run parsers on it to obtain the detailed analysis. Furthermore I have developed four answering modules which use world knowledge and inferencing mechanisms to try and resolve the pronoun. The four techniques I use are : ConceptNet knowledgebase, Search engine pattern counts,Narrative event chains and sentiment analysis. I have developed a particular aggregation mechanism for the answers from these modules to arrive at a final answer. I have used caching technique for the association relations that I obtain for different modules, so as to boost the performance. I run my system on the standard ‘nyu dataset’ of Winograd sentences and questions. This dataset is then restricted, by my classifier, to 90 sentences. I evaluate my system on this 90 sentence dataset. When I compare my results against the state of the art system on the same dataset, I get nearly 4.5 % improvement in the restricted domain.

ContributorsBudukh, Tejas Ulhas (Author) / Baral, Chitta (Thesis advisor) / VanLehn, Kurt (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

LudoNarrare: A Model for Verb Based Interactive Storytelling

Description

Instead of providing the illusion of agency to a reader via a tree or network of prewritten, branching paths, an interactive story should treat the reader as a player who has meaningful influence on the story. An interactive story can accomplish this task by giving the player a large toolset…

Instead of providing the illusion of agency to a reader via a tree or network of prewritten, branching paths, an interactive story should treat the reader as a player who has meaningful influence on the story. An interactive story can accomplish this task by giving the player a large toolset for expression in the plot. LudoNarrare, an engine for interactive storytelling, puts "verbs" in this toolset. Verbs are contextual choices of action given to agents in a story that result in narrative events. This paper begins with an analysis and statement of the problem of creating interactive stories. From here, various attempts to solve this problem, ranging from commercial video games to academic research, are given a brief overview to give context to what paths have already been forged. With the background set, the model of interactive storytelling that the research behind LudoNarrare led to is exposed in detail. The section exploring this model contains explanations on what storyworlds are and how they are structured. It then discusses the way these storyworlds can be brought to life. The exposition on the LudoNarrare model finally wraps up by considering the way storyworlds created around this model can be designed. After the concepts of LudoNarrare are explored in the abstract, the story of the engine's research and development and the specifics of its software implementation are given. With LudoNarrare fully explained, the focus then turns to plans for evaluation of its quality in terms of entertainment value, robustness, and performance. To conclude, possible further paths of investigation for LudoNarrare and its model of interactive storytelling are proposed to inspire those who wish to continue in the spirit of the project.

ContributorsStark, Joshua Matthew (Author) / VanLehn, Kurt (Thesis director) / Wetzel, Jon (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2015-12

Extensions to a unified theory of the cognitive architecture

Description

Building computational models of human problem solving has been a longstanding goal in Artificial Intelligence research. The theories of cognitive architectures addressed this issue by embedding models of problem solving within them. This thesis presents an extended account of human problem solving and describes its implementation within one such theory…

Building computational models of human problem solving has been a longstanding goal in Artificial Intelligence research. The theories of cognitive architectures addressed this issue by embedding models of problem solving within them. This thesis presents an extended account of human problem solving and describes its implementation within one such theory of cognitive architecture--ICARUS. The document begins by reviewing the standard theory of problem solving, along with how previous versions of ICARUS have incorporated and expanded on it. Next it discusses some limitations of the existing mechanism and proposes four extensions that eliminate these limitations, elaborate the framework along interesting dimensions, and bring it into closer alignment with human problem-solving abilities. After this, it presents evaluations on four domains that establish the benefits of these extensions. The results demonstrate the system's ability to solve problems in various domains and its generality. In closing, it outlines related work and notes promising directions for additional research.

ContributorsTrivedi, Nishant (Author) / Langley, Patrick W (Thesis advisor) / VanLehn, Kurt (Committee member) / Kambhampati, Subbarao (Committee member) / Arizona State University (Publisher)

Created2011

Human-Aware AI Methods for Active Teaming

Description

The future will be replete with Artificial Intelligence (AI) based agents closely collaborating with humans. Although it is challenging to construct such systems for real-world conditions, the Intelligent Tutoring System (ITS) community has proposed several techniques to work closely with students. However, there is a need to extend these systems…

The future will be replete with Artificial Intelligence (AI) based agents closely collaborating with humans. Although it is challenging to construct such systems for real-world conditions, the Intelligent Tutoring System (ITS) community has proposed several techniques to work closely with students. However, there is a need to extend these systems outside the controlled environment of the classroom. More recently, Human-Aware Planning (HAP) community has developed generalized AI techniques for collaborating with humans and providing personalized support or guidance to the collaborators. In this thesis, the take learning from the ITS community is extend to construct such human-aware systems for real-world domains and evaluate them with real stakeholders. First, the applicability of HAP to ITS is demonstrated, by modeling the behavior in a classroom and a state-of-the-art tutoring system called Dragoon. Then these techniques are extended to provide decision support to a human teammate and evaluate the effectiveness of the framework through ablation studies to support students in constructing their plan of study (\ipos). The results show that these techniques are helpful and can support users in their tasks. In the third section of the thesis, an ITS scenario of asking questions (or problems) in active environments is modeled by constructing questions to elicit a human teammate's model of understanding. The framework is evaluated through a user study, where the results show that the queries can be used for eliciting the human teammate's mental model.

ContributorsGrover, Sachin (Author) / Kambhampati, Subbarao (Thesis advisor) / Smith, David (Committee member) / Srivastava, Sidhharth (Committee member) / VanLehn, Kurt (Committee member) / Arizona State University (Publisher)

Created2022

Using ML to Predict Online Course Ratings

Description

The pandemic that hit in 2020 has boosted the growth of online learning that involves the booming of Massive Open Online Course (MOOC). To support this situation, it will be helpful to have tools that can help students in choosing between the different courses and can help instructors to understand…

The pandemic that hit in 2020 has boosted the growth of online learning that involves the booming of Massive Open Online Course (MOOC). To support this situation, it will be helpful to have tools that can help students in choosing between the different courses and can help instructors to understand what the students need. One of those tools is an online course ratings predictor. Using the predictor, online course instructors can learn the qualities that majority course takers deem as important, and thus they can adjust their lesson plans to fit those qualities. Meanwhile, students will be able to use it to help them in choosing the course to take by comparing the ratings. This research aims to find the best way to predict the rating of online courses using machine learning (ML). To create the ML model, different combinations of the length of the course, the number of materials it contains, the price of the course, the number of students taking the course, the course’s difficulty level, the usage of jargons or technical terms in the course description, the course’s instructors’ rating, the number of reviews the instructors got, and the number of classes the instructors have created on the same platform are used as the inputs. Meanwhile, the output of the model would be the average rating of a course. Data from 350 courses are used for this model, where 280 of them are used for training, 35 for testing, and the last 35 for validation. After trying out different machine learning models, wide neural networks model constantly gives the best training results while the medium tree model gives the best testing results. However, further research needs to be conducted as none of the results are not accurate, with 0.51 R-squared test result for the tree model.

ContributorsWidodo, Herlina (Author) / VanLehn, Kurt (Thesis director) / Craig, Scotty (Committee member) / Barrett, The Honors College (Contributor) / Department of Management and Entrepreneurship (Contributor) / Computer Science and Engineering Program (Contributor)

Created2021-12

Predicting Self-Correction Attempts with FACT, an Automated Teaching Assistant for Algebra Classes

Description

Machine learning is a rapidly growing field, with no doubt in part due to its countless applications to other fields, including pedagogy and the creation of computer-aided tutoring systems. To extend the functionality of FACT, an automated teaching assistant, we want to predict, using metadata produced by student activity, whether…

Machine learning is a rapidly growing field, with no doubt in part due to its countless applications to other fields, including pedagogy and the creation of computer-aided tutoring systems. To extend the functionality of FACT, an automated teaching assistant, we want to predict, using metadata produced by student activity, whether a student is capable of fixing their own mistakes. Logs were collected from previous FACT trials with middle school math teachers and students. The data was converted to time series sequences for deep learning, and ordinary features were extracted for statistical machine learning. Ultimately, deep learning models attained an accuracy of 60%, while tree-based methods attained an accuracy of 65%, showing that some correlation, although small, exists between how a student fixes their mistakes and whether their correction is correct.

ContributorsZhou, David (Author) / VanLehn, Kurt (Thesis director) / Wetzel, Jon (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2022-05

Bridging the Physical and the Digital Worlds of Learning Analytics in Educational Assessments through Human-AI Collaboration

Description

Experience, whether personal or vicarious, plays an influential role in shaping human knowledge. Through these experiences, one develops an understanding of the world, which leads to learning. The process of gaining knowledge in higher education transcends beyond the passive transmission of knowledge from an expert to a novice. Instead, students…

Experience, whether personal or vicarious, plays an influential role in shaping human knowledge. Through these experiences, one develops an understanding of the world, which leads to learning. The process of gaining knowledge in higher education transcends beyond the passive transmission of knowledge from an expert to a novice. Instead, students are encouraged to actively engage in every learning opportunity to achieve mastery in their chosen field. Evaluation of such mastery typically entails using educational assessments that provide objective measures to determine whether the student has mastered what is required of them. With the proliferation of educational technology in the modern classroom, information about students is being collected at an unprecedented rate, covering demographic, performance, and behavioral data. In the absence of analytics expertise, stakeholders may miss out on valuable insights that can guide future instructional interventions, especially in helping students understand their strengths and weaknesses. This dissertation presents Web-Programming Grading Assistant (WebPGA), a homegrown educational technology designed based on various learning sciences principles, which has been used by 6,000+ students. In addition to streamlining and improving the grading process, it encourages students to reflect on their performance. WebPGA integrates learning analytics into educational assessments using students' physical and digital footprints. A series of classroom studies is presented demonstrating the use of learning analytics and assessment data to make students aware of their misconceptions. It aims to develop ways for students to learn from previous mistakes made by themselves or by others. The key findings of this dissertation include the identification of effective strategies of better-performing students, the demonstration of the importance of individualized guidance during the reviewing process, and the likely impact of validating one's understanding of another's experiences. Moreover, the Personalized Recommender of Items to Master and Evaluate (PRIME) framework is introduced. It is a novel and intelligent approach for diagnosing one's domain mastery and providing tailored learning opportunities by allowing students to observe others' mistakes. Thus, this dissertation lays the groundwork for further improvement and inspires better use of available data to improve the quality of educational assessments that will benefit both students and teachers.

ContributorsParedes, Yancy Vance (Author) / Hsiao, I-Han (Thesis advisor) / VanLehn, Kurt (Thesis advisor) / Craig, Scotty D (Committee member) / Bansal, Srividya (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2023

AI-assisted Programming Question Generation: Constructing Semantic Networks of Programming Knowledge by Local Knowledge Graph and Abstract Syntax Tree

Description

Persistent self-assessment is the key to proficiency in computer programming. The process involves distributed practice of code tracing and writing skills which encompasses a large amount of training that is tailored for the student's learning condition. It requires the instructor to efficiently manage the learning resource and diligently generate related…

Persistent self-assessment is the key to proficiency in computer programming. The process involves distributed practice of code tracing and writing skills which encompasses a large amount of training that is tailored for the student's learning condition. It requires the instructor to efficiently manage the learning resource and diligently generate related programming questions for the student. However, programming question generation (PQG) is not an easy job. The instructor has to organize heterogeneous types of resources, i.e., conceptual programming concepts and procedural programming rules. S/he also has to carefully align the learning goals with the design of questions in regard to the topic relevance and complexity. Although numerous educational technologies like learning management systems (LMS) have been adopted across levels of programming learning, PQG is still largely based on the demanding creation task performed by the instructor without advanced technological support. To fill this gap, I propose a knowledge-based PQG model that aims to help the instructor generate new programming questions and expand existing assessment items. The PQG model is designed to transform conceptual and procedural programming knowledge from textbooks into a semantic network model by the Local Knowledge Graph (LKG) and the Abstract Syntax Tree (AST). For a given question, the model can generate a set of new questions by the associated LKG/AST semantic structures. I used the model to compare instructor-made questions from 9 undergraduate programming courses and textbook questions, which showed that the instructor-made questions had much simpler complexity than the textbook ones. The analysis also revealed the difference in topic distributions between the two question sets. A classification analysis further showed that the complexity of questions was correlated with student performance. To evaluate the performance of PQG, a group of experienced instructors from introductory programming courses was recruited. The result showed that the machine-generated questions were semantically similar to the instructor-generated questions. The questions also received significantly positive feedback regarding the topic relevance and extensibility. Overall, this work demonstrates a feasible PQG model that sheds light on AI-assisted PQG for the future development of intelligent authoring tools for programming learning.

ContributorsChung, Cheng-Yu (Author) / Hsiao, Ihan (Thesis advisor) / VanLehn, Kurt (Committee member) / Sahebi, Shaghayegh (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2022

Filtering by