Search Content

SearchViz: an interactive visual interface to navigate search-results in online discussion forums

Description

Online programming communities are widely used by programmers for troubleshooting or various problem solving tasks. Large and ever increasing volume of posts on these communities demands more efforts to read and comprehend thus making it harder to find relevant information. In my thesis; I designed and studied an alternate approach…

Online programming communities are widely used by programmers for troubleshooting or various problem solving tasks. Large and ever increasing volume of posts on these communities demands more efforts to read and comprehend thus making it harder to find relevant information. In my thesis; I designed and studied an alternate approach by using interactive network visualization to represent relevant search results for online programming discussion forums.

I conducted user study to evaluate the effectiveness of this approach. Results show that users were able to identify relevant information more precisely via visual interface as compared to traditional list based approach. Network visualization demonstrated effective search-result navigation support to facilitate user’s tasks and improved query quality for successive queries. Subjective evaluation also showed that visualizing search results conveys more semantic information in efficient manner and makes searching more effective.

ContributorsMehta, Vishal Vimal (Author) / Hsiao, Ihan (Thesis advisor) / Walker, Erin (Committee member) / Sarwat, Mohamed (Committee member) / Arizona State University (Publisher)

Created2015

Assessing Adaptive Learning Styles in Computer Science Through a Virtual World

Description

Programming is quickly becoming as ubiquitous and essential a skill as general mathematics. However, many elementary and high school students are still not aware of what the computer science field entails. To make matters worse, students who are introduced to computer science are frequently being fed only part of what…

Programming is quickly becoming as ubiquitous and essential a skill as general mathematics. However, many elementary and high school students are still not aware of what the computer science field entails. To make matters worse, students who are introduced to computer science are frequently being fed only part of what it is about rather than its entire construction. Consequently, they feel out of their depth when they approach college. Research has discovered that by teaching computer science and programming through a problem-driven approach and focusing on a combination of syntax and computational thinking, students can be prepared when entering higher levels of computer science education.

This thesis describes the design, development, and early user testing of a theory-based virtual world for computer science instruction called System Dot. System Dot was designed to visually manifest programming instructions into interactable objects, giving players a way to see coding as tangible entities rather than text on a white screen. In order for System Dot to convey the true nature of computer science, a custom predictive recursive descent parser was embedded in the program to validate any user-generated solutions to pre-defined logical platforming puzzles.

Steps were taken to adapt the virtual world to player behavior by creating a system to detect their learning style playing the game. Through a dynamic Bayesian network, System Dot aims to classify a player’s learning style based on the Felder-Sylverman Learning Style Model (FSLSM). Testers played through the first half of System Dot, which was enough to test out the Bayesian network and initial learning style classification. This classification was then compared to the assessment by Felder’s Index of Learning Styles Questionnaire (ILSQ). Lastly, this thesis will also discuss ways to use the results from the user testing to implement a personalized feedback system for the virtual world in the future and what has been learned through the learning style method.

ContributorsKury, Nizar (Author) / Nelson, Brian C (Thesis advisor) / Hsiao, Ihan (Committee member) / Kobayashi, Yoshihiro (Committee member) / Arizona State University (Publisher)

Created2017

Detecting Political Framing Shifts and the Adversarial Phrases within\\ Rival Factions and Ranking Temporal Snapshot Contents in Social Media

Description

Social Computing is an area of computer science concerned with dynamics of communities and cultures, created through computer-mediated social interaction. Various social media platforms, such as social network services and microblogging, enable users to come together and create social movements expressing their opinions on diverse sets of issues, events, complaints,…

Social Computing is an area of computer science concerned with dynamics of communities and cultures, created through computer-mediated social interaction. Various social media platforms, such as social network services and microblogging, enable users to come together and create social movements expressing their opinions on diverse sets of issues, events, complaints, grievances, and goals. Methods for monitoring and summarizing these types of sociopolitical trends, its leaders and followers, messages, and dynamics are needed. In this dissertation, a framework comprising of community and content-based computational methods is presented to provide insights for multilingual and noisy political social media content. First, a model is developed to predict the emergence of viral hashtag breakouts, using network features. Next, another model is developed to detect and compare individual and organizational accounts, by using a set of domain and language-independent features. The third model exposes contentious issues, driving reactionary dynamics between opposing camps. The fourth model develops community detection and visualization methods to reveal underlying dynamics and key messages that drive dynamics. The final model presents a use case methodology for detecting and monitoring foreign influence, wherein a state actor and news media under its control attempt to shift public opinion by framing information to support multiple adversarial narratives that facilitate their goals. In each case, a discussion of novel aspects and contributions of the models is presented, as well as quantitative and qualitative evaluations. An analysis of multiple conflict situations will be conducted, covering areas in the UK, Bangladesh, Libya and the Ukraine where adversarial framing lead to polarization, declines in social cohesion, social unrest, and even civil wars (e.g., Libya and the Ukraine).

ContributorsAlzahrani, Sultan (Author) / Davulcu, Hasan (Thesis advisor) / Corman, Steve R. (Committee member) / Li, Baoxin (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2018

Cost-Sensitive Selective Classification and its Applications to Online Fraud Management

Description

Fraud is defined as the utilization of deception for illegal gain by hiding the true nature of the activity. While organizations lose around $3.7 trillion in revenue due to financial crimes and fraud worldwide, they can affect all levels of society significantly. In this dissertation, I focus on credit card…

Fraud is defined as the utilization of deception for illegal gain by hiding the true nature of the activity. While organizations lose around $3.7 trillion in revenue due to financial crimes and fraud worldwide, they can affect all levels of society significantly. In this dissertation, I focus on credit card fraud in online transactions. Every online transaction comes with a fraud risk and it is the merchant's liability to detect and stop fraudulent transactions. Merchants utilize various mechanisms to prevent and manage fraud such as automated fraud detection systems and manual transaction reviews by expert fraud analysts. Many proposed solutions mostly focus on fraud detection accuracy and ignore financial considerations. Also, the highly effective manual review process is overlooked. First, I propose Profit Optimizing Neural Risk Manager (PONRM), a selective classifier that (a) constitutes optimal collaboration between machine learning models and human expertise under industrial constraints, (b) is cost and profit sensitive. I suggest directions on how to characterize fraudulent behavior and assess the risk of a transaction. I show that my framework outperforms cost-sensitive and cost-insensitive baselines on three real-world merchant datasets. While PONRM is able to work with many supervised learners and obtain convincing results, utilizing probability outputs directly from the trained model itself can pose problems, especially in deep learning as softmax output is not a true uncertainty measure. This phenomenon, and the wide and rapid adoption of deep learning by practitioners brought unintended consequences in many situations such as in the infamous case of Google Photos' racist image recognition algorithm; thus, necessitated the utilization of the quantified uncertainty for each prediction. There have been recent efforts towards quantifying uncertainty in conventional deep learning methods (e.g., dropout as Bayesian approximation); however, their optimal use in decision making is often overlooked and understudied. Thus, I present a mixed-integer programming framework for selective classification called MIPSC, that investigates and combines model uncertainty and predictive mean to identify optimal classification and rejection regions. I also extend this framework to cost-sensitive settings (MIPCSC) and focus on the critical real-world problem, online fraud management and show that my approach outperforms industry standard methods significantly for online fraud management in real-world settings.

ContributorsYildirim, Mehmet Yigit (Author) / Davulcu, Hasan (Thesis advisor) / Bakkaloglu, Bertan (Committee member) / Huang, Dijiang (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2019

Predicting and Interpreting Students Performance using Supervised Learning and Shapley Additive Explanations

Description

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness,…

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness, but limited studies compared different statistical techniques with latest frameworks, and interpreted models in a unified approach.

In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore

and predict students’ performances. Multiple machine learning models and the model accuracy were evaluated based on the Shapley Additive Explanation.

The Cross-Validation shows the Gradient Boosting Decision Tree has the best precision 85.93% with average 82.90%. Features like Component grade, Due Date, Submission Times have higher impact than others. Baseline model received lower precision due to lack of non-linear fitting.

ContributorsTian, Wenbo (Author) / Hsiao, Ihan (Thesis advisor) / Bazzi, Rida (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2019

Harnessing Digital Footprints From Paper-based Assessments: An Investigation on Students' Reviewing Behavior

Description

This thesis investigates students' learning behaviors through their interaction with an educational technology, Web Programming Grading Assistant. The technology was developed to facilitate the grading of paper-based examinations in large lecture-based classrooms and to provide richer and more meaningful feedback to students. A classroom study was designed and data was…

This thesis investigates students' learning behaviors through their interaction with an educational technology, Web Programming Grading Assistant. The technology was developed to facilitate the grading of paper-based examinations in large lecture-based classrooms and to provide richer and more meaningful feedback to students. A classroom study was designed and data was gathered from an undergraduate computer-programming course in the fall of 2016. Analysis of the data revealed that there was a negative correlation between time lag of first review attempt and performance. A survey was developed and disseminated that gave insight into how students felt about the technology and what they normally do to study for programming exams. In conclusion, the knowledge gained in this study aids in the quest to better educate students in computer programming in large in-person classrooms.

ContributorsMurphy, Hannah (Author) / Hsiao, Ihan (Thesis director) / Nelson, Brian (Committee member) / School of Computing, Informatics, and Decision Systems Engineering (Contributor) / Department of Supply Chain Management (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Visualization tool for islamic radical and counter radical movements and their online followers in South East Asia

Description

With the advent of social media and micro-blogging sites, people have become active in sharing their thoughts, opinions, ideologies and furthermore enforcing them on others. Users have become the source for the production and dissemination of real time information. The content posted by the users can be used to understand…

With the advent of social media and micro-blogging sites, people have become active in sharing their thoughts, opinions, ideologies and furthermore enforcing them on others. Users have become the source for the production and dissemination of real time information. The content posted by the users can be used to understand them and track their behavior. Using this content of the user, data analysis can be performed to understand their social ideology and affinity towards Radical and Counter-Radical Movements. During the process of expressing their opinions people use hashtags in their messages in Twitter. These hashtags are a rich source of information in understanding the content based relationship between the online users apart from the existing context based follower and friend relationship.

An intelligent visual dash-board system is necessary which can track the activities of the users and diffusion of the online social movements, identify the hot-spots in the users' network, show the geographic foot print of the users and to understand the socio-cultural, economic and political drivers for the relationship among different groups of the users.

ContributorsGaripalli, Sravan Kumar (Author) / Davulcu, Hasan (Thesis advisor) / Shakarian, Paulo (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2015

Identifying relevant interaction metrics for predicting student performance in a generic learning content management system

Description

The growing use of Learning Management Systems (LMS) in classrooms has enabled a great amount of data to be collected about the study behavior of students. Previously, research has been conducted to interpret the collected LMS usage data in order to find the most effective study habits for students. Professors…

The growing use of Learning Management Systems (LMS) in classrooms has enabled a great amount of data to be collected about the study behavior of students. Previously, research has been conducted to interpret the collected LMS usage data in order to find the most effective study habits for students. Professors can then use the interpretations to predict which students will perform well and which student will perform poorly in the rest of the course, allowing the professor to better provide assistance to students in need. However, these research attempts have largely analyzed metrics that are specific to certain graphical interfaces, ways of answering questions, or specific pages on an LMS. As a result, the analysis is only relevant to classrooms that use the specific LMS being analyzed.

For this thesis, behavior metrics obtained by the Organic Practice Environment (OPE) LMS at Arizona State University were compared to student performance in Dr. Ian Gould’s Organic Chemistry I course. Each metric gathered was generic enough to be potentially used by any LMS, allowing the results to be relevant to a larger amount of classrooms. By using a combination of bivariate correlation analysis, group mean comparisons, linear regression model generation, and outlier analysis, the metrics that correlate best to exam performance were identified. The results indicate that the total usage of the LMS, amount of cramming done before exams, correctness of the responses submitted, and duration of the responses submitted all demonstrate a strong correlation with exam scores.

ContributorsBeerman, Eric (Author) / VanLehn, Kurt (Thesis advisor) / Gould, Ian (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2015

Exploring generic features for online large-scale discussion forum comments

Description

Online discussion forums have become an integral part of education and are large repositories of valuable information. They facilitate exploratory learning by allowing users to review and respond to the work of others and approach learning in diverse ways. This research investigates the different comment semantic features and the effect…

Online discussion forums have become an integral part of education and are large repositories of valuable information. They facilitate exploratory learning by allowing users to review and respond to the work of others and approach learning in diverse ways. This research investigates the different comment semantic features and the effect they have on the quality of a post in a large-scale discussion forum. We survey the relevant literature and employ the key content quality identification features. We then construct comment semantics features and build several regression models to explore the value of comment semantics dynamics. The results reconfirm the usefulness of several essential quality predictors, including time, reputation, length, and editorship. We also found that comment semantics are valuable to shape the answer quality. Specifically, the diversity of comments significantly contributes to the answer quality. In addition, when searching for good quality answers, it is important to look for global semantics dynamics (diversity), rather than observe local differences (disputable content). Finally, the presence of comments shepherd the community to revise the posts by attracting attentions to the posts and eventually facilitate the editing process.

ContributorsAggarwal, Adithya (Author) / Hsiao, Ihan (Thesis advisor) / Lopez, Claudia (Committee member) / Walker, Erin (Committee member) / Arizona State University (Publisher)

Created2016

Biology question generation from a semantic network

Description

Science instructors need questions for use in exams, homework assignments, class discussions, reviews, and other instructional activities. Textbooks never have enough questions, so instructors must find them from other sources or generate their own questions. In order to supply instructors with biology questions, a semantic network approach was…

Science instructors need questions for use in exams, homework assignments, class discussions, reviews, and other instructional activities. Textbooks never have enough questions, so instructors must find them from other sources or generate their own questions. In order to supply instructors with biology questions, a semantic network approach was developed for generating open response biology questions. The generated questions were compared to professional authorized questions.

To boost students’ learning experience, adaptive selection was built on the generated questions. Bayesian Knowledge Tracing was used as embedded assessment of the student’s current competence so that a suitable question could be selected based on the student’s previous performance. A between-subjects experiment with 42 participants was performed, where half of the participants studied with adaptive selected questions and the rest studied with mal-adaptive order of questions. Both groups significantly improved their test scores, and the participants in adaptive group registered larger learning gains than participants in the control group.

To explore the possibility of generating rich instructional feedback for machine-generated questions, a question-paragraph mapping task was identified. Given a set of questions and a list of paragraphs for a textbook, the goal of the task was to map the related paragraphs to each question. An algorithm was developed whose performance was comparable to human annotators.

A multiple-choice question with high quality distractors (incorrect answers) can be pedagogically valuable as well as being much easier to grade than open-response questions. Thus, an algorithm was developed to generate good distractors for multiple-choice questions. The machine-generated multiple-choice questions were compared to human-generated questions in terms of three measures: question difficulty, question discrimination and distractor usefulness. By recruiting 200 participants from Amazon Mechanical Turk, it turned out that the two types of questions performed very closely on all the three measures.

ContributorsZhang, Lishang (Author) / VanLehn, Kurt (Thesis advisor) / Baral, Chitta (Committee member) / Hsiao, Ihan (Committee member) / Wright, Christian (Committee member) / Arizona State University (Publisher)

Created2015

Filtering by