Search Content

A semantic framework for integrating and publishing linked data on the Web

Description

Semantic web is the web of data that provides a common framework and technologies for sharing and reusing data in various applications. In semantic web terminology, linked data is the term used to describe a method of exposing and connecting data on the web from different sources. The purpose of…

Semantic web is the web of data that provides a common framework and technologies for sharing and reusing data in various applications. In semantic web terminology, linked data is the term used to describe a method of exposing and connecting data on the web from different sources. The purpose of linked data and semantic web is to publish data in an open and standard format and to link this data with existing data on the Linked Open Data Cloud. The goal of this thesis to come up with a semantic framework for integrating and publishing linked data on the web. Traditionally integrating data from multiple sources usually involves an Extract-Transform-Load (ETL) framework to generate datasets for analytics and visualization. The thesis proposes introducing a semantic component in the ETL framework to semi-automate the generation and publishing of linked data. In this thesis, various existing ETL tools and data integration techniques have been analyzed and deficiencies have been identified. This thesis proposes a set of requirements for the semantic ETL framework by conducting a manual process to integrate data from various sources such as weather, holidays, airports, flight arrival, departure and delays. The research questions that are addressed are: (i) to what extent can the integration, generation, and publishing of linked data to the cloud using a semantic ETL framework be automated; (ii) does use of semantic technologies produce a richer data model and integrated data. Details of the methodology, data collection, and application that uses the linked data generated are presented. Evaluation is done by comparing traditional data integration approach with semantic ETL approach in terms of effort involved in integration, data model generated and querying the data generated.

ContributorsPadki, Aparna (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Lindquist, Timothy (Committee member) / Arizona State University (Publisher)

Created2016

Semantic keyword search on large-scale semi-structured data

Description

Keyword search provides a simple and user-friendly mechanism for information search, and has become increasingly popular for accessing structured or semi-structured data. However, there are two open issues of keyword search on semi/structured data which are not well addressed by existing work yet.

First, while an increasing amount of investigation has…

Keyword search provides a simple and user-friendly mechanism for information search, and has become increasingly popular for accessing structured or semi-structured data. However, there are two open issues of keyword search on semi/structured data which are not well addressed by existing work yet.

First, while an increasing amount of investigation has been done in this important area, most existing work concentrates on efficiency instead of search quality and may fail to deliver high quality results from semantic perspectives. Majority of the existing work generates minimal sub-graph results that are oblivious to the entity and relationship semantics embedded in the data and in the user query. There are also studies that define results to be subtrees or subgraphs that contain all query keywords but are not necessarily ``minimal''. However, such result construction method suffers from the same problem of semantic mis-alignment between data and user query. In this work the semantics of how to {\em define} results that can capture users' search intention and then the generation of search intention aware results is studied.

Second, most existing research is incapable of handling large-scale structured data. However, as data volume has seen rapid growth in recent years, the problem of how to efficiently process keyword queries on large-scale structured data becomes important. MapReduce is widely acknowledged as an effective programming model to process big data. For keyword query processing on data graph, first graph algorithms which can efficiently return query results that are consistent with users' search intention are proposed. Then these algorithms are migrated to MapReduce to support big data. For keyword query processing on schema graph, it first transforms a keyword query into multiple SQL queries, then all generated SQL queries are run on the structured data. Therefore it is crucial to find the optimal way to execute a SQL query using MapReduce, which can minimize the processing time. In this work, a system called SOSQL is developed which generates the optimal query execution plan using MapReduce for a SQL query $Q$ with time complexity $O(n^2)$, where $n$ is the number of input tables of $Q$.

ContributorsShan, Yi (Author) / Chen, Yi (Thesis advisor) / Bansal, Srividya (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2016

An adaptable iOS mobile application for mobile data collection

Description

Mobile data collection (MDC) applications have been growing in the last decade

especially in the field of education and research. Although many MDC applications are

available, almost all of them are tailor-made for a very specific task in a very specific

field (i.e. health, traffic, weather forecasts, …etc.). Since the main users of…

Mobile data collection (MDC) applications have been growing in the last decade

especially in the field of education and research. Although many MDC applications are

available, almost all of them are tailor-made for a very specific task in a very specific

field (i.e. health, traffic, weather forecasts, …etc.). Since the main users of these apps are

researchers, physicians or generally data collectors, it can be extremely challenging for

them to make adjustments or modifications to these applications given that they have

limited or no technical background in coding. Another common issue with MDC

applications is that its functionalities are limited only to data collection and storing. Other

functionalities such as data visualizations, data sharing, data synchronization and/or data updating are rarely found in MDC apps.

This thesis tries to solve the problems mentioned above by adding the following

two enhancements: (a) the ability for data collectors to customize their own applications

based on the project they’re working on, (b) and introducing new tools that would help

manage the collected data. This will be achieved by creating a Java standalone

application where data collectors can use to design their own mobile apps in a userfriendly Graphical User Interface (GUI). Once the app has been completely designed

using the Java tool, a new iOS mobile application would be automatically generated

based on the user’s input. By using this tool, researchers now are able to create mobile

applications that are completely tailored to their needs, in addition to enjoying new

features such as visualize and analyze data, synchronize data to the remote database,

share data with other data collectors and update existing data.

ContributorsAl-Kaf, Zahra M (Author) / Lindquist, Timothy E (Thesis advisor) / Bansal, Srividya (Committee member) / Bansal, Ajay (Committee member) / Arizona State University (Publisher)

Created2016

Monitoring and Improving User Compliance and Data Quality For Long and Repetitive Self-Reporting MHealth Surveys

Description

For the past decade, mobile health applications are seeing greater acceptance due to their potential to remotely monitor and increase patient engagement, particularly for chronic disease. Sickle Cell Disease is an inherited chronic disorder of red blood cells requiring careful pain management. A significant number of mHealth applications have been…

For the past decade, mobile health applications are seeing greater acceptance due to their potential to remotely monitor and increase patient engagement, particularly for chronic disease. Sickle Cell Disease is an inherited chronic disorder of red blood cells requiring careful pain management. A significant number of mHealth applications have been developed in the market to help clinicians collect and monitor information of SCD patients. Surveys are the most common way to self-report patient conditions. These are non-engaging and suffer from poor compliance. The quality of data gathered from survey instruments while using technology can be questioned as patients may be motivated to complete a task but not motivated to do it well. A compromise in quality and quantity of the collected patient data hinders the clinicians' effort to be able to monitor patient's health on a regular basis and derive effective treatment measures. This research study has two goals. The first is to monitor user compliance and data quality in mHealth apps with long and repetitive surveys delivered. The second is to identify possible motivational interventions to help improve compliance and data quality. As a form of intervention, will introduce intrinsic and extrinsic motivational factors within the application and test it on a small target population. I will validate the impact of these motivational factors by performing a comparative analysis on the test results to determine improvements in user performance. This study is relevant, as it will help analyze user behavior in long and repetitive self-reporting tasks and derive measures to improve user performance. The results will assist software engineers working with doctors in designing and developing improved self-reporting mHealth applications for collecting better quality data and enhance user compliance.

ContributorsRallabhandi, Pooja (Author) / Gary, Kevin A (Thesis advisor) / Gaffar, Ashraf (Committee member) / Bansal, Srividya (Committee member) / Amresh, Ashish (Committee member) / Arizona State University (Publisher)

Created2017

Optimizing Performance Measures in Classification Using Ensemble Learning Methods

Description

Ensemble learning methods like bagging, boosting, adaptive boosting, stacking have traditionally shown promising results in improving the predictive accuracy in classification. These techniques have recently been widely used in various domains and applications owing to the improvements in computational efficiency and distributed computing advances. However, with the advent of wide…

Ensemble learning methods like bagging, boosting, adaptive boosting, stacking have traditionally shown promising results in improving the predictive accuracy in classification. These techniques have recently been widely used in various domains and applications owing to the improvements in computational efficiency and distributed computing advances. However, with the advent of wide variety of applications of machine learning techniques to class imbalance problems, further focus is needed to evaluate, improve and optimize other performance measures such as sensitivity (true positive rate) and specificity (true negative rate) in classification. This thesis demonstrates a novel approach to evaluate and optimize the performance measures (specifically sensitivity and specificity) using ensemble learning methods for classification that can be especially useful in class imbalanced datasets. In this thesis, ensemble learning methods (specifically bagging and boosting) are used to optimize the performance measures (sensitivity and specificity) on a UC Irvine (UCI) 130 hospital diabetes dataset to predict if a patient will be readmitted to the hospital based on various feature vectors. From the experiments conducted, it can be empirically concluded that, by using ensemble learning methods, although accuracy does improve to some margin, both sensitivity and specificity are optimized significantly and consistently over different cross validation approaches. The implementation and evaluation has been done on a subset of the large UCI 130 hospital diabetes dataset. The performance measures of ensemble learners are compared to the base machine learning classification algorithms such as Naive Bayes, Logistic Regression, k Nearest Neighbor, Decision Trees and Support Vector Machines.

ContributorsBahl, Neeraj Dharampal (Author) / Bansal, Ajay (Thesis advisor) / Amresh, Ashish (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2017

A comparative analysis of graph vs relational database for instructional module development system

Description

In today's data-driven world, every datum is connected to a large amount of data. Relational databases have been proving itself a pioneer in the field of data storage and manipulation since 1970s. But more recently they have been challenged by NoSQL graph databases in handling data models which have an…

In today's data-driven world, every datum is connected to a large amount of data. Relational databases have been proving itself a pioneer in the field of data storage and manipulation since 1970s. But more recently they have been challenged by NoSQL graph databases in handling data models which have an inherent graphical representation. Graph databases with the ability to store physical relationships between two nodes and native graph processing technique have been doing exceptionally well in graph data storage and management for applications like recommendation engines, biological modeling, network modeling, social media applications, etc.

Instructional Module Development System (IMODS) is a web-based software system that guides STEM instructors through the complex task of curriculum design, ensures tight alignment between various components of a course (i.e., learning objectives, content, assessments), and provides relevant information about research-based pedagogical and assessment strategies. The data model of IMODS is highly connected and has an inherent graphical representation between all its entities with numerous relationships between them. This thesis focuses on developing an algorithm to determine completeness of course design developed using IMODS. As part of this research objective, the study also analyzes the data model for best fit database to run these algorithms. As part of this thesis, two separate applications abstracting the data model of IMODS have been developed - one with Neo4j (graph database) and another with PostgreSQL (relational database). The research objectives of the thesis are as follows: (i) evaluate the performance of Neo4j and PostgreSQL in handling complex queries that will be fired throughout the life cycle of the course design process; (ii) devise an algorithm to determine the completeness of a course design developed using IMODS. This thesis presents the process of creating data model for PostgreSQL and converting it into a graph data model to be abstracted by Neo4j, creating SQL and CYPHER scripts for undertaking experiments on both platforms, testing and elaborate analysis of the results and evaluation of the databases in the context of IMODS.

ContributorsSaha, Abir Lal (Author) / Bansal, Srividya (Thesis advisor) / Bansal, Ajay (Committee member) / Gonzalez-Sanchez, Javier (Committee member) / Arizona State University (Publisher)

Created2017

SMART SCHEDULING FOR INSTRUCTIONAL MODULE DEVELOPMENT SYSTEM

Description

Many organizational course design methodologies feature general guidelines for the chronological and time-management aspects of course design development. Proper course structure and instructional strategy pacing has been shown to facilitate student knowledge acquisition of novel material. These course-scheduling details influencing student learning outcomes implies the need for an effective and…

Many organizational course design methodologies feature general guidelines for the chronological and time-management aspects of course design development. Proper course structure and instructional strategy pacing has been shown to facilitate student knowledge acquisition of novel material. These course-scheduling details influencing student learning outcomes implies the need for an effective and tightly coupled component of an instructional module. The Instructional Module Development System, or IMODS, seeks to improve STEM, or ‘science, technology, engineering, and math’, education, by equipping educators with a powerful informational tool that helps guide course design by providing information based on contemporary research about pedagogical methodology and assessment practices. This is particularly salient within the higher-education STEM fields because many instructors come from backgrounds that are more technical and most Ph.Ds. in science fields have traditionally not focused on preparing doctoral candidates to teach. This thesis project aims to apply a multidisciplinary approach, blending educational psychology and computer science, to help improve STEM education. By developing an instructional module-scheduling feature for the Web-based IMODS, Instructional Module Development System, system, we can help instructors plan out and organize their course work inside and outside of the classroom, while providing them with relevant helpful research that will help them improve their courses. This article illustrates the iterative design process to gather background research on pacing of workload and learning activities and their influence on student knowledge acquisition, constructively critique and analyze pre-existing information technology (IT) scheduling tools, synthesize graphical user interface, or GUI, mockups based on the background research, and then implement a functional-working prototype using the IMODs framework.

ContributorsCoomber, Wesley Poblete (Author) / Bansal, Srividya (Thesis director) / Lindquist, Timothy (Committee member) / Software Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Changing the College Experience

Description

College student mental health has been a prominent issue in the US. However, solutions to address this issue are oftentimes not free or convenient for students. This project seeks to aid in improving student mental health by identifying and addressing the most commonly faced stress factors that contribute to poor…

College student mental health has been a prominent issue in the US. However, solutions to address this issue are oftentimes not free or convenient for students. This project seeks to aid in improving student mental health by identifying and addressing the most commonly faced stress factors that contribute to poor mental health. These stress factors will be addressed via a free iOS application made available on the Apple App Store. A free iOS application that addresses commonly faced stress factors will provide students with a free and easily accessible resource to aid in their mental health journey.

ContributorsSuman, Faith (Author) / Sandy, Douglas (Thesis director) / Bansal, Srividya (Committee member) / Barrett, The Honors College (Contributor) / Watts College of Public Service & Community Solut (Contributor) / Software Engineering (Contributor)

Created2023-05

Learning Analytics and Behavior of Distributed Self-assessment and Reflections in Programming Problem Solving

Description

Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in the Computer Science (CS) and programming domain. They may choose…

Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in the Computer Science (CS) and programming domain. They may choose to utilize these opportunities to self-assess their learning progress and practice their skill. My objective in this thesis is to understand to what extent self-assess process can impact novice programmers learning and what advanced learning technologies can I provide to enhance the learner’s outcome and the progress. In this dissertation, I conducted a series of studies to investigate learning analytics and students’ behaviors in working on self-assessments and reflection opportunities. To enable this objective, I designed a personalized learning platform named QuizIT that provides daily quizzes to support learners in the computer science domain. QuizIT adopts an Open Social Student Model (OSSM) that supports personalized learning and serves as a self-assessment system. It aims to ignite self-regulating behavior and engage students in the self-assessment and reflective procedure. I designed and integrated the personalized practice recommender to the platform to investigate the self-assessment process. I also evaluated the self-assessment behavioral trails as a predictor to the students’ performance. The statistical indicators suggested that the distributed reflections were associated with the learner's performance. I proceeded to address whether distributed reflections enable self-regulating behavior and lead to better learning in CS introductory courses. From the student interactions with the system, I found distinct behavioral patterns that showed early signs of the learners' performance trajectory. The utilization of the personalized recommender improved the student’s engagement and performance in the self-assessment procedure. When I focused on enhancing reflections impact during self-assessment sessions through weekly opportunities, the learners in the CS domain showed better self-regulating learning behavior when utilizing those opportunities. The weekly reflections provided by the learners were able to capture more reflective features than the daily opportunities. Overall, this dissertation demonstrates the effectiveness of the learning technologies, including adaptive recommender and reflection, to support novice programming learners and their self-assessing processes.

ContributorsAlzaid, Mohammed (Author) / Hsiao, Ihan (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / VanLehn, Kurt (Committee member) / Nelson, Brian (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2022

Bridging the Physical and the Digital Worlds of Learning Analytics in Educational Assessments through Human-AI Collaboration

Description

Experience, whether personal or vicarious, plays an influential role in shaping human knowledge. Through these experiences, one develops an understanding of the world, which leads to learning. The process of gaining knowledge in higher education transcends beyond the passive transmission of knowledge from an expert to a novice. Instead, students…

Experience, whether personal or vicarious, plays an influential role in shaping human knowledge. Through these experiences, one develops an understanding of the world, which leads to learning. The process of gaining knowledge in higher education transcends beyond the passive transmission of knowledge from an expert to a novice. Instead, students are encouraged to actively engage in every learning opportunity to achieve mastery in their chosen field. Evaluation of such mastery typically entails using educational assessments that provide objective measures to determine whether the student has mastered what is required of them. With the proliferation of educational technology in the modern classroom, information about students is being collected at an unprecedented rate, covering demographic, performance, and behavioral data. In the absence of analytics expertise, stakeholders may miss out on valuable insights that can guide future instructional interventions, especially in helping students understand their strengths and weaknesses. This dissertation presents Web-Programming Grading Assistant (WebPGA), a homegrown educational technology designed based on various learning sciences principles, which has been used by 6,000+ students. In addition to streamlining and improving the grading process, it encourages students to reflect on their performance. WebPGA integrates learning analytics into educational assessments using students' physical and digital footprints. A series of classroom studies is presented demonstrating the use of learning analytics and assessment data to make students aware of their misconceptions. It aims to develop ways for students to learn from previous mistakes made by themselves or by others. The key findings of this dissertation include the identification of effective strategies of better-performing students, the demonstration of the importance of individualized guidance during the reviewing process, and the likely impact of validating one's understanding of another's experiences. Moreover, the Personalized Recommender of Items to Master and Evaluate (PRIME) framework is introduced. It is a novel and intelligent approach for diagnosing one's domain mastery and providing tailored learning opportunities by allowing students to observe others' mistakes. Thus, this dissertation lays the groundwork for further improvement and inspires better use of available data to improve the quality of educational assessments that will benefit both students and teachers.

ContributorsParedes, Yancy Vance (Author) / Hsiao, I-Han (Thesis advisor) / VanLehn, Kurt (Thesis advisor) / Craig, Scotty D (Committee member) / Bansal, Srividya (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2023