Semantic Information Extraction From Natural Language Using a Learning and Rule-Based Approach

190879-Thumbnail Image.png
Description
Open Information Extraction (OIE) is a subset of Natural Language Processing (NLP) that constitutes the processing of natural language into structured and machine-readable data. This thesis uses data in Resource Description Framework (RDF) triple format that comprises of a subject,

Open Information Extraction (OIE) is a subset of Natural Language Processing (NLP) that constitutes the processing of natural language into structured and machine-readable data. This thesis uses data in Resource Description Framework (RDF) triple format that comprises of a subject, predicate, and object. The extraction of RDF triples from natural language is an essential step towards importing data into web ontologies as part of the linked open data cloud on the Semantic web. There have been a number of related techniques for extraction of triples from plain natural language text including but not limited to ClausIE, OLLIE, Reverb, and DeepEx. This proposed study aims to reduce the dependency on conventional machine learning models since they require training datasets, and the models are not easily customizable or explainable. By leveraging a context-free grammar (CFG) based model, this thesis aims to address some of these issues while minimizing the trade-offs on performance and accuracy. Furthermore, a deep-dive is conducted to analyze the strengths and limitations of the proposed approach.
Date Created
2023
Agent

Bridging the Physical and the Digital Worlds of Learning Analytics in Educational Assessments through Human-AI Collaboration

187457-Thumbnail Image.png
Description
Experience, whether personal or vicarious, plays an influential role in shaping human knowledge. Through these experiences, one develops an understanding of the world, which leads to learning. The process of gaining knowledge in higher education transcends beyond the passive transmission

Experience, whether personal or vicarious, plays an influential role in shaping human knowledge. Through these experiences, one develops an understanding of the world, which leads to learning. The process of gaining knowledge in higher education transcends beyond the passive transmission of knowledge from an expert to a novice. Instead, students are encouraged to actively engage in every learning opportunity to achieve mastery in their chosen field. Evaluation of such mastery typically entails using educational assessments that provide objective measures to determine whether the student has mastered what is required of them. With the proliferation of educational technology in the modern classroom, information about students is being collected at an unprecedented rate, covering demographic, performance, and behavioral data. In the absence of analytics expertise, stakeholders may miss out on valuable insights that can guide future instructional interventions, especially in helping students understand their strengths and weaknesses. This dissertation presents Web-Programming Grading Assistant (WebPGA), a homegrown educational technology designed based on various learning sciences principles, which has been used by 6,000+ students. In addition to streamlining and improving the grading process, it encourages students to reflect on their performance. WebPGA integrates learning analytics into educational assessments using students' physical and digital footprints. A series of classroom studies is presented demonstrating the use of learning analytics and assessment data to make students aware of their misconceptions. It aims to develop ways for students to learn from previous mistakes made by themselves or by others. The key findings of this dissertation include the identification of effective strategies of better-performing students, the demonstration of the importance of individualized guidance during the reviewing process, and the likely impact of validating one's understanding of another's experiences. Moreover, the Personalized Recommender of Items to Master and Evaluate (PRIME) framework is introduced. It is a novel and intelligent approach for diagnosing one's domain mastery and providing tailored learning opportunities by allowing students to observe others' mistakes. Thus, this dissertation lays the groundwork for further improvement and inspires better use of available data to improve the quality of educational assessments that will benefit both students and teachers.
Date Created
2023
Agent

Changing the College Experience

Description

College student mental health has been a prominent issue in the US. However, solutions to address this issue are oftentimes not free or convenient for students. This project seeks to aid in improving student mental health by identifying and addressing

College student mental health has been a prominent issue in the US. However, solutions to address this issue are oftentimes not free or convenient for students. This project seeks to aid in improving student mental health by identifying and addressing the most commonly faced stress factors that contribute to poor mental health. These stress factors will be addressed via a free iOS application made available on the Apple App Store. A free iOS application that addresses commonly faced stress factors will provide students with a free and easily accessible resource to aid in their mental health journey.

Date Created
2023-05
Agent

Improving Ontology Alignment Using Machine Learning Techniques

171617-Thumbnail Image.png
Description
Ontologies play an important role in storing and exchanging digitized data. As the need for semantic web information grows, organizations from around the globe has defined ontologies in different domains to better represent the data. But different organizations define ontologies

Ontologies play an important role in storing and exchanging digitized data. As the need for semantic web information grows, organizations from around the globe has defined ontologies in different domains to better represent the data. But different organizations define ontologies of the same entity in their own way. Finding ontologies of the same entity in different fields and domains has become very important for unifying and improving interoperability of data between these multiple domains. Many different techniques have been used over the year, including human assisted, automated and hybrid. In recent years with the availability of many machine learning techniques, researchers are trying to apply these techniques to solve the ontology alignment problem across different domains. In this study I have looked into the use of different machine learning techniques such as Support Vector Machine, Stochastic Gradient Descent, Random Forest etc. for solving ontology alignment problem with some of the most commonly used datasets found from the famous Ontology Alignment Evaluation Initiative (OAEI). I have proposed a method OntoAlign which demonstrates the importance of using different types of similarity measures for feature extraction from ontology data in order to achieve better results for ontology alignment.
Date Created
2022
Agent

Learning Analytics and Behavior of Distributed Self-assessment and Reflections in Programming Problem Solving

171562-Thumbnail Image.png
Description
Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in

Distributed self-assessments and reflections empower learners to take the lead on their knowledge gaining evaluation. Both provide essential elements for practice and self-regulation in learning settings. Nowadays, many sources for practice opportunities are made available to the learners, especially in the Computer Science (CS) and programming domain. They may choose to utilize these opportunities to self-assess their learning progress and practice their skill. My objective in this thesis is to understand to what extent self-assess process can impact novice programmers learning and what advanced learning technologies can I provide to enhance the learner’s outcome and the progress. In this dissertation, I conducted a series of studies to investigate learning analytics and students’ behaviors in working on self-assessments and reflection opportunities. To enable this objective, I designed a personalized learning platform named QuizIT that provides daily quizzes to support learners in the computer science domain. QuizIT adopts an Open Social Student Model (OSSM) that supports personalized learning and serves as a self-assessment system. It aims to ignite self-regulating behavior and engage students in the self-assessment and reflective procedure. I designed and integrated the personalized practice recommender to the platform to investigate the self-assessment process. I also evaluated the self-assessment behavioral trails as a predictor to the students’ performance. The statistical indicators suggested that the distributed reflections were associated with the learner's performance. I proceeded to address whether distributed reflections enable self-regulating behavior and lead to better learning in CS introductory courses. From the student interactions with the system, I found distinct behavioral patterns that showed early signs of the learners' performance trajectory. The utilization of the personalized recommender improved the student’s engagement and performance in the self-assessment procedure. When I focused on enhancing reflections impact during self-assessment sessions through weekly opportunities, the learners in the CS domain showed better self-regulating learning behavior when utilizing those opportunities. The weekly reflections provided by the learners were able to capture more reflective features than the daily opportunities. Overall, this dissertation demonstrates the effectiveness of the learning technologies, including adaptive recommender and reflection, to support novice programming learners and their self-assessing processes.
Date Created
2022
Agent

Investigating the Utility of Agile and Lean Software Process Metrics for Open Source Software Communities: An Exploratory Study

171448-Thumbnail Image.png
Description
The adoption of Open Source Software (OSS) by organizations has become a strategic need in a wide variety of software applications and platforms. Open Source has changed the way organizations develop, acquire, use, and commercialize software. Further, OSS projects often

The adoption of Open Source Software (OSS) by organizations has become a strategic need in a wide variety of software applications and platforms. Open Source has changed the way organizations develop, acquire, use, and commercialize software. Further, OSS projects often incorporate similar principles and practices as Agile and Lean software development projects. Contrary to traditional organizations, the environment in which these projects function has an impact on process-related elements like the flow of work and value definition. Process metrics are typically employed during Agile Software Engineering projects as a means of providing meaningful feedback. Investigating these metrics to see if OSS projects and communities can utilize them in a beneficial way thus becomes an interesting research topic. In that context, this exploratory research investigates whether well-established Agile and Lean software engineering metrics provide useful feedback about OSS projects. This knowledge will assist in educating the Open Source community about the applications of Agile Software Engineering and its variations in Open Source projects. Each of the Open Source projects included in this analysis has a substantial development team that maintains a mature, well-established codebase with process flow information. These OSS projects listed on GitHub are investigated by applying process flow metrics. The methodology used to collect these metrics and relevant findings are discussed in this thesis. This study also compares the results to distinctive Open Source project characteristics as part of the analysis. In this exploratory research best-fit versions of published Agile and Lean software process metrics are applied to OSS, and following these explorations, specific questions are further addressed using the data collected. This research's original contribution is to determine whether Agile and Lean process metrics are helpful in OSS, as well as the opportunities and obstacles that may arise when applying Agile and Lean principles to OSS.
Date Created
2022
Agent

AI-assisted Programming Question Generation: Constructing Semantic Networks of Programming Knowledge by Local Knowledge Graph and Abstract Syntax Tree

168847-Thumbnail Image.png
Description
Persistent self-assessment is the key to proficiency in computer programming. The process involves distributed practice of code tracing and writing skills which encompasses a large amount of training that is tailored for the student's learning condition. It requires the instructor

Persistent self-assessment is the key to proficiency in computer programming. The process involves distributed practice of code tracing and writing skills which encompasses a large amount of training that is tailored for the student's learning condition. It requires the instructor to efficiently manage the learning resource and diligently generate related programming questions for the student. However, programming question generation (PQG) is not an easy job. The instructor has to organize heterogeneous types of resources, i.e., conceptual programming concepts and procedural programming rules. S/he also has to carefully align the learning goals with the design of questions in regard to the topic relevance and complexity. Although numerous educational technologies like learning management systems (LMS) have been adopted across levels of programming learning, PQG is still largely based on the demanding creation task performed by the instructor without advanced technological support. To fill this gap, I propose a knowledge-based PQG model that aims to help the instructor generate new programming questions and expand existing assessment items. The PQG model is designed to transform conceptual and procedural programming knowledge from textbooks into a semantic network model by the Local Knowledge Graph (LKG) and the Abstract Syntax Tree (AST). For a given question, the model can generate a set of new questions by the associated LKG/AST semantic structures. I used the model to compare instructor-made questions from 9 undergraduate programming courses and textbook questions, which showed that the instructor-made questions had much simpler complexity than the textbook ones. The analysis also revealed the difference in topic distributions between the two question sets. A classification analysis further showed that the complexity of questions was correlated with student performance. To evaluate the performance of PQG, a group of experienced instructors from introductory programming courses was recruited. The result showed that the machine-generated questions were semantically similar to the instructor-generated questions. The questions also received significantly positive feedback regarding the topic relevance and extensibility. Overall, this work demonstrates a feasible PQG model that sheds light on AI-assisted PQG for the future development of intelligent authoring tools for programming learning.
Date Created
2022
Agent

OntoConnect: Domain-Agnostic Ontology Alignment using Neural Networks

161678-Thumbnail Image.png
Description
An ontology is a vocabulary that provides a formal description of entities within a domain and their relationships with other entities. Along with basic schema information, it also captures information in the form of metadata about cardinality, restrictions, hierarchy, and

An ontology is a vocabulary that provides a formal description of entities within a domain and their relationships with other entities. Along with basic schema information, it also captures information in the form of metadata about cardinality, restrictions, hierarchy, and semantic meaning. With the rapid growth of semantic (RDF) data on the web, many organizations like DBpedia, Earth Science Information Partners (ESIP) are publishing more and more data in RDF format. The ontology alignment task aims at linking two or more different ontologies from the same domain or different domains. It is a process of finding the semantic relationship between two or more ontological entities and/or instances. Information/data sharing among different systems is quite limited because of differences in data based on syntax, structures, and semantics. Ontology alignment is used to overcome the limitation of semantic interoperability of current vast distributed systems available on the Web. In spite of the availability of large hierarchical domain-specific datasets, automated ontology mapping is still a complex problem. Over the years, many techniques have been proposed for ontology instance alignment, schema alignment, and link discovery. Most of the available approaches require human intervention or work within a specific domain. The challenge involves representing an entity as a vector that encodes all context information of the entity such as hierarchical information, properties, constraints, etc. The ontological representation is rich in comparison with the regular data schema because of metadata about various properties, constraints, relationship to other entities within the domain, etc. While finding similarities between entities this metadata is often overlooked. The second challenge is that the comparison of two ontologies is an intense operation and highly depends on the domain and the language that the ontologies are expressed in. Most current methods require human intervention that leads to a time-consuming and cumbersome process and the output is prone to human errors. The proposed unsupervised recursive neural network technique achieves an F-measure of 80.3% on the Anatomy dataset and the proposed graph neural network technique achieves an F-measure of 81.0% on the Anatomy dataset.
Date Created
2021
Agent

Distributed RDF Storage and Querying Using In-Memory Processing Engine

161510-Thumbnail Image.png
Description
The proliferation of semantic data in the form of RDF (Resource Description Framework) triples demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. There are three open issues with distributed RDF data

The proliferation of semantic data in the form of RDF (Resource Description Framework) triples demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second is that solutions are optimized for certain types of query patterns and don’t necessarily work well for all types, and third is concerned with reducing pre-processing cost. Therefore, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL (SPARQL Protocol and RDF Query Language) query performance regardless of its pattern shape with minimized pre-processing overhead. In this context, the first contribution of this work is a distributed RDF data partitioning schema called 3CStore that extends the existing VP (Vertical Partitioning) approach by using a subset of triples from the VP tables based on different join correlations. This approach speeds up queries at the cost of additional pre-processing overhead. To solve this, a relational partitioning schema called VPExp was developed by splitting predicates based on explicit type information of objects. This approach gains a significant query performance only for the specific type of query where the object is bound to a value for a particular predicate. To get efficient query performance on a wide range of query patterns, an improved solution is proposed by extending the existing Property Table approach to Subset-Property Table and combined with the VP approach. Further investigation on distributed RDF processing and querying systems based on typical use cases led to a novel relational partitioning schema called PTP (Property Table Partitioning) that further partitions the whole Property Table into the number of unique properties to minimize query input size and join operations during query evaluation. Finally, an RDF data management system based on the SPARQL-over-SQL approach called S3QLRDF is developed that generates the optimal query execution plan using statistics of PTP tables to provide efficient SPARQL query processing on a distributed system.
Date Created
2021
Agent

Diversifying Relevant Search Results from Social Media Using Community Contributed Images

158206-Thumbnail Image.png
Description
Availability of affordable image and video capturing devices as well as rapid development of social networking and content sharing websites has led to the creation of new type of content, Social Media. Any system serving the end user’s query search

Availability of affordable image and video capturing devices as well as rapid development of social networking and content sharing websites has led to the creation of new type of content, Social Media. Any system serving the end user’s query search request should not only take the relevant images into consideration but they also need to be divergent for a well-rounded description of a query. As a result, the automated optimization of image retrieval results that are also divergent becomes exceedingly important.



The main focus of this thesis is to use visual description of a landmark by choosing the most diverse pictures that best describe all the details of the queried location from community-contributed datasets. For this, an end-to-end framework has been built, to retrieve relevant results that are also diverse. Different retrieval re-ranking and diversification strategies are evaluated to find a balance between relevance and diversification. Clustering techniques are employed to improve divergence. A unique fusion approach has been adopted to overcome the dilemma of selecting an appropriate clustering technique and the corresponding parameters, given a set of data to be investigated. Extensive experiments have been conducted on the Flickr Div150Cred dataset that has 30 different landmark locations. The results obtained are promising when evaluated on metrics for relevance and diversification.
Date Created
2020
Agent