Matching Items (57)

Filtering by

Clear all filters

Detecting Propaganda Bots on Twitter Using Machine Learning

Description

Propaganda bots are malicious bots on Twitter that spread divisive opinions and support political accounts. This project is based on detecting propaganda bots on Twitter using machine learning. Once I began to observe patterns within propaganda followers on

Propaganda bots are malicious bots on Twitter that spread divisive opinions and support political accounts. This project is based on detecting propaganda bots on Twitter using machine learning. Once I began to observe patterns within propaganda followers on Twitter, I determined that I could train algorithms to detect these bots. The paper focuses on my development and process of training classifiers and using them to create a user-facing server that performs prediction functions automatically. The learning goals of this project were detailed, the focus of which was to learn some form of machine learning architecture. I needed to learn some aspect of large data handling, as well as being able to maintain these datasets for training use. I also needed to develop a server that would execute these functionalities on command. I wanted to be able to design a full-stack system that allowed me to create every aspect of a user-facing server that can execute predictions using the classifiers that I design.
Throughout this project, I decided on a number of learning goals to consider it a success. I needed to learn how to use the supporting libraries that would help me to design this system. I also learned how to use the Twitter API, as well as create the infrastructure behind it that would allow me to collect large amounts of data for machine learning. I needed to become familiar with common machine learning libraries in Python in order to create the necessary algorithms and pipelines to make predictions based on Twitter data.
This paper details the steps and decisions needed to determine how to collect this data and apply it to machine learning algorithms. I determined how to create labelled data using pre-existing Botometer ratings, and the levels of confidence I needed to label data for training. I use the scikit-learn library to create these algorithms to best detect these bots. I used a number of pre-processing routines to refine the classifiers’ precision, including natural language processing and data analysis techniques. I eventually move to remotely-hosted versions of the system on Amazon web instances to collect larger amounts of data and train more advanced classifiers. This leads to the details of my final implementation of a user-facing server, hosted on AWS and interfacing over Gmail’s IMAP server.
The current and future development of this system is laid out. This includes more advanced classifiers, better data analysis, conversions to third party Twitter data collection systems, and user features. I detail what it is I have learned from this exercise, and what it is I hope to continue working on.

Contributors

Agent

Created

Date Created
2019-05

Deep Periodic Networks

Description

In the field of machine learning, reinforcement learning stands out for its ability to explore approaches to complex, high dimensional problems that outperform even expert humans. For robotic locomotion tasks reinforcement learning provides an approach to solving them without the

In the field of machine learning, reinforcement learning stands out for its ability to explore approaches to complex, high dimensional problems that outperform even expert humans. For robotic locomotion tasks reinforcement learning provides an approach to solving them without the need for unique controllers. In this thesis, two reinforcement learning algorithms, Deep Deterministic Policy Gradient and Group Factor Policy Search are compared based upon their performance in the bipedal walking environment provided by OpenAI gym. These algorithms are evaluated on their performance in the environment and their sample efficiency.

Contributors

Agent

Created

Date Created
2018-12

133397-Thumbnail Image.png

Comparative Analysis in Acquisition of Coding Skills

Description

Students learn in various ways \u2014 visualization, auditory, memorizing, or making analogies. Traditional lecturing in engineering courses and the learning styles of engineering students are inharmonious causing students to be at a disadvantage based on their learning style (Felder &

Students learn in various ways \u2014 visualization, auditory, memorizing, or making analogies. Traditional lecturing in engineering courses and the learning styles of engineering students are inharmonious causing students to be at a disadvantage based on their learning style (Felder & Silverman, 1988). My study analyzes the traditional approach to learning coding skills which is unnatural to engineering students with no previous exposure and examining if visual learning enhances introductory computer science education. Visual and text-based learning are evaluated to determine how students learn introductory coding skills and associated problem solving skills. My study was conducted to observe how the two types of learning aid the students in learning how to problem solve as well as how much knowledge can be obtained in a short period of time. The application used for visual learning was Scratch and Repl.it was used for text-based learning. Two exams were made to measure the progress made by each student. The topics covered by the exam were initialization, variable reassignment, output, if statements, if else statements, nested if statements, logical operators, arrays/lists, while loop, type casting, functions, object orientation, and sorting. Analysis of the data collected in the study allow us to observe whether the traditional method of teaching programming or block-based programming is more beneficial and in what topics of introductory computer science concepts.

Contributors

Agent

Created

Date Created
2018-05

Enhancing Student Learning Through Adaptive Sentence Generation

Description

Education of any skill based subject, such as mathematics or language, involves a significant amount of repetition and pratice. According to the National Survey of Student Engagements, students spend on average 17 hours per week reviewing and practicing material previously

Education of any skill based subject, such as mathematics or language, involves a significant amount of repetition and pratice. According to the National Survey of Student Engagements, students spend on average 17 hours per week reviewing and practicing material previously learned in a classroom, with higher performing students showing a tendency to spend more time practicing. As such, learning software has emerged in the past several decades focusing on providing a wide range of examples, practice problems, and situations for users to exercise their skills. Notably, math students have benefited from software that procedurally generates a virtually infinite number of practice problems and their corresponding solutions. This allows for instantaneous feedback and automatic generation of tests and quizzes. Of course, this is only possible because software is capable of generating and verifying a virtually endless supply of sample problems across a wide range of topics within mathematics. While English learning software has progressed in a similar manner, it faces a series of hurdles distinctly different from those of mathematics. In particular, there is a wide range of exception cases present in English grammar. Some words have unique spellings for their plural forms, some words have identical spelling for plural forms, and some words are conjugated differently for only one particular tense or person-of-speech. These issues combined make the problem of generating grammatically correct sentences complicated. To compound to this problem, the grammar rules in English are vast, and often depend on the context in which they are used. Verb-tense agreement (e.g. "I eat" vs "he eats"), and conjugation of irregular verbs (e.g. swim -> swam) are common examples. This thesis presents an algorithm designed to randomly generate a virtually infinite number of practice problems for students of English as a second language. This approach differs from other generation approaches by generating based on a context set by educators, so that problems can be generated in the context of what students are currently learning. The algorithm is validated through a study in which over 35 000 sentences generated by the algorithm are verified by multiple grammar checking algorithms, and a subset of the sentences are validated against 3 education standards by a subject matter expert in the field. The study found that this approach has a significantly reduced grammar error ratio compared to other generation algorithms, and shows potential where context specification is concerned.

Contributors

Agent

Created

Date Created
2016-05

132368-Thumbnail Image.png

Moving Target Defense: Defending against Adversarial Defense

Description

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this

A defense-by-randomization framework is proposed as an effective defense mechanism against different types of adversarial attacks on neural networks. Experiments were conducted by selecting a combination of differently constructed image classification neural networks to observe which combinations applied to this framework were most effective in maximizing classification accuracy. Furthermore, the reasons why particular combinations were more effective than others is explored.

Contributors

Agent

Created

Date Created
2019-05

162001-Thumbnail Image.png

Computationally Efficient Object Detection Strategy from Water Surfaces with Specularity Removal

Description

Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to

Floating trash objects are very commonly seen on water bodies such as lakes, canals and rivers. With the increase of plastic goods and human activities near the water bodies, these trash objects can pile up and cause great harm to the surrounding environment. Using human workers to clear out these trash is a hazardous and time-consuming task. Employing autonomous robots for these tasks is a better approach since it is more efficient and faster than humans. However, for a robot to clean the trash objects, a good detection algorithm is required. Real-time object detection on water surfaces is a challenging issue due to nature of the environment and the volatility of the water surface. In addition to this, running an object detection algorithm on an on-board processor of a robot limits the amount of CPU consumption that the algorithm can utilize. In this thesis, a computationally low cost object detection approach for robust detection of trash objects that was run on an on-board processor of a multirotor is presented. To account for specular reflections on the water surface, we use a polarization filter and integrate a specularity removal algorithm on our approach as well. The challenges faced during testing and the means taken to eliminate those challenges are also discussed. The algorithm was compared with two other object detectors using 4 different metrics. The testing was carried out using videos of 5 different objects collected at different illumination conditions over a lake using a multirotor. The results indicate that our algorithm is much suitable to be employed in real-time since it had the highest processing speed of 21 FPS, the lowest CPU consumption of 37.5\% and considerably high precision and recall values in detecting the object.

Contributors

Agent

Created

Date Created
2021

161838-Thumbnail Image.png

Weakly-Supervised Visual-Retriever-Reader Pipeline for Knowledge-Based VQA Tasks

Description

Visual question answering (VQA) is a task that answers the questions by giving an image, and thus involves both language and vision methods to solve, which make the VQA tasks a frontier interdisciplinary field. In recent years, as the great

Visual question answering (VQA) is a task that answers the questions by giving an image, and thus involves both language and vision methods to solve, which make the VQA tasks a frontier interdisciplinary field. In recent years, as the great progress made in simple question tasks (e.g. object recognition), researchers start to shift their interests to the questions that require knowledge and reasoning. Knowledge-based VQA requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverages different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of varying knowledge bases, it is hard to fairly compare models' performance. To address this issue, this paper collects a natural language knowledge base that can be used for any question answering (QA) system. Moreover, a Visual Retriever-Reader pipeline is proposed to approach knowledge-based VQA, where the visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. The retriever is constructed with two versions: term based retriever which uses best matching 25 (BM25), and neural based retriever where the latest dense passage retriever (DPR) is introduced. To encode the visual information, the image and caption are encoded separately in the two kinds of neural based retriever: Image-DPR and Caption-DPR. There are also two styles of readers, classification reader and extraction reader. Both the retriever and reader are trained with weak supervision. The experimental results show that a good retriever can significantly improve the reader's performance on the OK-VQA challenge.

Contributors

Agent

Created

Date Created
2021

156036-Thumbnail Image.png

Perturbation Robust Representations of Topological Persistence Diagrams

Description

Topological methods for data analysis present opportunities for enforcing certain invariances of broad interest in computer vision: including view-point in activity analysis, articulation in shape analysis, and measurement invariance in non-linear dynamical modeling. The increasing success of these methods is

Topological methods for data analysis present opportunities for enforcing certain invariances of broad interest in computer vision: including view-point in activity analysis, articulation in shape analysis, and measurement invariance in non-linear dynamical modeling. The increasing success of these methods is attributed to the complementary information that topology provides, as well as availability of tools for computing topological summaries such as persistence diagrams. However, persistence diagrams are multi-sets of points and hence it is not straightforward to fuse them with features used for contemporary machine learning tools like deep-nets. In this paper theoretically well-grounded approaches to develop novel perturbation robust topological representations are presented, with the long-term view of making them amenable to fusion with contemporary learning architectures. The proposed representation lives on a Grassmann manifold and hence can be efficiently used in machine learning pipelines.

The proposed representation.The efficacy of the proposed descriptor was explored on three applications: view-invariant activity analysis, 3D shape analysis, and non-linear dynamical modeling. Favorable results in both high-level recognition performance and improved performance in reduction of time-complexity when compared to other baseline methods are obtained.

Contributors

Agent

Created

Date Created
2017

155963-Thumbnail Image.png

Novel Image Representations and Learning Tasks

Description

Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to

Computer Vision as a eld has gone through signicant changes in the last decade.

The eld has seen tremendous success in designing learning systems with hand-crafted

features and in using representation learning to extract better features. In this dissertation

some novel approaches to representation learning and task learning are studied.

Multiple-instance learning which is generalization of supervised learning, is one

example of task learning that is discussed. In particular, a novel non-parametric k-

NN-based multiple-instance learning is proposed, which is shown to outperform other

existing approaches. This solution is applied to a diabetic retinopathy pathology

detection problem eectively.

In cases of representation learning, generality of neural features are investigated

rst. This investigation leads to some critical understanding and results in feature

generality among datasets. The possibility of learning from a mentor network instead

of from labels is then investigated. Distillation of dark knowledge is used to eciently

mentor a small network from a pre-trained large mentor network. These studies help

in understanding representation learning with smaller and compressed networks.

Contributors

Agent

Created

Date Created
2017

157799-Thumbnail Image.png

Sample-Efficient Reinforcement Learning of Robot Control Policies in the Real World

Description

The goal of reinforcement learning is to enable systems to autonomously solve tasks in the real world, even in the absence of prior data. To succeed in such situations, reinforcement learning algorithms collect new experience through interactions with the environment

The goal of reinforcement learning is to enable systems to autonomously solve tasks in the real world, even in the absence of prior data. To succeed in such situations, reinforcement learning algorithms collect new experience through interactions with the environment to further the learning process. The behaviour is optimized by maximizing a reward function, which assigns high numerical values to desired behaviours. Especially in robotics, such interactions with the environment are expensive in terms of the required execution time, human involvement, and mechanical degradation of the system itself. Therefore, this thesis aims to introduce sample-efficient reinforcement learning methods which are applicable to real-world settings and control tasks such as bimanual manipulation and locomotion. Sample efficiency is achieved through directed exploration, either by using dimensionality reduction or trajectory optimization methods. Finally, it is demonstrated how data-efficient reinforcement learning methods can be used to optimize the behaviour and morphology of robots at the same time.

Contributors

Agent

Created

Date Created
2019