Filtering by
- All Subjects: Python
- Creators: Computer Science and Engineering Program
- Creators: Economics Program in CLAS
Throughout this project, I decided on a number of learning goals to consider it a success. I needed to learn how to use the supporting libraries that would help me to design this system. I also learned how to use the Twitter API, as well as create the infrastructure behind it that would allow me to collect large amounts of data for machine learning. I needed to become familiar with common machine learning libraries in Python in order to create the necessary algorithms and pipelines to make predictions based on Twitter data.
This paper details the steps and decisions needed to determine how to collect this data and apply it to machine learning algorithms. I determined how to create labelled data using pre-existing Botometer ratings, and the levels of confidence I needed to label data for training. I use the scikit-learn library to create these algorithms to best detect these bots. I used a number of pre-processing routines to refine the classifiers’ precision, including natural language processing and data analysis techniques. I eventually move to remotely-hosted versions of the system on Amazon web instances to collect larger amounts of data and train more advanced classifiers. This leads to the details of my final implementation of a user-facing server, hosted on AWS and interfacing over Gmail’s IMAP server.
The current and future development of this system is laid out. This includes more advanced classifiers, better data analysis, conversions to third party Twitter data collection systems, and user features. I detail what it is I have learned from this exercise, and what it is I hope to continue working on.
The program writes randomly generated, syntactically correct Python 3 code in order to provide students infinite examples from which to study. The end goal of the project is to create an interactive tool where beginning programming students can click a button to generate a random code snippet, check if what they predict the output to be is correct, and get an explanation of the code line by line. The tool currently lacks a front end, but it currently is able to write Python code that includes assignment statements, delete statements, if statements, and print statements. It supports boolean, float, integer, and string variable types.
When creating computer vision applications, it is important to have a clear image of what is represented such that further processing has the best representation of the underlying data. A common factor that impacts image quality is blur, caused either by an intrinsic property of the camera lens or by introducing motion while the camera’s shutter is capturing an image. Possible solutions for reducing the impact of blur include cameras with faster shutter speeds or higher resolutions; however, both of these solutions require utilizing more expensive equipment, which is infeasible for instances where images are already captured. This thesis discusses an iterative solution for deblurring an image using an alternating minimization technique through regularization and PSF reconstruction. The alternating minimizer is then used to deblur a sample image of a pumpkin field to demonstrate its capabilities.
Machine learning has a near infinite number of applications, of which the potential has yet to have been fully harnessed and realized. This thesis will outline two departments that machine learning can be utilized in, and demonstrate the execution of one methodology in each department. The first department that will be described is self-play in video games, where a neural model will be researched and described that will teach a computer to complete a level of Super Mario World (1990) on its own. The neural model in question was inspired by the academic paper “Evolving Neural Networks through Augmenting Topologies”, which was written by Kenneth O. Stanley and Risto Miikkulainen of University of Texas at Austin. The model that will actually be described is from YouTuber SethBling of the California Institute of Technology. The second department that will be described is cybersecurity, where an algorithm is described from the academic paper “Process Based Volatile Memory Forensics for Ransomware Detection”, written by Asad Arfeen, Muhammad Asim Khan, Obad Zafar, and Usama Ahsan. This algorithm utilizes Python and the Volatility framework to detect malicious software in an infected system.