Search Content

Analyzing LinkedIn Profiles Using Machine Learning

Description

Understanding the necessary skills required to work in an industry is a difficult task with many potential uses. By being able to predict the industry of a person based on their skills, professional social networks could make searching better with automated tagging, advertisers can target more carefully, and students can…

Understanding the necessary skills required to work in an industry is a difficult task with many potential uses. By being able to predict the industry of a person based on their skills, professional social networks could make searching better with automated tagging, advertisers can target more carefully, and students can better find a career path that fits their skillset. The aim in this project is to apply deep learning to the world of professional networking. Deep Learning is a type of machine learning that has recently been making breakthroughs in the analysis of complex datasets that previously were not of much use. Initially the goal was to apply deep learning to the skills-to-company relationship, but a lack of quality data required a change to the skills-to-industry relationship. To accomplish the new goal, a database of LinkedIn profiles that are part of various industries was gathered and processed. From this dataset a model was created to take a list of skills and output an industry that people with those skills work in. Such a model has value in the insights that it forms allowing candidates to: determine what industry fits a skillset, identify key skills for industries, and locate which industries possible candidates may best fit in. Various models were trained and tested on a skill to industry dataset. The model was able to learn similarities between industries, and predict the most likely industries for each profiles skillset.

ContributorsAndrew, Benjamin (Co-author) / Thiel, Alex (Co-author) / Sodemann, Angela (Thesis director) / Sebold, Brent (Committee member) / Engineering Programs (Contributor) / Barrett, The Honors College (Contributor)

Created2017-12

Machine Learning Enabled Analytics for Health-Related Demographics: a Case Study Identifying Important Factors in Cardiac Disease

Description

Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to extracting semantic meaning from thousands of medical form PDFs. Currently…

Machine learning for analytics has exponentially increased in the past few years due to its ability to identify hidden insights in data. It also has a plethora of applications in healthcare ranging from improving image recognition in CT scans to extracting semantic meaning from thousands of medical form PDFs. Currently in the BioElectrical Systems and Technology Lab, there is a biosensor in development that retrieves and analyzes data manually. In a proof of concept, this project uses the neural network architecture to automatically parse and classify a cardiac disease data set as well as explore health related factors impacting cardiac disease in patients of all ages.

ContributorsMurella, Akhila Sainagaki (Author) / Blain-Christen, Jennifer (Thesis director) / Meuth, Ryan (Committee member) / Computer Science and Engineering Program (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / School of the Arts, Media and Engineering (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Predicting the Outcome of UFC Fights Using Machine Learning Models

Description

Abstract: The Ultimate Fighting Championship or UFC as it is commonly known, was founded in 1993 and has quickly built itself into the world's foremost authority on all things MMA (mixed martial arts) related. With pay-per-view and cable television deals in hand, the UFC has become a huge competitor in…

Abstract: The Ultimate Fighting Championship or UFC as it is commonly known, was founded in 1993 and has quickly built itself into the world's foremost authority on all things MMA (mixed martial arts) related. With pay-per-view and cable television deals in hand, the UFC has become a huge competitor in the sports market, rivaling the popularity of boxing for almost a decade. As with most other sports, the UFC has seen an influx of various analytics and data science over the past five to seven years. We see this revolution in football with the broadcast first down markers, basketball with tracking player movement, and baseball with locating pitches for strikes and balls, and now the UFC has partnered with statistics company Fightmetric, to provide in-depth statistical analysis of its fights. ESPN has their win probability metrics, and statistical predictive modeling has begun to spread throughout sports. All these stats were made to showcase the information about a fighter that one wouldn't typically know, giving insight into how the fight might go. But, can these fights be predicted? Based off of the research of prior individuals and combining the thought processes of relevant research into other sports leagues, I sought to use the arsenal of statistical analyses done by Fightmetric, along with the official UFC fighter database to answer the question of whether UFC fights could be predicted. Specifically, by using only data that would be known about a fighter prior to stepping into the cage, could I predict with any degree of certainty who was going to win the fight?

ContributorsMoorman, Taylor D. (Author) / Simon, Alan (Thesis director) / Simon, Phil (Committee member) / W.P. Carey School of Business (Contributor) / Department of Information Systems (Contributor) / Department of Management and Entrepreneurship (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Automatic Song Lyric Generation and Classification with Long Short-Term Networks

Description

Lyric classification and generation are trending in topics in the machine learning community. Long Short-Term Networks (LSTMs) are effective tools for classifying and generating text. We explored their effectiveness in the generation and classification of lyrical data and proposed methods of evaluating their accuracy. We found that LSTM networks with…

Lyric classification and generation are trending in topics in the machine learning community. Long Short-Term Networks (LSTMs) are effective tools for classifying and generating text. We explored their effectiveness in the generation and classification of lyrical data and proposed methods of evaluating their accuracy. We found that LSTM networks with dropout layers were effective at lyric classification. We also found that Word embedding LSTM networks were extremely effective at lyric generation.

ContributorsTallapragada, Amit (Author) / Ben Amor, Heni (Thesis director) / Caviedes, Jorge (Committee member) / Computer Science and Engineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

The Capabilities and Obstacles of Integrating Machine Learning into a Supply Chain

Description

Only an Executive Summary of the project is included.
The goal of this project is to develop a deeper understanding of how machine learning pertains to the business world and how business professionals can capitalize on its capabilities. It explores the end-to-end process of integrating a machine and the tradeoffs…

Only an Executive Summary of the project is included.
The goal of this project is to develop a deeper understanding of how machine learning pertains to the business world and how business professionals can capitalize on its capabilities. It explores the end-to-end process of integrating a machine and the tradeoffs and obstacles to consider. This topic is extremely pertinent today as the advent of big data increases and the use of machine learning and artificial intelligence is expanding across industries and functional roles. The approach I took was to expand on a project I championed as a Microsoft intern where I facilitated the integration of a forecasting machine learning model firsthand into the business. I supplement my findings from the experience with research on machine learning as a disruptive technology. This paper will not delve into the technical aspects of coding a machine model, but rather provide a holistic overview of developing the model from a business perspective. My findings show that, while the advantages of machine learning are large and widespread, a lack of visibility and transparency into the algorithms behind machine learning, the necessity for large amounts of data, and the overall complexity of creating accurate models are all tradeoffs to consider when deciding whether or not machine learning is suitable for a certain objective. The results of this paper are important in order to increase the understanding of any business professional on the capabilities and obstacles of integrating machine learning into their business operations.

ContributorsVerma, Ria (Author) / Goegan, Brian (Thesis director) / Moore, James (Committee member) / Department of Information Systems (Contributor) / Department of Supply Chain Management (Contributor) / Department of Economics (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Reddit Predicts Swings in the Stock Market: r/WorldNews and Using Machine Learning to Predict Changes in Stock Price

Description

In this paper, I will show that news headlines of global events can predict changes in stock price by using Machine Learning and eight years of data from r/WorldNews, a popular forum on Reddit.com. My data is confined to the top 25 daily posts on the forum, and due to…

In this paper, I will show that news headlines of global events can predict changes in stock price by using Machine Learning and eight years of data from r/WorldNews, a popular forum on Reddit.com. My data is confined to the top 25 daily posts on the forum, and due to the implicit filtering mechanism in the online community, these 25 posts are representative of the most popular news headlines and influential global events of the day. Hence, these posts shine a light on how large-scale social and political events affect the stock market. Using a Logistic Regression and a Naive Bayes classifier, I am able to predict with approximately 85% accuracy a binary change in stock price using term-feature vectors gathered from the news headlines. The accuracy, precision and recall results closely rival the best models in this field of research. In addition to the results, I will also describe the mathematical underpinnings of the two models; preceded by a general investigation of the intersection between the multiple academic disciplines related to this project. These range from social to computer science and from statistics to philosophy. The goal of this additional discussion is to further illustrate the interdisciplinary nature of the research and hopefully inspire a non-monolithic mindset when further investigations are pursued.

ContributorsPriniski, John Hunter (Author) / Haiyan, Wang (Thesis director) / Hazel, Kwon (Committee member) / School of Historical, Philosophical and Religious Studies (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Startle can evoke individuated movements of the fingers; implications for neural control

Description

Startle-evoked-movement (SEM), the involuntary release of a planned movement via a startling stimulus, has gained significant attention recently for its ability to probe motor planning as well as enhance movement of the upper extremity following stroke. We recently showed that hand movements are susceptible to SEM. Interestingly, only coordinated movements…

Startle-evoked-movement (SEM), the involuntary release of a planned movement via a startling stimulus, has gained significant attention recently for its ability to probe motor planning as well as enhance movement of the upper extremity following stroke. We recently showed that hand movements are susceptible to SEM. Interestingly, only coordinated movements of the hand (grasp) but not individuated movements of the finger (finger abduction) were susceptible. It was suggested that this resulted from different neural mechanisms involved in each task; however it is possible this was the result of task familiarity. The objective of this study was to evaluate a more familiar individuated finger movement, typing, to determine if this task was susceptible to SEM. We hypothesized that typing movements will be susceptible to SEM in all fingers. These results indicate that individuated movements of the fingers are susceptible to SEM when the task involves a more familiar task, since the electromyogram (EMG) latency is faster in SCM+ trials compared to SCM- trials. However, the middle finger does not show a difference in terms of the keystroke voltage signal, suggesting the middle finger is less susceptible to SEM. Given that SEM is thought to be mediated by the brainstem, specifically the reticulospinal tract, this suggest that the brainstem may play a role in movements of the distal limb when those movements are very familiar, and the independence of each finger might also have a significant on the effect of SEM. Further research includes understanding SEM in fingers in the stroke population. The implications of this research can impact the way upper extremity rehabilitation is delivered.

ContributorsQuezada Valladares, Maria Jose (Author) / Honeycutt, Claire (Thesis director) / Santello, Marco (Committee member) / Harrington Bioengineering Program (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Naïve Bayes Classification for Analyzing Prostate Cancer Treatment Outcomes

Description

Prostate cancer is the second most common kind of cancer in men. Fortunately, it has a 99% survival rate. To achieve such a survival rate, a variety of aggressive therapies are used to treat prostate cancers that are caught early. Androgen deprivation therapy (ADT) is a therapy that is given…

Prostate cancer is the second most common kind of cancer in men. Fortunately, it has a 99% survival rate. To achieve such a survival rate, a variety of aggressive therapies are used to treat prostate cancers that are caught early. Androgen deprivation therapy (ADT) is a therapy that is given in cycles to patients. This study attempted to analyze what factors in a group of 79 patients caused them to stick with or discontinue the treatment. This was done using naïve Bayes classification, a machine-learning algorithm. The usage of this algorithm identified high testosterone as an indicator of a patient persevering with the treatment, but failed to produce statistically significant high rates of prediction.

ContributorsMillea, Timothy Michael (Author) / Kostelich, Eric (Thesis director) / Kuang, Yang (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Startle-evoked movement in multi-jointed, two-dimensional reaching tasks

Description

Previous research has shown that a loud acoustic stimulus can trigger an individual's prepared movement plan. This movement response is referred to as a startle-evoked movement (SEM). SEM has been observed in the stroke survivor population where results have shown that SEM enhances single joint movements that are usually performed…

Previous research has shown that a loud acoustic stimulus can trigger an individual's prepared movement plan. This movement response is referred to as a startle-evoked movement (SEM). SEM has been observed in the stroke survivor population where results have shown that SEM enhances single joint movements that are usually performed with difficulty. While the presence of SEM in the stroke survivor population advances scientific understanding of movement capabilities following a stroke, published studies using the SEM phenomenon only examined one joint. The ability of SEM to generate multi-jointed movements is understudied and consequently limits SEM as a potential therapy tool. In order to apply SEM as a therapy tool however, the biomechanics of the arm in multi-jointed movement planning and execution must be better understood. Thus, the objective of our study was to evaluate if SEM could elicit multi-joint reaching movements that were accurate in an unrestrained, two-dimensional workspace. Data was collected from ten subjects with no previous neck, arm, or brain injury. Each subject performed a reaching task to five Targets that were equally spaced in a semi-circle to create a two-dimensional workspace. The subject reached to each Target following a sequence of two non-startling acoustic stimuli cues: "Get Ready" and "Go". A loud acoustic stimuli was randomly substituted for the "Go" cue. We hypothesized that SEM is accessible and accurate for unrestricted multi-jointed reaching tasks in a functional workspace and is therefore independent of movement direction. Our results found that SEM is possible in all five Target directions. The probability of evoking SEM and the movement kinematics (i.e. total movement time, linear deviation, average velocity) to each Target are not statistically different. Thus, we conclude that SEM is possible in a functional workspace and is not dependent on where arm stability is maximized. Moreover, coordinated preparation and storage of a multi-jointed movement is indeed possible.

ContributorsOssanna, Meilin Ryan (Author) / Honeycutt, Claire (Thesis director) / Schaefer, Sydney (Committee member) / Harrington Bioengineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Using Facebook to Examine Smoking Behavior through ""Quit Smoking"" Support Groups

Description

Background: As the growth of social media platforms continues, the use of the constantly increasing amount of freely available, user-generated data they receive becomes of great importance. One apparent use of this content is public health surveillance; such as for increasing understanding of substance abuse. In this study, Facebook was…

Background: As the growth of social media platforms continues, the use of the constantly increasing amount of freely available, user-generated data they receive becomes of great importance. One apparent use of this content is public health surveillance; such as for increasing understanding of substance abuse. In this study, Facebook was used to monitor nicotine addiction through the public support groups users can join to aid their quitting process. Objective: The main objective of this project was to gain a better understanding of the mechanisms of nicotine addiction online and provide content analysis of Facebook posts obtained from "quit smoking" support groups. Methods: Using the Facebook Application Programming Interface (API) for Python, a sample of 9,970 posts were collected in October 2015. Information regarding the user's name and the number of likes and comments they received on their post were also included. The posts crawled were then manually classified by one annotator into one of three categories: positive, negative, and neutral. Where positive posts are those that describe current quits, negative posts are those that discuss relapsing, and neutral posts are those that were not be used to train the classifiers, which include posts where users have yet to attempt a quit, ads, random questions, etc. For this project, the performance of two machine learning algorithms on a corpus of manually labeled Facebook posts were compared. The classification goal was to test the plausibility of creating a natural language processing machine learning classifier which could be used to distinguish between relapse (labeled negative) and quitting success (labeled positive) posts from a set of smoking related posts. Results: From the corpus of 9,970 posts that were manually labeled: 6,254 (62.7%) were labeled positive, 1,249 (12.5%) were labeled negative, and 2467 (24.8%) were labeled neutral. Since the posts labeled neutral are those which are irrelevant to the classification task, 7,503 posts were used to train the classifiers: 83.4% positive and 16.6% negative. The SVM classifier was 84.1% accurate and 84.1% precise, had a recall of 1, and an F-score of 0.914. The MNB classifier was 82.8% accurate and 82.8% precise, had a recall of 1, and an F-score of 0.906. Conclusions: From the Facebook surveillance results, a small peak is given into the behavior of those looking to quit smoking. Ultimately, what makes Facebook a great tool for public health surveillance is that it has an extremely large and diverse user base with information that is easily obtainable. This, and the fact that so many people are actually willing to use Facebook support groups to aid their quitting processes demonstrates that it can be used to learn a lot about quitting and smoking behavior.

ContributorsMolina, Daniel Antonio (Author) / Li, Baoxin (Thesis director) / Tian, Qiongjie (Committee member) / School of Mathematical and Statistical Sciences (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Filtering by