Filtering by
- All Subjects: Data Analytics
- Creators: Department of Information Systems
- Resource Type: Text
sports, banking, and other disciplines. We use predictive analytics and modeling to
determine the impact of certain factors that increase the probability of a successful
fourth down conversion in the Power 5 conferences. The logistic regression models
predict the likelihood of going for fourth down with a 64% or more probability based on
2015-17 data obtained from ESPN’s college football API. Offense type though important
but non-measurable was incorporated as a random effect. We found that distance to go,
play type, field position, and week of the season were key leading covariates in
predictability. On average, our model performed as much as 14% better than coaches
in 2018.
This was achieved by first using offline explorer, an application that can download websites, to gather job postings from Dice.com that were searched by a pre-defined list of technical skills. Next came the parsing of the downloaded postings to extract and clean the data that was required and filling a database with that cleaned data. Then the companies were matched up with their corresponding industries. This was done using their NAICS (North American Industry Classification System) codes. The descriptions were then analyzed, and a group of soft skills was chosen based on the results of Word2Vec (a group of models that assists in creating word embeddings). A master table was then created by combining all of the tables in the database. The master table was then filtered down to exclude posts that required too much experience. Lastly, the web app was created using node.js as the back-end. This web app allows the user to choose their desired criteria and navigate through the postings that meet their criteria.
The goal of this project is to develop a deeper understanding of how machine learning pertains to the business world and how business professionals can capitalize on its capabilities. It explores the end-to-end process of integrating a machine and the tradeoffs and obstacles to consider. This topic is extremely pertinent today as the advent of big data increases and the use of machine learning and artificial intelligence is expanding across industries and functional roles. The approach I took was to expand on a project I championed as a Microsoft intern where I facilitated the integration of a forecasting machine learning model firsthand into the business. I supplement my findings from the experience with research on machine learning as a disruptive technology. This paper will not delve into the technical aspects of coding a machine model, but rather provide a holistic overview of developing the model from a business perspective. My findings show that, while the advantages of machine learning are large and widespread, a lack of visibility and transparency into the algorithms behind machine learning, the necessity for large amounts of data, and the overall complexity of creating accurate models are all tradeoffs to consider when deciding whether or not machine learning is suitable for a certain objective. The results of this paper are important in order to increase the understanding of any business professional on the capabilities and obstacles of integrating machine learning into their business operations.
Created predictive models using R to determine significant variables that help determine whether someone will default on their loans using a data set of almost 900,000 loan applicants.
Sports analytics refers to the implementation of data science and analytics techniques within the sports industry. Several sports analysts and team managers have utilized analytical tools to boost overall team and player performance, often through the analysis of historical data. One of the most common techniques employed in sports analytics is that of data mining–the extensive practice of analyzing data in order to extract and deliver insights and findings. Data mining projects are frequently guided with the six-step Cross Industry Standard Process for Data Mining (CRISP-DM) framework. One such sport that has extensively used data science and analytics, and data mining specifically, is that of Formula One (F1). Given the sports’ reliance on technology, race engineers working for F1 constructors often develop statistical models analyzing historical race performance to derive insight of drivers’ success. For the purposes of this project, the perspective of a race engineer working for the F1 constructor McLaren was considered. As the constructor is seeking to gain a competitive advantage for the upcoming F1 season, race performance data concerning previous seasons was collected and analyzed as part of a larger data mining project utilizing the CRISP-DM framework. Statistical models, such as linear regression and random forest, were developed to predict the number of points scored by McLaren racers and the variables most strongly contributed to such scored points. The final results point to specific lap times having to be aimed for as the most important variable in determining the number of points gained, although specific locations also seem prone to McLaren race success. These results in turn will be utilized to develop race strategies for the upcoming season to ensure McLaren has high efficiency against its competitors.
As Clive Humby said, “Data is the new oil” and is becoming ever more important to every industry, profession, and business with incredible applications like artificial intelligence and machine learning. Looking specifically at the Small and Medium Businesses (SMB) market segment, there is a significant gap in the use of data analytics. Only 15% of SMBs have a “data-driven” culture. Companies that leverage data to drive decision-making have seen increased revenue, profit, and employee output. Despite the benefits, SMB owners run into three main issues. First, a lack of bandwidth as time and human capital are stretched thin. Second, technical expertise as many analytics tools require coding expertise or knowledge of systems and tools which many SMBs do not possess. Lastly, many SMBs lack the finances to invest in costly tools or subject matter experts. Enterprise-level organizations will continue to invest in analytics leaving SMBs behind and increasing economic inequality. Our solution is DataMate, a Data as a Service (DaaS) no-code, low-cost, and low-time intensive platform designed to provide end-to-end analytics solutions for SMB owners. The platform allows users to automatically pull data from sources (ex. point of sale, customer relationship management, etc.), store data in a centralized location, and lastly, visualize data through dashboards to enable SMBs with data-driven decision-making capabilities. Once at scale, we will be able to create models and deliver advanced predictive and prescriptive analytics. The global data-as-a-service industry market was valued at $5.5B in 2021 and is expected to grow at a CAGR of 36.9% until 2030. SMBs account for a minority of global revenue share but are expected to grow faster than large enterprises. The Total Addressable Market (TAM) for the data-as-a-service industry of small and medium-sized businesses in the United States is roughly $1.02B and the Serviceable Obtainable Market (SOM) is roughly $2.6M. The DaaS industry is highly competitive with high customer bargaining power and large growth potential. Some direct competitors to DataMate are FiveTran, Looker, Domo, and Alteryx. While offering similar data infrastructure services, no solution can achieve DataMate’s unique product value proposition. A fully operational platform will require considerable technical investment. Our go-to-market strategy consists of a manual and automated phase. To start, leveraging the expertise of data/business analysts to manually build end-to-end analytics solutions. Concurrently, we plan to build an automated platform. By starting to manually build, we can bring revenue on day one while solidifying template dashboards and ETL flows. Additionally, DataMate will start building data solutions only in the restaurant vertical given its large market segment and homogeneity of tools. Given the numerous variations in data needs between SMB industries, a step-by-step rollout allows for quality integration. Eventually, the platform will expand to all industries.