Filtering by
- All Subjects: Data Analytics
- Creators: Department of Information Systems
- Resource Type: Text
Created predictive models using R to determine significant variables that help determine whether someone will default on their loans using a data set of almost 900,000 loan applicants.
Corporate buzzword terms like “big data” and “data analytics” are vague in meaning, and are thrown around by media sources often enough to obfuscate their actual meanings. These concepts are then associated with company-wide initiatives beyond the reach of the individual, in a nebulous world where people know that analytics happens, but don’t understand what it is.
The power of data analytics is not reserved for company-wide initiatives, or only employed by Silicon Valley tech start-ups. Its impacts are visible down at the team or department level, and can be conducted by the individual employees. The field of data analytics is evolving, and within it exists a rapid transition in which the individual employee is becoming a source for insight and value creation through the adoption of analytics based approaches.
The purpose of this thesis is to showcase an example of this claim, and demonstrate how an analytics based approach was applied to an existing accounting process to create new insights and information. To do this, I will discuss my development of an Excel based Dashboard Analytics tool, which I completed during my internship with Bechtel Corporation throughout the summer of 2018, and I will use this analytics tool to demonstrate the improvements that small-scale analytics had on a pre-existing process. During this discussion, I will address conceptual aspects of database design that related to my project, and will show how I applied this classroom learning to a working environment. The paper will begin with an overview of the desired goals of the group in which I was based, and will then analyze how the needs of the group led to the creation and implementation of this new analytics-based reporting tool. I will conclude with a discussion of the potential future use of this tool, and how the inclusion of these analytical approaches will continue to shape the working environment.
The goal of this project is to develop a deeper understanding of how machine learning pertains to the business world and how business professionals can capitalize on its capabilities. It explores the end-to-end process of integrating a machine and the tradeoffs and obstacles to consider. This topic is extremely pertinent today as the advent of big data increases and the use of machine learning and artificial intelligence is expanding across industries and functional roles. The approach I took was to expand on a project I championed as a Microsoft intern where I facilitated the integration of a forecasting machine learning model firsthand into the business. I supplement my findings from the experience with research on machine learning as a disruptive technology. This paper will not delve into the technical aspects of coding a machine model, but rather provide a holistic overview of developing the model from a business perspective. My findings show that, while the advantages of machine learning are large and widespread, a lack of visibility and transparency into the algorithms behind machine learning, the necessity for large amounts of data, and the overall complexity of creating accurate models are all tradeoffs to consider when deciding whether or not machine learning is suitable for a certain objective. The results of this paper are important in order to increase the understanding of any business professional on the capabilities and obstacles of integrating machine learning into their business operations.
sports, banking, and other disciplines. We use predictive analytics and modeling to
determine the impact of certain factors that increase the probability of a successful
fourth down conversion in the Power 5 conferences. The logistic regression models
predict the likelihood of going for fourth down with a 64% or more probability based on
2015-17 data obtained from ESPN’s college football API. Offense type though important
but non-measurable was incorporated as a random effect. We found that distance to go,
play type, field position, and week of the season were key leading covariates in
predictability. On average, our model performed as much as 14% better than coaches
in 2018.
This was achieved by first using offline explorer, an application that can download websites, to gather job postings from Dice.com that were searched by a pre-defined list of technical skills. Next came the parsing of the downloaded postings to extract and clean the data that was required and filling a database with that cleaned data. Then the companies were matched up with their corresponding industries. This was done using their NAICS (North American Industry Classification System) codes. The descriptions were then analyzed, and a group of soft skills was chosen based on the results of Word2Vec (a group of models that assists in creating word embeddings). A master table was then created by combining all of the tables in the database. The master table was then filtered down to exclude posts that required too much experience. Lastly, the web app was created using node.js as the back-end. This web app allows the user to choose their desired criteria and navigate through the postings that meet their criteria.
As Clive Humby said, “Data is the new oil” and is becoming ever more important to every industry, profession, and business with incredible applications like artificial intelligence and machine learning. Looking specifically at the Small and Medium Businesses (SMB) market segment, there is a significant gap in the use of data analytics. Only 15% of SMBs have a “data-driven” culture. Companies that leverage data to drive decision-making have seen increased revenue, profit, and employee output. Despite the benefits, SMB owners run into three main issues. First, a lack of bandwidth as time and human capital are stretched thin. Second, technical expertise as many analytics tools require coding expertise or knowledge of systems and tools which many SMBs do not possess. Lastly, many SMBs lack the finances to invest in costly tools or subject matter experts. Enterprise-level organizations will continue to invest in analytics leaving SMBs behind and increasing economic inequality. Our solution is DataMate, a Data as a Service (DaaS) no-code, low-cost, and low-time intensive platform designed to provide end-to-end analytics solutions for SMB owners. The platform allows users to automatically pull data from sources (ex. point of sale, customer relationship management, etc.), store data in a centralized location, and lastly, visualize data through dashboards to enable SMBs with data-driven decision-making capabilities. Once at scale, we will be able to create models and deliver advanced predictive and prescriptive analytics. The global data-as-a-service industry market was valued at $5.5B in 2021 and is expected to grow at a CAGR of 36.9% until 2030. SMBs account for a minority of global revenue share but are expected to grow faster than large enterprises. The Total Addressable Market (TAM) for the data-as-a-service industry of small and medium-sized businesses in the United States is roughly $1.02B and the Serviceable Obtainable Market (SOM) is roughly $2.6M. The DaaS industry is highly competitive with high customer bargaining power and large growth potential. Some direct competitors to DataMate are FiveTran, Looker, Domo, and Alteryx. While offering similar data infrastructure services, no solution can achieve DataMate’s unique product value proposition. A fully operational platform will require considerable technical investment. Our go-to-market strategy consists of a manual and automated phase. To start, leveraging the expertise of data/business analysts to manually build end-to-end analytics solutions. Concurrently, we plan to build an automated platform. By starting to manually build, we can bring revenue on day one while solidifying template dashboards and ETL flows. Additionally, DataMate will start building data solutions only in the restaurant vertical given its large market segment and homogeneity of tools. Given the numerous variations in data needs between SMB industries, a step-by-step rollout allows for quality integration. Eventually, the platform will expand to all industries.