Search Content

Cluster metrics and temporal coherency in pixel based matrices

Description

In this thesis, the application of pixel-based vertical axes used within parallel coordinate plots is explored in an attempt to improve how existing tools can explain complex multivariate interactions across temporal data. Several promising visualization techniques are combined, such as: visual boosting to allow for quicker consumption of large data…

In this thesis, the application of pixel-based vertical axes used within parallel coordinate plots is explored in an attempt to improve how existing tools can explain complex multivariate interactions across temporal data. Several promising visualization techniques are combined, such as: visual boosting to allow for quicker consumption of large data sets, the bond energy algorithm to find finer patterns and anomalies through contrast, multi-dimensional scaling, flow lines, user guided clustering, and row-column ordering. User input is applied on precomputed data sets to provide for real time interaction. General applicability of the techniques are tested against industrial trade, social networking, financial, and sparse data sets of varying dimensionality.

ContributorsHayden, Thomas (Author) / Maciejewski, Ross (Thesis advisor) / Wang, Yalin (Committee member) / Runger, George C. (Committee member) / Mack, Elizabeth (Committee member) / Arizona State University (Publisher)

Created2014

Predicting and Interpreting Students Performance using Supervised Learning and Shapley Additive Explanations

Description

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness,…

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness, but limited studies compared different statistical techniques with latest frameworks, and interpreted models in a unified approach.

In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore

and predict students’ performances. Multiple machine learning models and the model accuracy were evaluated based on the Shapley Additive Explanation.

The Cross-Validation shows the Gradient Boosting Decision Tree has the best precision 85.93% with average 82.90%. Features like Component grade, Due Date, Submission Times have higher impact than others. Baseline model received lower precision due to lack of non-linear fitting.

ContributorsTian, Wenbo (Author) / Hsiao, Ihan (Thesis advisor) / Bazzi, Rida (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2019

The Evolution of Data and Statistics in Baseball

Description

While former New York Yankees pitcher Goose Gossage unleashed his tirade on the deterioration of the unwritten rules of baseball and nerds ruining the sport about halfway through my writing of the paper, sentiments like his were inspiration for my topic: the evolution of statistics and data in baseball. By…

While former New York Yankees pitcher Goose Gossage unleashed his tirade on the deterioration of the unwritten rules of baseball and nerds ruining the sport about halfway through my writing of the paper, sentiments like his were inspiration for my topic: the evolution of statistics and data in baseball. By telling the story of how baseball data and statistics have evolved, my goal was to also demonstrate how they have been intertwined since the beginning—which would essentially mean that nerds have always been ruining the sport (if you subscribe to that kind of thought).

In the quest to showcase this, it was necessary to document how baseball prospers from numbers and numbers prosper from baseball. The relationship between the two is mutualistic. Furthermore, an all-encompassing historical look at how data and statistics in baseball have matured was a critical portion of the paper. With a metric such as batting average going from a radical new measure that posed a threat to the status quo, to a fiercely cherished statistic that was suddenly being unseated by advanced analytics, it shows the creation of new and destruction of old has been incessant. Innovators like Pete Palmer, Dick Cramer and Bill James played a large role in this process in the 1980s. Computers aided their effort and when paired with the Internet, unleashed the ability to crunch data to an even larger sector of the population. The unveiling of Statcast at the commencement of the 2015 season showed just how much potential there is for measuring previously unquantifiable baseball acts.

Essentially, there will always be people who mourn the presence of data and statistics in baseball. Despite this, the evolution story indicates baseball and numbers will be intertwined into the future, likely to an even greater extent than ever before, as technology and new philosophies become increasingly integrated into front offices and clubhouses.

ContributorsGarcia, Jacob Michael (Author) / Kurland, Brett (Thesis director) / Doig, Stephen (Committee member) / Jackson, Victoria (Committee member) / Walter Cronkite School of Journalism and Mass Communication (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

We Should Talk: Consulting the Relationship Between Twitter and Sports Journalism

Description

This thesis documentary film takes a look at the dysfunctional but ongoing relationship between Twitter and sports journalism. The foundation of this relationship's dysfunction is what I have coined as the Twitter Outrage Cycle. In this cycle a sports broadcasting personality comments on a matter while on-air. Next, the program's…

This thesis documentary film takes a look at the dysfunctional but ongoing relationship between Twitter and sports journalism. The foundation of this relationship's dysfunction is what I have coined as the Twitter Outrage Cycle. In this cycle a sports broadcasting personality comments on a matter while on-air. Next, the program's audience where the comments were spoken becomes offended by the statement. After that, the offended audience members express their outrage on social media, most namely Twitter. Finally the cycle culminates with the public outrage pressuring networks and its executives to either suspended or fire the individual that said the controversial statements. This cycle began to occur on a more consistent basis starting in 2012. It became such a regular occurrence that many on-air talent figures have noticed and taken precautionary measures to either avoid or confront the Outrage Cycles. This documentary uses the voice of seven figures within the sports media and online interaction forum. Notable using the voices of three notable individuals that currently have a prominent voice in sports journalism. As well as a neutral social media curator who clearly explains the psyche behind these outraged viewer's mindsets. Through these four main voices their ideals and opinions on the matter weave together, disagree with each other at times but ultimately help the viewer come to an understanding of why these Outrage Cycles occur and what needs to be done in order for them to cease. We Should Talk: The Relationship Between Twitter and Sports Journalism is a documentary film that looks to illustrate a seemingly minimal part of many people's lives that when taken into perspective many people look at in a very serious light.

ContributorsNeely, Cammeron Allen Douglas (Author) / Kurland, Brett (Thesis director) / Fergus, Tom (Committee member) / Walter Cronkite School of Journalism and Mass Communication (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Data science for small businesses

Description

This reports investigates the general day to day problems faced by small businesses, particularly small vendors, in areas of marketing and general management. Due to lack of man power, internet availability and properly documented data, small business cannot optimize their business. The aim of the research is to address and…

This reports investigates the general day to day problems faced by small businesses, particularly small vendors, in areas of marketing and general management. Due to lack of man power, internet availability and properly documented data, small business cannot optimize their business. The aim of the research is to address and find a solution to these problems faced, in the form of a tool which utilizes data science. The tool will have features which will aid the vendor to mine their data which they record themselves and find useful information which will benefit their businesses. Since there is lack of properly documented data, One Class Classification using Support Vector Machine (SVM) is used to build a classifying model that can return positive values for audience that is likely to respond to a marketing strategy. Market basket analysis is used to choose products from the inventory in a way that patterns are found amongst them and therefore there is a higher chance of a marketing strategy to attract audience. Also, higher selling products can be used to the vendors' advantage and lesser selling products can be paired with them to have an overall profit to the business. The tool, as envisioned, meets all the requirements that it was set out to have and can be used as a stand alone application to bring the power of data mining into the hands of a small vendor.

ContributorsSharma, Aveesha (Author) / Ghazarian, Arbi (Thesis advisor) / Gaffar, Ashraf (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2016

Decision-making for utility scale photovoltaic systems: probabilistic risk assessment models for corrosion of structural elements and a material selection approach for polymeric components

Description

The solar energy sector has been growing rapidly over the past decade. Growth in renewable electricity generation using photovoltaic (PV) systems is accompanied by an increased awareness of the fault conditions developing during the operational lifetime of these systems. While the annual energy losses caused by faults in PV systems…

The solar energy sector has been growing rapidly over the past decade. Growth in renewable electricity generation using photovoltaic (PV) systems is accompanied by an increased awareness of the fault conditions developing during the operational lifetime of these systems. While the annual energy losses caused by faults in PV systems could reach up to 18.9% of their total capacity, emerging technologies and models are driving for greater efficiency to assure the reliability of a product under its actual application. The objectives of this dissertation consist of (1) reviewing the state of the art and practice of prognostics and health management for the Direct Current (DC) side of photovoltaic systems; (2) assessing the corrosion of the driven posts supporting PV structures in utility scale plants; and (3) assessing the probabilistic risk associated with the failure of polymeric materials that are used in tracker and fixed tilt systems.

As photovoltaic systems age under relatively harsh and changing environmental conditions, several potential fault conditions can develop during the operational lifetime including corrosion of supporting structures and failures of polymeric materials. The ability to accurately predict the remaining useful life of photovoltaic systems is critical for plants ‘continuous operation. This research contributes to the body of knowledge of PV systems reliability by: (1) developing a meta-model of the expected service life of mounting structures; (2) creating decision frameworks and tools to support practitioners in mitigating risks; (3) and supporting material selection for fielded and future photovoltaic systems. The newly developed frameworks were validated by a global solar company.

ContributorsChokor, Abbas (Author) / El Asmar, Mounir (Thesis advisor) / Chong, Oswald (Committee member) / Ernzen, James (Committee member) / Arizona State University (Publisher)

Created2017

Lossless Data Compression by Representing Data as a Solution to the Diophantine Equations

Description

There has been a substantial development in the field of data transmission in the last two decades. One does not have to wait much for a high-definition video to load on the systems anymore. Data compression is one of the most important technologies that helped achieve this seamless data transmission…

There has been a substantial development in the field of data transmission in the last two decades. One does not have to wait much for a high-definition video to load on the systems anymore. Data compression is one of the most important technologies that helped achieve this seamless data transmission experience. It helps to store or send more data using less memory or network resources. However, it appears that there is a limit on the amount of compression that can be achieved with the existing lossless data compression techniques because they rely on the frequency of characters or set of characters in the data. The thesis proposes a lossless data compression technique in which the data is compressed by representing it as a set of parameters that can reproduce the original data without any loss when given to the corresponding mathematical equation. The mathematical equation used in the thesis is the sum of the first N terms in a geometric series. Various changes are made to this mathematical equation so that any given data can be compressed and decompressed. According to the proposed technique, the whole data is taken as a single decimal number and replaced with one of the terms of the used equation. All the other terms of the equation are computed and stored as a compressed file. The performance of the developed technique is evaluated in terms of compression ratio, compression time and decompression time. The evaluation metrics are then compared with the other existing techniques of the same domain.

ContributorsGrewal, Karandeep Singh (Author) / Gonzalez Sanchez, Javier (Thesis advisor) / Bansal, Ajay (Committee member) / Findler, Michael (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by

Cluster metrics and temporal coherency in pixel based matrices

Predicting and Interpreting Students Performance using Supervised Learning and Shapley Additive Explanations

The Evolution of Data and Statistics in Baseball

We Should Talk: Consulting the Relationship Between Twitter and Sports Journalism

Data science for small businesses

Decision-making for utility scale photovoltaic systems: probabilistic risk assessment models for corrosion of structural elements and a material selection approach for polymeric components

Lossless Data Compression by Representing Data as a Solution to the Diophantine Equations