Search Content

Cluster metrics and temporal coherency in pixel based matrices

Description

In this thesis, the application of pixel-based vertical axes used within parallel coordinate plots is explored in an attempt to improve how existing tools can explain complex multivariate interactions across temporal data. Several promising visualization techniques are combined, such as: visual boosting to allow for quicker consumption of large data…

In this thesis, the application of pixel-based vertical axes used within parallel coordinate plots is explored in an attempt to improve how existing tools can explain complex multivariate interactions across temporal data. Several promising visualization techniques are combined, such as: visual boosting to allow for quicker consumption of large data sets, the bond energy algorithm to find finer patterns and anomalies through contrast, multi-dimensional scaling, flow lines, user guided clustering, and row-column ordering. User input is applied on precomputed data sets to provide for real time interaction. General applicability of the techniques are tested against industrial trade, social networking, financial, and sparse data sets of varying dimensionality.

ContributorsHayden, Thomas (Author) / Maciejewski, Ross (Thesis advisor) / Wang, Yalin (Committee member) / Runger, George C. (Committee member) / Mack, Elizabeth (Committee member) / Arizona State University (Publisher)

Created2014

Predicting and Interpreting Students Performance using Supervised Learning and Shapley Additive Explanations

Description

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness,…

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness, but limited studies compared different statistical techniques with latest frameworks, and interpreted models in a unified approach.

In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore

and predict students’ performances. Multiple machine learning models and the model accuracy were evaluated based on the Shapley Additive Explanation.

The Cross-Validation shows the Gradient Boosting Decision Tree has the best precision 85.93% with average 82.90%. Features like Component grade, Due Date, Submission Times have higher impact than others. Baseline model received lower precision due to lack of non-linear fitting.

ContributorsTian, Wenbo (Author) / Hsiao, Ihan (Thesis advisor) / Bazzi, Rida (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2019

The Data Arms Race: Reimagining Data Transparency, Ethics and Regulations

Description

Data has quickly become a cornerstone of society. Across our daily lives, industry, policy, and more, we are experiencing what can only be called a “data revolution” igniting ferociously. While data is gaining more and more importance, consumers do not fully understand the extent of its use and subsequent capitalization…

Data has quickly become a cornerstone of society. Across our daily lives, industry, policy, and more, we are experiencing what can only be called a “data revolution” igniting ferociously. While data is gaining more and more importance, consumers do not fully understand the extent of its use and subsequent capitalization by companies. This paper explores the current climate relating to data security and data privacy. It aims to start a conversation regarding the culture around the sharing and collection of data. We explore aspects of data privacy in four tiers: the current cultural and social perception of data privacy, its relevance in our daily lives, its importance in society’s dialogue. Next, we look at current policy and legislature in place today, focusing primarily on Europe’s established GDPR and the incoming California Consumer Privacy Act, to see what measures are already in place and what measures need to be adopted to mold more of a culture of transparency. Next, we analyze current data privacy regulations and power of regulators like the FTC and SEC to see what tools they have at their disposal to ensure accountability in the tech industry when it comes to how our data is used. Lastly, we look at the potential act of treating and viewing data as an asset, and the implications of doing so in the scope of possible valuation and depreciation techniques. The goal of this paper is to outline initial steps to better understand and regulate data privacy and collection practices. Our goal is to bring this issue to the forefront of conversation in society, so that we may start the first step in the metaphorical marathon of data privacy, with the goal of establishing better data privacy controls and become a more data-conscious society.

ContributorsAnderson, Thomas C (Co-author) / Shafeeva, Zarina (Co-author) / Swiech, Jakub (Co-author) / Marchant, Gary (Thesis director) / Sopha, Matthew (Committee member) / WPC Graduate Programs (Contributor) / Department of Finance (Contributor) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

Impulse Spending with Data

Description

Data is ever present in the world today. Data can help predict presidential elections, Super Bowl champions, and even the weather. However, it's very hard, if not impossible, to predict how people feel unless they tell us. This is when impulse spending with data comes in handy. Companies are constantly…

Data is ever present in the world today. Data can help predict presidential elections, Super Bowl champions, and even the weather. However, it's very hard, if not impossible, to predict how people feel unless they tell us. This is when impulse spending with data comes in handy. Companies are constantly looking for ways to get honest feedback when they are doing market research. Often, the research obtained ends up being unreliable or biased in some way. Allowing users to make impulse purchases with survey data is the answer. Companies can still gather the data that they need to do market research and customers can get more features or lives for their favorite games. It becomes a win-win for both users and companies. By adding the option to pay with information instead of money, companies can still get value out of frugal players. Established companies might not care so much about the impulse spending for purchases made in the application, however they would find a great deal of value in hearing about what customers think of their product or upcoming event. The real value from getting data from customers is the ability to train analytics models so that companies can make better predictions about consumer behavior. More accurate predictions can lead to companies being better prepared to meet the needs to the customer. Impulse spending with data provides the foundation to creating a software that can create value from all types of users regardless of whether the user is willing to spend money in the application.

ContributorsYotter, Alexandria Lee (Author) / Olsen, Christopher (Thesis director) / Sopha, Matthew (Committee member) / Department of Information Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Data science for small businesses

Description

This reports investigates the general day to day problems faced by small businesses, particularly small vendors, in areas of marketing and general management. Due to lack of man power, internet availability and properly documented data, small business cannot optimize their business. The aim of the research is to address and…

This reports investigates the general day to day problems faced by small businesses, particularly small vendors, in areas of marketing and general management. Due to lack of man power, internet availability and properly documented data, small business cannot optimize their business. The aim of the research is to address and find a solution to these problems faced, in the form of a tool which utilizes data science. The tool will have features which will aid the vendor to mine their data which they record themselves and find useful information which will benefit their businesses. Since there is lack of properly documented data, One Class Classification using Support Vector Machine (SVM) is used to build a classifying model that can return positive values for audience that is likely to respond to a marketing strategy. Market basket analysis is used to choose products from the inventory in a way that patterns are found amongst them and therefore there is a higher chance of a marketing strategy to attract audience. Also, higher selling products can be used to the vendors' advantage and lesser selling products can be paired with them to have an overall profit to the business. The tool, as envisioned, meets all the requirements that it was set out to have and can be used as a stand alone application to bring the power of data mining into the hands of a small vendor.

ContributorsSharma, Aveesha (Author) / Ghazarian, Arbi (Thesis advisor) / Gaffar, Ashraf (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2016

Decision-making for utility scale photovoltaic systems: probabilistic risk assessment models for corrosion of structural elements and a material selection approach for polymeric components

Description

The solar energy sector has been growing rapidly over the past decade. Growth in renewable electricity generation using photovoltaic (PV) systems is accompanied by an increased awareness of the fault conditions developing during the operational lifetime of these systems. While the annual energy losses caused by faults in PV systems…

The solar energy sector has been growing rapidly over the past decade. Growth in renewable electricity generation using photovoltaic (PV) systems is accompanied by an increased awareness of the fault conditions developing during the operational lifetime of these systems. While the annual energy losses caused by faults in PV systems could reach up to 18.9% of their total capacity, emerging technologies and models are driving for greater efficiency to assure the reliability of a product under its actual application. The objectives of this dissertation consist of (1) reviewing the state of the art and practice of prognostics and health management for the Direct Current (DC) side of photovoltaic systems; (2) assessing the corrosion of the driven posts supporting PV structures in utility scale plants; and (3) assessing the probabilistic risk associated with the failure of polymeric materials that are used in tracker and fixed tilt systems.

As photovoltaic systems age under relatively harsh and changing environmental conditions, several potential fault conditions can develop during the operational lifetime including corrosion of supporting structures and failures of polymeric materials. The ability to accurately predict the remaining useful life of photovoltaic systems is critical for plants ‘continuous operation. This research contributes to the body of knowledge of PV systems reliability by: (1) developing a meta-model of the expected service life of mounting structures; (2) creating decision frameworks and tools to support practitioners in mitigating risks; (3) and supporting material selection for fielded and future photovoltaic systems. The newly developed frameworks were validated by a global solar company.

ContributorsChokor, Abbas (Author) / El Asmar, Mounir (Thesis advisor) / Chong, Oswald (Committee member) / Ernzen, James (Committee member) / Arizona State University (Publisher)

Created2017

Lossless Data Compression by Representing Data as a Solution to the Diophantine Equations

Description

There has been a substantial development in the field of data transmission in the last two decades. One does not have to wait much for a high-definition video to load on the systems anymore. Data compression is one of the most important technologies that helped achieve this seamless data transmission…

There has been a substantial development in the field of data transmission in the last two decades. One does not have to wait much for a high-definition video to load on the systems anymore. Data compression is one of the most important technologies that helped achieve this seamless data transmission experience. It helps to store or send more data using less memory or network resources. However, it appears that there is a limit on the amount of compression that can be achieved with the existing lossless data compression techniques because they rely on the frequency of characters or set of characters in the data. The thesis proposes a lossless data compression technique in which the data is compressed by representing it as a set of parameters that can reproduce the original data without any loss when given to the corresponding mathematical equation. The mathematical equation used in the thesis is the sum of the first N terms in a geometric series. Various changes are made to this mathematical equation so that any given data can be compressed and decompressed. According to the proposed technique, the whole data is taken as a single decimal number and replaced with one of the terms of the used equation. All the other terms of the equation are computed and stored as a compressed file. The performance of the developed technique is evaluated in terms of compression ratio, compression time and decompression time. The evaluation metrics are then compared with the other existing techniques of the same domain.

ContributorsGrewal, Karandeep Singh (Author) / Gonzalez Sanchez, Javier (Thesis advisor) / Bansal, Ajay (Committee member) / Findler, Michael (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by

Cluster metrics and temporal coherency in pixel based matrices

Predicting and Interpreting Students Performance using Supervised Learning and Shapley Additive Explanations

The Data Arms Race: Reimagining Data Transparency, Ethics and Regulations

Impulse Spending with Data

Data science for small businesses

Decision-making for utility scale photovoltaic systems: probabilistic risk assessment models for corrosion of structural elements and a material selection approach for polymeric components

Lossless Data Compression by Representing Data as a Solution to the Diophantine Equations