Search Content

Cluster metrics and temporal coherency in pixel based matrices

Description

In this thesis, the application of pixel-based vertical axes used within parallel coordinate plots is explored in an attempt to improve how existing tools can explain complex multivariate interactions across temporal data. Several promising visualization techniques are combined, such as: visual boosting to allow for quicker consumption of large data…

In this thesis, the application of pixel-based vertical axes used within parallel coordinate plots is explored in an attempt to improve how existing tools can explain complex multivariate interactions across temporal data. Several promising visualization techniques are combined, such as: visual boosting to allow for quicker consumption of large data sets, the bond energy algorithm to find finer patterns and anomalies through contrast, multi-dimensional scaling, flow lines, user guided clustering, and row-column ordering. User input is applied on precomputed data sets to provide for real time interaction. General applicability of the techniques are tested against industrial trade, social networking, financial, and sparse data sets of varying dimensionality.

ContributorsHayden, Thomas (Author) / Maciejewski, Ross (Thesis advisor) / Wang, Yalin (Committee member) / Runger, George C. (Committee member) / Mack, Elizabeth (Committee member) / Arizona State University (Publisher)

Created2014

Predicting and Interpreting Students Performance using Supervised Learning and Shapley Additive Explanations

Description

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness,…

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness, but limited studies compared different statistical techniques with latest frameworks, and interpreted models in a unified approach.

In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore

and predict students’ performances. Multiple machine learning models and the model accuracy were evaluated based on the Shapley Additive Explanation.

The Cross-Validation shows the Gradient Boosting Decision Tree has the best precision 85.93% with average 82.90%. Features like Component grade, Due Date, Submission Times have higher impact than others. Baseline model received lower precision due to lack of non-linear fitting.

ContributorsTian, Wenbo (Author) / Hsiao, Ihan (Thesis advisor) / Bazzi, Rida (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2019

Collision of News and Technology, Chaos or Catalyst

Description

The foundations of legacy media, especially the news media, are not as strong as they once were. A digital revolution has changed the operation models for and journalistic organizations are trying to find their place in the new market. This project is intended to analyze the effects of new/emerging technologies…

The foundations of legacy media, especially the news media, are not as strong as they once were. A digital revolution has changed the operation models for and journalistic organizations are trying to find their place in the new market. This project is intended to analyze the effects of new/emerging technologies on the journalism industry. Five different categories of technology will be explored. They are as follows: the semantic web, automation software, data analysis and aggregators, virtual reality and drone journalism. The potential of these technologies will be broken up according to four guidelines, ethical implications, effects on the reportorial process, business impacts and changes to the consumer experience. Upon my examination, it is apparent that no single technology will offer the journalism industry the remedy it has been searching for. Some combination of emerging technologies however, may form the basis for the next generation of news. Findings are presented on a website that features video, visuals, linked content, and original graphics. Website found at http://www.explorenewstech.com/

ContributorsMaxwell, Gavin William (Author) / McGuire, Tim (Thesis director) / Cornelius, David (Committee member) / Walter Cronkite School of Journalism and Mass Communication (Contributor) / School of Politics and Global Studies (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Zero to Tolerant

Description

Academic integrity policies coded specifically for journalism schools or departments are devised for the purpose of fostering a realistic, informative learning environment. Plagiarism and fabrication are two of the most egregious errors of judgment a journalist can commit, and journalism schools and departments address these errors through their academic integrity…

Academic integrity policies coded specifically for journalism schools or departments are devised for the purpose of fostering a realistic, informative learning environment. Plagiarism and fabrication are two of the most egregious errors of judgment a journalist can commit, and journalism schools and departments address these errors through their academic integrity policies. Some schools take a zero-tolerance approach, often expelling the student after the first or second violation, while other schools take a tolerant approach, in which a student is permitted at least three violations before suspension is considered. In a time where plagiarizing and fabricating stories has never been easier to commit and never easier to catch, students must be prepared to understand plagiarism and fabrication with multimedia elements, such as video, audio, and photos. In this project, journalism academic integrity codes were gathered from across the U.S. and designated to a zero-tolerance, semi-tolerant or tolerant category the researcher designed in order to determine what is preparing students most for the real journalism world, and to suggest how some policies could improve themselves.

ContributorsRoney, Claire Marie (Author) / McGuire, Tim (Thesis director) / Russomanno, Joseph (Committee member) / W. P. Carey School of Business (Contributor) / Walter Cronkite School of Journalism and Mass Communication (Contributor) / Barrett, The Honors College (Contributor)

Created2016-12

Data science for small businesses

Description

This reports investigates the general day to day problems faced by small businesses, particularly small vendors, in areas of marketing and general management. Due to lack of man power, internet availability and properly documented data, small business cannot optimize their business. The aim of the research is to address and…

This reports investigates the general day to day problems faced by small businesses, particularly small vendors, in areas of marketing and general management. Due to lack of man power, internet availability and properly documented data, small business cannot optimize their business. The aim of the research is to address and find a solution to these problems faced, in the form of a tool which utilizes data science. The tool will have features which will aid the vendor to mine their data which they record themselves and find useful information which will benefit their businesses. Since there is lack of properly documented data, One Class Classification using Support Vector Machine (SVM) is used to build a classifying model that can return positive values for audience that is likely to respond to a marketing strategy. Market basket analysis is used to choose products from the inventory in a way that patterns are found amongst them and therefore there is a higher chance of a marketing strategy to attract audience. Also, higher selling products can be used to the vendors' advantage and lesser selling products can be paired with them to have an overall profit to the business. The tool, as envisioned, meets all the requirements that it was set out to have and can be used as a stand alone application to bring the power of data mining into the hands of a small vendor.

ContributorsSharma, Aveesha (Author) / Ghazarian, Arbi (Thesis advisor) / Gaffar, Ashraf (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2016

Decision-making for utility scale photovoltaic systems: probabilistic risk assessment models for corrosion of structural elements and a material selection approach for polymeric components

Description

The solar energy sector has been growing rapidly over the past decade. Growth in renewable electricity generation using photovoltaic (PV) systems is accompanied by an increased awareness of the fault conditions developing during the operational lifetime of these systems. While the annual energy losses caused by faults in PV systems…

The solar energy sector has been growing rapidly over the past decade. Growth in renewable electricity generation using photovoltaic (PV) systems is accompanied by an increased awareness of the fault conditions developing during the operational lifetime of these systems. While the annual energy losses caused by faults in PV systems could reach up to 18.9% of their total capacity, emerging technologies and models are driving for greater efficiency to assure the reliability of a product under its actual application. The objectives of this dissertation consist of (1) reviewing the state of the art and practice of prognostics and health management for the Direct Current (DC) side of photovoltaic systems; (2) assessing the corrosion of the driven posts supporting PV structures in utility scale plants; and (3) assessing the probabilistic risk associated with the failure of polymeric materials that are used in tracker and fixed tilt systems.

As photovoltaic systems age under relatively harsh and changing environmental conditions, several potential fault conditions can develop during the operational lifetime including corrosion of supporting structures and failures of polymeric materials. The ability to accurately predict the remaining useful life of photovoltaic systems is critical for plants ‘continuous operation. This research contributes to the body of knowledge of PV systems reliability by: (1) developing a meta-model of the expected service life of mounting structures; (2) creating decision frameworks and tools to support practitioners in mitigating risks; (3) and supporting material selection for fielded and future photovoltaic systems. The newly developed frameworks were validated by a global solar company.

ContributorsChokor, Abbas (Author) / El Asmar, Mounir (Thesis advisor) / Chong, Oswald (Committee member) / Ernzen, James (Committee member) / Arizona State University (Publisher)

Created2017

Kitsune: Structurally-Aware and Adaptable Plagiarism Detection

Description

Plagiarism is a huge problem in a learning environment. In programming classes especially, plagiarism can be hard to detect as source codes' appearance can be easily modified without changing the intent through simple formatting changes or refactoring. There are a number of plagiarism detection tools that attempt to encode knowledge…

Plagiarism is a huge problem in a learning environment. In programming classes especially, plagiarism can be hard to detect as source codes' appearance can be easily modified without changing the intent through simple formatting changes or refactoring. There are a number of plagiarism detection tools that attempt to encode knowledge about the programming languages they support in order to better detect obscured duplicates. Many such tools do not support a large number of languages because doing so requires too much code and therefore too much maintenance. It is also difficult to add support for new languages because each language is vastly different syntactically. Tools that are more extensible often do so by reducing the features of a language that are encoded and end up closer to text comparison tools than structurally-aware program analysis tools.

Kitsune attempts to remedy these issues by tying itself to Antlr, a pre-existing language recognition tool with over 200 currently supported languages. In addition, it provides an interface through which generic manipulations can be applied to the parse tree generated by Antlr. As Kitsune relies on language-agnostic structure modifications, it can be adapted with minimal effort to provide plagiarism detection for new languages. Kitsune has been evaluated for 10 of the languages in the Antlr grammar repository with success and could easily be extended to support all of the grammars currently developed by Antlr or future grammars which are developed as new languages are written.

ContributorsMonroe, Zachary Lynn (Author) / Bansal, Ajay (Thesis advisor) / Lindquist, Timothy (Committee member) / Acuna, Ruben (Committee member) / Arizona State University (Publisher)

Created2020

Lossless Data Compression by Representing Data as a Solution to the Diophantine Equations

Description

There has been a substantial development in the field of data transmission in the last two decades. One does not have to wait much for a high-definition video to load on the systems anymore. Data compression is one of the most important technologies that helped achieve this seamless data transmission…

There has been a substantial development in the field of data transmission in the last two decades. One does not have to wait much for a high-definition video to load on the systems anymore. Data compression is one of the most important technologies that helped achieve this seamless data transmission experience. It helps to store or send more data using less memory or network resources. However, it appears that there is a limit on the amount of compression that can be achieved with the existing lossless data compression techniques because they rely on the frequency of characters or set of characters in the data. The thesis proposes a lossless data compression technique in which the data is compressed by representing it as a set of parameters that can reproduce the original data without any loss when given to the corresponding mathematical equation. The mathematical equation used in the thesis is the sum of the first N terms in a geometric series. Various changes are made to this mathematical equation so that any given data can be compressed and decompressed. According to the proposed technique, the whole data is taken as a single decimal number and replaced with one of the terms of the used equation. All the other terms of the equation are computed and stored as a compressed file. The performance of the developed technique is evaluated in terms of compression ratio, compression time and decompression time. The evaluation metrics are then compared with the other existing techniques of the same domain.

ContributorsGrewal, Karandeep Singh (Author) / Gonzalez Sanchez, Javier (Thesis advisor) / Bansal, Ajay (Committee member) / Findler, Michael (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by