Search Content

Cluster metrics and temporal coherency in pixel based matrices

Description

In this thesis, the application of pixel-based vertical axes used within parallel coordinate plots is explored in an attempt to improve how existing tools can explain complex multivariate interactions across temporal data. Several promising visualization techniques are combined, such as: visual boosting to allow for quicker consumption of large data…

In this thesis, the application of pixel-based vertical axes used within parallel coordinate plots is explored in an attempt to improve how existing tools can explain complex multivariate interactions across temporal data. Several promising visualization techniques are combined, such as: visual boosting to allow for quicker consumption of large data sets, the bond energy algorithm to find finer patterns and anomalies through contrast, multi-dimensional scaling, flow lines, user guided clustering, and row-column ordering. User input is applied on precomputed data sets to provide for real time interaction. General applicability of the techniques are tested against industrial trade, social networking, financial, and sparse data sets of varying dimensionality.

ContributorsHayden, Thomas (Author) / Maciejewski, Ross (Thesis advisor) / Wang, Yalin (Committee member) / Runger, George C. (Committee member) / Mack, Elizabeth (Committee member) / Arizona State University (Publisher)

Created2014

Predicting and Interpreting Students Performance using Supervised Learning and Shapley Additive Explanations

Description

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness,…

Due to large data resources generated by online educational applications, Educational Data Mining (EDM) has improved learning effects in different ways: Students Visualization, Recommendations for students, Students Modeling, Grouping Students, etc. A lot of programming assignments have the features like automating submissions, examining the test cases to verify the correctness, but limited studies compared different statistical techniques with latest frameworks, and interpreted models in a unified approach.

In this thesis, several data mining algorithms have been applied to analyze students’ code assignment submission data from a real classroom study. The goal of this work is to explore

and predict students’ performances. Multiple machine learning models and the model accuracy were evaluated based on the Shapley Additive Explanation.

The Cross-Validation shows the Gradient Boosting Decision Tree has the best precision 85.93% with average 82.90%. Features like Component grade, Due Date, Submission Times have higher impact than others. Baseline model received lower precision due to lack of non-linear fitting.

ContributorsTian, Wenbo (Author) / Hsiao, Ihan (Thesis advisor) / Bazzi, Rida (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2019

Value of Twitter: Using Text Mining Methods to Gain Insight into the 2016 Presidential Race

Description

This project analyzes the tweets from the 2016 US Presidential Candidates' personal Twitter accounts. The goal is to define distinct patterns and differences between candidates and parties use of social media as a platform. The data spans the period of September 2015 to March 2016, which was during the primary…

This project analyzes the tweets from the 2016 US Presidential Candidates' personal Twitter accounts. The goal is to define distinct patterns and differences between candidates and parties use of social media as a platform. The data spans the period of September 2015 to March 2016, which was during the primary races for the Republicans and Democrats. The overall purpose of this project is to contribute to finding new ways of driving value from social media, in particular Twitter.

ContributorsMortimer, Schuyler Kenneth (Author) / Simon, Alan (Thesis director) / Mousavi, Seyedreza (Committee member) / Department of Information Systems (Contributor) / Department of Supply Chain Management (Contributor) / Barrett, The Honors College (Contributor)

Created2016-05

Investigating the Relationship between Neighborhood Socioeconomic Status and Proximity to Public Services

Description

With growing levels of income inequality in the United States, it remains as important as ever to ensure indispensable public services are readily available to all members of society. This paper investigates four forms of public services (schools, libraries, fire stations, and police stations), first by researching the background of…

With growing levels of income inequality in the United States, it remains as important as ever to ensure indispensable public services are readily available to all members of society. This paper investigates four forms of public services (schools, libraries, fire stations, and police stations), first by researching the background of these services and their relation to poverty, and then by conducting geospatial and regression analysis. The author uses Esri's ArcGIS Pro software to quantify the proximity to public services from urban American neighborhoods (census tracts in the cities of Phoenix and Chicago). Afterwards, the measures indicating proximity are compared to the socioeconomic statuses of neighborhoods using regression analysis. The results indicate that pure proximity to these four services is not necessarily correlated to socioeconomic status. While the paper does uncover some correlations, such as a relationship between school quality and socioeconomic status, the majority of the findings negate the author's hypothesis and show that, in Phoenix and Chicago, there is not much discrepancy between neighborhoods and the extent to which they are able to access vital government-funded services.

ContributorsNorbury, Adam Charles (Author) / Simon, Alan (Thesis director) / Simon, Phil (Committee member) / Department of Information Systems (Contributor) / Department of English (Contributor) / Department of Economics (Contributor) / Barrett, The Honors College (Contributor)

Created2018-05

Data science for small businesses

Description

This reports investigates the general day to day problems faced by small businesses, particularly small vendors, in areas of marketing and general management. Due to lack of man power, internet availability and properly documented data, small business cannot optimize their business. The aim of the research is to address and…

This reports investigates the general day to day problems faced by small businesses, particularly small vendors, in areas of marketing and general management. Due to lack of man power, internet availability and properly documented data, small business cannot optimize their business. The aim of the research is to address and find a solution to these problems faced, in the form of a tool which utilizes data science. The tool will have features which will aid the vendor to mine their data which they record themselves and find useful information which will benefit their businesses. Since there is lack of properly documented data, One Class Classification using Support Vector Machine (SVM) is used to build a classifying model that can return positive values for audience that is likely to respond to a marketing strategy. Market basket analysis is used to choose products from the inventory in a way that patterns are found amongst them and therefore there is a higher chance of a marketing strategy to attract audience. Also, higher selling products can be used to the vendors' advantage and lesser selling products can be paired with them to have an overall profit to the business. The tool, as envisioned, meets all the requirements that it was set out to have and can be used as a stand alone application to bring the power of data mining into the hands of a small vendor.

ContributorsSharma, Aveesha (Author) / Ghazarian, Arbi (Thesis advisor) / Gaffar, Ashraf (Committee member) / Bansal, Srividya (Committee member) / Arizona State University (Publisher)

Created2016

Decision-making for utility scale photovoltaic systems: probabilistic risk assessment models for corrosion of structural elements and a material selection approach for polymeric components

Description

The solar energy sector has been growing rapidly over the past decade. Growth in renewable electricity generation using photovoltaic (PV) systems is accompanied by an increased awareness of the fault conditions developing during the operational lifetime of these systems. While the annual energy losses caused by faults in PV systems…

The solar energy sector has been growing rapidly over the past decade. Growth in renewable electricity generation using photovoltaic (PV) systems is accompanied by an increased awareness of the fault conditions developing during the operational lifetime of these systems. While the annual energy losses caused by faults in PV systems could reach up to 18.9% of their total capacity, emerging technologies and models are driving for greater efficiency to assure the reliability of a product under its actual application. The objectives of this dissertation consist of (1) reviewing the state of the art and practice of prognostics and health management for the Direct Current (DC) side of photovoltaic systems; (2) assessing the corrosion of the driven posts supporting PV structures in utility scale plants; and (3) assessing the probabilistic risk associated with the failure of polymeric materials that are used in tracker and fixed tilt systems.

As photovoltaic systems age under relatively harsh and changing environmental conditions, several potential fault conditions can develop during the operational lifetime including corrosion of supporting structures and failures of polymeric materials. The ability to accurately predict the remaining useful life of photovoltaic systems is critical for plants ‘continuous operation. This research contributes to the body of knowledge of PV systems reliability by: (1) developing a meta-model of the expected service life of mounting structures; (2) creating decision frameworks and tools to support practitioners in mitigating risks; (3) and supporting material selection for fielded and future photovoltaic systems. The newly developed frameworks were validated by a global solar company.

ContributorsChokor, Abbas (Author) / El Asmar, Mounir (Thesis advisor) / Chong, Oswald (Committee member) / Ernzen, James (Committee member) / Arizona State University (Publisher)

Created2017

The big, predictable picture: construal-level reflects underlying life history strategy

Description

Integrating research from life history theory with investigations of construal-level theory, the researcher proposes a novel relationship between life history strategy and construal-level. Slow life history strategies arise in safe, predictable environments where individuals give up current reproductive effort in favor of future reproductive effort. Correspondingly, high-level construals allow individuals…

Integrating research from life history theory with investigations of construal-level theory, the researcher proposes a novel relationship between life history strategy and construal-level. Slow life history strategies arise in safe, predictable environments where individuals give up current reproductive effort in favor of future reproductive effort. Correspondingly, high-level construals allow individuals to transcend the current context and act according to global concerns, such as the type of future planning necessary to enact slow life history strategies. Meanwhile, fast life history strategies arise in harsh, unpredictable environments where the future is uncertain and individuals need to pay close attention to the current context to survive. Correspondingly, low-level construals immerse individuals in the immediate situation, enabling them the flexibility needed to respond to local concerns. Given the correspondence between aspects of life history and construal-level, it seems possible that individuals adopting slow life history strategies should more frequently use high-level construals to assist in transcending the current situation to plan for the future, while individuals adopting fast life history strategies should more frequently use low-level construals to assist in monitoring the details of their harsh, unpredictable environment. To test the relationship between life history and construal, the researcher investigated whether or not a childhood cue of environmental harshness and unpredictability, childhood SES, and a current cue of environmental harshness and unpredictability, local mortality rate, influenced construal-level. In line with past research, the researcher predicted that childhood SES would interact with current cues of local mortality rate to influence construal-level. For individuals growing up in high SES households, a high local mortality rate will lead to an increase in high-level construals. For individuals growing up in low SES households, a high local mortality rate will lead to an increase in low-level construals. Overall, results did not support the hypotheses. Childhood SES did not interact with prime condition to influence either categorization or trend predictions. Examining how the prime interacted with another measure of life history strategy, the Mini-K, yielded mixed results. However, there are several ways in which the current study could be altered to reexamine the relationship between life history strategy and construal.

ContributorsWhite, Andrew (Author) / Cohen, Adam B. (Thesis advisor) / Kenrick, Douglas T. (Committee member) / Kwan, Virginia Sy (Committee member) / Arizona State University (Publisher)

Created2011

Is More Always Better? The Relation Between Socioeconomic Status and Human Development

Description

Socioeconomic status (SES) is one of the most well researched constructs in developmental science, yet important questions underly how to best model it. That is, are relations with SES always in the same direction or does the direction of association change at different levels of SES? In this dissertation, I…

Socioeconomic status (SES) is one of the most well researched constructs in developmental science, yet important questions underly how to best model it. That is, are relations with SES always in the same direction or does the direction of association change at different levels of SES? In this dissertation, I conducted a meta-analysis using individual participant data (IPD) to examine two questions: 1) Does a nonmonotonic (quadratic) model of the relations between components of SES (i.e., income, years of education, occupation status/prestige), depressive symptoms, and academic achievement fit better than a monotonic (linear) model? and 2) Is the magnitude of relation moderated by developmental period, gender/sex, or race/ethnicity? I hypothesized that there would be more support for the nonmonotonic model. Moderation analyses were exploratory. I identified nationally representative IPD from the Inter-university Consortium for Political and Social Research (ICPSR). I included 59 datasets, which represent 23 studies (e.g., Add Health) and 1,844,577 participants. Higher income (β = -0.11; β = 0.10), years of education (β = -0.09; β = 0.13), and occupational status (β = -0.04; β = 0.04) and prestige (β = -0.03; β = 0.04) were associated with a linear decrease in depressive symptoms and increase in academic achievement, respectively. Higher income (β = 0.05), years of education (β = 0.02), and occupational status/prestige (β = 0.02) were quadratically associated with a decrease in depressive symptoms followed by a slight increase at higher levels of income and a diminishing association towards higher levels of education and occupational status/prestige. Higher income was also quadratically associated with academic achievement (β = -0.03). I found evidence that these associations varied between developmental periods and racial/ethnic samples, but I did not find evidence of variation between females and males. I integrate these findings with three conclusions: (1) more is not always better and (2) there are unique contexts and resources associated with different levels of SES that (3) operate in a dynamic fashion with other cultural systems (e.g., racism), which affect the integrated actions between the individual and context. I outline several measurement implications and limitations for future research directions.

ContributorsKorous, Kevin M. (Author) / Causadias, José M (Thesis advisor) / Bradley, Robert H (Thesis advisor) / Luthar, Suniya S (Committee member) / Levy, Roy (Committee member) / Arizona State University (Publisher)

Created2021

Lossless Data Compression by Representing Data as a Solution to the Diophantine Equations

Description

There has been a substantial development in the field of data transmission in the last two decades. One does not have to wait much for a high-definition video to load on the systems anymore. Data compression is one of the most important technologies that helped achieve this seamless data transmission…

There has been a substantial development in the field of data transmission in the last two decades. One does not have to wait much for a high-definition video to load on the systems anymore. Data compression is one of the most important technologies that helped achieve this seamless data transmission experience. It helps to store or send more data using less memory or network resources. However, it appears that there is a limit on the amount of compression that can be achieved with the existing lossless data compression techniques because they rely on the frequency of characters or set of characters in the data. The thesis proposes a lossless data compression technique in which the data is compressed by representing it as a set of parameters that can reproduce the original data without any loss when given to the corresponding mathematical equation. The mathematical equation used in the thesis is the sum of the first N terms in a geometric series. Various changes are made to this mathematical equation so that any given data can be compressed and decompressed. According to the proposed technique, the whole data is taken as a single decimal number and replaced with one of the terms of the used equation. All the other terms of the equation are computed and stored as a compressed file. The performance of the developed technique is evaluated in terms of compression ratio, compression time and decompression time. The evaluation metrics are then compared with the other existing techniques of the same domain.

ContributorsGrewal, Karandeep Singh (Author) / Gonzalez Sanchez, Javier (Thesis advisor) / Bansal, Ajay (Committee member) / Findler, Michael (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by