Search Content

Positive Unlabeled Learning - Optimization and Evaluation

Description

In many real-world machine learning classification applications, well labeled training data can be difficult, expensive, or even impossible to obtain. In such situations, it is sometimes possible to label a small subset of data as belonging to the class of interest though it is impractical to manually label all data…

In many real-world machine learning classification applications, well labeled training data can be difficult, expensive, or even impossible to obtain. In such situations, it is sometimes possible to label a small subset of data as belonging to the class of interest though it is impractical to manually label all data not of interest. The result is a small set of positive labeled data and a large set of unknown and unlabeled data. This is known as the Positive and Unlabeled learning (PU learning) problem, a type of semi-supervised learning. In this dissertation, the PU learning problem is rigorously defined, several common assumptions described, and a literature review of the field provided. A new family of effective PU learning algorithms, the MLR (Modified Logistic Regression) family of algorithms, is described. Theoretical and experimental justification for these algorithms is provided demonstrating their success and flexibility. Extensive experimentation and empirical evidence are provided comparing several new and existing PU learning evaluation estimation metrics in a wide variety of scenarios. The surprisingly clear advantage of a simple recall estimate as the best estimate for overall PU classifier performance is described. Finally, an application of PU learning to the field of solar fault detection, an area not previously explored in the field, demonstrates the advantage and potential of PU learning in new application domains.

ContributorsJaskie, Kristen P (Author) / Spanias, Andreas (Thesis advisor) / Blain-Christen, Jennifer (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Thiagarajan, Jayaraman (Committee member) / Arizona State University (Publisher)

Created2021

Generating Point Cloud Failure Surface of Polymeric Unidirectional Composites Using Virtual Tests

Description

Composite materials have gained interest in the aerospace, mechanical and civil engineering industries due to their desirable properties - high specific strength and modulus, and superior resistance to fatigue. Design engineers greatly benefit from a reliable predictive tool that can calculate the deformations, strains, and stresses of composites under uniaxial…

Composite materials have gained interest in the aerospace, mechanical and civil engineering industries due to their desirable properties - high specific strength and modulus, and superior resistance to fatigue. Design engineers greatly benefit from a reliable predictive tool that can calculate the deformations, strains, and stresses of composites under uniaxial and multiaxial states of loading including damage and failure predictions. Obtaining this information from (laboratory) experimental testing is costly, time consuming, and sometimes, impractical. On the other hand, numerical modeling of composite materials provides a tool (virtual testing) that can be used as a supplemental and an alternate procedure to obtain data that either cannot be readily obtained via experiments or is not possible with the currently available experimental setup. In this study, a unidirectional composite (Toray T800-F3900) is modeled at the constituent level using repeated unit cells (RUC) so as to obtain homogenized response all the way from the unloaded state up until failure (defined as complete loss of load carrying capacity). The RUC-based model is first calibrated and validated against the principal material direction laboratory tests involving unidirectional loading states. Subsequently, the models are subjected to multi-directional states of loading to generate a point cloud failure data under in-plane and out-of-plane biaxial loading conditions. Failure surfaces thus generated are plotted and compared against analytical failure theories. Results indicate that the developed process and framework can be used to generate a reliable failure prediction procedure that can possibly be used for a variety of composite systems.

ContributorsKatusele, Daniel Mutahwa (Author) / Rajan, Subramaniam (Thesis advisor) / Mobasher, Barzin (Committee member) / Neithalath, Narayanan (Committee member) / Arizona State University (Publisher)

Created2021

Fault Detection and Classification in Photovoltaic Arrays using Machine Learning

Description

Operational efficiency of solar energy farms requires detailed analytics and information on each panel regarding voltage, current, temperature, and irradiance. Monitoring utility-scale solar arrays was shown to minimize the cost of maintenance and help optimize the performance of photovoltaic (PV) arrays under various conditions. This dissertation describes a project that…

Operational efficiency of solar energy farms requires detailed analytics and information on each panel regarding voltage, current, temperature, and irradiance. Monitoring utility-scale solar arrays was shown to minimize the cost of maintenance and help optimize the performance of photovoltaic (PV) arrays under various conditions. This dissertation describes a project that focuses on the development of machine learning and neural network algorithms. It also describes an 18kW solar array testbed for the purpose of PV monitoring and control. The use of the 18kW Sensor Signal and Information Processing (SenSIP) PV testbed which consists of 104 modules fitted with smart monitoring devices (SMDs) is described in detail. Each of the SMDs has embedded, a wireless transceiver, and relays that enable continuous monitoring, fault detection, and real-time connection topology changes. Data is obtained in real time using the SenSIP PV testbed. Machine learning and neural network algorithms for PV fault classification is are studied in depth. More specifically, the development of a series of customized neural networks for detection and classification of solar array faults that include soiling, shading, degradation, short circuits and standard test conditions is considered. The evaluation of fault detection and classification methods using metrics such as accuracy, confusion matrices, and the Risk Priority Number (RPN) is performed. The examination and assessment the classification performance of customized neural networks with dropout regularizers is presented in detail. The development and evaluation of neural network pruning strategies and illustration of the trade-off between fault classification model accuracy and algorithm complexity is studied. This study includes data from the National Renewable Energy Laboratory (NREL) database and also real-time data collected from the SenSIP testbed at MTW under various loading and shading conditions. The overall approach for detection and classification promises to elevate the performance and robustness of PV arrays.

ContributorsRao, Sunil (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Thesis advisor) / Tsakalis, Konstantinos (Committee member) / Srinivasan, Devarajan (Committee member) / Arizona State University (Publisher)

Created2021

Graph Based Semi-Supervised Classification and Manifold Learning

Description

Due to their effectiveness in capturing similarities between different entities, graphical models are widely used to represent datasets that reside on irregular and complex manifolds. Graph signal processing offers support to handle such complex datasets. By extending the digital signal processing conceptual frame from time and frequency domain to graph…

Due to their effectiveness in capturing similarities between different entities, graphical models are widely used to represent datasets that reside on irregular and complex manifolds. Graph signal processing offers support to handle such complex datasets. By extending the digital signal processing conceptual frame from time and frequency domain to graph domain, operators such as graph shift, graph filter and graph Fourier transform are defined. In this dissertation, two novel graph filter design methods are proposed. First, a graph filter with multiple shift matrices is applied to semi-supervised classification, which can handle features with uneven qualities through an embedded feature importance evaluation process. Three optimization solutions are provided: an alternating minimization method that is simple to implement, a convex relaxation method that provides a theoretical performance benchmark and a genetic algorithm, which is computationally efficient and better at configuring overfitting. Second, a graph filter with splitting-and-merging scheme is proposed, which splits the graph into multiple subgraphs. The corresponding subgraph filters are trained parallelly and in the last, by merging all the subgraph filters, the final graph filter is obtained. Due to the splitting process, the redundant edges in the original graph are dropped, which can save computational cost in semi-supervised classification. At the same time, this scheme also enables the filter to represent unevenly sampled data in manifold learning. To evaluate the performance of the proposed graph filter design approaches, simulation experiments with synthetic and real datasets are conduct. The Monte Carlo cross validation method is employed to demonstrate the need for the proposed graph filter design approaches in various application scenarios. Criterions, such as accuracy, Gini score, F1-score and learning curves, are provided to analyze the performance of the proposed methods and their competitors.

ContributorsFan, Jie (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Spanias, Andreas (Thesis advisor) / Tsakalis, Konstantinos (Committee member) / Dasarathy, Gautam (Committee member) / Arizona State University (Publisher)

Created2022

Kinetics of alkaline activation of slag and fly ash-slag systems

Description

Alkali-activated aluminosilicates, commonly known as "geopolymers", are being increasingly studied as a potential replacement for Portland cement. These binders use an alkaline activator, typically alkali silicates, alkali hydroxides or a combination of both along with a silica-and-alumina rich material, such as fly ash or slag, to form a final product…

Alkali-activated aluminosilicates, commonly known as "geopolymers", are being increasingly studied as a potential replacement for Portland cement. These binders use an alkaline activator, typically alkali silicates, alkali hydroxides or a combination of both along with a silica-and-alumina rich material, such as fly ash or slag, to form a final product with properties comparable to or better than those of ordinary Portland cement. The kinetics of alkali activation is highly dependent on the chemical composition of the binder material and the activator concentration. The influence of binder composition (slag, fly ash or both), different levels of alkalinity, expressed using the ratios of Na2O-to-binders (n) and activator SiO2-to-Na2O ratios (Ms), on the early age behavior in sodium silicate solution (waterglass) activated fly ash-slag blended systems is discussed in this thesis. Optimal binder composition and the n values are selected based on the setting times. Higher activator alkalinity (n value) is required when the amount of slag in the fly ash-slag blended mixtures is reduced. Isothermal calorimetry is performed to evaluate the early age hydration process and to understand the reaction kinetics of the alkali activated systems. The differences in the calorimetric signatures between waterglass activated slag and fly ash-slag blends facilitate an understanding of the impact of the binder composition on the reaction rates. Kinetic modeling is used to quantify the differences in reaction kinetics using the Exponential as well as the Knudsen method. The influence of temperature on the reaction kinetics of activated slag and fly ash-slag blends based on the hydration parameters are discussed. Very high compressive strengths can be obtained both at early ages as well as later ages (more than 70 MPa) with waterglass activated slag mortars. Compressive strength decreases with the increase in the fly ash content. A qualitative evidence of leaching is presented through the electrical conductivity changes in the saturating solution. The impact of leaching and the strength loss is found to be generally higher for the mixtures made using a higher activator Ms and a higher n value. Attenuated Total Reflectance-Fourier Transform Infrared Spectroscopy (ATR-FTIR) is used to obtain information about the reaction products.

ContributorsChithiraputhiran, Sundara Raman (Author) / Neithalath, Narayanan (Thesis advisor) / Rajan, Subramaniyam D (Committee member) / Mobasher, Barzin (Committee member) / Arizona State University (Publisher)

Created2012

Bayesian Inference and Information Learning for Switching Nonlinear Gene Regulatory Networks

Description

This dissertation centers on the development of Bayesian methods for learning differ- ent types of variation in switching nonlinear gene regulatory networks (GRNs). A new nonlinear and dynamic multivariate GRN model is introduced to account for different sources of variability in GRNs. The new model is aimed at more precisely…

This dissertation centers on the development of Bayesian methods for learning differ- ent types of variation in switching nonlinear gene regulatory networks (GRNs). A new nonlinear and dynamic multivariate GRN model is introduced to account for different sources of variability in GRNs. The new model is aimed at more precisely capturing the complexity of GRN interactions through the introduction of time-varying kinetic order parameters, while allowing for variability in multiple model parameters. This model is used as the drift function in the development of several stochastic GRN mod- els based on Langevin dynamics. Six models are introduced which capture intrinsic and extrinsic noise in GRNs, thereby providing a full characterization of a stochastic regulatory system. A Bayesian hierarchical approach is developed for learning the Langevin model which best describes the noise dynamics at each time step. The trajectory of the state, which are the gene expression values, as well as the indicator corresponding to the correct noise model are estimated via sequential Monte Carlo (SMC) with a high degree of accuracy. To address the problem of time-varying regulatory interactions, a Bayesian hierarchical model is introduced for learning variation in switching GRN architectures with unknown measurement noise covariance. The trajectory of the state and the indicator corresponding to the network configuration at each time point are estimated using SMC. This work is extended to a fully Bayesian hierarchical model to account for uncertainty in the process noise covariance associated with each network architecture. An SMC algorithm with local Gibbs sampling is developed to estimate the trajectory of the state and the indicator correspond- ing to the network configuration at each time point with a high degree of accuracy. The results demonstrate the efficacy of Bayesian methods for learning information in switching nonlinear GRNs.

ContributorsVélez-Cruz, Nayely (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Moraffah, Bahman (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2023

Covid-19 Hotspot Estimation Using Consensus Methods, SEIR Models and ML Algorithms

Description

The primary objective of this thesis is to identify locations or regions where COVID-19 transmission is more prevalent, termed “hotspots,” assess the likelihood of contracting the virus after visiting crowded areas or potential hotspots, and make predictions on confirmed COVID-19 cases and recoveries. A consensus algorithm is used to identify…

The primary objective of this thesis is to identify locations or regions where COVID-19 transmission is more prevalent, termed “hotspots,” assess the likelihood of contracting the virus after visiting crowded areas or potential hotspots, and make predictions on confirmed COVID-19 cases and recoveries. A consensus algorithm is used to identify such hotspots; the SEIR epidemiological model tracks COVID-19 cases, allowing for a better understanding of the disease dynamics and enabling informed decision-making in public health strategies. Consensus-based distributed methodologies have been developed to estimate the magnitude, density, and locations of COVID-19 hotspots to provide well-informed alerts based on continuous data risk assessments. Assuming agents own a mobile device, transmission hotspots use information from user devices with Bluetooth and WiFi. In a consensus-based distributed clustering algorithm, users are divided into smaller groups, and then the number of users is estimated in each group. This process allows for the determination of the population of an outdoor site and the distances between individuals. The proposed algorithm demonstrates versatility by being applicable not only in outdoor environments but also in indoor settings. Considerations are made for signal attenuation caused by walls and other barriers to adapt to indoor environments, and a wall detection algorithm is employed for this purpose. The clustering mechanism is designed to dynamically choose the appropriate clustering technique based on data-dependent patterns, ensuring that every node undergoes proper clustering. After networks have been established and clustered, the output of the consensus algorithmis fed as one of many inputs into the SEIR model. SEIR, representing Susceptible, Exposed, Infectious, and Removed, forms the basis of a model designed to assess the probability of infection at a Point of Interest (POI). The SEIR model utilizes calculated parameters such as β (contact), σ (latency),γ (recovery), ω (loss of immunity) along with current COVID-19 case data to precisely predict the infection spread in a specific area. The SEIR model is implemented with diverse methodologies for transitioning populations between compartments. Hence, the model identifies optimal parameter values under different conditions and scenarios and forecasts the number of infected and recovered cases for the upcoming days.

ContributorsPatel, Bhavikkumar (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Thesis advisor) / Banavar, Mahesh (Committee member) / Arizona State University (Publisher)

Created2024

ASU Electronic Theses and Dissertations

Filtering by

Positive Unlabeled Learning - Optimization and Evaluation

Generating Point Cloud Failure Surface of Polymeric Unidirectional Composites Using Virtual Tests

Fault Detection and Classification in Photovoltaic Arrays using Machine Learning

Graph Based Semi-Supervised Classification and Manifold Learning

Kinetics of alkaline activation of slag and fly ash-slag systems

Bayesian Inference and Information Learning for Switching Nonlinear Gene Regulatory Networks

Covid-19 Hotspot Estimation Using Consensus Methods, SEIR Models and ML Algorithms