Search Content

Positive Unlabeled Learning - Optimization and Evaluation

Description

In many real-world machine learning classification applications, well labeled training data can be difficult, expensive, or even impossible to obtain. In such situations, it is sometimes possible to label a small subset of data as belonging to the class of interest though it is impractical to manually label all data…

In many real-world machine learning classification applications, well labeled training data can be difficult, expensive, or even impossible to obtain. In such situations, it is sometimes possible to label a small subset of data as belonging to the class of interest though it is impractical to manually label all data not of interest. The result is a small set of positive labeled data and a large set of unknown and unlabeled data. This is known as the Positive and Unlabeled learning (PU learning) problem, a type of semi-supervised learning. In this dissertation, the PU learning problem is rigorously defined, several common assumptions described, and a literature review of the field provided. A new family of effective PU learning algorithms, the MLR (Modified Logistic Regression) family of algorithms, is described. Theoretical and experimental justification for these algorithms is provided demonstrating their success and flexibility. Extensive experimentation and empirical evidence are provided comparing several new and existing PU learning evaluation estimation metrics in a wide variety of scenarios. The surprisingly clear advantage of a simple recall estimate as the best estimate for overall PU classifier performance is described. Finally, an application of PU learning to the field of solar fault detection, an area not previously explored in the field, demonstrates the advantage and potential of PU learning in new application domains.

ContributorsJaskie, Kristen P (Author) / Spanias, Andreas (Thesis advisor) / Blain-Christen, Jennifer (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Thiagarajan, Jayaraman (Committee member) / Arizona State University (Publisher)

Created2021

Fault Detection and Classification in Photovoltaic Arrays using Machine Learning

Description

Operational efficiency of solar energy farms requires detailed analytics and information on each panel regarding voltage, current, temperature, and irradiance. Monitoring utility-scale solar arrays was shown to minimize the cost of maintenance and help optimize the performance of photovoltaic (PV) arrays under various conditions. This dissertation describes a project that…

Operational efficiency of solar energy farms requires detailed analytics and information on each panel regarding voltage, current, temperature, and irradiance. Monitoring utility-scale solar arrays was shown to minimize the cost of maintenance and help optimize the performance of photovoltaic (PV) arrays under various conditions. This dissertation describes a project that focuses on the development of machine learning and neural network algorithms. It also describes an 18kW solar array testbed for the purpose of PV monitoring and control. The use of the 18kW Sensor Signal and Information Processing (SenSIP) PV testbed which consists of 104 modules fitted with smart monitoring devices (SMDs) is described in detail. Each of the SMDs has embedded, a wireless transceiver, and relays that enable continuous monitoring, fault detection, and real-time connection topology changes. Data is obtained in real time using the SenSIP PV testbed. Machine learning and neural network algorithms for PV fault classification is are studied in depth. More specifically, the development of a series of customized neural networks for detection and classification of solar array faults that include soiling, shading, degradation, short circuits and standard test conditions is considered. The evaluation of fault detection and classification methods using metrics such as accuracy, confusion matrices, and the Risk Priority Number (RPN) is performed. The examination and assessment the classification performance of customized neural networks with dropout regularizers is presented in detail. The development and evaluation of neural network pruning strategies and illustration of the trade-off between fault classification model accuracy and algorithm complexity is studied. This study includes data from the National Renewable Energy Laboratory (NREL) database and also real-time data collected from the SenSIP testbed at MTW under various loading and shading conditions. The overall approach for detection and classification promises to elevate the performance and robustness of PV arrays.

ContributorsRao, Sunil (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Thesis advisor) / Tsakalis, Konstantinos (Committee member) / Srinivasan, Devarajan (Committee member) / Arizona State University (Publisher)

Created2021

Graph Based Semi-Supervised Classification and Manifold Learning

Description

Due to their effectiveness in capturing similarities between different entities, graphical models are widely used to represent datasets that reside on irregular and complex manifolds. Graph signal processing offers support to handle such complex datasets. By extending the digital signal processing conceptual frame from time and frequency domain to graph…

Due to their effectiveness in capturing similarities between different entities, graphical models are widely used to represent datasets that reside on irregular and complex manifolds. Graph signal processing offers support to handle such complex datasets. By extending the digital signal processing conceptual frame from time and frequency domain to graph domain, operators such as graph shift, graph filter and graph Fourier transform are defined. In this dissertation, two novel graph filter design methods are proposed. First, a graph filter with multiple shift matrices is applied to semi-supervised classification, which can handle features with uneven qualities through an embedded feature importance evaluation process. Three optimization solutions are provided: an alternating minimization method that is simple to implement, a convex relaxation method that provides a theoretical performance benchmark and a genetic algorithm, which is computationally efficient and better at configuring overfitting. Second, a graph filter with splitting-and-merging scheme is proposed, which splits the graph into multiple subgraphs. The corresponding subgraph filters are trained parallelly and in the last, by merging all the subgraph filters, the final graph filter is obtained. Due to the splitting process, the redundant edges in the original graph are dropped, which can save computational cost in semi-supervised classification. At the same time, this scheme also enables the filter to represent unevenly sampled data in manifold learning. To evaluate the performance of the proposed graph filter design approaches, simulation experiments with synthetic and real datasets are conduct. The Monte Carlo cross validation method is employed to demonstrate the need for the proposed graph filter design approaches in various application scenarios. Criterions, such as accuracy, Gini score, F1-score and learning curves, are provided to analyze the performance of the proposed methods and their competitors.

ContributorsFan, Jie (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Spanias, Andreas (Thesis advisor) / Tsakalis, Konstantinos (Committee member) / Dasarathy, Gautam (Committee member) / Arizona State University (Publisher)

Created2022

Bayesian Inference and Information Learning for Switching Nonlinear Gene Regulatory Networks

Description

This dissertation centers on the development of Bayesian methods for learning differ- ent types of variation in switching nonlinear gene regulatory networks (GRNs). A new nonlinear and dynamic multivariate GRN model is introduced to account for different sources of variability in GRNs. The new model is aimed at more precisely…

This dissertation centers on the development of Bayesian methods for learning differ- ent types of variation in switching nonlinear gene regulatory networks (GRNs). A new nonlinear and dynamic multivariate GRN model is introduced to account for different sources of variability in GRNs. The new model is aimed at more precisely capturing the complexity of GRN interactions through the introduction of time-varying kinetic order parameters, while allowing for variability in multiple model parameters. This model is used as the drift function in the development of several stochastic GRN mod- els based on Langevin dynamics. Six models are introduced which capture intrinsic and extrinsic noise in GRNs, thereby providing a full characterization of a stochastic regulatory system. A Bayesian hierarchical approach is developed for learning the Langevin model which best describes the noise dynamics at each time step. The trajectory of the state, which are the gene expression values, as well as the indicator corresponding to the correct noise model are estimated via sequential Monte Carlo (SMC) with a high degree of accuracy. To address the problem of time-varying regulatory interactions, a Bayesian hierarchical model is introduced for learning variation in switching GRN architectures with unknown measurement noise covariance. The trajectory of the state and the indicator corresponding to the network configuration at each time point are estimated using SMC. This work is extended to a fully Bayesian hierarchical model to account for uncertainty in the process noise covariance associated with each network architecture. An SMC algorithm with local Gibbs sampling is developed to estimate the trajectory of the state and the indicator correspond- ing to the network configuration at each time point with a high degree of accuracy. The results demonstrate the efficacy of Bayesian methods for learning information in switching nonlinear GRNs.

ContributorsVélez-Cruz, Nayely (Author) / Papandreou-Suppappola, Antonia (Thesis advisor) / Moraffah, Bahman (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Berisha, Visar (Committee member) / Arizona State University (Publisher)

Created2023

Filtering by

Positive Unlabeled Learning - Optimization and Evaluation

Fault Detection and Classification in Photovoltaic Arrays using Machine Learning

Graph Based Semi-Supervised Classification and Manifold Learning

Bayesian Inference and Information Learning for Switching Nonlinear Gene Regulatory Networks