Matching Items (7)
Filtering by

Clear all filters

153065-Thumbnail Image.png
Description
Data imbalance and data noise often coexist in real world datasets. Data imbalance affects the learning classifier by degrading the recognition power of the classifier on the minority class, while data noise affects the learning classifier by providing inaccurate information and thus misleads the classifier. Because of these differences, data

Data imbalance and data noise often coexist in real world datasets. Data imbalance affects the learning classifier by degrading the recognition power of the classifier on the minority class, while data noise affects the learning classifier by providing inaccurate information and thus misleads the classifier. Because of these differences, data imbalance and data noise have been treated separately in the data mining field. Yet, such approach ignores the mutual effects and as a result may lead to new problems. A desirable solution is to tackle these two issues jointly. Noting the complementary nature of generative and discriminative models, this research proposes a unified model fusion based framework to handle the imbalanced classification with noisy dataset.

The phase I study focuses on the imbalanced classification problem. A generative classifier, Gaussian Mixture Model (GMM) is studied which can learn the distribution of the imbalance data to improve the discrimination power on imbalanced classes. By fusing this knowledge into cost SVM (cSVM), a CSG method is proposed. Experimental results show the effectiveness of CSG in dealing with imbalanced classification problems.

The phase II study expands the research scope to include the noisy dataset into the imbalanced classification problem. A model fusion based framework, K Nearest Gaussian (KNG) is proposed. KNG employs a generative modeling method, GMM, to model the training data as Gaussian mixtures and form adjustable confidence regions which are less sensitive to data imbalance and noise. Motivated by the K-nearest neighbor algorithm, the neighboring Gaussians are used to classify the testing instances. Experimental results show KNG method greatly outperforms traditional classification methods in dealing with imbalanced classification problems with noisy dataset.

The phase III study addresses the issues of feature selection and parameter tuning of KNG algorithm. To further improve the performance of KNG algorithm, a Particle Swarm Optimization based method (PSO-KNG) is proposed. PSO-KNG formulates model parameters and data features into the same particle vector and thus can search the best feature and parameter combination jointly. The experimental results show that PSO can greatly improve the performance of KNG with better accuracy and much lower computational cost.
ContributorsHe, Miao (Author) / Wu, Teresa (Thesis advisor) / Li, Jing (Committee member) / Silva, Alvin (Committee member) / Borror, Connie (Committee member) / Arizona State University (Publisher)
Created2014
137487-Thumbnail Image.png
Description
The current Enterprise Requirements and Acquisition Model (ERAM), a discrete event simulation of the major tasks and decisions within the DoD acquisition system, identifies several what-if intervention strategies to improve program completion time. However, processes that contribute to the program acquisition completion time were not explicitly identified in the simulation

The current Enterprise Requirements and Acquisition Model (ERAM), a discrete event simulation of the major tasks and decisions within the DoD acquisition system, identifies several what-if intervention strategies to improve program completion time. However, processes that contribute to the program acquisition completion time were not explicitly identified in the simulation study. This research seeks to determine the acquisition processes that contribute significantly to total simulated program time in the acquisition system for all programs reaching Milestone C. Specifically, this research examines the effect of increased scope management, technology maturity, and decreased variation and mean process times in post-Design Readiness Review contractor activities by performing additional simulation analyses. Potential policies are formulated from the results to further improve program acquisition completion time.
ContributorsWorger, Danielle Marie (Author) / Wu, Teresa (Thesis director) / Shunk, Dan (Committee member) / Wirthlin, J. Robert (Committee member) / Industrial, Systems (Contributor) / Barrett, The Honors College (Contributor)
Created2013-05
154578-Thumbnail Image.png
Description
Buildings consume nearly 50% of the total energy in the United States, which drives the need to develop high-fidelity models for building energy systems. Extensive methods and techniques have been developed, studied, and applied to building energy simulation and forecasting, while most of work have focused on developing dedicated modeling

Buildings consume nearly 50% of the total energy in the United States, which drives the need to develop high-fidelity models for building energy systems. Extensive methods and techniques have been developed, studied, and applied to building energy simulation and forecasting, while most of work have focused on developing dedicated modeling approach for generic buildings. In this study, an integrated computationally efficient and high-fidelity building energy modeling framework is proposed, with the concentration on developing a generalized modeling approach for various types of buildings. First, a number of data-driven simulation models are reviewed and assessed on various types of computationally expensive simulation problems. Motivated by the conclusion that no model outperforms others if amortized over diverse problems, a meta-learning based recommendation system for data-driven simulation modeling is proposed. To test the feasibility of the proposed framework on the building energy system, an extended application of the recommendation system for short-term building energy forecasting is deployed on various buildings. Finally, Kalman filter-based data fusion technique is incorporated into the building recommendation system for on-line energy forecasting. Data fusion enables model calibration to update the state estimation in real-time, which filters out the noise and renders more accurate energy forecast. The framework is composed of two modules: off-line model recommendation module and on-line model calibration module. Specifically, the off-line model recommendation module includes 6 widely used data-driven simulation models, which are ranked by meta-learning recommendation system for off-line energy modeling on a given building scenario. Only a selective set of building physical and operational characteristic features is needed to complete the recommendation task. The on-line calibration module effectively addresses system uncertainties, where data fusion on off-line model is applied based on system identification and Kalman filtering methods. The developed data-driven modeling framework is validated on various genres of buildings, and the experimental results demonstrate desired performance on building energy forecasting in terms of accuracy and computational efficiency. The framework could be easily implemented into building energy model predictive control (MPC), demand response (DR) analysis and real-time operation decision support systems.
ContributorsCui, Can (Author) / Wu, Teresa (Thesis advisor) / Weir, Jeffery D. (Thesis advisor) / Li, Jing (Committee member) / Fowler, John (Committee member) / Hu, Mengqi (Committee member) / Arizona State University (Publisher)
Created2016
155450-Thumbnail Image.png
Description
Distributed Renewable energy generators are now contributing a significant amount of energy into the energy grid. Consequently, reliability adequacy of such energy generators will depend on making accurate forecasts of energy produced by them. Power outputs of Solar PV systems depend on the stochastic variation of environmental factors (solar irradiance,

Distributed Renewable energy generators are now contributing a significant amount of energy into the energy grid. Consequently, reliability adequacy of such energy generators will depend on making accurate forecasts of energy produced by them. Power outputs of Solar PV systems depend on the stochastic variation of environmental factors (solar irradiance, ambient temperature & wind speed) and random mechanical failures/repairs. Monte Carlo Simulation which is typically used to model such problems becomes too computationally intensive leading to simplifying state-space assumptions. Multi-state models for power system reliability offer a higher flexibility in providing a description of system state evolution and an accurate representation of probability. In this study, Universal Generating Functions (UGF) were used to solve such combinatorial problems. 8 grid connected Solar PV systems were analyzed with a combined capacity of about 5MW located in a hot-dry climate (Arizona) and accuracy of 98% was achieved when validated with real-time data. An analytics framework is provided to grid operators and utilities to effectively forecast energy produced by distributed energy assets and in turn, develop strategies for effective Demand Response in times of increased share of renewable distributed energy assets in the grid. Second part of this thesis extends the environmental modelling approach to develop an aging test to be run in conjunction with an accelerated test of Solar PV modules. Accelerated Lifetime Testing procedures in the industry are used to determine the dominant failure modes which the product undergoes in the field, as well as predict the lifetime of the product. UV stressor is one of the ten stressors which a PV module undergoes in the field. UV exposure causes browning of modules leading to drop in Short Circuit Current. This thesis presents an environmental modelling approach for the hot-dry climate and extends it to develop an aging test methodology. This along with the accelerated tests would help achieve the goal of correlating field failures with accelerated tests and obtain acceleration factor. This knowledge would help predict PV module degradation in the field within 30% of the actual value and help in knowing the PV module lifetime accurately.
ContributorsKadloor, Nikhil (Author) / Kuitche, Joseph (Thesis advisor) / Pan, Rong (Thesis advisor) / Wu, Teresa (Committee member) / Arizona State University (Publisher)
Created2017
156053-Thumbnail Image.png
Description
Understanding customer preference is crucial for new product planning and marketing decisions. This thesis explores how historical data can be leveraged to understand and predict customer preference. This thesis presents a decision support framework that provides a holistic view on customer preference by following a two-phase procedure. Phase-1 uses cluster

Understanding customer preference is crucial for new product planning and marketing decisions. This thesis explores how historical data can be leveraged to understand and predict customer preference. This thesis presents a decision support framework that provides a holistic view on customer preference by following a two-phase procedure. Phase-1 uses cluster analysis to create product profiles based on which customer profiles are derived. Phase-2 then delves deep into each of the customer profiles and investigates causality behind their preference using Bayesian networks. This thesis illustrates the working of the framework using the case of Intel Corporation, world’s largest semiconductor manufacturing company.
ContributorsRam, Sudarshan Venkat (Author) / Kempf, Karl G. (Thesis advisor) / Wu, Teresa (Thesis advisor) / Ju, Feng (Committee member) / Arizona State University (Publisher)
Created2017
161762-Thumbnail Image.png
Description
Nonalcoholic Steatohepatitis (NASH) is a severe form of Nonalcoholic fatty liverdisease, that is caused due to excessive calorie intake, sedentary lifestyle and in the absence of severe alcohol consumption. It is widely prevalent in the United States and in many other developed countries, affecting up to 25 percent of the population. Due to

Nonalcoholic Steatohepatitis (NASH) is a severe form of Nonalcoholic fatty liverdisease, that is caused due to excessive calorie intake, sedentary lifestyle and in the absence of severe alcohol consumption. It is widely prevalent in the United States and in many other developed countries, affecting up to 25 percent of the population. Due to being asymptotic, it usually goes unnoticed and may lead to liver failure if not treated at the right time. Currently, liver biopsy is the gold standard to diagnose NASH, but being an invasive procedure, it comes with it's own complications along with the inconvenience of sampling repeated measurements over a period of time. Hence, noninvasive procedures to assess NASH are urgently required. Magnetic Resonance Elastography (MRE) based Shear Stiffness and Loss Modulus along with Magnetic Resonance Imaging based proton density fat fraction have been successfully combined to predict NASH stages However, their role in the prediction of disease progression still remains to be investigated. This thesis thus looks into combining features from serial MRE observations to develop statistical models to predict NASH progression. It utilizes data from an experiment conducted on male mice to develop progressive and regressive NASH and trains ordinal models, ordered probit regression and ordinal forest on labels generated from a logistic regression model. The models are assessed on histological data collected at the end point of the experiment. The models developed provide a framework to utilize a non-invasive tool to predict NASH disease progression.
ContributorsDeshpande, Eeshan (Author) / Ju, Feng (Thesis advisor) / Wu, Teresa (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)
Created2021
158398-Thumbnail Image.png
Description
The main objective of this research is to develop reliability assessment methodologies to quantify the effect of various environmental factors on photovoltaic (PV) module performance degradation. The manufacturers of these photovoltaic modules typically provide a warranty level of about 25 years for 20% power degradation from the initial specified power

The main objective of this research is to develop reliability assessment methodologies to quantify the effect of various environmental factors on photovoltaic (PV) module performance degradation. The manufacturers of these photovoltaic modules typically provide a warranty level of about 25 years for 20% power degradation from the initial specified power rating. To quantify the reliability of such PV modules, the Accelerated Life Testing (ALT) plays an important role. But there are several obstacles that needs to be tackled to conduct such experiments, since there has not been enough historical field data available. Even if some time-series performance data of maximum output power (Pmax) is available, it may not be useful to develop failure/degradation mode-specific accelerated tests. This is because, to study the specific failure modes, it is essential to use failure mode-specific performance variable (like short circuit current, open circuit voltage or fill factor) that is directly affected by the failure mode, instead of overall power which would be affected by one or more of the performance variables. Hence, to address several of the above-mentioned issues, this research is divided into three phases. The first phase deals with developing models to study climate specific failure modes using failure mode specific parameters instead of power degradation. The limited field data collected after a long time (say 18-21 years), is utilized to model the degradation rate and the developed model is then calibrated to account for several unknown environmental effects using the available qualification testing data. The second phase discusses the cumulative damage modeling method to quantify the effects of various environmental variables on the overall power production of the photovoltaic module. Mainly, this cumulative degradation modeling approach is used to model the power degradation path and quantify the effects of high frequency multiple environmental input data (like temperature, humidity measured every minute or hour) with very sparse response data (power measurements taken quarterly or annually). The third phase deals with optimal planning and inference framework using Iterative-Accelerated Life Testing (I-ALT) methodology. All the proposed methodologies are demonstrated and validated using appropriate case studies.
ContributorsBala Subramaniyan, Arun (Author) / Pan, Rong (Thesis advisor) / Tamizhmani, Govindasamy (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Wu, Teresa (Committee member) / Kuitche, Joseph (Committee member) / Arizona State University (Publisher)
Created2020