Search Content

Modeling time series data for supervised learning

Description

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of…

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of the relevant patterns This dissertation proposes TS representations and methods for supervised TS analysis. The approaches combine new representations that handle translations and dilations of patterns with bag-of-features strategies and tree-based ensemble learning. This provides flexibility in handling time-warped patterns in a computationally efficient way. The ensemble learners provide a classification framework that can handle high-dimensional feature spaces, multiple classes and interaction between features. The proposed representations are useful for classification and interpretation of the TS data of varying complexity. The first contribution handles the problem of time warping with a feature-based approach. An interval selection and local feature extraction strategy is proposed to learn a bag-of-features representation. This is distinctly different from common similarity-based time warping. This allows for additional features (such as pattern location) to be easily integrated into the models. The learners have the capability to account for the temporal information through the recursive partitioning method. The second contribution focuses on the comprehensibility of the models. A new representation is integrated with local feature importance measures from tree-based ensembles, to diagnose and interpret time intervals that are important to the model. Multivariate time series (MTS) are especially challenging because the input consists of a collection of TS and both features within TS and interactions between TS can be important to models. Another contribution uses a different representation to produce computationally efficient strategies that learn a symbolic representation for MTS. Relationships between the multiple TS, nominal and missing values are handled with tree-based learners. Applications such as speech recognition, medical diagnosis and gesture recognition are used to illustrate the methods. Experimental results show that the TS representations and methods provide better results than competitive methods on a comprehensive collection of benchmark datasets. Moreover, the proposed approaches naturally provide solutions to similarity analysis, predictive pattern discovery and feature selection.

ContributorsBaydogan, Mustafa Gokce (Author) / Runger, George C. (Thesis advisor) / Atkinson, Robert (Committee member) / Gel, Esma (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2012

The development of a validated clinically meaningful endpoint for the evaluation of tear film stability as a measure of ocular surface protection for use in the diagnosis and evaluation of dry eye disease

Description

This dissertation presents methods for the evaluation of ocular surface protection during natural blink function. The evaluation of ocular surface protection is especially important in the diagnosis of dry eye and the evaluation of dry eye severity in clinical trials. Dry eye is a highly prevalent disease affecting vast numbers…

This dissertation presents methods for the evaluation of ocular surface protection during natural blink function. The evaluation of ocular surface protection is especially important in the diagnosis of dry eye and the evaluation of dry eye severity in clinical trials. Dry eye is a highly prevalent disease affecting vast numbers (between 11% and 22%) of an aging population. There is only one approved therapy with limited efficacy, which results in a huge unmet need. The reason so few drugs have reached approval is a lack of a recognized therapeutic pathway with reproducible endpoints. While the interplay between blink function and ocular surface protection has long been recognized, all currently used evaluation techniques have addressed blink function in isolation from tear film stability, the gold standard of which is Tear Film Break-Up Time (TFBUT). In the first part of this research a manual technique of calculating ocular surface protection during natural blink function through the use of video analysis is developed and evaluated for it's ability to differentiate between dry eye and normal subjects, the results are compared with that of TFBUT. In the second part of this research the technique is improved in precision and automated through the use of video analysis algorithms. This software, called the OPI 2.0 System, is evaluated for accuracy and precision, and comparisons are made between the OPI 2.0 System and other currently recognized dry eye diagnostic techniques (e.g. TFBUT). In the third part of this research the OPI 2.0 System is deployed for use in the evaluation of subjects before, immediately after and 30 minutes after exposure to a controlled adverse environment (CAE), once again the results are compared and contrasted against commonly used dry eye endpoints. The results demonstrate that the evaluation of ocular surface protection using the OPI 2.0 System offers superior accuracy to the current standard, TFBUT.

ContributorsAbelson, Richard (Author) / Montgomery, Douglas C. (Thesis advisor) / Borror, Connie (Committee member) / Shunk, Dan (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2012

Semiconductor yield modeling using generalized linear models

Description

Yield is a key process performance characteristic in the capital-intensive semiconductor fabrication process. In an industry where machines cost millions of dollars and cycle times are a number of months, predicting and optimizing yield are critical to process improvement, customer satisfaction, and financial success. Semiconductor yield modeling is…

Yield is a key process performance characteristic in the capital-intensive semiconductor fabrication process. In an industry where machines cost millions of dollars and cycle times are a number of months, predicting and optimizing yield are critical to process improvement, customer satisfaction, and financial success. Semiconductor yield modeling is essential to identifying processing issues, improving quality, and meeting customer demand in the industry. However, the complicated fabrication process, the massive amount of data collected, and the number of models available make yield modeling a complex and challenging task. This work presents modeling strategies to forecast yield using generalized linear models (GLMs) based on defect metrology data. The research is divided into three main parts. First, the data integration and aggregation necessary for model building are described, and GLMs are constructed for yield forecasting. This technique yields results at both the die and the wafer levels, outperforms existing models found in the literature based on prediction errors, and identifies significant factors that can drive process improvement. This method also allows the nested structure of the process to be considered in the model, improving predictive capabilities and violating fewer assumptions. To account for the random sampling typically used in fabrication, the work is extended by using generalized linear mixed models (GLMMs) and a larger dataset to show the differences between batch-specific and population-averaged models in this application and how they compare to GLMs. These results show some additional improvements in forecasting abilities under certain conditions and show the differences between the significant effects identified in the GLM and GLMM models. The effects of link functions and sample size are also examined at the die and wafer levels. The third part of this research describes a methodology for integrating classification and regression trees (CART) with GLMs. This technique uses the terminal nodes identified in the classification tree to add predictors to a GLM. This method enables the model to consider important interaction terms in a simpler way than with the GLM alone, and provides valuable insight into the fabrication process through the combination of the tree structure and the statistical analysis of the GLM.

ContributorsKrueger, Dana Cheree (Author) / Montgomery, Douglas C. (Thesis advisor) / Fowler, John (Committee member) / Pan, Rong (Committee member) / Pfund, Michele (Committee member) / Arizona State University (Publisher)

Created2011

Optimal Designs under Logistic Mixed Models

Description

Longitudinal data involving multiple subjects is quite popular in medical and social science areas. I consider generalized linear mixed models (GLMMs) applied to such longitudinal data, and the optimal design searching problem under such models. In this case, based on optimal design theory, the optimality criteria depend on the estimated…

Longitudinal data involving multiple subjects is quite popular in medical and social science areas. I consider generalized linear mixed models (GLMMs) applied to such longitudinal data, and the optimal design searching problem under such models. In this case, based on optimal design theory, the optimality criteria depend on the estimated parameters, which leads to local optimality. Moreover, the information matrix under a GLMM doesn't have a closed-form expression. My dissertation includes three topics related to this design problem. The first part is searching for locally optimal designs under GLMMs with longitudinal data. I apply penalized quasi-likelihood (PQL) method to approximate the information matrix and compare several approximations to show the superiority of PQL over other approximations. Under different local parameters and design restrictions, locally D- and A- optimal designs are constructed based on the approximation. An interesting finding is that locally optimal designs sometimes apply different designs to different subjects. Finally, the robustness of these locally optimal designs is discussed. In the second part, an unknown observational covariate is added to the previous model. With an unknown observational variable in the experiment, expected optimality criteria are considered. Under different assumptions of the unknown variable and parameter settings, locally optimal designs are constructed and discussed. In the last part, Bayesian optimal designs are considered under logistic mixed models. Considering different priors of the local parameters, Bayesian optimal designs are generated. Bayesian design under such a model is usually expensive in time. The running time in this dissertation is optimized to an acceptable amount with accurate results. I also discuss the robustness of these Bayesian optimal designs, which is the motivation of applying such an approach.

ContributorsShi, Yao (Author) / Stufken, John (Thesis advisor) / Kao, Ming-Hung (Thesis advisor) / Lan, Shiwei (Committee member) / Pan, Rong (Committee member) / Reiser, Mark (Committee member) / Arizona State University (Publisher)

Created2022

Design of Experiments and Reliability Growth on Repairable Systems

Description

Reliability growth is not a new topic in either engineering or statistics and has been a major focus for the past few decades. The increasing level of high-tech complex systems and interconnected components and systems implies that reliability problems will continue to exist and may require more complex solutions. The…

Reliability growth is not a new topic in either engineering or statistics and has been a major focus for the past few decades. The increasing level of high-tech complex systems and interconnected components and systems implies that reliability problems will continue to exist and may require more complex solutions. The most heavily used experimental designs in assessing and predicting a systems reliability are the "classical designs", such as full factorial designs, fractional factorial designs, and Latin square designs. They are so heavily used because they are optimal in their own right and have served superbly well in providing efficient insight into the underlying structure of industrial processes. However, cases do arise when the classical designs do not cover a particular practical situation. Repairable systems are such a case in that they usually have limitations on the maximum number of runs or too many varying levels for factors. This research explores the D-optimal design criteria as it applies to the Poisson Regression model on repairable systems, with a number of independent variables and under varying assumptions, to include the total time tested at a specific design point with fixed parameters, the use of a Bayesian approach with unknown parameters, and how the design region affects the optimal design. In applying experimental design to these complex repairable systems, one may discover interactions between stressors and provide better failure data. Our novel approach of accounting for time and the design space in the early stages of testing of repairable systems should, theoretically, in the final engineering design improve the system's reliability, maintainability and availability.

ContributorsTAYLOR, DUSTIN (Author) / Montgomery, Douglas (Thesis advisor) / Pan, Rong (Thesis advisor) / Rigdon, Steve (Committee member) / Freeman, Laura (Committee member) / Iquebal, Ashif (Committee member) / Arizona State University (Publisher)

Created2023

A Digital Twin Based Approach to Optimize Reticle Management in Photolithography

Description

Photolithography is among the key phases in chip manufacturing. It is also among the most expensive with manufacturing equipment valued at the hundreds of millions of dollars. It is paramount that the process is run efficiently, guaranteeing high resource utilization and low product cycle times. A key element in the…

Photolithography is among the key phases in chip manufacturing. It is also among the most expensive with manufacturing equipment valued at the hundreds of millions of dollars. It is paramount that the process is run efficiently, guaranteeing high resource utilization and low product cycle times. A key element in the operation of a photolithography system is the effective management of the reticles that are responsible for the imprinting of the circuit path on the wafers. Managing reticles means determining which are appropriate to mount on the very expensive scanners as a function of the product types being released to the system. Given the importance of the problem, several heuristic policies have been developed in the industry practice in an attempt to guarantee that the expensive tools are never idle. However, such policies have difficulties reacting to unforeseen events (e.g., unplanned failures, unavailability of reticles). On the other hand, the technological advance of the semiconductor industry in sensing at system and process level should be harnessed to improve on these “expert policies”. In this thesis, a system for the real time reticle management is developed that not only is able to retrieve information from the real system, but also can embed commonly used policies to improve upon them. A new digital twin for the photolithography process is developed that efficiently and accurately predicts the system performance thus enabling predictions for the future behaviors as a function of possible decisions. The results demonstrate the validity of the developed model, and the feasibility of the overall approach demonstrating a statistically significant improvement of performance as compared to the current policy.

ContributorsSivasubramanian, Chandrasekhar (Author) / Pedrielli, Giulia (Thesis advisor) / Jevtic, Petar (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2023

Ultra-efficient and Scalable Uncertainty Quantification and Probabilistic Analysis for Heterogeneous Materials

Description

Ultra-fast 2D/3D material microstructure reconstruction and quantitative structure-property mapping are crucial components of integrated computational material engineering (ICME). It is particularly challenging for modeling random heterogeneous materials such as alloys, composites, polymers, porous media, and granular matters, which exhibit strong randomness and variations of their material properties due to…

Ultra-fast 2D/3D material microstructure reconstruction and quantitative structure-property mapping are crucial components of integrated computational material engineering (ICME). It is particularly challenging for modeling random heterogeneous materials such as alloys, composites, polymers, porous media, and granular matters, which exhibit strong randomness and variations of their material properties due to the hierarchical uncertainties associated with their complex microstructure at different length scales. Such uncertainties also exist in disordered hyperuniform systems that are statistically isotropic and possess no Bragg peaks like liquids and glasses, yet they suppress large-scale density fluctuations in a similar manner as in perfect crystals. The unique hyperuniform long-range order in these systems endow them with nearly optimal transport, electronic and mechanical properties. The concept of hyperuniformity was originally introduced for many-particle systems and has subsequently been generalized to heterogeneous materials such as porous media, composites, polymers, and biological tissues for unconventional property discovery. An explicit mixture random field (MRF) model is proposed to characterize and reconstruct multi-phase stochastic material property and microstructure simultaneously, where no additional tuning step nor iteration is needed compared with other stochastic optimization approaches such as the simulated annealing. The proposed method is shown to have ultra-high computational efficiency and only requires minimal imaging and property input data. Considering microscale uncertainties, the material reliability will face the challenge of high dimensionality. To deal with the so-called “curse of dimensionality”, efficient material reliability analysis methods are developed. Then, the explicit hierarchical uncertainty quantification model and efficient material reliability solvers are applied to reliability-based topology optimization to pursue the lightweight under reliability constraint defined based on structural mechanical responses. Efficient and accurate methods for high-resolution microstructure and hyperuniform microstructure reconstruction, high-dimensional material reliability analysis, and reliability-based topology optimization are developed. The proposed framework can be readily incorporated into ICME for probabilistic analysis, discovery of novel disordered hyperuniform materials, material design and optimization.

ContributorsGao, Yi (Author) / Liu, Yongming (Thesis advisor) / Jiao, Yang (Committee member) / Ren, Yi (Committee member) / Pan, Rong (Committee member) / Mignolet, Marc (Committee member) / Arizona State University (Publisher)

Created2021

System-level Models for Network Monitoring and Change Detection

Description

Monitoring a system for deviations from standard or reference behavior is essential for many data-driven tasks. Whether it is monitoring sensor data or the interactions between system elements, such as edges in a path or transactions in a network, the goal is to detect significant changes from a reference. As…

Monitoring a system for deviations from standard or reference behavior is essential for many data-driven tasks. Whether it is monitoring sensor data or the interactions between system elements, such as edges in a path or transactions in a network, the goal is to detect significant changes from a reference. As technological advancements allow for more data to be collected from systems, monitoring approaches should evolve to accommodate the greater collection of high-dimensional data and complex system settings. This dissertation introduces system-level models for monitoring tasks characterized by changes in a subset of system components, utilizing component-level information and relationships. A change may only affect a portion of the data or system (partial change). The first three parts of this dissertation present applications and methods for detecting partial changes. The first part introduces a methodology for partial change detection in a simple, univariate setting. Changes are detected with posterior probabilities and statistical mixture models which allow only a fraction of data to change. The second and third parts of this dissertation center around monitoring more complex multivariate systems modeled through networks. The goal is to detect partial changes in the underlying network attributes and topology. The contributions of the second and third parts are two non-parametric system-level monitoring techniques that consider relationships between network elements. The algorithm Supervised Network Monitoring (SNetM) leverages Graph Neural Networks and transforms the problem into supervised learning. The other algorithm Supervised Network Monitoring for Partial Temporal Inhomogeneity (SNetMP) generates a network embedding, and then transforms the problem to supervised learning. At the end, both SNetM and SNetMP construct measures and transform them to pseudo-probabilities to be monitored for changes. The last topic addresses predicting and monitoring system-level delays on paths in a transportation/delivery system. For each item, the risk of delay is quantified. Machine learning is used to build a system-level model for delay risk, given the information available (such as environmental conditions) on the edges of a path, which integrates edge models. The outputs can then be used in a system-wide monitoring framework, and items most at risk are identified for potential corrective actions.

ContributorsKasaei Roodsari, Maziar (Author) / Runger, George (Thesis advisor) / Escobedo, Adolfo (Committee member) / Pan, Rong (Committee member) / Shinde, Amit (Committee member) / Arizona State University (Publisher)

Created2021

Adaptive Gray Box Reinforcement Learning Methods to Support Therapeutic Research: From Product design to Manufacturing

Description

This thesis is developed in the context of biomanufacturing of modern products that have the following properties: require short design to manufacturing time, they have high variability due to a high desired level of patient personalization, and, as a result, may be manufactured in low volumes. This area at the…

This thesis is developed in the context of biomanufacturing of modern products that have the following properties: require short design to manufacturing time, they have high variability due to a high desired level of patient personalization, and, as a result, may be manufactured in low volumes. This area at the intersection of therapeutics and biomanufacturing has become increasingly important: (i) a huge push toward the design of new RNA nanoparticles has revolutionized the science of vaccines due to the COVID-19 pandemic; (ii) while the technology to produce personalized cancer medications is available, efficient design and operation of manufacturing systems is not yet agreed upon. In this work, the focus is on operations research methodologies that can support faster design of novel products, specifically RNA; and methods for the enabling of personalization in biomanufacturing, and will specifically look at the problem of cancer therapy manufacturing. Across both areas, methods are presented attempting to embed pre-existing knowledge (e.g., constraints characterizing good molecules, comparison between structures) as well as learn problem structure (e.g., the landscape of the rewards function while synthesizing the control for a single use bioreactor). This thesis produced three key outcomes: (i) ExpertRNA for the prediction of the structure of an RNA molecule given a sequence. RNA structure is fundamental in determining its function. Therefore, having efficient tools for such prediction can make all the difference for a scientist trying to understand optimal molecule configuration. For the first time, the algorithm allows expert evaluation in the loop to judge the partial predictions that the tool produces; (ii) BioMAN, a discrete event simulation tool for the study of single-use biomanufacturing of personalized cancer therapies. The discrete event simulation engine was designed tailored to handle the efficient scheduling of many parallel events which is cause by the presence of single use resources. This is the first simulator of this type for individual therapies; (iii) Part-MCTS, a novel sequential decision-making algorithm to support the control of single use systems. This tool integrates for the first-time simulation, monte-carlo tree search and optimal computing budget allocation for managing the computational effort.

ContributorsLiu, Menghan (Author) / Pedrielli, Giulia (Thesis advisor) / Bertsekas, Dimitri (Committee member) / Pan, Rong (Committee member) / Sulc, Petr (Committee member) / Wu, Teresa (Committee member) / Arizona State University (Publisher)

Created2023

An Analysis of the Boundary Explorer Adaptive Sampling Technique

Description

With the explosion of autonomous systems under development, complex simulation models are being tested and relied on far more than in the recent past. This uptick in autonomous systems being modeled then tested magnifies both the advantages and disadvantages of simulation experimentation. An inherent problem in autonomous systems development is…

With the explosion of autonomous systems under development, complex simulation models are being tested and relied on far more than in the recent past. This uptick in autonomous systems being modeled then tested magnifies both the advantages and disadvantages of simulation experimentation. An inherent problem in autonomous systems development is when small changes in factor settings result in large changes in a response’s performance. These occurrences look like cliffs in a metamodel’s response surface and are referred to as performance mode boundary regions. These regions represent areas of interest in the autonomous system’s decision-making process. Therefore, performance mode boundary regions are areas of interest for autonomous systems developers.Traditional augmentation methods aid experimenters seeking different objectives, often by improving a certain design property of the factor space (such as variance) or a design’s modeling capabilities. While useful, these augmentation techniques do not target areas of interest that need attention in autonomous systems testing focused on the response. Boundary Explorer Adaptive Sampling Technique, or BEAST, is a set of design augmentation algorithms. The adaptive sampling algorithm targets performance mode boundaries with additional samples. The gap filling augmentation algorithm targets sparsely sampled areas in the factor space. BEAST allows for sampling to adapt to information obtained from pervious iterations of experimentation and target these regions of interest. Exploiting the advantages of simulation model experimentation, BEAST can be used to provide additional iterations of experimentation, providing clarity and high-fidelity in areas of interest along potentially steep gradient regions. The objective of this thesis is to research and present BEAST, then compare BEAST’s algorithms to other design augmentation techniques. Comparisons are made towards traditional methods that are already implemented in SAS Institute’s JMP software, or emerging adaptive sampling techniques, such as Range Adversarial Planning Tool (RAPT). The goal of this objective is to gain a deeper understanding of how BEAST works and where it stands in the design augmentation space for practical applications. With a gained understanding of how BEAST operates and how well BEAST performs, future research recommendations will be presented to improve BEAST’s capabilities.

ContributorsSimpson, Ryan James (Author) / Montgomery, Douglas (Thesis advisor) / Karl, Andrew (Committee member) / Pan, Rong (Committee member) / Pedrielli, Giulia (Committee member) / Wisnowski, James (Committee member) / Arizona State University (Publisher)

Created2024