Search Content

The development of a validated clinically meaningful endpoint for the evaluation of tear film stability as a measure of ocular surface protection for use in the diagnosis and evaluation of dry eye disease

Description

This dissertation presents methods for the evaluation of ocular surface protection during natural blink function. The evaluation of ocular surface protection is especially important in the diagnosis of dry eye and the evaluation of dry eye severity in clinical trials. Dry eye is a highly prevalent disease affecting vast numbers…

This dissertation presents methods for the evaluation of ocular surface protection during natural blink function. The evaluation of ocular surface protection is especially important in the diagnosis of dry eye and the evaluation of dry eye severity in clinical trials. Dry eye is a highly prevalent disease affecting vast numbers (between 11% and 22%) of an aging population. There is only one approved therapy with limited efficacy, which results in a huge unmet need. The reason so few drugs have reached approval is a lack of a recognized therapeutic pathway with reproducible endpoints. While the interplay between blink function and ocular surface protection has long been recognized, all currently used evaluation techniques have addressed blink function in isolation from tear film stability, the gold standard of which is Tear Film Break-Up Time (TFBUT). In the first part of this research a manual technique of calculating ocular surface protection during natural blink function through the use of video analysis is developed and evaluated for it's ability to differentiate between dry eye and normal subjects, the results are compared with that of TFBUT. In the second part of this research the technique is improved in precision and automated through the use of video analysis algorithms. This software, called the OPI 2.0 System, is evaluated for accuracy and precision, and comparisons are made between the OPI 2.0 System and other currently recognized dry eye diagnostic techniques (e.g. TFBUT). In the third part of this research the OPI 2.0 System is deployed for use in the evaluation of subjects before, immediately after and 30 minutes after exposure to a controlled adverse environment (CAE), once again the results are compared and contrasted against commonly used dry eye endpoints. The results demonstrate that the evaluation of ocular surface protection using the OPI 2.0 System offers superior accuracy to the current standard, TFBUT.

ContributorsAbelson, Richard (Author) / Montgomery, Douglas C. (Thesis advisor) / Borror, Connie (Committee member) / Shunk, Dan (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2012

Modeling Fantasy Baseball Player Popularity Using Twitter Activity

Description

Social media is used by people every day to discuss the nuances of their lives. Major League Baseball (MLB) is a popular sport in the United States, and as such has generated a great deal of activity on Twitter. As fantasy baseball continues to grow in popularity, so does the…

Social media is used by people every day to discuss the nuances of their lives. Major League Baseball (MLB) is a popular sport in the United States, and as such has generated a great deal of activity on Twitter. As fantasy baseball continues to grow in popularity, so does the research into better algorithms for picking players. Most of the research done in this area focuses on improving the prediction of a player's individual performance. However, the crowd-sourcing power afforded by social media may enable more informed predictions about players' performances. Players are chosen by popularity and personal preferences by most amateur gamblers. While some of these trends (particularly the long-term ones) are captured by ranking systems, this research was focused on predicting the daily spikes in popularity (and therefore price or draft order) by comparing the number of mentions that the player received on Twitter compared to their previous mentions. In doing so, it was demonstrated that improved fantasy baseball predictions can be made through leveraging social media data.

ContributorsRuskin, Lewis John (Author) / Liu, Huan (Thesis director) / Montgomery, Douglas (Committee member) / Morstatter, Fred (Committee member) / Industrial, Systems (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Statistical Analysis of Power Differences between Experimental Design Software Packages

Description

Based on findings of previous studies, there was speculation that two well-known experimental design software packages, JMP and Design Expert, produced varying power outputs given the same design and user inputs. For context and scope, another popular experimental design software package, Minitab® Statistical Software version 17, was added to the…

Based on findings of previous studies, there was speculation that two well-known experimental design software packages, JMP and Design Expert, produced varying power outputs given the same design and user inputs. For context and scope, another popular experimental design software package, Minitab® Statistical Software version 17, was added to the comparison. The study compared multiple test cases run on the three software packages with a focus on 2k and 3K factorial design and adjusting the standard deviation effect size, number of categorical factors, levels, number of factors, and replicates. All six cases were run on all three programs and were attempted to be run at one, two, and three replicates each. There was an issue at the one replicate stage, however—Minitab does not allow for only one replicate full factorial designs and Design Expert will not provide power outputs for only one replicate unless there are three or more factors. From the analysis of these results, it was concluded that the differences between JMP 13 and Design Expert 10 were well within the margin of error and likely caused by rounding. The differences between JMP 13, Design Expert 10, and Minitab 17 on the other hand indicated a fundamental difference in the way Minitab addressed power calculation compared to the latest versions of JMP and Design Expert. This was found to be likely a cause of Minitab’s dummy variable coding as its default instead of the orthogonal coding default of the other two. Although dummy variable and orthogonal coding for factorial designs do not show a difference in results, the methods affect the overall power calculations. All three programs can be adjusted to use either method of coding, but the exact instructions for how are difficult to find and thus a follow-up guide on changing the coding for factorial variables would improve this issue.

ContributorsArmstrong, Julia Robin (Author) / McCarville, Daniel R. (Thesis director) / Montgomery, Douglas (Committee member) / Industrial, Systems (Contributor, Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Graph Analysis of Arctic Ice

Description

Polar ice masses can be valuable indicators of trends in global climate. In an effort to better understand the dynamics of Arctic ice, this project analyzes sea ice concentration anomaly data collected over gridded regions (cells) and builds graphs based upon high correlations between cells. These graphs offer the opportunity…

Polar ice masses can be valuable indicators of trends in global climate. In an effort to better understand the dynamics of Arctic ice, this project analyzes sea ice concentration anomaly data collected over gridded regions (cells) and builds graphs based upon high correlations between cells. These graphs offer the opportunity to use metrics such as clustering coefficients and connected components to isolate representative trends in ice masses. Based upon this analysis, the structure of sea ice graphs differs at a statistically significant level from random graphs, and several regions show erratically decreasing trends in sea ice concentration.

ContributorsWallace-Patterson, Chloe Rae (Author) / Syrotiuk, Violet (Thesis director) / Colbourn, Charles (Committee member) / Montgomery, Douglas (Committee member) / Barrett, The Honors College (Contributor) / School of Mathematical and Statistical Sciences (Contributor) / Computer Science and Engineering Program (Contributor)

Created2013-05

Early Career Performance Models: Regression-Based Forecasting Models for Predicting Future Major League Baseball Player Performance

Description

The widespread use of statistical analysis in sports-particularly Baseball- has made it increasingly necessary for small and mid-market teams to find ways to maintain their analytical advantages over large market clubs. In baseball, an opportunity for exists for teams with limited financial resources to sign players under team control to…

The widespread use of statistical analysis in sports-particularly Baseball- has made it increasingly necessary for small and mid-market teams to find ways to maintain their analytical advantages over large market clubs. In baseball, an opportunity for exists for teams with limited financial resources to sign players under team control to long-term contracts before other teams can bid for their services in free agency. If small and mid-market clubs can successfully identify talented players early, clubs can save money, achieve cost certainty and remain competitive for longer periods of time. These deals are also advantageous to players since they receive job security and greater financial dividends earlier in their career. The objective of this paper is to develop a regression-based predictive model that teams can use to forecast the performance of young baseball players with limited Major League experience. There were several tasks conducted to achieve this goal: (1) Data was obtained from Major League Baseball and Lahman's Baseball Database and sorted using Excel macros for easier analysis. (2) Players were separated into three positional groups depending on similar fielding requirements and offensive profiles: Group I was comprised of first and third basemen, Group II contains second basemen, shortstops, and center fielders and Group III contains left and right fielders. (3) Based on the context of baseball and the nature of offensive performance metrics, only players who achieve greater than 200 plate appearances within the first two years of their major league debut are included in this analysis. (4) The statistical software package JMP was used to create regression models of each group and analyze the residuals for any irregularities or normality violations. Once the models were developed, slight adjustments were made to improve the accuracy of the forecasts and identify opportunities for future work. It was discovered that Group I and Group III were the easiest player groupings to forecast while Group II required several attempts to improve the model.

ContributorsJack, Nathan Scott (Author) / Shunk, Dan (Thesis director) / Montgomery, Douglas (Committee member) / Borror, Connie (Committee member) / Industrial, Systems (Contributor) / Barrett, The Honors College (Contributor)

Created2013-05

Design of Experiments and Reliability Growth on Repairable Systems

Description

Reliability growth is not a new topic in either engineering or statistics and has been a major focus for the past few decades. The increasing level of high-tech complex systems and interconnected components and systems implies that reliability problems will continue to exist and may require more complex solutions. The…

Reliability growth is not a new topic in either engineering or statistics and has been a major focus for the past few decades. The increasing level of high-tech complex systems and interconnected components and systems implies that reliability problems will continue to exist and may require more complex solutions. The most heavily used experimental designs in assessing and predicting a systems reliability are the "classical designs", such as full factorial designs, fractional factorial designs, and Latin square designs. They are so heavily used because they are optimal in their own right and have served superbly well in providing efficient insight into the underlying structure of industrial processes. However, cases do arise when the classical designs do not cover a particular practical situation. Repairable systems are such a case in that they usually have limitations on the maximum number of runs or too many varying levels for factors. This research explores the D-optimal design criteria as it applies to the Poisson Regression model on repairable systems, with a number of independent variables and under varying assumptions, to include the total time tested at a specific design point with fixed parameters, the use of a Bayesian approach with unknown parameters, and how the design region affects the optimal design. In applying experimental design to these complex repairable systems, one may discover interactions between stressors and provide better failure data. Our novel approach of accounting for time and the design space in the early stages of testing of repairable systems should, theoretically, in the final engineering design improve the system's reliability, maintainability and availability.

ContributorsTAYLOR, DUSTIN (Author) / Montgomery, Douglas (Thesis advisor) / Pan, Rong (Thesis advisor) / Rigdon, Steve (Committee member) / Freeman, Laura (Committee member) / Iquebal, Ashif (Committee member) / Arizona State University (Publisher)

Created2023

Analysis of No-Confounding Designs in 16 Runs for 9-14 Factors

Description

Nonregular designs for 9-14 factors in 16 runs are a vital alternative for to theregular minimum aberration resolution III fractional factorials. Because there is no complete aliasing between the main factor and two factor interactions, these designs are useful as potential confusion in results is avoided. However, there is another…

Nonregular designs for 9-14 factors in 16 runs are a vital alternative for to theregular minimum aberration resolution III fractional factorials. Because there is no complete aliasing between the main factor and two factor interactions, these designs are useful as potential confusion in results is avoided. However, there is another associated complication to this kind of design due to the complete confounding for some of the two- factors. In this research, the focus is on using three different of methods and compare the results. The methods are: Stepwise, least absolute shrinkage and selection operator (LASSO) and the Dantzig selector method. In a previous research, Metcalfe discuss the nonregular designs for 6-8 factors design and studies several analysis methods. She also develops a new method, The Aliased Informed Model Selection (AIMS), for those designs. This research builds upon that. For this research, simulation is used to develop random models to analyze designs from the class of nonregular fractions with 9 – 14 factors in 16 runs using JMP scripting. Then, analyze the cases with the mentioned methods and find the success rate for each one. The model generations were random with only main factors, or main factors and two- factors interaction as active effects. Effect sizes of 2 and 3 standard deviations are studied. The nonregular design used in this research are 9 and 11-factors design. Results shows that there is a clear consistency for the main factors only as active effects using all the methods. However, adding the interactions to the active effects degrade the success rate substantially for the Dantzig method. Moreover, as the active effects exceed approximately half of the degrees of freedom for the design the performance for all i the methods decreases. Finally, some recommendations are discussed for further research investigation such as AIMS, other variation methods and Augmentation.

ContributorsAlqarni, Hanan (Author) / Montgomery, Douglas (Thesis advisor) / Metcalfe, Carly (Committee member) / Pedrielli, Giulia (Committee member) / Arizona State University (Publisher)

Created2022

An Analysis of the Boundary Explorer Adaptive Sampling Technique

Description

With the explosion of autonomous systems under development, complex simulation models are being tested and relied on far more than in the recent past. This uptick in autonomous systems being modeled then tested magnifies both the advantages and disadvantages of simulation experimentation. An inherent problem in autonomous systems development is…

With the explosion of autonomous systems under development, complex simulation models are being tested and relied on far more than in the recent past. This uptick in autonomous systems being modeled then tested magnifies both the advantages and disadvantages of simulation experimentation. An inherent problem in autonomous systems development is when small changes in factor settings result in large changes in a response’s performance. These occurrences look like cliffs in a metamodel’s response surface and are referred to as performance mode boundary regions. These regions represent areas of interest in the autonomous system’s decision-making process. Therefore, performance mode boundary regions are areas of interest for autonomous systems developers.Traditional augmentation methods aid experimenters seeking different objectives, often by improving a certain design property of the factor space (such as variance) or a design’s modeling capabilities. While useful, these augmentation techniques do not target areas of interest that need attention in autonomous systems testing focused on the response. Boundary Explorer Adaptive Sampling Technique, or BEAST, is a set of design augmentation algorithms. The adaptive sampling algorithm targets performance mode boundaries with additional samples. The gap filling augmentation algorithm targets sparsely sampled areas in the factor space. BEAST allows for sampling to adapt to information obtained from pervious iterations of experimentation and target these regions of interest. Exploiting the advantages of simulation model experimentation, BEAST can be used to provide additional iterations of experimentation, providing clarity and high-fidelity in areas of interest along potentially steep gradient regions. The objective of this thesis is to research and present BEAST, then compare BEAST’s algorithms to other design augmentation techniques. Comparisons are made towards traditional methods that are already implemented in SAS Institute’s JMP software, or emerging adaptive sampling techniques, such as Range Adversarial Planning Tool (RAPT). The goal of this objective is to gain a deeper understanding of how BEAST works and where it stands in the design augmentation space for practical applications. With a gained understanding of how BEAST operates and how well BEAST performs, future research recommendations will be presented to improve BEAST’s capabilities.

ContributorsSimpson, Ryan James (Author) / Montgomery, Douglas (Thesis advisor) / Karl, Andrew (Committee member) / Pan, Rong (Committee member) / Pedrielli, Giulia (Committee member) / Wisnowski, James (Committee member) / Arizona State University (Publisher)

Created2024

Making Bayesian Optimization Practical in the Context of High Dimensional, Highly Expensive, BlackBox Functions

Description

Complex systems appear when interaction among system components creates emergent behavior that is difficult to be predicted from component properties. The growth of Internet of Things (IoT) and embedded technology has increased complexity across several sectors (e.g., automotive, aerospace, agriculture, city infrastructures, home technologies, healthcare) where the paradigm of cyber-physical…

Complex systems appear when interaction among system components creates emergent behavior that is difficult to be predicted from component properties. The growth of Internet of Things (IoT) and embedded technology has increased complexity across several sectors (e.g., automotive, aerospace, agriculture, city infrastructures, home technologies, healthcare) where the paradigm of cyber-physical systems (CPSs) has become a standard. While CPS enables unprecedented capabilities, it raises new challenges in system design, certification, control, and verification. When optimizing system performance computationally expensive simulation tools are often required, and search algorithms that sequentially interrogate a simulator to learn promising solutions are in great demand. This class of algorithms are black-box optimization techniques. However, the generality that makes black-box optimization desirable also causes computational efficiency difficulties when applied real problems. This thesis focuses on Bayesian optimization, a prominent black-box optimization family, and proposes new principles, translated in implementable algorithms, to scale Bayesian optimization to highly expensive, large scale problems. Four problem contexts are studied and approaches are proposed for practically applying Bayesian optimization concepts, namely: (1) increasing sample efficiency of a highly expensive simulator in the presence of other sources of information, where multi-fidelity optimization is used to leverage complementary information sources; (2) accelerating global optimization in the presence of local searches by avoiding over-exploitation with adaptive restart behavior; (3) scaling optimization to high dimensional input spaces by integrating Game theoretic mechanisms with traditional techniques; (4) accelerating optimization by embedding function structure when the reward function is a minimum of several functions. In the first context this thesis produces two multi-fidelity algorithms, a sample driven and model driven approach, and is implemented to optimize a serial production line; in the second context the Stochastic Optimization with Adaptive Restart (SOAR) framework is produced and analyzed with multiple applications to CPS falsification problems; in the third context the Bayesian optimization with sample fictitious play (BOFiP) algorithm is developed with an implementation in high-dimensional neural network training; in the last problem context the minimum surrogate optimization (MSO) framework is produced and combined with both Bayesian optimization and the SOAR framework with applications in simultaneous falsification of multiple CPS requirements.

ContributorsMathesen, Logan (Author) / Pedrielli, Giulia (Thesis advisor) / Candan, Kasim (Committee member) / Fainekos, Georgios (Committee member) / Gel, Esma (Committee member) / Montgomery, Douglas (Committee member) / Zabinsky, Zelda (Committee member) / Arizona State University (Publisher)

Created2021

Machine Learning Models for High-Dimensional Matched Data

Description

Matching or stratification is commonly used in observational studies to remove bias due to confounding variables. Analyzing matched data sets requires specific methods which handle dependency among observations within a stratum. Also, modern studies often include hundreds or thousands of variables. Traditional methods for matched data sets are challenged in…

Matching or stratification is commonly used in observational studies to remove bias due to confounding variables. Analyzing matched data sets requires specific methods which handle dependency among observations within a stratum. Also, modern studies often include hundreds or thousands of variables. Traditional methods for matched data sets are challenged in high-dimensional settings, mixed type variables (numerical and categorical), nonlinear andinteraction effects. Furthermore, machine learning research for such structured data is quite limited. This dissertation addresses this important gap and proposes machine learning models for identifying informative variables from high-dimensional matched data sets. The first part of this dissertation proposes a machine learning model to identify informative variables from high-dimensional matched case-control data sets. The outcome of interest in this study design is binary (case or control), and each stratum is assumed to have one unit from each outcome level. The proposed method which is referred to as Matched Forest (MF) is effective for large number of variables and identifying interaction effects. The second part of this dissertation proposes three enhancements of MF algorithm. First, a regularization framework is proposed to improve variable selection performance in excessively high-dimensional settings. Second, a classification method is proposed to classify unlabeled pairs of data. Third, two metrics are proposed to estimate the effects of important variables identified by MF. The third part proposes a machine learning model based on Neural Networks to identify important variables from a more generalized matched case-control data set where each stratum has one unit from case outcome level and more than one unit from control outcome level. This method which is referred to as Matched Neural Network (MNN) performs better than current algorithms to identify variables with interaction effects. Lastly, a generalized machine learning model is proposed to identify informative variables from high-dimensional matched data sets where the outcome has more than two levels. This method outperforms existing algorithms in the literature in identifying variables with complex nonlinear and interaction effects.

ContributorsShomal Zadeh, Nooshin (Author) / Runger, George (Thesis advisor) / Montgomery, Douglas (Committee member) / Shinde, Shilpa (Committee member) / Escobedo, Adolfo (Committee member) / Arizona State University (Publisher)

Created2021