As a case study, this thesis develops methods for the analysis of large amounts of data generated from a simulated ecosystem designed to understand how mammalian biomechanics interact with environmental complexity to modulate the outcomes of predator–prey interactions. These simulations investigate how other biomechanical parameters relating to the agility of animals in predator–prey pairs are better predictors of pursuit outcomes. Traditional modelling techniques such as forward, backward, and stepwise variable selection are initially used to study these data, but the number of parameters and potentially relevant interaction effects render these methods impractical. Consequently, new modelling techniques such as LASSO regularization are used and compared to the traditional techniques in terms of accuracy and computational complexity. Finally, the splitting rules and instances in the leaves of classification trees provide the basis for future simulation with an economical number of additional runs. In general, this thesis shows the increased utility of these sophisticated statistical techniques with simulated ecological data compared to the approaches traditionally used in these fields. These techniques combined with methods from industrial Design of Experiments will help ecologists extract novel insights from simulations that combine habitat complexity, population structure, and biomechanics.
This research explores the problem of the why so few of the published algorithms enter production and furthermore, fewer end up generating sustained value. The dissertation proposes a ‘Design for Deployment’ (DFD) framework to successfully build machine learning analytics so they can be deployed to generate sustained value. The framework emphasizes and elaborates the often neglected but immensely important latter steps of an analytics process: ‘Evaluation’ and ‘Deployment’. A representative evaluation framework is proposed that incorporates the temporal-shifts and dynamism of real-world scenarios. Additionally, the recommended infrastructure allows analytics projects to pivot rapidly when a particular venture does not materialize. Deployment needs and apprehensions of the industry are identified and gaps addressed through a 4-step process for sustainable deployment. Lastly, the need for analytics as a functional area (like finance and IT) is identified to maximize the return on machine-learning deployment.
The framework and process is demonstrated in semiconductor manufacturing – it is highly complex process involving hundreds of optical, electrical, chemical, mechanical, thermal, electrochemical and software processes which makes it a highly dynamic non-stationary system. Due to the 24/7 uptime requirements in manufacturing, high-reliability and fail-safe are a must. Moreover, the ever growing volumes mean that the system must be highly scalable. Lastly, due to the high cost of change, sustained value proposition is a must for any proposed changes. Hence the context is ideal to explore the issues involved. The enterprise use-cases are used to demonstrate the robustness of the framework in addressing challenges encountered in the end-to-end process of productizing machine learning analytics in dynamic read-world scenarios.
The new spatially explicit counterfactual framework considers how spatial effects impact treatment choice, treatment variation, and treatment effects. To illustrate this new methodological framework, I first replicate a classic quasi-experimental study that evaluates the effect of drinking age policy on mortality in the United States from 1970 to 1984, and further extend it with a spatial perspective. In another example, I evaluate food access dynamics in Chicago from 2007 to 2014 by implementing advanced spatial analytics that better account for the complex patterns of food access, and quasi-experimental research design to distill the impact of the Great Recession on the foodscape. Inference interpretation is sensitive to both research design framing and underlying processes that drive geographically distributed relationships. Finally, I advance a new Spatial Data Science Infrastructure to integrate and manage data in dynamic, open environments for public health systems research and decision- making. I demonstrate an infrastructure prototype in a final case study, developed in collaboration with health department officials and community organizations.
The majority of trust research has focused on the benefits trust can have for individual actors, institutions, and organizations. This “optimistic bias” is particularly evident in work focused on institutional trust, where concepts such as procedural justice, shared values, and moral responsibility have gained prominence. But trust in institutions may not be exclusively good. We reveal implications for the “dark side” of institutional trust by reviewing relevant theories and empirical research that can contribute to a more holistic understanding. We frame our discussion by suggesting there may be a “Goldilocks principle” of institutional trust, where trust that is too low (typically the focus) or too high (not usually considered by trust researchers) may be problematic. The chapter focuses on the issue of too-high trust and processes through which such too-high trust might emerge. Specifically, excessive trust might result from external, internal, and intersecting external-internal processes. External processes refer to the actions institutions take that affect public trust, while internal processes refer to intrapersonal factors affecting a trustor’s level of trust. We describe how the beneficial psychological and behavioral outcomes of trust can be mitigated or circumvented through these processes and highlight the implications of a “darkest” side of trust when they intersect. We draw upon research on organizations and legal, governmental, and political systems to demonstrate the dark side of trust in different contexts. The conclusion outlines directions for future research and encourages researchers to consider the ethical nuances of studying how to increase institutional trust.
In the last two decades, fantasy sports have grown massively in popularity. Fantasy football in particular is the most popular fantasy sport in the United States. People spend hours upon hours every year building, researching, and perfecting their teams to compete with others for money or bragging rights. One problem, however, is that National Football League (NFL) players are human and will not perform the same as they did last week or last season. Because of this, there is a need to create a machine learning model to help predict when players will have a tough game or when they can perform above average. This report discusses the history and science of fantasy football, gathering large amounts of player data, manipulating the information to create more insightful data points, creating a machine learning model, and how to use this tool in a real-world situation. The initial model created significantly accurate predictions for quarterbacks and running backs but not receivers and tight ends. Improvements significantly increased the accuracy by reducing the mean average error to below one for all positions, resulting in a successful model for all four positions.