Search Content

Novel statistical models for complex data structures

Description

Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these…

Rapid advance in sensor and information technology has resulted in both spatially and temporally data-rich environment, which creates a pressing need for us to develop novel statistical methods and the associated computational tools to extract intelligent knowledge and informative patterns from these massive datasets. The statistical challenges for addressing these massive datasets lay in their complex structures, such as high-dimensionality, hierarchy, multi-modality, heterogeneity and data uncertainty. Besides the statistical challenges, the associated computational approaches are also considered essential in achieving efficiency, effectiveness, as well as the numerical stability in practice. On the other hand, some recent developments in statistics and machine learning, such as sparse learning, transfer learning, and some traditional methodologies which still hold potential, such as multi-level models, all shed lights on addressing these complex datasets in a statistically powerful and computationally efficient way. In this dissertation, we identify four kinds of general complex datasets, including "high-dimensional datasets", "hierarchically-structured datasets", "multimodality datasets" and "data uncertainties", which are ubiquitous in many domains, such as biology, medicine, neuroscience, health care delivery, manufacturing, etc. We depict the development of novel statistical models to analyze complex datasets which fall under these four categories, and we show how these models can be applied to some real-world applications, such as Alzheimer's disease research, nursing care process, and manufacturing.

ContributorsHuang, Shuai (Author) / Li, Jing (Thesis advisor) / Askin, Ronald (Committee member) / Ye, Jieping (Committee member) / Runger, George C. (Committee member) / Arizona State University (Publisher)

Created2012

Modeling time series data for supervised learning

Description

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of…

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of the relevant patterns This dissertation proposes TS representations and methods for supervised TS analysis. The approaches combine new representations that handle translations and dilations of patterns with bag-of-features strategies and tree-based ensemble learning. This provides flexibility in handling time-warped patterns in a computationally efficient way. The ensemble learners provide a classification framework that can handle high-dimensional feature spaces, multiple classes and interaction between features. The proposed representations are useful for classification and interpretation of the TS data of varying complexity. The first contribution handles the problem of time warping with a feature-based approach. An interval selection and local feature extraction strategy is proposed to learn a bag-of-features representation. This is distinctly different from common similarity-based time warping. This allows for additional features (such as pattern location) to be easily integrated into the models. The learners have the capability to account for the temporal information through the recursive partitioning method. The second contribution focuses on the comprehensibility of the models. A new representation is integrated with local feature importance measures from tree-based ensembles, to diagnose and interpret time intervals that are important to the model. Multivariate time series (MTS) are especially challenging because the input consists of a collection of TS and both features within TS and interactions between TS can be important to models. Another contribution uses a different representation to produce computationally efficient strategies that learn a symbolic representation for MTS. Relationships between the multiple TS, nominal and missing values are handled with tree-based learners. Applications such as speech recognition, medical diagnosis and gesture recognition are used to illustrate the methods. Experimental results show that the TS representations and methods provide better results than competitive methods on a comprehensive collection of benchmark datasets. Moreover, the proposed approaches naturally provide solutions to similarity analysis, predictive pattern discovery and feature selection.

ContributorsBaydogan, Mustafa Gokce (Author) / Runger, George C. (Thesis advisor) / Atkinson, Robert (Committee member) / Gel, Esma (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2012

Dynamic management of inspection effort allocation in an international port of entry (POE)

Description

Every year, more than 11 million maritime containers and 11 million commercial trucks arrive to the United States, carrying all types of imported goods. As it would be costly to inspect every container, only a fraction of them are inspected before being allowed to proceed into the United States. This…

Every year, more than 11 million maritime containers and 11 million commercial trucks arrive to the United States, carrying all types of imported goods. As it would be costly to inspect every container, only a fraction of them are inspected before being allowed to proceed into the United States. This dissertation proposes a decision support system that aims to allocate the scarce inspection resources at a land POE (L-POE), to minimize the different costs associated with the inspection process, including those associated with delaying the entry of legitimate imports. Given the ubiquity of sensors in all aspects of the supply chain, it is necessary to have automated decision systems that incorporate the information provided by these sensors and other possible channels into the inspection planning process. The inspection planning system proposed in this dissertation decomposes the inspection effort allocation process into two phases: Primary and detailed inspection planning. The former helps decide what to inspect, and the latter how to conduct the inspections. A multi-objective optimization (MOO) model is developed for primary inspection planning. This model tries to balance the costs of conducting inspections, direct and expected, and the waiting time of the trucks. The resulting model is exploited in two different ways: One is to construct a complete or a partial efficient frontier for the MOO model with diversity of Pareto-optimal solutions maximized; the other is to evaluate a given inspection plan and provide possible suggestions for improvement. The methodologies are described in detail and case studies provided. The case studies show that this MOO based primary planning model can effectively pick out the non-conforming trucks to inspect, while balancing the costs and waiting time.

ContributorsXue, Liangjie (Author) / Villalobos, Jesus René (Thesis advisor) / Gel, Esma (Committee member) / Runger, George C. (Committee member) / Maltz, Arnold (Committee member) / Arizona State University (Publisher)

Created2012

Engineering Lean, Packaged Energy Systems for Rapid, Economical Deployment and Distributed Generation

Description

The following document addresses two grand challenges posed to engineers: to make solar energy economically viable and to restore and improve urban infrastructure. Design solutions to these problems consist of the preliminary designs of two energy systems: a Packaged Photovoltaic (PPV) energy system and a natural gas based Modular Micro…

The following document addresses two grand challenges posed to engineers: to make solar energy economically viable and to restore and improve urban infrastructure. Design solutions to these problems consist of the preliminary designs of two energy systems: a Packaged Photovoltaic (PPV) energy system and a natural gas based Modular Micro Combined Cycle (MMCC) with 3D renderings. Defining requirements and problem-solving approach methodology for generating complex design solutions required iterative design and a thorough understanding of industry practices and market trends. This paper briefly discusses design specifics; however, the major emphasis is on aspects pertaining to economical manufacture, deployment, and subsequent suitability to address the aforementioned challenges. The selection of these systems is based on the steady reduction of PV installation costs in recent years (average among utility, commercial, and residential down 27% from Q4 2012 to Q4 2013) and the dramatic decline in natural gas prices to $5.61 per thousand cubic feet. In addition, a large number of utility scale coal-based power plants will be retired in 2014, many due to progressive emission criteria, creating a demand for additional power systems to offset the capacity loss and to increase generating capacity in order to facilitate the ever-expanding world population. The proposed energy systems are not designed to provide power to the masses through a central location. Rather, they are intended to provide economical, reliable, and high quality power to remote locations and decentralized power to community-based grids. These energy systems are designed as a means of transforming and supporting the current infrastructure through distributed electricity generation.

ContributorsSandoval, Benjamin Mark (Author) / Bryan, Harvey (Thesis director) / Fonseca, Ernesto (Committee member) / Barrett, The Honors College (Contributor) / Mechanical and Aerospace Engineering Program (Contributor)

Created2014-05

Solar Power Purchase Agreements for 10MWP Distributed Grid-tied Photovoltaic Systems at the Arizona State University Main Campus: Estimated vs. Actual Energy Output

Description

The majority of the 52 photovoltaic installations at ASU are governed by power purchase agreements (PPA) that set a fixed per kilowatt-hour rate at which ASU buys power from the system owner over the period of 15-20 years. PPAs require accurate predictions of the system output to determine the financial…

The majority of the 52 photovoltaic installations at ASU are governed by power purchase agreements (PPA) that set a fixed per kilowatt-hour rate at which ASU buys power from the system owner over the period of 15-20 years. PPAs require accurate predictions of the system output to determine the financial viability of the system installations as well as the purchase price. The research was conducted using PPAs and historical solar power production data from the ASU's Energy Information System (EIS). The results indicate that most PPAs slightly underestimate the annual energy yield. However, the modeled power output from PVsyst indicates that higher energy outputs are possible with better system monitoring.

ContributorsVulic, Natasa (Author) / Bowden, Stuart (Thesis director) / Bryan, Harvey (Committee member) / Sharma, Vivek (Committee member) / Barrett, The Honors College (Contributor) / School of Sustainability (Contributor) / Ira A. Fulton School of Engineering (Contributor)

Created2012-12

A comparison of EnergyPlus and eQUEST whole building energy simulation results for a medium sized office building

Description

With the increasing interest in energy efficient building design, whole building energy simulation programs are increasingly employed in the design process to help architects and engineers determine which design alternatives save energy and are cost effective. DOE-2 is one of the most popular programs used by the building energy simulation…

With the increasing interest in energy efficient building design, whole building energy simulation programs are increasingly employed in the design process to help architects and engineers determine which design alternatives save energy and are cost effective. DOE-2 is one of the most popular programs used by the building energy simulation community. eQUEST is a powerful graphic user interface for the DOE-2 engine. EnergyPlus is the newest generation simulation program under development by the U.S. Department of Energy which adds new modeling features beyond the DOE-2's capability. The new modeling capabilities of EnergyPlus make it possible to model new and complex building technologies which cannot be modeled by other whole building energy simulation programs. On the other hand, EnergyPlus models, especially with a large number of zones, run much slower than those of eQUEST. Both eQUEST and EnergyPlus offer their own set of advantages and disadvantages. The choice of which building simulation program should be used might vary in each case. The purpose of this thesis is to investigate the potential of both the programs to do the whole building energy analysis and compare the results with the actual building energy performance. For this purpose the energy simulation of a fully functional building is done in eQUEST and EnergyPlus and the results were compared with utility data of the building to identify the degree of closeness with which simulation results match with the actual heat and energy flows in building. It was observed in this study that eQUEST is easy to use and quick in producing results that would especially help in the taking critical decisions during the design phase. On the other hand EnergyPlus aids in modeling complex systems, producing more accurate results, but consumes more time. The choice of simulation program might change depending on the usability and applicability of the program to our need in different phases of a building's lifecycle. Therefore, it makes sense if a common front end is designed for both these simulation programs thereby allowing the user to select either the DOE-2.2 engine or the EnergyPlus engine based upon the need in each particular case.

ContributorsRallapalli, Hema Sree (Author) / Bryan, Harvey (Thesis advisor) / Addison, Marlin (Committee member) / Reddy, Agami (Committee member) / Arizona State University (Publisher)

Created2010

Public health surveillance in high-dimensions with supervised learning

Description

Public health surveillance is a special case of the general problem where counts (or rates) of events are monitored for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change…

Public health surveillance is a special case of the general problem where counts (or rates) of events are monitored for changes. Modern data complements event counts with many additional measurements (such as geographic, demographic, and others) that comprise high-dimensional covariates. This leads to an important challenge to detect a change that only occurs within a region, initially unspecified, defined by these covariates. Current methods are typically limited to spatial and/or temporal covariate information and often fail to use all the information available in modern data that can be paramount in unveiling these subtle changes. Additional complexities associated with modern health data that are often not accounted for by traditional methods include: covariates of mixed type, missing values, and high-order interactions among covariates. This work proposes a transform of public health surveillance to supervised learning, so that an appropriate learner can inherently address all the complexities described previously. At the same time, quantitative measures from the learner can be used to define signal criteria to detect changes in rates of events. A Feature Selection (FS) method is used to identify covariates that contribute to a model and to generate a signal. A measure of statistical significance is included to control false alarms. An alternative Percentile method identifies the specific cases that lead to changes using class probability estimates from tree-based ensembles. This second method is intended to be less computationally intensive and significantly simpler to implement. Finally, a third method labeled Rule-Based Feature Value Selection (RBFVS) is proposed for identifying the specific regions in high-dimensional space where the changes are occurring. Results on simulated examples are used to compare the FS method and the Percentile method. Note this work emphasizes the application of the proposed methods on public health surveillance. Nonetheless, these methods can easily be extended to a variety of applications where counts (or rates) of events are monitored for changes. Such problems commonly occur in domains such as manufacturing, economics, environmental systems, engineering, as well as in public health.

ContributorsDavila, Saylisse (Author) / Runger, George C. (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Young, Dennis (Committee member) / Gel, Esma (Committee member) / Arizona State University (Publisher)

Created2010

Building the Green Hospital: An Analysis of Construction Strategies Contributing to Building Efficiency in the Healthcare Sector

Description

Hospitals constitute 9 percent of commercial energy consumption in the U.S. annually, though they only make up 2 percent of the U.S. commercial floor space. Consuming an average of 259,000 Btu per square foot, U.S. hospitals spend about 8.3 billion dollars on energy every year. Utilizing collaborative delivery method for…

Hospitals constitute 9 percent of commercial energy consumption in the U.S. annually, though they only make up 2 percent of the U.S. commercial floor space. Consuming an average of 259,000 Btu per square foot, U.S. hospitals spend about 8.3 billion dollars on energy every year. Utilizing collaborative delivery method for hospital construction can effectively save healthcare business owners thousands of dollars while reducing construction time and resulting in a better product: a building that has fewer operational deficiencies and requires less maintenance. Healthcare systems are integrated by nature, and are rich in technical complexity to meet the needs of their various patients. In addition to being technologically and energy intensive, hospitals must meet health regulations while maintaining human comfort. The interdisciplinary nature of hospitals suggests that multiple perspectives would be valuable in optimizing the building design. Integrated project delivery provides a means to reaching the optimal design by emphasizing group collaboration and expertise of the architect, engineer, owner, builder, and hospital staff. In previous studies, IPD has proven to be particularly beneficial when it comes to highly complex projects, such as hospitals. To assess the effects of a high level of team collaboration in the delivery of a hospital, case studies were prepared on several hospitals that have been built in the past decade. The case studies each utilized some form of a collaborative delivery method, and each were successful in saving and/or redirecting time and money to other building components, achieving various certifications, recognitions, and awards, and satisfying the client. The purpose of this research is to determine key strategies in the construction of healthcare facilities that allow for quicker construction, greater monetary savings, and improved operational efficiency. This research aims to communicate the value of both "green building" and a high level of team collaboration in the hospital-building process.

ContributorsHansen, Hannah Elizabeth (Author) / Parrish, Kristen (Thesis director) / Bryan, Harvey (Committee member) / Civil, Environmental and Sustainable Engineering Programs (Contributor) / Barrett, The Honors College (Contributor)

Created2017-05

Locating counting sensors in traffic network to estimate origin-destination volumes

Description

Improving the quality of Origin-Destination (OD) demand estimates increases the effectiveness of design, evaluation and implementation of traffic planning and management systems. The associated bilevel Sensor Location Flow-Estimation problem considers two important research questions: (1) how to compute the best estimates of the flows of interest by using anticipated data…

Improving the quality of Origin-Destination (OD) demand estimates increases the effectiveness of design, evaluation and implementation of traffic planning and management systems. The associated bilevel Sensor Location Flow-Estimation problem considers two important research questions: (1) how to compute the best estimates of the flows of interest by using anticipated data from given candidate sensors location; and (2) how to decide on the optimum subset of links where sensors should be located. In this dissertation, a decision framework is developed to optimally locate and obtain high quality OD volume estimates in vehicular traffic networks. The framework includes a traffic assignment model to load the OD traffic volumes on routes in a known choice set, a sensor location model to decide on which subset of links to locate counting sensors to observe traffic volumes, and an estimation model to obtain best estimates of OD or route flow volumes. The dissertation first addresses the deterministic route flow estimation problem given apriori knowledge of route flows and their uncertainties. Two procedures are developed to locate "perfect" and "noisy" sensors respectively. Next, it addresses a stochastic route flow estimation problem. A hierarchical linear Bayesian model is developed, where the real route flows are assumed to be generated from a Multivariate Normal distribution with two parameters: "mean" and "variance-covariance matrix". The prior knowledge for the "mean" parameter is described by a probability distribution. When assuming the "variance-covariance matrix" parameter is known, a Bayesian A-optimal design is developed. When the "variance-covariance matrix" parameter is unknown, Markov Chain Monte Carlo approach is used to estimate the aposteriori quantities. In all the sensor location model the objective is the maximization of the reduction in the variances of the distribution of the estimates of the OD volume. Developed models are compared with other available models in the literature. The comparison showed that the models developed performed better than available models.

ContributorsWang, Ning (Author) / Mirchandani, Pitu (Thesis advisor) / Murray, Alan (Committee member) / Pendyala, Ram (Committee member) / Runger, George C. (Committee member) / Zhang, Muhong (Committee member) / Arizona State University (Publisher)

Created2013

Selling Sunshine: ASU Solar Energy Research 1951-1980

Description

Original exhibit panel text and an associated interview with ASU faculty Charles Backus and Harvey Bryan for the exhibit presented at the Luhrs Gallery, Hayden Library, Fall, 2013.

ContributorsSpindler, Rob (Curator) / Backus, Charles (Interviewee) / Bryan, Harvey (Interviewee)

Created2013-07-01