Search Content

Forecasting the 85281 Residential Rental Property

Description

The listing price of residential rental real estate is dependent upon property specific attributes. These attributes involve data that can be tabulated as categorical and continuous predictors. The forecasting model presented in this paper is developed using publicly available, property specific information sourced from the Zillow and Trulia online real…

The listing price of residential rental real estate is dependent upon property specific attributes. These attributes involve data that can be tabulated as categorical and continuous predictors. The forecasting model presented in this paper is developed using publicly available, property specific information sourced from the Zillow and Trulia online real estate databases. The following fifteen predictors were tracked for forty-eight rental listings in the 85281 area code: housing type, square footage, number of baths, number of bedrooms, distance to Arizona State University’s Tempe Campus, crime level of the neighborhood, median age range of the neighborhood population, percentage of the neighborhood population that is married, median year of construction of the neighborhood, percentage of the population commuting longer than thirty minutes, percentage of neighborhood homes occupied by renters, percentage of the population commuting by transit, and the number of restaurants, grocery stores, and nightlife within a one mile radius of the property. Through regression analysis, the significant predictors of the listing price of a rental property in the 85281 area code were discerned. These predictors were used to form a forecasting model. This forecasting model explains 75.5% of the variation in listing prices of residential rental real estate in the 85281 area code.

ContributorsSchuchter, Grant (Author) / Clough, Michael (Thesis director) / Escobedo, Adolfo (Committee member) / Industrial, Systems & Operations Engineering Prgm (Contributor) / Barrett, The Honors College (Contributor)

Created2019-05

An improved mathematical formulation for the carbon capture and storage (CCS) problem

Description

Carbon Capture and Storage (CCS) is a climate stabilization strategy that prevents CO2 emissions from entering the atmosphere. Despite its benefits, impactful CCS projects require large investments in infrastructure, which could deter governments from implementing this strategy. In this sense, the development of innovative tools to support large-scale cost-efficient CCS…

Carbon Capture and Storage (CCS) is a climate stabilization strategy that prevents CO2 emissions from entering the atmosphere. Despite its benefits, impactful CCS projects require large investments in infrastructure, which could deter governments from implementing this strategy. In this sense, the development of innovative tools to support large-scale cost-efficient CCS deployment decisions is critical for climate change mitigation. This thesis proposes an improved mathematical formulation for the scalable infrastructure model for CCS (SimCCS), whose main objective is to design a minimum-cost pipe network to capture, transport, and store a target amount of CO2. Model decisions include source, reservoir, and pipe selection, as well as CO2 amounts to capture, store, and transport. By studying the SimCCS optimal solution and the subjacent network topology, new valid inequalities (VI) are proposed to strengthen the existing mathematical formulation. These constraints seek to improve the quality of the linear relaxation solutions in the branch and bound algorithm used to solve SimCCS. Each VI is explained with its intuitive description, mathematical structure and examples of resulting improvements. Further, all VIs are validated by assessing the impact of their elimination from the new formulation. The validated new formulation solves the 72-nodes Alberta problem up to 7 times faster than the original model. The upgraded model reduces the computation time required to solve SimCCS in 72% of randomly generated test instances, solving SimCCS up to 200 times faster. These formulations can be tested and then applied to enhance variants of the SimCCS and general fixed-charge network flow problems. Finally, an experience from testing a Benders decomposition approach for SimCCS is discussed and future scope of probable efficient solution-methods is outlined.

ContributorsLobo, Loy Joseph (Author) / Sefair, Jorge A (Thesis advisor) / Escobedo, Adolfo (Committee member) / Kuby, Michael (Committee member) / Middleton, Richard (Committee member) / Arizona State University (Publisher)

Created2017

System-level Models for Network Monitoring and Change Detection

Description

Monitoring a system for deviations from standard or reference behavior is essential for many data-driven tasks. Whether it is monitoring sensor data or the interactions between system elements, such as edges in a path or transactions in a network, the goal is to detect significant changes from a reference. As…

Monitoring a system for deviations from standard or reference behavior is essential for many data-driven tasks. Whether it is monitoring sensor data or the interactions between system elements, such as edges in a path or transactions in a network, the goal is to detect significant changes from a reference. As technological advancements allow for more data to be collected from systems, monitoring approaches should evolve to accommodate the greater collection of high-dimensional data and complex system settings. This dissertation introduces system-level models for monitoring tasks characterized by changes in a subset of system components, utilizing component-level information and relationships. A change may only affect a portion of the data or system (partial change). The first three parts of this dissertation present applications and methods for detecting partial changes. The first part introduces a methodology for partial change detection in a simple, univariate setting. Changes are detected with posterior probabilities and statistical mixture models which allow only a fraction of data to change. The second and third parts of this dissertation center around monitoring more complex multivariate systems modeled through networks. The goal is to detect partial changes in the underlying network attributes and topology. The contributions of the second and third parts are two non-parametric system-level monitoring techniques that consider relationships between network elements. The algorithm Supervised Network Monitoring (SNetM) leverages Graph Neural Networks and transforms the problem into supervised learning. The other algorithm Supervised Network Monitoring for Partial Temporal Inhomogeneity (SNetMP) generates a network embedding, and then transforms the problem to supervised learning. At the end, both SNetM and SNetMP construct measures and transform them to pseudo-probabilities to be monitored for changes. The last topic addresses predicting and monitoring system-level delays on paths in a transportation/delivery system. For each item, the risk of delay is quantified. Machine learning is used to build a system-level model for delay risk, given the information available (such as environmental conditions) on the edges of a path, which integrates edge models. The outputs can then be used in a system-wide monitoring framework, and items most at risk are identified for potential corrective actions.

ContributorsKasaei Roodsari, Maziar (Author) / Runger, George (Thesis advisor) / Escobedo, Adolfo (Committee member) / Pan, Rong (Committee member) / Shinde, Amit (Committee member) / Arizona State University (Publisher)

Created2021

Advances at the Interface of Combinatorial Optimization and Computations Social Choice: Mathematical Formulations, Structural Decompositions, and Analytical Insights

Description

The rank aggregation problem has ubiquitous applications in operations research, artificial intelligence, computational social choice, and various other fields. Generally, rank aggregation is utilized whenever a set of judges (human or non-human) express their preferences over a set of items, and it is necessary to find a consensus ranking that…

The rank aggregation problem has ubiquitous applications in operations research, artificial intelligence, computational social choice, and various other fields. Generally, rank aggregation is utilized whenever a set of judges (human or non-human) express their preferences over a set of items, and it is necessary to find a consensus ranking that best represents these preferences collectively. Many real-world instances of this problem involve a very large number of items, include ties, and/or contain partial information, which brings a challenge to decision-makers. This work makes several contributions to overcoming these challenges. Most attention on this problem has focused on an NP-hard distance-based variant known as Kemeny aggregation, for which solution approaches with provable guarantees that can handle difficult large-scale instances remain elusive. Firstly, this work introduces exact and approximate methodologies inspired by the social choice foundations of the problem, namely the Condorcet criterion, to decompose the problem. To deal with instances where exact partitioning does not yield many subsets, it proposes Approximate Condorcet Partitioning, which is a scalable solution technique capable of handling large-scale instances while providing provable guarantees. Secondly, this work delves into the rank aggregation problem under the generalized Kendall-tau distance, which contains Kemeny aggregation as a special case. This new problem provides a robust and highly-flexible framework for handling ties. First, it derives exact and heuristic solution methods for the generalized problem. Second, it introduces a novel social choice property that encloses existing variations of the Condorcet criterion as special cases. Thirdly, this work focuses on top-k list aggregation. Top-k lists are a special form of item orderings wherein out of n total items only a small number of them, k, are explicitly ordered. Top-k lists are being increasingly utilized in various fields including recommendation systems, information retrieval, and machine learning. This work introduces exact and inexact methods for consolidating a collection of heterogeneous top- lists. Furthermore, the strength of the proposed exact formulations is analyzed from a polyhedral point of view. Finally, this work identifies the top-100 U.S. universities by consolidating four prominent university rankings to assess the computational implications of this problem.

ContributorsAkbari, Sina (Author) / Escobedo, Adolfo (Thesis advisor) / Byeon, Geunyeong (Committee member) / Sefair, Jorge (Committee member) / Wu, Shin-Yi (Committee member) / Arizona State University (Publisher)

Created2022

Global Optimization Using Piecewise Linear Approximation

Description

Global optimization (programming) has been attracting the attention of researchers for almost a century. Since linear programming (LP) and mixed integer linear programming (MILP) had been well studied in early stages, MILP methods and software tools had improved in their efficiency in the past few years. They are now fast…

Global optimization (programming) has been attracting the attention of researchers for almost a century. Since linear programming (LP) and mixed integer linear programming (MILP) had been well studied in early stages, MILP methods and software tools had improved in their efficiency in the past few years. They are now fast and robust even for problems with millions of variables. Therefore, it is desirable to use MILP software to solve mixed integer nonlinear programming (MINLP) problems. For an MINLP problem to be solved by an MILP solver, its nonlinear functions must be transformed to linear ones. The most common method to do the transformation is the piecewise linear approximation (PLA). This dissertation will summarize the types of optimization and the most important tools and methods, and will discuss in depth the PLA tool. PLA will be done using nonuniform partitioning of the domain of the variables involved in the function that will be approximated. Also partial PLA models that approximate only parts of a complicated optimization problem will be introduced. Computational experiments will be done and the results will show that nonuniform partitioning and partial PLA can be beneficial.

ContributorsAlkhalifa, Loay (Author) / Mittelmann, Hans (Thesis advisor) / Armbruster, Hans (Committee member) / Escobedo, Adolfo (Committee member) / Renaut, Rosemary (Committee member) / Sefair, Jorge (Committee member) / Arizona State University (Publisher)

Created2020

Active Learning with Explore and Exploit Equilibriums

Description

In conventional supervised learning tasks, information retrieval from extensive collections of data happens automatically at low cost, whereas in many real-world problems obtaining labeled data can be hard, time-consuming, and expensive. Consider healthcare systems, for example, where unlabeled medical images are abundant while labeling requires a considerable amount of knowledge…

In conventional supervised learning tasks, information retrieval from extensive collections of data happens automatically at low cost, whereas in many real-world problems obtaining labeled data can be hard, time-consuming, and expensive. Consider healthcare systems, for example, where unlabeled medical images are abundant while labeling requires a considerable amount of knowledge from experienced physicians. Active learning addresses this challenge with an iterative process to select instances from the unlabeled data to annotate and improve the supervised learner. At each step, the query of examples to be labeled can be considered as a dilemma between exploitation of the supervised learner's current knowledge and exploration of the unlabeled input features.

Motivated by the need for efficient active learning strategies, this dissertation proposes new algorithms for batch-mode, pool-based active learning. The research considers the following questions: how can unsupervised knowledge of the input features (exploration) improve learning when incorporated with supervised learning (exploitation)? How to characterize exploration in active learning when data is high-dimensional? Finally, how to adaptively make a balance between exploration and exploitation?

The first contribution proposes a new active learning algorithm, Cluster-based Stochastic Query-by-Forest (CSQBF), which provides a batch-mode strategy that accelerates learning with added value from exploration and improved exploitation scores. CSQBF balances exploration and exploitation using a probabilistic scoring criterion based on classification probabilities from a tree-based ensemble model within each data cluster.

The second contribution introduces two more query strategies, Double Margin Active Learning (DMAL) and Cluster Agnostic Active Learning (CAAL), that combine consistent exploration and exploitation modules into a coherent and unified measure for label query. Instead of assuming a fixed clustering structure, CAAL and DMAL adopt a soft-clustering strategy which provides a new approach to formalize exploration in active learning.

The third contribution addresses the challenge of dynamically making a balance between exploration and exploitation criteria throughout the active learning process. Two adaptive algorithms are proposed based on feedback-driven bandit optimization frameworks that elegantly handle this issue by learning the relationship between exploration-exploitation trade-off and an active learner's performance.

ContributorsShams, Ghazal (Author) / Runger, George C. (Thesis advisor) / Montgomery, Douglas C. (Committee member) / Escobedo, Adolfo (Committee member) / Pedrielli, Giulia (Committee member) / Arizona State University (Publisher)

Created2020

Cognitive Computing for Decision Support

Description

The Cognitive Decision Support (CDS) model is proposed. The model is widely applicable and scales to realistic, complex decision problems based on adaptive learning. The utility of a decision is discussed and four types of decisions associated with CDS model are identified. The CDS model is designed to learn decision…

The Cognitive Decision Support (CDS) model is proposed. The model is widely applicable and scales to realistic, complex decision problems based on adaptive learning. The utility of a decision is discussed and four types of decisions associated with CDS model are identified. The CDS model is designed to learn decision utilities. Data enrichment is introduced to promote the effectiveness of learning. Grouping is introduced for large-scale decision learning. Introspection and adjustment are presented for adaptive learning. Triage recommendation is incorporated to indicate the trustworthiness of suggested decisions.

The CDS model and methodologies are integrated into an architecture using concepts from cognitive computing. The proposed architecture is implemented with an example use case to inventory management.

Reinforcement learning (RL) is discussed as an alternative, generalized adaptive learning engine for the CDS system to handle the complexity of many problems with unknown environments. An adaptive state dimension with context that can increase with newly available information is discussed. Several enhanced components for RL which are critical for complex use cases are integrated. Deep Q networks are embedded with the adaptive learning methodologies and applied to an example supply chain management problem on capacity planning.

A new approach using Ito stochastic processes is proposed as a more generalized method to generate non-stationary demands in various patterns that can be used in decision problems. The proposed method generates demands with varying non-stationary patterns, including trend, cyclical, seasonal, and irregular patterns. Conventional approaches are identified as special cases of the proposed method. Demands are illustrated in realistic settings for various decision models. Various statistical criteria are applied to filter the generated demands. The method is applied to a real-world example.

ContributorsKee, Seho (Author) / Runger, George C. (Thesis advisor) / Escobedo, Adolfo (Committee member) / Gel, Esma (Committee member) / Janakiram, Mani (Committee member) / Rogers, Dale (Committee member) / Arizona State University (Publisher)

Created2020

Structural Decomposition Methods for Sparse Large-Scale Optimization

Description

This dissertation focuses on three large-scale optimization problems and devising algorithms to solve them. In addition to the societal impact of each problem’s solution, this dissertation contributes to the optimization literature a set of decomposition algorithms for problems whose optimal solution is sparse. These algorithms exploit problem-specific properties and use…

This dissertation focuses on three large-scale optimization problems and devising algorithms to solve them. In addition to the societal impact of each problem’s solution, this dissertation contributes to the optimization literature a set of decomposition algorithms for problems whose optimal solution is sparse. These algorithms exploit problem-specific properties and use tailored strategies based on iterative refinement (outer-approximations). The proposed algorithms are not rooted in duality theory, providing an alternative to existing methods based on linear programming relaxations. However, it is possible to embed existing decomposition methods into the proposed framework. These general decomposition principles extend to other combinatorial optimization problems.

The first problem is a route assignment and scheduling problem in which a set of vehicles need to traverse a directed network while maintaining a minimum inter-vehicle distance at any time. This problem is inspired by applications in hazmat logistics and the coordination of autonomous agents. The proposed approach includes realistic features such as continuous-time vehicle scheduling, heterogeneous speeds, minimum and maximum waiting times at any node, among others.

The second problem is a fixed-charge network design, which aims to find a minimum-cost plan to transport a target amount of a commodity between known origins and destinations. In addition to the typical flow decisions, the model chooses the capacity of each arc and selects sources and sinks. The proposed algorithms admit any nondecreasing piecewise linear cost structure. This model is applied to the Carbon Capture and Storage (CCS) problem, which is to design a minimum-cost pipeline network to transport CO2 between industrial sources and geologic reservoirs for long-term storage.

The third problem extends the proposed decomposition framework to a special case of joint chance constraint programming with independent random variables. This model is applied to the probabilistic transportation problem, where demands are assumed stochastic and independent. Using an empirical probability distribution, this problem is formulated as an integer program with the goal of finding a minimum-cost distribution plan that satisfies all the demands with a minimum given probability. The proposed scalable algorithm is based on a concave envelop approximation of the empirical probability function, which is iteratively refined as needed.

ContributorsMatin Moghaddam, Navid (Author) / Sefair, Jorge (Thesis advisor) / Mirchandani, Pitu (Committee member) / Escobedo, Adolfo (Committee member) / Grubesic, Anthony (Committee member) / Arizona State University (Publisher)

Created2020

Product Allocation and Capacity Planning Considering Product-Specific Flexibility for Automobile Manufacturing

Description

To maintain long term success, a manufacturing company should be managed and operated under the guidance of properly designed capacity, production and logistics plans that are formulated in coordination with its manufacturing footprint, so that its managerial goals on both strategic and tactical levels can be fulfilled. In particular, sufficient…

To maintain long term success, a manufacturing company should be managed and operated under the guidance of properly designed capacity, production and logistics plans that are formulated in coordination with its manufacturing footprint, so that its managerial goals on both strategic and tactical levels can be fulfilled. In particular, sufficient flexibility and efficiency should be ensured so that future customer demand can be met at a profit. This dissertation is motivated by an automobile manufacturer's mid-term and long-term decision problems, but applies to any multi-plant, multi-product manufacturer with evolving product portfolios and significant fixed and variable production costs. Via introducing the concepts of effective capacity and product-specific flexibility, two mixed integer programming (MIP) models are proposed to help manufacturers shape their mid-term capacity plans and long-term product allocation plans. With fixed tooling flexibility, production and logistics considerations are integrated into a mid-term capacity planning model to develop well-informed and balanced tactical plans, which utilize various capacity adjustment options to coordinate production, inventory, and shipping schedules throughout the planning horizon so that overall operational and capacity adjustment costs are minimized. For long-term product allocation planning, strategic tooling configuration plans that empower the production of multi-generation products at minimal configuration and operational costs are established for all plants throughout the planning horizon considering product-specific commonality and compatibility. New product introductions and demand uncertainty over the planning horizon are incorporated. As a result, potential production sites for each product and corresponding process flexibility are determined. An efficient heuristic method is developed and shown to perform well in solution quality and computational requirements.

ContributorsYao, Xufeng (Author) / Askin, Ronald (Thesis advisor) / Sefair, Jorge (Thesis advisor) / Escobedo, Adolfo (Committee member) / Yan, Hao (Committee member) / Arizona State University (Publisher)

Created2021

Machine Learning Models for High-Dimensional Matched Data

Description

Matching or stratification is commonly used in observational studies to remove bias due to confounding variables. Analyzing matched data sets requires specific methods which handle dependency among observations within a stratum. Also, modern studies often include hundreds or thousands of variables. Traditional methods for matched data sets are challenged in…

Matching or stratification is commonly used in observational studies to remove bias due to confounding variables. Analyzing matched data sets requires specific methods which handle dependency among observations within a stratum. Also, modern studies often include hundreds or thousands of variables. Traditional methods for matched data sets are challenged in high-dimensional settings, mixed type variables (numerical and categorical), nonlinear andinteraction effects. Furthermore, machine learning research for such structured data is quite limited. This dissertation addresses this important gap and proposes machine learning models for identifying informative variables from high-dimensional matched data sets. The first part of this dissertation proposes a machine learning model to identify informative variables from high-dimensional matched case-control data sets. The outcome of interest in this study design is binary (case or control), and each stratum is assumed to have one unit from each outcome level. The proposed method which is referred to as Matched Forest (MF) is effective for large number of variables and identifying interaction effects. The second part of this dissertation proposes three enhancements of MF algorithm. First, a regularization framework is proposed to improve variable selection performance in excessively high-dimensional settings. Second, a classification method is proposed to classify unlabeled pairs of data. Third, two metrics are proposed to estimate the effects of important variables identified by MF. The third part proposes a machine learning model based on Neural Networks to identify important variables from a more generalized matched case-control data set where each stratum has one unit from case outcome level and more than one unit from control outcome level. This method which is referred to as Matched Neural Network (MNN) performs better than current algorithms to identify variables with interaction effects. Lastly, a generalized machine learning model is proposed to identify informative variables from high-dimensional matched data sets where the outcome has more than two levels. This method outperforms existing algorithms in the literature in identifying variables with complex nonlinear and interaction effects.

ContributorsShomal Zadeh, Nooshin (Author) / Runger, George (Thesis advisor) / Montgomery, Douglas (Committee member) / Shinde, Shilpa (Committee member) / Escobedo, Adolfo (Committee member) / Arizona State University (Publisher)

Created2021

Filtering by