Matching Items (8)
Filtering by

Clear all filters

151517-Thumbnail Image.png
Description
Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.
ContributorsBhattacharya, Indrani (Author) / Sen, Arunabha (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)
Created2013
153442-Thumbnail Image.png
Description
It has been identified in the literature that there exists a link between the built environment and non-motorized transport. This study aims to contribute to existing literature on the effects of the built environment on cycling, examining the case of the whole State of California. Physical built environment features are

It has been identified in the literature that there exists a link between the built environment and non-motorized transport. This study aims to contribute to existing literature on the effects of the built environment on cycling, examining the case of the whole State of California. Physical built environment features are classified into six groups as: 1) local density, 2) diversity of land use, 3) road connectivity, 4) bike route length, 5) green space, 6) job accessibility. Cycling trips in one week for all children, school children, adults and employed-adults are investigated separately. The regression analysis shows that cycling trips is significantly associated with some features of built environment when many socio-demographic factors are taken into account. Street intersections, bike route length tend to increase the use of bicycle. These effects are well-aligned with literature. Moreover, both local and regional job accessibility variables are statistically significant in two adults' models. However, residential density always has a significant negatively effect on cycling trips, which is still need further research to confirm. Also, there is a gap in literature on how green space affects cycling, but the results of this study is still too unclear to make it up. By elasticity analysis, this study concludes that street intersections is the most powerful predictor on cycling trips. From another perspective, the effects of built environment on cycling at workplace (or school) are distinguished from at home. This study implies that a wide range of measures are available for planners to control vehicle travel by improving cycling-level in California.
ContributorsWang, Kailai, M.U.E.P (Author) / Salon, Deborah (Thesis advisor) / Rey, Sergio (Committee member) / Li, Wenwen (Committee member) / Arizona State University (Publisher)
Created2015
153018-Thumbnail Image.png
Description
Urban scaling analysis has introduced a new scientific paradigm to the study of cities. With it, the notions of size, heterogeneity and structure have taken a leading role. These notions are assumed to be behind the causes for why cities differ from one another, sometimes wildly. However, the mechanisms by

Urban scaling analysis has introduced a new scientific paradigm to the study of cities. With it, the notions of size, heterogeneity and structure have taken a leading role. These notions are assumed to be behind the causes for why cities differ from one another, sometimes wildly. However, the mechanisms by which size, heterogeneity and structure shape the general statistical patterns that describe urban economic output are still unclear. Given the rapid rate of urbanization around the globe, we need precise and formal mathematical understandings of these matters. In this context, I perform in this dissertation probabilistic, distributional and computational explorations of (i) how the broadness, or narrowness, of the distribution of individual productivities within cities determines what and how we measure urban systemic output, (ii) how urban scaling may be expressed as a statistical statement when urban metrics display strong stochasticity, (iii) how the processes of aggregation constrain the variability of total urban output, and (iv) how the structure of urban skills diversification within cities induces a multiplicative process in the production of urban output.
ContributorsGómez-Liévano, Andrés (Author) / Lobo, Jose (Thesis advisor) / Muneepeerakul, Rachata (Thesis advisor) / Bettencourt, Luis M. A. (Committee member) / Chowell-Puente, Gerardo (Committee member) / Arizona State University (Publisher)
Created2014
150618-Thumbnail Image.png
Description
Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate

Coarsely grouped counts or frequencies are commonly used in the behavioral sciences. Grouped count and grouped frequency (GCGF) that are used as outcome variables often violate the assumptions of linear regression as well as models designed for categorical outcomes; there is no analytic model that is designed specifically to accommodate GCGF outcomes. The purpose of this dissertation was to compare the statistical performance of four regression models (linear regression, Poisson regression, ordinal logistic regression, and beta regression) that can be used when the outcome is a GCGF variable. A simulation study was used to determine the power, type I error, and confidence interval (CI) coverage rates for these models under different conditions. Mean structure, variance structure, effect size, continuous or binary predictor, and sample size were included in the factorial design. Mean structures reflected either a linear relationship or an exponential relationship between the predictor and the outcome. Variance structures reflected homoscedastic (as in linear regression), heteroscedastic (monotonically increasing) or heteroscedastic (increasing then decreasing) variance. Small to medium, large, and very large effect sizes were examined. Sample sizes were 100, 200, 500, and 1000. Results of the simulation study showed that ordinal logistic regression produced type I error, statistical power, and CI coverage rates that were consistently within acceptable limits. Linear regression produced type I error and statistical power that were within acceptable limits, but CI coverage was too low for several conditions important to the analysis of counts and frequencies. Poisson regression and beta regression displayed inflated type I error, low statistical power, and low CI coverage rates for nearly all conditions. All models produced unbiased estimates of the regression coefficient. Based on the statistical performance of the four models, ordinal logistic regression seems to be the preferred method for analyzing GCGF outcomes. Linear regression also performed well, but CI coverage was too low for conditions with an exponential mean structure and/or heteroscedastic variance. Some aspects of model prediction, such as model fit, were not assessed here; more research is necessary to determine which statistical model best captures the unique properties of GCGF outcomes.
ContributorsCoxe, Stefany (Author) / Aiken, Leona S. (Thesis advisor) / West, Stephen G. (Thesis advisor) / Mackinnon, David P (Committee member) / Reiser, Mark R. (Committee member) / Arizona State University (Publisher)
Created2012
150996-Thumbnail Image.png
Description
A least total area of triangle method was proposed by Teissier (1948) for fitting a straight line to data from a pair of variables without treating either variable as the dependent variable while allowing each of the variables to have measurement errors. This method is commonly called Reduced Major Axis

A least total area of triangle method was proposed by Teissier (1948) for fitting a straight line to data from a pair of variables without treating either variable as the dependent variable while allowing each of the variables to have measurement errors. This method is commonly called Reduced Major Axis (RMA) regression and is often used instead of Ordinary Least Squares (OLS) regression. Results for confidence intervals, hypothesis testing and asymptotic distributions of coefficient estimates in the bivariate case are reviewed. A generalization of RMA to more than two variables for fitting a plane to data is obtained by minimizing the sum of a function of the volumes obtained by drawing, from each data point, lines parallel to each coordinate axis to the fitted plane (Draper and Yang 1997; Goodman and Tofallis 2003). Generalized RMA results for the multivariate case obtained by Draper and Yang (1997) are reviewed and some investigations of multivariate RMA are given. A linear model is proposed that does not specify a dependent variable and allows for errors in the measurement of each variable. Coefficients in the model are estimated by minimization of the function of the volumes previously mentioned. Methods for obtaining coefficient estimates are discussed and simulations are used to investigate the distribution of coefficient estimates. The effects of sample size, sampling error and correlation among variables on the estimates are studied. Bootstrap methods are used to obtain confidence intervals for model coefficients. Residual analysis is considered for assessing model assumptions. Outlier and influential case diagnostics are developed and a forward selection method is proposed for subset selection of model variables. A real data example is provided that uses the methods developed. Topics for further research are discussed.
ContributorsLi, Jingjin (Author) / Young, Dennis (Thesis advisor) / Eubank, Randall (Thesis advisor) / Reiser, Mark R. (Committee member) / Kao, Ming-Hung (Committee member) / Yang, Yan (Committee member) / Arizona State University (Publisher)
Created2012
157274-Thumbnail Image.png
Description
Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model

that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called Accelerated Bayesian Additive Regression Trees (XBART). The study consists of simulation and real data experiments comparing XBART to other leading

Bayesian Additive Regression Trees (BART) is a non-parametric Bayesian model

that often outperforms other popular predictive models in terms of out-of-sample error. This thesis studies a modified version of BART called Accelerated Bayesian Additive Regression Trees (XBART). The study consists of simulation and real data experiments comparing XBART to other leading algorithms, including BART. The results show that XBART maintains BART’s predictive power while reducing its computation time. The thesis also describes the development of a Python package implementing XBART.
ContributorsYalov, Saar (Author) / Hahn, P. Richard (Thesis advisor) / McCulloch, Robert (Committee member) / Kao, Ming-Hung (Committee member) / Arizona State University (Publisher)
Created2019
155855-Thumbnail Image.png
Description
Time-to-event analysis or equivalently, survival analysis deals with two variables simultaneously: when (time information) an event occurs and whether an event occurrence is observed or not during the observation period (censoring information). In behavioral and social sciences, the event of interest usually does not lead to a terminal state

Time-to-event analysis or equivalently, survival analysis deals with two variables simultaneously: when (time information) an event occurs and whether an event occurrence is observed or not during the observation period (censoring information). In behavioral and social sciences, the event of interest usually does not lead to a terminal state such as death. Other outcomes after the event can be collected and thus, the survival variable can be considered as a predictor as well as an outcome in a study. One example of a case where the survival variable serves as a predictor as well as an outcome is a survival-mediator model. In a single survival-mediator model an independent variable, X predicts a survival variable, M which in turn, predicts a continuous outcome, Y. The survival-mediator model consists of two regression equations: X predicting M (M-regression), and M and X simultaneously predicting Y (Y-regression). To estimate the regression coefficients of the survival-mediator model, Cox regression is used for the M-regression. Ordinary least squares regression is used for the Y-regression using complete case analysis assuming censored data in M are missing completely at random so that the Y-regression is unbiased. In this dissertation research, different measures for the indirect effect were proposed and a simulation study was conducted to compare performance of different indirect effect test methods. Bias-corrected bootstrapping produced high Type I error rates as well as low parameter coverage rates in some conditions. In contrast, the Sobel test produced low Type I error rates as well as high parameter coverage rates in some conditions. The bootstrap of the natural indirect effect produced low Type I error and low statistical power when the censoring proportion was non-zero. Percentile bootstrapping, distribution of the product and the joint-significance test showed best performance. Statistical analysis of the survival-mediator model is discussed. Two indirect effect measures, the ab-product and the natural indirect effect are compared and discussed. Limitations and future directions of the simulation study are discussed. Last, interpretation of the survival-mediator model for a made-up empirical data set is provided to clarify the meaning of the quantities in the survival-mediator model.
ContributorsKim, Han Joe (Author) / Mackinnon, David P. (Thesis advisor) / Tein, Jenn-Yun (Thesis advisor) / West, Stephen G. (Committee member) / Grimm, Kevin J. (Committee member) / Arizona State University (Publisher)
Created2017
187449-Thumbnail Image.png
Description城投债是地方政府投融资平台作为发行主体发行的债券,所融资金多被投入地方政府基础设施建设或者公益性项目,拥有地方政府信用的隐性担保。城投债在一定程度上缓解了地方政府在城市发展过程中资金的短缺问题,在我国城市化进程,促进当地经济发展,引导产业转型升级等方面做出了重大贡献。 随着城投债不断发展,代表城投债信用风险的主要考量点-城投债信用利差愈发备受关注。因为无论是城投债的承销机构,还是城投债的投资机构,包括涉及到城投债风险管控的政策制定部门,都会关注到城投债信用利差,那么影响城投债信用利差的影响因素有哪些呢,这些影响因素有哪些是对城投债信用利差有显著影响呢。 本文首先对城投债相关理论概念,包括政府投融资平台、城投债概念以及相关文献综述做了介绍;并指出了之前研究的一些不足之处等问题。同时对城投债的发展概况做了简要描述并进行了相关统计;其次针对影响城投债信用风险的相关因素进行了详细的分析,主要包括宏观经济因素分析、地方政府影响因素分析、发债主体影响因素分析和债项自身影响因素分析;通过分析每一种影响因素的具体情况,假设相关因素与信用利差的关系。然后再提取二手数据通过实证验证回归分析的方法分别验证假设是否成立,找出影响城投债信用风险的主要共同影响因素,同时得出影响最为强烈的几种因素。最后根据上述分析得出的相关结论, 提出防范与降低城投债信用风险的对策和建议。 该研究一方面引导市场正视城投债信用利差的各种因素,明确我们平时认为的影响因素和理论研究得出的影响因素是否一致;继而找到影响城投债信用利差的关键因素,供城 投债承销机构及投资机构做参考,同时提示城投债风险防范应重点关注的核心问题,为防范和降低城投债风险提供重要参考。
ContributorsLi, Juhui (Author) / Gu, Bin (Thesis advisor) / Liang, Bing (Thesis advisor) / Wang, Tan (Committee member) / Arizona State University (Publisher)
Created2019