Matching Items (12)
151226-Thumbnail Image.png
Description
Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of the relevant patterns This dissertation proposes TS representations and methods for supervised TS analysis. The approaches combine new representations that handle translations and dilations of patterns with bag-of-features strategies and tree-based ensemble learning. This provides flexibility in handling time-warped patterns in a computationally efficient way. The ensemble learners provide a classification framework that can handle high-dimensional feature spaces, multiple classes and interaction between features. The proposed representations are useful for classification and interpretation of the TS data of varying complexity. The first contribution handles the problem of time warping with a feature-based approach. An interval selection and local feature extraction strategy is proposed to learn a bag-of-features representation. This is distinctly different from common similarity-based time warping. This allows for additional features (such as pattern location) to be easily integrated into the models. The learners have the capability to account for the temporal information through the recursive partitioning method. The second contribution focuses on the comprehensibility of the models. A new representation is integrated with local feature importance measures from tree-based ensembles, to diagnose and interpret time intervals that are important to the model. Multivariate time series (MTS) are especially challenging because the input consists of a collection of TS and both features within TS and interactions between TS can be important to models. Another contribution uses a different representation to produce computationally efficient strategies that learn a symbolic representation for MTS. Relationships between the multiple TS, nominal and missing values are handled with tree-based learners. Applications such as speech recognition, medical diagnosis and gesture recognition are used to illustrate the methods. Experimental results show that the TS representations and methods provide better results than competitive methods on a comprehensive collection of benchmark datasets. Moreover, the proposed approaches naturally provide solutions to similarity analysis, predictive pattern discovery and feature selection.
ContributorsBaydogan, Mustafa Gokce (Author) / Runger, George C. (Thesis advisor) / Atkinson, Robert (Committee member) / Gel, Esma (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)
Created2012
154174-Thumbnail Image.png
Description
The amount of time series data generated is increasing due to the integration of sensor technologies with everyday applications, such as gesture recognition, energy optimization, health care, video surveillance. The use of multiple sensors simultaneously

for capturing different aspects of the real world attributes has also led to an increase in

The amount of time series data generated is increasing due to the integration of sensor technologies with everyday applications, such as gesture recognition, energy optimization, health care, video surveillance. The use of multiple sensors simultaneously

for capturing different aspects of the real world attributes has also led to an increase in dimensionality from uni-variate to multi-variate time series. This has facilitated richer data representation but also has necessitated algorithms determining similarity between two multi-variate time series for search and analysis.

Various algorithms have been extended from uni-variate to multi-variate case, such as multi-variate versions of Euclidean distance, edit distance, dynamic time warping. However, it has not been studied how these algorithms account for asynchronous in time series. Human gestures, for example, exhibit asynchrony in their patterns as different subjects perform the same gesture with varying movements in their patterns at different speeds. In this thesis, we propose several algorithms (some of which also leverage metadata describing the relationships among the variates). In particular, we present several techniques that leverage the contextual relationships among the variates when measuring multi-variate time series similarities. Based on the way correlation is leveraged, various weighing mechanisms have been proposed that determine the importance of a dimension for discriminating between the time series as giving the same weight to each dimension can led to misclassification. We next study the robustness of the considered techniques against different temporal asynchronies, including shifts and stretching.

Exhaustive experiments were carried on datasets with multiple types and amounts of temporal asynchronies. It has been observed that accuracy of algorithms that rely on data to discover variate relationships can be low under the presence of temporal asynchrony, whereas in case of algorithms that rely on external metadata, robustness against asynchronous distortions tends to be stronger. Specifically, algorithms using external metadata have better classification accuracy and cluster separation than existing state-of-the-art work, such as EROS, PCA, and naive dynamic time warping.
ContributorsGarg, Yash (Author) / Candan, Kasim Selcuk (Thesis advisor) / Chowell-Punete, Gerardo (Committee member) / Tong, Hanghang (Committee member) / Davulcu, Hasan (Committee member) / Sapino, Maria Luisa (Committee member) / Arizona State University (Publisher)
Created2015
155984-Thumbnail Image.png
Description
Predicting resistant prostate cancer is critical for lowering medical costs and improving the quality of life of advanced prostate cancer patients. I formulate, compare, and analyze two mathematical models that aim to forecast future levels of prostate-specific antigen (PSA). I accomplish these tasks by employing clinical data of locally advanced

Predicting resistant prostate cancer is critical for lowering medical costs and improving the quality of life of advanced prostate cancer patients. I formulate, compare, and analyze two mathematical models that aim to forecast future levels of prostate-specific antigen (PSA). I accomplish these tasks by employing clinical data of locally advanced prostate cancer patients undergoing androgen deprivation therapy (ADT). I demonstrate that the inverse problem of parameter estimation might be too complicated and simply relying on data fitting can give incorrect conclusions, since there is a large error in parameter values estimated and parameters might be unidentifiable. I provide confidence intervals to give estimate forecasts using data assimilation via an ensemble Kalman Filter. Using the ensemble Kalman Filter, I perform dual estimation of parameters and state variables to test the prediction accuracy of the models. Finally, I present a novel model with time delay and a delay-dependent parameter. I provide a geometric stability result to study the behavior of this model and show that the inclusion of time delay may improve the accuracy of predictions. Also, I demonstrate with clinical data that the inclusion of the delay-dependent parameter facilitates the identification and estimation of parameters.
ContributorsBaez, Javier (Author) / Kuang, Yang (Thesis advisor) / Kostelich, Eric (Committee member) / Crook, Sharon (Committee member) / Gardner, Carl (Committee member) / Nagy, John (Committee member) / Arizona State University (Publisher)
Created2017
156594-Thumbnail Image.png
Description
Aquifers host the largest accessible freshwater resource in the world. However, groundwater reserves are declining in many places. Often coincident with drought, high extraction rates and inadequate replenishment result in groundwater overdraft and permanent land subsidence. Land subsidence is the cause of aquifer storage capacity reduction, altered topographic gradients which

Aquifers host the largest accessible freshwater resource in the world. However, groundwater reserves are declining in many places. Often coincident with drought, high extraction rates and inadequate replenishment result in groundwater overdraft and permanent land subsidence. Land subsidence is the cause of aquifer storage capacity reduction, altered topographic gradients which can exacerbate floods, and differential displacement that can lead to earth fissures and infrastructure damage. Improving understanding of the sources and mechanisms driving aquifer deformation is important for resource management planning and hazard mitigation.

Poroelastic theory describes the coupling of differential stress, strain, and pore pressure, which are modulated by material properties. To model these relationships, displacement time series are estimated via satellite interferometry and hydraulic head levels from observation wells provide an in-situ dataset. In combination, the deconstruction and isolation of selected time-frequency components allow for estimating aquifer parameters, including the elastic and inelastic storage coefficients, compaction time constants, and vertical hydraulic conductivity. Together these parameters describe the storage response of an aquifer system to changes in hydraulic head and surface elevation. Understanding aquifer parameters is useful for the ongoing management of groundwater resources.

Case studies in Phoenix and Tucson, Arizona, focus on land subsidence from groundwater withdrawal as well as distinct responses to artificial recharge efforts. In Christchurch, New Zealand, possible changes to aquifer properties due to earthquakes are investigated. In Houston, Texas, flood severity during Hurricane Harvey is linked to subsidence, which modifies base flood elevations and topographic gradients.
ContributorsMiller, Megan Marie (Author) / Shirzaei, Manoochehr (Thesis advisor) / Reynolds, Stephen (Committee member) / Tyburczy, James (Committee member) / Semken, Steven (Committee member) / Werth, Susanna (Committee member) / Arizona State University (Publisher)
Created2018
156679-Thumbnail Image.png
Description
The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to hel

The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields.

The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner.

The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability.

The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability.

The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control.
ContributorsLin, Sangdi (Author) / Runger, George C. (Thesis advisor) / Kocher, Jean-Pierre A (Committee member) / Pan, Rong (Committee member) / Escobedo, Adolfo R. (Committee member) / Arizona State University (Publisher)
Created2018
156682-Thumbnail Image.png
Description
Unsupervised learning of time series data, also known as temporal clustering, is a challenging problem in machine learning. This thesis presents a novel algorithm, Deep Temporal Clustering (DTC), to naturally integrate dimensionality reduction and temporal clustering into a single end-to-end learning framework, fully unsupervised. The algorithm utilizes an autoencoder for

Unsupervised learning of time series data, also known as temporal clustering, is a challenging problem in machine learning. This thesis presents a novel algorithm, Deep Temporal Clustering (DTC), to naturally integrate dimensionality reduction and temporal clustering into a single end-to-end learning framework, fully unsupervised. The algorithm utilizes an autoencoder for temporal dimensionality reduction and a novel temporal clustering layer for cluster assignment. Then it jointly optimizes the clustering objective and the dimensionality reduction objective. Based on requirement and application, the temporal clustering layer can be customized with any temporal similarity metric. Several similarity metrics and state-of-the-art algorithms are considered and compared. To gain insight into temporal features that the network has learned for its clustering, a visualization method is applied that generates a region of interest heatmap for the time series. The viability of the algorithm is demonstrated using time series data from diverse domains, ranging from earthquakes to spacecraft sensor data. In each case, the proposed algorithm outperforms traditional methods. The superior performance is attributed to the fully integrated temporal dimensionality reduction and clustering criterion.
ContributorsMadiraju, NaveenSai (Author) / Liang, Jianming (Thesis advisor) / Wang, Yalin (Thesis advisor) / He, Jingrui (Committee member) / Arizona State University (Publisher)
Created2018
157101-Thumbnail Image.png
Description中国商品期货市场经历30年发展,已初备协调资源分配、对冲经营风险的功能。但受产业自身和期货市场发展的制约,各期货品种市场有效性参差不齐。随着我国经济从增量阶段过渡到存量阶段,期货作为企业的价格管理和风险控制工具的重要性日益凸显,因此研究我国商品期货市场有效性具有非常好的现实意义。

本文开创性的从期货的基本功能——资源配置的角度出发,提出有效市场是指其期货价格能够对本行业社会资源起到合理的调配作用的市场。在内容安排上,本文首先总结了现有国际成熟期货品种的特点并找出能够反映期货对资源配置能力的四个指标假说,分别为期现回归性、利润波动性、库存波动性以及现金流变化,然后通过数学模型证明指标数据和品种成熟度的关联,最后应用该套指标对我国商品市场有效性进行检验。数学方法上,本文先采用Bai-Perron内生多重结构突变模型对时间序列进行突变点检验,然后对断点时间序列分别进行多元回归,并在剔除季节性和周期性后,通过平稳性检验、ARCH效应检验结果来确定相应的Garch模型,并用Garch模型来描述时间序列的波动性。

通过数学验证,我们认为期现回归性、利润波动性、库存波动性以及现金流变化这四个指标可以作为反映期货成熟度的检验指标,用该套方法对国内部分活跃品种检验后发现大连豆粕期货已经具备成熟品种的特征,本文认为豆粕期货市场是有效的;PTA、玉米淀粉期货的四个检验指标在近年来表现出时间序列优化的特点,但因时间较短尚不稳定,可以认为是接近成熟的品种;而螺纹钢和铝期货在多数指标上表现不佳,表明他们对社会资源配置能力较差,因此本文认为螺纹钢和铝期货市场是活跃但非有效的。通过进一步分析,本文认为品种的期现回归性差是制约其资源配置能力发挥的关键因素,而交易标的不明确、

仓单制作难度大、产业参与度低以及期货设计中的其他限制因素又是导致期现回归性差的重要原因。
ContributorsWang, Ping (Author) / Gu, Bin (Thesis advisor) / Li, Feng (Thesis advisor) / Yan, Hong (Committee member) / Arizona State University (Publisher)
Created2019
157322-Thumbnail Image.png
Description
With improvements in technology, intensive longitudinal studies that permit the investigation of daily and weekly cycles in behavior have increased exponentially over the past few decades. Traditionally, when data have been collected on two variables over time, multivariate time series approaches that remove trends, cycles, and serial dependency have been

With improvements in technology, intensive longitudinal studies that permit the investigation of daily and weekly cycles in behavior have increased exponentially over the past few decades. Traditionally, when data have been collected on two variables over time, multivariate time series approaches that remove trends, cycles, and serial dependency have been used. These analyses permit the study of the relationship between random shocks (perturbations) in the presumed causal series and changes in the outcome series, but do not permit the study of the relationships between cycles. Liu and West (2016) proposed a multilevel approach that permitted the study of potential between subject relationships between features of the cycles in two series (e.g., amplitude). However, I show that the application of the Liu and West approach is restricted to a small set of features and types of relationships between the series. Several authors (e.g., Boker & Graham, 1998) proposed a connected mass-spring model that appears to permit modeling of more general cyclic relationships. I showed that the undamped connected mass-spring model is also limited and may be unidentified. To test the severity of the restrictions of the motion trajectories producible by the undamped connected mass-spring model I mathematically derived their connection to the force equations of the undamped connected mass-spring system. The mathematical solution describes the domain of the trajectory pairs that are producible by the undamped connected mass-spring model. The set of producible trajectory pairs is highly restricted, and this restriction sets major limitations on the application of the connected mass-spring model to psychological data. I used a simulation to demonstrate that even if a pair of psychological time-varying variables behaved exactly like two masses in an undamped connected mass-spring system, the connected mass-spring model would not yield adequate parameter estimates. My simulation probed the performance of the connected mass-spring model as a function of several aspects of data quality including number of subjects, series length, sampling rate relative to the cycle, and measurement error in the data. The findings can be extended to damped and nonlinear connected mass-spring systems.
ContributorsMartynova, Elena (M.A.) (Author) / West, Stephen G. (Thesis advisor) / Amazeen, Polemnia (Committee member) / Tein, Jenn-Yun (Committee member) / Arizona State University (Publisher)
Created2019
154956-Thumbnail Image.png
Description
As the photovoltaic (PV) power plants age in the field, the PV modules degrade and generate visible and invisible defects. A defect and statistical degradation rate analysis of photovoltaic (PV) power plants is presented in two-part thesis. The first part of the thesis deals with the defect analysis and the

As the photovoltaic (PV) power plants age in the field, the PV modules degrade and generate visible and invisible defects. A defect and statistical degradation rate analysis of photovoltaic (PV) power plants is presented in two-part thesis. The first part of the thesis deals with the defect analysis and the second part of the thesis deals with the statistical degradation rate analysis. In the first part, a detailed analysis on the performance or financial risk related to each defect found in multiple PV power plants across various climatic regions of the USA is presented by assigning a risk priority number (RPN). The RPN for all the defects in each PV plant is determined based on two databases: degradation rate database; defect rate database. In this analysis it is determined that the RPN for each plant is dictated by the technology type (crystalline silicon or thin-film), climate and age. The PV modules aging between 3 and 19 years in four different climates of hot-dry, hot-humid, cold-dry and temperate are investigated in this study.

In the second part, a statistical degradation analysis is performed to determine if the degradation rates are linear or not in the power plants exposed in a hot-dry climate for the crystalline silicon technologies. This linearity degradation analysis is performed using the data obtained through two methods: current-voltage method; metered kWh method. For the current-voltage method, the annual power degradation data of hundreds of individual modules in six crystalline silicon power plants of different ages is used. For the metered kWh method, a residual plot analysis using Winters’ statistical method is performed for two crystalline silicon plants of different ages. The metered kWh data typically consists of the signal and noise components. Smoothers remove the noise component from the data by taking the average of the current and the previous observations. Once this is done, a residual plot analysis of the error component is performed to determine the noise was successfully separated from the data by proving the noise is random.
ContributorsSundarajan, Prasanna (Author) / Tamizhmani, Govindasamy (Thesis advisor) / Rogers, Bradley (Committee member) / Srinivasan, Devarajan (Committee member) / Arizona State University (Publisher)
Created2016
155732-Thumbnail Image.png
Description
This is a two-part thesis. Part 1 presents the seasonal and tilt angle dependence of soiling loss factor of photovoltaic (PV) modules over two years for Mesa, Arizona (a desert climatic condition). Part 2 presents the development of an indoor artificial soil deposition chamber replicating natural dew cycle.

This is a two-part thesis. Part 1 presents the seasonal and tilt angle dependence of soiling loss factor of photovoltaic (PV) modules over two years for Mesa, Arizona (a desert climatic condition). Part 2 presents the development of an indoor artificial soil deposition chamber replicating natural dew cycle. Several environmental factors affect the performance of PV systems including soiling. Soiling on PV modules results in a decrease of sunlight reaching the solar cell, thereby reducing the current and power output. Dust particles, air pollution particles, pollen, bird droppings and other industrial airborne particles are some natural sources that cause soiling. The dust particles vary from one location to the other in terms of particle size, color, and chemical composition. The thickness and properties of the soil layer determine the optical path of light through the soil/glass interface. Soil accumulation on the glass surface is also influenced by environmental factors such as dew, wind speeds and rainfall. Studies have shown that soil deposition is closely related to tilt angle and exposure period before a rain event. The first part of this thesis analyzes the reduction in irradiance transmitted to a solar cell through the air/soil/glass in comparison to a clean cell (air/glass interface). A time series representation is used to compare seasonal soiling loss factors for two consecutive years (2014-2016). The effect of tilt angle and rain events on these losses are extensively analyzed. Since soiling is a significant field issue, there is a growing need to address the problem, and several companies have come up with solutions such as anti-soiling coatings, automated cleaning systems etc. To test and validate the effectiveness of these anti-soiling coating technologies, various research institutes around the world are working on the design and development of artificial indoor soiling chambers to replicate the natural process in the field. The second part of this thesis work deals with the design and development of an indoor artificial soiling chamber that replicates natural soil deposition process in the field.
ContributorsVirkar, Shalaim (Author) / Tamizhmani, Govindasamy (Thesis advisor) / Srinivasan, Devarajan (Committee member) / Kuitche, Joseph (Committee member) / Arizona State University (Publisher)
Created2017