Matching Items (6)

154174-Thumbnail Image.png

Multi-variate time series similarity measures and their robustness against temporal asynchrony

Description

The amount of time series data generated is increasing due to the integration of sensor technologies with everyday applications, such as gesture recognition, energy optimization, health care, video surveillance. The

The amount of time series data generated is increasing due to the integration of sensor technologies with everyday applications, such as gesture recognition, energy optimization, health care, video surveillance. The use of multiple sensors simultaneously

for capturing different aspects of the real world attributes has also led to an increase in dimensionality from uni-variate to multi-variate time series. This has facilitated richer data representation but also has necessitated algorithms determining similarity between two multi-variate time series for search and analysis.

Various algorithms have been extended from uni-variate to multi-variate case, such as multi-variate versions of Euclidean distance, edit distance, dynamic time warping. However, it has not been studied how these algorithms account for asynchronous in time series. Human gestures, for example, exhibit asynchrony in their patterns as different subjects perform the same gesture with varying movements in their patterns at different speeds. In this thesis, we propose several algorithms (some of which also leverage metadata describing the relationships among the variates). In particular, we present several techniques that leverage the contextual relationships among the variates when measuring multi-variate time series similarities. Based on the way correlation is leveraged, various weighing mechanisms have been proposed that determine the importance of a dimension for discriminating between the time series as giving the same weight to each dimension can led to misclassification. We next study the robustness of the considered techniques against different temporal asynchronies, including shifts and stretching.

Exhaustive experiments were carried on datasets with multiple types and amounts of temporal asynchronies. It has been observed that accuracy of algorithms that rely on data to discover variate relationships can be low under the presence of temporal asynchrony, whereas in case of algorithms that rely on external metadata, robustness against asynchronous distortions tends to be stronger. Specifically, algorithms using external metadata have better classification accuracy and cluster separation than existing state-of-the-art work, such as EROS, PCA, and naive dynamic time warping.

Contributors

Agent

Created

Date Created
  • 2015

154905-Thumbnail Image.png

Determining appropriate sample sizes and their effects on key parameters in longitudinal three-level models

Description

Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75,

Through a two study simulation design with different design conditions (sample size at level 1 (L1) was set to 3, level 2 (L2) sample size ranged from 10 to 75, level 3 (L3) sample size ranged from 30 to 150, intraclass correlation (ICC) ranging from 0.10 to 0.50, model complexity ranging from one predictor to three predictors), this study intends to provide general guidelines about adequate sample sizes at three levels under varying ICC conditions for a viable three level HLM analysis (e.g., reasonably unbiased and accurate parameter estimates). In this study, the data generating parameters for the were obtained using a large-scale longitudinal data set from North Carolina, provided by the National Center on Assessment and Accountability for Special Education (NCAASE). I discuss ranges of sample sizes that are inadequate or adequate for convergence, absolute bias, relative bias, root mean squared error (RMSE), and coverage of individual parameter estimates. The current study, with the help of a detailed two-part simulation design for various sample sizes, model complexity and ICCs, provides various options of adequate sample sizes under different conditions. This study emphasizes that adequate sample sizes at either L1, L2, and L3 can be adjusted according to different interests in parameter estimates, different ranges of acceptable absolute bias, relative bias, root mean squared error, and coverage. Under different model complexity and varying ICC conditions, this study aims to help researchers identify L1, L2, and L3 sample size or both as the source of variation in absolute bias, relative bias, RMSE, or coverage proportions for a certain parameter estimate. This assists researchers in making better decisions for selecting adequate sample sizes in a three-level HLM analysis. A limitation of the study was the use of only a single distribution for the dependent and explanatory variables, different types of distributions and their effects might result in different sample size recommendations.

Contributors

Agent

Created

Date Created
  • 2016

153299-Thumbnail Image.png

Short-term wind power forecasts using Doppler lidar

Description

With a ground-based Doppler lidar on the upwind side of a wind farm in the Tehachapi Pass of California, radial wind velocity measurements were collected for repeating sector sweeps, scanning

With a ground-based Doppler lidar on the upwind side of a wind farm in the Tehachapi Pass of California, radial wind velocity measurements were collected for repeating sector sweeps, scanning up to 10 kilometers away. This region consisted of complex terrain, with the scans made between mountains. The dataset was utilized for techniques being studied for short-term forecasting of wind power by correlating changes in energy content and of turbulence intensity by tracking spatial variance, in the wind ahead of a wind farm. A ramp event was also captured and its propagation was tracked.

Orthogonal horizontal wind vectors were retrieved from the radial velocity using a sector Velocity Azimuth Display method. Streamlines were plotted to determine the potential sites for a correlation of upstream wind speed with wind speed at downstream locations near the wind farm. A "virtual wind turbine" was "placed" in locations along the streamline by using the time-series velocity data at the location as the input to a modeled wind turbine, to determine the extractable energy content at that location. The relationship between this time-dependent energy content upstream and near the wind farm was studied. By correlating the energy content with each upstream location based on a time shift estimated according to advection at the mean wind speed, several fits were evaluated. A prediction of the downstream energy content was produced by shifting the power output in time and applying the best-fit function. This method made predictions of the power near the wind farm several minutes in advance. Predictions were also made up to an hour in advance for a large ramp event. The Magnitude Absolute Error and Standard Deviation are presented for the predictions based on each selected upstream location.

Contributors

Agent

Created

Date Created
  • 2014

156148-Thumbnail Image.png

Three essays on correlated binary outcomes: detection and appropriate models

Description

Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent

Correlation is common in many types of data, including those collected through longitudinal studies or in a hierarchical structure. In the case of clustering, or repeated measurements, there is inherent correlation between observations within the same group, or between observations obtained on the same subject. Longitudinal studies also introduce association between the covariates and the outcomes across time. When multiple outcomes are of interest, association may exist between the various models. These correlations can lead to issues in model fitting and inference if not properly accounted for. This dissertation presents three papers discussing appropriate methods to properly consider different types of association. The first paper introduces an ANOVA based measure of intraclass correlation for three level hierarchical data with binary outcomes, and corresponding properties. This measure is useful for evaluating when the correlation due to clustering warrants a more complex model. This measure is used to investigate AIDS knowledge in a clustered study conducted in Bangladesh. The second paper develops the Partitioned generalized method of moments (Partitioned GMM) model for longitudinal studies. This model utilizes valid moment conditions to separately estimate the varying effects of each time-dependent covariate on the outcome over time using multiple coefficients. The model is fit to data from the National Longitudinal Study of Adolescent to Adult Health (Add Health) to investigate risk factors of childhood obesity. In the third paper, the Partitioned GMM model is extended to jointly estimate regression models for multiple outcomes of interest. Thus, this approach takes into account both the correlation between the multivariate outcomes, as well as the correlation due to time-dependency in longitudinal studies. The model utilizes an expanded weight matrix and objective function composed of valid moment conditions to simultaneously estimate optimal regression coefficients. This approach is applied to Add Health data to simultaneously study drivers of outcomes including smoking, social alcohol usage, and obesity in children.

Contributors

Agent

Created

Date Created
  • 2018

153829-Thumbnail Image.png

Neutron-gamma ray discrimination using normalized cross correlation

Description

The reduced availability of 3He is a motivation for developing alternative neutron detectors. 6Li-enriched CLYC (Cs2LiYCl6), a scintillator, is a promising candidate to replace 3He. The neutron and gamma ray

The reduced availability of 3He is a motivation for developing alternative neutron detectors. 6Li-enriched CLYC (Cs2LiYCl6), a scintillator, is a promising candidate to replace 3He. The neutron and gamma ray signals from CLYC have different shapes due to the slower decay of neutron pulses. Some of the well-known pulse shape discrimination techniques are charge comparison method, pulse gradient method and frequency gradient method. In the work presented here, we have applied a normalized cross correlation (NCC) approach to real neutron and gamma ray pulses produced by exposing CLYC scintillators to a mixed radiation environment generated by 137Cs, 22Na, 57Co and 252Cf/AmBe at different event rates. The cross correlation analysis produces distinctive results for measured neutron pulses and gamma ray pulses when they are cross correlated with reference neutron and/or gamma templates. NCC produces good separation between neutron and gamma rays at low (< 100 kHz) to mid event rate (< 200 kHz). However, the separation disappears at high event rate (> 200 kHz) because of pileup, noise and baseline shift. This is also confirmed by observing the pulse shape discrimination (PSD) plots and figure of merit (FOM) of NCC. FOM is close to 3, which is good, for low event rate but rolls off significantly along with the increase in the event rate and reaches 1 at high event rate. Future efforts are required to reduce the noise by using better hardware system, remove pileup and detect the NCC shapes of neutron and gamma rays using advanced techniques.

Contributors

Agent

Created

Date Created
  • 2015

149957-Thumbnail Image.png

Correlation based tools for analysis of dynamic networks

Description

Time series analysis of dynamic networks is an important area of study that helps in predicting changes in networks. Changes in networks are used to analyze deviations in the network

Time series analysis of dynamic networks is an important area of study that helps in predicting changes in networks. Changes in networks are used to analyze deviations in the network characteristics. This analysis helps in characterizing any network that has dynamic behavior. This area of study has applications in many domains such as communication networks, climate networks, social networks, transportation networks, and biological networks. The aim of this research is to analyze the structural characteristics of such dynamic networks. This thesis examines tools that help to analyze the structure of the networks and explores a technique for computation and analysis of a large climate dataset. The computations for analyzing the structural characteristics are done in a computing cluster and there is a linear speed up in computation time compared to a single-core computer. As an application, a large sea ice concentration anomaly dataset is analyzed. The large dataset is used to construct a correlation based graph. The results suggest that the climate data has the characteristics of a small-world graph.

Contributors

Agent

Created

Date Created
  • 2011