Matching Items (12)

136409-Thumbnail Image.png

Predicting Trends on Twitter with Time Series Analysis

Description

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media.

Twitter, the microblogging platform, has grown in prominence to the point that the topics that trend on the network are often the subject of the news and other traditional media. By predicting trends on Twitter, it could be possible to predict the next major topic of interest to the public. With this motivation, this paper develops a model for trends leveraging previous work with k-nearest-neighbors and dynamic time warping. The development of this model provides insight into the length and features of trends, and successfully generalizes to identify 74.3% of trends in the time period of interest. The model developed in this work provides understanding into why par- ticular words trend on Twitter.

Contributors

Created

Date Created
  • 2015-05

152127-Thumbnail Image.png

Leveraging metadata for extracting robust multi-variate temporal features

Description

In recent years, there are increasing numbers of applications that use multi-variate time series data where multiple uni-variate time series coexist. However, there is a lack of systematic of multi-variate

In recent years, there are increasing numbers of applications that use multi-variate time series data where multiple uni-variate time series coexist. However, there is a lack of systematic of multi-variate time series. This thesis focuses on (a) defining a simplified inter-related multi-variate time series (IMTS) model and (b) developing robust multi-variate temporal (RMT) feature extraction algorithm that can be used for locating, filtering, and describing salient features in multi-variate time series data sets. The proposed RMT feature can also be used for supporting multiple analysis tasks, such as visualization, segmentation, and searching / retrieving based on multi-variate time series similarities. Experiments confirm that the proposed feature extraction algorithm is highly efficient and effective in identifying robust multi-scale temporal features of multi-variate time series.

Contributors

Agent

Created

Date Created
  • 2013

157322-Thumbnail Image.png

Modeling relationships between cycles in psychology: potential limitations of sinusoidal and mass-spring models

Description

With improvements in technology, intensive longitudinal studies that permit the investigation of daily and weekly cycles in behavior have increased exponentially over the past few decades. Traditionally, when data have

With improvements in technology, intensive longitudinal studies that permit the investigation of daily and weekly cycles in behavior have increased exponentially over the past few decades. Traditionally, when data have been collected on two variables over time, multivariate time series approaches that remove trends, cycles, and serial dependency have been used. These analyses permit the study of the relationship between random shocks (perturbations) in the presumed causal series and changes in the outcome series, but do not permit the study of the relationships between cycles. Liu and West (2016) proposed a multilevel approach that permitted the study of potential between subject relationships between features of the cycles in two series (e.g., amplitude). However, I show that the application of the Liu and West approach is restricted to a small set of features and types of relationships between the series. Several authors (e.g., Boker & Graham, 1998) proposed a connected mass-spring model that appears to permit modeling of more general cyclic relationships. I showed that the undamped connected mass-spring model is also limited and may be unidentified. To test the severity of the restrictions of the motion trajectories producible by the undamped connected mass-spring model I mathematically derived their connection to the force equations of the undamped connected mass-spring system. The mathematical solution describes the domain of the trajectory pairs that are producible by the undamped connected mass-spring model. The set of producible trajectory pairs is highly restricted, and this restriction sets major limitations on the application of the connected mass-spring model to psychological data. I used a simulation to demonstrate that even if a pair of psychological time-varying variables behaved exactly like two masses in an undamped connected mass-spring system, the connected mass-spring model would not yield adequate parameter estimates. My simulation probed the performance of the connected mass-spring model as a function of several aspects of data quality including number of subjects, series length, sampling rate relative to the cycle, and measurement error in the data. The findings can be extended to damped and nonlinear connected mass-spring systems.

Contributors

Agent

Created

Date Created
  • 2019

150108-Thumbnail Image.png

Directional information flow and applications

Description

In the late 1960s, Granger published a seminal study on causality in time series, using linear interdependencies and information transfer. Recent developments in the field of information theory have introduced

In the late 1960s, Granger published a seminal study on causality in time series, using linear interdependencies and information transfer. Recent developments in the field of information theory have introduced new methods to investigate the transfer of information in dynamical systems. Using concepts from Chaos and Markov theory, much of these methods have evolved to capture non-linear relations and information flow between coupled dynamical systems with applications to fields like biomedical signal processing. This thesis deals with the application of information theory to non-linear multivariate time series and develops measures of information flow to identify significant drivers and response (driven) components in networks of coupled sub-systems with variable coupling in strength and direction (uni- or bi-directional) for each connection. Transfer Entropy (TE) is used to quantify pairwise directional information. Four TE-based measures of information flow are proposed, namely TE Outflow (TEO), TE Inflow (TEI), TE Net flow (TEN), and Average TE flow (ATE). First, the reliability of the information flow measures on models, with and without noise, is evaluated. The driver and response sub-systems in these models are identified. Second, these measures are applied to electroencephalographic (EEG) data from two patients with focal epilepsy. The analysis showed dominant directions of information flow between brain sites and identified the epileptogenic focus as the system component typically with the highest value for the proposed measures (for example, ATE). Statistical tests between pre-seizure (preictal) and post-seizure (postictal) information flow also showed a breakage of the driving of the brain by the focus after seizure onset. The above findings shed light on the function of the epileptogenic focus and understanding of ictogenesis. It is expected that they will contribute to the diagnosis of epilepsy, for example by accurate identification of the epileptogenic focus from interictal periods, as well as the development of better seizure detection, prediction and control methods, for example by isolating pathologic areas of excessive information flow through electrical stimulation.

Contributors

Agent

Created

Date Created
  • 2011

153242-Thumbnail Image.png

DB 2020: analyzing and forecasting DB market trends

Description

Over the last two decades, Alternative Project Delivery Methods (APDM), such as Design-Build (DB), have become more popular in the construction industry, specifically in the U.S., and the competition for

Over the last two decades, Alternative Project Delivery Methods (APDM), such as Design-Build (DB), have become more popular in the construction industry, specifically in the U.S., and the competition for APDM projects has risen among construction companies. The Engineering News Record (ENR) magazine analyzes DB firms and publishes the list of the top 100 every year. According to ENR articles and many scientific papers, the implementation of DB method has grown drastically over the last decade, however, information about growth trends depending on firm size and segment is lacking. Also missing is knowledge the future market trends over the next five years. Furthermore, public agencies and DB firms may be worried that DB projects do not distribute wealth equally among DB firms. Using the top 100 firms deemed representative of the DB market, the author has divided the market into volumes based on rankings to analyze the total DB market revenue growth. A comparison between international and domestic revenues indicated that the top five DB firms have 64% more involvement in the international market compared to the domestic market. Furthermore, while the research shows increasing market share only for the top five firms, the author has found that (1) a large portion of their market share is due to a large growth in their international market, and (2) revenues for all volumes of the DB market have increased. Moreover, regression and time series analyses allow for the forecasting of the DB market growth, which the author anticipate to move from about $100B to about $150B in 2020.

Contributors

Agent

Created

Date Created
  • 2014

151226-Thumbnail Image.png

Modeling time series data for supervised learning

Description

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance,

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of the relevant patterns This dissertation proposes TS representations and methods for supervised TS analysis. The approaches combine new representations that handle translations and dilations of patterns with bag-of-features strategies and tree-based ensemble learning. This provides flexibility in handling time-warped patterns in a computationally efficient way. The ensemble learners provide a classification framework that can handle high-dimensional feature spaces, multiple classes and interaction between features. The proposed representations are useful for classification and interpretation of the TS data of varying complexity. The first contribution handles the problem of time warping with a feature-based approach. An interval selection and local feature extraction strategy is proposed to learn a bag-of-features representation. This is distinctly different from common similarity-based time warping. This allows for additional features (such as pattern location) to be easily integrated into the models. The learners have the capability to account for the temporal information through the recursive partitioning method. The second contribution focuses on the comprehensibility of the models. A new representation is integrated with local feature importance measures from tree-based ensembles, to diagnose and interpret time intervals that are important to the model. Multivariate time series (MTS) are especially challenging because the input consists of a collection of TS and both features within TS and interactions between TS can be important to models. Another contribution uses a different representation to produce computationally efficient strategies that learn a symbolic representation for MTS. Relationships between the multiple TS, nominal and missing values are handled with tree-based learners. Applications such as speech recognition, medical diagnosis and gesture recognition are used to illustrate the methods. Experimental results show that the TS representations and methods provide better results than competitive methods on a comprehensive collection of benchmark datasets. Moreover, the proposed approaches naturally provide solutions to similarity analysis, predictive pattern discovery and feature selection.

Contributors

Agent

Created

Date Created
  • 2012

155089-Thumbnail Image.png

Multiscale interactions in psychological systems

Description

For many years now, researchers have documented evidence of fractal scaling in psychological time series. Explanations of fractal scaling have come from many sources but those that have gained

For many years now, researchers have documented evidence of fractal scaling in psychological time series. Explanations of fractal scaling have come from many sources but those that have gained the most traction in the literature are theories that suggest fractal scaling originates from the interactions among the multiple scales that make up behavior. Those theories, originating in the study of dynamical systems, suffer from the limitation that fractal analysis reveals only indirect evidence of multiscale interactions. Multiscale interactions must be demonstrated directly because there are many means to generate fractal properties. In two experiments, participants performed a pursuit tracking task while I recorded multiple behavioral and physiological time series. A new analytical technique, multiscale lagged regression, was introduced to capture how those many psychological time series coordinate across multiple scales and time. The results were surprising in that coordination among psychological time series tends to be oscillatory in nature, even when the series are not oscillatory themselves. Those and other results demonstrate the existence of multiscale interactions in psychological systems.

Contributors

Agent

Created

Date Created
  • 2016

154246-Thumbnail Image.png

Reconstructing and cotrolling nonlinear complex systems

Description

The power of science lies in its ability to infer and predict the

existence of objects from which no direct information can be obtained

experimentally or observationally. A well known example is

The power of science lies in its ability to infer and predict the

existence of objects from which no direct information can be obtained

experimentally or observationally. A well known example is to

ascertain the existence of black holes of various masses in different

parts of the universe from indirect evidence, such as X-ray emissions.

In the field of complex networks, the problem of detecting

hidden nodes can be stated, as follows. Consider a network whose

topology is completely unknown but whose nodes consist of two types:

one accessible and another inaccessible from the outside world. The

accessible nodes can be observed or monitored, and it is assumed that time

series are available from each node in this group. The inaccessible

nodes are shielded from the outside and they are essentially

``hidden.'' The question is, based solely on the

available time series from the accessible nodes, can the existence and

locations of the hidden nodes be inferred? A completely data-driven,

compressive-sensing based method is developed to address this issue by utilizing

complex weighted networks of nonlinear oscillators, evolutionary game

and geospatial networks.

Both microbes and multicellular organisms actively regulate their cell

fate determination to cope with changing environments or to ensure

proper development. Here, the synthetic biology approaches are used to

engineer bistable gene networks to demonstrate that stochastic and

permanent cell fate determination can be achieved through initializing

gene regulatory networks (GRNs) at the boundary between dynamic

attractors. This is experimentally realized by linking a synthetic GRN

to a natural output of galactose metabolism regulation in yeast.

Combining mathematical modeling and flow cytometry, the

engineered systems are shown to be bistable and that inherent gene expression

stochasticity does not induce spontaneous state transitioning at

steady state. By interfacing rationally designed synthetic

GRNs with background gene regulation mechanisms, this work

investigates intricate properties of networks that illuminate possible

regulatory mechanisms for cell differentiation and development that

can be initiated from points of instability.

Contributors

Agent

Created

Date Created
  • 2015

153343-Thumbnail Image.png

Eclipse BIRT plug-ins for dynamic piecewise constant and event time-series

Description

Time-series plots are used in many scientific and engineering applications. In this thesis, two new plug-ins for piecewise constant and event time-series are developed within the Eclipse BIRT (Business Intelligence

Time-series plots are used in many scientific and engineering applications. In this thesis, two new plug-ins for piecewise constant and event time-series are developed within the Eclipse BIRT (Business Intelligence and Reporting Tools) framework. These customizable plug-ins support superdense time, which is required for plotting the dynamics of Parallel DEVS models. These plug-ins are designed to receive time-based alphanumerical data sets from external computing sources, which can then be dynamically plotted. Static and dynamic time-series plotting are demonstrated in two settings. First, as standalone plug-ins, they can be used to create static plots, which can then be included in BIRT reports. Second, the plug-ins are integrated into the DEVS-Suite simulator where runtime simulated data generated from model components are dynamically plotted. Visual representation of data sets can simplify and improve model verification and simulation validation.

Contributors

Agent

Created

Date Created
  • 2015

154735-Thumbnail Image.png

Improving solar PV scheduling using statistical techniques

Description

The inherent intermittency in solar energy resources poses challenges to scheduling generation, transmission, and distribution systems. Energy storage devices are often used to mitigate variability in renewable asset generation and

The inherent intermittency in solar energy resources poses challenges to scheduling generation, transmission, and distribution systems. Energy storage devices are often used to mitigate variability in renewable asset generation and provide a mechanism to shift renewable power between periods of the day. In the absence of storage, however, time series forecasting techniques can be used to estimate future solar resource availability to improve the accuracy of solar generator scheduling. The knowledge of future solar availability helps scheduling solar generation at high-penetration levels, and assists with the selection and scheduling of spinning reserves. This study employs statistical techniques to improve the accuracy of solar resource forecasts that are in turn used to estimate solar photovoltaic (PV) power generation. The first part of the study involves time series forecasting of the global horizontal irradiation (GHI) in Phoenix, Arizona using Seasonal Autoregressive Integrated Moving Average (SARIMA) models. A comparative study is completed for time series forecasting models developed with different time step resolutions, forecasting start time, forecasting time horizons, training data, and transformations for data measured at Phoenix, Arizona. Approximately 3,000 models were generated and evaluated across the entire study. One major finding is that forecasted values one day ahead are near repeats of the preceding day—due to the 24-hour seasonal differencing—indicating that use of statistical forecasting over multiple days creates a repeating pattern. Logarithmic transform data were found to perform poorly in nearly all cases relative to untransformed or square-root transform data when forecasting out to four days. Forecasts using a logarithmic transform followed a similar profile as the immediate day prior whereas forecasts using untransformed and square-root transform data had smoother daily solar profiles that better represented the average intraday profile. Error values were generally lower during mornings and evenings and higher during midday. Regarding one-day forecasting and shorter forecasting horizons, the logarithmic transformation performed better than untransformed data and square-root transformed data irrespective of forecast horizon for data resolutions of 1-hour, 30-minutes, and 15-minutes.

Contributors

Agent

Created

Date Created
  • 2016