Search Content

Enhanced topic-based modeling for Twitter sentiment analysis

Description

In this thesis multiple approaches are explored to enhance sentiment analysis of tweets. A standard sentiment analysis model with customized features is first trained and tested to establish a baseline. This is compared to an existing topic based mixture model and a new proposed topic based vector model both of…

In this thesis multiple approaches are explored to enhance sentiment analysis of tweets. A standard sentiment analysis model with customized features is first trained and tested to establish a baseline. This is compared to an existing topic based mixture model and a new proposed topic based vector model both of which use Latent Dirichlet Allocation (LDA) for topic modeling. The proposed topic based vector model has higher accuracies in terms of averaged F scores than the other two models.

ContributorsBaskaran, Swetha (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Hsiao, Ihan (Committee member) / Arizona State University (Publisher)

Created2016

Online ET-LDA - joint modeling of events and their related tweets with online streaming data

Description

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the written news articles or transcripts of events in giving the…

Micro-blogging platforms like Twitter have become some of the most popular sites for people to share and express their views and opinions about public events like debates, sports events or other news articles. These social updates by people complement the written news articles or transcripts of events in giving the popular public opinion about these events. So it would be useful to annotate the transcript with tweets. The technical challenge is to align the tweets with the correct segment of the transcript. ET-LDA by Hu et al [9] addresses this issue by modeling the whole process with an LDA-based graphical model. The system segments the transcript into coherent and meaningful parts and also determines if a tweet is a general tweet about the event or it refers to a particular segment of the transcript. One characteristic of the Hu et al’s model is that it expects all the data to be available upfront and uses batch inference procedure. But in many cases we find that data is not available beforehand, and it is often streaming. In such cases it is infeasible to repeatedly run the batch inference algorithm. My thesis presents an online inference algorithm for the ET-LDA model, with a continuous stream of tweet data and compare their runtime and performance to existing algorithms.

ContributorsAcharya, Anirudh (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2015

TensorDB and tensor-relational model (TRM) for efficient tensor-relational operations

Description

Multidimensional data have various representations. Thanks to their simplicity in modeling multidimensional data and the availability of various mathematical tools (such as tensor decompositions) that support multi-aspect analysis of such data, tensors are increasingly being used in many application domains including scientific data management, sensor data management, and social network…

Multidimensional data have various representations. Thanks to their simplicity in modeling multidimensional data and the availability of various mathematical tools (such as tensor decompositions) that support multi-aspect analysis of such data, tensors are increasingly being used in many application domains including scientific data management, sensor data management, and social network data analysis. Relational model, on the other hand, enables semantic manipulation of data using relational operators, such as projection, selection, Cartesian-product, and set operators. For many multidimensional data applications, tensor operations as well as relational operations need to be supported throughout the data life cycle. In this thesis, we introduce a tensor-based relational data model (TRM), which enables both tensor- based data analysis and relational manipulations of multidimensional data, and define tensor-relational operations on this model. Then we introduce a tensor-relational data management system, so called, TensorDB. TensorDB is based on TRM, which brings together relational algebraic operations (for data manipulation and integration) and tensor algebraic operations (for data analysis). We develop optimization strategies for tensor-relational operations in both in-memory and in-database TensorDB. The goal of the TRM and TensorDB is to serve as a single environment that supports the entire life cycle of data; that is, data can be manipulated, integrated, processed, and analyzed.

ContributorsKim, Mijung (Author) / Candan, K. Selcuk (Thesis advisor) / Davulcu, Hasan (Committee member) / Sundaram, Hari (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2014

On Density and Noise Challenges in Tensor-Based Data Analytics

Description

Many real-world problems, such as model- and data-driven computer simulation analysis, social and collaborative network analysis, brain data analysis, and so on, benefit from jointly modeling and analyzing the underlying patterns associated with complex, multi-relational data. Tensor decomposition is an ideal mathematical tool for this joint modeling, due to its…

Many real-world problems, such as model- and data-driven computer simulation analysis, social and collaborative network analysis, brain data analysis, and so on, benefit from jointly modeling and analyzing the underlying patterns associated with complex, multi-relational data. Tensor decomposition is an ideal mathematical tool for this joint modeling, due to its simultaneous analysis of such multi-relational data, which is made possible by the data's multidimensional, array-based nature. A major challenge in tensor decomposition lies with its computational and space complexity, especially for dense datasets. While the process is comparatively faster for sparse tensors, decomposition is still a major bottleneck for many applications. The tensor decomposition process results in dense (hence, large) intermediate results, even when the input tensor is sparse (or small). Noise is another challenge for most data mining techniques, and many tensor decomposition schemes are sensitive to noisy datasets; this is an inevitable problem for real-world data, which can lead to false conclusions. In this dissertation, I develop innovative tensor decomposition algorithms for mining both sparse and dense multi-relational data in a noise-resistant way. I present novel, scalable, parallelizable tensor decomposition algorithms, specifically tuned to be effective for dense, noisy tensors, and which maintain the quality of the resulting analysis. Furthermore, I present results on multi-relational data applications focusing on model- and data-driven computer simulation analysis, as well as social network and web mining, which demonstrate the effectiveness of these tensor decompositions.

ContributorsLi, Xinsheng (Author) / Candan, Kasim S (Thesis advisor) / Davulcu, Hasan (Committee member) / Sapino, Maria L (Committee member) / Tong, Hanghang (Committee member) / Arizona State University (Publisher)

Created2019

Optimization of Block-based Tensor Decompositions through Sub-Tensor Impact Graphs and Applications to Dynamicity in Data and User Focus

Description

Tensors are commonly used for representing multi-dimensional data, such as Web graphs, sensor streams, and social networks. As a consequence of the increase in the use of tensors, tensor decomposition operations began to form the basis for many data analysis and knowledge discovery tasks, from clustering, trend detection, anomaly detection…

Tensors are commonly used for representing multi-dimensional data, such as Web graphs, sensor streams, and social networks. As a consequence of the increase in the use of tensors, tensor decomposition operations began to form the basis for many data analysis and knowledge discovery tasks, from clustering, trend detection, anomaly detection to correlationanalysis [31, 38]. It is well known that Singular Value matrix Decomposition (SVD) [9] is used to extract latent semantics for matrix data. When apply SVD to tensors, which have more than two modes, it is tensor decomposition. The two most popular tensor decomposition algorithms are the Tucker [54] and the CP [19] decompositions. Intuitively, they both generalize SVD to tensors. However, one key problem with tensor decomposition is its computational complexity which may cause system bottleneck. Therefore, two phase block-centric CP tensor decomposition (2PCP) was proposed to partition the tensor into small sub-tensors, execute sub-tensor decomposition in parallel and combine the factors from each sub-tensor into final decomposition factors through iterative rerefinement process. Consequently, I proposed Sub-tensor Impact Graph (SIG) to account for inaccuracy propagation among sub-tensors and measure the impact of decomposition of sub-tensors on the other's decomposition, Based on SIG, I proposed several optimization strategies to optimize 2PCP's phase-2 refinement process. Furthermore, I applied SIG and optimization strategies for data focus, data evolution, and focus shifting in tensor analysis. Personalized Tensor Decomposition (PTD) is proposed to account for the users focus given the observations that in many applications, the user may have a focus of interest i.e., part of the data for which the user needs high accuracy and beyond this area focus, accuracy may not be as critical. PTD takes as input one or more areas of focus and performs the decomposition in such a way that, when reconstructed, the accuracy of the tensor is boosted for these areas of focus. A related challenge of data evolution in tensor analytics is incremental tensor decomposition since re-computation of the whole tensor decomposition with each update will cause high computational costs and incur large memory overheads. Especially for applications where data evolves over time and the tensor-based analysis results need to be continuouslymaintained. To avoid re-decomposition, I propose a two-phase block-incremental CP-based tensor decomposition technique, BICP, that efficiently and effectively maintains tensor decomposition results in the presence of dynamically evolving tensor data. I further extend the research focus on user focus shift. User focus may change over time as data is evolving along the time. Although PTD is efficient, re-computation for each user preference update can be the bottleneck for the system. Therefore I propose dynamic evolving user focus tensor decomposition which can smartly reuse the existing decomposition result to improve the efficiency of evolving user focus block decomposition.

ContributorsHuang, shengyu (Author) / Candan, K. Selcuk (Thesis advisor) / Davulcu, Hasan (Committee member) / Sapino, Maria Luisa (Committee member) / Tong, Hanghang (Committee member) / Zou, Jia (Committee member) / Arizona State University (Publisher)

Created2021

ASU Electronic Theses and Dissertations

Filtering by

Enhanced topic-based modeling for Twitter sentiment analysis

Online ET-LDA - joint modeling of events and their related tweets with online streaming data

TensorDB and tensor-relational model (TRM) for efficient tensor-relational operations

On Density and Noise Challenges in Tensor-Based Data Analytics

Optimization of Block-based Tensor Decompositions through Sub-Tensor Impact Graphs and Applications to Dynamicity in Data and User Focus