Search Content

Analysis of shifts & trends of organizations in Indonesia using tweets & RSS feeds

Description

With the advent of social media (like Twitter, Facebook etc.,) people are easily sharing their opinions, sentiments and enforcing their ideologies on others like never before. Even people who are otherwise socially inactive would like to share their thoughts on current affairs by tweeting and sharing news feeds with their…

With the advent of social media (like Twitter, Facebook etc.,) people are easily sharing their opinions, sentiments and enforcing their ideologies on others like never before. Even people who are otherwise socially inactive would like to share their thoughts on current affairs by tweeting and sharing news feeds with their friends and acquaintances. In this thesis study, we chose Twitter as our main data platform to analyze shifts and movements of 27 political organizations in Indonesia. So far, we have collected over 30 million tweets and 150,000 news articles from RSS feeds of the corresponding organizations for our analysis. For Twitter data extraction, we developed a multi-threaded application which seamlessly extracts, cleans and stores millions of tweets matching our keywords from Twitter Streaming API. For keyword extraction, we used topics and perspectives which were extracted using n-grams techniques and later approved by our social scientists. After the data is extracted, we aggregate the tweet contents that belong to every user on a weekly basis. Finally, we applied linear and logistic regression using SLEP, an open source sparse learning package to compute weekly score for users and mapping them to one of the 27 organizations on a radical or counter radical scale. Since, we are mapping users to organizations on a weekly basis, we are able to track user's behavior and important new events that triggered shifts among users between organizations. This thesis study can further be extended to identify topics and organization specific influential users and new users from various social media platforms like Facebook, YouTube etc. can easily be mapped to existing organizations on a radical or counter-radical scale.

ContributorsPoornachandran, Sathishkumar (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Woodward, Mark (Committee member) / Arizona State University (Publisher)

Created2013

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data

Description

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like…

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.

ContributorsBhattacharya, Indrani (Author) / Sen, Arunabha (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

A new backoff strategy using topological persistence in wireless networks

Description

Contention based IEEE 802.11MAC uses the binary exponential backoff algorithm (BEB) for the contention resolution. The protocol suffers poor performance in the heavily loaded networks and MANETs, high collision rate and packet drops, probabilistic delay guarantees, and unfairness. Many backoff strategies were proposed to improve the performance of IEEE 802.11…

Contention based IEEE 802.11MAC uses the binary exponential backoff algorithm (BEB) for the contention resolution. The protocol suffers poor performance in the heavily loaded networks and MANETs, high collision rate and packet drops, probabilistic delay guarantees, and unfairness. Many backoff strategies were proposed to improve the performance of IEEE 802.11 but all ignore the network topology and demand. Persistence is defined as the fraction of time a node is allowed to transmit, when this allowance should take into account topology and load, it is topology and load aware persistence (TLA). We develop a relation between contention window size and the TLA-persistence. We implement a new backoff strategy where the TLA-persistence is defined as the lexicographic max-min channel allocation. We use a centralized algorithm to calculate each node's TLApersistence and then convert it into a contention window size. The new backoff strategy is evaluated in simulation, comparing with that of the IEEE 802.11 using BEB. In most of the static scenarios like exposed terminal, flow in the middle, star topology, and heavy loaded multi-hop networks and in MANETs, through the simulation study, we show that the new backoff strategy achieves higher overall average throughput as compared to that of the IEEE 802.11 using BEB.

ContributorsBhyravajosyula, Sai Vishnu Kiran (Author) / Syrotiuk, Violet R. (Thesis advisor) / Sen, Arunabha (Committee member) / Richa, Andrea (Committee member) / Arizona State University (Publisher)

Created2013

Infinite cacheflow: a rule-caching solution for software defined networks

Description

New OpenFlow switches support a wide range of network applications, such as firewalls, load balancers, routers, and traffic monitoring. While ternary content addressable memory (TCAM) allows switches to process packets at high speed based on multiple header fields, today's commodity switches support just thousands to tens of thousands of forwarding…

New OpenFlow switches support a wide range of network applications, such as firewalls, load balancers, routers, and traffic monitoring. While ternary content addressable memory (TCAM) allows switches to process packets at high speed based on multiple header fields, today's commodity switches support just thousands to tens of thousands of forwarding rules. To allow for finer-grained policies on this hardware, efficient ways to support the abstraction of a switch are needed with arbitrarily large rule tables. To do so, a hardware-software hybrid switch is designed that relies on rule caching to provide large rule tables at low cost. Unlike traditional caching solutions, neither individual rules are cached (to respect rule dependencies) nor compressed (to preserve the per-rule traffic counts). Instead long dependency chains are ``spliced'' to cache smaller groups of rules while preserving the semantics of the network policy. The proposed hybrid switch design satisfies three criteria: (1) responsiveness, to allow rapid changes to the cache with minimal effect on traffic throughput; (2) transparency, to faithfully support native OpenFlow semantics; (3) correctness, to cache rules while preserving the semantics of the original policy. The evaluation of the hybrid switch on large rule tables suggest that it can effectively expose the benefits of both hardware and software switches to the controller and to applications running on top of it.

ContributorsAlipourfard, Omid (Author) / Syrotiuk, Violet R. (Thesis advisor) / Richa, Andréa W. (Committee member) / Xue, Guoliang (Committee member) / Arizona State University (Publisher)

Created2014

Firewall rule set analysis and visualization

Description

A firewall is a necessary component for network security and just like any regular equipment it requires maintenance. To keep up with changing cyber security trends and threats, firewall rules are modified frequently. Over time such modifications increase the complexity, size and verbosity of firewall rules. As the rule set…

A firewall is a necessary component for network security and just like any regular equipment it requires maintenance. To keep up with changing cyber security trends and threats, firewall rules are modified frequently. Over time such modifications increase the complexity, size and verbosity of firewall rules. As the rule set grows in size, adding and modifying rule becomes a tedious task. This discourages network administrators to review the work done by previous administrators before and after applying any changes. As a result the quality and efficiency of the firewall goes down.

Modification and addition of rules without knowledge of previous rules creates anomalies like shadowing and rule redundancy. Anomalous rule sets not only limit the efficiency of the firewall but in some cases create a hole in the perimeter security. Detection of anomalies has been studied for a long time and some well established procedures have been implemented and tested. But they all have a common problem of visualizing the results. When it comes to visualization of firewall anomalies, the results do not fit in traditional matrix, tree or sunburst representations.

This research targets the anomaly detection and visualization problem. It analyzes and represents firewall rule anomalies in innovative ways such as hive plots and dynamic slices. Such graphical representations of rule anomalies are useful in understanding the state of a firewall. It also helps network administrators in finding and fixing the anomalous rules.

ContributorsKhatkar, Pankaj Kumar (Author) / Huang, Dijiang (Thesis advisor) / Ahn, Gail-Joon (Committee member) / Syrotiuk, Violet R. (Committee member) / Arizona State University (Publisher)

Created2014

Continuous spatio temporal tracking of mobile targets

Description

There has been extensive study of the target tracking problems in the recent years. Very little work has been done in the problem of continuous monitoring of all the mobile targets using the fewest number of mobile trackers, when the trajectories of all the targets are known in advance. Almost…

There has been extensive study of the target tracking problems in the recent years. Very little work has been done in the problem of continuous monitoring of all the mobile targets using the fewest number of mobile trackers, when the trajectories of all the targets are known in advance. Almost all the existing research discretized time (and/or space), or assume infinite tracker velocity. In this thesis, I consider the problem of covering (tracking) target nodes using a network of Unmanned Airborne Vehicles (UAV's) for the entire period of observation by adding the constraint of fixed velocity on the trackers and observing the targets in continuous time and space. I also show that the problem is NP-complete and provide algorithms for handling cases when targets are static and dynamic.

ContributorsVachhani, Harsh (Author) / Sen, Arunabha (Thesis advisor) / Saripalli, Srikanth (Committee member) / Shirazipour, Shahrzad (Committee member) / Arizona State University (Publisher)

Created2014

Adaptive cross layer design and implementation for gigabit multimedia applications using 60 GHz wireless links

Description

Demands in file size and transfer rates for consumer-orientated products have escalated in recent times. This is primarily due to the emergence of high definition video content. Now factor in the consumer desire for convenience, and we find that wireless service is the most desired approach for inter-connectivity. Consumers expect…

Demands in file size and transfer rates for consumer-orientated products have escalated in recent times. This is primarily due to the emergence of high definition video content. Now factor in the consumer desire for convenience, and we find that wireless service is the most desired approach for inter-connectivity. Consumers expect wireless service to emulate wired service with little to virtually no difference in quality of service (QoS). The background section of this document examines the QoS requirements for wireless connectivity of high definition video applications. I then proceed to look at proposed solutions at the physical (PHY) and the media access control (MAC) layers as well as cross-layer schemes. These schemes are subsequently are evaluated in terms of usefulness in a multi-gigabit, 60 GHz wireless multimedia system targeting the average consumer. It is determined that a substantial gap in published literature exists pertinent to this application. Specifically, little or no work has been found that shows how an adaptive PHYMAC cross-layer solution that provides real-time compensation for varying channel conditions might be actually implemented. Further, no work has been found that shows results of such a model. This research proposes, develops and implements in Matlab code an alternate cross-layer solution that will provide acceptable QoS service for multimedia applications. Simulations using actual high definition video sequences are used to test the proposed solution. Results based on the average PSNR metric show that a quasi-adaptive algorithm provides greater than 7 dB of improvement over a non-adaptive approach while a fully-adaptive alogrithm provides over18 dB of improvement. The fully adaptive implementation has been conclusively shown to be superior to non-adaptive techniques and sufficiently superior to even quasi-adaptive algorithms.

ContributorsBosco, Bruce (Author) / Reisslein, Martin (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Query expansion for handling exploratory and ambiguous keyword queries

Description

Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical…

Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical options, in order to narrow down the search and reach the desired result. Typical corpus-driven keyword query expansion approaches return popular words in the results as expanded queries. These empirical methods fail to cover all semantics of categories present in the query results. More importantly these methods do not consider the semantic relationship between the keywords featured in an expanded query. Contrary to a normal keyword search setting, these factors are non-trivial in an exploratory and ambiguous query setting where the user's precise discernment of different categories present in the query results is more important for making subsequent search decisions. In this thesis, I propose a new framework for keyword query expansion: generating a set of queries that correspond to the categorization of original query results, which is referred as Categorizing query expansion. Two approaches of algorithms are proposed, one that performs clustering as pre-processing step and then generates categorizing expanded queries based on the clusters. The other category of algorithms handle the case of generating quality expanded queries in the presence of imperfect clusters.

ContributorsNatarajan, Sivaramakrishnan (Author) / Chen, Yi (Thesis advisor) / Candan, Selcuk (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Correlation based tools for analysis of dynamic networks

Description

Time series analysis of dynamic networks is an important area of study that helps in predicting changes in networks. Changes in networks are used to analyze deviations in the network characteristics. This analysis helps in characterizing any network that has dynamic behavior. This area of study has applications in many…

Time series analysis of dynamic networks is an important area of study that helps in predicting changes in networks. Changes in networks are used to analyze deviations in the network characteristics. This analysis helps in characterizing any network that has dynamic behavior. This area of study has applications in many domains such as communication networks, climate networks, social networks, transportation networks, and biological networks. The aim of this research is to analyze the structural characteristics of such dynamic networks. This thesis examines tools that help to analyze the structure of the networks and explores a technique for computation and analysis of a large climate dataset. The computations for analyzing the structural characteristics are done in a computing cluster and there is a linear speed up in computation time compared to a single-core computer. As an application, a large sea ice concentration anomaly dataset is analyzed. The large dataset is used to construct a correlation based graph. The results suggest that the climate data has the characteristics of a small-world graph.

ContributorsParamasivam, Kumaraguru (Author) / Colbourn, Charles J (Thesis advisor) / Sen, Arunabhas (Committee member) / Syrotiuk, Violet R. (Committee member) / Arizona State University (Publisher)

Created2011

Compressive sensing for computer vision and image processing

Description

With the introduction of compressed sensing and sparse representation,many image processing and computer vision problems have been looked at in a new way. Recent trends indicate that many challenging computer vision and image processing problems are being solved using compressive sensing and sparse representation algorithms. This thesis assays some applications…

With the introduction of compressed sensing and sparse representation,many image processing and computer vision problems have been looked at in a new way. Recent trends indicate that many challenging computer vision and image processing problems are being solved using compressive sensing and sparse representation algorithms. This thesis assays some applications of compressive sensing and sparse representation with regards to image enhancement, restoration and classication. The first application deals with image Super-Resolution through compressive sensing based sparse representation. A novel framework is developed for understanding and analyzing some of the implications of compressive sensing in reconstruction and recovery of an image through raw-sampled and trained dictionaries. Properties of the projection operator and the dictionary are examined and the corresponding results presented. In the second application a novel technique for representing image classes uniquely in a high-dimensional space for image classification is presented. In this method, design and implementation strategy of the image classification system through unique affine sparse codes is presented, which leads to state of the art results. This further leads to analysis of some of the properties attributed to these unique sparse codes. In addition to obtaining these codes, a strong classier is designed and implemented to boost the results obtained. Evaluation with publicly available datasets shows that the proposed method outperforms other state of the art results in image classication. The final part of the thesis deals with image denoising with a novel approach towards obtaining high quality denoised image patches using only a single image. A new technique is proposed to obtain highly correlated image patches through sparse representation, which are then subjected to matrix completion to obtain high quality image patches. Experiments suggest that there may exist a structure within a noisy image which can be exploited for denoising through a low-rank constraint.

ContributorsKulkarni, Naveen (Author) / Li, Baoxin (Thesis advisor) / Ye, Jieping (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Filtering by