Search Content

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

The cost and benefit of bulk energy storage in the Arizona power transmission system

Description

This thesis addresses the issue of making an economic case for energy storage in power systems. Bulk energy storage has often been suggested for large scale electric power systems in order to levelize load; store energy when it is inexpensive and discharge energy when it is expensive; potentially defer transmission…

This thesis addresses the issue of making an economic case for energy storage in power systems. Bulk energy storage has often been suggested for large scale electric power systems in order to levelize load; store energy when it is inexpensive and discharge energy when it is expensive; potentially defer transmission and generation expansion; and provide for generation reserve margins. As renewable energy resource penetration increases, the uncertainty and variability of wind and solar may be alleviated by bulk energy storage technologies. The quadratic programming function in MATLAB is used to simulate an economic dispatch that includes energy storage. A program is created that utilizes quadratic programming to analyze various cases using a 2010 summer peak load from the Arizona transmission system, part of the Western Electricity Coordinating Council (WECC). The MATLAB program is used first to test the Arizona test bed with a low level of energy storage to study how the storage power limit effects several optimization out-puts such as the system wide operating costs. Very high levels of energy storage are then added to see how high level energy storage affects peak shaving, load factor, and other system applications. Finally, various constraint relaxations are made to analyze why the applications tested eventually approach a constant value. This research illustrates the use of energy storage which helps minimize the system wide generator operating cost by "shaving" energy off of the peak demand.

ContributorsRuggiero, John (Author) / Heydt, Gerald T (Thesis advisor) / Datta, Rajib (Committee member) / Karady, George G. (Committee member) / Arizona State University (Publisher)

Created2013

Thermal performance of 69 kV underground cables

Description

Underground cables have been widely used in big cities. This is because underground cables offer the benefits of reducing visual impact and the disturbance caused by bad weather (wind, ice, snow, and the lightning strikes). Additionally, when placing power lines underground, the maintenance costs can also be reduced as a…

Underground cables have been widely used in big cities. This is because underground cables offer the benefits of reducing visual impact and the disturbance caused by bad weather (wind, ice, snow, and the lightning strikes). Additionally, when placing power lines underground, the maintenance costs can also be reduced as a result. The underground cable rating calculation is the most critical part of designing the cable construction and cable installation. In this thesis, three contributions regarding the cable ampacity study have been made. First, an analytical method for rating of underground cables has been presented. Second, this research also develops the steady state and transient ratings for Salt River Project (SRP) 69 kV underground system using the commercial software CYMCAP for several typical substations. Third, to find an alternative way to predict the cable ratings, three regression models have been built. The residual plot and mean square error for the three methods have been analyzed. The conclusion is dawn that the nonlinear regression model provides the sufficient accuracy of the cable rating prediction for SRP's typical installation.

ContributorsWang, Tong (Author) / Tylavsky, Daniel (Thesis advisor) / Karady, George G. (Committee member) / Holbert, Keith E. (Committee member) / Arizona State University (Publisher)

Created2013

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located…

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Evaluation of room temperature vulcanized (RTV) silicone rubber coated porcelain post insulators under contaminated conditions

Description

This thesis concerns the flashover issue of the substation insulators operating in a polluted environment. The outdoor insulation equipment used in the power delivery infrastructure encounter different types of pollutants due to varied environmental conditions. Various methods have been developed by manufacturers and researchers to mitigate the flashover problem. The…

This thesis concerns the flashover issue of the substation insulators operating in a polluted environment. The outdoor insulation equipment used in the power delivery infrastructure encounter different types of pollutants due to varied environmental conditions. Various methods have been developed by manufacturers and researchers to mitigate the flashover problem. The application of Room Temperature Vulcanized (RTV) silicone rubber is one such favorable method as it can be applied over the already installed units. Field experience has already showed that the RTV silicone rubber coated insulators have a lower flashover probability than the uncoated insulators. The scope of this research is to quantify the improvement in the flashover performance. Artificial contamination tests were carried on station post insulators for assessing their performance. A factorial experiment design was used to model the flashover performance. The formulation included the severity of contamination and leakage distance of the insulator samples. Regression analysis was used to develop a mathematical model from the data obtained from the experiments. The main conclusion drawn from the study is that the RTV coated insulators withstood much higher levels of contamination even when the coating had lost its hydrophobicity. This improvement in flashover performance was found to be in the range of 20-40%. A much better flashover performance was observed when the coating recovered its hydrophobicity. It was also seen that the adhesion of coating was excellent even after many tests which involved substantial discharge activity.

ContributorsGholap, Vipul (Author) / Gorur, Ravi S (Thesis advisor) / Karady, George G. (Committee member) / Ayyanar, Raja (Committee member) / Arizona State University (Publisher)

Created2013

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data

Description

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like…

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.

ContributorsBhattacharya, Indrani (Author) / Sen, Arunabha (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

IISS a framework to influence individuals through social signals on a social network

Description

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a…

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a subset of items presented to her. The network operator again, shows that activity to a selection of peers, and thus creating a behavioral loop. That mechanism of interaction and information flow raises some very interesting questions such as: can network operator design social signals to promote a particular activity like sustainability, public health care awareness, or to promote a specific product? The focus of my thesis is to answer that question. In this thesis, I develop a framework to personalize social signals for users to guide their activities on an online platform. As the result, we gradually nudge the activity distribution on the platform from the initial distribution p to the target distribution q. My work is particularly applicable to guiding collaborations, guiding collective actions, and online advertising. In particular, I first propose a probabilistic model on how users behave and how information flows on the platform. The main part of this thesis after that discusses the Influence Individuals through Social Signals (IISS) framework. IISS consists of four main components: (1) Learner: it learns users' interests and characteristics from their historical activities using Bayesian model, (2) Calculator: it uses gradient descent method to compute the intermediate activity distributions, (3) Selector: it selects users who can be influenced to adopt or drop specific activities, (4) Designer: it personalizes social signals for each user. I evaluate the performance of IISS framework by simulation on several network topologies such as preferential attachment, small world, and random. I show that the framework gradually nudges users' activities to approach the target distribution. I use both simulation and mathematical method to analyse convergence properties such as how fast and how close we can approach the target distribution. When the number of activities is 3, I show that for about 45% of target distributions, we can achieve KL-divergence as low as 0.05. But for some other distributions KL-divergence can be as large as 0.5.

ContributorsLe, Tien D (Author) / Sundaram, Hari (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2014

HTLS upgrades for power transmission expansion planning and operation

Description

Renewable portfolio standards prescribe for penetration of high amounts of re-newable energy sources (RES) that may change the structure of existing power systems. The load growth and changes in power flow caused by RES integration may result in re-quirements of new available transmission capabilities and upgrades of existing transmis-sion paths.…

Renewable portfolio standards prescribe for penetration of high amounts of re-newable energy sources (RES) that may change the structure of existing power systems. The load growth and changes in power flow caused by RES integration may result in re-quirements of new available transmission capabilities and upgrades of existing transmis-sion paths. Construction difficulties of new transmission lines can become a problem in certain locations. The increase of transmission line thermal ratings by reconductoring using High Temperature Low Sag (HTLS) conductors is a comparatively new technology introduced to transmission expansion. A special design permits HTLS conductors to operate at high temperatures (e.g., 200oC), thereby allowing passage of higher current. The higher temperature capability increases the steady state and emergency thermal ratings of the transmission line. The main disadvantage of HTLS technology is high cost. The high cost may place special emphasis on a thorough analysis of cost to benefit of HTLS technology im-plementation. Increased transmission losses in HTLS conductors due to higher current may be a disadvantage that can reduce the attractiveness of this method. Studies described in this thesis evaluate the expenditures for transmission line re-conductoring using HTLS and the consequent benefits obtained from the potential decrease in operating cost for thermally limited transmission systems. Studies performed consider the load growth and penetration of distributed renewable energy sources according to the renewable portfolio standards for power systems. An evaluation of payback period is suggested to assess the cost to benefit ratio of HTLS upgrades. The thesis also considers the probabilistic nature of transmission upgrades. The well-known Chebyshev inequality is discussed with an application to transmission up-grades. The Chebyshev inequality is proposed to calculate minimum payback period ob-tained from the upgrades of certain transmission lines. The cost to benefit evaluation of HTLS upgrades is performed using a 225 bus equivalent of the 2012 summer peak Arizona portion of the Western Electricity Coordi-nating Council (WECC).

ContributorsTokombayev, Askhat (Author) / Heydt, Gerald T. (Thesis advisor) / Sankar, Lalitha (Committee member) / Karady, George G. (Committee member) / Arizona State University (Publisher)

Created2014

Utility of considering multiple alternative rectifications in data cleaning

Description

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many…

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.

ContributorsRihan, Preet Inder Singh (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

Making the case for high temperature low sag (HTLS) overhead transmission line conductors

Description

The future grid will face challenges to meet an increased power demand by the consumers. Various solutions were studied to address this issue. One alternative to realize increased power flow in the grid is to use High Temperature Low Sag (HTLS) since it fulfills essential criteria of less sag and…

The future grid will face challenges to meet an increased power demand by the consumers. Various solutions were studied to address this issue. One alternative to realize increased power flow in the grid is to use High Temperature Low Sag (HTLS) since it fulfills essential criteria of less sag and good material performance with temperature. HTLS conductors like Aluminum Conductor Composite Reinforced (ACCR) and Aluminum Conductor Carbon Composite (ACCC) are expected to face high operating temperatures of 150-200 degree Celsius in order to achieve the desired increased power flow. Therefore, it is imperative to characterize the material performance of these conductors with temperature. The work presented in this thesis addresses the characterization of carbon composite core based and metal matrix core based HTLS conductors. The thesis focuses on the study of variation of tensile strength of the carbon composite core with temperature and the level of temperature rise of the HTLS conductors due to fault currents cleared by backup protection. In this thesis, Dynamic Mechanical Analysis (DMA) was used to quantify the loss in storage modulus of carbon composite cores with temperature. It has been previously shown in literature that storage modulus is correlated to the tensile strength of the composite. Current temperature relationships of HTLS conductors were determined using the IEEE 738-2006 standard. Temperature rise of these conductors due to fault currents were also simulated. All simulations were performed using Microsoft Visual C++ suite. Tensile testing of metal matrix core was also performed. Results of DMA on carbon composite cores show that the storage modulus, hence tensile strength, decreases rapidly in the temperature range of intended use. DMA on composite cores subjected to heat treatment were conducted to investigate any changes in the variation of storage modulus curves. The experiments also indicates that carbon composites cores subjected to temperatures at or above 250 degree Celsius can cause permanent loss of mechanical properties including tensile strength. The fault current temperature analysis of carbon composite based conductors reveal that fault currents eventually cleared by backup protection in the event of primary protection failure can cause damage to fiber matrix interface.

ContributorsBanerjee, Koustubh (Author) / Gorur, Ravi (Committee member) / Karady, George G. (Committee member) / Ayyanar, Raja (Committee member) / Arizona State University (Publisher)

Created2014

ASU Electronic Theses and Dissertations