Search Content

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located…

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Angle of incidence and power degradation analysis of photovoltaic modules

Description

Photovoltaic (PV) module nameplates typically provide the module's electrical characteristics at standard test conditions (STC). The STC conditions are: irradiance of 1000 W/m2, cell temperature of 25oC and sunlight spectrum at air mass 1.5. However, modules in the field experience a wide range of environmental conditions which affect their electrical…

Photovoltaic (PV) module nameplates typically provide the module's electrical characteristics at standard test conditions (STC). The STC conditions are: irradiance of 1000 W/m2, cell temperature of 25oC and sunlight spectrum at air mass 1.5. However, modules in the field experience a wide range of environmental conditions which affect their electrical characteristics and render the nameplate data insufficient in determining a module's overall, actual field performance. To make sound technical and financial decisions, designers and investors need additional performance data to determine the energy produced by modules operating under various field conditions. The angle of incidence (AOI) of sunlight on PV modules is one of the major parameters which dictate the amount of light reaching the solar cells. The experiment was carried out at the Arizona State University- Photovoltaic Reliability Laboratory (ASU-PRL). The data obtained was processed in accordance with the IEC 61853-2 model to obtain relative optical response of the modules (response which does not include the cosine effect). The results were then compared with theoretical models for air-glass interface and also with the empirical model developed by Sandia National Laboratories. The results showed that all modules with glass as the superstrate had identical optical response and were in agreement with both the IEC 61853-2 model and other theoretical and empirical models. The performance degradation of module over years of exposure in the field is dependent upon factors such as environmental conditions, system configuration, etc. Analyzing the degradation of power and other related performance parameters over time will provide vital information regarding possible degradation rates and mechanisms of the modules. An extensive study was conducted by previous ASU-PRL students on approximately 1700 modules which have over 13 years of hot- dry climatic field condition. An analysis of the results obtained in previous ASU-PRL studies show that the major degradation in crystalline silicon modules having glass/polymer construction is encapsulant discoloration (causing short circuit current drop) and solder bond degradation (causing fill factor drop due to series resistance increase). The power degradation for crystalline silicon modules having glass/glass construction was primarily attributed to encapsulant delamination (causing open-circuit voltage drop).

ContributorsVasantha Janakeeraman, Suryanarayana (Author) / Tamizhmani, Govindasamy (Thesis advisor) / Rogers, Bradley (Committee member) / Macia, Narciso (Committee member) / Arizona State University (Publisher)

Created2013

26+ year old photovoltaic power plant: degradation and reliability evaluation of crystalline silicon modules - north array

Description

The object of this study was a 26 year old residential Photovoltaic (PV) monocrystalline silicon (c-Si) power plant, called Solar One, built by developer John F. Long in Phoenix, Arizona (a hot-dry field condition). The task for Arizona State University Photovoltaic Reliability Laboratory (ASU-PRL) graduate students was to evaluate the…

The object of this study was a 26 year old residential Photovoltaic (PV) monocrystalline silicon (c-Si) power plant, called Solar One, built by developer John F. Long in Phoenix, Arizona (a hot-dry field condition). The task for Arizona State University Photovoltaic Reliability Laboratory (ASU-PRL) graduate students was to evaluate the power plant through visual inspection, electrical performance, and infrared thermography. The purpose of this evaluation was to measure and understand the extent of degradation to the system along with the identification of the failure modes in this hot-dry climatic condition. This 4000 module bipolar system was originally installed with a 200 kW DC output of PV array (17 degree fixed tilt) and an AC output of 175 kVA. The system was shown to degrade approximately at a rate of 2.3% per year with no apparent potential induced degradation (PID) effect. The power plant is made of two arrays, the north array and the south array. Due to a limited time frame to execute this large project, this work was performed by two masters students (Jonathan Belmont and Kolapo Olakonu) and the test results are presented in two masters theses. This thesis presents the results obtained on the north array and the other thesis presents the results obtained on the south array. The resulting study showed that PV module design, array configuration, vandalism, installation methods and Arizona environmental conditions have had an effect on this system's longevity and reliability. Ultimately, encapsulation browning, higher series resistance (potentially due to solder bond fatigue) and non-cell interconnect ribbon breakages outside the modules were determined to be the primary causes for the power loss.

ContributorsBelmont, Jonathan (Author) / Tamizhmani, Govindasamy (Thesis advisor) / Henderson, Mark (Committee member) / Rogers, Bradley (Committee member) / Arizona State University (Publisher)

Created2013

Potential induced degradation (PID) of pre-stressed photovoltaic modules: effect of glass surface conductivity disruption

Description

Potential induced degradation (PID) due to high system voltages is one of the major degradation mechanisms in photovoltaic (PV) modules, adversely affecting their performance due to the combined effects of the following factors: system voltage, superstrate/glass surface conductivity, encapsulant conductivity, silicon nitride anti-reflection coating property and interface property (glass/encapsulant; encapsulant/cell;…

Potential induced degradation (PID) due to high system voltages is one of the major degradation mechanisms in photovoltaic (PV) modules, adversely affecting their performance due to the combined effects of the following factors: system voltage, superstrate/glass surface conductivity, encapsulant conductivity, silicon nitride anti-reflection coating property and interface property (glass/encapsulant; encapsulant/cell; encapsulant/backsheet). Previous studies carried out at ASU's Photovoltaic Reliability Laboratory (ASU-PRL) showed that only negative voltage bias (positive grounded systems) adversely affects the performance of commonly available crystalline silicon modules. In previous studies, the surface conductivity of the glass surface was obtained using either conductive carbon layer extending from the glass surface to the frame or humidity inside an environmental chamber. This thesis investigates the influence of glass surface conductivity disruption on PV modules. In this study, conductive carbon was applied only on the module's glass surface without extending to the frame and the surface conductivity was disrupted (no carbon layer) at 2cm distance from the periphery of frame inner edges. This study was carried out under dry heat at two different temperatures (60 °C and 85 °C) and three different negative bias voltages (-300V, -400V, and -600V). To replicate closeness to the field conditions, half of the selected modules were pre-stressed under damp heat for 1000 hours (DH 1000) and the remaining half under 200 hours of thermal cycling (TC 200). When the surface continuity was disrupted by maintaining a 2 cm gap from the frame to the edge of the conductive layer, as demonstrated in this study, the degradation was found to be absent or negligibly small even after 35 hours of negative bias at elevated temperatures. This preliminary study appears to indicate that the modules could become immune to PID losses if the continuity of the glass surface conductivity is disrupted at the inside boundary of the frame. The surface conductivity of the glass, due to water layer formation in a humid condition, close to the frame could be disrupted just by applying a water repelling (hydrophobic) but high transmittance surface coating (such as Teflon) or modifying the frame/glass edges with water repellent properties.

ContributorsTatapudi, Sai Ravi Vasista (Author) / Tamizhmani, Govindasamy (Thesis advisor) / Srinivasan, Devarajan (Committee member) / Rogers, Bradley (Committee member) / Arizona State University (Publisher)

Created2012

26+ year old photovoltaic power plant: degradation and reliability evaluation of crystalline silicon modules - south array

Description

ABSTRACT As the use of photovoltaic (PV) modules in large power plants continues to increase globally, more studies on degradation, reliability, failure modes, and mechanisms of field aged modules are needed to predict module life expectancy based on accelerated lifetime testing of PV modules. In this work, a 26+ year…

ABSTRACT As the use of photovoltaic (PV) modules in large power plants continues to increase globally, more studies on degradation, reliability, failure modes, and mechanisms of field aged modules are needed to predict module life expectancy based on accelerated lifetime testing of PV modules. In this work, a 26+ year old PV power plant in Phoenix, Arizona has been evaluated for performance, reliability, and durability. The PV power plant, called Solar One, is owned and operated by John F. Long's homeowners association. It is a 200 kWdc, standard test conditions (STC) rated power plant comprised of 4000 PV modules or frameless laminates, in 100 panel groups (rated at 175 kWac). The power plant is made of two center-tapped bipolar arrays, the north array and the south array. Due to a limited time frame to execute this large project, this work was performed by two masters students (Jonathan Belmont and Kolapo Olakonu) and the test results are presented in two masters theses. This thesis presents the results obtained on the south array and the other thesis presents the results obtained on the north array. Each of these two arrays is made of four sub arrays, the east sub arrays (positive and negative polarities) and the west sub arrays (positive and negative polarities), making up eight sub arrays. The evaluation and analyses of the power plant included in this thesis consists of: visual inspection, electrical performance measurements, and infrared thermography. A possible presence of potential induced degradation (PID) due to potential difference between ground and strings was also investigated. Some installation practices were also studied and found to contribute to the power loss observed in this investigation. The power output measured in 2011 for all eight sub arrays at STC is approximately 76 kWdc and represents a power loss of 62% (from 200 kW to 76 kW) over 26+ years. The 2011 measured power output for the four south sub arrays at STC is 39 kWdc and represents a power loss of 61% (from 100 kW to 39 kW) over 26+ years. Encapsulation browning and non-cell interconnect ribbon breakages were determined to be the primary causes for the power loss.

ContributorsOlakonu, Kolapo (Author) / Tamizhmani, Govindasamy (Thesis advisor) / Srinivasan, Devarajan (Committee member) / Rogers, Bradley (Committee member) / Arizona State University (Publisher)

Created2012

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data

Description

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like…

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.

ContributorsBhattacharya, Indrani (Author) / Sen, Arunabha (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Feasibility of energy harvesting using a piezoelectric tire

Description

While the piezoelectric effect has been around for some time, it has only recently caught interest as a potential sustainable energy harvesting device. Piezoelectric energy harvesting has been developed for shoes and panels, but has yet to be integrated into a marketable bicycle tire. For this thesis, the development and…

While the piezoelectric effect has been around for some time, it has only recently caught interest as a potential sustainable energy harvesting device. Piezoelectric energy harvesting has been developed for shoes and panels, but has yet to be integrated into a marketable bicycle tire. For this thesis, the development and feasibility of a piezoelectric tire was done. This includes the development of a circuit that incorporates piezoceramic elements, energy harvesting circuitry, and an energy storage device. A single phase circuit was designed using an ac-dc diode rectifier. An electrolytic capacitor was used as the energy storage device. A financial feasibility was also done to determine targets for manufacturing cost and sales price. These models take into account market trends for high performance tires, economies of scale, and the possibility of government subsidies. This research will help understand the potential for the marketability of a piezoelectric energy harvesting tire that can create electricity for remote use. This study found that there are many obstacles that must be addressed before a piezoelectric tire can be marketed to the general public. The power output of this device is miniscule compared to an alkaline battery. In order for this device to approach the power output of an alkaline battery the weight of the device would also become an issue. Additionally this device is very costly compared to the average bicycle tire. Lastly, this device is extreme fragile and easily broken. In order for this device to become marketable the issues of power output, cost, weight, and durability must all be successfully overcome.

ContributorsMalotte, Christopher (Author) / Madakannan, Arunachalanadar (Thesis advisor) / Srinivasan, Devarajan (Committee member) / Rogers, Bradley (Committee member) / Arizona State University (Publisher)

Created2012

IISS a framework to influence individuals through social signals on a social network

Description

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a…

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a subset of items presented to her. The network operator again, shows that activity to a selection of peers, and thus creating a behavioral loop. That mechanism of interaction and information flow raises some very interesting questions such as: can network operator design social signals to promote a particular activity like sustainability, public health care awareness, or to promote a specific product? The focus of my thesis is to answer that question. In this thesis, I develop a framework to personalize social signals for users to guide their activities on an online platform. As the result, we gradually nudge the activity distribution on the platform from the initial distribution p to the target distribution q. My work is particularly applicable to guiding collaborations, guiding collective actions, and online advertising. In particular, I first propose a probabilistic model on how users behave and how information flows on the platform. The main part of this thesis after that discusses the Influence Individuals through Social Signals (IISS) framework. IISS consists of four main components: (1) Learner: it learns users' interests and characteristics from their historical activities using Bayesian model, (2) Calculator: it uses gradient descent method to compute the intermediate activity distributions, (3) Selector: it selects users who can be influenced to adopt or drop specific activities, (4) Designer: it personalizes social signals for each user. I evaluate the performance of IISS framework by simulation on several network topologies such as preferential attachment, small world, and random. I show that the framework gradually nudges users' activities to approach the target distribution. I use both simulation and mathematical method to analyse convergence properties such as how fast and how close we can approach the target distribution. When the number of activities is 3, I show that for about 45% of target distributions, we can achieve KL-divergence as low as 0.05. But for some other distributions KL-divergence can be as large as 0.5.

ContributorsLe, Tien D (Author) / Sundaram, Hari (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2014

Utility of considering multiple alternative rectifications in data cleaning

Description

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many…

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.

ContributorsRihan, Preet Inder Singh (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

ASU Electronic Theses and Dissertations