Search Content

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located…

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Application of 2-D digital image correlation (DIC) method to damage characterization of cementitious composites under dynamic tensile loads

Description

The main objective of this study is to investigate the mechanical behaviour of cementitious based composites subjected dynamic tensile loading, with effects of strain rate, temperature, addition of short fibres etc. Fabric pullout model and tension stiffening model based on finite difference model, previously developed at Arizona State University were…

The main objective of this study is to investigate the mechanical behaviour of cementitious based composites subjected dynamic tensile loading, with effects of strain rate, temperature, addition of short fibres etc. Fabric pullout model and tension stiffening model based on finite difference model, previously developed at Arizona State University were used to help study the bonding mechanism between fibre and matrix, and the phenomenon of tension stiffening due to the addition of fibres and textiles. Uniaxial tension tests were conducted on strain-hardening cement-based composites (SHCC), textile reinforced concrete (TRC) with and without addition of short fibres, at the strain rates ranging from 25 s-1 to 100 s-1. Historical data on quasi-static tests of same materials were used to demonstrate the effects including increases in average tensile strength, strain capacity, work-to-fracture due to high strain rate. Polyvinyl alcohol (PVA), glass, polypropylene were employed as reinforcements of concrete. A state-of-the-art phantom v7 high speed camera was setup to record the video at frame rate of 10,000 fps. Random speckle pattern of texture style was made on the surface of specimens for image analysis. An optical non-contacting deformation measurement technique referred to as digital image correlation (DIC) method was used to conduct the image analysis by means of tracking the displacement field through comparison between the reference image and deformed images. DIC successfully obtained full-filed strain distribution, strain versus time responses, demonstrated the bonding mechanism from perspective of strain field, and corrected the stress-strain responses.

ContributorsYao, Yiming (Author) / Barzin, Mobasher (Thesis advisor) / Rajan, Subramaniam D. (Committee member) / Neithalath, Narayanan (Committee member) / Arizona State University (Publisher)

Created2013

Probabilistic finite element analysis and design optimization for structural designs

Description

This study focuses on implementing probabilistic nature of material properties (Kevlar® 49) to the existing deterministic finite element analysis (FEA) of fabric based engine containment system through Monte Carlo simulations (MCS) and implementation of probabilistic analysis in engineering designs through Reliability Based Design Optimization (RBDO). First, the emphasis is on…

This study focuses on implementing probabilistic nature of material properties (Kevlar® 49) to the existing deterministic finite element analysis (FEA) of fabric based engine containment system through Monte Carlo simulations (MCS) and implementation of probabilistic analysis in engineering designs through Reliability Based Design Optimization (RBDO). First, the emphasis is on experimental data analysis focusing on probabilistic distribution models which characterize the randomness associated with the experimental data. The material properties of Kevlar® 49 are modeled using experimental data analysis and implemented along with an existing spiral modeling scheme (SMS) and user defined constitutive model (UMAT) for fabric based engine containment simulations in LS-DYNA. MCS of the model are performed to observe the failure pattern and exit velocities of the models. Then the solutions are compared with NASA experimental tests and deterministic results. MCS with probabilistic material data give a good prospective on results rather than a single deterministic simulation results. The next part of research is to implement the probabilistic material properties in engineering designs. The main aim of structural design is to obtain optimal solutions. In any case, in a deterministic optimization problem even though the structures are cost effective, it becomes highly unreliable if the uncertainty that may be associated with the system (material properties, loading etc.) is not represented or considered in the solution process. Reliable and optimal solution can be obtained by performing reliability optimization along with the deterministic optimization, which is RBDO. In RBDO problem formulation, in addition to structural performance constraints, reliability constraints are also considered. This part of research starts with introduction to reliability analysis such as first order reliability analysis, second order reliability analysis followed by simulation technique that are performed to obtain probability of failure and reliability of structures. Next, decoupled RBDO procedure is proposed with a new reliability analysis formulation with sensitivity analysis, which is performed to remove the highly reliable constraints in the RBDO, thereby reducing the computational time and function evaluations. Followed by implementation of the reliability analysis concepts and RBDO in finite element 2D truss problems and a planar beam problem are presented and discussed.

ContributorsDeivanayagam, Arumugam (Author) / Rajan, Subramaniam D. (Thesis advisor) / Mobasher, Barzin (Committee member) / Neithalath, Narayanan (Committee member) / Arizona State University (Publisher)

Created2012

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data

Description

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like…

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.

ContributorsBhattacharya, Indrani (Author) / Sen, Arunabha (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Low velocity impact properties of sandwich insulated panels with textile - reinforced concrete skin and aerated concrete core

Description

The main objective of this study is to develop an innovative system in the form of a sandwich panel type composite with textile reinforced skins and aerated concrete core. Existing theoretical concepts along with extensive experimental investigations were utilized to characterize the behavior of cement based systems in the presence…

The main objective of this study is to develop an innovative system in the form of a sandwich panel type composite with textile reinforced skins and aerated concrete core. Existing theoretical concepts along with extensive experimental investigations were utilized to characterize the behavior of cement based systems in the presence of individual fibers and textile yarns. Part of this thesis is based on a material model developed here in Arizona State University to simulate experimental flexural response and back calculate tensile response. This concept is based on a constitutive law consisting of a tri-linear tension model with residual strength and a bilinear elastic perfectly plastic compression stress strain model. This parametric model was used to characterize Textile Reinforced Concrete (TRC) with aramid, carbon, alkali resistant glass, polypropylene TRC and hybrid systems of aramid and polypropylene. The same material model was also used to characterize long term durability issues with glass fiber reinforced concrete (GFRC). Historical data associated with effect of temperature dependency in aging of GFRC composites were used. An experimental study was conducted to understand the behavior of aerated concrete systems under high stain rate impact loading. Test setup was modeled on a free fall drop of an instrumented hammer using three point bending configuration. Two types of aerated concrete: autoclaved aerated concrete (AAC) and polymeric fiber-reinforced aerated concrete (FRAC) were tested and compared in terms of their impact behavior. The effect of impact energy on the mechanical properties was investigated for various drop heights and different specimen sizes. Both materials showed similar flexural load carrying capacity under impact, however, flexural toughness of fiber-reinforced aerated concrete was proved to be several degrees higher in magnitude than that provided by plain autoclaved aerated concrete. Effect of specimen size and drop height on the impact response of AAC and FRAC was studied and discussed. Results obtained were compared to the performance of sandwich beams with AR glass textile skins with aerated concrete core under similar impact conditions. After this extensive study it was concluded that this type of sandwich composite could be effectively used in low cost sustainable infrastructure projects.

ContributorsDey, Vikram (Author) / Mobasher, Barzin (Thesis advisor) / Rajan, Subramaniam D. (Committee member) / Neithalath, Narayanan (Committee member) / Arizona State University (Publisher)

Created2012

IISS a framework to influence individuals through social signals on a social network

Description

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a…

Contemporary online social platforms present individuals with social signals in the form of news feed on their peers' activities. On networks such as Facebook, Quora, network operator decides how that information is shown to an individual. Then the user, with her own interests and resource constraints selectively acts on a subset of items presented to her. The network operator again, shows that activity to a selection of peers, and thus creating a behavioral loop. That mechanism of interaction and information flow raises some very interesting questions such as: can network operator design social signals to promote a particular activity like sustainability, public health care awareness, or to promote a specific product? The focus of my thesis is to answer that question. In this thesis, I develop a framework to personalize social signals for users to guide their activities on an online platform. As the result, we gradually nudge the activity distribution on the platform from the initial distribution p to the target distribution q. My work is particularly applicable to guiding collaborations, guiding collective actions, and online advertising. In particular, I first propose a probabilistic model on how users behave and how information flows on the platform. The main part of this thesis after that discusses the Influence Individuals through Social Signals (IISS) framework. IISS consists of four main components: (1) Learner: it learns users' interests and characteristics from their historical activities using Bayesian model, (2) Calculator: it uses gradient descent method to compute the intermediate activity distributions, (3) Selector: it selects users who can be influenced to adopt or drop specific activities, (4) Designer: it personalizes social signals for each user. I evaluate the performance of IISS framework by simulation on several network topologies such as preferential attachment, small world, and random. I show that the framework gradually nudges users' activities to approach the target distribution. I use both simulation and mathematical method to analyse convergence properties such as how fast and how close we can approach the target distribution. When the number of activities is 3, I show that for about 45% of target distributions, we can achieve KL-divergence as low as 0.05. But for some other distributions KL-divergence can be as large as 0.5.

ContributorsLe, Tien D (Author) / Sundaram, Hari (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2014

NMR studies of MRI contrast agents and cementitous materials

Description

Nuclear magnetic resonance (NMR) is an important phenomenon involving nuclear magnetic moments in magnetic field, which can provide much information about a wide range of materials, including their chemical composition, chemical environments and nuclear spin interactions. The NMR spectrometer has been extensively developed and used in many areas of research.…

Nuclear magnetic resonance (NMR) is an important phenomenon involving nuclear magnetic moments in magnetic field, which can provide much information about a wide range of materials, including their chemical composition, chemical environments and nuclear spin interactions. The NMR spectrometer has been extensively developed and used in many areas of research. In this thesis, studies in two different areas using NMR are presented. First, a new kind of nanoparticle, Gd(DTPA) intercalated layered double hydroxide (LDH), has been successfully synthesized in the laboratory of Prof. Dey in SEMTE at ASU. In Chapter II, the NMR relaxation studies of two types of LDH (Mg, Al-LDH and Zn, Al-LDH) are presented and the results show that when they are intercalated with Gd(DTPA) they have a higher relaxivity than current commercial magnetic resonance imaging (MRI) contrast agents, such as DTPA in water solution. So this material may be useful as an MRI contrast agent. Several conditions were examined, such as nanoparticle size, pH and intercalation percentage, to determine the optimal relaxivity of this nanoparticle. Further NMR studies and simulations were conducted to provide an explanation for the high relaxivity. Second, fly ash is a kind of cementitious material, which has been of great interest because, when activated by an alkaline solution, it exhibits the capability for replacing ordinary Portland cement as a concrete binder. However, the reaction of activated fly ash is not fully understood. In chapter III, pore structure and NMR studies of activated fly ash using different activators, including NaOH and KOH (4M and 8M) and Na/K silicate, are presented. The pore structure, degree of order and proportion of different components in the reaction product were obtained, which reveal much about the reaction and makeup of the final product.

ContributorsPeng, Zihui (Author) / Marzke, Robert F (Thesis advisor) / Dey, Sandwip Kumar (Committee member) / Neithalath, Narayanan (Committee member) / Chamberlin, Ralph Vary (Committee member) / Mccartney, Martha Rogers (Committee member) / Arizona State University (Publisher)

Created2013

Preliminary structural design optimization of tall buildings using GS-USA Frame3D

Description

Tall buildings are spreading across the globe at an ever-increasing rate (www.ctbuh.org). The global number of buildings 200m or more in height has risen from 286 to 602 in the last decade alone. The increasing complexity of building architecture poses unique challenges in the structural design of modern tall buildings.…

Tall buildings are spreading across the globe at an ever-increasing rate (www.ctbuh.org). The global number of buildings 200m or more in height has risen from 286 to 602 in the last decade alone. The increasing complexity of building architecture poses unique challenges in the structural design of modern tall buildings. Hence, innovative structural systems need to be evaluated to create an economical design that satisfies multiple design criteria. Design using traditional trial-and-error approach can be extremely time-consuming and the resultant design uneconomical. Thus, there is a need for an efficient numerical optimization tool that can explore and generate several design alternatives in the preliminary design phase which can lead to a more desirable final design. In this study, we present the details of a tool that can be very useful in preliminary design optimization - finite element modeling, design optimization, translating design code requirements into components of the FE and design optimization models, and pre-and post-processing to verify the veracity of the model. Emphasis is placed on development and deployment of various FE models (static, modal and dynamic analyses; linear, beam and plate/shell finite elements), design optimization problem formulation (sizing, shape, topology and material selection optimization) and numerical optimization tools (gradient-based and evolutionary optimization methods) [Rajan, 2001]. The design optimization results of full scale three dimensional buildings subject to multiple design criteria including stress, serviceability and dynamic response are discussed.

ContributorsSirigiri, Mamatha (Author) / Rajan, Subramaniam D. (Thesis advisor) / Neithalath, Narayanan (Committee member) / Mobasher, Barzin (Committee member) / Arizona State University (Publisher)

Created2014

Utility of considering multiple alternative rectifications in data cleaning

Description

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many…

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.

ContributorsRihan, Preet Inder Singh (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

ASU Electronic Theses and Dissertations