Search Content

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Low power, high throughput continuous flow PCR instruments for environmental applications

Description

Continuous monitoring in the adequate temporal and spatial scale is necessary for a better understanding of environmental variations. But field deployments of molecular biological analysis platforms in that scale are currently hindered because of issues with power, throughput and automation. Currently, such analysis is performed by the collection of large…

Continuous monitoring in the adequate temporal and spatial scale is necessary for a better understanding of environmental variations. But field deployments of molecular biological analysis platforms in that scale are currently hindered because of issues with power, throughput and automation. Currently, such analysis is performed by the collection of large sample volumes from over a wide area and transporting them to laboratory testing facilities, which fail to provide any real-time information. This dissertation evaluates the systems currently utilized for in-situ field analyses and the issues hampering the successful deployment of such bioanalytial instruments for environmental applications. The design and development of high throughput, low power, and autonomous Polymerase Chain Reaction (PCR) instruments, amenable for portable field operations capable of providing quantitative results is presented here as part of this dissertation. A number of novel innovations have been reported here as part of this work in microfluidic design, PCR thermocycler design, optical design and systems integration. Emulsion microfluidics in conjunction with fluorinated oils and Teflon tubing have been used for the fluidic module that reduces cross-contamination eliminating the need for disposable components or constant cleaning. A cylindrical heater has been designed with the tubing wrapped around fixed temperature zones enabling continuous operation. Fluorescence excitation and detection have been achieved by using a light emitting diode (LED) as the excitation source and a photomultiplier tube (PMT) as the detector. Real-time quantitative PCR results were obtained by using multi-channel fluorescence excitation and detection using LED, optical fibers and a 64-channel multi-anode PMT for measuring continuous real-time fluorescence. The instrument was evaluated by comparing the results obtained with those obtained from a commercial instrument and found to be comparable. To further improve the design and enhance its field portability, this dissertation also presents a framework for the instrumentation necessary for a portable digital PCR platform to achieve higher throughputs with lower power. Both systems were designed such that it can easily couple with any upstream platform capable of providing nucleic acid for analysis using standard fluidic connections. Consequently, these instruments can be used not only in environmental applications, but portable diagnostics applications as well.

ContributorsRay, Tathagata (Author) / Youngbull, Cody (Thesis advisor) / Goryll, Michael (Thesis advisor) / Blain Christen, Jennifer (Committee member) / Yu, Hongyu (Committee member) / Arizona State University (Publisher)

Created2013

A cloud based continuous delivery software developing system on Vlab platform

Description

Continuous Delivery, as one of the youngest and most popular member of agile model family, has become a popular concept and method in software development industry recently. Instead of the traditional software development method, which requirements and solutions must be fixed before starting software developing, it promotes adaptive planning, evolutionary…

Continuous Delivery, as one of the youngest and most popular member of agile model family, has become a popular concept and method in software development industry recently. Instead of the traditional software development method, which requirements and solutions must be fixed before starting software developing, it promotes adaptive planning, evolutionary development and delivery, and encourages rapid and flexible response to change. However, several problems prevent Continuous Delivery to be introduced into education world. Taking into the consideration of the barriers, we propose a new Cloud based Continuous Delivery Software Developing System. This system is designed to fully utilize the whole life circle of software developing according to Continuous Delivery concepts in a virtualized environment in Vlab platform.

ContributorsDeng, Yuli (Author) / Huang, Dijiang (Thesis advisor) / Davulcu, Hasan (Committee member) / Chen, Yinong (Committee member) / Arizona State University (Publisher)

Created2013

Eating in the absence of hunger in college students

Description

The body is capable of regulating hunger in several ways. Some of these hunger regulation methods are innate, such as genetics, and some, such as the responses to stress and to the smell of food, are innate but can be affected by body conditions such as BMI and physical activity.…

The body is capable of regulating hunger in several ways. Some of these hunger regulation methods are innate, such as genetics, and some, such as the responses to stress and to the smell of food, are innate but can be affected by body conditions such as BMI and physical activity. Further, some hunger regulation methods stem from learned behaviors originating from cultural pressures or parenting styles. These latter regulation methods for hunger can be grouped into the categories: emotion, environment, and physical. The factors that regulate hunger can also influence the incidence of disordered eating, such as eating in the absence of hunger (EAH). Eating in the absence of hunger can occur in one of two scenarios, continuous EAH or beginning EAH. College students are at a particularly high risk for EAH and weight gain due to stress, social pressures, and the constant availability of energy dense and nutrient poor food options. The purpose of this study is to validate a modified EAH-C survey in college students and to discover which of the three latent factors (emotion, environment, physical) best predicts continual and beginning EAH. To do so, a modified EAH-C survey, with additional demographic components, was administered to students at a major southwest university. This survey contained two questions, one each for continuing and beginning EAH, regarding 14 factors related to emotional, physical, or environmental reasons that may trigger EAH. The results from this study revealed that the continual and beginning EAH surveys displayed good internal consistency reliability. We found that for beginning and continuing EAH, although emotion is the strongest predictor of EAH, all three latent factors are significant predictors of EAH. In addition, we found that environmental factors had the greatest influence on an individual's likelihood to continue to eat in the absence of hunger. Due to statistical abnormalities and differing numbers of factors in each category, we were unable to determine which of the three factors exerted the greatest influence on an individual's likelihood to begin eating in the absence of hunger. These results can be utilized to develop educational tools aimed at reducing EAH in college students, and ultimately reducing the likelihood for unhealthy weight gain and health complications related to obesity.

ContributorsGoett, Taylor (Author) / Johnston, Carol (Thesis advisor) / Lee, Chong (Committee member) / Lespron, Christy (Committee member) / Arizona State University (Publisher)

Created2013

Decentralized information search

Description

Our research focuses on finding answers through decentralized search, for complex, imprecise queries (such as "Which is the best hair salon nearby?") in situations where there is a spatiotemporal constraint (say answer needs to be found within 15 minutes) associated with the query. In general, human networks are good in…

Our research focuses on finding answers through decentralized search, for complex, imprecise queries (such as "Which is the best hair salon nearby?") in situations where there is a spatiotemporal constraint (say answer needs to be found within 15 minutes) associated with the query. In general, human networks are good in answering imprecise queries. We try to use the social network of a person to answer his query. Our research aims at designing a framework that exploits the user's social network in order to maximize the answers for a given query. Exploiting an user's social network has several challenges. The major challenge is that the user's immediate social circle may not possess the answer for the given query, and hence the framework designed needs to carry out the query diffusion process across the network. The next challenge involves in finding the right set of seeds to pass the query to in the user's social circle. One other challenge is to incentivize people in the social network to respond to the query and thereby maximize the quality and quantity of replies. Our proposed framework is a mobile application where an individual can either respond to the query or forward it to his friends. We simulated the query diffusion process in three types of graphs: Small World, Random and Preferential Attachment. Given a type of network and a particular query, we carried out the query diffusion by selecting seeds based on attributes of the seed. The main attributes are Topic relevance, Replying or Forwarding probability and Time to Respond. We found that there is a considerable increase in the number of replies attained, even without saturating the user's network, if we adopt an optimal seed selection process. We found the output of the optimal algorithm to be satisfactory as the number of replies received at the interrogator's end was close to three times the number of neighbors an interrogator has. We addressed the challenge of incentivizing people to respond by associating a particular amount of points for each query asked, and awarding the same to people involved in answering the query. Thus, we aim to design a mobile application based on our proposed framework so that it helps in maximizing the replies for the interrogator's query by diffusing the query across his/her social network.

ContributorsSwaminathan, Neelakantan (Author) / Sundaram, Hari (Thesis advisor) / Davulcu, Hasan (Thesis advisor) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2013

Analysis of shifts & trends of organizations in Indonesia using tweets & RSS feeds

Description

With the advent of social media (like Twitter, Facebook etc.,) people are easily sharing their opinions, sentiments and enforcing their ideologies on others like never before. Even people who are otherwise socially inactive would like to share their thoughts on current affairs by tweeting and sharing news feeds with their…

With the advent of social media (like Twitter, Facebook etc.,) people are easily sharing their opinions, sentiments and enforcing their ideologies on others like never before. Even people who are otherwise socially inactive would like to share their thoughts on current affairs by tweeting and sharing news feeds with their friends and acquaintances. In this thesis study, we chose Twitter as our main data platform to analyze shifts and movements of 27 political organizations in Indonesia. So far, we have collected over 30 million tweets and 150,000 news articles from RSS feeds of the corresponding organizations for our analysis. For Twitter data extraction, we developed a multi-threaded application which seamlessly extracts, cleans and stores millions of tweets matching our keywords from Twitter Streaming API. For keyword extraction, we used topics and perspectives which were extracted using n-grams techniques and later approved by our social scientists. After the data is extracted, we aggregate the tweet contents that belong to every user on a weekly basis. Finally, we applied linear and logistic regression using SLEP, an open source sparse learning package to compute weekly score for users and mapping them to one of the 27 organizations on a radical or counter radical scale. Since, we are mapping users to organizations on a weekly basis, we are able to track user's behavior and important new events that triggered shifts among users between organizations. This thesis study can further be extended to identify topics and organization specific influential users and new users from various social media platforms like Facebook, YouTube etc. can easily be mapped to existing organizations on a radical or counter-radical scale.

ContributorsPoornachandran, Sathishkumar (Author) / Davulcu, Hasan (Thesis advisor) / Sen, Arunabha (Committee member) / Woodward, Mark (Committee member) / Arizona State University (Publisher)

Created2013

An exploration of attitudes and perceptions of cash value vouchers in the Arizona Special Supplemental Nutrition Program for Women, Infants, and Children (WIC)

Description

In October, 2009, participants of the Arizona Special Supplemental Nutrition Program for Women, Infants and Children (WIC) began receiving monthly Cash Value Vouchers (CVV) worth between six and 10 dollars towards the purchase of fresh fruits and vegetables. Data from the Arizona Department of Health Services (ADHS) showed CVV redemption…

In October, 2009, participants of the Arizona Special Supplemental Nutrition Program for Women, Infants and Children (WIC) began receiving monthly Cash Value Vouchers (CVV) worth between six and 10 dollars towards the purchase of fresh fruits and vegetables. Data from the Arizona Department of Health Services (ADHS) showed CVV redemption rates in the first two years of the program were lower than the national average of 77% redemption. In response, the ADHS WIC Food List was expanded to also include canned and frozen fruits and vegetables. More recent data from ADHS suggest that redemption rates are improving, but variably exist among different WIC sub-populations. The purpose of this project was to identify themes related to the ease or difficulty of WIC CVV use amongst different categories of low-redeeming WIC participants. A total of 8 focus groups were conducted, four at a clinic in each of two Valley cities: Surprise and Mesa. Each of the four focus groups comprised one of four targeted WIC participant categories: pregnant, postpartum, breastfeeding, and children with participation ranging from 3-9 participants per group. Using the general inductive approach, recordings of the focus groups were transcribed, hand-coded and uploaded into qualitative analysis software resulting in four emergent themes including: interactions and shopping strategies, maximizing WIC value, redemption issues, and effect of rule change. Researchers identified twelve different subthemes related to the emergent theme of interactions and strategies to improve their experience, including economic considerations during redemption. Barriers related to interactions existed that made their purchase difficult, most notably anger from the cashier and other shoppers. However, participants made use of a number of strategies to facilitate WIC purchases or extract more value out of WIC benefits, such as pooling their CVV. Finally, it appears that the fruit and vegetable rule change was well received by those who were aware of the change. These data suggest a number of important avenues for future research, including verifying these themes are important within a larger, representative sample of Arizona WIC participants, and exploring strategies to minimize barriers identified by participants, such as use of electronic benefits transfer-style cards (EBT).

ContributorsBertmann, Farryl M. W (Author) / Wharton, Christopher (Christopher Mack), 1977- (Thesis advisor) / Ohri-Vachaspati, Punam (Committee member) / Johnston, Carol (Committee member) / Hampl, Jeffrey (Committee member) / Dixit-Joshi, Sujata (Committee member) / Barroso, Cristina (Committee member) / Arizona State University (Publisher)

Created2013

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located…

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Study of self-heating effects in GaN HEMTs

Description

GaN high electron mobility transistors (HEMTs) based on the III-V nitride material system have been under extensive investigation because of their superb performance as high power RF devices. Two dimensional electron gas(2-DEG) with charge density ten times higher than that of GaAs-based HEMT and mobility much higher than Si enables…

GaN high electron mobility transistors (HEMTs) based on the III-V nitride material system have been under extensive investigation because of their superb performance as high power RF devices. Two dimensional electron gas(2-DEG) with charge density ten times higher than that of GaAs-based HEMT and mobility much higher than Si enables a low on-resistance required for RF devices. Self-heating issues with GaN HEMT and lack of understanding of various phenomena are hindering their widespread commercial development. There is a need to understand device operation by developing a model which could be used to optimize electrical and thermal characteristics of GaN HEMT design for high power and high frequency operation. In this thesis work a physical simulation model of AlGaN/GaN HEMT is developed using commercially available software ATLAS from SILVACO Int. based on the energy balance/hydrodynamic carrier transport equations. The model is calibrated against experimental data. Transfer and output characteristics are the key focus in the analysis along with saturation drain current. The resultant IV curves showed a close correspondence with experimental results. Various combinations of electron mobility, velocity saturation, momentum and energy relaxation times and gate work functions were attempted to improve IV curve correlation. Thermal effects were also investigated to get a better understanding on the role of self-heating effects on the electrical characteristics of GaN HEMTs. The temperature profiles across the device were observed. Hot spots were found along the channel in the gate-drain spacing. These preliminary results indicate that the thermal effects do have an impact on the electrical device characteristics at large biases even though the amount of self-heating is underestimated with respect to thermal particle-based simulations that solve the energy balance equations for acoustic and optical phonons as well (thus take proper account of the formation of the hot-spot). The decrease in drain current is due to decrease in saturation carrier velocity. The necessity of including hydrodynamic/energy balance transport models for accurate simulations is demonstrated. Possible ways for improving model accuracy are discussed in conjunction with future research.

ContributorsChowdhury, Towhid (Author) / Vasileska, Dragica (Thesis advisor) / Goodnick, Stephen (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)

Created2013

Distributed inference using bounded transmissions

Description

Distributed inference has applications in a wide range of fields such as source localization, target detection, environment monitoring, and healthcare. In this dissertation, distributed inference schemes which use bounded transmit power are considered. The performance of the proposed schemes are studied for a variety of inference problems. In the first…

Distributed inference has applications in a wide range of fields such as source localization, target detection, environment monitoring, and healthcare. In this dissertation, distributed inference schemes which use bounded transmit power are considered. The performance of the proposed schemes are studied for a variety of inference problems. In the first part of the dissertation, a distributed detection scheme where the sensors transmit with constant modulus signals over a Gaussian multiple access channel is considered. The deflection coefficient of the proposed scheme is shown to depend on the characteristic function of the sensing noise, and the error exponent for the system is derived using large deviation theory. Optimization of the deflection coefficient and error exponent are considered with respect to a transmission phase parameter for a variety of sensing noise distributions including impulsive ones. The proposed scheme is also favorably compared with existing amplify-and-forward (AF) and detect-and-forward (DF) schemes. The effect of fading is shown to be detrimental to the detection performance and simulations are provided to corroborate the analytical results. The second part of the dissertation studies a distributed inference scheme which uses bounded transmission functions over a Gaussian multiple access channel. The conditions on the transmission functions under which consistent estimation and reliable detection are possible is characterized. For the distributed estimation problem, an estimation scheme that uses bounded transmission functions is proved to be strongly consistent provided that the variance of the noise samples are bounded and that the transmission function is one-to-one. The proposed estimation scheme is compared with the amplify and forward technique and its robustness to impulsive sensing noise distributions is highlighted. It is also shown that bounded transmissions suffer from inconsistent estimates if the sensing noise variance goes to infinity. For the distributed detection problem, similar results are obtained by studying the deflection coefficient. Simulations corroborate our analytical results. In the third part of this dissertation, the problem of estimating the average of samples distributed at the nodes of a sensor network is considered. A distributed average consensus algorithm in which every sensor transmits with bounded peak power is proposed. In the presence of communication noise, it is shown that the nodes reach consensus asymptotically to a finite random variable whose expectation is the desired sample average of the initial observations with a variance that depends on the step size of the algorithm and the variance of the communication noise. The asymptotic performance is characterized by deriving the asymptotic covariance matrix using results from stochastic approximation theory. It is shown that using bounded transmissions results in slower convergence compared to the linear consensus algorithm based on the Laplacian heuristic. Simulations corroborate our analytical findings. Finally, a robust distributed average consensus algorithm in which every sensor performs a nonlinear processing at the receiver is proposed. It is shown that non-linearity at the receiver nodes makes the algorithm robust to a wide range of channel noise distributions including the impulsive ones. It is shown that the nodes reach consensus asymptotically and similar results are obtained as in the case of transmit non-linearity. Simulations corroborate our analytical findings and highlight the robustness of the proposed algorithm.

ContributorsDasarathan, Sivaraman (Author) / Tepedelenlioğlu, Cihan (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Reisslein, Martin (Committee member) / Goryll, Michael (Committee member) / Arizona State University (Publisher)

Created2013

Filtering by