Search Content

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Description

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a…

The increasing popularity of Twitter renders improved trustworthiness and relevance assessment of tweets much more important for search. However, given the limitations on the size of tweets, it is hard to extract measures for ranking from the tweet's content alone. I propose a method of ranking tweets by generating a reputation score for each tweet that is based not just on content, but also additional information from the Twitter ecosystem that consists of users, tweets, and the web pages that tweets link to. This information is obtained by modeling the Twitter ecosystem as a three-layer graph. The reputation score is used to power two novel methods of ranking tweets by propagating the reputation over an agreement graph based on tweets' content similarity. Additionally, I show how the agreement graph helps counter tweet spam. An evaluation of my method on 16~million tweets from the TREC 2011 Microblog Dataset shows that it doubles the precision over baseline Twitter Search and achieves higher precision than current state of the art method. I present a detailed internal empirical evaluation of RAProp in comparison to several alternative approaches proposed by me, as well as external evaluation in comparison to the current state of the art method.

ContributorsRavikumar, Srijith (Author) / Kambhampati, Subbarao (Thesis advisor) / Davulcu, Hasan (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Energy-efficient distributed estimation by utilizing a nonlinear amplifier

Description

Distributed estimation uses many inexpensive sensors to compose an accurate estimate of a given parameter. It is frequently implemented using wireless sensor networks. There have been several studies on optimizing power allocation in wireless sensor networks used for distributed estimation, the vast majority of which assume linear radio-frequency amplifiers. Linear…

Distributed estimation uses many inexpensive sensors to compose an accurate estimate of a given parameter. It is frequently implemented using wireless sensor networks. There have been several studies on optimizing power allocation in wireless sensor networks used for distributed estimation, the vast majority of which assume linear radio-frequency amplifiers. Linear amplifiers are inherently inefficient, so in this dissertation nonlinear amplifiers are examined to gain efficiency while operating distributed sensor networks. This research presents a method to boost efficiency by operating the amplifiers in the nonlinear region of operation. Operating amplifiers nonlinearly presents new challenges. First, nonlinear amplifier characteristics change across manufacturing process variation, temperature, operating voltage, and aging. Secondly, the equations conventionally used for estimators and performance expectations in linear amplify-and-forward systems fail. To compensate for the first challenge, predistortion is utilized not to linearize amplifiers but rather to force them to fit a common nonlinear limiting amplifier model close to the inherent amplifier performance. This minimizes the power impact and the training requirements for predistortion. Second, new estimators are required that account for transmitter nonlinearity. This research derives analytically and confirms via simulation new estimators and performance expectation equations for use in nonlinear distributed estimation. An additional complication when operating nonlinear amplifiers in a wireless environment is the influence of varied and potentially unknown channel gains. The impact of these varied gains and both measurement and channel noise sources on estimation performance are analyzed in this paper. Techniques for minimizing the estimate variance are developed. It is shown that optimizing transmitter power allocation to minimize estimate variance for the most-compressed parameter measurement is equivalent to the problem for linear sensors. Finally, a method for operating distributed estimation in a multipath environment is presented that is capable of developing robust estimates for a wide range of Rician K-factors. This dissertation demonstrates that implementing distributed estimation using nonlinear sensors can boost system efficiency and is compatible with existing techniques from the literature for boosting efficiency at the system level via sensor power allocation. Nonlinear transmitters work best when channel gains are known and channel noise and receiver noise levels are low.

ContributorsSantucci, Robert (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioðlu, Cihan (Committee member) / Bakkaloglu, Bertan (Committee member) / Tsakalis, Kostas (Committee member) / Arizona State University (Publisher)

Created2013

Multipath mitigating correlation kernels

Description

Autonomous vehicle control systems utilize real-time kinematic Global Navigation Satellite Systems (GNSS) receivers to provide a position within two-centimeter of truth. GNSS receivers utilize the satellite signal time of arrival estimates to solve for position; and multipath corrupts the time of arrival estimates with a time-varying bias. Time of arrival…

Autonomous vehicle control systems utilize real-time kinematic Global Navigation Satellite Systems (GNSS) receivers to provide a position within two-centimeter of truth. GNSS receivers utilize the satellite signal time of arrival estimates to solve for position; and multipath corrupts the time of arrival estimates with a time-varying bias. Time of arrival estimates are based upon accurate direct sequence spread spectrum (DSSS) code and carrier phase tracking. Current multipath mitigating GNSS solutions include fixed radiation pattern antennas and windowed delay-lock loop code phase discriminators. A new multipath mitigating code tracking algorithm is introduced that utilizes a non-symmetric correlation kernel to reject multipath. Independent parameters provide a means to trade-off code tracking discriminant gain against multipath mitigation performance. The algorithm performance is characterized in terms of multipath phase error bias, phase error estimation variance, tracking range, tracking ambiguity and implementation complexity. The algorithm is suitable for modernized GNSS signals including Binary Phase Shift Keyed (BPSK) and a variety of Binary Offset Keyed (BOC) signals. The algorithm compensates for unbalanced code sequences to ensure a code tracking bias does not result from the use of asymmetric correlation kernels. The algorithm does not require explicit knowledge of the propagation channel model. Design recommendations for selecting the algorithm parameters to mitigate precorrelation filter distortion are also provided.

ContributorsMiller, Steven (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Zhang, Junshan (Committee member) / Arizona State University (Publisher)

Created2013

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Description

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located…

Automating aspects of biocuration through biomedical information extraction could significantly impact biomedical research by enabling greater biocuration throughput and improving the feasibility of a wider scope. An important step in biomedical information extraction systems is named entity recognition (NER), where mentions of entities such as proteins and diseases are located within natural-language text and their semantic type is determined. This step is critical for later tasks in an information extraction pipeline, including normalization and relationship extraction. BANNER is a benchmark biomedical NER system using linear-chain conditional random fields and the rich feature set approach. A case study with BANNER locating genes and proteins in biomedical literature is described. The first corpus for disease NER adequate for use as training data is introduced, and employed in a case study of disease NER. The first corpus locating adverse drug reactions (ADRs) in user posts to a health-related social website is also described, and a system to locate and identify ADRs in social media text is created and evaluated. The rich feature set approach to creating NER feature sets is argued to be subject to diminishing returns, implying that additional improvements may require more sophisticated methods for creating the feature set. This motivates the first application of multivariate feature selection with filters and false discovery rate analysis to biomedical NER, resulting in a feature set at least 3 orders of magnitude smaller than the set created by the rich feature set approach. Finally, two novel approaches to NER by modeling the semantics of token sequences are introduced. The first method focuses on the sequence content by using language models to determine whether a sequence resembles entries in a lexicon of entity names or text from an unlabeled corpus more closely. The second method models the distributional semantics of token sequences, determining the similarity between a potential mention and the token sequences from the training data by analyzing the contexts where each sequence appears in a large unlabeled corpus. The second method is shown to improve the performance of BANNER on multiple data sets.

ContributorsLeaman, James Robert (Author) / Gonzalez, Graciela (Thesis advisor) / Baral, Chitta (Thesis advisor) / Cohen, Kevin B (Committee member) / Liu, Huan (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Undergraduate signal processing laboratories on the Android platform

Description

The field of education has been immensely benefited by major breakthroughs in technology. The arrival of computers and the internet made student-teacher interaction from different parts of the world viable, increasing the reach of the educator to hitherto remote corners of the world. The arrival of mobile phones in the…

The field of education has been immensely benefited by major breakthroughs in technology. The arrival of computers and the internet made student-teacher interaction from different parts of the world viable, increasing the reach of the educator to hitherto remote corners of the world. The arrival of mobile phones in the recent past has the potential to provide the next paradigm shift in the way education is conducted. It combines the universal reach and powerful visualization capabilities of the computer with intimacy and portability. Engineering education is a field which can exploit the benefits of mobile devices to enhance learning and spread essential technical know-how to different parts of the world. In this thesis, I present AJDSP, an Android application evolved from JDSP, providing an intuitive and a easy to use environment for signal processing education. AJDSP is a graphical programming laboratory for digital signal processing developed for the Android platform. It is designed to provide utility; both as a supplement to traditional classroom learning and as a tool for self-learning. The architecture of AJDSP is based on the Model-View-Controller paradigm optimized for the Android platform. The extensive set of function modules cover a wide range of basic signal processing areas such as convolution, fast Fourier transform, z transform and filter design. The simple and intuitive user interface inspired from iJDSP is designed to facilitate ease of navigation and to provide the user with an intimate learning environment. Rich visualizations necessary to understand mathematically intensive signal processing algorithms have been incorporated into the software. Interactive demonstrations boosting student understanding of concepts like convolution and the relation between different signal domains have also been developed. A set of detailed assessments to evaluate the application has been conducted for graduate and senior-level undergraduate students.

ContributorsRanganath, Suhas (Author) / Spanias, Andreas (Thesis advisor) / Tepedelenlioğlu, Cihan (Committee member) / Tsakalis, Konstantinos (Committee member) / Arizona State University (Publisher)

Created2013

Analytical control grid registration for efficient application of optical flow

Description

Image resolution limits the extent to which zooming enhances clarity, restricts the size digital photographs can be printed at, and, in the context of medical images, can prevent a diagnosis. Interpolation is the supplementing of known data with estimated values based on a function or model involving some or all…

Image resolution limits the extent to which zooming enhances clarity, restricts the size digital photographs can be printed at, and, in the context of medical images, can prevent a diagnosis. Interpolation is the supplementing of known data with estimated values based on a function or model involving some or all of the known samples. The selection of the contributing data points and the specifics of how they are used to define the interpolated values influences how effectively the interpolation algorithm is able to estimate the underlying, continuous signal. The main contributions of this dissertation are three fold: 1) Reframing edge-directed interpolation of a single image as an intensity-based registration problem. 2) Providing an analytical framework for intensity-based registration using control grid constraints. 3) Quantitative assessment of the new, single-image enlargement algorithm based on analytical intensity-based registration. In addition to single image resizing, the new methods and analytical approaches were extended to address a wide range of applications including volumetric (multi-slice) image interpolation, video deinterlacing, motion detection, and atmospheric distortion correction. Overall, the new approaches generate results that more accurately reflect the underlying signals than less computationally demanding approaches and with lower processing requirements and fewer restrictions than methods with comparable accuracy.

ContributorsZwart, Christine M. (Author) / Frakes, David H (Thesis advisor) / Karam, Lina (Committee member) / Kodibagkar, Vikram (Committee member) / Spanias, Andreas (Committee member) / Towe, Bruce (Committee member) / Arizona State University (Publisher)

Created2013

Upper body motion analysis using kinect for stroke rehabilitation at the home

Description

Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera -…

Motion capture using cost-effective sensing technology is challenging and the huge success of Microsoft Kinect has been attracting researchers to uncover the potential of using this technology into computer vision applications. In this thesis, an upper-body motion analysis in a home-based system for stroke rehabilitation using novel RGB-D camera - Kinect is presented. We address this problem by ﬁrst conducting a systematic analysis of the usability of Kinect for motion analysis in stroke rehabilitation. Then a hybrid upper body tracking approach is proposed which combines off-the-shelf skeleton tracking with a novel depth-fused mean shift tracking method. We proposed several kinematic features reliably extracted from the proposed inexpensive and portable motion capture system and classiﬁers that correlate torso movement to clinical measures of unimpaired and impaired. Experiment results show that the proposed sensing and analysis works reliably on measuring torso movement quality and is promising for end-point tracking. The system is currently being deployed for large-scale evaluations.

ContributorsDu, Tingfang (Author) / Turaga, Pavan (Thesis advisor) / Spanias, Andreas (Committee member) / Rikakis, Thanassis (Committee member) / Arizona State University (Publisher)

Created2012

Sparse methods in image understanding and computer vision

Description

Image understanding has been playing an increasingly crucial role in vision applications. Sparse models form an important component in image understanding, since the statistics of natural images reveal the presence of sparse structure. Sparse methods lead to parsimonious models, in addition to being efficient for large scale learning. In sparse…

Image understanding has been playing an increasingly crucial role in vision applications. Sparse models form an important component in image understanding, since the statistics of natural images reveal the presence of sparse structure. Sparse methods lead to parsimonious models, in addition to being efficient for large scale learning. In sparse modeling, data is represented as a sparse linear combination of atoms from a "dictionary" matrix. This dissertation focuses on understanding different aspects of sparse learning, thereby enhancing the use of sparse methods by incorporating tools from machine learning. With the growing need to adapt models for large scale data, it is important to design dictionaries that can model the entire data space and not just the samples considered. By exploiting the relation of dictionary learning to 1-D subspace clustering, a multilevel dictionary learning algorithm is developed, and it is shown to outperform conventional sparse models in compressed recovery, and image denoising. Theoretical aspects of learning such as algorithmic stability and generalization are considered, and ensemble learning is incorporated for effective large scale learning. In addition to building strategies for efficiently implementing 1-D subspace clustering, a discriminative clustering approach is designed to estimate the unknown mixing process in blind source separation. By exploiting the non-linear relation between the image descriptors, and allowing the use of multiple features, sparse methods can be made more effective in recognition problems. The idea of multiple kernel sparse representations is developed, and algorithms for learning dictionaries in the feature space are presented. Using object recognition experiments on standard datasets it is shown that the proposed approaches outperform other sparse coding-based recognition frameworks. Furthermore, a segmentation technique based on multiple kernel sparse representations is developed, and successfully applied for automated brain tumor identification. Using sparse codes to define the relation between data samples can lead to a more robust graph embedding for unsupervised clustering. By performing discriminative embedding using sparse coding-based graphs, an algorithm for measuring the glomerular number in kidney MRI images is developed. Finally, approaches to build dictionaries for local sparse coding of image descriptors are presented, and applied to object recognition and image retrieval.

ContributorsJayaraman Thiagarajan, Jayaraman (Author) / Spanias, Andreas (Thesis advisor) / Frakes, David (Committee member) / Tepedelenlioğlu, Cihan (Committee member) / Turaga, Pavan (Committee member) / Arizona State University (Publisher)

Created2013

Connecting users with similar interests for group understanding

Description

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our…

In most social networking websites, users are allowed to perform interactive activities. One of the fundamental features that these sites provide is to connecting with users of their kind. On one hand, this activity makes online connections visible and tangible; on the other hand, it enables the exploration of our connections and the expansion of our social networks easier. The aggregation of people who share common interests forms social groups, which are fundamental parts of our social lives. Social behavioral analysis at a group level is an active research area and attracts many interests from the industry. Challenges of my work mainly arise from the scale and complexity of user generated behavioral data. The multiple types of interactions, highly dynamic nature of social networking and the volatile user behavior suggest that these data are complex and big in general. Effective and efficient approaches are required to analyze and interpret such data. My work provide effective channels to help connect the like-minded and, furthermore, understand user behavior at a group level. The contributions of this dissertation are in threefold: (1) proposing novel representation of collective tagging knowledge via tag networks; (2) proposing the new information spreader identification problem in egocentric soical networks; (3) defining group profiling as a systematic approach to understanding social groups. In sum, the research proposes novel concepts and approaches for connecting the like-minded, enables the understanding of user groups, and exposes interesting research opportunities.

ContributorsWang, Xufei (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Sundaram, Hari (Committee member) / Ye, Jieping (Committee member) / Arizona State University (Publisher)

Created2013

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data

Description

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like…

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.

ContributorsBhattacharya, Indrani (Author) / Sen, Arunabha (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Theses and Dissertations

Filtering by

RAProp: ranking tweets by exploiting the tweet/user/web ecosystem

Energy-efficient distributed estimation by utilizing a nonlinear amplifier

Multipath mitigating correlation kernels

Advancing biomedical named entity recognition with multivariate feature selection and semantically motivated features

Undergraduate signal processing laboratories on the Android platform

Analytical control grid registration for efficient application of optical flow

Upper body motion analysis using kinect for stroke rehabilitation at the home

Sparse methods in image understanding and computer vision

Connecting users with similar interests for group understanding

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data