Search Content

Utility of considering multiple alternative rectifications in data cleaning

Description

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many…

Most data cleaning systems aim to go from a given deterministic dirty database to another deterministic but clean database. Such an enterprise pre–supposes that it is in fact possible for the cleaning process to uniquely recover the clean versions of each dirty data tuple. This is not possible in many cases, where the most a cleaning system can do is to generate a (hopefully small) set of clean candidates for each dirty tuple. When the cleaning system is required to output a deterministic database, it is forced to pick one clean candidate (say the "most likely" candidate) per tuple. Such an approach can lead to loss of information. For example, consider a situation where there are three equally likely clean candidates of a dirty tuple. An appealing alternative that avoids such an information loss is to abandon the requirement that the output database be deterministic. In other words, even though the input (dirty) database is deterministic, I allow the reconstructed database to be probabilistic. Although such an approach does avoid the information loss, it also brings forth several challenges. For example, how many alternatives should be kept per tuple in the reconstructed database? Maintaining too many alternatives increases the size of the reconstructed database, and hence the query processing time. Second, while processing queries on the probabilistic database may well increase recall, how would they affect the precision of the query processing? In this thesis, I investigate these questions. My investigation is done in the context of a data cleaning system called BayesWipe that has the capability of producing multiple clean candidates per each dirty tuple, along with the probability that they are the correct cleaned version. I represent these alternatives as tuples in a tuple disjoint probabilistic database, and use the Mystiq system to process queries on it. This probabilistic reconstruction (called BayesWipe–PDB) is compared to a deterministic reconstruction (called BayesWipe–DET)—where the most likely clean candidate for each tuple is chosen, and the rest of the alternatives discarded.

ContributorsRihan, Preet Inder Singh (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2013

Understanding plasticity and fracture in aluminum alloys and their composites by 3D X-ray synchrotron tomography and microdiffraction

Description

Aluminum alloys and their composites are attractive materials for applications requiring high strength-to-weight ratios and reasonable cost. Many of these applications, such as those in the aerospace industry, undergo fatigue loading. An understanding of the microstructural damage that occurs in these materials is critical in assessing their fatigue resistance. Two…

Aluminum alloys and their composites are attractive materials for applications requiring high strength-to-weight ratios and reasonable cost. Many of these applications, such as those in the aerospace industry, undergo fatigue loading. An understanding of the microstructural damage that occurs in these materials is critical in assessing their fatigue resistance. Two distinct experimental studies were performed to further the understanding of fatigue damage mechanisms in aluminum alloys and their composites, specifically fracture and plasticity. Fatigue resistance of metal matrix composites (MMCs) depends on many aspects of composite microstructure. Fatigue crack growth behavior is particularly dependent on the reinforcement characteristics and matrix microstructure. The goal of this work was to obtain a fundamental understanding of fatigue crack growth behavior in SiC particle-reinforced 2080 Al alloy composites. In situ X-ray synchrotron tomography was performed on two samples at low (R=0.1) and at high (R=0.6) R-ratios. The resulting reconstructed images were used to obtain three-dimensional (3D) rendering of the particles and fatigue crack. Behaviors of the particles and crack, as well as their interaction, were analyzed and quantified. Four-dimensional (4D) visual representations were constructed to aid in the overall understanding of damage evolution. During fatigue crack growth in ductile materials, a plastic zone is created in the region surrounding the crack tip. Knowledge of the plastic zone is important for the understanding of fatigue crack formation as well as subsequent growth behavior. The goal of this work was to quantify the 3D size and shape of the plastic zone in 7075 Al alloys. X-ray synchrotron tomography and Laue microdiffraction were used to non-destructively characterize the volume surrounding a fatigue crack tip. The precise 3D crack profile was segmented from the reconstructed tomography data. Depth-resolved Laue patterns were obtained using differential-aperture X-ray structural microscopy (DAXM), from which peak-broadening characteristics were quantified. Plasticity, as determined by the broadening of diffracted peaks, was mapped in 3D. Two-dimensional (2D) maps of plasticity were directly compared to the corresponding tomography slices. A 3D representation of the plastic zone surrounding the fatigue crack was generated by superimposing the mapped plasticity on the 3D crack profile.

ContributorsHruby, Peter (Author) / Chawla, Nikhilesh (Thesis advisor) / Solanki, Kiran (Committee member) / Liu, Yongming (Committee member) / Arizona State University (Publisher)

Created2014

TweetSense: recommending hashtags for orphaned tweets by exploiting social signals in Twitter

Description

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with…

Twitter is a micro-blogging platform where the users can be social, informational or both. In certain cases, users generate tweets that have no "hashtags" or "@mentions"; we call it an orphaned tweet. The user will be more interested to find more "context" of an orphaned tweet presumably to engage with his/her friend on that topic. Finding context for an Orphaned tweet manually is challenging because of larger social graph of a user , the enormous volume of tweets generated per second, topic diversity, and limited information from tweet length of 140 characters. To help the user to get the context of an orphaned tweet, this thesis aims at building a hashtag recommendation system called TweetSense, to suggest hashtags as a context or metadata for the orphaned tweets. This in turn would increase user's social engagement and impact Twitter to maintain its monthly active online users in its social network. In contrast to other existing systems, this hashtag recommendation system recommends personalized hashtags by exploiting the social signals of users in Twitter. The novelty with this system is that it emphasizes on selecting the suitable candidate set of hashtags from the related tweets of user's social graph (timeline).The system then rank them based on the combination of features scores computed from their tweet and user related features. It is evaluated based on its ability to predict suitable hashtags for a random sample of tweets whose existing hashtags are deliberately removed for evaluation. I present a detailed internal empirical evaluation of TweetSense, as well as an external evaluation in comparison with current state of the art method.

ContributorsVijayakumar, Manikandan (Author) / Kambhampati, Subbarao (Thesis advisor) / Liu, Huan (Committee member) / Davulcu, Hasan (Committee member) / Arizona State University (Publisher)

Created2014

Planning with incomplete user preferences and domain models

Description

Current work in planning assumes that user preferences and/or domain dynamics are completely specified in advance, and aims to search for a single solution plan to satisfy these. In many real world scenarios, however, providing a complete specification of user preferences and domain dynamics becomes a time-consuming and error-prone task.…

Current work in planning assumes that user preferences and/or domain dynamics are completely specified in advance, and aims to search for a single solution plan to satisfy these. In many real world scenarios, however, providing a complete specification of user preferences and domain dynamics becomes a time-consuming and error-prone task. More often than not, a user may provide no knowledge or at best partial knowledge of her preferences with respect to a desired plan. Similarly, a domain writer may only be able to determine certain parts, not all, of the model of some actions in a domain. Such modeling issues requires new concepts on what a solution should be, and novel techniques in solving the problem. When user preferences are incomplete, rather than presenting a single plan, the planner must instead provide a set of plans containing one or more plans that are similar to the one that the user prefers. This research first proposes the usage of different measures to capture the quality of such plan sets. These are domain-independent distance measures based on plan elements if no knowledge of the user preferences is given, or the Integrated Preference Function measure in case incomplete knowledge of such preferences is provided. It then investigates various heuristic approaches to generate plan sets in accordance with these measures, and presents empirical results demonstrating the promise of the methods. The second part of this research addresses planning problems with incomplete domain models, specifically those annotated with possible preconditions and effects of actions. It formalizes the notion of plan robustness capturing the probability of success for plans during execution. A method of assessing plan robustness based on the weighted model counting approach is proposed. Two approaches for synthesizing robust plans are introduced. The first one compiles the robust plan synthesis problems to the conformant probabilistic planning problems. The second approximates the robustness measure with lower and upper bounds, incorporating them into a stochastic local search for estimating distance heuristic to a goal state. The resulting planner outperforms a state-of-the-art planner that can handle incomplete domain models in both plan quality and planning time.

ContributorsNguyễn, Tuấn Anh (Author) / Kambhampati, Subbarao (Thesis advisor) / Baral, Chitta (Committee member) / Do, Minh (Committee member) / Lee, Joohyung (Committee member) / Smith, David E. (Committee member) / Arizona State University (Publisher)

Created2014

Early age characterization and microstructural features of sustainable binder systems for concrete

Description

Concrete is the most widely used infrastructure material worldwide. Production of portland cement, the main binding component in concrete, has been shown to require significant energy and account for approximately 5-7% of global carbon dioxide production. The expected continued increased use of concrete over the coming decades indicates this is…

Concrete is the most widely used infrastructure material worldwide. Production of portland cement, the main binding component in concrete, has been shown to require significant energy and account for approximately 5-7% of global carbon dioxide production. The expected continued increased use of concrete over the coming decades indicates this is an ideal time to implement sustainable binder technologies. The current work aims to explore enhanced sustainability concretes, primarily in the context of limestone and flow. Aspects such as hydration kinetics, hydration product formation and pore structure add to the understanding of the strength development and potential durability characteristics of these binder systems. Two main strategies for enhancing this sustainability are explored in this work: (i) the use of high volume limestone in combination with other alternative cementitious materials to decrease the portland cement quantity in concrete and (ii) the use of geopolymers as the binder phase in concrete. The first phase of the work investigates the use of fine limestone as cement replacement from the perspective of hydration, strength development, and pore structure. The nature of the potential synergistic benefit of limestone and alumina will be explored. The second phase will focus on the rheological characterization of these materials in the fresh state, as well as a more general investigation of the rheological characterization of suspensions. The results of this work indicate several key ideas. (i) There is a potential synergistic benefit for strength, hydration, and pore structure by using alumina and in portland limestone cements, (ii) the limestone in these systems is shown to react to some extent, and fine limestone is shown to accelerate hydration, (iii) rheological characteristics of cementitious suspensions are complex, and strongly dependent on several key parameters including: the solid loading, interparticle forces, surface area of the particles present, particle size distribution of the particles, and rheological nature of the media in which the particles are suspended, and (iv) stress plateau method is proposed for the determination of rheological properties of concentrated suspensions, as it more accurately predicts apparent yield stress and is shown to correlate well with other viscoelastic properties of the suspensions.

ContributorsVance, Kirk (Author) / Neithalath, Narayanan (Thesis advisor) / Rajan, Subramaniam D. (Committee member) / Mobasher, Barzin (Committee member) / Chawla, Nikhilesh (Committee member) / Marzke, Robert (Committee member) / Arizona State University (Publisher)

Created2014

Communication between teammates in urban search and rescue

Description

Although current urban search and rescue (USAR) robots are little more than remotely controlled cameras, the end goal is for them to work alongside humans as trusted teammates. Natural language communications and performance data are collected as a team of humans works to carry out a simulated search and rescue…

Although current urban search and rescue (USAR) robots are little more than remotely controlled cameras, the end goal is for them to work alongside humans as trusted teammates. Natural language communications and performance data are collected as a team of humans works to carry out a simulated search and rescue task in an uncertain virtual environment. Conditions are tested emulating a remotely controlled robot versus an intelligent one. Differences in performance, situation awareness, trust, workload, and communications are measured. The Intelligent robot condition resulted in higher levels of performance and operator situation awareness (SA).

ContributorsBartlett, Cade Earl (Author) / Cooke, Nancy J. (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Wu, Bing (Committee member) / Arizona State University (Publisher)

Created2015

Quantifying electromigration processes in Sn-0.7Cu solder with lab-scale X-ray computed micro-tomography

Description

For decades, microelectronics manufacturing has been concerned with failures related to electromigration phenomena in conductors experiencing high current densities. The influence of interconnect microstructure on device failures related to electromigration in BGA and flip chip solder interconnects has become a significant interest with reduced individual solder interconnect volumes. A survey…

For decades, microelectronics manufacturing has been concerned with failures related to electromigration phenomena in conductors experiencing high current densities. The influence of interconnect microstructure on device failures related to electromigration in BGA and flip chip solder interconnects has become a significant interest with reduced individual solder interconnect volumes. A survey indicates that x-ray computed micro-tomography (µXCT) is an emerging, novel means for characterizing the microstructures' role in governing electromigration failures. This work details the design and construction of a lab-scale µXCT system to characterize electromigration in the Sn-0.7Cu lead-free solder system by leveraging in situ imaging.

In order to enhance the attenuation contrast observed in multi-phase material systems, a modeling approach has been developed to predict settings for the controllable imaging parameters which yield relatively high detection rates over the range of x-ray energies for which maximum attenuation contrast is expected in the polychromatic x-ray imaging system. In order to develop this predictive tool, a model has been constructed for the Bremsstrahlung spectrum of an x-ray tube, and calculations for the detector's efficiency over the relevant range of x-ray energies have been made, and the product of emitted and detected spectra has been used to calculate the effective x-ray imaging spectrum. An approach has also been established for filtering `zinger' noise in x-ray radiographs, which has proven problematic at high x-ray energies used for solder imaging. The performance of this filter has been compared with a known existing method and the results indicate a significant increase in the accuracy of zinger filtered radiographs.

The obtained results indicate the conception of a powerful means for the study of failure causing processes in solder systems used as interconnects in microelectronic packaging devices. These results include the volumetric quantification of parameters which are indicative of both electromigration tolerance of solders and the dominant mechanisms for atomic migration in response to current stressing. This work is aimed to further the community's understanding of failure-causing electromigration processes in industrially relevant material systems for microelectronic interconnect applications and to advance the capability of available characterization techniques for their interrogation.

ContributorsMertens, James Charles Edwin (Author) / Chawla, Nikhilesh (Thesis advisor) / Alford, Terry (Committee member) / Jiao, Yang (Committee member) / Neithalath, Narayanan (Committee member) / Arizona State University (Publisher)

Created2015

Mining content and relations for social spammer detection

Description

Social networking services have emerged as an important platform for large-scale information sharing and communication. With the growing popularity of social media, spamming has become rampant in the platforms. Complex network interactions and evolving content present great challenges for social spammer detection. Different from some existing well-studied platforms, distinct characteristics…

Social networking services have emerged as an important platform for large-scale information sharing and communication. With the growing popularity of social media, spamming has become rampant in the platforms. Complex network interactions and evolving content present great challenges for social spammer detection. Different from some existing well-studied platforms, distinct characteristics of newly emerged social media data present new challenges for social spammer detection. First, texts in social media are short and potentially linked with each other via user connections. Second, it is observed that abundant contextual information may play an important role in distinguishing social spammers and normal users. Third, not only the content information but also the social connections in social media evolve very fast. Fourth, it is easy to amass vast quantities of unlabeled data in social media, but would be costly to obtain labels, which are essential for many supervised algorithms. To tackle those challenges raise in social media data, I focused on developing effective and efficient machine learning algorithms for social spammer detection.

I provide a novel and systematic study of social spammer detection in the dissertation. By analyzing the properties of social network and content information, I propose a unified framework for social spammer detection by collectively using the two types of information in social media. Motivated by psychological findings in physical world, I investigate whether sentiment analysis can help spammer detection in online social media. In particular, I conduct an exploratory study to analyze the sentiment differences between spammers and normal users; and present a novel method to incorporate sentiment information into social spammer detection framework. Given the rapidly evolving nature, I propose a novel framework to efficiently reflect the effect of newly emerging social spammers. To tackle the problem of lack of labeling data in social media, I study how to incorporate network information into text content modeling, and design strategies to select the most representative and informative instances from social media for labeling. Motivated by publicly available label information from other media platforms, I propose to make use of knowledge learned from cross-media to help spammer detection on social media.

ContributorsHu, Xia, Ph.D (Author) / Liu, Huan (Thesis advisor) / Kambhampati, Subbarao (Committee member) / Ye, Jieping (Committee member) / Faloutsos, Christos (Committee member) / Arizona State University (Publisher)

Created2015

Unsupervised Bayesian data cleaning techniques for structured data

Description

Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which…

Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this thesis, I provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. I thus avoid the necessity for a domain expert or master data. I also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. A Map-Reduce architecture to perform this computation in a distributed manner is also shown. I evaluate these methods over both synthetic and real data.

ContributorsDe, Sushovan (Author) / Kambhampati, Subbarao (Thesis advisor) / Chen, Yi (Committee member) / Candan, K. Selcuk (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2014

Planning challenges in human-robot teaming

Description

As robotic technology and its various uses grow steadily more complex and ubiquitous, humans are coming into increasing contact with robotic agents. A large portion of such contact is cooperative interaction, where both humans and robots are required to work on the same application towards achieving common goals. These application…

As robotic technology and its various uses grow steadily more complex and ubiquitous, humans are coming into increasing contact with robotic agents. A large portion of such contact is cooperative interaction, where both humans and robots are required to work on the same application towards achieving common goals. These application scenarios are characterized by a need to leverage the strengths of each agent as part of a unified team to reach those common goals. To ensure that the robotic agent is truly a contributing team-member, it must exhibit some degree of autonomy in achieving goals that have been delegated to it. Indeed, a significant portion of the utility of such human-robot teams derives from the delegation of goals to the robot, and autonomy on the part of the robot in achieving those goals. In order to be considered truly autonomous, the robot must be able to make its own plans to achieve the goals assigned to it, with only minimal direction and assistance from the human.

Automated planning provides the solution to this problem -- indeed, one of the main motivations that underpinned the beginnings of the field of automated planning was to provide planning support for Shakey the robot with the STRIPS system. For long, however, automated planners suffered from scalability issues that precluded their application to real world, real time robotic systems. Recent decades have seen a gradual abeyance of those issues, and fast planning systems are now the norm rather than the exception. However, some of these advances in speedup and scalability have been achieved by ignoring or abstracting out challenges that real world integrated robotic systems must confront.

In this work, the problem of planning for human-hobot teaming is introduced. The central idea -- the use of automated planning systems as mediators in such human-robot teaming scenarios -- and the main challenges inspired from real world scenarios that must be addressed in order to make such planning seamless are presented: (i) Goals which can be specified or changed at execution time, after the planning process has completed; (ii) Worlds and scenarios where the state changes dynamically while a previous plan is executing; (iii) Models that are incomplete and can be changed during execution; and (iv) Information about the human agent's plan and intentions that can be used for coordination. These challenges are compounded by the fact that the human-robot team must execute in an open world, rife with dynamic events and other agents; and in a manner that encourages the exchange of information between the human and the robot. As an answer to these challenges, implemented solutions and a fielded prototype that combines all of those solutions into one planning system are discussed. Results from running this prototype in real world scenarios are presented, and extensions to some of the solutions are offered as appropriate.

ContributorsTalamadupula, Kartik (Author) / Kambhampati, Subbarao (Thesis advisor) / Baral, Chitta (Committee member) / Liu, Huan (Committee member) / Scheutz, Matthias (Committee member) / Smith, David E. (Committee member) / Arizona State University (Publisher)

Created2014

ASU Electronic Theses and Dissertations

Filtering by

Utility of considering multiple alternative rectifications in data cleaning

Understanding plasticity and fracture in aluminum alloys and their composites by 3D X-ray synchrotron tomography and microdiffraction

TweetSense: recommending hashtags for orphaned tweets by exploiting social signals in Twitter

Planning with incomplete user preferences and domain models

Early age characterization and microstructural features of sustainable binder systems for concrete

Communication between teammates in urban search and rescue

Quantifying electromigration processes in Sn-0.7Cu solder with lab-scale X-ray computed micro-tomography

Mining content and relations for social spammer detection

Unsupervised Bayesian data cleaning techniques for structured data

Planning challenges in human-robot teaming