Search Content

Finding provenance data in social media

Description

A statement appearing in social media provides a very significant challenge for determining the provenance of the statement. Provenance describes the origin, custody, and ownership of something. Most statements appearing in social media are not published with corresponding provenance data. However, the same characteristics that make the social media environment…

A statement appearing in social media provides a very significant challenge for determining the provenance of the statement. Provenance describes the origin, custody, and ownership of something. Most statements appearing in social media are not published with corresponding provenance data. However, the same characteristics that make the social media environment challenging, including the massive amounts of data available, large numbers of users, and a highly dynamic environment, provide unique and untapped opportunities for solving the provenance problem for social media. Current approaches for tracking provenance data do not scale for online social media and consequently there is a gap in provenance methodologies and technologies providing exciting research opportunities. The guiding vision is the use of social media information itself to realize a useful amount of provenance data for information in social media. This departs from traditional approaches for data provenance which rely on a central store of provenance information. The contemporary online social media environment is an enormous and constantly updated "central store" that can be mined for provenance information that is not readily made available to the average social media user. This research introduces an approach and builds a foundation aimed at realizing a provenance data capability for social media users that is not accessible today.

ContributorsBarbier, Geoffrey P (Author) / Liu, Huan (Thesis advisor) / Bell, Herbert (Committee member) / Li, Baoxin (Committee member) / Sen, Arunabha (Committee member) / Arizona State University (Publisher)

Created2011

Factors affecting behavioral change through the use of computer-mediated technology

Description

This study explores the impact of feedback and feedforward and personality on computer-mediated behavior change. The impact of the effects were studied using subjects who entered information relevant to their diet and exercise into an online tool. Subjects were divided into four experimental groups: those receiving only feedback, those receiving…

This study explores the impact of feedback and feedforward and personality on computer-mediated behavior change. The impact of the effects were studied using subjects who entered information relevant to their diet and exercise into an online tool. Subjects were divided into four experimental groups: those receiving only feedback, those receiving only feedforward, those receiving both, and those receiving none. Results were analyzed using regression analysis. Results indicate that both feedforward and feedback impact behavior change and that individuals with individuals ranking low in conscientiousness experienced behavior change equivalent to that of individuals with high conscientiousness in the presence of feedforward and/or feedback.

ContributorsMcCreless, Tamuchin (Author) / St. Louis, Robert (Thesis advisor) / St. Louis, Robert D. (Committee member) / Goul, Kenneth M (Committee member) / Shao, Benjamin B (Committee member) / Arizona State University (Publisher)

Created2012

Learning from asymmetric models and matched pairs

Description

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus…

With the increase in computing power and availability of data, there has never been a greater need to understand data and make decisions from it. Traditional statistical techniques may not be adequate to handle the size of today's data or the complexities of the information hidden within the data. Thus knowledge discovery by machine learning techniques is necessary if we want to better understand information from data. In this dissertation, we explore the topics of asymmetric loss and asymmetric data in machine learning and propose new algorithms as solutions to some of the problems in these topics. We also studied variable selection of matched data sets and proposed a solution when there is non-linearity in the matched data. The research is divided into three parts. The first part addresses the problem of asymmetric loss. A proposed asymmetric support vector machine (aSVM) is used to predict specific classes with high accuracy. aSVM was shown to produce higher precision than a regular SVM. The second part addresses asymmetric data sets where variables are only predictive for a subset of the predictor classes. Asymmetric Random Forest (ARF) was proposed to detect these kinds of variables. The third part explores variable selection for matched data sets. Matched Random Forest (MRF) was proposed to find variables that are able to distinguish case and control without the restrictions that exists in linear models. MRF detects variables that are able to distinguish case and control even in the presence of interaction and qualitative variables.

ContributorsKoh, Derek (Author) / Runger, George C. (Thesis advisor) / Wu, Tong (Committee member) / Pan, Rong (Committee member) / Cesta, John (Committee member) / Arizona State University (Publisher)

Created2013

Analysis of an information sharing system model within the community of Navajo County, Arizona

Description

Law enforcement, schools and universities, health service agencies, as well as social service agencies, each acquire information from individuals that receive their services. That information gets recorded into the respective application system of each organization. The information, however, gets recorded only in the context of each service rendered and within…

Law enforcement, schools and universities, health service agencies, as well as social service agencies, each acquire information from individuals that receive their services. That information gets recorded into the respective application system of each organization. The information, however, gets recorded only in the context of each service rendered and within each system used to record it. Information that is recorded by the police department for one individual is entirely different from the information that is recorded by the hospital for that same individual. What if all the organizations used the same system to record information? What if all the organizations followed the same protocols to record information as well as access it? The goal of this research was to analyze a system that allows for all organizations within a community to share information with each other. Technically, this system is feasible. However, public opinion says sharing personal information is unethical, and Federal regulation says it is unlawful. To accomplish an information-sharing system of this type, both regulation and public opinion need to be addressed.

ContributorsPullin, Britton Scott (Author) / Schildgen, Thomas (Thesis advisor) / Prewitt, Deborah (Committee member) / Ralston, Laurel (Committee member) / Arizona State University (Publisher)

Created2012

Industrial applications of data mining: engineering effort forecasting based on mining and analysis of patterns in historical project execution data

Description

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like…

Data mining is increasing in importance in solving a variety of industry problems. Our initiative involves the estimation of resource requirements by skill set for future projects by mining and analyzing actual resource consumption data from past projects in the semiconductor industry. To achieve this goal we face difficulties like data with relevant consumption information but stored in different format and insufficient data about project attributes to interpret consumption data. Our first goal is to clean the historical data and organize it into meaningful structures for analysis. Once the preprocessing on data is completed, different data mining techniques like clustering is applied to find projects which involve resources of similar skillsets and which involve similar complexities and size. This results in "resource utilization templates" for groups of related projects from a resource consumption perspective. Then project characteristics are identified which generate this diversity in headcounts and skillsets. These characteristics are not currently contained in the data base and are elicited from the managers of historical projects. This represents an opportunity to improve the usefulness of the data collection system for the future. The ultimate goal is to match the product technical features with the resource requirement for projects in the past as a model to forecast resource requirements by skill set for future projects. The forecasting model is developed using linear regression with cross validation of the training data as the past project execution are relatively few in number. Acceptable levels of forecast accuracy are achieved relative to human experts' results and the tool is applied to forecast some future projects' resource demand.

ContributorsBhattacharya, Indrani (Author) / Sen, Arunabha (Thesis advisor) / Kempf, Karl G. (Thesis advisor) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2013

Context-aware rank-oriented recommender systems

Description

Recommender systems are a type of information filtering system that suggests items that may be of interest to a user. Most information retrieval systems have an overwhelmingly large number of entries. Most users would experience information overload if they were forced to explore the full set of results. The goal…

Recommender systems are a type of information filtering system that suggests items that may be of interest to a user. Most information retrieval systems have an overwhelmingly large number of entries. Most users would experience information overload if they were forced to explore the full set of results. The goal of recommender systems is to overcome this limitation by predicting how users will value certain items and returning the items that should be of the highest interest to the user. Most recommender systems collect explicit user feedback, such as a rating, and attempt to optimize their model to this rating value. However, there is potential for a system to collect implicit user feedback, such as user purchases and clicks, to learn user preferences. Additionally with implicit user feedback, it is possible for the system to remember the context of user feedback in terms of which other items a user was considering when making their decisions. When considering implicit user feedback, only a subset of all evaluation techniques can be used. Currently, sufficient evaluation techniques for evaluating implicit user feedback do not exist. In this thesis, I introduce a new model for recommendation that borrows the idea of opportunity cost from economics. There are two variations of the model, one considering context and one that does not. Additionally, I propose a new evaluation measure that works specifically for the case of implicit user feedback.

ContributorsAckerman, Brian (Author) / Chen, Yi (Thesis advisor) / Candan, Kasim (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2012

It's not all about the music: digital goods, social media, and the pressure of peers

Description

Social media offers a powerful platform for the independent digital content producer community to develop, disperse, and maintain their brands. In terms of information systems research, the broad majority of the work has not examined hedonic consumption on Social Media Sites (SMS). The focus has mostly been on the organizational…

Social media offers a powerful platform for the independent digital content producer community to develop, disperse, and maintain their brands. In terms of information systems research, the broad majority of the work has not examined hedonic consumption on Social Media Sites (SMS). The focus has mostly been on the organizational perspectives and utilitarian gains from these services. Unlike through traditional commerce channels, including e-commerce retailers, consumption enhancing hedonic utility is experienced differently in the context of a social media site; consequently, the dynamic of the decision-making process shifts when it is made in a social context. Previous research assumed a limited influence of a small, immediate group of peers. But the rules change when the network of peers expands exponentially. The assertion is that, while there are individual differences in the level of susceptibility to influence coming from others, these are not the most important pieces of the analysis--unlike research centered completely on influence. Rather, the context of the consumption can play an important role in the way social influence factors affect consumer behavior on Social Media Sites. Over the course of three studies, this dissertation will examine factors that influence consumer decision-making and the brand personalities created and interpreted in these SMS. Study one examines the role of different types of peer influence on consumer decision-making on Facebook. Study two observes the impact of different types of producer message posts with the different types of influence on decision-making on Twitter. Study three will conclude this work with an exploratory empirical investigation of actual twitter postings of a set of musicians. These studies contribute to the body of IS literature by evaluating the specific behavioral changes related to consumption in the context of digital social media: (a) the power of social influencers in contrast to personal preferences on SMS, (b) the effect on consumers of producer message types and content on SMS at both the profile level and the individual message level.

ContributorsSopha, Matthew (Author) / Santanam, Raghu T (Thesis advisor) / Goul, Kenneth M (Committee member) / Gu, Bin (Committee member) / Arizona State University (Publisher)

Created2013

Volatile perceptions: the power of the public sphere to reshape science

Description

This thesis examines the role of the media and popular culture in defining the shape and scope of what we think of today as "science." As a source of cognitive authority the scientific establishment is virtually beyond dispute. The intellectual clout of science seemingly elevates it to a position outside…

This thesis examines the role of the media and popular culture in defining the shape and scope of what we think of today as "science." As a source of cognitive authority the scientific establishment is virtually beyond dispute. The intellectual clout of science seemingly elevates it to a position outside the influence of the general population. Yet in reality the emergence and evolution of the public sphere, including popular culture, has had a profound impact on the definition and application of science. What science is and how it relates to the life of the ordinary person are hardly static concepts; the public perception of science has been molding its boundaries since at least the 18th century. During the Enlightenment "natural philosophy" was broadly accessible and integrated nicely with other forms of knowledge. As the years passed into the 19th century, however, science became increasingly professionalized and distinct, until the "Two Cultures" had fully developed. The established scientific institution distanced itself from the nonscientific community, leaving the task of communicating scientific knowledge to various popularizers, who typically operated through the media and often used the mantle of science to further their own social or political agendas. Such isolation from orthodox science forced the public to create an alternate form of science for popular consumption, a form consisting mainly of decontextualized facts, often used in contrast to other forms of thought (i.e. religion, art, or pseudoscience). However, with the recent advent of "Web 2.0" and the increasing prominence of convergence culture, the role of the public sphere is undergoing a dramatic revolution. Concepts such as "collective intelligence" are changing consumers of information into simultaneous producers, establishing vast peer networks of collaboration and enabling the public to bypass traditional sources of authority. This new hypermobility of information and empowerment of the public sphere are just now beginning to break down science's monolithic status. In many ways, it seems, we are entering a new Enlightenment.

ContributorsSmith, Robert Scott (Author) / Lussier, Mark (Thesis advisor) / Broglio, Ronald (Committee member) / Bivona, Daniel (Committee member) / Arizona State University (Publisher)

Created2012

An informatics approach to establishing a sustainable public health community

Description

This work involved the analysis of a public health system, and the design, development and deployment of enterprise informatics architecture, and sustainable community methods to address problems with the current public health system. Specifically, assessment of the Nationally Notifiable Disease Surveillance System (NNDSS) was instrumental in forming the design of…

This work involved the analysis of a public health system, and the design, development and deployment of enterprise informatics architecture, and sustainable community methods to address problems with the current public health system. Specifically, assessment of the Nationally Notifiable Disease Surveillance System (NNDSS) was instrumental in forming the design of the current implementation at the Southern Nevada Health District (SNHD). The result of the system deployment at SNHD was considered as a basis for projecting the practical application and benefits of an enterprise architecture. This approach has resulted in a sustainable platform to enhance the practice of public health by improving the quality and timeliness of data, effectiveness of an investigation, and reporting across the continuum.

ContributorsKriseman, Jeffrey Michael (Author) / Dinu, Valentin (Thesis advisor) / Greenes, Robert (Committee member) / Johnson, William (Committee member) / Arizona State University (Publisher)

Created2012

Modeling time series data for supervised learning

Description

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of…

Temporal data are increasingly prevalent and important in analytics. Time series (TS) data are chronological sequences of observations and an important class of temporal data. Fields such as medicine, finance, learning science and multimedia naturally generate TS data. Each series provide a high-dimensional data vector that challenges the learning of the relevant patterns This dissertation proposes TS representations and methods for supervised TS analysis. The approaches combine new representations that handle translations and dilations of patterns with bag-of-features strategies and tree-based ensemble learning. This provides flexibility in handling time-warped patterns in a computationally efficient way. The ensemble learners provide a classification framework that can handle high-dimensional feature spaces, multiple classes and interaction between features. The proposed representations are useful for classification and interpretation of the TS data of varying complexity. The first contribution handles the problem of time warping with a feature-based approach. An interval selection and local feature extraction strategy is proposed to learn a bag-of-features representation. This is distinctly different from common similarity-based time warping. This allows for additional features (such as pattern location) to be easily integrated into the models. The learners have the capability to account for the temporal information through the recursive partitioning method. The second contribution focuses on the comprehensibility of the models. A new representation is integrated with local feature importance measures from tree-based ensembles, to diagnose and interpret time intervals that are important to the model. Multivariate time series (MTS) are especially challenging because the input consists of a collection of TS and both features within TS and interactions between TS can be important to models. Another contribution uses a different representation to produce computationally efficient strategies that learn a symbolic representation for MTS. Relationships between the multiple TS, nominal and missing values are handled with tree-based learners. Applications such as speech recognition, medical diagnosis and gesture recognition are used to illustrate the methods. Experimental results show that the TS representations and methods provide better results than competitive methods on a comprehensive collection of benchmark datasets. Moreover, the proposed approaches naturally provide solutions to similarity analysis, predictive pattern discovery and feature selection.

ContributorsBaydogan, Mustafa Gokce (Author) / Runger, George C. (Thesis advisor) / Atkinson, Robert (Committee member) / Gel, Esma (Committee member) / Pan, Rong (Committee member) / Arizona State University (Publisher)

Created2012

Filtering by