Search Content

Differential Privacy Protection via Inexact Data Cloning

Description

With the advent of new advanced analysis tools and access to related published data, it is getting more difficult for data owners to suppress private information from published data while still providing useful information. This dual problem of providing useful, accurate information and protecting it at the same time has…

With the advent of new advanced analysis tools and access to related published data, it is getting more difficult for data owners to suppress private information from published data while still providing useful information. This dual problem of providing useful, accurate information and protecting it at the same time has been challenging, especially in healthcare. The data owners lack an automated resource that provides layers of protection on a published dataset with validated statistical values for usability. Differential privacy (DP) has gained a lot of attention in the past few years as a solution to the above-mentioned dual problem. DP is defined as a statistical anonymity model that can protect the data from adversarial observation while still providing intended usage. This dissertation introduces a novel DP protection mechanism called Inexact Data Cloning (IDC), which simultaneously protects and preserves information in published data while conveying source data intent. IDC preserves the privacy of the records by converting the raw data records into clonesets. The clonesets then pass through a classifier that removes potential compromising clonesets, filtering only good inexact cloneset. The mechanism of IDC is dependent on a set of privacy protection metrics called differential privacy protection metrics (DPPM), which represents the overall protection level. IDC uses two novel performance values, differential privacy protection score (DPPS) and clone classifier selection percentage (CCSP), to estimate the privacy level of protected data. In support of using IDC as a viable data security product, a software tool chain prototype, differential privacy protection architecture (DPPA), was developed to utilize the IDC. DPPA used the engineering security mechanism of IDC. DPPA is a hub which facilitates a market for data DP security mechanisms. DPPA works by incorporating standalone IDC mechanisms and provides automation, IDC protected published datasets and statistically verified IDC dataset diagnostic report. DPPA is currently doing functional, and operational benchmark processes that quantifies the DP protection of a given published dataset. The DPPA tool was recently used to test a couple of health datasets. The test results further validate the IDC mechanism as being feasible.

Contributorsthomas, zelpha (Author) / Bliss, Daniel W (Thesis advisor) / Papandreou-Suppappola, Antonia (Committee member) / Banerjee, Ayan (Committee member) / Shrivastava, Aviral (Committee member) / Arizona State University (Publisher)

Created2023

Secure and privacy-preserving microblogging services: attacks and defenses

Description

Microblogging services such as Twitter, Sina Weibo, and Tumblr have been emerging and deeply embedded into people's daily lives. Used by hundreds of millions of users to connect the people worldwide and share and access information in real-time, the microblogging service has also became the target of malicious attackers due…

Microblogging services such as Twitter, Sina Weibo, and Tumblr have been emerging and deeply embedded into people's daily lives. Used by hundreds of millions of users to connect the people worldwide and share and access information in real-time, the microblogging service has also became the target of malicious attackers due to its massive user engagement and structural openness. Although existed, little is still known in the community about new types of vulnerabilities in current microblogging services which could be leveraged by the intelligence-evolving attackers, and more importantly, the corresponding defenses that could prevent both the users and the microblogging service providers from being attacked. This dissertation aims to uncover a number of challenging security and privacy issues in microblogging services and also propose corresponding defenses.

This dissertation makes fivefold contributions. The first part presents the social botnet, a group of collaborative social bots under the control of a single botmaster, demonstrate the effectiveness and advantages of exploiting a social botnet for spam distribution and digital-influence manipulation, and propose the corresponding countermeasures and evaluate their effectiveness. Inspired by Pagerank, the second part describes TrueTop, the first sybil-resilient system to find the top-K influential users in microblogging services with very accurate results and strong resilience to sybil attacks. TrueTop has been implemented to handle millions of nodes and 100 times more edges on commodity computers. The third and fourth part demonstrate that microblogging systems' structural openness and users' carelessness could disclose the later's sensitive information such as home city and age. LocInfer, a novel and lightweight system, is presented to uncover the majority of the users in any metropolitan area; the dissertation also proposes MAIF, a novel machine learning framework that leverages public content and interaction information in microblogging services to infer users' hidden ages. Finally, the dissertation proposes the first privacy-preserving social media publishing framework to let the microblogging service providers publish their data to any third-party without disclosing users' privacy and meanwhile meeting the data's commercial utilities. This dissertation sheds the light on the state-of-the-art security and privacy issues in the microblogging services.

ContributorsZhang, Jinxue (Author) / Zhang, Yanchao (Thesis advisor) / Zhang, Junshan (Committee member) / Ying, Lei (Committee member) / Ahn, Gail-Joon (Committee member) / Arizona State University (Publisher)

Created2016

Privacy-preserving mobile crowd sensing

Description

The presence of a rich set of embedded sensors on mobile devices has been fuelling various sensing applications regarding the activities of individuals and their surrounding environment, and these ubiquitous sensing-capable mobile devices are pushing the new paradigm of Mobile Crowd Sensing (MCS) from concept to reality. MCS aims to…

The presence of a rich set of embedded sensors on mobile devices has been fuelling various sensing applications regarding the activities of individuals and their surrounding environment, and these ubiquitous sensing-capable mobile devices are pushing the new paradigm of Mobile Crowd Sensing (MCS) from concept to reality. MCS aims to outsource sensing data collection to mobile users and it could revolutionize the traditional ways of sensing data collection and processing. In the meantime, cloud computing provides cloud-backed infrastructures for mobile devices to provision their capabilities with network access. With enormous computational and storage resources along with sufficient bandwidth, it functions as the hub to handle the sensing service requests from sensing service consumers and coordinate sensing task assignment among eligible mobile users to reach a desired quality of sensing service. This paper studies the problem of sensing task assignment to mobile device owners with specific spatio-temporal traits to minimize the cost and maximize the utility in MCS while adhering to QoS constraints. Greedy approaches and hybrid solutions combined with bee algorithms are explored to address the problem.

Moreover, the privacy concerns arise with the widespread deployment of MCS from both the data contributors and the sensing service consumers. The uploaded sensing data, especially those tagged with spatio-temporal information, will disclose the personal information of the data contributors. In addition, the sensing service requests can reveal the personal interests of service consumers. To address the privacy issues, this paper constructs a new framework named Privacy-Preserving Mobile Crowd Sensing (PP-MCS) to leverage the sensing capabilities of ubiquitous mobile devices and cloud infrastructures. PP-MCS has a distributed architecture without relying on trusted third parties for privacy-preservation. In PP-MCS, the sensing service consumers can retrieve data without revealing the real data contributors. Besides, the individual sensing records can be compared against the aggregation result while keeping the values of sensing records unknown, and the k-nearest neighbors could be approximately identified without privacy leaks. As such, the privacy of the data contributors and the sensing service consumers can be protected to the greatest extent possible.

ContributorsWang, Zhijie (Thesis advisor) / Xue, Guoliang (Committee member) / Sen, Arunabha (Committee member) / Li, Jing (Committee member) / Arizona State University (Publisher)

Created2016

Fundamental limits in data privacy: from privacy measures to economic foundations

Description

Data privacy is emerging as one of the most serious concerns of big data analytics, particularly with the growing use of personal data and the ever-improving capability of data analysis. This dissertation first investigates the relation between different privacy notions, and then puts the main focus on developing economic foundations…

Data privacy is emerging as one of the most serious concerns of big data analytics, particularly with the growing use of personal data and the ever-improving capability of data analysis. This dissertation first investigates the relation between different privacy notions, and then puts the main focus on developing economic foundations for a market model of trading private data.

The first part characterizes differential privacy, identifiability and mutual-information privacy by their privacy--distortion functions, which is the optimal achievable privacy level as a function of the maximum allowable distortion. The results show that these notions are fundamentally related and exhibit certain consistency: (1) The gap between the privacy--distortion functions of identifiability and differential privacy is upper bounded by a constant determined by the prior. (2) Identifiability and mutual-information privacy share the same optimal mechanism. (3) The mutual-information optimal mechanism satisfies differential privacy with a level at most a constant away from the optimal level.

The second part studies a market model of trading private data, where a data collector purchases private data from strategic data subjects (individuals) through an incentive mechanism. The value of epsilon units of privacy is measured by the minimum payment such that an individual's equilibrium strategy is to report data in an epsilon-differentially private manner. For the setting with binary private data that represents individuals' knowledge about a common underlying state, asymptotically tight lower and upper bounds on the value of privacy are established as the number of individuals becomes large, and the payment--accuracy tradeoff for learning the state is obtained. The lower bound assures the impossibility of using lower payment to buy epsilon units of privacy, and the upper bound is given by a designed reward mechanism. When the individuals' valuations of privacy are unknown to the data collector, mechanisms with possible negative payments (aiming to penalize individuals with "unacceptably" high privacy valuations) are designed to fulfill the accuracy goal and drive the total payment to zero. For the setting with binary private data following a general joint probability distribution with some symmetry, asymptotically optimal mechanisms are designed in the high data quality regime.

ContributorsWang, Weina (Author) / Ying, Lei (Thesis advisor) / Zhang, Junshan (Thesis advisor) / Scaglione, Anna (Committee member) / Zhang, Yanchao (Committee member) / Arizona State University (Publisher)

Created2016

Filtering by

Differential Privacy Protection via Inexact Data Cloning

Secure and privacy-preserving microblogging services: attacks and defenses

Privacy-preserving mobile crowd sensing

Fundamental limits in data privacy: from privacy measures to economic foundations