Mining IoT Network Traffic in Smart Homes: Traffic Measurement, Pattern Recognition, and Security Applications

189245-Thumbnail Image.png
Description
Recent advances in cyber-physical systems, artificial intelligence, and cloud computing have driven the widespread deployment of Internet-of-Things (IoT) devices in smart homes. However, the spate of cyber attacks exploiting the vulnerabilities and weak security management of smart home IoT devices

Recent advances in cyber-physical systems, artificial intelligence, and cloud computing have driven the widespread deployment of Internet-of-Things (IoT) devices in smart homes. However, the spate of cyber attacks exploiting the vulnerabilities and weak security management of smart home IoT devices have highlighted the urgency and challenges of designing efficient mechanisms for detecting, analyzing, and mitigating security threats towards them. In this dissertation, I seek to address the security and privacy issues of smart home IoT devices from the perspectives of traffic measurement, pattern recognition, and security applications. I first propose an efficient multidimensional smart home network traffic measurement framework, which enables me to deeply understand the smart home IoT ecosystem and detect various vulnerabilities and flaws. I further design intelligent schemes to efficiently extract security-related IoT device event and user activity patterns from the encrypted smart home network traffic. Based on the knowledge of how smart home operates, different systems for securing smart home networks are proposed and implemented, including abnormal network traffic detection across multiple IoT networking protocol layers, smart home safety monitoring with extracted spatial information about IoT device events, and system-level IoT vulnerability analysis and network hardening.
Date Created
2023
Agent

Towards Scalable Security State Management in The Cloud

187520-Thumbnail Image.png
Description
Modern data center networks require efficient and scalable security analysis approaches that can analyze the relationship between the vulnerabilities. Utilizing the Attack Representation Methods (ARMs) and Attack Graphs (AGs) enables the security administrator to understand the cloud network’s current security

Modern data center networks require efficient and scalable security analysis approaches that can analyze the relationship between the vulnerabilities. Utilizing the Attack Representation Methods (ARMs) and Attack Graphs (AGs) enables the security administrator to understand the cloud network’s current security situation at the low-level. However, the AG approach suffers from scalability challenges. It relies on the connectivity between the services and the vulnerabilities associated with the services to allow the system administrator to realize its security state. In addition, the security policies created by the administrator can have conflicts among them, which is often detected in the data plane of the Software Defined Networking (SDN) system. Such conflicts can cause security breaches and increase the flow rules processing delay. This dissertation addresses these challenges with novel solutions to tackle the scalability issue of Attack Graphs and detect security policy conflictsin the application plane before they are transmitted into the data plane for final installation. Specifically, it introduces a segmentation-based scalable security state (S3) framework for the cloud network. This framework utilizes the well-known divide-and-conquer approach to divide the large network region into smaller, manageable segments. It follows a well-known segmentation approach derived from the K-means clustering algorithm to partition the system into segments based on the similarity between the services. Furthermore, the dissertation presents unified intent rules that abstract the network administration from the underlying network controller’s format. It develops a networking service solution to use a bounded formal model for network service compliance checking that significantly reduces the complexity of flow rule conflict checking at the data plane level. The solution can be expended from a single SDN domain to multiple SDN domains and hybrid networks by applying network service function chaining (SFC) for inter-domain policy management.
Date Created
2023
Agent

Data-Efficient Graph Learning

187374-Thumbnail Image.png
Description
Graph-structured data, ranging from social networks to financial transaction networks, from citation networks to gene regulatory networks, have been widely used for modeling a myriad of real-world systems. As a prevailing model architecture to model graph-structured data, graph neural

Graph-structured data, ranging from social networks to financial transaction networks, from citation networks to gene regulatory networks, have been widely used for modeling a myriad of real-world systems. As a prevailing model architecture to model graph-structured data, graph neural networks (GNNs) has drawn much attention in both academic and industrial communities in the past decades. Despite their success in different graph learning tasks, existing methods usually rely on learning from ``big'' data, requiring a large amount of labeled data for model training. However, it is common that real-world graphs are associated with ``small'' labeled data as data annotation and labeling on graphs is always time and resource-consuming. Therefore, it is imperative to investigate graph machine learning (Graph ML) with low-cost human supervision for low-resource settings where limited or even no labeled data is available. This dissertation investigates a new research field -- Data-Efficient Graph Learning, which aims to push forward the performance boundary of graph machine learning (Graph ML) models with different kinds of low-cost supervision signals. To achieve this goal, a series of studies are conducted for solving different data-efficient graph learning problems, including graph few-shot learning, graph weakly-supervised learning, and graph self-supervised learning.
Date Created
2023
Agent

Trustworthy IoT Sensing-as-a-Service

171963-Thumbnail Image.png
Description
The Internet-of-Things (IoT) paradigm is reshaping the ways to interact with the physical space. Many emerging IoT applications need to acquire, process, gain insights from, and act upon the massive amount of data continuously produced by ubiquitous IoT sensors. It

The Internet-of-Things (IoT) paradigm is reshaping the ways to interact with the physical space. Many emerging IoT applications need to acquire, process, gain insights from, and act upon the massive amount of data continuously produced by ubiquitous IoT sensors. It is nevertheless technically challenging and economically prohibitive for each IoT application to deploy and maintain a dedicated large-scale sensor network over distributed wide geographic areas. Built upon the Sensing-as-a-Service paradigm, cloud-sensing service providers are emerging to provide heterogeneous sensing data to various IoT applications with a shared sensing substrate. Cyber threats are among the biggest obstacles against the faster development of cloud-sensing services. This dissertation presents novel solutions to achieve trustworthy IoT sensing-as-a-service. Chapter 1 introduces the cloud-sensing system architecture and the outline of this dissertation. Chapter 2 presents MagAuth, a secure and usable two-factor authentication scheme that explores commercial off-the-shelf wrist wearables with magnetic strap bands to enhance the security and usability of password-based authentication for touchscreen IoT devices. Chapter 3 presents SmartMagnet, a novel scheme that combines smartphones and cheap magnets to achieve proximity-based access control for IoT devices. Chapter 4 proposes SpecKriging, a new spatial-interpolation technique based on graphic neural networks for secure cooperative spectrum sensing which is an important application of cloud-sensing systems. Chapter 5 proposes a trustworthy multi-transmitter localization scheme based on SpecKriging. Chapter 6 discusses the future work.
Date Created
2022
Agent

Identifying Sources of Anomalies in Complex Networks

171925-Thumbnail Image.png
Description
The problem of monitoring complex networks for the detection of anomalous behavior is well known. Sensors are usually deployed for the purpose of monitoring these networks for anomalies and Sensor Placement Optimization (SPO) is the problem of determining where these

The problem of monitoring complex networks for the detection of anomalous behavior is well known. Sensors are usually deployed for the purpose of monitoring these networks for anomalies and Sensor Placement Optimization (SPO) is the problem of determining where these sensors should be placed (deployed) in the network. Prior works have utilized the well known Set Cover formulation in order to determine the locations where sensors should be placed in the network, so that anomalies can be effectively detected. However, such works cannot be utilized to address the problem when the objective is to not only detect the presence of anomalies, but also to detect (distinguish) the source(s) of the detected anomalies, i.e., uniquely monitoring the network. In this dissertation, I attempt to fill in this gap by utilizing the mathematical concept of Identifying Codes and illustrating how it not only can overcome the aforementioned limitation, but also it, and its variants, can be utilized to monitor complex networks modeled from multiple domains. Over the course of this dissertation, I make key contributions which further enhance the efficacy and applicability of Identifying Codes as a monitoring strategy. First, I show how Identifying Codes are superior to not only the Set Cover formulation but also standard graph centrality metrics, for the purpose of uniquely monitoring complex networks. Second, I study novel problems such as the budget constrained Identifying Code, scalable Identifying Code, robust Identifying Code etc., and present algorithms and results for the respective problems. Third, I present useful Identifying Code results for restricted graph classes such as Unit Interval Bigraphs and Unit Disc Bigraphs. Finally, I show the universality of Identifying Codes by applying it to multiple domains.
Date Created
2022
Agent

Risk-based Network Vulnerability Prioritization

171813-Thumbnail Image.png
Description
This dissertation investigates the problem of efficiently and effectively prioritizing a vulnerability risk in a computer networking system. Vulnerability prioritization is one of the most challenging issues in vulnerability management, which affects allocating preventive and defensive resources in a computer

This dissertation investigates the problem of efficiently and effectively prioritizing a vulnerability risk in a computer networking system. Vulnerability prioritization is one of the most challenging issues in vulnerability management, which affects allocating preventive and defensive resources in a computer networking system. Due to the large number of identified vulnerabilities, it is very challenging to remediate them all in a timely fashion. Thus, an efficient and effective vulnerability prioritization framework is required. To deal with this challenge, this dissertation proposes a novel risk-based vulnerability prioritization framework that integrates the recent artificial intelligence techniques (i.e., neuro-symbolic computing and logic reasoning). The proposed work enhances the vulnerability management process by prioritizing vulnerabilities with high risk by refining the initial risk assessment with the network constraints. This dissertation is organized as follows. The first part of this dissertation presents the overview of the proposed risk-based vulnerability prioritization framework, which contains two stages. The second part of the dissertation investigates vulnerability risk features in a computer networking system. The third part proposes the first stage of this framework, a vulnerability risk assessment model. The proposed assessment model captures the pattern of vulnerability risk features to provide a more comprehensive risk assessment for a vulnerability. The fourth part proposes the second stage of this framework, a vulnerability prioritization reasoning engine. This reasoning engine derives network constraints from interactions between vulnerabilities and network environment elements based on network and system setups. This proposed framework assesses a vulnerability in a computer networking system based on its actual security impact by refining the initial risk assessment with the network constraints.
Date Created
2022
Agent

Connected and Automated Mobility Modeling on Layered Transportation Networks: Cross-Resolution Architecture of System Estimation and Optimization

171423-Thumbnail Image.png
Description
The emerging multimodal mobility as a service (MaaS) and connected and automated mobility (CAM) are expected to improve individual travel experience and entire transportation system performance in various aspects, such as convenience, safety, and reliability. There have been extensive efforts

The emerging multimodal mobility as a service (MaaS) and connected and automated mobility (CAM) are expected to improve individual travel experience and entire transportation system performance in various aspects, such as convenience, safety, and reliability. There have been extensive efforts in the literature devoted to enhancing existing and developing new methodologies and tools to investigate the impacts and potentials of CAM systems. Due to the hierarchical nature of CAM systems and associated intrinsic correlated human factors and physical infrastructures from various resolutions, simply considering components across different levels into a single model may be practically infeasible and computationally prohibitive in operation and decision stages. One of the greatest challenges in existing studies is to construct a theoretically sound and computationally efficient architecture such that CAM system modeling can be performed in an inherently consistent cross-resolution manner. This research aims to contribute to the modeling of CAM systems on layered transportation networks, with a special focus on the following three aspects: (1) layered CAM system architecture with a tight network and modeling consistency, in which different levels of tasks can be efficiently performed at dedicated layers; (2) cross-resolution traffic state estimation in CAM systems using heterogeneous observations; and (3) integrated city logistics operation optimization in CAM for improving system performance.
Date Created
2022
Agent

Defeating Attackers by Bridging the Gaps Between Security and Intelligence

168710-Thumbnail Image.png
Description
The omnipresent data, growing number of network devices, and evolving attack techniques have been challenging organizations’ security defenses over the past decade. With humongous volumes of logs generated by those network devices, looking for patterns of malicious activities and identifying

The omnipresent data, growing number of network devices, and evolving attack techniques have been challenging organizations’ security defenses over the past decade. With humongous volumes of logs generated by those network devices, looking for patterns of malicious activities and identifying them in time is growing beyond the capabilities of their defense systems. Deep Learning, a subset of Machine Learning (ML) and Artificial Intelligence (AI), fills in this gapwith its ability to learn from huge amounts of data, and improve its performance as the data it learns from increases. In this dissertation, I bring forward security issues pertaining to two top threats that most organizations fear, Advanced Persistent Threat (APT), and Distributed Denial of Service (DDoS), along with deep learning models built towards addressing those security issues. First, I present a deep learning model, APT Detection, capable of detecting anomalous activities in a system. Evaluation of this model demonstrates how it can contribute to early detection of an APT attack with an Area Under the Curve (AUC) of up to 91% on a Receiver Operating Characteristic (ROC) curve. Second, I present DAPT2020, a first of its kind dataset capturing an APT attack exploiting web and system vulnerabilities in an emulated organization’s production network. Evaluation of the dataset using well known machine learning models demonstrates the need for better deep learning models to detect APT attacks. I then present DAPT2021, a semi-synthetic dataset capturing an APT attackexploiting human vulnerabilities, alongside 2 less skilled attacks. By emulating the normal behavior of the employees in a set target organization, DAPT2021 has been created to enable researchers study the causations and correlations among the captured data, a much-needed information to detect an underlying threat early. Finally, I present a distributed defense framework, SmartDefense, that can detect and mitigate over 90% of DDoS traffic at the source and over 97.5% of the remaining DDoS traffic at the Internet Service Provider’s (ISP’s) edge network. Evaluation of this work shows how by using attributes sent by customer edge network, SmartDefense can further help ISPs prevent up to 51.95% of the DDoS traffic from going to the destination.
Date Created
2022
Agent

Learning from the Data Heterogeneity for Data Imputation

162017-Thumbnail Image.png
Description
Data mining, also known as big data analysis, has been identified as a critical and challenging process for a variety of applications in real-world problems. Numerous datasets are collected and generated every day to store the information. The rise in

Data mining, also known as big data analysis, has been identified as a critical and challenging process for a variety of applications in real-world problems. Numerous datasets are collected and generated every day to store the information. The rise in the number of data volumes and data modality has resulted in the increased demand for data mining methods and strategies of finding anomalies, patterns, and correlations within large data sets to predict outcomes. Effective machine learning methods are widely adapted to build the data mining pipeline for various purposes like business understanding, data understanding, data preparation, modeling, evaluation, and deployment. The major challenges for effectively and efficiently mining big data include (1) data heterogeneity and (2) missing data. Heterogeneity is the natural characteristic of big data, as the data is typically collected from different sources with diverse formats. The missing value is the most common issue faced by the heterogeneous data analysis, which resulted from variety of factors including the data collecting processing, user initiatives, erroneous data entries, and so on. In response to these challenges, in this thesis, three main research directions with application scenarios have been investigated: (1) Mining and Formulating Heterogeneous Data, (2) missing value imputation strategy in various application scenarios in both offline and online manner, and (3) missing value imputation for multi-modality data. Multiple strategies with theoretical analysis are presented, and the evaluation of the effectiveness of the proposed algorithms compared with state-of-the-art methods is discussed.
Date Created
2021
Agent

Learning Causality with Networked Observational Data

161577-Thumbnail Image.png
Description
This dissertation considers the question of how convenient access to copious networked observational data impacts our ability to learn causal knowledge. It investigates in what ways learning causality from such data is different from -- or the same as --

This dissertation considers the question of how convenient access to copious networked observational data impacts our ability to learn causal knowledge. It investigates in what ways learning causality from such data is different from -- or the same as -- the traditional causal inference which often deals with small scale i.i.d. data collected from randomized controlled trials? For example, how can we exploit network information for a series of tasks in the area of learning causality? To answer this question, the dissertation is written toward developing a suite of novel causal learning algorithms that offer actionable insights for a series of causal inference tasks with networked observational data. The work aims to benefit real-world decision-making across a variety of highly influential applications. In the first part of this dissertation, it investigates the task of inferring individual-level causal effects from networked observational data. First, it presents a representation balancing-based framework for handling the influence of hidden confounders to achieve accurate estimates of causal effects. Second, it extends the framework with an adversarial learning approach to properly combine two types of existing heuristics: representation balancing and treatment prediction. The second part of the dissertation describes a framework for counterfactual evaluation of treatment assignment policies with networked observational data. A novel framework that captures patterns of hidden confounders is developed to provide more informative input for downstream counterfactual evaluation methods. The third part presents a framework for debiasing two-dimensional grid-based e-commerce search with observational search log data where there is an implicit network connecting neighboring products in a search result page. A novel inverse propensity scoring framework that models user behavior patterns for two-dimensional display in e-commerce websites is developed, which aims to optimize online performance of ranking algorithms with offline log data.
Date Created
2021
Agent