Search Content

Application of a temporal database framework for processing event queries

Description

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query…

This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query languages have been a subject of research for more than 30 years and are a natural fit for expressing queries that involve a temporal dimension. However, operators developed in this context cannot be directly applied to event streams. The research extends a preexisting relational framework for event stream processing to support temporal queries. The language features and formal semantic extensions to extend the relational framework are identified. The extended framework supports continuous, step-wise evaluation of temporal queries. The incremental evaluation of TEQL operators is formalized to avoid re-computation of previous results. The research includes the development of a prototype that supports the integrated event and temporal query processing framework, with support for incremental evaluation and materialization of intermediate results. TEQL enables reporting temporal data in the output, direct specification of conditions over timestamps, and specification of temporal relational operators. Through the integration of temporal database operators with event languages, a new class of temporal queries is made possible for querying event streams. New features include semantic aggregation, extraction of temporal patterns using set operators, and a more accurate specification of event co-occurrence.

ContributorsShiva, Foruhar Ali (Author) / Urban, Susan D (Thesis advisor) / Chen, Yi (Thesis advisor) / Davulcu, Hasan (Committee member) / Sarjoughian, Hessam S. (Committee member) / Arizona State University (Publisher)

Created2012

Automated event-driven security assessment

Description

With the growth of IT products and sophisticated software in various operating systems, I observe that security risks in systems are skyrocketing constantly. Consequently, Security Assessment is now considered as one of primary security mechanisms to measure assurance of systems since systems that are not compliant with security requirements may…

With the growth of IT products and sophisticated software in various operating systems, I observe that security risks in systems are skyrocketing constantly. Consequently, Security Assessment is now considered as one of primary security mechanisms to measure assurance of systems since systems that are not compliant with security requirements may lead adversaries to access critical information by circumventing security practices. In order to ensure security, considerable efforts have been spent to develop security regulations by facilitating security best-practices. Applying shared security standards to the system is critical to understand vulnerabilities and prevent well-known threats from exploiting vulnerabilities. However, many end users tend to change configurations of their systems without paying attention to the security. Hence, it is not straightforward to protect systems from being changed by unconscious users in a timely manner. Detecting the installation of harmful applications is not sufficient since attackers may exploit risky software as well as commonly used software. In addition, checking the assurance of security configurations periodically is disadvantageous in terms of time and cost due to zero-day attacks and the timing attacks that can leverage the window between each security checks. Therefore, event-driven monitoring approach is critical to continuously assess security of a target system without ignoring a particular window between security checks and lessen the burden of exhausted task to inspect the entire configurations in the system. Furthermore, the system should be able to generate a vulnerability report for any change initiated by a user if such changes refer to the requirements in the standards and turn out to be vulnerable. Assessing various systems in distributed environments also requires to consistently applying standards to each environment. Such a uniformed consistent assessment is important because the way of assessment approach for detecting security vulnerabilities may vary across applications and operating systems. In this thesis, I introduce an automated event-driven security assessment framework to overcome and accommodate the aforementioned issues. I also discuss the implementation details that are based on the commercial-off-the-self technologies and testbed being established to evaluate approach. Besides, I describe evaluation results that demonstrate the effectiveness and practicality of the approaches.

ContributorsSeo, Jeong-Jin (Author) / Ahn, Gail-Joon (Thesis advisor) / Yau, Stephen S. (Committee member) / Lee, Joohyung (Committee member) / Arizona State University (Publisher)

Created2014

A framework for extended acquisition and uniform representation of forensic email evidence

Description

The digital forensics community has neglected email forensics as a process, despite the fact that email remains an important tool in the commission of crime. Current forensic practices focus mostly on that of disk forensics, while email forensics is left as an analysis task stemming from that practice. As there…

The digital forensics community has neglected email forensics as a process, despite the fact that email remains an important tool in the commission of crime. Current forensic practices focus mostly on that of disk forensics, while email forensics is left as an analysis task stemming from that practice. As there is no well-defined process to be used for email forensics the comprehensiveness, extensibility of tools, uniformity of evidence, usefulness in collaborative/distributed environments, and consistency of investigations are hindered. At present, there exists little support for discovering, acquiring, and representing web-based email, despite its widespread use. To remedy this, a systematic process which includes discovering, acquiring, and representing web-based email for email forensics which is integrated into the normal forensic analysis workflow, and which accommodates the distinct characteristics of email evidence will be presented. This process focuses on detecting the presence of non-obvious artifacts related to email accounts, retrieving the data from the service provider, and representing email in a well-structured format based on existing standards. As a result, developers and organizations can collaboratively create and use analysis tools that can analyze email evidence from any source in the same fashion and the examiner can access additional data relevant to their forensic cases. Following, an extensible framework implementing this novel process-driven approach has been implemented in an attempt to address the problems of comprehensiveness, extensibility, uniformity, collaboration/distribution, and consistency within forensic investigations involving email evidence.

ContributorsPaglierani, Justin W (Author) / Ahn, Gail-Joon (Thesis advisor) / Yau, Stephen S. (Committee member) / Santanam, Raghu T (Committee member) / Arizona State University (Publisher)

Created2013

Representing, reasoning and answering questions about biological pathways various applications

Description

Biological organisms are made up of cells containing numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted, manifesting as disease symptoms. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for activities including diagnosing diseases and…

Biological organisms are made up of cells containing numerous interconnected biochemical processes. Diseases occur when normal functionality of these processes is disrupted, manifesting as disease symptoms. Thus, understanding these biochemical processes and their interrelationships is a primary task in biomedical research and a prerequisite for activities including diagnosing diseases and drug development. Scientists studying these interconnected processes have identified various pathways involved in drug metabolism, diseases, and signal transduction, etc. High-throughput technologies, new algorithms and speed improvements over the last decade have resulted in deeper knowledge about biological systems, leading to more refined pathways. Such pathways tend to be large and complex, making it difficult for an individual to remember all aspects. Thus, computer models are needed to represent and analyze them. The refinement activity itself requires reasoning with a pathway model by posing queries against it and comparing the results against the real biological system. Many existing models focus on structural and/or factoid questions, relying on surface-level information. These are generally not the kind of questions that a biologist may ask someone to test their understanding of biological processes. Examples of questions requiring understanding of biological processes are available in introductory college level biology text books. Such questions serve as a model for the question answering system developed in this thesis. Thus, the main goal of this thesis is to develop a system that allows the encoding of knowledge about biological pathways to answer questions demonstrating understanding of the pathways. To that end, a language is developed to specify a pathway and pose questions against it. Some existing tools are modified and used to accomplish this goal. The utility of the framework developed in this thesis is illustrated with applications in the biological domain. Finally, the question answering system is used in real world applications by extracting pathway knowledge from text and answering questions related to drug development.

ContributorsAnwar, Saadat (Author) / Baral, Chitta (Thesis advisor) / Inoue, Katsumi (Committee member) / Chen, Yi (Committee member) / Davulcu, Hasan (Committee member) / Lee, Joohyung (Committee member) / Arizona State University (Publisher)

Created2014

Unsupervised Bayesian data cleaning techniques for structured data

Description

Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which…

Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this thesis, I provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. I thus avoid the necessity for a domain expert or master data. I also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. A Map-Reduce architecture to perform this computation in a distributed manner is also shown. I evaluate these methods over both synthetic and real data.

ContributorsDe, Sushovan (Author) / Kambhampati, Subbarao (Thesis advisor) / Chen, Yi (Committee member) / Candan, K. Selcuk (Committee member) / Liu, Huan (Committee member) / Arizona State University (Publisher)

Created2014

Discovering and using patterns for countering security challenges

Description

Most existing security decisions for both defending and attacking are made based on some deterministic approaches that only give binary answers. Even though these approaches can achieve low false positive rate for decision making, they have high false negative rates due to the lack of accommodations to new attack methods…

Most existing security decisions for both defending and attacking are made based on some deterministic approaches that only give binary answers. Even though these approaches can achieve low false positive rate for decision making, they have high false negative rates due to the lack of accommodations to new attack methods and defense techniques. In this dissertation, I study how to discover and use patterns with uncertainty and randomness to counter security challenges. By extracting and modeling patterns in security events, I am able to handle previously unknown security events with quantified confidence, rather than simply making binary decisions. In particular, I cope with the following four real-world security challenges by modeling and analyzing with pattern-based approaches: 1) How to detect and attribute previously unknown shellcode? I propose instruction sequence abstraction that extracts coarse-grained patterns from an instruction sequence and use Markov chain-based model and support vector machines to detect and attribute shellcode; 2) How to safely mitigate routing attacks in mobile ad hoc networks? I identify routing table change patterns caused by attacks, propose an extended Dempster-Shafer theory to measure the risk of such changes, and use a risk-aware response mechanism to mitigate routing attacks; 3) How to model, understand, and guess human-chosen picture passwords? I analyze collected human-chosen picture passwords, propose selection function that models patterns in password selection, and design two algorithms to optimize password guessing paths; and 4) How to identify influential figures and events in underground social networks? I analyze collected underground social network data, identify user interaction patterns, and propose a suite of measures for systematically discovering and mining adversarial evidence. By solving these four problems, I demonstrate that discovering and using patterns could help deal with challenges in computer security, network security, human-computer interaction security, and social network security.

ContributorsZhao, Ziming (Author) / Ahn, Gail-Joon (Thesis advisor) / Yau, Stephen S. (Committee member) / Huang, Dijiang (Committee member) / Santanam, Raghu (Committee member) / Arizona State University (Publisher)

Created2014

The effect of image preprocessing techniques and varying JPEG quality on the identifiability of digital image splicing forgery

Description

Splicing of digital images is a powerful form of tampering which transports regions of an image to create a composite image. When used as an artistic tool, this practice is harmless but when these composite images can be used to create political associations or are submitted as evidence in the…

Splicing of digital images is a powerful form of tampering which transports regions of an image to create a composite image. When used as an artistic tool, this practice is harmless but when these composite images can be used to create political associations or are submitted as evidence in the judicial system they become more impactful. In these cases, distinction between an authentic image and a tampered image can become important.

Many proposed approaches to image splicing detection follow the model of extracting features from an authentic and tampered dataset and then classifying them using machine learning with the goal of optimizing classification accuracy. This thesis approaches splicing detection from a slightly different perspective by choosing a modern splicing detection framework and examining a variety of preprocessing techniques along with their effect on classification accuracy. Preprocessing techniques explored include Joint Picture Experts Group (JPEG) file type block line blurring, image level blurring, and image level sharpening. Attention is also paid to preprocessing images adaptively based on the amount of higher frequency content they contain.

This thesis also recognizes an identified problem with using a popular tampering evaluation dataset where a mismatch in the number of JPEG processing iterations between the authentic and tampered set creates an unfair statistical bias, leading to higher detection rates. Many modern approaches do not acknowledge this issue but this thesis applies a quality factor equalization technique to reduce this bias. Additionally, this thesis artificially inserts a mismatch in JPEG processing iterations by varying amounts to determine its effect on detection rates.

ContributorsGubrud, Aaron (Author) / Li, Baoxin (Thesis advisor) / Candan, Kasim (Committee member) / Kadi, Zafer (Committee member) / Arizona State University (Publisher)

Created2015

Efficient processing of skyline queries on static data sources, data streams and incomplete datasets

Description

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.

An assumption commonly made by many skyline algorithms is that a skyline query is applied…

Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems.

An assumption commonly made by many skyline algorithms is that a skyline query is applied to a single static data source or data stream. Unfortunately, this assumption does not hold in many applications in which a skyline query may involve attributes belonging to multiple data sources and requires a join operation to be performed before the skyline can be produced. Recently, various skyline-join algorithms have been proposed to address this problem in the context of static data sources. However, these algorithms suffer from several drawbacks: they often need to scan the data sources exhaustively to obtain the skyline-join results; moreover, the pruning techniques employed to eliminate tuples are largely based on expensive tuple-to-tuple comparisons. On the other hand, most data stream techniques focus on single stream skyline queries, thus rendering them unsuitable for skyline-join queries.

Another assumption typically made by most of the earlier skyline algorithms is that the data is complete and all skyline attribute values are available. Due to this constraint, these algorithms cannot be applied to incomplete data sources in which some of the attribute values are missing and are represented by NULL values. There exists a definition of dominance for incomplete data, but this leads to undesirable consequences such as non-transitive and cyclic dominance relations both of which are detrimental to skyline processing.

Based on the aforementioned observations, the main goal of the research described in this dissertation is the design and development of a framework of skyline operators that effectively handles three distinct types of skyline queries: 1) skyline-join queries on static data sources, 2) skyline-window-join queries over data streams, and 3) strata-skyline queries on incomplete datasets. This dissertation presents the unique challenges posed by these skyline queries and addresses the shortcomings of current skyline techniques by proposing efficient methods to tackle the added overhead in processing skyline queries on static data sources, data streams, and incomplete datasets.

ContributorsNagendra, Mithila (Author) / Candan, Kasim Selcuk (Thesis advisor) / Chen, Yi (Committee member) / Davulcu, Hasan (Committee member) / Silva, Yasin N. (Committee member) / Sundaram, Hari (Committee member) / Arizona State University (Publisher)

Created2014

Systematic policy analysis and management

Description

With the advent of technologies such as web services, service oriented architecture and cloud computing, modern organizations have to deal with policies such as Firewall policies to secure the networks, XACML (eXtensible Access Control Markup Language) policies for controlling the access to critical information as well as resources. Management of…

With the advent of technologies such as web services, service oriented architecture and cloud computing, modern organizations have to deal with policies such as Firewall policies to secure the networks, XACML (eXtensible Access Control Markup Language) policies for controlling the access to critical information as well as resources. Management of these policies is an extremely important task in order to avoid unintended security leakages via illegal accesses, while maintaining proper access to services for legitimate users. Managing and maintaining access control policies manually over long period of time is an error prone task due to their inherent complex nature. Existing tools and mechanisms for policy management use different approaches for different types of policies. This research thesis represents a generic framework to provide an unified approach for policy analysis and management of different types of policies. Generic approach captures the common semantics and structure of different access control policies with the notion of policy ontology. Policy ontology representation is then utilized for effectively analyzing and managing the policies. This thesis also discusses a proof-of-concept implementation of the proposed generic framework and demonstrates how efficiently this unified approach can be used for analysis and management of different types of access control policies.

ContributorsKulkarni, Ketan (Author) / Ahn, Gail-Joon (Thesis advisor) / Yau, Stephen S. (Committee member) / Huang, Dijiang (Committee member) / Arizona State University (Publisher)

Created2011

Time efficient and quality effective K nearest neighbor search in high dimension space

Description

K-Nearest-Neighbors (KNN) search is a fundamental problem in many application domains such as database and data mining, information retrieval, machine learning, pattern recognition and plagiarism detection. Locality sensitive hash (LSH) is so far the most practical approximate KNN search algorithm for high dimensional data. Algorithms such as Multi-Probe LSH and…

K-Nearest-Neighbors (KNN) search is a fundamental problem in many application domains such as database and data mining, information retrieval, machine learning, pattern recognition and plagiarism detection. Locality sensitive hash (LSH) is so far the most practical approximate KNN search algorithm for high dimensional data. Algorithms such as Multi-Probe LSH and LSH-Forest improve upon the basic LSH algorithm by varying hash bucket size dynamically at query time, so these two algorithms can answer different KNN queries adaptively. However, these two algorithms need a data access post-processing step after candidates' collection in order to get the final answer to the KNN query. In this thesis, Multi-Probe LSH with data access post-processing (Multi-Probe LSH with DAPP) algorithm and LSH-Forest with data access post-processing (LSH-Forest with DAPP) algorithm are improved by replacing the costly data access post-processing (DAPP) step with a much faster histogram-based post-processing (HBPP). Two HBPP algorithms: LSH-Forest with HBPP and Multi- Probe LSH with HBPP are presented in this thesis, both of them achieve the three goals for KNN search in large scale high dimensional data set: high search quality, high time efficiency, high space efficiency. None of the previous KNN algorithms can achieve all three goals. More specifically, it is shown that HBPP algorithms can always achieve high search quality (as good as LSH-Forest with DAPP and Multi-Probe LSH with DAPP) with much less time cost (one to several orders of magnitude speedup) and same memory usage. It is also shown that with almost same time cost and memory usage, HBPP algorithms can always achieve better search quality than LSH-Forest with random pick (LSH-Forest with RP) and Multi-Probe LSH with random pick (Multi-Probe LSH with RP). Moreover, to achieve a very high search quality, Multi-Probe with HBPP is always a better choice than LSH-Forest with HBPP, regardless of the distribution, size and dimension number of the data set.

ContributorsYu, Renwei (Author) / Candan, Kasim S (Thesis advisor) / Sapino, Maria L (Committee member) / Chen, Yi (Committee member) / Sundaram, Hari (Committee member) / Arizona State University (Publisher)

Created2011