Search Content

The effect of image preprocessing techniques and varying JPEG quality on the identifiability of digital image splicing forgery

Description

Splicing of digital images is a powerful form of tampering which transports regions of an image to create a composite image. When used as an artistic tool, this practice is harmless but when these composite images can be used to create political associations or are submitted as evidence in the…

Splicing of digital images is a powerful form of tampering which transports regions of an image to create a composite image. When used as an artistic tool, this practice is harmless but when these composite images can be used to create political associations or are submitted as evidence in the judicial system they become more impactful. In these cases, distinction between an authentic image and a tampered image can become important.

Many proposed approaches to image splicing detection follow the model of extracting features from an authentic and tampered dataset and then classifying them using machine learning with the goal of optimizing classification accuracy. This thesis approaches splicing detection from a slightly different perspective by choosing a modern splicing detection framework and examining a variety of preprocessing techniques along with their effect on classification accuracy. Preprocessing techniques explored include Joint Picture Experts Group (JPEG) file type block line blurring, image level blurring, and image level sharpening. Attention is also paid to preprocessing images adaptively based on the amount of higher frequency content they contain.

This thesis also recognizes an identified problem with using a popular tampering evaluation dataset where a mismatch in the number of JPEG processing iterations between the authentic and tampered set creates an unfair statistical bias, leading to higher detection rates. Many modern approaches do not acknowledge this issue but this thesis applies a quality factor equalization technique to reduce this bias. Additionally, this thesis artificially inserts a mismatch in JPEG processing iterations by varying amounts to determine its effect on detection rates.

ContributorsGubrud, Aaron (Author) / Li, Baoxin (Thesis advisor) / Candan, Kasim (Committee member) / Kadi, Zafer (Committee member) / Arizona State University (Publisher)

Created2015

A Benchmark for Automated Fact Checking with Knowledge Bases

Description

The need for automated / computational fact checking has grown substantially in recent times due to the high volume of false information and limited workforce of human fact checkers. This need has spawned research and new developments in this field and has created many different systems and approaches to this…

The need for automated / computational fact checking has grown substantially in recent times due to the high volume of false information and limited workforce of human fact checkers. This need has spawned research and new developments in this field and has created many different systems and approaches to this complex problem. This paper attempts to not just explain the most popular methods that are currently being used, but provide experimental results of the comparison of two different systems, the replication of results from their respective papers, and an annotated data-set of different test sentences to be used in these systems.

ContributorsRosenkilde, Trevor Curtis (Author) / Papotti, Paolo (Thesis director) / Candan, Kasim (Committee member) / Computer Science and Engineering Program (Contributor) / Barrett, The Honors College (Contributor)

Created2017-12

Understanding legacy workflows through runtime trace analysis

Description

When scientific software is written to specify processes, it takes the form of a workflow, and is often written in an ad-hoc manner in a dynamic programming language. There is a proliferation of legacy workflows implemented by non-expert programmers due to the accessibility of dynamic languages. Unfortunately, ad-hoc workflows lack…

When scientific software is written to specify processes, it takes the form of a workflow, and is often written in an ad-hoc manner in a dynamic programming language. There is a proliferation of legacy workflows implemented by non-expert programmers due to the accessibility of dynamic languages. Unfortunately, ad-hoc workflows lack a structured description as provided by specialized management systems, making ad-hoc workflow maintenance and reuse difficult, and motivating the need for analysis methods. The analysis of ad-hoc workflows using compiler techniques does not address dynamic languages - a program has so few constrains that its behavior cannot be predicted. In contrast, workflow provenance tracking has had success using run-time techniques to record data. The aim of this work is to develop a new analysis method for extracting workflow structure at run-time, thus avoiding issues with dynamics.

The method captures the dataflow of an ad-hoc workflow through its execution and abstracts it with a process for simplifying repetition. An instrumentation system first processes the workflow to produce an instrumented version, capable of logging events, which is then executed on an input to produce a trace. The trace undergoes dataflow construction to produce a provenance graph. The dataflow is examined for equivalent regions, which are collected into a single unit. The workflow is thus characterized in terms of its treatment of an input. Unlike other methods, a run-time approach characterizes the workflow's actual behavior; including elements which static analysis cannot predict (for example, code dynamically evaluated based on input parameters). This also enables the characterization of dataflow through external tools.

The contributions of this work are: a run-time method for recording a provenance graph from an ad-hoc Python workflow, and a method to analyze the structure of a workflow from provenance. Methods are implemented in Python and are demonstrated on real world Python workflows. These contributions enable users to derive graph structure from workflows. Empowered by a graphical view, users can better understand a legacy workflow. This makes the wealth of legacy ad-hoc workflows accessible, enabling workflow reuse instead of investing time and resources into creating a workflow.

ContributorsAcűna, Ruben (Author) / Bazzi, Rida (Thesis advisor) / Lacroix, Zoé (Thesis advisor) / Candan, Kasim (Committee member) / Arizona State University (Publisher)

Created2015

Predicting demographic and financial attributes in a bank marketing dataset

Description

Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability…

Bank institutions employ several marketing strategies to maximize new customer acquisition as well as current customer retention. Telemarketing is one such approach taken where individual customers are contacted by bank representatives with offers. These telemarketing strategies can be improved in combination with data mining techniques that allow predictability of customer information and interests. In this thesis, bank telemarketing data from a Portuguese banking institution were analyzed to determine predictability of several client demographic and financial attributes and find most contributing factors in each. Data were preprocessed to ensure quality, and then data mining models were generated for the attributes with logistic regression, support vector machine (SVM) and random forest using Orange as the data mining tool. Results were analyzed using precision, recall and F1 score.

ContributorsEjaz, Samira (Author) / Davulcu, Hasan (Thesis advisor) / Balasooriya, Janaka (Committee member) / Candan, Kasim (Committee member) / Arizona State University (Publisher)

Created2016

Privacy preserving service discovery and ranking for multiple user QoS requirements in service-based software systems

Description

Service based software (SBS) systems are software systems consisting of services based on the service oriented architecture (SOA). Each service in SBS systems provides partial functionalities and collaborates with other services as workflows to provide the functionalities required by the systems. These services may be developed and/or owned by different…

Service based software (SBS) systems are software systems consisting of services based on the service oriented architecture (SOA). Each service in SBS systems provides partial functionalities and collaborates with other services as workflows to provide the functionalities required by the systems. These services may be developed and/or owned by different entities and physically distributed across the Internet. Compared with traditional software system components which are usually specifically designed for the target systems and bound tightly, the interfaces of services and their communication protocols are standardized, which allow SBS systems to support late binding, provide better interoperability, better flexibility in dynamic business logics, and higher fault tolerance. The development process of SBS systems can be divided to three major phases: 1) SBS specification, 2) service discovery and matching, and 3) service composition and workflow execution. This dissertation focuses on the second phase, and presents a privacy preserving service discovery and ranking approach for multiple user QoS requirements. This approach helps service providers to register services and service users to search services through public, but untrusted service directories with the protection of their privacy against the service directories. The service directories can match the registered services with service requests, but do not learn any information about them. Our approach also enforces access control on services during the matching process, which prevents unauthorized users from discovering services. After the service directories match a set of services that satisfy the service users' functionality requirements, the service discovery approach presented in this dissertation further considers service users' QoS requirements in two steps. First, this approach optimizes services' QoS by making tradeoff among various QoS aspects with users' QoS requirements and preferences. Second, this approach ranks services based on how well they satisfy users' QoS requirements to help service users select the most suitable service to develop their SBSs.

ContributorsYin, Yin (Author) / Yau, Stephen S. (Thesis advisor) / Candan, Kasim (Committee member) / Dasgupta, Partha (Committee member) / Santanam, Raghu (Committee member) / Arizona State University (Publisher)

Created2011

Filtering by

The effect of image preprocessing techniques and varying JPEG quality on the identifiability of digital image splicing forgery

A Benchmark for Automated Fact Checking with Knowledge Bases

Understanding legacy workflows through runtime trace analysis

Predicting demographic and financial attributes in a bank marketing dataset

Privacy preserving service discovery and ranking for multiple user QoS requirements in service-based software systems

Le tombeau de couperin. IV. Regaudon

Sonata for violin and violoncello

Miroirs

Gaspard de la nuit

Sonate posthume