Many proposed approaches to image splicing detection follow the model of extracting features from an authentic and tampered dataset and then classifying them using machine learning with the goal of optimizing classification accuracy. This thesis approaches splicing detection from a slightly different perspective by choosing a modern splicing detection framework and examining a variety of preprocessing techniques along with their effect on classification accuracy. Preprocessing techniques explored include Joint Picture Experts Group (JPEG) file type block line blurring, image level blurring, and image level sharpening. Attention is also paid to preprocessing images adaptively based on the amount of higher frequency content they contain.
This thesis also recognizes an identified problem with using a popular tampering evaluation dataset where a mismatch in the number of JPEG processing iterations between the authentic and tampered set creates an unfair statistical bias, leading to higher detection rates. Many modern approaches do not acknowledge this issue but this thesis applies a quality factor equalization technique to reduce this bias. Additionally, this thesis artificially inserts a mismatch in JPEG processing iterations by varying amounts to determine its effect on detection rates.
With the rapid increase of technological capabilities, particularly in processing power and speed, the usage of machine learning is becoming increasingly widespread, especially in fields where real-time assessment of complex data is extremely valuable. This surge in popularity of machine learning gives rise to an abundance of potential research and projects on further broadening applications of artificial intelligence. From these opportunities comes the purpose of this thesis. Our work seeks to meaningfully increase our understanding of current capabilities of machine learning and the problems they can solve. One extremely popular application of machine learning is in data prediction, as machines are capable of finding trends that humans often miss. Our effort to this end was to examine the CVE dataset and attempt to predict future entries with Random Forests. The second area of interest lies within the great promise being demonstrated by neural networks in the field of autonomous driving. We sought to understand the research being put out by the most prominent bodies within this field and to implement a model on one of the largest standing datasets, Berkeley DeepDrive 100k. This thesis describes our efforts to build, train, and optimize a Random Forest model on the CVE dataset and a convolutional neural network on the Berkeley DeepDrive 100k dataset. We document these efforts with the goal of growing our knowledge on (and usage of) machine learning in these topics.