In the area of hardware, reverse engineering was traditionally focused on developing clones—duplicated components that performed the same functionality of the original component. While reverse engineering techniques have been applied to software, these techniques have instead focused on understanding high-level software designs to ease the software maintenance burden. This approach works well for traditional applications that contain source code, however, there are circumstances, particularly regarding web applications, where it would be very beneficial to clone a web application and no source code is present, e.g., for security testing of the application or for offline mock testing of a third-party web service. We call this the web application cloning problem.
This thesis presents a possible solution to the problem of web application cloning. Our approach is a novel application of inductive programming, which we call inductive reverse engineering. The goal of inductive reverse engineering is to automatically reverse engineer an abstraction of the web application’s code in a completely black-box manner. We build this approach using recent advances in inductive programming, and we solve several technical challenges to scale the inductive programming techniques to realistic-sized web applications. We target the initial version of our inductive reverse engineering tool to a subset of web applications, i.e., those that do not store state and those that do not have loops. We introduce an evaluation methodology for web application cloning techniques and evaluate our approach on several real-world web applications. The results indicate that inductive reverse engineering can effectively reverse engineer specific types of web applications. In the future, we hope to extend the power of inductive reverse engineering to web applications with state and to learn loops, while still maintaining tractability.