Understanding legacy workflows through runtime trace analysis

Acűna, Ruben

When scientific software is written to specify processes, it takes the form of a workflow, and is often written in an ad-hoc manner in a dynamic programming language. There is a proliferation of legacy workflows implemented by non-expert programmers due…

When scientific software is written to specify processes, it takes the form of a workflow, and is often written in an ad-hoc manner in a dynamic programming language. There is a proliferation of legacy workflows implemented by non-expert programmers due to the accessibility of dynamic languages. Unfortunately, ad-hoc workflows lack a structured description as provided by specialized management systems, making ad-hoc workflow maintenance and reuse difficult, and motivating the need for analysis methods. The analysis of ad-hoc workflows using compiler techniques does not address dynamic languages - a program has so few constrains that its behavior cannot be predicted. In contrast, workflow provenance tracking has had success using run-time techniques to record data. The aim of this work is to develop a new analysis method for extracting workflow structure at run-time, thus avoiding issues with dynamics.

The method captures the dataflow of an ad-hoc workflow through its execution and abstracts it with a process for simplifying repetition. An instrumentation system first processes the workflow to produce an instrumented version, capable of logging events, which is then executed on an input to produce a trace. The trace undergoes dataflow construction to produce a provenance graph. The dataflow is examined for equivalent regions, which are collected into a single unit. The workflow is thus characterized in terms of its treatment of an input. Unlike other methods, a run-time approach characterizes the workflow's actual behavior; including elements which static analysis cannot predict (for example, code dynamically evaluated based on input parameters). This also enables the characterization of dataflow through external tools.

The contributions of this work are: a run-time method for recording a provenance graph from an ad-hoc Python workflow, and a method to analyze the structure of a workflow from provenance. Methods are implemented in Python and are demonstrated on real world Python workflows. These contributions enable users to derive graph structure from workflows. Empowered by a graphical view, users can better understand a legacy workflow. This makes the wealth of legacy ad-hoc workflows accessible, enabling workflow reuse instead of investing time and resources into creating a workflow.

Copyright Statement

Reuse Permissions

Downloads

pdf (1.3 MB)

Details

Title

Understanding legacy workflows through runtime trace analysis

Contributors

Acűna, Ruben (Author)
Bazzi, Rida (Thesis advisor)
Lacroix, Zoé (Thesis advisor)
Candan, Kasim (Committee member)
Arizona State University (Publisher)

Date Created

2015

Subjects

Resource Type

Text

Collections this item is in

ASU Electronic Theses and Dissertations

Note

Partial requirement for: M.S., Arizona State University, 2015

Note type

thesis
Includes biblioghraphical references (pages 100-109)

Note type

bibliography
Field of study: Computer science

Understanding legacy workflows through runtime trace analysis

Details

Citation and reuse

Statement of Responsibility

Machine-readable links