Control and Data Flow Analysis

DMS provides support for computing various kinds of control and data flows. Serious program analysis and transformation tasks often require a deep understanding of information flows that occur between program components. These information flows are induced by the various runtime entity declarations (variables and procedures) and the references, explicit or implied, between them, which are in turn implied by the semantics of the language being processed. Since these semantics are not always directly visible in the syntax of the language, trivial syntax analysis cannot easily discover them, and typically an analysis procedure that understands the semantic implications of the syntax needs to be constructed to make the information flows accessible.

The process of making such informations flow visible is called "flow analysis". The result of such an analysis is usually a graph, often with annotations. Commonly, a control flow graph ("flowchart") is produced, and flow analyses augment that graph with additional arcs or annotations on the nodes of the control flow graph ("facts").

Control Flow Analysis and Graphs

A control flow graph shows how events in the program are sequenced. Traditional flowcharts are limited to executable statements ("boxes"), conditional tests ("diamonds") and subroutine calls ("hexagons"). DMS flow graphs represent these concepts directly as nodes, and attach the AST for the originating tree to each control flow node. Flow graphs for modern languages require more sophistication to represent more complex control flows. DMS flow graphs can include regions (e.g., for exceptions), indirect GOTOs, fork-join and partial-order nodes to represent parallelism (including concurrency or simply lack of defined order). As examples, DMS-derived C control flow graphs contain fork-join nodes used to model the C standard's notion of "sequence points", where subexpression with unspecified order must be complete before further computation occurs (e.g., the arguments to a function are evaluated in an arbitrary order, but the function call cannot occur until all are evaluated). Control flow graphs for GCC C source code can also contain indirect GOTO nodes, to model GCC's "goto *exp;" construct.

Control flow graphs for a specific language (e.g, C or a dialect) is often implemented in DMS by using an attribute grammar evaluator that propagates control points around the abstract syntax tree and assembles those control points into a completed graph. These evaluators use a DMS Control Flow graph domain and a supporting library of control flow graph facilities that it provides. Using these evaluators and this library, it is generally straighforward to construct flow graphs for many modern languages.

Often it is important to know when one computation always occurs before another ("dominates"). The library can compute dominators and post-dominators from standard DMS control flow graphs. It is also useful to know under what condition some control flow node executes; the library also computes this and decorates the control flow graph with such "control dependence" arcs.

Example COBOL Control Flow Graph

Here is a sample control flow graph. You may need to expand your browser view and scroll to appreciate it. It was produced by DMS's COBOL Front End, for the IBM Enterprise dialect.

Key to flow graph interpretation:

Flow Analysis
For Control and Data