|
COPYRIGHT 2000 Association for Computing Machinery, Inc.
1. INTRODUCTION
A number of program-analysis problems can be tackled by transforming them into certain kinds of graph-reachability problems in labeled directed graphs [Callahan 1988; Cooper and Kennedy 1988; 1989; Hecht 1977; Horwitz et al. 1990; 1995; Khedker and Dhamdhere 1994; Kou 1977; Melski and Reps 2000; Reps 1995; 1998; Reps et al. 1994; 1995]. It is useful to consider not just ordinary reachability (e.g., transitive closure), but a generalization in which the edge labels are used, in effect, to filter out paths that are not of interest [Callahan 1988; Horwitz et al. 1990; 1995; Melski and Reps 2000; Reps 1995; 1998; Reps et al. 1994; 1995]: a path P from vertex s to vertex t only counts as a "valid connection" between s and t if the word spelled out by P (i.e., the concatenation, in order, of the labels on the edges of P) is in a certain language.
Definition 1.1. Let L be a language over alphabet [[Sigma].sub.L], and let G be a graph whose edges are labeled with members of [[Sigma].sub.L]. Each path in G defines a word over [[Sigma].sub.L], namely, the word obtained by concatenating, in order, the labels of the edges on the path. A path in G is an L-path if its word is a member of L. An instance of the (single-source/single-target) L-path problem asks whether there exists an L-path in G from a given source vertex s to a given target vertex t.
Let L be a family of languages, and let L [element of] L be a language over alphabet [[Sigma].sub.L]. An instance of the L-reachability problem is an L-path problem instance (L, [[Sigma].sub.L], G, s, t).
Example. When the family of languages L in Definition 1.1 is the context-free languages, we have the CFL-reachability problem [Yannakakis 1990]. Consider the graph and the context-free grammar shown in Figure 1. Note that L(matched) is the context-free language that consists of strings of matched parentheses and square brackets, with zero or more e's interspersed.(1) In this graph, there is exactly one L(matched)-path from s to t: the path goes exactly once around the cycle and generates the word "[(e[])eee[e]]".
[Figure 1 ILLUSTRATION OMITTED]
A number of program-analysis problems can be viewed as instances of the CFL-reachability problem [Reps 1998]. In program-analysis problems, the languages used for such filtering purposes are often languages of matching parentheses. In some cases, the matched-parenthesis condition is used to filter out paths with mismatched calls and returns in order to implement so-called "context-sensitive" program analyses.
Example 1.2. The use of CFL-reachability in context-sensitive program-analysis, as opposed to ordinary graph-reachability, is illustrated in Figure 2.(2) The diagram on the right in Figure 2 shows the program's data-dependence graph. (Strictly, neither the two large ovoid shapes nor the rectangular boxes labeled Call, Enter, and Exit are part of the data-dependence graph. The two ovoids indicate which elements belong to which procedure; the rectangular boxes provide some context about the control points in the program to which the various vertices are associated.) Each directed edge in the graph represents a data-dependence (also known as a flow dependence [Kuck 1978; Kuck et al. 1981]): an edge from vertex [v.sub.1] to vertex [v.sub.2] indicates that the value produced at [v.sub.1] may be used at vertex [v.sub.2].
[Figure 2 ILLUSTRATION OMITTED]
For instance, the edge
[right arrow] x =
in the dependence graph for procedure f indicates that the value of x after the execution of the statement x = could be (and, in this case, must be) 0.
In the above program, procedure g has no parameters. However, our data-dependence graphs reflect a somewhat nonstandard treatment of global variables: a global variable such as x is treated as if it were a "hidden" value-result parameter whose value (and subsequent return value) is passed from one scope to another via the special scope-transfer variables x_in and x_out. For example, the interprocedural data-dependence edge
[{.sub.1] x_in = x [right arrow] x = x_in
represents the passing of x from f's scope to g's scope at the first call on g. Note that data-dependence edges for dependences transmitted from the caller to the callee (i.e., from f to g) are labeled by the symbols "[{.sub.1]" and "[{.sub.2]" whereas data-dependence edges for dependences transmitted from the callee back to the caller are labeled by the symbols "[}.sub.1]" and "[}.sub.2]. In particular, the data-dependence edges that represent how x and y are passed from f's scope to g's scope at the first call on g are labeled with [{.sub.1]; the data-dependence edges that represent how x and y are passed from f's scope to g's scope at the second call on g are labeled with [{.sub.2]. Likewise, the data-dependence edges that represent how x and y are passed back from g's scope to f's scope after the two calls finish are labeled with [}.sub.1] and [}.sub.2], respectively.
Our data-dependence graphs also have vertices that correspond to various annotations in the program; annotations, denoted here as comments, indicate that we are interested in a given variable at a given point in the program. In the example above, the comments at p1 and p2 give rise to the vertices p1:y and p2:y.
A context-insensitive analysis that tracks dependences between constants and variables in the program will report that y depends on and 1 at both p1 and p2. The reason is that a context-insensitive analysis ignores the fact that the only paths that can possibly be feasible execution paths are those in which returns are matched with corresponding calls. For instance, the existence of the (mismatched) path
(1.3) [{.sub.1] [right arrow] x = [right arrow] x_in = x [right arrow] x = x_in [right arrow] y = x
[}.sub.2] [right arrow] y_out = y [right arrow] y = y_out [right arrow] p2:y
in the graph shown above serves as "evidence" that p2:y depends on (i.e., that the value of y at p2 could be 0).
In contrast, a context-sensitive analysis only reports possible transmissions of values along paths in which returns are matched with corresponding calls. For this example, it reports that p1:y depends on but not on 1 and that p2:y depends on 1 but not on (i.e., y could have only the value at p1 and 1 at p2). The context-sensitive analysis can be expressed as a CFL-reachability problem with respect to a language of matched indexed curly braces. It would say that path (1.3) is not a valid connection between and p2:y, because the label [{.sub.1] on the interprocedural data-dependence edge
[{.sub.1] x_in = x [right arrow] x = x_in
(which represents the transfer of x's value from f to g as g is entered at the first call site) does not match the label [{.sub.2] on the edge
[}.sub.2] y_out = y [right arrow] y = y_out
(which represents the transfer of y's value back to f when a return is performed in g from a call on g that originated at the second call site). On the other hand, because [{.sub.2] matches with [}.sub.2], the (matched) path
(1.4) [{.sub.2] 1 [right arrow] x = 1 [right arrow] x_in = x [right arrow] x = x_in [right arrow] y = x
[}.sub.2] [right arrow] y_out = y [right arrow] y = y_out [right arrow] p2:y
does count as a valid connection between 1 and p2:y, and this path serves as evidence that the value of y at p2 could...
Read the full article for free courtesy of your local library.
|