To illustrate our current understanding, let's start with the following program:
https://godbolt.org/z/33f6vheh1
void clang_analyzer_printState(); struct C { int x; int y; int more_padding; }; struct D { C c; int z; }; C foo(D d, int new_x, int new_y) { d.c.x = new_x; // B1 assert(d.c.x < 13); // C1 C c = d.c; // L assert(d.c.y < 10); // C2 assert(d.z < 5); // C3 d.c.y = new_y; // B2 assert(d.c.y < 10); // C4 return c; // R }
In the code, we create a few bindings to subregions of root region d (B1, B2), a constrain on the values (C1, C2, ….), and create a lazyCompoundVal for the part of the region d at point L, which is returned at point R.
Now, the question is which of these should remain live as long the return value of the foo call is live. In perfect a word we should preserve:
- only the bindings of the subregions of d.c, which were created before the copy at L. In our example, this includes B1, and not B2. In other words, new_x should be live but new_y shouldn’t.
- constraints on the values of d.c, that are reachable through c. This can be created both before the point of making the copy (L) or after. In our case, that would be C1 and C2. But not C3 (d.z value is not reachable through c) and C4 (the original value of`d.c.y` was overridden at B2 after the creation of c).
The current code in the RegionStore covers the use case (1), by using the getInterestingValues() to extract bindings to parts of the referred region present in the store at the point of copy. This also partially covers point (2), in case when constraints are applied to a location that has binding at the point of the copy (in our case d.c.x in C1 that has value new_x), but it fails to preserve the constraints that require creating a new symbol for location (d.c.y in C2).
We introduce the concept of lazily copied locations (regions) to the SymbolReaper, i.e. for which a program can access the value stored at that location, but not its address. These locations are constructed as a set of regions referred to by lazyCompoundVal. A readable location (region) is a location that live or lazily copied . And symbols that refer to values in regions are alive if the region is readable.
For simplicity, we follow the current approach to live regions and mark the base region as lazily copied, and consider any subregions as readable. This makes some symbols falsy live (d.z in our example) and keeps the corresponding constraints alive.
The rename Regions to LiveRegions inside RegionStore is NFC change, that was done to make it clear, what is difference between regions stored in this two sets.
Regression Test: https://reviews.llvm.org/D134941
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Could you please incorporate the definition of lazily copied locations (regions) from the summary to here as a comment?