This is an archive of the discontinued LLVM Phabricator instance.

[Assignment Tracking Analysis][3/*] Memory location fragment filling
ClosedPublic

Authored by Orlando on Oct 20 2022, 2:49 AM.

Details

Summary

This patch adds another dataflow analysis to the AssignmentTrackingAnalysis pass (though it is much simpler than D136320).

The problem and goal

Disjoint and adjacent fragments compose well, however in the back-end, the following sequence

  1. dbg.value ... Fragment(0, 64)
  2. dbg.value ... Fragment(0, 32)

is effectively interpreted as setting Fragment(32, 32) (bits [32, 64)) to undef. Perhaps this scenario is uncommon or doesn't present much of an issue currently. However, this is a problem for assignment tracking where overlapping fragments are common between memory location definitions.

The goal of this analysis is to add new variable location definitions that preserve the "bits in memory" at subsequent definitions.

Dataflow high level details

Similar to the "core" analysis, this is a fixed point dataflow analysis. The main difference is that instead of working with join-semilattice types (monotonic increase to a fixed point) it works with meet-semilattice types (monotonic decrease to a fixed point). Basically, instead of performing something like a union when merging predecessor "live-outs" this one performs something like an intersect.

Implementation details

The value computed for each block is a map of aggregate variable to fragment memory locations. The fragment locations are tracked using an IntervalMap (one for each aggregate variable). The values mapped in the IntervalMaps are IDs that represent a stack slot.

In this patch it's probably easier to start looking at process first then join.

In process the location defs - generated by the first analysis - in each block are iterated over. At each location def for a fragment the relevant IntervalMap is updated. A new interval is inserted and new memory location defs for all partially overlapped intervals that are in memory are saved into BBInsertBeforeMap such that the not-overlapped bits in memory get a new def. For example:

Before:

var x bits 0 to 63:  value in memory
some instructions
var x bits 0 to 31:  value is %0

After:

var x bits 0 to 63:  value in memory
some instructions
var x bits 0 to 31:  value is %0
var x bits 32 to 61: value in memory ; <-- new location definition

In join we perform an intersect-like operation on the IntervalMaps. Essentially, we keep only the (sections of) the intervals where all predecessors agree the memory location is in use.


Again, tests coming in another patch.

Diff Detail

Event Timeline

Orlando created this revision.Oct 20 2022, 2:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 2:49 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
Orlando requested review of this revision.Oct 20 2022, 2:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2022, 2:49 AM

Comments aside, LGTM.

llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp
42
255

This could be inverted and made into an early return.

310

A more descriptive name might be useful, painful as it is for a map of maps; even just VarFragMap or AggFragMap would be reasonable I think.

329

equ functions may be worth renaming.

462

Might be worth putting in a function comment that A is modified to contain the result of the meeting, while B is not modified; if you can think of good variable names to represent that as well (I can't) then that would be a good idea as well.

541

Maybe worth adding an assert(StartBit < EndBit && "Cannot create fragment of size <= 0") here as a guard against any future modifications to this pass (it seems like a possible result of some incorrect handling of intervals; there were a few cases in this patch I thought this error might appear until I investigated further).

555

Is the After parameter misnamed? As far as I can tell the intent is that this Def becomes live before After, which technically makes sense (the instruction comes "after" the def) but uses the opposite naming convention to elsewhere, where the paramater is named Before as-in "The instruction we're inserting before".

586–587

It looks like the condition has indeed been removed, unless the actual conditional check appears elsewhere in this pass?

619

Should this be LastOverlap.stop() > EndBit? I could have some misunderstandings about how this works, but to my understanding since this is a half-open interval map, if LastOverlap.start() >= EndBit then the interval map wouldn't consider it an overlap at all (e.g. [0, 16) does not overlap with [16, 32)).

625–628

I believe this example is actually incorrect - IntersectStart and IntersectStop are only true if the existing fragment extends beyond the start and end of this fragment, i.e.

     [ f ]
[  -   i   -  ]
+
[ i ][ f ][ i ]

In the example given in this comment, IntersectStart would be false because this fragment and the existing fragment have the same start bit, and so we would (correctly) fall through to the else block below.

StephenTozer added inline comments.Nov 3 2022, 6:33 AM
llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp
619

Nevermind, I get it now - I was thinking of LastOverlap as "the last existing fragment that overlaps with this fragment", but it's just the existing fragment that overlaps with EndBit if any exists.

Orlando planned changes to this revision.Nov 15 2022, 2:00 AM
Orlando updated this revision to Diff 475796.Nov 16 2022, 5:24 AM
Orlando marked 10 inline comments as done.
llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp
541

SGTM!

555

Yeah, sorry about that. This changed a few times during development.

586–587

That's the ternary operator condition below.

625–628

Yeah looks like you're right. Nice catch - I appreciate the keen-eyed review!

StephenTozer accepted this revision.Nov 22 2022, 1:05 AM

Changes LGTM.

llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp
586–587

Missed that, all good then.

This revision is now accepted and ready to land.Nov 22 2022, 1:05 AM
jmorse requested changes to this revision.Nov 22 2022, 7:50 AM
jmorse added subscribers: CarlosAlbertoEnciso, jmorse.

This is well motivated and useful -- I'm having some trouble building a mental model of what the lattice looks like though. IIUC we're seeking to identify the ranges of variable location that are well defined by the fragment ranges, but, because we later drop overlapped ranges at a later date, we want to explicitly state the not-overlapped memory location so that it isn't dropped. I don't quite see how meetFragments achieves that, as it only selects the bits that do overlap. Is the lattice collecting all the ranges where there's genuine overlap, so that later dbg.values split their ranges across the intersection? And if so, why does addDef delete "fully contained" overlaps, shouldn't it break all definitions up?

Another concern is the high cost of copying a DenseMap of IntervalMap's each join/meet -- this is probably fine if there are only a few leaked + partially promoted variables in a function, which is typically the case, do you have any statistics on how frequently that happens in a clang build for example? This might be a scenario where we need to land+evaluate on wider code bases before refining. It also might be acceptable to have a cut-off point where there are too many leaked variable locations to track, to avoid excessive work.

One also wonders whether @CarlosAlbertoEnciso 's IntervalTree would be useful to collect and identify overlapping areas. That would probably want a "global view" of the function where all fragments are split up from the beginning, wheras I suppose taking the dataflow approach limits it to only the regions where there's some overlap.

(Hitting request changes to take it out of the review queue).

llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp
228–230

Sounds good, potentially this is useful as a method of DIExpression, but let's think about that later.

266–267

Just to check my understanding: so anything that ends with DW_OP_stack_value is going to get filtered out by this check, yes?

288

".. and so we try to recover the rest of the fragment location here."

316–317

Mmmmm, map of map of interval map. Reallocations of this are going to be expensive. Do you have any statistics wrt. how frequently these scenarios present themseves?

558

const VarLocInfo &

592–595

Something unclear to me is what VarLoc.V is in this scenario -- it's being treated as if it's an alloca base, but the comment in patch 1 of "Needs to be value_s_ for variadic expressions." suggests that V is the actual value of an assignment. Commentry on the Bases container of what types it contains would be good.

611

Early return, avoid indentation?

This revision now requires changes to proceed.Nov 22 2022, 7:50 AM
Orlando marked 4 inline comments as done.Nov 23 2022, 7:24 AM

This is well motivated and useful -- I'm having some trouble building a mental model of what the lattice looks like though. IIUC we're seeking to identify the ranges of variable location that are well defined by the fragment ranges, but, because we later drop overlapped ranges at a later date, we want to explicitly state the not-overlapped memory location so that it isn't dropped. I don't quite see how meetFragments achieves that, as it only selects the bits that do overlap. Is the lattice collecting all the ranges where there's genuine overlap, so that later dbg.values split their ranges across the intersection? And if so, why does addDef delete "fully contained" overlaps, shouldn't it break all definitions up?

it only selects the bits that do overlap

meetFragments is merging the LiveOut results of preds. If all preds agree the memory location for some bits is valid (intersect of variable fragments in memory) then that information is propagated. Bits not in that intersect are not treated as located in memory. During process (processing a block) the opposite occurs; the interval map is built up by taking the union of "bits in memory" according to the map and the encountered debug intrinsic. Where there is an intersection in that union we need to emit debug intrinsics to explicitly re-instate the memory location for the bits outside the intersect (union minus intersect). In other worse, the bits in memory that are "not shadowed" by a subsequent overlapping location def need a new location def.

Another concern is the high cost of copying a DenseMap of IntervalMap's each join/meet -- this is probably fine if there are only a few leaked + partially promoted variables in a function, which is typically the case, do you have any statistics on how frequently that happens in a clang build for example? This might be a scenario where we need to land+evaluate on wider code bases before refining. It also might be acceptable to have a cut-off point where there are too many leaked variable locations to track, to avoid excessive work.

I don't have any to hand but I can try and dig some out.

One also wonders whether @CarlosAlbertoEnciso 's IntervalTree would be useful to collect and identify overlapping areas. That would probably want a "global view" of the function where all fragments are split up from the beginning, wheras I suppose taking the dataflow approach limits it to only the regions where there's some overlap.

I did briefly investigate this during development and chose to use IntervalMap for the reasons you've just mentioned.

llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp
266–267

Yes that's right.

316–317

Same as for the other patches, LiveIn and LiveOut are initialised to have a number of elements equal to the number of basic blocks, so in terms of reallocations we only pay for those done by the inner map, VarFragMap (which is a map of intervalmaps though - perhaps this is what you were talking about?).

592–595

MemLocFragFill (this pass) only works with memory locations.

a) AssignmentTrackingLowering (the previous pass) rewrites memory locations in terms of the base alloca
b) Variadic address component of a dbg.assign is not supported (and there are no plans for that)

So, here VarLoc.V is a) the base pointer for the whole variable and b) will never be variadic (even if/when the value component is updated to support variadic expressions).

611

Done, and have done the same for the outer else block (pulled it to the top just above here).

Orlando updated this revision to Diff 477500.Nov 23 2022, 7:27 AM
Orlando marked an inline comment as done.

+ Address comments

jmorse added a comment.Dec 9 2022, 4:53 AM

I think I've clocked that this is a two-value lattice of \top and \bottom / true and false, indicating "this bit is in memory" or not in memory, and the intersection on meet ensures that it monotonically goes downwards, so we gradually remove bits from being in memory. Which sounds good and correct.

I had a slight concern that insertMemLoc might end up recording a history of what happens during dataflow exploration, instead of determining memory locations once a fixedpoint has been reached. What if we have this:

entry:
  everything-is-in-memory bits 0-to-127,
  br loophead
loophead:
  def bits 0-63
  br anotherblock
anotherblock:
  def bits 64-95
  br i1 %foo loophead, exit
exit:
  return

After chatting with Orlando elsewhere, it looks like on the first pass through the loop we'll start with bits 0 to 127 being in memory, then:

  • On processing loophead, fragment it into being only bits 64 to 127 being in memory,
  • Record that fact in BBInsertBeforeMap via insertMemLoc,
  • Process anotherblock, now only bits 96 to 127 are in memory,
  • Record that fact in BBInsertBeforeMap too,
  • Loop back round to loophead,
  • meet/merge the interval maps and determine that we have to go through the loop again,
  • Clear BBInsertBeforeMap for loophead,
  • Walk through loophead again generating data...
  • (etc etc etc)

I'm a lot happier that I understand what's going on now, will check through comment responses...

llvm/lib/CodeGen/AssignmentTrackingAnalysis.cpp
328

Some commentry on the usage of this data structure appreciated -- specifically the fact that it always records the _last_ (AFAUI) memory/fragment map through a block. I didn't clock that it was repeatedly cleared until much later.

jmorse accepted this revision.Dec 9 2022, 5:02 AM

LGTM, with a request for some more documentation comments in the earlier comment.

This revision is now accepted and ready to land.Dec 9 2022, 5:02 AM
Orlando updated this revision to Diff 481629.Dec 9 2022, 6:39 AM
Orlando marked an inline comment as done.

Thanks again for all the reviews!

Orlando closed this revision.Dec 9 2022, 8:21 AM