This is an archive of the discontinued LLVM Phabricator instance.

[DA][SDA] SyncDependenceAnalysis re-write
ClosedPublic

Authored by simoll on Jul 23 2020, 7:45 AM.

Details

Summary

This patch achieves two things:

  1. It breaks up the join_blocks interface between the SDA to the DA to return two separate sets for divergent loops exits and divergent, disjoint path joins.
  2. It updates the SDA algorithm to run in O(n) time and improves the precision on divergent loop exits.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46372 (by virtue of the improved join_blocks interface) and revealed an imprecise expected result in the Analysis/DivergenceAnalysis/AMDGPU/hidden_loopdiverge.ll test.
This should work as is (feedback on any issues welcome). I'll clean up the code as we proceed with the review.

Diff Detail

Event Timeline

simoll created this revision.Jul 23 2020, 7:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2020, 7:45 AM
bmahjour removed a subscriber: bmahjour.Jul 23 2020, 9:19 AM
arsenm added inline comments.Jul 23 2020, 2:16 PM
llvm/include/llvm/Analysis/SyncDependenceAnalysis.h
41–43

This is just a SetVector?

foad added a comment.Jul 24 2020, 8:35 AM

FYI I've tested this patch by running a large chunk of Vulkan CTS with AMDVLK, and I didn't find any problems.

simoll marked an inline comment as done.Jul 27 2020, 12:56 AM
simoll added inline comments.
llvm/include/llvm/Analysis/SyncDependenceAnalysis.h
41–43

LoopRPO could be a SetVector. However, this is only modified during construction time and atm we discard the "unique-ing" set once RPO becomes read only.
We still need the RPOIndex map for inverse lookup.

It's interesting that you end up using a modified post-order where loops are kept contiguous, I found the same thing helpful in my work on control flow lowering in AMDGPU.

I haven't looked into the actual divergence propagation, but some early comments on other stuff.

llvm/lib/Analysis/SyncDependenceAnalysis.cpp
141

This is relatively expensive, isn't it? And it gets executed once per loop exit instead of once per loop...

193

"\brief" comments should be above the function, not inside its body.

How is the CFG modified?

236–241

The explicit initializer for LoopRPO isn't needed.

237–240

This looks like it's computing a post-order, not a reverse post-order. I haven't looked into how this lines up with the rest of the algorithm, but perhaps you can rename LoopRPO to LoopPO?

244–245

Can this be removed now?

sameerds added inline comments.Jul 27 2020, 10:23 PM
llvm/include/llvm/Analysis/DivergenceAnalysis.h
134

The comment could be placed on the preceding line to avoid breaking up the declaration.

llvm/lib/Analysis/DivergenceAnalysis.cpp
178

It looks very strange to call a function on the whole Phi inside a loop that is iterating over its incoming values. This is actually a predicate loop that needs to be converted into a predicate function. Something like the following. Like the coding style says, this forces you to pick a name which documents what is happening.

if (!Phi)
  return;
if (!hasInterestingOperands(Phi))
  return;
if (markDivergent(*Phi))
  pushUsers(*Phi);
return;
187

Same as previous comment. The intention of introducing usesLiveOut() as a separate function was to document the loop's intention in the name of the function itself.

259

Loop::Contains is called with the same argument in each iteration. If LoopInfo is available at this point, the function could call LoopInfo::getLoopFor() just once outside the current while-loop and then compare pointers at this point.

llvm/lib/Analysis/SyncDependenceAnalysis.cpp
101

This got reordered. Might be a good idea to clang-format the whole patch.

simoll planned changes to this revision.Jul 28 2020, 1:59 AM

Thanks for the comments so far.

simoll updated this revision to Diff 288338.Aug 27 2020, 8:12 AM
simoll marked 6 inline comments as done.

NFC

  • addressed reviewer comments, clean-up, tidy
  • rebased
simoll updated this revision to Diff 288341.Aug 27 2020, 8:15 AM
simoll marked 2 inline comments as done.

minor cleanup

llvm/lib/Analysis/DivergenceAnalysis.cpp
259

We cannot do that because`getLoopFor` would give us the inner-most loop that contains DivExit. We are looking for the outer-most loop that does not contains DivExit here. I've changed this to use the loop depth instead.

llvm/lib/Analysis/SyncDependenceAnalysis.cpp
141

Actually, this is only called once per loop header.

237–240

Yes, we compute PO and traverse the block list backwards to do the reversal when it's used.

simoll updated this revision to Diff 288363.Aug 27 2020, 9:20 AM

Bug fix: DivLoopExit's loop may be nullptr

simoll updated this revision to Diff 288375.Aug 27 2020, 9:52 AM
  • rebased (onto clang-formatting commit for the SDA)
  • DCE
sameerds accepted this revision.Sep 7 2020, 3:03 AM

LGTM, to the extent of my high-level understanding of the analysis. I would recommend waiting a few days and give others a chance to finish the review. I do have some nitpicks, but feel free to decide which ones warrant a change.

llvm/lib/Analysis/DivergenceAnalysis.cpp
220

Not necessarily binding on this change, but could this be an error instead? Or perhaps an early exit with a conservative approximation?

270

typo in the debug output: "taintAndPushPhiNodes"

llvm/lib/Analysis/SyncDependenceAnalysis.cpp
116–120

Is this intended to treat each child loop as a single node when traversing a parent loop? Could we just say that here?

132

typo: line needs to be indented.

143

Is LoopHead always the same as Loop->getHeader()? Also, the complete name LoopHeader is preferred.

238

This should either be removed or put in a debug macro

268

Did you mean to start each "if" on a new line? Missing markup for an unordered list?

289

The final return seems to only update the label. How is \top defined?

This revision is now accepted and ready to land.Sep 7 2020, 3:03 AM
simoll planned changes to this revision.Sep 28 2020, 5:20 AM
simoll marked 10 inline comments as done.
simoll added inline comments.
llvm/lib/Analysis/DivergenceAnalysis.cpp
220

It is non-obvious which threads synchronize in irreducible loops.
The convergence semantics of D85603 should clarify this. Until that patch is in, i'd rather we leave this as an assertion.

llvm/lib/Analysis/SyncDependenceAnalysis.cpp
116–120

We treat child loops as single nodes to speedup the analysis (all loops are reducible and all blocks of child loops will receive the same label as their loop headers).. This is not part of the changes to the CFG that are required for correctness.

simoll updated this revision to Diff 294700.Sep 28 2020, 7:45 AM
simoll marked an inline comment as done.
  • NFC.
  • Adressed comments.

I'll let this simmer for a last round of feedback and commit soonish, otw.

This revision is now accepted and ready to land.Sep 28 2020, 7:45 AM
This revision was automatically updated to reflect the committed changes.