This is an archive of the discontinued LLVM Phabricator instance.

[DA][NFC] Update publication - add remarks
ClosedPublic

Authored by simoll on Nov 18 2021, 1:08 AM.

Details

Summary

Update the reference publication for the SyncDependenceAnalysis and Divergence Analysis. Fix phrasing, formatting. Add comments on reducible loop limitation.

Diff Detail

Event Timeline

simoll created this revision.Nov 18 2021, 1:08 AM
simoll requested review of this revision.Nov 18 2021, 1:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2021, 1:08 AM
simoll retitled this revision from [DA] Update publication - add remarks to [DA][NFC] Update publication - add remarks.Nov 18 2021, 1:09 AM
bmahjour removed a subscriber: bmahjour.Nov 18 2021, 5:48 AM

Not related to the changes proposed in this review, but it seems the PDT is not used by the implementation. It might be good to take out that dependency.

llvm/lib/Analysis/SyncDependenceAnalysis.cpp
103

Good to see both these points spelled out. We are currently working on an implementation that works with irreducible control flow. It's still a work in progress, but involves the new "CycleInfo" being introduced in D112696. I do believe that the single pass of DFA is a strength that need not be lost when handling irreducible control flow. CycleInfo provides a predictable way to work around the lack of a unique header. One just needs to take extra care about entering the same irreducible loop multiple times when constructing ModifiedPO.

149–151

Is it correct to say that this is not critical for correctness? From what I understood, if other blocks (beyond the loop exit) got interleaved, the implementation will work, but unnecessarily visit and skip the interleaved blocks when walking backwards on the ModifiedPO.

Not related to the changes proposed in this review, but it seems the PDT is not used by the implementation. It might be good to take out that dependency.

All good points. PDT and the comment on strict loop compactness are leftovers. I will remove PDT in a followup patch.

llvm/lib/Analysis/SyncDependenceAnalysis.cpp
103

Regarding irreducibility, the unclear part to me always was whether the analysis result (detected joins, synchronization points) in IR still represents the synchronization we will end up getting with the final binary on actual hardware.
So the issue is not really getting some result but making sure the entire stack agrees on synchronization.

D85603 should help here. However, if a kernel is irreducible and doesn't use the tools of the patch, the documentation says that reconvergence should be maximal, as early as possible - this may be ambiguous in irreducible control (IIUC implementation defined with cycles).

This is less of an issue with reducible loops as there is a (mostly unspoken) mutual understanding where synchronization happens and so all transformations will abide to that. This is starting to fade however as more recent hardware breaks with the traditions on synchronization.

149–151

Correct. Exit blocks must have a lower number than the loop header in the traversal (to account for the virtual edge from header to exit). The PO should take care of all other dependences.
Strict loop compactness is a relic from the earlier implementation that recursed on the loop tree during propagation.

It may help with efficiency though.

All good points. PDT and the comment on strict loop compactness are leftovers. I will remove PDT in a followup patch.

So will you be rephrasing the compactness in this patch? While I am convinced that it does not affect correctness, I have not thought about whether it affects accuracy, i.e., whether the lack of compactness may result in some places being marked divergent when they aren't.

llvm/lib/Analysis/SyncDependenceAnalysis.cpp
103

Indeed, D85603 is a big part of the work being done for the AMDGPU backend. As part of its cleanup, I am hoping to make some conservative statements about the default behaviour when the intrinsics are not being used. The minimal goal is to gracefully handle irreducible control flow, unlike the current DA implementation where we just give up and assume everything is divergent. More as soon as I am in a confident place to make specific statements.

All good points. PDT and the comment on strict loop compactness are leftovers. I will remove PDT in a followup patch.

So will you be rephrasing the compactness in this patch? While I am convinced that it does not affect correctness, I have not thought about whether it affects accuracy, i.e., whether the lack of compactness may result in some places being marked divergent when they aren't.

I won't touch the comment on loop compactness in this patch. The change does modify how we think about this, which shouldn't happen in the same commit as an updated reference.
Giving up loop compactness (not the order constraints on the loop header/exits though!) shouldn't even change the result of the analysis tbh.

sameerds accepted this revision.Nov 22 2021, 2:17 AM
This revision is now accepted and ready to land.Nov 22 2021, 2:17 AM
This revision was landed with ongoing or failed builds.Nov 22 2021, 3:59 AM
This revision was automatically updated to reflect the committed changes.