This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Flush vmcnt in preheader for loops with loads
Changes PlannedPublic

Authored by kerbowa on Jul 5 2023, 1:40 AM.

Details

Summary

Expand hoisting waitcnt by flushing vmcnt in the preheader of all loops which use values loaded outside of the loop and contain VMEM loads.

Diff Detail

Event Timeline

kerbowa created this revision.Jul 5 2023, 1:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2023, 1:40 AM
kerbowa requested review of this revision.Jul 5 2023, 1:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2023, 1:40 AM
kerbowa planned changes to this revision.Jul 5 2023, 1:41 AM

This patch is meant to discuss and explore the idea of swapping the default to assume that in the average case, it is profitable to hoist waitcnt to the preheader of loops. It's mutually exclusive with D154480. Needs a round of performance testing to confirm it actually is profitable in the aggregate.

An improvement would probably be needed where there is verification that the waitcnt being hoisted is actually improving the placement of waitcnt in the loop.

E.g. in cases like below, we don't want to do any hoisting.

v0 = load(...)
loop {
  v1 = load(...)
  ...
  use(v1)
  use(v0)
}