Expand hoisting waitcnt by flushing vmcnt in the preheader of all loops which use values loaded outside of the loop and contain VMEM loads.
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
This patch is meant to discuss and explore the idea of swapping the default to assume that in the average case, it is profitable to hoist waitcnt to the preheader of loops. It's mutually exclusive with D154480. Needs a round of performance testing to confirm it actually is profitable in the aggregate.
An improvement would probably be needed where there is verification that the waitcnt being hoisted is actually improving the placement of waitcnt in the loop.
E.g. in cases like below, we don't want to do any hoisting.
v0 = load(...) loop { v1 = load(...) ... use(v1) use(v0) }