waitcnt vmcnt instructions are currently generated in a loop body before using a value loaded
outside of the loop. In some cases, it is better to flush the vmcnt counter in the loop preheader
before entering the loop body. This patch flushes the counter in the two following situations:
- (pre-GFX10 only) The loop contains no load, at least one store and uses a vgpr loaded outside of the loop:
v0 = load(...) loop { ... use(v0) store(...) }
- The loop uses a vgpr loaded outside of the loop and contains at least one load loading a value that is unused in the loop. (On pre-GFX10 targets, the loop also contains no store):
v0 = load(...) loop { ... use(v0) v1 = load(...) }
MachineBasicBlock successors are stored in a vector so you can write this more simply e.g.: return Successors.size() == 1 ? Successors[0] : nullptr;