With GFX11 the minimum occupancy (for memory bound) shaders should
be set based on the number of addressable VGPRs and total VGPRs,
i.e. it should be higher than 4 in many cases to avoid spilling.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
60,040 ms | x64 debian > MLIR.Examples/standalone::test.toy |
Event Timeline
I'm not sure where this is currently tested to add a new test.
If any reviewers know?
See also D83674 (which I more or less gave up on after the reviewer feedback).
Not sure. test/CodeGen/AMDGPU/perfhint.ll tests whether a function is marked as memory bound or not, but I don't know if we have tests for the scheduler allowing memory bound functions to get a lower occupancy.
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
---|---|---|
965 | I do not understand the logic really. TotalNumVGPR does not tell anything about spilling. I would understand if that is expressed in a maximum occupancy terms, so you return a fraction of maximum occupancy. Moreover, this limit is really about a number of concurrent potential memory requests so that they do not evict each other or conflict. So if your CU can run more waves you likely want to drop occupancy here and give shader more registers for a memory bound case. That said, this is not supposed to be a strong limit, if scheduler have to spill it should drop the occupancy below this limit. At least it was written that way at some point. I.e. I do not think this is a right place to fix spilling, you do not know real pressure here anyway. If this limit causes spilling then scheduler itself have to be fixed. |
So this was based on dealing with the temporary 128 VGPR limit on GFX11, which has now been removed.
On that basis I will probably abandon this revision; however, I do still have concerns about this magic number 4.
Because of this number the scheduler would pick a schedule which used more than 128 VGPRs and spill, when in fact a schedule without spilling was possible.
Essentially when memory bound heuristic has been triggered the scheduler behaves in a manner which artificially reduces occupancy to this value of 4.
With 256 VGPRs addressable then 4 does work as a minimum without spilling for all hardware variants.
Although logically the value is going to be 6 for GFX11 with extra VGPRs in Wave32.
i.e. it should be higher than 4 in many cases to avoid spilling.
As I understand it: the scheduler came up with a schedule that had a max register pressure of 128. For some reason the register allocator (or other post-scheduler passes) failed to meet this limit and actually tried to use more than 128 registers. Normally this would mean that we miss the scheduler's occupancy target, but because 128 was the maximum number of usable registers (on GFX11 before D133723) we spilled instead.
Perhaps it is worth trying to understand *why* we failed to meet the scheduler's estimated limit of 128 registers? It may not be possible for the scheduler to accurately estimate the final register usage in all cases, but perhaps there are some simple problems that we could fix to make the estimate more reliable?
The magic number was borrowed from SC and is empirical. The experience is that increasing occupancy beyond 4 for memory bound kernels does not help performance in average. But it is not linked to spilling in any way, just to cache thrashing. That is likely this number may differ for later targets, but again not linked to spilling.
I do not understand the logic really. TotalNumVGPR does not tell anything about spilling. I would understand if that is expressed in a maximum occupancy terms, so you return a fraction of maximum occupancy.
Moreover, this limit is really about a number of concurrent potential memory requests so that they do not evict each other or conflict. So if your CU can run more waves you likely want to drop occupancy here and give shader more registers for a memory bound case.
That said, this is not supposed to be a strong limit, if scheduler have to spill it should drop the occupancy below this limit. At least it was written that way at some point. I.e. I do not think this is a right place to fix spilling, you do not know real pressure here anyway. If this limit causes spilling then scheduler itself have to be fixed.