Large loads on target that does not useFlatForGlobal have to be split
in regbankselect. This did not happen in case when destination had vgpr
bank and address had sgpr bank.
Instead of checking if address bank is sgpr check bank of the destination.
Details
Diff Detail
Event Timeline
This was detected on gfx7 on large test (few hundred lines) with many uniform loads.
Loads in first ~200 lines had amdgpu.noclobber but after some threshold they did not.
AnnotateUniformValues was relying on MemoryDependenceResults::getSimplePointerDependencyFrom which gives up at some point and returns MemDepResult::getUnknown() resulting in amdgpu.noclobber not being set on address. Such load will be selected using vgpr destination (and sgpr address on gfx7).
In the case of mentioned test, amdgpu.noclobber not being set is fixed by D101962. Thus only mir test.
llvm/test/CodeGen/AMDGPU/GlobalISel/load-constant.96.ll | ||
---|---|---|
449–452 | Why is it a bug fix? Isn't it better to use a scalar load when possible, cos it's less memory traffic? |
llvm/test/CodeGen/AMDGPU/GlobalISel/load-constant.96.ll | ||
---|---|---|
449–452 | It is a silent bug, that load was marked as vgpr dest, and then split into two sgpr dest loads (was meant to be one buffer load according to banks assigned before applyMapping) |
Can you also add an end to end IR test for this case
No longer possible because of D101962 (also was way too large for lit test).
I did find another test, it needs -mattr=+unaligned-access-mode and align 1 to pass through legalizer and get vgpr dst + sgpr address.
llvm/test/CodeGen/AMDGPU/GlobalISel/load-constant.96.ll | ||
---|---|---|
449–452 | Scalar loads don't support unaligned access, so this was wrong to be using them |
LGTM, although I'm a bit concerned we were selecting unaligned scalar loads so there may be another bug hidden in here
clang-format: please reformat the code