With OS type AMDPAL, the scratch descriptor is hardwired to be loaded
from offset 0 of the global information table, whose low pointer is
passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as
the hardware reserves s0-s7.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
397 ↗ | (On Diff #130287) | Why aren't there corresponding changes in SIISelLowering? I would expect the argument lowering code would need to be aware of this |
test/CodeGen/AMDGPU/amdpal_scratch_mergeshader.ll | ||
1 ↗ | (On Diff #130287) | Use GCN instead of CHECK with multiple prefixes |
12 ↗ | (On Diff #130287) | You don't need the linkage. Most of these arguments also look unused. Why is most of this function body necessary? Shouldn't you only need a single volatile scratch access? |
lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
397 ↗ | (On Diff #130287) | The frontend does actually set up args for s0-s7 as well as the user sgprs starting at s8. So arg lowering does not need to do anything different. |
test/CodeGen/AMDGPU/amdpal_scratch_mergeshader.ll | ||
---|---|---|
12 ↗ | (On Diff #130287) | Removed the linkage and arg9 onwards. arg0-arg8 are needed. I got that body by using bugpoint on a real shader; any further manual reduction results in the scratch access disappearing. It needs to be an alloca to test what the fix is doing. |
One tiny nitpick, apart from that LGTM.
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
---|---|---|
859 ↗ | (On Diff #133031) | Merged, here and elsewhere? |