With OS type AMDPAL, the scratch descriptor is hardwired to be loaded
from offset 0 of the global information table, whose low pointer is
passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as
the hardware reserves s0-s7.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
- Buildable 13946 - Build 13946: arc lint + arc unit 
Event Timeline
| lib/Target/AMDGPU/SIFrameLowering.cpp | ||
|---|---|---|
| 397 | Why aren't there corresponding changes in SIISelLowering? I would expect the argument lowering code would need to be aware of this | |
| test/CodeGen/AMDGPU/amdpal_scratch_mergeshader.ll | ||
| 1 | Use GCN instead of CHECK with multiple prefixes | |
| 12 | You don't need the linkage. Most of these arguments also look unused. Why is most of this function body necessary? Shouldn't you only need a single volatile scratch access? | |
| lib/Target/AMDGPU/SIFrameLowering.cpp | ||
|---|---|---|
| 397 | The frontend does actually set up args for s0-s7 as well as the user sgprs starting at s8. So arg lowering does not need to do anything different. | |
| test/CodeGen/AMDGPU/amdpal_scratch_mergeshader.ll | ||
|---|---|---|
| 12 | Removed the linkage and arg9 onwards. arg0-arg8 are needed. I got that body by using bugpoint on a real shader; any further manual reduction results in the scratch access disappearing. It needs to be an alloca to test what the fix is doing. | |
One tiny nitpick, apart from that LGTM.
| lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
|---|---|---|
| 854 | Merged, here and elsewhere? | |
Merged, here and elsewhere?