Generate flat instructions for 64-bit global buffer instructions. In this patch, we defined
a subtarget feature: FlatForGlobal, and add this feature for CI. This means for CI, we will
generate a flat instruction for a 64-bit global buffer reference.
Details
- Reviewers
• tstellarAMD - Commits
- rGb41574a96126: AMDGPU/SI: Use flat for global load/store when targeting HSA
rG9b8a9be05870: AMDGPU/SI: Use flat for global load/store when targeting HSA
rL256282: AMDGPU/SI: Use flat for global load/store when targeting HSA
rL256273: AMDGPU/SI: Use flat for global load/store when targeting HSA
Diff Detail
Event Timeline
The commit message should be prefixed with: AMDGPU/SI:
lib/Target/AMDGPU/AMDGPU.td | ||
---|---|---|
240 | We don't want to enable FeatureFlatForGlobal globally. It should only be used for HSA. | |
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
417–418 | I think this fix should be in a separate patch. | |
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
995–998 | You should move the check for useFlatForGlobal() into SelectMUBUF() that way you only need to add it in one place. | |
1064–1068 | You won't need this once you have the check in SelectMUBUF(). | |
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
73 | You need to initialize this value to something in the AMDGPUSubtarget constructor. 'true' when targeting HSA, 'false' otherwise. | |
lib/Target/AMDGPU/VIInstructions.td | ||
107–112 | SInce the patterns below this will be used for both CI and VI now, they should be moved into CIInstructions.td | |
test/CodeGen/AMDGPU/ci-use-flat-for-global.ll | ||
1–2 | Should add a RUN line for non-hsa: -march=amdgcn instead of -mtriple=amdgcn--amdhsa to make sure we are still using the buffer instructions. |
lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
---|---|---|
102–105 ↗ | (On Diff #43042) | You need to set this value before you call initializeSubtargetDependencies() otherwise users won't be able to toggle this using -mattr=flat-for-global. |
test/CodeGen/AMDGPU/ci-use-flat-for-global.ll | ||
2–3 | I would add two more run line here to test setting the non-default value for each target using -mattr= | |
7 | You can drop the spir_kernel calling convention here. | |
30–37 | Since we are disabling buffer instructions globally, I think you can use a much more simple test case: void @test(i32 addrspace(1)* %out) { entry: store i32 0, i32 addrspace(1)* %out ret void } |
update based on Tom's reviews:
- set FlatForGlobal before "initializeSubtargetDependencies()"
- update LIT test for a simple one and also check around -mattr=(+/-)flat-for-global
lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
---|---|---|
81–82 ↗ | (On Diff #43053) | The test case you added fails now, because the generation has not been initialized yet. I think we can just check for AmdHsaOS() here. |
Hello, tom:
I am taking this afternoon again. Call you make the change or I can do it tomorrow!
Thanks
Changpeng
Change the way of initialization for FlatForGlobal to make -mattr=-flat-for-global work;
Fix LIT test failures due to generating flat instructions instead of buffer ones.
LGTM.
I think it would be good to explain why we are making this change in the commit message. Here is the commit message I wrote when I was getting ready to commit an earlier version of the change. Feel free to use it:
AMDGPU/SI: Use flat for global load/store when targeting HSA For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work.
lib/Target/AMDGPU/CIInstructions.td | ||
---|---|---|
36–37 ↗ | (On Diff #43415) | Unnecessary whitepsace change, please drop this before you commit. |
Commited but was reverted because of a lIT failure on llvm.dbg.ll.
But this is the test I modified and my local test of this passed.
What could happen?
Thanks;
Changpeng
We don't want to enable FeatureFlatForGlobal globally. It should only be used for HSA.