SMEM opcodes are faster, so we want to use them if possible.
Deus Ex performance: +13%
(with on-demand shader compilation enabled in Deus Ex, so the real
improvement should be higher)
Paths
| Differential D28993
AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load AbandonedPublic Authored by mareko on Jan 22 2017, 2:16 PM.
Details
Summary SMEM opcodes are faster, so we want to use them if possible. Deus Ex performance: +13%
Diff Detail
Event TimelineComment Actions I wonder if the improvement comes from the fact that the intrinsics can use SMEM now, or the fact I fixed smrd#_SGPR to accept a non-constant offset.
Comment Actions What if something else has written to the buffer in the same shader? That would make using smem instructions illegal. Comment Actions I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'. Comment Actions
Can you be more specific about why it's incorrect? I only see an issue with L1 coherency (GLC=0).
Comment Actions
That's what I meant, yes. Comment Actions Another possible issue is that SMEM instructions ignore bits of the resource descriptor. So you would need some way to tell the compiler that it wouldn't be ignoring some relevant resource bits by selecting to SMEM. Comment Actions
Doesn't this make this change unworkable? Presumably the front-end would need to annotate in some way to indicate that this is a legitimate transformation, in which case you might as well use a different intrinsic anyway. Are there any circumstances where you can determine if this is definitely the case? I've got a situation that benefits from this change, but equally could use the solution in D27586. Perhaps that change could be enhanced with the non-const offset change in this review? Comment Actions
Yes, having separate intrinsics like D27586 is preferable not just because of the sL1 vs vL1 coherency stuff, but also because SMEM instructions have many differences compared to VMEM and sometimes even the same looking VMEM and SMEM instructions have different behavior. It's also important that s.load intrinsics support non-constant offsets and are lowered to VMEM when the address comes from a VGPR.
Revision Contents
Diff 85298 lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
lib/Target/AMDGPU/SIInstrInfo.cpp
lib/Target/AMDGPU/SIRegisterInfo.td
lib/Target/AMDGPU/SMInstructions.td
test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.ll
|
I'm not sure I understand what the point of the AnyReg parameter is