This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Try to select SMEM opcodes for llvm.amdgcn.buffer.load
AbandonedPublic

Authored by mareko on Jan 22 2017, 2:16 PM.

Details

Summary

SMEM opcodes are faster, so we want to use them if possible.

Deus Ex performance: +13%
(with on-demand shader compilation enabled in Deus Ex, so the real
improvement should be higher)

Event Timeline

mareko created this revision.Jan 22 2017, 2:16 PM

I wonder if the improvement comes from the fact that the intrinsics can use SMEM now, or the fact I fixed smrd#_SGPR to accept a non-constant offset.

arsenm added inline comments.Jan 23 2017, 11:37 AM
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1190

I'm not sure I understand what the point of the AnyReg parameter is

1311–1312

Can you move this getGeneration check into a new Subtarget->hasGLCForSMEM()?

lib/Target/AMDGPU/SIInstrInfo.cpp
2973

Do you also ne ed to handle x8/x16? (those can probably be a separate patch, but for now an assert/unreachable woul dwork)

2995–2996

These should use getNamedOperand

2997–2998

These look backwards from the order (it's also called offset, not inst_offset).

test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.ll
144

Tests seem to be missing for the moveToVALU path (or should those all be covered by the existing intrinsic tests?)

What if something else has written to the buffer in the same shader? That would make using smem instructions illegal.

nhaehnle edited edge metadata.Feb 1 2017, 3:46 AM

I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.

mareko added a comment.Feb 1 2017, 3:56 AM

I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.

Can you be more specific about why it's incorrect? I only see an issue with L1 coherency (GLC=0).

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1190

I'll change that to "AllowNonConst". It should be obvious if you look below.

1311–1312

Sure.

lib/Target/AMDGPU/SIInstrInfo.cpp
2973

No. They can't be selected for amdgcn.buffer.load.

2997–2998

The order is the same as in the .inc files.

test/CodeGen/AMDGPU/llvm.amdgcn.buffer.load.ll
144

They are all covered by existing tests.

I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.

Can you be more specific about why it's incorrect? I only see an issue with L1 coherency (GLC=0).

That's what I meant, yes.

Another possible issue is that SMEM instructions ignore bits of the resource descriptor. So you would need some way to tell the compiler that it wouldn't be ignoring some relevant resource bits by selecting to SMEM.

t-tye added a subscriber: t-tye.Mar 22 2017, 6:38 PM
tony-tye removed a subscriber: tony-tye.Mar 22 2017, 6:50 PM

Another possible issue is that SMEM instructions ignore bits of the resource descriptor. So you would need some way to tell the compiler that it wouldn't be ignoring some relevant resource bits by selecting to SMEM.

Doesn't this make this change unworkable? Presumably the front-end would need to annotate in some way to indicate that this is a legitimate transformation, in which case you might as well use a different intrinsic anyway. Are there any circumstances where you can determine if this is definitely the case?

I've got a situation that benefits from this change, but equally could use the solution in D27586. Perhaps that change could be enhanced with the non-const offset change in this review?

mareko abandoned this revision.May 3 2017, 9:30 AM

Another possible issue is that SMEM instructions ignore bits of the resource descriptor. So you would need some way to tell the compiler that it wouldn't be ignoring some relevant resource bits by selecting to SMEM.

Doesn't this make this change unworkable? Presumably the front-end would need to annotate in some way to indicate that this is a legitimate transformation, in which case you might as well use a different intrinsic anyway. Are there any circumstances where you can determine if this is definitely the case?

I've got a situation that benefits from this change, but equally could use the solution in D27586. Perhaps that change could be enhanced with the non-const offset change in this review?

Yes, having separate intrinsics like D27586 is preferable not just because of the sL1 vs vL1 coherency stuff, but also because SMEM instructions have many differences compared to VMEM and sometimes even the same looking VMEM and SMEM instructions have different behavior. It's also important that s.load intrinsics support non-constant offsets and are lowered to VMEM when the address comes from a VGPR.