This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx11 ldsdir intrinsics and ISel
ClosedPublic

Authored by Joe_Nash on Jun 13 2022, 8:50 AM.

Details

Diff Detail

Event Timeline

Joe_Nash created this revision.Jun 13 2022, 8:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2022, 8:50 AM
Joe_Nash requested review of this revision.Jun 13 2022, 8:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2022, 8:50 AM
Joe_Nash added a reviewer: Restricted Project.Jun 13 2022, 8:51 AM
foad added a comment.Jun 13 2022, 8:57 AM

Can we also add something like test/CodeGen/AMDGPU/llvm.amdgcn.lds.direct.load.ll and test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll to test codegen?

Joe_Nash updated this revision to Diff 436442.Jun 13 2022, 9:26 AM

added ISA codegen test.

Joe_Nash added a comment.EditedJun 13 2022, 9:27 AM

Can we also add something like test/CodeGen/AMDGPU/llvm.amdgcn.lds.direct.load.ll and test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll to test codegen?

I have added something. Just to note, changes to SIInsertWaitcnt are coming which will affect these tests.

arsenm added inline comments.Jun 14 2022, 9:43 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1464

Why is the return type hardcoded to float instead of mangled for a load?

1465

Also would expect this to be an addrspace 3 pointer

critson added inline comments.Jun 14 2022, 8:11 PM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1465

This is setting M0, the same as parameter loads.
Construction of M0 values is handled by the front-end because it is not a pure address pointer but rather a combination of address offset and flags describing the data type.
I guess we could rework this to form M0 in the backend based on an address 3 pointer and a return type.

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll
3 ↗(On Diff #436442)

There should probably be a move to m0 here?

nhaehnle added inline comments.Jun 15 2022, 12:54 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1465

I think it makes sense to have an intrinsic that closely corresponds to the instruction itself. Maybe add a comment explaining this fact about M0?

I agree with Matt that the return value should be mangled.

Joe_Nash updated this revision to Diff 437146.Jun 15 2022, 7:06 AM

changed return type of lds_direct_load and commented on input argument type

Joe_Nash added inline comments.Jun 15 2022, 7:07 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1465

I have commented on the input argument, and changed the return type. I am not familiar with the type mangling here, does this look correct?

Joe_Nash updated this revision to Diff 437154.Jun 15 2022, 7:21 AM
Joe_Nash marked an inline comment as done.

added s_mov m0 to param.load test

Joe_Nash updated this revision to Diff 437327.Jun 15 2022, 1:45 PM

removed builtin from intrinsic definition

I think all outstanding issues are addressed; please take another look.

arsenm added inline comments.Jun 15 2022, 1:52 PM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.direct.load.ll
2 ↗(On Diff #437327)

Missing globalisel codegen tests

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll
2 ↗(On Diff #437327)

Missing globalisel codegen tests

Joe_Nash updated this revision to Diff 437522.Jun 16 2022, 6:18 AM

added globalisel runlines to codegen test

Joe_Nash marked 2 inline comments as done.Jun 16 2022, 6:18 AM
rampitec added inline comments.Jun 16 2022, 11:23 AM
llvm/lib/Target/AMDGPU/LDSDIRInstructions.td
96

How does that work? It seems to ignore M0 argument.

Joe_Nash added inline comments.Jun 16 2022, 11:35 AM
llvm/lib/Target/AMDGPU/LDSDIRInstructions.td
96

Because LDS_DIRECT_LOAD has an implicit use of M0. In llvm.amdgcn.lds.direct.load.ll it seems to work as it should given that.

rampitec added inline comments.Jun 16 2022, 11:42 AM
llvm/lib/Target/AMDGPU/LDSDIRInstructions.td
96

Right, it uses M0, but where is a link from the call argument and actual store of that value into M0?

Joe_Nash added inline comments.Jun 16 2022, 1:44 PM
llvm/lib/Target/AMDGPU/LDSDIRInstructions.td
96

I don't know, but maybe this helps to explain? Dump from AMDGPUGenGlobalISel.inc

// Label 2092: @108525
GIM_Try, /*On fail goto*//*Label 2093*/ 108577, // Rule ID 3516 //
  GIM_CheckIntrinsicID, /*MI*/0, /*Op*/1, Intrinsic::amdgcn_lds_direct_load,
  GIM_CheckType, /*MI*/0, /*Op*/0, /*Type*/GILLT_s32,
  GIM_CheckType, /*MI*/0, /*Op*/2, /*Type*/GILLT_s32,
  GIM_CheckRegBankForClass, /*MI*/0, /*Op*/0, /*RC*/AMDGPU::VGPR_32RegClassID,
  GIM_CheckRegBankForClass, /*MI*/0, /*Op*/2, /*RC*/AMDGPU::M0_CLASSRegClassID,
  // (intrinsic_w_chain:{ *:[f32] } 1864:{ *:[iPTR] }, M0:{ *:[i32] })  =>  (LDS_DIRECT_LOAD:{ *:[f32] } 0:{ *:[i8] })
  GIR_BuildMI, /*InsnID*/1, /*Opcode*/TargetOpcode::COPY,
  GIR_AddRegister, /*InsnID*/1, AMDGPU::M0, /*AddRegisterRegFlags*/RegState::Define,
  GIR_Copy, /*NewInsnID*/1, /*OldInsnID*/0, /*OpIdx*/2, // M0
  GIR_BuildMI, /*InsnID*/0, /*Opcode*/AMDGPU::LDS_DIRECT_LOAD,
  GIR_Copy, /*NewInsnID*/0, /*OldInsnID*/0, /*OpIdx*/0, // vdst
  GIR_AddImm, /*InsnID*/0, /*Imm*/0,
  GIR_MergeMemOperands, /*InsnID*/0, /*MergeInsnID's*/0, GIU_MergeMemOperands_EndOfList,
  GIR_EraseFromParent, /*InsnID*/0,
  GIR_ConstrainSelectedInstOperands, /*InsnID*/0,
  // GIR_Coverage, 3516,
  GIR_Done,
// Label 2093: @108577
GIM_Reject,
rampitec accepted this revision.Jun 16 2022, 1:55 PM

LGTM

llvm/lib/Target/AMDGPU/LDSDIRInstructions.td
96

OK, I must admit I still do not understand how does it work, but it obviously does.

This revision is now accepted and ready to land.Jun 16 2022, 1:55 PM
arsenm added inline comments.Jun 16 2022, 2:00 PM
llvm/lib/Target/AMDGPU/LDSDIRInstructions.td
96

The physical register takes the place in the pattern of what normally would be a named virtual input, like VReg_32:$src1

This revision was landed with ongoing or failed builds.Jun 17 2022, 6:32 AM
This revision was automatically updated to reflect the committed changes.
foad added inline comments.Jun 17 2022, 7:39 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1461

The __int_* prefix doesn't make much sense. I would suggest either using the tablegen name (int_amdgcn_lds_direct_load) or preferably the LLVM IR name (llvm.amdgcn.lds.direct.load).

Joe_Nash added inline comments.Jun 17 2022, 9:59 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1461