Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can we also add something like test/CodeGen/AMDGPU/llvm.amdgcn.lds.direct.load.ll and test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll to test codegen?
I have added something. Just to note, changes to SIInsertWaitcnt are coming which will affect these tests.
| llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
|---|---|---|
| 1495 | This is setting M0, the same as parameter loads. | |
| llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll | ||
| 4 | There should probably be a move to m0 here? | |
| llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
|---|---|---|
| 1495 | I think it makes sense to have an intrinsic that closely corresponds to the instruction itself. Maybe add a comment explaining this fact about M0? I agree with Matt that the return value should be mangled. | |
| llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
|---|---|---|
| 1495 | I have commented on the input argument, and changed the return type. I am not familiar with the type mangling here, does this look correct? | |
| llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
|---|---|---|
| 96 | How does that work? It seems to ignore M0 argument. | |
| llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
|---|---|---|
| 96 | Because LDS_DIRECT_LOAD has an implicit use of M0. In llvm.amdgcn.lds.direct.load.ll it seems to work as it should given that. | |
| llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
|---|---|---|
| 96 | Right, it uses M0, but where is a link from the call argument and actual store of that value into M0? | |
| llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
|---|---|---|
| 96 | I don't know, but maybe this helps to explain? Dump from AMDGPUGenGlobalISel.inc // Label 2092: @108525
GIM_Try, /*On fail goto*//*Label 2093*/ 108577, // Rule ID 3516 //
GIM_CheckIntrinsicID, /*MI*/0, /*Op*/1, Intrinsic::amdgcn_lds_direct_load,
GIM_CheckType, /*MI*/0, /*Op*/0, /*Type*/GILLT_s32,
GIM_CheckType, /*MI*/0, /*Op*/2, /*Type*/GILLT_s32,
GIM_CheckRegBankForClass, /*MI*/0, /*Op*/0, /*RC*/AMDGPU::VGPR_32RegClassID,
GIM_CheckRegBankForClass, /*MI*/0, /*Op*/2, /*RC*/AMDGPU::M0_CLASSRegClassID,
// (intrinsic_w_chain:{ *:[f32] } 1864:{ *:[iPTR] }, M0:{ *:[i32] }) => (LDS_DIRECT_LOAD:{ *:[f32] } 0:{ *:[i8] })
GIR_BuildMI, /*InsnID*/1, /*Opcode*/TargetOpcode::COPY,
GIR_AddRegister, /*InsnID*/1, AMDGPU::M0, /*AddRegisterRegFlags*/RegState::Define,
GIR_Copy, /*NewInsnID*/1, /*OldInsnID*/0, /*OpIdx*/2, // M0
GIR_BuildMI, /*InsnID*/0, /*Opcode*/AMDGPU::LDS_DIRECT_LOAD,
GIR_Copy, /*NewInsnID*/0, /*OldInsnID*/0, /*OpIdx*/0, // vdst
GIR_AddImm, /*InsnID*/0, /*Imm*/0,
GIR_MergeMemOperands, /*InsnID*/0, /*MergeInsnID's*/0, GIU_MergeMemOperands_EndOfList,
GIR_EraseFromParent, /*InsnID*/0,
GIR_ConstrainSelectedInstOperands, /*InsnID*/0,
// GIR_Coverage, 3516,
GIR_Done,
// Label 2093: @108577
GIM_Reject, | |
LGTM
| llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
|---|---|---|
| 96 | OK, I must admit I still do not understand how does it work, but it obviously does. | |
| llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
|---|---|---|
| 96 | The physical register takes the place in the pattern of what normally would be a named virtual input, like VReg_32:$src1 | |
| llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
|---|---|---|
| 1491 | The __int_* prefix doesn't make much sense. I would suggest either using the tablegen name (int_amdgcn_lds_direct_load) or preferably the LLVM IR name (llvm.amdgcn.lds.direct.load). | |
| llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
|---|---|---|
| 1491 | Done, see | |
The __int_* prefix doesn't make much sense. I would suggest either using the tablegen name (int_amdgcn_lds_direct_load) or preferably the LLVM IR name (llvm.amdgcn.lds.direct.load).