Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can we also add something like test/CodeGen/AMDGPU/llvm.amdgcn.lds.direct.load.ll and test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll to test codegen?
I have added something. Just to note, changes to SIInsertWaitcnt are coming which will affect these tests.
llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
1495 | This is setting M0, the same as parameter loads. | |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.lds.param.load.ll | ||
4 | There should probably be a move to m0 here? |
llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
1495 | I think it makes sense to have an intrinsic that closely corresponds to the instruction itself. Maybe add a comment explaining this fact about M0? I agree with Matt that the return value should be mangled. |
llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
1495 | I have commented on the input argument, and changed the return type. I am not familiar with the type mangling here, does this look correct? |
llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
---|---|---|
96 | How does that work? It seems to ignore M0 argument. |
llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
---|---|---|
96 | Because LDS_DIRECT_LOAD has an implicit use of M0. In llvm.amdgcn.lds.direct.load.ll it seems to work as it should given that. |
llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
---|---|---|
96 | Right, it uses M0, but where is a link from the call argument and actual store of that value into M0? |
llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
---|---|---|
96 | I don't know, but maybe this helps to explain? Dump from AMDGPUGenGlobalISel.inc // Label 2092: @108525 GIM_Try, /*On fail goto*//*Label 2093*/ 108577, // Rule ID 3516 // GIM_CheckIntrinsicID, /*MI*/0, /*Op*/1, Intrinsic::amdgcn_lds_direct_load, GIM_CheckType, /*MI*/0, /*Op*/0, /*Type*/GILLT_s32, GIM_CheckType, /*MI*/0, /*Op*/2, /*Type*/GILLT_s32, GIM_CheckRegBankForClass, /*MI*/0, /*Op*/0, /*RC*/AMDGPU::VGPR_32RegClassID, GIM_CheckRegBankForClass, /*MI*/0, /*Op*/2, /*RC*/AMDGPU::M0_CLASSRegClassID, // (intrinsic_w_chain:{ *:[f32] } 1864:{ *:[iPTR] }, M0:{ *:[i32] }) => (LDS_DIRECT_LOAD:{ *:[f32] } 0:{ *:[i8] }) GIR_BuildMI, /*InsnID*/1, /*Opcode*/TargetOpcode::COPY, GIR_AddRegister, /*InsnID*/1, AMDGPU::M0, /*AddRegisterRegFlags*/RegState::Define, GIR_Copy, /*NewInsnID*/1, /*OldInsnID*/0, /*OpIdx*/2, // M0 GIR_BuildMI, /*InsnID*/0, /*Opcode*/AMDGPU::LDS_DIRECT_LOAD, GIR_Copy, /*NewInsnID*/0, /*OldInsnID*/0, /*OpIdx*/0, // vdst GIR_AddImm, /*InsnID*/0, /*Imm*/0, GIR_MergeMemOperands, /*InsnID*/0, /*MergeInsnID's*/0, GIU_MergeMemOperands_EndOfList, GIR_EraseFromParent, /*InsnID*/0, GIR_ConstrainSelectedInstOperands, /*InsnID*/0, // GIR_Coverage, 3516, GIR_Done, // Label 2093: @108577 GIM_Reject, |
LGTM
llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
---|---|---|
96 | OK, I must admit I still do not understand how does it work, but it obviously does. |
llvm/lib/Target/AMDGPU/LDSDIRInstructions.td | ||
---|---|---|
96 | The physical register takes the place in the pattern of what normally would be a named virtual input, like VReg_32:$src1 |
llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
1491 | The __int_* prefix doesn't make much sense. I would suggest either using the tablegen name (int_amdgcn_lds_direct_load) or preferably the LLVM IR name (llvm.amdgcn.lds.direct.load). |
llvm/include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
1491 | Done, see |
The __int_* prefix doesn't make much sense. I would suggest either using the tablegen name (int_amdgcn_lds_direct_load) or preferably the LLVM IR name (llvm.amdgcn.lds.direct.load).