Added the intrinsics llvm.amdgcn.interp.p1.f16() and
llvm.amdgcn.interp.p2.f16() and related LIT test.
The p1 intrinsic generates code appropriate for both 16 and 32
bank LDS.
Paths
| Differential D46754
[AMDGPU] Add intrinsics for 16 bit interpolation ClosedPublic Authored by timcorringham on May 11 2018, 6:48 AM.
Details
Summary Added the intrinsics llvm.amdgcn.interp.p1.f16() and The p1 intrinsic generates code appropriate for both 16 and 32
Diff Detail
Event TimelineHerald added subscribers: llvm-commits, t-tye, tpr and 6 others. · View Herald TranscriptMay 11 2018, 6:48 AM This revision now requires changes to proceed.May 14 2018, 4:26 AM
Comment Actions Corrected the ordering of operands to interp_p2_f16, added lowered I have not overloaded the intrinsics as I don't believe it is possible Comment Actions
Is the extra parameter you're referring the high parameter to change where the register is read from the high or low bits? That shouldn't be exposed in the intrinsic at all. Eliminating the high bit extraction is a codegen optimization pattern Comment Actions
Or is this bit controlling the weird load from memory? The manual isn't particularly clear to me. I see mention of LDs loads, but also op_sel control of destination bits
Comment Actions Even without the high operand I don't think it is possible to overload interp_p1 and interp_p1_f16 as they would have identical types - there is nothing to disambiguate them. Comment Actions
Yes, the high bit controls the LDS access. As all the operands to interp_p1_f16 are the same types as for the 32 bit variant, I don't know of any way to deduce the value of the high bit if it isn't specified explicitly.
Comment Actions Change the omod operand type to be i32 rather than i1, to avoid Comment Actions [AMDGPU] Add intrinsics for 16 bit interpolation Added a new pass to to ensure that the 16 bit interpolation Comment Actions A slighly more performant implementation of the pass to add any Comment Actions Refactored pass to insert rounding mode to use a style more in line timcorringham marked 2 inline comments as done. Comment ActionsRebased, and amended LIT test now that the required mode register
timcorringham marked an inline comment as done. Comment ActionsExtended llvm.amdgcn.interp.f16.ll to check that m0 is set before timcorringham added inline comments.
This revision is now accepted and ready to land.Jan 25 2019, 9:13 AM Closed by commit rL352357: [AMDGPU] Add intrinsics for 16 bit interpolation (authored by timcorringham). · Explain WhyJan 28 2019, 5:48 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 153913 include/llvm/IR/IntrinsicsAMDGPU.td
lib/Target/AMDGPU/AMDGPU.h
lib/Target/AMDGPU/AMDGPUISelLowering.h
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
lib/Target/AMDGPU/AMDGPUInstrInfo.td
lib/Target/AMDGPU/AMDGPUSearchableTables.td
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
lib/Target/AMDGPU/CMakeLists.txt
lib/Target/AMDGPU/SIDefines.h
lib/Target/AMDGPU/SIISelLowering.cpp
lib/Target/AMDGPU/SIInstrFormats.td
lib/Target/AMDGPU/SIInstrInfo.h
lib/Target/AMDGPU/SIModeRegister.cpp
lib/Target/AMDGPU/VOP1Instructions.td
lib/Target/AMDGPU/VOP3Instructions.td
test/CodeGen/AMDGPU/llvm.amdgcn.interp.f16.ll
test/CodeGen/AMDGPU/mode-register.mir
|
You should add name mangling to the existing intrinsics rather than new intrinsics. The builtin declaration needs to be done in clang for the GCCBuiltin