Added the intrinsics llvm.amdgcn.interp.p1.f16() and
llvm.amdgcn.interp.p2.f16() and related LIT test.
The p1 intrinsic generates code appropriate for both 16 and 32
bank LDS.
Differential D46754
[AMDGPU] Add intrinsics for 16 bit interpolation timcorringham on May 11 2018, 6:48 AM. Authored by
Details
Added the intrinsics llvm.amdgcn.interp.p1.f16() and The p1 intrinsic generates code appropriate for both 16 and 32
Diff Detail
Event Timeline
Comment Actions Corrected the ordering of operands to interp_p2_f16, added lowered I have not overloaded the intrinsics as I don't believe it is possible Comment Actions Is the extra parameter you're referring the high parameter to change where the register is read from the high or low bits? That shouldn't be exposed in the intrinsic at all. Eliminating the high bit extraction is a codegen optimization pattern Comment Actions Or is this bit controlling the weird load from memory? The manual isn't particularly clear to me. I see mention of LDs loads, but also op_sel control of destination bits
Comment Actions Even without the high operand I don't think it is possible to overload interp_p1 and interp_p1_f16 as they would have identical types - there is nothing to disambiguate them. Comment Actions
Yes, the high bit controls the LDS access. As all the operands to interp_p1_f16 are the same types as for the 32 bit variant, I don't know of any way to deduce the value of the high bit if it isn't specified explicitly.
Comment Actions Change the omod operand type to be i32 rather than i1, to avoid Comment Actions [AMDGPU] Add intrinsics for 16 bit interpolation Added a new pass to to ensure that the 16 bit interpolation Comment Actions A slighly more performant implementation of the pass to add any Comment Actions Refactored pass to insert rounding mode to use a style more in line Comment Actions Rebased, and amended LIT test now that the required mode register
Comment Actions Extended llvm.amdgcn.interp.f16.ll to check that m0 is set before
|
You should add name mangling to the existing intrinsics rather than new intrinsics. The builtin declaration needs to be done in clang for the GCCBuiltin