so that the stack access is avoided.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2710 ↗ | (On Diff #426278) | I don’t think these should be legal. We don’t naturally have 8 X 16 operations. A lowering that splits the vector would avoid introducing the wider registers and may combine better |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2710 ↗ | (On Diff #426278) | We actually do have these operands: v_smfmac_f32_16x16x32_f16 v_smfmac_f32_32x32x16_f16 v_smfmac_f32_16x16x32_bf16 v_smfmac_f32_32x32x16_bf16 |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2710 ↗ | (On Diff #426278) | And, even if we think that we better handle it by splitting the vector, then we can just materialize scalar_to_vector as build_vector since build_vector already has custom lowering for v8i16/v8f16 by splitting these types. I experimented it, I see better ISEL output in this case, and also final ISA looks good - one shift and one pack operation is got eliminated. I will update the patch with this change. Let's take a look at it and discuss it. |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2710 ↗ | (On Diff #426278) | Please do. Offhand the patch looks good to me. |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2715 ↗ | (On Diff #426278) | It’s also not obvious what register clas me this will end up picking |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
2715 ↗ | (On Diff #426278) | Legal class for VT? Note exactly the same code above. |
Fix review comments (by Matt), by materializing custom lowering of
scalar_to_vector via build_vector since build_vector already specially
handles v8i16/v8f16 types.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
5973 | Why not undef? It is less operations. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
5972 | call getUNDEF() once. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
5978 | This again giving same ISA for (new) lit tests which we were getting when lowered to insert_subreg. |
Not much point in the assert since this works for any size