This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix scalar_to_vector for v8i16/v8f16
ClosedPublic

Authored by hsmhsm on May 1 2022, 2:27 AM.

Details

Summary

so that the stack access is avoided.

Diff Detail

Event Timeline

hsmhsm created this revision.May 1 2022, 2:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2022, 2:27 AM
hsmhsm requested review of this revision.May 1 2022, 2:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2022, 2:27 AM
arsenm added inline comments.May 1 2022, 6:22 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2710

I don’t think these should be legal. We don’t naturally have 8 X 16 operations. A lowering that splits the vector would avoid introducing the wider registers and may combine better

rampitec added inline comments.May 2 2022, 11:10 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2710

We actually do have these operands:

v_smfmac_f32_16x16x32_f16
v_smfmac_f32_32x32x16_f16
v_smfmac_f32_16x16x32_bf16
v_smfmac_f32_32x32x16_bf16
hsmhsm added inline comments.May 2 2022, 11:27 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2710

And, even if we think that we better handle it by splitting the vector, then we can just materialize scalar_to_vector as build_vector since build_vector already has custom lowering for v8i16/v8f16 by splitting these types.

I experimented it, I see better ISEL output in this case, and also final ISA looks good - one shift and one pack operation is got eliminated. I will update the patch with this change. Let's take a look at it and discuss it.

rampitec added inline comments.May 2 2022, 11:36 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2710

Please do. Offhand the patch looks good to me.

arsenm added inline comments.May 2 2022, 11:47 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2715

It’s also not obvious what register clas me this will end up picking

rampitec added inline comments.May 2 2022, 11:53 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2715

Legal class for VT? Note exactly the same code above.

hsmhsm updated this revision to Diff 426470.May 2 2022, 11:56 AM

Fix review comments (by Matt), by materializing custom lowering of
scalar_to_vector via build_vector since build_vector already specially
handles v8i16/v8f16 types.

rampitec added inline comments.May 2 2022, 12:02 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5975

Why not undef? It is less operations.

arsenm added inline comments.May 2 2022, 12:07 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5971

Not much point in the assert since this works for any size

5980

Just return the build_vecotr. You are potentially missing combine opportunities by directly lowering it

hsmhsm updated this revision to Diff 426503.May 2 2022, 1:13 PM

Fix review comments by Stas and Matt.

rampitec added inline comments.May 2 2022, 1:15 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5974

call getUNDEF() once.

hsmhsm marked 2 inline comments as done.May 2 2022, 1:17 PM
hsmhsm added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
5980

This again giving same ISA for (new) lit tests which we were getting when lowered to insert_subreg.

hsmhsm updated this revision to Diff 426507.May 2 2022, 1:25 PM

Fix further review comments by Stas.

hsmhsm marked an inline comment as done.May 2 2022, 1:25 PM
rampitec accepted this revision.May 2 2022, 1:30 PM

LGTM

This revision is now accepted and ready to land.May 2 2022, 1:30 PM
This revision was landed with ongoing or failed builds.May 2 2022, 7:31 PM
This revision was automatically updated to reflect the committed changes.
hsmhsm marked an inline comment as done.