This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GISel] Widen s16 SHUFFLE_VECTOR where there are no scalar pack insts
AbandonedPublic

Authored by Pierre-vh on Oct 28 2022, 1:13 AM.

Details

Summary

On targets where we don't have scalar pack insts, it's more efficient to first widen the SHUFFLE_VECTOR before lowering. It avoids a bunch of useless bit-manipulation instructions.

This also:

  • Fixes a small bug in the LegalizationArtefactCombiner that was exposed by this patch (a few tests were crashing because of it)
  • Adds logic to the Legalizer so it can handle widening G_SHUFFLE_VECTORS

Diff Detail

Event Timeline

Pierre-vh created this revision.Oct 28 2022, 1:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 1:13 AM
Pierre-vh requested review of this revision.Oct 28 2022, 1:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 1:13 AM
arsenm accepted this revision.Nov 1 2022, 1:22 PM

LGTM with nit

llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
732

This is redundant with the assert in operator->

This revision is now accepted and ready to land.Nov 1 2022, 1:22 PM
arsenm added inline comments.Nov 1 2022, 1:25 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1555

I thought you were moving towards not using shuffle_vector for packed cases too.

Plus the packed cases still need handling for non-16 bit elements

Pierre-vh updated this revision to Diff 472540.Nov 2 2022, 2:10 AM
Pierre-vh marked 2 inline comments as done.

Rebase

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1555

I'm not sure I understand.

With D135145, the goal is to use shuffle_vector to replace insert_vector_elt in most cases (even packed ones) in a global combine.
The reason why this patch exists is because D135145 alone would cause a lot of regressions in the codebase for the no-pack-insts case due to the differences in how insert_vector_elt & shuffle_vector are lowered (shuffle implies a lot more bit manipulation stuff). If we widen the shuffle vectors before lowering them for those targets, codegen seems much better.

arsenm added inline comments.Nov 2 2022, 4:56 PM
llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
728–732

D137273 is a standalone fix for this

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1555

The only cases we have that look anything like a legal vector_shuffle is for 2 element masks (which is what you removed in c93104073c8a864113a42cde0cceb3e9c21bbf8d). That roughly corresponds with op_sel* modifiers for VOP3P instructions. This case is so narrow that it can also be handled with some shifts and casts. The downside of the shuffle is more general bit combines are less likely to fire on it.

We do not want shuffle vector for "most cases" of insert_vector_elt. The most common case for incoming IR is a sequence of insertelements which should fold to a build_vector. We also would not want an individual insertelement to turn into a shuffle. If we want shuffles to reach the selector, the only cases would be where it would be more convenient to match VO3P modifiers. In that case, we would also want wider shuffles to be split into 2 element pieces (cases where the shuffles remain isolated to two neighboring elements is a narrow narrow case where this could happen)

At the moment I don't see any reason to produce shuffles differently than what DAGCombiner does today

arsenm added inline comments.Nov 2 2022, 4:57 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1555

In principal this promotion shouldn't improve anything, if we had a sufficiently good combiner. Here and for most operations, it's probably giving the legalizer less work to pre-promote to 32-bit (which is closer to emulating the legalization process in the DAG)

Pierre-vh added inline comments.Nov 3 2022, 1:26 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1555

Does that mean this patch should be abandoned and I should look into adding new combines instead if I want to improve the codegen cases this patch affects?

And for D135145, does it also need to be abandoned as well?

Pierre-vh added inline comments.Nov 3 2022, 1:48 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1555

If D135145 needs to be abandoned then what is the alternative? I remember we discussed quite a lot about this and I thought using shuffle_vector was the solution?

Pierre-vh abandoned this revision.Nov 4 2022, 2:05 AM

D137273 fixes the codegen cases I was interested in, it seems.