This is an archive of the discontinued LLVM Phabricator instance.

[x86][CGP] enable target hook to sink funnel shift intrinsic's splatted shift amount
ClosedPublic

Authored by spatel on May 11 2020, 9:03 AM.

Details

Summary

SDAG suffers when it can't see that a funnel operand is a splat value (due to single-basic-block visibility), so invert the normal loop hoisting rules to move a splat op closer to its use.

This would be part 1 of an enhancement similar to D63233.

This is needed to re-fix PR37426:
https://bugs.llvm.org/show_bug.cgi?id=37426
...because we got better at canonicalizing IR to funnel shift intrinsics.

The existing CGP code for shift opcodes is likely overstepping what it was intended to do, so that will be fixed in a follow-up.

Diff Detail

Event Timeline

spatel created this revision.May 11 2020, 9:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2020, 9:03 AM
RKSimon added inline comments.May 11 2020, 9:21 AM
llvm/lib/CodeGen/CodeGenPrepare.cpp
6439 ↗(On Diff #263181)

Please can you mention that we use it for funnels/rotates in isVectorShiftByScalarCheap descriptions in TargetLowering.h/X86ISelLowering.cpp

spatel marked an inline comment as done.May 11 2020, 9:22 AM
spatel added inline comments.
llvm/test/Transforms/CodeGenPrepare/X86/x86-shuffle-sink.ll
219

The labeling here is a bit misleading - "AVX" means both AVX2 and AVX512, but not AVX1; there is no AVX1 run line on this file. More specific testing is shown in the x86 codegen file.

spatel updated this revision to Diff 263196.May 11 2020, 9:35 AM
spatel marked an inline comment as done.

Patch updated:
Added header comments to indicate that the hook applies to shifts and funnel shifts.

RKSimon accepted this revision.May 12 2020, 2:29 AM

LGTM - thanks

This revision is now accepted and ready to land.May 12 2020, 2:29 AM

See comments in D78728.
After thinking about this a bit more, it seems the existing code can overstep what it was meant to do, and there's a likely better way to reorganize it. I'm going to investigate that before trying to push this.

spatel updated this revision to Diff 263411.May 12 2020, 6:16 AM
spatel retitled this revision from [CGP] include funnel shift intrinsic when sinking splatted operands to [x86][CGP] enable target hook to sink funnel shift intrinsic's splatted shift amount.
spatel edited the summary of this revision. (Show Details)

Patch completely revised:
Don't build on the broken CGP code for shifts. Enable the TLI hook for shouldSinkOperands() instead, and limit this transform to splats of the funnel shift amount operand only.

spatel requested review of this revision.May 12 2020, 6:19 AM

Different implementation, but virtually identical test diffs.

spatel updated this revision to Diff 263417.EditedMay 12 2020, 6:31 AM

Patch updated:
I accidentally left off the header file comment diffs. I think those updates still make sense because we're expanding the usage of the existing TLI hook to include funnel shifts. Not sure yet if we can remove that isVectorShiftByScalarCheap() hook entirely since it is used in another spot in CGP.

craig.topper added inline comments.May 12 2020, 10:05 AM
llvm/test/CodeGen/X86/vector-fshl-128.ll
2195–2200

Why are we splatting the scalar here when only element 0 is used?

spatel marked an inline comment as done.May 12 2020, 11:06 AM
spatel added inline comments.
llvm/test/CodeGen/X86/vector-fshl-128.ll
2195–2200

This is another limitation caused by the block-level visibility - SDAG doesn't know that the splat is from a scalar because we are only sinking the shuffle instruction, not the insertelement:

  t12: v4i32,ch = CopyFromReg t0, Register:v4i32 %0
t14: v4i32 = vector_shuffle<0,0,0,0> t12, undef:v4i32

The splat doesn't get hoisted back out of the loop until later in MachineLICM, and there's apparently no really late analysis for demanded elements.

We could try to sink insertelement to shuffles. That should probably be another patch though.

craig.topper added inline comments.May 12 2020, 11:35 AM
llvm/test/CodeGen/X86/vector-fshl-128.ll
2195–2200

I'm still confused. Shouldn't demandedelts inside selectiondag have determined the splat shuffle was unnecessary regardless of it coming from an insertelement?

spatel marked an inline comment as done.May 12 2020, 12:15 PM
spatel added inline comments.
llvm/test/CodeGen/X86/vector-fshl-128.ll
2195–2200

Ah, I see. Starting from the x86 shift nodes, we should see that we only need the low chunk. I didn't step through, but there are many potential candidates here that would foil the analysis: too many intervening nodes, casts to different sizes, and/or multiple uses:

  t14: v4i32 = vector_shuffle<0,0,0,0> t12, undef:v4i32
  t45: v4i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>
t46: v4i32 = and t14, t45
      t25: ch = CopyToReg t0, Register:i64 %2, t23
              t54: v2i64 = zero_extend_vector_inreg t46
            t55: v4i32 = bitcast t54
          t56: v4i32 = X86ISD::VSHL t10, t55
                  t48: v4i32 = BUILD_VECTOR Constant:i32<32>, Constant:i32<32>, Constant:i32<32>, Constant:i32<32>
                t49: v4i32 = sub t48, t46
              t58: v2i64 = zero_extend_vector_inreg t49
            t59: v4i32 = bitcast t58
          t60: v4i32 = X86ISD::VSRL t10, t59
craig.topper accepted this revision.May 12 2020, 12:28 PM

Thanks for checking.

LGTM

This revision is now accepted and ready to land.May 12 2020, 12:28 PM

Thanks for checking.

LGTM

Just realized we're missing a more basic splat transform that we have in IR:
binop (splat X), (splat C) --> splat (binop X, C)
I thought I had tried that at 1 point in the backend too, but I'm not finding any history now. Will have to see if that breaks anything.

This revision was automatically updated to reflect the committed changes.