This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add `TuningPreferShiftShuffle` for when Shifts are preferable to shuffles.
ClosedPublic

Authored by goldstein.w.n on Feb 10 2023, 3:31 PM.

Details

Summary

SKX has an objectively faster shift than shuffle, on all other targets
the two have equal performance (with maybe a slight preference for
shifts because p5 is a more common bottleneck).

Diff Detail

Event Timeline

goldstein.w.n created this revision.Feb 10 2023, 3:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 3:31 PM
goldstein.w.n requested review of this revision.Feb 10 2023, 3:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 3:31 PM
pengfei added inline comments.Feb 11 2023, 3:17 AM
llvm/test/CodeGen/X86/min-legal-vector-width.ll
173–277

Tests for other conditions are gone. You may need to tune the combinations in --check-prefixes.

Without AVX512 we can't load fold arg0 for bit-shift ops - isn't that likely to be a problem?

llvm/lib/Target/X86/X86ISelLowering.cpp
18354

This approach isn't particularly easy to grok - why not just add an additional lowerShuffleAsShift check before behind a hasFasterShiftThanShuffle check?

llvm/test/CodeGen/X86/pr57340.ll
272

Are byte shifts faster I thought they were still Port5 bound?

goldstein.w.n added inline comments.Feb 11 2023, 9:17 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
18354

Was to avoid duplicating ~30 lines of code, but will do for v2.

llvm/test/CodeGen/X86/pr57340.ll
272

Same perf/code size for byte-shift vs shuffle so figure its all the same. I guess, however, it could have a drawback because its harder to switch domains for shift than shuffle so I can update logic to only do bit-shift.

Also note this particular case actually reflects a missed optimization in combineExtractVectorElt because it should be just using vpextrw but I still haven't figured out exactly whats missing.

RKSimon added inline comments.Feb 12 2023, 3:53 AM
llvm/test/CodeGen/X86/pr57340.ll
272

The combineExtractVectorElt peek through shuffle code has slowly evolved as we encountered individual regressions - I'm not surprised it still misses many.

Only do bitwise + add rotate

goldstein.w.n marked an inline comment as done.Feb 12 2023, 2:21 PM

Without AVX512 we can't load fold arg0 for bit-shift ops - isn't that likely to be a problem?

I'm not sure what you mean?
But the tuning is only for SKX which has avx512.

llvm/lib/Target/X86/X86ISelLowering.cpp
18354

Refactored as you suggest everything except matchunaryshufflepermute helper where it would cause too much duplication imo.

RKSimon added inline comments.Feb 14 2023, 12:53 AM
llvm/test/CodeGen/X86/min-legal-vector-width.ll
11

Have you seen if you can add additional common check-prefixes to reduce the amount of duplications below?

Add common prefix for SKX tests

goldstein.w.n marked an inline comment as done.Feb 15 2023, 12:18 AM
goldstein.w.n added inline comments.
llvm/test/CodeGen/X86/min-legal-vector-width.ll
11

Yeah, sorry misunderstood what it meant the first time.

goldstein.w.n marked an inline comment as done.

Rebase

RKSimon added inline comments.Feb 17 2023, 3:51 AM
llvm/lib/Target/X86/X86.td
532

Maybe rephrase this (e.g. you refer to vprold etc. but not shifts).

"Prefer lowering shuffles on AVX512 targets (e.g. Skylake Server) to shifts/rotate if they can use more ports than regular shuffles." ?

llvm/lib/Target/X86/X86ISelLowering.cpp
38748

Instead of a loop - why not move the shuffle/shift paths into lambdas and then do this - it should be easier to understand:

if (Subtarget.hasFasterShiftThanShuffle()) {
  if (matchUnaryPermuteAsBitShift()) {}
  if (matchUnaryPermuteAsIntShuffle()) {}
} else {
  if (matchUnaryPermuteAsIntShuffle()) {}
  if (matchUnaryPermuteAsBitShift()) {}
}
Matt added a subscriber: Matt.Feb 20 2023, 1:11 PM
goldstein.w.n marked an inline comment as done.Feb 20 2023, 6:02 PM
goldstein.w.n added inline comments.
llvm/lib/Target/X86/X86.td
532

Added "imm" as prefer to shifts/rotate b.c we don't want var shift, but otherwise done.

llvm/lib/Target/X86/X86ISelLowering.cpp
38748

The reason I prefered loop her is there is a lot of boiler plate for type /target checking that would also need to be copied. Seemed like loop was preferable to ~30 lines of dup code. But your call. LMK.

goldstein.w.n marked an inline comment as done.

Update comment

Improve comment

RKSimon added inline comments.Feb 22 2023, 3:58 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
16572

remove newline

38819

newline

llvm/lib/Target/X86/X86Subtarget.h
252

You shouldn't need this - the GET_SUBTARGETINFO_MACRO above should have created the getter

goldstein.w.n marked 6 inline comments as done.Feb 23 2023, 8:09 PM

Fix some nits

Is this substituted by D144570?

Is this substituted by D144570?

No, the x86fixupinsttuning pass is only meant for guaranteed replacements. There is not shuffle that is always replaceable with shifts.

@RKSimon this okay? D143859 has some dependencies on this and preference is to keep in order. If you think it needs more work, however, can rebase and push D143859 ... D144442 first.

RKSimon accepted this revision.Feb 28 2023, 7:04 AM

LGTM

This revision is now accepted and ready to land.Feb 28 2023, 7:04 AM
This revision was landed with ongoing or failed builds.Feb 28 2023, 9:25 PM
This revision was automatically updated to reflect the committed changes.

After this patch, I see an msan issue running this test internally; strangely I don't see a failure on any sanitizer buildbot yet.

There are ~400 test failures in LLVM, e.g. for the test in llvm/test/CodeGen/Generic/vector-casts.ll:

==3443==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55f66059dba1 in matchUnaryPermuteShuffle(llvm::MVT, llvm::ArrayRef<int>, llvm::APInt const&, bool, bool, llvm::SelectionDAG const&, llvm::X86Subtarget const&, unsigned int&, llvm::MVT&, unsigned int&) llvm/lib/Target/X86/X86ISelLowering.cpp:38834:38
    #1 0x55f66058ed18 in combineX86ShuffleChain(llvm::ArrayRef<llvm::SDValue>, llvm::SDValue, llvm::ArrayRef<int>, int, bool, bool, bool, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:39512:9
    #2 0x55f6603aa410 in combineX86ShufflesRecursively(llvm::ArrayRef<llvm::SDValue>, int, llvm::SDValue, llvm::ArrayRef<int>, llvm::ArrayRef<llvm::SDNode const*>, unsigned int, unsigned int, bool, bool, bool, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:40791:27
    #3 0x55f660620689 in combineX86ShufflesRecursively(llvm::SDValue, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:40818:10
    #4 0x55f66045d4b4 in combineVectorPack(llvm::SDNode*, llvm::SelectionDAG&, llvm::TargetLowering::DAGCombinerInfo&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:48525:21
    #5 0x55f6603c092c in llvm::X86TargetLowering::PerformDAGCombine(llvm::SDNode*, llvm::TargetLowering::DAGCombinerInfo&) const llvm/lib/Target/X86/X86ISelLowering.cpp:57229:36
    #6 0x55f661b6df5b in (anonymous namespace)::DAGCombiner::combine(llvm::SDNode*) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:2014:16
    #7 0x55f661b6c8f4 in Run llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1795:18
    #8 0x55f661b6c8f4 in llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:26935:36
    #9 0x55f66214e2e3 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:923:13
    #10 0x55f662147e92 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1633:7
    #11 0x55f662141c18 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:480:3
    #12 0x55f660981cf0 in (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:191:25
    #13 0x55f661633d50 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) llvm/lib/CodeGen/MachineFunctionPass.cpp:91:13
    #14 0x55f664c0c6f7 in llvm::FPPassManager::runOnFunction(llvm::Function&) llvm/lib/IR/LegacyPassManager.cpp:1430:27
    #15 0x55f664c192d9 in llvm::FPPassManager::runOnModule(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:1476:16
    #16 0x55f664c0d785 in runOnModule llvm/lib/IR/LegacyPassManager.cpp:1545:27
    #17 0x55f664c0d785 in llvm::legacy::PassManagerImpl::run(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:535:44
    #18 0x55f65d621281 in compileModule(char**, llvm::LLVMContext&) llvm/tools/llc/llc.cpp:733:8
    #19 0x55f65d61a872 in main llvm/tools/llc/llc.cpp:420:22


  Uninitialized value was created by an allocation of 'Shuffle' in the stack frame
    #0 0x55f66058a88e in combineX86ShuffleChain(llvm::ArrayRef<llvm::SDValue>, llvm::SDValue, llvm::ArrayRef<int>, int, bool, bool, bool, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:39461:3

SUMMARY: MemorySanitizer: use-of-uninitialized-value llvm/lib/Target/X86/X86ISelLowering.cpp:38834:38 in matchUnaryPermuteShuffle(llvm::MVT, llvm::ArrayRef<int>, llvm::APInt const&, bool, bool, llvm::SelectionDAG const&, llvm::X86Subtarget const&, unsigned int&, llvm::MVT&, unsigned int&)

After this patch, I see an msan issue running this test internally; strangely I don't see a failure on any sanitizer buildbot yet.

There are ~400 test failures in LLVM, e.g. for the test in llvm/test/CodeGen/Generic/vector-casts.ll:

==3443==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55f66059dba1 in matchUnaryPermuteShuffle(llvm::MVT, llvm::ArrayRef<int>, llvm::APInt const&, bool, bool, llvm::SelectionDAG const&, llvm::X86Subtarget const&, unsigned int&, llvm::MVT&, unsigned int&) llvm/lib/Target/X86/X86ISelLowering.cpp:38834:38
    #1 0x55f66058ed18 in combineX86ShuffleChain(llvm::ArrayRef<llvm::SDValue>, llvm::SDValue, llvm::ArrayRef<int>, int, bool, bool, bool, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:39512:9
    #2 0x55f6603aa410 in combineX86ShufflesRecursively(llvm::ArrayRef<llvm::SDValue>, int, llvm::SDValue, llvm::ArrayRef<int>, llvm::ArrayRef<llvm::SDNode const*>, unsigned int, unsigned int, bool, bool, bool, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:40791:27
    #3 0x55f660620689 in combineX86ShufflesRecursively(llvm::SDValue, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:40818:10
    #4 0x55f66045d4b4 in combineVectorPack(llvm::SDNode*, llvm::SelectionDAG&, llvm::TargetLowering::DAGCombinerInfo&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:48525:21
    #5 0x55f6603c092c in llvm::X86TargetLowering::PerformDAGCombine(llvm::SDNode*, llvm::TargetLowering::DAGCombinerInfo&) const llvm/lib/Target/X86/X86ISelLowering.cpp:57229:36
    #6 0x55f661b6df5b in (anonymous namespace)::DAGCombiner::combine(llvm::SDNode*) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:2014:16
    #7 0x55f661b6c8f4 in Run llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1795:18
    #8 0x55f661b6c8f4 in llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:26935:36
    #9 0x55f66214e2e3 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:923:13
    #10 0x55f662147e92 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1633:7
    #11 0x55f662141c18 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:480:3
    #12 0x55f660981cf0 in (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:191:25
    #13 0x55f661633d50 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) llvm/lib/CodeGen/MachineFunctionPass.cpp:91:13
    #14 0x55f664c0c6f7 in llvm::FPPassManager::runOnFunction(llvm::Function&) llvm/lib/IR/LegacyPassManager.cpp:1430:27
    #15 0x55f664c192d9 in llvm::FPPassManager::runOnModule(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:1476:16
    #16 0x55f664c0d785 in runOnModule llvm/lib/IR/LegacyPassManager.cpp:1545:27
    #17 0x55f664c0d785 in llvm::legacy::PassManagerImpl::run(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:535:44
    #18 0x55f65d621281 in compileModule(char**, llvm::LLVMContext&) llvm/tools/llc/llc.cpp:733:8
    #19 0x55f65d61a872 in main llvm/tools/llc/llc.cpp:420:22


  Uninitialized value was created by an allocation of 'Shuffle' in the stack frame
    #0 0x55f66058a88e in combineX86ShuffleChain(llvm::ArrayRef<llvm::SDValue>, llvm::SDValue, llvm::ArrayRef<int>, int, bool, bool, bool, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:39461:3

SUMMARY: MemorySanitizer: use-of-uninitialized-value llvm/lib/Target/X86/X86ISelLowering.cpp:38834:38 in matchUnaryPermuteShuffle(llvm::MVT, llvm::ArrayRef<int>, llvm::APInt const&, bool, bool, llvm::SelectionDAG const&, llvm::X86Subtarget const&, unsigned int&, llvm::MVT&, unsigned int&)

I think the issue is:

+      // Byte shifts can be slower so only match them on second attempt.
+      if (Order == 0 &&
+          (Shuffle == X86ISD::VSHLDQ || Shuffle == X86ISD::VSRLDQ))
+        continue;

It comes before the check of

+      if (0 < ShiftAmt && (!ShuffleVT.is512BitVector() || Subtarget.hasBWI() ||
+                           32 <= ShuffleVT.getScalarSizeInBits())) {
+        PermuteImm = (unsigned)ShiftAmt;
+        return true;
+      }

and the 0 < ShiftAmt check if basically a check if actually found/set Shuffle.
Don't think the bug actually can change behavior but is bug none the less.

Will post patch to fix.

After this patch, I see an msan issue running this test internally; strangely I don't see a failure on any sanitizer buildbot yet.

There are ~400 test failures in LLVM, e.g. for the test in llvm/test/CodeGen/Generic/vector-casts.ll:

==3443==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55f66059dba1 in matchUnaryPermuteShuffle(llvm::MVT, llvm::ArrayRef<int>, llvm::APInt const&, bool, bool, llvm::SelectionDAG const&, llvm::X86Subtarget const&, unsigned int&, llvm::MVT&, unsigned int&) llvm/lib/Target/X86/X86ISelLowering.cpp:38834:38
    #1 0x55f66058ed18 in combineX86ShuffleChain(llvm::ArrayRef<llvm::SDValue>, llvm::SDValue, llvm::ArrayRef<int>, int, bool, bool, bool, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:39512:9
    #2 0x55f6603aa410 in combineX86ShufflesRecursively(llvm::ArrayRef<llvm::SDValue>, int, llvm::SDValue, llvm::ArrayRef<int>, llvm::ArrayRef<llvm::SDNode const*>, unsigned int, unsigned int, bool, bool, bool, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:40791:27
    #3 0x55f660620689 in combineX86ShufflesRecursively(llvm::SDValue, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:40818:10
    #4 0x55f66045d4b4 in combineVectorPack(llvm::SDNode*, llvm::SelectionDAG&, llvm::TargetLowering::DAGCombinerInfo&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:48525:21
    #5 0x55f6603c092c in llvm::X86TargetLowering::PerformDAGCombine(llvm::SDNode*, llvm::TargetLowering::DAGCombinerInfo&) const llvm/lib/Target/X86/X86ISelLowering.cpp:57229:36
    #6 0x55f661b6df5b in (anonymous namespace)::DAGCombiner::combine(llvm::SDNode*) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:2014:16
    #7 0x55f661b6c8f4 in Run llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1795:18
    #8 0x55f661b6c8f4 in llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:26935:36
    #9 0x55f66214e2e3 in llvm::SelectionDAGISel::CodeGenAndEmitDAG() llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:923:13
    #10 0x55f662147e92 in llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1633:7
    #11 0x55f662141c18 in llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:480:3
    #12 0x55f660981cf0 in (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:191:25
    #13 0x55f661633d50 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) llvm/lib/CodeGen/MachineFunctionPass.cpp:91:13
    #14 0x55f664c0c6f7 in llvm::FPPassManager::runOnFunction(llvm::Function&) llvm/lib/IR/LegacyPassManager.cpp:1430:27
    #15 0x55f664c192d9 in llvm::FPPassManager::runOnModule(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:1476:16
    #16 0x55f664c0d785 in runOnModule llvm/lib/IR/LegacyPassManager.cpp:1545:27
    #17 0x55f664c0d785 in llvm::legacy::PassManagerImpl::run(llvm::Module&) llvm/lib/IR/LegacyPassManager.cpp:535:44
    #18 0x55f65d621281 in compileModule(char**, llvm::LLVMContext&) llvm/tools/llc/llc.cpp:733:8
    #19 0x55f65d61a872 in main llvm/tools/llc/llc.cpp:420:22


  Uninitialized value was created by an allocation of 'Shuffle' in the stack frame
    #0 0x55f66058a88e in combineX86ShuffleChain(llvm::ArrayRef<llvm::SDValue>, llvm::SDValue, llvm::ArrayRef<int>, int, bool, bool, bool, llvm::SelectionDAG&, llvm::X86Subtarget const&) llvm/lib/Target/X86/X86ISelLowering.cpp:39461:3

SUMMARY: MemorySanitizer: use-of-uninitialized-value llvm/lib/Target/X86/X86ISelLowering.cpp:38834:38 in matchUnaryPermuteShuffle(llvm::MVT, llvm::ArrayRef<int>, llvm::APInt const&, bool, bool, llvm::SelectionDAG const&, llvm::X86Subtarget const&, unsigned int&, llvm::MVT&, unsigned int&)

I think the issue is:

+      // Byte shifts can be slower so only match them on second attempt.
+      if (Order == 0 &&
+          (Shuffle == X86ISD::VSHLDQ || Shuffle == X86ISD::VSRLDQ))
+        continue;

It comes before the check of

+      if (0 < ShiftAmt && (!ShuffleVT.is512BitVector() || Subtarget.hasBWI() ||
+                           32 <= ShuffleVT.getScalarSizeInBits())) {
+        PermuteImm = (unsigned)ShiftAmt;
+        return true;
+      }

and the 0 < ShiftAmt check if basically a check if actually found/set Shuffle.
Don't think the bug actually can change behavior but is bug none the less.

Will post patch to fix.

see: D145129

if you have concerns over whether that works I can just revert this.

I think the issue is:

+      // Byte shifts can be slower so only match them on second attempt.
+      if (Order == 0 &&
+          (Shuffle == X86ISD::VSHLDQ || Shuffle == X86ISD::VSRLDQ))
+        continue;

It comes before the check of

+      if (0 < ShiftAmt && (!ShuffleVT.is512BitVector() || Subtarget.hasBWI() ||
+                           32 <= ShuffleVT.getScalarSizeInBits())) {
+        PermuteImm = (unsigned)ShiftAmt;
+        return true;
+      }

and the 0 < ShiftAmt check if basically a check if actually found/set Shuffle.
Don't think the bug actually can change behavior but is bug none the less.

Will post patch to fix.

Yeah, that looks like the cause. There's another place where matchShuffleAsShift is called that seems problematic. Adding some assertions can catch the issue outside of an msan build:

$ git diff
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 71dad73cfd9b..6af0ce2fd14b 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -14118,6 +14118,7 @@ static SDValue lowerShuffleAsShift(const SDLoc &DL, MVT VT, SDValue V1,
     V = V2;
   }

+  assert(ShiftAmt >= 0 && "matchShuffleAsShift failed twice");
   if (BitwiseOnly && (Opcode == X86ISD::VSHLDQ || Opcode == X86ISD::VSRLDQ))
     return SDValue();

@@ -38829,6 +38830,7 @@ static bool matchUnaryPermuteShuffle(MVT MaskVT, ArrayRef<int> Mask,
       int ShiftAmt =
           matchShuffleAsShift(ShuffleVT, Shuffle, MaskScalarSizeInBits, Mask, 0,
                               Zeroable, Subtarget);
+      assert(ShiftAmt >= 0 && "matchShuffleAsShift failed");
       // Byte shifts can be slower so only match them on second attempt.
       if (Order == 0 &&
           (Shuffle == X86ISD::VSHLDQ || Shuffle == X86ISD::VSRLDQ))
llvm/lib/Target/X86/X86ISelLowering.cpp
14121

here

see: D145129

Thanks!

if you have concerns over whether that works I can just revert this.

I don't have concerns but I also don't know how anything in this file is supposed to work :)

Fixing forward is fine with me if it can land sometime soon.

goldstein.w.n added inline comments.Mar 1 2023, 5:27 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
14121

Yeah, that too. Good Catch!
Think the fix is just move the ShiftAmt < 0 check before. Will update the other PR with that.

goldstein.w.n reopened this revision.Mar 2 2023, 12:12 AM
This revision is now accepted and ready to land.Mar 2 2023, 12:12 AM

Fix undef access

pengfei accepted this revision.Mar 2 2023, 12:13 AM
This revision was landed with ongoing or failed builds.Mar 2 2023, 12:54 AM
This revision was automatically updated to reflect the committed changes.