This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))
ClosedPublic

Authored by RKSimon on Feb 9 2021, 7:42 AM.

Details

Summary

Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles.

Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle.

This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise.

Diff Detail

Event Timeline

RKSimon created this revision.Feb 9 2021, 7:42 AM
RKSimon requested review of this revision.Feb 9 2021, 7:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2021, 7:42 AM
RKSimon retitled this revision from [DAG] Fold shuffle(binop(shuffle(x,y),shuffle(z,w)),binop(shuffle(a,b),shuffle(c,d))) to [DAG] Fold shuffle(binop(shuffle(x,y),shuffle(z,w))) -> binop(shuffle(a,b),shuffle(c,d))).Feb 15 2021, 2:09 AM
RKSimon retitled this revision from [DAG] Fold shuffle(binop(shuffle(x,y),shuffle(z,w))) -> binop(shuffle(a,b),shuffle(c,d))) to [DAG] Fold shuffle(binop(shuffle(x,y),shuffle(z,w))) -> binop(shuffle(a,b),shuffle(c,d)).

ping?

pengfei added inline comments.Feb 15 2021, 5:26 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20981

Will the merge break some thing like expection handling? E.g. Under strict FP model, we should make sure no unexpected expections raised by the masked elements.

20983

Can we use a variable to save N->getOperand(0/1)? They are used many times later.

20984

Should it be isCommutativeBinOp for Op10 and Op11?

21002–21005

Can we use a lambda expression to save the repeated code?

21007

How about both Op00 and Op10 are VECTOR_SHUFFLE? Do we miss the chance for Op10?

RKSimon retitled this revision from [DAG] Fold shuffle(binop(shuffle(x,y),shuffle(z,w))) -> binop(shuffle(a,b),shuffle(c,d)) to [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))).Feb 15 2021, 6:39 AM
RKSimon updated this revision to Diff 323755.Feb 15 2021, 7:48 AM

Reuse existing getOperand() variables.

Explicitly check that the binop source operands are the same type as the shuffle results.

RKSimon marked an inline comment as done.Feb 15 2021, 8:34 AM
RKSimon added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20981

We're pushing the shuffle(binop(x,y),binop(z,w)) -> binop(shuffle(x,y),shuffle(z,w)) - so we shouldn't be calling binop on any vector element that it wasn't already. The only danger is if the outer shuffle mask will introduce any undefined elements that the binop can't handle. I'll add a check that ensures we don't merge to a shuffle mask that introduces undefined elements.

21007

We don't change LHS/RHS ordering of any binop elements so we should be OK.

RKSimon updated this revision to Diff 323770.Feb 15 2021, 8:36 AM

Don't merge shuffles if it would introduce any undefined elements

pengfei accepted this revision.Feb 16 2021, 5:37 AM

I didn't verify the generated binary, but the math LGTM.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21072

It seems all BinOps have the same VT between inputs and output. Nevertheless, I think we could still be able to do the merge even if they have different VTs.

21083

Nit, can we use a lambda expression like below to simplify the 4 checks

auto canMergeInnerShuffle = [&](bool LeftOp, bool Commute) {
  SDValue Op0 = LeftOp ? Op00 : Op01;
  SDValue Op1 = LeftOp ? Op10 : Op11;
  SDValue &SV0 = LeftOp ? LeftSV0 : RightSV0;
  SDValue &SV1 = LeftOp ? LeftSV1 : RightSV1;
  SmallVector<int, 4> &Mask = LeftOp ? LeftMask : RightMask;
  if (Commute)
    std::swap(Op0, Op1);
  return Op0.getOpcode() == ISD::VECTOR_SHUFFLE &&
         N0->isOnlyUserOf(Op0.getNode()) &&
         MergeInnerShuffle(Commute, SVN, cast<ShuffleVectorSDNode>(Op0), Op1,
                           TLI, SV0, SV1, Mask) &&
         llvm::none_of(Mask, [](int M) { return M < 0; });
};
if (canMergeInnerShuffle(true, false) || canMergeInnerShuffle(true, true)) {
  LeftMerged = true;
} else {
  LeftMask.assign(SVN->getMask().begin(), SVN->getMask().end());
  LeftSV0 = Op00, LeftSV1 = Op10;
}
...
This revision is now accepted and ready to land.Feb 16 2021, 5:37 AM

Unfortunately, this commit causes a segfault with the test case below. I'm intend to revert it shortly.

typedef long b __attribute__((__vector_size__(16)));
b *d;
b e;
b __attribute__((__always_inline__)) c(b h, b i) {
  return (__attribute__((__vector_size__(8 * sizeof(short)))) short)h + i;
}
j() {
  b k, l, m, n, o[6], p, q;
  m = d[5];
  b r = m;
  b s = f(r, 8);
  q = s;
  l = d[1];
  p = l;
  t(q);
  n = c(m, l);
  o[1] = c(s, f(p, 8));
  k = __builtin_shufflevector(n, o[1], 0, 2);
  e = __builtin_ia32_psrlwi128(k, j);
}

./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c

PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: ./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 /usr/local/google/home/saugustine/tmp/test.c

  1. <eof> parser at end of file
  2. Code generation
  3. Running pass 'Function Pass Manager' on module '.../tmp/test.c'.
  4. Running pass 'X86 DAG->DAG Instruction Selection' on function '@j' #0 0x00000000048d250a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) .../llvm-new/llvm-project/llvm/lib/Support/Unix/Signals.inc:565:11 #1 0x00000000048d26db PrintStackTraceSignalHandler(void*) .../llvm-new/llvm-project/llvm/lib/Support/Unix/Signals.inc:632:1 #2 0x00000000048d0ccb llvm::sys::RunSignalHandlers() .../llvm-new/llvm-project/llvm/lib/Support/Signals.cpp:70:5 #3 0x00000000048d2e51 SignalHandler(int) .../llvm-new/llvm-project/llvm/lib/Support/Unix/Signals.inc:407:1 #4 0x00007f522cedf140 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14140) #5 0x0000000001592ec8 llvm::SDNode::getValueType(unsigned int) const .../llvm-new/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:959:5 #6 0x00000000015930ab llvm::SDValue::getValueType() const .../llvm-new/llvm-project/llvm/include/llvm/CodeGen/SelectionDAGNodes.h:1114:16 #7 0x0000000005fe59a6 llvm::SelectionDAG::getVectorShuffle(llvm::EVT, llvm::SDLoc const&, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>) .../llvm-new/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:1731:3 #8 0x0000000005e9e6c6 (anonymous namespace)::DAGCombiner::visitVECTOR_SHUFFLE(llvm::SDNode*) .../llvm-new/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:21125:19 #9 0x0000000005e4d240 (anonymous namespace)::DAGCombiner::visit(llvm::SDNode*) .../llvm-new/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1716:40

#10 0x0000000005e4c298 (anonymous namespace)::DAGCombiner::combine(llvm::SDNode*) .../llvm-new/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1747:10
#11 0x0000000005e4b880 (anonymous namespace)::DAGCombiner::Run(llvm::CombineLevel) .../llvm-new/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:1555:18
#12 0x0000000005e4b14f llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) .../llvm-new/llvm-project/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:22906:3
#13 0x000000000604f24e llvm::SelectionDAGISel::CodeGenAndEmitDAG() .../llvm-new/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:795:3
#14 0x000000000604edcf llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void>, false, true>, llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void>, false, true>, bool&) .../llvm-new/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:711:1
#15 0x000000000604e832 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) .../llvm-new/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:0:7
#16 0x000000000604bb9f llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) .../llvm-new/llvm-project/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:505:3
#17 0x0000000002bc4ae1 (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) .../llvm-new/llvm-project/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:193:7
#18 0x0000000003676747 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) .../llvm-new/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:72:8
#19 0x0000000003cec2fc llvm::FPPassManager::runOnFunction(llvm::Function&) .../llvm-new/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1434:23
#20 0x0000000003cf1475 llvm::FPPassManager::runOnModule(llvm::Module&) .../llvm-new/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1480:16
#21 0x0000000003ceccc4 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) .../llvm-new/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1549:23
#22 0x0000000003cec7e8 llvm::legacy::PassManagerImpl::run(llvm::Module&) .../llvm-new/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:540:16
#23 0x0000000003cf1781 llvm::legacy::PassManager::run(llvm::Module&) .../llvm-new/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1676:3
#24 0x0000000004cf67b5 (anonymous namespace)::EmitAssemblyHelper::EmitAssemblyWithNewPassManager(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) .../llvm-new/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1439:3
#25 0x0000000004cf3f07 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) .../llvm-new/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1601:5
#26 0x00000000061acec9 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) .../llvm-new/llvm-project/clang/lib/CodeGen/CodeGenAction.cpp:344:7
#27 0x0000000007f5fd2e clang::ParseAST(clang::Sema&, bool, bool) .../llvm-new/llvm-project/clang/lib/Parse/ParseAST.cpp:178:12
#28 0x00000000056d13a2 clang::ASTFrontendAction::ExecuteAction() .../llvm-new/llvm-project/clang/lib/Frontend/FrontendAction.cpp:1058:1
#29 0x00000000061a8869 clang::CodeGenAction::ExecuteAction() .../llvm-new/llvm-project/clang/lib/CodeGen/CodeGenAction.cpp:1083:5
#30 0x00000000056d0d68 clang::FrontendAction::Execute() .../llvm-new/llvm-project/clang/lib/Frontend/FrontendAction.cpp:953:7
#31 0x000000000560a028 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) .../llvm-new/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:949:23
#32 0x000000000588dbb6 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) .../llvm-new/llvm-project/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:278:8
#33 0x000000000142cf0d cc1_main(llvm::ArrayRef<char const*>, char const*, void*) .../llvm-new/llvm-project/clang/tools/driver/cc1_main.cpp:246:13
#34 0x000000000141f759 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) .../llvm-new/llvm-project/clang/tools/driver/driver.cpp:330:5
#35 0x000000000141e8ee main .../llvm-new/llvm-project/clang/tools/driver/driver.cpp:407:5
#36 0x00007f522c9a2d0a __libc_start_main ./csu/../csu/libc-start.c:308:16
#37 0x000000000141e09a _start (./bin/clang+0x141e09a)
Segmentation fault

@saugustine OK - I don't suppose you managed to get the IR (reduced or otherwise) please?

this is as small as llvm-reduce makes the ir:

./bin/llc --mtriple x86_64-grtev4-linux-gnu -O1 $1

; ModuleID = '/usr/local/google/home/saugustine/tmp/test.ll'
source_filename = "/usr/local/google/home/saugustine/tmp/test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

define dso_local i32 @j() {
entry:

%0 = load <2 x i64>, <2 x i64>* undef, align 16, !tbaa !0
%psrldq = shufflevector <16 x i8> undef, <16 x i8> <i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison>, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23>
%1 = load <2 x i64>, <2 x i64>* undef, align 16, !tbaa !0
%2 = bitcast <2 x i64> %0 to <8 x i16>
%3 = bitcast <2 x i64> %1 to <8 x i16>
%add.i = add <8 x i16> %3, %2
%4 = bitcast <8 x i16> %add.i to <2 x i64>
%5 = bitcast <16 x i8> %psrldq to <8 x i16>
%6 = bitcast <2 x i64> %1 to <8 x i16>
%7 = shufflevector <8 x i16> %6, <8 x i16> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
%add.i24 = add <8 x i16> %7, %5
%8 = bitcast <8 x i16> %add.i24 to <2 x i64>
%shuffle = shufflevector <2 x i64> %4, <2 x i64> %8, <2 x i32> <i32 0, i32 2>
%9 = bitcast <2 x i64> %shuffle to <8 x i16>
%10 = call <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16> %9, i32 ptrtoint (i32 ()* @j to i32))
store <8 x i16> %10, <8 x i16>* undef, align 16, !tbaa !0
ret i32 undef

}

; Function Attrs: nounwind readnone
declare <8 x i16> @llvm.x86.sse2.psrli.w(<8 x i16>, i32) #0

attributes #0 = { nounwind readnone }

!0 = !{!1, !1, i64 0}
!1 = !{!"omnipotent char", !2, i64 0}
!2 = !{!"Simple C/C++ TBAA"}

Thanks @saugustine I somehow managed to lose a null check in one the refactors......