This is an archive of the discontinued LLVM Phabricator instance.

[IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
ClosedPublic

Authored by ABataev on Aug 21 2023, 1:05 PM.

Diff Detail

Event Timeline

ABataev created this revision.Aug 21 2023, 1:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 1:05 PM
ABataev requested review of this revision.Aug 21 2023, 1:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 1:05 PM
craig.topper added inline comments.
llvm/lib/IR/Instructions.cpp
2178–2179

If Mask.size() == 4 and NumSrcElts == 6. Then this considers <5, 4, 3, 2> as a reverse?

ABataev added inline comments.Aug 21 2023, 1:52 PM
llvm/lib/IR/Instructions.cpp
2178–2179

Yes. Need to add a check that the operation does not change the size, just like non-static isReverse() does.

ABataev updated this revision to Diff 552154.Aug 21 2023, 3:19 PM

Address comments

RKSimon added inline comments.Aug 23 2023, 1:46 AM
llvm/include/llvm/IR/Instructions.h
2070–2071

Update this comment

2165–2166

Update comment

2186–2187

Update comment

2259

Update comment

llvm/unittests/IR/ShuffleVectorInstTest.cpp
14

All of these tests need support for length changing shuffles

ABataev updated this revision to Diff 552703.Aug 23 2023, 7:25 AM

Address comments

craig.topper added inline comments.Sep 6 2023, 8:58 AM
llvm/lib/IR/Instructions.cpp
2158

Why is the argument an int if we're going to cast it to unsigned?

ABataev added inline comments.Sep 6 2023, 9:48 AM
llvm/lib/IR/Instructions.cpp
2158

For conformance. All other functions have corresponding int parameter, though treat it as unsigned.

ABataev updated this revision to Diff 556478.Sep 11 2023, 1:23 PM

Rebase, ping!

ABataev updated this revision to Diff 557058.Sep 19 2023, 10:35 AM

Rebase, ping!

No objections to this, but I do think it'd make sense to split into 2 patches - (1) adding the NumSrcElts argument to the shuffle matching helpers + additional unit test coverage and (2) using it in SLP

ABataev updated this revision to Diff 557193.Sep 21 2023, 12:08 PM

Rebase, address comments

RKSimon added inline comments.Sep 26 2023, 12:15 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6659

Maybe split this into a series of separate if () { return true/false; } to make it easier to grok?

ABataev updated this revision to Diff 557371.Sep 26 2023, 12:53 PM

Address comment

ABataev marked an inline comment as done.Sep 26 2023, 12:58 PM
ABataev added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6659

Done

ABataev marked an inline comment as done.Sep 26 2023, 1:00 PM
RKSimon accepted this revision.Sep 27 2023, 3:01 AM

LGTM - cheers

This revision is now accepted and ready to land.Sep 27 2023, 3:01 AM
This revision was landed with ongoing or failed builds.Sep 28 2023, 11:12 AM
This revision was automatically updated to reflect the committed changes.

hitting an opt -p slp-vectorizer assert on the following IR:

target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.34.0"

define ptr @f() {
entry:
  %cmp.i.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %0 = zext i1 %cmp.i.i to i64
  %pgocount88 = load i64, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 9), align 8
  %1 = or i64 %pgocount88, %0
  store i64 %1, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 9), align 8
  %cond.i.i = select i1 %cmp.i.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i.i = fcmp ogt float %cond.i.i, 0.000000e+00
  %2 = zext i1 %cmp1.i.i to i64
  %pgocount89 = load i64, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 10), align 8
  %3 = or i64 %pgocount89, %2
  store i64 %3, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 10), align 8
  %cmp.i9.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i10.i = select i1 %cmp.i9.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i11.i = fcmp ogt float %cond.i10.i, 0.000000e+00
  %cmp.i14.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i15.i = select i1 %cmp.i14.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i16.i = fcmp ogt float %cond.i15.i, 0.000000e+00
  %cmp.i19.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i20.i = select i1 %cmp.i19.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i21.i = fcmp ogt float %cond.i20.i, 0.000000e+00
  ret ptr null
}

hitting an opt -p slp-vectorizer assert on the following IR:

target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.34.0"

define ptr @f() {
entry:
  %cmp.i.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %0 = zext i1 %cmp.i.i to i64
  %pgocount88 = load i64, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 9), align 8
  %1 = or i64 %pgocount88, %0
  store i64 %1, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 9), align 8
  %cond.i.i = select i1 %cmp.i.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i.i = fcmp ogt float %cond.i.i, 0.000000e+00
  %2 = zext i1 %cmp1.i.i to i64
  %pgocount89 = load i64, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 10), align 8
  %3 = or i64 %pgocount89, %2
  store i64 %3, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 10), align 8
  %cmp.i9.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i10.i = select i1 %cmp.i9.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i11.i = fcmp ogt float %cond.i10.i, 0.000000e+00
  %cmp.i14.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i15.i = select i1 %cmp.i14.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i16.i = fcmp ogt float %cond.i15.i, 0.000000e+00
  %cmp.i19.i = fcmp olt float 0.000000e+00, 0.000000e+00
  %cond.i20.i = select i1 %cmp.i19.i, float 0.000000e+00, float 0.000000e+00
  %cmp1.i21.i = fcmp ogt float %cond.i20.i, 0.000000e+00
  ret ptr null
}

Thanks, reverted the patch, will investigate it and fix.

reames added a subscriber: reames.Oct 4 2023, 11:50 AM

Coming into this quite late, but I'm confused by the need for this API change.

Doesn't the existing ArrayRef::take_front API fulfill the same need? If you need to consider a prefix of the mask, why not just take a reference to that prefix?

Coming into this quite late, but I'm confused by the need for this API change.

Doesn't the existing ArrayRef::take_front API fulfill the same need? If you need to consider a prefix of the mask, why not just take a reference to that prefix?

Sure, it may work this way, but better to have this in the interface functions. It helped to find so many bugs in the cost model. Plus, this extract thing must be copied over almost all target-specific TTI implementations.

another slp-vectorizer crash on

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv7-unknown-linux-android24"

define void @_ZNK8SkStroke10strokeRectERK6SkRectP6SkPath15SkPathDirection() {
entry:
  %0 = load float, ptr null, align 4
  %1 = load float, ptr null, align 4
  %2 = load float, ptr null, align 4
  %cmp.i = fcmp ogt float %1, %0
  %rect.sroa.14.0 = select i1 %cmp.i, float %1, float 0.000000e+00
  %rect.sroa.0.0 = select i1 %cmp.i, float %0, float 0.000000e+00
  %cmp4.i = fcmp ogt float 0.000000e+00, %2
  %rect.sroa.19.0 = select i1 %cmp4.i, float 0.000000e+00, float 0.000000e+00
  %rect.sroa.9.0 = select i1 %cmp4.i, float %2, float 0.000000e+00
  store float %rect.sroa.0.0, ptr null, align 4
  %rect.sroa.9.0.r.sroa_idx = getelementptr i8, ptr null, i32 4
  store float %rect.sroa.9.0, ptr %rect.sroa.9.0.r.sroa_idx, align 4
  %rect.sroa.14.0.r.sroa_idx = getelementptr i8, ptr null, i32 8
  store float %rect.sroa.14.0, ptr %rect.sroa.14.0.r.sroa_idx, align 4
  %rect.sroa.19.0.r.sroa_idx = getelementptr i8, ptr null, i32 12
  store float %rect.sroa.19.0, ptr %rect.sroa.19.0.r.sroa_idx, align 4
  ret void
}
reames added a comment.Oct 5 2023, 7:52 AM

Coming into this quite late, but I'm confused by the need for this API change.

Doesn't the existing ArrayRef::take_front API fulfill the same need? If you need to consider a prefix of the mask, why not just take a reference to that prefix?

Sure, it may work this way, but better to have this in the interface functions. It helped to find so many bugs in the cost model. Plus, this extract thing must be copied over almost all target-specific TTI implementations.

On the first point, I disagree. Having a simpler API and shifting the mask manipulation to the client (SLP) seems strictly better. On the second, I don't see (in your patch) need for truncation *within* backend modeling, I only see it from the client (SLP). Am I missing something?

Coming into this quite late, but I'm confused by the need for this API change.

Doesn't the existing ArrayRef::take_front API fulfill the same need? If you need to consider a prefix of the mask, why not just take a reference to that prefix?

Sure, it may work this way, but better to have this in the interface functions. It helped to find so many bugs in the cost model. Plus, this extract thing must be copied over almost all target-specific TTI implementations.

On the first point, I disagree. Having a simpler API and shifting the mask manipulation to the client (SLP) seems strictly better. On the second, I don't see (in your patch) need for truncation *within* backend modeling, I only see it from the client (SLP). Am I missing something?

  1. It is supposed to be used by other clients too. Currently it mostly affects SLP.
  2. Backend relies on the different functions, which already include check for size change. This change actually make the API for middle-end more strictly follow the backend requirements and allows better cost estimation.
uabelho added a subscriber: uabelho.Oct 9 2023, 3:24 AM

Hi @ABataev

The following fails with an assertion with this patch:

opt bbi-87376.ll -passes=vector-combine -o /dev/null


It crashes with

opt: ../lib/IR/Instructions.cpp:2129: bool isSingleSourceMaskImpl(ArrayRef<int>, int): Assertion `I >= 0 && I < (NumOpElts * 2) && "Out-of-bounds shuffle mask element"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: ../../main-github/llvm/build-all/bin/opt bbi-87376.ll -passes=vector-combine -o /dev/null
 #0 0x0000556ead38c447 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../main-github/llvm/build-all/bin/opt+0x2f2a447)
 #1 0x0000556ead389f9e llvm::sys::RunSignalHandlers() (../../main-github/llvm/build-all/bin/opt+0x2f27f9e)
 #2 0x0000556ead38cb0f SignalHandler(int) (../../main-github/llvm/build-all/bin/opt+0x2f2ab0f)
 #3 0x00007f28ab3f0630 __restore_rt (/lib64/libpthread.so.0+0xf630)
 #4 0x00007f28a8b37387 raise (/lib64/libc.so.6+0x36387)
 #5 0x00007f28a8b38a78 abort (/lib64/libc.so.6+0x37a78)
 #6 0x00007f28a8b301a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6)
 #7 0x00007f28a8b30252 (/lib64/libc.so.6+0x2f252)
 #8 0x0000556eacd49919 llvm::ShuffleVectorInst::isReverseMask(llvm::ArrayRef<int>, int) (../../main-github/llvm/build-all/bin/opt+0x28e7919)
 #9 0x0000556eabe3ea36 llvm::BasicTTIImplBase<llvm::X86TTIImpl>::improveShuffleKindFromMask(llvm::TargetTransformInfo::ShuffleKind, llvm::ArrayRef<int>, llvm::VectorType*, int&, llvm::VectorType*&) const (../../main-github/llvm/build-all/bin/opt+0x19dca36)
#10 0x0000556eabe3cb32 llvm::X86TTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, llvm::TargetTransformInfo::TargetCostKind, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) (../../main-github/llvm/build-all/bin/opt+0x19dab32)
#11 0x0000556eac5a944d llvm::TargetTransformInfo::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, llvm::TargetTransformInfo::TargetCostKind, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) const (../../main-github/llvm/build-all/bin/opt+0x214744d)
#12 0x0000556ead852a58 (anonymous namespace)::VectorCombine::run()::$_12::operator()(llvm::Instruction&) const (../../main-github/llvm/build-all/bin/opt+0x33f0a58)
#13 0x0000556ead84d7e5 llvm::VectorCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x33eb7e5)
#14 0x0000556ead5a9a7d llvm::detail::PassModel<llvm::Function, llvm::VectorCombinePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x3147a7d)
#15 0x0000556eacdac8c4 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x294a8c4)
#16 0x0000556eab18572d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0xd2372d)
#17 0x0000556eacdb0cae llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x294ecae)
#18 0x0000556eab1854cd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0xd234cd)
#19 0x0000556eacdaba54 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x2949a54)
#20 0x0000556eaad8a0d3 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (../../main-github/llvm/build-all/bin/opt+0x9280d3)
#21 0x0000556eaad9771c main (../../main-github/llvm/build-all/bin/opt+0x93571c)
#22 0x00007f28a8b23555 __libc_start_main (/lib64/libc.so.6+0x22555)
#23 0x0000556eaad84270 _start (../../main-github/llvm/build-all/bin/opt+0x922270)
Abort (core dumped)

Hi @ABataev

The following fails with an assertion with this patch:

opt bbi-87376.ll -passes=vector-combine -o /dev/null


It crashes with

opt: ../lib/IR/Instructions.cpp:2129: bool isSingleSourceMaskImpl(ArrayRef<int>, int): Assertion `I >= 0 && I < (NumOpElts * 2) && "Out-of-bounds shuffle mask element"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: ../../main-github/llvm/build-all/bin/opt bbi-87376.ll -passes=vector-combine -o /dev/null
 #0 0x0000556ead38c447 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../main-github/llvm/build-all/bin/opt+0x2f2a447)
 #1 0x0000556ead389f9e llvm::sys::RunSignalHandlers() (../../main-github/llvm/build-all/bin/opt+0x2f27f9e)
 #2 0x0000556ead38cb0f SignalHandler(int) (../../main-github/llvm/build-all/bin/opt+0x2f2ab0f)
 #3 0x00007f28ab3f0630 __restore_rt (/lib64/libpthread.so.0+0xf630)
 #4 0x00007f28a8b37387 raise (/lib64/libc.so.6+0x36387)
 #5 0x00007f28a8b38a78 abort (/lib64/libc.so.6+0x37a78)
 #6 0x00007f28a8b301a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6)
 #7 0x00007f28a8b30252 (/lib64/libc.so.6+0x2f252)
 #8 0x0000556eacd49919 llvm::ShuffleVectorInst::isReverseMask(llvm::ArrayRef<int>, int) (../../main-github/llvm/build-all/bin/opt+0x28e7919)
 #9 0x0000556eabe3ea36 llvm::BasicTTIImplBase<llvm::X86TTIImpl>::improveShuffleKindFromMask(llvm::TargetTransformInfo::ShuffleKind, llvm::ArrayRef<int>, llvm::VectorType*, int&, llvm::VectorType*&) const (../../main-github/llvm/build-all/bin/opt+0x19dca36)
#10 0x0000556eabe3cb32 llvm::X86TTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, llvm::TargetTransformInfo::TargetCostKind, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) (../../main-github/llvm/build-all/bin/opt+0x19dab32)
#11 0x0000556eac5a944d llvm::TargetTransformInfo::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, llvm::TargetTransformInfo::TargetCostKind, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) const (../../main-github/llvm/build-all/bin/opt+0x214744d)
#12 0x0000556ead852a58 (anonymous namespace)::VectorCombine::run()::$_12::operator()(llvm::Instruction&) const (../../main-github/llvm/build-all/bin/opt+0x33f0a58)
#13 0x0000556ead84d7e5 llvm::VectorCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x33eb7e5)
#14 0x0000556ead5a9a7d llvm::detail::PassModel<llvm::Function, llvm::VectorCombinePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x3147a7d)
#15 0x0000556eacdac8c4 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x294a8c4)
#16 0x0000556eab18572d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0xd2372d)
#17 0x0000556eacdb0cae llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x294ecae)
#18 0x0000556eab1854cd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0xd234cd)
#19 0x0000556eacdaba54 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x2949a54)
#20 0x0000556eaad8a0d3 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (../../main-github/llvm/build-all/bin/opt+0x9280d3)
#21 0x0000556eaad9771c main (../../main-github/llvm/build-all/bin/opt+0x93571c)
#22 0x00007f28a8b23555 __libc_start_main (/lib64/libc.so.6+0x22555)
#23 0x0000556eaad84270 _start (../../main-github/llvm/build-all/bin/opt+0x922270)
Abort (core dumped)

Fixed in c2ae16f6a72a9e48d7c6df00ff34d12360eec190