Need to add NumSrcElts param to is..Mask functions in
ShuffleVectorInstruction class for better mask analysis. Mask.size() not
always matches the sizes of the permuted vector(s). Allows to better
estimate the cost in SLP and fix uses of the functions in other cases.
Details
- Reviewers
RKSimon - Commits
- rGe22818d5c98a: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
rGb186f1f68be1: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
rG6f43d28f3452: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
rG9f5960e004ff: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
rGc88c281cf1ac: [IR]Add NumSrcElts param to is..Mask static function in ShuffleVectorInst.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/IR/Instructions.cpp | ||
---|---|---|
2368–2369 | If Mask.size() == 4 and NumSrcElts == 6. Then this considers <5, 4, 3, 2> as a reverse? |
llvm/lib/IR/Instructions.cpp | ||
---|---|---|
2368–2369 | Yes. Need to add a check that the operation does not change the size, just like non-static isReverse() does. |
llvm/lib/IR/Instructions.cpp | ||
---|---|---|
2350 | Why is the argument an int if we're going to cast it to unsigned? |
llvm/lib/IR/Instructions.cpp | ||
---|---|---|
2350 | For conformance. All other functions have corresponding int parameter, though treat it as unsigned. |
No objections to this, but I do think it'd make sense to split into 2 patches - (1) adding the NumSrcElts argument to the shuffle matching helpers + additional unit test coverage and (2) using it in SLP
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
6521 | Maybe split this into a series of separate if () { return true/false; } to make it easier to grok? |
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp | ||
---|---|---|
6521 | Done |
hitting an opt -p slp-vectorizer assert on the following IR:
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-windows-msvc19.34.0" define ptr @f() { entry: %cmp.i.i = fcmp olt float 0.000000e+00, 0.000000e+00 %0 = zext i1 %cmp.i.i to i64 %pgocount88 = load i64, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 9), align 8 %1 = or i64 %pgocount88, %0 store i64 %1, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 9), align 8 %cond.i.i = select i1 %cmp.i.i, float 0.000000e+00, float 0.000000e+00 %cmp1.i.i = fcmp ogt float %cond.i.i, 0.000000e+00 %2 = zext i1 %cmp1.i.i to i64 %pgocount89 = load i64, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 10), align 8 %3 = or i64 %pgocount89, %2 store i64 %3, ptr getelementptr inbounds ([17 x i64], ptr null, i64 0, i64 10), align 8 %cmp.i9.i = fcmp olt float 0.000000e+00, 0.000000e+00 %cond.i10.i = select i1 %cmp.i9.i, float 0.000000e+00, float 0.000000e+00 %cmp1.i11.i = fcmp ogt float %cond.i10.i, 0.000000e+00 %cmp.i14.i = fcmp olt float 0.000000e+00, 0.000000e+00 %cond.i15.i = select i1 %cmp.i14.i, float 0.000000e+00, float 0.000000e+00 %cmp1.i16.i = fcmp ogt float %cond.i15.i, 0.000000e+00 %cmp.i19.i = fcmp olt float 0.000000e+00, 0.000000e+00 %cond.i20.i = select i1 %cmp.i19.i, float 0.000000e+00, float 0.000000e+00 %cmp1.i21.i = fcmp ogt float %cond.i20.i, 0.000000e+00 ret ptr null }
Coming into this quite late, but I'm confused by the need for this API change.
Doesn't the existing ArrayRef::take_front API fulfill the same need? If you need to consider a prefix of the mask, why not just take a reference to that prefix?
Sure, it may work this way, but better to have this in the interface functions. It helped to find so many bugs in the cost model. Plus, this extract thing must be copied over almost all target-specific TTI implementations.
another slp-vectorizer crash on
target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64" target triple = "thumbv7-unknown-linux-android24" define void @_ZNK8SkStroke10strokeRectERK6SkRectP6SkPath15SkPathDirection() { entry: %0 = load float, ptr null, align 4 %1 = load float, ptr null, align 4 %2 = load float, ptr null, align 4 %cmp.i = fcmp ogt float %1, %0 %rect.sroa.14.0 = select i1 %cmp.i, float %1, float 0.000000e+00 %rect.sroa.0.0 = select i1 %cmp.i, float %0, float 0.000000e+00 %cmp4.i = fcmp ogt float 0.000000e+00, %2 %rect.sroa.19.0 = select i1 %cmp4.i, float 0.000000e+00, float 0.000000e+00 %rect.sroa.9.0 = select i1 %cmp4.i, float %2, float 0.000000e+00 store float %rect.sroa.0.0, ptr null, align 4 %rect.sroa.9.0.r.sroa_idx = getelementptr i8, ptr null, i32 4 store float %rect.sroa.9.0, ptr %rect.sroa.9.0.r.sroa_idx, align 4 %rect.sroa.14.0.r.sroa_idx = getelementptr i8, ptr null, i32 8 store float %rect.sroa.14.0, ptr %rect.sroa.14.0.r.sroa_idx, align 4 %rect.sroa.19.0.r.sroa_idx = getelementptr i8, ptr null, i32 12 store float %rect.sroa.19.0, ptr %rect.sroa.19.0.r.sroa_idx, align 4 ret void }
On the first point, I disagree. Having a simpler API and shifting the mask manipulation to the client (SLP) seems strictly better. On the second, I don't see (in your patch) need for truncation *within* backend modeling, I only see it from the client (SLP). Am I missing something?
- It is supposed to be used by other clients too. Currently it mostly affects SLP.
- Backend relies on the different functions, which already include check for size change. This change actually make the API for middle-end more strictly follow the backend requirements and allows better cost estimation.
Hi @ABataev
The following fails with an assertion with this patch:
opt bbi-87376.ll -passes=vector-combine -o /dev/null
It crashes with
opt: ../lib/IR/Instructions.cpp:2129: bool isSingleSourceMaskImpl(ArrayRef<int>, int): Assertion `I >= 0 && I < (NumOpElts * 2) && "Out-of-bounds shuffle mask element"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: ../../main-github/llvm/build-all/bin/opt bbi-87376.ll -passes=vector-combine -o /dev/null #0 0x0000556ead38c447 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (../../main-github/llvm/build-all/bin/opt+0x2f2a447) #1 0x0000556ead389f9e llvm::sys::RunSignalHandlers() (../../main-github/llvm/build-all/bin/opt+0x2f27f9e) #2 0x0000556ead38cb0f SignalHandler(int) (../../main-github/llvm/build-all/bin/opt+0x2f2ab0f) #3 0x00007f28ab3f0630 __restore_rt (/lib64/libpthread.so.0+0xf630) #4 0x00007f28a8b37387 raise (/lib64/libc.so.6+0x36387) #5 0x00007f28a8b38a78 abort (/lib64/libc.so.6+0x37a78) #6 0x00007f28a8b301a6 __assert_fail_base (/lib64/libc.so.6+0x2f1a6) #7 0x00007f28a8b30252 (/lib64/libc.so.6+0x2f252) #8 0x0000556eacd49919 llvm::ShuffleVectorInst::isReverseMask(llvm::ArrayRef<int>, int) (../../main-github/llvm/build-all/bin/opt+0x28e7919) #9 0x0000556eabe3ea36 llvm::BasicTTIImplBase<llvm::X86TTIImpl>::improveShuffleKindFromMask(llvm::TargetTransformInfo::ShuffleKind, llvm::ArrayRef<int>, llvm::VectorType*, int&, llvm::VectorType*&) const (../../main-github/llvm/build-all/bin/opt+0x19dca36) #10 0x0000556eabe3cb32 llvm::X86TTIImpl::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, llvm::TargetTransformInfo::TargetCostKind, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) (../../main-github/llvm/build-all/bin/opt+0x19dab32) #11 0x0000556eac5a944d llvm::TargetTransformInfo::getShuffleCost(llvm::TargetTransformInfo::ShuffleKind, llvm::VectorType*, llvm::ArrayRef<int>, llvm::TargetTransformInfo::TargetCostKind, int, llvm::VectorType*, llvm::ArrayRef<llvm::Value const*>) const (../../main-github/llvm/build-all/bin/opt+0x214744d) #12 0x0000556ead852a58 (anonymous namespace)::VectorCombine::run()::$_12::operator()(llvm::Instruction&) const (../../main-github/llvm/build-all/bin/opt+0x33f0a58) #13 0x0000556ead84d7e5 llvm::VectorCombinePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x33eb7e5) #14 0x0000556ead5a9a7d llvm::detail::PassModel<llvm::Function, llvm::VectorCombinePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x3147a7d) #15 0x0000556eacdac8c4 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0x294a8c4) #16 0x0000556eab18572d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (../../main-github/llvm/build-all/bin/opt+0xd2372d) #17 0x0000556eacdb0cae llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x294ecae) #18 0x0000556eab1854cd llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0xd234cd) #19 0x0000556eacdaba54 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (../../main-github/llvm/build-all/bin/opt+0x2949a54) #20 0x0000556eaad8a0d3 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) (../../main-github/llvm/build-all/bin/opt+0x9280d3) #21 0x0000556eaad9771c main (../../main-github/llvm/build-all/bin/opt+0x93571c) #22 0x00007f28a8b23555 __libc_start_main (/lib64/libc.so.6+0x22555) #23 0x0000556eaad84270 _start (../../main-github/llvm/build-all/bin/opt+0x922270) Abort (core dumped)
Update this comment