If the scalar must be extracted and then used in the gather node,
instead we can emit shuffle instruction to avoid those extra
extractelements and vector-to-scalar and back data movement.
Part of D110978
Paths
| Differential D141940
[SLP]Add shuffling of extractelements to avoid extra costs/data movement. ClosedPublic Authored by ABataev on Jan 17 2023, 8:01 AM.
Details Summary If the scalar must be extracted and then used in the gather node, Part of D110978
Diff Detail
Event TimelineComment Actions There's a lot going on in this patch - making it really difficult to understand
Comment Actions
Yeah, I understand, did everything I could. It is a part of another big patch, which improves previous attempt to vectorize the gathered node with extractelement instructions. Original isFixedVectorShuffle is very limited, this patch introduces new function, which tries to handle more cases.
Then it reuses the existing isFixedVectorShuffle to build the shuffle for the best subset of instructions.
Comment Actions LGTM - I'm still worried there's too much in this patch, but I can't see anything wrong. This revision is now accepted and ready to land.Feb 17 2023, 8:46 AM Closed by commit rG708eb1b96d9a: [SLP]Add shuffling of extractelements to avoid extra costs/data movement. (authored by ABataev). · Explain WhyFeb 20 2023, 6:16 AM This revision was automatically updated to reflect the committed changes. Comment Actions Hi - We noticed some regressions from this. The largest ones appear to be similar to the potential regressions that were reported in https://reviews.llvm.org/D142359#4131731, but with code that includes intrinsics and didn't change the vector cost: https://godbolt.org/z/P1zoxz85f. Any idea how to improve that case and what might be going wrong? @SjoerdMeijer FYI. Thanks. Comment Actions
I have a small fix, will commit it later today (working on a small reproducer) Comment Actions
Comment Actions Heads up: this commits crashes one of our tests, the stack trace is: ... #2 0x000055c65640bbd6 CrashRecoverySignalHandler(int) (clang+0x800bbd6) #3 0x00007fda88e7b1c0 __restore_rt (/usr/grte/v5/lib64/libpthread.so.0+0x151c0) #4 0x000055c655b71964 llvm::slpvectorizer::BoUpSLP::isGatherShuffledEntry(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>, llvm::SmallVectorImpl<int>&, llvm::SmallVectorImpl<llvm::slpvectorizer::BoUpSLP::TreeEntry const*>&) (clang+0x7771964) #5 0x000055c655b6c47c llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>) (clang+0x776c47c) #6 0x000055c655b74903 llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>) (clang+0x7774903) #7 0x000055c655b8d3a5 llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) (clang+0x778d3a5) #8 0x000055c655b92a0a bool tryToVectorizeSequence<llvm::Value>(llvm::SmallVectorImpl<llvm::Value*>&, llvm::function_ref<unsigned int (llvm::Value*)>, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::ArrayRef<llvm::Value*>, bool)>, bool) (clang+0x7792a0a) #9 0x000055c655b89681 llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (clang+0x7789681) #10 0x000055c655b87eb5 llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (clang+0x7787eb5) #11 0x000055c655b877e6 llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x77877e6) #12 0x000055c654ce1e32 llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x68e1e32) #13 0x000055c6562b07cb llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x7eb07cb) #14 0x000055c651f02472 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x3b02472) #15 0x000055c6562b2b27 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x7eb2b27) #16 0x000055c651f04512 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x3b04512) #17 0x000055c6562afcab llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x7eafcab) #18 0x000055c651efa03d (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>&, std::__u::unique_ptr<llvm::ToolOutputFile, std::__u::default_delete<llvm::ToolOutputFile>>&) (clang+0x3afa03d) #19 0x000055c651ef2d47 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>) (clang+0x3af2d47) #20 0x000055c651eef4ca clang::CodeGenAction::ExecuteAction() (clang+0x3aef4ca) #21 0x000055c6529aad5a clang::FrontendAction::Execute() (clang+0x45aad5a) #22 0x000055c652919a04 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (clang+0x4519a04) #23 0x000055c651b958ce clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (clang+0x37958ce) #24 0x000055c651b8a4af cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (clang+0x378a4af) ... Comment Actions
Thanks for the report, please send the reproducer. Comment Actions Failing assertion is assert.h assertion failed at ...llvm/include/llvm/ADT/SmallVector.h:298 in const_reference llvm::SmallVectorTemplateCommon<llvm::Value *>::operator[](size_type) const [T = llvm::Value *]: idx < size()
Comment Actions
Need a reproducer, unable to investigate the crash without actual repro. Comment Actions
We are working on it. Unfortunately, reducing and stripping the internal stuff for this particular test is really hard. Comment Actions I think this is the cause of the crash: https://reviews.llvm.org/D144895
Revision Contents
Diff 494633 llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle-inseltpoison.ll
llvm/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll
llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll
llvm/test/Transforms/SLPVectorizer/X86/gather-extractelements-different-bbs.ll
llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll
llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
llvm/test/Transforms/SLPVectorizer/X86/reduction-transpose.ll
llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll
|
I > E?