If the scalar must be extracted and then used in the gather node,
instead we can emit shuffle instruction to avoid those extra
extractelements and vector-to-scalar and back data movement.
Part of D110978
Differential D141940
[SLP]Add shuffling of extractelements to avoid extra costs/data movement. ABataev on Jan 17 2023, 8:01 AM. Authored by
Details If the scalar must be extracted and then used in the gather node, Part of D110978
Diff Detail
Event TimelineComment Actions There's a lot going on in this patch - making it really difficult to understand
Comment Actions Yeah, I understand, did everything I could. It is a part of another big patch, which improves previous attempt to vectorize the gathered node with extractelement instructions. Original isFixedVectorShuffle is very limited, this patch introduces new function, which tries to handle more cases.
Then it reuses the existing isFixedVectorShuffle to build the shuffle for the best subset of instructions.
Comment Actions LGTM - I'm still worried there's too much in this patch, but I can't see anything wrong. Comment Actions Hi - We noticed some regressions from this. The largest ones appear to be similar to the potential regressions that were reported in https://reviews.llvm.org/D142359#4131731, but with code that includes intrinsics and didn't change the vector cost: https://godbolt.org/z/P1zoxz85f. Any idea how to improve that case and what might be going wrong? @SjoerdMeijer FYI. Thanks. Comment Actions Heads up: this commits crashes one of our tests, the stack trace is: ... #2 0x000055c65640bbd6 CrashRecoverySignalHandler(int) (clang+0x800bbd6) #3 0x00007fda88e7b1c0 __restore_rt (/usr/grte/v5/lib64/libpthread.so.0+0x151c0) #4 0x000055c655b71964 llvm::slpvectorizer::BoUpSLP::isGatherShuffledEntry(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>, llvm::SmallVectorImpl<int>&, llvm::SmallVectorImpl<llvm::slpvectorizer::BoUpSLP::TreeEntry const*>&) (clang+0x7771964) #5 0x000055c655b6c47c llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>) (clang+0x776c47c) #6 0x000055c655b74903 llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>) (clang+0x7774903) #7 0x000055c655b8d3a5 llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) (clang+0x778d3a5) #8 0x000055c655b92a0a bool tryToVectorizeSequence<llvm::Value>(llvm::SmallVectorImpl<llvm::Value*>&, llvm::function_ref<unsigned int (llvm::Value*)>, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::ArrayRef<llvm::Value*>, bool)>, bool) (clang+0x7792a0a) #9 0x000055c655b89681 llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (clang+0x7789681) #10 0x000055c655b87eb5 llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (clang+0x7787eb5) #11 0x000055c655b877e6 llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x77877e6) #12 0x000055c654ce1e32 llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x68e1e32) #13 0x000055c6562b07cb llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x7eb07cb) #14 0x000055c651f02472 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x3b02472) #15 0x000055c6562b2b27 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x7eb2b27) #16 0x000055c651f04512 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x3b04512) #17 0x000055c6562afcab llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x7eafcab) #18 0x000055c651efa03d (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>&, std::__u::unique_ptr<llvm::ToolOutputFile, std::__u::default_delete<llvm::ToolOutputFile>>&) (clang+0x3afa03d) #19 0x000055c651ef2d47 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>) (clang+0x3af2d47) #20 0x000055c651eef4ca clang::CodeGenAction::ExecuteAction() (clang+0x3aef4ca) #21 0x000055c6529aad5a clang::FrontendAction::Execute() (clang+0x45aad5a) #22 0x000055c652919a04 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (clang+0x4519a04) #23 0x000055c651b958ce clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (clang+0x37958ce) #24 0x000055c651b8a4af cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (clang+0x378a4af) ... Comment Actions Failing assertion is assert.h assertion failed at ...llvm/include/llvm/ADT/SmallVector.h:298 in const_reference llvm::SmallVectorTemplateCommon<llvm::Value *>::operator[](size_type) const [T = llvm::Value *]: idx < size() Comment Actions
We are working on it. Unfortunately, reducing and stripping the internal stuff for this particular test is really hard. Comment Actions I think this is the cause of the crash: https://reviews.llvm.org/D144895 |