This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Add shuffling of extractelements to avoid extra costs/data movement.
ClosedPublic

Authored by ABataev on Jan 17 2023, 8:01 AM.

Details

Summary

If the scalar must be extracted and then used in the gather node,
instead we can emit shuffle instruction to avoid those extra
extractelements and vector-to-scalar and back data movement.

Part of D110978

Diff Detail

Event Timeline

ABataev created this revision.Jan 17 2023, 8:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 8:01 AM
ABataev requested review of this revision.Jan 17 2023, 8:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 8:01 AM
ABataev updated this revision to Diff 492108.Jan 25 2023, 7:39 AM

Rebase, ping!

ABataev updated this revision to Diff 493417.Jan 30 2023, 2:07 PM

Rebase, ping!

There's a lot going on in this patch - making it really difficult to understand

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
637

I > E?

6875

we're looking for VF? == VF so would this be better as:

((VF == VF1 || VF == VF2) && GatheredScalars.size() != VF)
9422
((VF == VF1 || VF == Vf2) && GatheredScalars.size() != VF)

There's a lot going on in this patch - making it really difficult to understand

Yeah, I understand, did everything I could. It is a part of another big patch, which improves previous attempt to vectorize the gathered node with extractelement instructions. Original isFixedVectorShuffle is very limited, this patch introduces new function, which tries to handle more cases.

  1. Handles extractelements with vectors of different sizes.
  2. If we have the extractelements from more than 2 vectors, it selects 2 vector with the most number of uses in the node.
  3. If the number of the extracts from one of the vectors is greater than the extracts from 2 vectors with other vector factors, select permutation of single vector, not the very first pair.

Then it reuses the existing isFixedVectorShuffle to build the shuffle for the best subset of instructions.
If it is not succeeded, just exit (later can be extended to try to analyze some other subsets).

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
637

Very rare case, missed it.

ABataev updated this revision to Diff 495192.Feb 6 2023, 9:56 AM

Address comments

A couple of minors

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6852

if (Value *VecBase = AdjustExtractsCost(Cost, ExtractMask)) ?

9402

if (Value *VecBase = AdjustExtracts(E, ExtractMask)) ?

ABataev updated this revision to Diff 498393.Feb 17 2023, 8:42 AM

Address comments

RKSimon accepted this revision.Feb 17 2023, 8:46 AM

LGTM - I'm still worried there's too much in this patch, but I can't see anything wrong.

This revision is now accepted and ready to land.Feb 17 2023, 8:46 AM

Hi - We noticed some regressions from this. The largest ones appear to be similar to the potential regressions that were reported in https://reviews.llvm.org/D142359#4131731, but with code that includes intrinsics and didn't change the vector cost: https://godbolt.org/z/P1zoxz85f.

Any idea how to improve that case and what might be going wrong? @SjoerdMeijer FYI. Thanks.

Hi - We noticed some regressions from this. The largest ones appear to be similar to the potential regressions that were reported in https://reviews.llvm.org/D142359#4131731, but with code that includes intrinsics and didn't change the vector cost: https://godbolt.org/z/P1zoxz85f.

Any idea how to improve that case and what might be going wrong? @SjoerdMeijer FYI. Thanks.

I have a small fix, will commit it later today (working on a small reproducer)

Hi - We noticed some regressions from this. The largest ones appear to be similar to the potential regressions that were reported in https://reviews.llvm.org/D142359#4131731, but with code that includes intrinsics and didn't change the vector cost: https://godbolt.org/z/P1zoxz85f.

Any idea how to improve that case and what might be going wrong? @SjoerdMeijer FYI. Thanks.

Fixed in cbcdd747e85b8d33b821d94d8114b971f31fd0d2

Fixed in cbcdd747e85b8d33b821d94d8114b971f31fd0d2

Yeah the results look great. Thanks!

Heads up: this commits crashes one of our tests, the stack trace is:

...
 #2 0x000055c65640bbd6 CrashRecoverySignalHandler(int) (clang+0x800bbd6)
 #3 0x00007fda88e7b1c0 __restore_rt (/usr/grte/v5/lib64/libpthread.so.0+0x151c0)
 #4 0x000055c655b71964 llvm::slpvectorizer::BoUpSLP::isGatherShuffledEntry(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>, llvm::SmallVectorImpl<int>&, llvm::SmallVectorImpl<llvm::slpvectorizer::BoUpSLP::TreeEntry const*>&) (clang+0x7771964)
 #5 0x000055c655b6c47c llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>) (clang+0x776c47c)
 #6 0x000055c655b74903 llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>) (clang+0x7774903)
 #7 0x000055c655b8d3a5 llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) (clang+0x778d3a5)
 #8 0x000055c655b92a0a bool tryToVectorizeSequence<llvm::Value>(llvm::SmallVectorImpl<llvm::Value*>&, llvm::function_ref<unsigned int (llvm::Value*)>, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::ArrayRef<llvm::Value*>, bool)>, bool) (clang+0x7792a0a)
 #9 0x000055c655b89681 llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (clang+0x7789681)
#10 0x000055c655b87eb5 llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (clang+0x7787eb5)
#11 0x000055c655b877e6 llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x77877e6)
#12 0x000055c654ce1e32 llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x68e1e32)
#13 0x000055c6562b07cb llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x7eb07cb)
#14 0x000055c651f02472 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x3b02472)
#15 0x000055c6562b2b27 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x7eb2b27)
#16 0x000055c651f04512 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x3b04512)
#17 0x000055c6562afcab llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x7eafcab)
#18 0x000055c651efa03d (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>&, std::__u::unique_ptr<llvm::ToolOutputFile, std::__u::default_delete<llvm::ToolOutputFile>>&) (clang+0x3afa03d)
#19 0x000055c651ef2d47 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>) (clang+0x3af2d47)
#20 0x000055c651eef4ca clang::CodeGenAction::ExecuteAction() (clang+0x3aef4ca)
#21 0x000055c6529aad5a clang::FrontendAction::Execute() (clang+0x45aad5a)
#22 0x000055c652919a04 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (clang+0x4519a04)
#23 0x000055c651b958ce clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (clang+0x37958ce)
#24 0x000055c651b8a4af cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (clang+0x378a4af)
...

Heads up: this commits crashes one of our tests, the stack trace is:

...
 #2 0x000055c65640bbd6 CrashRecoverySignalHandler(int) (clang+0x800bbd6)
 #3 0x00007fda88e7b1c0 __restore_rt (/usr/grte/v5/lib64/libpthread.so.0+0x151c0)
 #4 0x000055c655b71964 llvm::slpvectorizer::BoUpSLP::isGatherShuffledEntry(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>, llvm::SmallVectorImpl<int>&, llvm::SmallVectorImpl<llvm::slpvectorizer::BoUpSLP::TreeEntry const*>&) (clang+0x7771964)
 #5 0x000055c655b6c47c llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry const*, llvm::ArrayRef<llvm::Value*>) (clang+0x776c47c)
 #6 0x000055c655b74903 llvm::slpvectorizer::BoUpSLP::getTreeCost(llvm::ArrayRef<llvm::Value*>) (clang+0x7774903)
 #7 0x000055c655b8d3a5 llvm::SLPVectorizerPass::tryToVectorizeList(llvm::ArrayRef<llvm::Value*>, llvm::slpvectorizer::BoUpSLP&, bool) (clang+0x778d3a5)
 #8 0x000055c655b92a0a bool tryToVectorizeSequence<llvm::Value>(llvm::SmallVectorImpl<llvm::Value*>&, llvm::function_ref<unsigned int (llvm::Value*)>, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::Value*, llvm::Value*)>, llvm::function_ref<bool (llvm::ArrayRef<llvm::Value*>, bool)>, bool) (clang+0x7792a0a)
 #9 0x000055c655b89681 llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (clang+0x7789681)
#10 0x000055c655b87eb5 llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (clang+0x7787eb5)
#11 0x000055c655b877e6 llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x77877e6)
#12 0x000055c654ce1e32 llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x68e1e32)
#13 0x000055c6562b07cb llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x7eb07cb)
#14 0x000055c651f02472 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (clang+0x3b02472)
#15 0x000055c6562b2b27 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x7eb2b27)
#16 0x000055c651f04512 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x3b04512)
#17 0x000055c6562afcab llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (clang+0x7eafcab)
#18 0x000055c651efa03d (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>&, std::__u::unique_ptr<llvm::ToolOutputFile, std::__u::default_delete<llvm::ToolOutputFile>>&) (clang+0x3afa03d)
#19 0x000055c651ef2d47 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>) (clang+0x3af2d47)
#20 0x000055c651eef4ca clang::CodeGenAction::ExecuteAction() (clang+0x3aef4ca)
#21 0x000055c6529aad5a clang::FrontendAction::Execute() (clang+0x45aad5a)
#22 0x000055c652919a04 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (clang+0x4519a04)
#23 0x000055c651b958ce clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (clang+0x37958ce)
#24 0x000055c651b8a4af cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (clang+0x378a4af)
...

Thanks for the report, please send the reproducer.

Failing assertion is

assert.h assertion failed at ...llvm/include/llvm/ADT/SmallVector.h:298 in const_reference llvm::SmallVectorTemplateCommon<llvm::Value *>::operator[](size_type) const [T = llvm::Value *]: idx < size()

Heads up: this commits crashes one of our tests, the stack trace is:

Failing assertion is

assert.h assertion failed at ...llvm/include/llvm/ADT/SmallVector.h:298 in const_reference llvm::SmallVectorTemplateCommon<llvm::Value *>::operator[](size_type) const [T = llvm::Value *]: idx < size()

Heads up: this commits crashes one of our tests, the stack trace is:

Need a reproducer, unable to investigate the crash without actual repro.

Need a reproducer, unable to investigate the crash without actual repro.

We are working on it. Unfortunately, reducing and stripping the internal stuff for this particular test is really hard.

I think this is the cause of the crash: https://reviews.llvm.org/D144895
@ABataev could you take a look ?