This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Add support for Splats in ComplexDeinterleaving pass
ClosedPublic

Authored by igor.kirillov on Jun 20 2023, 7:39 AM.

Details

Summary

This commit allows generating of complex number intrinsics for expressions
with constants or loops invariants, which are represented as splats.
For instance, after vectorizing loops in the following code snippets,
the ComplexDeinterleaving pass will be able to generate complex number
intrinsics:

complex<> x = ...;
for (int i = 0; i < N; ++i)
    c[i] = a[i] * b[i] * x;

or

for (int i = 0; i < N; ++i)
    c[i] = a[i] * b[i] * (11.0 + 3.0i);

Depends on D153446 and D153856

Diff Detail

Event Timeline

igor.kirillov created this revision.Jun 20 2023, 7:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2023, 7:39 AM
igor.kirillov requested review of this revision.Jun 20 2023, 7:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2023, 7:39 AM
igor.kirillov edited the summary of this revision. (Show Details)

Refactor part of the code into separate revision (D153446)

Looks good to me, but can you precommit the tests so we can see the expected differences?

llvm/test/CodeGen/AArch64/complex-deinterleaving-splat.ll
52

What's the reason for the square brackets here?

igor.kirillov added inline comments.Jun 27 2023, 3:34 AM
llvm/test/CodeGen/AArch64/complex-deinterleaving-splat.ll
52

That's how the complex<double> type variable looks when translated into the IR. So I thought it is better to leave it as aggregate.

igor.kirillov edited the summary of this revision. (Show Details)

Precommit tests

NickGuy accepted this revision.Jun 30 2023, 6:15 AM
This revision is now accepted and ready to land.Jun 30 2023, 6:15 AM
This revision was landed with ongoing or failed builds.Jul 5 2023, 10:03 AM
This revision was automatically updated to reflect the committed changes.

This causes a segfault when linking Firefox for arm64 mac with LTO. I can provide a reproducer if necessary, but it's large (several GB). The stack trace looks like this:

Stack dump:
0.	Running pass 'Function Pass Manager' on module 'builds/worker/workspace/obj-build/aarch64-apple-darwin/release/libgkrust.a(qcms-9781f4d323403c02.qcms.81a5ab1e-cgu.0.rcgu.o)56213408'.
1.	Running pass 'Complex Deinterleaving Pass' on function '@qcms_transform_data_rgb_out_lut_neon'
 #0 0x00007fb7032c6068 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /tmp/llvm/llvm/lib/Support/Unix/Signals.inc:602:13
 #1 0x00007fb7032c4440 llvm::sys::RunSignalHandlers() /tmp/llvm/llvm/lib/Support/Signals.cpp:105:18
 #2 0x00007fb7032c66e8 SignalHandler(int) /tmp/llvm/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007fb70245af90 (/lib/x86_64-linux-gnu/libc.so.6+0x3bf90)
 #4 0x00007fb70334be8e llvm::User::getNumOperands() const /tmp/llvm/llvm/include/llvm/IR/User.h:191:44
 #5 0x00007fb70334be8e llvm::HungoffOperandTraits<2u>::operands(llvm::User const*) /tmp/llvm/llvm/include/llvm/IR/OperandTraits.h:103:15
 #6 0x00007fb70334be8e llvm::PHINode::getNumOperands() const /tmp/llvm/llvm/include/llvm/IR/Instructions.h:2973:1
 #7 0x00007fb70334be8e llvm::PHINode::addIncoming(llvm::Value*, llvm::BasicBlock*) /tmp/llvm/llvm/include/llvm/IR/Instructions.h:2886:9
 #8 0x00007fb703545e92 (anonymous namespace)::ComplexDeinterleavingGraph::processReductionOperation(llvm::Value*, (anonymous namespace)::ComplexDeinterleavingCompositeNode*) /tmp/llvm/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:1943:45
 #9 0x00007fb703545e92 (anonymous namespace)::ComplexDeinterleavingGraph::replaceNode(llvm::IRBuilderBase&, (anonymous namespace)::ComplexDeinterleavingCompositeNode*) /tmp/llvm/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:1900:5
#10 0x00007fb70353d1e2 (anonymous namespace)::ComplexDeinterleavingGraph::replaceNodes() /tmp/llvm/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:1976:29
#11 0x00007fb70353d1e2 (anonymous namespace)::ComplexDeinterleaving::evaluateBasicBlock(llvm::BasicBlock*) /tmp/llvm/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:520:11
#12 0x00007fb70353d1e2 (anonymous namespace)::ComplexDeinterleaving::runOnFunction(llvm::Function&) /tmp/llvm/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:479:16
#13 0x00007fb70353e35f (anonymous namespace)::ComplexDeinterleavingLegacyPass::runOnFunction(llvm::Function&) /tmp/llvm/llvm/lib/CodeGen/ComplexDeinterleavingPass.cpp:460:42
#14 0x00007fb7033f3216 llvm::FPPassManager::runOnFunction(llvm::Function&) /tmp/llvm/llvm/lib/IR/LegacyPassManager.cpp:1435:27
#15 0x00007fb7033f8b13 llvm::FPPassManager::runOnModule(llvm::Module&) /tmp/llvm/llvm/lib/IR/LegacyPassManager.cpp:1481:13
#16 0x00007fb7033f389e (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /tmp/llvm/llvm/lib/IR/LegacyPassManager.cpp:0:27
#17 0x00007fb7033f389e llvm::legacy::PassManagerImpl::run(llvm::Module&) /tmp/llvm/llvm/lib/IR/LegacyPassManager.cpp:535:44
#18 0x00007fb7047898ee std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>::operator bool() const /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/unique_ptr.h:479:22
#19 0x00007fb7047898ee codegen(llvm::lto::Config const&, llvm::TargetMachine*, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, unsigned int, llvm::Module&, llvm::ModuleSummaryIndex const&) /tmp/llvm/llvm/lib/LTO/LTOBackend.cpp:413:31
#20 0x00007fb70478a7a2 std::_Function_base::~_Function_base() /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/std_function.h:243:11
#21 0x00007fb70478a7a2 llvm::lto::thinBackend(llvm::lto::Config const&, unsigned int, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::Module&, llvm::ModuleSummaryIndex const&, llvm::StringMap<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long>>, llvm::MallocAllocator> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, std::vector<std::pair<llvm::StringRef, llvm::BitcodeModule>, std::allocator<std::pair<llvm::StringRef, llvm::BitcodeModule>>>>*, std::vector<unsigned char, std::allocator<unsigned char>> const&)::$_1::operator()(llvm::Module&, llvm::TargetMachine*, std::unique_ptr<llvm::ToolOutputFile, std::default_delete<llvm::ToolOutputFile>>) const /tmp/llvm/llvm/lib/LTO/LTOBackend.cpp:585:9
#22 0x00007fb70478a63d llvm::lto::thinBackend(llvm::lto::Config const&, unsigned int, std::function<llvm::Expected<std::unique_ptr<llvm::CachedFileStream, std::default_delete<llvm::CachedFileStream>>> (unsigned int, llvm::Twine const&)>, llvm::Module&, llvm::ModuleSummaryIndex const&, llvm::StringMap<std::unordered_set<unsigned long, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<unsigned long>>, llvm::MallocAllocator> const&, llvm::DenseMap<unsigned long, llvm::GlobalValueSummary*, llvm::DenseMapInfo<unsigned long, void>, llvm::detail::DenseMapPair<unsigned long, llvm::GlobalValueSummary*>> const&, llvm::MapVector<llvm::StringRef, llvm::BitcodeModule, llvm::DenseMap<llvm::StringRef, unsigned int, llvm::DenseMapInfo<llvm::StringRef, void>, llvm::detail::DenseMapPair<llvm::StringRef, unsigned int>>, std::vector<std::pair<llvm::StringRef, llvm::BitcodeModule>, std::allocator<std::pair<llvm::StringRef, llvm::BitcodeModule>>>>*, std::vector<unsigned char, std::allocator<unsigned char>> const&) /tmp/llvm/llvm/lib/LTO/LTOBackend.cpp:0:10

@glandium, thanks. I've reproduced and prepared the solution - https://reviews.llvm.org/D154598