Page MenuHomePhabricator

[LoopUtils] Avoid expanding complicated SCEVNAry when rewriteLoopExitValues
Needs ReviewPublic

Authored by guopeilin on May 24 2021, 12:34 AM.

Diff Detail

Event Timeline

guopeilin created this revision.May 24 2021, 12:34 AM
guopeilin requested review of this revision.May 24 2021, 12:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2021, 12:34 AM
nikic added a comment.May 24 2021, 1:17 AM

Why is this not already covered by the isHighCostExpansion() check?

I'm not a fan of this direction.
Which stack overflow are you seeing?

llvm/lib/Transforms/Utils/LoopUtils.cpp
1436–1468

Because we expand regardless of the high cost,
because we need to do isValidRewrite() sanity check first,
before calling canLoopBeDeleted(), just in case the latter
returns true and we'd want to expand anyways.

Why is this not already covered by the isHighCostExpansion() check?

Firstly,within the function isHighCostExpansion, only the SCEVUDivExpr and SCEVMinMaxExpr could have the chance to return true.
Secondly, even though function isHighCostExpansion return true, the result is used in the following code:

// Only do the rewrite when the ExitValue can be expanded cheaply.
// If LoopCanBeDel is true, rewrite exit value aggressively.
if (ReplaceExitValue == OnlyCheapRepl && !LoopCanBeDel && Phi.HighCost) {
  DeadInsts.push_back(ExitVal);
  continue;
}

And in this case, the variable LoopCanBeDel is true, this means we will expand anyway.
So I guess we need some early-return way to avoid this extremely complicated cases.

I'm not a fan of this direction.
Which stack overflow are you seeing?

The details are that the variable %add.i.2.i can be computed by SCEV, and after expanding the SCEV value, this variable contains thousands of operands.
During ISel, we need to build a SelectionDAG first, so we need to visit all the operands to build SDNodes, and this will exhaust the stack due to plenty of operands.

Also, if you use opt -indvars -S, you can see that the program keeps printing the IR for a very long long time. More specifically, the program keeps printing the replaced variable.

lebedev.ri requested changes to this revision.May 24 2021, 5:43 AM

Why is this not already covered by the isHighCostExpansion() check?

Firstly,within the function isHighCostExpansion, only the SCEVUDivExpr and SCEVMinMaxExpr could have the chance to return true.

You are looking at the wrong function.

Secondly, even though function isHighCostExpansion return true, the result is used in the following code:

// Only do the rewrite when the ExitValue can be expanded cheaply.
// If LoopCanBeDel is true, rewrite exit value aggressively.
if (ReplaceExitValue == OnlyCheapRepl && !LoopCanBeDel && Phi.HighCost) {
  DeadInsts.push_back(ExitVal);
  continue;
}

And in this case, the variable LoopCanBeDel is true, this means we will expand anyway.
So I guess we need some early-return way to avoid this extremely complicated cases.

This revision now requires changes to proceed.May 24 2021, 5:43 AM
guopeilin updated this revision to Diff 347369.May 24 2021, 6:17 AM

I'm very wary of more magic numbers.
Could you please specify, is the problem the time the transformation takes,
or it's effect on some further passes?

mdchen added a subscriber: mdchen.May 24 2021, 8:14 PM

Actually, this transformation doesn't take too much time. However the IR it generated is very long. And this will have a great effect on the further pass. More specifically, the expanding SCEV which replaces the variable %add.i.2.i contains thousands of operands.
So if you use opt -indvars -S, you can see the program keeps printing the IR all the time.
Besides, I use clang -O3 test.cpp and find that the program gets a segment fault with a stack dump like the followings:

...
llvm::sys::RunSignalHandlers()
SignalHandler(int)
llvm::IntegerType::classof(llvm::Type const*)
llvm::isa_impl<llvm::IntegerType, llvm::Type, void>::doit(llvm::Type const&)
llvm::isa_impl_cl<llvm::IntegerType, llvm::Type const*>::doit(llvm::Type const*) 
llvm::isa_impl_wrap<llvm::IntegerType, llvm::Type const*, llvm::Type const*>::doit(llvm::Type const* const&)
llvm::isa_impl_wrap<llvm::IntegerType, llvm::Type* const, llvm::Type const*>::doit(llvm::Type* const&) 
bool llvm::isa<llvm::IntegerType, llvm::Type*>(llvm::Type* const&) 
llvm::cast_retty<llvm::IntegerType, llvm::Type*>::ret_type llvm::cast<llvm::IntegerType, llvm::Type>(llvm::Type*)
llvm::EVT::getEVT(llvm::Type*, bool)
llvm::TargetLoweringBase::getValueType(llvm::DataLayout const&, llvm::Type*, bool) const
llvm::SelectionDAGBuilder::getValueImpl(llvm::Value const*)
llvm::SelectionDAGBuilder::getValue(llvm::Value const*)
llvm::SelectionDAGBuilder::visitSelect(llvm::User const&)
llvm::SelectionDAGBuilder::visit(unsigned int, llvm::User const&)
llvm::SelectionDAGBuilder::getValueImpl(llvm::Value const*)
llvm::SelectionDAGBuilder::getValue(llvm::Value const*)
llvm::SelectionDAGBuilder::visitSelect(llvm::User const&)
llvm::SelectionDAGBuilder::visit(unsigned int, llvm::User const&)
llvm::SelectionDAGBuilder::getValueImpl(llvm::Value const*)
llvm::SelectionDAGBuilder::getValue(llvm::Value const*)
llvm::SelectionDAGBuilder::visitBinary(llvm::User const&, unsigned int)
llvm::SelectionDAGBuilder::visitAdd(llvm::User const&)
llvm::SelectionDAGBuilder::visit(unsigned int, llvm::User const&) 
llvm::SelectionDAGBuilder::getValueImpl(llvm::Value const*)
llvm::SelectionDAGBuilder::getValue(llvm::Value const*)
llvm::SelectionDAGBuilder::visitICmp(llvm::User const&)
(handreds of other function visit and getValue and so on)
...
...
clang-10: error: unable to execute command: Segmentation fault (core dumped)

From the stack dump, we can see that during the building SelectionDAG, visit thousands of operands finally exhaust the stack and left no enough space for a new Value. And the incomplete Value finally causes Segment Fault.
In short, any way that tries to visit the operands generated by expanding the extremely complicated SCEV computed in this test case will get stuck. 1. we using -S to print the IR will get stuck. 2. Building SelectionDAG by visiting all operands will get stuck.
One more thing is that ISel pass is controlled by llc, so I cannot reproduce the segment fault here cause commands opt -indvars test-case.ll | llc doesn't work as I want. This will still get stuck in opt cause it wants to generate IR first. It would be greatly appreciated if you have any ideas about this.

As for the magic number, I guess that the key point is that we need to add a case for complicated SCEV. And there might have another way to determine whether a SCEV is complicated rather than counting the operands.

I'm very wary of more magic numbers.
Could you please specify, is the problem the time the transformation takes,
or it's effect on some further passes?