This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Make MulOpsInlineThreshold lower to avoid excessive compilation time
ClosedPublic

Authored by mkazantsev on Jun 20 2017, 5:28 AM.

Details

Summary

MulOpsInlineThreshold option of SCEV is defaulted to 1000, which is inadequately high.
When constructing SCEVs of expressions like:

x1 = a * a
x2 = x1 * x1
x3 = x2 * x2
  ...

We actually have huge SCEVs with max allowed amount of operands inlined.
Such expressions are easy to get from unrolling of loops looking like

x = a
for (i = 0; i < n; i++)
  x = x * x

Or more tricky cases where big powers are involved. If some non-linear analysis
tries to work with a SCEV that has 1000 operands, it may lead to excessively long
compilation. The attached test does not pass within 1 minute with default threshold.

This patch decreases its default value to 32, which looks much more reasonable if we
use analyzes with complexity O(N^2) or O(N^3) working with SCEV.

Diff Detail

Repository
rL LLVM

Event Timeline

mkazantsev created this revision.Jun 20 2017, 5:28 AM

It is not clear whether we should also decrease AddOpsInlineThreshold or not. On one hand, in theory the same situation is possible there. On another hand, AddExprs with repeating operands are likely to merge them with Mul, so it may be a challenging task to construct a test where it plays a role. I will do some more investigation about that.

joerg added a subscriber: joerg.Jun 20 2017, 5:51 AM

What about the old TODO item of extending powi to integer types and folding them together in the middle end?

What about the old TODO item of extending powi to integer types and folding them together in the middle end?

Joerg, what TODO item do you mean? Where can I read about it?
Anyways, even if we have a concept of powi in IR, it does not automatically mean that we have it in SCEV. It would be extremely good to have it, though, but it looks like a massive task that takes time.

sanjoy edited edge metadata.Jun 20 2017, 10:28 PM

32 seems like a sane default value, but where exactly do we get stuck here? If there are O(N^2) or O(N^3) algorithms that are not super difficult to remove, we should just remove them instead of working around them like this.

sanjoy accepted this revision.Jun 20 2017, 10:29 PM

What about the old TODO item of extending powi to integer types and folding them together in the middle end?

Anyways, even if we have a concept of powi in IR, it does not automatically mean that we have it in SCEV. It would be extremely good to have it, though, but it looks like a massive task that takes time.

+1

Adding new nodes to SCEV has a high cost, and needs to have good justification.

This revision is now accepted and ready to land.Jun 20 2017, 10:29 PM

Sanjoy, actually merging operand Muls into patent Mul is O(N^2) by itself: you need a linear time to merge each of them (and if we sort them by time, it is N log(N)). So if you have a parent Mul of max ops amount, and which op is also a Mul node with max Ops amount, then oops - we will merge them for long.

This is where we spend a lot of time in this particular test, but I don't exclude that there are other algorithms that will die on this step.

Sanjoy, actually merging operand Muls into patent Mul is O(N^2) by itself: you need a linear time to merge each of them (and if we sort them by time, it is N log(N)). So if you have a parent Mul of max ops amount, and which op is also a Mul node with max Ops amount, then oops - we will merge them for long.

This is where we spend a lot of time in this particular test, but I don't exclude that there are other algorithms that will die on this step.

Maybe not the best explanation, let me be more specific.

while (const SCEVMulExpr *Mul = dyn_cast<SCEVMulExpr>(Ops[Idx])) { 
  if (Ops.size() > MulOpsInlineThreshold)
    break;
  // If we have an mul, expand the mul operands onto the end of the
  // operands list.
  Ops.erase(Ops.begin()+Idx);                                    // Erase
  Ops.append(Mul->op_begin(), Mul->op_end());    // Append
  DeletedMul = true;
}

If we have N ops on current level, each of these ops is a Mul with M ops (and N * M = MulOpsInlineThreshold), we:

  1. Make N appends of M elements, with total cost N * M;
  2. Make N erases, cost of each one is linear from number of ops on current level. On i-th iteration, we have N + M * (i - 1) ops on current level. The overall cost of these erases is O(sum(i = 1 ... N) (N + M * (i - 1))), which is O(N^2 + N*M).

The overall complexity is O(N^2 + N*M), which is extremely bad if (for example) N = 500, M = 2. Maybe we should consider rewriting this algorithm so that we don't make these erases.

And again, I don't see why similar problems shouldn't happen in other places.

This revision was automatically updated to reflect the committed changes.
joerg added a comment.Jun 21 2017, 2:14 AM

What about the old TODO item of extending powi to integer types and folding them together in the middle end?

Joerg, what TODO item do you mean? Where can I read about it?
Anyways, even if we have a concept of powi in IR, it does not automatically mean that we have it in SCEV. It would be extremely good to have it, though, but it looks like a massive task that takes time.

lib/Target/README.txt line 45ff.