This is an archive of the discontinued LLVM Phabricator instance.

[Transforms][Reassociate] "Reassociate expressions" pass optimizations not always profitable
AbandonedPublic

Authored by pawosm01 on May 27 2023, 11:49 AM.

Details

Reviewers
qcolombet
nikic
Summary

Consider the following piece of code:

void innermost_loop(int i, double d1, double d2, double delta, int n, double cells[n])
{
  int j;
  const double d1d = d1 * delta;
  const double d2d = d2 * delta;

  for (j = 0; j <= i; j++)
    cells[j] = d1d * cells[j + 1] + d2d * cells[j];
}

When compiling at -Ofast level, after the "Reassociate expressions"
pass, this code is transformed into an equivalent of:

int j;

for (j = 0; j <= i; j++)
  cells[j] = (d1 * cells[j + 1] + d2 * cells[j]) * delta;

Effectively, the computation of those loop invariants isn't done
before the loop anymore, we have one extra multiplication on each
loop iteration instead. Sadly, this results in a significant
performance hit.

This patch makes the OptimizeAdd() function of the "Reassociate
expressions" pass aware of modifying operations in a loop and
bails out when some of the operands are pulled from the outside
of the loop.

See also: https://github.com/llvm/llvm-project/issues/62736

Diff Detail

Event Timeline

pawosm01 created this revision.May 27 2023, 11:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2023, 11:49 AM
pawosm01 requested review of this revision.May 27 2023, 11:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2023, 11:49 AM
pawosm01 updated this revision to Diff 526280.May 27 2023, 11:51 AM

After the precommit comes the real thing.

tschuett added inline comments.
llvm/lib/Transforms/Scalar/Reassociate.cpp
1707

for (unsigned i = 0, size_t e = Ops.size(); i != e; ++i) {
}

Just to get the obvious issues out of the way...

llvm/lib/Transforms/Scalar/Reassociate.cpp
1699

These need to be fetched from the pass manager instead.

1701

Should be just LoopInfo.

1732

This looks like a very complicated and inefficient re-implementation of LI->getLoopFor().

llvm/test/Transforms/Reassociate/reassociate-not-from-the-outside-of-the-loop.ll
2

Should use update_test_checks.py.

mgabka added a subscriber: mgabka.May 30 2023, 12:51 AM
pawosm01 added inline comments.May 30 2023, 7:10 AM
llvm/test/Transforms/Reassociate/reassociate-not-from-the-outside-of-the-loop.ll
2

It will make it look like this:

--- a/llvm/test/Transforms/Reassociate/reassociate-not-from-the-outside-of-the-loop.ll
+++ b/llvm/test/Transforms/Reassociate/reassociate-not-from-the-outside-of-the-loop.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
 ; RUN: opt -passes=reassociate -S < %s | FileCheck %s

 ; This test is to ensure that no computations are pulled into a loop
@@ -9,12 +10,35 @@
 ; Reassociate pass.

 define void @innermost_loop(i32 %i, double %d1, double %d2, double %delta, ptr %cells) {
-; CHECK-LABEL: @innermost_loop(
+; CHECK-LABEL: define void @innermost_loop
+; CHECK-SAME: (i32 [[I:%.*]], double [[D1:%.*]], double [[D2:%.*]], double [[DELTA:%.*]], ptr [[CELLS:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[MUL:%.*]] = fmul fast double [[DELTA]], [[D1]]
+; CHECK-NEXT:    [[MUL1:%.*]] = fmul fast double [[DELTA]], [[D2]]
+; CHECK-NEXT:    br label [[FOR_COND:%.*]]
+; CHECK:       for.cond:
+; CHECK-NEXT:    [[J_0:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[ADD:%.*]], [[FOR_BODY:%.*]] ]
+; CHECK-NEXT:    [[CMP_NOT:%.*]] = icmp sgt i32 [[J_0]], [[I]]
+; CHECK-NEXT:    br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[ADD]] = add nuw nsw i32 [[J_0]], 1
+; CHECK-NEXT:    [[IDXPROM:%.*]] = zext i32 [[ADD]] to i64
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM]]
+; CHECK-NEXT:    [[TMP0:%.*]] = load double, ptr [[ARRAYIDX]], align 8
+; CHECK-NEXT:    [[MUL2:%.*]] = fmul fast double [[MUL]], [[TMP0]]
+; CHECK-NEXT:    [[IDXPROM3:%.*]] = zext i32 [[J_0]] to i64
+; CHECK-NEXT:    [[ARRAYIDX4:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM3]]
+; CHECK-NEXT:    [[TMP1:%.*]] = load double, ptr [[ARRAYIDX4]], align 8
+; CHECK-NEXT:    [[MUL5:%.*]] = fmul fast double [[MUL1]], [[TMP1]]
+; CHECK-NEXT:    [[ADD6:%.*]] = fadd fast double [[MUL5]], [[MUL2]]
+; CHECK-NEXT:    store double [[ADD6]], ptr [[ARRAYIDX4]], align 8
+; CHECK-NEXT:    br label [[FOR_COND]]
+; CHECK:       for.end:
+; CHECK-NEXT:    ret void
+;

Is that what is expected here? It doesn't seem focusing on the matter of this patch, and it doesn't seem futureproof: many things not related to this patch can change that can affect this test case in this form...

pawosm01 added inline comments.May 30 2023, 7:13 AM
llvm/test/Transforms/Reassociate/reassociate-not-from-the-outside-of-the-loop.ll
2

...and where those CHECK-NOT's has gone?

pawosm01 updated this revision to Diff 526633.May 30 2023, 8:23 AM

Review comments addressed.

pawosm01 marked 4 inline comments as done.May 30 2023, 8:27 AM
pawosm01 added inline comments.
llvm/lib/Transforms/Scalar/Reassociate.cpp
1699

OK, used pass manager instead.

1701

Although this place has gone, I've considered your suggestion in the other place below.

1707

Thanks for spotting that!

1732

using LI->getLoopFor() instead now.

pawosm01 updated this revision to Diff 526637.May 30 2023, 8:29 AM
pawosm01 marked 4 inline comments as done.
pawosm01 updated this revision to Diff 526640.May 30 2023, 8:33 AM
fhahn added a subscriber: fhahn.May 30 2023, 8:36 AM

There's been some work to reassoicate in LICM to enable hoisting: D148001. It would probably be more general to extend the logic in LICM to handle those cases, which would also trigger if the user wrote the expression like reassociate creates at the moment.

pawosm01 marked an inline comment as done.May 30 2023, 8:52 AM

There's been some work to reassoicate in LICM to enable hoisting: D148001. It would probably be more general to extend the logic in LICM to handle those cases, which would also trigger if the user wrote the expression like reassociate creates at the moment.

So effectively in certain circumstances LICM should undo what Reassociate have done?

There's been some work to reassoicate in LICM to enable hoisting: D148001. It would probably be more general to extend the logic in LICM to handle those cases, which would also trigger if the user wrote the expression like reassociate creates at the moment.

You were right about the user written code like that: we're doing much worse than GCC here! Yet still, addressing it in LICM and not blocking it in Reassociate will create one more place for competing optimization, and LLVM is already haunted by those.

qcolombet added inline comments.May 31 2023, 3:18 AM
llvm/lib/Transforms/Scalar/Reassociate.cpp
1705

Although it is less performance sensitive, pulling adds in a loop could also be problematic.

1709

Early exit to reduce indentation

1719

Shouldn't PHIs with the same loop as L return true here?

1742

TL;DR I think the model should be more general than that but as it is, it is probably a good proxy for not shooting ourselves in the foot.

The problem is not pulling something into a loop, the problem is replacing X something with 1 * frequency-of-the-target-basic-block something.

I.e., let's imagine we have:
v1 = s1 * delta
v2 = s2 * delta
...
vN = sN *delta
for () {

v1 + ... + vN // with loop dependent stuff

}

Transforming this into:
for () {

(s1 + ... + sN) * delta

}

Could actually be beneficial as long as the loop count is smaller than N.

Similarly, something like:
for () {

if (unlikely)
 v1 + ... + vN

}

Is even more likely to be beneficial since the expected frequency of the basic block may be smaller than the source basic block.

pawosm01 added inline comments.May 31 2023, 3:08 PM
llvm/lib/Transforms/Scalar/Reassociate.cpp
1742

Transforming this into: Could actually be beneficial as long as the loop count is smaller than N.

Yes, I was afraid that with this change I'm cutting off some of the beneficial transformations, and making the logic of this pass more complicated than necessary.

Therefore I tend to agree with Florian, that this problem could be addressed in LICM instead. In the end, it's not a wrong thing for a transformation pass to undo whatever previous pass(es) did when there is a new knowledge available (and here, the new knowledge is that we are in a loop). I'm very close to giving up this commit and work on extending hoistArithmetics() (of LICM) with the ability to spot this code pattern. As I mentioned in the other comment, currently LICM can't figure it out even if it wasn't caused by Reassociate pass, namely a simple C code that I specifically crafted to replicate this problem was enough to show that GCC produces faster executable than LLVM. But there's a problem with LICM too. This hoistArithmetics() function and its surrounding is getting more and more complicated already. And I assume any attempt to extend it with ability to deal with the situation discussed here, will create a lot of code bloat. But maybe that's something inevitable.

pawosm01 abandoned this revision.Jun 7 2023, 9:21 AM

Overriden by D152281