Page MenuHomePhabricator

[IndVarSimplify] Add loop-flattening
AbandonedPublic

Authored by SjoerdMeijer on Oct 6 2020, 2:23 AM.

Details

Summary

This pass was used for a 32-bit target, now we want to use it on 64-bit targets too. This pass wasn't triggering on 64-bit targets because the input IR to pass looks slightly different than it was expecting. For example, earlier transformations can perform rewrites using the widest available integer type, and address calculation uses wider 64-bit types. Thus, this change:

  • Moves pass LoopFlatten to just before LoopIndvarSimplify in the optimisation pipeline. These passes were already running shortly after each other, but LoopIndvarSimplify can perform rewrites using wider types that makes life more difficult for LoopFlatten. This simple reordering avoids these complications, at no disadvantage for the 32-bit target. For our motivating case on this target I've measured an irrelevant -0.0048% regression as a result of this pass reordering.
  • Overflow checks are performed on the GEP instructions. This change looks through a ZExt instruction only if it is used to index a GEP and if there are no other uses that could change the value. I think this is the least intrusive change compared to alternatives, for example promoting loop control instruction to the widest used type if different types are used.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Oct 6 2020, 2:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2020, 2:23 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
SjoerdMeijer requested review of this revision.Oct 6 2020, 2:23 AM
SjoerdMeijer added inline comments.Oct 6 2020, 2:26 AM
llvm/test/Transforms/LoopFlatten/zext-i64.ll
83 ↗(On Diff #296389)

This file shows changes compared to a local initial commit of this new file to better highlight the modifications of this changes. Here's the negative test, a case that should not be transformed.

ostannard added inline comments.Oct 8 2020, 6:41 AM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
386

I don't think this logic is correct. The check below is proving that the add and mul cannot overflow because the GEP would overflow before they do, and that would be UB. However, in the example in this comment, the add or mul could overflow, but the GEP would be fine, because it never sees an offset greater than 2*32-1. Since there's no UB, we would need to preserve the wrapping behaviour, which we can't do with a single, simple loop.

I think it might be really good if it would be possible to not implement all this overflow detection from scratch.
Is there nothing in SCEV already that does this?

I think it might be really good if it would be possible to not implement all this overflow detection from scratch.
Is there nothing in SCEV already that does this?

Agreed. This needs a bit of a rethink, best done with the help of existing infrastructure. I am going to have a look.

Thanks for looking at this!

ostannard added inline comments.Oct 8 2020, 7:58 AM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
386

Actually, I think the original version of this is incorrect too:

extern char a[];

#define SIZE 100000

int first = 1;

void foo(unsigned lim) {
  for (unsigned i = 0; i < lim; ++i) {
    for (unsigned j = 0; j < SIZE; ++j) {
      // This might overflow if lim is large, but that is well-defined.
      unsigned x = i * SIZE + j;
      if (first) {
        // Access memory using the computed index. Only do this the first time,
        // so the address calculation won't overflow.
        asm volatile("" : : "r" (a[x]));
        first = 0;
      }
      // Use the computed index for something which isn't UB.
      asm volatile("" : : "r" (x));
    }
  }
}

This gets flattened because the GEP check below passes, but the GEP is only actually executed once, and doesn't overflow the address space. I think everything in the source is well-defined, but flattening it changes the number of times the second asm statement is executed.

We might be able to salvage this by adding a check that the GEP dominates the loop latch?

We might be able to salvage this by adding a check that the GEP dominates the loop latch?

Thanks, I am picking this up again, and am going to progress this in 2 ways. I will address the existing bug separately.

Here, in this change that addresses different types, I will investigate widening to a wider type so that the transformation should be safe and avoiding more layers of overflow checks.

fhahn added a subscriber: fhahn.Oct 14 2020, 1:25 AM

We might be able to salvage this by adding a check that the GEP dominates the loop latch?

Thanks, I am picking this up again, and am going to progress this in 2 ways. I will address the existing bug separately.

Here, in this change that addresses different types, I will investigate widening to a wider type so that the transformation should be safe and avoiding more layers of overflow checks.

I guess this is similar to the widening that IndVarSimplify does? Can we just re-use the stuff from there or have IndVarSimplify just do it for us?

I guess this is similar to the widening that IndVarSimplify does? Can we just re-use the stuff from there or have IndVarSimplify just do it for us?

Yep, exactly. That's exactly what I want to look at.

SjoerdMeijer updated this revision to Diff 298615.EditedOct 16 2020, 6:22 AM
SjoerdMeijer retitled this revision from [LoopFlatten] Initial support for different types to [IndVarSimplify] Add loop-flattening.

I've got a slightly different proposal. This moves loop flattening into IndVarSimplify for several reasons:

  • loop-flatten is best run just before IndVarSimplify because IndVarSimplify can promote induction variables. For overflow analysis to see if loop flattening is legal, it's best if inductions variables haven't been promoted yet.
  • When induction variables of a loop nest don't use the maximum legal integer type, we promote them to the widest type so we know loop flattening is safe thus avoiding overflow analysis. Promoting induction variables is what IndVarSimplify was already doing, so this reusing that.
  • Last but not least, with the loops that we support with loop-flattening, induction variable simplification is exactly the point of this transform, so this looks like a good home for it. Thus, this also avoids quite some churn making modifications to LoopUtils where refactored/shared code could live, and in both of the passes.

This is still work-in-progress, but wanted to share the idea already. This needs porting of the existing loopflatten tests, and I need to fix a bug/integration issue, but in terms of (re)structuring this should be pretty much it.

Little ping to see if we are okay with the direction to flatten induction vars in IndVarSimplify, see my previous message for motivation as opposed to having a separate pass.

SjoerdMeijer abandoned this revision.Oct 29 2020, 7:39 AM

Think I am abandoning this as I am guessing it will be easier to get buy in if I keep things separate. This means that:

  • I will separate out the widening of the induction variable in IndVarSimplify and put this as a utility in LoopUtil,
  • and modify IndVarsSimplify to use that,
  • then do make changes to LoopFlatten.