This is an archive of the discontinued LLVM Phabricator instance.

[LoopFlatten] Make the analysis more robust after IV widening
ClosedPublic

Authored by SjoerdMeijer on Sep 6 2021, 3:10 AM.

Details

Summary

LoopFlatten wasn't triggering on this motivating case after IV widening:

void foo(int *A, int N, int M) {
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < M; ++j)
      f(A[i*M+j]);
}

The reason was that the old induction phi nodes were getting in the way. These narrow and dead induction phis are not always trivially dead, and having both the narrow and wide IVs confused the analysis and caused it to bail. This adds some extra bookkeeping for these old phis, so we can filter them out when checks on phi nodes are performed. Other clean up passes will get rid of these old phis and increment instructions.

As this was one of the motivating examples from the beginning, it was surprising this wasn't triggering from C/C++ code. It looks like the IR and CFG is just slightly different.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Sep 6 2021, 3:10 AM
SjoerdMeijer requested review of this revision.Sep 6 2021, 3:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 6 2021, 3:10 AM

LGTM, but of course @dmgreen should be the one to approve.

dmgreen added inline comments.Sep 7 2021, 6:51 AM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
443

Do we know this is always valid? For any sext? Or do we need to check the types match the new IV?

SjoerdMeijer added inline comments.Sep 8 2021, 12:37 AM
llvm/lib/Transforms/Scalar/LoopFlatten.cpp
443

Yes, I believe this is always valid. This is after widening, when all loop components and checks have been performed (twice actually, one on the narrow IV, and again on the wide IV). Thus, when there is an extend in the way at this point, I believe it is always in the shape that we expect it. But I will add an assert for the matching types, and check that MatchedItCount isn't null.

Added an assert.

dmgreen accepted this revision.Sep 10 2021, 1:07 AM

OK. LGTM then

This revision is now accepted and ready to land.Sep 10 2021, 1:07 AM
This revision was landed with ongoing or failed builds.Sep 10 2021, 4:37 AM
This revision was automatically updated to reflect the committed changes.

Hi,

I've hunted down a miscompile that seems to start happening with this commit.
Reproduce with

opt -passes='function(loop-flatten)' -S -o - bbi-60764.ll

The input has two nested loops, the outer for i [0, 3} and the inner for j [0, 15] and loop-flatten flattens it to one loop iterating for i' in [0, 63] which looks ok.

In the inner loop there is a load from v[16*i+j] but after flattening as far as I can see it loads from v[16*i'] so it quickly goes out-of-bounds of v and read random data instead.

So in the result there is

entry:
  %flatten.tripcount = mul i32 16, 4
  br label %for.cond1.preheader

for.cond1.preheader:                              ; preds = %for.cond.cleanup3, %entry
  %indvar2 = phi i32 [ %indvar.next3, %for.cond.cleanup3 ], [ 0, %entry ]
  %i.013 = phi i16 [ 0, %entry ], [ %inc7, %for.cond.cleanup3 ]
  %sum.012 = phi i16 [ 0, %entry ], [ %add5.lcssa, %for.cond.cleanup3 ]
  %0 = mul nsw i32 %indvar2, 16

[...]
for.body4:                                        ; preds = %for.cond1.preheader
  %indvar = phi i32 [ 0, %for.cond1.preheader ]
  %2 = add nuw nsw i32 %indvar, %0
  %3 = trunc i32 %2 to i16
  %arrayidx = getelementptr inbounds [64 x i16], [64 x i16]* @v, i16 0, i16 %3
  %4 = load i16, i16* %arrayidx, align 1

Note that before this patch, the above reproducer hit an error about a broken PHI, so there was certainly something strange going on then too.

Many thanks for that and the (short) reproducer, will be looking at that soon.