This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] Remove support for inner-only reductions.
ClosedPublic

Authored by fhahn on Oct 9 2018, 10:17 AM.

Details

Summary

Inner-loop only reductions require additional checks to make sure they
form a load-phi-store cycle across inner and outer loop. Otherwise the
reduction value is not properly preserved. This patch disables
interchanging such loops for now, as it causes miscompiles in some
cases and it seems to apply only for a tiny amount of loops. Across the
test-suite, SPEC2000 and SPEC2006, 61 instead of 62 loops are
interchange with inner loop reduction support disabled. With
-loop-interchange-threshold=-1000, 3256 instead of 3267.

See the discussion and history of D53027 for an outline of how such legality
checks could look like.

Diff Detail

Event Timeline

fhahn created this revision.Oct 9 2018, 10:17 AM

This analysis seems very fragile.

lib/Transforms/Scalar/LoopInterchange.cpp
655

I'm pretty sure the "PHInd == -1 || IncInd == -1" check is unnecessary, given the PHI is in the loop header.

671

What happens if SI is non-null before we enter here?

674

What happens if there are mutiple stores using the PHI?

681

Could any operations between the load and store alias them? Does it matter?

This analysis seems very fragile.

Agreed, it is not ideal and I think alternatively we could just drop support for such loops as it is unlikely they will be profitable (we are moving loads/stores from outer loops to inner loops). Unfortunately I did not find any existing analysis that could be re-used for those checks, but I think they are quite conservative.

It would be great if you could let me know which option you prefer. If we go with this patch, I'll address the code comments and also add a comment in line with my responses. The underlying problem is that loops like the one below are miscompiled if they are interchanged, if y is a global variable.

for (unsigned i = 0; i < N; i++) {
  y = 0;
  for (unsigned j = 0; j < N; j++) {
    A[i][j] += 1;
    y += A[i][j];
  }
}
lib/Transforms/Scalar/LoopInterchange.cpp
674

We could just support a single store in the exit block for now I think.

681

I don't think aliasing operations between them (in the inner loop body) should be a problem. The store in the exit makes sure we write the correct value after the inner loop has been executed, and the load makes sure the inner loop uses the right values.

Having multiple load - reduction - stores which alias shouldn't be a problem either. As long as each chain forms a cycle, the results should be properly be preserved between inner loop executions and we do not change the relative order of the loads/stores. If we would have multiple stores that alias, it would mean we reduce multiple chains to a single one.

If you aren't finding real loops where this matters, simply rejecting loops with a reduction like this seems fine.

fhahn updated this revision to Diff 171958.Oct 31 2018, 10:37 AM
fhahn retitled this revision from [LoopInterchange] Fix inner loop reduction handling. to [LoopInterchange] Remove support for inner-only reductions..
fhahn edited the summary of this revision. (Show Details)

If you aren't finding real loops where this matters, simply rejecting loops with a reduction like this seems fine.

I've updated the patch to remove support for inner-loop only reductions and updated the description with some numbers across the test-suite, SPEC2k and SPEC2k6. Given the simplification and the minor practical impact, I think it would make sense to just remove it. What do you think? I've also added a comment referring back to the discussion and patches here, in case we want to add the analysis described here later on.

It also simplifies the code to support reductions across inner & outer loops.

This revision is now accepted and ready to land.Oct 31 2018, 11:52 AM
fhahn added a comment.Oct 31 2018, 2:24 PM

Thanks Eli!

This revision was automatically updated to reflect the committed changes.
test/Transforms/LoopInterchange/phi-ordering.ll