This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] Support multi-level reduction loops
Needs ReviewPublic

Authored by congzhe on May 17 2022, 9:27 AM.

Details

Reviewers
Meinersbur
bmahjour
Group Reviewers
Restricted Project
Summary

This is a follow-up to D120386 where we enabled interchange to get the optimal access pattern for multi-level loopnests. But currently loop interchange fails to interchange multi-level loopnests that have reduction operations inside the loopnest. This patch enables interchange of reduction loops with multi-level loopnests, for example the following loopnest should be interchanged three times to get the optimal access pattern:

; Triply nested loop with reduction, should be able to do interchange three times
; to get the ideal access pattern.
 unsigned f(int e[100][100][100], int f[100][100][100]) {
   unsigned x = 0;
   for (int a = 0; a < 100; a++) {
     for (int b = 0; b < 100; b++) {
       for (int c = 0; c < 100; c++) {
          x += e[c][b][a];
       }
     }
   }
   return x;
 }

Currently interchange only supports reduction loops within 2-level loopnests. For loopnests with more than 2-levels like the case above, (after interchanging the b-loop and c-loop) the pass cannot recognize reduction phis and would bail out. This patch adds capability to recognize reduction phis for multi-level loopnests.

Diff Detail

Event Timeline

congzhe created this revision.May 17 2022, 9:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 9:27 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
congzhe requested review of this revision.May 17 2022, 9:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 9:27 AM
congzhe updated this revision to Diff 430154.May 17 2022, 12:13 PM
congzhe edited the summary of this revision. (Show Details)May 18 2022, 7:41 AM
congzhe added reviewers: Meinersbur, bmahjour, Restricted Project.
congzhe set the repository for this revision to rG LLVM Github Monorepo.
congzhe edited the summary of this revision. (Show Details)
congzhe updated this revision to Diff 439252.EditedJun 22 2022, 8:30 PM
congzhe edited the summary of this revision. (Show Details)

My apologies for the late update. Last time during a loop meeting our conclusion on this patch was that we need to 1) add more comments to explain the code, and 2) avoid construction of dominator tree and loop info locally.

I've updated the patch accordingly, i.e., added more comments and directly leveraged the loop info that is already available in loop interchange. I'd appreciate it if you could let me know your comments, thanks a lot. @bmahjour @Meinersbur

congzhe updated this revision to Diff 439256.Jun 22 2022, 8:40 PM

Consider being less specific to only recognize nested loops, but PHINodes in the loop body in general.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
789

[Nit] innermost could be considered a single word

790–792

[style] https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

I think the name findInnerReductionPhi implies that the search for the innermost loop happens in that function. Either that, or rename it to findReductionPhi.

findInnerReductionPhi is already a static helper function, you could wrap the new functionality into such a helper function as well.

[serious] This uses InnerMostLoop->getSubLoops().front() even if there are multiple nested loops inside (that hence do not even particpate in the loop interchange). Both loops might participate in the reduction.

797–798

The while condition and its if-guard is essentially the same condition and can be integrated.

I was also thinking about how InnerRedPhi->getParent() represents the CurLoop and can be directly compared InnerLoop instead of its header blocks, but it's the header of the parent loop.
Its dependence on just loops may cause it to break when a condition is involved:

unsigned f(int e[100][100][100], int f[100][100][100]) {
   unsigned x = 0;
   for (int a = 0; a < 100; a++) {
     for (int b = 0; b < 100; b++) {
       if (c)
         for (int c = 0; c < 100; c++) // Not participating in the loop interchange
            x += e[c][b][a];
         } else {
           for (int c = 0; c < 100; c++) 
              x += 1;
        }
     }
   }
   return x;
 }

[serious] Here may want to interchange a and b. b has two subloops where your first while will just randomly pick one of them. Then, there is another PHINode joining the result of the two x where getIncomingValueForBlock will be called with the wrong incoming BB and fail.

[serious] Another problem is that it does not check for what the value for the other incoming block is, it could be something unrelated to a reduction

for (int a = 0; a < 100; a++) 
  for (int b = 0; b < 100; b++) {
    for (int c = 0; c < n; c++)
      new_x = x = e[c][b][a];
    x = PHINode(42, new_x); // should be: PHINode(x, new_x); where `x` is the value when the c-loop has zero iterations and hence skipped by the loop guard.
  }
hiraditya added a comment.EditedAug 1 2022, 10:56 AM

Looks pretty interesting. I'm curious if there is a popular workload/benchmark that would improve with this patch?