Page MenuHomePhabricator

Proof-of-concept for a Loop fusion pass (WIP)
Needs ReviewPublic

Authored by ramshankar123 on Jan 15 2015, 12:55 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

A new loop fusion pass that attempts to proactive fuse loops across function call boundaries and arbitrary control flow.

The pass initially attempts to inline calls which have loops in them. Following inlining, a separate scalar opt pass fuses loops. It is not necessary for loops to be adjacent or have the same control dependence. Specifically a control dependence closure is computed and loops that share a control dep closure prefix are taken as candidates for fusion - in case there are no other loops in a path between them. A loop graph is built with edges between loops which share the control dependence closure prefix. A recursive application of fusion on the graph starts from "bottom-up".

During fusion, program slices are taken based on the control flow closures of the two loops. Example: Suppose two loops A and B are going to be fused. They share the same control dependence prefix, but their suffices vary. The suffices are used to duplicate control flow paths that pass through both loops. Specifically paths from the first block in the control-dep closure suffix of the first loop, through the first loop's exit and following into the suffix of the second loop through the second loop's exit on to the common post-dominator of either loop's exit. The number of paths would grow exponentially. At this time some heuristics are used to prune out for paths. Using profile based heuristics is a better idea.

Also function unswitching helps by cleaning up control flow.

With this pass, we get 103 loop fusions in SPECCPU INT 2006 libquantum with rate performance improving close to 2.5X in x86 (results from AMD A10-6700).

I took some liberties in patching up some of the code in ScalarEvolution/DependenceAnalysis/GlobalsModRef/Clang options/ and also adjusted the IPO/LTO pass managers. I would need to do a better job here but I would like to put the patch out with the WIP tag for high level reviews. At this time standard loop fusion test cases would not work (cases where control dep are same or loops are truly adjacent etc). I would be doing that as well.

Appreciate suggestions and otherwise help.

Example:
if (a & b) {

if (a & c) {
  if (t & 0xF) {
    for (i=0; i < size; i++) {
      if (A[i] & d) {
        A[i] |= (1 << t);
      }
    }
  }
}

}

if (b & d) {

if (a & d) {
  if (t & 0xA) {
    for (i=0; i < size; i++) {
      if (A[i] & d) {
        A[i] |= (1 << t);
      }
    }
  }
}

}

After fusion we will have:

if (a&b && a&c && t&0xF && b&d && a&d t&0xA) {

for (i=0; i < size; i++) {
  if (A[i] & d) {
    A[i] |= (1 << t);
  }
  if (A[i] & d) {
    A[i] |= (1 << t);
  }
}

} else {
// original code

if (a & b) {
  if (a & c) {
    if (t & 0xF) {
      for (i=0; i < size; i++) {
        if (A[i] & d) {
          A[i] |= (1 << t);
        }
      }
    }
  }
}

if (b & d) {
  if (a & d) {
    if (t & 0xA) {
      for (i=0; i < size; i++) {
        if (A[i] & d) {
          A[i] |= (1 << t);
        }
      }
    }
  }
}

}

Diff Detail

Event Timeline

ramshankar123 updated this revision to Diff 18250.
ramshankar123 retitled this revision from to Proof-of-concept for a Loop fusion pass (WIP).
ramshankar123 updated this object.
ramshankar123 edited the test plan for this revision. (Show Details)

Clang update to add new flags for loop fusion

aadg added a subscriber: aadg.Jan 16 2015, 3:05 AM

Is this missing the pass itself?