This is an archive of the discontinued LLVM Phabricator instance.

Simplify n-ary adds by reassociation
ClosedPublic

Authored by jingyue on Apr 9 2015, 9:26 PM.

Details

Summary

This transformation reassociates a n-ary add so that the add can partially reuse
existing instructions. For example, this pass can simplify

void foo(int a, int b) {
  bar(a + b);
  bar((a + 2) + b);
}

to

void foo(int a, int b) {
  int t = a + b;
  bar(t);
  bar(t + 2);
}

saving one add instruction.

Fixes PR22357 (https://llvm.org/bugs/show_bug.cgi?id=22357).

Diff Detail

Event Timeline

jingyue updated this revision to Diff 23563.Apr 9 2015, 9:26 PM
jingyue retitled this revision from to Simplify n-ary adds by reassociation.
jingyue updated this object.
jingyue edited the test plan for this revision. (Show Details)
jingyue added a subscriber: Unknown Object (MLST).

I previously tried performing this optimization in StraightLineStrengthReduce (D8898), but realized it is probably worth a standalone pass for the following reasons:

  1. it is general enough;
  2. users can flexibly choose where to run it;
  3. it can be made more efficient without being restricted to the framework of SLSR.

So, I gave it another try, and would like to see how you like this implementation. As mentioned in the header comments, this implementation is preliminary and has lots of limitations. But I hope to make it more fully-fledged iteratively. For now, I only plan to use it after SLSR in the NVPTX backend.

atrick accepted this revision.Apr 13 2015, 2:15 PM
atrick edited edge metadata.

Your implementation looks good to me.

I would like at least one other reviewer to sign off on creating a new pass to do this--I'm not sure what the best answer is. There are several places in LLVM where reassociation makes sense. This should probably not happen in the current Reassociate pass because that is really a canonicalization and it may be better to leave the simple single-use expressions in place for further optimization. We would also like to reassociate in machine code based on registers and critical path, but this is redundancy elimination so doesn't belong there either. So I'm personally ok with a new IR pass that people can experiment with in their pipelines. I would still like to get feedback from others first.

This revision is now accepted and ready to land.Apr 13 2015, 2:15 PM
sanjoy edited edge metadata.Apr 13 2015, 2:54 PM

Minor comments inline.

lib/Transforms/Scalar/NaryReassociate.cpp
74

Since you're using PatternMatch in only one function, I think the namespace should be opened in just that scope.

141

I think LLVM convention is Node. Also, I'd suggest using auto *Node to make it obvious that we have a pointer here (very minor).

200

Is it possible to switch to a traversal and expression gathering scheme that will always visit defs before uses? Reverse post-order maybe? I think that way we can avoid the dominates query (which can be expensive) and just turn it into an assert; and just use the earliest (latest?) element in LHSCandidates.

sanjoy accepted this revision.Apr 13 2015, 2:58 PM
sanjoy edited edge metadata.

I'm okay with this living as a separate pass. I think this pass exposes clear, orthogonal functionality and can also be made more efficient by the virtue of working on an entire function at once.

Minor comments inline, all of which are okay to fix either pre or post commit, as you see fit.

Thanks Andrew and Sanjoy for the review! All comments are addressed inline.

lib/Transforms/Scalar/NaryReassociate.cpp
74

I have a pending patch based on this, and that patch will use PatternMatch too. So I'll leave the code as is for now.

141

Changing auto to auto * doesn't compile. I guess it's because nodes_begin returns an iterator and the compiler is unable to deduce auto * from the iterator type. Additionally, even if the code compiled, ++Node would be problematic.

200

We traversed the instructions in the pre-order of the dominator tree, so defs are already always visited before uses. However, this doesn't avoid checking dominates(*LHS, I) because *LHS being visited before I does not mean *LHS dominating I.

Anyhow, I think there is a way of avoiding the dominance check, and I'm happy to put it in the next patch. Whenever we backtrack from a basic block during DFS'ing the dominator tree, we remove all SCEVs defined in this basic block from SeenExprs. This can ensure that, at any time, all the instructions in SeenExprs dominate the basic block we are exploring.

For example, suppose the dominator tree of this function looks like:

B1 -> B2
 |
 V
B3 -> B4
 |
 V
B5

Then, we traverse the tree in this order: +B1 +B2 -B2 +B3 +B4 -B4 +B5 -B5 -B3 -B1. + means entering a node, and - means exiting a node.

Let's pick a random basic block say B4. Before entering B4, we added B1, B2, and B3 to SeenExprs and removed B2 from it, so SeenExprs contains instructions in B1 and B3 which must dominate B4.

jingyue updated this revision to Diff 23715.Apr 13 2015, 9:50 PM
jingyue edited edge metadata.

Addressed Sanjoy's comments

jingyue closed this revision.Apr 13 2015, 10:02 PM