This is an archive of the discontinued LLVM Phabricator instance.

[MergeSets] Add infrastructure to build merge sets based on Das and Ramakrishna's paper.
Needs ReviewPublic

Authored by fhahn on Jan 23 2019, 3:00 PM.

Details

Summary

Merge sets as described in the paper below provide a fast way to answer multiple
queries for basic blocks that may need PHI nodes, by pre-computing the merge sets
for all nodes in a dominator tree in constant time.

Dibyendu Das and U. Ramakrishna. 2005. A practical and fast iterative
algorithm for φ-function computation using DJ graphs.
ACM Trans. Program. Lang. Syst. 27, 3 (May 2005), 426-440.

Merge sets can be used to speed up code that currently calculates
iterated dominance frontiers multiple times to find nodes that need PHI
nodes when doing SSA updates.

One example is mem2reg. With merge sets, mem2reg/SROA runs about 3 times
faster for a SROA heavy test case I could find (
https://bugs.llvm.org/show_bug.cgi?id=15412)

With IDF

Total Execution Time: 33.5696 seconds (33.5756 wall clock)

 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
10.2445 ( 31.0%)   0.1748 ( 31.4%)  10.4193 ( 31.0%)  10.4207 ( 31.0%)  SROA #2
 4.7518 ( 14.4%)   0.0120 (  2.1%)   4.7637 ( 14.2%)   4.7635 ( 14.2%)  Induction Variable Simplification
 3.1591 (  9.6%)   0.0248 (  4.5%)   3.1839 (  9.5%)   3.1862 (  9.5%)  Simple Register Coalescing

With merge sets (this patch)

Total Execution Time: 25.3381 seconds (25.3400 wall clock)

 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
 4.8915 ( 19.8%)   0.0146 (  2.2%)   4.9061 ( 19.4%)   4.9061 ( 19.4%)  Induction Variable Simplification
 2.9898 ( 12.1%)   0.2932 ( 44.3%)   3.2830 ( 13.0%)   3.2838 ( 13.0%)  SROA #2
 2.9581 ( 12.0%)   0.0153 (  2.3%)   2.9733 ( 11.7%)   2.9739 ( 11.7%)  Simple Register Coalescing

I haven't done any micro tuning so far and the merge sets implementation
still needs a bit of polishing, but I thought it would be good to share
this early to get some high level feedback.

This potentially could be used in SSAUpdaterBulk in combination with a
fast renamer similar to the implementation in PredicateInfo (I have a
patch for such a renamer that I will post shortly).

Event Timeline

fhahn created this revision.Jan 23 2019, 3:00 PM
fhahn added a comment.Jan 23 2019, 3:03 PM

The patch currently implements variant 1 form the paper (TDMSC-I) as it is slightly simpler. I will follow up with variant 2 (TDMSC-II) in a bit and do some more detailed compile time measurement. Also the only reason MergeSets live in IteratedDominanceFrontier.h/cpp was convenience, they could be moved to their own files.

I need to read the paper again because I forgot how the algorithm works. Please ping me in a week if I forget to review this.

fhahn updated this revision to Diff 185572.Feb 6 2019, 9:06 AM

Move implementation to MergeSets.h/.cpp and implement variant 2 from the paper as build2().

fhahn added a comment.Feb 6 2019, 9:10 AM

I need to read the paper again because I forgot how the algorithm works. Please ping me in a week if I forget to review this.

Thanks Davide! On CTMark, the impact is neutral (some tiny improvements/regressions, in the noise). In some cases, computing the merge sets is more expensive, than the IDF (e.g. if we just need to place PHI nodes for a one or two allocas in a function), so we might want to pick one strategy depending on the input function.

(I also plan to refactor the implementation of both variants a bit)

Interesting, I have a merge-sets implementation in my patch: https://reviews.llvm.org/D32140
It will be interesting to compare the two.

fhahn added a comment.Feb 6 2019, 12:19 PM

Interesting, I have a merge-sets implementation in my patch: https://reviews.llvm.org/D32140
It will be interesting to compare the two.

Interesting! It seems the biggest difference is that the implementation here does not explicitly construct the DJ graph and uses BitVectors instead of sets. It would probably make sense to split up the mergeset part and the PromoteMem2Reg part once they are ready for committing. Let me know if you are interesting in trying the implementation for your patch, I'd be happy to split it up sooner.

hiraditya accepted this revision.Sep 29 2019, 10:20 AM

i think this will help quite a few optimizations get faster and enable others like instruction scheduling in SSA. We can push this patch and make incremental improvements as needed. Thanks for working on this!

This revision is now accepted and ready to land.Sep 29 2019, 10:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2019, 10:20 AM
hiraditya requested changes to this revision.Sep 29 2019, 10:38 AM
hiraditya added inline comments.
llvm/include/llvm/Analysis/MergeSets.h
82 ↗(On Diff #185572)

We can use a more descriptive name here and below.

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
554–556

nit:

716

can we put the code under #ifndef in separate function.

813

nit:

This revision now requires changes to proceed.Sep 29 2019, 10:38 AM
hiraditya added inline comments.Sep 29 2019, 10:39 AM
llvm/include/llvm/Analysis/MergeSets.h
14 ↗(On Diff #185572)

is it okay to use non-ASCII characters? I'm not sure but wanted to point out.

fhahn planned changes to this revision.Sep 29 2019, 2:34 PM

Thanks! I'll try to update this patch and a few related patches around the time of the dev meeting.

Great, looking forward to it!

fhahn updated this revision to Diff 226940.Oct 29 2019, 10:50 AM

Rebase patch

Thanks for rebasing the patch.

llvm/lib/Analysis/MergeSets.cpp
47 ↗(On Diff #226940)

Some comments would be nice!

109 ↗(On Diff #226940)

Is it possible that DFS numbers are already up-to-date?

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
726

We can put the verification part somewhere else, possibly under a flag.