This is an archive of the discontinued LLVM Phabricator instance.

[GlobalMerge] Look at uses to create smaller global sets.
ClosedPublic

Authored by ab on Mar 4 2015, 5:39 PM.

Details

Summary

As discussed in the RFC, I've been trying out a bunch of improvements to GlobalMerge. Whenever you try to improve it, it ends up being too conservative for SPEC, but this one was good enough for SPEC and some other "modern" code I've looked at.

The functionality is hidden under two flags, both enabled by default in the patch:
-global-merge-group-by-use: instead of merging everything together (current behavior, available by passing =false), look at the users of GlobalVariables, and try to group them by function, to create sets of globals used together.
-global-merge-ignore-single-use: going further, keep merging everything together *except* globals that are only ever used alone, that is, for which it is obviously non-profitable to merge them with others.

The first flag by itself seems to create some regressions in benchmarks, but ignoring globals used alone is just always better. I measured on AArch64 -O3+-flto, and I got, depending on what you look at, -0.01 to -0.17% geomean improvement:
The overwhelming majority of the changes is just "adrp+add+add" turning into "adrp+add" (all this is without LOHs). These are the biggest improvements:

SingleSource/Benchmarks/Adobe-C++/stepanov_vector                           6.3374   6.3139   -0.37%  0.0001
External/SPEC/CINT2006/400_perlbench/400_perlbench                          12.0686  11.9385  -1.08%  0.0029
External/Povray/povray                                                      4.2761   4.1903   -2.01%  0.0002
External/SPEC/CINT2006/456_hmmer/456_hmmer                                  4.9376   4.8260   -2.26%  0.0002
External/SPEC/CINT2000/253_perlbmk/253_perlbmk                              11.2362  10.7393  -4.42%  0.0045

There are some regressions as well, with the exact same changes (one less add), so I'm blaming those on bad luck: code and/or data alignment, etc.. (yeah I know, bias, and all that; and yes, the numbers are pretty stable and reproducible.)

The patch itself is pretty simple: look at all uses of globals, create sets of globals used together in a function, and finally pick the "biggest" (very braindead: usage count * size of the set) such sets.
I'll add tests if I stumble upon other interesting cases.

Diff Detail

Event Timeline

ab updated this revision to Diff 21247.Mar 4 2015, 5:39 PM
ab retitled this revision from to [GlobalMerge] Look at uses to create smaller global sets..
ab updated this object.
ab edited the test plan for this revision. (Show Details)
ab added subscribers: Unknown Object (MLST), qcolombet, Jiangning.

Hi Ahmed,

I think this is starting to look pretty good. A request for a nice big comment with the algorithm as a block comment somewhere in the file? Makes it a bit more hard to suss out what you're doing in some cases. One silly comment inline as well.

Thanks and sorry for the delay!

-eric

lib/CodeGen/GlobalMerge.cpp
191

I find the "Enable" here actively hurts my readability on this code, seems to work better without, but up to you.

ab updated this revision to Diff 23357.Apr 7 2015, 11:40 AM

You're right Eric, the "Enable" was redundant, removed. I also added a couple of block comments explaining 1) why we don't just merge everything and call it a day, and 2) an overview of the algorithm used to look at the uses

Thanks!

-Ahmed

One nit, but at this point it looks good to me. I don't think I missed anything terrible algorithmically, but no one else has stepped up either.

-eric

lib/CodeGen/GlobalMerge.cpp
267

Might be nice to comment the loops as to what they're each doing.

This revision was automatically updated to reflect the committed changes.
ab added a comment.Apr 17 2015, 6:26 PM

r235249, thanks Eric!

I should note I had to update a few of the existing ARM testcases (I tested on AArch64 only), but none of them were really testing the merging heuristics, so I think it's fine.

-Ahmed