As discussed in the RFC, I've been trying out a bunch of improvements to GlobalMerge. Whenever you try to improve it, it ends up being too conservative for SPEC, but this one was good enough for SPEC and some other "modern" code I've looked at.
The functionality is hidden under two flags, both enabled by default in the patch:
-global-merge-group-by-use: instead of merging everything together (current behavior, available by passing =false), look at the users of GlobalVariables, and try to group them by function, to create sets of globals used together.
-global-merge-ignore-single-use: going further, keep merging everything together *except* globals that are only ever used alone, that is, for which it is obviously non-profitable to merge them with others.
The first flag by itself seems to create some regressions in benchmarks, but ignoring globals used alone is just always better. I measured on AArch64 -O3+-flto, and I got, depending on what you look at, -0.01 to -0.17% geomean improvement:
The overwhelming majority of the changes is just "adrp+add+add" turning into "adrp+add" (all this is without LOHs). These are the biggest improvements:
SingleSource/Benchmarks/Adobe-C++/stepanov_vector 6.3374 6.3139 -0.37% 0.0001 External/SPEC/CINT2006/400_perlbench/400_perlbench 12.0686 11.9385 -1.08% 0.0029 External/Povray/povray 4.2761 4.1903 -2.01% 0.0002 External/SPEC/CINT2006/456_hmmer/456_hmmer 4.9376 4.8260 -2.26% 0.0002 External/SPEC/CINT2000/253_perlbmk/253_perlbmk 11.2362 10.7393 -4.42% 0.0045
There are some regressions as well, with the exact same changes (one less add), so I'm blaming those on bad luck: code and/or data alignment, etc.. (yeah I know, bias, and all that; and yes, the numbers are pretty stable and reproducible.)
The patch itself is pretty simple: look at all uses of globals, create sets of globals used together in a function, and finally pick the "biggest" (very braindead: usage count * size of the set) such sets.
I'll add tests if I stumble upon other interesting cases.
I find the "Enable" here actively hurts my readability on this code, seems to work better without, but up to you.