This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Enable the GlobalMerge pass by default
Needs ReviewPublic

Authored by asb on Jul 6 2022, 2:19 AM.

Details

Summary

As a follow-up to D130481, this patch enables GlobalMerge by default. Posting for comment / testing rather than seriously suggesting we go ahead and enable this yet (the earliest we would possibly to this is after the LLVM 15 branch).

Note: an earlier version of this review both added support for GlobalMerge and enabled it by default. This has now been split out.

Diff Detail

Event Timeline

asb created this revision.Jul 6 2022, 2:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2022, 2:19 AM
asb requested review of this revision.Jul 6 2022, 2:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2022, 2:19 AM
Allen added a subscriber: Allen.Jul 6 2022, 6:24 PM
asb updated this revision to Diff 445746.Jul 19 2022, 2:33 AM

Ping and rebase.

Any thoughts? One suggestion would be to tweak this patch so it adds the globals merging tests and command-line option, but leaves it off by default, and we can revisit enabling after 15.x branch takes place.

This patch seems to prevent GP linker relaxation on dhrystone. Presumably because the merged symbols are no longer in the .sbss section?

asb updated this revision to Diff 447306.Jul 25 2022, 6:28 AM
asb edited the summary of this revision. (Show Details)

Change this patch so it follows-on from the newly split D130481 (which adds support for GlobalMerge). This allows us to separately review and merge basic support vs changes to the default pass pipeline.

asb retitled this revision from [RISCV] Enable the GlobalMerge pass for RISC-V to [RISCV] Enable the GlobalMerge pass by default.

This patch seems to prevent GP linker relaxation on dhrystone. Presumably because the merged symbols are no longer in the .sbss section?

Yeah, probably because of that (or .sdata?). It seems the heuristic of being small is failing as a predictor of what you want to put in the relaxation range. I guess the actual metric you want to optimize for is the total number of symbol references that you can move from the general address range to the optimizable address range? Not the kind of thing you can do from a linker script... How big is the regression? For comparison, Embench O3 and Oz have code size improvements of 3.35% and 2.92%, respectively (D129686 doesn't seem to change that at all).

llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
213–217

Please add a comment describing in prose when the option is enabled.

reames added a comment.EditedJan 30 2023, 2:08 PM

The following is basically a brain dump on a few things vaguely related to GlobalMerge for RISCV. This isn't a review comment on this review per se. Some of this came from discussion w/Palmer because I nerd sniped myself into thinking this a bit too hard, and he was willing to brainstorm with me. I then did the same to @craig.topper a bit later, and edited in some further changes.

Profitability wise, we have three known cases.

Case 1 is where the alignment guarantees the second address could fold into the consuming load/store instruction. The simplest case would be to restrict to when at least one of the globals being merged had a sufficiently large alignment. https://reviews.llvm.org/D129686#inline-1380320 has some brainstorming on a more advanced boundary align mechanism, but building that out is likely non trivial. There have been some other use cases for analogous features in the past, but I don't have details.

Case 2 is when we have three or more accesses using the same global (regardless of alignment). In this case, we only need one lui/addi pair + one access with small folded offset for each of the original access. This is a 1 instruction savings for each additional access.

Case 3 is a size optimization only. This is Alex's https://reviews.llvm.org/D129686 and is geared at using compressed instructions to share common addresses.

For the GP interaction, we may want to take a close look at how gcc models global merging vs how we do. Per Palmer, it keeps around the symbols for each global, and that may impact the heuristic that LD uses for selecting globals to place near GP. We may be able to massage our output a bit to line up with the existing heuristics.

There's a question of how worthwhile this is. For anything beyond static builds with medlow, we need to worry about pc relative addresses. Out of the three known profitable cases above, case 2 and 3 apply to pc relative sequences without knowing the alignment of the auipc, but case 1 does not. For case 1, we'd need to additionally account for the alignment of the auipc. We could potentially insert an align directive, but that wastes space. Per Palmer, there was some previous discussion around a relocation type for an optimized "aligned auipc" construct which used (at most) a single extra instruction. However, no one has pushed this forward.

My current thinking is that we should probably enable this for code size minimization only, and return to it at a later point.

The following is basically a brain dump on a few things vaguely related to GlobalMerge for RISCV. This isn't a review comment on this review per se. Some of this came from discussion w/Palmer because I nerd sniped myself into thinking this a bit too hard, and he was willing to brainstorm with me. .

Profitability wise, we could potentially restrict GM to cases where the alignment guarantees the second address could fold into the consuming load/store instruction. The simplest case would be to restrict to when at least one of the globals being merged had a sufficiently large alignment. https://reviews.llvm.org/D129686#inline-1380320 has some brainstorming on a more advanced boundary align mechanism, but building that out is likely non trivial. There have been some other use cases for analogous features in the past, but I don't have details.

For the GP interaction, we may want to take a close look at how gcc models global merging vs how we do. Per Palmer, it keeps around the symbols for each global, and that may impact the heuristic that LD uses for selecting globals to place near GP. We may be able to massage our output a bit to line up with the existing heuristics.

There's a question of how worthwhile this is. For anything beyond static builds with medlow, we need to worry about pc relative addresses. Given that, we'd need to additionally account for the alignment of the auipc. We could potentially insert an align directive, but that wastes space. Per Palmer, there was some previous discussion around a relocation type for an optimized "aligned auipc" construct which used (at most) a single extra instruction. However, no one has pushed this forward.

My current thinking is that we should probably enable this for code size minimization only, and return to it at a later point.

We saw regressions in 400.perlbench and 458.sjeng from SPEC2006INT on one of our cores when enabling GlobalMerge. I suspect this was due to a loss of GP relaxation.