This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Reuse a materialised global address in preference to merging into a load/store
Needs ReviewPublic

Authored by asb on Jul 13 2022, 12:30 PM.

Details

Summary

This isn't ready for full review - I need to clean up and add in more tests that reflect issues found when working on this. But posting for any initial feedback from other timezones ahead of me looking again tomorrow (UK time).

I noted in D129178 that in some cases, code sequences like:

lui a1, %hi(.L_MergedGlobals)
sw a0, %lo(.L_MergedGlobals)(a1)
addi a1, a1, %lo(.L_MergedGlobals)
... (other users of a1)

Where altering the sw to use the global address once it's fully materialised into a1 might be beneficial for code size (increasing the chance the sw is compressible). Such code patterns can exist without globals merging, but the globals merging code makes them much more common.

This patch achieves this by:

  • Altering SelectAddrRegImm so it won't fold in an ADD_LO if the C extension is enabled and it has users that aren't memory operations (which is typically the case when other offsets of the global are calculated with ADDs, which normally results in the global address being materialised into a register)
    • TODO: it would be best to disable this if the load/store is a half-word or a float on RV64C (as there are no compressed forms of those instructions available anyway)
  • Adding a peephole for handling the case where this turned out to be a bad idea - which can happen if all of those global offset calculations were merged into memory operations. Given the work Craig has done to remove the store/addi peephole, this isn't ideal...

Diff Detail

Event Timeline

asb created this revision.Jul 13 2022, 12:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 12:30 PM
asb requested review of this revision.Jul 13 2022, 12:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 12:30 PM
asb updated this revision to Diff 445795.Jul 19 2022, 6:02 AM
asb edited the summary of this revision. (Show Details)

Rebase, and only alter the codegen path if the C extension is enabled, update patch summary.

Any thoughts on this kind of transformation? I'm thinking it may make more sense as an extension to RISCVMakeCompressible (as then we can additionally gate the transformation on whether the load/store operand registers are compressible in the first place). And perhaps we want some parts of RISCVMakeCompressible to run even for optimisation modes other than Os/Oz (see e.g. #56390 for another case this might be worthwhile).

reames added inline comments.Jan 30 2023, 12:11 PM
llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll
84

Looking at your test here, there's something interesting lurking. You probably already knew this, but it took me a while staring at the code to figure this out.

We're materializing two addresses here. Those addresses are known to be four bytes apart, but we do not know that e.g. the high bits are common. The two addresses could e.g. straddle a 12 bit aligned boundary, and thus the high bits of the second may not match those of the first. If we did know this fact, we could use the common high bits and fold both low parts into loads. (Your load_g_8 actually covers exactly that.)

So, conceptually, the Rv32 split case does look exactly like two merged i32 globals.

Now, I do have to say that this test case may not matter. The RV32 split i64 case should be 8 byte aligned, and thus not hit this.

I do see why this is interesting from a global merging perspective though. Coming at this a bit differently, are we allowed to increase the alignment of the merged global? If we are, we might make cases look like more load_g_8 than load_g_1.

Another idea: Do we have any way to express the opposite of alignment? That is, we don't care that an address is aligned *unless* the result would cross some boundary? I remember we have such a concept for instructions (e.g. boundary align, thanks JCC errata!), but if could spell something like that for globals, then we could leave their addresses unaligned *unless* they crossed the boundary which allows the high bits to differ. Haven't explored that much, so not sure what we have here.

asb added inline comments.Feb 2 2023, 6:01 AM
llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll
84

Coming at this a bit differently, are we allowed to increase the alignment of the merged global? If we are, we might make cases look like more load_g_8 than load_g_1.

That seems like a promising idea. I'm not aware of anything that would prevent us from increasing alignment. GlobalMerge currently sets the alignment of the merged global to the maximum alignment of its merged members, and maintains alignment for the members using a struct and adding padding as necessary - I think additional padding could just be added.

Re the second idea: I don't think I've seen anything that would give that so far, but perhaps I've missed it. And as you suggest, there's some precedent elsewhere.

luke added inline comments.Feb 2 2023, 6:10 AM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2389

/in that is the case/in if that is the case

luke957 removed a subscriber: luke957.Feb 2 2023, 11:28 PM