This isn't ready for full review - I need to clean up and add in more tests that reflect issues found when working on this. But posting for any initial feedback from other timezones ahead of me looking again tomorrow (UK time).
I noted in D129178 that in some cases, code sequences like:
lui a1, %hi(.L_MergedGlobals) sw a0, %lo(.L_MergedGlobals)(a1) addi a1, a1, %lo(.L_MergedGlobals) ... (other users of a1)
Where altering the sw to use the global address once it's fully materialised into a1 might be beneficial for code size (increasing the chance the sw is compressible). Such code patterns can exist without globals merging, but the globals merging code makes them much more common.
This patch achieves this by:
- Altering SelectAddrRegImm so it won't fold in an ADD_LO if the C extension is enabled and it has users that aren't memory operations (which is typically the case when other offsets of the global are calculated with ADDs, which normally results in the global address being materialised into a register)
- TODO: it would be best to disable this if the load/store is a half-word or a float on RV64C (as there are no compressed forms of those instructions available anyway)
- Adding a peephole for handling the case where this turned out to be a bad idea - which can happen if all of those global offset calculations were merged into memory operations. Given the work Craig has done to remove the store/addi peephole, this isn't ideal...
/in that is the case/in if that is the case