Constants with a single use or G_IMPLICIT_DEF can be repaired by simply
cloning instruction that defined them with the appropriate register bank.
I'm wondering if this would be considered more of a hack then an appropriate way of getting rid of these annoying copies.
We get decreased instruction count in a lot of test.
I'll highlight below (inline) cases that I noticed where instruction count actually increases.
|2101 ↗||(On Diff #386883)|
This is because of SMEMtoVectorWriteHazards on gfx10. Inserted by GCNHazardRecognizer::fixSMEMtoVectorWriteHazards.
Longer code due to a vgpr (v5) used for const.
SIFoldOperands won't inline this constant because it has multiple uses. It does this because this might increase code size. Previously this were two different constants (two different instructions) but they had same value. SDag also does not inline const but uses sgpr.
Extra vgpr used here (now 3 was 2)
Last time I thought about this I thought it would be easier to have the post-regbank combiner handle this. In some situations it makes most sense to completely rematerialize the constant value in each regbank, not just reassign it. If you have an inline constant, it would be better to just emit a new constant for each bank. For multiple uses of literals its trickier since there's a code size or instruction count tradeoff based on the uses
|324 ↗||(On Diff #386883)|
|325 ↗||(On Diff #386883)|
I would expect to make the right decision upfront, not have to go back and erase copies
I think this is way too specific of a target hook
I've considered this regbank combiner solution before but it was too late for D98040. Thankfully those cases are single use constants needed for offset for load/stores, a case which can be easily solved in RegBankSelect.
Now updating of regbanks is split between RegBankSelect with no target hooks and AMDGPURegBankCombiner.
the same applies for G_FCONSTANT and G_IMPLICIT_DEF, but I'm not sure this should be unconditionally done for all targets, and in all situations. For instance on AMDGPU we may want to choose a code size tradeoff with non-inlineable constants
I'm currently thinking rematerialized in regbankselect is a good strategy. We would still want to have a post-isel operand optimizer pass try to re-scalarize constants when there's open constant bus use.
It would be better to report the rematerializability of an opcode as part of the reported mapping rather than add another callback for it
I think it would be cleaner to handle this as a separate RepairingPlacement case in applyMapping