Page MenuHomePhabricator

[AMDGPU] Aggressively fold immediates in SIFoldOperands
Needs ReviewPublic

Authored by foad on Nov 26 2021, 7:56 AM.
This revision needs review, but there are no reviewers specified.



Previously SIFoldOperands::foldInstOperand would only fold a
non-inlinable immediate into a single user, so as not to increase code
size by adding the same 32-bit literal operand to many instructions.

This patch removes that restriction, so that a non-inlinable immediate
will be folded into any number of users. The rationale is:

  • It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.)
  • It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it.
  • The above benefits are expected to outweigh any increase in code size.

Diff Detail

Event Timeline

foad created this revision.Nov 26 2021, 7:56 AM
foad requested review of this revision.Nov 26 2021, 7:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 26 2021, 7:56 AM
foad added inline comments.Nov 26 2021, 8:09 AM

Regression here: we are no longer forming madak/fmaak instructions. I think this is just bad luck. madak/fmaak formation is only implemented when PeepholeOptimizer calls SIInstrInfo::FoldImmediate. I think it would be much more reliable to do it as part of SIFoldOperands / SIShrinkInstructions.

sebastian-ne added inline comments.

Not really related to this patch, but shouldn’t we be able to inline v2 (15) into the scratch_store?

foad added inline comments.Nov 26 2021, 8:42 AM

No, v2 is the value being stored, and it has to be in a vgpr for that instruction (not even an inline constant is allowed).

Maybe we should start considering optsize here?