This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Set isReMaterializable on ADDI and LUI instructions
ClosedPublic

Authored by asb on Apr 27 2018, 5:21 AM.

Details

Summary

Although this patch is straight-forward, I'm hoping that more eyes on it will help point out if there are any additional hooks I may be missing related to rematerialisation.

The isReMaterlizable flag is somewhat confusing, unlike most other instruction flags it is currently interpreted as a hint (mightBeRematerializable would be a better name). While LUI is always rematerialisable, for an instruction like ADDI it depends on its operands. TargetInstrInfo::isTriviallyReMaterializable will call TargetInstrInfo::isReallyTriviallyReMaterializable, which in turn calls TargetInstrInfo::isReallyTriviallyReMaterializableGeneric. We rely on the logic in the latter to pick out instances of ADDI that really are rematerializable.

The isReMaterializable flag does make a difference on a variety of test programs. The recently committed remat.ll test case demonstrates how stack usage is reduce and a unnecessary lw/sw can be removed. Stack usage in the Proc0 function in dhrystone reduces from 192 bytes to 112 bytes.

For the sake of completeness, this patch also implements RISCVRegisterInfo::isConstantPhysReg. Although this is called from a number of places, it doesn't seem to result in different codegen for any programs I've thrown at it. However, it is called in the rematerialisation codepath and it seems sensible to implement something correct here.

@apazos, @mgrang, @sabuasal: could you please check the workloads you reported before indicated spilling of trivial values to ensure this patch addresses those.

Diff Detail

Repository
rL LLVM

Event Timeline

asb created this revision.Apr 27 2018, 5:21 AM

@asb Thanks for the patch. I tested this on our internal workload and it gave us ~44 bytes savings. However, instead if we mark all ALU insts as isReMaterializable then we get ~226 bytes savings:

let hasSideEffects = 0, isReMaterializable = 1, mayLoad = 0, mayStore = 0 in
class ALU_ri<bits<3> funct3, string opcodestr>
    : RVInstI<funct3, OPC_OP_IMM, (outs GPR:$rd), (ins GPR:$rs1, simm12:$imm12),
              opcodestr, "$rd, $rs1, $imm12">;

Note: Marking ALU insts in addition to ADDI and LUI as isReMaterializable still gave us only ~44 bytes savings.

asb added a comment.Apr 27 2018, 1:17 PM

@asb Thanks for the patch. I tested this on our internal workload and it gave us ~44 bytes savings.
However, instead if we mark all ALU insts as isReMaterializable then we get ~226 bytes savings:
Note: Marking ALU insts in addition to ADDI and LUI as isReMaterializable still gave us only ~44 bytes savings.

Hi Mandeep - thanks for taking a look. I'm a bit confused, which of the above 3 statements is true?

mgrang added a comment.EditedApr 27 2018, 1:30 PM
In D46182#1081552, @asb wrote:

@asb Thanks for the patch. I tested this on our internal workload and it gave us ~44 bytes savings.
However, instead if we mark all ALU insts as isReMaterializable then we get ~226 bytes savings:
Note: Marking ALU insts in addition to ADDI and LUI as isReMaterializable still gave us only ~44 bytes savings.

Hi Mandeep - thanks for taking a look. I'm a bit confused, which of the above 3 statements is true?

Sorry for not being clear enough. This patch alone gives us 44 bytes. Internally, I had tried a patch which just marks ALU insts as isReMat and that gave us 226 bytes. But combining the two patches still gives us only 44 bytes.

So I digged a bit more and it seems the degradation in code size comes from marking LUI as isReMaterializable. This results in repeated load zero for our workload:

lui x10, 0

So in your patch if I simply remove LUI isReMat (and keep ADDI isReMat) then I get 226 bytes (without my ALU patch).

asb added a comment.Apr 27 2018, 1:44 PM

Sorry for not being clear enough. This patch alone gives us 44 bytes. Internally, I had tried a patch which just marks ALU insts as isReMat and that gave us 226 bytes. But combining the two patches still gives us only 44 bytes.

So I digged a bit more and it seems the degradation in code size comes from marking LUI as isReMaterializable. This results in repeated load zero for our workload:

lui x10, 0

So in your patch if I simply remove LUI isReMat (and keep ADDI isReMat) then I get 226 bytes (without my ALU patch).

Thanks for the datapoint. I'm not surprised there's no big code size win. I'm mostly seeing minor codegen changes, and where those changes are noticeable the win is in reducing stack size.

The use of LUI to zero a register is curious. I'll play around with those patch variants and see if I can replicate something similar. If there's a representative example that would be interesting, but I know from playing around with this series of patches that tweak codegen that it can be _really_ difficult to pull out a minimal example.

asb added a comment.Apr 30 2018, 5:13 AM

I haven't seen the selection of lui $reg, 0 at all in the wild. Can you take a closer look at where you're seeing this please? It implies something _very_ odd is going on with constant lowering.

mgrang accepted this revision.May 10 2018, 8:00 AM

LGTM. I didn't get a chance to look further into this. Since this patch helps with overall code size I think we can commit this now and we can report back once we have done more analysis.

This revision is now accepted and ready to land.May 10 2018, 8:00 AM
This revision was automatically updated to reflect the committed changes.