This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Refactor an optimization of addition with immediate
ClosedPublic

Authored by benshi001 on Apr 19 2021, 8:37 AM.

Details

Summary

Simplify the original optimization from customed DAG selection code
to TD pattern.

Diff Detail

Event Timeline

benshi001 created this revision.Apr 19 2021, 8:37 AM
benshi001 requested review of this revision.Apr 19 2021, 8:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2021, 8:37 AM

The original optimization is written in customed DAG selection, which is hard to extend.
Changing it to TD pattern makes it more flexible.

craig.topper added inline comments.Apr 19 2021, 9:18 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1298

Do we need to make sure sext_inreg is the only user of the add? Otherwise we’ll emit two ADDIWs and two ADDIs.

craig.topper added inline comments.Apr 19 2021, 10:25 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1298

I think if we emitted ADDI followed by ADDIW for sign_ext case, the first ADDI would CSE with the first ADDI from a non-sext case if there were multiple uses. Then we wouldn't need to check for multiple uses.

lenary resigned from this revision.Apr 19 2021, 11:27 AM
benshi001 added inline comments.Apr 19 2021, 10:31 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1298

I do not quite understand your concern, why do you think "Otherwise we’ll emit two ADDIWs and two ADDIs." ? That is impossible.

  1. for the IR "%a = add i32 %b, 3001", two ADDIs are emitted on both rv32 and rv64.
  1. for the IR "%a = add i64 %b, 3001", two ADDIs are emitted on rv64.
  1. for the IR "%a = add i64 %b, 3001", two ADDIs are emitted on rv32 for the lower 32-bit, (along with other instrs for the upper 32-bit).
  1. for the IR pattern "%a = add i32 %b, 3001; %c = sext_inreg %a, i32", two ADDIWs are emitted without any ADDI.

I did not find any other cases/IR patterns for this optimization.

1298

In current patch, there is no possibility that ADDI/ADDIW are mixedly emmitted. I can remove the one-use check, but I do concern the immediate composed by a lui/addi pair has further use, and my transform leads to less efficient code.

craig.topper added inline comments.Apr 19 2021, 10:44 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1298

Example test

define signext i32 @add32_sext_accept(i32 signext %a, i32* %b) nounwind {                                                                                                                      
  %1 = add i32 %a, 2999                                                                                                                                                                        
  store i32 %1, i32* %b                                                                                                                                                                        
  ret i32 %1                                                                                                                                                                                   
}

Produces

addi    a2, a0, 1500
addi    a2, a2, 1499
addiw   a0, a0, 1500
addiw   a0, a0, 1499
sw      a2, 0(a1)
ret

Though for that example we could use the addiw result for the sw, but that's a bit hard to fix at the moment.

Here's another example

define i64 @add32_sext_accept(i64 %a, i64* %b) nounwind {                                                                                                                                      
  %1 = add i64 %a, 2999                                                                                                                                                                        
  store i64 %1, i64* %b                                                                                                                                                                        
  %2 = shl i64 %1, 32                                                                                                                                                                          
  %3 = ashr i64 %2, 32                                                                                                                                                                         
  ret i64 %3                                                                                                                                                                                   
}

produces

addi    a2, a0, 1500
addi    a2, a2, 1499
addiw   a0, a0, 1500
addiw   a0, a0, 1499
sd      a2, 0(a1)
ret
benshi001 added inline comments.Apr 19 2021, 11:21 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
1298

I see. Your concern does matter. Currently I can not figure an easy way to cover all special cases. So I will remove the ADDIW rule.

benshi001 retitled this revision from [RISCV] Optimize addition with immediate to [RISCV] Refactor an optimization of addition with immediate.
benshi001 edited the summary of this revision. (Show Details)
benshi001 removed a reviewer: lenary.

I have removed the ADDIW parts, this patch is simply a refactoring of the oringal optimization, changeing from DAG to TD. There is no extra test needed, since llvm/test/codegen/riscv/add-imm.ll has covered all cases.

craig.topper accepted this revision.Apr 19 2021, 11:51 PM

LGTM

I think to address my concern you can just change the (sext_inreg (add GPR:$rs1, (AddiPair GPR:$rs2)), i32) to produce (ADDIW (ADDI GPR:$rs1, (AddiPairImmB GPR:$rs2))). We only need the W on the outer ADDI to sign extend. That should reduce my test examples to 1 ADDIW and 2 ADDIs. Which is the same number of instructions you get now with sext.w+addi+addi in the worst case. And when there aren't additional uses you get down to just 2 instructions.

The other option is to change the pattern to (sext_inreg (add_oneuse GPR:$rs1, (AddiPair GPR:$rs2)), i32) by using a PatFrag to check that the add only has one use, the sext_inreg. That would keep my example tests as sext.w+addi+addi as the pattern wouldn't match, but allow 2 instructions when the one use check passes.

The first approach allows 2 of the addis in my test cases to execute in parallel on a superscalar core. The second approach with add_oneuse serializes the sext.w after the addis have completed.

This revision is now accepted and ready to land.Apr 19 2021, 11:51 PM