Simplify the original optimization from customed DAG selection code
to TD pattern.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The original optimization is written in customed DAG selection, which is hard to extend.
Changing it to TD pattern makes it more flexible.
llvm/lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
1298 | Do we need to make sure sext_inreg is the only user of the add? Otherwise we’ll emit two ADDIWs and two ADDIs. |
llvm/lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
1298 | I think if we emitted ADDI followed by ADDIW for sign_ext case, the first ADDI would CSE with the first ADDI from a non-sext case if there were multiple uses. Then we wouldn't need to check for multiple uses. |
llvm/lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
1298 | I do not quite understand your concern, why do you think "Otherwise we’ll emit two ADDIWs and two ADDIs." ? That is impossible.
I did not find any other cases/IR patterns for this optimization. | |
1298 | In current patch, there is no possibility that ADDI/ADDIW are mixedly emmitted. I can remove the one-use check, but I do concern the immediate composed by a lui/addi pair has further use, and my transform leads to less efficient code. |
llvm/lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
1298 | Example test define signext i32 @add32_sext_accept(i32 signext %a, i32* %b) nounwind { %1 = add i32 %a, 2999 store i32 %1, i32* %b ret i32 %1 } Produces addi a2, a0, 1500 addi a2, a2, 1499 addiw a0, a0, 1500 addiw a0, a0, 1499 sw a2, 0(a1) ret Though for that example we could use the addiw result for the sw, but that's a bit hard to fix at the moment. Here's another example define i64 @add32_sext_accept(i64 %a, i64* %b) nounwind { %1 = add i64 %a, 2999 store i64 %1, i64* %b %2 = shl i64 %1, 32 %3 = ashr i64 %2, 32 ret i64 %3 } produces addi a2, a0, 1500 addi a2, a2, 1499 addiw a0, a0, 1500 addiw a0, a0, 1499 sd a2, 0(a1) ret |
llvm/lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
1298 | I see. Your concern does matter. Currently I can not figure an easy way to cover all special cases. So I will remove the ADDIW rule. |
I have removed the ADDIW parts, this patch is simply a refactoring of the oringal optimization, changeing from DAG to TD. There is no extra test needed, since llvm/test/codegen/riscv/add-imm.ll has covered all cases.
LGTM
I think to address my concern you can just change the (sext_inreg (add GPR:$rs1, (AddiPair GPR:$rs2)), i32) to produce (ADDIW (ADDI GPR:$rs1, (AddiPairImmB GPR:$rs2))). We only need the W on the outer ADDI to sign extend. That should reduce my test examples to 1 ADDIW and 2 ADDIs. Which is the same number of instructions you get now with sext.w+addi+addi in the worst case. And when there aren't additional uses you get down to just 2 instructions.
The other option is to change the pattern to (sext_inreg (add_oneuse GPR:$rs1, (AddiPair GPR:$rs2)), i32) by using a PatFrag to check that the add only has one use, the sext_inreg. That would keep my example tests as sext.w+addi+addi as the pattern wouldn't match, but allow 2 instructions when the one use check passes.
The first approach allows 2 of the addis in my test cases to execute in parallel on a superscalar core. The second approach with add_oneuse serializes the sext.w after the addis have completed.
Do we need to make sure sext_inreg is the only user of the add? Otherwise we’ll emit two ADDIWs and two ADDIs.