This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form
a != b ? x : 0 a == b ? 0 : x
and that currently (e.g. in Thumb1) are emitted as branches.
Paths
| Differential D34515
[ARM] Materialise some boolean values to avoid a branch ClosedPublic Authored by rogfer01 on Jun 22 2017, 8:41 AM.
Details Summary This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form a != b ? x : 0 a == b ? 0 : x and that currently (e.g. in Thumb1) are emitted as branches.
Diff Detail Event TimelineHerald added subscribers: kristof.beyls, javed.absar, aemerson. ยท View Herald TranscriptJun 22 2017, 8:41 AM Comment Actions A lot of these combines look very similar to the work done for ISD::ADDCARRY/SUBCARRY; see https://reviews.llvm.org/D29872 . Can we reuse that work rather than add a bunch of ARM-specific code? Comment Actions Hi @efriedma, thanks for pointing me to the ADDCARRY/SUBCARRY nodes. I tried to use them but I ran into serious problems (maybe due to some inexperience on my side working with SelectionDAG). First if I want to use them I assume I have to make them legal for the ARM back end, this means that I have to lower them somehow. After the short discussion in the list about the semantics of SUBCARRY I made ISD::ADDCARRY to lower to ARMISD::ADDC and ISD::SUBCARRY to lower to ARMISD::SUBC plus doing an extra (ISD::ADD 1, c) to the carry result, to change from borrow semantics (ISD::SUBCARRY semantics) to ARM carry semantics. One side effect of doing this is that many operations that were previously legalized using ISD::ADD and ISD::ADDC like i64 add/sub are now legalized using ISD::ADDCARRY. And one generic combiner for ISD::ADDCARRY when it has a zero carry transforms it into ISD::UADDO (similarly ISD::USUBO for ISD::SUBCARRY without borrow). This, unfortunately, leads to very poor (and probably wrong) code involving msr / mrs instructions. I tried to fix some of these issues but still it is very easy that sometimes the carry value is moved to some general purpose register and the backend decides to use mrs. For instance a case like this define i64 @f3(i32 %al, i32 %bl) { entry: ; unsigned wide add %aw = zext i32 %al to i64 %bw = zext i32 %bl to i64 %cw = add i64 %aw, %bw ; ch == carry bit %ch = lshr i64 %cw, 32 %dw = add i64 %ch, %bw ret i64 %dw } leads to something like this before instruction selection SelectionDAG has 13 nodes: t0: ch = EntryToken t4: i32,ch = CopyFromReg t0, Register:i32 %vreg1 t17: ch,glue = CopyToReg t0, Register:i32 %R0, t28 t19: ch,glue = CopyToReg t17, Register:i32 %R1, t28:1, t17:1 t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 t29: i32,i32 = ARMISD::ADDC t2, t4 t28: i32,i32 = ARMISD::ADDE t4, Constant:i32<0>, t29:1 t20: ch = ARMISD::RET_FLAG t19, Register:i32 %R0, Register:i32 %R1, t19:1 but the t19: ch,glue = CopyToReg t17, Register:i32 %R1, t28:1, t17:1 ends being emitted as mrs r1, apsr, below is the final code generated by llc -mtriple=armv6t2-eabi which I think is really wrong as the apsr register has more bits than the carry. f3: adds r0, r0, r1 adcs r0, r1, #0 mrs r1, apsr bx lr So my understanding is that using these nodes, while something that has to be done in the future, is not an easy addition to this change. The upside is that it is possible to express the original change of this patch using the ADDCARRY and SUBCARRY nodes, so this may still be worth the effort. Do you agree with my analysis or I am missing something very obvious here. I can upload a phab with the current change as it is if it helps. Thank you very much, Comment Actions If you're seeing references to apsr, you aren't lowering ADDCARRY correctly. ARMISD::ADDE has three operands; two integers, and apsr. It has two results: an integer, and the resulting apsr. If you're lowering ADDCARRY to ARMISD::ADDE, you actually need three operations: one to convert the boolean carry input to an apsr, one ARMISD::ADDE to perform the actual operation, and one operation to convert the result apsr to a boolean. See LowerADDSUBCARRY for how the x86 backend does this. (Afterwards, you use a target dagcombine to eliminate the redundant operations.) Comment Actions Hi Eli, to make this more manageable I opened D35192 in which I only use ADDCARRY and SUBCARRY. Once we're happy with that one I will update this change to use the new nodes. Regards, hans mentioned this in rL312980: Revert r312898 "[ARM] Use ADDCARRY / SUBCARRY".Sep 11 2017, 4:53 PM hans mentioned this in rL313044: Revert r313009 "[ARM] Use ADDCARRY / SUBCARRY".Sep 12 2017, 9:25 AM
rogfer01 marked 5 inline comments as done. Comment ActionsChangeLog:
Comment Actions "it" is more like an instruction prefix than an actual instruction from the perspective of the CPU; it gets decoded into the same thing as an ARM mode conditional instruction. It's not an improvement to replace "it" with an actual instruction. Comment Actions @efriedma Ah I see. If I get you right, the initial change was more sensible. Also Sam's concerns were on a file that is explicitly marked as -O0. I will go back the original change. Comment Actions This patch would be easier to review if you would commit the changes to add new tests and RUN lines separately.
Comment Actions ChangeLog:
Comment Actions ChangeLog
Comment Actions The use of clz is very similar to what the PowerPC backend does; I wonder if we can pick up any additional tricks from there.
Comment Actions ChangeLog:
Comment Actions @efriedma @samparker any further comments on this change? We can add more clz-like tricks in later changes if needed.
Comment Actions No other points from me, lets get this monster in. (fingers crossed)
This revision is now accepted and ready to land.Feb 15 2018, 6:35 AM Closed by commit rL325323: [ARM] Materialise some boolean values to avoid a branch (authored by rogfer01). ยท Explain WhyFeb 16 2018, 1:26 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 131293 lib/Target/ARM/ARMISelLowering.cpp
test/CodeGen/ARM/and-load-combine.ll
test/CodeGen/ARM/atomic-cmpxchg.ll
test/CodeGen/ARM/cmn.ll
test/CodeGen/ARM/cmp.ll
test/CodeGen/ARM/cmpxchg-O0.ll
test/CodeGen/ARM/fp16-promote.ll
test/CodeGen/ARM/long-setcc.ll
test/CodeGen/ARM/select-imm.ll
test/CodeGen/ARM/setcc-logic.ll
test/CodeGen/Thumb/branchless-cmp.lltest/CodeGen/Thumb/constants.ll
test/CodeGen/Thumb/long-setcc.lltest/CodeGen/Thumb2/float-cmp.ll
test/CodeGen/Thumb2/thumb2-cmn.ll
test/CodeGen/Thumb2/thumb2-cmn2.ll
test/CodeGen/Thumb2/thumb2-cmp.ll
test/CodeGen/Thumb2/thumb2-teq.ll
test/CodeGen/Thumb2/thumb2-teq2.ll
test/CodeGen/Thumb2/thumb2-tst.ll
test/CodeGen/Thumb2/thumb2-tst2.ll
|
When do we hit this case? I would expect that normally DAGCombiner::visitADDCARRY would combine this away.