We also want to optimise tests like this: return a*b == 0. The MULS instruction is flag setting, so we don't need the CMP instruction but can instead branch on the result of the MULS. The generated instructions sequence for this example was: MULS, MOVS, MOVS, CMP. The MOVS instruction load the boolean values resulting from the select instruction, but these MOVS instructions are flag setting and were thus preventing this optimisation. Now we first reorder and move the MULS to before the CMP and generate sequence MOVS, MOVS, MULS, CMP so that the optimisation could still be triggered. Reordering of the MULS and MOVS is safe to do because the subsequent MOVS instructions just set the CPSR register and don't use it, i.e. the CPSR is dead.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Hi Sjoerd,
I'm not sure this is the best way to go, it looks a bit hacky. Why are we generating MOVS for those constants in the first place? It feels like MOV should do the job just as well, and wouldn't touch the flags.
Cheers,
Diana
test/CodeGen/ARM/mul-cmp-peephole.ll | ||
---|---|---|
1 ↗ | (On Diff #82108) | I think this would work better as a .mir test (use -stop-before=peephole-opt). |
28 ↗ | (On Diff #82108) | Is all this alloca clutter relevant to the test? |
Hi Diana,
Thanks a lot for the review! We are generating these constants and moves for an expression like "return A*B == C", where A and B are variables and C is a constant, because this is lowered involving a select_cc is generated. For example:
%mul = mul nsw i32 %b, %a %cmp = icmp eq i32 %mul, 0 %conv = zext i1 %cmp to i32 ret i32 %conv
this IR gets instruction selected to:
t5: i32 = mul t4, t2
t19: glue = ARMISD::CMPZ t5, Constant:i32<0> t20: i32 = ARMISD::CMOV Constant:i32<0>, Constant:i32<1>, Constant:i32<0>, Register:i32 %CPSR, t19 t11: ch,glue = CopyToReg t0, Register:i32 %R0, t20 t12: ch = ARMISD::RET_FLAG t11, Register:i32 %R0, t11:1
and then we end up with these MIs:
%vreg2<def,tied3>, %CPSR<def> = tMUL %vreg1, %vreg0<tied0>, pred:14, pred:%noreg; tGPR:%vreg2,%vreg1,%vreg0
%vreg3<def>, %CPSR<def> = tMOVi8 1, pred:14, pred:%noreg; tGPR:%vreg3 %vreg4<def>, %CPSR<def> = tMOVi8 0, pred:14, pred:%noreg; tGPR:%vreg4 tCMPi8 %vreg2<kill>, 0, pred:14, pred:%noreg, %CPSR<imp-def>; tGPR:%vreg2 %vreg5<def> = tMOVCCr_pseudo %vreg4<kill>, %vreg3<kill>, pred:0, pred:%CPSR; tGPR:%vreg5,%vreg4,%vreg3 %R0<def> = COPY %vreg5; tGPR:%vreg5 tBX_RET
And the issue is that the move immediate instruction for Thumb1 (V6M) updates the condition flags.
The elegant solution would be a bit of (early) CFG restructuring and 2 exit blocks return the true/false conditions, so that we end up with something similar to this:
MULS r0,r1,r0 BEQ |L0.8| MOVS r0,#0 BX lr
L0.8 |
MOVS r0,#1 BX lr
But to achieve that at this point is difficult, hence my solution to simply reorder the constants and the operation.
Right, sorry, I was thinking about Thumb2, where MOV doesn't touch the flags.
I guess reordering isn't so bad in this case, but it will affect the registers that you have available. Before, if the operands for tMUL weren't needed elsewhere you could reuse their registers for the constants, now you might need extra regs for them. E.g. in the first example from the test, you end up using up to r3 instead of just up to r2. I think it would be useful to run some benchmarks with this change, to make sure we're not causing trouble in more complicated cases.
Thanks.
Thanks for the suggestion and I will look into possible correctness issues a bit more, which was also on my todo list. I.e., I ran testing and that didn't show problems for lnt, dhrystone, coremark, eembc, geekbench, spec2k, but spec2k6 might have found something but need to check if that is noise or not.
I have verified correctness and this patch does not show any problems in regression tests or benchmarks. This peephole works on the vregs, so is still in SSA form. As a consequence, the movs won't redefine/kill the MUL operands which would make this reordering illegal. I have added an extra check though that the movs are not predicated. I have also changed the tests into .mir tests (and simplified the 2nd test).
lib/Target/ARM/ARMBaseInstrInfo.cpp | ||
---|---|---|
2528 ↗ | (On Diff #83042) | Typo: MOVS instructions (also on line 2531) |
2537 ↗ | (On Diff #83042) | Nitpick: I think CanReorder is a better name. |
2541 ↗ | (On Diff #83042) | Why not isPredicated? |
2541 ↗ | (On Diff #83042) | You should cover this in the tests (i.e. check that the optimization is not performed for a predicated MOV). |
test/CodeGen/ARM/cmp1-peephole-thumb.mir | ||
3 ↗ | (On Diff #83042) | Nitpick: I'd move this closer to the MIR code. Plus, since the code is dealing with reordering, it's probably a good idea to also check for what you're expecting to see in the output (i.e. the MOVs followed by the MUL). |
Thanks for reviewing again. I have fixed the typos in the comments, changed the variable name, and modified the test case. Regarding the predicated MOVs I have taken a slightly different approach. Since we only want to do this peephole optimisation for Thumb1, I first check that the instruction is actually a Thumb1 instruction. And because Thumb1 does not have conditional movs, we can safely omit the isPredicated check (testing shows no issues).
LGTM with a couple more nitpicks (you don't have to upload a new version here, just make the changes and commit).
lib/Target/ARM/ARMBaseInstrInfo.cpp | ||
---|---|---|
2570 ↗ | (On Diff #84559) | Another typo: eliminate :D |
test/CodeGen/ARM/cmp2-peephole-thumb.mir | ||
8 ↗ | (On Diff #84559) | Please move this closer to the MIR code (as you did for the other test), and also check for the compare. |