This is an archive of the discontinued LLVM Phabricator instance.

[Thumb] Add support for tMUL in the compare instruction peephole optimizer
ClosedPublic

Authored by SjoerdMeijer on Dec 20 2016, 7:41 AM.

Details

Summary

We also want to optimise tests like this: return a*b == 0. The MULS instruction is flag setting, so we don't need the CMP instruction but can instead branch on the result of the MULS. The generated instructions sequence for this example was: MULS, MOVS, MOVS, CMP. The MOVS instruction load the boolean values resulting from the select instruction, but these MOVS instructions are flag setting and were thus preventing this optimisation. Now we first reorder and move the MULS to before the CMP and generate sequence MOVS, MOVS, MULS, CMP so that the optimisation could still be triggered. Reordering of the MULS and MOVS is safe to do because the subsequent MOVS instructions just set the CPSR register and don't use it, i.e. the CPSR is dead.

Diff Detail

Event Timeline

SjoerdMeijer retitled this revision from to [Thumb] Add support for tMUL in the compare instruction peephole optimizer.
SjoerdMeijer updated this object.
SjoerdMeijer added a subscriber: llvm-commits.
rovka edited edge metadata.Dec 23 2016, 3:05 AM

Hi Sjoerd,

I'm not sure this is the best way to go, it looks a bit hacky. Why are we generating MOVS for those constants in the first place? It feels like MOV should do the job just as well, and wouldn't touch the flags.

Cheers,
Diana

test/CodeGen/ARM/mul-cmp-peephole.ll
1

I think this would work better as a .mir test (use -stop-before=peephole-opt).

28

Is all this alloca clutter relevant to the test?

Hi Diana,

Thanks a lot for the review! We are generating these constants and moves for an expression like "return A*B == C", where A and B are variables and C is a constant, because this is lowered involving a select_cc is generated. For example:

%mul = mul nsw i32 %b, %a
%cmp = icmp eq i32 %mul, 0
%conv = zext i1 %cmp to i32
ret i32 %conv

this IR gets instruction selected to:

t5: i32 = mul t4, t2

    t19: glue = ARMISD::CMPZ t5, Constant:i32<0>
  t20: i32 = ARMISD::CMOV Constant:i32<0>, Constant:i32<1>, Constant:i32<0>, Register:i32 %CPSR, t19
t11: ch,glue = CopyToReg t0, Register:i32 %R0, t20
t12: ch = ARMISD::RET_FLAG t11, Register:i32 %R0, t11:1

and then we end up with these MIs:

%vreg2<def,tied3>, %CPSR<def> = tMUL %vreg1, %vreg0<tied0>, pred:14, pred:%noreg; tGPR:%vreg2,%vreg1,%vreg0

%vreg3<def>, %CPSR<def> = tMOVi8 1, pred:14, pred:%noreg; tGPR:%vreg3
%vreg4<def>, %CPSR<def> = tMOVi8 0, pred:14, pred:%noreg; tGPR:%vreg4
tCMPi8 %vreg2<kill>, 0, pred:14, pred:%noreg, %CPSR<imp-def>; tGPR:%vreg2
%vreg5<def> = tMOVCCr_pseudo %vreg4<kill>, %vreg3<kill>, pred:0, pred:%CPSR; tGPR:%vreg5,%vreg4,%vreg3
%R0<def> = COPY %vreg5; tGPR:%vreg5
tBX_RET

And the issue is that the move immediate instruction for Thumb1 (V6M) updates the condition flags.

The elegant solution would be a bit of (early) CFG restructuring and 2 exit blocks return the true/false conditions, so that we end up with something similar to this:

MULS     r0,r1,r0
BEQ      |L0.8|
MOVS     r0,#0
BX       lr
L0.8
MOVS     r0,#1
BX       lr

But to achieve that at this point is difficult, hence my solution to simply reorder the constants and the operation.

rovka added a comment.Dec 23 2016, 7:21 AM

Right, sorry, I was thinking about Thumb2, where MOV doesn't touch the flags.

I guess reordering isn't so bad in this case, but it will affect the registers that you have available. Before, if the operands for tMUL weren't needed elsewhere you could reuse their registers for the constants, now you might need extra regs for them. E.g. in the first example from the test, you end up using up to r3 instead of just up to r2. I think it would be useful to run some benchmarks with this change, to make sure we're not causing trouble in more complicated cases.

Thanks.

Thanks for the suggestion and I will look into possible correctness issues a bit more, which was also on my todo list. I.e., I ran testing and that didn't show problems for lnt, dhrystone, coremark, eembc, geekbench, spec2k, but spec2k6 might have found something but need to check if that is noise or not.

SjoerdMeijer edited edge metadata.

I have verified correctness and this patch does not show any problems in regression tests or benchmarks. This peephole works on the vregs, so is still in SSA form. As a consequence, the movs won't redefine/kill the MUL operands which would make this reordering illegal. I have added an extra check though that the movs are not predicated. I have also changed the tests into .mir tests (and simplified the 2nd test).

rovka added inline comments.Jan 11 2017, 7:46 AM
lib/Target/ARM/ARMBaseInstrInfo.cpp
2528

Typo: MOVS instructions (also on line 2531)

2537

Nitpick: I think CanReorder is a better name.

2541

Why not isPredicated?

2541

You should cover this in the tests (i.e. check that the optimization is not performed for a predicated MOV).

test/CodeGen/ARM/cmp1-peephole-thumb.mir
3 ↗(On Diff #83042)

Nitpick: I'd move this closer to the MIR code. Plus, since the code is dealing with reordering, it's probably a good idea to also check for what you're expecting to see in the output (i.e. the MOVs followed by the MUL).

Thanks for reviewing again. I have fixed the typos in the comments, changed the variable name, and modified the test case. Regarding the predicated MOVs I have taken a slightly different approach. Since we only want to do this peephole optimisation for Thumb1, I first check that the instruction is actually a Thumb1 instruction. And because Thumb1 does not have conditional movs, we can safely omit the isPredicated check (testing shows no issues).

rovka accepted this revision.Jan 19 2017, 4:44 AM

LGTM with a couple more nitpicks (you don't have to upload a new version here, just make the changes and commit).

lib/Target/ARM/ARMBaseInstrInfo.cpp
2534

Another typo: eliminate :D

test/CodeGen/ARM/cmp2-peephole-thumb.mir
8 ↗(On Diff #84559)

Please move this closer to the MIR code (as you did for the other test), and also check for the compare.

This revision is now accepted and ready to land.Jan 19 2017, 4:44 AM

Thanks! Have fixed that, and will commit.

This revision was automatically updated to reflect the committed changes.