This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Support all versions of AND, ORR, EOR and BIC in optimizeCompareInstr
ClosedPublic

Authored by fzhinkin on Aug 12 2022, 9:10 AM.

Details

Summary

Combine cmp with zero and all versions of AND, ORR, EOR and BIC instructions into S-suffixed versions.

Related issue: https://github.com/llvm/llvm-project/issues/57122

Diff Detail

Event Timeline

fzhinkin created this revision.Aug 12 2022, 9:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2022, 9:10 AM
fzhinkin added inline comments.Aug 12 2022, 9:12 AM
llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
3011–3013

Didn't support shifts as it seems very unlikely to see these instructions here.

llvm/test/CodeGen/ARM/cmp-peephole.ll
6

I'll precommit new tests if there will be no objections.

fzhinkin updated this revision to Diff 452280.Aug 12 2022, 12:58 PM

Supported shifts

fzhinkin updated this revision to Diff 454477.Aug 22 2022, 6:08 AM

Added tests on zero comparison for equality.

Built stage2 binaries and executed clang and llvm tests, all passed:

fzhinkin edited the summary of this revision. (Show Details)Sep 6 2022, 11:18 AM
fzhinkin added reviewers: dmgreen, SjoerdMeijer.
fzhinkin updated this revision to Diff 458225.Sep 6 2022, 11:20 AM
This comment was removed by fzhinkin.
fzhinkin published this revision for review.Sep 6 2022, 11:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 6 2022, 11:29 AM

Sounds like a good patch. Are all the cases covered by the tests? It looks like they are, but it's a bit hard to tell from the current file.

llvm/test/CodeGen/ARM/cmp-peephole.ll
6

Precommitting the tests sounds good.

fzhinkin updated this revision to Diff 458790.Sep 8 2022, 10:25 AM

Rebase

llvm/test/CodeGen/ARM/cmp-peephole.ll
6

Done

efriedma added inline comments.Sep 8 2022, 10:48 AM
llvm/lib/Target/ARM/ARMInstrInfo.td
4888

I'm not sure how you use the shift's S-version in general. Do we have test coverage for something like void a(int x, int y, void z()) { if ((x >> 1) == y) z(); }?

(If the issue is that we're choosing CMPrsi over CMPri, you can probably use AddedComplexity to force the preferred form.)

fzhinkin added inline comments.Sep 8 2022, 12:47 PM
llvm/lib/Target/ARM/ARMInstrInfo.td
4888

Thank you for taking a look at it! It seems like there are no tests covering this particular case (and I didn't add it either :( ), and for code that looks like something you mentioned removal of these patterns will result in a worse code sequence (shift + cmp instead of single cmp). I'll roll back this part of the change.

The issue I tried to overcome by removing two ARMcmpZ patterns is following:
consider a code like return (a >> b) != 0 ? (a >> b) : 42 (https://godbolt.org/z/EfsTq3dP4).
Before instruction selection corresponding DAG will looks like:

  t0: ch = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t4: i32,ch = CopyFromReg t0, Register:i32 %1
t5: i32 = srl t2, t4
    t16: glue = ARMISD::CMPZ t5, Constant:i32<0>
  t17: i32 = ARMISD::CMOV t5, Constant:i32<42>, Constant:i32<0>, Register:i32 $cpsr, t16
t12: ch,glue = CopyToReg t0, Register:i32 $r0, t17
t13: ch = ARMISD::RET_FLAG t12, Register:i32 $r0, t12:1

CMPZ t5 where t5 is i32 = srl t2, t4 will match pattern for ARMcmpZ, so after instr selection we'll get:

t0: ch = EntryToken
 t2: i32,ch = CopyFromReg t0, Register:i32 %0
 t4: i32,ch = CopyFromReg t0, Register:i32 %1
     t5: i32 = MOVsr t2, t4, TargetConstant:i32<3>, TargetConstant:i32<14>, Register:i32 $noreg, Register:i32 $noreg
       t6: i32 = MOVi TargetConstant:i32<0>, TargetConstant:i32<14>, Register:i32 $noreg, Register:i32 $noreg
     t16: i32,glue = CMPrsr t6, t2, t4, TargetConstant:i32<3>, TargetConstant:i32<14>, Register:i32 $noreg
   t17: i32 = MOVCCi t5, TargetConstant:i32<42>, TargetConstant:i32<0>, Register:i32 $cpsr, t16:1
 t12: ch,glue = CopyToReg t0, Register:i32 $r0, t17
 t13: ch = BX_RET TargetConstant:i32<14>, Register:i32 $noreg, Register:i32 $r0, t12, t12:1

CMPrsr (t16) is no longer related to t5 and isOptimizeCompareCandidate won't be able to replace it with S-version of t5.

I tried to use AddedComplexity, but it didn't help.
Probably, the better place to handle this case is ARMTargetLowering::getARMCmp (and there is a TODO suggesting to do that there, and I missed it previously).

fzhinkin added inline comments.Sep 9 2022, 4:35 AM
llvm/lib/Target/ARM/ARMInstrInfo.td
4888

A shameful part of why "AddedComplexity didn't help" is that I applied it incorrectly. :)
Otherwise it seems to work, I'll add test cases for code like void a(int x, int y, void z()) { if ((x >> 1) == y) z(); } and will come back with the update.

fzhinkin updated this revision to Diff 460978.Sep 17 2022, 1:51 AM

Introduced additional patterns to allow handing of shift+cmp by the peephole

samtebbs accepted this revision.Sep 20 2022, 6:11 AM

Looks pretty good to me, but maybe give the other reviewers some time for another look.

This revision is now accepted and ready to land.Sep 20 2022, 6:11 AM

Looks pretty good to me, but maybe give the other reviewers some time for another look.

Thank you!

@dmgreen @efriedma are you OK with this change?