This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Don't expand sdiv when optimising for minsize
ClosedPublic

Authored by SjoerdMeijer on Nov 14 2018, 2:33 PM.

Details

Summary

Don't expand a SDIV with an immediate that is a power of 2 if we optimise for minimum code size. For example:

    
sdiv %1, i32 4

this gets cleverly expanded to a sequence of 3 instructions: ASR, ADD (with a LSR in the addressing mode) and another ASR. But this is suboptimial for minimum code size, so instead we just generate a MOV and a SDIV.

Diff Detail

Repository
rL LLVM

Event Timeline

SjoerdMeijer created this revision.Nov 14 2018, 2:33 PM

But this is suboptimial for minimum code size, so instead we just generate a MOV and a SDIV.

This saves two bytes, if I'm following correctly?

test/CodeGen/ARM/sdiv-opt-size.ll
3 ↗(On Diff #174102)

What happens for targets that don't have an sdiv instruction? (For example, thumbv6m.)

RKSimon added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3237 ↗(On Diff #174102)

Instead of isVector() should this be based on operation legality? Promotion/expansion could be just as (or more?) expensive as the pow2 expansion. i64 on 32-bits targets for instance.

On thumbv8, if you're dividing by a power of two greater than 128, I think you save zero bytes (it's eight bytes either way); is that right?

This affects non-ARM targets; do you know what the effect is on those targets?

SjoerdMeijer added a comment.EditedNov 14 2018, 3:30 PM

Thanks for commenting!
Yes, this should save 2 bytes.
I will check tomorrow:

  • about the SDIV legality, and where exactly the SDIV is expanded when it is not supported in HW. I see that sdiv is expanded very early if it is not available, immediately after the initial SDAG, and this code in visitSDIVLike does not get triggered. But again, will double check where exactly the magic happens. I've noticed that at least some tests are in place for other targets: the minsize check is not enough, the !VT.isVector() is required for some X86 and AArch64 tests, and without it the (knock on) effect is that the vector operations get scalarized and these tests fail, but I don't think this is related the legality of SDIV.
  • I missed this: "On Thumbv8, if you're dividing by a power of two greater than 128, I think you save zero byte", so will address that.
  • what exactly the benefit is on some other targets.
SjoerdMeijer retitled this revision from [ARM] Don't expand sdiv to [DAGCombiner] Don't expand sdiv when optimising for minsize.Nov 14 2018, 3:35 PM
samparker added inline comments.Nov 15 2018, 6:16 AM
test/CodeGen/ARM/sdiv-opt-size.ll
1 ↗(On Diff #174102)

specifying the m33 is unnecessary and it's worth adding v8m.base too.

A little bit of progress:

What happens for targets that don't have an sdiv instruction? (For example, thumbv6m.)

The Legalizer takes care of it and turns it into a call to __aeabi_idiv. But it doesn't look beneficial though:

$ cat t.c
int foo(int f) {
  return f / 4;
}

before:

0x00000000:    17c1        ASRS     r1,r0,#31
0x00000002:    0f89        LSRS     r1,r1,#30
0x00000004:    1840       ADDS     r0,r0,r1
0x00000006:    1080       ASRS     r0,r0,#2
0x00000008:    4770       BX       lr

and after:

0x00000000:    b580        PUSH     {r7,lr}
0x00000002:    2104        MOVS     r1,#4
0x00000004:    f7fffffe    BL       __aeabi_idiv
0x00000008:    bd80         POP      {r7,pc}

This isn't great in many ways: same size, it's slower, and we still need to pull in code for __aeabi_idiv (if it isn't already linked in). Easiest and best thing to do here is to expand SDIV as before. I will see if I can query if SDIV is legal, and add that as a check here.

For X86, this code path doesn't seem to be triggered at all and it is already generating an idiv:

0:	89 f8                	mov    %edi,%eax
2:	6a 05                	pushq  $0x5
4:	59                   	pop    %rcx
5:	99                   	cltd   
6:	f7 f9                	idiv   %ecx
8:	c3                   	retq

From a quick look for X86, it leaves the SDIV intact, because it disappears in function BuildSDIVPow2 on line 3233 just before my addition there, queries TLI.BuildSDIVPow2, which returns this SDIV node.

SjoerdMeijer updated this revision to Diff 174747.EditedNov 20 2018, 4:58 AM
SjoerdMeijer retitled this revision from [DAGCombiner] Don't expand sdiv when optimising for minsize to [ARM] Don't expand sdiv when optimising for minsize.

While looking what X86 was doing (and why it was doing the right thing), I noticed that I should just implement TargetLowering::BuildSDIVPow2. So that's what I am doing now, and thus it's no longer a DAGCombine change but just an ARM one.
Different tests and targets have been added, and a check for larger constants.

efriedma added inline comments.Nov 20 2018, 11:56 AM
lib/Target/ARM/ARMISelLowering.cpp
7806 ↗(On Diff #174747)

ARM mode? It looks like the rest of the function doesn't handle it correctly, and you don't have any tests.

test/CodeGen/ARM/sdiv-opt-size.ll
11 ↗(On Diff #174747)

Could you also check that we don't generate any libcalls for v6m?

18 ↗(On Diff #174747)

This sxth is redundant? Probably need to implement ComputeNumSignBits for SDIV, or something like that.

SjoerdMeijer added inline comments.Nov 21 2018, 6:03 AM
lib/Target/ARM/ARMISelLowering.cpp
7806 ↗(On Diff #174747)

Oops, thanks for catching this! I completely focused on Thumb, and forgot about ARM. But yes, will add ARM too.

test/CodeGen/ARM/sdiv-opt-size.ll
11 ↗(On Diff #174747)

Yep, will do.

18 ↗(On Diff #174747)

Interesting! Hadn't looked at that, but yes, that's redundant. Will address that in a follow-up patch.

RKSimon added inline comments.Nov 21 2018, 7:00 AM
lib/Target/ARM/ARMISelLowering.cpp
7812 ↗(On Diff #174747)

Why do you need this? BuildSDIVPow2 is only called with constant (pow2) divisors, also it'll fail for constant vectors.

Thanks for reviewing!

  • removed the unnecessary assert
  • added support for ARM mode, and tests.
efriedma added inline comments.Nov 21 2018, 2:10 PM
test/CodeGen/ARM/sdiv-pow2-arm-size.ll
14 ↗(On Diff #174930)

Not sure the libcall is the right choice... yes, in your tiny testcase, it saves one instruction, but a libcall is going to force spilling for most realistic functions.

SjoerdMeijer added inline comments.Nov 22 2018, 8:41 AM
test/CodeGen/ARM/sdiv-pow2-arm-size.ll
14 ↗(On Diff #174930)

Ok, yes, fair enough. I will change this and also in ARM mode won't generate the sdiv if hwdiv is unsupported.

  • Don't create div libcalls in ARM mode,
  • Bail for SREM instructions. This wasn't checked, and resulted in a crash. Thanks to @dmgreen for pointing this out. Will follow up on this.
efriedma added inline comments.Nov 26 2018, 1:42 PM
lib/Target/ARM/ARMISelLowering.cpp
7820 ↗(On Diff #175129)

Please early return here instead of changing the value of SDIV.

Bail earlier if minsize or hwdiv is not enabled.

This revision is now accepted and ready to land.Nov 29 2018, 3:32 PM
This revision was automatically updated to reflect the committed changes.