Page MenuHomePhabricator

[X86] Override BuildSDIVPow2 for X86.

Authored by craig.topper on Mon, Sep 2, 3:58 PM.



As noted in PR43197, we can use test+add+cmov+sra to implement
signed division by a power of 2.

This is based off the similar version in AArch64, but I've
adjusted it to use target independent nodes where AArch64 uses
target specific CMP and CSEL nodes. I've also blocked INT_MIN
as the transform isn't valid for that.

I've limited this to i32 and i64 on 64-bit targets for now and only
when CMOV is supported. i8 and i16 need further investigation to be
sure they get promoted to i32 well.

I adjusted a few tests to enable cmov to demonstrate the new
codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode
without cmov to avoid perturbing the scenario that is being
set up there.

Diff Detail


Event Timeline

craig.topper created this revision.Mon, Sep 2, 3:58 PM
19 ↗(On Diff #218396)

Maybe not a big deal?

Middle-end can reduce this IR to
define void @fn1() local_unnamed_addr #0 {

%t0 = load i32, i32* @c, align 4
%tobool1 = icmp eq i32 %t0, 0
%xor = zext i1 %tobool1 to i32
store i32 %xor, i32* @d, align 4
ret void


and then llc..
fn1: # @fn1

xor     eax, eax
cmp     dword ptr [rip + c], 0
sete    al
mov     dword ptr [rip + d], eax

We should add the minimal/motivating test from PR43197, so we cover that exact case, and we should have at least 1 test with a negative divisor constant rather than relying on srem transforms to get that case.

20109–20112 ↗(On Diff #218396)

That's a complicated string of logic. It would be easier to read if it was split into 2-3 clauses:

  1. Check target capability
  2. Check type
  3. Check divisor value
20112 ↗(On Diff #218396)

The min-signed check is not necessary?

Name: sdiv X, {+/-} power-of-2-constant
Pre: isPowerOf2(C1) || isPowerOf2(-C1)
%r = sdiv i32 %x, C1
%x_is_negative = icmp slt i32 %x, 0
%x_with_offset = add i32 %x, ((1 << countTrailingZeros(C1)) - 1)
%normx = select i1 %x_is_negative, i32 %x_with_offset, i32 %x
%a = ashr i32 %normx, countTrailingZeros(C1)
%neg = sub i32 0, %a
%divisor_is_negative = icmp slt i32 C1, 0 ; constant fold
%r = select i1 %divisor_is_negative, i32 %neg, %a
20121 ↗(On Diff #218396)

This comment doesn't match what we're creating - the 'add' is the true operand of the select:
// (N0 < 0) ? (N0 + Pow2M1) : N0

Address review comments

With context

Do we already have test coverage for vector div-by-power-of-2?

xbolva00 added a comment.EditedTue, Sep 3, 1:02 PM

Can you please add a test for

sdiv X, 2 ?

(gcc somehow avoids cmov in this case)

Do we already have test coverage for vector div-by-power-of-2?

There's some in vec_sdiv_to_shift.ll and vector-idiv-v2i32.ll and combine-sdiv.ll

Limit the transform to not apply to division by 2 or -2. Add more test cases

i8 and i16

gcc does this for char/long too.. Maybe you can try to lift restrictions and see how it looks like?
(if we miss test cases like sdiv i8 %x, 4 or sdiv i16 %x, 4 + srem variants - please add them - I was unable to find them in X86_combine-sdiv/ srem.ll / rem.ll)

Playing with chars, I think there is a way to improve -Oz codesize for this case too.

foo(char): # @foo(char)

mov     eax, edi
mov     cl, 4
idiv    cl

gcc with -Os

mov     dl, 4
movsx   ax, dil
idiv    dl

with short, -Oz
foo(long): # @foo(long)

mov     rax, rdi
push    4
pop     rcx
idiv    rcx
spatel added a comment.Wed, Sep 4, 7:03 AM

Thinking about this sequence in relation to some other folds: if the add was pulled through the select operands, we'd have a select-of-constants where the false op is a zero, and that would get reduced to a 'shift+and' and defeat the goal of this patch. So if cmov is the better option for this case, it's possible that we are doing the wrong transform in some other cases.

Another consideration: if we want to avoid having the 'select' getting squashed by some other transform, then using the x86-specific opcode should ensure that (that might be why AArch chose to go with target-specific opcodes?). Having the tests here (and as usual, we should have the baseline versions committed 1st) should signal if someone changes the behavior, but it might be worth leaving a TODO comment.

20110 ↗(On Diff #218527)

Does the caller always check this anyway? If so, could just make it an assert.

Rebase with tests pre-committed. I've also added i8/i16/i64 tests.

spatel added inline comments.Thu, Sep 5, 10:22 AM
20110 ↗(On Diff #218527)

This question wasn't answered. The only caller is DAGCombiner::visitSDIVLike()?

Change divisor check to an assert.

spatel accepted this revision.Thu, Sep 5, 10:50 AM


This revision is now accepted and ready to land.Thu, Sep 5, 10:50 AM
Closed by commit rL371104: [X86] Override BuildSDIVPow2 for X86. (authored by ctopper, committed by ). · Explain WhyThu, Sep 5, 11:14 AM
This revision was automatically updated to reflect the committed changes.

I've also blocked INT_MIN as the transform isn't valid for that.

The commit message says this, but I don't see any corresponding code?

I've also blocked INT_MIN as the transform isn't valid for that.

The commit message says this, but I don't see any corresponding code?

Oops that was a mistake in my use of Alive. Sanjay pointed out it was fine, but to forgot to update the description.