This is an archive of the discontinued LLVM Phabricator instance.

[X86] Override BuildSDIVPow2 for X86.
ClosedPublic

Authored by craig.topper on Sep 2 2019, 3:58 PM.

Details

Summary

As noted in PR43197, we can use test+add+cmov+sra to implement
signed division by a power of 2.

This is based off the similar version in AArch64, but I've
adjusted it to use target independent nodes where AArch64 uses
target specific CMP and CSEL nodes. I've also blocked INT_MIN
as the transform isn't valid for that.

I've limited this to i32 and i64 on 64-bit targets for now and only
when CMOV is supported. i8 and i16 need further investigation to be
sure they get promoted to i32 well.

I adjusted a few tests to enable cmov to demonstrate the new
codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode
without cmov to avoid perturbing the scenario that is being
set up there.

Diff Detail

Event Timeline

craig.topper created this revision.Sep 2 2019, 3:58 PM
llvm/test/CodeGen/X86/pr32588.ll
19 ↗(On Diff #218396)

Maybe not a big deal?

Middle-end can reduce this IR to
define void @fn1() local_unnamed_addr #0 {

%t0 = load i32, i32* @c, align 4
%tobool1 = icmp eq i32 %t0, 0
%xor = zext i1 %tobool1 to i32
store i32 %xor, i32* @d, align 4
ret void

}

and then llc..
fn1: # @fn1

xor     eax, eax
cmp     dword ptr [rip + c], 0
sete    al
mov     dword ptr [rip + d], eax
ret

We should add the minimal/motivating test from PR43197, so we cover that exact case, and we should have at least 1 test with a negative divisor constant rather than relying on srem transforms to get that case.

llvm/lib/Target/X86/X86ISelLowering.cpp
20093–20096

That's a complicated string of logic. It would be easier to read if it was split into 2-3 clauses:

  1. Check target capability
  2. Check type
  3. Check divisor value
20096

The min-signed check is not necessary?
https://rise4fun.com/Alive/Ki68

Name: sdiv X, {+/-} power-of-2-constant
Pre: isPowerOf2(C1) || isPowerOf2(-C1)
%r = sdiv i32 %x, C1
=>
%x_is_negative = icmp slt i32 %x, 0
%x_with_offset = add i32 %x, ((1 << countTrailingZeros(C1)) - 1)
%normx = select i1 %x_is_negative, i32 %x_with_offset, i32 %x
%a = ashr i32 %normx, countTrailingZeros(C1)
%neg = sub i32 0, %a
%divisor_is_negative = icmp slt i32 C1, 0 ; constant fold
%r = select i1 %divisor_is_negative, i32 %neg, %a
20105

This comment doesn't match what we're creating - the 'add' is the true operand of the select:
// (N0 < 0) ? (N0 + Pow2M1) : N0

Address review comments

With context

Do we already have test coverage for vector div-by-power-of-2?

xbolva00 added a comment.EditedSep 3 2019, 1:02 PM

Can you please add a test for

sdiv X, 2 ?

(gcc somehow avoids cmov in this case)

Do we already have test coverage for vector div-by-power-of-2?

There's some in vec_sdiv_to_shift.ll and vector-idiv-v2i32.ll and combine-sdiv.ll

Limit the transform to not apply to division by 2 or -2. Add more test cases

i8 and i16

gcc does this for char/long too.. Maybe you can try to lift restrictions and see how it looks like?
(if we miss test cases like sdiv i8 %x, 4 or sdiv i16 %x, 4 + srem variants - please add them - I was unable to find them in X86_combine-sdiv/ srem.ll / rem.ll)

Playing with chars, I think there is a way to improve -Oz codesize for this case too.

foo(char): # @foo(char)

mov     eax, edi
mov     cl, 4
cbw
idiv    cl
ret

gcc with -Os
foo(char):

mov     dl, 4
movsx   ax, dil
idiv    dl
ret

with short, -Oz
foo(long): # @foo(long)

mov     rax, rdi
push    4
pop     rcx
cqo
idiv    rcx
ret
spatel added a comment.Sep 4 2019, 7:03 AM

Thinking about this sequence in relation to some other folds: if the add was pulled through the select operands, we'd have a select-of-constants where the false op is a zero, and that would get reduced to a 'shift+and' and defeat the goal of this patch. So if cmov is the better option for this case, it's possible that we are doing the wrong transform in some other cases.

Another consideration: if we want to avoid having the 'select' getting squashed by some other transform, then using the x86-specific opcode should ensure that (that might be why AArch chose to go with target-specific opcodes?). Having the tests here (and as usual, we should have the baseline versions committed 1st) should signal if someone changes the behavior, but it might be worth leaving a TODO comment.

llvm/lib/Target/X86/X86ISelLowering.cpp
20103

Does the caller always check this anyway? If so, could just make it an assert.

Rebase with tests pre-committed. I've also added i8/i16/i64 tests.

spatel added inline comments.Sep 5 2019, 10:22 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
20103

This question wasn't answered. The only caller is DAGCombiner::visitSDIVLike()?

Change divisor check to an assert.

spatel accepted this revision.Sep 5 2019, 10:50 AM

LGTM

This revision is now accepted and ready to land.Sep 5 2019, 10:50 AM
This revision was automatically updated to reflect the committed changes.

I've also blocked INT_MIN as the transform isn't valid for that.

The commit message says this, but I don't see any corresponding code?

I've also blocked INT_MIN as the transform isn't valid for that.

The commit message says this, but I don't see any corresponding code?

Oops that was a mistake in my use of Alive. Sanjay pointed out it was fine, but to forgot to update the description.