This is an archive of the discontinued LLVM Phabricator instance.

[x86] Improve select lowering for smin(x, 0) & smax(x, 0)
ClosedPublic

Authored by wxiao3 on Apr 5 2022, 1:06 AM.

Details

Summary

smin(x, 0):

(select (x < 0), x, 0) -> ((x >> (size_in_bits(x)-1))) & x

smax(x, 0):

(select (x > 0), x, 0) -> (~(x >> (size_in_bits(x)-1))) & x
The comparison is testing for a positive value, we have to invert the sign
bit mask, so only do that transform if the target has a bitwise 'and not'
instruction (the invert is free).

The transform is performed only when CMP has a single user to avoid
increasing total instruction number.

https://alive2.llvm.org/ce/z/euUnNm
https://alive2.llvm.org/ce/z/37339J

Diff Detail

Event Timeline

wxiao3 created this revision.Apr 5 2022, 1:06 AM
wxiao3 requested review of this revision.Apr 5 2022, 1:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2022, 1:06 AM
wxiao3 updated this revision to Diff 420403.Apr 5 2022, 1:11 AM

update the comments.

wxiao3 edited the summary of this revision. (Show Details)Apr 5 2022, 1:12 AM
RKSimon added a subscriber: RKSimon.Apr 5 2022, 1:44 AM
RKSimon added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
24703

What about the more general cases (select (x > 0), y, 0) & (select (x < 0), y, 0)?

Although I think that requires a freeze:

define i8 @src(i8 %x, i8 %y) {
%0:
  %c = icmp sge i8 %x, 0
  %r = select i1 %c, i8 %y, i8 0
  ret i8 %r
}
=>
define i8 @tgt(i8 %x, i8 %y) {
%0:
  %c = ashr i8 %x, 7
  %m = xor i8 %c, 255
  %f = freeze i8 %y
  %r = and i8 %m, %f
  ret i8 %r
}
Transformation seems to be correct!
wxiao3 added inline comments.Apr 5 2022, 2:28 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
24703

you're right that we can make it more general.
I restrict it on purpose to align with DAGCombiner

so that for following test:

$ cat t1.ll
define i32 @test1_ir(i32 %a, i32 %b) nounwind {
  %tmp1 = icmp sgt i32 %a, %b
  %r = select i1 %tmp1, i32 %a, i32 %b
  ret i32 %r
}

define i32 @test1_intrinsic(i32 %a, i32 %b) nounwind {
  %r = call i32 @llvm.smax.i32(i32 %a, i32 %b)
  ret i32 %r
}

define i32 @test2_ir(i32 %a) nounwind {
  %tmp1 = icmp sgt i32 %a, 0
  %r = select i1 %tmp1, i32 %a, i32 0
  ret i32 %r
}

define i32 @test2_intrinsic(i32 %a) nounwind {
  %r = call i32 @llvm.smax.i32(i32 %a, i32 0)
  ret i32 %r
}

declare i32 @llvm.smax.i32(i32, i32)

we can generate the same assembly between ir version and intrinsic version.

I'd like to prepare another patch to make it more general according to your suggestions by relaxing the restriction in both here and DAGCombiner at the same time.

pengfei added inline comments.Apr 5 2022, 9:24 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
24785

clang-format

llvm/test/CodeGen/X86/fpclamptosat_vec.ll
2986–3006 ↗(On Diff #420403)

I see the right side has more instructions, is this still better in performance?

wxiao3 updated this revision to Diff 420843.Apr 6 2022, 7:33 AM

Address Phoebe's comments.

wxiao3 retitled this revision from [x86] improve select lowering for smin(x, 0) & smax(x, 0) to [x86] Improve select lowering for smin(x, 0) & smax(x, 0).Apr 6 2022, 7:34 AM
wxiao3 marked an inline comment as done.Apr 6 2022, 7:37 AM
wxiao3 added inline comments.
llvm/test/CodeGen/X86/fpclamptosat_vec.ll
2986–3006 ↗(On Diff #420403)

Good catch!
We can do the transformation only when CMP has a single user. Otherwise, the total instruction number will be increased like here. I have updated patch to add the restriction.

wxiao3 edited the summary of this revision. (Show Details)Apr 6 2022, 7:38 AM
pengfei accepted this revision.Apr 6 2022, 7:48 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Apr 6 2022, 7:48 AM
lebedev.ri edited the summary of this revision. (Show Details)Apr 6 2022, 7:57 AM
RKSimon added inline comments.Apr 6 2022, 8:09 AM
llvm/test/CodeGen/X86/select-smin-smax.ll
2–3

It's be useful to test without bmi as well

lebedev.ri accepted this revision.Apr 6 2022, 10:43 AM
lebedev.ri added a subscriber: lebedev.ri.

LG, i'm not really sure for which CPU's this improves things,
but i don't think it would regress things,
and avoiding EFLAGS is a win.

wxiao3 updated this revision to Diff 421068.Apr 6 2022, 7:21 PM

Address Simon's comments.

wxiao3 marked an inline comment as done.Apr 6 2022, 7:22 PM
pengfei added inline comments.Apr 6 2022, 7:25 PM
llvm/test/CodeGen/X86/select-smin-smax.ll
2–3

Use CHECK,CHECK-NOBMI and CHECK,CHECK-BMI for common ones.

wxiao3 updated this revision to Diff 421071.Apr 6 2022, 7:59 PM

Address Phoebe's comments.

wxiao3 marked an inline comment as done.Apr 6 2022, 8:00 PM
This revision was automatically updated to reflect the committed changes.