Download Raw Diff

Details

Reviewers

sjarus
pengfei
craig.topper
LuoYuanke
RKSimon
lebedev.ri

Commits

rG842d0bf93176: [x86] Improve select lowering for smin(x, 0) & smax(x, 0)

Summary

smin(x, 0):

(select (x < 0), x, 0) -> ((x >> (size_in_bits(x)-1))) & x

smax(x, 0):

(select (x > 0), x, 0) -> (~(x >> (size_in_bits(x)-1))) & x
The comparison is testing for a positive value, we have to invert the sign
bit mask, so only do that transform if the target has a bitwise 'and not'
instruction (the invert is free).

The transform is performed only when CMP has a single user to avoid
increasing total instruction number.

https://alive2.llvm.org/ce/z/euUnNm
https://alive2.llvm.org/ce/z/37339J

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wxiao3 created this revision.Apr 5 2022, 1:06 AM

Herald added a reviewer: sjarus. · View Herald TranscriptApr 5 2022, 1:06 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: armkevincheng, eric-k256, pengfei, hiraditya. · View Herald Transcript

wxiao3 requested review of this revision.Apr 5 2022, 1:06 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2022, 1:06 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

wxiao3 added reviewers: pengfei, craig.topper, LuoYuanke.Apr 5 2022, 1:08 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptApr 5 2022, 1:08 AM

update the comments.

wxiao3 edited the summary of this revision. (Show Details)Apr 5 2022, 1:12 AM

RKSimon added a subscriber: RKSimon.Apr 5 2022, 1:44 AM

RKSimon added a reviewer: RKSimon.Apr 5 2022, 1:49 AM

RKSimon added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp

24703

What about the more general cases (select (x > 0), y, 0) & (select (x < 0), y, 0)?

Although I think that requires a freeze:

define i8 @src(i8 %x, i8 %y) {
%0:
  %c = icmp sge i8 %x, 0
  %r = select i1 %c, i8 %y, i8 0
  ret i8 %r
}
=>
define i8 @tgt(i8 %x, i8 %y) {
%0:
  %c = ashr i8 %x, 7
  %m = xor i8 %c, 255
  %f = freeze i8 %y
  %r = and i8 %m, %f
  ret i8 %r
}
Transformation seems to be correct!

Harbormaster completed remote builds in B157905: Diff 420403.Apr 5 2022, 2:02 AM

wxiao3 added inline comments.Apr 5 2022, 2:28 AM

llvm/lib/Target/X86/X86ISelLowering.cpp

24703

you're right that we can make it more general.
I restrict it on purpose to align with DAGCombiner

so that for following test:

$ cat t1.ll
define i32 @test1_ir(i32 %a, i32 %b) nounwind {
  %tmp1 = icmp sgt i32 %a, %b
  %r = select i1 %tmp1, i32 %a, i32 %b
  ret i32 %r
}

define i32 @test1_intrinsic(i32 %a, i32 %b) nounwind {
  %r = call i32 @llvm.smax.i32(i32 %a, i32 %b)
  ret i32 %r
}

define i32 @test2_ir(i32 %a) nounwind {
  %tmp1 = icmp sgt i32 %a, 0
  %r = select i1 %tmp1, i32 %a, i32 0
  ret i32 %r
}

define i32 @test2_intrinsic(i32 %a) nounwind {
  %r = call i32 @llvm.smax.i32(i32 %a, i32 0)
  ret i32 %r
}

declare i32 @llvm.smax.i32(i32, i32)

we can generate the same assembly between ir version and intrinsic version.

I'd like to prepare another patch to make it more general according to your suggestions by relaxing the restriction in both here and DAGCombiner at the same time.

pengfei added inline comments.Apr 5 2022, 9:24 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
24785	clang-format
llvm/test/CodeGen/X86/fpclamptosat_vec.ll
2986–3006 ↗	(On Diff #420403)	I see the right side has more instructions, is this still better in performance?

Address Phoebe's comments.

wxiao3 retitled this revision from [x86] improve select lowering for smin(x, 0) & smax(x, 0) to [x86] Improve select lowering for smin(x, 0) & smax(x, 0).Apr 6 2022, 7:34 AM

wxiao3 marked an inline comment as done.Apr 6 2022, 7:37 AM

wxiao3 added inline comments.

llvm/test/CodeGen/X86/fpclamptosat_vec.ll
2986–3006 ↗	(On Diff #420403)	Good catch! We can do the transformation only when CMP has a single user. Otherwise, the total instruction number will be increased like here. I have updated patch to add the restriction.

wxiao3 edited the summary of this revision. (Show Details)Apr 6 2022, 7:38 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Apr 6 2022, 7:48 AM

lebedev.ri edited the summary of this revision. (Show Details)Apr 6 2022, 7:57 AM

RKSimon added inline comments.Apr 6 2022, 8:09 AM

llvm/test/CodeGen/X86/select-smin-smax.ll
2–3	It's be useful to test without bmi as well

Harbormaster completed remote builds in B158215: Diff 420843.Apr 6 2022, 8:13 AM

LG, i'm not really sure for which CPU's this improves things,
but i don't think it would regress things,
and avoiding EFLAGS is a win.

Address Simon's comments.

wxiao3 marked an inline comment as done.Apr 6 2022, 7:22 PM

pengfei added inline comments.Apr 6 2022, 7:25 PM

llvm/test/CodeGen/X86/select-smin-smax.ll
2–3	Use `CHECK,CHECK-NOBMI` and `CHECK,CHECK-BMI` for common ones.

Address Phoebe's comments.

wxiao3 marked an inline comment as done.Apr 6 2022, 8:00 PM

Harbormaster completed remote builds in B158384: Diff 421071.Apr 6 2022, 9:15 PM

Closed by commit rG842d0bf93176: [x86] Improve select lowering for smin(x, 0) & smax(x, 0) (authored by wxiao3). · Explain WhyApr 7 2022, 12:53 AM

This revision was automatically updated to reflect the committed changes.

wxiao3 added a commit: rG842d0bf93176: [x86] Improve select lowering for smin(x, 0) & smax(x, 0).

Diff 421116

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 24,693 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerSELECT(SDValue Op, SelectionDAG &DAG) const {
}		}

// (select (x == 0), -1, y) -> (sign_bit (x - 1)) \| y		// (select (x == 0), -1, y) -> (sign_bit (x - 1)) \| y
// (select (x == 0), y, -1) -> ~(sign_bit (x - 1)) \| y		// (select (x == 0), y, -1) -> ~(sign_bit (x - 1)) \| y
// (select (x != 0), y, -1) -> (sign_bit (x - 1)) \| y		// (select (x != 0), y, -1) -> (sign_bit (x - 1)) \| y
// (select (x != 0), -1, y) -> ~(sign_bit (x - 1)) \| y		// (select (x != 0), -1, y) -> ~(sign_bit (x - 1)) \| y
// (select (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) ^ y		// (select (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) ^ y
// (select (and (x , 0x1) == 0), y, (z \| y) ) -> (-(and (x , 0x1)) & z ) \| y		// (select (and (x , 0x1) == 0), y, (z \| y) ) -> (-(and (x , 0x1)) & z ) \| y
		// (select (x > 0), x, 0) -> (~(x >> (size_in_bits(x)-1))) & x
		// (select (x < 0), x, 0) -> ((x >> (size_in_bits(x)-1))) & x
		RKSimonUnsubmitted Not Done Reply Inline Actions What about the more general cases (select (x > 0), y, 0) & (select (x < 0), y, 0)? Although I think that requires a freeze: define i8 @src(i8 %x, i8 %y) { %0: %c = icmp sge i8 %x, 0 %r = select i1 %c, i8 %y, i8 0 ret i8 %r } => define i8 @tgt(i8 %x, i8 %y) { %0: %c = ashr i8 %x, 7 %m = xor i8 %c, 255 %f = freeze i8 %y %r = and i8 %m, %f ret i8 %r } Transformation seems to be correct! RKSimon: What about the more general cases (select (x > 0), y, 0) & (select (x < 0), y, 0)? Although I…
		wxiao3AuthorUnsubmitted Done Reply Inline Actions you're right that we can make it more general. I restrict it on purpose to align with DAGCombiner so that for following test: $ cat t1.ll define i32 @test1_ir(i32 %a, i32 %b) nounwind { %tmp1 = icmp sgt i32 %a, %b %r = select i1 %tmp1, i32 %a, i32 %b ret i32 %r } define i32 @test1_intrinsic(i32 %a, i32 %b) nounwind { %r = call i32 @llvm.smax.i32(i32 %a, i32 %b) ret i32 %r } define i32 @test2_ir(i32 %a) nounwind { %tmp1 = icmp sgt i32 %a, 0 %r = select i1 %tmp1, i32 %a, i32 0 ret i32 %r } define i32 @test2_intrinsic(i32 %a) nounwind { %r = call i32 @llvm.smax.i32(i32 %a, i32 0) ret i32 %r } declare i32 @llvm.smax.i32(i32, i32) we can generate the same assembly between ir version and intrinsic version. I'd like to prepare another patch to make it more general according to your suggestions by relaxing the restriction in both here and DAGCombiner at the same time. wxiao3: you're right that we can make it more general. I restrict it on purpose to align with [[ https…
if (Cond.getOpcode() == X86ISD::SETCC &&		if (Cond.getOpcode() == X86ISD::SETCC &&
Cond.getOperand(1).getOpcode() == X86ISD::CMP &&		Cond.getOperand(1).getOpcode() == X86ISD::CMP &&
isNullConstant(Cond.getOperand(1).getOperand(1))) {		isNullConstant(Cond.getOperand(1).getOperand(1))) {
SDValue Cmp = Cond.getOperand(1);		SDValue Cmp = Cond.getOperand(1);
SDValue CmpOp0 = Cmp.getOperand(0);		SDValue CmpOp0 = Cmp.getOperand(0);
unsigned CondCode = Cond.getConstantOperandVal(0);		unsigned CondCode = Cond.getConstantOperandVal(0);

// Special handling for __builtin_ffs(X) - 1 pattern which looks like		// Special handling for __builtin_ffs(X) - 1 pattern which looks like
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	if (Subtarget.canUseCMOV() && (VT == MVT::i32 \|\| VT == MVT::i64) &&
DAG.getConstant(1, DL, VT));		DAG.getConstant(1, DL, VT));
else		else
Neg = CmpOp0;		Neg = CmpOp0;
SDValue Mask = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT),		SDValue Mask = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT),
Neg); // -(and (x, 0x1))		Neg); // -(and (x, 0x1))
SDValue And = DAG.getNode(ISD::AND, DL, VT, Mask, Src1); // Mask & z		SDValue And = DAG.getNode(ISD::AND, DL, VT, Mask, Src1); // Mask & z
return DAG.getNode(Op2.getOpcode(), DL, VT, And, Src2); // And Op y		return DAG.getNode(Op2.getOpcode(), DL, VT, And, Src2); // And Op y
}		}
		} else if ((VT == MVT::i32 \|\| VT == MVT::i64) && isNullConstant(Op2) &&
		Cmp.getNode()->hasOneUse() && (CmpOp0 == Op1) &&
		pengfeiUnsubmitted Done Reply Inline Actions clang-format pengfei: clang-format
		((CondCode == X86::COND_S) \|\| // smin(x, 0)
		(CondCode == X86::COND_G && hasAndNot(Op1)))) { // smax(x, 0)
		// (select (x < 0), x, 0) -> ((x >> (size_in_bits(x)-1))) & x
		//
		// If the comparison is testing for a positive value, we have to invert
		// the sign bit mask, so only do that transform if the target has a
		// bitwise 'and not' instruction (the invert is free).
		// (select (x > 0), x, 0) -> (~(x >> (size_in_bits(x)-1))) & x
		unsigned ShCt = VT.getSizeInBits() - 1;
		SDValue ShiftAmt = DAG.getConstant(ShCt, DL, VT);
		SDValue Shift = DAG.getNode(ISD::SRA, DL, VT, Op1, ShiftAmt);
		if (CondCode == X86::COND_G)
		Shift = DAG.getNOT(DL, Shift, VT);
		return DAG.getNode(ISD::AND, DL, VT, Shift, Op1);
}		}
}		}

// Look past (and (setcc_carry (cmp ...)), 1).		// Look past (and (setcc_carry (cmp ...)), 1).
if (Cond.getOpcode() == ISD::AND &&		if (Cond.getOpcode() == ISD::AND &&
Cond.getOperand(0).getOpcode() == X86ISD::SETCC_CARRY &&		Cond.getOperand(0).getOpcode() == X86ISD::SETCC_CARRY &&
isOneConstant(Cond.getOperand(1)))		isOneConstant(Cond.getOperand(1)))
Cond = Cond.getOperand(0);		Cond = Cond.getOperand(0);
▲ Show 20 Lines • Show All 30,985 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/select-smin-smax.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=-bmi < %s \| FileCheck %s --check-prefixes=CHECK,CHECK-NOBMI
				; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+bmi < %s \| FileCheck %s --check-prefixes=CHECK,CHECK-BMI
				RKSimonUnsubmitted Done Reply Inline Actions It's be useful to test without bmi as well RKSimon: It's be useful to test without bmi as well
				pengfeiUnsubmitted Done Reply Inline Actions Use `CHECK,CHECK-NOBMI` and `CHECK,CHECK-BMI` for common ones. pengfei: Use `CHECK,CHECK-NOBMI` and `CHECK,CHECK-BMI` for common ones.

	declare i32 @llvm.smax.i32(i32, i32)			declare i32 @llvm.smax.i32(i32, i32)
	declare i32 @llvm.smin.i32(i32, i32)			declare i32 @llvm.smin.i32(i32, i32)
	declare i64 @llvm.smax.i64(i64, i64)			declare i64 @llvm.smax.i64(i64, i64)
	declare i64 @llvm.smin.i64(i64, i64)			declare i64 @llvm.smin.i64(i64, i64)

	define i32 @test_i32_smax(i32 %a) nounwind {			define i32 @test_i32_smax(i32 %a) nounwind {
	; CHECK-LABEL: test_i32_smax:			; CHECK-NOBMI-LABEL: test_i32_smax:
	; CHECK: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NOBMI-NEXT: xorl %eax, %eax
	; CHECK-NEXT: testl %edi, %edi			; CHECK-NOBMI-NEXT: testl %edi, %edi
	; CHECK-NEXT: cmovgl %edi, %eax			; CHECK-NOBMI-NEXT: cmovgl %edi, %eax
	; CHECK-NEXT: retq			; CHECK-NOBMI-NEXT: retq
				;
				; CHECK-BMI-LABEL: test_i32_smax:
				; CHECK-BMI: # %bb.0:
				; CHECK-BMI-NEXT: movl %edi, %eax
				; CHECK-BMI-NEXT: sarl $31, %eax
				; CHECK-BMI-NEXT: andnl %edi, %eax, %eax
				; CHECK-BMI-NEXT: retq
	%r = call i32 @llvm.smax.i32(i32 %a, i32 0)			%r = call i32 @llvm.smax.i32(i32 %a, i32 0)
	ret i32 %r			ret i32 %r
	}			}

	define i32 @test_i32_smin(i32 %a) nounwind {			define i32 @test_i32_smin(i32 %a) nounwind {
	; CHECK-LABEL: test_i32_smin:			; CHECK-LABEL: test_i32_smin:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: testl %edi, %edi			; CHECK-NEXT: sarl $31, %eax
	; CHECK-NEXT: cmovsl %edi, %eax			; CHECK-NEXT: andl %edi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%r = call i32 @llvm.smin.i32(i32 %a, i32 0)			%r = call i32 @llvm.smin.i32(i32 %a, i32 0)
	ret i32 %r			ret i32 %r
	}			}

	define i64 @test_i64_smax(i64 %a) nounwind {			define i64 @test_i64_smax(i64 %a) nounwind {
	; CHECK-LABEL: test_i64_smax:			; CHECK-NOBMI-LABEL: test_i64_smax:
	; CHECK: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NOBMI-NEXT: xorl %eax, %eax
	; CHECK-NEXT: testq %rdi, %rdi			; CHECK-NOBMI-NEXT: testq %rdi, %rdi
	; CHECK-NEXT: cmovgq %rdi, %rax			; CHECK-NOBMI-NEXT: cmovgq %rdi, %rax
	; CHECK-NEXT: retq			; CHECK-NOBMI-NEXT: retq
				;
				; CHECK-BMI-LABEL: test_i64_smax:
				; CHECK-BMI: # %bb.0:
				; CHECK-BMI-NEXT: movq %rdi, %rax
				; CHECK-BMI-NEXT: sarq $63, %rax
				; CHECK-BMI-NEXT: andnq %rdi, %rax, %rax
				; CHECK-BMI-NEXT: retq
	%r = call i64 @llvm.smax.i64(i64 %a, i64 0)			%r = call i64 @llvm.smax.i64(i64 %a, i64 0)
	ret i64 %r			ret i64 %r
	}			}

	define i64 @test_i64_smin(i64 %a) nounwind {			define i64 @test_i64_smin(i64 %a) nounwind {
	; CHECK-LABEL: test_i64_smin:			; CHECK-LABEL: test_i64_smin:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: movq %rdi, %rax
	; CHECK-NEXT: testq %rdi, %rdi			; CHECK-NEXT: sarq $63, %rax
	; CHECK-NEXT: cmovsq %rdi, %rax			; CHECK-NEXT: andq %rdi, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%r = call i64 @llvm.smin.i64(i64 %a, i64 0)			%r = call i64 @llvm.smin.i64(i64 %a, i64 0)
	ret i64 %r			ret i64 %r
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[x86] Improve select lowering for smin(x, 0) & smax(x, 0)
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 421116

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/select-smin-smax.ll

This is an archive of the discontinued LLVM Phabricator instance.

[x86] Improve select lowering for smin(x, 0) & smax(x, 0)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 421116

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/select-smin-smax.ll

[x86] Improve select lowering for smin(x, 0) & smax(x, 0)
ClosedPublic