This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
pr35972.ll
-
sdiv_fix_sat.ll
-
select.ll
-
shl-crash-on-legalize.ll
-
umul_fix_sat.ll

Differential D116804

[x86] use SETCC_CARRY instead of SBB node for select lowering
ClosedPublic

Authored by spatel on Jan 7 2022, 5:28 AM.

Download Raw Diff

Details

Reviewers

craig.topper
pengfei
RKSimon
lebedev.ri

Commits

rGaab1f55e33bb: [x86] use SETCC_CARRY instead of SBB node for select lowering

Summary

This is a suggested follow-up to D116765 (and diffs are on top of that patch). This removes a clear of the register operand, so it is better for code size, but it does potentially create a false register dependency on surrounding code.

The asm results match what I was expecting, but I'm not exactly sure how this works. Given that the node is called SETCC_CARRY and documented as:

// Same as SETCC except it's materialized with a sbb and the value is all
// one's or all zero's.
SETCC_CARRY, // R = carry_bit ? ~0 : 0

...why does it require a parameter for X86::COND_B? I changed that parameter in an experimental patch, and it didn't alter the asm.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Jan 7 2022, 5:28 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald TranscriptJan 7 2022, 5:28 AM

spatel requested review of this revision.Jan 7 2022, 5:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2022, 5:28 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B142060: Diff 398110.Jan 7 2022, 5:28 AM

spatel added a parent revision: D116765: [x86] make select lowering using SBB hack more flexible.Jan 7 2022, 5:28 AM

why does it require a parameter for X86::COND_B? I changed that parameter in an experimental patch, and it didn't alter the asm.

I believe there is some code in lowering that uses it. I think it made it easier to handle it along with X86ISD::SETCC if it has a condition code operand in the same place.

Are we concerned about the false dependency on Intel CPUs? I tried checking uops.info to see if I could tell what CPUs have it, but I couldn’t tell.

In D116804#3227666, @craig.topper wrote:

Are we concerned about the false dependency on Intel CPUs? I tried checking uops.info to see if I could tell what CPUs have it, but I couldn’t tell.

IMO, this is the right transform at this stage of compiling, but I don't have a strong opinion either way. The bug report ( https://github.com/llvm/llvm-project/issues/53006 ) doesn't seem to expect a zero op.

LGTM

This revision is now accepted and ready to land.Jan 7 2022, 9:26 AM

This revision was landed with ongoing or failed builds.Jan 9 2022, 4:00 AM

Closed by commit rGaab1f55e33bb: [x86] use SETCC_CARRY instead of SBB node for select lowering (authored by spatel). · Explain Why

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rGaab1f55e33bb: [x86] use SETCC_CARRY instead of SBB node for select lowering.

We noticed a 15% performance regression for llvm_test_suite/MultiSource/Benchmarks/MallocBench/gs with this patch.
Looking at the assembly, noticed a relatively long dependence chain including the following sequence:

mov   (%rbx),%rax 
sbb   %rax,%rax
or    %rdx,%rax

According to Agner Fog's manual (https://www.agner.org/optimize/microarchitecture.pdf), none of the Intel architectures recognize sbb with same operands as dependence breaking.
So, it seems that the false dependence issue already discussed in the previous comments might not be worth the code reduction gained by removing the clearing of the register, at least for Intel architectures.

It'd be good to at least keep this on AMD targets (bulldozer/zen/bobcat/jaguar) - these recognise the SBB(x,x) dependency breaking behaviour. Either as a tuning flag or if there's a way to access the scheduler model isDependencyBreaking() check.

In D116804#3286618, @apostolakis wrote:
We noticed a 15% performance regression for llvm_test_suite/MultiSource/Benchmarks/MallocBench/gs with this patch.
Looking at the assembly, noticed a relatively long dependence chain including the following sequence:
mov   (%rbx),%rax 
sbb   %rax,%rax
or    %rdx,%rax

Do you have the IR for the function where that appears?
I haven't looked at the mechanics of BreakFalseDeps and X86InstrInfo::getUndefRegClearance() in a long time, and I'm not sure if they would handle a pattern like that (the load of %rax may already be providing the expected clearance?).
If we're going to need a tuning flag similar to TuningPOPCNTFalseDep, then we could just use that as a predicate hack for the code that was changed in this patch (so don't take chances and always create a real SBB with zero operand if the flag is set).

In D116804#3287531, @spatel wrote:
In D116804#3286618, @apostolakis wrote:
We noticed a 15% performance regression for llvm_test_suite/MultiSource/Benchmarks/MallocBench/gs with this patch.
Looking at the assembly, noticed a relatively long dependence chain including the following sequence:
mov   (%rbx),%rax 
sbb   %rax,%rax
or    %rdx,%rax
Do you have the IR for the function where that appears?

Here is a source code example with clang-trunk generated IR and assembly (https://godbolt.org/z/v66TM8W7e) that resembles the affected code and reproduces the aforementioned sequence of instructions.
Notice the non-broken (for Intel targets) dependence chain including the following instructions: callq foo1(long*, long*, y_s*) -> movq 8(%rbx), %rax -> sbbq %rax, %rax -> orq %rdx, %rax -> callq foo2(int*, int, int, int, int, int, y_s*, long, long)

If we're going to need a tuning flag similar to TuningPOPCNTFalseDep, then we could just use that as a predicate hack for the code that was changed in this patch (so don't take chances and always create a real SBB with zero operand if the flag is set).

This sounds okay to me.

In D116804#3289136, @apostolakis wrote:
In D116804#3287531, @spatel wrote:
In D116804#3286618, @apostolakis wrote:
We noticed a 15% performance regression for llvm_test_suite/MultiSource/Benchmarks/MallocBench/gs with this patch.
Looking at the assembly, noticed a relatively long dependence chain including the following sequence:
mov   (%rbx),%rax 
sbb   %rax,%rax
or    %rdx,%rax
Do you have the IR for the function where that appears?
Here is a source code example with clang-trunk generated IR and assembly (https://godbolt.org/z/v66TM8W7e) that resembles the affected code and reproduces the aforementioned sequence of instructions.
Notice the non-broken (for Intel targets) dependence chain including the following instructions: callq foo1(long*, long*, y_s*) -> movq 8(%rbx), %rax -> sbbq %rax, %rax -> orq %rdx, %rax -> callq foo2(int*, int, int, int, int, int, y_s*, long, long)

If we're going to need a tuning flag similar to TuningPOPCNTFalseDep, then we could just use that as a predicate hack for the code that was changed in this patch (so don't take chances and always create a real SBB with zero operand if the flag is set).

This sounds okay to me.

Rather than modifying the code touched by this patch with a new tuning flag, should we do it in the X86ISelDAGToDAG.cpp where SETCC_CARRY is selected?

In D116804#3289352, @craig.topper wrote:
In D116804#3289136, @apostolakis wrote:
In D116804#3287531, @spatel wrote:
In D116804#3286618, @apostolakis wrote:
We noticed a 15% performance regression for llvm_test_suite/MultiSource/Benchmarks/MallocBench/gs with this patch.
Looking at the assembly, noticed a relatively long dependence chain including the following sequence:
mov   (%rbx),%rax 
sbb   %rax,%rax
or    %rdx,%rax
Do you have the IR for the function where that appears?
Here is a source code example with clang-trunk generated IR and assembly (https://godbolt.org/z/v66TM8W7e) that resembles the affected code and reproduces the aforementioned sequence of instructions.
Notice the non-broken (for Intel targets) dependence chain including the following instructions: callq foo1(long*, long*, y_s*) -> movq 8(%rbx), %rax -> sbbq %rax, %rax -> orq %rdx, %rax -> callq foo2(int*, int, int, int, int, int, y_s*, long, long)

If we're going to need a tuning flag similar to TuningPOPCNTFalseDep, then we could just use that as a predicate hack for the code that was changed in this patch (so don't take chances and always create a real SBB with zero operand if the flag is set).

This sounds okay to me.
Rather than modifying the code touched by this patch with a new tuning flag, should we do it in the X86ISelDAGToDAG.cpp where SETCC_CARRY is selected?

Yes, good point. I suspect it would be hard to come up with a test where there's a difference, but the later we make the conversion, the better.
I did look at BreakFalseDeps a bit, and I don't think it can work for this case because we convert the X86ISD::SETCC_CARRY into a X86::SETB_C32r pseudo op, and that has no general register operands for BreakFalseDeps to detect.

In D116804#3290586, @spatel wrote:
In D116804#3289352, @craig.topper wrote:
In D116804#3289136, @apostolakis wrote:
In D116804#3287531, @spatel wrote:
In D116804#3286618, @apostolakis wrote:
We noticed a 15% performance regression for llvm_test_suite/MultiSource/Benchmarks/MallocBench/gs with this patch.
Looking at the assembly, noticed a relatively long dependence chain including the following sequence:
mov   (%rbx),%rax 
sbb   %rax,%rax
or    %rdx,%rax
Do you have the IR for the function where that appears?
Here is a source code example with clang-trunk generated IR and assembly (https://godbolt.org/z/v66TM8W7e) that resembles the affected code and reproduces the aforementioned sequence of instructions.
Notice the non-broken (for Intel targets) dependence chain including the following instructions: callq foo1(long*, long*, y_s*) -> movq 8(%rbx), %rax -> sbbq %rax, %rax -> orq %rdx, %rax -> callq foo2(int*, int, int, int, int, int, y_s*, long, long)

If we're going to need a tuning flag similar to TuningPOPCNTFalseDep, then we could just use that as a predicate hack for the code that was changed in this patch (so don't take chances and always create a real SBB with zero operand if the flag is set).

This sounds okay to me.
Rather than modifying the code touched by this patch with a new tuning flag, should we do it in the X86ISelDAGToDAG.cpp where SETCC_CARRY is selected?
Yes, good point. I suspect it would be hard to come up with a test where there's a difference, but the later we make the conversion, the better.
I did look at BreakFalseDeps a bit, and I don't think it can work for this case because we convert the X86ISD::SETCC_CARRY into a X86::SETB_C32r pseudo op, and that has no general register operands for BreakFalseDeps to detect.

Won't SETB_C32r be turned into SBB32rr by expandPostRAPseudo before we get to BreakFalseDeps? But I don't think BreakFalseDeps is prepared to deal with a tied dest/src that's also used by another source.

In D116804#3290631, @craig.topper wrote:

Won't SETB_C32r be turned into SBB32rr by expandPostRAPseudo before we get to BreakFalseDeps? But I don't think BreakFalseDeps is prepared to deal with a tied dest/src that's also used by another source.

Ah, yes - it does become SBB before we run BreakFalseDeps. I had my breakpoints mixed up and missed that. But a quick hack shows the limitation you're suggesting; we insert the xor to clear the reg between the cmp and the sbb user, so that can't work.

spatel mentioned this in D118843: [x86] avoid false dependency stall on 'sbb' with same source reg.Feb 2 2022, 1:08 PM

spatel mentioned this in rGf523e83b204e: [x86] make helper function to create sbb with zero operands; NFC.Feb 2 2022, 1:56 PM

spatel mentioned this in rGa662456b33ed: [x86] add minimal test for sbb idiom and CPU capabilities; NFC.Feb 3 2022, 9:35 AM

spatel mentioned this in rG40a50f8701a9: [x86] avoid false dependency stall on 'sbb' with same source reg.Feb 7 2022, 7:13 AM

skan added a subscriber: skan.Mar 9 2022, 6:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 9 2022, 6:03 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

16 lines

test/

CodeGen/

X86/

pr35972.ll

1 line

sdiv_fix_sat.ll

4 lines

select.ll

199 lines

shl-crash-on-legalize.ll

1 line

umul_fix_sat.ll

32 lines

Diff 398414

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 24,546 Lines • ▼ Show 20 Lines	if (Cond.getOpcode() == X86ISD::SETCC &&
};		};
if (Subtarget.hasCMov() && (VT == MVT::i32 \|\| VT == MVT::i64) &&		if (Subtarget.hasCMov() && (VT == MVT::i32 \|\| VT == MVT::i64) &&
((CondCode == X86::COND_NE && MatchFFSMinus1(Op1, Op2)) \|\|		((CondCode == X86::COND_NE && MatchFFSMinus1(Op1, Op2)) \|\|
(CondCode == X86::COND_E && MatchFFSMinus1(Op2, Op1)))) {		(CondCode == X86::COND_E && MatchFFSMinus1(Op2, Op1)))) {
// Keep Cmp.		// Keep Cmp.
} else if ((isAllOnesConstant(Op1) \|\| isAllOnesConstant(Op2)) &&		} else if ((isAllOnesConstant(Op1) \|\| isAllOnesConstant(Op2)) &&
(CondCode == X86::COND_E \|\| CondCode == X86::COND_NE)) {		(CondCode == X86::COND_E \|\| CondCode == X86::COND_NE)) {
SDValue Y = isAllOnesConstant(Op2) ? Op1 : Op2;		SDValue Y = isAllOnesConstant(Op2) ? Op1 : Op2;

SDVTList VTs = DAG.getVTList(Op.getValueType(), MVT::i32);
SDVTList CmpVTs = DAG.getVTList(CmpOp0.getValueType(), MVT::i32);		SDVTList CmpVTs = DAG.getVTList(CmpOp0.getValueType(), MVT::i32);

// 'X - 1' sets the carry flag if X == 0.		// 'X - 1' sets the carry flag if X == 0.
// '0 - X' sets the carry flag if X != 0.		// '0 - X' sets the carry flag if X != 0.
// Convert the carry flag to a -1/0 mask with sbb:		// Convert the carry flag to a -1/0 mask with sbb:
// select (X != 0), -1, Y --> 0 - X; or (sbb), Y		// select (X != 0), -1, Y --> 0 - X; or (sbb), Y
// select (X == 0), Y, -1 --> 0 - X; or (sbb), Y		// select (X == 0), Y, -1 --> 0 - X; or (sbb), Y
// select (X != 0), Y, -1 --> X - 1; or (sbb), Y		// select (X != 0), Y, -1 --> X - 1; or (sbb), Y
// select (X == 0), -1, Y --> X - 1; or (sbb), Y		// select (X == 0), -1, Y --> X - 1; or (sbb), Y
		SDValue Sub;
if (isAllOnesConstant(Op1) == (CondCode == X86::COND_NE)) {		if (isAllOnesConstant(Op1) == (CondCode == X86::COND_NE)) {
SDValue Zero = DAG.getConstant(0, DL, CmpOp0.getValueType());		SDValue Zero = DAG.getConstant(0, DL, CmpOp0.getValueType());
Cmp = DAG.getNode(X86ISD::SUB, DL, CmpVTs, Zero, CmpOp0);		Sub = DAG.getNode(X86ISD::SUB, DL, CmpVTs, Zero, CmpOp0);
} else {		} else {
SDValue One = DAG.getConstant(1, DL, CmpOp0.getValueType());		SDValue One = DAG.getConstant(1, DL, CmpOp0.getValueType());
Cmp = DAG.getNode(X86ISD::SUB, DL, CmpVTs, CmpOp0, One);		Sub = DAG.getNode(X86ISD::SUB, DL, CmpVTs, CmpOp0, One);
}		}
// TODO: We don't need "0 - 0" here. This should use X86ISD::SETCC_CARRY.		SDValue SBB = DAG.getNode(X86ISD::SETCC_CARRY, DL, VT,
SDValue Zero = DAG.getConstant(0, DL, Op.getValueType());		DAG.getTargetConstant(X86::COND_B, DL, MVT::i8),
SDValue Res = // Res = 0 or -1.		Sub.getValue(1));
DAG.getNode(X86ISD::SBB, DL, VTs, Zero, Zero, Cmp.getValue(1));		return DAG.getNode(ISD::OR, DL, VT, SBB, Y);
return DAG.getNode(ISD::OR, DL, Res.getValueType(), Res, Y);
} else if (!Subtarget.hasCMov() && CondCode == X86::COND_E &&		} else if (!Subtarget.hasCMov() && CondCode == X86::COND_E &&
Cmp.getOperand(0).getOpcode() == ISD::AND &&		Cmp.getOperand(0).getOpcode() == ISD::AND &&
isOneConstant(Cmp.getOperand(0).getOperand(1))) {		isOneConstant(Cmp.getOperand(0).getOperand(1))) {
SDValue Src1, Src2;		SDValue Src1, Src2;
// true if Op2 is XOR or OR operator and one of its operands		// true if Op2 is XOR or OR operator and one of its operands
// is equal to Op1		// is equal to Op1
// ( a , a op b) \|\| ( b , a op b)		// ( a , a op b) \|\| ( b , a op b)
auto isOrXorPattern = [&]() {		auto isOrXorPattern = [&]() {
▲ Show 20 Lines • Show All 30,286 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/pr35972.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=i686-unknown-linux-gnu %s -o - -mattr=avx512bw \| FileCheck %s			; RUN: llc -mtriple=i686-unknown-linux-gnu %s -o - -mattr=avx512bw \| FileCheck %s

	define void @test3(i32 %c, <64 x i1>* %ptr) {			define void @test3(i32 %c, <64 x i1>* %ptr) {
	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax			; CHECK-NEXT: movl {{[0-9]+}}(%esp), %eax
	; CHECK-NEXT: xorl %ecx, %ecx
	; CHECK-NEXT: cmpl $1, {{[0-9]+}}(%esp)			; CHECK-NEXT: cmpl $1, {{[0-9]+}}(%esp)
	; CHECK-NEXT: sbbl %ecx, %ecx			; CHECK-NEXT: sbbl %ecx, %ecx
	; CHECK-NEXT: kmovd %ecx, %k0			; CHECK-NEXT: kmovd %ecx, %k0
	; CHECK-NEXT: kunpckdq %k0, %k0, %k0			; CHECK-NEXT: kunpckdq %k0, %k0, %k0
	; CHECK-NEXT: kmovq %k0, (%eax)			; CHECK-NEXT: kmovq %k0, (%eax)
	; CHECK-NEXT: retl			; CHECK-NEXT: retl
	%cmp = icmp eq i32 %c, 0			%cmp = icmp eq i32 %c, 0
	%insert = insertelement <64 x i1> undef, i1 %cmp, i32 0			%insert = insertelement <64 x i1> undef, i1 %cmp, i32 0
	%shuf = shufflevector <64 x i1> %insert, <64 x i1> undef, <64 x i32> zeroinitializer			%shuf = shufflevector <64 x i1> %insert, <64 x i1> undef, <64 x i32> zeroinitializer
	store <64 x i1> %shuf, <64 x i1>* %ptr			store <64 x i1> %shuf, <64 x i1>* %ptr
	ret void			ret void
	}			}

llvm/test/CodeGen/X86/sdiv_fix_sat.ll

	Show First 20 Lines • Show All 1,213 Lines • ▼ Show 20 Lines
	; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; X86-NEXT: andl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload			; X86-NEXT: andl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
	; X86-NEXT: testl %eax, %eax			; X86-NEXT: testl %eax, %eax
	; X86-NEXT: cmovel %eax, %ecx			; X86-NEXT: cmovel %eax, %ecx
	; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
	; X86-NEXT: andl %eax, %ebx			; X86-NEXT: andl %eax, %ebx
	; X86-NEXT: negl %eax			; X86-NEXT: negl %eax
	; X86-NEXT: movl $0, %ecx
	; X86-NEXT: sbbl %ecx, %ecx			; X86-NEXT: sbbl %ecx, %ecx
	; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload			; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
	; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload			; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
	; X86-NEXT: cmovnel %esi, %ecx			; X86-NEXT: cmovnel %esi, %ecx
	; X86-NEXT: movl $0, %edx			; X86-NEXT: movl $0, %edx
	; X86-NEXT: cmovel %edx, %ebx			; X86-NEXT: cmovel %edx, %ebx
	; X86-NEXT: cmpl $-1, %ebx			; X86-NEXT: cmpl $-1, %ebx
	; X86-NEXT: movl $0, %esi			; X86-NEXT: movl $0, %esi
	; X86-NEXT: cmovel %ecx, %esi			; X86-NEXT: cmovel %ecx, %esi
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
	; X86-NEXT: testl %eax, %eax			; X86-NEXT: testl %eax, %eax
	; X86-NEXT: cmovsl %edx, %ecx			; X86-NEXT: cmovsl %edx, %ecx
	; X86-NEXT: movl $-1, %edx			; X86-NEXT: movl $-1, %edx
	; X86-NEXT: cmovsl %edx, %ebx			; X86-NEXT: cmovsl %edx, %ebx
	; X86-NEXT: andl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload			; X86-NEXT: andl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
	; X86-NEXT: cmpl $-1, %eax			; X86-NEXT: cmpl $-1, %eax
	; X86-NEXT: cmovel %esi, %ecx			; X86-NEXT: cmovel %esi, %ecx
	; X86-NEXT: cmovnel %ebx, %eax			; X86-NEXT: cmovnel %ebx, %eax
	; X86-NEXT: shldl $31, %ecx, %eax			; X86-NEXT: shldl $31, %ecx, %eax
	; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
	; X86-NEXT: andl %eax, %edi			; X86-NEXT: andl %eax, %edi
	; X86-NEXT: negl %eax			; X86-NEXT: negl %eax
	; X86-NEXT: movl $0, %eax
	; X86-NEXT: sbbl %eax, %eax			; X86-NEXT: sbbl %eax, %eax
	; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload			; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
	; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload			; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
	; X86-NEXT: cmovnel {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload			; X86-NEXT: cmovnel {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
	; X86-NEXT: movl $0, %esi			; X86-NEXT: movl $0, %esi
	; X86-NEXT: cmovel %esi, %edi			; X86-NEXT: cmovel %esi, %edi
	; X86-NEXT: cmpl $-1, %edi			; X86-NEXT: cmpl $-1, %edi
	Show All 10 Lines
	; X86-NEXT: cmovel %edx, %eax			; X86-NEXT: cmovel %edx, %eax
	; X86-NEXT: cmovnel %edi, %ecx			; X86-NEXT: cmovnel %edi, %ecx
	; X86-NEXT: shldl $31, %eax, %ecx			; X86-NEXT: shldl $31, %eax, %ecx
	; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
	; X86-NEXT: andl %eax, %edx			; X86-NEXT: andl %eax, %edx
	; X86-NEXT: negl %eax			; X86-NEXT: negl %eax
	; X86-NEXT: movl $0, %eax
	; X86-NEXT: sbbl %eax, %eax			; X86-NEXT: sbbl %eax, %eax
	; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload			; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
	; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload			; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
	; X86-NEXT: cmovnel {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload			; X86-NEXT: cmovnel {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
	; X86-NEXT: cmovel %esi, %edx			; X86-NEXT: cmovel %esi, %edx
	; X86-NEXT: cmpl $-1, %edx			; X86-NEXT: cmpl $-1, %edx
	; X86-NEXT: movl $0, %ecx			; X86-NEXT: movl $0, %ecx
	; X86-NEXT: cmovel %eax, %ecx			; X86-NEXT: cmovel %eax, %ecx
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ebx # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ebx # 4-byte Reload
	; X86-NEXT: testl %ebx, %ebx			; X86-NEXT: testl %ebx, %ebx
	; X86-NEXT: cmovsl %esi, %eax			; X86-NEXT: cmovsl %esi, %eax
	; X86-NEXT: movl $-1, %edi			; X86-NEXT: movl $-1, %edi
	; X86-NEXT: cmovsl %edi, %edx			; X86-NEXT: cmovsl %edi, %edx
	; X86-NEXT: andl {{[-0-9]+}}(%e{{[sb]}}p), %ebx # 4-byte Folded Reload			; X86-NEXT: andl {{[-0-9]+}}(%e{{[sb]}}p), %ebx # 4-byte Folded Reload
	; X86-NEXT: cmpl $-1, %ebx			; X86-NEXT: cmpl $-1, %ebx
	; X86-NEXT: cmovel %ecx, %eax			; X86-NEXT: cmovel %ecx, %eax
	; X86-NEXT: cmovnel %edx, %ebx			; X86-NEXT: cmovnel %edx, %ebx
	; X86-NEXT: shldl $31, %eax, %ebx			; X86-NEXT: shldl $31, %eax, %ebx
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Reload
	; X86-NEXT: andl %eax, %edi			; X86-NEXT: andl %eax, %edi
	; X86-NEXT: negl %eax			; X86-NEXT: negl %eax
	; X86-NEXT: movl $0, %eax
	; X86-NEXT: sbbl %eax, %eax			; X86-NEXT: sbbl %eax, %eax
	; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload			; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
	; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload			; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
	; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload			; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
	; X86-NEXT: cmovnel {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload			; X86-NEXT: cmovnel {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
	; X86-NEXT: cmovel %esi, %edi			; X86-NEXT: cmovel %esi, %edi
	; X86-NEXT: cmpl $-1, %edi			; X86-NEXT: cmpl $-1, %edi
	; X86-NEXT: movl $0, %ecx			; X86-NEXT: movl $0, %ecx
	Show All 28 Lines

llvm/test/CodeGen/X86/select.ll

Show First 20 Lines • Show All 623 Lines • ▼ Show 20 Lines	; MCU-NEXT: retl
store <6 x i32> %val, <6 x i32>* %dst.addr		store <6 x i32> %val, <6 x i32>* %dst.addr
ret void		ret void
}		}


;; Test integer select between values and constants.		;; Test integer select between values and constants.

define i64 @test9(i64 %x, i64 %y) nounwind readnone ssp noredzone {		define i64 @test9(i64 %x, i64 %y) nounwind readnone ssp noredzone {
; CHECK-LABEL: test9:		; GENERIC-LABEL: test9:
; CHECK: ## %bb.0:		; GENERIC: ## %bb.0:
; CHECK-NEXT: xorl %eax, %eax		; GENERIC-NEXT: cmpq $1, %rdi
; CHECK-NEXT: cmpq $1, %rdi		; GENERIC-NEXT: sbbq %rax, %rax
; CHECK-NEXT: sbbq %rax, %rax		; GENERIC-NEXT: orq %rsi, %rax
; CHECK-NEXT: orq %rsi, %rax		; GENERIC-NEXT: retq
; CHECK-NEXT: retq		;
		; ATOM-LABEL: test9:
		; ATOM: ## %bb.0:
		; ATOM-NEXT: cmpq $1, %rdi
		; ATOM-NEXT: sbbq %rax, %rax
		; ATOM-NEXT: orq %rsi, %rax
		; ATOM-NEXT: nop
		; ATOM-NEXT: nop
		; ATOM-NEXT: retq
;		;
; ATHLON-LABEL: test9:		; ATHLON-LABEL: test9:
; ATHLON: ## %bb.0:		; ATHLON: ## %bb.0:
; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %eax		; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %eax
; ATHLON-NEXT: orl {{[0-9]+}}(%esp), %eax		; ATHLON-NEXT: orl {{[0-9]+}}(%esp), %eax
; ATHLON-NEXT: movl $-1, %eax		; ATHLON-NEXT: movl $-1, %eax
; ATHLON-NEXT: movl $-1, %edx		; ATHLON-NEXT: movl $-1, %edx
; ATHLON-NEXT: je LBB8_2		; ATHLON-NEXT: je LBB8_2
Show All 17 Lines
; MCU-NEXT: retl		; MCU-NEXT: retl
%cmp = icmp ne i64 %x, 0		%cmp = icmp ne i64 %x, 0
%cond = select i1 %cmp, i64 %y, i64 -1		%cond = select i1 %cmp, i64 %y, i64 -1
ret i64 %cond		ret i64 %cond
}		}

;; Same as test9		;; Same as test9
define i64 @test9a(i64 %x, i64 %y) nounwind readnone ssp noredzone {		define i64 @test9a(i64 %x, i64 %y) nounwind readnone ssp noredzone {
; CHECK-LABEL: test9a:		; GENERIC-LABEL: test9a:
; CHECK: ## %bb.0:		; GENERIC: ## %bb.0:
; CHECK-NEXT: xorl %eax, %eax		; GENERIC-NEXT: cmpq $1, %rdi
; CHECK-NEXT: cmpq $1, %rdi		; GENERIC-NEXT: sbbq %rax, %rax
; CHECK-NEXT: sbbq %rax, %rax		; GENERIC-NEXT: orq %rsi, %rax
; CHECK-NEXT: orq %rsi, %rax		; GENERIC-NEXT: retq
; CHECK-NEXT: retq		;
		; ATOM-LABEL: test9a:
		; ATOM: ## %bb.0:
		; ATOM-NEXT: cmpq $1, %rdi
		; ATOM-NEXT: sbbq %rax, %rax
		; ATOM-NEXT: orq %rsi, %rax
		; ATOM-NEXT: nop
		; ATOM-NEXT: nop
		; ATOM-NEXT: retq
;		;
; ATHLON-LABEL: test9a:		; ATHLON-LABEL: test9a:
; ATHLON: ## %bb.0:		; ATHLON: ## %bb.0:
; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %eax		; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %eax
; ATHLON-NEXT: orl {{[0-9]+}}(%esp), %eax		; ATHLON-NEXT: orl {{[0-9]+}}(%esp), %eax
; ATHLON-NEXT: movl $-1, %eax		; ATHLON-NEXT: movl $-1, %eax
; ATHLON-NEXT: movl $-1, %edx		; ATHLON-NEXT: movl $-1, %edx
; ATHLON-NEXT: je LBB9_2		; ATHLON-NEXT: je LBB9_2
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
; MCU-NEXT: .LBB11_2:		; MCU-NEXT: .LBB11_2:
; MCU-NEXT: retl		; MCU-NEXT: retl
%cmp = icmp eq i64 %x, 0		%cmp = icmp eq i64 %x, 0
%cond = select i1 %cmp, i64 -1, i64 1		%cond = select i1 %cmp, i64 -1, i64 1
ret i64 %cond		ret i64 %cond
}		}

define i64 @test11(i64 %x, i64 %y) nounwind readnone ssp noredzone {		define i64 @test11(i64 %x, i64 %y) nounwind readnone ssp noredzone {
; CHECK-LABEL: test11:		; GENERIC-LABEL: test11:
; CHECK: ## %bb.0:		; GENERIC: ## %bb.0:
; CHECK-NEXT: xorl %eax, %eax		; GENERIC-NEXT: negq %rdi
; CHECK-NEXT: negq %rdi		; GENERIC-NEXT: sbbq %rax, %rax
; CHECK-NEXT: sbbq %rax, %rax		; GENERIC-NEXT: orq %rsi, %rax
; CHECK-NEXT: orq %rsi, %rax		; GENERIC-NEXT: retq
; CHECK-NEXT: retq		;
		; ATOM-LABEL: test11:
		; ATOM: ## %bb.0:
		; ATOM-NEXT: negq %rdi
		; ATOM-NEXT: sbbq %rax, %rax
		; ATOM-NEXT: orq %rsi, %rax
		; ATOM-NEXT: nop
		; ATOM-NEXT: nop
		; ATOM-NEXT: retq
;		;
; ATHLON-LABEL: test11:		; ATHLON-LABEL: test11:
; ATHLON: ## %bb.0:		; ATHLON: ## %bb.0:
; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %eax		; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %eax
; ATHLON-NEXT: orl {{[0-9]+}}(%esp), %eax		; ATHLON-NEXT: orl {{[0-9]+}}(%esp), %eax
; ATHLON-NEXT: movl $-1, %eax		; ATHLON-NEXT: movl $-1, %eax
; ATHLON-NEXT: movl $-1, %edx		; ATHLON-NEXT: movl $-1, %edx
; ATHLON-NEXT: jne LBB12_2		; ATHLON-NEXT: jne LBB12_2
Show All 16 Lines
; MCU-NEXT: movl {{[0-9]+}}(%esp), %edx		; MCU-NEXT: movl {{[0-9]+}}(%esp), %edx
; MCU-NEXT: retl		; MCU-NEXT: retl
%cmp = icmp eq i64 %x, 0		%cmp = icmp eq i64 %x, 0
%cond = select i1 %cmp, i64 %y, i64 -1		%cond = select i1 %cmp, i64 %y, i64 -1
ret i64 %cond		ret i64 %cond
}		}

define i64 @test11a(i64 %x, i64 %y) nounwind readnone ssp noredzone {		define i64 @test11a(i64 %x, i64 %y) nounwind readnone ssp noredzone {
; CHECK-LABEL: test11a:		; GENERIC-LABEL: test11a:
; CHECK: ## %bb.0:		; GENERIC: ## %bb.0:
; CHECK-NEXT: xorl %eax, %eax		; GENERIC-NEXT: negq %rdi
; CHECK-NEXT: negq %rdi		; GENERIC-NEXT: sbbq %rax, %rax
; CHECK-NEXT: sbbq %rax, %rax		; GENERIC-NEXT: orq %rsi, %rax
; CHECK-NEXT: orq %rsi, %rax		; GENERIC-NEXT: retq
; CHECK-NEXT: retq		;
		; ATOM-LABEL: test11a:
		; ATOM: ## %bb.0:
		; ATOM-NEXT: negq %rdi
		; ATOM-NEXT: sbbq %rax, %rax
		; ATOM-NEXT: orq %rsi, %rax
		; ATOM-NEXT: nop
		; ATOM-NEXT: nop
		; ATOM-NEXT: retq
;		;
; ATHLON-LABEL: test11a:		; ATHLON-LABEL: test11a:
; ATHLON: ## %bb.0:		; ATHLON: ## %bb.0:
; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %eax		; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %eax
; ATHLON-NEXT: orl {{[0-9]+}}(%esp), %eax		; ATHLON-NEXT: orl {{[0-9]+}}(%esp), %eax
; ATHLON-NEXT: movl $-1, %eax		; ATHLON-NEXT: movl $-1, %eax
; ATHLON-NEXT: movl $-1, %edx		; ATHLON-NEXT: movl $-1, %edx
; ATHLON-NEXT: jne LBB13_2		; ATHLON-NEXT: jne LBB13_2
Show All 15 Lines
; MCU-NEXT: .LBB13_2:		; MCU-NEXT: .LBB13_2:
; MCU-NEXT: retl		; MCU-NEXT: retl
%cmp = icmp ne i64 %x, 0		%cmp = icmp ne i64 %x, 0
%cond = select i1 %cmp, i64 -1, i64 %y		%cond = select i1 %cmp, i64 -1, i64 %y
ret i64 %cond		ret i64 %cond
}		}

define i32 @eqzero_const_or_all_ones(i32 %x) {		define i32 @eqzero_const_or_all_ones(i32 %x) {
; CHECK-LABEL: eqzero_const_or_all_ones:		; GENERIC-LABEL: eqzero_const_or_all_ones:
; CHECK: ## %bb.0:		; GENERIC: ## %bb.0:
; CHECK-NEXT: xorl %eax, %eax		; GENERIC-NEXT: negl %edi
; CHECK-NEXT: negl %edi		; GENERIC-NEXT: sbbl %eax, %eax
; CHECK-NEXT: sbbl %eax, %eax		; GENERIC-NEXT: orl $42, %eax
; CHECK-NEXT: orl $42, %eax		; GENERIC-NEXT: retq
; CHECK-NEXT: retq		;
		; ATOM-LABEL: eqzero_const_or_all_ones:
		; ATOM: ## %bb.0:
		; ATOM-NEXT: negl %edi
		; ATOM-NEXT: sbbl %eax, %eax
		; ATOM-NEXT: orl $42, %eax
		; ATOM-NEXT: nop
		; ATOM-NEXT: nop
		; ATOM-NEXT: retq
;		;
; ATHLON-LABEL: eqzero_const_or_all_ones:		; ATHLON-LABEL: eqzero_const_or_all_ones:
; ATHLON: ## %bb.0:		; ATHLON: ## %bb.0:
; ATHLON-NEXT: xorl %eax, %eax		; ATHLON-NEXT: xorl %eax, %eax
; ATHLON-NEXT: cmpl {{[0-9]+}}(%esp), %eax		; ATHLON-NEXT: cmpl {{[0-9]+}}(%esp), %eax
; ATHLON-NEXT: sbbl %eax, %eax		; ATHLON-NEXT: sbbl %eax, %eax
; ATHLON-NEXT: orl $42, %eax		; ATHLON-NEXT: orl $42, %eax
; ATHLON-NEXT: retl		; ATHLON-NEXT: retl
;		;
; MCU-LABEL: eqzero_const_or_all_ones:		; MCU-LABEL: eqzero_const_or_all_ones:
; MCU: # %bb.0:		; MCU: # %bb.0:
; MCU-NEXT: xorl %ecx, %ecx
; MCU-NEXT: negl %eax		; MCU-NEXT: negl %eax
; MCU-NEXT: sbbl %ecx, %ecx		; MCU-NEXT: sbbl %eax, %eax
; MCU-NEXT: orl $42, %ecx		; MCU-NEXT: orl $42, %eax
; MCU-NEXT: movl %ecx, %eax
; MCU-NEXT: retl		; MCU-NEXT: retl
%z = icmp eq i32 %x, 0		%z = icmp eq i32 %x, 0
%r = select i1 %z, i32 42, i32 -1		%r = select i1 %z, i32 42, i32 -1
ret i32 %r		ret i32 %r
}		}

define i32 @nezero_const_or_all_ones(i32 %x) {		define i32 @nezero_const_or_all_ones(i32 %x) {
; CHECK-LABEL: nezero_const_or_all_ones:		; GENERIC-LABEL: nezero_const_or_all_ones:
; CHECK: ## %bb.0:		; GENERIC: ## %bb.0:
; CHECK-NEXT: xorl %eax, %eax		; GENERIC-NEXT: cmpl $1, %edi
; CHECK-NEXT: cmpl $1, %edi		; GENERIC-NEXT: sbbl %eax, %eax
; CHECK-NEXT: sbbl %eax, %eax		; GENERIC-NEXT: orl $42, %eax
; CHECK-NEXT: orl $42, %eax		; GENERIC-NEXT: retq
; CHECK-NEXT: retq		;
		; ATOM-LABEL: nezero_const_or_all_ones:
		; ATOM: ## %bb.0:
		; ATOM-NEXT: cmpl $1, %edi
		; ATOM-NEXT: sbbl %eax, %eax
		; ATOM-NEXT: orl $42, %eax
		; ATOM-NEXT: nop
		; ATOM-NEXT: nop
		; ATOM-NEXT: retq
;		;
; ATHLON-LABEL: nezero_const_or_all_ones:		; ATHLON-LABEL: nezero_const_or_all_ones:
; ATHLON: ## %bb.0:		; ATHLON: ## %bb.0:
; ATHLON-NEXT: xorl %eax, %eax
; ATHLON-NEXT: cmpl $1, {{[0-9]+}}(%esp)		; ATHLON-NEXT: cmpl $1, {{[0-9]+}}(%esp)
; ATHLON-NEXT: sbbl %eax, %eax		; ATHLON-NEXT: sbbl %eax, %eax
; ATHLON-NEXT: orl $42, %eax		; ATHLON-NEXT: orl $42, %eax
; ATHLON-NEXT: retl		; ATHLON-NEXT: retl
;		;
; MCU-LABEL: nezero_const_or_all_ones:		; MCU-LABEL: nezero_const_or_all_ones:
; MCU: # %bb.0:		; MCU: # %bb.0:
; MCU-NEXT: xorl %ecx, %ecx
; MCU-NEXT: cmpl $1, %eax		; MCU-NEXT: cmpl $1, %eax
; MCU-NEXT: sbbl %ecx, %ecx		; MCU-NEXT: sbbl %eax, %eax
; MCU-NEXT: orl $42, %ecx		; MCU-NEXT: orl $42, %eax
; MCU-NEXT: movl %ecx, %eax
; MCU-NEXT: retl		; MCU-NEXT: retl
%z = icmp ne i32 %x, 0		%z = icmp ne i32 %x, 0
%r = select i1 %z, i32 42, i32 -1		%r = select i1 %z, i32 42, i32 -1
ret i32 %r		ret i32 %r
}		}

define i64 @eqzero_all_ones_or_const(i64 %x) {		define i64 @eqzero_all_ones_or_const(i64 %x) {
; CHECK-LABEL: eqzero_all_ones_or_const:		; GENERIC-LABEL: eqzero_all_ones_or_const:
; CHECK: ## %bb.0:		; GENERIC: ## %bb.0:
; CHECK-NEXT: xorl %eax, %eax		; GENERIC-NEXT: cmpq $1, %rdi
; CHECK-NEXT: cmpq $1, %rdi		; GENERIC-NEXT: sbbq %rax, %rax
; CHECK-NEXT: sbbq %rax, %rax		; GENERIC-NEXT: orq $42, %rax
; CHECK-NEXT: orq $42, %rax		; GENERIC-NEXT: retq
; CHECK-NEXT: retq		;
		; ATOM-LABEL: eqzero_all_ones_or_const:
		; ATOM: ## %bb.0:
		; ATOM-NEXT: cmpq $1, %rdi
		; ATOM-NEXT: sbbq %rax, %rax
		; ATOM-NEXT: orq $42, %rax
		; ATOM-NEXT: nop
		; ATOM-NEXT: nop
		; ATOM-NEXT: retq
;		;
; ATHLON-LABEL: eqzero_all_ones_or_const:		; ATHLON-LABEL: eqzero_all_ones_or_const:
; ATHLON: ## %bb.0:		; ATHLON: ## %bb.0:
; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %eax		; ATHLON-NEXT: movl {{[0-9]+}}(%esp), %eax
; ATHLON-NEXT: xorl %edx, %edx		; ATHLON-NEXT: xorl %edx, %edx
; ATHLON-NEXT: orl {{[0-9]+}}(%esp), %eax		; ATHLON-NEXT: orl {{[0-9]+}}(%esp), %eax
; ATHLON-NEXT: movl $-1, %ecx		; ATHLON-NEXT: movl $-1, %ecx
; ATHLON-NEXT: movl $42, %eax		; ATHLON-NEXT: movl $42, %eax
Show All 13 Lines
; MCU-NEXT: .LBB16_2:		; MCU-NEXT: .LBB16_2:
; MCU-NEXT: retl		; MCU-NEXT: retl
%z = icmp eq i64 %x, 0		%z = icmp eq i64 %x, 0
%r = select i1 %z, i64 -1, i64 42		%r = select i1 %z, i64 -1, i64 42
ret i64 %r		ret i64 %r
}		}

define i8 @nezero_all_ones_or_const(i8 %x) {		define i8 @nezero_all_ones_or_const(i8 %x) {
; CHECK-LABEL: nezero_all_ones_or_const:		; GENERIC-LABEL: nezero_all_ones_or_const:
; CHECK: ## %bb.0:		; GENERIC: ## %bb.0:
; CHECK-NEXT: xorl %eax, %eax		; GENERIC-NEXT: negb %dil
; CHECK-NEXT: negb %dil		; GENERIC-NEXT: sbbl %eax, %eax
; CHECK-NEXT: sbbl %eax, %eax		; GENERIC-NEXT: orb $42, %al
; CHECK-NEXT: orb $42, %al		; GENERIC-NEXT: ## kill: def $al killed $al killed $eax
; CHECK-NEXT: ## kill: def $al killed $al killed $eax		; GENERIC-NEXT: retq
; CHECK-NEXT: retq		;
		; ATOM-LABEL: nezero_all_ones_or_const:
		; ATOM: ## %bb.0:
		; ATOM-NEXT: negb %dil
		; ATOM-NEXT: sbbl %eax, %eax
		; ATOM-NEXT: orb $42, %al
		; ATOM-NEXT: ## kill: def $al killed $al killed $eax
		; ATOM-NEXT: nop
		; ATOM-NEXT: nop
		; ATOM-NEXT: retq
;		;
; ATHLON-LABEL: nezero_all_ones_or_const:		; ATHLON-LABEL: nezero_all_ones_or_const:
; ATHLON: ## %bb.0:		; ATHLON: ## %bb.0:
; ATHLON-NEXT: xorl %eax, %eax		; ATHLON-NEXT: xorl %eax, %eax
; ATHLON-NEXT: cmpb {{[0-9]+}}(%esp), %al		; ATHLON-NEXT: cmpb {{[0-9]+}}(%esp), %al
; ATHLON-NEXT: sbbl %eax, %eax		; ATHLON-NEXT: sbbl %eax, %eax
; ATHLON-NEXT: orb $42, %al		; ATHLON-NEXT: orb $42, %al
; ATHLON-NEXT: ## kill: def $al killed $al killed $eax		; ATHLON-NEXT: ## kill: def $al killed $al killed $eax
; ATHLON-NEXT: retl		; ATHLON-NEXT: retl
;		;
; MCU-LABEL: nezero_all_ones_or_const:		; MCU-LABEL: nezero_all_ones_or_const:
; MCU: # %bb.0:		; MCU: # %bb.0:
; MCU-NEXT: xorl %ecx, %ecx
; MCU-NEXT: negb %al		; MCU-NEXT: negb %al
; MCU-NEXT: sbbl %ecx, %ecx		; MCU-NEXT: sbbl %eax, %eax
; MCU-NEXT: orb $42, %cl		; MCU-NEXT: orb $42, %al
; MCU-NEXT: movl %ecx, %eax		; MCU-NEXT: # kill: def $al killed $al killed $eax
; MCU-NEXT: retl		; MCU-NEXT: retl
%z = icmp ne i8 %x, 0		%z = icmp ne i8 %x, 0
%r = select i1 %z, i8 -1, i8 42		%r = select i1 %z, i8 -1, i8 42
ret i8 %r		ret i8 %r
}		}

define i32 @PR53006(i32 %x) {		define i32 @PR53006(i32 %x) {
; CHECK-LABEL: PR53006:		; CHECK-LABEL: PR53006:
▲ Show 20 Lines • Show All 851 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/shl-crash-on-legalize.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	;RUN: llc < %s \| FileCheck %s			;RUN: llc < %s \| FileCheck %s

	; This test case failed on legalization of "shl" node. PR29058.			; This test case failed on legalization of "shl" node. PR29058.

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@structMember = external dso_local local_unnamed_addr global i64, align 8			@structMember = external dso_local local_unnamed_addr global i64, align 8

	define i32 @PR29058(i8 %x, i32 %y) {			define i32 @PR29058(i8 %x, i32 %y) {
	; CHECK-LABEL: PR29058:			; CHECK-LABEL: PR29058:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: testb %dil, %dil			; CHECK-NEXT: testb %dil, %dil
	; CHECK-NEXT: movl $2147483646, %eax # imm = 0x7FFFFFFE			; CHECK-NEXT: movl $2147483646, %eax # imm = 0x7FFFFFFE
	; CHECK-NEXT: cmovnel %esi, %eax			; CHECK-NEXT: cmovnel %esi, %eax
	; CHECK-NEXT: xorl %ecx, %ecx
	; CHECK-NEXT: cmpb $1, %dil			; CHECK-NEXT: cmpb $1, %dil
	; CHECK-NEXT: sbbl %ecx, %ecx			; CHECK-NEXT: sbbl %ecx, %ecx
	; CHECK-NEXT: orb %sil, %cl			; CHECK-NEXT: orb %sil, %cl
	; CHECK-NEXT: # kill: def $cl killed $cl killed $ecx			; CHECK-NEXT: # kill: def $cl killed $cl killed $ecx
	; CHECK-NEXT: shll %cl, %eax			; CHECK-NEXT: shll %cl, %eax
	; CHECK-NEXT: movq %rax, structMember(%rip)			; CHECK-NEXT: movq %rax, structMember(%rip)
	; CHECK-NEXT: # kill: def $eax killed $eax killed $rax			; CHECK-NEXT: # kill: def $eax killed $eax killed $rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	Show All 15 Lines

llvm/test/CodeGen/X86/umul_fix_sat.ll

	Show First 20 Lines • Show All 437 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X86-LABEL: func7:			; X86-LABEL: func7:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: pushl %ebp			; X86-NEXT: pushl %ebp
	; X86-NEXT: pushl %ebx			; X86-NEXT: pushl %ebx
	; X86-NEXT: pushl %edi			; X86-NEXT: pushl %edi
	; X86-NEXT: pushl %esi			; X86-NEXT: pushl %esi
	; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx			; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
				; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp			; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
	; X86-NEXT: movl %esi, %eax			; X86-NEXT: movl %ebx, %eax
	; X86-NEXT: mull %ebp			; X86-NEXT: mull %ebp
	; X86-NEXT: movl %edx, %ecx			; X86-NEXT: movl %edx, %ecx
	; X86-NEXT: movl %eax, %edi			; X86-NEXT: movl %eax, %esi
	; X86-NEXT: movl %esi, %eax			; X86-NEXT: movl %ebx, %eax
	; X86-NEXT: mull %ebx			; X86-NEXT: mull %edi
	; X86-NEXT: addl %edx, %edi			; X86-NEXT: addl %edx, %esi
	; X86-NEXT: adcl $0, %ecx			; X86-NEXT: adcl $0, %ecx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: mull %ebp			; X86-NEXT: mull %ebp
	; X86-NEXT: movl %edx, %esi			; X86-NEXT: movl %edx, %ebx
	; X86-NEXT: movl %eax, %ebp			; X86-NEXT: movl %eax, %ebp
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X86-NEXT: mull %ebx			; X86-NEXT: mull %edi
	; X86-NEXT: addl %edi, %eax			; X86-NEXT: addl %esi, %eax
	; X86-NEXT: adcl %ecx, %edx			; X86-NEXT: adcl %ecx, %edx
	; X86-NEXT: adcl $0, %esi			; X86-NEXT: adcl $0, %ebx
	; X86-NEXT: addl %ebp, %edx			; X86-NEXT: addl %ebp, %edx
	; X86-NEXT: adcl $0, %esi			; X86-NEXT: adcl $0, %ebx
	; X86-NEXT: xorl %ecx, %ecx			; X86-NEXT: negl %ebx
	; X86-NEXT: negl %esi
	; X86-NEXT: sbbl %ecx, %ecx			; X86-NEXT: sbbl %ecx, %ecx
	; X86-NEXT: orl %ecx, %eax			; X86-NEXT: orl %ecx, %eax
	; X86-NEXT: orl %ecx, %edx			; X86-NEXT: orl %ecx, %edx
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
	; X86-NEXT: popl %edi			; X86-NEXT: popl %edi
	; X86-NEXT: popl %ebx			; X86-NEXT: popl %ebx
	; X86-NEXT: popl %ebp			; X86-NEXT: popl %ebp
	; X86-NEXT: retl			; X86-NEXT: retl
	Show All 39 Lines
	; X86-NEXT: addl %edi, %eax			; X86-NEXT: addl %edi, %eax
	; X86-NEXT: adcl %esi, %edx			; X86-NEXT: adcl %esi, %edx
	; X86-NEXT: adcl $0, %ecx			; X86-NEXT: adcl $0, %ecx
	; X86-NEXT: addl %ebp, %edx			; X86-NEXT: addl %ebp, %edx
	; X86-NEXT: adcl $0, %ecx			; X86-NEXT: adcl $0, %ecx
	; X86-NEXT: shrdl $31, %edx, %eax			; X86-NEXT: shrdl $31, %edx, %eax
	; X86-NEXT: movl %edx, %esi			; X86-NEXT: movl %edx, %esi
	; X86-NEXT: shrl $31, %esi			; X86-NEXT: shrl $31, %esi
	; X86-NEXT: xorl %edi, %edi
	; X86-NEXT: negl %esi			; X86-NEXT: negl %esi
	; X86-NEXT: sbbl %edi, %edi			; X86-NEXT: sbbl %esi, %esi
	; X86-NEXT: orl %edi, %eax			; X86-NEXT: orl %esi, %eax
	; X86-NEXT: shrdl $31, %ecx, %edx			; X86-NEXT: shrdl $31, %ecx, %edx
	; X86-NEXT: orl %edi, %edx			; X86-NEXT: orl %esi, %edx
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
	; X86-NEXT: popl %edi			; X86-NEXT: popl %edi
	; X86-NEXT: popl %ebx			; X86-NEXT: popl %ebx
	; X86-NEXT: popl %ebp			; X86-NEXT: popl %ebp
	; X86-NEXT: retl			; X86-NEXT: retl
	%tmp = call i64 @llvm.umul.fix.sat.i64(i64 %x, i64 %y, i32 63)			%tmp = call i64 @llvm.umul.fix.sat.i64(i64 %x, i64 %y, i32 63)
	ret i64 %tmp			ret i64 %tmp
	}			}