This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
12/13
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AMDGPU/
2/2
dagcombine-select.ll
-
udivrem.ll
-
X86/
3/9
dagcombine-select.ll

Differential D48223

Allow binop C1, (select cc, CF, CT) -> select folding
ClosedPublic

Authored by rampitec on Jun 15 2018, 9:10 AM.

Download Raw Diff

Details

Reviewers

msearles
spatel
lebedev.ri
craig.topper
RKSimon
arsenm

Commits

rG20279dc0258b: Allow binop C1, (select cc, CF, CT) -> select folding
rL335167: Allow binop C1, (select cc, CF, CT) -> select folding

Summary

Previously this folding was done only if select is a first operand.
However, for non-commutative operations constant may go before
select.

Diff Detail

Event Timeline

rampitec created this revision.Jun 15 2018, 9:10 AM

Herald added subscribers: tpr, nhaehnle. · View Herald TranscriptJun 15 2018, 9:10 AM

LGTM

This revision is now accepted and ready to land.Jun 15 2018, 12:37 PM

Rebase.

Would be good to have additional small test for some other target (e.g. x86), too.

In D48223#1134457, @lebedev.ri wrote:

Would be good to have additional small test for some other target (e.g. x86), too.

Sure. It turns out x86 needs it even more probably. I have created the test and it turns out both andl and orl were present before the patch.

Added x86 test.
Updated amdgcn test.

rampitec added a reviewer: lebedev.ri.Jun 16 2018, 1:00 AM

lebedev.ri added reviewers: craig.topper, RKSimon.Jun 16 2018, 1:17 AM

lebedev.ri added inline comments.

test/CodeGen/X86/dagcombine-select.ll
1	Most tests (and practically all new x86 tests) use `utils/update_llc_test_checks.py` script to auto-generate these check lines.
3–17	Hm, is there some omitted instruction, or is this actually better than what we currently normally do? https://godbolt.org/g/7ULPfH

Updated x86 asm to contain full ISA.

rampitec marked an inline comment as done.Jun 16 2018, 1:37 AM

rampitec added inline comments.

test/CodeGen/X86/dagcombine-select.ll
3–17	Yes. I have updated the test to contain the full ISA. First xor to zero out eax was omitted. I am not sure what compiler explorer does, but that is what trunk llc has produced: xorl %eax, %eax cmpl $11, %edi setl %al decl %eax andl %esi, %eax retq I assume difference comes from running or not running opt.

rampitec added inline comments.Jun 16 2018, 1:42 AM

test/CodeGen/X86/dagcombine-select.ll
3–17	E.g. compare w/o opt: https://godbolt.org/g/NMZ9he

lebedev.ri added inline comments.Jun 16 2018, 2:00 AM

test/CodeGen/X86/dagcombine-select.ll
1	(not actually done, that does not look like the utility's output, and the first line does not say the script was used)
3–17	Ok, so the only difference is that the strictness of the comparison is inverted (`cmovg` vs `cmovge` and the other way around).

rampitec marked an inline comment as done.Jun 16 2018, 2:24 AM

rampitec added inline comments.

test/CodeGen/X86/dagcombine-select.ll
1	Oops, sorry. Clicked wrong checkbox. Now updated.
3–17	Do you see "and eax, esi" instruction on the second from the right tab using my link? This one was eliminated. In general if you run opt that is not an issue. If you run llc only (as in the test) you can see it. Normally that shall not happen, as I wrote in the description InstCombine can get rid of it. However, amdgcn can produce it during lowering and InstCombine does not play.

Updated x86 test with utils/update_llc_test_checks.py

Sounds reasonable.

test/CodeGen/X86/dagcombine-select.ll
3–17	Of course. I was only talking about comparing the `llc(D48223) test/CodeGen/X86/dagcombine-select.ll` with the `opt(trunk) test/CodeGen/X86/dagcombine-select.ll \| llc(trunk)` output.

What about
(X / Y) != 0 -> X >= Y ?

(X, Y are unsigned)

spatel requested changes to this revision.Jun 16 2018, 7:10 AM

spatel added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889	This change is independent of the and/or enhancement. We're now allowing folding when the binop has a constant operand 0. That seems like a good enhancement for non-commutative binops, but it should have its own tests using opcodes besides 'and'/'or' and be split into its own patch. Example based on one of the original tests from rL296699: define <2 x double> @sel_constants_fmul_constant_vec(i1 %cond) { %sel = select i1 %cond, <2 x double> <double -4.0, double 12.0>, <2 x double> <double 23.3, double 11.0> %bo = fsub <2 x double> <double 5.1, double 3.14>, %sel ret <2 x double> %bo }

This revision now requires changes to proceed.Jun 16 2018, 7:10 AM

In D48223#1134505, @xbolva00 wrote:

What about
(X / Y) != 0 -> X >= Y ?

(X, Y are unsigned)

This might be a good idea, but seems a separate change.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889	I am not sure why this would be beneficial. For example: sub (select 0, -1), x -> select (sub 0, x), (sub -1, x) add (select 0, -1), x -> select x, (add -1, x) In this case select would remain and two subs instead of one, seems worse. For add both select and add would remain, so the benefit is not obvious. In turn and/or have clear benefit in this change because binop completely goes away, and In fact the same can be done to xor, but I have decided not to since xor would be needed anyway with a non-zero operand (and one would never have a select with both sizes zero).

spatel added inline comments.Jun 16 2018, 12:51 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889	It's beneficial for exactly the same reason as the existing fold - we eliminate the binop. The fact that the case with constant operand 0 isn't already folded was just an oversight in the original patch. Currently, this patch miscompiles on these cases, so we need more tests and a code change whether we treat that difference as part of this patch or not. Here's a scalar example for Aarch64 in case it makes it easier to see: define i32 @sel_constants_sub_constant_op0(i1 %cond) { %sel = select i1 %cond, i32 12, i32 42 %bo = sub i32 500, %sel ret i32 %bo } Current codegen: tst w0, #0x1 mov w8, #42 orr w9, wzr, #0xc csel w8, w9, w8, ne mov w9, #500 sub w0, w9, w8 ret And with this patch applied (miscompile): tst w0, #0x1 mov w8, #-458 ; 500 - 42 != -458 mov w9, #-488 ; 500 - 12 != -488 csel w0, w9, w8, ne ret

rampitec added inline comments.Jun 18 2018, 10:21 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889	Yes, there is a bug. I will fix it.

rampitec added inline comments.Jun 18 2018, 12:16 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889	Thanks for catching this! The confusion on my side was about "constant operand 0". I have misread it as "constant zero" instead of "operand zero". I will update diff shortly.

Fixed handling of non-commutative operations if arguments are swapped.
Added tests for non-commutative operations with all-const value.
Retitled patch accordingly.

Herald added a subscriber: nemanjai. · View Herald TranscriptJun 18 2018, 12:26 PM

spatel added inline comments.Jun 18 2018, 2:01 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889	Ah...I can see how that was confusing. :) I think this patch is correct now, but let me make a couple of requests: Commit the new PPC and x86 tests with assertions based on trunk (ie, without this patch applied). It's easier to review if we can see the before/after changes directly in this review. I'd do the same for the AMDGPU tests too, but I'm not qualified to review those diffs anyway. Split the and/or enhancement into a follow-up patch. IIUC, that's independent of allowing the more flexible constant matching.

rampitec added inline comments.Jun 18 2018, 2:25 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889	Will do. I will not do so with amdgpu tests as these are not autogenerated, but will do it with x86 and PPC.

rampitec added inline comments.Jun 18 2018, 2:58 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1883–1889	Baseline tests committed https://reviews.llvm.org/rL334987

Only commute part of the change.

rampitec mentioned this in D48301: DAG combine "and|or (select c, -1, 0), x" -> "select c, x, 0|-1".Jun 18 2018, 3:14 PM

rampitec added a child revision: D48301: DAG combine "and|or (select c, -1, 0), x" -> "select c, x, 0|-1".

spatel added inline comments.Jun 18 2018, 4:04 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1904	Nit: I'd call this 'C' or 'CBO' or 'CBinOp' since it's not always operand 1 now.
1924–1925	When does this happen (is there a test)? Why does this only happen when the select is operand 1 of the binop? Better to remove the 'SelOpNo' condition to be safer?

rampitec added inline comments.Jun 18 2018, 4:40 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1924–1925	That is if shift value and amount have different types. On x86 shift amount is i8 regardless of LHS. The new test shl_constant_sel_constants does not fold on x86 but does on amdgpu. W/o the check this test asserts. At the same time old test does work on x86 and folds correctly, so I think it is only needed if I have swapped operands. In general that should be possible to write a piece of code to create correct VT for shifts, but I'd better leave that to x86 folks. I guess range checking will be also needed if such a code is about to be written.

Added comment about shift VTs
Renamed C1 into CBO

Added x86 test for shifts with not reversed operands (as before the change).

arsenm added a subscriber: arsenm.Jun 19 2018, 4:17 AM

arsenm added inline comments.

test/CodeGen/AMDGPU/dagcombine-select.ll
2	Use an i16 target for i16 tests
6	Some tests with i16 would be useful (as well as vectors)

spatel added inline comments.Jun 19 2018, 8:50 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1924–1925	Thanks - that was my guess, and the TODO + test makes it clear. LGTM, but get final approval from @arsenm once the AMDGPU tests are updated.

Added some i16/f16/vector tests.

Herald added a subscriber: wdng. · View Herald TranscriptJun 19 2018, 11:14 AM

LGTM

rampitec added inline comments.Jun 20 2018, 11:29 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1924–1925	So @arsenm has reviewed the tests. @spatel it looks like your previous vote technically holds the review now. Can this be submitted?

LGTM.

This revision is now accepted and ready to land.Jun 20 2018, 11:32 AM

Closed by commit rL335167: Allow binop C1, (select cc, CF, CT) -> select folding (authored by rampitec). · Explain WhyJun 20 2018, 1:29 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

33 lines

test/

CodeGen/

AMDGPU/

dagcombine-select.ll

49 lines

udivrem.ll

26 lines

X86/

dagcombine-select.ll

47 lines

Diff 151609

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,871 Lines • ▼ Show 20 Lines	assert((BinOpcode == ISD::ADD \|\| BinOpcode == ISD::SUB \|\|
BinOpcode == ISD::UREM \|\| BinOpcode == ISD::AND \|\|		BinOpcode == ISD::UREM \|\| BinOpcode == ISD::AND \|\|
BinOpcode == ISD::OR \|\| BinOpcode == ISD::XOR \|\|		BinOpcode == ISD::OR \|\| BinOpcode == ISD::XOR \|\|
BinOpcode == ISD::SHL \|\| BinOpcode == ISD::SRL \|\|		BinOpcode == ISD::SHL \|\| BinOpcode == ISD::SRL \|\|
BinOpcode == ISD::SRA \|\| BinOpcode == ISD::FADD \|\|		BinOpcode == ISD::SRA \|\| BinOpcode == ISD::FADD \|\|
BinOpcode == ISD::FSUB \|\| BinOpcode == ISD::FMUL \|\|		BinOpcode == ISD::FSUB \|\| BinOpcode == ISD::FMUL \|\|
BinOpcode == ISD::FDIV \|\| BinOpcode == ISD::FREM) &&		BinOpcode == ISD::FDIV \|\| BinOpcode == ISD::FREM) &&
"Unexpected binary operator");		"Unexpected binary operator");

// Bail out if any constants are opaque because we can't constant fold those.
SDValue C1 = BO->getOperand(1);
if (!isConstantOrConstantVector(C1, true) &&
!isConstantFPBuildVectorOrConstantFP(C1))
return SDValue();

// Don't do this unless the old select is going away. We want to eliminate the		// Don't do this unless the old select is going away. We want to eliminate the
// binary operator, not replace a binop with a select.		// binary operator, not replace a binop with a select.
// TODO: Handle ISD::SELECT_CC.		// TODO: Handle ISD::SELECT_CC.
		unsigned SelOpNo = 0;
SDValue Sel = BO->getOperand(0);		SDValue Sel = BO->getOperand(0);
		if (Sel.getOpcode() != ISD::SELECT \|\| !Sel.hasOneUse()) {
		SelOpNo = 1;
		Sel = BO->getOperand(1);
		}

		spatelUnsubmitted Done Reply Inline Actions This change is independent of the and/or enhancement. We're now allowing folding when the binop has a constant operand 0. That seems like a good enhancement for non-commutative binops, but it should have its own tests using opcodes besides 'and'/'or' and be split into its own patch. Example based on one of the original tests from rL296699: define <2 x double> @sel_constants_fmul_constant_vec(i1 %cond) { %sel = select i1 %cond, <2 x double> <double -4.0, double 12.0>, <2 x double> <double 23.3, double 11.0> %bo = fsub <2 x double> <double 5.1, double 3.14>, %sel ret <2 x double> %bo } spatel: This change is independent of the and/or enhancement. We're now allowing folding when the binop…
		rampitecAuthorUnsubmitted Done Reply Inline Actions I am not sure why this would be beneficial. For example: sub (select 0, -1), x -> select (sub 0, x), (sub -1, x) add (select 0, -1), x -> select x, (add -1, x) In this case select would remain and two subs instead of one, seems worse. For add both select and add would remain, so the benefit is not obvious. In turn and/or have clear benefit in this change because binop completely goes away, and In fact the same can be done to xor, but I have decided not to since xor would be needed anyway with a non-zero operand (and one would never have a select with both sizes zero). rampitec: I am not sure why this would be beneficial. For example: ``` sub (select 0, -1), x -> select…
		spatelUnsubmitted Done Reply Inline Actions It's beneficial for exactly the same reason as the existing fold - we eliminate the binop. The fact that the case with constant operand 0 isn't already folded was just an oversight in the original patch. Currently, this patch miscompiles on these cases, so we need more tests and a code change whether we treat that difference as part of this patch or not. Here's a scalar example for Aarch64 in case it makes it easier to see: define i32 @sel_constants_sub_constant_op0(i1 %cond) { %sel = select i1 %cond, i32 12, i32 42 %bo = sub i32 500, %sel ret i32 %bo } Current codegen: tst w0, #0x1 mov w8, #42 orr w9, wzr, #0xc csel w8, w9, w8, ne mov w9, #500 sub w0, w9, w8 ret And with this patch applied (miscompile): tst w0, #0x1 mov w8, #-458 ; 500 - 42 != -458 mov w9, #-488 ; 500 - 12 != -488 csel w0, w9, w8, ne ret spatel: It's beneficial for exactly the same reason as the existing fold - we eliminate the binop. The…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Yes, there is a bug. I will fix it. rampitec: Yes, there is a bug. I will fix it.
		rampitecAuthorUnsubmitted Done Reply Inline Actions Thanks for catching this! The confusion on my side was about "constant operand 0". I have misread it as "constant zero" instead of "operand zero". I will update diff shortly. rampitec: Thanks for catching this! The confusion on my side was about "constant operand 0". I have…
		spatelUnsubmitted Done Reply Inline Actions Ah...I can see how that was confusing. :) I think this patch is correct now, but let me make a couple of requests: Commit the new PPC and x86 tests with assertions based on trunk (ie, without this patch applied). It's easier to review if we can see the before/after changes directly in this review. I'd do the same for the AMDGPU tests too, but I'm not qualified to review those diffs anyway. Split the and/or enhancement into a follow-up patch. IIUC, that's independent of allowing the more flexible constant matching. spatel: Ah...I can see how that was confusing. :) I think this patch is correct now, but let me make a…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Will do. I will not do so with amdgpu tests as these are not autogenerated, but will do it with x86 and PPC. rampitec: Will do. I will not do so with amdgpu tests as these are not autogenerated, but will do it with…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Baseline tests committed https://reviews.llvm.org/rL334987 rampitec: Baseline tests committed https://reviews.llvm.org/rL334987
if (Sel.getOpcode() != ISD::SELECT \|\| !Sel.hasOneUse())		if (Sel.getOpcode() != ISD::SELECT \|\| !Sel.hasOneUse())
return SDValue();		return SDValue();

SDValue CT = Sel.getOperand(1);		SDValue CT = Sel.getOperand(1);
if (!isConstantOrConstantVector(CT, true) &&		if (!isConstantOrConstantVector(CT, true) &&
!isConstantFPBuildVectorOrConstantFP(CT))		!isConstantFPBuildVectorOrConstantFP(CT))
return SDValue();		return SDValue();

SDValue CF = Sel.getOperand(2);		SDValue CF = Sel.getOperand(2);
if (!isConstantOrConstantVector(CF, true) &&		if (!isConstantOrConstantVector(CF, true) &&
!isConstantFPBuildVectorOrConstantFP(CF))		!isConstantFPBuildVectorOrConstantFP(CF))
return SDValue();		return SDValue();

		// Bail out if any constants are opaque because we can't constant fold those.
		// The exception is "and" and "or" with either 0 or -1 in which case we can
		spatelUnsubmitted Done Reply Inline Actions Nit: I'd call this 'C' or 'CBO' or 'CBinOp' since it's not always operand 1 now. spatel: Nit: I'd call this 'C' or 'CBO' or 'CBinOp' since it's not always operand 1 now.
		// propagate non constant operands into select.
		bool CanFoldNonConst = false;
		if (BinOpcode == ISD::AND \|\| BinOpcode == ISD::OR) {
		ConstantSDNode *CTN = cast<ConstantSDNode>(CT);
		ConstantSDNode *CFN = cast<ConstantSDNode>(CF);
		CanFoldNonConst = (CTN->isNullValue() \|\| CTN->isAllOnesValue()) &&
		(CFN->isNullValue() \|\| CFN->isAllOnesValue());
		}

		SDValue C1 = BO->getOperand(SelOpNo ^ 1);
		if (!CanFoldNonConst &&
		!isConstantOrConstantVector(C1, true) &&
		!isConstantFPBuildVectorOrConstantFP(C1))
		return SDValue();

// We have a select-of-constants followed by a binary operator with a		// We have a select-of-constants followed by a binary operator with a
// constant. Eliminate the binop by pulling the constant math into the select.		// constant. Eliminate the binop by pulling the constant math into the select.
// Example: add (select Cond, CT, CF), C1 --> select Cond, CT + C1, CF + C1		// Example: add (select Cond, CT, CF), C1 --> select Cond, CT + C1, CF + C1
EVT VT = Sel.getValueType();		EVT VT = Sel.getValueType();
SDLoc DL(Sel);		SDLoc DL(Sel);
SDValue NewCT = DAG.getNode(BinOpcode, DL, VT, CT, C1);		SDValue NewCT = DAG.getNode(BinOpcode, DL, VT, CT, C1);
		spatelUnsubmitted Done Reply Inline Actions When does this happen (is there a test)? Why does this only happen when the select is operand 1 of the binop? Better to remove the 'SelOpNo' condition to be safer? spatel: When does this happen (is there a test)? Why does this only happen when the select is operand 1…
		rampitecAuthorUnsubmitted Done Reply Inline Actions That is if shift value and amount have different types. On x86 shift amount is i8 regardless of LHS. The new test shl_constant_sel_constants does not fold on x86 but does on amdgpu. W/o the check this test asserts. At the same time old test does work on x86 and folds correctly, so I think it is only needed if I have swapped operands. In general that should be possible to write a piece of code to create correct VT for shifts, but I'd better leave that to x86 folks. I guess range checking will be also needed if such a code is about to be written. rampitec: That is if shift value and amount have different types. On x86 shift amount is i8 regardless of…
		spatelUnsubmitted Done Reply Inline Actions Thanks - that was my guess, and the TODO + test makes it clear. LGTM, but get final approval from @arsenm once the AMDGPU tests are updated. spatel: Thanks - that was my guess, and the TODO + test makes it clear. LGTM, but get final approval…
		rampitecAuthorUnsubmitted Not Done Reply Inline Actions So @arsenm has reviewed the tests. @spatel it looks like your previous vote technically holds the review now. Can this be submitted? rampitec: So @arsenm has reviewed the tests. @spatel it looks like your previous vote technically holds…
if (!NewCT.isUndef() &&		if (!CanFoldNonConst && !NewCT.isUndef() &&
!isConstantOrConstantVector(NewCT, true) &&		!isConstantOrConstantVector(NewCT, true) &&
!isConstantFPBuildVectorOrConstantFP(NewCT))		!isConstantFPBuildVectorOrConstantFP(NewCT))
return SDValue();		return SDValue();

SDValue NewCF = DAG.getNode(BinOpcode, DL, VT, CF, C1);		SDValue NewCF = DAG.getNode(BinOpcode, DL, VT, CF, C1);
if (!NewCF.isUndef() &&		if (!CanFoldNonConst && !NewCF.isUndef() &&
!isConstantOrConstantVector(NewCF, true) &&		!isConstantOrConstantVector(NewCF, true) &&
!isConstantFPBuildVectorOrConstantFP(NewCF))		!isConstantFPBuildVectorOrConstantFP(NewCF))
return SDValue();		return SDValue();

return DAG.getSelect(DL, VT, Sel.getOperand(0), NewCT, NewCF);		return DAG.getSelect(DL, VT, Sel.getOperand(0), NewCT, NewCF);
}		}

SDValue DAGCombiner::visitADD(SDNode *N) {		SDValue DAGCombiner::visitADD(SDNode *N) {
▲ Show 20 Lines • Show All 16,290 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/dagcombine-select.ll

This file was added.

				; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

				arsenmUnsubmitted Done Reply Inline Actions Use an i16 target for i16 tests arsenm: Use an i16 target for i16 tests
				; GCN-LABEL: {{^}}select_and1:
				; GCN: v_cndmask_b32_e32 [[SEL:v[0-9]+]], 0, v{{[0-9]+}},
				; GCN-NOT: v_and_b32
				; GCN: store_dword [[SEL]],
				arsenmUnsubmitted Done Reply Inline Actions Some tests with i16 would be useful (as well as vectors) arsenm: Some tests with i16 would be useful (as well as vectors)
				define amdgpu_kernel void @select_and1(i32 addrspace(1)* %p, i32 %x, i32 %y) {
				%c = icmp slt i32 %x, 11
				%s = select i1 %c, i32 0, i32 -1
				%a = and i32 %y, %s
				store i32 %a, i32 addrspace(1)* %p, align 4
				ret void
				}

				; GCN-LABEL: {{^}}select_and2:
				; GCN: v_cndmask_b32_e32 [[SEL:v[0-9]+]], 0, v{{[0-9]+}},
				; GCN-NOT: v_and_b32
				; GCN: store_dword [[SEL]],
				define amdgpu_kernel void @select_and2(i32 addrspace(1)* %p, i32 %x, i32 %y) {
				%c = icmp slt i32 %x, 11
				%s = select i1 %c, i32 0, i32 -1
				%a = and i32 %s, %y
				store i32 %a, i32 addrspace(1)* %p, align 4
				ret void
				}

				; GCN-LABEL: {{^}}select_or1:
				; GCN: v_cndmask_b32_e32 [[SEL:v[0-9]+]], -1, v{{[0-9]+}},
				; GCN-NOT: v_or_b32
				; GCN: store_dword [[SEL]],
				define amdgpu_kernel void @select_or1(i32 addrspace(1)* %p, i32 %x, i32 %y) {
				%c = icmp slt i32 %x, 11
				%s = select i1 %c, i32 0, i32 -1
				%a = or i32 %y, %s
				store i32 %a, i32 addrspace(1)* %p, align 4
				ret void
				}

				; GCN-LABEL: {{^}}select_or2:
				; GCN: v_cndmask_b32_e32 [[SEL:v[0-9]+]], -1, v{{[0-9]+}},
				; GCN-NOT: v_or_b32
				; GCN: store_dword [[SEL]],
				define amdgpu_kernel void @select_or2(i32 addrspace(1)* %p, i32 %x, i32 %y) {
				%c = icmp slt i32 %x, 11
				%s = select i1 %c, i32 0, i32 -1
				%a = or i32 %s, %y
				store i32 %a, i32 addrspace(1)* %p, align 4
				ret void
				}

test/CodeGen/AMDGPU/udivrem.ll

	Show All 25 Lines
	; EG-DAG: SUB_INT			; EG-DAG: SUB_INT
	; EG-DAG: CNDE_INT			; EG-DAG: CNDE_INT
	; EG-DAG: CNDE_INT			; EG-DAG: CNDE_INT

	; SI: v_rcp_iflag_f32_e32 [[RCP:v[0-9]+]]			; SI: v_rcp_iflag_f32_e32 [[RCP:v[0-9]+]]
	; SI-DAG: v_mul_hi_u32 [[RCP_HI:v[0-9]+]], [[RCP]]			; SI-DAG: v_mul_hi_u32 [[RCP_HI:v[0-9]+]], [[RCP]]
	; SI-DAG: v_mul_lo_i32 [[RCP_LO:v[0-9]+]], [[RCP]]			; SI-DAG: v_mul_lo_i32 [[RCP_LO:v[0-9]+]], [[RCP]]
	; SI-DAG: v_sub_{{[iu]}}32_e32 [[NEG_RCP_LO:v[0-9]+]], vcc, 0, [[RCP_LO]]			; SI-DAG: v_sub_{{[iu]}}32_e32 [[NEG_RCP_LO:v[0-9]+]], vcc, 0, [[RCP_LO]]
	; SI: v_cndmask_b32_e64			; SI: v_cmp_eq_u32_e64 [[CC1:s\[[0-9:]+\]]], 0, [[RCP_HI]]
	; SI: v_mul_hi_u32 [[E:v[0-9]+]], {{v[0-9]+}}, [[RCP]]			; SI: v_cndmask_b32_e64 [[CND1:v[0-9]+]], [[RCP_LO]], [[NEG_RCP_LO]], [[CC1]]
				; SI: v_mul_hi_u32 [[E:v[0-9]+]], [[CND1]], [[RCP]]
	; SI-DAG: v_add_{{[iu]}}32_e32 [[RCP_A_E:v[0-9]+]], vcc, [[E]], [[RCP]]			; SI-DAG: v_add_{{[iu]}}32_e32 [[RCP_A_E:v[0-9]+]], vcc, [[E]], [[RCP]]
	; SI-DAG: v_subrev_{{[iu]}}32_e32 [[RCP_S_E:v[0-9]+]], vcc, [[E]], [[RCP]]			; SI-DAG: v_subrev_{{[iu]}}32_e32 [[RCP_S_E:v[0-9]+]], vcc, [[E]], [[RCP]]
	; SI: v_cndmask_b32_e64			; SI: v_cndmask_b32_e64 [[CND2:v[0-9]+]], [[RCP_S_E]], [[RCP_A_E]], [[CC1]]
	; SI: v_mul_hi_u32 [[Quotient:v[0-9]+]]			; SI: v_mul_hi_u32 [[Quotient:v[0-9]+]], [[CND2]],
	; SI: v_mul_lo_i32 [[Num_S_Remainder:v[0-9]+]]			; SI: v_mul_lo_i32 [[Num_S_Remainder:v[0-9]+]], [[CND2]]
	; SI-DAG: v_add_{{[iu]}}32_e32 [[Quotient_A_One:v[0-9]+]], vcc, 1, [[Quotient]]			; SI-DAG: v_add_{{[iu]}}32_e32 [[Quotient_A_One:v[0-9]+]], vcc, 1, [[Quotient]]
	; SI-DAG: v_sub_{{[iu]}}32_e32 [[Remainder:v[0-9]+]], vcc, {{[vs][0-9]+}}, [[Num_S_Remainder]]			; SI-DAG: v_sub_{{[iu]}}32_e32 [[Remainder:v[0-9]+]], vcc, {{[vs][0-9]+}}, [[Num_S_Remainder]]
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_subrev_{{[iu]}}32_e32 [[Quotient_S_One:v[0-9]+]],			; SI-DAG: v_subrev_{{[iu]}}32_e32 [[Quotient_S_One:v[0-9]+]],
	; SI-DAG: v_subrev_{{[iu]}}32_e32 [[Remainder_S_Den:v[0-9]+]],			; SI-DAG: v_subrev_{{[iu]}}32_e32 [[Remainder_S_Den:v[0-9]+]],
	; SI: v_and_b32_e32 [[Tmp1:v[0-9]+]]
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_add_{{[iu]}}32_e32 [[Remainder_A_Den:v[0-9]+]],			; SI-DAG: v_add_{{[iu]}}32_e32 [[Remainder_A_Den:v[0-9]+]],
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
				; SI-NOT: v_and_b32
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_udivrem(i32 addrspace(1)* %out0, i32 addrspace(1)* %out1, i32 %x, i32 %y) {			define amdgpu_kernel void @test_udivrem(i32 addrspace(1)* %out0, i32 addrspace(1)* %out1, i32 %x, i32 %y) {
	%result0 = udiv i32 %x, %y			%result0 = udiv i32 %x, %y
	store i32 %result0, i32 addrspace(1)* %out0			store i32 %result0, i32 addrspace(1)* %out0
	%result1 = urem i32 %x, %y			%result1 = urem i32 %x, %y
	store i32 %result1, i32 addrspace(1)* %out1			store i32 %result1, i32 addrspace(1)* %out1
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_mul_lo_i32			; SI-DAG: v_mul_lo_i32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_and_b32_e32
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_rcp_iflag_f32_e32			; SI-DAG: v_rcp_iflag_f32_e32
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_mul_lo_i32			; SI-DAG: v_mul_lo_i32
	; SI-DAG: v_sub_{{[iu]}}32_e32			; SI-DAG: v_sub_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_mul_lo_i32			; SI-DAG: v_mul_lo_i32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_and_b32_e32
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
				; SI-NOT: v_and_b32
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_udivrem_v2(<2 x i32> addrspace(1)* %out, <2 x i32> %x, <2 x i32> %y) {			define amdgpu_kernel void @test_udivrem_v2(<2 x i32> addrspace(1)* %out, <2 x i32> %x, <2 x i32> %y) {
	%result0 = udiv <2 x i32> %x, %y			%result0 = udiv <2 x i32> %x, %y
	store <2 x i32> %result0, <2 x i32> addrspace(1)* %out			store <2 x i32> %result0, <2 x i32> addrspace(1)* %out
	%result1 = urem <2 x i32> %x, %y			%result1 = urem <2 x i32> %x, %y
	store <2 x i32> %result1, <2 x i32> addrspace(1)* %out			store <2 x i32> %result1, <2 x i32> addrspace(1)* %out
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_mul_lo_i32			; SI-DAG: v_mul_lo_i32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_and_b32_e32
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_rcp_iflag_f32_e32			; SI-DAG: v_rcp_iflag_f32_e32
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_mul_lo_i32			; SI-DAG: v_mul_lo_i32
	; SI-DAG: v_sub_{{[iu]}}32_e32			; SI-DAG: v_sub_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_mul_lo_i32			; SI-DAG: v_mul_lo_i32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_and_b32_e32
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_rcp_iflag_f32_e32			; SI-DAG: v_rcp_iflag_f32_e32
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_mul_lo_i32			; SI-DAG: v_mul_lo_i32
	; SI-DAG: v_sub_{{[iu]}}32_e32			; SI-DAG: v_sub_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_mul_lo_i32			; SI-DAG: v_mul_lo_i32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_and_b32_e32
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_rcp_iflag_f32_e32			; SI-DAG: v_rcp_iflag_f32_e32
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_mul_lo_i32			; SI-DAG: v_mul_lo_i32
	; SI-DAG: v_sub_{{[iu]}}32_e32			; SI-DAG: v_sub_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
	; SI-DAG: v_mul_hi_u32			; SI-DAG: v_mul_hi_u32
	; SI-DAG: v_add_{{[iu]}}32_e32			; SI-DAG: v_add_{{[iu]}}32_e32
	; SI-DAG: v_subrev_{{[iu]}}32_e32			; SI-DAG: v_subrev_{{[iu]}}32_e32
	; SI-DAG: v_cndmask_b32_e64			; SI-DAG: v_cndmask_b32_e64
				; SI-NOT: v_and_b32
	; SI: s_endpgm			; SI: s_endpgm
	define amdgpu_kernel void @test_udivrem_v4(<4 x i32> addrspace(1)* %out, <4 x i32> %x, <4 x i32> %y) {			define amdgpu_kernel void @test_udivrem_v4(<4 x i32> addrspace(1)* %out, <4 x i32> %x, <4 x i32> %y) {
	%result0 = udiv <4 x i32> %x, %y			%result0 = udiv <4 x i32> %x, %y
	store <4 x i32> %result0, <4 x i32> addrspace(1)* %out			store <4 x i32> %result0, <4 x i32> addrspace(1)* %out
	%result1 = urem <4 x i32> %x, %y			%result1 = urem <4 x i32> %x, %y
	store <4 x i32> %result1, <4 x i32> addrspace(1)* %out			store <4 x i32> %result1, <4 x i32> addrspace(1)* %out
	ret void			ret void
	}			}

test/CodeGen/X86/dagcombine-select.ll

This file was added.

				; RUN: llc -march=x86-64 -verify-machineinstrs < %s \| FileCheck -enable-var-scope %s
				lebedev.riUnsubmitted Done Reply Inline Actions Most tests (and practically all new x86 tests) use `utils/update_llc_test_checks.py` script to auto-generate these check lines. lebedev.ri: Most tests (and practically all new x86 tests) use `utils/update_llc_test_checks.py` script to…
				lebedev.riUnsubmitted Done Reply Inline Actions (not actually done, that does not look like the utility's output, and the first line does not say the script was used) lebedev.ri: (not actually done, that does not look like the utility's output, and the first line does not…
				rampitecAuthorUnsubmitted Done Reply Inline Actions Oops, sorry. Clicked wrong checkbox. Now updated. rampitec: Oops, sorry. Clicked wrong checkbox. Now updated.

				; CHECK-LABEL: {{^}}select_and1:
				; CHECK: cmpl $11, %edi
				; CHECK-NEXT: cmovgel %esi, %eax
				; CHECK-NEXT: retq
				define i32 @select_and1(i32 %x, i32 %y) {
				%c = icmp slt i32 %x, 11
				%s = select i1 %c, i32 0, i32 -1
				%a = and i32 %y, %s
				ret i32 %a
				}

				; CHECK-LABEL: {{^}}select_and2:
				; CHECK: cmpl $11, %edi
				; CHECK-NEXT: cmovgel %esi, %eax
				; CHECK-NEXT: retq
				lebedev.riUnsubmitted Not Done Reply Inline Actions Hm, is there some omitted instruction, or is this actually better than what we currently normally do? https://godbolt.org/g/7ULPfH lebedev.ri: Hm, is there some omitted instruction, or is this actually better than what we currently…
				rampitecAuthorUnsubmitted Not Done Reply Inline Actions Yes. I have updated the test to contain the full ISA. First xor to zero out eax was omitted. I am not sure what compiler explorer does, but that is what trunk llc has produced: xorl %eax, %eax cmpl $11, %edi setl %al decl %eax andl %esi, %eax retq I assume difference comes from running or not running opt. rampitec: Yes. I have updated the test to contain the full ISA. First xor to zero out eax was omitted. I…
				rampitecAuthorUnsubmitted Not Done Reply Inline Actions E.g. compare w/o opt: https://godbolt.org/g/NMZ9he rampitec: E.g. compare w/o opt: https://godbolt.org/g/NMZ9he
				lebedev.riUnsubmitted Not Done Reply Inline Actions Ok, so the only difference is that the strictness of the comparison is inverted (`cmovg` vs `cmovge` and the other way around). lebedev.ri: Ok, so the only difference is that the strictness of the comparison is inverted (`cmovg` vs…
				rampitecAuthorUnsubmitted Not Done Reply Inline Actions Do you see "and eax, esi" instruction on the second from the right tab using my link? This one was eliminated. In general if you run opt that is not an issue. If you run llc only (as in the test) you can see it. Normally that shall not happen, as I wrote in the description InstCombine can get rid of it. However, amdgcn can produce it during lowering and InstCombine does not play. rampitec: Do you see "and eax, esi" instruction on the second from the right tab using my link? This one…
				lebedev.riUnsubmitted Not Done Reply Inline Actions Of course. I was only talking about comparing the `llc(D48223) test/CodeGen/X86/dagcombine-select.ll` with the `opt(trunk) test/CodeGen/X86/dagcombine-select.ll \| llc(trunk)` output. lebedev.ri: Of course. I was only talking about comparing the `llc(D48223) test/CodeGen/X86/dagcombine…
				define i32 @select_and2(i32 %x, i32 %y) {
				%c = icmp slt i32 %x, 11
				%s = select i1 %c, i32 0, i32 -1
				%a = and i32 %s, %y
				ret i32 %a
				}

				; CHECK-LABEL: {{^}}select_or1:
				; CHECK: cmpl $11, %edi
				; CHECK-NEXT: movl $-1, %eax
				; CHECK-NEXT: cmovll %esi, %eax
				; CHECK-NEXT: retq
				define i32 @select_or1(i32 %x, i32 %y) {
				%c = icmp slt i32 %x, 11
				%s = select i1 %c, i32 0, i32 -1
				%a = or i32 %y, %s
				ret i32 %a
				}

				; CHECK-LABEL: {{^}}select_or2:
				; CHECK: cmpl $11, %edi
				; CHECK-NEXT: movl $-1, %eax
				; CHECK-NEXT: cmovll %esi, %eax
				; CHECK-NEXT: retq
				define i32 @select_or2(i32 %x, i32 %y) {
				%c = icmp slt i32 %x, 11
				%s = select i1 %c, i32 0, i32 -1
				%a = or i32 %s, %y
				ret i32 %a
				}