This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
Target/X86/
-
X86/
-
X86ISelLowering.h
-
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
icmp-opt.ll
-
selectcc-to-shiftand.ll
-
unfold-masked-merge-scalar-variablemask.ll

Differential D46493

[DagCombiner] Not all 'andn''s work with immediates.
ClosedPublic

Authored by lebedev.ri on May 5 2018, 4:15 AM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper

Commits

rGcc42d08b1daf: [DagCombiner] Not all 'andn''s work with immediates.
rL331684: [DagCombiner] Not all 'andn''s work with immediates.

Summary

Split off from D46031.

In masked merge case, this degrades IPC by decreasing instruction count.

diff.txt2 KBDownload

The next patch should be able to recover and improve this.

This also affects the transform @spatel have added in D27489 / rL289738,
and the test coverage for X86 was missing.
But after i have added it, and looked at the changes in MCA, i'm somewhat confused.

icmp-opt.txt2 KBDownload

pos_sel_constants.txt2 KBDownload

pos_sel_special_constant.txt2 KBDownload

I'd say this regression is an improvement, since IPC increased in that case?

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.May 5 2018, 4:15 AM

lebedev.ri added a parent revision: D46492: [DAGCombiner] Masked merge: don't touch "not" xor's..

lebedev.ri mentioned this in D46494: [DAGCombiner] Masked merge: enhance handling of 'andn' with immediates.May 5 2018, 4:29 AM

lebedev.ri added a child revision: D46494: [DAGCombiner] Masked merge: enhance handling of 'andn' with immediates.

lebedev.ri mentioned this in D46031: [DAGCombiner] Masked merge: if 'B' is constant, de-canonicalize the pattern (invert the mask)..

I'd say this regression is an improvement, since IPC increased in that case?

As a rule of thumb when using llvm-mca, it's best to always remove return statements from the assembly code sequence.
llvm-mca should have warned you about the presence of a return statement in the input sequence:

warning: found a return instruction in the input assembly sequence.
note: program counter updates are ignored.

To get the correct resource pressure distribution in example icmp-opt.txt, you should remove the retq.
As a result, you should see this:

Resource pressure per iteration:
[0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]
 -      -     0.75   0.75   0.75   0.75    -      -      -      -      -      -

Resource pressure by instruction:
[0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]       Instructions:
 -      -     0.25   0.25   0.25   0.25    -      -      -      -      -      -         shrq    $63, %rdi
 -      -     0.25   0.25   0.25   0.25    -      -      -      -      -      -         xorl    $1, %edi
 -      -     0.25   0.25   0.25   0.25    -      -      -      -      -      -         movl    %edi, %eax

And...

Resource pressure by instruction:
[0]    [1]    [2]    [3]    [4]    [5]    [6]    [7]    [8]    [9]    [10]   [11]       Instructions:
 -      -     0.25   0.25   0.25   0.25    -      -      -      -      -      -         xorl    %eax, %eax
 -      -     0.25   0.25   0.25   0.25    -      -      -      -      -      -         testq   %rdi, %rdi
 -      -     0.25   0.25   0.25   0.25    -      -      -      -      -      -         setns   %al

In terms of register pressure, the two code sequences are equally good.

The shlq+xor+mov is worse in terms of IPC because of the data dependency on %edi that limits the ILP when executing multiple iterations of the loop.
If you run multiple iterations and print the timeline view, you can see how the "average wait time" in the scheduler's queue is quite high for the shlq instruction.

You can see a similar behavior in test pos_sel_constants. The only problem I see is the slow LEA instruction (which is not treated specially by the scheduling model; at the moment it uses the same resources as a normal LEA).
Assuming that these instructions are executed in a loop, the new variant suffer less for long data dependencies between iterations.

In D46493#1089097, @andreadb wrote:
I'd say this regression is an improvement, since IPC increased in that case?

As a rule of thumb when using llvm-mca, it's best to always remove return statements from the assembly code sequence.
llvm-mca should have warned you about the presence of a return statement in the input sequence:
warning: found a return instruction in the input assembly sequence.
note: program counter updates are ignored.
To get the correct resource pressure distribution in example icmp-opt.txt, you should remove the retq.

Thank you for your comments!
I will filter it out, so hopefully in future my mca expirience will be better :)

In D46493#1089097, @andreadb wrote:

If you run multiple iterations and print the timeline view, you can see how the "average wait time" in the scheduler's queue is quite high for the shlq instruction.

Aha, so far i kinda ignored -timeline switch.
Those flags need work i think. I have just tried enabling them all, and it seems like they invert the current state?
I'd like to 1. have a switch to turn them all on, 2. maybe print which ones are currently enabled in -help

In D46493#1089104, @lebedev.ri wrote:
In D46493#1089097, @andreadb wrote:
I'd say this regression is an improvement, since IPC increased in that case?

As a rule of thumb when using llvm-mca, it's best to always remove return statements from the assembly code sequence.
llvm-mca should have warned you about the presence of a return statement in the input sequence:
warning: found a return instruction in the input assembly sequence.
note: program counter updates are ignored.
To get the correct resource pressure distribution in example icmp-opt.txt, you should remove the retq.
Thank you for your comments!
I will filter it out, so hopefully in future my mca expirience will be better :)

In D46493#1089097, @andreadb wrote:

If you run multiple iterations and print the timeline view, you can see how the "average wait time" in the scheduler's queue is quite high for the shlq instruction.

Aha, so far i kinda ignored -timeline switch.
Those flags need work i think. I have just tried enabling them all, and it seems like they invert the current state?
I'd like to 1. have a switch to turn them all on, 2. maybe print which ones are currently enabled in -help

That should not happen.
I guess you passed flag -instruction-tables too? To avoid confusions, I will remove that flag from the "View Options", since it is used to print a completely different report.

I'd like to 1. have a switch to turn them all on,

Sure, that can be added.

maybe print which ones are currently enabled in -help

That can be done. At the moment, the llvm-mca commandline documentation specifies which views are enabled by default. I am going to add that information to the "help" too.

Cheers,
-Andrea

In D46493#1089126, @andreadb wrote:
In D46493#1089104, @lebedev.ri wrote:
In D46493#1089097, @andreadb wrote:
I'd say this regression is an improvement, since IPC increased in that case?

As a rule of thumb when using llvm-mca, it's best to always remove return statements from the assembly code sequence.
llvm-mca should have warned you about the presence of a return statement in the input sequence:
warning: found a return instruction in the input assembly sequence.
note: program counter updates are ignored.
To get the correct resource pressure distribution in example icmp-opt.txt, you should remove the retq.
Thank you for your comments!
I will filter it out, so hopefully in future my mca expirience will be better :)

In D46493#1089097, @andreadb wrote:

If you run multiple iterations and print the timeline view, you can see how the "average wait time" in the scheduler's queue is quite high for the shlq instruction.

Aha, so far i kinda ignored -timeline switch.
Those flags need work i think. I have just tried enabling them all, and it seems like they invert the current state?
I'd like to 1. have a switch to turn them all on, 2. maybe print which ones are currently enabled in -help
That should not happen.
I guess you passed flag -instruction-tables too? To avoid confusions, I will remove that flag from the "View Options", since it is used to print a completely different report.

Sounds about right.

I'd like to 1. have a switch to turn them all on,
Sure, that can be added.

maybe print which ones are currently enabled in -help

That can be done. At the moment, the llvm-mca commandline documentation specifies which views are enabled by default. I am going to add that information to the "help" too.

Yay! Thank you!

Cheers,
-Andrea

Rebased.

LGTM - but I want to point out that the mca analysis may not be accurate for these sequences yet. We don't have the zero idiom (xor %eax, %eax) recognition yet ( https://bugs.llvm.org/show_bug.cgi?id=36671 ), so the stats for those cases are probably pessimistic.

This revision is now accepted and ready to land.May 7 2018, 10:22 AM

In D46493#1089988, @spatel wrote:

LGTM

Thank you for the review!

but I want to point out that the mca analysis may not be accurate for these sequences yet. We don't have the zero idiom (xor %eax, %eax) recognition yet ( https://bugs.llvm.org/show_bug.cgi?id=36671 ), so the stats for those cases are probably pessimistic.

I defer to you on the question whether icmp-opt.ll, selectcc-to-shiftand.ll changes are a regression (in which case they should be updated to use hasAndNotCompare() instead of hasAndNot())

In D46493#1090008, @lebedev.ri wrote:

In D46493#1089988, @spatel wrote:

LGTM

Thank you for the review!

but I want to point out that the mca analysis may not be accurate for these sequences yet. We don't have the zero idiom (xor %eax, %eax) recognition yet ( https://bugs.llvm.org/show_bug.cgi?id=36671 ), so the stats for those cases are probably pessimistic.

I defer to you on the question whether icmp-opt.ll, selectcc-to-shiftand.ll changes are a regression (in which case they should be updated to use hasAndNotCompare() instead of hasAndNot())

No - I think the test+set sequences are likely equal or better for most uarch, so those are improvements. They should also be less instruction bytes, so that's good too.

Closed by commit rL331684: [DagCombiner] Not all 'andn''s work with immediates. (authored by lebedevri). · Explain WhyMay 7 2018, 2:57 PM

This revision was automatically updated to reflect the committed changes.

Diffusion mentioned this in rL331685: [DAGCombiner] Masked merge: enhance handling of 'andn' with immediates.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

4 lines

Target/

X86/

X86ISelLowering.h

2 lines

X86ISelLowering.cpp

11 lines

test/

CodeGen/

X86/

icmp-opt.ll

6 lines

selectcc-to-shiftand.ll

16 lines

unfold-masked-merge-scalar-variablemask.ll

11 lines

Diff 145557

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,422 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::unfoldMaskedMerge(SDNode *N) {
// probably shouldn't produce it, too.		// probably shouldn't produce it, too.
if (isa<ConstantSDNode>(M.getNode()))		if (isa<ConstantSDNode>(M.getNode()))
return SDValue();		return SDValue();

// We can transform if the target has AndNot		// We can transform if the target has AndNot
if (!TLI.hasAndNot(M))		if (!TLI.hasAndNot(M))
return SDValue();		return SDValue();

		// If Y is a constant, check that 'andn' works with immediates.
		if (!TLI.hasAndNot(Y))
		return SDValue();

SDLoc DL(N);		SDLoc DL(N);

SDValue LHS = DAG.getNode(ISD::AND, DL, VT, X, M);		SDValue LHS = DAG.getNode(ISD::AND, DL, VT, X, M);
SDValue NotM = DAG.getNOT(DL, M, VT);		SDValue NotM = DAG.getNOT(DL, M, VT);
SDValue RHS = DAG.getNode(ISD::AND, DL, VT, Y, NotM);		SDValue RHS = DAG.getNode(ISD::AND, DL, VT, Y, NotM);

return DAG.getNode(ISD::OR, DL, VT, LHS, RHS);		return DAG.getNode(ISD::OR, DL, VT, LHS, RHS);
}		}
▲ Show 20 Lines • Show All 12,575 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 828 Lines • ▼ Show 20 Lines	bool isMultiStoresCheaperThanBitsMerge(EVT LTy, EVT HTy) const override {
// such pair out until we get testcase to prove it is a win.		// such pair out until we get testcase to prove it is a win.
return false;		return false;
}		}

bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const override;		bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const override;

bool hasAndNotCompare(SDValue Y) const override;		bool hasAndNotCompare(SDValue Y) const override;

		bool hasAndNot(SDValue Y) const override;

bool convertSetCCLogicToBitwiseLogic(EVT VT) const override {		bool convertSetCCLogicToBitwiseLogic(EVT VT) const override {
return VT.isScalarInteger();		return VT.isScalarInteger();
}		}

/// Vector-sized comparisons are fast using PCMPEQ + PMOVMSK or PTEST.		/// Vector-sized comparisons are fast using PCMPEQ + PMOVMSK or PTEST.
MVT hasFastEqualityCompare(unsigned NumBits) const override;		MVT hasFastEqualityCompare(unsigned NumBits) const override;

/// Allow multiple load pairs per block for smaller and faster code.		/// Allow multiple load pairs per block for smaller and faster code.
▲ Show 20 Lines • Show All 713 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,737 Lines • ▼ Show 20 Lines
	}			}

	bool X86TargetLowering::isMaskAndCmp0FoldingBeneficial(			bool X86TargetLowering::isMaskAndCmp0FoldingBeneficial(
	const Instruction &AndI) const {			const Instruction &AndI) const {
	return true;			return true;
	}			}

	bool X86TargetLowering::hasAndNotCompare(SDValue Y) const {			bool X86TargetLowering::hasAndNotCompare(SDValue Y) const {
				// A mask and compare against constant is ok for an 'andn' too
				// even though the BMI instruction doesn't have an immediate form.

	if (!Subtarget.hasBMI())			if (!Subtarget.hasBMI())
	return false;			return false;

	// There are only 32-bit and 64-bit forms for 'andn'.			// There are only 32-bit and 64-bit forms for 'andn'.
	EVT VT = Y.getValueType();			EVT VT = Y.getValueType();
	if (VT != MVT::i32 && VT != MVT::i64)			if (VT != MVT::i32 && VT != MVT::i64)
	return false;			return false;

	return true;			return true;
	}			}

				bool X86TargetLowering::hasAndNot(SDValue Y) const {
				// x86 can't form 'andn' with an immediate.
				if (isa<ConstantSDNode>(Y))
				return false;

				return hasAndNotCompare(Y);
				}

	MVT X86TargetLowering::hasFastEqualityCompare(unsigned NumBits) const {			MVT X86TargetLowering::hasFastEqualityCompare(unsigned NumBits) const {
	MVT VT = MVT::getIntegerVT(NumBits);			MVT VT = MVT::getIntegerVT(NumBits);
	if (isTypeLegal(VT))			if (isTypeLegal(VT))
	return VT;			return VT;

	// PMOVMSKB can handle this.			// PMOVMSKB can handle this.
	if (NumBits == 128 && isTypeLegal(MVT::v16i8))			if (NumBits == 128 && isTypeLegal(MVT::v16i8))
	return MVT::v16i8;			return MVT::v16i8;
	▲ Show 20 Lines • Show All 34,997 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/icmp-opt.ll

	Show All 11 Lines
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %eax, %eax			; CHECK-NOBMI-NEXT: xorl %eax, %eax
	; CHECK-NOBMI-NEXT: testq %rdi, %rdi			; CHECK-NOBMI-NEXT: testq %rdi, %rdi
	; CHECK-NOBMI-NEXT: setns %al			; CHECK-NOBMI-NEXT: setns %al
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: t1:			; CHECK-BMI-LABEL: t1:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: shrq $63, %rdi			; CHECK-BMI-NEXT: xorl %eax, %eax
	; CHECK-BMI-NEXT: xorl $1, %edi			; CHECK-BMI-NEXT: testq %rdi, %rdi
	; CHECK-BMI-NEXT: movl %edi, %eax			; CHECK-BMI-NEXT: setns %al
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%cmp = icmp sgt i64 %a, -1			%cmp = icmp sgt i64 %a, -1
	%conv = zext i1 %cmp to i32			%conv = zext i1 %cmp to i32
	ret i32 %conv			ret i32 %conv
	}			}

llvm/trunk/test/CodeGen/X86/selectcc-to-shiftand.ll

	Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; CHECK-NOBMI-NEXT: xorl %eax, %eax			; CHECK-NOBMI-NEXT: xorl %eax, %eax
	; CHECK-NOBMI-NEXT: testl %edi, %edi			; CHECK-NOBMI-NEXT: testl %edi, %edi
	; CHECK-NOBMI-NEXT: setns %al			; CHECK-NOBMI-NEXT: setns %al
	; CHECK-NOBMI-NEXT: leal (%rax,%rax,4), %eax			; CHECK-NOBMI-NEXT: leal (%rax,%rax,4), %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: pos_sel_constants:			; CHECK-BMI-LABEL: pos_sel_constants:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: sarl $31, %edi			; CHECK-BMI-NEXT: xorl %eax, %eax
	; CHECK-BMI-NEXT: notl %edi			; CHECK-BMI-NEXT: testl %edi, %edi
	; CHECK-BMI-NEXT: andl $5, %edi			; CHECK-BMI-NEXT: setns %al
	; CHECK-BMI-NEXT: movl %edi, %eax			; CHECK-BMI-NEXT: leal (%rax,%rax,4), %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%tmp.1 = icmp sgt i32 %a, -1			%tmp.1 = icmp sgt i32 %a, -1
	%retval = select i1 %tmp.1, i32 5, i32 0			%retval = select i1 %tmp.1, i32 5, i32 0
	ret i32 %retval			ret i32 %retval
	}			}

	; Compare if positive and select of constants where one constant is zero and the other is a single bit.			; Compare if positive and select of constants where one constant is zero and the other is a single bit.

	define i32 @pos_sel_special_constant(i32 %a) {			define i32 @pos_sel_special_constant(i32 %a) {
	; CHECK-NOBMI-LABEL: pos_sel_special_constant:			; CHECK-NOBMI-LABEL: pos_sel_special_constant:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %eax, %eax			; CHECK-NOBMI-NEXT: xorl %eax, %eax
	; CHECK-NOBMI-NEXT: testl %edi, %edi			; CHECK-NOBMI-NEXT: testl %edi, %edi
	; CHECK-NOBMI-NEXT: setns %al			; CHECK-NOBMI-NEXT: setns %al
	; CHECK-NOBMI-NEXT: shll $9, %eax			; CHECK-NOBMI-NEXT: shll $9, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: pos_sel_special_constant:			; CHECK-BMI-LABEL: pos_sel_special_constant:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: shrl $22, %edi			; CHECK-BMI-NEXT: xorl %eax, %eax
	; CHECK-BMI-NEXT: notl %edi			; CHECK-BMI-NEXT: testl %edi, %edi
	; CHECK-BMI-NEXT: andl $512, %edi # imm = 0x200			; CHECK-BMI-NEXT: setns %al
	; CHECK-BMI-NEXT: movl %edi, %eax			; CHECK-BMI-NEXT: shll $9, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%tmp.1 = icmp sgt i32 %a, -1			%tmp.1 = icmp sgt i32 %a, -1
	%retval = select i1 %tmp.1, i32 512, i32 0			%retval = select i1 %tmp.1, i32 512, i32 0
	ret i32 %retval			ret i32 %retval
	}			}

	; Compare if positive and select variable or zero.			; Compare if positive and select variable or zero.

	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/unfold-masked-merge-scalar-variablemask.ll

	Show First 20 Lines • Show All 651 Lines • ▼ Show 20 Lines
	; CHECK-NOBMI-NEXT: xorl $42, %edi			; CHECK-NOBMI-NEXT: xorl $42, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl $42, %edi			; CHECK-NOBMI-NEXT: xorl $42, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_constant_varx_42:			; CHECK-BMI-LABEL: in_constant_varx_42:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
				; CHECK-BMI-NEXT: xorl $42, %edi
	; CHECK-BMI-NEXT: andl %edx, %edi			; CHECK-BMI-NEXT: andl %edx, %edi
	; CHECK-BMI-NEXT: notl %edx			; CHECK-BMI-NEXT: xorl $42, %edi
	; CHECK-BMI-NEXT: andl $42, %edx			; CHECK-BMI-NEXT: movl %edi, %eax
	; CHECK-BMI-NEXT: orl %edi, %edx
	; CHECK-BMI-NEXT: movl %edx, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i32 %x, 42 ; %x			%n0 = xor i32 %x, 42 ; %x
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, 42			%r = xor i32 %n1, 42
	ret i32 %r			ret i32 %r
	}			}
	; This is not a canonical form. Testing for completeness only.			; This is not a canonical form. Testing for completeness only.
	define i32 @out_constant_varx_42_invmask(i32 %x, i32 %y, i32 %mask) {			define i32 @out_constant_varx_42_invmask(i32 %x, i32 %y, i32 %mask) {
	Show All 27 Lines
	; CHECK-NOBMI-NEXT: xorl $42, %edi			; CHECK-NOBMI-NEXT: xorl $42, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl $42, %edi			; CHECK-NOBMI-NEXT: xorl $42, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_constant_varx_42_invmask:			; CHECK-BMI-LABEL: in_constant_varx_42_invmask:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
				; CHECK-BMI-NEXT: xorl $42, %edi
	; CHECK-BMI-NEXT: andnl %edi, %edx, %eax			; CHECK-BMI-NEXT: andnl %edi, %edx, %eax
	; CHECK-BMI-NEXT: andl $42, %edx			; CHECK-BMI-NEXT: xorl $42, %eax
	; CHECK-BMI-NEXT: orl %edx, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%notmask = xor i32 %mask, -1			%notmask = xor i32 %mask, -1
	%n0 = xor i32 %x, 42 ; %x			%n0 = xor i32 %x, 42 ; %x
	%n1 = and i32 %n0, %notmask			%n1 = and i32 %n0, %notmask
	%r = xor i32 %n1, 42			%r = xor i32 %n1, 42
	ret i32 %r			ret i32 %r
	}			}
	define i32 @out_constant_mone_vary(i32 %x, i32 %y, i32 %mask) {			define i32 @out_constant_mone_vary(i32 %x, i32 %y, i32 %mask) {
	▲ Show 20 Lines • Show All 331 Lines • Show Last 20 Lines