This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
2/3
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
unfold-masked-merge-scalar-variablemask.ll
-
X86/
-
unfold-masked-merge-scalar-variablemask.ll

Differential D45733

[DAGCombiner] Unfold scalar masked merge if profitable
ClosedPublic

Authored by lebedev.ri on Apr 17 2018, 11:52 AM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
RKSimon
javed.absar

Commits

rG95c6eaf530c1: [DAGCombiner] Unfold scalar masked merge if profitable
rL330646: [DAGCombiner] Unfold scalar masked merge if profitable

Summary

This is PR37104.

PR6773 will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, andl+andn/andps+andnps / bic/bsl would be generated. (see @out)
Now, they would no longer be generated (see @in).
So we need to make sure that they are still generated.

If the mask is constant, right now i always unfold it.
Else, i use hasAndNot() TLI hook.

For now, only handle scalars.

https://rise4fun.com/Alive/bO6

I *really* don't like the code i wrote in DAGCombiner::unfoldMaskedMerge().
It is super fragile. Is there something like IR Pattern Matchers for this?

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Apr 17 2018, 11:52 AM

lebedev.ri edited the summary of this revision. (Show Details)

lebedev.ri added inline comments.Apr 17 2018, 3:20 PM

test/CodeGen/X86/unfold-masked-merge-scalar.ll
376–398 ↗	(On Diff #142809)	Hm, this didn't lower into `andn`, unlike the non-constant-mask variant above. I'm guessing `movabsq` is interfering?

If the mask is constant, right now i always unfold it.

Let me make sure I understand. The fold in question is:

%n0 = xor i4 %x, %y
%n1 = and i4 %n0, C1
%r  = xor i4 %n1, %y
=>
%mx = and i4 %x, C1
%my = and i4 %y, ~C1
%r = or i4 %mx, %my

If that's correct, we need to take a step back here. If the fold is universally good, then it can go in InstCombine, and there's no need to add code bloat to the DAG to handle the pattern unless something in the backend can create this pattern (seems unlikely).

But we need to take another step back before we add code bloat to InstCombine. Is there evidence that this pattern exists in source (bug report, test-suite, etc) and affects analysis/performance? If not, is it worth the cost of adding a matcher for the pattern? It's a simple matcher, so the expense bar is low...but if it never happens, do we care?

spatel mentioned this in D45655: [InstCombine][RFC] Canonicalize constant mask in masked merge mattern.Apr 18 2018, 8:09 AM

In D45733#1070963, @spatel wrote:
If the mask is constant, right now i always unfold it.

Let me make sure I understand. The fold in question is:
%n0 = xor i4 %x, %y
%n1 = and i4 %n0, C1
%r  = xor i4 %n1, %y
=>
%mx = and i4 %x, C1
%my = and i4 %y, ~C1
%r = or i4 %mx, %my

Yes.

If that's correct, we need to take a step back here. If the fold is universally good, then it can go in InstCombine

Yeah, that is the question, i'm having. I did look at mca output.
Here is what MCA says about that for -mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a75

diff.txt1 KBDownload

Or is this a scheduling info problem?

and there's no need to add code bloat to the DAG to handle the pattern unless something in the backend can create this pattern (seems unlikely).

But we need to take another step back before we add code bloat to InstCombine. Is there evidence that this pattern exists in source (bug report, test-suite, etc) and affects analysis/performance? If not, is it worth the cost of adding a matcher for the pattern? It's a simple matcher, so the expense bar is low...but if it never happens, do we care?

In D45733#1071005, @lebedev.ri wrote:

Yeah, that is the question, i'm having. I did look at mca output.

Here is what MCA says about that for -mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a75

diff.txt1 KBDownload

Or is this a scheduling info problem?

Cool - a chance to poke at llvm-mca! (cc @andreadb and @courbet)

First thing I see is that it's harder to get the sequence we're after on x86 using the basic source premise:

int andandor(int x, int y)  {
  __asm volatile("# LLVM-MCA-BEGIN ands");
  int r = (x & 42) | (y & ~42);
  __asm volatile("# LLVM-MCA-END ands");
  return r;
}

int xorandxor(int x, int y) {
  __asm volatile("# LLVM-MCA-BEGIN xors");
  int r = ((x ^ y) & 42) ^ y;
  __asm volatile("# LLVM-MCA-END xors");
  return r;
}

...because the input param register doesn't match the output result register. We'd have to hack that in asm...or put the code in a loop, but subtract the loop overhead somehow. Things work/look alright to me other than that.

I don't know AArch that well, but your example is a special-case that may be going wrong. Ie, if we have a bit-string constant like 0xff000000, you could get:
bfxil w0, w1, #0, #24
...which should certainly be better than:
eor w8, w1, w0
and w8, w8, #0xff000000
eor w0, w8, w1

AArch64 chose to convert to shift + possibly more expensive bfi for the 0x00ffff00 constant though. That's not something that we can account for in generic DAGCombiner, so I'd categorize that as an AArch64-specific bug (either don't use bfi there or fix the scheduling model or fix this up in MI somehow).

In D45733#1071051, @spatel wrote:
In D45733#1071005, @lebedev.ri wrote:

Yeah, that is the question, i'm having. I did look at mca output.

Here is what MCA says about that for -mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a75

diff.txt1 KBDownload

Or is this a scheduling info problem?

Cool - a chance to poke at llvm-mca! (cc @andreadb and @courbet)

First thing I see is that it's harder to get the sequence we're after on x86 using the basic source premise:
int andandor(int x, int y)  {
  __asm volatile("# LLVM-MCA-BEGIN ands");
  int r = (x & 42) | (y & ~42);
  __asm volatile("# LLVM-MCA-END ands");
  return r;
}

int xorandxor(int x, int y) {
  __asm volatile("# LLVM-MCA-BEGIN xors");
  int r = ((x ^ y) & 42) ^ y;
  __asm volatile("# LLVM-MCA-END xors");
  return r;
}
...because the input param register doesn't match the output result register. We'd have to hack that in asm...or put the code in a loop, but subtract the loop overhead somehow. Things work/look alright to me other than that.

I simply stored the lhs and rhs side of // CHECK lines from aarch64's @in32_constmask in two local files,
run llvm-mca on each of them, and diffed the output, no clang was involved.

I don't know AArch that well, but your example is a special-case that may be going wrong. Ie, if we have a bit-string constant like 0xff000000, you could get:
bfxil w0, w1, #0, #24
...which should certainly be better than:
eor w8, w1, w0
and w8, w8, #0xff000000
eor w0, w8, w1

AArch64 chose to convert to shift + possibly more expensive bfi for the 0x00ffff00 constant though. That's not something that we can account for in generic DAGCombiner, so I'd categorize that as an AArch64-specific bug (either don't use bfi there or fix the scheduling model or fix this up in MI somehow).

Ok, then let's assume until proven otherwise that if mask is a constant, unfolded variant is always better.
I'll unfold it in instcombine (since it seems the D45664 will already match the masked merge pattern, so it would not add much code).

In D45733#1070963, @spatel wrote:

there's no need to add code bloat to the DAG to handle the pattern unless something in the backend can create this pattern (seems unlikely).

I don't think it will be possible to check that until after the instcombine part has landed, so ok, at least for now i will stop unfolding [constant mask] in dagcombine.

While there, any hint re pattern matchers for this code?

In D45733#1071068, @lebedev.ri wrote:

I don't think it will be possible to check that until after the instcombine part has landed, so ok, at least for now i will stop unfolding [constant mask] in dagcombine.

While there, any hint re pattern matchers for this code?

Unfortunately, DAG nodes don't have any equivalent match() infrastructure like IR that I know of. The commuted variants are what complicate this? Usually, I think we just std::swap() our way to the answer here in the DAG.

In D45733#1071088, @spatel wrote:

In D45733#1071068, @lebedev.ri wrote:

I don't think it will be possible to check that until after the instcombine part has landed, so ok, at least for now i will stop unfolding [constant mask] in dagcombine.

While there, any hint re pattern matchers for this code?

Unfortunately, DAG nodes don't have any equivalent match() infrastructure like IR that I know of.

Boo :(

The commuted variants are what complicate this?

Yes.

Usually, I think we just std::swap() our way to the answer here in the DAG.

Hmm, while i can see that working in many simple cases, i'm not sure that will be enough here.

Currently llvm-mca doesn't know how to resolve variant scheduling classes.
This problem mostly affects the ARM target.
This has been reported here: https://bugs.llvm.org/show_bug.cgi?id=36672

The number of micro opcodes that you see is the llvm-mca output is the
default (invalid) number of micro opcodes for instructions associated with
a sched-variant class.

I plan to send a patch to address (most of) the issues related to the
presence of variant scheduling classes. However, keep in mind that ARM
sched-predicates heavily rely on TII hooks. Those are going to cause
problems for tools like mca (i.e. there is not an easy way to "fix" them).

At the moment, llvm-mca doesnt' know how to analyze these two instructions,
since both are associated with a variant scheduling class:

 eor     w8, w0, w1
 mov w0, w1

- {F5972356, layout=link}

Rebased ontop of revised tests
Stop handling cases with constant mask. instcombine should unfold them.

lebedev.ri mentioned this in D45866: [InstCombine][NFC] Add tests for unfolding masked merge with constant mask.Apr 20 2018, 2:11 AM

lebedev.ri mentioned this in D45867: [InstCombine] Unfold masked merge with constant mask.Apr 20 2018, 2:13 AM

NFC, rebased ontop of rebased tests with CFI noise dropped.

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 20 2018, 11:50 AM

NFC, rebased.

spatel added inline comments.Apr 23 2018, 9:00 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

5381–5421

After stepping through more of your tests, I see why this is ugly.

We don't have to capture the intermediate values if the hasOneUse() checks are in the lambda(s) though. What do you think of this version:

// There are 3 commutable operators in the pattern, so we have to deal with
// 8 possible variants of the basic pattern.
SDValue X, Y, M;
auto matchAndXor = [&X,&Y,&M](SDValue And, unsigned XorIdx, SDValue Other) {
  if (And.getOpcode() != ISD::AND || !And.hasOneUse())
    return false;
  if (And.getOperand(XorIdx).getOpcode() != ISD::XOR ||
      !And.getOperand(XorIdx).hasOneUse())
    return false;
  SDValue Xor0 = And.getOperand(XorIdx).getOperand(0);
  SDValue Xor1 = And.getOperand(XorIdx).getOperand(1);
  if (Other == Xor0) std::swap(Xor0, Xor1);
  if (Other != Xor1) return false;
  X = Xor0;
  Y = Xor1;
  M = And.getOperand(1);
  return true;
};
if (!matchAndXor(A, 0, B) && !matchAndXor(A, 1, B) &&
    !matchAndXor(B, 0, A) && !matchAndXor(B, 1, A))
  return SDValue();

spatel added inline comments.Apr 23 2018, 9:10 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5381–5421	Oops - forgot to swap the M capture: M = And.getOperand(1); should be: M = And.getOperand(XorIdx ? 0 : 1);

Update with @spatel's suggested matcher.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5381–5421	Well, the test changes are the same, and the code looks a bit shorter.. Though it is much harder to read.

LGTM.

This revision is now accepted and ready to land.Apr 23 2018, 12:41 PM

In D45733#1075858, @spatel wrote:

LGTM.

Thank you for the review!

Closed by commit rL330646: [DAGCombiner] Unfold scalar masked merge if profitable (authored by lebedevri). · Explain WhyApr 23 2018, 1:42 PM

This revision was automatically updated to reflect the committed changes.

It seems this has uncovered something.
It does not look like a miscompilation to me (FIXME or is it?), but the produced code is certainly worse:

 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+bmi | FileCheck %s
 
 define float @test_andnotps_scalar(float %a0, float %a1, float* %a2) {
 ; CHECK-LABEL: test_andnotps_scalar:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    movd %xmm0, %eax
-; CHECK-NEXT:    movd %xmm1, %ecx
-; CHECK-NEXT:    andnl %ecx, %eax, %eax
-; CHECK-NEXT:    movd {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; CHECK-NEXT:    notl %eax
-; CHECK-NEXT:    movd %eax, %xmm0
+; CHECK-NEXT:    movd %xmm1, %eax
+; CHECK-NEXT:    movd {{.*#+}} xmm2 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    pand %xmm1, %xmm0
+; CHECK-NEXT:    movd %xmm0, %ecx
+; CHECK-NEXT:    notl %eax
+; CHECK-NEXT:    orl %ecx, %eax
+; CHECK-NEXT:    movd %eax, %xmm0
+; CHECK-NEXT:    pand %xmm2, %xmm0
 ; CHECK-NEXT:    retq
   %tmp = bitcast float %a0 to i32
   %tmp1 = bitcast float %a1 to i32
   %tmp2 = xor i32 %tmp, -1
   %tmp3 = and i32 %tmp2, %tmp1
   %tmp4 = load float, float* %a2, align 16
   %tmp5 = bitcast float %tmp4 to i32
   %tmp6 = xor i32 %tmp3, -1
   %tmp7 = and i32 %tmp5, %tmp6
   %tmp8 = bitcast i32 %tmp7 to float
   ret float %tmp8
 }

We lost andnl.
Discovered accidentally because the same happened to @test_andnotps/@test_andnotpd in test/CodeGen/X86/*-schedule.ll (they are no longer lowered to andnps/andnpd).

In D45733#1077183, @lebedev.ri wrote:

It seems this has uncovered something.
It does not look like a miscompilation to me (FIXME or is it?), but the produced code is certainly worse:

 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
 ; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=+bmi | FileCheck %s
 
 define float @test_andnotps_scalar(float %a0, float %a1, float* %a2) {
 ; CHECK-LABEL: test_andnotps_scalar:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    movd %xmm0, %eax
-; CHECK-NEXT:    movd %xmm1, %ecx
-; CHECK-NEXT:    andnl %ecx, %eax, %eax
-; CHECK-NEXT:    movd {{.*#+}} xmm1 = mem[0],zero,zero,zero
-; CHECK-NEXT:    notl %eax
-; CHECK-NEXT:    movd %eax, %xmm0
+; CHECK-NEXT:    movd %xmm1, %eax
+; CHECK-NEXT:    movd {{.*#+}} xmm2 = mem[0],zero,zero,zero
 ; CHECK-NEXT:    pand %xmm1, %xmm0
+; CHECK-NEXT:    movd %xmm0, %ecx
+; CHECK-NEXT:    notl %eax
+; CHECK-NEXT:    orl %ecx, %eax
+; CHECK-NEXT:    movd %eax, %xmm0
+; CHECK-NEXT:    pand %xmm2, %xmm0
 ; CHECK-NEXT:    retq
   %tmp = bitcast float %a0 to i32
   %tmp1 = bitcast float %a1 to i32
   %tmp2 = xor i32 %tmp, -1
   %tmp3 = and i32 %tmp2, %tmp1
   %tmp4 = load float, float* %a2, align 16
   %tmp5 = bitcast float %tmp4 to i32
   %tmp6 = xor i32 %tmp3, -1
   %tmp7 = and i32 %tmp5, %tmp6
   %tmp8 = bitcast i32 %tmp7 to float
   ret float %tmp8
 }

We lost andnl.
Discovered accidentally because the same happened to @test_andnotps/@test_andnotpd in test/CodeGen/X86/*-schedule.ll (they are no longer lowered to andnps/andnpd).

And it happened because both xor's have the same [constant] operand - -1.

Diffusion mentioned this in rL330771: [X86][AArch64][NFC] Add tests for masked merge unfolding with %y = const.Apr 24 2018, 2:26 PM

Diffusion mentioned this in rL331204: [InstCombine][NFC] Add tests for unfolding masked merge with constant mask.Apr 30 2018, 11:05 AM

Diffusion mentioned this in rL331205: [InstCombine] Unfold masked merge with constant mask.

andreadb mentioned this in D46031: [DAGCombiner] Masked merge: if 'B' is constant, de-canonicalize the pattern (invert the mask)..May 4 2018, 6:43 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

90 lines

test/

CodeGen/

AArch64/

unfold-masked-merge-scalar-variablemask.ll

114 lines

X86/

unfold-masked-merge-scalar-variablemask.ll

109 lines

Diff 143460

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	private:
SDValue SimplifySelect(const SDLoc &DL, SDValue N0, SDValue N1, SDValue N2);		SDValue SimplifySelect(const SDLoc &DL, SDValue N0, SDValue N1, SDValue N2);
SDValue SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1,		SDValue SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1,
SDValue N2, SDValue N3, ISD::CondCode CC,		SDValue N2, SDValue N3, ISD::CondCode CC,
bool NotExtCompare = false);		bool NotExtCompare = false);
SDValue foldSelectCCToShiftAnd(const SDLoc &DL, SDValue N0, SDValue N1,		SDValue foldSelectCCToShiftAnd(const SDLoc &DL, SDValue N0, SDValue N1,
SDValue N2, SDValue N3, ISD::CondCode CC);		SDValue N2, SDValue N3, ISD::CondCode CC);
SDValue foldLogicOfSetCCs(bool IsAnd, SDValue N0, SDValue N1,		SDValue foldLogicOfSetCCs(bool IsAnd, SDValue N0, SDValue N1,
const SDLoc &DL);		const SDLoc &DL);
		SDValue unfoldMaskedMerge(SDNode *N);
SDValue SimplifySetCC(EVT VT, SDValue N0, SDValue N1, ISD::CondCode Cond,		SDValue SimplifySetCC(EVT VT, SDValue N0, SDValue N1, ISD::CondCode Cond,
const SDLoc &DL, bool foldBooleans);		const SDLoc &DL, bool foldBooleans);
SDValue rebuildSetCC(SDValue N);		SDValue rebuildSetCC(SDValue N);

bool isSetCCEquivalent(SDValue N, SDValue &LHS, SDValue &RHS,		bool isSetCCEquivalent(SDValue N, SDValue &LHS, SDValue &RHS,
SDValue &CC) const;		SDValue &CC) const;
bool isOneUseSetCC(SDValue N) const;		bool isOneUseSetCC(SDValue N) const;

▲ Show 20 Lines • Show All 4,931 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::MatchLoadCombine(SDNode *N) {

// Transfer chain users from old loads to the new load.		// Transfer chain users from old loads to the new load.
for (LoadSDNode *L : Loads)		for (LoadSDNode *L : Loads)
DAG.ReplaceAllUsesOfValueWith(SDValue(L, 1), SDValue(NewLoad.getNode(), 1));		DAG.ReplaceAllUsesOfValueWith(SDValue(L, 1), SDValue(NewLoad.getNode(), 1));

return NeedsBswap ? DAG.getNode(ISD::BSWAP, SDLoc(N), VT, NewLoad) : NewLoad;		return NeedsBswap ? DAG.getNode(ISD::BSWAP, SDLoc(N), VT, NewLoad) : NewLoad;
}		}

		// If the target has andn, bsl, or a similar bit-select instruction,
		// we want to unfold masked merge, with canonical pattern of:
		// \| A \| \|B\|
		// ((x ^ y) & m) ^ y
		// \| D \|
		// Into:
		// (x & m) \| (y & ~m)
		SDValue DAGCombiner::unfoldMaskedMerge(SDNode *N) {
		assert(N->getOpcode() == ISD::XOR);

		EVT VT = N->getValueType(0);

		// FIXME
		if (VT.isVector())
		return SDValue();

		auto matchD = [](SDValue D, SDValue Y) -> llvm::Optional<SDValue> /X/ {
		if (D.getOpcode() != ISD::XOR)
		return llvm::None;
		SDValue D0 = D->getOperand(0);
		SDValue D1 = D->getOperand(1);
		if (D1 == Y)
		return D0;
		else if (D0 == Y)
		return D1;
		return llvm::None;
		};

		SDValue A, D, X, Y, M;

		auto matchA = [matchD, &D, &X, &Y, &M](SDValue A, SDValue B) -> bool {
		if (A.getOpcode() != ISD::AND)
		return false;
		SDValue A0 = A.getOperand(0);
		SDValue A1 = A.getOperand(1);
		if (auto X_ = matchD(A0, B)) {
		X = *X_;
		D = A0;
		M = A1;
		Y = B;
		return true;
		} else if (auto X_ = matchD(A1, B)) {
		X = *X_;
		D = A1;
		M = A0;
		Y = B;
		return true;
		}
		return false;
		};

		if (matchA(N->getOperand(0), N->getOperand(1)))
		A = N->getOperand(0);
		else if (matchA(N->getOperand(1), N->getOperand(0)))
		A = N->getOperand(1);
		else
		return SDValue();
		spatelUnsubmitted Done Reply Inline Actions After stepping through more of your tests, I see why this is ugly. We don't have to capture the intermediate values if the hasOneUse() checks are in the lambda(s) though. What do you think of this version: // There are 3 commutable operators in the pattern, so we have to deal with // 8 possible variants of the basic pattern. SDValue X, Y, M; auto matchAndXor = [&X,&Y,&M](SDValue And, unsigned XorIdx, SDValue Other) { if (And.getOpcode() != ISD::AND \|\| !And.hasOneUse()) return false; if (And.getOperand(XorIdx).getOpcode() != ISD::XOR \|\| !And.getOperand(XorIdx).hasOneUse()) return false; SDValue Xor0 = And.getOperand(XorIdx).getOperand(0); SDValue Xor1 = And.getOperand(XorIdx).getOperand(1); if (Other == Xor0) std::swap(Xor0, Xor1); if (Other != Xor1) return false; X = Xor0; Y = Xor1; M = And.getOperand(1); return true; }; if (!matchAndXor(A, 0, B) && !matchAndXor(A, 1, B) && !matchAndXor(B, 0, A) && !matchAndXor(B, 1, A)) return SDValue(); spatel: After stepping through more of your tests, I see why this is ugly. We don't have to capture…
		spatelUnsubmitted Done Reply Inline Actions Oops - forgot to swap the M capture: M = And.getOperand(1); should be: M = And.getOperand(XorIdx ? 0 : 1); spatel: Oops - forgot to swap the M capture: M = And.getOperand(1); should be: M = And.getOperand…
		lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions Well, the test changes are the same, and the code looks a bit shorter.. Though it is much harder to read. lebedev.ri: Well, the test changes are the same, and the code looks a bit shorter.. Though it is much…

		assert(A.getOpcode() == ISD::AND);
		assert(D.getOpcode() == ISD::XOR);

		// 'A' and 'D' part will be replaced completely.
		// Don't proceed they can't be dropped.
		if (!(A.hasOneUse() && D.hasOneUse()))
		return SDValue();

		// Don't do anything if the mask is constant. This should not be reachable.
		// InstCombine should have already unfolded this pattern, and DAGCombiner
		// probably shouldn't produce it, too.
		if (isa<ConstantSDNode>(M.getNode()))
		return SDValue();

		// We can transform if the target has AndNot
		if (!TLI.hasAndNot(M))
		return SDValue();

		SDLoc DL(N);

		SDValue LHS = DAG.getNode(ISD::AND, DL, VT, X, M);
		SDValue NotM = DAG.getNOT(DL, M, VT);
		SDValue RHS = DAG.getNode(ISD::AND, DL, VT, Y, NotM);

		return DAG.getNode(ISD::OR, DL, VT, LHS, RHS);
		}

SDValue DAGCombiner::visitXOR(SDNode *N) {		SDValue DAGCombiner::visitXOR(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N0.getValueType();		EVT VT = N0.getValueType();

// fold vector ops		// fold vector ops
if (VT.isVector()) {		if (VT.isVector()) {
if (SDValue FoldedVOp = SimplifyVBinOp(N))		if (SDValue FoldedVOp = SimplifyVBinOp(N))
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	return DAG.getNode(ISD::ROTL, DL, VT, DAG.getConstant(~1, DL, VT),
N0.getOperand(1));		N0.getOperand(1));
}		}

// Simplify: xor (op x...), (op y...) -> (op (xor x, y))		// Simplify: xor (op x...), (op y...) -> (op (xor x, y))
if (N0.getOpcode() == N1.getOpcode())		if (N0.getOpcode() == N1.getOpcode())
if (SDValue Tmp = SimplifyBinOpWithSameOpcodeHands(N))		if (SDValue Tmp = SimplifyBinOpWithSameOpcodeHands(N))
return Tmp;		return Tmp;

		// Unfold ((x ^ y) & m) ^ y into (x & m) \| (y & ~m) if profitable
		if (SDValue MM = unfoldMaskedMerge(N))
		return MM;

// Simplify the expression using non-local knowledge.		// Simplify the expression using non-local knowledge.
if (SimplifyDemandedBits(SDValue(N, 0)))		if (SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);		return SDValue(N, 0);

return SDValue();		return SDValue();
}		}

/// Handle transforms common to the three shifts, when the shift amount is a		/// Handle transforms common to the three shifts, when the shift amount is a
▲ Show 20 Lines • Show All 12,387 Lines • Show Last 20 Lines

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	}			}
	;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;			;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
	; Should be the same as the previous one.			; Should be the same as the previous one.
	;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;			;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

	define i8 @in8(i8 %x, i8 %y, i8 %mask) {			define i8 @in8(i8 %x, i8 %y, i8 %mask) {
	; CHECK-LABEL: in8:			; CHECK-LABEL: in8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w0, w1			; CHECK-NEXT: and w8, w0, w2
	; CHECK-NEXT: and w8, w8, w2			; CHECK-NEXT: bic w9, w1, w2
	; CHECK-NEXT: eor w0, w8, w1			; CHECK-NEXT: orr w0, w8, w9
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%n0 = xor i8 %x, %y			%n0 = xor i8 %x, %y
	%n1 = and i8 %n0, %mask			%n1 = and i8 %n0, %mask
	%r = xor i8 %n1, %y			%r = xor i8 %n1, %y
	ret i8 %r			ret i8 %r
	}			}

	define i16 @in16(i16 %x, i16 %y, i16 %mask) {			define i16 @in16(i16 %x, i16 %y, i16 %mask) {
	; CHECK-LABEL: in16:			; CHECK-LABEL: in16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w0, w1			; CHECK-NEXT: and w8, w0, w2
	; CHECK-NEXT: and w8, w8, w2			; CHECK-NEXT: bic w9, w1, w2
	; CHECK-NEXT: eor w0, w8, w1			; CHECK-NEXT: orr w0, w8, w9
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%n0 = xor i16 %x, %y			%n0 = xor i16 %x, %y
	%n1 = and i16 %n0, %mask			%n1 = and i16 %n0, %mask
	%r = xor i16 %n1, %y			%r = xor i16 %n1, %y
	ret i16 %r			ret i16 %r
	}			}

	define i32 @in32(i32 %x, i32 %y, i32 %mask) {			define i32 @in32(i32 %x, i32 %y, i32 %mask) {
	; CHECK-LABEL: in32:			; CHECK-LABEL: in32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w0, w1			; CHECK-NEXT: bic w8, w1, w2
	; CHECK-NEXT: and w8, w8, w2			; CHECK-NEXT: and w9, w0, w2
	; CHECK-NEXT: eor w0, w8, w1			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}

	define i64 @in64(i64 %x, i64 %y, i64 %mask) {			define i64 @in64(i64 %x, i64 %y, i64 %mask) {
	; CHECK-LABEL: in64:			; CHECK-LABEL: in64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor x8, x0, x1			; CHECK-NEXT: bic x8, x1, x2
	; CHECK-NEXT: and x8, x8, x2			; CHECK-NEXT: and x9, x0, x2
	; CHECK-NEXT: eor x0, x8, x1			; CHECK-NEXT: orr x0, x9, x8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%n0 = xor i64 %x, %y			%n0 = xor i64 %x, %y
	%n1 = and i64 %n0, %mask			%n1 = and i64 %n0, %mask
	%r = xor i64 %n1, %y			%r = xor i64 %n1, %y
	ret i64 %r			ret i64 %r
	}			}
	; ============================================================================ ;			; ============================================================================ ;
	; Commutativity tests.			; Commutativity tests.
	; ============================================================================ ;			; ============================================================================ ;
	define i32 @in_commutativity_0_0_1(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_0_0_1(i32 %x, i32 %y, i32 %mask) {
	; CHECK-LABEL: in_commutativity_0_0_1:			; CHECK-LABEL: in_commutativity_0_0_1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w0, w1			; CHECK-NEXT: bic w8, w1, w2
	; CHECK-NEXT: and w8, w2, w8			; CHECK-NEXT: and w9, w0, w2
	; CHECK-NEXT: eor w0, w8, w1			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0 ; swapped			%n1 = and i32 %mask, %n0 ; swapped
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_0_1_0(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_0_1_0(i32 %x, i32 %y, i32 %mask) {
	; CHECK-LABEL: in_commutativity_0_1_0:			; CHECK-LABEL: in_commutativity_0_1_0:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w0, w1			; CHECK-NEXT: bic w8, w1, w2
	; CHECK-NEXT: and w8, w8, w2			; CHECK-NEXT: and w9, w0, w2
	; CHECK-NEXT: eor w0, w1, w8			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %y, %n1 ; swapped			%r = xor i32 %y, %n1 ; swapped
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_0_1_1(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_0_1_1(i32 %x, i32 %y, i32 %mask) {
	; CHECK-LABEL: in_commutativity_0_1_1:			; CHECK-LABEL: in_commutativity_0_1_1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w0, w1			; CHECK-NEXT: bic w8, w1, w2
	; CHECK-NEXT: and w8, w2, w8			; CHECK-NEXT: and w9, w0, w2
	; CHECK-NEXT: eor w0, w1, w8			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0 ; swapped			%n1 = and i32 %mask, %n0 ; swapped
	%r = xor i32 %y, %n1 ; swapped			%r = xor i32 %y, %n1 ; swapped
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_1_0_0(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_1_0_0(i32 %x, i32 %y, i32 %mask) {
	; CHECK-LABEL: in_commutativity_1_0_0:			; CHECK-LABEL: in_commutativity_1_0_0:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w0, w1			; CHECK-NEXT: bic w8, w0, w2
	; CHECK-NEXT: and w8, w8, w2			; CHECK-NEXT: and w9, w1, w2
	; CHECK-NEXT: eor w0, w8, w0			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, %x ; %x instead of %y			%r = xor i32 %n1, %x ; %x instead of %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_1_0_1(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_1_0_1(i32 %x, i32 %y, i32 %mask) {
	; CHECK-LABEL: in_commutativity_1_0_1:			; CHECK-LABEL: in_commutativity_1_0_1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w0, w1			; CHECK-NEXT: bic w8, w0, w2
	; CHECK-NEXT: and w8, w2, w8			; CHECK-NEXT: and w9, w1, w2
	; CHECK-NEXT: eor w0, w8, w0			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0 ; swapped			%n1 = and i32 %mask, %n0 ; swapped
	%r = xor i32 %n1, %x ; %x instead of %y			%r = xor i32 %n1, %x ; %x instead of %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_1_1_0(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_1_1_0(i32 %x, i32 %y, i32 %mask) {
	; CHECK-LABEL: in_commutativity_1_1_0:			; CHECK-LABEL: in_commutativity_1_1_0:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w0, w1			; CHECK-NEXT: bic w8, w0, w2
	; CHECK-NEXT: and w8, w8, w2			; CHECK-NEXT: and w9, w1, w2
	; CHECK-NEXT: eor w0, w0, w8			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %x, %n1 ; swapped, %x instead of %y			%r = xor i32 %x, %n1 ; swapped, %x instead of %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_1_1_1(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_1_1_1(i32 %x, i32 %y, i32 %mask) {
	; CHECK-LABEL: in_commutativity_1_1_1:			; CHECK-LABEL: in_commutativity_1_1_1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w0, w1			; CHECK-NEXT: bic w8, w0, w2
	; CHECK-NEXT: and w8, w2, w8			; CHECK-NEXT: and w9, w1, w2
	; CHECK-NEXT: eor w0, w0, w8			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0 ; swapped			%n1 = and i32 %mask, %n0 ; swapped
	%r = xor i32 %x, %n1 ; swapped, %x instead of %y			%r = xor i32 %x, %n1 ; swapped, %x instead of %y
	ret i32 %r			ret i32 %r
	}			}
	; ============================================================================ ;			; ============================================================================ ;
	; Y is an 'and' too.			; Y is an 'and' too.
	; ============================================================================ ;			; ============================================================================ ;
	define i32 @in_complex_y0(i32 %x, i32 %y_hi, i32 %y_low, i32 %mask) {			define i32 @in_complex_y0(i32 %x, i32 %y_hi, i32 %y_low, i32 %mask) {
	; CHECK-LABEL: in_complex_y0:			; CHECK-LABEL: in_complex_y0:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: and w8, w1, w2			; CHECK-NEXT: and w8, w1, w2
	; CHECK-NEXT: eor w9, w0, w8			; CHECK-NEXT: and w9, w0, w3
	; CHECK-NEXT: and w9, w9, w3			; CHECK-NEXT: bic w8, w8, w3
	; CHECK-NEXT: eor w0, w9, w8			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_complex_y1(i32 %x, i32 %y_hi, i32 %y_low, i32 %mask) {			define i32 @in_complex_y1(i32 %x, i32 %y_hi, i32 %y_low, i32 %mask) {
	; CHECK-LABEL: in_complex_y1:			; CHECK-LABEL: in_complex_y1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: and w8, w1, w2			; CHECK-NEXT: and w8, w1, w2
	; CHECK-NEXT: eor w9, w0, w8			; CHECK-NEXT: and w9, w0, w3
	; CHECK-NEXT: and w9, w9, w3			; CHECK-NEXT: bic w8, w8, w3
	; CHECK-NEXT: eor w0, w8, w9			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %y, %n1			%r = xor i32 %y, %n1
	ret i32 %r			ret i32 %r
	}			}
	; ============================================================================ ;			; ============================================================================ ;
	; M is an 'xor' too.			; M is an 'xor' too.
	; ============================================================================ ;			; ============================================================================ ;
	define i32 @in_complex_m0(i32 %x, i32 %y, i32 %m_a, i32 %m_b) {			define i32 @in_complex_m0(i32 %x, i32 %y, i32 %m_a, i32 %m_b) {
	; CHECK-LABEL: in_complex_m0:			; CHECK-LABEL: in_complex_m0:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w2, w3			; CHECK-NEXT: eor w8, w2, w3
	; CHECK-NEXT: eor w9, w0, w1			; CHECK-NEXT: bic w9, w1, w8
	; CHECK-NEXT: and w8, w9, w8			; CHECK-NEXT: and w8, w0, w8
	; CHECK-NEXT: eor w0, w8, w1			; CHECK-NEXT: orr w0, w8, w9
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_complex_m1(i32 %x, i32 %y, i32 %m_a, i32 %m_b) {			define i32 @in_complex_m1(i32 %x, i32 %y, i32 %m_a, i32 %m_b) {
	; CHECK-LABEL: in_complex_m1:			; CHECK-LABEL: in_complex_m1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor w8, w2, w3			; CHECK-NEXT: eor w8, w2, w3
	; CHECK-NEXT: eor w9, w0, w1			; CHECK-NEXT: bic w9, w1, w8
	; CHECK-NEXT: and w8, w8, w9			; CHECK-NEXT: and w8, w0, w8
	; CHECK-NEXT: eor w0, w8, w1			; CHECK-NEXT: orr w0, w8, w9
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0			%n1 = and i32 %mask, %n0
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	; ============================================================================ ;			; ============================================================================ ;
	; Both Y and M are complex.			; Both Y and M are complex.
	; ============================================================================ ;			; ============================================================================ ;
	define i32 @in_complex_y0_m0(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {			define i32 @in_complex_y0_m0(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {
	; CHECK-LABEL: in_complex_y0_m0:			; CHECK-LABEL: in_complex_y0_m0:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: and w8, w1, w2			; CHECK-NEXT: and w8, w1, w2
	; CHECK-NEXT: eor w9, w3, w4			; CHECK-NEXT: eor w9, w3, w4
	; CHECK-NEXT: eor w10, w0, w8			; CHECK-NEXT: bic w8, w8, w9
	; CHECK-NEXT: and w9, w10, w9			; CHECK-NEXT: and w9, w0, w9
	; CHECK-NEXT: eor w0, w9, w8			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_complex_y1_m0(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {			define i32 @in_complex_y1_m0(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {
	; CHECK-LABEL: in_complex_y1_m0:			; CHECK-LABEL: in_complex_y1_m0:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: and w8, w1, w2			; CHECK-NEXT: and w8, w1, w2
	; CHECK-NEXT: eor w9, w3, w4			; CHECK-NEXT: eor w9, w3, w4
	; CHECK-NEXT: eor w10, w0, w8			; CHECK-NEXT: bic w8, w8, w9
	; CHECK-NEXT: and w9, w10, w9			; CHECK-NEXT: and w9, w0, w9
	; CHECK-NEXT: eor w0, w8, w9			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %y, %n1			%r = xor i32 %y, %n1
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_complex_y0_m1(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {			define i32 @in_complex_y0_m1(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {
	; CHECK-LABEL: in_complex_y0_m1:			; CHECK-LABEL: in_complex_y0_m1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: and w8, w1, w2			; CHECK-NEXT: and w8, w1, w2
	; CHECK-NEXT: eor w9, w3, w4			; CHECK-NEXT: eor w9, w3, w4
	; CHECK-NEXT: eor w10, w0, w8			; CHECK-NEXT: bic w8, w8, w9
	; CHECK-NEXT: and w9, w9, w10			; CHECK-NEXT: and w9, w0, w9
	; CHECK-NEXT: eor w0, w9, w8			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0			%n1 = and i32 %mask, %n0
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_complex_y1_m1(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {			define i32 @in_complex_y1_m1(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {
	; CHECK-LABEL: in_complex_y1_m1:			; CHECK-LABEL: in_complex_y1_m1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: and w8, w1, w2			; CHECK-NEXT: and w8, w1, w2
	; CHECK-NEXT: eor w9, w3, w4			; CHECK-NEXT: eor w9, w3, w4
	; CHECK-NEXT: eor w10, w0, w8			; CHECK-NEXT: bic w8, w8, w9
	; CHECK-NEXT: and w9, w9, w10			; CHECK-NEXT: and w9, w0, w9
	; CHECK-NEXT: eor w0, w8, w9			; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0			%n1 = and i32 %mask, %n0
	%r = xor i32 %y, %n1			%r = xor i32 %y, %n1
	ret i32 %r			ret i32 %r
	}			}
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

test/CodeGen/X86/unfold-masked-merge-scalar-variablemask.ll

	Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in8:			; CHECK-BMI-LABEL: in8:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %edi			; CHECK-BMI-NEXT: andl %edx, %edi
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %edi, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax			; CHECK-BMI-NEXT: # kill: def $al killed $al killed $eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i8 %x, %y			%n0 = xor i8 %x, %y
	%n1 = and i8 %n0, %mask			%n1 = and i8 %n0, %mask
	%r = xor i8 %n1, %y			%r = xor i8 %n1, %y
	ret i8 %r			ret i8 %r
	}			}

	define i16 @in16(i16 %x, i16 %y, i16 %mask) {			define i16 @in16(i16 %x, i16 %y, i16 %mask) {
	; CHECK-NOBMI-LABEL: in16:			; CHECK-NOBMI-LABEL: in16:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in16:			; CHECK-BMI-LABEL: in16:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %edi			; CHECK-BMI-NEXT: andl %edx, %edi
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %edi, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax			; CHECK-BMI-NEXT: # kill: def $ax killed $ax killed $eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i16 %x, %y			%n0 = xor i16 %x, %y
	%n1 = and i16 %n0, %mask			%n1 = and i16 %n0, %mask
	%r = xor i16 %n1, %y			%r = xor i16 %n1, %y
	ret i16 %r			ret i16 %r
	}			}

	define i32 @in32(i32 %x, i32 %y, i32 %mask) {			define i32 @in32(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: in32:			; CHECK-NOBMI-LABEL: in32:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in32:			; CHECK-BMI-LABEL: in32:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %edi			; CHECK-BMI-NEXT: andl %edx, %edi
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %edi, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}

	define i64 @in64(i64 %x, i64 %y, i64 %mask) {			define i64 @in64(i64 %x, i64 %y, i64 %mask) {
	; CHECK-NOBMI-LABEL: in64:			; CHECK-NOBMI-LABEL: in64:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorq %rsi, %rdi			; CHECK-NOBMI-NEXT: xorq %rsi, %rdi
	; CHECK-NOBMI-NEXT: andq %rdx, %rdi			; CHECK-NOBMI-NEXT: andq %rdx, %rdi
	; CHECK-NOBMI-NEXT: xorq %rsi, %rdi			; CHECK-NOBMI-NEXT: xorq %rsi, %rdi
	; CHECK-NOBMI-NEXT: movq %rdi, %rax			; CHECK-NOBMI-NEXT: movq %rdi, %rax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in64:			; CHECK-BMI-LABEL: in64:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorq %rsi, %rdi			; CHECK-BMI-NEXT: andnq %rsi, %rdx, %rax
	; CHECK-BMI-NEXT: andq %rdx, %rdi			; CHECK-BMI-NEXT: andq %rdx, %rdi
	; CHECK-BMI-NEXT: xorq %rsi, %rdi			; CHECK-BMI-NEXT: orq %rdi, %rax
	; CHECK-BMI-NEXT: movq %rdi, %rax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i64 %x, %y			%n0 = xor i64 %x, %y
	%n1 = and i64 %n0, %mask			%n1 = and i64 %n0, %mask
	%r = xor i64 %n1, %y			%r = xor i64 %n1, %y
	ret i64 %r			ret i64 %r
	}			}
	; ============================================================================ ;			; ============================================================================ ;
	; Commutativity tests.			; Commutativity tests.
	; ============================================================================ ;			; ============================================================================ ;
	define i32 @in_commutativity_0_0_1(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_0_0_1(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: in_commutativity_0_0_1:			; CHECK-NOBMI-LABEL: in_commutativity_0_0_1:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_commutativity_0_0_1:			; CHECK-BMI-LABEL: in_commutativity_0_0_1:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %edi			; CHECK-BMI-NEXT: andl %edx, %edi
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %edi, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0 ; swapped			%n1 = and i32 %mask, %n0 ; swapped
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_0_1_0(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_0_1_0(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: in_commutativity_0_1_0:			; CHECK-NOBMI-LABEL: in_commutativity_0_1_0:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_commutativity_0_1_0:			; CHECK-BMI-LABEL: in_commutativity_0_1_0:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %edi			; CHECK-BMI-NEXT: andl %edx, %edi
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %edi, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %y, %n1 ; swapped			%r = xor i32 %y, %n1 ; swapped
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_0_1_1(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_0_1_1(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: in_commutativity_0_1_1:			; CHECK-NOBMI-LABEL: in_commutativity_0_1_1:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_commutativity_0_1_1:			; CHECK-BMI-LABEL: in_commutativity_0_1_1:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %edi			; CHECK-BMI-NEXT: andl %edx, %edi
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %edi, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0 ; swapped			%n1 = and i32 %mask, %n0 ; swapped
	%r = xor i32 %y, %n1 ; swapped			%r = xor i32 %y, %n1 ; swapped
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_1_0_0(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_1_0_0(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: in_commutativity_1_0_0:			; CHECK-NOBMI-LABEL: in_commutativity_1_0_0:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %edi, %esi			; CHECK-NOBMI-NEXT: xorl %edi, %esi
	; CHECK-NOBMI-NEXT: andl %edx, %esi			; CHECK-NOBMI-NEXT: andl %edx, %esi
	; CHECK-NOBMI-NEXT: xorl %edi, %esi			; CHECK-NOBMI-NEXT: xorl %edi, %esi
	; CHECK-NOBMI-NEXT: movl %esi, %eax			; CHECK-NOBMI-NEXT: movl %esi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_commutativity_1_0_0:			; CHECK-BMI-LABEL: in_commutativity_1_0_0:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %edi, %esi			; CHECK-BMI-NEXT: andnl %edi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %esi			; CHECK-BMI-NEXT: andl %edx, %esi
	; CHECK-BMI-NEXT: xorl %edi, %esi			; CHECK-BMI-NEXT: orl %esi, %eax
	; CHECK-BMI-NEXT: movl %esi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, %x ; %x instead of %y			%r = xor i32 %n1, %x ; %x instead of %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_1_0_1(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_1_0_1(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: in_commutativity_1_0_1:			; CHECK-NOBMI-LABEL: in_commutativity_1_0_1:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %edi, %esi			; CHECK-NOBMI-NEXT: xorl %edi, %esi
	; CHECK-NOBMI-NEXT: andl %edx, %esi			; CHECK-NOBMI-NEXT: andl %edx, %esi
	; CHECK-NOBMI-NEXT: xorl %edi, %esi			; CHECK-NOBMI-NEXT: xorl %edi, %esi
	; CHECK-NOBMI-NEXT: movl %esi, %eax			; CHECK-NOBMI-NEXT: movl %esi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_commutativity_1_0_1:			; CHECK-BMI-LABEL: in_commutativity_1_0_1:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %edi, %esi			; CHECK-BMI-NEXT: andnl %edi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %esi			; CHECK-BMI-NEXT: andl %edx, %esi
	; CHECK-BMI-NEXT: xorl %edi, %esi			; CHECK-BMI-NEXT: orl %esi, %eax
	; CHECK-BMI-NEXT: movl %esi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0 ; swapped			%n1 = and i32 %mask, %n0 ; swapped
	%r = xor i32 %n1, %x ; %x instead of %y			%r = xor i32 %n1, %x ; %x instead of %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_1_1_0(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_1_1_0(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: in_commutativity_1_1_0:			; CHECK-NOBMI-LABEL: in_commutativity_1_1_0:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %edi, %esi			; CHECK-NOBMI-NEXT: xorl %edi, %esi
	; CHECK-NOBMI-NEXT: andl %edx, %esi			; CHECK-NOBMI-NEXT: andl %edx, %esi
	; CHECK-NOBMI-NEXT: xorl %edi, %esi			; CHECK-NOBMI-NEXT: xorl %edi, %esi
	; CHECK-NOBMI-NEXT: movl %esi, %eax			; CHECK-NOBMI-NEXT: movl %esi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_commutativity_1_1_0:			; CHECK-BMI-LABEL: in_commutativity_1_1_0:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %edi, %esi			; CHECK-BMI-NEXT: andnl %edi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %esi			; CHECK-BMI-NEXT: andl %edx, %esi
	; CHECK-BMI-NEXT: xorl %edi, %esi			; CHECK-BMI-NEXT: orl %esi, %eax
	; CHECK-BMI-NEXT: movl %esi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %x, %n1 ; swapped, %x instead of %y			%r = xor i32 %x, %n1 ; swapped, %x instead of %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_commutativity_1_1_1(i32 %x, i32 %y, i32 %mask) {			define i32 @in_commutativity_1_1_1(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: in_commutativity_1_1_1:			; CHECK-NOBMI-LABEL: in_commutativity_1_1_1:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %edi, %esi			; CHECK-NOBMI-NEXT: xorl %edi, %esi
	; CHECK-NOBMI-NEXT: andl %edx, %esi			; CHECK-NOBMI-NEXT: andl %edx, %esi
	; CHECK-NOBMI-NEXT: xorl %edi, %esi			; CHECK-NOBMI-NEXT: xorl %edi, %esi
	; CHECK-NOBMI-NEXT: movl %esi, %eax			; CHECK-NOBMI-NEXT: movl %esi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_commutativity_1_1_1:			; CHECK-BMI-LABEL: in_commutativity_1_1_1:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %edi, %esi			; CHECK-BMI-NEXT: andnl %edi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %esi			; CHECK-BMI-NEXT: andl %edx, %esi
	; CHECK-BMI-NEXT: xorl %edi, %esi			; CHECK-BMI-NEXT: orl %esi, %eax
	; CHECK-BMI-NEXT: movl %esi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0 ; swapped			%n1 = and i32 %mask, %n0 ; swapped
	%r = xor i32 %x, %n1 ; swapped, %x instead of %y			%r = xor i32 %x, %n1 ; swapped, %x instead of %y
	ret i32 %r			ret i32 %r
	}			}
	; ============================================================================ ;			; ============================================================================ ;
	; Y is an 'and' too.			; Y is an 'and' too.
	; ============================================================================ ;			; ============================================================================ ;
	define i32 @in_complex_y0(i32 %x, i32 %y_hi, i32 %y_low, i32 %mask) {			define i32 @in_complex_y0(i32 %x, i32 %y_hi, i32 %y_low, i32 %mask) {
	; CHECK-NOBMI-LABEL: in_complex_y0:			; CHECK-NOBMI-LABEL: in_complex_y0:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: andl %edx, %esi			; CHECK-NOBMI-NEXT: andl %edx, %esi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %ecx, %edi			; CHECK-NOBMI-NEXT: andl %ecx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_complex_y0:			; CHECK-BMI-LABEL: in_complex_y0:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andl %edx, %esi			; CHECK-BMI-NEXT: andl %edx, %esi
	; CHECK-BMI-NEXT: xorl %esi, %edi
	; CHECK-BMI-NEXT: andl %ecx, %edi			; CHECK-BMI-NEXT: andl %ecx, %edi
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %ecx, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax			; CHECK-BMI-NEXT: orl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_complex_y1(i32 %x, i32 %y_hi, i32 %y_low, i32 %mask) {			define i32 @in_complex_y1(i32 %x, i32 %y_hi, i32 %y_low, i32 %mask) {
	; CHECK-NOBMI-LABEL: in_complex_y1:			; CHECK-NOBMI-LABEL: in_complex_y1:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: andl %edx, %esi			; CHECK-NOBMI-NEXT: andl %edx, %esi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %ecx, %edi			; CHECK-NOBMI-NEXT: andl %ecx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_complex_y1:			; CHECK-BMI-LABEL: in_complex_y1:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andl %edx, %esi			; CHECK-BMI-NEXT: andl %edx, %esi
	; CHECK-BMI-NEXT: xorl %esi, %edi
	; CHECK-BMI-NEXT: andl %ecx, %edi			; CHECK-BMI-NEXT: andl %ecx, %edi
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %ecx, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax			; CHECK-BMI-NEXT: orl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %y, %n1			%r = xor i32 %y, %n1
	ret i32 %r			ret i32 %r
	}			}
	; ============================================================================ ;			; ============================================================================ ;
	; M is an 'xor' too.			; M is an 'xor' too.
	; ============================================================================ ;			; ============================================================================ ;
	define i32 @in_complex_m0(i32 %x, i32 %y, i32 %m_a, i32 %m_b) {			define i32 @in_complex_m0(i32 %x, i32 %y, i32 %m_a, i32 %m_b) {
	; CHECK-NOBMI-LABEL: in_complex_m0:			; CHECK-NOBMI-LABEL: in_complex_m0:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %ecx, %edx			; CHECK-NOBMI-NEXT: xorl %ecx, %edx
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_complex_m0:			; CHECK-BMI-LABEL: in_complex_m0:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %ecx, %edx			; CHECK-BMI-NEXT: xorl %ecx, %edx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %edi			; CHECK-BMI-NEXT: andl %edi, %edx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %edx, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_complex_m1(i32 %x, i32 %y, i32 %m_a, i32 %m_b) {			define i32 @in_complex_m1(i32 %x, i32 %y, i32 %m_a, i32 %m_b) {
	; CHECK-NOBMI-LABEL: in_complex_m1:			; CHECK-NOBMI-LABEL: in_complex_m1:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %ecx, %edx			; CHECK-NOBMI-NEXT: xorl %ecx, %edx
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_complex_m1:			; CHECK-BMI-LABEL: in_complex_m1:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: xorl %ecx, %edx			; CHECK-BMI-NEXT: xorl %ecx, %edx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %edx, %eax
	; CHECK-BMI-NEXT: andl %edx, %edi			; CHECK-BMI-NEXT: andl %edi, %edx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %edx, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0			%n1 = and i32 %mask, %n0
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	; ============================================================================ ;			; ============================================================================ ;
	Show All 9 Lines
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_complex_y0_m0:			; CHECK-BMI-LABEL: in_complex_y0_m0:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andl %edx, %esi			; CHECK-BMI-NEXT: andl %edx, %esi
	; CHECK-BMI-NEXT: xorl %r8d, %ecx			; CHECK-BMI-NEXT: xorl %r8d, %ecx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %ecx, %eax
	; CHECK-BMI-NEXT: andl %ecx, %edi			; CHECK-BMI-NEXT: andl %edi, %ecx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %ecx, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_complex_y1_m0(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {			define i32 @in_complex_y1_m0(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {
	; CHECK-NOBMI-LABEL: in_complex_y1_m0:			; CHECK-NOBMI-LABEL: in_complex_y1_m0:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: andl %edx, %esi			; CHECK-NOBMI-NEXT: andl %edx, %esi
	; CHECK-NOBMI-NEXT: xorl %r8d, %ecx			; CHECK-NOBMI-NEXT: xorl %r8d, %ecx
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %ecx, %edi			; CHECK-NOBMI-NEXT: andl %ecx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_complex_y1_m0:			; CHECK-BMI-LABEL: in_complex_y1_m0:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andl %edx, %esi			; CHECK-BMI-NEXT: andl %edx, %esi
	; CHECK-BMI-NEXT: xorl %r8d, %ecx			; CHECK-BMI-NEXT: xorl %r8d, %ecx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %ecx, %eax
	; CHECK-BMI-NEXT: andl %ecx, %edi			; CHECK-BMI-NEXT: andl %edi, %ecx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %ecx, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %y, %n1			%r = xor i32 %y, %n1
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_complex_y0_m1(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {			define i32 @in_complex_y0_m1(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {
	; CHECK-NOBMI-LABEL: in_complex_y0_m1:			; CHECK-NOBMI-LABEL: in_complex_y0_m1:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: andl %edx, %esi			; CHECK-NOBMI-NEXT: andl %edx, %esi
	; CHECK-NOBMI-NEXT: xorl %r8d, %ecx			; CHECK-NOBMI-NEXT: xorl %r8d, %ecx
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %ecx, %edi			; CHECK-NOBMI-NEXT: andl %ecx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_complex_y0_m1:			; CHECK-BMI-LABEL: in_complex_y0_m1:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andl %edx, %esi			; CHECK-BMI-NEXT: andl %edx, %esi
	; CHECK-BMI-NEXT: xorl %r8d, %ecx			; CHECK-BMI-NEXT: xorl %r8d, %ecx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %ecx, %eax
	; CHECK-BMI-NEXT: andl %ecx, %edi			; CHECK-BMI-NEXT: andl %edi, %ecx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %ecx, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0			%n1 = and i32 %mask, %n0
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @in_complex_y1_m1(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {			define i32 @in_complex_y1_m1(i32 %x, i32 %y_hi, i32 %y_low, i32 %m_a, i32 %m_b) {
	; CHECK-NOBMI-LABEL: in_complex_y1_m1:			; CHECK-NOBMI-LABEL: in_complex_y1_m1:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: andl %edx, %esi			; CHECK-NOBMI-NEXT: andl %edx, %esi
	; CHECK-NOBMI-NEXT: xorl %r8d, %ecx			; CHECK-NOBMI-NEXT: xorl %r8d, %ecx
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: andl %ecx, %edi			; CHECK-NOBMI-NEXT: andl %ecx, %edi
	; CHECK-NOBMI-NEXT: xorl %esi, %edi			; CHECK-NOBMI-NEXT: xorl %esi, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_complex_y1_m1:			; CHECK-BMI-LABEL: in_complex_y1_m1:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andl %edx, %esi			; CHECK-BMI-NEXT: andl %edx, %esi
	; CHECK-BMI-NEXT: xorl %r8d, %ecx			; CHECK-BMI-NEXT: xorl %r8d, %ecx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: andnl %esi, %ecx, %eax
	; CHECK-BMI-NEXT: andl %ecx, %edi			; CHECK-BMI-NEXT: andl %edi, %ecx
	; CHECK-BMI-NEXT: xorl %esi, %edi			; CHECK-BMI-NEXT: orl %ecx, %eax
	; CHECK-BMI-NEXT: movl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%y = and i32 %y_hi, %y_low			%y = and i32 %y_hi, %y_low
	%mask = xor i32 %m_a, %m_b			%mask = xor i32 %m_a, %m_b
	%n0 = xor i32 %x, %y			%n0 = xor i32 %x, %y
	%n1 = and i32 %mask, %n0			%n1 = and i32 %mask, %n0
	%r = xor i32 %y, %n1			%r = xor i32 %y, %n1
	ret i32 %r			ret i32 %r
	}			}
	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Unfold scalar masked merge if profitableClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 143460

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll

test/CodeGen/X86/unfold-masked-merge-scalar-variablemask.ll

[DAGCombiner] Unfold scalar masked merge if profitable
ClosedPublic