This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
Target/X86/
-
X86/
-
X86ISelLowering.h
-
X86ISelLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
8/13
unfold-masked-merge-scalar-variablemask.ll
-
X86/
-
icmp-opt.ll
-
selectcc-to-shiftand.ll
4/7
unfold-masked-merge-scalar-variablemask.ll

Differential D46031

[DAGCombiner] Masked merge: if 'B' is constant, de-canonicalize the pattern (invert the mask).
AbandonedPublic

Authored by lebedev.ri on Apr 24 2018, 3:48 PM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
RKSimon
javed.absar

Summary

Discovered accidentally when working on the vector part, because the @test_andnotps/@test_andnotpd
in test/CodeGen/X86/*-schedule.ll broke - they were no longer lowered to andnps/andnpd.

Given canonical pattern of:

|        A  |  |B|
((x ^ y) & m) ^ y
 |  D  |

We don't want to handle xor's with second operand being constant,
because andn does not get selected.

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.Apr 24 2018, 3:48 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 24 2018, 3:48 PM

Rebased ontop of better testset.

I have stared at the test change a bit more, and unless there are some other patterns i did not analyze, i do think this is the way we want to handle this.

test/CodeGen/X86/unfold-masked-merge-scalar-variablemask.ll
573–575	This improves.
613–616	This degrades, but instcombine will canonicalize it to `@in_constant_mone_vary`, and that one is ok. So this is ok too.
658	Looked at this in llvm-mca, no clear winner. The change decreases instruction count and IPC, but the cycle count does not change. So i guess it's ok?
702–703	As per mca this is an unimportant change, but again, instcombine will canonicalize it to `@in_constant_42_vary`, which is ok. So this one appears ok too.

It's not clear to me if this is about xor with a constant in general or xor with -1 specifically. Is the motivating pattern/problem not recognizing DeMorgan's Laws folds in the DAG?

; ~(~x & y) --> x | ~y
%notx = xor i8 %x, -1
%and = and i8 %notx, %y
%r = xor i8 %and, -1
=>
%notm = xor i8 %y, -1
%r = or i8 %x, %notm

That seems like a good fold to have in the DAG given that we're rearranging bitwise logic ops to better match target features. Should we just add that (and the 'or' --> 'and' twin)?

In D46031#1078128, @spatel wrote:

It's not clear to me if this is about xor with a constant in general or xor with -1 specifically.

I thought in general, note the tests with constant 42.

Is the motivating pattern/problem not recognizing DeMorgan's Laws folds in the DAG?

The motivational case is specified in the differential's description,

; ~(~x & y) --> x | ~y
%notx = xor i8 %x, -1
%and = and i8 %notx, %y
%r = xor i8 %and, -1
=>
%notm = xor i8 %y, -1
%r = or i8 %x, %notm
That seems like a good fold to have in the DAG given that we're rearranging bitwise logic ops to better match target features. Should we just add that (and the 'or' --> 'and' twin)?

Hmm, not sure, let's see..

lebedev.ri mentioned this in D46072: [DagCombine][InstCombine][NFC] De Morgan law tests.Apr 25 2018, 10:40 AM

lebedev.ri mentioned this in D46073: [DagCombine] De Morgan laws: 'nand' logic with an inverted operand.

lebedev.ri added inline comments.Apr 26 2018, 7:29 AM

test/CodeGen/X86/unfold-masked-merge-scalar-variablemask.ll

658

On top of D46073, this @in_constant_varx_42 pattern (i.e. %y being constant) is the only remaining issue.

# *** IR Dump After Machine InstCombiner ***:
# Machine code for function in_constant_varx_42: IsSSA, TracksLiveness
Function Live Ins: $edi in %0, $edx in %2

bb.0 (%ir-block.0):
  liveins: $edi, $edx
  %2:gr32 = COPY $edx
  %0:gr32 = COPY $edi
  %3:gr32 = AND32rr %0:gr32, %2:gr32, implicit-def dead $eflags
  %4:gr32 = NOT32r %2:gr32
  %5:gr32 = AND32ri8 %4:gr32, 42, implicit-def dead $eflags
  %6:gr32 = OR32rr %3:gr32, killed %5:gr32, implicit-def dead $eflags
  $eax = COPY %6:gr32
  RET 0, $eax

# End machine code for function in_constant_varx_42.

This *seems* ok (as per mca) on aarch64, but i'm not so sure about x86.

diff.txt2 KBDownload

lebedev.ri added inline comments.Apr 26 2018, 8:32 AM

test/CodeGen/X86/unfold-masked-merge-scalar-variablemask.ll
658	Right, in this case not only should i not unfold it, but also de-canonicalize the mask. diff.txt2 KBDownload

Assuming we'll manage to get D46073, this should do it.

lebedev.ri added a parent revision: D46073: [DagCombine] De Morgan laws: 'nand' logic with an inverted operand.Apr 27 2018, 4:13 AM

After some thought, and staring into MCA output, i believe this should come before De Morgan laws (D46073, should that ever land),
thus i rebased this change not to depend on that differential.

Some considerations, for znver1, and this test IR:

define i32 @in_constant_varx_42(i32 %x, i32 %y, i32 %mask) {
  %n0 = xor i32 %x, 42 ; %x
  %n1 = and i32 %n0, %mask
  %r = xor i32 %n1, 42
  ret i32 %r
}

diff-mm-vs-unfolded-old.txt2 KBDownload

Difference between not unfolding that pattern vs. svn - instruction count and IPC increased

diff-mm-vs-unfolded-new.txt2 KBDownload

Difference between not unfolding that pattern vs. this differential - Total Cycles halved, IPC doubled

diff-unfolded-old-vs-new.txt2 KBDownload

Difference between unfolding that pattern in svn vs. this differential - Instruction count decreased back to not unfolded count, cycle count halved, IPC increased.

lebedev.ri removed a parent revision: D46073: [DagCombine] De Morgan laws: 'nand' logic with an inverted operand.Apr 30 2018, 1:10 PM

spatel added inline comments.May 2 2018, 11:10 AM

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll
350–355	How does this happen? Isn't that a miscompile?

lebedev.ri added inline comments.May 2 2018, 12:01 PM

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll
350–355	Hm, at first i thought it was indeed (https://reviews.llvm.org/D45733#1077183), but now i do not think so. https://godbolt.org/g/L4hDjW ^ so neither of our outputs is fully optimized. But if i manually transform that assembly to C, the end result tells me that DAGCombine/arm isel is simply missing some optimizations. I could be wrong, of course.

spatel added inline comments.May 2 2018, 1:32 PM

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll
350–355	Ok, I was seeing an extra 'not' in there somewhere, so no miscompile. And the conclusion is that we don't care about this diff because it's already sub-optimal and instcombine should have folded it anyway. That raises the question of why are we testing this in the first place though. Add a comment to explain that or just delete?
401–403	This is a real regression, or am I seeing things that aren't there?
test/CodeGen/X86/unfold-masked-merge-scalar-variablemask.ll
573–575	But as with AArch, we don't care because instcombine would fold this?

lebedev.ri added inline comments.May 2 2018, 1:49 PM

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll
350–355	I'm somewhat sure that this is the scalar version of those tests that are failing in D46073 (and when we unfold vector masked merge), so i think it's best to keep these tests.
401–403	We replaced two instructions with two other instructions. Unless i'm using a bad `-mcpu` (`-mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a75`, is there a better choice?), this does not seem to matter in practice. Or i'm simply looking at `llvm-mca` wrong :) diff.txt1 KBDownload

spatel added inline comments.May 2 2018, 2:19 PM

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll
401–403	Correct - 2 instructions change. But the whole point of the masked merge exercise was to maximize the throughput depending on the target, right? The code with and/andn+or has a shorter critical path than the dependent chain of xor+and+xor. So I think llvm-mca is lying...at least for that CPU model. If we plug these in with -mcpu=kryo, we get: IPC: 1.32 for the 'eor' chain IPC: 1.96 for the 'bic' chain Is the problem that x86 can't form 'andn' with an immediate? Can we fix its override of hasAndNot to account for that? Or is the problem that we should be ignoring 'not' ops as candidates for transforming in this function? Or both?

lebedev.ri added a subscriber: andreadb.May 2 2018, 2:40 PM

lebedev.ri added inline comments.

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll
401–403	But the whole point of the masked merge exercise was to maximize the throughput depending on the target, right? Yes, absolutely. So I think llvm-mca is lying...at least for that CPU model. If we plug these in with -mcpu=kryo, we get: IPC: 1.32 for the 'eor' chain IPC: 1.96 for the 'bic' chain Ok, thank you, that makes more sense. It would be nice if llvm-mca's docs would contain a list of 'good' cpu models, for which it is known not to lie. (cc @andreadb ) Is the problem that x86 can't form 'andn' with an immediate? Yes, that is the motivational case. Can we fix its override of hasAndNot to account for that? Hmm, actually, maybe we can... Looking at the docs, it is already specified that it takes the value, not the mask-to-be-inverted. Or is the problem that we should be ignoring 'not' ops as candidates for transforming in this function? Or both? I don't think i'm able to answer that. Instcombine should certainly handle that, yes.

Or is the problem that we should be ignoring 'not' ops as candidates for transforming in this function? Or both?
I don't think i'm able to answer that. Instcombine should certainly handle that, yes.

I may still not be seeing clearly, but I think this is the real problem - we should just bail out if the 'xor' is truly a 'not'.
Nothing good is going to come out of us trying to improve on that here.

The 'andn' with constant for x86 is a small concern. It might be a win or not, but it's probably not going to make a big difference either way?

Don't touch not.
Update X86TargetLowering::hasAndNot() with the check for immediates.

The last change affects the transform @spatel have added in D27489 / rL289738,
and the test coverage for X86 was missing.
But after i have added it, and looked at the changes in MCA, i'm again confused.

icmp-opt.txt2 KBDownload

pos_sel_constants.txt2 KBDownload

pos_sel_special_constant.txt2 KBDownload

I'd say this regression is an improvement, since IPC increased?

andreadb added inline comments.May 4 2018, 6:22 AM

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll
401–403	llvm-mca is not lying. cpu cortex-a75 describes the latency of eor/bic/orr using variant scheduling classes. llvm-mca doesn't know how to analyze variant scheduling classes. So, you should have seen one or more warnings generated by the tool. If for good model you mean a model that doesn't use variant scheduling classes, then you only need to worry about cases where the above mentioned warnings are generated. I am currently working hard on a patch to add support for variant scheduling classes. It is quite tricky, but I am confident that I will have something in the form of a patch ready (hopefully) on next week. Cheers, Andrea

lebedev.ri added inline comments.May 4 2018, 6:28 AM

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll
401–403	I've come up with this script to do these comparisons llvm-mca.sh609 BDownload I'm guessing i haven't noticed any warnings because i only redirect stdout, which is good to know.

spatel added inline comments.May 4 2018, 6:39 AM

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll
401–403	Aha - thanks for clearing that up. I missed the warnings because they were at the top of the output, and I didn't scroll that far back up. :) $ llvm-mca -mtriple=aarch64 -mcpu=cortex-a75 eor.s warning: don't know how to model variant opcodes. note: assume 1 micro opcode. A potential usability improvement would be to make warnings like that one louder in some way (repeat it at the bottom, put asterisks in the stats?). Just a thought...now that I know, I'll definitely look harder at the whole output.

lebedev.ri added inline comments.May 4 2018, 6:42 AM

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll
401–403	Or maybe duplicate them to stdout too, not just output then to stderr?

andreadb added inline comments.May 4 2018, 6:43 AM

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll
401–403	I'll see what I can do to improve it. I might make the "warning:" red :-). Alternatively I could add some sort of `-Werror` equivalent mode where the warning is promoted to a fatal error. Something like that... As a side note: I mentioned this issue once in reply to D45733. Maybe that comment was lost in the noise.

I want to look at that x86 timing difference in more detail, but let me ask first: can we split this patch up and look at the changes independently?
I think there are 3 parts:

Ignore 'not'.
Change x86 hasAndNot().
Improve matching for AndNot with constant.

In D46031#1088250, @spatel wrote:

I want to look at that x86 timing difference in more detail, but let me ask first: can we split this patch up and look at the changes independently?
I think there are 3 parts:

Ignore 'not'.

Change x86 hasAndNot().

Improve matching for AndNot with constant.

Yes, i think i can split it into three, will post tomorrow.

lebedev.ri mentioned this in D46492: [DAGCombiner] Masked merge: don't touch "not" xor's..May 5 2018, 3:54 AM

lebedev.ri mentioned this in D46493: [DagCombiner] Not all 'andn''s work with immediates..May 5 2018, 4:15 AM

In D46031#1088257, @lebedev.ri wrote:

In D46031#1088250, @spatel wrote:

I want to look at that x86 timing difference in more detail, but let me ask first: can we split this patch up and look at the changes independently?
I think there are 3 parts:

Ignore 'not'.

Change x86 hasAndNot().

Improve matching for AndNot with constant.

Yes, i think i can split it into three, will post tomorrow.

Done, D46492 D46493 D46494.

Diffusion mentioned this in rL331595: [DAGCombiner] Masked merge: don't touch "not" xor's..May 5 2018, 8:49 AM

Diffusion mentioned this in rL331685: [DAGCombiner] Masked merge: enhance handling of 'andn' with immediates.May 7 2018, 2:57 PM

Diffusion mentioned this in rL331684: [DagCombiner] Not all 'andn''s work with immediates..

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

40 lines

Target/

X86/

X86ISelLowering.h

2 lines

X86ISelLowering.cpp

11 lines

test/

CodeGen/

AArch64/

unfold-masked-merge-scalar-variablemask.ll

18 lines

X86/

icmp-opt.ll

6 lines

selectcc-to-shiftand.ll

16 lines

unfold-masked-merge-scalar-variablemask.ll

36 lines

Diff 145020

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,369 Lines • ▼ Show 20 Lines
	// \| A \| \|B\|			// \| A \| \|B\|
	// ((x ^ y) & m) ^ y			// ((x ^ y) & m) ^ y
	// \| D \|			// \| D \|
	// Into:			// Into:
	// (x & m) \| (y & ~m)			// (x & m) \| (y & ~m)
	SDValue DAGCombiner::unfoldMaskedMerge(SDNode *N) {			SDValue DAGCombiner::unfoldMaskedMerge(SDNode *N) {
	assert(N->getOpcode() == ISD::XOR);			assert(N->getOpcode() == ISD::XOR);

				// Don't touch 'not' (i.e. where y = -1).
				if (isAllOnesConstantOrAllOnesSplatConstant(N->getOperand(1)))
				return SDValue();

	EVT VT = N->getValueType(0);			EVT VT = N->getValueType(0);

	// FIXME			// FIXME
	if (VT.isVector())			if (VT.isVector())
	return SDValue();			return SDValue();

	// There are 3 commutable operators in the pattern,			// There are 3 commutable operators in the pattern,
	// so we have to deal with 8 possible variants of the basic pattern.			// so we have to deal with 8 possible variants of the basic pattern.
	SDValue X, Y, M;			SDValue A, D, X, Y, M;
	auto matchAndXor = [&X, &Y, &M](SDValue And, unsigned XorIdx, SDValue Other) {			auto matchAndXor = [&A, &D, &X, &Y, &M](SDValue And, unsigned XorIdx,
				SDValue Other) {
	if (And.getOpcode() != ISD::AND \|\| !And.hasOneUse())			if (And.getOpcode() != ISD::AND \|\| !And.hasOneUse())
	return false;			return false;
	if (And.getOperand(XorIdx).getOpcode() != ISD::XOR \|\|			SDValue Xor = And.getOperand(XorIdx);
	!And.getOperand(XorIdx).hasOneUse())			if (Xor.getOpcode() != ISD::XOR \|\| !Xor.hasOneUse())
				return false;
				SDValue Xor0 = Xor.getOperand(0);
				SDValue Xor1 = Xor.getOperand(1);
				// Don't touch 'not' (i.e. where y = -1).
				if (isAllOnesConstantOrAllOnesSplatConstant(Xor1))
	return false;			return false;
	SDValue Xor0 = And.getOperand(XorIdx).getOperand(0);
	SDValue Xor1 = And.getOperand(XorIdx).getOperand(1);
	if (Other == Xor0)			if (Other == Xor0)
	std::swap(Xor0, Xor1);			std::swap(Xor0, Xor1);
	if (Other != Xor1)			if (Other != Xor1)
	return false;			return false;
				A = And;
				D = Xor;
	X = Xor0;			X = Xor0;
	Y = Xor1;			Y = Xor1;
	M = And.getOperand(XorIdx ? 0 : 1);			M = And.getOperand(XorIdx ? 0 : 1);
	return true;			return true;
	};			};

	SDValue A = N->getOperand(0);			SDValue A_ = N->getOperand(0);
	SDValue B = N->getOperand(1);			SDValue B_ = N->getOperand(1);
	if (!matchAndXor(A, 0, B) && !matchAndXor(A, 1, B) && !matchAndXor(B, 0, A) &&			if (!matchAndXor(A_, 0, B_) && !matchAndXor(A_, 1, B_) &&
	!matchAndXor(B, 1, A))			!matchAndXor(B_, 0, A_) && !matchAndXor(B_, 1, A_))
	return SDValue();			return SDValue();

	// Don't do anything if the mask is constant. This should not be reachable.			// Don't do anything if the mask is constant. This should not be reachable.
	// InstCombine should have already unfolded this pattern, and DAGCombiner			// InstCombine should have already unfolded this pattern, and DAGCombiner
	// probably shouldn't produce it, too.			// probably shouldn't produce it, too.
	if (isa<ConstantSDNode>(M.getNode()))			if (isa<ConstantSDNode>(M.getNode()))
	return SDValue();			return SDValue();

	// We can transform if the target has AndNot			// We can transform if the target has AndNot
	if (!TLI.hasAndNot(M))			if (!TLI.hasAndNot(M))
	return SDValue();			return SDValue();

	SDLoc DL(N);			SDLoc DL(N);
				SDValue NotM = DAG.getNOT(DL, M, VT);

				// If Y is a constant, check that 'andn' works with immediates.
				if (!TLI.hasAndNot(Y)) {
				assert(TLI.hasAndNot(X) && "Only mask is a variable? Unreachable.");
				// If not, de-canonicalze (Invert) the mask, swap the value in B part.
				SDValue NewA = DAG.getNode(ISD::AND, DL, VT, D, NotM);
				return DAG.getNode(ISD::OR, DL, VT, NewA, X);
				}

	SDValue LHS = DAG.getNode(ISD::AND, DL, VT, X, M);			SDValue LHS = DAG.getNode(ISD::AND, DL, VT, X, M);
	SDValue NotM = DAG.getNOT(DL, M, VT);
	SDValue RHS = DAG.getNode(ISD::AND, DL, VT, Y, NotM);			SDValue RHS = DAG.getNode(ISD::AND, DL, VT, Y, NotM);

	return DAG.getNode(ISD::OR, DL, VT, LHS, RHS);			return DAG.getNode(ISD::OR, DL, VT, LHS, RHS);
	}			}

	SDValue DAGCombiner::visitXOR(SDNode *N) {			SDValue DAGCombiner::visitXOR(SDNode *N) {
	SDValue N0 = N->getOperand(0);			SDValue N0 = N->getOperand(0);
	SDValue N1 = N->getOperand(1);			SDValue N1 = N->getOperand(1);
	▲ Show 20 Lines • Show All 12,571 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 828 Lines • ▼ Show 20 Lines	bool isMultiStoresCheaperThanBitsMerge(EVT LTy, EVT HTy) const override {
// such pair out until we get testcase to prove it is a win.		// such pair out until we get testcase to prove it is a win.
return false;		return false;
}		}

bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const override;		bool isMaskAndCmp0FoldingBeneficial(const Instruction &AndI) const override;

bool hasAndNotCompare(SDValue Y) const override;		bool hasAndNotCompare(SDValue Y) const override;

		bool hasAndNot(SDValue Y) const override;

bool convertSetCCLogicToBitwiseLogic(EVT VT) const override {		bool convertSetCCLogicToBitwiseLogic(EVT VT) const override {
return VT.isScalarInteger();		return VT.isScalarInteger();
}		}

/// Vector-sized comparisons are fast using PCMPEQ + PMOVMSK or PTEST.		/// Vector-sized comparisons are fast using PCMPEQ + PMOVMSK or PTEST.
MVT hasFastEqualityCompare(unsigned NumBits) const override;		MVT hasFastEqualityCompare(unsigned NumBits) const override;

/// Allow multiple load pairs per block for smaller and faster code.		/// Allow multiple load pairs per block for smaller and faster code.
▲ Show 20 Lines • Show All 713 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,737 Lines • ▼ Show 20 Lines
	}			}

	bool X86TargetLowering::isMaskAndCmp0FoldingBeneficial(			bool X86TargetLowering::isMaskAndCmp0FoldingBeneficial(
	const Instruction &AndI) const {			const Instruction &AndI) const {
	return true;			return true;
	}			}

	bool X86TargetLowering::hasAndNotCompare(SDValue Y) const {			bool X86TargetLowering::hasAndNotCompare(SDValue Y) const {
				// A mask and compare against constant is ok for an 'andn' too
				// even though the BMI instruction doesn't have an immediate form.

	if (!Subtarget.hasBMI())			if (!Subtarget.hasBMI())
	return false;			return false;

	// There are only 32-bit and 64-bit forms for 'andn'.			// There are only 32-bit and 64-bit forms for 'andn'.
	EVT VT = Y.getValueType();			EVT VT = Y.getValueType();
	if (VT != MVT::i32 && VT != MVT::i64)			if (VT != MVT::i32 && VT != MVT::i64)
	return false;			return false;

	return true;			return true;
	}			}

				bool X86TargetLowering::hasAndNot(SDValue Y) const {
				// x86 can't form 'andn' with an immediate.
				if (isa<ConstantSDNode>(Y))
				return false;

				return hasAndNotCompare(Y);
				}

	MVT X86TargetLowering::hasFastEqualityCompare(unsigned NumBits) const {			MVT X86TargetLowering::hasFastEqualityCompare(unsigned NumBits) const {
	MVT VT = MVT::getIntegerVT(NumBits);			MVT VT = MVT::getIntegerVT(NumBits);
	if (isTypeLegal(VT))			if (isTypeLegal(VT))
	return VT;			return VT;

	// PMOVMSKB can handle this.			// PMOVMSKB can handle this.
	if (NumBits == 128 && isTypeLegal(MVT::v16i8))			if (NumBits == 128 && isTypeLegal(MVT::v16i8))
	return MVT::v16i8;			return MVT::v16i8;
	▲ Show 20 Lines • Show All 34,993 Lines • Show Last 20 Lines

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll

Show First 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%mx = and i32 %mask, %x		%mx = and i32 %mask, %x
%my = and i32 %notmask, -1		%my = and i32 %notmask, -1
%r = or i32 %mx, %my		%r = or i32 %mx, %my
ret i32 %r		ret i32 %r
}		}
define i32 @in_constant_varx_mone(i32 %x, i32 %y, i32 %mask) {		define i32 @in_constant_varx_mone(i32 %x, i32 %y, i32 %mask) {
; CHECK-LABEL: in_constant_varx_mone:		; CHECK-LABEL: in_constant_varx_mone:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: and w8, w0, w2		; CHECK-NEXT: bic w8, w2, w0
; CHECK-NEXT: orn w0, w8, w2		; CHECK-NEXT: mvn w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%n0 = xor i32 %x, -1 ; %x		%n0 = xor i32 %x, -1 ; %x
%n1 = and i32 %n0, %mask		%n1 = and i32 %n0, %mask
%r = xor i32 %n1, -1		%r = xor i32 %n1, -1
spatelUnsubmitted Done Reply Inline Actions How does this happen? Isn't that a miscompile? spatel: How does this happen? Isn't that a miscompile?
lebedev.riAuthorUnsubmitted Done Reply Inline Actions Hm, at first i thought it was indeed (https://reviews.llvm.org/D45733#1077183), but now i do not think so. https://godbolt.org/g/L4hDjW ^ so neither of our outputs is fully optimized. But if i manually transform that assembly to C, the end result tells me that DAGCombine/arm isel is simply missing some optimizations. I could be wrong, of course. lebedev.ri: Hm, at first i thought it was indeed (https://reviews.llvm.org/D45733#1077183), but now i do…
spatelUnsubmitted Done Reply Inline Actions Ok, I was seeing an extra 'not' in there somewhere, so no miscompile. And the conclusion is that we don't care about this diff because it's already sub-optimal and instcombine should have folded it anyway. That raises the question of why are we testing this in the first place though. Add a comment to explain that or just delete? spatel: Ok, I was seeing an extra 'not' in there somewhere, so no miscompile. And the conclusion is…
lebedev.riAuthorUnsubmitted Done Reply Inline Actions I'm somewhat sure that this is the scalar version of those tests that are failing in D46073 (and when we unfold vector masked merge), so i think it's best to keep these tests. lebedev.ri: I'm somewhat sure that this is the scalar version of those tests that are failing in D46073…
ret i32 %r		ret i32 %r
}		}
define i32 @out_constant_varx_mone_invmask(i32 %x, i32 %y, i32 %mask) {		define i32 @out_constant_varx_mone_invmask(i32 %x, i32 %y, i32 %mask) {
; CHECK-LABEL: out_constant_varx_mone_invmask:		; CHECK-LABEL: out_constant_varx_mone_invmask:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: bic w8, w0, w2		; CHECK-NEXT: bic w8, w0, w2
; CHECK-NEXT: orr w0, w8, w2		; CHECK-NEXT: orr w0, w8, w2
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%notmask = xor i32 %mask, -1		%notmask = xor i32 %mask, -1
%mx = and i32 %notmask, %x		%mx = and i32 %notmask, %x
%my = and i32 %mask, -1		%my = and i32 %mask, -1
%r = or i32 %mx, %my		%r = or i32 %mx, %my
ret i32 %r		ret i32 %r
}		}
define i32 @in_constant_varx_mone_invmask(i32 %x, i32 %y, i32 %mask) {		define i32 @in_constant_varx_mone_invmask(i32 %x, i32 %y, i32 %mask) {
; CHECK-LABEL: in_constant_varx_mone_invmask:		; CHECK-LABEL: in_constant_varx_mone_invmask:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: bic w8, w0, w2		; CHECK-NEXT: mvn w8, w0
; CHECK-NEXT: orr w0, w8, w2		; CHECK-NEXT: bic w8, w8, w2
		; CHECK-NEXT: mvn w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%notmask = xor i32 %mask, -1		%notmask = xor i32 %mask, -1
%n0 = xor i32 %x, -1 ; %x		%n0 = xor i32 %x, -1 ; %x
%n1 = and i32 %n0, %notmask		%n1 = and i32 %n0, %notmask
%r = xor i32 %n1, -1		%r = xor i32 %n1, -1
ret i32 %r		ret i32 %r
}		}
define i32 @out_constant_varx_42(i32 %x, i32 %y, i32 %mask) {		define i32 @out_constant_varx_42(i32 %x, i32 %y, i32 %mask) {
Show All 9 Lines	; CHECK-NEXT: ret
%my = and i32 %notmask, 42		%my = and i32 %notmask, 42
%r = or i32 %mx, %my		%r = or i32 %mx, %my
ret i32 %r		ret i32 %r
}		}
define i32 @in_constant_varx_42(i32 %x, i32 %y, i32 %mask) {		define i32 @in_constant_varx_42(i32 %x, i32 %y, i32 %mask) {
; CHECK-LABEL: in_constant_varx_42:		; CHECK-LABEL: in_constant_varx_42:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #42		; CHECK-NEXT: mov w8, #42
; CHECK-NEXT: bic w8, w8, w2		; CHECK-NEXT: bic w8, w8, w2
; CHECK-NEXT: and w9, w0, w2		; CHECK-NEXT: and w9, w0, w2
; CHECK-NEXT: orr w0, w9, w8		; CHECK-NEXT: orr w0, w9, w8
		spatelUnsubmitted Done Reply Inline Actions This is a real regression, or am I seeing things that aren't there? spatel: This is a real regression, or am I seeing things that aren't there?
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions We replaced two instructions with two other instructions. Unless i'm using a bad `-mcpu` (`-mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a75`, is there a better choice?), this does not seem to matter in practice. Or i'm simply looking at `llvm-mca` wrong :) diff.txt1 KBDownload lebedev.ri: We replaced two instructions with two other instructions. Unless i'm using a bad `-mcpu` (`…
		spatelUnsubmitted Done Reply Inline Actions Correct - 2 instructions change. But the whole point of the masked merge exercise was to maximize the throughput depending on the target, right? The code with and/andn+or has a shorter critical path than the dependent chain of xor+and+xor. So I think llvm-mca is lying...at least for that CPU model. If we plug these in with -mcpu=kryo, we get: IPC: 1.32 for the 'eor' chain IPC: 1.96 for the 'bic' chain Is the problem that x86 can't form 'andn' with an immediate? Can we fix its override of hasAndNot to account for that? Or is the problem that we should be ignoring 'not' ops as candidates for transforming in this function? Or both? spatel: Correct - 2 instructions change. But the whole point of the masked merge exercise was to…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions But the whole point of the masked merge exercise was to maximize the throughput depending on the target, right? Yes, absolutely. So I think llvm-mca is lying...at least for that CPU model. If we plug these in with -mcpu=kryo, we get: IPC: 1.32 for the 'eor' chain IPC: 1.96 for the 'bic' chain Ok, thank you, that makes more sense. It would be nice if llvm-mca's docs would contain a list of 'good' cpu models, for which it is known not to lie. (cc @andreadb ) Is the problem that x86 can't form 'andn' with an immediate? Yes, that is the motivational case. Can we fix its override of hasAndNot to account for that? Hmm, actually, maybe we can... Looking at the docs, it is already specified that it takes the value, not the mask-to-be-inverted. Or is the problem that we should be ignoring 'not' ops as candidates for transforming in this function? Or both? I don't think i'm able to answer that. Instcombine should certainly handle that, yes. lebedev.ri: > But the whole point of the masked merge exercise was to maximize the throughput depending on…
		andreadbUnsubmitted Not Done Reply Inline Actions llvm-mca is not lying. cpu cortex-a75 describes the latency of eor/bic/orr using variant scheduling classes. llvm-mca doesn't know how to analyze variant scheduling classes. So, you should have seen one or more warnings generated by the tool. If for good model you mean a model that doesn't use variant scheduling classes, then you only need to worry about cases where the above mentioned warnings are generated. I am currently working hard on a patch to add support for variant scheduling classes. It is quite tricky, but I am confident that I will have something in the form of a patch ready (hopefully) on next week. Cheers, Andrea andreadb: llvm-mca is not lying. cpu cortex-a75 describes the latency of eor/bic/orr using variant…
		lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions I've come up with this script to do these comparisons llvm-mca.sh609 BDownload I'm guessing i haven't noticed any warnings because i only redirect stdout, which is good to know. lebedev.ri: I've come up with this script to do these comparisons {F6101135} I'm guessing i haven't noticed…
		spatelUnsubmitted Not Done Reply Inline Actions Aha - thanks for clearing that up. I missed the warnings because they were at the top of the output, and I didn't scroll that far back up. :) $ llvm-mca -mtriple=aarch64 -mcpu=cortex-a75 eor.s warning: don't know how to model variant opcodes. note: assume 1 micro opcode. A potential usability improvement would be to make warnings like that one louder in some way (repeat it at the bottom, put asterisks in the stats?). Just a thought...now that I know, I'll definitely look harder at the whole output. spatel: Aha - thanks for clearing that up. I missed the warnings because they were at the top of the…
		lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions Or maybe duplicate them to stdout too, not just output then to stderr? lebedev.ri: Or maybe duplicate them to stdout too, not just output then to stderr?
		andreadbUnsubmitted Not Done Reply Inline Actions I'll see what I can do to improve it. I might make the "warning:" red :-). Alternatively I could add some sort of `-Werror` equivalent mode where the warning is promoted to a fatal error. Something like that... As a side note: I mentioned this issue once in reply to D45733. Maybe that comment was lost in the noise. andreadb: I'll see what I can do to improve it. I might make the "warning:" red :-). Alternatively I…
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%n0 = xor i32 %x, 42 ; %x		%n0 = xor i32 %x, 42 ; %x
%n1 = and i32 %n0, %mask		%n1 = and i32 %n0, %mask
%r = xor i32 %n1, 42		%r = xor i32 %n1, 42
ret i32 %r		ret i32 %r
}		}
define i32 @out_constant_varx_42_invmask(i32 %x, i32 %y, i32 %mask) {		define i32 @out_constant_varx_42_invmask(i32 %x, i32 %y, i32 %mask) {
; CHECK-LABEL: out_constant_varx_42_invmask:		; CHECK-LABEL: out_constant_varx_42_invmask:
Show All 33 Lines	; CHECK-NEXT: ret
%mx = and i32 %mask, -1		%mx = and i32 %mask, -1
%my = and i32 %notmask, %y		%my = and i32 %notmask, %y
%r = or i32 %mx, %my		%r = or i32 %mx, %my
ret i32 %r		ret i32 %r
}		}
define i32 @in_constant_mone_vary(i32 %x, i32 %y, i32 %mask) {		define i32 @in_constant_mone_vary(i32 %x, i32 %y, i32 %mask) {
; CHECK-LABEL: in_constant_mone_vary:		; CHECK-LABEL: in_constant_mone_vary:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: bic w8, w1, w2		; CHECK-NEXT: bic w8, w2, w1
; CHECK-NEXT: orr w0, w2, w8		; CHECK-NEXT: eor w0, w8, w1
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%n0 = xor i32 -1, %y ; %x		%n0 = xor i32 -1, %y ; %x
%n1 = and i32 %n0, %mask		%n1 = and i32 %n0, %mask
%r = xor i32 %n1, %y		%r = xor i32 %n1, %y
ret i32 %r		ret i32 %r
}		}
define i32 @out_constant_mone_vary_invmask(i32 %x, i32 %y, i32 %mask) {		define i32 @out_constant_mone_vary_invmask(i32 %x, i32 %y, i32 %mask) {
; CHECK-LABEL: out_constant_mone_vary_invmask:		; CHECK-LABEL: out_constant_mone_vary_invmask:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: and w8, w2, w1		; CHECK-NEXT: and w8, w2, w1
; CHECK-NEXT: orn w0, w8, w2		; CHECK-NEXT: orn w0, w8, w2
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%notmask = xor i32 %mask, -1		%notmask = xor i32 %mask, -1
%mx = and i32 %notmask, -1		%mx = and i32 %notmask, -1
%my = and i32 %mask, %y		%my = and i32 %mask, %y
%r = or i32 %mx, %my		%r = or i32 %mx, %my
ret i32 %r		ret i32 %r
}		}
define i32 @in_constant_mone_vary_invmask(i32 %x, i32 %y, i32 %mask) {		define i32 @in_constant_mone_vary_invmask(i32 %x, i32 %y, i32 %mask) {
; CHECK-LABEL: in_constant_mone_vary_invmask:		; CHECK-LABEL: in_constant_mone_vary_invmask:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: and w8, w1, w2		; CHECK-NEXT: mvn w8, w1
; CHECK-NEXT: orn w0, w8, w2		; CHECK-NEXT: bic w8, w8, w2
		; CHECK-NEXT: eor w0, w8, w1
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%notmask = xor i32 %mask, -1		%notmask = xor i32 %mask, -1
%n0 = xor i32 -1, %y ; %x		%n0 = xor i32 -1, %y ; %x
%n1 = and i32 %n0, %notmask		%n1 = and i32 %n0, %notmask
%r = xor i32 %n1, %y		%r = xor i32 %n1, %y
ret i32 %r		ret i32 %r
}		}
define i32 @out_constant_42_vary(i32 %x, i32 %y, i32 %mask) {		define i32 @out_constant_42_vary(i32 %x, i32 %y, i32 %mask) {
▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines

test/CodeGen/X86/icmp-opt.ll

	Show All 11 Lines
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %eax, %eax			; CHECK-NOBMI-NEXT: xorl %eax, %eax
	; CHECK-NOBMI-NEXT: testq %rdi, %rdi			; CHECK-NOBMI-NEXT: testq %rdi, %rdi
	; CHECK-NOBMI-NEXT: setns %al			; CHECK-NOBMI-NEXT: setns %al
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: t1:			; CHECK-BMI-LABEL: t1:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: shrq $63, %rdi			; CHECK-BMI-NEXT: xorl %eax, %eax
	; CHECK-BMI-NEXT: xorl $1, %edi			; CHECK-BMI-NEXT: testq %rdi, %rdi
	; CHECK-BMI-NEXT: movl %edi, %eax			; CHECK-BMI-NEXT: setns %al
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%cmp = icmp sgt i64 %a, -1			%cmp = icmp sgt i64 %a, -1
	%conv = zext i1 %cmp to i32			%conv = zext i1 %cmp to i32
	ret i32 %conv			ret i32 %conv
	}			}

test/CodeGen/X86/selectcc-to-shiftand.ll

	Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; CHECK-NOBMI-NEXT: xorl %eax, %eax			; CHECK-NOBMI-NEXT: xorl %eax, %eax
	; CHECK-NOBMI-NEXT: testl %edi, %edi			; CHECK-NOBMI-NEXT: testl %edi, %edi
	; CHECK-NOBMI-NEXT: setns %al			; CHECK-NOBMI-NEXT: setns %al
	; CHECK-NOBMI-NEXT: leal (%rax,%rax,4), %eax			; CHECK-NOBMI-NEXT: leal (%rax,%rax,4), %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: pos_sel_constants:			; CHECK-BMI-LABEL: pos_sel_constants:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: sarl $31, %edi			; CHECK-BMI-NEXT: xorl %eax, %eax
	; CHECK-BMI-NEXT: notl %edi			; CHECK-BMI-NEXT: testl %edi, %edi
	; CHECK-BMI-NEXT: andl $5, %edi			; CHECK-BMI-NEXT: setns %al
	; CHECK-BMI-NEXT: movl %edi, %eax			; CHECK-BMI-NEXT: leal (%rax,%rax,4), %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%tmp.1 = icmp sgt i32 %a, -1			%tmp.1 = icmp sgt i32 %a, -1
	%retval = select i1 %tmp.1, i32 5, i32 0			%retval = select i1 %tmp.1, i32 5, i32 0
	ret i32 %retval			ret i32 %retval
	}			}

	; Compare if positive and select of constants where one constant is zero and the other is a single bit.			; Compare if positive and select of constants where one constant is zero and the other is a single bit.

	define i32 @pos_sel_special_constant(i32 %a) {			define i32 @pos_sel_special_constant(i32 %a) {
	; CHECK-NOBMI-LABEL: pos_sel_special_constant:			; CHECK-NOBMI-LABEL: pos_sel_special_constant:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	; CHECK-NOBMI-NEXT: xorl %eax, %eax			; CHECK-NOBMI-NEXT: xorl %eax, %eax
	; CHECK-NOBMI-NEXT: testl %edi, %edi			; CHECK-NOBMI-NEXT: testl %edi, %edi
	; CHECK-NOBMI-NEXT: setns %al			; CHECK-NOBMI-NEXT: setns %al
	; CHECK-NOBMI-NEXT: shll $9, %eax			; CHECK-NOBMI-NEXT: shll $9, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: pos_sel_special_constant:			; CHECK-BMI-LABEL: pos_sel_special_constant:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: shrl $22, %edi			; CHECK-BMI-NEXT: xorl %eax, %eax
	; CHECK-BMI-NEXT: notl %edi			; CHECK-BMI-NEXT: testl %edi, %edi
	; CHECK-BMI-NEXT: andl $512, %edi # imm = 0x200			; CHECK-BMI-NEXT: setns %al
	; CHECK-BMI-NEXT: movl %edi, %eax			; CHECK-BMI-NEXT: shll $9, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%tmp.1 = icmp sgt i32 %a, -1			%tmp.1 = icmp sgt i32 %a, -1
	%retval = select i1 %tmp.1, i32 512, i32 0			%retval = select i1 %tmp.1, i32 512, i32 0
	ret i32 %retval			ret i32 %retval
	}			}

	; Compare if positive and select variable or zero.			; Compare if positive and select variable or zero.

	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

test/CodeGen/X86/unfold-masked-merge-scalar-variablemask.ll

	Show First 20 Lines • Show All 564 Lines • ▼ Show 20 Lines
	; CHECK-NOBMI-NEXT: notl %edi			; CHECK-NOBMI-NEXT: notl %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: notl %edi			; CHECK-NOBMI-NEXT: notl %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_constant_varx_mone:			; CHECK-BMI-LABEL: in_constant_varx_mone:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andl %edx, %edi			; CHECK-BMI-NEXT: andnl %edx, %edi, %eax
	; CHECK-BMI-NEXT: notl %edx			; CHECK-BMI-NEXT: notl %eax
	; CHECK-BMI-NEXT: orl %edi, %edx
	; CHECK-BMI-NEXT: movl %edx, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
				lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions This improves. lebedev.ri: This improves.
				spatelUnsubmitted Done Reply Inline Actions But as with AArch, we don't care because instcombine would fold this? spatel: But as with AArch, we don't care because instcombine would fold this?
	%n0 = xor i32 %x, -1 ; %x			%n0 = xor i32 %x, -1 ; %x
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, -1			%r = xor i32 %n1, -1
	ret i32 %r			ret i32 %r
	}			}
	define i32 @out_constant_varx_mone_invmask(i32 %x, i32 %y, i32 %mask) {			define i32 @out_constant_varx_mone_invmask(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: out_constant_varx_mone_invmask:			; CHECK-NOBMI-LABEL: out_constant_varx_mone_invmask:
	; CHECK-NOBMI: # %bb.0:			; CHECK-NOBMI: # %bb.0:
	Show All 21 Lines
	; CHECK-NOBMI-NEXT: notl %edi			; CHECK-NOBMI-NEXT: notl %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: notl %edi			; CHECK-NOBMI-NEXT: notl %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_constant_varx_mone_invmask:			; CHECK-BMI-LABEL: in_constant_varx_mone_invmask:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andnl %edi, %edx, %eax			; CHECK-BMI-NEXT: notl %edx
	; CHECK-BMI-NEXT: orl %edx, %eax			; CHECK-BMI-NEXT: andnl %edx, %edi, %eax
				; CHECK-BMI-NEXT: notl %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
				lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions This degrades, but instcombine will canonicalize it to `@in_constant_mone_vary`, and that one is ok. So this is ok too. lebedev.ri: This degrades, but instcombine will canonicalize it to `@in_constant_mone_vary`, and that one…
	%notmask = xor i32 %mask, -1			%notmask = xor i32 %mask, -1
	%n0 = xor i32 %x, -1 ; %x			%n0 = xor i32 %x, -1 ; %x
	%n1 = and i32 %n0, %notmask			%n1 = and i32 %n0, %notmask
	%r = xor i32 %n1, -1			%r = xor i32 %n1, -1
	ret i32 %r			ret i32 %r
	}			}
	define i32 @out_constant_varx_42(i32 %x, i32 %y, i32 %mask) {			define i32 @out_constant_varx_42(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: out_constant_varx_42:			; CHECK-NOBMI-LABEL: out_constant_varx_42:
	Show All 25 Lines
	; CHECK-NOBMI-NEXT: xorl $42, %edi			; CHECK-NOBMI-NEXT: xorl $42, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl $42, %edi			; CHECK-NOBMI-NEXT: xorl $42, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_constant_varx_42:			; CHECK-BMI-LABEL: in_constant_varx_42:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andl %edx, %edi			; CHECK-BMI-NEXT: movl %edi, %eax
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Looked at this in llvm-mca, no clear winner. The change decreases instruction count and IPC, but the cycle count does not change. So i guess it's ok? lebedev.ri: Looked at this in llvm-mca, no clear winner. The change decreases instruction count and IPC…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions On top of D46073, this `@in_constant_varx_42` pattern (i.e. `%y` being constant) is the only remaining issue. # * IR Dump After Machine InstCombiner : # Machine code for function in_constant_varx_42: IsSSA, TracksLiveness Function Live Ins: $edi in %0, $edx in %2 bb.0 (%ir-block.0): liveins: $edi, $edx %2:gr32 = COPY $edx %0:gr32 = COPY $edi %3:gr32 = AND32rr %0:gr32, %2:gr32, implicit-def dead $eflags %4:gr32 = NOT32r %2:gr32 %5:gr32 = AND32ri8 %4:gr32, 42, implicit-def dead $eflags %6:gr32 = OR32rr %3:gr32, killed %5:gr32, implicit-def dead $eflags $eax = COPY %6:gr32 RET 0, $eax # End machine code for function in_constant_varx_42. This seems* ok (as per mca) on aarch64, but i'm not so sure about x86. diff.txt2 KBDownload lebedev.ri: On top of D46073, this `@in_constant_varx_42` pattern (i.e. `%y` being constant) is the only…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Right, in this case not only should i not unfold it, but also de-canonicalize the mask. diff.txt2 KBDownload lebedev.ri: Right, in this case not only should i not unfold it, but also de-canonicalize the mask.
	; CHECK-BMI-NEXT: notl %edx			; CHECK-BMI-NEXT: xorl $42, %eax
	; CHECK-BMI-NEXT: andl $42, %edx			; CHECK-BMI-NEXT: andnl %eax, %edx, %eax
	; CHECK-BMI-NEXT: orl %edi, %edx			; CHECK-BMI-NEXT: orl %edi, %eax
	; CHECK-BMI-NEXT: movl %edx, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i32 %x, 42 ; %x			%n0 = xor i32 %x, 42 ; %x
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, 42			%r = xor i32 %n1, 42
	ret i32 %r			ret i32 %r
	}			}
	define i32 @out_constant_varx_42_invmask(i32 %x, i32 %y, i32 %mask) {			define i32 @out_constant_varx_42_invmask(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: out_constant_varx_42_invmask:			; CHECK-NOBMI-LABEL: out_constant_varx_42_invmask:
	Show All 24 Lines
	; CHECK-NOBMI-NEXT: notl %edx			; CHECK-NOBMI-NEXT: notl %edx
	; CHECK-NOBMI-NEXT: xorl $42, %edi			; CHECK-NOBMI-NEXT: xorl $42, %edi
	; CHECK-NOBMI-NEXT: andl %edx, %edi			; CHECK-NOBMI-NEXT: andl %edx, %edi
	; CHECK-NOBMI-NEXT: xorl $42, %edi			; CHECK-NOBMI-NEXT: xorl $42, %edi
	; CHECK-NOBMI-NEXT: movl %edi, %eax			; CHECK-NOBMI-NEXT: movl %edi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_constant_varx_42_invmask:			; CHECK-BMI-LABEL: in_constant_varx_42_invmask:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andnl %edi, %edx, %eax			; CHECK-BMI-NEXT: movl %edi, %eax
				lebedev.riAuthorUnsubmitted Not Done Reply Inline Actions As per mca this is an unimportant change, but again, instcombine will canonicalize it to `@in_constant_42_vary`, which is ok. So this one appears ok too. lebedev.ri: As per mca this is an unimportant change, but again, instcombine will canonicalize it to…
	; CHECK-BMI-NEXT: andl $42, %edx			; CHECK-BMI-NEXT: xorl $42, %eax
	; CHECK-BMI-NEXT: orl %edx, %eax			; CHECK-BMI-NEXT: andl %edx, %eax
				; CHECK-BMI-NEXT: orl %edi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%notmask = xor i32 %mask, -1			%notmask = xor i32 %mask, -1
	%n0 = xor i32 %x, 42 ; %x			%n0 = xor i32 %x, 42 ; %x
	%n1 = and i32 %n0, %notmask			%n1 = and i32 %n0, %notmask
	%r = xor i32 %n1, 42			%r = xor i32 %n1, 42
	ret i32 %r			ret i32 %r
	}			}
	define i32 @out_constant_mone_vary(i32 %x, i32 %y, i32 %mask) {			define i32 @out_constant_mone_vary(i32 %x, i32 %y, i32 %mask) {
	Show All 22 Lines
	; CHECK-NOBMI-NEXT: movl %esi, %eax			; CHECK-NOBMI-NEXT: movl %esi, %eax
	; CHECK-NOBMI-NEXT: notl %eax			; CHECK-NOBMI-NEXT: notl %eax
	; CHECK-NOBMI-NEXT: andl %edx, %eax			; CHECK-NOBMI-NEXT: andl %edx, %eax
	; CHECK-NOBMI-NEXT: xorl %esi, %eax			; CHECK-NOBMI-NEXT: xorl %esi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_constant_mone_vary:			; CHECK-BMI-LABEL: in_constant_mone_vary:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andnl %esi, %edx, %eax			; CHECK-BMI-NEXT: andnl %edx, %esi, %eax
	; CHECK-BMI-NEXT: orl %edx, %eax			; CHECK-BMI-NEXT: xorl %esi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%n0 = xor i32 -1, %y ; %x			%n0 = xor i32 -1, %y ; %x
	%n1 = and i32 %n0, %mask			%n1 = and i32 %n0, %mask
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @out_constant_mone_vary_invmask(i32 %x, i32 %y, i32 %mask) {			define i32 @out_constant_mone_vary_invmask(i32 %x, i32 %y, i32 %mask) {
	; CHECK-NOBMI-LABEL: out_constant_mone_vary_invmask:			; CHECK-NOBMI-LABEL: out_constant_mone_vary_invmask:
	Show All 24 Lines
	; CHECK-NOBMI-NEXT: movl %esi, %eax			; CHECK-NOBMI-NEXT: movl %esi, %eax
	; CHECK-NOBMI-NEXT: notl %eax			; CHECK-NOBMI-NEXT: notl %eax
	; CHECK-NOBMI-NEXT: andl %edx, %eax			; CHECK-NOBMI-NEXT: andl %edx, %eax
	; CHECK-NOBMI-NEXT: xorl %esi, %eax			; CHECK-NOBMI-NEXT: xorl %esi, %eax
	; CHECK-NOBMI-NEXT: retq			; CHECK-NOBMI-NEXT: retq
	;			;
	; CHECK-BMI-LABEL: in_constant_mone_vary_invmask:			; CHECK-BMI-LABEL: in_constant_mone_vary_invmask:
	; CHECK-BMI: # %bb.0:			; CHECK-BMI: # %bb.0:
	; CHECK-BMI-NEXT: andl %edx, %esi
	; CHECK-BMI-NEXT: notl %edx			; CHECK-BMI-NEXT: notl %edx
	; CHECK-BMI-NEXT: orl %esi, %edx			; CHECK-BMI-NEXT: andnl %edx, %esi, %eax
	; CHECK-BMI-NEXT: movl %edx, %eax			; CHECK-BMI-NEXT: xorl %esi, %eax
	; CHECK-BMI-NEXT: retq			; CHECK-BMI-NEXT: retq
	%notmask = xor i32 %mask, -1			%notmask = xor i32 %mask, -1
	%n0 = xor i32 -1, %y ; %x			%n0 = xor i32 -1, %y ; %x
	%n1 = and i32 %n0, %notmask			%n1 = and i32 %n0, %notmask
	%r = xor i32 %n1, %y			%r = xor i32 %n1, %y
	ret i32 %r			ret i32 %r
	}			}
	define i32 @out_constant_42_vary(i32 %x, i32 %y, i32 %mask) {			define i32 @out_constant_42_vary(i32 %x, i32 %y, i32 %mask) {
	▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Masked merge: if 'B' is constant, de-canonicalize the pattern (invert the mask).AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 145020

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/Target/X86/X86ISelLowering.h

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll

test/CodeGen/X86/icmp-opt.ll

test/CodeGen/X86/selectcc-to-shiftand.ll

test/CodeGen/X86/unfold-masked-merge-scalar-variablemask.ll

[DAGCombiner] Masked merge: if 'B' is constant, de-canonicalize the pattern (invert the mask).
AbandonedPublic