This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
3/4
DAGCombiner.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
simplify_concat_vectors.ll

Differential D55274

[DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.
ClosedPublic

Authored by andreadb on Dec 4 2018, 8:12 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
spatel

Commits

rG52a2bac583f3: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.
rL348522: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.

Summary

This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes:

concat_vectors( bitcast (scalar_to_vector %A), UNDEF) --> bitcast (scalar_to_vector %A)

This patch only partially addresses PR39257. In particular, it is enough to fix one of the two problematic cases mentioned in PR39257. However, it is not enough to fix the original test case from Craig in PR39257; that particular case would probably require a more complicated approach (and knowledge about used bits).

Before this patch, we used to generate the following code for function PR39257 (-mtriple=x86_64 , -mattr=+avx):

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vxorps  %xmm1, %xmm1, %xmm1
vblendps        $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3]
vmovaps %ymm0, (%rsi)
vzeroupper
retq

Now we generate this:

vmovsd  (%rdi), %xmm0           # xmm0 = mem[0],zero
vmovaps %ymm0, (%rsi)
vzeroupper
retq

As a side note: that VZEROUPPER is completely redundant...

I guess the vzeroupper insertion pass doesn't realize that the definition of %xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on -mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion pass is disabled.

Diff Detail

Event Timeline

andreadb created this revision.Dec 4 2018, 8:12 AM

Maybe call the test file combine-concatvectors.ll ? That matches the other filenames we have.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16327	Use peekThroughOneUseBitcasts ?
16342	This is all very similar to concat_vectors(scalar, undef) -> scalar_to_vector(sclr) at the beginning of visitCONCAT_VECTORS

Thanks Simon for the feedback.
My new combine logic was indeed very similar to the existing logic in visitCONCAT_VECTORS.
This patch reuses that logic and introduces a new rule for the case where the first operand of the concat_vector is a scalar_to_vector.

For readability reasons (and simplicity), i moved all of that logic from visitCONCAT_VECTORS into a separate function.

The reason why I had to tweak combineConcatVectorOfScalars() is because before this patch, visitCONCAT_VECTORS used to early exit from visitCONCAT_VECTORS if it failed to fold a 'concat_vectors of a scalar'.
With this patch, we don't bail out immediately from visitCONCAT_VECTORS. Instead, we try other combine rules. As a consequence, we potentially end up calling function combineConcatVectorOfScalars() more often.
The problem with running combineConcatVectorOfScalars() more often is that there are cases where combineConcatVectorOfScalars() returns a sub-optimal BUILD_VECTOR(Scalar, UNDEF), insted of a much simpler (and more canonical) SCALAR_TO_VECTOR(Scalar).
That particular build_vector is poorly lowered by the x86 backend if the target only has SSE but not SSE2. So, test codegen/x86/vec_fneg.ll was reporting a failure after that refactoring (using -mtriple=i686 -mattr=+sse). Fixing that particular corner case was enough to get back the "expected codegen".

RKSimon added inline comments.Dec 5 2018, 7:45 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16492	check for implicit truncation of the scalar value
test/CodeGen/X86/combine-concatvectors.ll
2 ↗	(On Diff #176813)	Please commit this with trunk's current codegen and rebase so we see the diff

Also, its probably worth doing the initial simplifyConcatVectors refactor as a NFC

andreadb marked 2 inline comments as done.Dec 5 2018, 7:57 AM

andreadb added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16492	Right. I will add that check.
test/CodeGen/X86/combine-concatvectors.ll
2 ↗	(On Diff #176813)	Will do.

Diffusion mentioned this in rL348380: [X86] Add test case to show missed opportunity to combine a concat_vector into….Dec 5 2018, 8:26 AM

Patch updated.

Initially, I wanted to move thal combine logic into a separate function.
However, the presence of an early return in the original combine logic ended up causing problems once the code was moved to a separate function.
I was originally under the impression that the change was safe. However - after running more tests - if we don't bail out immediately from the visitCONCAT_VECTOR function, we may end up introducing odd regressions due to the presence of illegal build_vectors dag nodes (when testing for an SSE1 i686 target) (since we potentially trigger other combine rules which may in turn introduce more problematic illegal vector types).

So, I opted for this simpler (and safer) approach.

LGTM - thanks

This revision is now accepted and ready to land.Dec 6 2018, 8:24 AM

Closed by commit rL348522: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef. (authored by adibiagio). · Explain WhyDec 6 2018, 11:58 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

34 lines

test/

CodeGen/

X86/

simplify_concat_vectors.ll

18 lines

Diff 176636

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,304 Lines • ▼ Show 20 Lines	if (SDValue V = reduceBuildVecExtToExtBuildVec(N))
return V;		return V;

if (SDValue V = reduceBuildVecToShuffle(N))		if (SDValue V = reduceBuildVecToShuffle(N))
return V;		return V;

return SDValue();		return SDValue();
}		}

static SDValue combineConcatVectorOfScalars(SDNode *N, SelectionDAG &DAG) {		static SDValue combineConcatVectorOfScalars(SDNode *N, SelectionDAG &DAG,
		unsigned LegalOperations) {
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
EVT OpVT = N->getOperand(0).getValueType();		EVT OpVT = N->getOperand(0).getValueType();

		SDLoc DL(N);
		EVT VT = N->getValueType(0);

		// concat_vectors( bitcast (scalar_to_vector %A), UNDEF) -->
		// bitcast (scalar_to_vector %A)
		if (!LegalOperations && N->getNumOperands() > 1) {
		SDValue Op0 = N->getOperand(0);
		if (Op0.hasOneUse() && Op0.getOpcode() == ISD::BITCAST &&
		Op0.getOperand(0).hasOneUse() &&
		Op0.getOperand(0).getOpcode() == ISD::SCALAR_TO_VECTOR) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Use peekThroughOneUseBitcasts ? RKSimon: Use peekThroughOneUseBitcasts ?
		bool AllUndefs =
		std::all_of(N->op_begin() + 1, N->op_end(),
		[](const SDValue &U) { return U.isUndef(); });

		if (AllUndefs) {
		SDValue Scalar = Op0.getOperand(0).getOperand(0);
		EVT SVT = Scalar.getValueType();

		EVT NewVT = EVT::getVectorVT(*DAG.getContext(), SVT,
		VT.getSizeInBits() / SVT.getSizeInBits());
		SDValue STV = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, NewVT, Scalar);
		return DAG.getBitcast(VT, STV);
		}
		}
		}
		RKSimonUnsubmitted Not Done Reply Inline Actions This is all very similar to concat_vectors(scalar, undef) -> scalar_to_vector(sclr) at the beginning of visitCONCAT_VECTORS RKSimon: This is all very similar to concat_vectors(scalar, undef) -> scalar_to_vector(sclr) at the…

// If the operands are legal vectors, leave them alone.		// If the operands are legal vectors, leave them alone.
if (TLI.isTypeLegal(OpVT))		if (TLI.isTypeLegal(OpVT))
return SDValue();		return SDValue();

SDLoc DL(N);
EVT VT = N->getValueType(0);
SmallVector<SDValue, 8> Ops;		SmallVector<SDValue, 8> Ops;

EVT SVT = EVT::getIntegerVT(*DAG.getContext(), OpVT.getSizeInBits());		EVT SVT = EVT::getIntegerVT(*DAG.getContext(), OpVT.getSizeInBits());
SDValue ScalarUndef = DAG.getNode(ISD::UNDEF, DL, SVT);		SDValue ScalarUndef = DAG.getNode(ISD::UNDEF, DL, SVT);

// Keep track of what we encounter.		// Keep track of what we encounter.
bool AnyInteger = false;		bool AnyInteger = false;
bool AnyFP = false;		bool AnyFP = false;
for (const SDValue &Op : N->ops()) {		for (const SDValue &Op : N->ops()) {
if (ISD::BITCAST == Op.getOpcode() &&		if (ISD::BITCAST == Op.getOpcode() &&
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	if (ISD::allOperandsUndef(N))
return DAG.getUNDEF(VT);		return DAG.getUNDEF(VT);

// Optimize concat_vectors where all but the first of the vectors are undef.		// Optimize concat_vectors where all but the first of the vectors are undef.
if (std::all_of(std::next(N->op_begin()), N->op_end(), [](const SDValue &Op) {		if (std::all_of(std::next(N->op_begin()), N->op_end(), [](const SDValue &Op) {
return Op.isUndef();		return Op.isUndef();
})) {		})) {
SDValue In = N->getOperand(0);		SDValue In = N->getOperand(0);
assert(In.getValueType().isVector() && "Must concat vectors");		assert(In.getValueType().isVector() && "Must concat vectors");

		RKSimonUnsubmitted Not Done Reply Inline Actions check for implicit truncation of the scalar value RKSimon: check for implicit truncation of the scalar value
		andreadbAuthorUnsubmitted Done Reply Inline Actions Right. I will add that check. andreadb: Right. I will add that check.
// Transform: concat_vectors(scalar, undef) -> scalar_to_vector(sclr).		// Transform: concat_vectors(scalar, undef) -> scalar_to_vector(sclr).
if (In->getOpcode() == ISD::BITCAST &&		if (In->getOpcode() == ISD::BITCAST &&
!In->getOperand(0).getValueType().isVector()) {		!In->getOperand(0).getValueType().isVector()) {
SDValue Scalar = In->getOperand(0);		SDValue Scalar = In->getOperand(0);

// If the bitcast type isn't legal, it might be a trunc of a legal type;		// If the bitcast type isn't legal, it might be a trunc of a legal type;
// look through the trunc so we can still do the transform:		// look through the trunc so we can still do the transform:
// concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)		// concat_vectors(trunc(scalar), undef) -> scalar_to_vector(scalar)
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	if (llvm::all_of(N->ops(), IsBuildVectorOrUndef)) {
}		}

assert(VT.getVectorNumElements() == Opnds.size() &&		assert(VT.getVectorNumElements() == Opnds.size() &&
"Concat vector type mismatch");		"Concat vector type mismatch");
return DAG.getBuildVector(VT, SDLoc(N), Opnds);		return DAG.getBuildVector(VT, SDLoc(N), Opnds);
}		}

// Fold CONCAT_VECTORS of only bitcast scalars (or undef) to BUILD_VECTOR.		// Fold CONCAT_VECTORS of only bitcast scalars (or undef) to BUILD_VECTOR.
if (SDValue V = combineConcatVectorOfScalars(N, DAG))		if (SDValue V = combineConcatVectorOfScalars(N, DAG, LegalOperations))
return V;		return V;

// Fold CONCAT_VECTORS of EXTRACT_SUBVECTOR (or undef) to VECTOR_SHUFFLE.		// Fold CONCAT_VECTORS of EXTRACT_SUBVECTOR (or undef) to VECTOR_SHUFFLE.
if (Level < AfterLegalizeVectorOps && TLI.isTypeLegal(VT))		if (Level < AfterLegalizeVectorOps && TLI.isTypeLegal(VT))
if (SDValue V = combineConcatVectorOfExtracts(N, DAG))		if (SDValue V = combineConcatVectorOfExtracts(N, DAG))
return V;		return V;

// Type legalization of vectors and DAG canonicalization of SHUFFLE_VECTOR		// Type legalization of vectors and DAG canonicalization of SHUFFLE_VECTOR
▲ Show 20 Lines • Show All 2,531 Lines • Show Last 20 Lines

test/CodeGen/X86/simplify_concat_vectors.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+avx < %s \| FileCheck %s

				define void @PR32957(<2 x float>* %in, <8 x float>* %out) {
				; CHECK-LABEL: PR32957:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
				; CHECK-NEXT: vmovaps %ymm0, (%rsi)
				; CHECK-NEXT: vzeroupper
				; CHECK-NEXT: retq
				%ld = load <2 x float>, <2 x float>* %in, align 8
				%ext = extractelement <2 x float> %ld, i64 0
				%ext2 = extractelement <2 x float> %ld, i64 1
				%ins = insertelement <8 x float> <float undef, float undef, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0, float 0.0>, float %ext, i64 0
				%ins2 = insertelement <8 x float> %ins, float %ext2, i64 1
				store <8 x float> %ins2, <8 x float>* %out, align 32
				ret void
				}