Download Raw Diff

Details

Reviewers

rengolin
t.p.northover
gberry
jmolloy
sbaranga
mcrosier

Commits

rG4355e404d50c: [AArch64] Add DAG combine for extract extend pattern
rL255895: [AArch64] Add DAG combine for extract extend pattern

Summary

This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The extend is unnecessary since it is performed
implicitly by the extract.

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 42810.Dec 14 2015, 6:24 PM

mssimpso retitled this revision from to [AArch64] Add missing extract extend patterns.

mssimpso updated this object.

mssimpso added reviewers: t.p.northover, sbaranga, rengolin, mcrosier, gberry.

mssimpso added a subscriber: llvm-commits.

Herald added subscribers: rengolin, aemerson. · View Herald TranscriptDec 14 2015, 6:24 PM

sbaranga added inline comments.Dec 15 2015, 3:02 AM

lib/Target/AArch64/AArch64InstrInfo.td
3809 ↗	(On Diff #42810)	Any idea why we don't have (anyext (vector_extract (...) ) -> vector_extract ( )? I see that vector_extract is EXTRACT_VECTOR_ELT, which according to ISDOpcodes.h should be able to perform the anyext operation. If that would happen, wouldn't one of the patterns above match this?

Thanks very much for the quick reply, Silviu! Please see my response inline.

lib/Target/AArch64/AArch64InstrInfo.td
3809 ↗	(On Diff #42810)	I see what you're getting at, but as far as I know (please correct me if I'm wrong), "vector_extract" can't be used in an output pattern.

gberry added inline comments.Dec 15 2015, 7:45 AM

lib/Target/AArch64/AArch64InstrInfo.td
3809 ↗	(On Diff #42810)	I think what Silvu is getting at is that maybe a better place to fix this would be to add a dag-combine that did (anyext (vector_extract (x)) -> vector_extract(x) (with the wider type).

Reimplemented change as a DAG combine following feedback from Silviu and Geoff.

I first attemted to make this a target-independent combine, but that broke a lot of X86 tests. If we like the approach, I think it's best to limit the combine to AArch64 for now.

mssimpso retitled this revision from [AArch64] Add missing extract extend patterns to [AArch64] Add DAG combine for extract extend pattern.Dec 15 2015, 12:37 PM

mssimpso updated this object.

Get the DAG combine right this time.

Check for illegal vector types.

jmolloy requested changes to this revision.Dec 16 2015, 1:06 AM

jmolloy added a reviewer: jmolloy.

jmolloy added a subscriber: jmolloy.

jmolloy added inline comments.

lib/Target/AArch64/AArch64ISelLowering.cpp

8444

I suppose we need the sign extend because ISD::EXTRACT_VECTOR_ELT gets matched to "smov"? Perhaps that's something that needs changing - surely we should match (sign_ext_inreg (extract_vector_elt)) to smov and (zero_ext_inreg (extract_vector_elt)) to umov.

I think there needs to be more type checking going on here. Consider:

%0 = EXTRACT_VECTOR_ELT <8 x i16> %a  ; %0 is i16
%1 = ANYEXT i16 %0 to i32 ; %1 is i32
%2 = SIGN_EXTEND_INREG i32 %0 (from i8 to i32) ; bit[7] is replicated into bit[8..31]

This cannot be an SMOVv8i16, because the sign bit is bit[7], not bit[15]. This testcase shows it:

define i64 @matt(<8 x i16> %a) {
  %b = extractelement <8 x i16> %a, i32 1
  %c = trunc i16 %b to i8
  %d = sext i8 %c to i64
  ret i64 %d
}

SelectionDAG has 11 nodes:
t0: ch = EntryToken
        t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %vreg0
      t14: i32 = extract_vector_elt t2, Constant:i64<1>
    t15: i64 = any_extend t14
  t13: i64 = sign_extend_inreg t15, ValueType:ch:i8
t9: ch,glue = CopyToReg t0, Register:i64 %X0, t13
t10: ch = AArch64ISD::RET_FLAG t9, Register:i64 %X0, t9:1

This revision now requires changes to proceed.Dec 16 2015, 1:06 AM

Added additional type checking noted by James.

mssimpso marked an inline comment as done.Dec 16 2015, 7:54 AM

mssimpso added inline comments.

lib/Target/AArch64/AArch64ISelLowering.cpp
8444	James, You're right. Thanks for joining the review! We also need to check that the vector element type is the same as the type we're sign extending from. Yes, this combine is intended to canonicalize the patterns we match to SMOV. The existing patterns attempt match (sext_inreg (vector_extract v, i)) for the legal types. However, the 16xi8-to-i64 case and the 8xi16-to-i64 case (which we have patterns for) don't actually follow this form because there is an added any_extend. An any_extend from i32 to i64 is added by the type legalizer because it legalizes the extracted elements to i32. So for example, for the case below: define i64 @f(<16 x i8> %a) { entry: %e = extractelement <16 x i8> %a, i32 2 %b = zext i8 %e to i64 ret i64 %b } we currently generate: umov w8, v0.b[2] sxtb x0, w8 With this DAG combine, we will instead generate the following code because the existing 16xi8-to-i64 pattern will fire. smov x0, v0.b[2]

mssimpso marked an inline comment as done.Dec 16 2015, 7:57 AM

mssimpso added inline comments.

lib/Target/AArch64/AArch64ISelLowering.cpp
8444	Sorry, that should have been a sext in the test case above.

Type checking.

James, you may need to go into Phabricator and unblock the patch.

From: James Molloy [mailto:james@jamesmolloy.co.uk]
Sent: Wednesday, December 16, 2015 11:55 AM
To: reviews+D15515+public+f270c3ee04f7413b@reviews.llvm.org; Matthew Simpson; t.p.northover@gmail.com; silviu.baranga@arm.com; renato.golin@linaro.org; mcrosier@codeaurora.org; gberry@codeaurora.org; james.molloy@arm.com
Cc: kanheim@a-bix.com; llvm-commits@lists.llvm.org
Subject: Re: [PATCH] D15515: [AArch64] Add DAG combine for extract extend pattern

That makes sense to me. LGTM!

LGTM, per James.

Closed by commit rL255895: [AArch64] Add DAG combine for extract extend pattern (authored by mssimpso). · Explain WhyDec 17 2015, 6:34 AM

This revision was automatically updated to reflect the committed changes.

Thanks very much Silviu, James, Geoff, and Chad for the feedback.

I reverted this patch because it was breaking some internal tests. After looking at the failing test case, I don't think the DAG is the right place to do this. The issue in the failing test was that the sign_extend was first matched to another pattern (a widening operation like ssubw). Without the sign_extend, the widened extract_vector_elt could no longer be matched to anything. See the code below for an example.

define i64 @f(<8 x i16> %a, i64 %b) {
  %d = extractelement <8 x i16> %a, i32 2
  %e = sext i16 %d to i64
  %t1 = sub nsw i64 %b, %e
  ret i64 %t1
}

I think the initial revision of the patch, which added the two additional patterns to the target description, is the safer approach. We'll match the smov cases, but won't risk making the DAG illegal. Were there any objections to the initial revision?

If I understand correctly, you may be able to just add patterns for the vector_extracts without extends/ands to handle this case (along with the current dagcombine).
I'm not sure this will buy you anything more than just using the original patterns you suggested though.

Ok, it seems that the original solution is the best way to do this then.

Diff 42932

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,431 Lines • ▼ Show 20 Lines	if (IID == Intrinsic::aarch64_neon_sabd \|\|
if (!NewABD.getNode())		if (!NewABD.getNode())
return SDValue();		return SDValue();

return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), N->getValueType(0),		return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), N->getValueType(0),
NewABD);		NewABD);
}		}
}		}

		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

		// If we see (any_extend (extract_vector_element v, i)), we can potentially
		// remove the extend and promote the extract. We can do this if the vector
		// type is legal and if we know the result is sign extended.
		jmolloyUnsubmitted Done Reply Inline Actions I suppose we need the sign extend because ISD::EXTRACT_VECTOR_ELT gets matched to "smov"? Perhaps that's something that needs changing - surely we should match (sign_ext_inreg (extract_vector_elt)) to smov and (zero_ext_inreg (extract_vector_elt)) to umov. I think there needs to be more type checking going on here. Consider: %0 = EXTRACT_VECTOR_ELT <8 x i16> %a ; %0 is i16 %1 = ANYEXT i16 %0 to i32 ; %1 is i32 %2 = SIGN_EXTEND_INREG i32 %0 (from i8 to i32) ; bit[7] is replicated into bit[8..31] This cannot be an SMOVv8i16, because the sign bit is bit[7], not bit[15]. This testcase shows it: define i64 @matt(<8 x i16> %a) { %b = extractelement <8 x i16> %a, i32 1 %c = trunc i16 %b to i8 %d = sext i8 %c to i64 ret i64 %d } SelectionDAG has 11 nodes: t0: ch = EntryToken t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %vreg0 t14: i32 = extract_vector_elt t2, Constant:i64<1> t15: i64 = any_extend t14 t13: i64 = sign_extend_inreg t15, ValueType:ch:i8 t9: ch,glue = CopyToReg t0, Register:i64 %X0, t13 t10: ch = AArch64ISD::RET_FLAG t9, Register:i64 %X0, t9:1 jmolloy: I suppose we need the sign extend because ISD::EXTRACT_VECTOR_ELT gets matched to "smov"?
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions James, You're right. Thanks for joining the review! We also need to check that the vector element type is the same as the type we're sign extending from. Yes, this combine is intended to canonicalize the patterns we match to SMOV. The existing patterns attempt match (sext_inreg (vector_extract v, i)) for the legal types. However, the 16xi8-to-i64 case and the 8xi16-to-i64 case (which we have patterns for) don't actually follow this form because there is an added any_extend. An any_extend from i32 to i64 is added by the type legalizer because it legalizes the extracted elements to i32. So for example, for the case below: define i64 @f(<16 x i8> %a) { entry: %e = extractelement <16 x i8> %a, i32 2 %b = zext i8 %e to i64 ret i64 %b } we currently generate: umov w8, v0.b[2] sxtb x0, w8 With this DAG combine, we will instead generate the following code because the existing 16xi8-to-i64 pattern will fire. smov x0, v0.b[2] mssimpso: James, You're right. Thanks for joining the review! We also need to check that the vector…
		mssimpsoAuthorUnsubmitted Not Done Reply Inline Actions Sorry, that should have been a sext in the test case above. mssimpso: Sorry, that should have been a sext in the test case above.
		if (DCI.isAfterLegalizeVectorOps() && N->getOpcode() == ISD::ANY_EXTEND &&
		N->hasOneUse() && N->use_begin()->getOpcode() == ISD::SIGN_EXTEND_INREG) {
		auto N0 = N->getOperand(0);
		auto VT = N->getValueType(0);
		if (N0.getNode()->hasOneUse() && N0.getOpcode() == ISD::EXTRACT_VECTOR_ELT)
		if (TLI.isTypeLegal(N0.getOperand(0).getValueType()))
		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SDLoc(N), VT,
		N0.getOperand(0), N0.getOperand(1));
		}

// This is effectively a custom type legalization for AArch64.		// This is effectively a custom type legalization for AArch64.
//		//
// Type legalization will split an extend of a small, legal, type to a larger		// Type legalization will split an extend of a small, legal, type to a larger
// illegal type by first splitting the destination type, often creating		// illegal type by first splitting the destination type, often creating
// illegal source types, which then get legalized in isel-confusing ways,		// illegal source types, which then get legalized in isel-confusing ways,
// leading to really terrible codegen. E.g.,		// leading to really terrible codegen. E.g.,
// %result = v8i32 sext v8i8 %value		// %result = v8i32 sext v8i8 %value
// becomes		// becomes
Show All 14 Lines	static SDValue performExtendCombine(SDNode *N,
// This is pre-legalization to catch some cases where the default		// This is pre-legalization to catch some cases where the default
// type legalization will create ill-tempered code.		// type legalization will create ill-tempered code.
if (!DCI.isBeforeLegalizeOps())		if (!DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

// We're only interested in cleaning things up for non-legal vector types		// We're only interested in cleaning things up for non-legal vector types
// here. If both the source and destination are legal, things will just		// here. If both the source and destination are legal, things will just
// work naturally without any fiddling.		// work naturally without any fiddling.
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
EVT ResVT = N->getValueType(0);		EVT ResVT = N->getValueType(0);
if (!ResVT.isVector() \|\| TLI.isTypeLegal(ResVT))		if (!ResVT.isVector() \|\| TLI.isTypeLegal(ResVT))
return SDValue();		return SDValue();
// If the vector type isn't a simple VT, it's beyond the scope of what		// If the vector type isn't a simple VT, it's beyond the scope of what
// we're worried about here. Let legalization do its thing and hope for		// we're worried about here. Let legalization do its thing and hope for
// the best.		// the best.
SDValue Src = N->getOperand(0);		SDValue Src = N->getOperand(0);
EVT SrcVT = Src->getValueType(0);		EVT SrcVT = Src->getValueType(0);
▲ Show 20 Lines • Show All 1,527 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-neon-copy.ll

	Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: smovw8h:			; CHECK-LABEL: smovw8h:
	; CHECK: smov {{w[0-9]+}}, {{v[0-9]+}}.h[2]			; CHECK: smov {{w[0-9]+}}, {{v[0-9]+}}.h[2]
	%tmp3 = extractelement <8 x i16> %tmp1, i32 2			%tmp3 = extractelement <8 x i16> %tmp1, i32 2
	%tmp4 = sext i16 %tmp3 to i32			%tmp4 = sext i16 %tmp3 to i32
	%tmp5 = add i32 %tmp4, %tmp4			%tmp5 = add i32 %tmp4, %tmp4
	ret i32 %tmp5			ret i32 %tmp5
	}			}

	define i32 @smovx16b(<16 x i8> %tmp1) {			define i64 @smovx16b(<16 x i8> %tmp1) {
	; CHECK-LABEL: smovx16b:			; CHECK-LABEL: smovx16b:
	; CHECK: smov {{[xw][0-9]+}}, {{v[0-9]+}}.b[8]			; CHECK: smov {{x[0-9]+}}, {{v[0-9]+}}.b[8]
	%tmp3 = extractelement <16 x i8> %tmp1, i32 8			%tmp3 = extractelement <16 x i8> %tmp1, i32 8
	%tmp4 = sext i8 %tmp3 to i32			%tmp4 = sext i8 %tmp3 to i64
	%tmp5 = add i32 %tmp4, %tmp4			ret i64 %tmp4
	ret i32 %tmp5
	}			}

	define i32 @smovx8h(<8 x i16> %tmp1) {			define i64 @smovx8h(<8 x i16> %tmp1) {
	; CHECK-LABEL: smovx8h:			; CHECK-LABEL: smovx8h:
	; CHECK: smov {{[xw][0-9]+}}, {{v[0-9]+}}.h[2]			; CHECK: smov {{x[0-9]+}}, {{v[0-9]+}}.h[2]
	%tmp3 = extractelement <8 x i16> %tmp1, i32 2			%tmp3 = extractelement <8 x i16> %tmp1, i32 2
	%tmp4 = sext i16 %tmp3 to i32			%tmp4 = sext i16 %tmp3 to i64
	ret i32 %tmp4			ret i64 %tmp4
	}			}

	define i64 @smovx4s(<4 x i32> %tmp1) {			define i64 @smovx4s(<4 x i32> %tmp1) {
	; CHECK-LABEL: smovx4s:			; CHECK-LABEL: smovx4s:
	; CHECK: smov {{x[0-9]+}}, {{v[0-9]+}}.s[2]			; CHECK: smov {{x[0-9]+}}, {{v[0-9]+}}.s[2]
	%tmp3 = extractelement <4 x i32> %tmp1, i32 2			%tmp3 = extractelement <4 x i32> %tmp1, i32 2
	%tmp4 = sext i32 %tmp3 to i64			%tmp4 = sext i32 %tmp3 to i64
	ret i64 %tmp4			ret i64 %tmp4
	▲ Show 20 Lines • Show All 1,100 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add DAG combine for extract extend pattern
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 42932

lib/Target/AArch64/AArch64ISelLowering.cpp

test/CodeGen/AArch64/arm64-neon-copy.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add DAG combine for extract extend patternClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 42932

lib/Target/AArch64/AArch64ISelLowering.cpp

test/CodeGen/AArch64/arm64-neon-copy.ll

[AArch64] Add DAG combine for extract extend pattern
ClosedPublic