This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1
arm64-subvector-extend.ll

Differential D99437

[AArch64] Remove custom zext/sext legalization code.
ClosedPublic

Authored by fhahn on Mar 26 2021, 12:57 PM.

Download Raw Diff

Details

Reviewers

david-arm
dmgreen
t.p.northover
aemerson

Commits

rG482283042f79: [AArch64] Remove custom zext/sext legalization code.

Summary

Currently performExtendCombine assumes that the src-element bitwidth * 2
is a valid MVT. But this is not the case for i1 and it causes a crash on
the v64i1 test cases added in this patch.

It turns out that this code appears to not be needed; the same patterns are
handled by other code and we end up with the same results, even without the
custom lowering. I also added additional test cases in a50037aaa6d5df.

Let's just remove the unneeded code.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Mar 26 2021, 12:57 PM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptMar 26 2021, 12:57 PM

fhahn requested review of this revision.Mar 26 2021, 12:57 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2021, 12:57 PM

Harbormaster completed remote builds in B95921: Diff 333611.Mar 26 2021, 1:36 PM

Should we just be bailing on i1 src types? Otherwise if someone adds an i2 type it would just start to fail again.

In D99437#2654627, @dmgreen wrote:

Should we just be bailing on i1 src types? Otherwise if someone adds an i2 type it would just start to fail again.

That's a good point, thanks! I realized that my comment was a bit mis-leading, the code was not actually checking for a legal type. I updated it to use TLI.isTypeLegal, to check if the widened source type is legal. I think if it is legal, then using it to build a vector should also be legal? I added an assertion to make that clearer.

I am not sure if you should directly check for i1 src types, because that would mean we miss other combinations that cause crashes in this function (e.g. sext <1 x i64> %x to <1 x i128>) which is caught be the legal type check. Alternatively we could explicitly check for element types that are valid for vectors?

Harbormaster completed remote builds in B96008: Diff 333732.Mar 28 2021, 12:38 PM

I am not sure if you should directly check for i1 src types, because that would mean we miss other combinations that cause crashes in this function (e.g. sext <1 x i64> %x to <1 x i128>) which is caught be the legal type check. Alternatively we could explicitly check for element types that are valid for vectors?

i8 on it's own isn't a legal type, neither is i16.

Umm. Do we actually need this code? If so for what?

llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll
192	This looks odd, with the lanes being interchanged. I presume there's a lot of other code that converts the i1 vector over a call into vectors, and that doesn't preserve the register order?

In D99437#2655120, @dmgreen wrote:

I am not sure if you should directly check for i1 src types, because that would mean we miss other combinations that cause crashes in this function (e.g. sext <1 x i64> %x to <1 x i128>) which is caught be the legal type check. Alternatively we could explicitly check for element types that are valid for vectors?

i8 on it's own isn't a legal type, neither is i16.

Yeah, that might be a bit too restrictive. I think the original version (just checking if the MVT is valid/simple) should be enough, because my main concern is avoiding the crash.

Umm. Do we actually need this code? If so for what?

I added a few additional test cases (a50037aaa6d5) and it appears that while we have a bunch of tests that exercise the code, we get the same code without this combine. I updated the patch to remove it.

Yeah. None of the tests I ran produced different code either. They appear to lowered differently, but end up with identical codegen.

If we find any cases where that isn't true, we can always bring the code back. LGTM. Thanks!

This revision is now accepted and ready to land.Mar 29 2021, 11:12 AM

fhahn retitled this revision from [AArch64] Fix lowering zext/sext of v64i1. to [AArch64] Remove custom zext/sext legalization code..Mar 29 2021, 11:40 AM

fhahn edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B96158: Diff 333934.Mar 29 2021, 12:13 PM

Closed by commit rG482283042f79: [AArch64] Remove custom zext/sext legalization code. (authored by fhahn). · Explain WhyMar 29 2021, 2:27 PM

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rG482283042f79: [AArch64] Remove custom zext/sext legalization code..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

73 lines

test/

CodeGen/

AArch64/

arm64-subvector-extend.ll

64 lines

Diff 333986

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,933 Lines • ▼ Show 20 Lines	if (!DCI.isBeforeLegalizeOps() && N->getOpcode() == ISD::ZERO_EXTEND &&
SDNode *ABDNode = N->getOperand(0).getNode();		SDNode *ABDNode = N->getOperand(0).getNode();
SDValue NewABD =		SDValue NewABD =
tryCombineLongOpWithDup(Intrinsic::not_intrinsic, ABDNode, DCI, DAG);		tryCombineLongOpWithDup(Intrinsic::not_intrinsic, ABDNode, DCI, DAG);
if (!NewABD.getNode())		if (!NewABD.getNode())
return SDValue();		return SDValue();

return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), N->getValueType(0), NewABD);		return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), N->getValueType(0), NewABD);
}		}

// This is effectively a custom type legalization for AArch64.
//
// Type legalization will split an extend of a small, legal, type to a larger
// illegal type by first splitting the destination type, often creating
// illegal source types, which then get legalized in isel-confusing ways,
// leading to really terrible codegen. E.g.,
// %result = v8i32 sext v8i8 %value
// becomes
// %losrc = extract_subreg %value, ...
// %hisrc = extract_subreg %value, ...
// %lo = v4i32 sext v4i8 %losrc
// %hi = v4i32 sext v4i8 %hisrc
// Things go rapidly downhill from there.
//
// For AArch64, the [sz]ext vector instructions can only go up one element
// size, so we can, e.g., extend from i8 to i16, but to go from i8 to i32
// take two instructions.
//
// This implies that the most efficient way to do the extend from v8i8
// to two v4i32 values is to first extend the v8i8 to v8i16, then do
// the normal splitting to happen for the v8i16->v8i32.

// This is pre-legalization to catch some cases where the default
// type legalization will create ill-tempered code.
if (!DCI.isBeforeLegalizeOps())
return SDValue();

// We're only interested in cleaning things up for non-legal vector types
// here. If both the source and destination are legal, things will just
// work naturally without any fiddling.
const TargetLowering &TLI = DAG.getTargetLoweringInfo();
EVT ResVT = N->getValueType(0);
if (!ResVT.isVector() \|\| TLI.isTypeLegal(ResVT))
return SDValue();
// If the vector type isn't a simple VT, it's beyond the scope of what
// we're worried about here. Let legalization do its thing and hope for
// the best.
SDValue Src = N->getOperand(0);
EVT SrcVT = Src->getValueType(0);
if (!ResVT.isSimple() \|\| !SrcVT.isSimple())
return SDValue();

// If the source VT is a 64-bit fixed or scalable vector, we can play games
// and get the better results we want.
if (SrcVT.getSizeInBits().getKnownMinSize() != 64)
return SDValue();		return SDValue();

unsigned SrcEltSize = SrcVT.getScalarSizeInBits();
ElementCount SrcEC = SrcVT.getVectorElementCount();
SrcVT = MVT::getVectorVT(MVT::getIntegerVT(SrcEltSize * 2), SrcEC);
SDLoc DL(N);
Src = DAG.getNode(N->getOpcode(), DL, SrcVT, Src);

// Now split the rest of the operation into two halves, each with a 64
// bit source.
EVT LoVT, HiVT;
SDValue Lo, Hi;
LoVT = HiVT = ResVT.getHalfNumVectorElementsVT(*DAG.getContext());

EVT InNVT = EVT::getVectorVT(*DAG.getContext(), SrcVT.getVectorElementType(),
LoVT.getVectorElementCount());
Lo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, InNVT, Src,
DAG.getConstant(0, DL, MVT::i64));
Hi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, InNVT, Src,
DAG.getConstant(InNVT.getVectorMinNumElements(), DL, MVT::i64));
Lo = DAG.getNode(N->getOpcode(), DL, LoVT, Lo);
Hi = DAG.getNode(N->getOpcode(), DL, HiVT, Hi);

// Now combine the parts back together so we still have a single result
// like the combiner expects.
return DAG.getNode(ISD::CONCAT_VECTORS, DL, ResVT, Lo, Hi);
}		}

static SDValue splitStoreSplat(SelectionDAG &DAG, StoreSDNode &St,		static SDValue splitStoreSplat(SelectionDAG &DAG, StoreSDNode &St,
SDValue SplatVal, unsigned NumVecElts) {		SDValue SplatVal, unsigned NumVecElts) {
assert(!St.isTruncatingStore() && "cannot split truncating vector store");		assert(!St.isTruncatingStore() && "cannot split truncating vector store");
unsigned OrigAlignment = St.getAlignment();		unsigned OrigAlignment = St.getAlignment();
unsigned EltOffset = SplatVal.getValueType().getSizeInBits() / 8;		unsigned EltOffset = SplatVal.getValueType().getSizeInBits() / 8;

▲ Show 20 Lines • Show All 3,497 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll

	Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ushll.2d v0, v0, #0			; CHECK-NEXT: ushll.2d v0, v0, #0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	%r = zext <8 x i8> %v0 to <8 x i64>			%r = zext <8 x i8> %v0 to <8 x i64>
	ret <8 x i64> %r			ret <8 x i64> %r
	}			}

	define <8 x i64> @sext_v8i8_to_v8i64(<8 x i8> %v0) nounwind {			define <8 x i64> @sext_v8i8_to_v8i64(<8 x i8> %v0) nounwind {
	; CHECK-LABEL: sext_v8i8_to_v8i64:			; CHECK-LABEL: sext_v8i8_to_v8i64:
				dmgreenUnsubmitted Not Done Reply Inline Actions This looks odd, with the lanes being interchanged. I presume there's a lot of other code that converts the i1 vector over a call into vectors, and that doesn't preserve the register order? dmgreen: This looks odd, with the lanes being interchanged. I presume there's a lot of other code that…
	; CHECK-NEXT: sshll.8h v0, v0, #0			; CHECK-NEXT: sshll.8h v0, v0, #0
	; CHECK-NEXT: sshll2.4s v2, v0, #0			; CHECK-NEXT: sshll2.4s v2, v0, #0
	; CHECK-NEXT: sshll.4s v0, v0, #0			; CHECK-NEXT: sshll.4s v0, v0, #0
	; CHECK-NEXT: sshll2.2d v3, v2, #0			; CHECK-NEXT: sshll2.2d v3, v2, #0
	; CHECK-NEXT: sshll2.2d v1, v0, #0			; CHECK-NEXT: sshll2.2d v1, v0, #0
	; CHECK-NEXT: sshll.2d v2, v2, #0			; CHECK-NEXT: sshll.2d v2, v2, #0
	; CHECK-NEXT: sshll.2d v0, v0, #0			; CHECK-NEXT: sshll.2d v0, v0, #0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	%r = sext <8 x i8> %v0 to <8 x i64>			%r = sext <8 x i8> %v0 to <8 x i64>
	ret <8 x i64> %r			ret <8 x i64> %r
	}			}

				; Extends of vectors of i1.

				define <32 x i8> @zext_v32i1(<32 x i1> %arg) {
				; CHECK-LABEL: zext_v32i1:
				; CHECK: and.16b v0, v0, v2
				; CHECK-NEXT: and.16b v1, v1, v2
				; CHECK-NEXT: ret
				%res = zext <32 x i1> %arg to <32 x i8>
				ret <32 x i8> %res
				}

				define <32 x i8> @sext_v32i1(<32 x i1> %arg) {
				; CHECK-LABEL: sext_v32i1:
				; CHECK: shl.16b v0, v0, #7
				; CHECK-NEXT: shl.16b v1, v1, #7
				; CHECK-NEXT: sshr.16b v0, v0, #7
				; CHECK-NEXT: sshr.16b v1, v1, #7
				; CHECK-NEXT: ret
				;
				%res = sext <32 x i1> %arg to <32 x i8>
				ret <32 x i8> %res
				}

				define <64 x i8> @zext_v64i1(<64 x i1> %arg) {
				; CHECK-LABEL: zext_v64i1:
				; CHECK: and.16b v0, v0, [[V4:v.+]]
				; CHECK-NEXT: and.16b v1, v1, [[V4]]
				; CHECK-NEXT: and.16b v2, v2, [[V4]]
				; CHECK-NEXT: and.16b v3, v3, [[V4]]
				; CHECK-NEXT: ret
				;
				%res = zext <64 x i1> %arg to <64 x i8>
				ret <64 x i8> %res
				}

				define <64 x i8> @sext_v64i1(<64 x i1> %arg) {
				; CHECK-LABEL: sext_v64i1:
				; CHECK: shl.16b v0, v0, #7
				; CHECK-NEXT: shl.16b v3, v3, #7
				; CHECK-NEXT: shl.16b v2, v2, #7
				; CHECK-NEXT: shl.16b [[V4:v.+]], v1, #7
				; CHECK-NEXT: sshr.16b v0, v0, #7
				; CHECK-NEXT: sshr.16b v1, v3, #7
				; CHECK-NEXT: sshr.16b v2, v2, #7
				; CHECK-NEXT: sshr.16b v3, [[V4]], #7
				; CHECK-NEXT: ret
				;
				%res = sext <64 x i1> %arg to <64 x i8>
				ret <64 x i8> %res
				}

				define <1 x i128> @sext_v1x64(<1 x i64> %arg) {
				; CHECK-LABEL: sext_v1x64:
				; CHECK-NEXT: .cfi_startproc
				; CHECK-NEXT: fmov x8, d0
				; CHECK-NEXT: asr x1, x8, #63
				; CHECK-NEXT: mov.d v0[1], x1
				; CHECK-NEXT: fmov x0, d0
				; CHECK-NEXT: ret
				;
				%res = sext <1 x i64> %arg to <1 x i128>
				ret <1 x i128> %res
				}