This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
Target/AArch64/
-
AArch64/
2/3
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
sve-fixed-length-reshuffle.ll

Differential D125601

[DAGCombiner][AArch64] Reorder the bitcast of scalable vector
AbandonedPublic

Authored by Allen on May 14 2022, 2:11 AM.

Download Raw Diff

Details

Reviewers

craig.topper
paulwalker-arm
RKSimon
dmgreen
david-arm

Summary

Perform the following reorder when the scalable vector's inner type is floating
point and the outer type is not scalable vector, eg:
t19: v2f32 = extract_subvector t2, Constant:i64<0>

t12: v2i32 = bitcast t19

-->
t20: nxv2i32 = bitcast t2

t21: v2i32 = extract_subvector t20, Constant:i64<0>

Diff Detail

Event Timeline

Allen created this revision.May 14 2022, 2:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2022, 2:11 AM

Herald added subscribers: StephenFan, ecnelises, hiraditya, kristof.beyls. · View Herald Transcript

Allen requested review of this revision.May 14 2022, 2:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2022, 2:12 AM

Herald added subscribers: llvm-commits, alextsao1999. · View Herald Transcript

Allen edited the summary of this revision. (Show Details)May 14 2022, 2:52 AM

Harbormaster completed remote builds in B164438: Diff 429422.May 14 2022, 3:28 AM

Allen updated this revision to Diff 429585.May 15 2022, 6:13 PM

Harbormaster completed remote builds in B164558: Diff 429585.May 15 2022, 6:54 PM

david-arm added inline comments.May 16 2022, 6:01 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14937	From what I can see all the input and output types you're hoping to deal with are legal, right? For example, <vscale x 2 x float> is legal, but <vscale x 2 x i32> is illegal. So it looks like you're trying to take advantage of legalisation behaviour when doing something like this: %out = <2 x i32> extract_subvector <vscale x 2 x i32> %in, i32 0 which will probably turn into %out1 = <2 x i64> extract_subvector <vscale x 2 x i64> %in, i32 0 %out2 = truncate <2 x i64> %out1 to <2 x i32> The first operation will then become a nop. One problem with this approach is that it looks like you're assuming the index is always 0. What happens when extracting a subvector from index 2, etc? I'm worried the generated code might then look even worse. It just feels like we might want to restrict the allowed cases a bit more here to just those examples where we know there will be an improvement.
llvm/test/CodeGen/AArch64/extract-insert-element-sve.ll
1 ↗	(On Diff #429585)	We actually already have similar tests in sve-fixed-length-reshuffle.ll - could you move this test into that file please?

Allen added inline comments.May 16 2022, 7:54 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14937	sure, I hope to deal with all the input and output types are legal. would you please show me which API can be used, here ? thanks.

update review
1、add restrict for index ==0 and legal for types
2、move test into file llvm/test/CodeGen/AArch64/sve-fixed-length-reshuffle.ll

Harbormaster completed remote builds in B164651: Diff 429726.May 16 2022, 9:28 AM

Allen added inline comments.May 18 2022, 5:53 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14937	hi @david-arm , do you mean to list all the type pair, where we know there will be an improvement?
llvm/test/CodeGen/AArch64/extract-insert-element-sve.ll
1 ↗	(On Diff #429585)	done, thanks.

Hi @Allen, sorry I've not had chance to review your patch yet. I'll try to take a proper look tomorrow but in the interim my main concern is that this optimisation specifically relates to NEON sized vectors where we can do a better job of the extracts. I'm not sure we want to do this transform for all fixed length vector types though. Part of me thinks we're missing a different, more specific, transformation somewhere. However, I cannot be sure until I've had chance to investigate a little more.

Thank @paulwalker-arm very much, I look forward to your review suggestions.

ping?

Hi @Allen, Sorry for the delay but my investigations turned more complex than I had expected. I don't believe this is a DAG level problem and the real issue here is that 64/128bit fixed length extracts from unpacked scalable vectors are not legal and thus common legalisation code kicks in to perform the operation via the stack. The solution is to be able to directly isel these operations. It turns out this is not straight forward. I've uploaded D126201, which is clearly a bit of a hack, to show the areas to investigate.

The main problem is the current definition of extract_subvector does not allow mixed (i.e. fixed and scalable) vector types. Although it can be changed, this causes build failures because many existing patterns need to be updated because their types can no longer be inferred. Instead I propose a specific node, currently called extract_subvector2 which is specifically for the case where you want to extract a fixed length vector from a scalable vector. My reasoning being no existing patterns need to change and it makes it easier to see these sorts of extracts that typically need special handling. I didn't manage to figure out why I need to disable the existing extract_subvector patterns, which for some reason kept matching even though the types do not match what those patterns require.

Let me know how you want to proceed. Feel free to run with D126201 or you can wait on me to create a proper implementation. It just depend on you timeline as I'm not sure how much time I'll be able to invest in this over the next few weeks.

hi, @paulwalker-arm, Sorry to remind you! Thank you very much for providing the right idea to this problem. I'll try with your solution

Thanks @Allen. As another data point I just noticed vector_extract_subvec already exists upstream, which might mean you don't need the extract_subvector2 I created as part of D126201.

In D125601#3533796, @paulwalker-arm wrote:

Thanks @Allen. As another data point I just noticed vector_extract_subvec already exists upstream, which might mean you don't need the extract_subvector2 I created as part of D126201.

That's great. Thank you very much for the quick resolution!

Matt added a subscriber: Matt.May 25 2022, 2:06 PM

Allen abandoned this revision.Jun 1 2022, 6:08 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

5 lines

Target/

AArch64/

AArch64ISelLowering.cpp

27 lines

test/

CodeGen/

AArch64/

sve-fixed-length-reshuffle.ll

36 lines

Diff 429726

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 21,479 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitEXTRACT_SUBVECTOR(SDNode *N) {
if (V.getOpcode() == ISD::SPLAT_VECTOR)		if (V.getOpcode() == ISD::SPLAT_VECTOR)
if (DAG.isConstantValueOfAnyType(V.getOperand(0)) \|\| V.hasOneUse())		if (DAG.isConstantValueOfAnyType(V.getOperand(0)) \|\| V.hasOneUse())
if (!LegalOperations \|\| TLI.isOperationLegal(ISD::SPLAT_VECTOR, NVT))		if (!LegalOperations \|\| TLI.isOperationLegal(ISD::SPLAT_VECTOR, NVT))
return DAG.getSplatVector(NVT, SDLoc(N), V.getOperand(0));		return DAG.getSplatVector(NVT, SDLoc(N), V.getOperand(0));

// Try to move vector bitcast after extract_subv by scaling extraction index:		// Try to move vector bitcast after extract_subv by scaling extraction index:
// extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')		// extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')
if (V.getOpcode() == ISD::BITCAST &&		if (V.getOpcode() == ISD::BITCAST &&
V.getOperand(0).getValueType().isVector() &&		V.getOperand(0).getValueType().isFixedLengthVector() &&
(!LegalOperations \|\| TLI.isOperationLegal(ISD::BITCAST, NVT))) {		(!LegalOperations \|\| TLI.isOperationLegal(ISD::BITCAST, NVT))) {
SDValue SrcOp = V.getOperand(0);		SDValue SrcOp = V.getOperand(0);
EVT SrcVT = SrcOp.getValueType();		EVT SrcVT = SrcOp.getValueType();
		// For scalable vectors, we purposely add the bitcasts, and only deal
		// with integer extract_subvector. So we don't reorder those particular
		// bitcasts.
unsigned SrcNumElts = SrcVT.getVectorMinNumElements();		unsigned SrcNumElts = SrcVT.getVectorMinNumElements();
unsigned DestNumElts = V.getValueType().getVectorMinNumElements();		unsigned DestNumElts = V.getValueType().getVectorMinNumElements();
if ((SrcNumElts % DestNumElts) == 0) {		if ((SrcNumElts % DestNumElts) == 0) {
unsigned SrcDestRatio = SrcNumElts / DestNumElts;		unsigned SrcDestRatio = SrcNumElts / DestNumElts;
ElementCount NewExtEC = NVT.getVectorElementCount() * SrcDestRatio;		ElementCount NewExtEC = NVT.getVectorElementCount() * SrcDestRatio;
EVT NewExtVT = EVT::getVectorVT(*DAG.getContext(), SrcVT.getScalarType(),		EVT NewExtVT = EVT::getVectorVT(*DAG.getContext(), SrcVT.getScalarType(),
NewExtEC);		NewExtEC);
if (TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, NewExtVT)) {		if (TLI.isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, NewExtVT)) {
▲ Show 20 Lines • Show All 3,201 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,922 Lines • ▼ Show 20 Lines	return DAG.getNode(ISD::BITCAST, dl, VT,
DAG.getNode(ISD::CONCAT_VECTORS, dl, ConcatTy,		DAG.getNode(ISD::CONCAT_VECTORS, dl, ConcatTy,
DAG.getNode(ISD::BITCAST, dl, RHSTy, N0),		DAG.getNode(ISD::BITCAST, dl, RHSTy, N0),
RHS));		RHS));
}		}

static SDValue		static SDValue
performExtractSubvectorCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,		performExtractSubvectorCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
		EVT InVT = N->getOperand(0).getValueType();
		EVT OutVT = N->getValueType(0);
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		SDLoc DL(N);
		// Reorder when the scalable vector's inner type is floating point and the
		// outer type is not scalable vector. Also, the index shoud be 0 and all the
		// input and output types should be legal to deal with.
		david-armUnsubmitted Not Done Reply Inline Actions From what I can see all the input and output types you're hoping to deal with are legal, right? For example, <vscale x 2 x float> is legal, but <vscale x 2 x i32> is illegal. So it looks like you're trying to take advantage of legalisation behaviour when doing something like this: %out = <2 x i32> extract_subvector <vscale x 2 x i32> %in, i32 0 which will probably turn into %out1 = <2 x i64> extract_subvector <vscale x 2 x i64> %in, i32 0 %out2 = truncate <2 x i64> %out1 to <2 x i32> The first operation will then become a nop. One problem with this approach is that it looks like you're assuming the index is always 0. What happens when extracting a subvector from index 2, etc? I'm worried the generated code might then look even worse. It just feels like we might want to restrict the allowed cases a bit more here to just those examples where we know there will be an improvement. david-arm: From what I can see all the input and output types you're hoping to deal with are legal, right?
		AllenAuthorUnsubmitted Done Reply Inline Actions sure, I hope to deal with all the input and output types are legal. would you please show me which API can be used, here ? thanks. Allen: sure, I hope to deal with all the input and output types are legal. would you please show me…
		AllenAuthorUnsubmitted Done Reply Inline Actions hi @david-arm , do you mean to list all the type pair, where we know there will be an improvement? Allen: hi @david-arm , do you mean to list all the type pair, where we know there will be an…
		if (InVT.isScalableVector() && InVT.isFloatingPoint() &&
		DCI.isBeforeLegalize() && !OutVT.isScalableVector() &&
		isNullConstant(N->getOperand(1)) && TLI.isTypeLegal(OutVT) &&
		TLI.isOperationLegalOrCustom(ISD::INSERT_SUBVECTOR, InVT)) {
		// Bitcast the input
		SDValue VecOp = N->getOperand(0);
		VecOp = DAG.getNode(ISD::BITCAST, DL, InVT.changeTypeToInteger(), VecOp);
		// Perform extract in integer type
		SDValue Extract =
		DAG.getNode(N->getOpcode(), DL, OutVT.changeTypeToInteger(), VecOp,
		N->getOperand(1));
		// Bitcast back to fp type
		return DAG.getNode(ISD::BITCAST, DL, OutVT, Extract);
		}

if (DCI.isBeforeLegalizeOps())		if (DCI.isBeforeLegalizeOps())
return SDValue();		return SDValue();

EVT VT = N->getValueType(0);		if (!OutVT.isScalableVector() \|\| OutVT.getVectorElementType() != MVT::i1)
if (!VT.isScalableVector() \|\| VT.getVectorElementType() != MVT::i1)
return SDValue();		return SDValue();

SDValue V = N->getOperand(0);		SDValue V = N->getOperand(0);

// NOTE: This combine exists in DAGCombiner, but that version's legality check		// NOTE: This combine exists in DAGCombiner, but that version's legality check
// blocks this combine because the non-const case requires custom lowering.		// blocks this combine because the non-const case requires custom lowering.
//		//
// ty1 extract_vector(ty2 splat(const))) -> ty1 splat(const)		// ty1 extract_vector(ty2 splat(const))) -> ty1 splat(const)
if (V.getOpcode() == ISD::SPLAT_VECTOR)		if (V.getOpcode() == ISD::SPLAT_VECTOR)
if (isa<ConstantSDNode>(V.getOperand(0)))		if (isa<ConstantSDNode>(V.getOperand(0)))
return DAG.getNode(ISD::SPLAT_VECTOR, SDLoc(N), VT, V.getOperand(0));		return DAG.getNode(ISD::SPLAT_VECTOR, DL, OutVT, V.getOperand(0));

return SDValue();		return SDValue();
}		}

static SDValue		static SDValue
performInsertSubvectorCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,		performInsertSubvectorCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
SDLoc DL(N);		SDLoc DL(N);
▲ Show 20 Lines • Show All 6,157 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-fixed-length-reshuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s \| FileCheck %s			; RUN: llc -mattr=+sve < %s \| FileCheck %s

	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	; == Matching first N elements ==			; == Matching first N elements ==

	define <4 x i1> @reshuffle_v4i1_nxv4i1(<vscale x 4 x i1> %a) #0 {			define <4 x i1> @reshuffle_v4i1_nxv4i1(<vscale x 4 x i1> %a) {
	; CHECK-LABEL: reshuffle_v4i1_nxv4i1:			; CHECK-LABEL: reshuffle_v4i1_nxv4i1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov z1.s, p0/z, #1 // =0x1			; CHECK-NEXT: mov z1.s, p0/z, #1 // =0x1
	; CHECK-NEXT: mov w8, v1.s[1]			; CHECK-NEXT: mov w8, v1.s[1]
	; CHECK-NEXT: mov w9, v1.s[2]			; CHECK-NEXT: mov w9, v1.s[2]
	; CHECK-NEXT: mov v0.16b, v1.16b			; CHECK-NEXT: mov v0.16b, v1.16b
	; CHECK-NEXT: mov v0.h[1], w8			; CHECK-NEXT: mov v0.h[1], w8
	; CHECK-NEXT: mov w8, v1.s[3]			; CHECK-NEXT: mov w8, v1.s[3]
	; CHECK-NEXT: mov v0.h[2], w9			; CHECK-NEXT: mov v0.h[2], w9
	; CHECK-NEXT: mov v0.h[3], w8			; CHECK-NEXT: mov v0.h[3], w8
	; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0			; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%el0 = extractelement <vscale x 4 x i1> %a, i32 0			%el0 = extractelement <vscale x 4 x i1> %a, i32 0
	%el1 = extractelement <vscale x 4 x i1> %a, i32 1			%el1 = extractelement <vscale x 4 x i1> %a, i32 1
	%el2 = extractelement <vscale x 4 x i1> %a, i32 2			%el2 = extractelement <vscale x 4 x i1> %a, i32 2
	%el3 = extractelement <vscale x 4 x i1> %a, i32 3			%el3 = extractelement <vscale x 4 x i1> %a, i32 3
	%v0 = insertelement <4 x i1> undef, i1 %el0, i32 0			%v0 = insertelement <4 x i1> undef, i1 %el0, i32 0
	%v1 = insertelement <4 x i1> %v0, i1 %el1, i32 1			%v1 = insertelement <4 x i1> %v0, i1 %el1, i32 1
	%v2 = insertelement <4 x i1> %v1, i1 %el2, i32 2			%v2 = insertelement <4 x i1> %v1, i1 %el2, i32 2
	%v3 = insertelement <4 x i1> %v2, i1 %el3, i32 3			%v3 = insertelement <4 x i1> %v2, i1 %el3, i32 3
	ret <4 x i1> %v3			ret <4 x i1> %v3
	}			}

	attributes #0 = { "target-features"="+sve" }			; Extract from packed SVE vectors into different sizes of NEON registers.

				define <2 x float> @extract_subreg_2f32_unpacked_nx2xf32(<vscale x 2 x float> %vec) nounwind {
				; CHECK-LABEL: extract_subreg_2f32_unpacked_nx2xf32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: xtn v0.2s, v0.2d
				; CHECK-NEXT: ret
				%vec.e0 = extractelement <vscale x 2 x float> %vec, i32 0
				%vec.e1 = extractelement <vscale x 2 x float> %vec, i32 1

				%1 = insertelement <2 x float> undef, float %vec.e0, i32 0
				%2 = insertelement <2 x float> %1, float %vec.e1, i32 1
				ret <2 x float> %2
				}

				define <4 x half> @extract_subreg_4f16_unpacked_nx4xf16(<vscale x 4 x half> %vec) nounwind {
				; CHECK-LABEL: extract_subreg_4f16_unpacked_nx4xf16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: xtn v0.4h, v0.4s
				; CHECK-NEXT: ret
				%vec.e0 = extractelement <vscale x 4 x half> %vec, i32 0
				%vec.e1 = extractelement <vscale x 4 x half> %vec, i32 1
				%vec.e2 = extractelement <vscale x 4 x half> %vec, i32 2
				%vec.e3 = extractelement <vscale x 4 x half> %vec, i32 3

				%1 = insertelement <4 x half> undef, half %vec.e0, i32 0
				%2 = insertelement <4 x half> %1, half %vec.e1, i32 1
				%3 = insertelement <4 x half> %2, half %vec.e2, i32 2
				%4 = insertelement <4 x half> %3, half %vec.e3, i32 3
				ret <4 x half> %4
				}