This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
3/3
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-neon-2velem.ll

Differential D71672

[AArch64] match splat of bitcasted extract subvector to DUPLANE
ClosedPublic

Authored by spatel on Dec 18 2019, 12:49 PM.

Download Raw Diff

Details

Reviewers

efriedma
dmgreen
t.p.northover

Commits

rG0b38af89e2c0: [AArch64] match splat of bitcasted extract subvector to DUPLANE

Summary

This is another potential regression exposed by D63815.

Here we peek through a bitcast to find an extract subvector and scale the splat offset based on that:
splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC'

Diff Detail

Event Timeline

spatel created this revision.Dec 18 2019, 12:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 18 2019, 12:49 PM

Herald added subscribers: hiraditya, kristof.beyls, mcrosier. · View Herald Transcript

efriedma added inline comments.Dec 18 2019, 1:30 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7088–7089	Can you fix the comments here, so each of these transforms has its own comment briefly explaining what it does? `V1.getOperand(0).getScalarValueSizeInBits() % VTEltBitWidth == 0` seems overly restrictive. I guess you have to enforce that the EXTRACT_SUBVECTOR index is appropriately aligned, but it would be okay to allow, for example, `<16 x i8>` with index 8. Maybe translate the EXTRACT_SUBVECTOR index to a byte offset, then divide by the element size of the result? That should also make the logic a little easier to follow.

spatel mentioned this in rG59811f454df0: [AArch64] add more tests for extract-bitcast-splat; NFC.Dec 20 2019, 5:58 AM

Patch updated:

Enhanced to allow narrow element -> wide element bitcasts. (tests added with rG59811f454df0)
Restructured and added code/test comments (hopefully makes it easier to read).

efriedma added inline comments.Dec 20 2019, 11:57 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7111	Can we compute SrcVecNumElts as Extract.getOperand(0).getValueType().getSizeInBits()/CastedEltBitWidth, or something like that?

spatel marked 2 inline comments as done.Dec 20 2019, 1:02 PM

spatel added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7111	Ah, yes - that will simplify things. I was hung up on scaling that value, but it's unnecessary.

Patch updated:
Simplify code for calculating the casted source vector type.

LGTM

This revision is now accepted and ready to land.Dec 20 2019, 1:18 PM

Closed by commit rG0b38af89e2c0: [AArch64] match splat of bitcasted extract subvector to DUPLANE (authored by spatel). · Explain WhyDec 22 2019, 6:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

50 lines

test/

CodeGen/

AArch64/

arm64-neon-2velem.ll

16 lines

Diff 234958

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,079 Lines • ▼ Show 20 Lines	if (SVN->isSplat()) {
// Test if V1 is a BUILD_VECTOR and the lane being referenced is a non-		// Test if V1 is a BUILD_VECTOR and the lane being referenced is a non-
// constant. If so, we can just reference the lane's definition directly.		// constant. If so, we can just reference the lane's definition directly.
if (V1.getOpcode() == ISD::BUILD_VECTOR &&		if (V1.getOpcode() == ISD::BUILD_VECTOR &&
!isa<ConstantSDNode>(V1.getOperand(Lane)))		!isa<ConstantSDNode>(V1.getOperand(Lane)))
return DAG.getNode(AArch64ISD::DUP, dl, VT, V1.getOperand(Lane));		return DAG.getNode(AArch64ISD::DUP, dl, VT, V1.getOperand(Lane));

// Otherwise, duplicate from the lane of the input vector.		// Otherwise, duplicate from the lane of the input vector.
unsigned Opcode = getDUPLANEOp(V1.getValueType().getVectorElementType());		unsigned Opcode = getDUPLANEOp(V1.getValueType().getVectorElementType());

// SelectionDAGBuilder may have "helpfully" already extracted or conatenated		// Try to eliminate a bitcasted extract subvector before a DUPLANE.
		efriedmaUnsubmitted Done Reply Inline Actions Can you fix the comments here, so each of these transforms has its own comment briefly explaining what it does? `V1.getOperand(0).getScalarValueSizeInBits() % VTEltBitWidth == 0` seems overly restrictive. I guess you have to enforce that the EXTRACT_SUBVECTOR index is appropriately aligned, but it would be okay to allow, for example, `<16 x i8>` with index 8. Maybe translate the EXTRACT_SUBVECTOR index to a byte offset, then divide by the element size of the result? That should also make the logic a little easier to follow. efriedma: Can you fix the comments here, so each of these transforms has its own comment briefly…
// to make a vector of the same size as this SHUFFLE. We can ignore the		auto getScaledOffsetDup = [](SDValue BitCast, int &LaneC, MVT &CastVT) {
// extract entirely, and canonicalise the concat using WidenVector.		// Match: dup (bitcast (extract_subv X, C)), LaneC
if (V1.getOpcode() == ISD::EXTRACT_SUBVECTOR) {		if (BitCast.getOpcode() != ISD::BITCAST \|\|
Lane += cast<ConstantSDNode>(V1.getOperand(1))->getZExtValue();		BitCast.getOperand(0).getOpcode() != ISD::EXTRACT_SUBVECTOR)
		return false;

		// The extract index must align in the destination type. That may not
		// happen if the bitcast is from narrow to wide type.
		SDValue Extract = BitCast.getOperand(0);
		unsigned ExtIdx = Extract.getConstantOperandVal(1);
		unsigned SrcEltBitWidth = Extract.getScalarValueSizeInBits();
		unsigned ExtIdxInBits = ExtIdx * SrcEltBitWidth;
		unsigned CastedEltBitWidth = BitCast.getScalarValueSizeInBits();
		if (ExtIdxInBits % CastedEltBitWidth != 0)
		return false;

		// Update the lane value by offsetting with the scaled extract index.
		LaneC += ExtIdxInBits / CastedEltBitWidth;

		// Determine the casted vector type of the wide vector input.
		// dup (bitcast (extract_subv X, C)), LaneC --> dup (bitcast X), LaneC'
		// Examples:
		efriedmaUnsubmitted Done Reply Inline Actions Can we compute SrcVecNumElts as Extract.getOperand(0).getValueType().getSizeInBits()/CastedEltBitWidth, or something like that? efriedma: Can we compute SrcVecNumElts as Extract.getOperand(0).getValueType().getSizeInBits…
		spatelAuthorUnsubmitted Done Reply Inline Actions Ah, yes - that will simplify things. I was hung up on scaling that value, but it's unnecessary. spatel: Ah, yes - that will simplify things. I was hung up on scaling that value, but it's unnecessary.
		// dup (bitcast (extract_subv v2f64 X, 1) to v2f32), 1 --> dup v4f32 X, 3
		// dup (bitcast (extract_subv v16i8 X, 8) to v4i16), 1 --> dup v8i16 X, 5
		unsigned SrcVecNumElts =
		Extract.getOperand(0).getValueSizeInBits() / CastedEltBitWidth;
		CastVT = MVT::getVectorVT(BitCast.getSimpleValueType().getScalarType(),
		SrcVecNumElts);
		return true;
		};
		MVT CastVT;
		if (getScaledOffsetDup(V1, Lane, CastVT)) {
		V1 = DAG.getBitcast(CastVT, V1.getOperand(0).getOperand(0));
		} else if (V1.getOpcode() == ISD::EXTRACT_SUBVECTOR) {
		// The lane is incremented by the index of the extract.
		// Example: dup v2f32 (extract v4f32 X, 2), 1 --> dup v4f32 X, 3
		Lane += V1.getConstantOperandVal(1);
V1 = V1.getOperand(0);		V1 = V1.getOperand(0);
} else if (V1.getOpcode() == ISD::CONCAT_VECTORS) {		} else if (V1.getOpcode() == ISD::CONCAT_VECTORS) {
		// The lane is decremented if we are splatting from the 2nd operand.
		// Example: dup v4i32 (concat v2i32 X, v2i32 Y), 3 --> dup v4i32 Y, 1
unsigned Idx = Lane >= (int)VT.getVectorNumElements() / 2;		unsigned Idx = Lane >= (int)VT.getVectorNumElements() / 2;
Lane -= Idx * VT.getVectorNumElements() / 2;		Lane -= Idx * VT.getVectorNumElements() / 2;
V1 = WidenVector(V1.getOperand(Idx), DAG);		V1 = WidenVector(V1.getOperand(Idx), DAG);
} else if (VT.getSizeInBits() == 64)		} else if (VT.getSizeInBits() == 64) {
		// Widen the operand to 128-bit register with undef.
V1 = WidenVector(V1, DAG);		V1 = WidenVector(V1, DAG);
		}
return DAG.getNode(Opcode, dl, VT, V1, DAG.getConstant(Lane, dl, MVT::i64));		return DAG.getNode(Opcode, dl, VT, V1, DAG.getConstant(Lane, dl, MVT::i64));
}		}

if (isREVMask(ShuffleMask, VT, 64))		if (isREVMask(ShuffleMask, VT, 64))
return DAG.getNode(AArch64ISD::REV64, dl, V1.getValueType(), V1, V2);		return DAG.getNode(AArch64ISD::REV64, dl, V1.getValueType(), V1, V2);
if (isREVMask(ShuffleMask, VT, 32))		if (isREVMask(ShuffleMask, VT, 32))
return DAG.getNode(AArch64ISD::REV32, dl, V1.getValueType(), V1, V2);		return DAG.getNode(AArch64ISD::REV32, dl, V1.getValueType(), V1, V2);
if (isREVMask(ShuffleMask, VT, 16))		if (isREVMask(ShuffleMask, VT, 16))
▲ Show 20 Lines • Show All 6,117 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll

Show First 20 Lines • Show All 1,657 Lines • ▼ Show 20 Lines	entry:
%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <2 x i32> <i32 3, i32 3>		%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <2 x i32> <i32 3, i32 3>
%mul = fmul <2 x float> %shuffle, %a		%mul = fmul <2 x float> %shuffle, %a
ret <2 x float> %mul		ret <2 x float> %mul
}		}

define <2 x float> @test_vmul_laneq3_f32_bitcast(<2 x float> %a, <2 x double> %v) {		define <2 x float> @test_vmul_laneq3_f32_bitcast(<2 x float> %a, <2 x double> %v) {
; CHECK-LABEL: test_vmul_laneq3_f32_bitcast:		; CHECK-LABEL: test_vmul_laneq3_f32_bitcast:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ext v1.16b, v1.16b, v1.16b, #8		; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[3]
; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[1]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>		%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>
%bc = bitcast <1 x double> %extract to <2 x float>		%bc = bitcast <1 x double> %extract to <2 x float>
%splat = shufflevector <2 x float> %bc, <2 x float> undef, <2 x i32> <i32 1, i32 1>		%splat = shufflevector <2 x float> %bc, <2 x float> undef, <2 x i32> <i32 1, i32 1>
%mul = fmul <2 x float> %splat, %a		%mul = fmul <2 x float> %splat, %a
ret <2 x float> %mul		ret <2 x float> %mul
}		}

define <2 x float> @test_vmul_laneq2_f32_bitcast(<2 x float> %a, <2 x double> %v) {		define <2 x float> @test_vmul_laneq2_f32_bitcast(<2 x float> %a, <2 x double> %v) {
; CHECK-LABEL: test_vmul_laneq2_f32_bitcast:		; CHECK-LABEL: test_vmul_laneq2_f32_bitcast:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ext v1.16b, v1.16b, v1.16b, #8		; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[2]
; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>		%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>
%bc = bitcast <1 x double> %extract to <2 x float>		%bc = bitcast <1 x double> %extract to <2 x float>
%splat = shufflevector <2 x float> %bc, <2 x float> undef, <2 x i32> <i32 0, i32 0>		%splat = shufflevector <2 x float> %bc, <2 x float> undef, <2 x i32> <i32 0, i32 0>
%mul = fmul <2 x float> %splat, %a		%mul = fmul <2 x float> %splat, %a
ret <2 x float> %mul		ret <2 x float> %mul
}		}

define <4 x i16> @test_vadd_laneq5_i16_bitcast(<4 x i16> %a, <2 x double> %v) {		define <4 x i16> @test_vadd_laneq5_i16_bitcast(<4 x i16> %a, <2 x double> %v) {
; CHECK-LABEL: test_vadd_laneq5_i16_bitcast:		; CHECK-LABEL: test_vadd_laneq5_i16_bitcast:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ext v1.16b, v1.16b, v1.16b, #8		; CHECK-NEXT: dup v1.4h, v1.h[5]
; CHECK-NEXT: dup v1.4h, v1.h[1]
; CHECK-NEXT: add v0.4h, v1.4h, v0.4h		; CHECK-NEXT: add v0.4h, v1.4h, v0.4h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>		%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>
%bc = bitcast <1 x double> %extract to <4 x i16>		%bc = bitcast <1 x double> %extract to <4 x i16>
%splat = shufflevector <4 x i16> %bc, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%splat = shufflevector <4 x i16> %bc, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
%r = add <4 x i16> %splat, %a		%r = add <4 x i16> %splat, %a
ret <4 x i16> %r		ret <4 x i16> %r
}		}

		; TODO: The pattern in LowerVECTOR_SHUFFLE does not match what we are looking for.

define <4 x i16> @test_vadd_lane2_i16_bitcast_bigger_aligned(<4 x i16> %a, <16 x i8> %v) {		define <4 x i16> @test_vadd_lane2_i16_bitcast_bigger_aligned(<4 x i16> %a, <16 x i8> %v) {
; CHECK-LABEL: test_vadd_lane2_i16_bitcast_bigger_aligned:		; CHECK-LABEL: test_vadd_lane2_i16_bitcast_bigger_aligned:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ext v1.8b, v1.8b, v0.8b, #2		; CHECK-NEXT: ext v1.8b, v1.8b, v0.8b, #2
; CHECK-NEXT: dup v1.4h, v1.h[1]		; CHECK-NEXT: dup v1.4h, v1.h[1]
; CHECK-NEXT: add v0.4h, v1.4h, v0.4h		; CHECK-NEXT: add v0.4h, v1.4h, v0.4h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%extract = shufflevector <16 x i8> %v, <16 x i8> undef, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9>		%extract = shufflevector <16 x i8> %v, <16 x i8> undef, <8 x i32> <i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9>
%bc = bitcast <8 x i8> %extract to <4 x i16>		%bc = bitcast <8 x i8> %extract to <4 x i16>
%splat = shufflevector <4 x i16> %bc, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%splat = shufflevector <4 x i16> %bc, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
%r = add <4 x i16> %splat, %a		%r = add <4 x i16> %splat, %a
ret <4 x i16> %r		ret <4 x i16> %r
}		}

define <4 x i16> @test_vadd_lane5_i16_bitcast_bigger_aligned(<4 x i16> %a, <16 x i8> %v) {		define <4 x i16> @test_vadd_lane5_i16_bitcast_bigger_aligned(<4 x i16> %a, <16 x i8> %v) {
; CHECK-LABEL: test_vadd_lane5_i16_bitcast_bigger_aligned:		; CHECK-LABEL: test_vadd_lane5_i16_bitcast_bigger_aligned:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ext v1.16b, v1.16b, v1.16b, #8		; CHECK-NEXT: dup v1.4h, v1.h[5]
; CHECK-NEXT: dup v1.4h, v1.h[1]
; CHECK-NEXT: add v0.4h, v1.4h, v0.4h		; CHECK-NEXT: add v0.4h, v1.4h, v0.4h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%extract = shufflevector <16 x i8> %v, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%extract = shufflevector <16 x i8> %v, <16 x i8> undef, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%bc = bitcast <8 x i8> %extract to <4 x i16>		%bc = bitcast <8 x i8> %extract to <4 x i16>
%splat = shufflevector <4 x i16> %bc, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%splat = shufflevector <4 x i16> %bc, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
%r = add <4 x i16> %splat, %a		%r = add <4 x i16> %splat, %a
ret <4 x i16> %r		ret <4 x i16> %r
}		}

		; Negative test - can't dup bytes {3,4} of v8i16.

define <4 x i16> @test_vadd_lane_i16_bitcast_bigger_unaligned(<4 x i16> %a, <16 x i8> %v) {		define <4 x i16> @test_vadd_lane_i16_bitcast_bigger_unaligned(<4 x i16> %a, <16 x i8> %v) {
; CHECK-LABEL: test_vadd_lane_i16_bitcast_bigger_unaligned:		; CHECK-LABEL: test_vadd_lane_i16_bitcast_bigger_unaligned:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ext v1.8b, v1.8b, v0.8b, #1		; CHECK-NEXT: ext v1.8b, v1.8b, v0.8b, #1
; CHECK-NEXT: dup v1.4h, v1.h[1]		; CHECK-NEXT: dup v1.4h, v1.h[1]
; CHECK-NEXT: add v0.4h, v1.4h, v0.4h		; CHECK-NEXT: add v0.4h, v1.4h, v0.4h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%extract = shufflevector <16 x i8> %v, <16 x i8> undef, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>		%extract = shufflevector <16 x i8> %v, <16 x i8> undef, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
▲ Show 20 Lines • Show All 1,715 Lines • Show Last 20 Lines