This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
3/3
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-neon-2velem.ll

Differential D71672

[AArch64] match splat of bitcasted extract subvector to DUPLANE
ClosedPublic

Authored by spatel on Dec 18 2019, 12:49 PM.

Download Raw Diff

Details

Reviewers

efriedma
dmgreen
t.p.northover

Commits

rG0b38af89e2c0: [AArch64] match splat of bitcasted extract subvector to DUPLANE

Summary

This is another potential regression exposed by D63815.

Here we peek through a bitcast to find an extract subvector and scale the splat offset based on that:
splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC'

Diff Detail

Event Timeline

spatel created this revision.Dec 18 2019, 12:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 18 2019, 12:49 PM

Herald added subscribers: hiraditya, kristof.beyls, mcrosier. · View Herald Transcript

efriedma added inline comments.Dec 18 2019, 1:30 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7045	Can you fix the comments here, so each of these transforms has its own comment briefly explaining what it does? `V1.getOperand(0).getScalarValueSizeInBits() % VTEltBitWidth == 0` seems overly restrictive. I guess you have to enforce that the EXTRACT_SUBVECTOR index is appropriately aligned, but it would be okay to allow, for example, `<16 x i8>` with index 8. Maybe translate the EXTRACT_SUBVECTOR index to a byte offset, then divide by the element size of the result? That should also make the logic a little easier to follow.

spatel mentioned this in rG59811f454df0: [AArch64] add more tests for extract-bitcast-splat; NFC.Dec 20 2019, 5:58 AM

Patch updated:

Enhanced to allow narrow element -> wide element bitcasts. (tests added with rG59811f454df0)
Restructured and added code/test comments (hopefully makes it easier to read).

efriedma added inline comments.Dec 20 2019, 11:57 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7084	Can we compute SrcVecNumElts as Extract.getOperand(0).getValueType().getSizeInBits()/CastedEltBitWidth, or something like that?

spatel marked 2 inline comments as done.Dec 20 2019, 1:02 PM

spatel added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7084	Ah, yes - that will simplify things. I was hung up on scaling that value, but it's unnecessary.

Patch updated:
Simplify code for calculating the casted source vector type.

LGTM

This revision is now accepted and ready to land.Dec 20 2019, 1:18 PM

Closed by commit rG0b38af89e2c0: [AArch64] match splat of bitcasted extract subvector to DUPLANE (authored by spatel). · Explain WhyDec 22 2019, 6:02 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

17 lines

test/

CodeGen/

AArch64/

arm64-neon-2velem.ll

9 lines

Diff 234598

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,036 Lines • ▼ Show 20 Lines	if (V1.getOpcode() == ISD::BUILD_VECTOR &&
!isa<ConstantSDNode>(V1.getOperand(Lane)))		!isa<ConstantSDNode>(V1.getOperand(Lane)))
return DAG.getNode(AArch64ISD::DUP, dl, VT, V1.getOperand(Lane));		return DAG.getNode(AArch64ISD::DUP, dl, VT, V1.getOperand(Lane));

// Otherwise, duplicate from the lane of the input vector.		// Otherwise, duplicate from the lane of the input vector.
unsigned Opcode = getDUPLANEOp(V1.getValueType().getVectorElementType());		unsigned Opcode = getDUPLANEOp(V1.getValueType().getVectorElementType());

// SelectionDAGBuilder may have "helpfully" already extracted or conatenated		// SelectionDAGBuilder may have "helpfully" already extracted or conatenated
// to make a vector of the same size as this SHUFFLE. We can ignore the		// to make a vector of the same size as this SHUFFLE. We can ignore the
// extract entirely, and canonicalise the concat using WidenVector.		// extract entirely, and canonicalise the concat using WidenVector.
		efriedmaUnsubmitted Done Reply Inline Actions Can you fix the comments here, so each of these transforms has its own comment briefly explaining what it does? `V1.getOperand(0).getScalarValueSizeInBits() % VTEltBitWidth == 0` seems overly restrictive. I guess you have to enforce that the EXTRACT_SUBVECTOR index is appropriately aligned, but it would be okay to allow, for example, `<16 x i8>` with index 8. Maybe translate the EXTRACT_SUBVECTOR index to a byte offset, then divide by the element size of the result? That should also make the logic a little easier to follow. efriedma: Can you fix the comments here, so each of these transforms has its own comment briefly…
if (V1.getOpcode() == ISD::EXTRACT_SUBVECTOR) {		unsigned VTEltBitWidth = VT.getScalarSizeInBits();
		if (V1.getOpcode() == ISD::BITCAST &&
		V1.getOperand(0).getOpcode() == ISD::EXTRACT_SUBVECTOR &&
		V1.getOperand(0).getScalarValueSizeInBits() % VTEltBitWidth == 0) {
		// If the extract is bitcast to smaller type, offset the DUPLANE index to
		// account for that and bitcast the DUPLANE operand.
		SDValue SrcOp = V1.getOperand(0);
		unsigned ExtIdx = SrcOp.getConstantOperandVal(1);
		unsigned Scale = SrcOp.getScalarValueSizeInBits() / VTEltBitWidth;
		Lane += ExtIdx * Scale;
		unsigned WideVecNumElts =
		SrcOp.getOperand(0).getValueType().getVectorNumElements();
		MVT CastVT = MVT::getVectorVT(VT.getSimpleVT().getScalarType(),
		WideVecNumElts * Scale);
		V1 = DAG.getBitcast(CastVT, SrcOp.getOperand(0));
		} else if (V1.getOpcode() == ISD::EXTRACT_SUBVECTOR) {
Lane += cast<ConstantSDNode>(V1.getOperand(1))->getZExtValue();		Lane += cast<ConstantSDNode>(V1.getOperand(1))->getZExtValue();
V1 = V1.getOperand(0);		V1 = V1.getOperand(0);
} else if (V1.getOpcode() == ISD::CONCAT_VECTORS) {		} else if (V1.getOpcode() == ISD::CONCAT_VECTORS) {
unsigned Idx = Lane >= (int)VT.getVectorNumElements() / 2;		unsigned Idx = Lane >= (int)VT.getVectorNumElements() / 2;
Lane -= Idx * VT.getVectorNumElements() / 2;		Lane -= Idx * VT.getVectorNumElements() / 2;
V1 = WidenVector(V1.getOperand(Idx), DAG);		V1 = WidenVector(V1.getOperand(Idx), DAG);
} else if (VT.getSizeInBits() == 64)		} else if (VT.getSizeInBits() == 64)
V1 = WidenVector(V1, DAG);		V1 = WidenVector(V1, DAG);

return DAG.getNode(Opcode, dl, VT, V1, DAG.getConstant(Lane, dl, MVT::i64));		return DAG.getNode(Opcode, dl, VT, V1, DAG.getConstant(Lane, dl, MVT::i64));
}		}

if (isREVMask(ShuffleMask, VT, 64))		if (isREVMask(ShuffleMask, VT, 64))
return DAG.getNode(AArch64ISD::REV64, dl, V1.getValueType(), V1, V2);		return DAG.getNode(AArch64ISD::REV64, dl, V1.getValueType(), V1, V2);
if (isREVMask(ShuffleMask, VT, 32))		if (isREVMask(ShuffleMask, VT, 32))
return DAG.getNode(AArch64ISD::REV32, dl, V1.getValueType(), V1, V2);		return DAG.getNode(AArch64ISD::REV32, dl, V1.getValueType(), V1, V2);
if (isREVMask(ShuffleMask, VT, 16))		if (isREVMask(ShuffleMask, VT, 16))
return DAG.getNode(AArch64ISD::REV16, dl, V1.getValueType(), V1, V2);		return DAG.getNode(AArch64ISD::REV16, dl, V1.getValueType(), V1, V2);

bool ReverseEXT = false;		bool ReverseEXT = false;
unsigned Imm;		unsigned Imm;
if (isEXTMask(ShuffleMask, VT, ReverseEXT, Imm)) {		if (isEXTMask(ShuffleMask, VT, ReverseEXT, Imm)) {
if (ReverseEXT)		if (ReverseEXT)
		efriedmaUnsubmitted Done Reply Inline Actions Can we compute SrcVecNumElts as Extract.getOperand(0).getValueType().getSizeInBits()/CastedEltBitWidth, or something like that? efriedma: Can we compute SrcVecNumElts as Extract.getOperand(0).getValueType().getSizeInBits…
		spatelAuthorUnsubmitted Done Reply Inline Actions Ah, yes - that will simplify things. I was hung up on scaling that value, but it's unnecessary. spatel: Ah, yes - that will simplify things. I was hung up on scaling that value, but it's unnecessary.
std::swap(V1, V2);		std::swap(V1, V2);
Imm *= getExtFactor(V1);		Imm *= getExtFactor(V1);
return DAG.getNode(AArch64ISD::EXT, dl, V1.getValueType(), V1, V2,		return DAG.getNode(AArch64ISD::EXT, dl, V1.getValueType(), V1, V2,
DAG.getConstant(Imm, dl, MVT::i32));		DAG.getConstant(Imm, dl, MVT::i32));
} else if (V2->isUndef() && isSingletonEXTMask(ShuffleMask, VT, Imm)) {		} else if (V2->isUndef() && isSingletonEXTMask(ShuffleMask, VT, Imm)) {
Imm *= getExtFactor(V1);		Imm *= getExtFactor(V1);
return DAG.getNode(AArch64ISD::EXT, dl, V1.getValueType(), V1, V1,		return DAG.getNode(AArch64ISD::EXT, dl, V1.getValueType(), V1, V1,
DAG.getConstant(Imm, dl, MVT::i32));		DAG.getConstant(Imm, dl, MVT::i32));
▲ Show 20 Lines • Show All 6,013 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-neon-2velem.ll

Show First 20 Lines • Show All 1,657 Lines • ▼ Show 20 Lines	entry:
%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <2 x i32> <i32 3, i32 3>		%shuffle = shufflevector <4 x float> %v, <4 x float> undef, <2 x i32> <i32 3, i32 3>
%mul = fmul <2 x float> %shuffle, %a		%mul = fmul <2 x float> %shuffle, %a
ret <2 x float> %mul		ret <2 x float> %mul
}		}

define <2 x float> @test_vmul_laneq3_f32_bitcast(<2 x float> %a, <2 x double> %v) {		define <2 x float> @test_vmul_laneq3_f32_bitcast(<2 x float> %a, <2 x double> %v) {
; CHECK-LABEL: test_vmul_laneq3_f32_bitcast:		; CHECK-LABEL: test_vmul_laneq3_f32_bitcast:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ext v1.16b, v1.16b, v1.16b, #8		; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[3]
; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[1]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>		%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>
%bc = bitcast <1 x double> %extract to <2 x float>		%bc = bitcast <1 x double> %extract to <2 x float>
%splat = shufflevector <2 x float> %bc, <2 x float> undef, <2 x i32> <i32 1, i32 1>		%splat = shufflevector <2 x float> %bc, <2 x float> undef, <2 x i32> <i32 1, i32 1>
%mul = fmul <2 x float> %splat, %a		%mul = fmul <2 x float> %splat, %a
ret <2 x float> %mul		ret <2 x float> %mul
}		}

define <2 x float> @test_vmul_laneq2_f32_bitcast(<2 x float> %a, <2 x double> %v) {		define <2 x float> @test_vmul_laneq2_f32_bitcast(<2 x float> %a, <2 x double> %v) {
; CHECK-LABEL: test_vmul_laneq2_f32_bitcast:		; CHECK-LABEL: test_vmul_laneq2_f32_bitcast:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ext v1.16b, v1.16b, v1.16b, #8		; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[2]
; CHECK-NEXT: fmul v0.2s, v0.2s, v1.s[0]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>		%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>
%bc = bitcast <1 x double> %extract to <2 x float>		%bc = bitcast <1 x double> %extract to <2 x float>
%splat = shufflevector <2 x float> %bc, <2 x float> undef, <2 x i32> <i32 0, i32 0>		%splat = shufflevector <2 x float> %bc, <2 x float> undef, <2 x i32> <i32 0, i32 0>
%mul = fmul <2 x float> %splat, %a		%mul = fmul <2 x float> %splat, %a
ret <2 x float> %mul		ret <2 x float> %mul
}		}

define <4 x i16> @test_vmul_laneq5_i16_bitcast(<4 x i16> %a, <2 x double> %v) {		define <4 x i16> @test_vmul_laneq5_i16_bitcast(<4 x i16> %a, <2 x double> %v) {
; CHECK-LABEL: test_vmul_laneq5_i16_bitcast:		; CHECK-LABEL: test_vmul_laneq5_i16_bitcast:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ext v1.16b, v1.16b, v1.16b, #8		; CHECK-NEXT: dup v1.4h, v1.h[5]
; CHECK-NEXT: dup v1.4h, v1.h[1]
; CHECK-NEXT: add v0.4h, v1.4h, v0.4h		; CHECK-NEXT: add v0.4h, v1.4h, v0.4h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>		%extract = shufflevector <2 x double> %v, <2 x double> undef, <1 x i32> <i32 1>
%bc = bitcast <1 x double> %extract to <4 x i16>		%bc = bitcast <1 x double> %extract to <4 x i16>
%splat = shufflevector <4 x i16> %bc, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%splat = shufflevector <4 x i16> %bc, <4 x i16> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
%r = add <4 x i16> %splat, %a		%r = add <4 x i16> %splat, %a
ret <4 x i16> %r		ret <4 x i16> %r
}		}
▲ Show 20 Lines • Show All 1,710 Lines • Show Last 20 Lines