This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
2/5
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
fptosi-sat-vector.ll
-
fptoui-sat-vector.ll
-
neon-extracttruncate.ll

Differential D119469

[AArch64] Turn truncating buildvectors into truncates
ClosedPublic

Authored by dmgreen on Feb 10 2022, 12:10 PM.

Download Raw Diff

Details

Reviewers

samtebbs
fhahn
sdesmalen
david-arm
jaykang10

Commits

rGd9633d149022: [AArch64] Turn truncating buildvectors into truncates

Summary

When lowering large v16f32->v16i8 fp_to_si_sat, the fp_to_si_sat node it split several times, creating an illegal v4i8 concat that gets expanded into a BUILD_VECTOR. After some combining and other legalisation, it ends up the a buildvector that extracts from 4 vectors, looking like BUILDVECTOR(a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3). That is really an v16i32->v16i8 truncate in disguise. This adds a ReconstructTruncateFromBuildVector method to detect the pattern, converting it back into the legal concat(trunc(concat(trunc(a), trunc(b))), trunc(concat(trunc(c), trunc(d)))) tree. The extracted nodes could also be v4i16, in which case the truncates are not needed. All those truncates and concats then become uzip1's, which it much better than expanding by moving vector lanes around.

Found when looking at D96522 / D118979.

Diff Detail

Event Timeline

dmgreen created this revision.Feb 10 2022, 12:10 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptFeb 10 2022, 12:10 PM

dmgreen requested review of this revision.Feb 10 2022, 12:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2022, 12:10 PM

dmgreen added a reviewer: jaykang10.Feb 10 2022, 12:11 PM

Harbormaster completed remote builds in B148811: Diff 407629.Feb 10 2022, 12:11 PM

david-arm added inline comments.Feb 14 2022, 1:24 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9257	Hi @dmgreen, this feels a bit too specific to a particular optimisation. For example, there is a test in CodeGen/AArch64/neon-extracttruncate.ll called `@extract_2_v4i32` that could also benefit from something like this. Also, some of the code in here looks very similar to what ReconstructShuffle attempts to do, which also looks at BUILD_VECTORS that are constructed from extractelement operations. Would it be better to simply extend ReconstructShuffle to cover this case?

dmgreen added inline comments.Feb 15 2022, 2:02 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9257	ReconstructShuffle is for reconstructing buildvectors into Legal Shuffles, and if you look at the code only handles 2 sources. This method is for reconstructing a truncates, not shuffles, and needs to handle 4 sources. As this isn't a legal shuffle, it doesn't really fit there. The tests in neon-extracttruncate.ll I just added as a quick test. They are not expected to come up in practice. https://godbolt.org/z/eEser71Gq. It is the lowering of large fptosi.sat that we are targeting.

david-arm added inline comments.Feb 15 2022, 2:51 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9257	OK, but is it worth making this more generic then than it currently is? i.e. dealing with more cases than just this one very specific issue? @extract_2_v4i32 looks like it could benefit too.

dmgreen added inline comments.Feb 15 2022, 9:55 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9257	Hmm. Like I showed in the godbolt link above extract_2_v4i32 will be converted to shuffles prior to ISel. It will go though normal shuffle codepaths, not being converted to a BUILD_VECTOR anywhere during ISel. This method cant really help with that, and won't expect to catch any cases which have been optimized from llvm into a shuffle already. Do you have cases where you think a more general version of this function would come up?

creating an illegal v4i8 concat that gets expanded into a BUILD_VECTOR

Maybe it would make sense to stop this from happening? concat_vectors where the operands will be widened seems like it would be a common pattern.

In D119469#3323751, @efriedma wrote:

creating an illegal v4i8 concat that gets expanded into a BUILD_VECTOR

Maybe it would make sense to stop this from happening? concat_vectors where the operands will be widened seems like it would be a common pattern.

Yeah - That is something I looked into, but didn't get very far as it seems simple enough to fix after. That seems to be the same way we handle all other concats - we expand them with extract then reconstruct the shuffle. I think it would be a too large job to try and change how concat is lowered.

dmgreen edited the summary of this revision. (Show Details)Feb 17 2022, 1:09 AM

david-arm added inline comments.Mar 3 2022, 6:38 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

9292

I think you're missing some test cases here because your code permits the possibility of mixed types, i.e.

%a0 = extractelement <4 x i16> %a, i32 0
...
%b0 = extractelement <4 x i32> %b, i32 0
...
%c0 = extractelement <4 x i16> %c, i32 0
...
%d0 = extractelement <4 x i32> %d, i32 0
...
%t0 = trunc i16 %a0 to i8
...
%t4 = trunc i32 %b0 to i8
...
%t8 = trunc i16 %c0 to i8
...
%t12 = trunc i32 %d0 to i8
...
%i0 = insertelement <16 x i8> undef, i8 %t0, i32 0
...
%i4 = insertelement <16 x i8> %i3, i8 %t4, i32 4
...
%i8 = insertelement <16 x i8> %i7, i8 %t8, i32 8
...
%i12 = insertelement <16 x i8> %i11, i8 %t12, i32 12
...

Can you add at least one test for mixed types?

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 6:38 AM

Sounds good. This now has that test and a couple more.

Harbormaster completed remote builds in B152556: Diff 412956.Mar 4 2022, 1:25 AM

LGTM! Thanks for adding the tests @dmgreen . This patch seems like a nice improvement. :)

This revision is now accepted and ready to land.Mar 4 2022, 2:51 AM

Cheers

This revision was landed with ongoing or failed builds.Mar 7 2022, 1:43 AM

Closed by commit rGd9633d149022: [AArch64] Turn truncating buildvectors into truncates (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGd9633d149022: [AArch64] Turn truncating buildvectors into truncates.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

56 lines

test/

CodeGen/

AArch64/

fptosi-sat-vector.ll

57 lines

fptoui-sat-vector.ll

51 lines

neon-extracttruncate.ll

133 lines

Diff 412956

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,245 Lines • ▼ Show 20 Lines	if (M[i] < 0)
continue; // ignore UNDEF indices		continue; // ignore UNDEF indices
if (ExpectedElt != static_cast<unsigned>(M[i]))		if (ExpectedElt != static_cast<unsigned>(M[i]))
return false;		return false;
}		}

return true;		return true;
}		}

		// Detect patterns of a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3 from
		// v4i32s. This is really a truncate, which we can construct out of (legal)
		// concats and truncate nodes.
		static SDValue ReconstructTruncateFromBuildVector(SDValue V, SelectionDAG &DAG) {
		david-armUnsubmitted Not Done Reply Inline Actions Hi @dmgreen, this feels a bit too specific to a particular optimisation. For example, there is a test in CodeGen/AArch64/neon-extracttruncate.ll called `@extract_2_v4i32` that could also benefit from something like this. Also, some of the code in here looks very similar to what ReconstructShuffle attempts to do, which also looks at BUILD_VECTORS that are constructed from extractelement operations. Would it be better to simply extend ReconstructShuffle to cover this case? david-arm: Hi @dmgreen, this feels a bit too specific to a particular optimisation. For example, there…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions ReconstructShuffle is for reconstructing buildvectors into Legal Shuffles, and if you look at the code only handles 2 sources. This method is for reconstructing a truncates, not shuffles, and needs to handle 4 sources. As this isn't a legal shuffle, it doesn't really fit there. The tests in neon-extracttruncate.ll I just added as a quick test. They are not expected to come up in practice. https://godbolt.org/z/eEser71Gq. It is the lowering of large fptosi.sat that we are targeting. dmgreen: ReconstructShuffle is for reconstructing buildvectors into Legal Shuffles, and if you look at…
		david-armUnsubmitted Not Done Reply Inline Actions OK, but is it worth making this more generic then than it currently is? i.e. dealing with more cases than just this one very specific issue? @extract_2_v4i32 looks like it could benefit too. david-arm: OK, but is it worth making this more generic then than it currently is? i.e. dealing with more…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Hmm. Like I showed in the godbolt link above extract_2_v4i32 will be converted to shuffles prior to ISel. It will go though normal shuffle codepaths, not being converted to a BUILD_VECTOR anywhere during ISel. This method cant really help with that, and won't expect to catch any cases which have been optimized from llvm into a shuffle already. Do you have cases where you think a more general version of this function would come up? dmgreen: Hmm. Like I showed in the godbolt link above extract_2_v4i32 will be converted to shuffles…
		if (V.getValueType() != MVT::v16i8)
		return SDValue();
		assert(V.getNumOperands() == 16 && "Expected 16 operands on the BUILDVECTOR");

		for (unsigned X = 0; X < 4; X++) {
		// Check the first item in each group is an extract from lane 0 of a v4i32
		// or v4i16.
		SDValue BaseExt = V.getOperand(X * 4);
		if (BaseExt.getOpcode() != ISD::EXTRACT_VECTOR_ELT \|\|
		(BaseExt.getOperand(0).getValueType() != MVT::v4i16 &&
		BaseExt.getOperand(0).getValueType() != MVT::v4i32) \|\|
		!isa<ConstantSDNode>(BaseExt.getOperand(1)) \|\|
		BaseExt.getConstantOperandVal(1) != 0)
		return SDValue();
		SDValue Base = BaseExt.getOperand(0);
		// And check the other items are extracts from the same vector.
		for (unsigned Y = 1; Y < 4; Y++) {
		SDValue Ext = V.getOperand(X * 4 + Y);
		if (Ext.getOpcode() != ISD::EXTRACT_VECTOR_ELT \|\|
		Ext.getOperand(0) != Base \|\|
		!isa<ConstantSDNode>(Ext.getOperand(1)) \|\|
		Ext.getConstantOperandVal(1) != Y)
		return SDValue();
		}
		}

		// Turn the buildvector into a series of truncates and concates, which will
		// become uzip1's. Any v4i32s we found get truncated to v4i16, which are
		// concat together to produce 2 v8i16. These are both truncated and concat
		// together.
		SDLoc DL(V);
		SDValue Trunc[4] = {
		V.getOperand(0).getOperand(0), V.getOperand(4).getOperand(0),
		V.getOperand(8).getOperand(0), V.getOperand(12).getOperand(0)};
		for (int I = 0; I < 4; I++)
		david-armUnsubmitted Not Done Reply Inline Actions I think you're missing some test cases here because your code permits the possibility of mixed types, i.e. %a0 = extractelement <4 x i16> %a, i32 0 ... %b0 = extractelement <4 x i32> %b, i32 0 ... %c0 = extractelement <4 x i16> %c, i32 0 ... %d0 = extractelement <4 x i32> %d, i32 0 ... %t0 = trunc i16 %a0 to i8 ... %t4 = trunc i32 %b0 to i8 ... %t8 = trunc i16 %c0 to i8 ... %t12 = trunc i32 %d0 to i8 ... %i0 = insertelement <16 x i8> undef, i8 %t0, i32 0 ... %i4 = insertelement <16 x i8> %i3, i8 %t4, i32 4 ... %i8 = insertelement <16 x i8> %i7, i8 %t8, i32 8 ... %i12 = insertelement <16 x i8> %i11, i8 %t12, i32 12 ... Can you add at least one test for mixed types? david-arm: I think you're missing some test cases here because your code permits the possibility of mixed…
		if (Trunc[I].getValueType() == MVT::v4i32)
		Trunc[I] = DAG.getNode(ISD::TRUNCATE, DL, MVT::v4i16, Trunc[I]);
		SDValue Concat0 =
		DAG.getNode(ISD::CONCAT_VECTORS, DL, MVT::v8i16, Trunc[0], Trunc[1]);
		SDValue Concat1 =
		DAG.getNode(ISD::CONCAT_VECTORS, DL, MVT::v8i16, Trunc[2], Trunc[3]);
		SDValue Trunc0 = DAG.getNode(ISD::TRUNCATE, DL, MVT::v8i8, Concat0);
		SDValue Trunc1 = DAG.getNode(ISD::TRUNCATE, DL, MVT::v8i8, Concat1);
		return DAG.getNode(ISD::CONCAT_VECTORS, DL, MVT::v16i8, Trunc0, Trunc1);
		}

/// Check if a vector shuffle corresponds to a DUP instructions with a larger		/// Check if a vector shuffle corresponds to a DUP instructions with a larger
/// element width than the vector lane type. If that is the case the function		/// element width than the vector lane type. If that is the case the function
/// returns true and writes the value of the DUP instruction lane operand into		/// returns true and writes the value of the DUP instruction lane operand into
/// DupLaneOp		/// DupLaneOp
static bool isWideDUPMask(ArrayRef<int> M, EVT VT, unsigned BlockSize,		static bool isWideDUPMask(ArrayRef<int> M, EVT VT, unsigned BlockSize,
unsigned &DupLaneOp) {		unsigned &DupLaneOp) {
assert((BlockSize == 16 \|\| BlockSize == 32 \|\| BlockSize == 64) &&		assert((BlockSize == 16 \|\| BlockSize == 32 \|\| BlockSize == 64) &&
"Only possible block sizes for wide DUP are: 16, 32, 64");		"Only possible block sizes for wide DUP are: 16, 32, 64");
▲ Show 20 Lines • Show All 1,609 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerBUILD_VECTOR(SDValue Op,
}		}

// Empirical tests suggest this is rarely worth it for vectors of length <= 2.		// Empirical tests suggest this is rarely worth it for vectors of length <= 2.
if (NumElts >= 4) {		if (NumElts >= 4) {
if (SDValue shuffle = ReconstructShuffle(Op, DAG))		if (SDValue shuffle = ReconstructShuffle(Op, DAG))
return shuffle;		return shuffle;
}		}

		// Detect patterns of a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3 from
		// v4i32s. This is really a truncate, which we can construct out of (legal)
		// concats and truncate nodes.
		if (SDValue M = ReconstructTruncateFromBuildVector(Op, DAG))
		return M;

if (PreferDUPAndInsert) {		if (PreferDUPAndInsert) {
// First, build a constant vector with the common element.		// First, build a constant vector with the common element.
SmallVector<SDValue, 8> Ops(NumElts, Value);		SmallVector<SDValue, 8> Ops(NumElts, Value);
SDValue NewVector = LowerBUILD_VECTOR(DAG.getBuildVector(VT, dl, Ops), DAG);		SDValue NewVector = LowerBUILD_VECTOR(DAG.getBuildVector(VT, dl, Ops), DAG);
// Next, insert the elements that do not match the common value.		// Next, insert the elements that do not match the common value.
for (unsigned I = 0; I < NumElts; ++I)		for (unsigned I = 0; I < NumElts; ++I)
if (Op.getOperand(I) != Value)		if (Op.getOperand(I) != Value)
NewVector =		NewVector =
▲ Show 20 Lines • Show All 9,599 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll

Show First 20 Lines • Show All 3,006 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%x = call <8 x i8> @llvm.fptosi.sat.v8f32.v8i8(<8 x float> %f)		%x = call <8 x i8> @llvm.fptosi.sat.v8f32.v8i8(<8 x float> %f)
ret <8 x i8> %x		ret <8 x i8> %x
}		}

define <16 x i8> @test_signed_v16f32_v16i8(<16 x float> %f) {		define <16 x i8> @test_signed_v16f32_v16i8(<16 x float> %f) {
; CHECK-LABEL: test_signed_v16f32_v16i8:		; CHECK-LABEL: test_signed_v16f32_v16i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: movi v4.4s, #127		; CHECK-NEXT: movi v4.4s, #127
		; CHECK-NEXT: fcvtzs v3.4s, v3.4s
		; CHECK-NEXT: fcvtzs v2.4s, v2.4s
		; CHECK-NEXT: fcvtzs v1.4s, v1.4s
; CHECK-NEXT: fcvtzs v0.4s, v0.4s		; CHECK-NEXT: fcvtzs v0.4s, v0.4s
; CHECK-NEXT: mvni v5.4s, #127		; CHECK-NEXT: mvni v5.4s, #127
; CHECK-NEXT: fcvtzs v1.4s, v1.4s		; CHECK-NEXT: smin v3.4s, v3.4s, v4.4s
; CHECK-NEXT: fcvtzs v2.4s, v2.4s
; CHECK-NEXT: smin v0.4s, v0.4s, v4.4s
; CHECK-NEXT: smin v1.4s, v1.4s, v4.4s
; CHECK-NEXT: smin v2.4s, v2.4s, v4.4s		; CHECK-NEXT: smin v2.4s, v2.4s, v4.4s
; CHECK-NEXT: smax v0.4s, v0.4s, v5.4s		; CHECK-NEXT: smin v1.4s, v1.4s, v4.4s
; CHECK-NEXT: smax v1.4s, v1.4s, v5.4s		; CHECK-NEXT: smin v0.4s, v0.4s, v4.4s
; CHECK-NEXT: smax v2.4s, v2.4s, v5.4s		; CHECK-NEXT: smax v3.4s, v3.4s, v5.4s
; CHECK-NEXT: xtn v6.4h, v0.4s
; CHECK-NEXT: umov w8, v6.h[0]
; CHECK-NEXT: umov w9, v6.h[1]
; CHECK-NEXT: xtn v1.4h, v1.4s
; CHECK-NEXT: fmov s0, w8
; CHECK-NEXT: umov w8, v6.h[2]
; CHECK-NEXT: mov v0.b[1], w9
; CHECK-NEXT: mov v0.b[2], w8
; CHECK-NEXT: umov w8, v6.h[3]
; CHECK-NEXT: mov v0.b[3], w8
; CHECK-NEXT: umov w8, v1.h[0]
; CHECK-NEXT: mov v0.b[4], w8
; CHECK-NEXT: umov w8, v1.h[1]
; CHECK-NEXT: mov v0.b[5], w8
; CHECK-NEXT: umov w8, v1.h[2]
; CHECK-NEXT: mov v0.b[6], w8
; CHECK-NEXT: umov w8, v1.h[3]
; CHECK-NEXT: xtn v1.4h, v2.4s
; CHECK-NEXT: fcvtzs v2.4s, v3.4s
; CHECK-NEXT: mov v0.b[7], w8
; CHECK-NEXT: umov w8, v1.h[0]
; CHECK-NEXT: smin v2.4s, v2.4s, v4.4s
; CHECK-NEXT: mov v0.b[8], w8
; CHECK-NEXT: umov w8, v1.h[1]
; CHECK-NEXT: smax v2.4s, v2.4s, v5.4s		; CHECK-NEXT: smax v2.4s, v2.4s, v5.4s
; CHECK-NEXT: mov v0.b[9], w8		; CHECK-NEXT: smax v1.4s, v1.4s, v5.4s
; CHECK-NEXT: umov w8, v1.h[2]		; CHECK-NEXT: smax v0.4s, v0.4s, v5.4s
; CHECK-NEXT: mov v0.b[10], w8		; CHECK-NEXT: uzp1 v2.8h, v2.8h, v3.8h
; CHECK-NEXT: umov w8, v1.h[3]		; CHECK-NEXT: uzp1 v0.8h, v0.8h, v1.8h
; CHECK-NEXT: xtn v1.4h, v2.4s		; CHECK-NEXT: uzp1 v0.16b, v0.16b, v2.16b
; CHECK-NEXT: mov v0.b[11], w8
; CHECK-NEXT: umov w8, v1.h[0]
; CHECK-NEXT: mov v0.b[12], w8
; CHECK-NEXT: umov w8, v1.h[1]
; CHECK-NEXT: mov v0.b[13], w8
; CHECK-NEXT: umov w8, v1.h[2]
; CHECK-NEXT: mov v0.b[14], w8
; CHECK-NEXT: umov w8, v1.h[3]
; CHECK-NEXT: mov v0.b[15], w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%x = call <16 x i8> @llvm.fptosi.sat.v16f32.v16i8(<16 x float> %f)		%x = call <16 x i8> @llvm.fptosi.sat.v16f32.v16i8(<16 x float> %f)
ret <16 x i8> %x		ret <16 x i8> %x
}		}

define <8 x i16> @test_signed_v8f32_v8i16(<8 x float> %f) {		define <8 x i16> @test_signed_v8f32_v8i16(<8 x float> %f) {
; CHECK-LABEL: test_signed_v8f32_v8i16:		; CHECK-LABEL: test_signed_v8f32_v8i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 728 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll

Show First 20 Lines • Show All 2,509 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%x = call <8 x i8> @llvm.fptoui.sat.v8f32.v8i8(<8 x float> %f)		%x = call <8 x i8> @llvm.fptoui.sat.v8f32.v8i8(<8 x float> %f)
ret <8 x i8> %x		ret <8 x i8> %x
}		}

define <16 x i8> @test_unsigned_v16f32_v16i8(<16 x float> %f) {		define <16 x i8> @test_unsigned_v16f32_v16i8(<16 x float> %f) {
; CHECK-LABEL: test_unsigned_v16f32_v16i8:		; CHECK-LABEL: test_unsigned_v16f32_v16i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: movi v4.2d, #0x0000ff000000ff		; CHECK-NEXT: movi v4.2d, #0x0000ff000000ff
; CHECK-NEXT: fcvtzu v0.4s, v0.4s		; CHECK-NEXT: fcvtzu v3.4s, v3.4s
; CHECK-NEXT: fcvtzu v1.4s, v1.4s
; CHECK-NEXT: fcvtzu v2.4s, v2.4s		; CHECK-NEXT: fcvtzu v2.4s, v2.4s
; CHECK-NEXT: umin v0.4s, v0.4s, v4.4s		; CHECK-NEXT: fcvtzu v1.4s, v1.4s
; CHECK-NEXT: umin v1.4s, v1.4s, v4.4s		; CHECK-NEXT: fcvtzu v0.4s, v0.4s
; CHECK-NEXT: umin v2.4s, v2.4s, v4.4s		; CHECK-NEXT: umin v3.4s, v3.4s, v4.4s
; CHECK-NEXT: xtn v5.4h, v0.4s
; CHECK-NEXT: xtn v1.4h, v1.4s
; CHECK-NEXT: umov w8, v5.h[0]
; CHECK-NEXT: umov w9, v5.h[1]
; CHECK-NEXT: fmov s0, w8
; CHECK-NEXT: umov w8, v5.h[2]
; CHECK-NEXT: mov v0.b[1], w9
; CHECK-NEXT: mov v0.b[2], w8
; CHECK-NEXT: umov w8, v5.h[3]
; CHECK-NEXT: mov v0.b[3], w8
; CHECK-NEXT: umov w8, v1.h[0]
; CHECK-NEXT: mov v0.b[4], w8
; CHECK-NEXT: umov w8, v1.h[1]
; CHECK-NEXT: mov v0.b[5], w8
; CHECK-NEXT: umov w8, v1.h[2]
; CHECK-NEXT: mov v0.b[6], w8
; CHECK-NEXT: umov w8, v1.h[3]
; CHECK-NEXT: xtn v1.4h, v2.4s
; CHECK-NEXT: fcvtzu v2.4s, v3.4s
; CHECK-NEXT: mov v0.b[7], w8
; CHECK-NEXT: umov w8, v1.h[0]
; CHECK-NEXT: umin v2.4s, v2.4s, v4.4s		; CHECK-NEXT: umin v2.4s, v2.4s, v4.4s
; CHECK-NEXT: mov v0.b[8], w8		; CHECK-NEXT: umin v1.4s, v1.4s, v4.4s
; CHECK-NEXT: umov w8, v1.h[1]		; CHECK-NEXT: umin v0.4s, v0.4s, v4.4s
; CHECK-NEXT: mov v0.b[9], w8		; CHECK-NEXT: uzp1 v2.8h, v2.8h, v3.8h
; CHECK-NEXT: umov w8, v1.h[2]		; CHECK-NEXT: uzp1 v0.8h, v0.8h, v1.8h
; CHECK-NEXT: mov v0.b[10], w8		; CHECK-NEXT: uzp1 v0.16b, v0.16b, v2.16b
; CHECK-NEXT: umov w8, v1.h[3]
; CHECK-NEXT: xtn v1.4h, v2.4s
; CHECK-NEXT: mov v0.b[11], w8
; CHECK-NEXT: umov w8, v1.h[0]
; CHECK-NEXT: mov v0.b[12], w8
; CHECK-NEXT: umov w8, v1.h[1]
; CHECK-NEXT: mov v0.b[13], w8
; CHECK-NEXT: umov w8, v1.h[2]
; CHECK-NEXT: mov v0.b[14], w8
; CHECK-NEXT: umov w8, v1.h[3]
; CHECK-NEXT: mov v0.b[15], w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%x = call <16 x i8> @llvm.fptoui.sat.v16f32.v16i8(<16 x float> %f)		%x = call <16 x i8> @llvm.fptoui.sat.v16f32.v16i8(<16 x float> %f)
ret <16 x i8> %x		ret <16 x i8> %x
}		}

define <8 x i16> @test_unsigned_v8f32_v8i16(<8 x float> %f) {		define <8 x i16> @test_unsigned_v8f32_v8i16(<8 x float> %f) {
; CHECK-LABEL: test_unsigned_v8f32_v8i16:		; CHECK-LABEL: test_unsigned_v8f32_v8i16:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 569 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/neon-extracttruncate.ll

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	entry:
%i6 = insertelement <8 x i8> %i5, i8 %t6, i32 6		%i6 = insertelement <8 x i8> %i5, i8 %t6, i32 6
%i7 = insertelement <8 x i8> %i6, i8 %t7, i32 7		%i7 = insertelement <8 x i8> %i6, i8 %t7, i32 7
ret <8 x i8> %i7		ret <8 x i8> %i7
}		}

define <16 x i8> @extract_4_v4i16(<4 x i16> %a, <4 x i16> %b, <4 x i16> %c, <4 x i16> %d) {		define <16 x i8> @extract_4_v4i16(<4 x i16> %a, <4 x i16> %b, <4 x i16> %c, <4 x i16> %d) {
; CHECK-LABEL: extract_4_v4i16:		; CHECK-LABEL: extract_4_v4i16:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
; CHECK-NEXT: umov w9, v0.h[0]
; CHECK-NEXT: umov w10, v0.h[1]
; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2		; CHECK-NEXT: // kill: def $d2 killed $d2 def $q2
; CHECK-NEXT: umov w8, v2.h[0]		; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
; CHECK-NEXT: // kill: def $d3 killed $d3 def $q3		; CHECK-NEXT: // kill: def $d3 killed $d3 def $q3
; CHECK-NEXT: fmov s4, w9		; CHECK-NEXT: // kill: def $d1 killed $d1 def $q1
; CHECK-NEXT: umov w9, v0.h[2]		; CHECK-NEXT: mov v2.d[1], v3.d[0]
; CHECK-NEXT: mov v4.b[1], w10		; CHECK-NEXT: mov v0.d[1], v1.d[0]
; CHECK-NEXT: umov w10, v0.h[3]		; CHECK-NEXT: uzp1 v0.16b, v0.16b, v2.16b
; CHECK-NEXT: mov v4.b[2], w9
; CHECK-NEXT: umov w9, v1.h[0]
; CHECK-NEXT: mov v4.b[3], w10
; CHECK-NEXT: umov w10, v1.h[1]
; CHECK-NEXT: mov v4.b[4], w9
; CHECK-NEXT: umov w9, v1.h[2]
; CHECK-NEXT: mov v4.b[5], w10
; CHECK-NEXT: umov w10, v1.h[3]
; CHECK-NEXT: mov v4.b[6], w9
; CHECK-NEXT: umov w9, v2.h[1]
; CHECK-NEXT: mov v4.b[7], w10
; CHECK-NEXT: mov v4.b[8], w8
; CHECK-NEXT: umov w8, v2.h[2]
; CHECK-NEXT: mov v4.b[9], w9
; CHECK-NEXT: umov w9, v2.h[3]
; CHECK-NEXT: mov v4.b[10], w8
; CHECK-NEXT: umov w8, v3.h[0]
; CHECK-NEXT: mov v4.b[11], w9
; CHECK-NEXT: umov w9, v3.h[1]
; CHECK-NEXT: mov v4.b[12], w8
; CHECK-NEXT: umov w8, v3.h[2]
; CHECK-NEXT: mov v4.b[13], w9
; CHECK-NEXT: umov w9, v3.h[3]
; CHECK-NEXT: mov v4.b[14], w8
; CHECK-NEXT: mov v4.b[15], w9
; CHECK-NEXT: mov v0.16b, v4.16b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%a0 = extractelement <4 x i16> %a, i32 0		%a0 = extractelement <4 x i16> %a, i32 0
%a1 = extractelement <4 x i16> %a, i32 1		%a1 = extractelement <4 x i16> %a, i32 1
%a2 = extractelement <4 x i16> %a, i32 2		%a2 = extractelement <4 x i16> %a, i32 2
%a3 = extractelement <4 x i16> %a, i32 3		%a3 = extractelement <4 x i16> %a, i32 3
%b0 = extractelement <4 x i16> %b, i32 0		%b0 = extractelement <4 x i16> %b, i32 0
%b1 = extractelement <4 x i16> %b, i32 1		%b1 = extractelement <4 x i16> %b, i32 1
Show All 40 Lines	entry:
%i14 = insertelement <16 x i8> %i13, i8 %t14, i32 14		%i14 = insertelement <16 x i8> %i13, i8 %t14, i32 14
%i15 = insertelement <16 x i8> %i14, i8 %t15, i32 15		%i15 = insertelement <16 x i8> %i14, i8 %t15, i32 15
ret <16 x i8> %i15		ret <16 x i8> %i15
}		}

define <16 x i8> @extract_4_v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c, <4 x i32> %d) {		define <16 x i8> @extract_4_v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c, <4 x i32> %d) {
; CHECK-LABEL: extract_4_v4i32:		; CHECK-LABEL: extract_4_v4i32:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov w8, v0.s[1]		; CHECK-NEXT: uzp1 v2.8h, v2.8h, v3.8h
; CHECK-NEXT: mov w9, v0.s[2]		; CHECK-NEXT: uzp1 v0.8h, v0.8h, v1.8h
; CHECK-NEXT: mov w10, v0.s[3]		; CHECK-NEXT: uzp1 v0.16b, v0.16b, v2.16b
; CHECK-NEXT: mov v0.b[1], w8
; CHECK-NEXT: fmov w8, s1
; CHECK-NEXT: mov v0.b[2], w9
; CHECK-NEXT: mov w9, v1.s[1]
; CHECK-NEXT: mov v0.b[3], w10
; CHECK-NEXT: mov v0.b[4], w8
; CHECK-NEXT: mov w8, v1.s[2]
; CHECK-NEXT: mov v0.b[5], w9
; CHECK-NEXT: mov w9, v1.s[3]
; CHECK-NEXT: mov v0.b[6], w8
; CHECK-NEXT: fmov w8, s2
; CHECK-NEXT: mov v0.b[7], w9
; CHECK-NEXT: mov w9, v2.s[1]
; CHECK-NEXT: mov v0.b[8], w8
; CHECK-NEXT: mov w8, v2.s[2]
; CHECK-NEXT: mov v0.b[9], w9
; CHECK-NEXT: mov w9, v2.s[3]
; CHECK-NEXT: mov v0.b[10], w8
; CHECK-NEXT: fmov w8, s3
; CHECK-NEXT: mov v0.b[11], w9
; CHECK-NEXT: mov w9, v3.s[1]
; CHECK-NEXT: mov v0.b[12], w8
; CHECK-NEXT: mov w8, v3.s[2]
; CHECK-NEXT: mov v0.b[13], w9
; CHECK-NEXT: mov w9, v3.s[3]
; CHECK-NEXT: mov v0.b[14], w8
; CHECK-NEXT: mov v0.b[15], w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%b0 = extractelement <4 x i32> %b, i32 0		%b0 = extractelement <4 x i32> %b, i32 0
%b1 = extractelement <4 x i32> %b, i32 1		%b1 = extractelement <4 x i32> %b, i32 1
Show All 40 Lines	entry:
%i14 = insertelement <16 x i8> %i13, i8 %t14, i32 14		%i14 = insertelement <16 x i8> %i13, i8 %t14, i32 14
%i15 = insertelement <16 x i8> %i14, i8 %t15, i32 15		%i15 = insertelement <16 x i8> %i14, i8 %t15, i32 15
ret <16 x i8> %i15		ret <16 x i8> %i15
}		}

define <16 x i8> @extract_4_mixed(<4 x i16> %a, <4 x i32> %b, <4 x i32> %c, <4 x i16> %d) {		define <16 x i8> @extract_4_mixed(<4 x i16> %a, <4 x i32> %b, <4 x i32> %c, <4 x i16> %d) {
; CHECK-LABEL: extract_4_mixed:		; CHECK-LABEL: extract_4_mixed:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0		; CHECK-NEXT: xtn v2.4h, v2.4s
; CHECK-NEXT: umov w8, v0.h[0]
; CHECK-NEXT: umov w9, v0.h[1]
; CHECK-NEXT: // kill: def $d3 killed $d3 def $q3		; CHECK-NEXT: // kill: def $d3 killed $d3 def $q3
; CHECK-NEXT: fmov s4, w8		; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
; CHECK-NEXT: umov w8, v0.h[2]		; CHECK-NEXT: xtn2 v0.8h, v1.4s
; CHECK-NEXT: mov v4.b[1], w9		; CHECK-NEXT: mov v2.d[1], v3.d[0]
; CHECK-NEXT: umov w9, v0.h[3]		; CHECK-NEXT: uzp1 v0.16b, v0.16b, v2.16b
; CHECK-NEXT: mov v4.b[2], w8
; CHECK-NEXT: fmov w8, s1
; CHECK-NEXT: mov v4.b[3], w9
; CHECK-NEXT: mov w9, v1.s[1]
; CHECK-NEXT: mov v4.b[4], w8
; CHECK-NEXT: mov w8, v1.s[2]
; CHECK-NEXT: mov v4.b[5], w9
; CHECK-NEXT: mov w9, v1.s[3]
; CHECK-NEXT: mov v4.b[6], w8
; CHECK-NEXT: fmov w8, s2
; CHECK-NEXT: mov v4.b[7], w9
; CHECK-NEXT: mov w9, v2.s[1]
; CHECK-NEXT: mov v4.b[8], w8
; CHECK-NEXT: mov w8, v2.s[2]
; CHECK-NEXT: mov v4.b[9], w9
; CHECK-NEXT: mov w9, v2.s[3]
; CHECK-NEXT: mov v4.b[10], w8
; CHECK-NEXT: umov w8, v3.h[0]
; CHECK-NEXT: mov v4.b[11], w9
; CHECK-NEXT: umov w9, v3.h[1]
; CHECK-NEXT: mov v4.b[12], w8
; CHECK-NEXT: umov w8, v3.h[2]
; CHECK-NEXT: mov v4.b[13], w9
; CHECK-NEXT: umov w9, v3.h[3]
; CHECK-NEXT: mov v4.b[14], w8
; CHECK-NEXT: mov v4.b[15], w9
; CHECK-NEXT: mov v0.16b, v4.16b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%a0 = extractelement <4 x i16> %a, i32 0		%a0 = extractelement <4 x i16> %a, i32 0
%a1 = extractelement <4 x i16> %a, i32 1		%a1 = extractelement <4 x i16> %a, i32 1
%a2 = extractelement <4 x i16> %a, i32 2		%a2 = extractelement <4 x i16> %a, i32 2
%a3 = extractelement <4 x i16> %a, i32 3		%a3 = extractelement <4 x i16> %a, i32 3
%b0 = extractelement <4 x i32> %b, i32 0		%b0 = extractelement <4 x i32> %b, i32 0
%b1 = extractelement <4 x i32> %b, i32 1		%b1 = extractelement <4 x i32> %b, i32 1
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	entry:
%i14 = insertelement <16 x i8> %i13, i8 %t14, i32 14		%i14 = insertelement <16 x i8> %i13, i8 %t14, i32 14
%i15 = insertelement <16 x i8> %i14, i8 %t15, i32 15		%i15 = insertelement <16 x i8> %i14, i8 %t15, i32 15
ret <16 x i8> %i15		ret <16 x i8> %i15
}		}

define <16 x i8> @extract_4_v4i32_one(<4 x i32> %a) {		define <16 x i8> @extract_4_v4i32_one(<4 x i32> %a) {
; CHECK-LABEL: extract_4_v4i32_one:		; CHECK-LABEL: extract_4_v4i32_one:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mov w8, v0.s[1]		; CHECK-NEXT: uzp1 v0.8h, v0.8h, v0.8h
; CHECK-NEXT: fmov w9, s0		; CHECK-NEXT: uzp1 v0.16b, v0.16b, v0.16b
; CHECK-NEXT: mov w10, v0.s[2]
; CHECK-NEXT: mov w11, v0.s[3]
; CHECK-NEXT: mov v0.b[1], w8
; CHECK-NEXT: mov v0.b[2], w10
; CHECK-NEXT: mov v0.b[3], w11
; CHECK-NEXT: mov v0.b[4], w9
; CHECK-NEXT: mov v0.b[5], w8
; CHECK-NEXT: mov v0.b[6], w10
; CHECK-NEXT: mov v0.b[7], w11
; CHECK-NEXT: mov v0.b[8], w9
; CHECK-NEXT: mov v0.b[9], w8
; CHECK-NEXT: mov v0.b[10], w10
; CHECK-NEXT: mov v0.b[11], w11
; CHECK-NEXT: mov v0.b[12], w9
; CHECK-NEXT: mov v0.b[13], w8
; CHECK-NEXT: mov v0.b[14], w10
; CHECK-NEXT: mov v0.b[15], w11
; CHECK-NEXT: ret		; CHECK-NEXT: ret
entry:		entry:
%a0 = extractelement <4 x i32> %a, i32 0		%a0 = extractelement <4 x i32> %a, i32 0
%a1 = extractelement <4 x i32> %a, i32 1		%a1 = extractelement <4 x i32> %a, i32 1
%a2 = extractelement <4 x i32> %a, i32 2		%a2 = extractelement <4 x i32> %a, i32 2
%a3 = extractelement <4 x i32> %a, i32 3		%a3 = extractelement <4 x i32> %a, i32 3
%t0 = trunc i32 %a0 to i8		%t0 = trunc i32 %a0 to i8
%t1 = trunc i32 %a1 to i8		%t1 = trunc i32 %a1 to i8
Show All 21 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Turn truncating buildvectors into truncatesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 412956

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll

llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll

llvm/test/CodeGen/AArch64/neon-extracttruncate.ll

[AArch64] Turn truncating buildvectors into truncates
ClosedPublic