This is an archive of the discontinued LLVM Phabricator instance.

Differential D4080

[AArch64] Fix a pattern match failure caused by creating improper CONCAT_VECTOR.
Needs ReviewPublic

Authored by kevin.qin on Jun 9 2014, 7:06 PM.

Download Raw Diff

Details

Reviewers

t.p.northover

Summary

Hi Tim,

This patch is to fix a pattern match failure when handling IR like,
%tmp2 = fptosi <4 x double> %tmp1 to <4 x i16>

Failure reason is, ReconstructShuffle() wrongly created a CONCAT_VECTOR trying to concat 2 of v2i32 into v4i16. This patch can fix this issue and try to generate UZP1 instead of lots of MOV and INS. Please review.

Diff Detail

Event Timeline

kevin.qin updated this revision to Diff 10259.Jun 9 2014, 7:06 PM

kevin.qin retitled this revision from to [AArch64] Fix a pattern match failure caused by creating improper CONCAT_VECTOR..

kevin.qin updated this object.

kevin.qin edited the test plan for this revision. (Show Details)

kevin.qin added a reviewer: t.p.northover.

kevin.qin set the repository for this revision to rL LLVM.

kevin.qin added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptJun 9 2014, 7:06 PM

Hi Kevin,

Sorry it's taken me so long to get around to this one; it's rather a complicated patch (even in comparison to the surrounding code) and I wanted to make sure I did it justice.

I've got a few questions (at the end, as with usual Phab stuff)...

Cheers.

Tim.

lib/Target/AArch64/AArch64ISelLowering.cpp
4184–4185	Isn't the key type here VT.getVectorElementType()? I'm not sure I see the logic of caring about any AssertSext/AssertZext input: we're just going to be discarding the bits anyway in your example. In the reverse case (say extracting from v4i16 and inserting into v4i32), as far as I can see EXTRACT_VECTOR_ELT will basically be doing an anyext, so we don't have to care there either. Or do you have examples where this kind of check is needed?
4201–4202	This seems very analogous to the VExtOffsets vector, but is handled completely differently. Instead of mangling the actual BUILD_VECTOR node, perhaps we should create a similar OffsetMultipliers (say) variable and just record what we've done for later in the function.
4225	If this entire section of added code is moved to the start of the "for (unsigned i = 0; i < SourceVecs.size(); ++i)" loop, this block becomes redundant, we can just use the existing one at line 4176.

Hi Tim,

I have to admit this patch is a little complex to understand. I try to summarize what I did in this patch from the view of what I want to solve.

lib/Target/AArch64/AArch64ISelLowering.cpp
4182–4183	This condition check is moved from below to here, because without this check, for dags like(This can be created from lowering 'tmp2 = fptosi <4 x double> %tmp1 to <4 x i16>'), A: v4i16 = build_vector B, C,D, E B: i32 = extract_vector_elt F, lane 0 F: v2i32 = AssertSext v2i32, type i16 C: i32 = extract_vector_elt F, lane 1 F: v2i32 = AssertSext v2i32, type i16 D: i32 = extract_vector_elt G, lane 0 G: v2i32 = AssertSext v2i32, type i16 E: i32 = extract_vector_elt G, lane 1 G: v2i32 = AssertSext v2i32, type i16 That build_vector node will be lowering to a concat_vector combining 2 of v2i32 into v4i16, which is pattern match failed.
4184–4185	If input cannot pass element type check, we can simply get function return and the incorrect concat_vector would not be generated. But from the result, I saw lots of MOV and INS instructions are generated for this build_vector. To get better result, I added additional codes here specically for AssertSext /AssertZext, wishing to generate single UZIP1 instead. For AssertSext /AssertZext, only low bits of each element are valid, so for a v2i32 AssertSext node asserting holds i16, it can bitcast to v4i16, which lane 0, 2 are matching prior lanes and lane 1, 3 are undefined. So all lane numbers extracting on AssertSext node should adjust by multiplying 2. After that, this kind of build_vector can reconstruct to shuffle_vector and get single UZIP2 at last.
4201–4202	You mean rename Pow to OffsetMultipliers? I will do that in next version.
4225	Agreed.

Here is updated patch. Please review again. Thanks.

Herald added a subscriber: mcrosier. · View Herald TranscriptJun 16 2014, 10:49 PM

Hi Kevin,

Isn't the key type here VT.getVectorElementType()? I'm not sure
I see the logic of caring about any AssertSext/AssertZext input:
we're just going to be discarding the bits anyway in your example.

In the reverse case (say extracting from v4i16 and inserting into
v4i32), as far as I can see EXTRACT_VECTOR_ELT will basically
be doing an anyext, so we don't have to care there either.

Or do you have examples where this kind of check is needed?

If input cannot pass element type check, we can simply get function
return and the incorrect concat_vector would not be generated.

Fair enough, *something* clearly has to be done; I'm not disputing that.

But
from the result, I saw lots of MOV and INS instructions are generated
for this build_vector. To get better result, I added additional codes here
specically for AssertSext /AssertZext, wishing to generate single UZIP1
instead. For AssertSext /AssertZext, only low bits of each element are
valid, so for a v2i32 AssertSext node asserting holds i16, it can bitcast
to v4i16, which lane 0, 2 are matching prior lanes and lane 1, 3 are
undefined.

I think that works regardless of whether the AssertSext is present or
not. All the AssertZext and AssertSext nodes tell us is that the lanes
we *don't* care about are going to be either 0 or -1 (and that's
assuming they match up with the vector we're building, they're even
less useful otherwise).

Instead of mangling the actual BUILD_VECTOR node, perhaps we should
create a similar OffsetMultipliers (say) variable and just record what we've
done for later in the function.

You mean rename Pow to OffsetMultipliers? I will do that in next version.

I mean something a bit bigger: completely remove the
"DAG.getNode(ISD::BUILD_VECTOR, ...)" code and instead save the
information needed to extract the correct lanes later, in an "int
OffsetMultipliers[2] = { 1, 1 };" variable.

Purely for discussion purposes, I've attached a quick hatchet-job I've
done on the patch along the lines I'm suggesting (clearly unpolished).
It gets the test you wrote correct, without so many special cases. I
assume you had something similar before you settled on using
AssertZext and AssertSext, do you have an example of why it's not the
right solution?

Cheers.

Tim.

kevin-patch.diff5 KBDownload

Hi Tim,

After your modification, codes become cleaner and should execute faster. Thanks for this refactoring! I've tried more tests on this patch, showing no regression. As I have no suggestions on code changes any more, can I commit your version directly?

Hi Kevin,

Hi Tim,

I've commited it after adding some comments. It's r211144.

Cheers.
Kevin

2014-06-17 16:58 GMT+08:00 Tim Northover <t.p.northover@gmail.com>:

Hi Kevin,

http://reviews.llvm.org/D4080

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

60 lines

test/

CodeGen/

AArch64/

arm64-convert-v4f64.ll

32 lines

Diff 10477

lib/Target/AArch64/AArch64ISelLowering.cpp

Context not available.
	// shuffle in combination with VEXTs.	// shuffle in combination with VEXTs.
	SDValue AArch64TargetLowering::ReconstructShuffle(SDValue Op,	SDValue AArch64TargetLowering::ReconstructShuffle(SDValue Op,
	SelectionDAG &DAG) const {	SelectionDAG &DAG) const {
		assert(Op.getOpcode() == ISD::BUILD_VECTOR && "Unknown opcode!");
	SDLoc dl(Op);	SDLoc dl(Op);
	EVT VT = Op.getValueType();	EVT VT = Op.getValueType();
	unsigned NumElts = VT.getVectorNumElements();	unsigned NumElts = VT.getVectorNumElements();
Context not available.
	// This loop extracts the usage patterns of the source vectors	// This loop extracts the usage patterns of the source vectors
	// and prepares appropriate SDValues for a shuffle if possible.	// and prepares appropriate SDValues for a shuffle if possible.
	for (unsigned i = 0; i < SourceVecs.size(); ++i) {	for (unsigned i = 0; i < SourceVecs.size(); ++i) {
		if (SourceVecs[i].getValueType().getVectorElementType() !=
		VT.getVectorElementType()) {
		if (SourceVecs[i].getOpcode() == ISD::AssertSext \|\|
		SourceVecs[i].getOpcode() == ISD::AssertZext) {
		// For AssertSext/AssertZext, we need to bitcast it to the vector which
		// holds asserted element type, and modify the extracted lane number
		// pointing to correct lane. For example, if a v2i32 AssertSext node
		// asserts it holds 2 of i16 elements, firstly it will bitcast to v4i16.
		// then all lane number in EXTRACT_VECTOR_ELT extracting on it will be
		// doubled. Finally rebuild a new BUILD_VECTOR operating on those newly
		// created EXTRACT_VECTOR_ELTs to replace old Op.
		EVT AssertTy = cast<VTSDNode>(SourceVecs[i].getOperand(1))->getVT();
		EVT AssertVT = EVT::getVectorVT(*DAG.getContext(), AssertTy,
		SourceVecs[i].getValueSizeInBits() /
		AssertTy.getSizeInBits());
		EVT LegalTy = Op.getOperand(0).getValueType();
		// Create BITCAST on AssertSext/AssertZext to get a vector which element
		// type is AssertTy.
		kevin.qinAuthorUnsubmitted Not Done Reply Inline Actions This condition check is moved from below to here, because without this check, for dags like(This can be created from lowering 'tmp2 = fptosi <4 x double> %tmp1 to <4 x i16>'), A: v4i16 = build_vector B, C,D, E B: i32 = extract_vector_elt F, lane 0 F: v2i32 = AssertSext v2i32, type i16 C: i32 = extract_vector_elt F, lane 1 F: v2i32 = AssertSext v2i32, type i16 D: i32 = extract_vector_elt G, lane 0 G: v2i32 = AssertSext v2i32, type i16 E: i32 = extract_vector_elt G, lane 1 G: v2i32 = AssertSext v2i32, type i16 That build_vector node will be lowering to a concat_vector combining 2 of v2i32 into v4i16, which is pattern match failed. kevin.qin: This condition check is moved from below to here, because without this check, for dags like…
		SDValue BitCst = DAG.getNode(ISD::BITCAST, dl, AssertVT, SourceVecs[i]);
		unsigned OffsetMultipliers =
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Isn't the key type here VT.getVectorElementType()? I'm not sure I see the logic of caring about any AssertSext/AssertZext input: we're just going to be discarding the bits anyway in your example. In the reverse case (say extracting from v4i16 and inserting into v4i32), as far as I can see EXTRACT_VECTOR_ELT will basically be doing an anyext, so we don't have to care there either. Or do you have examples where this kind of check is needed? t.p.northover: Isn't the key type here VT.getVectorElementType()? I'm not sure I see the logic of caring about…
		kevin.qinAuthorUnsubmitted Not Done Reply Inline Actions If input cannot pass element type check, we can simply get function return and the incorrect concat_vector would not be generated. But from the result, I saw lots of MOV and INS instructions are generated for this build_vector. To get better result, I added additional codes here specically for AssertSext /AssertZext, wishing to generate single UZIP1 instead. For AssertSext /AssertZext, only low bits of each element are valid, so for a v2i32 AssertSext node asserting holds i16, it can bitcast to v4i16, which lane 0, 2 are matching prior lanes and lane 1, 3 are undefined. So all lane numbers extracting on AssertSext node should adjust by multiplying 2. After that, this kind of build_vector can reconstruct to shuffle_vector and get single UZIP2 at last. kevin.qin: If input cannot pass element type check, we can simply get function return and the incorrect…
		AssertVT.getVectorNumElements() /
		SourceVecs[i].getValueType().getVectorNumElements();
		// Collect operands to create new BUILD_VECTOR node, lanes in extracting
		// SourceVecs[i] should multiply OffsetMultipliers.
		SmallVector<SDValue, 16> BuildSrc;
		for (unsigned j = 0; j < NumElts; ++j) {
		if (Op.getOperand(j).getOperand(0) != SourceVecs[i]) {
		BuildSrc.push_back(Op.getOperand(j));
		continue;
		}
		unsigned OriginLane =
		cast<ConstantSDNode>(Op.getOperand(j).getOperand(1))
		->getSExtValue();
		SDValue ExtElt = DAG.getNode(
		ISD::EXTRACT_VECTOR_ELT, dl, LegalTy, BitCst,
		DAG.getIntPtrConstant(OriginLane * OffsetMultipliers));
		BuildSrc.push_back(ExtElt);
		t.p.northoverUnsubmitted Not Done Reply Inline Actions This seems very analogous to the VExtOffsets vector, but is handled completely differently. Instead of mangling the actual BUILD_VECTOR node, perhaps we should create a similar OffsetMultipliers (say) variable and just record what we've done for later in the function. t.p.northover: This seems very analogous to the VExtOffsets vector, but is handled completely differently.
		kevin.qinAuthorUnsubmitted Not Done Reply Inline Actions You mean rename Pow to OffsetMultipliers? I will do that in next version. kevin.qin: You mean rename Pow to OffsetMultipliers? I will do that in next version.
		}
		// Create new BUILD_VECTOR to replace old one.
		Op = DAG.getNode(ISD::BUILD_VECTOR, dl, VT,
		makeArrayRef(BuildSrc.data(), NumElts));
		SourceVecs[i] = BitCst;
		MaxElts[i] *= OffsetMultipliers;
		MinElts[i] *= OffsetMultipliers;
		} else {
		// Don't attempt to extract subvectors from BUILD_VECTOR sources
		// that expand or trunc the original value.
		return SDValue();
		}
		}
	if (SourceVecs[i].getValueType() == VT) {	if (SourceVecs[i].getValueType() == VT) {
	// No VEXT necessary	// No VEXT necessary
	ShuffleSrcs[i] = SourceVecs[i];	ShuffleSrcs[i] = SourceVecs[i];
		t.p.northoverUnsubmitted Not Done Reply Inline Actions If this entire section of added code is moved to the start of the "for (unsigned i = 0; i < SourceVecs.size(); ++i)" loop, this block becomes redundant, we can just use the existing one at line 4176. t.p.northover: If this entire section of added code is moved to the start of the "for (unsigned i = 0; i <…
		kevin.qinAuthorUnsubmitted Not Done Reply Inline Actions Agreed. kevin.qin: Agreed.
Context not available.
	continue;	continue;
	}	}

	// Don't attempt to extract subvectors from BUILD_VECTOR sources
	// that expand or trunc the original value.
	// TODO: We can try to bitcast and ANY_EXTEND the result but
	// we need to consider the cost of vector ANY_EXTEND, and the
	// legality of all the types.
	if (SourceVecs[i].getValueType().getVectorElementType() !=
	VT.getVectorElementType())
	return SDValue();

	// Since only 64-bit and 128-bit vectors are legal on ARM and	// Since only 64-bit and 128-bit vectors are legal on ARM and
	// we've eliminated the other cases...	// we've eliminated the other cases...
	assert(SourceVecs[i].getValueType().getVectorNumElements() == 2 * NumElts &&	assert(SourceVecs[i].getValueType().getVectorNumElements() == 2 * NumElts &&
Context not available.

test/CodeGen/AArch64/arm64-convert-v4f64.ll

This file was added.

				; RUN: llc < %s -march=arm64 \| FileCheck %s


				define <4 x i16> @fptosi_v4f64_to_v4i16(<4 x double>* %ptr) {
				; CHECK: fptosi_v4f64_to_v4i16
				; CHECK-DAG: fcvtzs v[[LHS:[0-9]+]].2d, v1.2d
				; CHECK-DAG: fcvtzs v[[RHS:[0-9]+]].2d, v0.2d
				; CHECK-DAG: xtn v[[LHS_NA:[0-9]+]].2s, v[[LHS]].2d
				; CHECK-DAG: xtn v[[RHS_NA:[0-9]+]].2s, v[[RHS]].2d
				; CHECK: uzp1 v0.4h, v[[RHS_NA]].4h, v[[LHS_NA]].4h
				%tmp1 = load <4 x double>* %ptr
				%tmp2 = fptosi <4 x double> %tmp1 to <4 x i16>
				ret <4 x i16> %tmp2
				}

				define <8 x i8> @fptosi_v4f64_to_v4i8(<8 x double>* %ptr) {
				; CHECK: fptosi_v4f64_to_v4i8
				; CHECK-DAG: fcvtzs v[[CONV3:[0-9]+]].2d, v3.2d
				; CHECK-DAG: fcvtzs v[[CONV2:[0-9]+]].2d, v2.2d
				; CHECK-DAG: fcvtzs v[[CONV1:[0-9]+]].2d, v1.2d
				; CHECK-DAG: fcvtzs v[[CONV0:[0-9]+]].2d, v0.2d
				; CHECK-DAG: xtn v[[NA3:[0-9]+]].2s, v[[CONV3]].2d
				; CHECK-DAG: xtn v[[NA2:[0-9]+]].2s, v[[CONV2]].2d
				; CHECK-DAG: xtn v[[NA1:[0-9]+]].2s, v[[CONV1]].2d
				; CHECK-DAG: xtn v[[NA0:[0-9]+]].2s, v[[CONV0]].2d
				; CHECK-DAG: uzp1 v[[TMP1:[0-9]+]].4h, v[[CONV2]].4h, v[[CONV3]].4h
				; CHECK-DAG: uzp1 v[[TMP2:[0-9]+]].4h, v[[CONV0]].4h, v[[CONV1]].4h
				; CHECK: uzp1 v0.8b, v[[TMP2]].8b, v[[TMP1]].8b
				%tmp1 = load <8 x double>* %ptr
				%tmp2 = fptosi <8 x double> %tmp1 to <8 x i8>
				ret <8 x i8> %tmp2
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix a pattern match failure caused by creating improper CONCAT_VECTOR.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 10477

lib/Target/AArch64/AArch64ISelLowering.cpp

test/CodeGen/AArch64/arm64-convert-v4f64.ll

[AArch64] Fix a pattern match failure caused by creating improper CONCAT_VECTOR.
Needs ReviewPublic