This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Accept more constant FP BuildVectors.
AbandonedPublic

Authored by jonpa on Feb 12 2019, 12:03 PM.

Download Raw Diff

Details

Reviewers

Summary

I could not simply revert the recent removal of some FP BuildVecotr tests, but by using a new method isBuildVectorAllLegalFPImms() to check if all the operands of the BVN are legal FP immediates, more cases than just all-ones or all-zeros can be left as-is during legalization.

This does make most of the recently removed test functions pass again, except for f3 and f4 in vec-const-06.ll, which involved constants only producible with VGBM. We don't do this for FP constants as they are thought to not be useful in practice.

f3: removed: VGBM FP constants: <2 x double> <double 0xFF000000FFFF0000, double 0xFFFFFF00FFFF00>
f4: replaced VGBM FP constant to all-ones: <double 0xff000000ffff0000, double undef>

This patch changes one file on SPEC (/447.dealII/build/sparse_matrix_ez.float.s), with 8 x vgmg -> vgbm, which seems to be the inverse change from the recent VGBM commit.

Diff Detail

Event Timeline

jonpa created this revision.Feb 12 2019, 12:03 PM

Hmm. Actually, I'm now wondering why we need to reject anything in the first place. Can't we improve isFPImmLegal to accept *anything* that can be constructed via any of the vector instructions (VGBM, VGM, VREPI)?

Bascially, if we have a float X, it can be loaded as FP immediate if BUILD_VECTOR (X, undef, ...) can be loaded, and should thus be considered legal. And if we do that, then any float X that occurs as any component of a BUILD_VECTOR that can be loaded, can itself can also be loaded (that's obvious if BUILD_VECTOR can be loaded via replication, and also true if BUILD_VECTOR can be loaded via VGBM since you just need to shift the mask) and is therefore legal and wouldn't cause the BUILD_VECTOR to be forced to the constant pool.

So in end, we just need two routines:

can a FP immediate or BUILD_VECTOR be loaded?
actually load a FP immediate or BUILD_VECTOR

Both routines should work on BUILD_VECTORs and FP immediates (the FP immediate case could be handled by constructing BUILD_VECTOR (X, undef ...), but it's probably simpler to just hande the cases directly). The first of those routines would then be called both from lowerBUILD_VECTOR and isFPImmLegal, and the second routine would be called in ::Select for the BUILD_VECTOR and ConstantFP cases.

Does this make sense? It may not make much difference on benchmarks, but just as a general principle if we have an instruction that can do something, we should be using it, if it's possible without a lot of overhead ...

Tried this idea, see https://reviews.llvm.org/D58270

Replaced by https://reviews.llvm.org/D58270.

Revision Contents

Path

Size

lib/

Target/

SystemZ/

SystemZISelLowering.h

1 line

SystemZISelLowering.cpp

21 lines

test/

CodeGen/

SystemZ/

vec-const-05.ll

57 lines

vec-const-06.ll

30 lines

Diff 186526

lib/Target/SystemZ/SystemZISelLowering.h

Show First 20 Lines • Show All 510 Lines • ▼ Show 20 Lines	public:

bool supportSwiftError() const override {		bool supportSwiftError() const override {
return true;		return true;
}		}

static bool tryBuildVectorByteMask(BuildVectorSDNode *BVN, uint64_t &Mask);		static bool tryBuildVectorByteMask(BuildVectorSDNode *BVN, uint64_t &Mask);
static bool analyzeFPImm(const APFloat &Imm, unsigned BitWidth,		static bool analyzeFPImm(const APFloat &Imm, unsigned BitWidth,
unsigned &Start, unsigned &End, const SystemZInstrInfo *TII);		unsigned &Start, unsigned &End, const SystemZInstrInfo *TII);
		bool isBuildVectorAllLegalFPImms(BuildVectorSDNode *BVN) const;
private:		private:
const SystemZSubtarget &Subtarget;		const SystemZSubtarget &Subtarget;

// Implement LowerOperation for individual opcodes.		// Implement LowerOperation for individual opcodes.
SDValue getVectorCmp(SelectionDAG &DAG, unsigned Opcode,		SDValue getVectorCmp(SelectionDAG &DAG, unsigned Opcode,
const SDLoc &DL, EVT VT,		const SDLoc &DL, EVT VT,
SDValue CmpOp0, SDValue CmpOp1) const;		SDValue CmpOp0, SDValue CmpOp1) const;
SDValue lowerVectorSETCC(SelectionDAG &DAG, const SDLoc &DL,		SDValue lowerVectorSETCC(SelectionDAG &DAG, const SDLoc &DL,
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,553 Lines • ▼ Show 20 Lines	static SDValue buildVector(SelectionDAG &DAG, const SDLoc &DL, EVT VT,
// Use VLVGx to insert the other elements.		// Use VLVGx to insert the other elements.
for (unsigned I = 0; I < NumElements; ++I)		for (unsigned I = 0; I < NumElements; ++I)
if (!Done[I] && !Elems[I].isUndef() && Elems[I] != ReplicatedVal)		if (!Done[I] && !Elems[I].isUndef() && Elems[I] != ReplicatedVal)
Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VT, Result, Elems[I],		Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VT, Result, Elems[I],
DAG.getConstant(I, DL, MVT::i32));		DAG.getConstant(I, DL, MVT::i32));
return Result;		return Result;
}		}

		bool SystemZTargetLowering::
		isBuildVectorAllLegalFPImms(BuildVectorSDNode *BVN) const {
		EVT VT = BVN->getValueType(0);
		unsigned NumElements = VT.getVectorNumElements();
		EVT EltVT = VT.getScalarType();
		if (!EltVT.isFloatingPoint())
		return false;
		for (unsigned I = 0; I < NumElements; ++I) {
		SDValue Op = BVN->getOperand(I);
		if (Op.isUndef())
		continue;
		ConstantFPSDNode *FPImm = dyn_cast<ConstantFPSDNode>(Op);
		if (FPImm == nullptr \|\| !isFPImmLegal(FPImm->getValueAPF(), EltVT))
		return false;
		}
		return true;
		}

SDValue SystemZTargetLowering::lowerBUILD_VECTOR(SDValue Op,		SDValue SystemZTargetLowering::lowerBUILD_VECTOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
const SystemZInstrInfo *TII =		const SystemZInstrInfo *TII =
static_cast<const SystemZInstrInfo *>(Subtarget.getInstrInfo());		static_cast<const SystemZInstrInfo *>(Subtarget.getInstrInfo());
auto *BVN = cast<BuildVectorSDNode>(Op.getNode());		auto *BVN = cast<BuildVectorSDNode>(Op.getNode());
SDLoc DL(Op);		SDLoc DL(Op);
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

if (BVN->isConstant()) {		if (BVN->isConstant()) {
// Try using VECTOR GENERATE BYTE MASK. This is the architecturally-		// Try using VECTOR GENERATE BYTE MASK. This is the architecturally-
// preferred way of creating all-zero and all-one vectors so give it		// preferred way of creating all-zero and all-one vectors so give it
// priority over other methods below.		// priority over other methods below.
uint64_t Mask;		uint64_t Mask;
if (ISD::isBuildVectorAllZeros(Op.getNode()) \|\|		if (ISD::isBuildVectorAllZeros(Op.getNode()) \|\|
ISD::isBuildVectorAllOnes(Op.getNode()) \|\|		ISD::isBuildVectorAllOnes(Op.getNode()) \|\|
(VT.isInteger() && tryBuildVectorByteMask(BVN, Mask)))		(tryBuildVectorByteMask(BVN, Mask) &&
		(VT.isInteger() \|\| isBuildVectorAllLegalFPImms(BVN))))
return Op;		return Op;

// Try using some form of replication.		// Try using some form of replication.
APInt SplatBits, SplatUndef;		APInt SplatBits, SplatUndef;
unsigned SplatBitSize;		unsigned SplatBitSize;
bool HasAnyUndefs;		bool HasAnyUndefs;
if (BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs,		if (BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs,
8, true) &&		8, true) &&
▲ Show 20 Lines • Show All 2,896 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/vec-const-05.ll

	; Test vector byte masks, v4f32 version. Only all-zero vectors are handled.			; Test vector byte masks, v4f32 version.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \| FileCheck %s

	; Test an all-zeros vector.			; Test an all-zeros vector.
	define <4 x float> @f0() {			define <4 x float> @f1() {
	; CHECK-LABEL: f0:			; CHECK-LABEL: f1:
	; CHECK: vgbm %v24, 0			; CHECK: vgbm %v24, 0
	; CHECK: br %r14			; CHECK: br %r14
	ret <4 x float> zeroinitializer			ret <4 x float> zeroinitializer
	}			}

	; Test that undefs are treated as zero.			; Test an all-ones vector.
	define <4 x float> @f1() {			define <4 x float> @f2() {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f2:
	; CHECK: vgbm %v24, 0			; CHECK: vgbm %v24, 65535
	; CHECK: br %r14			; CHECK: br %r14
	ret <4 x float> <float zeroinitializer, float undef,			ret <4 x float> <float 0xffffffffe0000000, float 0xffffffffe0000000,
	float zeroinitializer, float undef>			float 0xffffffffe0000000, float 0xffffffffe0000000>
				}

				; Test a mixed vector (mask 0xc731).
				define <4 x float> @f3() {
				; CHECK-LABEL: f3:
				; CHECK: vgbm %v24, 50993
				; CHECK: br %r14
				ret <4 x float> <float 0xffffe00000000000, float 0x381fffffe0000000,
				float 0x379fffe000000000, float 0x371fe00000000000>
				}

				; Test that undefs are treated as zero (mask 0xc031).
				define <4 x float> @f4() {
				; CHECK-LABEL: f4:
				; CHECK: vgbm %v24, 49201
				; CHECK: br %r14
				ret <4 x float> <float 0xffffe00000000000, float undef,
				float 0x379fffe000000000, float 0x371fe00000000000>
				}

				; Test that we don't use VGBM if one of the bytes is not 0 or 0xff.
				define <4 x float> @f5() {
				; CHECK-LABEL: f5:
				; CHECK-NOT: vgbm
				; CHECK: br %r14
				ret <4 x float> <float 0xffffe00000000000, float 0x381fffffc0000000,
				float 0x379fffe000000000, float 0x371fe00000000000>
	}			}

	; Test an all-zeros v2f32 that gets promoted to v4f32.			; Test an all-zeros v2f32 that gets promoted to v4f32.
	define <2 x float> @f2() {			define <2 x float> @f6() {
	; CHECK-LABEL: f2:			; CHECK-LABEL: f6:
	; CHECK: vgbm %v24, 0			; CHECK: vgbm %v24, 0
	; CHECK: br %r14			; CHECK: br %r14
	ret <2 x float> zeroinitializer			ret <2 x float> zeroinitializer
	}			}

				; Test a mixed v2f32 that gets promoted to v4f32 (mask 0xc700).
				define <2 x float> @f7() {
				; CHECK-LABEL: f7:
				; CHECK: vgbm %v24, 50944
				; CHECK: br %r14
				ret <2 x float> <float 0xffffe00000000000, float 0x381fffffe0000000>
				}

test/CodeGen/SystemZ/vec-const-06.ll

	; Test vector byte masks, v2f64 version. Only all-zero vectors are handled.			; Test vector byte masks, v2f64 version.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \| FileCheck %s

	; Test an all-zeros vector.			; Test an all-zeros vector.
	define <2 x double> @f0() {			define <2 x double> @f1() {
	; CHECK-LABEL: f0:			; CHECK-LABEL: f1:
	; CHECK: vgbm %v24, 0			; CHECK: vgbm %v24, 0
	; CHECK: br %r14			; CHECK: br %r14
	ret <2 x double> zeroinitializer			ret <2 x double> zeroinitializer
	}			}

				; Test an all-ones vector.
				define <2 x double> @f2() {
				; CHECK-LABEL: f2:
				; CHECK: vgbm %v24, 65535
				; CHECK: br %r14
				ret <2 x double> <double 0xffffffffffffffff, double 0xffffffffffffffff>
				}

	; Test that undefs are treated as zero.			; Test that undefs are treated as zero.
	define <2 x double> @f1() {			define <2 x double> @f3() {
	; CHECK-LABEL: f1:			; CHECK-LABEL: f3:
	; CHECK: vgbm %v24, 0			; CHECK: vgbm %v24, 65280
				; CHECK: br %r14
				ret <2 x double> <double 0xffffffffffffffff, double undef>
				}

				; Test that we don't use VGBM if one of the bytes is not 0 or 0xff.
				define <2 x double> @f4() {
				; CHECK-LABEL: f4:
				; CHECK-NOT: vgbm
	; CHECK: br %r14			; CHECK: br %r14
	ret <2 x double> <double zeroinitializer, double undef>			ret <2 x double> <double 0xfe000000ffff0000, double 0x00ffffff00ffff00>
	}			}