This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Wait with selection of VREPI and VGM until after DAGCombine2.
AbandonedPublic

Authored by jonpa on Feb 7 2019, 1:40 PM.

Download Raw Diff

Details

Reviewers

Summary

The lowerBUILD_VECTOR() handling of constant splats refactored into two methods: analyzeBVNForConstantReplication() is used both during lowering and in Select() to analyze the BVNs. SystemZDAGToDAGISel::tryReplicateConstantSplat() then performs the actual instruction selection.

This is a continuation of the handling of constant BVNs during legalization with the idea to expose more constant vectors to Combine2. The same problem with FP constants as was seen with VGBM persists, and this time it seems to have more impact on SPEC.

I see

spec-llvm_A_master/ spec-llvm
vrepg          :                 3279                 3621     +342
vgmg           :                  344                   25     -319
larl           :               153427               153664     +237
ldeb           :                 8511                 8728     +217
vl             :                22873                23025     +152
vst            :                24059                24131      +72
vlrepf         :                  688                  709      +21
vmrhg          :                 1154                 1170      +16
vmrhf          :                  728                  740      +12
vgbm           :                 3887                 3898      +11
...
Spill|Reload   :               189582               189793     +211

There are many more larls, which should be due to many more ConstantFP vectors loaded from the constant pool. It seems these are the FP splats present in SPEC:

206 BUILD_VECTOR ConstantFP:f64<1.000000e+00>, ConstantFP:f64<1.000000e+00>
 96 BUILD_VECTOR ConstantFP:f64<5.000000e-01>, ConstantFP:f64<5.000000e-01>
 17 BUILD_VECTOR ConstantFP:f64<2.000000e+00>, ConstantFP:f64<2.000000e+00>
 12 BUILD_VECTOR ConstantFP:f64<-2.000000e+00>, ConstantFP:f64<-2.000000e+00>
  8 BUILD_VECTOR undef:f64, ConstantFP:f64<2.000000e+00>
  8 BUILD_VECTOR ConstantFP:f32<nan>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<nan>, ConstantFP:f32<0.000000e+00>
  4 BUILD_VECTOR ConstantFP:f64<1.250000e-01>, ConstantFP:f64<1.250000e-01>
  4 BUILD_VECTOR ConstantFP:f32<1.000000e+00>, ConstantFP:f32<1.000000e+00>, ConstantFP:f32<1.000000e+00>, ConstantFP:f32<1.000000e+00>
  4 BUILD_VECTOR ConstantFP:f32<1.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<1.000000e+00>, ConstantFP:f32<0.000000e+00>
  3 BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<1.000000e+00>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<1.000000e+00>
  2 BUILD_VECTOR ConstantFP:f32<nan>, ConstantFP:f32<nan>, ConstantFP:f32<nan>, ConstantFP:f32<nan>
  1 BUILD_VECTOR ConstantFP:f64<INF>, ConstantFP:f64<INF>
  1 BUILD_VECTOR ConstantFP:f64<7.812500e-03>, ConstantFP:f64<7.812500e-03>
  1 BUILD_VECTOR ConstantFP:f32<5.000000e-01>, ConstantFP:f32<5.000000e-01>, ConstantFP:f32<5.000000e-01>, ConstantFP:f32<5.000000e-01>
  1 BUILD_VECTOR ConstantFP:f32<0.000000e+00>, ConstantFP:f32<5.000000e-01>, ConstantFP:f32<0.000000e+00>, ConstantFP:f32<5.000000e-01>

Not sure what to do next - does this mean we should consider again improving the handling of ConstantFP nodes in the backend, or should we abandon this?

Tests with FP splats that are no longer supported have been deleted.

Note: tryReplicateConstantSplat() first calls SelectCode() on the bitcast and then on the REPLICATE / ROTATE_MASK. Not entirely sure if that's wise, or if getMachineNode() should be called instead.

Diff Detail

Event Timeline

jonpa created this revision.Feb 7 2019, 1:40 PM

Well, for replication we definitely need proper float support. For VGBM, we could ignore floats since (except for the all-zero and all-one pattern) there aren't really any common FP constants that can be created via a VGBM pattern. But that isn't true at all for replication ...

Why doesn't this work properly for ContantFP nodes, again?

In D57926#1390395, @uweigand wrote:

Well, for replication we definitely need proper float support. For VGBM, we could ignore floats since (except for the all-zero and all-one pattern) there aren't really any common FP constants that can be created via a VGBM pattern. But that isn't true at all for replication ...

Why doesn't this work properly for ContantFP nodes, again?

Any ConstantFP nodes that are not giving 'true' from isFPImmLegal() seem to end up in the constant pool.

The options as we tried before seem to be:

return true from isFPImmLegal() for the values that fit VREPI / VGM, and *either*
- (a) Add handling so that also the scalar ConstantFP nodes can be selected
- (b) Extend isFPImmLegal() to have an argument so that we only return true for the constant splat BVN case.

Rebuild the BVN to have TargetConstantFP operands so that they are not touched by common-code.

Rebuild the BVN to have integer constant operands and bitcast to the FP vector type.

I wonder if "1a" would also help scalar FP code a bit? In that case that might be an interesting option.

I don't think that any alternative is really simple and neat to implement, except maybe "1b".

I think 1a would be the best option, indeed.

I think 1a would be the best option, indeed.

OK, added improvement of isFPImmLegal(). It seems that VGM covers all cases for FP immediates on benchmarks, so VREPI is not considered in analyzeFPImm().

This is the change of instruction counts after just extending isFPImmLegal() to call analyzeFPImm() and to then have emitFPScalarImm() handle the new FP??ScalarImmPseudo:s:

vgmg           :                  344                 4825    +4481
larl           :               153427               150578    -2849
ldeb           :                 8511                 6092    -2419
ldr            :                25172                24061    -1111
ld             :                34810                34163     -647
vst            :                24059                24602     +543
vl             :                22873                23401     +528
std            :                23550                23038     -512
lde            :                 9404                 8970     -434
j              :               120185               120336     +151
lg             :               374157               374024     -133
ste            :                 9160                 9073      -87
cebr           :                 1450                 1535      +85
ceb            :                  526                  444      -82
aebr           :                 5217                 5290      +73
aeb            :                 2577                 2506      -71
wfsdb          :                 7471                 7408      -63
sdbr           :                 3062                 3123      +61
meeb           :                 3580                 3522      -58
...
Spill|Reload   :               189582               189140     -442

The further difference to then as well keeping the BUILD_VECTOR opcode through DAGCombine2 (as before), is now merely

vgbm           :                 3887                 3895       +8
vgmg           :                 4825                 4817       -8

This involves the tryReplicateConstantSplat() and analyzeBVNForConstantReplication() methods.

Would it help to split this patch into two parts per above?

I made a new post for just the handling of scalar FP immediates with VGM: https://reviews.llvm.org/D58003

It has updated tests as well as a new one, and it seems we could do that before handling the constant BUILD_VECTORS here.

Patch rebased.

As before, this gives very little change on benchmarks (only the same previously seen 8 x vgmg -> vgbm).

Removed FP tests that require replication (vec-const-11.ll and vec-const-12.ll), as well as tests in vec-const-18.ll that demand a VGMF for a <2 x double> vector. As before, this is lost functionality that does not affect benchmarks.

Again, I am somewhat hesitant to use the two SelectCode() calls in tryReplicateConstantSplat(), but it seems to work although I cannot find another example of this usage. The alternative is to select a machine opcode for Op instead.

Replaced by https://reviews.llvm.org/D58270.

Revision Contents

Path

Size

lib/

Target/

SystemZ/

SystemZISelDAGToDAG.cpp

42 lines

SystemZISelLowering.h

5 lines

SystemZISelLowering.cpp

136 lines

test/

CodeGen/

SystemZ/

vec-const-11.ll

vec-const-12.ll

vec-const-18.ll

61 lines

Diff 186546

lib/Target/SystemZ/SystemZISelDAGToDAG.cpp

Show First 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	class SystemZDAGToDAGISel : public SelectionDAGISel {
// (Opcode UpperVal LowerVal)		// (Opcode UpperVal LowerVal)
//		//
// If Op0 is nonnull, then Node can be implemented using:		// If Op0 is nonnull, then Node can be implemented using:
//		//
// (Opcode (Opcode Op0 UpperVal) LowerVal)		// (Opcode (Opcode Op0 UpperVal) LowerVal)
void splitLargeImmediate(unsigned Opcode, SDNode *Node, SDValue Op0,		void splitLargeImmediate(unsigned Opcode, SDNode *Node, SDValue Op0,
uint64_t UpperVal, uint64_t LowerVal);		uint64_t UpperVal, uint64_t LowerVal);

		// Try to load a vector constant with a REPLICATE or ROTATE_MASK.
		bool tryReplicateConstantSplat(BuildVectorSDNode *BVN);

// Try to use gather instruction Opcode to implement vector insertion N.		// Try to use gather instruction Opcode to implement vector insertion N.
bool tryGather(SDNode *N, unsigned Opcode);		bool tryGather(SDNode *N, unsigned Opcode);

// Try to use scatter instruction Opcode to implement store Store.		// Try to use scatter instruction Opcode to implement store Store.
bool tryScatter(StoreSDNode *Store, unsigned Opcode);		bool tryScatter(StoreSDNode *Store, unsigned Opcode);

// Change a chain of {load; op; store} of the same value into a simple op		// Change a chain of {load; op; store} of the same value into a simple op
// through memory of that value, if the uses of the modified value and its		// through memory of that value, if the uses of the modified value and its
▲ Show 20 Lines • Show All 812 Lines • ▼ Show 20 Lines	void SystemZDAGToDAGISel::splitLargeImmediate(unsigned Opcode, SDNode *Node,
SDValue Lower = CurDAG->getConstant(LowerVal, DL, VT);		SDValue Lower = CurDAG->getConstant(LowerVal, DL, VT);
SDValue Or = CurDAG->getNode(Opcode, DL, VT, Upper, Lower);		SDValue Or = CurDAG->getNode(Opcode, DL, VT, Upper, Lower);

ReplaceNode(Node, Or.getNode());		ReplaceNode(Node, Or.getNode());

SelectCode(Or.getNode());		SelectCode(Or.getNode());
}		}

		bool SystemZDAGToDAGISel::tryReplicateConstantSplat(BuildVectorSDNode *BVN) {
		const SystemZInstrInfo *TII = getInstrInfo();
		int64_t ReplicatedImm;
		unsigned RotateStart, RotateEnd;
		MVT VecVT;
		if (!SystemZTargetLowering::analyzeBVNForConstantReplication(
		BVN, ReplicatedImm, RotateStart, RotateEnd, VecVT, TII))
		return false;

		SDLoc DL(BVN);
		EVT VT = BVN->getValueType(0);
		SDValue Op;
		SDValue BitCast;
		if (ReplicatedImm != INT64_MAX) {
		Op = CurDAG->getNode(SystemZISD::REPLICATE, DL, VecVT,
		CurDAG->getConstant(ReplicatedImm, DL, MVT::i32, false,
		true /isOpaque/));
		BitCast = CurDAG->getNode(ISD::BITCAST, DL, VT, Op);
		} else {
		Op = CurDAG->getNode(
		SystemZISD::ROTATE_MASK, DL, VecVT,
		CurDAG->getConstant(RotateStart, DL, MVT::i32, false,
		true /isOpaque/),
		CurDAG->getConstant(RotateEnd, DL, MVT::i32, false, true /isOpaque/));
		BitCast = CurDAG->getNode(ISD::BITCAST, DL, VT, Op);
		}

		ReplaceNode(BVN, BitCast.getNode());
		SelectCode(BitCast.getNode());
		if (Op != BitCast) {
		assert(!Op.use_empty() && "Expected bitcasted SDValue to remain in DAG");
		SelectCode(Op.getNode());
		}

		return true;
		}

bool SystemZDAGToDAGISel::tryGather(SDNode *N, unsigned Opcode) {		bool SystemZDAGToDAGISel::tryGather(SDNode *N, unsigned Opcode) {
SDValue ElemV = N->getOperand(2);		SDValue ElemV = N->getOperand(2);
auto *ElemN = dyn_cast<ConstantSDNode>(ElemV);		auto *ElemN = dyn_cast<ConstantSDNode>(ElemV);
if (!ElemN)		if (!ElemN)
return false;		return false;

unsigned Elem = ElemN->getZExtValue();		unsigned Elem = ElemN->getZExtValue();
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
▲ Show 20 Lines • Show All 390 Lines • ▼ Show 20 Lines	case ISD::BUILD_VECTOR: {
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
uint64_t Mask = 0;		uint64_t Mask = 0;
if (SystemZTargetLowering::tryBuildVectorByteMask(BVN, Mask)) {		if (SystemZTargetLowering::tryBuildVectorByteMask(BVN, Mask)) {
SDNode *Res = CurDAG->getMachineNode(SystemZ::VGBM, DL, VT,		SDNode *Res = CurDAG->getMachineNode(SystemZ::VGBM, DL, VT,
CurDAG->getTargetConstant(Mask, DL, MVT::i32));		CurDAG->getTargetConstant(Mask, DL, MVT::i32));
ReplaceNode(Node, Res);		ReplaceNode(Node, Res);
return;		return;
}		}
		if (tryReplicateConstantSplat(BVN))
		return;
break;		break;
}		}

case ISD::ConstantFP: {		case ISD::ConstantFP: {
APFloat Imm = cast<ConstantFPSDNode>(Node)->getValueAPF();		APFloat Imm = cast<ConstantFPSDNode>(Node)->getValueAPF();
if (Imm.isZero() \|\| Imm.isNegZero())		if (Imm.isZero() \|\| Imm.isNegZero())
break;		break;
const SystemZInstrInfo *TII = getInstrInfo();		const SystemZInstrInfo *TII = getInstrInfo();
▲ Show 20 Lines • Show All 333 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZISelLowering.h

Show First 20 Lines • Show All 510 Lines • ▼ Show 20 Lines	public:

bool supportSwiftError() const override {		bool supportSwiftError() const override {
return true;		return true;
}		}

static bool tryBuildVectorByteMask(BuildVectorSDNode *BVN, uint64_t &Mask);		static bool tryBuildVectorByteMask(BuildVectorSDNode *BVN, uint64_t &Mask);
static bool analyzeFPImm(const APFloat &Imm, unsigned BitWidth,		static bool analyzeFPImm(const APFloat &Imm, unsigned BitWidth,
unsigned &Start, unsigned &End, const SystemZInstrInfo *TII);		unsigned &Start, unsigned &End, const SystemZInstrInfo *TII);
		static bool analyzeBVNForConstantReplication(BuildVectorSDNode *BVN,
		int64_t &ReplicatedImm,
		unsigned &RotateStart,
		unsigned &RotateEnd, MVT &VecVT,
		const SystemZInstrInfo *TII);
private:		private:
const SystemZSubtarget &Subtarget;		const SystemZSubtarget &Subtarget;

// Implement LowerOperation for individual opcodes.		// Implement LowerOperation for individual opcodes.
SDValue getVectorCmp(SelectionDAG &DAG, unsigned Opcode,		SDValue getVectorCmp(SelectionDAG &DAG, unsigned Opcode,
const SDLoc &DL, EVT VT,		const SDLoc &DL, EVT VT,
SDValue CmpOp0, SDValue CmpOp1) const;		SDValue CmpOp0, SDValue CmpOp1) const;
SDValue lowerVectorSETCC(SelectionDAG &DAG, const SDLoc &DL,		SDValue lowerVectorSETCC(SelectionDAG &DAG, const SDLoc &DL,
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,312 Lines • ▼ Show 20 Lines	if (!Op.isUndef()) {
else if (Byte != 0)		else if (Byte != 0)
return false;		return false;
}		}
}		}
}		}
return true;		return true;
}		}

// Try to load a vector constant in which BitsPerElement-bit value Value
// is replicated to fill the vector. VT is the type of the resulting
// constant, which may have elements of a different size from BitsPerElement.
// Return the SDValue of the constant on success, otherwise return
// an empty value.
static SDValue tryBuildVectorReplicate(SelectionDAG &DAG,
const SystemZInstrInfo *TII,
const SDLoc &DL, EVT VT, uint64_t Value,
unsigned BitsPerElement) {
// Signed 16-bit values can be replicated using VREPI.
// Mark the constants as opaque or DAGCombiner will convert back to
// BUILD_VECTOR.
int64_t SignedValue = SignExtend64(Value, BitsPerElement);
if (isInt<16>(SignedValue)) {
MVT VecVT = MVT::getVectorVT(MVT::getIntegerVT(BitsPerElement),
SystemZ::VectorBits / BitsPerElement);
SDValue Op = DAG.getNode(
SystemZISD::REPLICATE, DL, VecVT,
DAG.getConstant(SignedValue, DL, MVT::i32, false, true /isOpaque/));
return DAG.getNode(ISD::BITCAST, DL, VT, Op);
}
// See whether rotating the constant left some N places gives a value that
// is one less than a power of 2 (i.e. all zeros followed by all ones).
// If so we can use VGM.
unsigned Start, End;
if (TII->isRxSBGMask(Value, BitsPerElement, Start, End)) {
// isRxSBGMask returns the bit numbers for a full 64-bit value,
// with 0 denoting 1 << 63 and 63 denoting 1. Convert them to
// bit numbers for an BitsPerElement value, so that 0 denotes
// 1 << (BitsPerElement-1).
Start -= 64 - BitsPerElement;
End -= 64 - BitsPerElement;
MVT VecVT = MVT::getVectorVT(MVT::getIntegerVT(BitsPerElement),
SystemZ::VectorBits / BitsPerElement);
SDValue Op = DAG.getNode(
SystemZISD::ROTATE_MASK, DL, VecVT,
DAG.getConstant(Start, DL, MVT::i32, false, true /isOpaque/),
DAG.getConstant(End, DL, MVT::i32, false, true /isOpaque/));
return DAG.getNode(ISD::BITCAST, DL, VT, Op);
}
return SDValue();
}

// If a BUILD_VECTOR contains some EXTRACT_VECTOR_ELTs, it's usually		// If a BUILD_VECTOR contains some EXTRACT_VECTOR_ELTs, it's usually
// better to use VECTOR_SHUFFLEs on them, only using BUILD_VECTOR for		// better to use VECTOR_SHUFFLEs on them, only using BUILD_VECTOR for
// the non-EXTRACT_VECTOR_ELT elements. See if the given BUILD_VECTOR		// the non-EXTRACT_VECTOR_ELT elements. See if the given BUILD_VECTOR
// would benefit from this representation and return it if so.		// would benefit from this representation and return it if so.
static SDValue tryBuildVectorShuffle(SelectionDAG &DAG,		static SDValue tryBuildVectorShuffle(SelectionDAG &DAG,
BuildVectorSDNode *BVN) {		BuildVectorSDNode *BVN) {
EVT VT = BVN->getValueType(0);		EVT VT = BVN->getValueType(0);
unsigned NumElements = VT.getVectorNumElements();		unsigned NumElements = VT.getVectorNumElements();
▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	static SDValue buildVector(SelectionDAG &DAG, const SDLoc &DL, EVT VT,
// Use VLVGx to insert the other elements.		// Use VLVGx to insert the other elements.
for (unsigned I = 0; I < NumElements; ++I)		for (unsigned I = 0; I < NumElements; ++I)
if (!Done[I] && !Elems[I].isUndef() && Elems[I] != ReplicatedVal)		if (!Done[I] && !Elems[I].isUndef() && Elems[I] != ReplicatedVal)
Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VT, Result, Elems[I],		Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VT, Result, Elems[I],
DAG.getConstant(I, DL, MVT::i32));		DAG.getConstant(I, DL, MVT::i32));
return Result;		return Result;
}		}

		// Return true if BVN holds a vector constant splat which can be loaded with
		// a REPLICATE or ROTATE_MASK. ReplicatedImm is then the value to use with
		// REPLICATE, or INT64_MAX in which case RotateStart and RotateEnd hold the
		// values for ROTATE_MASK. VecVT is the type of the resulting constant,
		// which may have elements of a different size from the BVN elements.
		bool SystemZTargetLowering::analyzeBVNForConstantReplication(
		BuildVectorSDNode *BVN, int64_t &ReplicatedImm, unsigned &RotateStart,
		unsigned &RotateEnd, MVT &VecVT, const SystemZInstrInfo *TII) {
		APInt SplatBits, SplatUndef;
		unsigned SplatBitSize;
		bool HasAnyUndefs;
		if (!(BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs,
		8, true) &&
		SplatBitSize <= 64))
		return false;
		VecVT = MVT::getVectorVT(MVT::getIntegerVT(SplatBitSize),
		SystemZ::VectorBits / SplatBitSize);
		ReplicatedImm = INT64_MAX;
		auto tryValue = [&](uint64_t Value) -> bool {
		int64_t SignedValue = SignExtend64(Value, SplatBitSize);
		if (isInt<16>(SignedValue)) {
		ReplicatedImm = SignedValue;
		return true;
		}
		if (TII->isRxSBGMask(Value, SplatBitSize, RotateStart, RotateEnd)) {
		RotateStart -= 64 - SplatBitSize;
		RotateEnd -= 64 - SplatBitSize;
		return true;
		}
		return false;
		};

		// First try assuming that any undefined bits above the highest set bit
		// and below the lowest set bit are 1s. This increases the likelihood of
		// being able to use a sign-extended element value in VECTOR REPLICATE
		// IMMEDIATE or a wraparound mask in VECTOR GENERATE MASK.
		uint64_t SplatBitsZ = SplatBits.getZExtValue();
		uint64_t SplatUndefZ = SplatUndef.getZExtValue();
		uint64_t Lower =
		(SplatUndefZ & ((uint64_t(1) << findFirstSet(SplatBitsZ)) - 1));
		uint64_t Upper =
		(SplatUndefZ & ~((uint64_t(1) << findLastSet(SplatBitsZ)) - 1));
		if (tryValue(SplatBitsZ \| Upper \| Lower))
		return true;

		// Now try assuming that any undefined bits between the first and
		// last defined set bits are set. This increases the chances of
		// using a non-wraparound mask.
		uint64_t Middle = SplatUndefZ & ~Upper & ~Lower;
		return tryValue(SplatBitsZ \| Middle);
		}

SDValue SystemZTargetLowering::lowerBUILD_VECTOR(SDValue Op,		SDValue SystemZTargetLowering::lowerBUILD_VECTOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
const SystemZInstrInfo *TII =
static_cast<const SystemZInstrInfo *>(Subtarget.getInstrInfo());
auto *BVN = cast<BuildVectorSDNode>(Op.getNode());		auto *BVN = cast<BuildVectorSDNode>(Op.getNode());
SDLoc DL(Op);		SDLoc DL(Op);
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

if (BVN->isConstant()) {		if (BVN->isConstant()) {
// Try using VECTOR GENERATE BYTE MASK. This is the architecturally-		// Try using VECTOR GENERATE BYTE MASK. This is the architecturally-
// preferred way of creating all-zero and all-one vectors so give it		// preferred way of creating all-zero and all-one vectors so give it
// priority over other methods below.		// priority over other methods below.
uint64_t Mask;		uint64_t Mask;
if (ISD::isBuildVectorAllZeros(Op.getNode()) \|\|		if (ISD::isBuildVectorAllZeros(Op.getNode()) \|\|
ISD::isBuildVectorAllOnes(Op.getNode()) \|\|		ISD::isBuildVectorAllOnes(Op.getNode()) \|\|
(VT.isInteger() && tryBuildVectorByteMask(BVN, Mask)))		(VT.isInteger() && tryBuildVectorByteMask(BVN, Mask)))
return Op;		return Op;

// Try using some form of replication.		// Try using some form of replication.
APInt SplatBits, SplatUndef;		const SystemZInstrInfo *TII =
unsigned SplatBitSize;		static_cast<const SystemZInstrInfo *>(Subtarget.getInstrInfo());
bool HasAnyUndefs;		int64_t ReplicatedImm;
if (BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs,		unsigned RotateStart, RotateEnd;
8, true) &&		MVT VecVT;
SplatBitSize <= 64) {		if (analyzeBVNForConstantReplication(BVN, ReplicatedImm, RotateStart,
// First try assuming that any undefined bits above the highest set bit		RotateEnd, VecVT, TII))
// and below the lowest set bit are 1s. This increases the likelihood of
// being able to use a sign-extended element value in VECTOR REPLICATE
// IMMEDIATE or a wraparound mask in VECTOR GENERATE MASK.
uint64_t SplatBitsZ = SplatBits.getZExtValue();
uint64_t SplatUndefZ = SplatUndef.getZExtValue();
uint64_t Lower = (SplatUndefZ
& ((uint64_t(1) << findFirstSet(SplatBitsZ)) - 1));
uint64_t Upper = (SplatUndefZ
& ~((uint64_t(1) << findLastSet(SplatBitsZ)) - 1));
uint64_t Value = SplatBitsZ \| Upper \| Lower;
SDValue Op = tryBuildVectorReplicate(DAG, TII, DL, VT, Value,
SplatBitSize);
if (Op.getNode())
return Op;

// Now try assuming that any undefined bits between the first and
// last defined set bits are set. This increases the chances of
// using a non-wraparound mask.
uint64_t Middle = SplatUndefZ & ~Upper & ~Lower;
Value = SplatBitsZ \| Middle;
Op = tryBuildVectorReplicate(DAG, TII, DL, VT, Value, SplatBitSize);
if (Op.getNode())
return Op;		return Op;
}

// Fall back to loading it from memory.		// Fall back to loading it from memory.
return SDValue();		return SDValue();
}		}

// See if we should use shuffles to construct the vector from other vectors.		// See if we should use shuffles to construct the vector from other vectors.
if (SDValue Res = tryBuildVectorShuffle(DAG, BVN))		if (SDValue Res = tryBuildVectorShuffle(DAG, BVN))
return Res;		return Res;
▲ Show 20 Lines • Show All 2,862 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/vec-const-11.ll

This file was deleted.

	; Test vector replicates, v4f32 version.
	;
	; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \| FileCheck %s

	; Test a byte-granularity replicate with the lowest useful value.
	define <4 x float> @f1() {
	; CHECK-LABEL: f1:
	; CHECK: vrepib %v24, 1
	; CHECK: br %r14
	ret <4 x float> <float 0x3820202020000000, float 0x3820202020000000,
	float 0x3820202020000000, float 0x3820202020000000>
	}

	; Test a byte-granularity replicate with an arbitrary value.
	define <4 x float> @f2() {
	; CHECK-LABEL: f2:
	; CHECK: vrepib %v24, -55
	; CHECK: br %r14
	ret <4 x float> <float 0xc139393920000000, float 0xc139393920000000,
	float 0xc139393920000000, float 0xc139393920000000>
	}

	; Test a byte-granularity replicate with the highest useful value.
	define <4 x float> @f3() {
	; CHECK-LABEL: f3:
	; CHECK: vrepib %v24, -2
	; CHECK: br %r14
	ret <4 x float> <float 0xc7dfdfdfc0000000, float 0xc7dfdfdfc0000000,
	float 0xc7dfdfdfc0000000, float 0xc7dfdfdfc0000000>
	}

	; Test a halfword-granularity replicate with the lowest useful value.
	define <4 x float> @f4() {
	; CHECK-LABEL: f4:
	; CHECK: vrepih %v24, 1
	; CHECK: br %r14
	ret <4 x float> <float 0x37a0001000000000, float 0x37a0001000000000,
	float 0x37a0001000000000, float 0x37a0001000000000>
	}

	; Test a halfword-granularity replicate with an arbitrary value.
	define <4 x float> @f5() {
	; CHECK-LABEL: f5:
	; CHECK: vrepih %v24, 25650
	; CHECK: br %r14
	ret <4 x float> <float 0x44864c8640000000, float 0x44864c8640000000,
	float 0x44864c8640000000, float 0x44864c8640000000>
	}

	; Test a halfword-granularity replicate with the highest useful value.
	define <4 x float> @f6() {
	; CHECK-LABEL: f6:
	; CHECK: vrepih %v24, -2
	; CHECK: br %r14
	ret <4 x float> <float 0xffffdfffc0000000, float 0xffffdfffc0000000,
	float 0xffffdfffc0000000, float 0xffffdfffc0000000>
	}

	; Test a word-granularity replicate with the lowest useful positive value.
	define <4 x float> @f7() {
	; CHECK-LABEL: f7:
	; CHECK: vrepif %v24, 1
	; CHECK: br %r14
	ret <4 x float> <float 0x36a0000000000000, float 0x36a0000000000000,
	float 0x36a0000000000000, float 0x36a0000000000000>
	}

	; Test a word-granularity replicate with the highest in-range value.
	define <4 x float> @f8() {
	; CHECK-LABEL: f8:
	; CHECK: vrepif %v24, 32767
	; CHECK: br %r14
	ret <4 x float> <float 0x378fffc000000000, float 0x378fffc000000000,
	float 0x378fffc000000000, float 0x378fffc000000000>
	}

	; Test a word-granularity replicate with the next highest value.
	; This cannot use VREPIF.
	define <4 x float> @f9() {
	; CHECK-LABEL: f9:
	; CHECK-NOT: vrepif
	; CHECK: br %r14
	ret <4 x float> <float 0x3790000000000000, float 0x3790000000000000,
	float 0x3790000000000000, float 0x3790000000000000>
	}

	; Test a word-granularity replicate with the lowest in-range value.
	define <4 x float> @f10() {
	; CHECK-LABEL: f10:
	; CHECK: vrepif %v24, -32768
	; CHECK: br %r14
	ret <4 x float> <float 0xfffff00000000000, float 0xfffff00000000000,
	float 0xfffff00000000000, float 0xfffff00000000000>
	}

	; Test a word-granularity replicate with the next lowest value.
	; This cannot use VREPIF.
	define <4 x float> @f11() {
	; CHECK-LABEL: f11:
	; CHECK-NOT: vrepif
	; CHECK: br %r14
	ret <4 x float> <float 0xffffefffe0000000, float 0xffffefffe0000000,
	float 0xffffefffe0000000, float 0xffffefffe0000000>
	}

	; Test a word-granularity replicate with the highest useful negative value.
	define <4 x float> @f12() {
	; CHECK-LABEL: f12:
	; CHECK: vrepif %v24, -2
	; CHECK: br %r14
	ret <4 x float> <float 0xffffffffc0000000, float 0xffffffffc0000000,
	float 0xffffffffc0000000, float 0xffffffffc0000000>
	}

	; Test a doubleword-granularity replicate with the lowest useful positive
	; value.
	define <4 x float> @f13() {
	; CHECK-LABEL: f13:
	; CHECK: vrepig %v24, 1
	; CHECK: br %r14
	ret <4 x float> <float 0.0, float 0x36a0000000000000,
	float 0.0, float 0x36a0000000000000>
	}

	; Test a doubleword-granularity replicate with the highest in-range value.
	define <4 x float> @f14() {
	; CHECK-LABEL: f14:
	; CHECK: vrepig %v24, 32767
	; CHECK: br %r14
	ret <4 x float> <float 0.0, float 0x378fffc000000000,
	float 0.0, float 0x378fffc000000000>
	}

	; Test a doubleword-granularity replicate with the next highest value.
	; This cannot use VREPIG.
	define <4 x float> @f15() {
	; CHECK-LABEL: f15:
	; CHECK-NOT: vrepig
	; CHECK: br %r14
	ret <4 x float> <float 0.0, float 0x3790000000000000,
	float 0.0, float 0x3790000000000000>
	}

	; Test a doubleword-granularity replicate with the lowest in-range value.
	define <4 x float> @f16() {
	; CHECK-LABEL: f16:
	; CHECK: vrepig %v24, -32768
	; CHECK: br %r14
	ret <4 x float> <float 0xffffffffe0000000, float 0xfffff00000000000,
	float 0xffffffffe0000000, float 0xfffff00000000000>
	}

	; Test a doubleword-granularity replicate with the next lowest value.
	; This cannot use VREPIG.
	define <4 x float> @f17() {
	; CHECK-LABEL: f17:
	; CHECK-NOT: vrepig
	; CHECK: br %r14
	ret <4 x float> <float 0xffffffffe0000000, float 0xffffefffe0000000,
	float 0xffffffffe0000000, float 0xffffefffe0000000>
	}

	; Test a doubleword-granularity replicate with the highest useful negative
	; value.
	define <4 x float> @f18() {
	; CHECK-LABEL: f18:
	; CHECK: vrepig %v24, -2
	; CHECK: br %r14
	ret <4 x float> <float 0xffffffffe0000000, float 0xffffffffc0000000,
	float 0xffffffffe0000000, float 0xffffffffc0000000>
	}

	; Repeat f14 with undefs optimistically treated as 0, 32767.
	define <4 x float> @f19() {
	; CHECK-LABEL: f19:
	; CHECK: vrepig %v24, 32767
	; CHECK: br %r14
	ret <4 x float> <float undef, float undef,
	float 0.0, float 0x378fffc000000000>
	}

	; Repeat f18 with undefs optimistically treated as -2, -1.
	define <4 x float> @f20() {
	; CHECK-LABEL: f20:
	; CHECK: vrepig %v24, -2
	; CHECK: br %r14
	ret <4 x float> <float 0xffffffffe0000000, float undef,
	float undef, float 0xffffffffc0000000>
	}

test/CodeGen/SystemZ/vec-const-12.ll

This file was deleted.

	; Test vector replicates, v2f64 version.
	;
	; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \| FileCheck %s

	; Test a byte-granularity replicate with the lowest useful value.
	define <2 x double> @f1() {
	; CHECK-LABEL: f1:
	; CHECK: vrepib %v24, 1
	; CHECK: br %r14
	ret <2 x double> <double 0x0101010101010101, double 0x0101010101010101>
	}

	; Test a byte-granularity replicate with an arbitrary value.
	define <2 x double> @f2() {
	; CHECK-LABEL: f2:
	; CHECK: vrepib %v24, -55
	; CHECK: br %r14
	ret <2 x double> <double 0xc9c9c9c9c9c9c9c9, double 0xc9c9c9c9c9c9c9c9>
	}

	; Test a byte-granularity replicate with the highest useful value.
	define <2 x double> @f3() {
	; CHECK-LABEL: f3:
	; CHECK: vrepib %v24, -2
	; CHECK: br %r14
	ret <2 x double> <double 0xfefefefefefefefe, double 0xfefefefefefefefe>
	}

	; Test a halfword-granularity replicate with the lowest useful value.
	define <2 x double> @f4() {
	; CHECK-LABEL: f4:
	; CHECK: vrepih %v24, 1
	; CHECK: br %r14
	ret <2 x double> <double 0x0001000100010001, double 0x0001000100010001>
	}

	; Test a halfword-granularity replicate with an arbitrary value.
	define <2 x double> @f5() {
	; CHECK-LABEL: f5:
	; CHECK: vrepih %v24, 25650
	; CHECK: br %r14
	ret <2 x double> <double 0x6432643264326432, double 0x6432643264326432>
	}

	; Test a halfword-granularity replicate with the highest useful value.
	define <2 x double> @f6() {
	; CHECK-LABEL: f6:
	; CHECK: vrepih %v24, -2
	; CHECK: br %r14
	ret <2 x double> <double 0xfffefffefffefffe, double 0xfffefffefffefffe>
	}

	; Test a word-granularity replicate with the lowest useful positive value.
	define <2 x double> @f7() {
	; CHECK-LABEL: f7:
	; CHECK: vrepif %v24, 1
	; CHECK: br %r14
	ret <2 x double> <double 0x0000000100000001, double 0x0000000100000001>
	}

	; Test a word-granularity replicate with the highest in-range value.
	define <2 x double> @f8() {
	; CHECK-LABEL: f8:
	; CHECK: vrepif %v24, 32767
	; CHECK: br %r14
	ret <2 x double> <double 0x00007fff00007fff, double 0x00007fff00007fff>
	}

	; Test a word-granularity replicate with the next highest value.
	; This cannot use VREPIF.
	define <2 x double> @f9() {
	; CHECK-LABEL: f9:
	; CHECK-NOT: vrepif
	; CHECK: br %r14
	ret <2 x double> <double 0x0000800000008000, double 0x0000800000008000>
	}

	; Test a word-granularity replicate with the lowest in-range value.
	define <2 x double> @f10() {
	; CHECK-LABEL: f10:
	; CHECK: vrepif %v24, -32768
	; CHECK: br %r14
	ret <2 x double> <double 0xffff8000ffff8000, double 0xffff8000ffff8000>
	}

	; Test a word-granularity replicate with the next lowest value.
	; This cannot use VREPIF.
	define <2 x double> @f11() {
	; CHECK-LABEL: f11:
	; CHECK-NOT: vrepif
	; CHECK: br %r14
	ret <2 x double> <double 0xffff7fffffff7fff, double 0xffff7fffffff7fff>
	}

	; Test a word-granularity replicate with the highest useful negative value.
	define <2 x double> @f12() {
	; CHECK-LABEL: f12:
	; CHECK: vrepif %v24, -2
	; CHECK: br %r14
	ret <2 x double> <double 0xfffffffefffffffe, double 0xfffffffefffffffe>
	}

	; Test a doubleword-granularity replicate with the lowest useful positive
	; value.
	define <2 x double> @f13() {
	; CHECK-LABEL: f13:
	; CHECK: vrepig %v24, 1
	; CHECK: br %r14
	ret <2 x double> <double 0x0000000000000001, double 0x0000000000000001>
	}

	; Test a doubleword-granularity replicate with the highest in-range value.
	define <2 x double> @f14() {
	; CHECK-LABEL: f14:
	; CHECK: vrepig %v24, 32767
	; CHECK: br %r14
	ret <2 x double> <double 0x0000000000007fff, double 0x0000000000007fff>
	}

	; Test a doubleword-granularity replicate with the next highest value.
	; This cannot use VREPIG.
	define <2 x double> @f15() {
	; CHECK-LABEL: f15:
	; CHECK-NOT: vrepig
	; CHECK: br %r14
	ret <2 x double> <double 0x0000000000008000, double 0x0000000000008000>
	}

	; Test a doubleword-granularity replicate with the lowest in-range value.
	define <2 x double> @f16() {
	; CHECK-LABEL: f16:
	; CHECK: vrepig %v24, -32768
	; CHECK: br %r14
	ret <2 x double> <double 0xffffffffffff8000, double 0xffffffffffff8000>
	}

	; Test a doubleword-granularity replicate with the next lowest value.
	; This cannot use VREPIG.
	define <2 x double> @f17() {
	; CHECK-LABEL: f17:
	; CHECK-NOT: vrepig
	; CHECK: br %r14
	ret <2 x double> <double 0xffffffffffff7fff, double 0xffffffffffff7fff>
	}

	; Test a doubleword-granularity replicate with the highest useful negative
	; value.
	define <2 x double> @f18() {
	; CHECK-LABEL: f18:
	; CHECK: vrepig %v24, -2
	; CHECK: br %r14
	ret <2 x double> <double 0xfffffffffffffffe, double 0xfffffffffffffffe>
	}

	; Repeat f14 with undefs optimistically treated as 32767.
	define <2 x double> @f19() {
	; CHECK-LABEL: f19:
	; CHECK: vrepig %v24, 32767
	; CHECK: br %r14
	ret <2 x double> <double undef, double 0x0000000000007fff>
	}

	; Repeat f18 with undefs optimistically treated as -2.
	define <2 x double> @f20() {
	; CHECK-LABEL: f20:
	; CHECK: vrepig %v24, -2
	; CHECK: br %r14
	ret <2 x double> <double undef, double 0xfffffffffffffffe>
	}

test/CodeGen/SystemZ/vec-const-18.ll

	; Test vector replicates that use VECTOR GENERATE MASK, v2f64 version.			; Test vector replicates that use VECTOR GENERATE MASK, v2f64 version.
	;			;
	; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \| FileCheck %s			; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \| FileCheck %s

	; Test a word-granularity replicate with the lowest value that cannot use
	; VREPIF.
	define <2 x double> @f1() {
	; CHECK-LABEL: f1:
	; CHECK: vgmf %v24, 16, 16
	; CHECK: br %r14
	ret <2 x double> <double 0x0000800000008000, double 0x0000800000008000>
	}

	; Test a word-granularity replicate that has the lower 17 bits set.
	define <2 x double> @f2() {
	; CHECK-LABEL: f2:
	; CHECK: vgmf %v24, 15, 31
	; CHECK: br %r14
	ret <2 x double> <double 0x0001ffff0001ffff, double 0x0001ffff0001ffff>
	}

	; Test a word-granularity replicate that has the upper 15 bits set.
	define <2 x double> @f3() {
	; CHECK-LABEL: f3:
	; CHECK: vgmf %v24, 0, 14
	; CHECK: br %r14
	ret <2 x double> <double 0xfffe0000fffe0000, double 0xfffe0000fffe0000>
	}

	; Test a word-granularity replicate that has middle bits set.
	define <2 x double> @f4() {
	; CHECK-LABEL: f4:
	; CHECK: vgmf %v24, 2, 11
	; CHECK: br %r14
	ret <2 x double> <double 0x3ff000003ff00000, double 0x3ff000003ff00000>
	}

	; Test a word-granularity replicate with a wrap-around mask.
	define <2 x double> @f5() {
	; CHECK-LABEL: f5:
	; CHECK: vgmf %v24, 17, 15
	; CHECK: br %r14
	ret <2 x double> <double 0xffff7fffffff7fff, double 0xffff7fffffff7fff>
	}

	; Test a doubleword-granularity replicate with the lowest value that cannot			; Test a doubleword-granularity replicate with the lowest value that cannot
	; use VREPIG.			; use VREPIG.
	define <2 x double> @f6() {			define <2 x double> @f1() {
	; CHECK-LABEL: f6:			; CHECK-LABEL: f1:
	; CHECK: vgmg %v24, 48, 48			; CHECK: vgmg %v24, 48, 48
	; CHECK: br %r14			; CHECK: br %r14
	ret <2 x double> <double 0x0000000000008000, double 0x0000000000008000>			ret <2 x double> <double 0x0000000000008000, double 0x0000000000008000>
	}			}

	; Test a doubleword-granularity replicate that has the lower 22 bits set.			; Test a doubleword-granularity replicate that has the lower 22 bits set.
	define <2 x double> @f7() {			define <2 x double> @f2() {
	; CHECK-LABEL: f7:			; CHECK-LABEL: f2:
	; CHECK: vgmg %v24, 42, 63			; CHECK: vgmg %v24, 42, 63
	; CHECK: br %r14			; CHECK: br %r14
	ret <2 x double> <double 0x000000000003fffff, double 0x000000000003fffff>			ret <2 x double> <double 0x000000000003fffff, double 0x000000000003fffff>
	}			}

	; Test a doubleword-granularity replicate that has the upper 45 bits set.			; Test a doubleword-granularity replicate that has the upper 45 bits set.
	define <2 x double> @f8() {			define <2 x double> @f3() {
	; CHECK-LABEL: f8:			; CHECK-LABEL: f3:
	; CHECK: vgmg %v24, 0, 44			; CHECK: vgmg %v24, 0, 44
	; CHECK: br %r14			; CHECK: br %r14
	ret <2 x double> <double 0xfffffffffff80000, double 0xfffffffffff80000>			ret <2 x double> <double 0xfffffffffff80000, double 0xfffffffffff80000>
	}			}

	; Test a doubleword-granularity replicate that has middle bits set.			; Test a doubleword-granularity replicate that has middle bits set.
	define <2 x double> @f9() {			define <2 x double> @f4() {
	; CHECK-LABEL: f9:			; CHECK-LABEL: f4:
	; CHECK: vgmg %v24, 2, 11			; CHECK: vgmg %v24, 2, 11
	; CHECK: br %r14			; CHECK: br %r14
	ret <2 x double> <double 0x3ff0000000000000, double 0x3ff0000000000000>			ret <2 x double> <double 0x3ff0000000000000, double 0x3ff0000000000000>
	}			}

	; Test a doubleword-granularity replicate with a wrap-around mask.			; Test a doubleword-granularity replicate with a wrap-around mask.
	define <2 x double> @f10() {			define <2 x double> @f5() {
	; CHECK-LABEL: f10:			; CHECK-LABEL: f5:
	; CHECK: vgmg %v24, 10, 0			; CHECK: vgmg %v24, 10, 0
	; CHECK: br %r14			; CHECK: br %r14
	ret <2 x double> <double 0x803fffffffffffff, double 0x803fffffffffffff>			ret <2 x double> <double 0x803fffffffffffff, double 0x803fffffffffffff>
	}			}