This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Wait with VGBM selection until after DAGCombine2.
ClosedPublic

Authored by jonpa on Jan 24 2019, 6:14 AM.

Download Raw Diff

Details

Reviewers

Summary

I was looking back at the shouldScalarizeBinop() hook, and found that if I used the X86 implementation of it, nothing changed on SPEC.

I then also remembered your discussion that perhaps it would be better to keep BUILD_VECTOR nodes during combine2, instead of replacing with SYSTEMZ::BYTE_MASK during legalization. I decided to try this and this is the patch I have.

It seems to be not too complicated to do this, since apart from the handling in Select() it is enough to redefine the z_vzero and z_vones nodes to recognize BUILD_VECTORs instead, and the pattern matching will work as before.

I am not quite sure if this is the best solution, but as it is now tryBuildVectorByteMask() is used first during legalization to build a new BUILD_VECTOR with the right constants, and then again in Select() to get the same mask back again. I first thought it would be possible to just leave the BUILD_VECTORS during legalization, but then I found a case where this doesn't
work which involved ConstantFP<nan>, which ended up in the constant pool.

First observations on benchmarks is that just one file (462.libquantum/build/qec.s) changes like (inside a vectorized loop):

vno            :                  203                  193      -10
vnc            :                  149                  151       +2
vn             :                  557                  555       -2

This is a surprisingly small improvement. Perhaps some piece is missing to unlock more improvements?

With this patch in place, I tried the shouldScalarizeBinop() hook again (copied from X86), and now one additional file changed (454.calculix/build/InpMtx_init.s) like

xilf           :                 4922                 4942      +20
vno            :                  193                  190       -3
tmll           :                14386                14385       -1
jne            :                15139                15138       -1
la             :               199056               199055       -1
vlgvf          :                 1266                 1265       -1

It seems this is a loop with many extracts and test-under-mask:s, that now do a scalar xilf before each tmll...

The handling of replication of constant in lowerBUILD_VECTOR() is perhaps the next step after this to rework in a similar way...

Diff Detail

Event Timeline

jonpa created this revision.Jan 24 2019, 6:14 AM

I am not quite sure if this is the best solution, but as it is now tryBuildVectorByteMask() is used first during legalization to build a new BUILD_VECTOR with the right constants, and then again in Select() to get the same mask back again. I first thought it would be possible to just leave the BUILD_VECTORS during legalization, but then I found a case where this doesn't work which involved ConstantFP<nan>, which ended up in the constant pool.

Can you explain this further? If you leave the BUILD_VECTOR as-is, where is it pushed into the constant pool?

The handling of replication of constant in lowerBUILD_VECTOR() is perhaps the next step after this to rework in a similar way...

That would probably make sense.

It seems this is a loop with many extracts and test-under-mask:s, that now do a scalar xilf before each tmll...

Well, that's certainly not good.

Can you explain this further? If you leave the BUILD_VECTOR as-is, where is it pushed into the constant pool?

If Op is returned unchanged from lowerBUILD_VECTOR(), SelectionDAG::legalize() will later call Legalizer.LegalizeOp(N), with the ConstantFP operand of the BUILD_VECTOR. Since it is not a TargetConstant, this will go on to find the Action to be 'Expand'. The ExpandNode() case for ConstantFP will call TLI.isFPImmLegal() which return false (in most cases), and therefore ExpandFPConstant() is called, which returns a load from the constant pool.

It is in fact not just NaN ConstantFPs that is in question here, but rather any ConstantFP that is a candidate for VGBM. In order to select into VGBM later, it seems we need to detect this case here and produce a new BUILD_VECTOR, either with i8 operands, or perhaps with TargetConstantFP operands, which makes a bit more sense in a way.

I also discovered that it is not so good to replace undef operands with 0, as the patch currently does by using the Mask produced by tryBuildVectorByteMask(). In cases where some elements are all-ones and the rest undef, it is best to keep it that way so that SD::isBuildVectorAllOnes() can return true.

Patch updated per previous comment so that it returns all integer BVNs unmodified, and for FP a new BVN is build with TargetConstantFP operands, to avoid expansion into the constant pool.

I had to adjust two vllez patterns to make all tests pass.

Due to keeping the undef operands instead of making them explicitly 0, one more file on benchmarks is slightly improved, which gives 2 in total :-)

Not sure if it would simplify the code a bit to build integer (16xi8) BVNs instead of insisting on keeping the FP operands as TargetConstantFP:s.

I am still using tryBuildVectorByteMask() in two places which I think saves some duplicated code even though the Mask isn't used during legalization.

In D57152#1371369, @jonpa wrote:

The ExpandNode() case for ConstantFP will call TLI.isFPImmLegal() which return false (in most cases), and therefore ExpandFPConstant() is called, which returns a load from the constant pool.

If we actually can load the FP constant using an immediate vector instruction, shouldn't we then return true from TLI.isFPImmLegal? Maybe that's the real underlying problem ...

If we actually can load the FP constant using an immediate vector instruction, shouldn't we then return true from TLI.isFPImmLegal? Maybe that's the real underlying problem ...

I actually tried that earlier, and found that it worked as you say in the case where BUILD_VECTOR can be selected into VGBM. However, an FP-immediate (outside any BUILD_VECTOR) then crashes isel, since there is no handling for it other than loading it from the constant pool.

I was also a bit skeptical about using isFPImmLegal since there is no way to tell if this is a scalar or vector element constant. However, I tried this now again, and found that this actually seems to work out fairly well in lower_BUILD_VECTOR:

If all fp-elements are legal, we can return it unmodified just as with an int vector and emit VGBM in Select().
If not all fp-elements are legal, we return SDValue(), so the entire vector constant is loaded from the constant pool.

This is just what we want, so the question now is what to do with the new legal scalar FP immediates. We could either decide to generate them with VGBM, or we could keep them in the constant pool where they are now. There are plenty of reg-mem FP instructions, and I suspect we would like to keep using them? In that case I think we could either extend isFPImmLegal with an argument that informs us to make the right decision depending on scalar/vector context, or we could possibly say that the scalar constant is legal, but we decide to load it from the constant pool in Select().

Handle the FP case by building a v16i8 vector instead of using TargetFPConstants (or isFPImmLegal).

NFC to using TargetFPConstants.

The support of VGBM for non-zero FP vectors has been removed, which greatly simplifies the patch. This is NFC on SPEC compared to previous version of patch, except for in one file where the floating point VGBM mask of 0xf0f0 is selected as VGMG insteadof VGBM (1 file, 8 places).

test functions in vec-const-05.ll and vec-const-06.ll for VGBM masks of non-zero FP vectors removed.

Yes, I think this version makes most sense. LGTM.

This revision is now accepted and ready to land.Feb 6 2019, 2:26 AM

Thanks for review. r353325.

Revision Contents

Path

Size

lib/

Target/

SystemZ/

SystemZISelDAGToDAG.cpp

15 lines

SystemZISelLowering.h

7 lines

SystemZISelLowering.cpp

52 lines

SystemZInstrVector.td

2 lines

SystemZOperators.td

7 lines

test/

CodeGen/

SystemZ/

buildvector-00.ll

36 lines

Diff 184585

lib/Target/SystemZ/SystemZISelDAGToDAG.cpp

//===-- SystemZISelDAGToDAG.cpp - A dag to dag inst selector for SystemZ --===//		//===-- SystemZISelDAGToDAG.cpp - A dag to dag inst selector for SystemZ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines an instruction selector for the SystemZ target.		// This file defines an instruction selector for the SystemZ target.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "SystemZTargetMachine.h"		#include "SystemZTargetMachine.h"
		#include "SystemZISelLowering.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/CodeGen/SelectionDAGISel.h"		#include "llvm/CodeGen/SelectionDAGISel.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace llvm;		using namespace llvm;

▲ Show 20 Lines • Show All 1,499 Lines • ▼ Show 20 Lines	if (ElemBitSize == 32) {
return;		return;
} else if (ElemBitSize == 64) {		} else if (ElemBitSize == 64) {
if (tryGather(Node, SystemZ::VGEG))		if (tryGather(Node, SystemZ::VGEG))
return;		return;
}		}
break;		break;
}		}

		case ISD::BUILD_VECTOR: {
		auto *BVN = cast<BuildVectorSDNode>(Node);
		SDLoc DL(Node);
		EVT VT = Node->getValueType(0);
		uint64_t Mask = 0;
		if (SystemZTargetLowering::tryBuildVectorByteMask(BVN, Mask)) {
		SDNode *Res = CurDAG->getMachineNode(SystemZ::VGBM, DL, VT,
		CurDAG->getTargetConstant(Mask, DL, MVT::i32));
		ReplaceNode(Node, Res);
		return;
		}
		break;
		}

case ISD::STORE: {		case ISD::STORE: {
if (tryFoldLoadStoreIntoMemOperand(Node))		if (tryFoldLoadStoreIntoMemOperand(Node))
return;		return;
auto *Store = cast<StoreSDNode>(Node);		auto *Store = cast<StoreSDNode>(Node);
unsigned ElemBitSize = Store->getValue().getValueSizeInBits();		unsigned ElemBitSize = Store->getValue().getValueSizeInBits();
if (ElemBitSize == 32) {		if (ElemBitSize == 32) {
if (tryScatter(Store, SystemZ::VSCEF))		if (tryScatter(Store, SystemZ::VSCEF))
return;		return;
▲ Show 20 Lines • Show All 306 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZISelLowering.h

Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
// the TDB pointer, and the third the immediate control field.		// the TDB pointer, and the third the immediate control field.
// Returns CC value and chain.		// Returns CC value and chain.
TBEGIN,		TBEGIN,
TBEGIN_NOFLOAT,		TBEGIN_NOFLOAT,

// Transaction end. Just the chain operand. Returns CC value and chain.		// Transaction end. Just the chain operand. Returns CC value and chain.
TEND,		TEND,

// Create a vector constant by filling byte N of the result with bit
// 15-N of the single operand.
BYTE_MASK,

// Create a vector constant by replicating an element-sized RISBG-style mask.		// Create a vector constant by replicating an element-sized RISBG-style mask.
// The first operand specifies the starting set bit and the second operand		// The first operand specifies the starting set bit and the second operand
// specifies the ending set bit. Both operands count from the MSB of the		// specifies the ending set bit. Both operands count from the MSB of the
// element.		// element.
ROTATE_MASK,		ROTATE_MASK,

// Replicate a GPR scalar value into all elements of a vector.		// Replicate a GPR scalar value into all elements of a vector.
REPLICATE,		REPLICATE,
▲ Show 20 Lines • Show All 334 Lines • ▼ Show 20 Lines	public:
ISD::NodeType getExtendForAtomicOps() const override {		ISD::NodeType getExtendForAtomicOps() const override {
return ISD::ANY_EXTEND;		return ISD::ANY_EXTEND;
}		}

bool supportSwiftError() const override {		bool supportSwiftError() const override {
return true;		return true;
}		}

		static bool tryBuildVectorByteMask(BuildVectorSDNode *BVN, uint64_t &Mask,
		uint64_t *UndefMask = nullptr);

private:		private:
const SystemZSubtarget &Subtarget;		const SystemZSubtarget &Subtarget;

// Implement LowerOperation for individual opcodes.		// Implement LowerOperation for individual opcodes.
SDValue getVectorCmp(SelectionDAG &DAG, unsigned Opcode,		SDValue getVectorCmp(SelectionDAG &DAG, unsigned Opcode,
const SDLoc &DL, EVT VT,		const SDLoc &DL, EVT VT,
SDValue CmpOp0, SDValue CmpOp1) const;		SDValue CmpOp0, SDValue CmpOp1) const;
SDValue lowerVectorSETCC(SelectionDAG &DAG, const SDLoc &DL,		SDValue lowerVectorSETCC(SelectionDAG &DAG, const SDLoc &DL,
▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,504 Lines • ▼ Show 20 Lines	else {
if (unsigned Opcode = getVectorComparisonOrInvert(CC, IsFP, Invert))		if (unsigned Opcode = getVectorComparisonOrInvert(CC, IsFP, Invert))
Cmp = getVectorCmp(DAG, Opcode, DL, VT, CmpOp1, CmpOp0);		Cmp = getVectorCmp(DAG, Opcode, DL, VT, CmpOp1, CmpOp0);
else		else
llvm_unreachable("Unhandled comparison");		llvm_unreachable("Unhandled comparison");
}		}
break;		break;
}		}
if (Invert) {		if (Invert) {
SDValue Mask = DAG.getNode(SystemZISD::BYTE_MASK, DL, MVT::v16i8,		SDValue Mask =
DAG.getConstant(65535, DL, MVT::i32));		DAG.getSplatBuildVector(VT, DL, DAG.getConstant(-1, DL, MVT::i64));
Mask = DAG.getNode(ISD::BITCAST, DL, VT, Mask);
Cmp = DAG.getNode(ISD::XOR, DL, VT, Cmp, Mask);		Cmp = DAG.getNode(ISD::XOR, DL, VT, Cmp, Mask);
}		}
return Cmp;		return Cmp;
}		}

SDValue SystemZTargetLowering::lowerSETCC(SDValue Op,		SDValue SystemZTargetLowering::lowerSETCC(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDValue CmpOp0 = Op.getOperand(0);		SDValue CmpOp0 = Op.getOperand(0);
▲ Show 20 Lines • Show All 801 Lines • ▼ Show 20 Lines	case 16: {
Op = DAG.getNode(ISD::BITCAST, DL, VT, Op);		Op = DAG.getNode(ISD::BITCAST, DL, VT, Op);
SDValue Shift = DAG.getConstant(8, DL, MVT::i32);		SDValue Shift = DAG.getConstant(8, DL, MVT::i32);
SDValue Tmp = DAG.getNode(SystemZISD::VSHL_BY_SCALAR, DL, VT, Op, Shift);		SDValue Tmp = DAG.getNode(SystemZISD::VSHL_BY_SCALAR, DL, VT, Op, Shift);
Op = DAG.getNode(ISD::ADD, DL, VT, Op, Tmp);		Op = DAG.getNode(ISD::ADD, DL, VT, Op, Tmp);
Op = DAG.getNode(SystemZISD::VSRL_BY_SCALAR, DL, VT, Op, Shift);		Op = DAG.getNode(SystemZISD::VSRL_BY_SCALAR, DL, VT, Op, Shift);
break;		break;
}		}
case 32: {		case 32: {
SDValue Tmp = DAG.getNode(SystemZISD::BYTE_MASK, DL, MVT::v16i8,		SDValue Tmp = DAG.getSplatBuildVector(MVT::v16i8, DL,
DAG.getConstant(0, DL, MVT::i32));		DAG.getConstant(0, DL, MVT::i32));
Op = DAG.getNode(SystemZISD::VSUM, DL, VT, Op, Tmp);		Op = DAG.getNode(SystemZISD::VSUM, DL, VT, Op, Tmp);
break;		break;
}		}
case 64: {		case 64: {
SDValue Tmp = DAG.getNode(SystemZISD::BYTE_MASK, DL, MVT::v16i8,		SDValue Tmp = DAG.getSplatBuildVector(MVT::v16i8, DL,
DAG.getConstant(0, DL, MVT::i32));		DAG.getConstant(0, DL, MVT::i32));
Op = DAG.getNode(SystemZISD::VSUM, DL, MVT::v4i32, Op, Tmp);		Op = DAG.getNode(SystemZISD::VSUM, DL, MVT::v4i32, Op, Tmp);
Op = DAG.getNode(SystemZISD::VSUM, DL, VT, Op, Tmp);		Op = DAG.getNode(SystemZISD::VSUM, DL, VT, Op, Tmp);
break;		break;
}		}
default:		default:
llvm_unreachable("Unexpected type");		llvm_unreachable("Unexpected type");
}		}
return Op;		return Op;
▲ Show 20 Lines • Show All 905 Lines • ▼ Show 20 Lines	else if (Op1.isUndef())
Op0 = Op1 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i64, Op0);		Op0 = Op1 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i64, Op0);
else {		else {
Op0 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i64, Op0);		Op0 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i64, Op0);
Op1 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i64, Op1);		Op1 = DAG.getNode(ISD::ANY_EXTEND, DL, MVT::i64, Op1);
}		}
return DAG.getNode(SystemZISD::JOIN_DWORDS, DL, MVT::v2i64, Op0, Op1);		return DAG.getNode(SystemZISD::JOIN_DWORDS, DL, MVT::v2i64, Op0, Op1);
}		}

// Try to represent constant BUILD_VECTOR node BVN using a		// Try to represent constant BUILD_VECTOR node BVN using a BYTE MASK style
// SystemZISD::BYTE_MASK-style mask. Store the mask value in Mask		// mask. Store the mask value in Mask on success.
// on success.		bool SystemZTargetLowering::
static bool tryBuildVectorByteMask(BuildVectorSDNode *BVN, uint64_t &Mask) {		tryBuildVectorByteMask(BuildVectorSDNode *BVN, uint64_t &Mask,
		uint64_t *UndefMask) {
EVT ElemVT = BVN->getValueType(0).getVectorElementType();		EVT ElemVT = BVN->getValueType(0).getVectorElementType();
unsigned BytesPerElement = ElemVT.getStoreSize();		unsigned BytesPerElement = ElemVT.getStoreSize();
for (unsigned I = 0, E = BVN->getNumOperands(); I != E; ++I) {		for (unsigned I = 0, E = BVN->getNumOperands(); I != E; ++I) {
SDValue Op = BVN->getOperand(I);		SDValue Op = BVN->getOperand(I);
if (!Op.isUndef()) {		if (!Op.isUndef()) {
uint64_t Value;		uint64_t Value;
if (Op.getOpcode() == ISD::Constant)		if (Op.getOpcode() == ISD::Constant)
Value = cast<ConstantSDNode>(Op)->getZExtValue();		Value = cast<ConstantSDNode>(Op)->getZExtValue();
else if (Op.getOpcode() == ISD::ConstantFP)		else if (Op.getOpcode() == ISD::ConstantFP)
Value = (cast<ConstantFPSDNode>(Op)->getValueAPF().bitcastToAPInt()		Value = (cast<ConstantFPSDNode>(Op)->getValueAPF().bitcastToAPInt()
.getZExtValue());		.getZExtValue());
else		else
return false;		return false;
for (unsigned J = 0; J < BytesPerElement; ++J) {		for (unsigned J = 0; J < BytesPerElement; ++J) {
uint64_t Byte = (Value >> (J * 8)) & 0xff;		uint64_t Byte = (Value >> (J * 8)) & 0xff;
if (Byte == 0xff)		if (Byte == 0xff)
Mask \|= 1ULL << ((E - I - 1) * BytesPerElement + J);		Mask \|= 1ULL << ((E - I - 1) * BytesPerElement + J);
else if (Byte != 0)		else if (Byte != 0)
return false;		return false;
}		}
		} else if (UndefMask != nullptr) {
		for (unsigned J = 0; J < BytesPerElement; ++J)
		UndefMask \|= 1ULL << ((E - I - 1) BytesPerElement + J);
}		}
}		}
return true;		return true;
}		}

// Try to load a vector constant in which BitsPerElement-bit value Value		// Try to load a vector constant in which BitsPerElement-bit value Value
// is replicated to fill the vector. VT is the type of the resulting		// is replicated to fill the vector. VT is the type of the resulting
// constant, which may have elements of a different size from BitsPerElement.		// constant, which may have elements of a different size from BitsPerElement.
▲ Show 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	SDValue SystemZTargetLowering::lowerBUILD_VECTOR(SDValue Op,
SDLoc DL(Op);		SDLoc DL(Op);
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

if (BVN->isConstant()) {		if (BVN->isConstant()) {
// Try using VECTOR GENERATE BYTE MASK. This is the architecturally-		// Try using VECTOR GENERATE BYTE MASK. This is the architecturally-
// preferred way of creating all-zero and all-one vectors so give it		// preferred way of creating all-zero and all-one vectors so give it
// priority over other methods below.		// priority over other methods below.
uint64_t Mask = 0;		uint64_t Mask = 0;
if (tryBuildVectorByteMask(BVN, Mask)) {		uint64_t UndefMask = 0;
SDValue Op = DAG.getNode(		if (tryBuildVectorByteMask(BVN, Mask, &UndefMask)) {
SystemZISD::BYTE_MASK, DL, MVT::v16i8,		if (VT.isInteger())
DAG.getConstant(Mask, DL, MVT::i32, false, true /isOpaque/));		return Op;
return DAG.getNode(ISD::BITCAST, DL, VT, Op);
		// Floating point: build a new integer BUILD_VECTOR with all-ones,
		// all-zeros or undef elements.
		SmallVector<SDValue, SystemZ::VectorBytes> Constants;
		for (unsigned I = SystemZ::VectorBytes - 1; I + 1 != 0; --I) {
		if (UndefMask & 1ULL << I)
		Constants.push_back(DAG.getUNDEF(MVT::i32));
		else if (Mask & 1ULL << I)
		Constants.push_back(DAG.getConstant(-1, DL, MVT::i32));
		else
		Constants.push_back(DAG.getConstant(0, DL, MVT::i32));
		}
		SDValue BVInt = DAG.getBuildVector(MVT::v16i8, DL, Constants);
		return DAG.getNode(ISD::BITCAST, DL, VT, BVInt);
}		}

// Try using some form of replication.		// Try using some form of replication.
APInt SplatBits, SplatUndef;		APInt SplatBits, SplatUndef;
unsigned SplatBitSize;		unsigned SplatBitSize;
bool HasAnyUndefs;		bool HasAnyUndefs;
if (BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs,		if (BVN->isConstantSplat(SplatBits, SplatUndef, SplatBitSize, HasAnyUndefs,
8, true) &&		8, true) &&
▲ Show 20 Lines • Show All 464 Lines • ▼ Show 20 Lines	switch ((SystemZISD::NodeType)Opcode) {
OPCODE(STPCPY);		OPCODE(STPCPY);
OPCODE(STRCMP);		OPCODE(STRCMP);
OPCODE(SEARCH_STRING);		OPCODE(SEARCH_STRING);
OPCODE(IPM);		OPCODE(IPM);
OPCODE(MEMBARRIER);		OPCODE(MEMBARRIER);
OPCODE(TBEGIN);		OPCODE(TBEGIN);
OPCODE(TBEGIN_NOFLOAT);		OPCODE(TBEGIN_NOFLOAT);
OPCODE(TEND);		OPCODE(TEND);
OPCODE(BYTE_MASK);
OPCODE(ROTATE_MASK);		OPCODE(ROTATE_MASK);
OPCODE(REPLICATE);		OPCODE(REPLICATE);
OPCODE(JOIN_DWORDS);		OPCODE(JOIN_DWORDS);
OPCODE(SPLAT);		OPCODE(SPLAT);
OPCODE(MERGE_HIGH);		OPCODE(MERGE_HIGH);
OPCODE(MERGE_LOW);		OPCODE(MERGE_LOW);
OPCODE(SHL_DOUBLE);		OPCODE(SHL_DOUBLE);
OPCODE(PERMUTE_DWORDS);		OPCODE(PERMUTE_DWORDS);
▲ Show 20 Lines • Show All 295 Lines • ▼ Show 20 Lines
SDValue SystemZTargetLowering::combineMERGE(		SDValue SystemZTargetLowering::combineMERGE(
SDNode *N, DAGCombinerInfo &DCI) const {		SDNode *N, DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
unsigned Opcode = N->getOpcode();		unsigned Opcode = N->getOpcode();
SDValue Op0 = N->getOperand(0);		SDValue Op0 = N->getOperand(0);
SDValue Op1 = N->getOperand(1);		SDValue Op1 = N->getOperand(1);
if (Op0.getOpcode() == ISD::BITCAST)		if (Op0.getOpcode() == ISD::BITCAST)
Op0 = Op0.getOperand(0);		Op0 = Op0.getOperand(0);
if (Op0.getOpcode() == SystemZISD::BYTE_MASK &&		if (ISD::isBuildVectorAllZeros(Op0.getNode())) {
cast<ConstantSDNode>(Op0.getOperand(0))->getZExtValue() == 0) {
// (z_merge_* 0, 0) -> 0. This is mostly useful for using VLLEZF		// (z_merge_* 0, 0) -> 0. This is mostly useful for using VLLEZF
// for v4f32.		// for v4f32.
if (Op1 == N->getOperand(0))		if (Op1 == N->getOperand(0))
return Op1;		return Op1;
// (z_merge_? 0, X) -> (z_unpackl_? 0, X).		// (z_merge_? 0, X) -> (z_unpackl_? 0, X).
EVT VT = Op1.getValueType();		EVT VT = Op1.getValueType();
unsigned ElemBytes = VT.getVectorElementType().getStoreSize();		unsigned ElemBytes = VT.getVectorElementType().getStoreSize();
if (ElemBytes <= 4) {		if (ElemBytes <= 4) {
▲ Show 20 Lines • Show All 2,057 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZInstrVector.td

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	let Predicates = [FeatureVector] in {			let Predicates = [FeatureVector] in {
	let isAsCheapAsAMove = 1, isMoveImm = 1, isReMaterializable = 1 in {			let isAsCheapAsAMove = 1, isMoveImm = 1, isReMaterializable = 1 in {

	// Generate byte mask.			// Generate byte mask.
	def VZERO : InherentVRIa<"vzero", 0xE744, 0>;			def VZERO : InherentVRIa<"vzero", 0xE744, 0>;
	def VONE : InherentVRIa<"vone", 0xE744, 0xffff>;			def VONE : InherentVRIa<"vone", 0xE744, 0xffff>;
	def VGBM : UnaryVRIa<"vgbm", 0xE744, z_byte_mask, v128b, imm32zx16>;			def VGBM : UnaryVRIa<"vgbm", 0xE744, null_frag, v128b, imm32zx16>;

	// Generate mask.			// Generate mask.
	def VGM : BinaryVRIbGeneric<"vgm", 0xE746>;			def VGM : BinaryVRIbGeneric<"vgm", 0xE746>;
	def VGMB : BinaryVRIb<"vgmb", 0xE746, z_rotate_mask, v128b, 0>;			def VGMB : BinaryVRIb<"vgmb", 0xE746, z_rotate_mask, v128b, 0>;
	def VGMH : BinaryVRIb<"vgmh", 0xE746, z_rotate_mask, v128h, 1>;			def VGMH : BinaryVRIb<"vgmh", 0xE746, z_rotate_mask, v128h, 1>;
	def VGMF : BinaryVRIb<"vgmf", 0xE746, z_rotate_mask, v128f, 2>;			def VGMF : BinaryVRIb<"vgmf", 0xE746, z_rotate_mask, v128f, 2>;
	def VGMG : BinaryVRIb<"vgmg", 0xE746, z_rotate_mask, v128g, 3>;			def VGMG : BinaryVRIb<"vgmg", 0xE746, z_rotate_mask, v128g, 3>;

	▲ Show 20 Lines • Show All 1,483 Lines • Show Last 20 Lines

lib/Target/SystemZ/SystemZOperators.td

Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines

def z_tdc : SDNode<"SystemZISD::TDC", SDT_ZTest>;		def z_tdc : SDNode<"SystemZISD::TDC", SDT_ZTest>;

// Defined because the index is an i32 rather than a pointer.		// Defined because the index is an i32 rather than a pointer.
def z_vector_insert : SDNode<"ISD::INSERT_VECTOR_ELT",		def z_vector_insert : SDNode<"ISD::INSERT_VECTOR_ELT",
SDT_ZInsertVectorElt>;		SDT_ZInsertVectorElt>;
def z_vector_extract : SDNode<"ISD::EXTRACT_VECTOR_ELT",		def z_vector_extract : SDNode<"ISD::EXTRACT_VECTOR_ELT",
SDT_ZExtractVectorElt>;		SDT_ZExtractVectorElt>;
def z_byte_mask : SDNode<"SystemZISD::BYTE_MASK", SDT_ZReplicate>;
def z_rotate_mask : SDNode<"SystemZISD::ROTATE_MASK", SDT_ZRotateMask>;		def z_rotate_mask : SDNode<"SystemZISD::ROTATE_MASK", SDT_ZRotateMask>;
def z_replicate : SDNode<"SystemZISD::REPLICATE", SDT_ZReplicate>;		def z_replicate : SDNode<"SystemZISD::REPLICATE", SDT_ZReplicate>;
def z_join_dwords : SDNode<"SystemZISD::JOIN_DWORDS", SDT_ZJoinDwords>;		def z_join_dwords : SDNode<"SystemZISD::JOIN_DWORDS", SDT_ZJoinDwords>;
def z_splat : SDNode<"SystemZISD::SPLAT", SDT_ZVecBinaryInt>;		def z_splat : SDNode<"SystemZISD::SPLAT", SDT_ZVecBinaryInt>;
def z_merge_high : SDNode<"SystemZISD::MERGE_HIGH", SDT_ZVecBinary>;		def z_merge_high : SDNode<"SystemZISD::MERGE_HIGH", SDT_ZVecBinary>;
def z_merge_low : SDNode<"SystemZISD::MERGE_LOW", SDT_ZVecBinary>;		def z_merge_low : SDNode<"SystemZISD::MERGE_LOW", SDT_ZVecBinary>;
def z_shl_double : SDNode<"SystemZISD::SHL_DOUBLE", SDT_ZVecTernaryInt>;		def z_shl_double : SDNode<"SystemZISD::SHL_DOUBLE", SDT_ZVecTernaryInt>;
def z_permute_dwords : SDNode<"SystemZISD::PERMUTE_DWORDS",		def z_permute_dwords : SDNode<"SystemZISD::PERMUTE_DWORDS",
▲ Show 20 Lines • Show All 406 Lines • ▼ Show 20 Lines	def imm32bottom6set : PatLeaf<(i32 imm), [{
return (N->getZExtValue() & 0x3f) == 0x3f;		return (N->getZExtValue() & 0x3f) == 0x3f;
}]>;		}]>;
class shiftop<SDPatternOperator operator>		class shiftop<SDPatternOperator operator>
: PatFrags<(ops node:$val, node:$count),		: PatFrags<(ops node:$val, node:$count),
[(operator node:$val, node:$count),		[(operator node:$val, node:$count),
(operator node:$val, (and node:$count, imm32bottom6set))]>;		(operator node:$val, (and node:$count, imm32bottom6set))]>;

// Vector representation of all-zeros and all-ones.		// Vector representation of all-zeros and all-ones.
def z_vzero : PatFrag<(ops), (bitconvert (v16i8 (z_byte_mask (i32 0))))>;		def z_vzero : PatFrags<(ops), [(immAllZerosV),
def z_vones : PatFrag<(ops), (bitconvert (v16i8 (z_byte_mask (i32 65535))))>;		(bitconvert (v16i8 (immAllZerosV)))]>;
		def z_vones : PatFrags<(ops), [(immAllOnesV),
		(bitconvert (v16i8 (immAllOnesV)))]>;

// Load a scalar and replicate it in all elements of a vector.		// Load a scalar and replicate it in all elements of a vector.
class z_replicate_load<ValueType scalartype, SDPatternOperator load>		class z_replicate_load<ValueType scalartype, SDPatternOperator load>
: PatFrag<(ops node:$addr),		: PatFrag<(ops node:$addr),
(z_replicate (scalartype (load node:$addr)))>;		(z_replicate (scalartype (load node:$addr)))>;
def z_replicate_loadi8 : z_replicate_load<i32, anyextloadi8>;		def z_replicate_loadi8 : z_replicate_load<i32, anyextloadi8>;
def z_replicate_loadi16 : z_replicate_load<i32, anyextloadi16>;		def z_replicate_loadi16 : z_replicate_load<i32, anyextloadi16>;
def z_replicate_loadi32 : z_replicate_load<i32, load>;		def z_replicate_loadi32 : z_replicate_load<i32, load>;
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/buildvector-00.ll

This file was added.

				; Test that the dag combiner can understand that some vector operands are
				; all-zeros and then optimize the logical operations.
				;
				; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z13 \| FileCheck %s

				define void @f1() {
				; CHECK-LABEL: f1:
				; CHECK: vno
				; CHECK-NOT: vno

				bb:
				%tmp = shufflevector <2 x i64> undef, <2 x i64> undef, <2 x i32> zeroinitializer
				br label %bb1

				bb1: ; preds = %bb
				%tmp2 = load i64, i64* undef, align 8
				%tmp3 = insertelement <2 x i64> undef, i64 %tmp2, i32 1
				%tmp4 = icmp ne <2 x i64> undef, zeroinitializer
				%tmp5 = xor <2 x i1> %tmp4, zeroinitializer
				%tmp6 = xor <2 x i1> zeroinitializer, %tmp5
				%tmp7 = and <2 x i64> %tmp3, %tmp
				%tmp8 = icmp ne <2 x i64> %tmp7, zeroinitializer
				%tmp9 = xor <2 x i1> zeroinitializer, %tmp8
				%tmp10 = icmp ne <2 x i64> undef, zeroinitializer
				%tmp11 = xor <2 x i1> %tmp10, %tmp9
				%tmp12 = and <2 x i1> %tmp6, %tmp11
				%tmp13 = extractelement <2 x i1> %tmp12, i32 0
				br i1 %tmp13, label %bb14, label %bb15

				bb14: ; preds = %bb1
				store i64 undef, i64* undef, align 8
				br label %bb15

				bb15: ; preds = %bb14, %bb1
				unreachable
				}