This is an archive of the discontinued LLVM Phabricator instance.

Differential D60133

[DAGCombiner] Improve detection of unmergable stores, based on type size (WIP)
Needs ReviewPublic

Authored by fhahn on Apr 2 2019, 8:45 AM.

Download Raw Diff

Details

Reviewers

niravd

Summary

Note: this is WIP, there is still one case where we exit early
unnecessarily.

By constructing the VT for a merged store upfront and using that to
check if merging is legal, we can exit early in some more cases.

This speeds up some variations of PR41263, where we have lots of i64
stores.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 29949
Build 29948: arc lint + arc unit

Event Timeline

fhahn created this revision.Apr 2 2019, 8:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2019, 8:45 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B29949: Diff 193300.Apr 2 2019, 8:45 AM

It's not sufficient to check if you can merge two stores into a valid node; there are backends where you need 4 or more to get a legal merged store.

If you look at target-specific implementations of CanMergeStoresTo it essentially serves as a context-specific find maximum store which is what we need here. If you massage that interface a bit you can fold most of this check in there.

In D60133#1452430, @niravd wrote:

It's not sufficient to check if you can merge two stores into a valid node; there are backends where you need 4 or more to get a legal merged store.

If you look at target-specific implementations of CanMergeStoresTo it essentially serves as a context-specific find maximum store which is what we need here. If you massage that interface a bit you can fold most of this check in there.

Yeah, that might be a better approach. IIUC we would need the (potentially separate) limits for integer types, floating point types and vector types. And accounting for when we can convert to vector types. Does that make sense?

niravd mentioned this in D61397: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.May 2 2019, 8:19 AM

In D60133#1452430, @niravd wrote:

It's not sufficient to check if you can merge two stores into a valid node; there are backends where you need 4 or more to get a legal merged store.

If you look at target-specific implementations of CanMergeStoresTo it essentially serves as a context-specific find maximum store which is what we need here. If you massage that interface a bit you can fold most of this check in there.

Sorry for the long delay with getting back to this one! I had a look at the X86 and AArch64 implementations of CanMergeStoresTo and it looks like they are not too helpful with getting a reasonable upper bound. They return true for functions without noimplicitfloat and otherwise limit the size to 64 bits or 32 bits for X86 32 bit mode. I think one thing that's more important to check is whether the merged type is legal for the target. This will weed out stores with types too large for the target.

So I think we need to do more work than just checking canMergeStoresTo unfortunately. What we would need is the maximum size for integer, float and vector stores I think, but I am not sure if any existing interface provides that info.

The current interfaces definitely do not provide anything like that. As I recall, most of the use cases requiring canMergeStoresTo involve ad hoc ways to locally declare a type/operation invalid localized to the specific target, so I suggest the just marginally expanding hte interface to capture the one issue we've seen so far (no-implict-float) and defer any deeper analysis for when we run into it.

niravd mentioned this in D65174: [DAGCombine] Limit the number of times for a store being considered for merging.Jul 29 2019, 8:31 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

61 lines

Diff 193300

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,863 Lines • ▼ Show 20 Lines	if (SDNode::hasPredecessorHelper(StoreNodes[i].MemNode, Visited, Worklist,
return false;		return false;
return true;		return true;
}		}

bool DAGCombiner::MergeConsecutiveStores(StoreSDNode *St) {		bool DAGCombiner::MergeConsecutiveStores(StoreSDNode *St) {
if (OptLevel == CodeGenOpt::None)		if (OptLevel == CodeGenOpt::None)
return false;		return false;

		// Perform an early exit check. Do not bother looking at stored values that
		// are not constants, loads, or extracted vector elements.
		SDValue StoredVal = peekThroughBitcasts(St->getValue());
		bool IsLoadSrc = isa<LoadSDNode>(StoredVal);
		bool IsConstantSrc = isa<ConstantSDNode>(StoredVal) \|\|
		isa<ConstantFPSDNode>(StoredVal);
		bool IsExtractVecSrc = (StoredVal.getOpcode() == ISD::EXTRACT_VECTOR_ELT \|\|
		StoredVal.getOpcode() == ISD::EXTRACT_SUBVECTOR);

		if (!IsConstantSrc && !IsLoadSrc && !IsExtractVecSrc)
		return false;

		LLVMContext &Context = *DAG.getContext();
EVT MemVT = St->getMemoryVT();		EVT MemVT = St->getMemoryVT();
int64_t ElementSizeBytes = MemVT.getStoreSize();
unsigned NumMemElts = MemVT.isVector() ? MemVT.getVectorNumElements() : 1;		unsigned NumMemElts = MemVT.isVector() ? MemVT.getVectorNumElements() : 1;
		auto IsZero = [](const SDValue &Val) {
		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Val))
		return C->isNullValue();
		else if (ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(Val))
		return C->getConstantFPValue()->isNullValue();
		return false;
		};

if (MemVT.getSizeInBits() * 2 > MaximumLegalStoreInBits)		// Construct VT for two merged stores and check if it is legal.
		unsigned Size = MemVT.getSizeInBits() * 2;
		auto GetMergedEVT = [&](EVT &Base) {
		if (Base.isVector() \|\|
		IsZero(St->getValue()) \|\| (IsConstantSrc &&TLI.storeOfVectorConstantIsCheap(Base, 2, St->getAddressSpace())))
		return EVT::getVectorVT(Context, Base.getScalarType(), NumMemElts * 2);
		return EVT::getIntegerVT(Context, Size);
		};
		EVT MergedTy = GetMergedEVT(MemVT);
		if (!TLI.isTypeLegal(MergedTy)) {
		if (TLI.getTypeAction(Context, MergedTy) == TargetLowering::TypePromoteInteger)
		MergedTy =TLI.getTypeToTransformTo(Context, StoredVal.getValueType());
		else
		return false;
		}
		if (!TLI.canMergeStoresTo(St->getAddressSpace(), MergedTy, DAG))
return false;		return false;

bool NoVectors = DAG.getMachineFunction().getFunction().hasFnAttribute(		bool NoVectors = DAG.getMachineFunction().getFunction().hasFnAttribute(
Attribute::NoImplicitFloat);		Attribute::NoImplicitFloat);

		int64_t ElementSizeBytes = MemVT.getStoreSize();
// This function cannot currently deal with non-byte-sized memory sizes.		// This function cannot currently deal with non-byte-sized memory sizes.
if (ElementSizeBytes * 8 != MemVT.getSizeInBits())		if (ElementSizeBytes * 8 != MemVT.getSizeInBits())
return false;		return false;

if (!MemVT.isSimple())		if (!MemVT.isSimple())
return false;		return false;

// Perform an early exit check. Do not bother looking at stored values that
// are not constants, loads, or extracted vector elements.
SDValue StoredVal = peekThroughBitcasts(St->getValue());
bool IsLoadSrc = isa<LoadSDNode>(StoredVal);
bool IsConstantSrc = isa<ConstantSDNode>(StoredVal) \|\|
isa<ConstantFPSDNode>(StoredVal);
bool IsExtractVecSrc = (StoredVal.getOpcode() == ISD::EXTRACT_VECTOR_ELT \|\|
StoredVal.getOpcode() == ISD::EXTRACT_SUBVECTOR);

if (!IsConstantSrc && !IsLoadSrc && !IsExtractVecSrc)
return false;

SmallVector<MemOpLink, 8> StoreNodes;		SmallVector<MemOpLink, 8> StoreNodes;
SDNode *RootNode;		SDNode *RootNode;
// Find potential store merge candidates by searching through chain sub-DAG		// Find potential store merge candidates by searching through chain sub-DAG
getStoreMergeCandidates(St, StoreNodes, RootNode);		getStoreMergeCandidates(St, StoreNodes, RootNode);

// Check if there is anything to merge.		// Check if there is anything to merge.
if (StoreNodes.size() < 2)		if (StoreNodes.size() < 2)
return false;		return false;
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	while (StoreNodes.size() > 1) {

if (NumConsecutiveStores < 2) {		if (NumConsecutiveStores < 2) {
StoreNodes.erase(StoreNodes.begin(),		StoreNodes.erase(StoreNodes.begin(),
StoreNodes.begin() + NumConsecutiveStores);		StoreNodes.begin() + NumConsecutiveStores);
continue;		continue;
}		}

// The node with the lowest store address.		// The node with the lowest store address.
LLVMContext &Context = *DAG.getContext();
const DataLayout &DL = DAG.getDataLayout();		const DataLayout &DL = DAG.getDataLayout();

// Store the constants into memory as one consecutive store.		// Store the constants into memory as one consecutive store.
if (IsConstantSrc) {		if (IsConstantSrc) {
while (NumConsecutiveStores >= 2) {		while (NumConsecutiveStores >= 2) {
LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;		LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;
unsigned FirstStoreAS = FirstInChain->getAddressSpace();		unsigned FirstStoreAS = FirstInChain->getAddressSpace();
unsigned FirstStoreAlign = FirstInChain->getAlignment();		unsigned FirstStoreAlign = FirstInChain->getAlignment();
unsigned LastLegalType = 1;		unsigned LastLegalType = 1;
unsigned LastLegalVectorType = 1;		unsigned LastLegalVectorType = 1;
bool LastIntegerTrunc = false;		bool LastIntegerTrunc = false;
bool NonZero = false;		bool NonZero = false;
unsigned FirstZeroAfterNonZero = NumConsecutiveStores;		unsigned FirstZeroAfterNonZero = NumConsecutiveStores;
for (unsigned i = 0; i < NumConsecutiveStores; ++i) {		for (unsigned i = 0; i < NumConsecutiveStores; ++i) {
StoreSDNode *ST = cast<StoreSDNode>(StoreNodes[i].MemNode);		StoreSDNode *ST = cast<StoreSDNode>(StoreNodes[i].MemNode);
SDValue StoredVal = ST->getValue();		SDValue StoredVal = ST->getValue();
bool IsElementZero = false;		bool IsElementZero = IsZero(StoredVal);
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(StoredVal))
IsElementZero = C->isNullValue();
else if (ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(StoredVal))
IsElementZero = C->getConstantFPValue()->isNullValue();
if (IsElementZero) {		if (IsElementZero) {
if (NonZero && FirstZeroAfterNonZero == NumConsecutiveStores)		if (NonZero && FirstZeroAfterNonZero == NumConsecutiveStores)
FirstZeroAfterNonZero = i;		FirstZeroAfterNonZero = i;
}		}

NonZero \|= !IsElementZero;		NonZero \|= !IsElementZero;

// Find a legal type for the constant store.		// Find a legal type for the constant store.
unsigned SizeInBits = (i + 1) * ElementSizeBytes * 8;		unsigned SizeInBits = (i + 1) * ElementSizeBytes * 8;
EVT StoreTy = EVT::getIntegerVT(Context, SizeInBits);		EVT StoreTy = EVT::getIntegerVT(Context, SizeInBits);
bool IsFast = false;		bool IsFast = false;

// Break early when size is too large to be legal.		// Break early when size is too large to be legal.
if (StoreTy.getSizeInBits() > MaximumLegalStoreInBits)		if (StoreTy.getSizeInBits() > MaximumLegalStoreInBits)
▲ Show 20 Lines • Show All 4,948 Lines • Show Last 20 Lines