This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Split very large token factors for loads into 64k chunks
ClosedPublic

Authored by aemerson on Nov 29 2018, 1:07 PM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
eli.friedman
RKSimon
efriedma

Commits

rG814a6794ba78: [SelectionDAG] Split very large token factors for loads into 64k chunks.
rL348324: [SelectionDAG] Split very large token factors for loads into 64k chunks.

Summary

There's a 64k limit on the number of SDNode operands, and some very large functions with 64k or more loads can cause crashes due to this limit being hit when a TokenFactor with this many operands is created. To fix this, create sub-tokenfactors if we've exceeded the limit. No test case as it requires a very large function, however, the test is just this:

define void @foo() {                                                                                                                                                                                                                                                                                                          
  %r1 = load i8, i8* undef                                                                                                                                                                                                                                                                                                    
  %r2 = load i8, i8* undef                                                                                                                                                                                                                                                                                                    
  %r3 = load i8, i8* undef
  ... etc etc 2^16 times
  call void @llvm.trap()                                                                                                                                                                                                                                                                                                      
  unreachable

rdar://45196621

Diff Detail

Repository: rL LLVM

Event Timeline

aemerson created this revision.Nov 29 2018, 1:07 PM

efriedma added a subscriber: efriedma.Nov 29 2018, 2:40 PM

efriedma added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1052	Instead of making a tree of TokenFactors, could you make a list? It seems a little simpler (less code, and you don't have to worry about the length of TokenFactors itself). I'm a little worried that other code dealing with TokenFactors might end up violating the limit if we're very close... any idea if there's other code that could be affected, like DAGCombine? Do we have an assertion somewhere that will reliably catch this issue?

aemerson marked an inline comment as done.Nov 29 2018, 2:52 PM

aemerson added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1052	Could you clarify what you mean by list?

efriedma added inline comments.Nov 29 2018, 3:01 PM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1052	I mean, each TokenFactor should only have one TokenFactor operand. So the algorithm would be something like this: remove "Limit" loads from end of PendingLoads, make a TokenFactor, and append the resulting TokenFactor to PendingLoads. Repeat as necessary.

RKSimon added reviewers: RKSimon, efriedma.Nov 29 2018, 4:02 PM

Does this fix PR7250 and PR37000 ? Those test cases look usable.

In D55073#1313703, @RKSimon wrote:

Does this fix PR7250 and PR37000 ? Those test cases look usable.

PR7250 asserts in a different place, for the number of result values of a node. This issue is very similar, but instead exceeds the number of operands.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
1052	Ok, I could do that. As for the other places, since SDAG will CSE nodes we should be able to catch this case in getNode() somewhere and assert.

Changed the approach to replace values in PendingLoads in place. Also added an assert in SelectionDAG::createOperands() to check if we can store the requested number of operands.

LGTM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9027 ↗	(On Diff #176189)	Maybe this would be more readable using std::numeric_limits? Your choice.

This revision is now accepted and ready to land.Dec 4 2018, 3:54 PM

aemerson marked an inline comment as done.Dec 4 2018, 4:33 PM

aemerson added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9027 ↗	(On Diff #176189)	Sure.

Closed by commit rL348324: [SelectionDAG] Split very large token factors for loads into 64k chunks. (authored by aemerson). · Explain WhyDec 4 2018, 4:46 PM

This revision was automatically updated to reflect the committed changes.

fhahn mentioned this in D56738: [SelectionDAG] Update check in createOperands to reflect max() is a valid value..Jan 15 2019, 11:58 AM

fhahn mentioned this in D56740: [SelectionDAG] Split very large token factors for chained stores to 64k chunks..Jan 15 2019, 12:02 PM

fhahn mentioned this in rL351318: [SelectionDAG] Update check in createOperands to reflect max() is a valid value..Jan 16 2019, 2:10 AM

fhahn mentioned this in rL351571: [SelectionDAG] Split very large token factors for chained stores to 64k chunks..Jan 18 2019, 10:43 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

21 lines

Diff 175944

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,026 Lines • ▼ Show 20 Lines	SDValue SelectionDAGBuilder::getRoot() {
if (PendingLoads.size() == 1) {		if (PendingLoads.size() == 1) {
SDValue Root = PendingLoads[0];		SDValue Root = PendingLoads[0];
DAG.setRoot(Root);		DAG.setRoot(Root);
PendingLoads.clear();		PendingLoads.clear();
return Root;		return Root;
}		}

// Otherwise, we have to make a token factor node.		// Otherwise, we have to make a token factor node.
SDValue Root = DAG.getNode(ISD::TokenFactor, getCurSDLoc(), MVT::Other,		// If we have >= 2^16 loads then split across multiple token factors as
PendingLoads);		// there's a 64k limit on the number of SDNode operands.
		SDValue Root;
		size_t Limit = (1 << 16) - 1;
		if (PendingLoads.size() <= Limit) {
		Root =
		DAG.getNode(ISD::TokenFactor, getCurSDLoc(), MVT::Other, PendingLoads);
		} else {
		SmallVector<SDValue, 4> TokenFactors;
		for (unsigned i = 0; i < PendingLoads.size(); i += Limit) {
		auto LoadSlice = ArrayRef<SDValue>(PendingLoads)
		.slice(i, std::min(Limit, PendingLoads.size() - i));
		assert(LoadSlice.size() <= Limit && "Too many loads in slice");
		TokenFactors.emplace_back(
		DAG.getNode(ISD::TokenFactor, getCurSDLoc(), MVT::Other, LoadSlice));
		}
		Root =
		DAG.getNode(ISD::TokenFactor, getCurSDLoc(), MVT::Other, TokenFactors);
		efriedmaUnsubmitted Not Done Reply Inline Actions Instead of making a tree of TokenFactors, could you make a list? It seems a little simpler (less code, and you don't have to worry about the length of TokenFactors itself). I'm a little worried that other code dealing with TokenFactors might end up violating the limit if we're very close... any idea if there's other code that could be affected, like DAGCombine? Do we have an assertion somewhere that will reliably catch this issue? efriedma: Instead of making a tree of TokenFactors, could you make a list? It seems a little simpler…
		aemersonAuthorUnsubmitted Not Done Reply Inline Actions Could you clarify what you mean by list? aemerson: Could you clarify what you mean by list?
		efriedmaUnsubmitted Not Done Reply Inline Actions I mean, each TokenFactor should only have one TokenFactor operand. So the algorithm would be something like this: remove "Limit" loads from end of PendingLoads, make a TokenFactor, and append the resulting TokenFactor to PendingLoads. Repeat as necessary. efriedma: I mean, each TokenFactor should only have one TokenFactor operand. So the algorithm would be…
		aemersonAuthorUnsubmitted Done Reply Inline Actions Ok, I could do that. As for the other places, since SDAG will CSE nodes we should be able to catch this case in getNode() somewhere and assert. aemerson: Ok, I could do that. As for the other places, since SDAG will CSE nodes we should be able to…
		}
PendingLoads.clear();		PendingLoads.clear();
DAG.setRoot(Root);		DAG.setRoot(Root);
return Root;		return Root;
}		}

SDValue SelectionDAGBuilder::getControlRoot() {		SDValue SelectionDAGBuilder::getControlRoot() {
SDValue Root = DAG.getRoot();		SDValue Root = DAG.getRoot();

▲ Show 20 Lines • Show All 9,387 Lines • Show Last 20 Lines