This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/CodeGen/SelectionDAG/
-
lib/
-
CodeGen/
-
SelectionDAG/
6/11
DAGCombiner.cpp

Differential D117104

[DAGCombine] Refactor DAGCombiner::ReduceLoadWidth. NFCI
ClosedPublic

Authored by bjope on Jan 12 2022, 5:24 AM.

Download Raw Diff

Details

Reviewers

spatel
samparker

Commits

rG9f237c9e7d88: [DAGCombine] Refactor DAGCombiner::ReduceLoadWidth. NFCI

Summary

Update code comments in DAGCombiner::ReduceLoadWidth and refactor
the handling of SRL a bit. The refactoring is done with the intent
of adding support for folding away SRA by using SEXTLOAD in a
follow-up patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bjope created this revision.Jan 12 2022, 5:24 AM

Herald added subscribers: ecnelises, steven.zhang, hiraditya. · View Herald TranscriptJan 12 2022, 5:24 AM

bjope requested review of this revision.Jan 12 2022, 5:24 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2022, 5:24 AM

bjope mentioned this in D116930: [DAGCombine] Fold SRA of a load into a narrower sign-extending load.Jan 12 2022, 5:27 AM

bjope added a child revision: D116930: [DAGCombine] Fold SRA of a load into a narrower sign-extending load.Jan 12 2022, 5:28 AM

This is an attempt to refactor the code a bit, making the diff in the D116930 child revision smaller.

There is still some code duplication since SRL is handled in two places. I kind of focused on making the introduction of SRA support easier, while still refactoring the second part of SRL handling just to describe that part a bit more. Maybe it too aggressive refactoring?

Harbormaster completed remote builds in B142893: Diff 399293.Jan 12 2022, 5:50 AM

There are a complicated set of conditions to handle the various patterns/inputs. I don't know if I have accounted for all of the combinations.
I wonder if we'd be better off breaking it up, but if you feel confident that it's correct, let's keep it. :)

See inline for some minor changes.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12097–12098	This sounds like it was originally written with a little-endian-only implementation. How about generalizing to: /// If the result of a load is shifted/masked/truncated to an effectively /// narrower type, try to transform the load to a narrower type and/or /// use an extending load. And fix the capitalization? "DAGCombiner::reduceLoadWidth()"
12117	is masked -> are masked
12149	that it -> that is
12178	needs to be -> need to be
12219	This assumes that ExtVTBits is a power-of-2, but is that enforced/asserted?
12267–12268	This seems backwards. It's the little-endian target that needs to adjust the pointer. We're chopping off the LSB, so this is always converting ShAmt back to zero for big-endian? fe17ce0fa6626f79be66

bjope added inline comments.Jan 13 2022, 10:52 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12219	I've not really understood why these checks exist. I mean, if there is no SRL at all we won't even take this path. So the code below (I figure mainly `isLegalNarrowLdSt`) need to ensure the legality anyway for the more general situation. So maybe this is some kind of early out (possibly saving a tiny amount of compile time). Or it is protecting the AND masking below somehow (but I can't really see that the AND masking depend on these properties). Nevertheless, just like you have spotted, these checks aren't even making sense when ExtVTBits isn't a power of two. As it happens we do get here for multiple lit tests with ExtEVTBits not being a power-of-2. If I simply remove the checks, then I get diffs in several lit tests. I did examined one such test, and it turned out that we ended up with some slightly different order of transforms (resulting in `(and (load i32), 7)` instead of `(and (sexload i8 to i32), 7)`, not really sure whichever would be better in that particular case). If the sextload is preferred, then I guess there should be another DAGCombine added that is doing such a transform(?). If I instead bail out here if ExtVTBits isn't a power-of-2, then I get diffs in 3 lit tests. Those diffs looked like regressions as the instruction count increased in all three cases. Skipping the tests seems like the better solution of the two above, but I'd rather fix that in a separate patch. And there might actually be some alternatives to explore. The AND mask hack below might detect that ExtVT can be reduced further into something that is a power-of-2, so bailing out on ExtVTBits not being a power-of-2 before the AND mask check was perhaps too restricting.

spatel added inline comments.Jan 13 2022, 12:07 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12219	Then this is even more confusing than I thought! It's fine if you want to leave any more changes to other patches. You've probably stepped through this more than anyone else by now. :)

Adding Sam as a potential reviewer - code history shows that the last significant changes were:
D39595
D40034
D50432

spatel added inline comments.Jan 13 2022, 12:15 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12267–12268	To be clear, I think the code is correct. I just meant that the comment seems inverted for endian.

Updates based on review feedback.

bjope marked 4 inline comments as done.Jan 15 2022, 2:17 PM

bjope added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12267–12268	I relaxed the description a bit. We could end up adjusting the pointer both for big/little endian here. Such as only loading a single byte from the middle of an i64.

bjope mentioned this in D117406: [DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth.Jan 15 2022, 2:20 PM

bjope added a child revision: D117406: [DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth.Jan 15 2022, 2:20 PM

bjope added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12219	I made a follow-up patch related to this here D117406.

bjope removed a child revision: D116930: [DAGCombine] Fold SRA of a load into a narrower sign-extending load.Jan 15 2022, 2:28 PM

Harbormaster completed remote builds in B143618: Diff 400318.Jan 15 2022, 3:04 PM

LGTM

This revision is now accepted and ready to land.Jan 16 2022, 8:10 AM

This revision was landed with ongoing or failed builds.Jan 16 2022, 11:25 AM

Closed by commit rG9f237c9e7d88: [DAGCombine] Refactor DAGCombiner::ReduceLoadWidth. NFCI (authored by bjope). · Explain Why

This revision was automatically updated to reflect the committed changes.

bjope added a commit: rG9f237c9e7d88: [DAGCombine] Refactor DAGCombiner::ReduceLoadWidth. NFCI.

bjope mentioned this in rG46cacdbb21c2: [DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth.Jan 24 2022, 3:24 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

185 lines

Diff 400397

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 587 Lines • ▼ Show 20 Lines	SDValue MatchRotatePosNeg(SDValue Shifted, SDValue Pos, SDValue Neg,
const SDLoc &DL);		const SDLoc &DL);
SDValue MatchFunnelPosNeg(SDValue N0, SDValue N1, SDValue Pos, SDValue Neg,		SDValue MatchFunnelPosNeg(SDValue N0, SDValue N1, SDValue Pos, SDValue Neg,
SDValue InnerPos, SDValue InnerNeg,		SDValue InnerPos, SDValue InnerNeg,
unsigned PosOpcode, unsigned NegOpcode,		unsigned PosOpcode, unsigned NegOpcode,
const SDLoc &DL);		const SDLoc &DL);
SDValue MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL);		SDValue MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL);
SDValue MatchLoadCombine(SDNode *N);		SDValue MatchLoadCombine(SDNode *N);
SDValue mergeTruncStores(StoreSDNode *N);		SDValue mergeTruncStores(StoreSDNode *N);
SDValue ReduceLoadWidth(SDNode *N);		SDValue reduceLoadWidth(SDNode *N);
SDValue ReduceLoadOpStoreWidth(SDNode *N);		SDValue ReduceLoadOpStoreWidth(SDNode *N);
SDValue splitMergedValStore(StoreSDNode *ST);		SDValue splitMergedValStore(StoreSDNode *ST);
SDValue TransformFPLoadStorePair(SDNode *N);		SDValue TransformFPLoadStorePair(SDNode *N);
SDValue convertBuildVecZextToZext(SDNode *N);		SDValue convertBuildVecZextToZext(SDNode *N);
SDValue reduceBuildVecExtToExtBuildVec(SDNode *N);		SDValue reduceBuildVecExtToExtBuildVec(SDNode *N);
SDValue reduceBuildVecTruncToBitCast(SDNode *N);		SDValue reduceBuildVecTruncToBitCast(SDNode *N);
SDValue reduceBuildVecToShuffle(SDNode *N);		SDValue reduceBuildVecToShuffle(SDNode *N);
SDValue createBuildVecShuffle(const SDLoc &DL, SDNode *N,		SDValue createBuildVecShuffle(const SDLoc &DL, SDNode *N,
▲ Show 20 Lines • Show All 5,014 Lines • ▼ Show 20 Lines	if (SearchForAndLoads(N, Loads, NodesWithConsts, Mask, FixupNode)) {
for (auto *Load : Loads) {		for (auto *Load : Loads) {
LLVM_DEBUG(dbgs() << "Propagate AND back to: "; Load->dump());		LLVM_DEBUG(dbgs() << "Propagate AND back to: "; Load->dump());
SDValue And = DAG.getNode(ISD::AND, SDLoc(Load), Load->getValueType(0),		SDValue And = DAG.getNode(ISD::AND, SDLoc(Load), Load->getValueType(0),
SDValue(Load, 0), MaskOp);		SDValue(Load, 0), MaskOp);
DAG.ReplaceAllUsesOfValueWith(SDValue(Load, 0), And);		DAG.ReplaceAllUsesOfValueWith(SDValue(Load, 0), And);
if (And.getOpcode() == ISD ::AND)		if (And.getOpcode() == ISD ::AND)
And = SDValue(		And = SDValue(
DAG.UpdateNodeOperands(And.getNode(), SDValue(Load, 0), MaskOp), 0);		DAG.UpdateNodeOperands(And.getNode(), SDValue(Load, 0), MaskOp), 0);
SDValue NewLoad = ReduceLoadWidth(And.getNode());		SDValue NewLoad = reduceLoadWidth(And.getNode());
assert(NewLoad &&		assert(NewLoad &&
"Shouldn't be masking the load if it can't be narrowed");		"Shouldn't be masking the load if it can't be narrowed");
CombineTo(Load, NewLoad, NewLoad.getValue(1));		CombineTo(Load, NewLoad, NewLoad.getValue(1));
}		}
DAG.ReplaceAllUsesWith(N, N->getOperand(0).getNode());		DAG.ReplaceAllUsesWith(N, N->getOperand(0).getNode());
return true;		return true;
}		}
return false;		return false;
▲ Show 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitAND(SDNode *N) {
}		}

// fold (and (load x), 255) -> (zextload x, i8)		// fold (and (load x), 255) -> (zextload x, i8)
// fold (and (extload x, i16), 255) -> (zextload x, i8)		// fold (and (extload x, i16), 255) -> (zextload x, i8)
// fold (and (any_ext (extload x, i16)), 255) -> (zextload x, i8)		// fold (and (any_ext (extload x, i16)), 255) -> (zextload x, i8)
if (!VT.isVector() && N1C && (N0.getOpcode() == ISD::LOAD \|\|		if (!VT.isVector() && N1C && (N0.getOpcode() == ISD::LOAD \|\|
(N0.getOpcode() == ISD::ANY_EXTEND &&		(N0.getOpcode() == ISD::ANY_EXTEND &&
N0.getOperand(0).getOpcode() == ISD::LOAD))) {		N0.getOperand(0).getOpcode() == ISD::LOAD))) {
if (SDValue Res = ReduceLoadWidth(N)) {		if (SDValue Res = reduceLoadWidth(N)) {
LoadSDNode *LN0 = N0->getOpcode() == ISD::ANY_EXTEND		LoadSDNode *LN0 = N0->getOpcode() == ISD::ANY_EXTEND
? cast<LoadSDNode>(N0.getOperand(0)) : cast<LoadSDNode>(N0);		? cast<LoadSDNode>(N0.getOperand(0)) : cast<LoadSDNode>(N0);
AddToWorklist(N);		AddToWorklist(N);
DAG.ReplaceAllUsesOfValueWith(SDValue(LN0, 0), Res);		DAG.ReplaceAllUsesOfValueWith(SDValue(LN0, 0), Res);
return SDValue(N, 0);		return SDValue(N, 0);
}		}
}		}

▲ Show 20 Lines • Show All 3,099 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSRL(SDNode *N) {
if (SimplifyDemandedBits(SDValue(N, 0)))		if (SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);		return SDValue(N, 0);

if (N1C && !N1C->isOpaque())		if (N1C && !N1C->isOpaque())
if (SDValue NewSRL = visitShiftByConstant(N))		if (SDValue NewSRL = visitShiftByConstant(N))
return NewSRL;		return NewSRL;

// Attempt to convert a srl of a load into a narrower zero-extending load.		// Attempt to convert a srl of a load into a narrower zero-extending load.
if (SDValue NarrowLoad = ReduceLoadWidth(N))		if (SDValue NarrowLoad = reduceLoadWidth(N))
return NarrowLoad;		return NarrowLoad;

// Here is a common situation. We want to optimize:		// Here is a common situation. We want to optimize:
//		//
// %a = ...		// %a = ...
// %b = and i32 %a, 2		// %b = and i32 %a, 2
// %c = srl i32 %b, 1		// %c = srl i32 %b, 1
// brcond i32 %c ...		// brcond i32 %c ...
▲ Show 20 Lines • Show All 2,200 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSIGN_EXTEND(SDNode *N) {
// fold (sext (sext x)) -> (sext x)		// fold (sext (sext x)) -> (sext x)
// fold (sext (aext x)) -> (sext x)		// fold (sext (aext x)) -> (sext x)
if (N0.getOpcode() == ISD::SIGN_EXTEND \|\| N0.getOpcode() == ISD::ANY_EXTEND)		if (N0.getOpcode() == ISD::SIGN_EXTEND \|\| N0.getOpcode() == ISD::ANY_EXTEND)
return DAG.getNode(ISD::SIGN_EXTEND, DL, VT, N0.getOperand(0));		return DAG.getNode(ISD::SIGN_EXTEND, DL, VT, N0.getOperand(0));

if (N0.getOpcode() == ISD::TRUNCATE) {		if (N0.getOpcode() == ISD::TRUNCATE) {
// fold (sext (truncate (load x))) -> (sext (smaller load x))		// fold (sext (truncate (load x))) -> (sext (smaller load x))
// fold (sext (truncate (srl (load x), c))) -> (sext (smaller load (x+c/n)))		// fold (sext (truncate (srl (load x), c))) -> (sext (smaller load (x+c/n)))
if (SDValue NarrowLoad = ReduceLoadWidth(N0.getNode())) {		if (SDValue NarrowLoad = reduceLoadWidth(N0.getNode())) {
SDNode *oye = N0.getOperand(0).getNode();		SDNode *oye = N0.getOperand(0).getNode();
if (NarrowLoad.getNode() != N0.getNode()) {		if (NarrowLoad.getNode() != N0.getNode()) {
CombineTo(N0.getNode(), NarrowLoad);		CombineTo(N0.getNode(), NarrowLoad);
// CombineTo deleted the truncate, if needed, but not what's under it.		// CombineTo deleted the truncate, if needed, but not what's under it.
AddToWorklist(oye);		AddToWorklist(oye);
}		}
return SDValue(N, 0); // Return N so it doesn't get rechecked!		return SDValue(N, 0); // Return N so it doesn't get rechecked!
}		}
▲ Show 20 Lines • Show All 247 Lines • ▼ Show 20 Lines	if (isTruncateOf(DAG, N0, Op, Known)) {
if (TruncatedBits.isSubsetOf(Known.Zero))		if (TruncatedBits.isSubsetOf(Known.Zero))
return DAG.getZExtOrTrunc(Op, SDLoc(N), VT);		return DAG.getZExtOrTrunc(Op, SDLoc(N), VT);
}		}

// fold (zext (truncate x)) -> (and x, mask)		// fold (zext (truncate x)) -> (and x, mask)
if (N0.getOpcode() == ISD::TRUNCATE) {		if (N0.getOpcode() == ISD::TRUNCATE) {
// fold (zext (truncate (load x))) -> (zext (smaller load x))		// fold (zext (truncate (load x))) -> (zext (smaller load x))
// fold (zext (truncate (srl (load x), c))) -> (zext (smaller load (x+c/n)))		// fold (zext (truncate (srl (load x), c))) -> (zext (smaller load (x+c/n)))
if (SDValue NarrowLoad = ReduceLoadWidth(N0.getNode())) {		if (SDValue NarrowLoad = reduceLoadWidth(N0.getNode())) {
SDNode *oye = N0.getOperand(0).getNode();		SDNode *oye = N0.getOperand(0).getNode();
if (NarrowLoad.getNode() != N0.getNode()) {		if (NarrowLoad.getNode() != N0.getNode()) {
CombineTo(N0.getNode(), NarrowLoad);		CombineTo(N0.getNode(), NarrowLoad);
// CombineTo deleted the truncate, if needed, but not what's under it.		// CombineTo deleted the truncate, if needed, but not what's under it.
AddToWorklist(oye);		AddToWorklist(oye);
}		}
return SDValue(N, 0); // Return N so it doesn't get rechecked!		return SDValue(N, 0); // Return N so it doesn't get rechecked!
}		}
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitANY_EXTEND(SDNode *N) {
if (N0.getOpcode() == ISD::ANY_EXTEND \|\|		if (N0.getOpcode() == ISD::ANY_EXTEND \|\|
N0.getOpcode() == ISD::ZERO_EXTEND \|\|		N0.getOpcode() == ISD::ZERO_EXTEND \|\|
N0.getOpcode() == ISD::SIGN_EXTEND)		N0.getOpcode() == ISD::SIGN_EXTEND)
return DAG.getNode(N0.getOpcode(), SDLoc(N), VT, N0.getOperand(0));		return DAG.getNode(N0.getOpcode(), SDLoc(N), VT, N0.getOperand(0));

// fold (aext (truncate (load x))) -> (aext (smaller load x))		// fold (aext (truncate (load x))) -> (aext (smaller load x))
// fold (aext (truncate (srl (load x), c))) -> (aext (small load (x+c/n)))		// fold (aext (truncate (srl (load x), c))) -> (aext (small load (x+c/n)))
if (N0.getOpcode() == ISD::TRUNCATE) {		if (N0.getOpcode() == ISD::TRUNCATE) {
if (SDValue NarrowLoad = ReduceLoadWidth(N0.getNode())) {		if (SDValue NarrowLoad = reduceLoadWidth(N0.getNode())) {
SDNode *oye = N0.getOperand(0).getNode();		SDNode *oye = N0.getOperand(0).getNode();
if (NarrowLoad.getNode() != N0.getNode()) {		if (NarrowLoad.getNode() != N0.getNode()) {
CombineTo(N0.getNode(), NarrowLoad);		CombineTo(N0.getNode(), NarrowLoad);
// CombineTo deleted the truncate, if needed, but not what's under it.		// CombineTo deleted the truncate, if needed, but not what's under it.
AddToWorklist(oye);		AddToWorklist(oye);
}		}
return SDValue(N, 0); // Return N so it doesn't get rechecked!		return SDValue(N, 0); // Return N so it doesn't get rechecked!
}		}
▲ Show 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	if (LHSAlignShift >= AlignShift \|\| RHSAlignShift >= AlignShift) {
return DAG.getNode(N0.getOpcode(), DL, N0.getValueType(), LHS, RHS);		return DAG.getNode(N0.getOpcode(), DL, N0.getValueType(), LHS, RHS);
}		}
break;		break;
}		}
}		}

return SDValue();		return SDValue();
}		}

/// If the result of a wider load is shifted to right of N bits and then		/// If the result of a load is shifted/masked/truncated to an effectively
		spatelUnsubmitted Done Reply Inline Actions This sounds like it was originally written with a little-endian-only implementation. How about generalizing to: /// If the result of a load is shifted/masked/truncated to an effectively /// narrower type, try to transform the load to a narrower type and/or /// use an extending load. And fix the capitalization? "DAGCombiner::reduceLoadWidth()" spatel: This sounds like it was originally written with a little-endian-only implementation. How about…
/// truncated to a narrower type and where N is a multiple of number of bits of		/// narrower type, try to transform the load to a narrower type and/or
/// the narrower type, transform it to a narrower load from address + N / num of		/// use an extending load.
/// bits of new type. Also narrow the load if the result is masked with an AND		SDValue DAGCombiner::reduceLoadWidth(SDNode *N) {
/// to effectively produce a smaller type. If the result is to be extended, also
/// fold the extension to form a extending load.
SDValue DAGCombiner::ReduceLoadWidth(SDNode *N) {
unsigned Opc = N->getOpcode();		unsigned Opc = N->getOpcode();

ISD::LoadExtType ExtType = ISD::NON_EXTLOAD;		ISD::LoadExtType ExtType = ISD::NON_EXTLOAD;
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT ExtVT = VT;		EVT ExtVT = VT;

// This transformation isn't valid for vector loads.		// This transformation isn't valid for vector loads.
if (VT.isVector())		if (VT.isVector())
return SDValue();		return SDValue();

		// The ShAmt variable is used to indicate that we've consumed a right
		// shift. I.e. we want to narrow the width of the load by skipping to load the
		// ShAmt least significant bits.
unsigned ShAmt = 0;		unsigned ShAmt = 0;
		// A special case is when the least significant bits from the load are masked
		spatelUnsubmitted Done Reply Inline Actions is masked -> are masked spatel: is masked -> are masked
		// away, but using an AND rather than a right shift. HasShiftedOffset is used
		// to indicate that the narrowed load should be left-shifted ShAmt bits to get
		// the result.
bool HasShiftedOffset = false;		bool HasShiftedOffset = false;
// Special case: SIGN_EXTEND_INREG is basically truncating to ExtVT then		// Special case: SIGN_EXTEND_INREG is basically truncating to ExtVT then
// extended to VT.		// extended to VT.
if (Opc == ISD::SIGN_EXTEND_INREG) {		if (Opc == ISD::SIGN_EXTEND_INREG) {
ExtType = ISD::SEXTLOAD;		ExtType = ISD::SEXTLOAD;
ExtVT = cast<VTSDNode>(N->getOperand(1))->getVT();		ExtVT = cast<VTSDNode>(N->getOperand(1))->getVT();
} else if (Opc == ISD::SRL) {		} else if (Opc == ISD::SRL) {
// Another special-case: SRL is basically zero-extending a narrower value,		// Another special-case: SRL is basically zero-extending a narrower value,
// or it maybe shifting a higher subword, half or byte into the lowest		// or it may be shifting a higher subword, half or byte into the lowest
// bits.		// bits.
ExtType = ISD::ZEXTLOAD;
N0 = SDValue(N, 0);

auto *LN0 = dyn_cast<LoadSDNode>(N0.getOperand(0));		// Only handle shift with constant shift amount, and the shiftee must be a
auto *N01 = dyn_cast<ConstantSDNode>(N0.getOperand(1));		// load.
if (!N01 \|\| !LN0)		auto *LN = dyn_cast<LoadSDNode>(N0);
		auto *N1C = dyn_cast<ConstantSDNode>(N->getOperand(1));
		if (!N1C \|\| !LN)
		return SDValue();
		// If the shift amount is larger than the memory type then we're not
		// accessing any of the loaded bytes.
		ShAmt = N1C->getZExtValue();
		uint64_t MemoryWidth = LN->getMemoryVT().getScalarSizeInBits();
		if (MemoryWidth <= ShAmt)
		return SDValue();
		// Attempt to fold away the SRL by using ZEXTLOAD.
		ExtType = ISD::ZEXTLOAD;
		ExtVT = EVT::getIntegerVT(*DAG.getContext(), MemoryWidth - ShAmt);
		// If original load is a SEXTLOAD then we can't simply replace it by a
		// ZEXTLOAD (we could potentially replace it by a more narrow SEXTLOAD
		// followed by a ZEXT, but that is not handled at the moment).
		spatelUnsubmitted Done Reply Inline Actions that it -> that is spatel: that it -> that is
		if (LN->getExtensionType() == ISD::SEXTLOAD)
return SDValue();		return SDValue();

uint64_t ShiftAmt = N01->getZExtValue();
uint64_t MemoryWidth = LN0->getMemoryVT().getScalarSizeInBits();
if (LN0->getExtensionType() != ISD::SEXTLOAD && MemoryWidth > ShiftAmt)
ExtVT = EVT::getIntegerVT(*DAG.getContext(), MemoryWidth - ShiftAmt);
else
ExtVT = EVT::getIntegerVT(*DAG.getContext(),
VT.getScalarSizeInBits() - ShiftAmt);
} else if (Opc == ISD::AND) {		} else if (Opc == ISD::AND) {
// An AND with a constant mask is the same as a truncate + zero-extend.		// An AND with a constant mask is the same as a truncate + zero-extend.
auto AndC = dyn_cast<ConstantSDNode>(N->getOperand(1));		auto AndC = dyn_cast<ConstantSDNode>(N->getOperand(1));
if (!AndC)		if (!AndC)
return SDValue();		return SDValue();

const APInt &Mask = AndC->getAPIntValue();		const APInt &Mask = AndC->getAPIntValue();
unsigned ActiveBits = 0;		unsigned ActiveBits = 0;
if (Mask.isMask()) {		if (Mask.isMask()) {
ActiveBits = Mask.countTrailingOnes();		ActiveBits = Mask.countTrailingOnes();
} else if (Mask.isShiftedMask()) {		} else if (Mask.isShiftedMask()) {
ShAmt = Mask.countTrailingZeros();		ShAmt = Mask.countTrailingZeros();
APInt ShiftedMask = Mask.lshr(ShAmt);		APInt ShiftedMask = Mask.lshr(ShAmt);
ActiveBits = ShiftedMask.countTrailingOnes();		ActiveBits = ShiftedMask.countTrailingOnes();
HasShiftedOffset = true;		HasShiftedOffset = true;
} else		} else
return SDValue();		return SDValue();

ExtType = ISD::ZEXTLOAD;		ExtType = ISD::ZEXTLOAD;
ExtVT = EVT::getIntegerVT(*DAG.getContext(), ActiveBits);		ExtVT = EVT::getIntegerVT(*DAG.getContext(), ActiveBits);
}		}

if (N0.getOpcode() == ISD::SRL && N0.hasOneUse()) {		// In case Opc==SRL we've already prepared ExtVT/ExtType/ShAmt based on doing
SDValue SRL = N0;		// a right shift. Here we redo some of those checks, to possibly adjust the
if (auto *ConstShift = dyn_cast<ConstantSDNode>(SRL.getOperand(1))) {		// ExtVT even further based on "a masking AND". We could also end up here for
ShAmt = ConstShift->getZExtValue();		// other reasons (e.g. based on Opc==TRUNCATE) and that is why some checks
unsigned EVTBits = ExtVT.getScalarSizeInBits();		// need to be done here as well.
		spatelUnsubmitted Done Reply Inline Actions needs to be -> need to be spatel: needs to be -> need to be
// Is the shift amount a multiple of size of VT?		if (Opc == ISD::SRL \|\| N0.getOpcode() == ISD::SRL) {
if ((ShAmt & (EVTBits-1)) == 0) {		SDValue SRL = Opc == ISD::SRL ? SDValue(N, 0) : N0;
N0 = N0.getOperand(0);		// Bail out when the SRL has more than one use. This is done for historical
// Is the load width a multiple of size of VT?		// (undocumented) reasons. Maybe intent was to guard the AND-masking below
if ((N0.getScalarValueSizeInBits() & (EVTBits - 1)) != 0)		// check below? And maybe it could be non-profitable to do the transform in
		// case the SRL has multiple uses and we get here with Opc!=ISD::SRL?
		// FIXME: Can't we just skip this check for the Opc==ISD::SRL case.
		if (!SRL.hasOneUse())
return SDValue();		return SDValue();
}

// At this point, we must have a load or else we can't do the transform.		// Only handle shift with constant shift amount, and the shiftee must be a
auto *LN0 = dyn_cast<LoadSDNode>(N0);		// load.
if (!LN0) return SDValue();		auto *LN = dyn_cast<LoadSDNode>(SRL.getOperand(0));
		auto *SRL1C = dyn_cast<ConstantSDNode>(SRL.getOperand(1));
		if (!SRL1C \|\| !LN)
		return SDValue();

		// If the shift amount is larger than the input type then we're not
		// accessing any of the loaded bytes. If the load was a zextload/extload
		// then the result of the shift+trunc is zero/undef (handled elsewhere).
		ShAmt = SRL1C->getZExtValue();
		if (ShAmt >= LN->getMemoryVT().getSizeInBits())
		return SDValue();

// Because a SRL must be assumed to need to zero-extend the high bits		// Because a SRL must be assumed to need to zero-extend the high bits
// (as opposed to anyext the high bits), we can't combine the zextload		// (as opposed to anyext the high bits), we can't combine the zextload
// lowering of SRL and an sextload.		// lowering of SRL and an sextload.
if (LN0->getExtensionType() == ISD::SEXTLOAD)		if (LN->getExtensionType() == ISD::SEXTLOAD)
return SDValue();		return SDValue();

// If the shift amount is larger than the input type then we're not		unsigned ExtVTBits = ExtVT.getScalarSizeInBits();
// accessing any of the loaded bytes. If the load was a zextload/extload		// Is the shift amount a multiple of size of ExtVT?
// then the result of the shift+trunc is zero/undef (handled elsewhere).		if ((ShAmt & (ExtVTBits - 1)) != 0)
if (ShAmt >= LN0->getMemoryVT().getSizeInBits())		return SDValue();
		// Is the load width a multiple of size of ExtVT?
		if ((SRL.getScalarValueSizeInBits() & (ExtVTBits - 1)) != 0)
return SDValue();		return SDValue();

// If the SRL is only used by a masking AND, we may be able to adjust		// If the SRL is only used by a masking AND, we may be able to adjust
// the ExtVT to make the AND redundant.		// the ExtVT to make the AND redundant.
SDNode Mask = (SRL->use_begin());		SDNode Mask = (SRL->use_begin());
		spatelUnsubmitted Not Done Reply Inline Actions This assumes that ExtVTBits is a power-of-2, but is that enforced/asserted? spatel: This assumes that ExtVTBits is a power-of-2, but is that enforced/asserted?
		bjopeAuthorUnsubmitted Not Done Reply Inline Actions I've not really understood why these checks exist. I mean, if there is no SRL at all we won't even take this path. So the code below (I figure mainly `isLegalNarrowLdSt`) need to ensure the legality anyway for the more general situation. So maybe this is some kind of early out (possibly saving a tiny amount of compile time). Or it is protecting the AND masking below somehow (but I can't really see that the AND masking depend on these properties). Nevertheless, just like you have spotted, these checks aren't even making sense when ExtVTBits isn't a power of two. As it happens we do get here for multiple lit tests with ExtEVTBits not being a power-of-2. If I simply remove the checks, then I get diffs in several lit tests. I did examined one such test, and it turned out that we ended up with some slightly different order of transforms (resulting in `(and (load i32), 7)` instead of `(and (sexload i8 to i32), 7)`, not really sure whichever would be better in that particular case). If the sextload is preferred, then I guess there should be another DAGCombine added that is doing such a transform(?). If I instead bail out here if ExtVTBits isn't a power-of-2, then I get diffs in 3 lit tests. Those diffs looked like regressions as the instruction count increased in all three cases. Skipping the tests seems like the better solution of the two above, but I'd rather fix that in a separate patch. And there might actually be some alternatives to explore. The AND mask hack below might detect that ExtVT can be reduced further into something that is a power-of-2, so bailing out on ExtVTBits not being a power-of-2 before the AND mask check was perhaps too restricting. bjope: I've not really understood why these checks exist. I mean, if there is no SRL at all we won't…
		spatelUnsubmitted Not Done Reply Inline Actions Then this is even more confusing than I thought! It's fine if you want to leave any more changes to other patches. You've probably stepped through this more than anyone else by now. :) spatel: Then this is even more confusing than I thought! It's fine if you want to leave any more…
		bjopeAuthorUnsubmitted Done Reply Inline Actions I made a follow-up patch related to this here D117406. bjope: I made a follow-up patch related to this here D117406.
if (Mask->getOpcode() == ISD::AND &&		if (SRL.hasOneUse() && Mask->getOpcode() == ISD::AND &&
isa<ConstantSDNode>(Mask->getOperand(1))) {		isa<ConstantSDNode>(Mask->getOperand(1))) {
const APInt& ShiftMask = Mask->getConstantOperandAPInt(1);		const APInt& ShiftMask = Mask->getConstantOperandAPInt(1);
if (ShiftMask.isMask()) {		if (ShiftMask.isMask()) {
EVT MaskedVT = EVT::getIntegerVT(*DAG.getContext(),		EVT MaskedVT = EVT::getIntegerVT(*DAG.getContext(),
ShiftMask.countTrailingOnes());		ShiftMask.countTrailingOnes());
// If the mask is smaller, recompute the type.		// If the mask is smaller, recompute the type.
if ((ExtVT.getScalarSizeInBits() > MaskedVT.getScalarSizeInBits()) &&		if ((ExtVTBits > MaskedVT.getScalarSizeInBits()) &&
TLI.isLoadExtLegal(ExtType, N0.getValueType(), MaskedVT))		TLI.isLoadExtLegal(ExtType, SRL.getValueType(), MaskedVT))
ExtVT = MaskedVT;		ExtVT = MaskedVT;
}		}
}		}
}
		N0 = SRL.getOperand(0);
}		}

// If the load is shifted left (and the result isn't shifted back right),		// If the load is shifted left (and the result isn't shifted back right), we
// we can fold the truncate through the shift.		// can fold a truncate through the shift. The typical scenario is that N
		// points at a TRUNCATE here so the attempted fold is:
		// (truncate (shl (load x), c))) -> (shl (narrow load x), c)
		// ShLeftAmt will indicate how much a narrowed load should be shifted left.
unsigned ShLeftAmt = 0;		unsigned ShLeftAmt = 0;
if (ShAmt == 0 && N0.getOpcode() == ISD::SHL && N0.hasOneUse() &&		if (ShAmt == 0 && N0.getOpcode() == ISD::SHL && N0.hasOneUse() &&
ExtVT == VT && TLI.isNarrowingProfitable(N0.getValueType(), VT)) {		ExtVT == VT && TLI.isNarrowingProfitable(N0.getValueType(), VT)) {
if (ConstantSDNode *N01 = dyn_cast<ConstantSDNode>(N0.getOperand(1))) {		if (ConstantSDNode *N01 = dyn_cast<ConstantSDNode>(N0.getOperand(1))) {
ShLeftAmt = N01->getZExtValue();		ShLeftAmt = N01->getZExtValue();
N0 = N0.getOperand(0);		N0 = N0.getOperand(0);
}		}
}		}
Show All 10 Lines	if (!LN0->isSimple() \|\|
return SDValue();		return SDValue();

auto AdjustBigEndianShift = [&](unsigned ShAmt) {		auto AdjustBigEndianShift = [&](unsigned ShAmt) {
unsigned LVTStoreBits =		unsigned LVTStoreBits =
LN0->getMemoryVT().getStoreSizeInBits().getFixedSize();		LN0->getMemoryVT().getStoreSizeInBits().getFixedSize();
unsigned EVTStoreBits = ExtVT.getStoreSizeInBits().getFixedSize();		unsigned EVTStoreBits = ExtVT.getStoreSizeInBits().getFixedSize();
return LVTStoreBits - EVTStoreBits - ShAmt;		return LVTStoreBits - EVTStoreBits - ShAmt;
};		};

// For big endian targets, we need to adjust the offset to the pointer to		// We need to adjust the pointer to the load by ShAmt bits in order to load
		spatelUnsubmitted Not Done Reply Inline Actions This seems backwards. It's the little-endian target that needs to adjust the pointer. We're chopping off the LSB, so this is always converting ShAmt back to zero for big-endian? fe17ce0fa6626f79be66 spatel: This seems backwards. It's the little-endian target that needs to adjust the pointer. We're…
		spatelUnsubmitted Not Done Reply Inline Actions To be clear, I think the code is correct. I just meant that the comment seems inverted for endian. spatel: To be clear, I think the code is correct. I just meant that the comment seems inverted for…
		bjopeAuthorUnsubmitted Done Reply Inline Actions I relaxed the description a bit. We could end up adjusting the pointer both for big/little endian here. Such as only loading a single byte from the middle of an i64. bjope: I relaxed the description a bit. We could end up adjusting the pointer both for big/little…
// load the correct bytes.		// the correct bytes.
if (DAG.getDataLayout().isBigEndian())		unsigned PtrAdjustmentInBits =
ShAmt = AdjustBigEndianShift(ShAmt);		DAG.getDataLayout().isBigEndian() ? AdjustBigEndianShift(ShAmt) : ShAmt;

uint64_t PtrOff = ShAmt / 8;		uint64_t PtrOff = PtrAdjustmentInBits / 8;
Align NewAlign = commonAlignment(LN0->getAlign(), PtrOff);		Align NewAlign = commonAlignment(LN0->getAlign(), PtrOff);
SDLoc DL(LN0);		SDLoc DL(LN0);
// The original load itself didn't wrap, so an offset within it doesn't.		// The original load itself didn't wrap, so an offset within it doesn't.
SDNodeFlags Flags;		SDNodeFlags Flags;
Flags.setNoUnsignedWrap(true);		Flags.setNoUnsignedWrap(true);
SDValue NewPtr = DAG.getMemBasePlusOffset(LN0->getBasePtr(),		SDValue NewPtr = DAG.getMemBasePlusOffset(LN0->getBasePtr(),
TypeSize::Fixed(PtrOff), DL, Flags);		TypeSize::Fixed(PtrOff), DL, Flags);
AddToWorklist(NewPtr.getNode());		AddToWorklist(NewPtr.getNode());
Show All 26 Lines	if (ShLeftAmt != 0) {
if (ShLeftAmt >= VT.getScalarSizeInBits())		if (ShLeftAmt >= VT.getScalarSizeInBits())
Result = DAG.getConstant(0, DL, VT);		Result = DAG.getConstant(0, DL, VT);
else		else
Result = DAG.getNode(ISD::SHL, DL, VT,		Result = DAG.getNode(ISD::SHL, DL, VT,
Result, DAG.getConstant(ShLeftAmt, DL, ShImmTy));		Result, DAG.getConstant(ShLeftAmt, DL, ShImmTy));
}		}

if (HasShiftedOffset) {		if (HasShiftedOffset) {
// Recalculate the shift amount after it has been altered to calculate
// the offset.
if (DAG.getDataLayout().isBigEndian())
ShAmt = AdjustBigEndianShift(ShAmt);

// We're using a shifted mask, so the load now has an offset. This means		// We're using a shifted mask, so the load now has an offset. This means
// that data has been loaded into the lower bytes than it would have been		// that data has been loaded into the lower bytes than it would have been
// before, so we need to shl the loaded data into the correct position in the		// before, so we need to shl the loaded data into the correct position in the
// register.		// register.
SDValue ShiftC = DAG.getConstant(ShAmt, DL, VT);		SDValue ShiftC = DAG.getConstant(ShAmt, DL, VT);
Result = DAG.getNode(ISD::SHL, DL, VT, Result, ShiftC);		Result = DAG.getNode(ISD::SHL, DL, VT, Result, ShiftC);
DAG.ReplaceAllUsesOfValueWith(SDValue(N, 0), Result);		DAG.ReplaceAllUsesOfValueWith(SDValue(N, 0), Result);
}		}
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSIGN_EXTEND_INREG(SDNode *N) {

// fold operands of sext_in_reg based on knowledge that the top bits are not		// fold operands of sext_in_reg based on knowledge that the top bits are not
// demanded.		// demanded.
if (SimplifyDemandedBits(SDValue(N, 0)))		if (SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);		return SDValue(N, 0);

// fold (sext_in_reg (load x)) -> (smaller sextload x)		// fold (sext_in_reg (load x)) -> (smaller sextload x)
// fold (sext_in_reg (srl (load x), c)) -> (smaller sextload (x+c/evtbits))		// fold (sext_in_reg (srl (load x), c)) -> (smaller sextload (x+c/evtbits))
if (SDValue NarrowLoad = ReduceLoadWidth(N))		if (SDValue NarrowLoad = reduceLoadWidth(N))
return NarrowLoad;		return NarrowLoad;

// fold (sext_in_reg (srl X, 24), i8) -> (sra X, 24)		// fold (sext_in_reg (srl X, 24), i8) -> (sra X, 24)
// fold (sext_in_reg (srl X, 23), i8) -> (sra X, 23) iff possible.		// fold (sext_in_reg (srl X, 23), i8) -> (sra X, 23) iff possible.
// We already fold "(sext_in_reg (srl X, 25), i8) -> srl X, 25" above.		// We already fold "(sext_in_reg (srl X, 25), i8) -> srl X, 25" above.
if (N0.getOpcode() == ISD::SRL) {		if (N0.getOpcode() == ISD::SRL) {
if (auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1)))		if (auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1)))
if (ShAmt->getAPIntValue().ule(VTBits - ExtVTBits)) {		if (ShAmt->getAPIntValue().ule(VTBits - ExtVTBits)) {
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	APInt Mask =
APInt::getLowBitsSet(N0.getValueSizeInBits(), VT.getSizeInBits());		APInt::getLowBitsSet(N0.getValueSizeInBits(), VT.getSizeInBits());
if (SDValue Shorter = DAG.GetDemandedBits(N0, Mask))		if (SDValue Shorter = DAG.GetDemandedBits(N0, Mask))
return DAG.getNode(ISD::TRUNCATE, SDLoc(N), VT, Shorter);		return DAG.getNode(ISD::TRUNCATE, SDLoc(N), VT, Shorter);
}		}

// fold (truncate (load x)) -> (smaller load x)		// fold (truncate (load x)) -> (smaller load x)
// fold (truncate (srl (load x), c)) -> (smaller load (x+c/evtbits))		// fold (truncate (srl (load x), c)) -> (smaller load (x+c/evtbits))
if (!LegalTypes \|\| TLI.isTypeDesirableForOp(N0.getOpcode(), VT)) {		if (!LegalTypes \|\| TLI.isTypeDesirableForOp(N0.getOpcode(), VT)) {
if (SDValue Reduced = ReduceLoadWidth(N))		if (SDValue Reduced = reduceLoadWidth(N))
return Reduced;		return Reduced;

// Handle the case where the load remains an extending load even		// Handle the case where the load remains an extending load even
// after truncation.		// after truncation.
if (N0.hasOneUse() && ISD::isUNINDEXEDLoad(N0.getNode())) {		if (N0.hasOneUse() && ISD::isUNINDEXEDLoad(N0.getNode())) {
LoadSDNode *LN0 = cast<LoadSDNode>(N0);		LoadSDNode *LN0 = cast<LoadSDNode>(N0);
if (LN0->isSimple() && LN0->getMemoryVT().bitsLT(VT)) {		if (LN0->isSimple() && LN0->getMemoryVT().bitsLT(VT)) {
SDValue NewLoad = DAG.getExtLoad(LN0->getExtensionType(), SDLoc(LN0),		SDValue NewLoad = DAG.getExtLoad(LN0->getExtensionType(), SDLoc(LN0),
▲ Show 20 Lines • Show All 11,353 Lines • Show Last 20 Lines