This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
2
DAGCombiner.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
shift-i128.ll

Differential D23007

[DAGCombiner] Better support for shifting large value type by constants
ClosedPublic

Authored by RKSimon on Aug 1 2016, 3:12 AM.

Download Raw Diff

Details

Reviewers

spatel
t.p.northover
majnemer
eli.friedman
andreadb
bryant

Commits

rG76964e31407e: [DAGCombiner] Better support for shifting large value type by constants
rL278141: [DAGCombiner] Better support for shifting large value type by constants

Summary

As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths.

This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 66293.Aug 1 2016, 3:12 AM

RKSimon retitled this revision from to [DAGCombiner] Better support for shifting large value type by constants.

RKSimon updated this object.

RKSimon added reviewers: eli.friedman, majnemer, bryant, spatel, andreadb.

RKSimon set the repository for this revision to rL LLVM.

RKSimon mentioned this in D22726: [DAGCombine] Match shift amount by value rather than relying on common sub-expressions..

RKSimon added a subscriber: llvm-commits.

bryant added a child revision: D22726: [DAGCombine] Match shift amount by value rather than relying on common sub-expressions..Aug 1 2016, 8:39 PM

ping?

Isn't the IR already undefined whenever this triggers (unless you're working on an i18446744073709551616)?

In D23007#508614, @t.p.northover wrote:

Isn't the IR already undefined whenever this triggers (unless you're working on an i18446744073709551616)?

If the offending IR originated during dag combine then this is what should be reducing it to UNDEF. And when we are combining repeated shifts with the outer shift being valid then the inner shift might not have been reduced to UNDEF yet.

Ah, of course. undef isn't unrestrained UB, we still have to compile it on the assumption that it's never dynamically executed.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
731–739	Lower-case 'z' for functions. A simpler body would be: unsigned Bits = std::max(LHS.getBitWidth(), RHS.getBitWidth()); LHS = LHS.zextOrSelf(Bits); RHS = RHS.zextOrSelf(Bits);
4484	I don't think this is quite the right check, though it might not matter since it's UB anyway (e.g. c1 = c2 = 2^(Bits-1)). A comment that we made the decision intentionally might be useful though.

Updated based on Tim's feedback. Added better support for shift overflow.

I took the easy way out and didn't add support for i18446744073709551616 ;-)

Thanks Simon! Looks good to me.

This revision is now accepted and ready to land.Aug 9 2016, 9:12 AM

Closed by commit rL278141: [DAGCombiner] Better support for shifting large value type by constants (authored by RKSimon). · Explain WhyAug 9 2016, 10:47 AM

This revision was automatically updated to reflect the committed changes.

RKSimon mentioned this in rL281354: [DAGCombiner] Use APInt directly in (shl (ext (shl x, c1)), c2) combine.Sep 13 2016, 10:24 AM

RKSimon mentioned this in rL281362: [DAGCombiner] Use APInt directly in (shl (zext (srl x, C)), C) combine range….Sep 13 2016, 11:42 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

	DAGCombiner.cpp
	DAGCombiner.cpp (revision 277331)

40 lines

test/

CodeGen/

X86/

	shift-i128.ll
	shift-i128.ll (revision 277331)

24 lines

Diff 66293

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 720 Lines • ▼ Show 20 Lines	static SDValue GetNegatedExpression(SDValue Op, SelectionDAG &DAG,
case ISD::FP_ROUND:		case ISD::FP_ROUND:
return DAG.getNode(ISD::FP_ROUND, SDLoc(Op), Op.getValueType(),		return DAG.getNode(ISD::FP_ROUND, SDLoc(Op), Op.getValueType(),
GetNegatedExpression(Op.getOperand(0), DAG,		GetNegatedExpression(Op.getOperand(0), DAG,
LegalOperations, Depth+1),		LegalOperations, Depth+1),
Op.getOperand(1));		Op.getOperand(1));
}		}
}		}

		// APInts must be the same size for most operations, this helper
		// function zero extends the shorter of the pair so that they match.
		static void ZeroExtendToMatch(APInt &LHS, APInt &RHS) {
		unsigned LHSBits = LHS.getBitWidth();
		unsigned RHSBits = RHS.getBitWidth();

		if (LHSBits > RHSBits)
		RHS = RHS.zext(LHSBits);
		else if (RHSBits > LHSBits)
		LHS = LHS.zext(RHSBits);
		}
		t.p.northoverUnsubmitted Not Done Reply Inline Actions Lower-case 'z' for functions. A simpler body would be: unsigned Bits = std::max(LHS.getBitWidth(), RHS.getBitWidth()); LHS = LHS.zextOrSelf(Bits); RHS = RHS.zextOrSelf(Bits); t.p.northover: Lower-case 'z' for functions. A simpler body would be: unsigned Bits = std::max(LHS.

// Return true if this node is a setcc, or is a select_cc		// Return true if this node is a setcc, or is a select_cc
// that selects between the target values used for true and false, making it		// that selects between the target values used for true and false, making it
// equivalent to a setcc. Also, set the incoming LHS, RHS, and CC references to		// equivalent to a setcc. Also, set the incoming LHS, RHS, and CC references to
// the appropriate nodes based on the type of node we are checking. This		// the appropriate nodes based on the type of node we are checking. This
// simplifies life a bit for the callers.		// simplifies life a bit for the callers.
bool DAGCombiner::isSetCCEquivalent(SDValue N, SDValue &LHS, SDValue &RHS,		bool DAGCombiner::isSetCCEquivalent(SDValue N, SDValue &LHS, SDValue &RHS,
SDValue &CC) const {		SDValue &CC) const {
if (N.getOpcode() == ISD::SETCC) {		if (N.getOpcode() == ISD::SETCC) {
▲ Show 20 Lines • Show All 3,722 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSHL(SDNode *N) {
}		}

if (N1C && SimplifyDemandedBits(SDValue(N, 0)))		if (N1C && SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);		return SDValue(N, 0);

// fold (shl (shl x, c1), c2) -> 0 or (shl x, (add c1, c2))		// fold (shl (shl x, c1), c2) -> 0 or (shl x, (add c1, c2))
if (N1C && N0.getOpcode() == ISD::SHL) {		if (N1C && N0.getOpcode() == ISD::SHL) {
if (ConstantSDNode *N0C1 = isConstOrConstSplat(N0.getOperand(1))) {		if (ConstantSDNode *N0C1 = isConstOrConstSplat(N0.getOperand(1))) {
uint64_t c1 = N0C1->getZExtValue();		APInt c1 = N0C1->getAPIntValue();
uint64_t c2 = N1C->getZExtValue();		APInt c2 = N1C->getAPIntValue();
		ZeroExtendToMatch(c1, c2);

SDLoc DL(N);		SDLoc DL(N);
if (c1 + c2 >= OpSizeInBits)		if ((c1 + c2).uge(OpSizeInBits))
		t.p.northoverUnsubmitted Not Done Reply Inline Actions I don't think this is quite the right check, though it might not matter since it's UB anyway (e.g. c1 = c2 = 2^(Bits-1)). A comment that we made the decision intentionally might be useful though. t.p.northover: I don't think this is quite the right check, though it might not matter since it's UB anyway (e.
return DAG.getConstant(0, DL, VT);		return DAG.getConstant(0, DL, VT);
return DAG.getNode(ISD::SHL, DL, VT, N0.getOperand(0),		return DAG.getNode(ISD::SHL, DL, VT, N0.getOperand(0),
DAG.getConstant(c1 + c2, DL, N1.getValueType()));		DAG.getConstant(c1 + c2, DL, N1.getValueType()));
}		}
}		}

// fold (shl (ext (shl x, c1)), c2) -> (ext (shl x, (add c1, c2)))		// fold (shl (ext (shl x, c1)), c2) -> (ext (shl x, (add c1, c2)))
// For this to be valid, the second form must not preserve any of the bits		// For this to be valid, the second form must not preserve any of the bits
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	if (N1C && N0.getOpcode() == ISD::SHL && N1 == N0.getOperand(1)) {
if ((!LegalOperations \|\|		if ((!LegalOperations \|\|
TLI.isOperationLegal(ISD::SIGN_EXTEND_INREG, ExtVT)))		TLI.isOperationLegal(ISD::SIGN_EXTEND_INREG, ExtVT)))
return DAG.getNode(ISD::SIGN_EXTEND_INREG, SDLoc(N), VT,		return DAG.getNode(ISD::SIGN_EXTEND_INREG, SDLoc(N), VT,
N0.getOperand(0), DAG.getValueType(ExtVT));		N0.getOperand(0), DAG.getValueType(ExtVT));
}		}

// fold (sra (sra x, c1), c2) -> (sra x, (add c1, c2))		// fold (sra (sra x, c1), c2) -> (sra x, (add c1, c2))
if (N1C && N0.getOpcode() == ISD::SRA) {		if (N1C && N0.getOpcode() == ISD::SRA) {
if (ConstantSDNode *C1 = isConstOrConstSplat(N0.getOperand(1))) {		if (ConstantSDNode *N0C1 = isConstOrConstSplat(N0.getOperand(1))) {
unsigned Sum = N1C->getZExtValue() + C1->getZExtValue();		APInt c1 = N0C1->getAPIntValue();
if (Sum >= OpSizeInBits)		APInt c2 = N1C->getAPIntValue();
		ZeroExtendToMatch(c1, c2);

		APInt Sum = c1 + c2;
		if (Sum.uge(OpSizeInBits))
Sum = OpSizeInBits - 1;		Sum = OpSizeInBits - 1;
SDLoc DL(N);		SDLoc DL(N);
return DAG.getNode(ISD::SRA, DL, VT, N0.getOperand(0),		return DAG.getNode(ISD::SRA, DL, VT, N0.getOperand(0),
DAG.getConstant(Sum, DL, N1.getValueType()));		DAG.getConstant(Sum, DL, N1.getValueType()));
}		}
}		}

// fold (sra (shl X, m), (sub result_size, n))		// fold (sra (shl X, m), (sub result_size, n))
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	if (N1C && N1C->isNullValue())
return N0;		return N0;
// if (srl x, c) is known to be zero, return 0		// if (srl x, c) is known to be zero, return 0
if (N1C && DAG.MaskedValueIsZero(SDValue(N, 0),		if (N1C && DAG.MaskedValueIsZero(SDValue(N, 0),
APInt::getAllOnesValue(OpSizeInBits)))		APInt::getAllOnesValue(OpSizeInBits)))
return DAG.getConstant(0, SDLoc(N), VT);		return DAG.getConstant(0, SDLoc(N), VT);

// fold (srl (srl x, c1), c2) -> 0 or (srl x, (add c1, c2))		// fold (srl (srl x, c1), c2) -> 0 or (srl x, (add c1, c2))
if (N1C && N0.getOpcode() == ISD::SRL) {		if (N1C && N0.getOpcode() == ISD::SRL) {
if (ConstantSDNode *N01C = isConstOrConstSplat(N0.getOperand(1))) {		if (ConstantSDNode *N0C1 = isConstOrConstSplat(N0.getOperand(1))) {
uint64_t c1 = N01C->getZExtValue();		APInt c1 = N0C1->getAPIntValue();
uint64_t c2 = N1C->getZExtValue();		APInt c2 = N1C->getAPIntValue();
		ZeroExtendToMatch(c1, c2);

SDLoc DL(N);		SDLoc DL(N);
if (c1 + c2 >= OpSizeInBits)		if ((c1 + c2).uge(OpSizeInBits))
return DAG.getConstant(0, DL, VT);		return DAG.getConstant(0, DL, VT);
return DAG.getNode(ISD::SRL, DL, VT, N0.getOperand(0),		return DAG.getNode(ISD::SRL, DL, VT, N0.getOperand(0),
DAG.getConstant(c1 + c2, DL, N1.getValueType()));		DAG.getConstant(c1 + c2, DL, N1.getValueType()));
}		}
}		}

// fold (srl (trunc (srl x, c1)), c2) -> 0 or (trunc (srl x, (add c1, c2)))		// fold (srl (trunc (srl x, c1)), c2) -> 0 or (trunc (srl x, (add c1, c2)))
if (N1C && N0.getOpcode() == ISD::TRUNCATE &&		if (N1C && N0.getOpcode() == ISD::TRUNCATE &&
▲ Show 20 Lines • Show All 10,204 Lines • Show Last 20 Lines

test/CodeGen/X86/shift-i128.ll

	Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	}			}

	define void @test_shl_v2i128_outofrange(<2 x i128> %x, <2 x i128>* nocapture %r) nounwind {			define void @test_shl_v2i128_outofrange(<2 x i128> %x, <2 x i128>* nocapture %r) nounwind {
	entry:			entry:
	%0 = shl <2 x i128> %x, <i128 -1, i128 -1>			%0 = shl <2 x i128> %x, <i128 -1, i128 -1>
	store <2 x i128> %0, <2 x i128>* %r, align 16			store <2 x i128> %0, <2 x i128>* %r, align 16
	ret void			ret void
	}			}

				define void @test_lshr_v2i128_outofrange_sum(<2 x i128> %x, <2 x i128>* nocapture %r) nounwind {
				entry:
				%0 = lshr <2 x i128> %x, <i128 -1, i128 -1>
				%1 = lshr <2 x i128> %0, <i128 1, i128 1>
				store <2 x i128> %1, <2 x i128>* %r, align 16
				ret void
				}

				define void @test_ashr_v2i128_outofrange_sum(<2 x i128> %x, <2 x i128>* nocapture %r) nounwind {
				entry:
				%0 = ashr <2 x i128> %x, <i128 -1, i128 -1>
				%1 = ashr <2 x i128> %0, <i128 1, i128 1>
				store <2 x i128> %1, <2 x i128>* %r, align 16
				ret void
				}

				define void @test_shl_v2i128_outofrange_sum(<2 x i128> %x, <2 x i128>* nocapture %r) nounwind {
				entry:
				%0 = shl <2 x i128> %x, <i128 -1, i128 -1>
				%1 = shl <2 x i128> %0, <i128 1, i128 1>
				store <2 x i128> %1, <2 x i128>* %r, align 16
				ret void
				}