Download Raw Diff

Details

Reviewers

hfinkel
echristo
kbarton
nemanjai
sfertile
lei
syzaara
jtojnar
gyiu
stefanp

Commits

rGdcedd66b007f: [PowerPC] support ZERO_EXTEND in tryBitPermutation
rL314655: [PowerPC] support ZERO_EXTEND in tryBitPermutation

Summary

This patch add a support of ISD::ZERO_EXTEND in PPCDAGToDAGISel::tryBitPermutation to increase the opportunity to use rotate-and-mask by reordering ZEXT and ANDI.
Since tryBitPermutation stops analyzing nodes if it hits a ZEXT node while traversing SDNodes, we want to avoid ZEXT between two nodes that can be folded into a rotate-and-mask instruction.

For example, we allow these nodes

      t9: i32 = add t7, Constant:i32<1>
    t11: i32 = and t9, Constant:i32<255>
  t12: i64 = zero_extend t11
t14: i64 = shl t12, Constant:i64<2>

to be folded into a rotate-and-mask instruction.
Such case often happens in array accesses with logical AND operation in the index, e.g. array[i & 0xFF];

Diff Detail

Repository: rL LLVM

Event Timeline

inouehrs created this revision.Sep 6 2017, 6:07 AM

Herald added a subscriber: igorb. · View Herald TranscriptSep 6 2017, 6:07 AM

I'd prefer that we extend tryBitPermutation to look through the zext. I realize this means adding code there to deal with mismatched input/output bit widths, but that seems much less fragile, and potentially more useful, than this.

I have a particular fear of doing this kinds of transformations as target-specific DAGCombines: The output pattern could be anti-canonical, or could become so in the future, and if that's in any case true, then you'll cause the optimizer to hang while DAGCombine and the target fight each other. I also don't like when one piece of code is guessing what another piece will do. Sometimes this is unavoidable (e.g., the vectorizer guesses what the register allocator will do), but these two pieces of code are essentially at the same level, so there's no need. As a result, I'd recommend against solving the problem this way.

Hal, thank you for the advice. I will redesign based on your comment.

reimplemented the optimization in tryBitPermutation instead of adding new DAGCombine rule as @hfinkel suggested.

I apologize for taking so long to get back to this. The bit-permutation selector uses a cost model to decide how to lower each permutation sequence. Preemptively lowering the zext like this during the analysis phase of the algorithm seems suboptimal (or at least ad hoc).

In BitPermutationSelector, the ValueBit type can have one of two kinds: Variable or ConstZero. What we should do here to handle zext is, upon encountering the ISD::ZERO_EXTEND node, we should simply recurse by calling getValueBits(V.getOperand(0), <number of operand bits>). After recursing, we should take the gathered bits, extend the result with ValueBit(ValueBit::ConstZero) (similar to how ISD::AND is handled now).

I believe that the remainder of the algorithm will work as-is, except, that in the code that actually selects the machine instructions: the code called from Select64 -- and perhaps Select32 if we're extending from i1, although you might omit this by only handling i32 -> i64 zext), you'll need to check that the incoming values are all i64. If they're not, you'll then want to convert them to i64 using INSERT_SUBREG (as you're doing here). You could even do this as a pre-pass over all of the BitGroups to avoid complicating the rest of the code (by keeping a map to avoid inserting more insert_subregs than necessary: you want only one per incoming i32 value).

@hfinkel Thank you so much for the suggestion. I reimplemented the patch based on your suggestion and I hope this implementation is cleaner than the previous one.
So far, the patch supporting only i32 to i64 zero-entensions. I will support other conversions such as i1 case if I found common patterns.

hfinkel added inline comments.Sep 29 2017, 8:26 AM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
1076 ↗	(On Diff #117096)	entend -> extend
1079 ↗	(On Diff #117096)	Why are you checking that the upper bits are zero? This is a zext node, so I'd think you should just force them all to zero (like the code for AND does for bits that are zero).

hfinkel added inline comments.Sep 29 2017, 8:37 AM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
1079 ↗	(On Diff #117096)	Or, to put it another way, if you do it this way, then you should have a comment that reads something like, "We'll look through zext nodes here, but only if they're provably redundant." If we do this, however, we should explain why.

inouehrs updated this revision to Diff 117202.Sep 29 2017, 12:35 PM

inouehrs added inline comments.Sep 29 2017, 12:40 PM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
1079 ↗	(On Diff #117096)	As you suggested, zero extension is like logical AND and so I do not need to check the upper bits. I removed the check.

LGTM (but please run performance tests and a self-host check before committing - I can imagine us overlooking something here).

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
1074 ↗	(On Diff #117202)	Line is too long.

This revision is now accepted and ready to land.Sep 29 2017, 1:07 PM

Closed by commit rL314655: [PowerPC] support ZERO_EXTEND in tryBitPermutation (authored by inouehrs). · Explain WhyOct 2 2017, 2:25 AM

This revision was automatically updated to reflect the committed changes.

I did not see significant changes in performance on average (0.01% improvement) in five runs of SPECCPU2006.
As individual benchmarks, the performance ranges from 1.74% improvement (povray) to 1.23% degradation (xalan).
(Note that measurements are done on a somewhat _noisy_ cloud instance.)

All tests are passed on a ppc64le/POWER8 box.

Diff 117312

llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

Show First 20 Lines • Show All 1,057 Lines • ▼ Show 20 Lines	case ISD::OR: {
break;		break;
}		}

if (!AllDisjoint)		if (!AllDisjoint)
break;		break;

return std::make_pair(Interesting = true, &Bits);		return std::make_pair(Interesting = true, &Bits);
}		}
		case ISD::ZERO_EXTEND: {
		// We support only the case with zero extension from i32 to i64 so far.
		if (V.getValueType() != MVT::i64 \|\|
		V.getOperand(0).getValueType() != MVT::i32)
		break;

		const SmallVector<ValueBit, 64> *LHSBits;
		const unsigned NumOperandBits = 32;
		std::tie(Interesting, LHSBits) = getValueBits(V.getOperand(0),
		NumOperandBits);

		for (unsigned i = 0; i < NumOperandBits; ++i)
		Bits[i] = (*LHSBits)[i];

		for (unsigned i = NumOperandBits; i < NumBits; ++i)
		Bits[i] = ValueBit(ValueBit::ConstZero);

		return std::make_pair(Interesting, &Bits);
		}
}		}

for (unsigned i = 0; i < NumBits; ++i)		for (unsigned i = 0; i < NumBits; ++i)
Bits[i] = ValueBit(V, i);		Bits[i] = ValueBit(V, i);

return std::make_pair(Interesting = false, &Bits);		return std::make_pair(Interesting = false, &Bits);
}		}

▲ Show 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < Bits.size(); ++i) {
if (Bits[i].hasValue())		if (Bits[i].hasValue())
continue;		continue;
Mask \|= (UINT64_C(1) << i);		Mask \|= (UINT64_C(1) << i);
}		}

return ~Mask;		return ~Mask;
}		}

		// This method extends an input value to 64 bit if input is 32-bit integer.
		// While selecting instructions in BitPermutationSelector in 64-bit mode,
		// an input value can be a 32-bit integer if a ZERO_EXTEND node is included.
		// In such case, we extend it to 64 bit to be consistent with other values.
		SDValue ExtendToInt64(SDValue V, const SDLoc &dl) {
		if (V.getValueSizeInBits() == 64)
		return V;

		assert(V.getValueSizeInBits() == 32);
		SDValue SubRegIdx = CurDAG->getTargetConstant(PPC::sub_32, dl, MVT::i32);
		SDValue ImDef = SDValue(CurDAG->getMachineNode(PPC::IMPLICIT_DEF, dl,
		MVT::i64), 0);
		SDValue ExtVal = SDValue(CurDAG->getMachineNode(PPC::INSERT_SUBREG, dl,
		MVT::i64, ImDef, V,
		SubRegIdx), 0);
		return ExtVal;
		}

// Depending on the number of groups for a particular value, it might be		// Depending on the number of groups for a particular value, it might be
// better to rotate, mask explicitly (using andi/andis), and then or the		// better to rotate, mask explicitly (using andi/andis), and then or the
// result. Select this part of the result first.		// result. Select this part of the result first.
void SelectAndParts32(const SDLoc &dl, SDValue &Res, unsigned *InstCnt) {		void SelectAndParts32(const SDLoc &dl, SDValue &Res, unsigned *InstCnt) {
if (BPermRewriterNoMasking)		if (BPermRewriterNoMasking)
return;		return;

for (ValueRotInfo &VRI : ValueRotsVec) {		for (ValueRotInfo &VRI : ValueRotsVec) {
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	SDValue SelectRotMask64(SDValue V, const SDLoc &dl, unsigned RLAmt,

if (Repl32) {		if (Repl32) {
// This rotation amount assumes that the lower 32 bits of the quantity		// This rotation amount assumes that the lower 32 bits of the quantity
// are replicated in the high 32 bits by the rotation operator (which is		// are replicated in the high 32 bits by the rotation operator (which is
// done by rlwinm and friends).		// done by rlwinm and friends).
assert(InstMaskStart >= 32 && "Mask cannot start out of range");		assert(InstMaskStart >= 32 && "Mask cannot start out of range");
assert(InstMaskEnd >= 32 && "Mask cannot end out of range");		assert(InstMaskEnd >= 32 && "Mask cannot end out of range");
SDValue Ops[] =		SDValue Ops[] =
{ V, getI32Imm(RLAmt, dl), getI32Imm(InstMaskStart - 32, dl),		{ ExtendToInt64(V, dl), getI32Imm(RLAmt, dl),
getI32Imm(InstMaskEnd - 32, dl) };		getI32Imm(InstMaskStart - 32, dl), getI32Imm(InstMaskEnd - 32, dl) };
return SDValue(CurDAG->getMachineNode(PPC::RLWINM8, dl, MVT::i64,		return SDValue(CurDAG->getMachineNode(PPC::RLWINM8, dl, MVT::i64,
Ops), 0);		Ops), 0);
}		}

if (InstMaskEnd == 63) {		if (InstMaskEnd == 63) {
SDValue Ops[] =		SDValue Ops[] =
{ V, getI32Imm(RLAmt, dl), getI32Imm(InstMaskStart, dl) };		{ ExtendToInt64(V, dl), getI32Imm(RLAmt, dl),
		getI32Imm(InstMaskStart, dl) };
return SDValue(CurDAG->getMachineNode(PPC::RLDICL, dl, MVT::i64, Ops), 0);		return SDValue(CurDAG->getMachineNode(PPC::RLDICL, dl, MVT::i64, Ops), 0);
}		}

if (InstMaskStart == 0) {		if (InstMaskStart == 0) {
SDValue Ops[] =		SDValue Ops[] =
{ V, getI32Imm(RLAmt, dl), getI32Imm(InstMaskEnd, dl) };		{ ExtendToInt64(V, dl), getI32Imm(RLAmt, dl),
		getI32Imm(InstMaskEnd, dl) };
return SDValue(CurDAG->getMachineNode(PPC::RLDICR, dl, MVT::i64, Ops), 0);		return SDValue(CurDAG->getMachineNode(PPC::RLDICR, dl, MVT::i64, Ops), 0);
}		}

if (InstMaskEnd == 63 - RLAmt) {		if (InstMaskEnd == 63 - RLAmt) {
SDValue Ops[] =		SDValue Ops[] =
{ V, getI32Imm(RLAmt, dl), getI32Imm(InstMaskStart, dl) };		{ ExtendToInt64(V, dl), getI32Imm(RLAmt, dl),
		getI32Imm(InstMaskStart, dl) };
return SDValue(CurDAG->getMachineNode(PPC::RLDIC, dl, MVT::i64, Ops), 0);		return SDValue(CurDAG->getMachineNode(PPC::RLDIC, dl, MVT::i64, Ops), 0);
}		}

// We cannot do this with a single instruction, so we'll use two. The		// We cannot do this with a single instruction, so we'll use two. The
// problem is that we're not free to choose both a rotation amount and mask		// problem is that we're not free to choose both a rotation amount and mask
// start and end independently. We can choose an arbitrary mask start and		// start and end independently. We can choose an arbitrary mask start and
// end, but then the rotation amount is fixed. Rotation, however, can be		// end, but then the rotation amount is fixed. Rotation, however, can be
// inverted, and so by applying an "inverse" rotation first, we can get the		// inverted, and so by applying an "inverse" rotation first, we can get the
Show All 24 Lines	SDValue SelectRotMaskIns64(SDValue Base, SDValue V, const SDLoc &dl,

if (Repl32) {		if (Repl32) {
// This rotation amount assumes that the lower 32 bits of the quantity		// This rotation amount assumes that the lower 32 bits of the quantity
// are replicated in the high 32 bits by the rotation operator (which is		// are replicated in the high 32 bits by the rotation operator (which is
// done by rlwinm and friends).		// done by rlwinm and friends).
assert(InstMaskStart >= 32 && "Mask cannot start out of range");		assert(InstMaskStart >= 32 && "Mask cannot start out of range");
assert(InstMaskEnd >= 32 && "Mask cannot end out of range");		assert(InstMaskEnd >= 32 && "Mask cannot end out of range");
SDValue Ops[] =		SDValue Ops[] =
{ Base, V, getI32Imm(RLAmt, dl), getI32Imm(InstMaskStart - 32, dl),		{ ExtendToInt64(Base, dl), ExtendToInt64(V, dl), getI32Imm(RLAmt, dl),
getI32Imm(InstMaskEnd - 32, dl) };		getI32Imm(InstMaskStart - 32, dl), getI32Imm(InstMaskEnd - 32, dl) };
return SDValue(CurDAG->getMachineNode(PPC::RLWIMI8, dl, MVT::i64,		return SDValue(CurDAG->getMachineNode(PPC::RLWIMI8, dl, MVT::i64,
Ops), 0);		Ops), 0);
}		}

if (InstMaskEnd == 63 - RLAmt) {		if (InstMaskEnd == 63 - RLAmt) {
SDValue Ops[] =		SDValue Ops[] =
{ Base, V, getI32Imm(RLAmt, dl), getI32Imm(InstMaskStart, dl) };		{ ExtendToInt64(Base, dl), ExtendToInt64(V, dl), getI32Imm(RLAmt, dl),
		getI32Imm(InstMaskStart, dl) };
return SDValue(CurDAG->getMachineNode(PPC::RLDIMI, dl, MVT::i64, Ops), 0);		return SDValue(CurDAG->getMachineNode(PPC::RLDIMI, dl, MVT::i64, Ops), 0);
}		}

// We cannot do this with a single instruction, so we'll use two. The		// We cannot do this with a single instruction, so we'll use two. The
// problem is that we're not free to choose both a rotation amount and mask		// problem is that we're not free to choose both a rotation amount and mask
// start and end independently. We can choose an arbitrary mask start and		// start and end independently. We can choose an arbitrary mask start and
// end, but then the rotation amount is fixed. Rotation, however, can be		// end, but then the rotation amount is fixed. Rotation, however, can be
// inverted, and so by applying an "inverse" rotation first, we can get the		// inverted, and so by applying an "inverse" rotation first, we can get the
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	for (ValueRotInfo &VRI : ValueRotsVec) {
SDValue TotalVal;		SDValue TotalVal;
if (Use32BitInsts) {		if (Use32BitInsts) {
assert((ANDIMask != 0 \|\| ANDISMask != 0) &&		assert((ANDIMask != 0 \|\| ANDISMask != 0) &&
"No set bits in mask when using 32-bit ands for 64-bit value");		"No set bits in mask when using 32-bit ands for 64-bit value");

SDValue ANDIVal, ANDISVal;		SDValue ANDIVal, ANDISVal;
if (ANDIMask != 0)		if (ANDIMask != 0)
ANDIVal = SDValue(CurDAG->getMachineNode(PPC::ANDIo8, dl, MVT::i64,		ANDIVal = SDValue(CurDAG->getMachineNode(PPC::ANDIo8, dl, MVT::i64,
VRot, getI32Imm(ANDIMask, dl)), 0);		ExtendToInt64(VRot, dl),
		getI32Imm(ANDIMask, dl)),
		0);
if (ANDISMask != 0)		if (ANDISMask != 0)
ANDISVal = SDValue(CurDAG->getMachineNode(PPC::ANDISo8, dl, MVT::i64,		ANDISVal = SDValue(CurDAG->getMachineNode(PPC::ANDISo8, dl, MVT::i64,
VRot, getI32Imm(ANDISMask, dl)), 0);		ExtendToInt64(VRot, dl),
		getI32Imm(ANDISMask, dl)),
		0);

if (!ANDIVal)		if (!ANDIVal)
TotalVal = ANDISVal;		TotalVal = ANDISVal;
else if (!ANDISVal)		else if (!ANDISVal)
TotalVal = ANDIVal;		TotalVal = ANDIVal;
else		else
TotalVal = SDValue(CurDAG->getMachineNode(PPC::OR8, dl, MVT::i64,		TotalVal = SDValue(CurDAG->getMachineNode(PPC::OR8, dl, MVT::i64,
ANDIVal, ANDISVal), 0);		ExtendToInt64(ANDIVal, dl), ANDISVal), 0);
} else {		} else {
TotalVal = SDValue(selectI64Imm(CurDAG, dl, Mask), 0);		TotalVal = SDValue(selectI64Imm(CurDAG, dl, Mask), 0);
TotalVal =		TotalVal =
SDValue(CurDAG->getMachineNode(PPC::AND8, dl, MVT::i64,		SDValue(CurDAG->getMachineNode(PPC::AND8, dl, MVT::i64,
VRot, TotalVal), 0);		ExtendToInt64(VRot, dl), TotalVal),
		0);
}		}

if (!Res)		if (!Res)
Res = TotalVal;		Res = TotalVal;
else		else
Res = SDValue(CurDAG->getMachineNode(PPC::OR8, dl, MVT::i64,		Res = SDValue(CurDAG->getMachineNode(PPC::OR8, dl, MVT::i64,
Res, TotalVal), 0);		ExtendToInt64(Res, dl), TotalVal),
		0);

// Now, remove all groups with this underlying value and rotation		// Now, remove all groups with this underlying value and rotation
// factor.		// factor.
eraseMatchingBitGroups(MatchingBG);		eraseMatchingBitGroups(MatchingBG);
}		}
}		}

// Instruction selection for the 64-bit case.		// Instruction selection for the 64-bit case.
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	if (LateMask) {

if (InstCnt) *InstCnt += (unsigned) (ANDIMask != 0) +		if (InstCnt) *InstCnt += (unsigned) (ANDIMask != 0) +
(unsigned) (ANDISMask != 0) +		(unsigned) (ANDISMask != 0) +
(unsigned) (ANDIMask != 0 && ANDISMask != 0);		(unsigned) (ANDIMask != 0 && ANDISMask != 0);

SDValue ANDIVal, ANDISVal;		SDValue ANDIVal, ANDISVal;
if (ANDIMask != 0)		if (ANDIMask != 0)
ANDIVal = SDValue(CurDAG->getMachineNode(PPC::ANDIo8, dl, MVT::i64,		ANDIVal = SDValue(CurDAG->getMachineNode(PPC::ANDIo8, dl, MVT::i64,
Res, getI32Imm(ANDIMask, dl)), 0);		ExtendToInt64(Res, dl), getI32Imm(ANDIMask, dl)), 0);
if (ANDISMask != 0)		if (ANDISMask != 0)
ANDISVal = SDValue(CurDAG->getMachineNode(PPC::ANDISo8, dl, MVT::i64,		ANDISVal = SDValue(CurDAG->getMachineNode(PPC::ANDISo8, dl, MVT::i64,
Res, getI32Imm(ANDISMask, dl)), 0);		ExtendToInt64(Res, dl), getI32Imm(ANDISMask, dl)), 0);

if (!ANDIVal)		if (!ANDIVal)
Res = ANDISVal;		Res = ANDISVal;
else if (!ANDISVal)		else if (!ANDISVal)
Res = ANDIVal;		Res = ANDIVal;
else		else
Res = SDValue(CurDAG->getMachineNode(PPC::OR8, dl, MVT::i64,		Res = SDValue(CurDAG->getMachineNode(PPC::OR8, dl, MVT::i64,
ANDIVal, ANDISVal), 0);		ExtendToInt64(ANDIVal, dl), ANDISVal), 0);
} else {		} else {
if (InstCnt) InstCnt += selectI64ImmInstrCount(Mask) + / and */ 1;		if (InstCnt) InstCnt += selectI64ImmInstrCount(Mask) + / and */ 1;

SDValue MaskVal = SDValue(selectI64Imm(CurDAG, dl, Mask), 0);		SDValue MaskVal = SDValue(selectI64Imm(CurDAG, dl, Mask), 0);
Res =		Res =
SDValue(CurDAG->getMachineNode(PPC::AND8, dl, MVT::i64,		SDValue(CurDAG->getMachineNode(PPC::AND8, dl, MVT::i64,
Res, MaskVal), 0);		ExtendToInt64(Res, dl), MaskVal), 0);
}		}
}		}

return Res.getNode();		return Res.getNode();
}		}

SDNode Select(SDNode N, bool LateMask, unsigned *InstCnt = nullptr) {		SDNode Select(SDNode N, bool LateMask, unsigned *InstCnt = nullptr) {
// Fill in BitGroups.		// Fill in BitGroups.
▲ Show 20 Lines • Show All 2,711 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/zext-bitperm.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc-unknown-linux-gnu \| FileCheck %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64-unknown-linux-gnu \| FileCheck %s
				; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64le-unknown-linux-gnu \| FileCheck %s

				; Test case for PPCTargetLowering::extendSubTreeForBitPermutation.
				; We expect mask and rotate are folded into a rlwinm instruction.

				define zeroext i32 @func(i32* %p, i32 zeroext %i) {
				; CHECK-LABEL: @func
				; CHECK: addi [[REG1:[0-9]+]], 4, 1
				; CHECK: rlwinm [[REG2:[0-9]+]], [[REG1]], 2, 22, 29
				; CHECK-NOT: sldi
				; CHECK: lwzx 3, 3, [[REG2]]
				; CHECK: blr
				entry:
				%add = add i32 %i, 1
				%and = and i32 %add, 255
				%idxprom = zext i32 %and to i64
				%arrayidx = getelementptr inbounds i32, i32* %p, i64 %idxprom
				%0 = load i32, i32* %arrayidx, align 4
				ret i32 %0
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] support ZERO_EXTEND in tryBitPermutation
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 117312

llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

llvm/trunk/test/CodeGen/PowerPC/zext-bitperm.ll

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] support ZERO_EXTEND in tryBitPermutationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 117312

llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

llvm/trunk/test/CodeGen/PowerPC/zext-bitperm.ll

[PowerPC] support ZERO_EXTEND in tryBitPermutation
ClosedPublic