This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Custom lowering for funnel shifts
ClosedPublic

Authored by foad on Jul 16 2020, 6:36 AM.

Download Raw Diff

Details

Reviewers

hfinkel
RKSimon
nemanjai

Group Reviewers

Restricted Project

Commits

rG28e322ea9393: [PowerPC] Custom lowering for funnel shifts

Summary

The custom lowering saves an instruction over the generic expansion, by
taking advantage of the fact that PowerPC shift instructions are well
defined in the shift-by-bitwidth case.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Jul 16 2020, 6:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 16 2020, 6:36 AM

Herald added subscribers: llvm-commits, steven.zhang, shchenz and 3 others. · View Herald Transcript

foad marked 3 inline comments as done.Jul 16 2020, 6:39 AM

foad added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
616–624	This is just guesswork. I'm really not sure which types we should do this for, under what conditions.
llvm/test/CodeGen/PowerPC/funnel-shift.ll
31–43	Can I pre-commit this new test case, and fshr_i64?
llvm/test/CodeGen/PowerPC/pr44183.ll
16–22	This regressions seems to be caused by not constant folding based on the fact that r3 is known to be 4. Can anyone suggest how to fix it? Do I have to spot known constant shift amounts in PPCTargetLowering::LowerFunnelShift?

Herald added a subscriber: • wuzish. · View Herald TranscriptJul 16 2020, 6:39 AM

foad mentioned this in D77152: [SelectionDAG] Better legalization for FSHL and FSHR.Jul 16 2020, 6:40 AM

Harbormaster failed remote builds in B64505: Diff 278452!Jul 16 2020, 7:08 AM

lkail added a reviewer: Restricted Project.Jul 16 2020, 7:12 AM

spatel added a subscriber: spatel.Jul 16 2020, 7:16 AM

spatel added inline comments.

llvm/test/CodeGen/PowerPC/funnel-shift.ll
31–43	Yes, please pre-commit those, so we see the diffs.

Rebase after precommitting new test cases.

spatel added inline comments.Jul 16 2020, 7:36 AM

llvm/test/CodeGen/PowerPC/pr44183.ll
16–22	This is an unusual test because it contains "i216" types, but that's apparently the way the code was written in: http://bugs.llvm.org/PR44183 Maybe because of that, the constants are marked opaque from DAG creation time: Creating constant: t6: i216 = OpaqueConstant<4> ...so default constant folding is bypassed. Someone with more PPC knowledge probably needs to decide what -- if anything -- needs to be done here.

Harbormaster failed remote builds in B64517: Diff 278470!Jul 16 2020, 7:54 AM

Have you triaged whats going on in the mulfixsat test regressions?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8998	Maybe also show the pseudocode as used by PPC for comparison?

foad marked 3 inline comments as done.Jul 20 2020, 3:40 AM

foad added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8998	This is pseudocode for the PPC expansion. Did you mean, show the more complicated expansion that TargetLowering would have used?
llvm/test/CodeGen/PowerPC/umulfixsat.ll
12 ↗	(On Diff #278470)	I think the extra instruction here is just due to bad luck in the register allocation or scheduling.
29 ↗	(On Diff #278470)	It's hard to see because of the register allocation and scheduling differences but there is a slight regression here, from: rotlwi 3, 3, 31 rlwimi 3, 5, 31, 0, 0 to: srwi 5, 5, 1 slwi 4, 3, 31 or 4, 4, 5 I suppose I have regressed funnel shifts by a constant amount. I'll see if I can fix it.

RKSimon added inline comments.Jul 20 2020, 4:51 AM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8998	Doh! Sorry about that, I misread it this morning - no need to add anything here.

Enable combining of PPC-specific shift opcodes.

In D83948#2161728, @foad wrote:

Enable combining of PPC-specific shift opcodes.

My rationale is that in PPCTargetLowering::LowerFunnelShift it's correct to use PPCISD::SHL and PPCISD::SRL (instead of the ISD:: versions) precisely because they are defined in the shift-by-bitwidth case. But I think it's safe for BitPermutationSelector to treat them the same as the ISD:: versions. @hfinkel @inouehrs does this seem reasonable?

Harbormaster completed remote builds in B64893: Diff 279190.Jul 20 2020, 5:54 AM

This seems reasonable to me @hfinkel @nemanjai any more comments?

LGTM.

llvm/test/CodeGen/PowerPC/pr44183.ll
16–22	I agree with Sanjay, this is due to the weird size. However, it is not a concern. I'll post a very small tweak to our Reg+Reg -> Reg+Imm transformation that recovers this.

This revision is now accepted and ready to land.Jul 27 2020, 7:16 AM

The fix to recover the regression: https://reviews.llvm.org/D84659

Closed by commit rG28e322ea9393: [PowerPC] Custom lowering for funnel shifts (authored by foad). · Explain WhyAug 4 2020, 8:31 AM

This revision was automatically updated to reflect the committed changes.

foad added a commit: rG28e322ea9393: [PowerPC] Custom lowering for funnel shifts.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

12 lines

Target/

PowerPC/

PPCISelDAGToDAG.cpp

2 lines

PPCISelLowering.h

1 line

PPCISelLowering.cpp

37 lines

test/

CodeGen/

PowerPC/

funnel-shift.ll

28 lines

pr44183.ll

21 lines

Diff 279190

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,277 Lines • ▼ Show 20 Lines	case Intrinsic::fshr: {
SDValue X = getValue(I.getArgOperand(0));		SDValue X = getValue(I.getArgOperand(0));
SDValue Y = getValue(I.getArgOperand(1));		SDValue Y = getValue(I.getArgOperand(1));
SDValue Z = getValue(I.getArgOperand(2));		SDValue Z = getValue(I.getArgOperand(2));
EVT VT = X.getValueType();		EVT VT = X.getValueType();
SDValue BitWidthC = DAG.getConstant(VT.getScalarSizeInBits(), sdl, VT);		SDValue BitWidthC = DAG.getConstant(VT.getScalarSizeInBits(), sdl, VT);
SDValue Zero = DAG.getConstant(0, sdl, VT);		SDValue Zero = DAG.getConstant(0, sdl, VT);
SDValue ShAmt = DAG.getNode(ISD::UREM, sdl, VT, Z, BitWidthC);		SDValue ShAmt = DAG.getNode(ISD::UREM, sdl, VT, Z, BitWidthC);

auto FunnelOpcode = IsFSHL ? ISD::FSHL : ISD::FSHR;
if (TLI.isOperationLegalOrCustom(FunnelOpcode, VT)) {
setValue(&I, DAG.getNode(FunnelOpcode, sdl, VT, X, Y, Z));
return;
}

// When X == Y, this is rotate. If the data type has a power-of-2 size, we		// When X == Y, this is rotate. If the data type has a power-of-2 size, we
// avoid the select that is necessary in the general case to filter out		// avoid the select that is necessary in the general case to filter out
// the 0-shift possibility that leads to UB.		// the 0-shift possibility that leads to UB.
if (X == Y && isPowerOf2_32(VT.getScalarSizeInBits())) {		if (X == Y && isPowerOf2_32(VT.getScalarSizeInBits())) {
auto RotateOpcode = IsFSHL ? ISD::ROTL : ISD::ROTR;		auto RotateOpcode = IsFSHL ? ISD::ROTL : ISD::ROTR;
if (TLI.isOperationLegalOrCustom(RotateOpcode, VT)) {		if (TLI.isOperationLegalOrCustom(RotateOpcode, VT)) {
setValue(&I, DAG.getNode(RotateOpcode, sdl, VT, X, Z));		setValue(&I, DAG.getNode(RotateOpcode, sdl, VT, X, Z));
return;		return;
Show All 13 Lines	if (X == Y && isPowerOf2_32(VT.getScalarSizeInBits())) {
SDValue NegZ = DAG.getNode(ISD::SUB, sdl, VT, Zero, Z);		SDValue NegZ = DAG.getNode(ISD::SUB, sdl, VT, Zero, Z);
SDValue NShAmt = DAG.getNode(ISD::UREM, sdl, VT, NegZ, BitWidthC);		SDValue NShAmt = DAG.getNode(ISD::UREM, sdl, VT, NegZ, BitWidthC);
SDValue ShX = DAG.getNode(ISD::SHL, sdl, VT, X, IsFSHL ? ShAmt : NShAmt);		SDValue ShX = DAG.getNode(ISD::SHL, sdl, VT, X, IsFSHL ? ShAmt : NShAmt);
SDValue ShY = DAG.getNode(ISD::SRL, sdl, VT, X, IsFSHL ? NShAmt : ShAmt);		SDValue ShY = DAG.getNode(ISD::SRL, sdl, VT, X, IsFSHL ? NShAmt : ShAmt);
setValue(&I, DAG.getNode(ISD::OR, sdl, VT, ShX, ShY));		setValue(&I, DAG.getNode(ISD::OR, sdl, VT, ShX, ShY));
return;		return;
}		}

		auto FunnelOpcode = IsFSHL ? ISD::FSHL : ISD::FSHR;
		if (TLI.isOperationLegalOrCustom(FunnelOpcode, VT)) {
		setValue(&I, DAG.getNode(FunnelOpcode, sdl, VT, X, Y, Z));
		return;
		}

// fshl: (X << (Z % BW)) \| (Y >> (BW - (Z % BW)))		// fshl: (X << (Z % BW)) \| (Y >> (BW - (Z % BW)))
// fshr: (X << (BW - (Z % BW))) \| (Y >> (Z % BW))		// fshr: (X << (BW - (Z % BW))) \| (Y >> (Z % BW))
SDValue InvShAmt = DAG.getNode(ISD::SUB, sdl, VT, BitWidthC, ShAmt);		SDValue InvShAmt = DAG.getNode(ISD::SUB, sdl, VT, BitWidthC, ShAmt);
SDValue ShX = DAG.getNode(ISD::SHL, sdl, VT, X, IsFSHL ? ShAmt : InvShAmt);		SDValue ShX = DAG.getNode(ISD::SHL, sdl, VT, X, IsFSHL ? ShAmt : InvShAmt);
SDValue ShY = DAG.getNode(ISD::SRL, sdl, VT, Y, IsFSHL ? InvShAmt : ShAmt);		SDValue ShY = DAG.getNode(ISD::SRL, sdl, VT, Y, IsFSHL ? InvShAmt : ShAmt);
SDValue Or = DAG.getNode(ISD::OR, sdl, VT, ShX, ShY);		SDValue Or = DAG.getNode(ISD::OR, sdl, VT, ShX, ShY);

// If (Z % BW == 0), then the opposite direction shift is shift-by-bitwidth,		// If (Z % BW == 0), then the opposite direction shift is shift-by-bitwidth,
▲ Show 20 Lines • Show All 4,373 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,247 Lines • ▼ Show 20 Lines	case ISD::ROTL:

for (unsigned i = 0; i < NumBits; ++i)		for (unsigned i = 0; i < NumBits; ++i)
Bits[i] = LHSBits[i < RotAmt ? i + (NumBits - RotAmt) : i - RotAmt];		Bits[i] = LHSBits[i < RotAmt ? i + (NumBits - RotAmt) : i - RotAmt];

return std::make_pair(Interesting = true, &Bits);		return std::make_pair(Interesting = true, &Bits);
}		}
break;		break;
case ISD::SHL:		case ISD::SHL:
		case PPCISD::SHL:
if (isa<ConstantSDNode>(V.getOperand(1))) {		if (isa<ConstantSDNode>(V.getOperand(1))) {
unsigned ShiftAmt = V.getConstantOperandVal(1);		unsigned ShiftAmt = V.getConstantOperandVal(1);

const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second;		const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second;

for (unsigned i = ShiftAmt; i < NumBits; ++i)		for (unsigned i = ShiftAmt; i < NumBits; ++i)
Bits[i] = LHSBits[i - ShiftAmt];		Bits[i] = LHSBits[i - ShiftAmt];

for (unsigned i = 0; i < ShiftAmt; ++i)		for (unsigned i = 0; i < ShiftAmt; ++i)
Bits[i] = ValueBit(ValueBit::ConstZero);		Bits[i] = ValueBit(ValueBit::ConstZero);

return std::make_pair(Interesting = true, &Bits);		return std::make_pair(Interesting = true, &Bits);
}		}
break;		break;
case ISD::SRL:		case ISD::SRL:
		case PPCISD::SRL:
if (isa<ConstantSDNode>(V.getOperand(1))) {		if (isa<ConstantSDNode>(V.getOperand(1))) {
unsigned ShiftAmt = V.getConstantOperandVal(1);		unsigned ShiftAmt = V.getConstantOperandVal(1);

const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second;		const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second;

for (unsigned i = 0; i < NumBits - ShiftAmt; ++i)		for (unsigned i = 0; i < NumBits - ShiftAmt; ++i)
Bits[i] = LHSBits[i + ShiftAmt];		Bits[i] = LHSBits[i + ShiftAmt];

▲ Show 20 Lines • Show All 5,493 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 1,111 Lines • ▼ Show 20 Lines	private:
SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG,		SDValue LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG,
const SDLoc &dl) const;		const SDLoc &dl) const;
SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFLT_ROUNDS_(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSHL_PARTS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSHL_PARTS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSRL_PARTS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSRL_PARTS(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSRA_PARTS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSRA_PARTS(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerFunnelShift(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBSWAP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBSWAP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerATOMIC_CMP_SWAP(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 607 Lines • ▼ Show 20 Lines	if (Subtarget.use64BitRegs()) {
setOperationAction(ISD::SRL_PARTS, MVT::i64, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i64, Custom);
} else {		} else {
// 32-bit PowerPC wants to expand i64 shifts itself.		// 32-bit PowerPC wants to expand i64 shifts itself.
setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SHL_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRA_PARTS, MVT::i32, Custom);
setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);		setOperationAction(ISD::SRL_PARTS, MVT::i32, Custom);
}		}

		// PowerPC has better expansions for funnel shifts than the generic
		// TargetLowering::expandFunnelShift.
		if (Subtarget.has64BitSupport()) {
		setOperationAction(ISD::FSHL, MVT::i64, Custom);
		setOperationAction(ISD::FSHR, MVT::i64, Custom);
		}
		setOperationAction(ISD::FSHL, MVT::i32, Custom);
		setOperationAction(ISD::FSHR, MVT::i32, Custom);

		foadAuthorUnsubmitted Done Reply Inline Actions This is just guesswork. I'm really not sure which types we should do this for, under what conditions. foad: This is just guesswork. I'm really not sure which types we should do this for, under what…
if (Subtarget.hasVSX()) {		if (Subtarget.hasVSX()) {
setOperationAction(ISD::FMAXNUM_IEEE, MVT::f64, Legal);		setOperationAction(ISD::FMAXNUM_IEEE, MVT::f64, Legal);
setOperationAction(ISD::FMAXNUM_IEEE, MVT::f32, Legal);		setOperationAction(ISD::FMAXNUM_IEEE, MVT::f32, Legal);
setOperationAction(ISD::FMINNUM_IEEE, MVT::f64, Legal);		setOperationAction(ISD::FMINNUM_IEEE, MVT::f64, Legal);
setOperationAction(ISD::FMINNUM_IEEE, MVT::f32, Legal);		setOperationAction(ISD::FMINNUM_IEEE, MVT::f32, Legal);
}		}

if (Subtarget.hasAltivec()) {		if (Subtarget.hasAltivec()) {
▲ Show 20 Lines • Show All 8,343 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::LowerSRA_PARTS(SDValue Op, SelectionDAG &DAG) const {
SDValue Tmp6 = DAG.getNode(PPCISD::SRA, dl, VT, Hi, Tmp5);		SDValue Tmp6 = DAG.getNode(PPCISD::SRA, dl, VT, Hi, Tmp5);
SDValue OutHi = DAG.getNode(PPCISD::SRA, dl, VT, Hi, Amt);		SDValue OutHi = DAG.getNode(PPCISD::SRA, dl, VT, Hi, Amt);
SDValue OutLo = DAG.getSelectCC(dl, Tmp5, DAG.getConstant(0, dl, AmtVT),		SDValue OutLo = DAG.getSelectCC(dl, Tmp5, DAG.getConstant(0, dl, AmtVT),
Tmp4, Tmp6, ISD::SETLE);		Tmp4, Tmp6, ISD::SETLE);
SDValue OutOps[] = { OutLo, OutHi };		SDValue OutOps[] = { OutLo, OutHi };
return DAG.getMergeValues(OutOps, dl);		return DAG.getMergeValues(OutOps, dl);
}		}

		SDValue PPCTargetLowering::LowerFunnelShift(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc dl(Op);
		EVT VT = Op.getValueType();
		unsigned BitWidth = VT.getSizeInBits();

		bool IsFSHL = Op.getOpcode() == ISD::FSHL;
		SDValue X = Op.getOperand(0);
		SDValue Y = Op.getOperand(1);
		SDValue Z = Op.getOperand(2);
		EVT AmtVT = Z.getValueType();

		// fshl: (X << (Z % BW)) \| (Y >> (BW - (Z % BW)))
		// fshr: (X << (BW - (Z % BW))) \| (Y >> (Z % BW))
		// This is simpler than TargetLowering::expandFunnelShift because we can rely
		RKSimonUnsubmitted Not Done Reply Inline Actions Maybe also show the pseudocode as used by PPC for comparison? RKSimon: Maybe also show the pseudocode as used by PPC for comparison?
		foadAuthorUnsubmitted Done Reply Inline Actions This is pseudocode for the PPC expansion. Did you mean, show the more complicated expansion that TargetLowering would have used? foad: This //is// pseudocode for the PPC expansion. Did you mean, show the more complicated expansion…
		RKSimonUnsubmitted Not Done Reply Inline Actions Doh! Sorry about that, I misread it this morning - no need to add anything here. RKSimon: Doh! Sorry about that, I misread it this morning - no need to add anything here.
		// on PowerPC shift by BW being well defined.
		Z = DAG.getNode(ISD::AND, dl, AmtVT, Z,
		DAG.getConstant(BitWidth - 1, dl, AmtVT));
		SDValue SubZ =
		DAG.getNode(ISD::SUB, dl, AmtVT, DAG.getConstant(BitWidth, dl, AmtVT), Z);
		X = DAG.getNode(PPCISD::SHL, dl, VT, X, IsFSHL ? Z : SubZ);
		Y = DAG.getNode(PPCISD::SRL, dl, VT, Y, IsFSHL ? SubZ : Z);
		return DAG.getNode(ISD::OR, dl, VT, X, Y);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Vector related lowering.		// Vector related lowering.
//		//

/// getCanonicalConstSplat - Build a canonical splat immediate of Val with an		/// getCanonicalConstSplat - Build a canonical splat immediate of Val with an
/// element size of SplatSize. Cast the result to VT.		/// element size of SplatSize. Cast the result to VT.
static SDValue getCanonicalConstSplat(uint64_t Val, unsigned SplatSize, EVT VT,		static SDValue getCanonicalConstSplat(uint64_t Val, unsigned SplatSize, EVT VT,
SelectionDAG &DAG, const SDLoc &dl) {		SelectionDAG &DAG, const SDLoc &dl) {
▲ Show 20 Lines • Show All 2,181 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::SINT_TO_FP: return LowerINT_TO_FP(Op, DAG);		case ISD::SINT_TO_FP: return LowerINT_TO_FP(Op, DAG);
case ISD::FLT_ROUNDS_: return LowerFLT_ROUNDS_(Op, DAG);		case ISD::FLT_ROUNDS_: return LowerFLT_ROUNDS_(Op, DAG);

// Lower 64-bit shifts.		// Lower 64-bit shifts.
case ISD::SHL_PARTS: return LowerSHL_PARTS(Op, DAG);		case ISD::SHL_PARTS: return LowerSHL_PARTS(Op, DAG);
case ISD::SRL_PARTS: return LowerSRL_PARTS(Op, DAG);		case ISD::SRL_PARTS: return LowerSRL_PARTS(Op, DAG);
case ISD::SRA_PARTS: return LowerSRA_PARTS(Op, DAG);		case ISD::SRA_PARTS: return LowerSRA_PARTS(Op, DAG);

		case ISD::FSHL: return LowerFunnelShift(Op, DAG);
		case ISD::FSHR: return LowerFunnelShift(Op, DAG);

// Vector-related lowering.		// Vector-related lowering.
case ISD::BUILD_VECTOR: return LowerBUILD_VECTOR(Op, DAG);		case ISD::BUILD_VECTOR: return LowerBUILD_VECTOR(Op, DAG);
case ISD::VECTOR_SHUFFLE: return LowerVECTOR_SHUFFLE(Op, DAG);		case ISD::VECTOR_SHUFFLE: return LowerVECTOR_SHUFFLE(Op, DAG);
case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);		case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);
case ISD::SCALAR_TO_VECTOR: return LowerSCALAR_TO_VECTOR(Op, DAG);		case ISD::SCALAR_TO_VECTOR: return LowerSCALAR_TO_VECTOR(Op, DAG);
case ISD::EXTRACT_VECTOR_ELT: return LowerEXTRACT_VECTOR_ELT(Op, DAG);		case ISD::EXTRACT_VECTOR_ELT: return LowerEXTRACT_VECTOR_ELT(Op, DAG);
case ISD::INSERT_VECTOR_ELT: return LowerINSERT_VECTOR_ELT(Op, DAG);		case ISD::INSERT_VECTOR_ELT: return LowerINSERT_VECTOR_ELT(Op, DAG);
case ISD::MUL: return LowerMUL(Op, DAG);		case ISD::MUL: return LowerMUL(Op, DAG);
▲ Show 20 Lines • Show All 5,967 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/funnel-shift.ll

	Show All 12 Lines
	declare i64 @llvm.fshr.i64(i64, i64, i64)			declare i64 @llvm.fshr.i64(i64, i64, i64)
	declare <4 x i32> @llvm.fshr.v4i32(<4 x i32>, <4 x i32>, <4 x i32>)			declare <4 x i32> @llvm.fshr.v4i32(<4 x i32>, <4 x i32>, <4 x i32>)

	; General case - all operands can be variables.			; General case - all operands can be variables.

	define i32 @fshl_i32(i32 %x, i32 %y, i32 %z) {			define i32 @fshl_i32(i32 %x, i32 %y, i32 %z) {
	; CHECK-LABEL: fshl_i32:			; CHECK-LABEL: fshl_i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: andi. 5, 5, 31			; CHECK-NEXT: clrlwi 5, 5, 27
	; CHECK-NEXT: subfic 6, 5, 32			; CHECK-NEXT: subfic 6, 5, 32
	; CHECK-NEXT: slw 5, 3, 5			; CHECK-NEXT: slw 3, 3, 5
	; CHECK-NEXT: srw 4, 4, 6			; CHECK-NEXT: srw 4, 4, 6
	; CHECK-NEXT: or 4, 5, 4			; CHECK-NEXT: or 3, 3, 4
	; CHECK-NEXT: iseleq 3, 3, 4
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%f = call i32 @llvm.fshl.i32(i32 %x, i32 %y, i32 %z)			%f = call i32 @llvm.fshl.i32(i32 %x, i32 %y, i32 %z)
	ret i32 %f			ret i32 %f
	}			}

	define i64 @fshl_i64(i64 %x, i64 %y, i64 %z) {			define i64 @fshl_i64(i64 %x, i64 %y, i64 %z) {
	; CHECK-LABEL: fshl_i64:			; CHECK-LABEL: fshl_i64:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: andi. 5, 5, 63			; CHECK-NEXT: clrlwi 5, 5, 26
	; CHECK-NEXT: subfic 6, 5, 64			; CHECK-NEXT: subfic 6, 5, 64
	; CHECK-NEXT: sld 5, 3, 5			; CHECK-NEXT: sld 3, 3, 5
	; CHECK-NEXT: srd 4, 4, 6			; CHECK-NEXT: srd 4, 4, 6
	; CHECK-NEXT: or 4, 5, 4			; CHECK-NEXT: or 3, 3, 4
	; CHECK-NEXT: iseleq 3, 3, 4
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%f = call i64 @llvm.fshl.i64(i64 %x, i64 %y, i64 %z)			%f = call i64 @llvm.fshl.i64(i64 %x, i64 %y, i64 %z)
	ret i64 %f			ret i64 %f
	}			}

				foadAuthorUnsubmitted Done Reply Inline Actions Can I pre-commit this new test case, and fshr_i64? foad: Can I pre-commit this new test case, and fshr_i64?
				spatelUnsubmitted Not Done Reply Inline Actions Yes, please pre-commit those, so we see the diffs. spatel: Yes, please pre-commit those, so we see the diffs.
	; Verify that weird types are minimally supported.			; Verify that weird types are minimally supported.
	declare i37 @llvm.fshl.i37(i37, i37, i37)			declare i37 @llvm.fshl.i37(i37, i37, i37)
	define i37 @fshl_i37(i37 %x, i37 %y, i37 %z) {			define i37 @fshl_i37(i37 %x, i37 %y, i37 %z) {
	; CHECK-LABEL: fshl_i37:			; CHECK-LABEL: fshl_i37:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: lis 6, -8857			; CHECK-NEXT: lis 6, -8857
	; CHECK-NEXT: clrldi 5, 5, 27			; CHECK-NEXT: clrldi 5, 5, 27
	; CHECK-NEXT: ori 6, 6, 51366			; CHECK-NEXT: ori 6, 6, 51366
	▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines

	; Repeat everything for funnel shift right.			; Repeat everything for funnel shift right.

	; General case - all operands can be variables.			; General case - all operands can be variables.

	define i32 @fshr_i32(i32 %x, i32 %y, i32 %z) {			define i32 @fshr_i32(i32 %x, i32 %y, i32 %z) {
	; CHECK-LABEL: fshr_i32:			; CHECK-LABEL: fshr_i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: andi. 5, 5, 31			; CHECK-NEXT: clrlwi 5, 5, 27
	; CHECK-NEXT: subfic 6, 5, 32			; CHECK-NEXT: subfic 6, 5, 32
	; CHECK-NEXT: srw 5, 4, 5			; CHECK-NEXT: srw 4, 4, 5
	; CHECK-NEXT: slw 3, 3, 6			; CHECK-NEXT: slw 3, 3, 6
	; CHECK-NEXT: or 3, 3, 5			; CHECK-NEXT: or 3, 3, 4
	; CHECK-NEXT: iseleq 3, 4, 3
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%f = call i32 @llvm.fshr.i32(i32 %x, i32 %y, i32 %z)			%f = call i32 @llvm.fshr.i32(i32 %x, i32 %y, i32 %z)
	ret i32 %f			ret i32 %f
	}			}

	define i64 @fshr_i64(i64 %x, i64 %y, i64 %z) {			define i64 @fshr_i64(i64 %x, i64 %y, i64 %z) {
	; CHECK-LABEL: fshr_i64:			; CHECK-LABEL: fshr_i64:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: andi. 5, 5, 63			; CHECK-NEXT: clrlwi 5, 5, 26
	; CHECK-NEXT: subfic 6, 5, 64			; CHECK-NEXT: subfic 6, 5, 64
	; CHECK-NEXT: srd 5, 4, 5			; CHECK-NEXT: srd 4, 4, 5
	; CHECK-NEXT: sld 3, 3, 6			; CHECK-NEXT: sld 3, 3, 6
	; CHECK-NEXT: or 3, 3, 5			; CHECK-NEXT: or 3, 3, 4
	; CHECK-NEXT: iseleq 3, 4, 3
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%f = call i64 @llvm.fshr.i64(i64 %x, i64 %y, i64 %z)			%f = call i64 @llvm.fshr.i64(i64 %x, i64 %y, i64 %z)
	ret i64 %f			ret i64 %f
	}			}

	; Verify that weird types are minimally supported.			; Verify that weird types are minimally supported.
	declare i37 @llvm.fshr.i37(i37, i37, i37)			declare i37 @llvm.fshr.i37(i37, i37, i37)
	define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) {			define i37 @fshr_i37(i37 %x, i37 %y, i37 %z) {
	▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/pr44183.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \			; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \
	; RUN: -ppc-asm-full-reg-names -mcpu=pwr8 < %s \| FileCheck %s			; RUN: -ppc-asm-full-reg-names -mcpu=pwr8 < %s \| FileCheck %s
	%struct.m.2.5.8.11 = type { %struct.l.0.3.6.9, [7 x i8], %struct.a.1.4.7.10 }			%struct.m.2.5.8.11 = type { %struct.l.0.3.6.9, [7 x i8], %struct.a.1.4.7.10 }
	%struct.l.0.3.6.9 = type { i8 }			%struct.l.0.3.6.9 = type { i8 }
	%struct.a.1.4.7.10 = type { [27 x i8], [0 x i32], [4 x i8] }			%struct.a.1.4.7.10 = type { [27 x i8], [0 x i32], [4 x i8] }
	define void @_ZN1m1nEv(%struct.m.2.5.8.11* %this) local_unnamed_addr nounwind align 2 {			define void @_ZN1m1nEv(%struct.m.2.5.8.11* %this) local_unnamed_addr nounwind align 2 {
	; CHECK-LABEL: _ZN1m1nEv:			; CHECK-LABEL: _ZN1m1nEv:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: mflr r0			; CHECK-NEXT: mflr r0
				; CHECK-NEXT: std r29, -24(r1) # 8-byte Folded Spill
	; CHECK-NEXT: std r30, -16(r1) # 8-byte Folded Spill			; CHECK-NEXT: std r30, -16(r1) # 8-byte Folded Spill
	; CHECK-NEXT: std r0, 16(r1)			; CHECK-NEXT: std r0, 16(r1)
	; CHECK-NEXT: stdu r1, -48(r1)			; CHECK-NEXT: stdu r1, -64(r1)
	; CHECK-NEXT: mr r30, r3			; CHECK-NEXT: mr r30, r3
	; CHECK-NEXT: ld r4, 8(r30)			; CHECK-NEXT: li r3, 4
				; CHECK-NEXT: ld r4, 16(r30)
				; CHECK-NEXT: ld r5, 8(r30)
				; CHECK-NEXT: subfic r29, r3, 64
				; CHECK-NEXT: rldicl r3, r5, 60, 4
				; CHECK-NEXT: sld r4, r4, r29
	; CHECK-NEXT: lwz r5, 36(r30)			; CHECK-NEXT: lwz r5, 36(r30)
				foadAuthorUnsubmitted Done Reply Inline Actions This regressions seems to be caused by not constant folding based on the fact that r3 is known to be 4. Can anyone suggest how to fix it? Do I have to spot known constant shift amounts in PPCTargetLowering::LowerFunnelShift? foad: This regressions seems to be caused by not constant folding based on the fact that r3 is known…
				spatelUnsubmitted Not Done Reply Inline Actions This is an unusual test because it contains "i216" types, but that's apparently the way the code was written in: http://bugs.llvm.org/PR44183 Maybe because of that, the constants are marked opaque from DAG creation time: Creating constant: t6: i216 = OpaqueConstant<4> ...so default constant folding is bypassed. Someone with more PPC knowledge probably needs to decide what -- if anything -- needs to be done here. spatel: This is an unusual test because it contains "i216" types, but that's apparently the way the…
				nemanjaiUnsubmitted Not Done Reply Inline Actions I agree with Sanjay, this is due to the weird size. However, it is not a concern. I'll post a very small tweak to our Reg+Reg -> Reg+Imm transformation that recovers this. nemanjai: I agree with Sanjay, this is due to the weird size. However, it is not a concern. I'll post a…
	; CHECK-NEXT: rldicl r4, r4, 60, 4			; CHECK-NEXT: or r3, r4, r3
	; CHECK-NEXT: rlwinm r3, r4, 31, 0, 0			; CHECK-NEXT: rlwinm r3, r3, 31, 0, 0
	; CHECK-NEXT: clrlwi r4, r5, 31			; CHECK-NEXT: clrlwi r4, r5, 31
	; CHECK-NEXT: or r4, r4, r3			; CHECK-NEXT: or r4, r4, r3
	; CHECK-NEXT: bl _ZN1llsE1d			; CHECK-NEXT: bl _ZN1llsE1d
	; CHECK-NEXT: nop			; CHECK-NEXT: nop
	; CHECK-NEXT: ld r3, 16(r30)			; CHECK-NEXT: ld r3, 16(r30)
	; CHECK-NEXT: ld r4, 8(r30)			; CHECK-NEXT: ld r4, 8(r30)
	; CHECK-NEXT: rldicl r4, r4, 60, 4			; CHECK-NEXT: rldicl r4, r4, 60, 4
	; CHECK-NEXT: sldi r3, r3, 60			; CHECK-NEXT: sld r3, r3, r29
	; CHECK-NEXT: or r3, r4, r3			; CHECK-NEXT: or r3, r3, r4
	; CHECK-NEXT: sldi r3, r3, 31			; CHECK-NEXT: sldi r3, r3, 31
	; CHECK-NEXT: clrldi r4, r3, 32			; CHECK-NEXT: clrldi r4, r3, 32
	; CHECK-NEXT: bl _ZN1llsE1d			; CHECK-NEXT: bl _ZN1llsE1d
	; CHECK-NEXT: nop			; CHECK-NEXT: nop
	; CHECK-NEXT: addi r1, r1, 48			; CHECK-NEXT: addi r1, r1, 64
	; CHECK-NEXT: ld r0, 16(r1)			; CHECK-NEXT: ld r0, 16(r1)
	; CHECK-NEXT: ld r30, -16(r1) # 8-byte Folded Reload			; CHECK-NEXT: ld r30, -16(r1) # 8-byte Folded Reload
				; CHECK-NEXT: ld r29, -24(r1) # 8-byte Folded Reload
	; CHECK-NEXT: mtlr r0			; CHECK-NEXT: mtlr r0
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	entry:			entry:
	%bc = getelementptr inbounds %struct.m.2.5.8.11, %struct.m.2.5.8.11* %this, i64 0, i32 2			%bc = getelementptr inbounds %struct.m.2.5.8.11, %struct.m.2.5.8.11* %this, i64 0, i32 2
	%0 = bitcast %struct.a.1.4.7.10* %bc to i216*			%0 = bitcast %struct.a.1.4.7.10* %bc to i216*
	%bf.load = load i216, i216* %0, align 8			%bf.load = load i216, i216* %0, align 8
	%bf.lshr = lshr i216 %bf.load, 4			%bf.lshr = lshr i216 %bf.load, 4
	%shl.i23 = shl i216 %bf.lshr, 31			%shl.i23 = shl i216 %bf.lshr, 31
	Show All 14 Lines