This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
2/12
PPCISelLowering.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
3/3
direct-move-profit.ll

Differential D18405

[PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP
ClosedPublic

Authored by amehsan on Mar 23 2016, 10:12 AM.

Download Raw Diff

Details

Reviewers

tjablin
cycheng
kbarton
hfinkel
nemanjai

Summary

The code checks when an integer load is followed by one or more direct move. If there is no other user for the integer load we can replace two instructions with one floating point load. Essentially what we do here is disabling aggressive exploitation of direct move instruction. Otherwise the code for replacing the sequence already exists and works properly.

Diff Detail

Event Timeline

amehsan updated this revision to Diff 51441.Mar 23 2016, 10:12 AM

amehsan retitled this revision from to [PPC] Prefer floating point load to integer load plus direct move, when there is no other user for the integer load.

amehsan updated this object.

amehsan added reviewers: hfinkel, kbarton, nemanjai, cycheng, tjablin.

amehsan added a subscriber: llvm-commits.

amehsan updated this object.Mar 23 2016, 10:14 AM

nemanjai added inline comments.Mar 23 2016, 1:28 PM

lib/Target/PowerPC/PPCISelLowering.cpp
6302	I think this only makes sense if the direct-move path wouldn't be able to use a D-Form load. In that case, you're simply eliminating one direct move instruction. However, in cases where the direct-move path would be able to use a D-Form load, we've just added another load (for the X-Form lxsiwax). For [a contrived] example: float test (int *arr) { return arr[2]; } That being said, I don't know that this would be a show-stopper for this approach as the benefits may outweigh this issue. Also, I think you can probably predict whether the `ISD::LOAD` will be lowered to a D-Form load based on the inputs. I think the complete condition when we don't want the direct move would be: The input is a load The only use is an int-to-fp conversion The offset from the base register is either zero, non-constant or does not fit in the 16-bit D field of a D-Form load I believe that all of these can be checked without making the code overly complicated.
6309	Just a minor nit - I think we generally prefer complete sentences for comments (capitalization, punctuation).
test/CodeGen/PowerPC/direct-move-profit.ll
5	You should probably remove the metadata here and (some) below. The triple on the command line and here conflict and I don't believe that all the metadata below is necessary.

amehsan added inline comments.Mar 28 2016, 10:17 AM

lib/Target/PowerPC/PPCISelLowering.cpp
6302	We discussed this over email. This is a summary of our conclusion. The current patch for deciding whether or not we should use direct move is fine. The code generated for the example above (after this patch) is: 0: 08 00 80 38 li r4,8 4: 98 20 03 7c lxsiwax vs0,r3,r4 8: e0 04 20 f0 xscvsxdsp vs1,vs0 c: 20 00 80 4e blr Ideally it should be like what gcc does: 0: 08 00 63 38 addi r3,r3,8 4: ae 1e 20 7c lfiwax f1,0,r3 8: 9c 0e 20 ec fcfids f1,f1 c: 20 00 80 4e blr The second code pattern has lower register pressure and that is why it is better. The code that generates first code pattern above was previously used only for Power7. After this patch this code is now used for Power8. The example exposes an opportunity within the new code path. That will be addressed separately from this patch.

kbarton requested changes to this revision.Apr 4 2016, 1:10 PM

kbarton edited edge metadata.

kbarton added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
6302	OK, I'm confused by these comments. With this patch, what code sequence is going to be generated? If there is a further opportunity, why not address it with this patch? If it really makes sense to deal with it in a separate patch, please list the phabricator review (or bugzilla) for it here, so we can be sure to track it and maintain the association between the items.
6389–6390	I have a very slight preference for checking whether DirectMove is available before checking whether it is profitable. Changing the order of the checks will have virtually no impact, but it makes more sense to me.
test/CodeGen/PowerPC/direct-move-profit.ll
5	I agree. Please remove this, and the target triple line immediately below, and specify the correct triple on the RUN step.

This revision now requires changes to proceed.Apr 4 2016, 1:10 PM

amehsan added inline comments.Apr 4 2016, 1:19 PM

lib/Target/PowerPC/PPCISelLowering.cpp
6302	The first sequence is the one generated after this patch. The second sequence is the one currently gcc generates. The one that is generated before this patch is this: 0: 08 00 63 80 lwz r3,8(r3) 4: a6 01 03 7c mtfprwa f0,r3 8: e0 04 20 f0 xscvsxdsp vs1,vs0 c: 20 00 80 4e blr

amehsan added inline comments.Apr 4 2016, 2:31 PM

lib/Target/PowerPC/PPCISelLowering.cpp
6302	I discussed this with Kit offline. I will address the requested change in the test case (making sure meta data is correct) and two nits suggested below. After final approval I will commit this change. The bug exposed by this patch (which was discussed in the comments above) will be addressed separately under the following bugzilla item. https://llvm.org/bugs/show_bug.cgi?id=27204

Removed data layout and redundant triple definition from the test case.
Capitalized comment in the code.
Minor change in the if statement which has the new condition. (Changed the order of two conditions)

mcrosier added a subscriber: mcrosier.Apr 5 2016, 12:19 PM

mcrosier added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
6309	Please add a space between the "//" and the comments.
6389	The calls to the Subtarget.* are likely very cheap, while the call to directMoveIsProfitable is less so. I'd suggest checking the directMoveIsProfitable() last.
test/CodeGen/PowerPC/direct-move-profit.ll
6	Please add a space after the semicolon.

amehsan added inline comments.Apr 5 2016, 12:28 PM

lib/Target/PowerPC/PPCISelLowering.cpp
6389	Actually, calls to the Subtarget.* are going to be always successful unless someone is compiling for Power 7 or below (which is not very likely) or is doing some work with the testcases. So for almost all practical purposes, we will always call directMoeIsProfitable and make the decision based on that.

adding space after comments

mcrosier added inline comments.Apr 5 2016, 12:38 PM

lib/Target/PowerPC/PPCISelLowering.cpp
6389–6391	Fair enough. :)

amehsan added inline comments.Apr 5 2016, 12:47 PM

lib/Target/PowerPC/PPCISelLowering.cpp
6389–6391	:)

LGTM

This revision is now accepted and ready to land.Apr 6 2016, 8:48 AM

amehsan retitled this revision from [PPC] Prefer floating point load to integer load plus direct move, when there is no other user for the integer load to [PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP.Apr 6 2016, 1:13 PM

amehsan edited edge metadata.

committed r265593.

amehsan closed this revision.Apr 6 2016, 8:05 PM

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelLowering.cpp

27 lines

test/

CodeGen/

PowerPC/

direct-move-profit.ll

83 lines

Diff 52721

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,288 Lines • ▼ Show 20 Lines	SDValue TF = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
NewResChain, DAG.getUNDEF(MVT::Other));		NewResChain, DAG.getUNDEF(MVT::Other));
assert(TF.getNode() != NewResChain.getNode() &&		assert(TF.getNode() != NewResChain.getNode() &&
"A new TF really is required here");		"A new TF really is required here");

DAG.ReplaceAllUsesOfValueWith(ResChain, TF);		DAG.ReplaceAllUsesOfValueWith(ResChain, TF);
DAG.UpdateNodeOperands(TF.getNode(), ResChain, NewResChain);		DAG.UpdateNodeOperands(TF.getNode(), ResChain, NewResChain);
}		}

		/// \brief Analyze profitability of direct move
		/// prefer float load to int load plus direct move
		/// when there is no integer use of int load
		static bool directMoveIsProfitable(const SDValue &Op) {
		SDNode *Origin = Op.getOperand(0).getNode();
		if (Origin->getOpcode() != ISD::LOAD)
		nemanjaiUnsubmitted Not Done Reply Inline Actions I think this only makes sense if the direct-move path wouldn't be able to use a D-Form load. In that case, you're simply eliminating one direct move instruction. However, in cases where the direct-move path would be able to use a D-Form load, we've just added another load (for the X-Form lxsiwax). For [a contrived] example: float test (int arr) { return arr[2]; } That being said, I don't know that this would be a show-stopper for this approach as the benefits may outweigh this issue. Also, I think you can probably predict whether the `ISD::LOAD` will be lowered to a D-Form load based on the inputs. I think the complete condition when we don't want the direct move would be: The input is a load The only use is an int-to-fp conversion The offset from the base register is either zero, non-constant or does not fit in the 16-bit D field of a D-Form load I believe that all of these can be checked without making the code overly complicated. nemanjai:* I think this only makes sense if the direct-move path wouldn't be able to use a D-Form load. In…
		amehsanAuthorUnsubmitted Not Done Reply Inline Actions We discussed this over email. This is a summary of our conclusion. The current patch for deciding whether or not we should use direct move is fine. The code generated for the example above (after this patch) is: 0: 08 00 80 38 li r4,8 4: 98 20 03 7c lxsiwax vs0,r3,r4 8: e0 04 20 f0 xscvsxdsp vs1,vs0 c: 20 00 80 4e blr Ideally it should be like what gcc does: 0: 08 00 63 38 addi r3,r3,8 4: ae 1e 20 7c lfiwax f1,0,r3 8: 9c 0e 20 ec fcfids f1,f1 c: 20 00 80 4e blr The second code pattern has lower register pressure and that is why it is better. The code that generates first code pattern above was previously used only for Power7. After this patch this code is now used for Power8. The example exposes an opportunity within the new code path. That will be addressed separately from this patch. amehsan: We discussed this over email. This is a summary of our conclusion. The current patch for…
		kbartonUnsubmitted Not Done Reply Inline Actions OK, I'm confused by these comments. With this patch, what code sequence is going to be generated? If there is a further opportunity, why not address it with this patch? If it really makes sense to deal with it in a separate patch, please list the phabricator review (or bugzilla) for it here, so we can be sure to track it and maintain the association between the items. kbarton: OK, I'm confused by these comments. With this patch, what code sequence is going to be…
		amehsanAuthorUnsubmitted Not Done Reply Inline Actions The first sequence is the one generated after this patch. The second sequence is the one currently gcc generates. The one that is generated before this patch is this: 0: 08 00 63 80 lwz r3,8(r3) 4: a6 01 03 7c mtfprwa f0,r3 8: e0 04 20 f0 xscvsxdsp vs1,vs0 c: 20 00 80 4e blr amehsan: The first sequence is the one generated after this patch. The second sequence is the one…
		amehsanAuthorUnsubmitted Not Done Reply Inline Actions I discussed this with Kit offline. I will address the requested change in the test case (making sure meta data is correct) and two nits suggested below. After final approval I will commit this change. The bug exposed by this patch (which was discussed in the comments above) will be addressed separately under the following bugzilla item. https://llvm.org/bugs/show_bug.cgi?id=27204 amehsan: I discussed this with Kit offline. I will address the requested change in the test case (making…
		return true;

		for (SDNode::use_iterator UI = Origin->use_begin(),
		UE = Origin->use_end();
		UI != UE; ++UI) {

		//Only look at the users of the loaded value.
		nemanjaiUnsubmitted Done Reply Inline Actions Just a minor nit - I think we generally prefer complete sentences for comments (capitalization, punctuation). nemanjai: Just a minor nit - I think we generally prefer complete sentences for comments (capitalization…
		mcrosierUnsubmitted Done Reply Inline Actions Please add a space between the "//" and the comments. mcrosier: Please add a space between the "//" and the comments.
		if (UI.getUse().get().getResNo() != 0)
		continue;

		if (UI->getOpcode() != ISD::SINT_TO_FP &&
		UI->getOpcode() != ISD::UINT_TO_FP)
		return true;
		}

		return false;
		}

/// \brief Custom lowers integer to floating point conversions to use		/// \brief Custom lowers integer to floating point conversions to use
/// the direct move instructions available in ISA 2.07 to avoid the		/// the direct move instructions available in ISA 2.07 to avoid the
/// need for load/store combinations.		/// need for load/store combinations.
SDValue PPCTargetLowering::LowerINT_TO_FPDirectMove(SDValue Op,		SDValue PPCTargetLowering::LowerINT_TO_FPDirectMove(SDValue Op,
SelectionDAG &DAG,		SelectionDAG &DAG,
SDLoc dl) const {		SDLoc dl) const {
assert((Op.getValueType() == MVT::f32 \|\|		assert((Op.getValueType() == MVT::f32 \|\|
Op.getValueType() == MVT::f64) &&		Op.getValueType() == MVT::f64) &&
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::LowerINT_TO_FP(SDValue Op,

if (Op.getOperand(0).getValueType() == MVT::i1)		if (Op.getOperand(0).getValueType() == MVT::i1)
return DAG.getNode(ISD::SELECT, dl, Op.getValueType(), Op.getOperand(0),		return DAG.getNode(ISD::SELECT, dl, Op.getValueType(), Op.getOperand(0),
DAG.getConstantFP(1.0, dl, Op.getValueType()),		DAG.getConstantFP(1.0, dl, Op.getValueType()),
DAG.getConstantFP(0.0, dl, Op.getValueType()));		DAG.getConstantFP(0.0, dl, Op.getValueType()));

// If we have direct moves, we can do all the conversion, skip the store/load		// If we have direct moves, we can do all the conversion, skip the store/load
// however, without FPCVT we can't do most conversions.		// however, without FPCVT we can't do most conversions.
if (Subtarget.hasDirectMove() && Subtarget.isPPC64() && Subtarget.hasFPCVT())		if (Subtarget.hasDirectMove() && directMoveIsProfitable(Op) &&
		mcrosierUnsubmitted Not Done Reply Inline Actions The calls to the Subtarget.* are likely very cheap, while the call to directMoveIsProfitable is less so. I'd suggest checking the directMoveIsProfitable() last. mcrosier: The calls to the Subtarget.* are likely very cheap, while the call to directMoveIsProfitable is…
		amehsanAuthorUnsubmitted Not Done Reply Inline Actions Actually, calls to the Subtarget.* are going to be always successful unless someone is compiling for Power 7 or below (which is not very likely) or is doing some work with the testcases. So for almost all practical purposes, we will always call directMoeIsProfitable and make the decision based on that. amehsan: Actually, calls to the Subtarget.* are going to be always successful unless someone is…
		Subtarget.isPPC64() && Subtarget.hasFPCVT())
		kbartonUnsubmitted Not Done Reply Inline Actions I have a very slight preference for checking whether DirectMove is available before checking whether it is profitable. Changing the order of the checks will have virtually no impact, but it makes more sense to me. kbarton: I have a very slight preference for checking whether DirectMove is available before checking…
return LowerINT_TO_FPDirectMove(Op, DAG, dl);		return LowerINT_TO_FPDirectMove(Op, DAG, dl);
		mcrosierUnsubmitted Not Done Reply Inline Actions Fair enough. :) mcrosier: Fair enough. :)
		amehsanAuthorUnsubmitted Not Done Reply Inline Actions :) amehsan: :)

assert((Op.getOpcode() == ISD::SINT_TO_FP \|\| Subtarget.hasFPCVT()) &&		assert((Op.getOpcode() == ISD::SINT_TO_FP \|\| Subtarget.hasFPCVT()) &&
"UINT_TO_FP is supported only with FPCVT");		"UINT_TO_FP is supported only with FPCVT");

// If we have FCFIDS, then use it when converting to single-precision.		// If we have FCFIDS, then use it when converting to single-precision.
// Otherwise, convert to double-precision and then round.		// Otherwise, convert to double-precision and then round.
unsigned FCFOp = (Subtarget.hasFPCVT() && Op.getValueType() == MVT::f32)		unsigned FCFOp = (Subtarget.hasFPCVT() && Op.getValueType() == MVT::f32)
? (Op.getOpcode() == ISD::UINT_TO_FP ? PPCISD::FCFIDUS		? (Op.getOpcode() == ISD::UINT_TO_FP ? PPCISD::FCFIDUS
▲ Show 20 Lines • Show All 5,208 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/direct-move-profit.ll

This file was added.

				; RUN: llc -O2 -mcpu=pwr8 -mtriple=powerpc64le-unknown-unknown < %s \| FileCheck %s

				; Function Attrs: norecurse nounwind
				define void @test1(float* noalias nocapture %a, i32* noalias nocapture readonly %b, i32* nocapture readnone %c, i32 signext %n) #0 {

				nemanjaiUnsubmitted Done Reply Inline Actions You should probably remove the metadata here and (some) below. The triple on the command line and here conflict and I don't believe that all the metadata below is necessary. nemanjai: You should probably remove the metadata here and (some) below. The triple on the command line…
				kbartonUnsubmitted Done Reply Inline Actions I agree. Please remove this, and the target triple line immediately below, and specify the correct triple on the RUN step. kbarton: I agree. Please remove this, and the target triple line immediately below, and specify the…
				;CHECK-LABEL: test1
				mcrosierUnsubmitted Done Reply Inline Actions Please add a space after the semicolon. mcrosier: Please add a space after the semicolon.

				entry:
				%idxprom = sext i32 %n to i64
				%arrayidx = getelementptr inbounds i32, i32* %b, i64 %idxprom
				%0 = load i32, i32* %arrayidx, align 4, !tbaa !1
				%conv = sitofp i32 %0 to float
				%mul = fmul float %conv, 0x4002916880000000
				%arrayidx2 = getelementptr inbounds float, float* %a, i64 %idxprom
				store float %mul, float* %arrayidx2, align 4, !tbaa !5
				ret void

				;CHECK-NOT: mtvsrwa
				;CHECK-NOT: mtfprwa
				;CHECK: lxsiwax [[REG:[0-9]+]], {{.*}}
				;CHECK-NOT: mtvsrwa
				;CHECK-NOT: mtfprwa
				;CHECK: xscvsxdsp {{.*}}, [[REG]]
				;CHECK-NOT: mtvsrwa
				;CHECK-NOT: mtfprwa
				;CHECK: blr

				}

				; Function Attrs: norecurse nounwind readonly
				define float @test2(i32* nocapture readonly %b) #0 {

				;CHECK-LABEL: test2

				entry:
				%0 = load i32, i32* %b, align 4, !tbaa !1
				%conv = sitofp i32 %0 to float
				%mul = fmul float %conv, 0x40030A3D80000000
				ret float %mul

				;CHECK-NOT: mtvsrwa
				;CHECK-NOT: mtfprwa
				;CHECK: lxsiwax [[REG:[0-9]+]], {{.*}}
				;CHECK-NOT: mtvsrwa
				;CHECK-NOT: mtfprwa
				;CHECK: xscvsxdsp {{.*}}, [[REG]]
				;CHECK-NOT: mtvsrwa
				;CHECK-NOT: mtfprwa
				;CHECK: blr

				}

				; Function Attrs: norecurse nounwind
				define void @test3(float* noalias nocapture %a, i32* noalias nocapture readonly %b, i32* noalias nocapture %c, i32 signext %n) #0 {

				;CHECK-LABEL: test3

				entry:
				%idxprom = sext i32 %n to i64
				%arrayidx = getelementptr inbounds i32, i32* %b, i64 %idxprom
				%0 = load i32, i32* %arrayidx, align 4, !tbaa !1
				%conv = sitofp i32 %0 to float
				%mul = fmul float %conv, 0x4002916880000000
				%arrayidx2 = getelementptr inbounds float, float* %a, i64 %idxprom
				store float %mul, float* %arrayidx2, align 4, !tbaa !5
				%arrayidx6 = getelementptr inbounds i32, i32* %c, i64 %idxprom
				%1 = load i32, i32* %arrayidx6, align 4, !tbaa !1
				%add = add nsw i32 %1, %0
				store i32 %add, i32* %arrayidx6, align 4, !tbaa !1
				ret void

				;CHECK: mtvsrwa
				;CHECK: blr

				}

				!0 = !{!"clang version 3.9.0 (http://llvm.org/git/clang.git b88a395e7ba26c0fb96cd99a2a004d76f4f41d0c) (http://llvm.org/git/llvm.git 1ac3fbac0f5b037c17c0b0f9d271c32c4d7ca1b5)"}
				!1 = !{!2, !2, i64 0}
				!2 = !{!"int", !3, i64 0}
				!3 = !{!"omnipotent char", !4, i64 0}
				!4 = !{!"Simple C++ TBAA"}
				!5 = !{!6, !6, i64 0}
				!6 = !{!"float", !3, i64 0}