This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
Analysis/
-
TargetLibraryInfo.cpp
-
Target/PowerPC/
-
PowerPC/
1/1
PPCISelLowering.h
2/2
PPCISelLowering.cpp
-
test/
-
CodeGen/PowerPC/
-
PowerPC/
1
pow_massv_0.75exp.ll
-
Transforms/LoopVectorize/PowerPC/
-
LoopVectorize/
-
PowerPC/
-
massv-calls.ll

Differential D80744

DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single precision even in case massv function is asked
ClosedPublic

Authored by masoud.ataei on May 28 2020, 10:45 AM.

Download Raw Diff

Details

Reviewers

nemanjai
jsji
kbarton
etiotto
pjeeva01
Whitney
steven.zhang

Group Reviewers

Restricted Project

Commits

rG2d038370bb6b: DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single…

Summary

Here, I am proposing to add an special case for massv powf4/powd2 function (SIMD counterpart of powf/pow function in MASSV library) in MASSV pass to get later optimizations like conversion from pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's in the DAGCombiner in vector float case. My reason for doing this is: the optimized pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's is faster than powf4/powd2 on P8 and P9.

In case MASSV functions is called, and if the exponent of pow is 0.75 or 0.25, we will get the sequence of sqrt's and if exponent is not 0.75 or 0.25 we will get the appropriate MASSV function.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

masoud.ataei created this revision.May 28 2020, 10:45 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald TranscriptMay 28 2020, 10:45 AM

Harbormaster failed remote builds in B58266: Diff 266943!May 28 2020, 11:33 AM

Whitney added inline comments.May 28 2020, 6:10 PM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10943	move this line up, so `case` and `return` on the same line like others.
llvm/lib/Target/PowerPC/PPCISelLowering.h
1063	`lowerToLibCall` -> `LowerToLibCall` follow the same convention as the other function startswith `Lower`

What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:

__sind2_massv --> __sind2_P9 for a Power9 subtarget.

Does it make sense ?

Addressing the reviews

In D80744#2062144, @steven.zhang wrote:
What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
__sind2_massv --> __sind2_P9 for a Power9 subtarget.
Does it make sense ?

I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x)). So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.

What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10943	It was the clang-format suggestion to put the return in the new line. I agree it was too ugly.

masoud.ataei marked an inline comment as done.Jun 1 2020, 9:09 AM

Harbormaster failed remote builds in B58628: Diff 267638!Jun 1 2020, 10:12 AM

In D80744#2066590, @masoud.ataei wrote:
In D80744#2062144, @steven.zhang wrote:
What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
__sind2_massv --> __sind2_P9 for a Power9 subtarget.
Does it make sense ?
I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x)). So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.

What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.

If we expect the llvm.pow(x, 0.75) to be lowered as two sqrt and for others, they are libcall, can we just don't transform it as libcall for 0.75 in PPCLowerMASSVEntries? And it will be lowered as two sqrts in DAGCombine.

In D80744#2067821, @steven.zhang wrote:
In D80744#2066590, @masoud.ataei wrote:
In D80744#2062144, @steven.zhang wrote:
What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
__sind2_massv --> __sind2_P9 for a Power9 subtarget.
Does it make sense ?
I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x)). So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.

What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.
If we expect the llvm.pow(x, 0.75) to be lowered as two sqrt and for others, they are libcall, can we just don't transform it as libcall for 0.75 in PPCLowerMASSVEntries? And it will be lowered as two sqrts in DAGCombine.

For pow to be __powf4_P8, there are two step:

in LoopVectorizePass, pow becomes __powf4_massv, then
in PPCLowerMASSVEntries __powf4_massv becomes __powf4_P8

So when we reach PPCLowerMASSVEntries the pow intrinsic is already a libcall. I thought it is really ugly to undo the LoopVectorizePass conversion in PPCLowerMASSVEntries pass when there is an special case.

In D80744#2068900, @masoud.ataei wrote:
In D80744#2067821, @steven.zhang wrote:
In D80744#2066590, @masoud.ataei wrote:
In D80744#2062144, @steven.zhang wrote:
What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
__sind2_massv --> __sind2_P9 for a Power9 subtarget.
Does it make sense ?
I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x)). So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.

What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.
If we expect the llvm.pow(x, 0.75) to be lowered as two sqrt and for others, they are libcall, can we just don't transform it as libcall for 0.75 in PPCLowerMASSVEntries? And it will be lowered as two sqrts in DAGCombine.
For pow to be __powf4_P8, there are two step:

in LoopVectorizePass, pow becomes __powf4_massv, then

in PPCLowerMASSVEntries __powf4_massv becomes __powf4_P8

So when we reach PPCLowerMASSVEntries the pow intrinsic is already a libcall. I thought it is really ugly to undo the LoopVectorizePass conversion in PPCLowerMASSVEntries pass when there is an special case.

As what I see from your test is that, we are trying to lower the pow intrinsic(not pow libcall) to two sqrts or __powf4_P8/9. It is different from the case that, pow -> __powf4_massv -> __powf4_P8 as they are all libcall path.

If your intention is to have __powf4_massv transformed to two fsqrts if argument is 0.75, the test is missing in this patch and I am not sure if this patch could work. You might need to transform the __powf4_massv to llvm.pow intrinsic at some place like PartiallyInlineLibCallsLegacyPass if the argument is 0.75

Maybe, I miss some background here and I just want to get a clear understanding of the reason why we have to do it in DAG. Thank you for your patience.

In D80744#2070174, @steven.zhang wrote:
In D80744#2068900, @masoud.ataei wrote:
In D80744#2067821, @steven.zhang wrote:
In D80744#2066590, @masoud.ataei wrote:
In D80744#2062144, @steven.zhang wrote:
What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
__sind2_massv --> __sind2_P9 for a Power9 subtarget.
Does it make sense ?
I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x)). So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.

What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.
If we expect the llvm.pow(x, 0.75) to be lowered as two sqrt and for others, they are libcall, can we just don't transform it as libcall for 0.75 in PPCLowerMASSVEntries? And it will be lowered as two sqrts in DAGCombine.
For pow to be __powf4_P8, there are two step:

in LoopVectorizePass, pow becomes __powf4_massv, then

in PPCLowerMASSVEntries __powf4_massv becomes __powf4_P8

So when we reach PPCLowerMASSVEntries the pow intrinsic is already a libcall. I thought it is really ugly to undo the LoopVectorizePass conversion in PPCLowerMASSVEntries pass when there is an special case.
As what I see from your test is that, we are trying to lower the pow intrinsic(not pow libcall) to two sqrts or __powf4_P8/9. It is different from the case that, pow -> __powf4_massv -> __powf4_P8 as they are all libcall path.

If your intention is to have __powf4_massv transformed to two fsqrts if argument is 0.75, the test is missing in this patch and I am not sure if this patch could work. You might need to transform the __powf4_massv to llvm.pow intrinsic at some place like PartiallyInlineLibCallsLegacyPass if the argument is 0.75

Maybe, I miss some background here and I just want to get a clear understanding of the reason why we have to do it in DAG. Thank you for your patience.

Thank you for reviewing this patch.

I was also thinking that I needed to revert the conversion in LoopVectorizePass and PPCLowerMASSVEntries to llvm intrinsic llvm.pow before optimization to sqrt's happening. But it seems just with setting the operation action to custom setOperationAction(ISD::FPOW, MVT::v4f32, Custom); we will get the sqrt's optimization when pow(x,0.75) is used in the code.

I tested with a c code too

$ cat vst.c 
#include<math.h>
void my_vspow_075 (float y[], float x[]) {
  #pragma disjoint (*y, *x)

  float *xp=x, *yp=y;
  int i;

  for (i=0; i<1024; i++) {
     *yp=powf(*xp, 0.75);
     xp++;
     yp++;
  }
}

compile it with clang -Ofast -fveclib=MASSV vst.c -mllvm -print-after-all. I can see that after LoopVectorizePass, llvm.pow.f32 is converted to __powf4_massv and after PPCLowerMASSVEntries __powf4_massv is converted to __powf4_P8. But later we will get conversion to sqrt's. This is happening if you just set setOperationAction(ISD::FPOW, MVT::v4f32, Custom); no other changes. The rest of changes in PPCISelLowring.cpp is needed to handle the cases of powf(x,y) when y is not 0.75.

About whether we apply this changes in LoopVectorizePass, PPCLowerMASSVEntries or in PPCISelLowing, I was not the only person that decided to handle it in PPCISelLowing. I will be happy if they can speak up and help me to correctly answer your question. Their names are in the list of reviewers so I won't tag them here again.

I tested with a c code too
$ cat vst.c 
#include<math.h>
void my_vspow_075 (float y[], float x[]) {
  #pragma disjoint (*y, *x)

  float *xp=x, *yp=y;
  int i;

  for (i=0; i<1024; i++) {
     *yp=powf(*xp, 0.75);
     xp++;
     yp++;
  }
}
compile it with clang -Ofast -fveclib=MASSV vst.c -mllvm -print-after-all. I can see that after LoopVectorizePass, llvm.pow.f32 is converted to __powf4_massv and after PPCLowerMASSVEntries __powf4_massv is converted to __powf4_P8. But later we will get conversion to sqrt's. This is happening if you just set setOperationAction(ISD::FPOW, MVT::v4f32, Custom); no other changes. The rest of changes in PPCISelLowring.cpp is needed to handle the cases of powf(x,y) when y is not 0.75.

I don't see any difference of the assembly output from your example code with your patch. I see that we turn the llvm.pow.f32 into two sqrts w/o this patch, not __powf4_P8. I assume that, you want to turn __powf4_P8 into two sqrt's if the arguments is 0.75. Please correct me if I misunderstand the intention.

@steven.zhang
I think I had done a terrible mistake. When I tested with my c code, I didn't have if (ClVectorLibrary == TargetLibraryInfoImpl::MASSV) on

if (ClVectorLibrary == TargetLibraryInfoImpl::MASSV)
  setOperationAction(ISD::FPOW, MVT::v4f32, Custom);

and for some reason if I move this check inside the function LowerFPOWMASSV it works good. So I am updating the patch. Thank you for catching it.

Although, this fix is not ideal. In case we want to use other vector libraries like Accelerate or SVML on PowerPC in future, this code is preventing them to generate accurate libcall for them. Any idea how to fix this issue?

Harbormaster failed remote builds in B59117: Diff 268538!Jun 4 2020, 12:43 PM

Although, this fix is not ideal. In case we want to use other vector libraries like Accelerate or SVML on PowerPC in future, this code is preventing them to generate accurate libcall for them. Any idea how to fix this issue?

As you mark the FPOW as custom which will change the cost when vectorize the llvm.pow.f32. So it will vectorize it as llvm.pow.v4f32 even MASSV is disabled, as we are telling the loop vectorizer that it is cheap in Backend to lower FPOW(llvm.pow.v4f32), which is not always true. That will bring regression for the code path with MASSV disabled if the argument is not 0.75.

I think, the best way to do this is to adjust the loop vectorizer cost model for PowerPC. If the argument is 0.75, the cost of vectorizing llvm.pow.f32 is small, no matter MASSV enabled or not. So that, we will always get llvm.pow.v4f32. But I don't know if it is easy to do it.

Another easier way is to turn the massv function to intrinsic, as that is the motivation of this patch as you described which makes sense to me. What do you think ?

diff --git a/llvm/lib/Target/PowerPC/PPCLowerMASSVEntries.cpp b/llvm/lib/Target/PowerPC/PPCLowerMASSVEntries.cpp
index 429b8a31fbe9..74bd31b0b044 100644
--- a/llvm/lib/Target/PowerPC/PPCLowerMASSVEntries.cpp
+++ b/llvm/lib/Target/PowerPC/PPCLowerMASSVEntries.cpp
@@ -105,6 +105,21 @@ bool PPCLowerMASSVEntries::lowerMASSVCall(CallInst *CI, Function &Func,
   if (CI->use_empty())
     return false;

+  // FIXME - add necessary fast math flag check here.
+  if (Func.getName() == "__powf4_massv") {
+    if (Constant *Exp = dyn_cast<Constant>(CI->getArgOperand(1))) {
+      if (ConstantFP *CFP = dyn_cast<ConstantFP>(Exp->getSplatValue())) {
+        // If the argument is 0.75, it is cheaper to turn it into pow intrinsic
+        // so that it could be optimzed as two sqrt's.
+        if (CFP->isExactlyValue(0.75)) {
+          CI->setCalledFunction(Intrinsic::getDeclaration(&M, Intrinsic::pow,
+                                                          CI->getType()));
+          return true;
+        }
+      }
+    }
+  }
+
   std::string MASSVEntryName = createMASSVFuncName(Func, Subtarget);
   FunctionCallee FCache = M.getOrInsertFunction(
       MASSVEntryName, Func.getFunctionType(), Func.getAttributes());

Moving completely the changes from PPCISelLowring.cpp to PPCLowerMASSVEntries.cpp (MASSV pass) to address the reviewer comments.

Harbormaster failed remote builds in B59622: Diff 269511!Jun 9 2020, 7:06 AM

steven.zhang added inline comments.Jun 9 2020, 8:27 PM

llvm/lib/Target/PowerPC/PPCLowerMASSVEntries.cpp
103 ↗	(On Diff #269511)	The flag nsz don't matter here. It is only used to handle the 0.25 case. But maybe, you can extend it to support both 0.75 and 0.25 as what we did in DAGCombine or remove the nsz check here.
llvm/test/CodeGen/PowerPC/pow_massv_0.75exp.ll
49	Don't add the flag fast as it is too general. But adding the ninf and afn and add another negative case that didn't have these to flags.

I added support for the case when exponent is 0.25 in addition to support for double precision cases.

masoud.ataei retitled this revision from DAGCombiner optimization for pow(x,0.75) even in case massv function is asked to DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single precision even in case massv function is asked.Jun 10 2020, 12:22 PM

masoud.ataei edited the summary of this revision. (Show Details)

Sorry, I forgot the clang-format.

Harbormaster failed remote builds in B59862: Diff 269938!Jun 10 2020, 1:22 PM

Harbormaster completed remote builds in B59865: Diff 269945.Jun 10 2020, 2:28 PM

LGTM now. Thank you for your patient.

This revision is now accepted and ready to land.Jun 10 2020, 3:55 PM

Closed by commit rG2d038370bb6b: DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single… (authored by masoud.ataei). · Explain WhyJun 12 2020, 7:33 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

TargetLibraryInfo.cpp

2 lines

Target/

PowerPC/

PPCISelLowering.h

3 lines

PPCISelLowering.cpp

50 lines

test/

CodeGen/

PowerPC/

pow_massv_0.75exp.ll

96 lines

Transforms/

LoopVectorize/

PowerPC/

massv-calls.ll

25 lines

Diff 267638

llvm/lib/Analysis/TargetLibraryInfo.cpp

	Show All 11 Lines

	#include "llvm/Analysis/TargetLibraryInfo.h"			#include "llvm/Analysis/TargetLibraryInfo.h"
	#include "llvm/ADT/Triple.h"			#include "llvm/ADT/Triple.h"
	#include "llvm/IR/Constants.h"			#include "llvm/IR/Constants.h"
	#include "llvm/InitializePasses.h"			#include "llvm/InitializePasses.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
	using namespace llvm;			using namespace llvm;

	static cl::opt<TargetLibraryInfoImpl::VectorLibrary> ClVectorLibrary(			cl::opt<TargetLibraryInfoImpl::VectorLibrary> ClVectorLibrary(
	"vector-library", cl::Hidden, cl::desc("Vector functions library"),			"vector-library", cl::Hidden, cl::desc("Vector functions library"),
	cl::init(TargetLibraryInfoImpl::NoLibrary),			cl::init(TargetLibraryInfoImpl::NoLibrary),
	cl::values(clEnumValN(TargetLibraryInfoImpl::NoLibrary, "none",			cl::values(clEnumValN(TargetLibraryInfoImpl::NoLibrary, "none",
	"No vector functions library"),			"No vector functions library"),
	clEnumValN(TargetLibraryInfoImpl::Accelerate, "Accelerate",			clEnumValN(TargetLibraryInfoImpl::Accelerate, "Accelerate",
	"Accelerate framework"),			"Accelerate framework"),
	clEnumValN(TargetLibraryInfoImpl::MASSV, "MASSV",			clEnumValN(TargetLibraryInfoImpl::MASSV, "MASSV",
	"IBM MASS vector library"),			"IBM MASS vector library"),
	▲ Show 20 Lines • Show All 1,627 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 1,054 Lines • ▼ Show 20 Lines	SDValue EmitTailCallLoadFPAndRetAddr(SelectionDAG &DAG, int SPDiff,
SDValue Chain, SDValue &LROpOut,		SDValue Chain, SDValue &LROpOut,
SDValue &FPOpOut,		SDValue &FPOpOut,
const SDLoc &dl) const;		const SDLoc &dl) const;

SDValue getTOCEntry(SelectionDAG &DAG, const SDLoc &dl, SDValue GA) const;		SDValue getTOCEntry(SelectionDAG &DAG, const SDLoc &dl, SDValue GA) const;

SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerToLibCall(const char *LibCallName, CallingConv::ID CC,
		WhitneyUnsubmitted Done Reply Inline Actions `lowerToLibCall` -> `LowerToLibCall` follow the same convention as the other function startswith `Lower` Whitney: `lowerToLibCall` -> `LowerToLibCall` follow the same convention as the other function…
		SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerFPOWMASSV(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerConstantPool(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerConstantPool(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBlockAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBlockAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerJumpTable(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerJumpTable(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSETCC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSETCC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINIT_TRAMPOLINE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINIT_TRAMPOLINE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerADJUST_TRAMPOLINE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerADJUST_TRAMPOLINE(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 28 Lines
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/CodeGen/CallingConvLower.h"		#include "llvm/CodeGen/CallingConvLower.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineJumpTableInfo.h"		#include "llvm/CodeGen/MachineJumpTableInfo.h"
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines

STATISTIC(NumTailCalls, "Number of tail calls");		STATISTIC(NumTailCalls, "Number of tail calls");
STATISTIC(NumSiblingCalls, "Number of sibling calls");		STATISTIC(NumSiblingCalls, "Number of sibling calls");

static bool isNByteElemShuffleMask(ShuffleVectorSDNode *, unsigned, int);		static bool isNByteElemShuffleMask(ShuffleVectorSDNode *, unsigned, int);

static SDValue widenVec(SelectionDAG &DAG, SDValue Vec, const SDLoc &dl);		static SDValue widenVec(SelectionDAG &DAG, SDValue Vec, const SDLoc &dl);

		extern cl::opt<TargetLibraryInfoImpl::VectorLibrary> ClVectorLibrary;

// FIXME: Remove this once the bug has been fixed!		// FIXME: Remove this once the bug has been fixed!
extern cl::opt<bool> ANDIGlueBug;		extern cl::opt<bool> ANDIGlueBug;

PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,		PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
const PPCSubtarget &STI)		const PPCSubtarget &STI)
: TargetLowering(TM), Subtarget(STI) {		: TargetLowering(TM), Subtarget(STI) {
// On PPC32/64, arguments smaller than 4/8 bytes are extended, so all		// On PPC32/64, arguments smaller than 4/8 bytes are extended, so all
// arguments are at least 4/8 bytes aligned.		// arguments are at least 4/8 bytes aligned.
▲ Show 20 Lines • Show All 634 Lines • ▼ Show 20 Lines	if (Subtarget.hasAltivec()) {
addRegisterClass(MVT::v16i8, &PPC::VRRCRegClass);		addRegisterClass(MVT::v16i8, &PPC::VRRCRegClass);

setOperationAction(ISD::MUL, MVT::v4f32, Legal);		setOperationAction(ISD::MUL, MVT::v4f32, Legal);
setOperationAction(ISD::FMA, MVT::v4f32, Legal);		setOperationAction(ISD::FMA, MVT::v4f32, Legal);

if (TM.Options.UnsafeFPMath \|\| Subtarget.hasVSX()) {		if (TM.Options.UnsafeFPMath \|\| Subtarget.hasVSX()) {
setOperationAction(ISD::FDIV, MVT::v4f32, Legal);		setOperationAction(ISD::FDIV, MVT::v4f32, Legal);
setOperationAction(ISD::FSQRT, MVT::v4f32, Legal);		setOperationAction(ISD::FSQRT, MVT::v4f32, Legal);
		if (ClVectorLibrary == TargetLibraryInfoImpl::MASSV)
		setOperationAction(ISD::FPOW, MVT::v4f32, Custom);
}		}

if (Subtarget.hasP8Altivec())		if (Subtarget.hasP8Altivec())
setOperationAction(ISD::MUL, MVT::v4i32, Legal);		setOperationAction(ISD::MUL, MVT::v4i32, Legal);
else		else
setOperationAction(ISD::MUL, MVT::v4i32, Custom);		setOperationAction(ISD::MUL, MVT::v4i32, Custom);

setOperationAction(ISD::MUL, MVT::v8i16, Legal);		setOperationAction(ISD::MUL, MVT::v8i16, Legal);
▲ Show 20 Lines • Show All 2,041 Lines • ▼ Show 20 Lines	SDValue Reg = Is64Bit ? DAG.getRegister(PPC::X2, VT)
: DAG.getNode(PPCISD::GlobalBaseReg, dl, VT);		: DAG.getNode(PPCISD::GlobalBaseReg, dl, VT);
SDValue Ops[] = { GA, Reg };		SDValue Ops[] = { GA, Reg };
return DAG.getMemIntrinsicNode(		return DAG.getMemIntrinsicNode(
PPCISD::TOC_ENTRY, dl, DAG.getVTList(VT, MVT::Other), Ops, VT,		PPCISD::TOC_ENTRY, dl, DAG.getVTList(VT, MVT::Other), Ops, VT,
MachinePointerInfo::getGOT(DAG.getMachineFunction()), None,		MachinePointerInfo::getGOT(DAG.getMachineFunction()), None,
MachineMemOperand::MOLoad);		MachineMemOperand::MOLoad);
}		}

		SDValue PPCTargetLowering::LowerToLibCall(const char *LibCallName,
		CallingConv::ID CC, SDValue Op,
		SelectionDAG &DAG) const {
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		TargetLowering::CallLoweringInfo CLI(DAG);
		EVT RetVT = Op.getValueType();

		TargetLowering::ArgListTy Args;
		TargetLowering::ArgListEntry Entry;
		for (const SDValue &OpArgs : Op->op_values()) {
		EVT ArgVT = OpArgs.getValueType();
		Type ArgTy = ArgVT.getTypeForEVT(DAG.getContext());
		Entry.Node = OpArgs;
		Entry.Ty = ArgTy;
		Entry.IsSExt = TLI.shouldSignExtendTypeInLibCall(ArgVT, false);
		Entry.IsZExt = !TLI.shouldSignExtendTypeInLibCall(ArgVT, false);
		Args.push_back(Entry);
		}

		SDValue Callee =
		DAG.getExternalSymbol(LibCallName, TLI.getPointerTy(DAG.getDataLayout()));
		bool SignExtend = TLI.shouldSignExtendTypeInLibCall(RetVT, false);
		CLI.setDebugLoc(SDLoc(Op))
		.setChain(DAG.getEntryNode())
		.setLibCallee(CC, RetVT.getTypeForEVT(*DAG.getContext()), Callee,
		std::move(Args))
		.setTailCall(true)
		.setSExtResult(SignExtend)
		.setZExtResult(!SignExtend)
		.setIsPostTypeLegalization(true);
		return TLI.LowerCallTo(CLI).first;
		}

		SDValue PPCTargetLowering::LowerFPOWMASSV(SDValue Op,
		SelectionDAG &DAG) const {
		if (Op.getValueType() != MVT::v4f32 \|\| !Subtarget.hasP8Vector())
		return SDValue();

		if (Subtarget.hasP9Vector())
		return LowerToLibCall("__powf4_P9", CallingConv::C, Op, DAG);
		else
		return LowerToLibCall("__powf4_P8", CallingConv::C, Op, DAG);
		}

SDValue PPCTargetLowering::LowerConstantPool(SDValue Op,		SDValue PPCTargetLowering::LowerConstantPool(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
EVT PtrVT = Op.getValueType();		EVT PtrVT = Op.getValueType();
ConstantPoolSDNode *CP = cast<ConstantPoolSDNode>(Op);		ConstantPoolSDNode *CP = cast<ConstantPoolSDNode>(Op);
const Constant *C = CP->getConstVal();		const Constant *C = CP->getConstVal();

// 64-bit SVR4 ABI and AIX ABI code are always position-independent.		// 64-bit SVR4 ABI and AIX ABI code are always position-independent.
// The actual address of the GlobalValue is stored in the TOC.		// The actual address of the GlobalValue is stored in the TOC.
▲ Show 20 Lines • Show All 8,037 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::VECTOR_SHUFFLE: return LowerVECTOR_SHUFFLE(Op, DAG);		case ISD::VECTOR_SHUFFLE: return LowerVECTOR_SHUFFLE(Op, DAG);
case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);		case ISD::INTRINSIC_WO_CHAIN: return LowerINTRINSIC_WO_CHAIN(Op, DAG);
case ISD::SCALAR_TO_VECTOR: return LowerSCALAR_TO_VECTOR(Op, DAG);		case ISD::SCALAR_TO_VECTOR: return LowerSCALAR_TO_VECTOR(Op, DAG);
case ISD::EXTRACT_VECTOR_ELT: return LowerEXTRACT_VECTOR_ELT(Op, DAG);		case ISD::EXTRACT_VECTOR_ELT: return LowerEXTRACT_VECTOR_ELT(Op, DAG);
case ISD::INSERT_VECTOR_ELT: return LowerINSERT_VECTOR_ELT(Op, DAG);		case ISD::INSERT_VECTOR_ELT: return LowerINSERT_VECTOR_ELT(Op, DAG);
case ISD::MUL: return LowerMUL(Op, DAG);		case ISD::MUL: return LowerMUL(Op, DAG);
case ISD::ABS: return LowerABS(Op, DAG);		case ISD::ABS: return LowerABS(Op, DAG);
case ISD::FP_EXTEND: return LowerFP_EXTEND(Op, DAG);		case ISD::FP_EXTEND: return LowerFP_EXTEND(Op, DAG);
		case ISD::FPOW: return LowerFPOWMASSV(Op, DAG);

		WhitneyUnsubmitted Done Reply Inline Actions move this line up, so `case` and `return` on the same line like others. Whitney: move this line up, so `case` and `return` on the same line like others.
		masoud.ataeiAuthorUnsubmitted Done Reply Inline Actions It was the clang-format suggestion to put the return in the new line. I agree it was too ugly. masoud.ataei: It was the clang-format suggestion to put the return in the new line. I agree it was too ugly.
// For counter-based loop handling.		// For counter-based loop handling.
case ISD::INTRINSIC_W_CHAIN: return SDValue();		case ISD::INTRINSIC_W_CHAIN: return SDValue();

case ISD::BITCAST: return LowerBITCAST(Op, DAG);		case ISD::BITCAST: return LowerBITCAST(Op, DAG);

// Frame & Return address.		// Frame & Return address.
case ISD::RETURNADDR: return LowerRETURNADDR(Op, DAG);		case ISD::RETURNADDR: return LowerRETURNADDR(Op, DAG);
case ISD::FRAMEADDR: return LowerFRAMEADDR(Op, DAG);		case ISD::FRAMEADDR: return LowerFRAMEADDR(Op, DAG);
▲ Show 20 Lines • Show All 5,424 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/pow_massv_0.75exp.ll

This file was added.

				; RUN: llc -vector-library=MASSV < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr9 \| FileCheck -check-prefixes=CHECK-PWR9 %s
				; RUN: llc -vector-library=MASSV < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr8 \| FileCheck -check-prefixes=CHECK-PWR8 %s
				; RUN: llc -vector-library=MASSV < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr7 \| FileCheck -check-prefixes=CHECK-PWR7 %s
				; RUN: llc < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr9 \| FileCheck -check-prefixes=CHECK-NOMASSV %s
				; RUN: llc < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr8 \| FileCheck -check-prefixes=CHECK-NOMASSV %s
				; RUN: llc < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr7 \| FileCheck -check-prefixes=CHECK-NOMASSV %s

				; Exponent is a variable
				define void @vspow_no075(float* nocapture %z, float* nocapture readonly %y, float* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_no075
				; CHECK-PWR9: bl __powf4_P9
				; CHECK-PWR8: bl __powf4_P8
				; CHECK-PWR7: bl powf
				; CHECK-NOMASSV-NOT: bl __powf4_P{{[8,9]}}
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr float, float* %z, i64 %index
				%next.gep31 = getelementptr float, float* %y, i64 %index
				%next.gep32 = getelementptr float, float* %x, i64 %index
				%0 = bitcast float* %next.gep32 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %0, align 4
				%1 = bitcast float* %next.gep31 to <4 x float>*
				%wide.load33 = load <4 x float>, <4 x float>* %1, align 4
				%2 = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %wide.load, <4 x float> %wide.load33)
				%3 = bitcast float* %next.gep to <4 x float>*
				store <4 x float> %2, <4 x float>* %3, align 4
				%index.next = add i64 %index, 4
				%4 = icmp eq i64 %index.next, 1024
				br i1 %4, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is a constant != 0.75
				define void @vspow_no075c(float* nocapture %y, float* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_no075c
				; CHECK-PWR9: bl __powf4_P9
				; CHECK-PWR8: bl __powf4_P8
				; CHECK-PWR7: bl powf
				; CHECK-NOMASSV-NOT: bl __powf4_P{{[8,9]}}
				; CHECK: blr
				entry:
				br label %vector.body

				steven.zhangUnsubmitted Not Done Reply Inline Actions Don't add the flag fast as it is too general. But adding the ninf and afn and add another negative case that didn't have these to flags. steven.zhang: Don't add the flag fast as it is too general. But adding the ninf and afn and add another…
				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr float, float* %y, i64 %index
				%next.gep19 = getelementptr float, float* %x, i64 %index
				%0 = bitcast float* %next.gep19 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %0, align 4
				%1 = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %wide.load, <4 x float> <float 0x3FE851EB80000000, float 0x3FE851EB80000000, float 0x3FE851EB80000000, float 0x3FE851EB80000000>)
				%2 = bitcast float* %next.gep to <4 x float>*
				store <4 x float> %1, <4 x float>* %2, align 4
				%index.next = add i64 %index, 4
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is 0.75
				define void @vspow_075(float* nocapture %y, float* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_075
				; CHECK-NOT: bl __powf4_P{{[8,9]}}
				; CHECK: xvrsqrtesp
				; CHECK-NOMASSV-NOT: bl __powf4_P{{[8,9]}}
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr float, float* %y, i64 %index
				%next.gep19 = getelementptr float, float* %x, i64 %index
				%0 = bitcast float* %next.gep19 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %0, align 4
				%1 = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %wide.load, <4 x float> <float 7.500000e-01, float 7.500000e-01, float 7.500000e-01, float 7.500000e-01>)
				%2 = bitcast float* %next.gep to <4 x float>*
				store <4 x float> %1, <4 x float>* %2, align 4
				%index.next = add i64 %index, 4
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Function Attrs: nounwind readnone speculatable willreturn
				declare <4 x float> @llvm.pow.v4f32(<4 x float>, <4 x float>)

llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll

Show First 20 Lines • Show All 203 Lines • ▼ Show 20 Lines	for.body:
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, 1000		%exitcond = icmp eq i64 %iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body		br i1 %exitcond, label %for.end, label %for.body

for.end:		for.end:
ret void		ret void
}		}

define void @pow_f32_intrinsic(float* nocapture %varray, float* nocapture readonly %exp) {
; CHECK-LABEL: @pow_f32_intrinsic(
; CHECK: __powf4_massv{{.*}}<4 x float>
; CHECK: ret void
;
entry:
br label %for.body

for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%tmp = trunc i64 %iv to i32
%conv = sitofp i32 %tmp to float
%arrayidx = getelementptr inbounds float, float* %exp, i64 %iv
%tmp1 = load float, float* %arrayidx, align 4
%tmp2 = tail call float @llvm.pow.f32(float %conv, float %tmp1)
%arrayidx2 = getelementptr inbounds float, float* %varray, i64 %iv
store float %tmp2, float* %arrayidx2, align 4
%iv.next = add nuw nsw i64 %iv, 1
%exitcond = icmp eq i64 %iv.next, 1000
br i1 %exitcond, label %for.end, label %for.body

for.end:
ret void
}

define void @sqrt_f64(double* nocapture %varray) {		define void @sqrt_f64(double* nocapture %varray) {
; CHECK-LABEL: @sqrt_f64(		; CHECK-LABEL: @sqrt_f64(
; CHECK: __sqrtd2_massv{{.*}}<2 x double>		; CHECK: __sqrtd2_massv{{.*}}<2 x double>
; CHECK: ret void		; CHECK: ret void
;		;
entry:		entry:
br label %for.body		br label %for.body

▲ Show 20 Lines • Show All 1,281 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single precision even in case massv function is askedClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 267638

llvm/lib/Analysis/TargetLibraryInfo.cpp

llvm/lib/Target/PowerPC/PPCISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/test/CodeGen/PowerPC/pow_massv_0.75exp.ll

llvm/test/Transforms/LoopVectorize/PowerPC/massv-calls.ll

DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single precision even in case massv function is asked
ClosedPublic