This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
1
PPCLowerMASSVEntries.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
pow_massv_075_025exp.ll
-
powf_massv_075_025exp.ll

Differential D80744

DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single precision even in case massv function is asked
ClosedPublic

Authored by masoud.ataei on May 28 2020, 10:45 AM.

Download Raw Diff

Details

Reviewers

nemanjai
jsji
kbarton
etiotto
pjeeva01
Whitney
steven.zhang

Group Reviewers

Restricted Project

Commits

rG2d038370bb6b: DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single…

Summary

Here, I am proposing to add an special case for massv powf4/powd2 function (SIMD counterpart of powf/pow function in MASSV library) in MASSV pass to get later optimizations like conversion from pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's in the DAGCombiner in vector float case. My reason for doing this is: the optimized pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's is faster than powf4/powd2 on P8 and P9.

In case MASSV functions is called, and if the exponent of pow is 0.75 or 0.25, we will get the sequence of sqrt's and if exponent is not 0.75 or 0.25 we will get the appropriate MASSV function.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

masoud.ataei created this revision.May 28 2020, 10:45 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald TranscriptMay 28 2020, 10:45 AM

Harbormaster failed remote builds in B58266: Diff 266943!May 28 2020, 11:33 AM

Whitney added inline comments.May 28 2020, 6:10 PM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10943 ↗	(On Diff #266943)	move this line up, so `case` and `return` on the same line like others.
llvm/lib/Target/PowerPC/PPCISelLowering.h
1063 ↗	(On Diff #266943)	`lowerToLibCall` -> `LowerToLibCall` follow the same convention as the other function startswith `Lower`

What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:

__sind2_massv --> __sind2_P9 for a Power9 subtarget.

Does it make sense ?

Addressing the reviews

In D80744#2062144, @steven.zhang wrote:
What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
__sind2_massv --> __sind2_P9 for a Power9 subtarget.
Does it make sense ?

I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x)). So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.

What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10943 ↗	(On Diff #266943)	It was the clang-format suggestion to put the return in the new line. I agree it was too ugly.

masoud.ataei marked an inline comment as done.Jun 1 2020, 9:09 AM

Harbormaster failed remote builds in B58628: Diff 267638!Jun 1 2020, 10:12 AM

In D80744#2066590, @masoud.ataei wrote:
In D80744#2062144, @steven.zhang wrote:
What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
__sind2_massv --> __sind2_P9 for a Power9 subtarget.
Does it make sense ?
I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x)). So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.

What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.

If we expect the llvm.pow(x, 0.75) to be lowered as two sqrt and for others, they are libcall, can we just don't transform it as libcall for 0.75 in PPCLowerMASSVEntries? And it will be lowered as two sqrts in DAGCombine.

In D80744#2067821, @steven.zhang wrote:
In D80744#2066590, @masoud.ataei wrote:
In D80744#2062144, @steven.zhang wrote:
What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
__sind2_massv --> __sind2_P9 for a Power9 subtarget.
Does it make sense ?
I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x)). So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.

What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.
If we expect the llvm.pow(x, 0.75) to be lowered as two sqrt and for others, they are libcall, can we just don't transform it as libcall for 0.75 in PPCLowerMASSVEntries? And it will be lowered as two sqrts in DAGCombine.

For pow to be __powf4_P8, there are two step:

in LoopVectorizePass, pow becomes __powf4_massv, then
in PPCLowerMASSVEntries __powf4_massv becomes __powf4_P8

So when we reach PPCLowerMASSVEntries the pow intrinsic is already a libcall. I thought it is really ugly to undo the LoopVectorizePass conversion in PPCLowerMASSVEntries pass when there is an special case.

In D80744#2068900, @masoud.ataei wrote:
In D80744#2067821, @steven.zhang wrote:
In D80744#2066590, @masoud.ataei wrote:
In D80744#2062144, @steven.zhang wrote:
What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
__sind2_massv --> __sind2_P9 for a Power9 subtarget.
Does it make sense ?
I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x)). So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.

What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.
If we expect the llvm.pow(x, 0.75) to be lowered as two sqrt and for others, they are libcall, can we just don't transform it as libcall for 0.75 in PPCLowerMASSVEntries? And it will be lowered as two sqrts in DAGCombine.
For pow to be __powf4_P8, there are two step:

in LoopVectorizePass, pow becomes __powf4_massv, then

in PPCLowerMASSVEntries __powf4_massv becomes __powf4_P8

So when we reach PPCLowerMASSVEntries the pow intrinsic is already a libcall. I thought it is really ugly to undo the LoopVectorizePass conversion in PPCLowerMASSVEntries pass when there is an special case.

As what I see from your test is that, we are trying to lower the pow intrinsic(not pow libcall) to two sqrts or __powf4_P8/9. It is different from the case that, pow -> __powf4_massv -> __powf4_P8 as they are all libcall path.

If your intention is to have __powf4_massv transformed to two fsqrts if argument is 0.75, the test is missing in this patch and I am not sure if this patch could work. You might need to transform the __powf4_massv to llvm.pow intrinsic at some place like PartiallyInlineLibCallsLegacyPass if the argument is 0.75

Maybe, I miss some background here and I just want to get a clear understanding of the reason why we have to do it in DAG. Thank you for your patience.

In D80744#2070174, @steven.zhang wrote:
In D80744#2068900, @masoud.ataei wrote:
In D80744#2067821, @steven.zhang wrote:
In D80744#2066590, @masoud.ataei wrote:
In D80744#2062144, @steven.zhang wrote:
What we are doing is as follows:
llvm.pow(IR) --> FPOW(ISD) --> __powf4_P8/9(ISD/IR)

It makes more sense to do it in the IR pass from what I see. And then, you can query the TargetLibraryInfoImpl::isFunctionVectorizable(name) instead of the option. Maybe, we can do it in ppc pass: PPCLowerMASSVEntries which did something like:
__sind2_massv --> __sind2_P9 for a Power9 subtarget.
Does it make sense ?
I agree in general it makes more sense to do this kind of conversions in an IR pass like PPCLowerMASSVEntries. This is what we are currently doing in LLVM but there is a problem here. If we change the llvm intrinsic to libcall earlier than a later optimization (like in DAGCombiner) the later optimization won't be triggered. In the case that I am proposing the change, if we have pow(x,0.75) in the code, the PPCLowerMASSVEntries pass will currently change it to __powf4_P* libcall then later in the DAGCombiner we will not get the optimization pow(x,0.75) --> sqrt(x)*sqrt(sqrt(x)). So that's a problem, because sqrt(x)*sqrt(sqrt(x)) is faster.

What I am proposing is to move the conversion for powf4 to late in the compiler pipeline. With this change we will we will get above optimization when exponent is 0.75 and MASSV calls otherwise.
If we expect the llvm.pow(x, 0.75) to be lowered as two sqrt and for others, they are libcall, can we just don't transform it as libcall for 0.75 in PPCLowerMASSVEntries? And it will be lowered as two sqrts in DAGCombine.
For pow to be __powf4_P8, there are two step:

in LoopVectorizePass, pow becomes __powf4_massv, then

in PPCLowerMASSVEntries __powf4_massv becomes __powf4_P8

So when we reach PPCLowerMASSVEntries the pow intrinsic is already a libcall. I thought it is really ugly to undo the LoopVectorizePass conversion in PPCLowerMASSVEntries pass when there is an special case.
As what I see from your test is that, we are trying to lower the pow intrinsic(not pow libcall) to two sqrts or __powf4_P8/9. It is different from the case that, pow -> __powf4_massv -> __powf4_P8 as they are all libcall path.

If your intention is to have __powf4_massv transformed to two fsqrts if argument is 0.75, the test is missing in this patch and I am not sure if this patch could work. You might need to transform the __powf4_massv to llvm.pow intrinsic at some place like PartiallyInlineLibCallsLegacyPass if the argument is 0.75

Maybe, I miss some background here and I just want to get a clear understanding of the reason why we have to do it in DAG. Thank you for your patience.

Thank you for reviewing this patch.

I was also thinking that I needed to revert the conversion in LoopVectorizePass and PPCLowerMASSVEntries to llvm intrinsic llvm.pow before optimization to sqrt's happening. But it seems just with setting the operation action to custom setOperationAction(ISD::FPOW, MVT::v4f32, Custom); we will get the sqrt's optimization when pow(x,0.75) is used in the code.

I tested with a c code too

$ cat vst.c 
#include<math.h>
void my_vspow_075 (float y[], float x[]) {
  #pragma disjoint (*y, *x)

  float *xp=x, *yp=y;
  int i;

  for (i=0; i<1024; i++) {
     *yp=powf(*xp, 0.75);
     xp++;
     yp++;
  }
}

compile it with clang -Ofast -fveclib=MASSV vst.c -mllvm -print-after-all. I can see that after LoopVectorizePass, llvm.pow.f32 is converted to __powf4_massv and after PPCLowerMASSVEntries __powf4_massv is converted to __powf4_P8. But later we will get conversion to sqrt's. This is happening if you just set setOperationAction(ISD::FPOW, MVT::v4f32, Custom); no other changes. The rest of changes in PPCISelLowring.cpp is needed to handle the cases of powf(x,y) when y is not 0.75.

About whether we apply this changes in LoopVectorizePass, PPCLowerMASSVEntries or in PPCISelLowing, I was not the only person that decided to handle it in PPCISelLowing. I will be happy if they can speak up and help me to correctly answer your question. Their names are in the list of reviewers so I won't tag them here again.

I tested with a c code too
$ cat vst.c 
#include<math.h>
void my_vspow_075 (float y[], float x[]) {
  #pragma disjoint (*y, *x)

  float *xp=x, *yp=y;
  int i;

  for (i=0; i<1024; i++) {
     *yp=powf(*xp, 0.75);
     xp++;
     yp++;
  }
}
compile it with clang -Ofast -fveclib=MASSV vst.c -mllvm -print-after-all. I can see that after LoopVectorizePass, llvm.pow.f32 is converted to __powf4_massv and after PPCLowerMASSVEntries __powf4_massv is converted to __powf4_P8. But later we will get conversion to sqrt's. This is happening if you just set setOperationAction(ISD::FPOW, MVT::v4f32, Custom); no other changes. The rest of changes in PPCISelLowring.cpp is needed to handle the cases of powf(x,y) when y is not 0.75.

I don't see any difference of the assembly output from your example code with your patch. I see that we turn the llvm.pow.f32 into two sqrts w/o this patch, not __powf4_P8. I assume that, you want to turn __powf4_P8 into two sqrt's if the arguments is 0.75. Please correct me if I misunderstand the intention.

@steven.zhang
I think I had done a terrible mistake. When I tested with my c code, I didn't have if (ClVectorLibrary == TargetLibraryInfoImpl::MASSV) on

if (ClVectorLibrary == TargetLibraryInfoImpl::MASSV)
  setOperationAction(ISD::FPOW, MVT::v4f32, Custom);

and for some reason if I move this check inside the function LowerFPOWMASSV it works good. So I am updating the patch. Thank you for catching it.

Although, this fix is not ideal. In case we want to use other vector libraries like Accelerate or SVML on PowerPC in future, this code is preventing them to generate accurate libcall for them. Any idea how to fix this issue?

Harbormaster failed remote builds in B59117: Diff 268538!Jun 4 2020, 12:43 PM

Although, this fix is not ideal. In case we want to use other vector libraries like Accelerate or SVML on PowerPC in future, this code is preventing them to generate accurate libcall for them. Any idea how to fix this issue?

As you mark the FPOW as custom which will change the cost when vectorize the llvm.pow.f32. So it will vectorize it as llvm.pow.v4f32 even MASSV is disabled, as we are telling the loop vectorizer that it is cheap in Backend to lower FPOW(llvm.pow.v4f32), which is not always true. That will bring regression for the code path with MASSV disabled if the argument is not 0.75.

I think, the best way to do this is to adjust the loop vectorizer cost model for PowerPC. If the argument is 0.75, the cost of vectorizing llvm.pow.f32 is small, no matter MASSV enabled or not. So that, we will always get llvm.pow.v4f32. But I don't know if it is easy to do it.

Another easier way is to turn the massv function to intrinsic, as that is the motivation of this patch as you described which makes sense to me. What do you think ?

diff --git a/llvm/lib/Target/PowerPC/PPCLowerMASSVEntries.cpp b/llvm/lib/Target/PowerPC/PPCLowerMASSVEntries.cpp
index 429b8a31fbe9..74bd31b0b044 100644
--- a/llvm/lib/Target/PowerPC/PPCLowerMASSVEntries.cpp
+++ b/llvm/lib/Target/PowerPC/PPCLowerMASSVEntries.cpp
@@ -105,6 +105,21 @@ bool PPCLowerMASSVEntries::lowerMASSVCall(CallInst *CI, Function &Func,
   if (CI->use_empty())
     return false;

+  // FIXME - add necessary fast math flag check here.
+  if (Func.getName() == "__powf4_massv") {
+    if (Constant *Exp = dyn_cast<Constant>(CI->getArgOperand(1))) {
+      if (ConstantFP *CFP = dyn_cast<ConstantFP>(Exp->getSplatValue())) {
+        // If the argument is 0.75, it is cheaper to turn it into pow intrinsic
+        // so that it could be optimzed as two sqrt's.
+        if (CFP->isExactlyValue(0.75)) {
+          CI->setCalledFunction(Intrinsic::getDeclaration(&M, Intrinsic::pow,
+                                                          CI->getType()));
+          return true;
+        }
+      }
+    }
+  }
+
   std::string MASSVEntryName = createMASSVFuncName(Func, Subtarget);
   FunctionCallee FCache = M.getOrInsertFunction(
       MASSVEntryName, Func.getFunctionType(), Func.getAttributes());

Moving completely the changes from PPCISelLowring.cpp to PPCLowerMASSVEntries.cpp (MASSV pass) to address the reviewer comments.

Harbormaster failed remote builds in B59622: Diff 269511!Jun 9 2020, 7:06 AM

steven.zhang added inline comments.Jun 9 2020, 8:27 PM

llvm/lib/Target/PowerPC/PPCLowerMASSVEntries.cpp
103	The flag nsz don't matter here. It is only used to handle the 0.25 case. But maybe, you can extend it to support both 0.75 and 0.25 as what we did in DAGCombine or remove the nsz check here.
llvm/test/CodeGen/PowerPC/pow_massv_0.75exp.ll
48 ↗	(On Diff #269511)	Don't add the flag fast as it is too general. But adding the ninf and afn and add another negative case that didn't have these to flags.

I added support for the case when exponent is 0.25 in addition to support for double precision cases.

masoud.ataei retitled this revision from DAGCombiner optimization for pow(x,0.75) even in case massv function is asked to DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single precision even in case massv function is asked.Jun 10 2020, 12:22 PM

masoud.ataei edited the summary of this revision. (Show Details)

Sorry, I forgot the clang-format.

Harbormaster failed remote builds in B59862: Diff 269938!Jun 10 2020, 1:22 PM

Harbormaster completed remote builds in B59865: Diff 269945.Jun 10 2020, 2:28 PM

LGTM now. Thank you for your patient.

This revision is now accepted and ready to land.Jun 10 2020, 3:55 PM

Closed by commit rG2d038370bb6b: DAGCombiner optimization for pow(x,0.75) and pow(x,0.25) on double and single… (authored by masoud.ataei). · Explain WhyJun 12 2020, 7:33 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

PPCLowerMASSVEntries.cpp

33 lines

test/

CodeGen/

PowerPC/

pow_massv_075_025exp.ll

166 lines

powf_massv_075_025exp.ll

166 lines

Diff 270397

llvm/lib/Target/PowerPC/PPCLowerMASSVEntries.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
}		}

private:		private:
static bool isMASSVFunc(StringRef Name);		static bool isMASSVFunc(StringRef Name);
static StringRef getCPUSuffix(const PPCSubtarget *Subtarget);		static StringRef getCPUSuffix(const PPCSubtarget *Subtarget);
static std::string createMASSVFuncName(Function &Func,		static std::string createMASSVFuncName(Function &Func,
const PPCSubtarget *Subtarget);		const PPCSubtarget *Subtarget);
		bool handlePowSpecialCases(CallInst *CI, Function &Func, Module &M);
bool lowerMASSVCall(CallInst *CI, Function &Func, Module &M,		bool lowerMASSVCall(CallInst *CI, Function &Func, Module &M,
const PPCSubtarget *Subtarget);		const PPCSubtarget *Subtarget);
};		};

} // namespace		} // namespace

/// Checks if the specified function name represents an entry in the MASSV		/// Checks if the specified function name represents an entry in the MASSV
/// library.		/// library.
Show All 26 Lines
PPCLowerMASSVEntries::createMASSVFuncName(Function &Func,		PPCLowerMASSVEntries::createMASSVFuncName(Function &Func,
const PPCSubtarget *Subtarget) {		const PPCSubtarget *Subtarget) {
StringRef Suffix = getCPUSuffix(Subtarget);		StringRef Suffix = getCPUSuffix(Subtarget);
auto GenericName = Func.getName().drop_back(MASSVSuffixLength).str();		auto GenericName = Func.getName().drop_back(MASSVSuffixLength).str();
std::string MASSVEntryName = GenericName + Suffix.str();		std::string MASSVEntryName = GenericName + Suffix.str();
return MASSVEntryName;		return MASSVEntryName;
}		}

		/// If there are proper fast-math flags, this function creates llvm.pow
		/// intrinsics when the exponent is 0.25 or 0.75.
		bool PPCLowerMASSVEntries::handlePowSpecialCases(CallInst *CI, Function &Func,
		Module &M) {
		steven.zhangUnsubmitted Not Done Reply Inline Actions The flag nsz don't matter here. It is only used to handle the 0.25 case. But maybe, you can extend it to support both 0.75 and 0.25 as what we did in DAGCombine or remove the nsz check here. steven.zhang: The flag nsz don't matter here. It is only used to handle the 0.25 case. But maybe, you can…
		if (Func.getName() != "__powf4_massv" && Func.getName() != "__powd2_massv")
		return false;

		if (Constant *Exp = dyn_cast<Constant>(CI->getArgOperand(1)))
		if (ConstantFP *CFP = dyn_cast<ConstantFP>(Exp->getSplatValue())) {
		// If the argument is 0.75 or 0.25 it is cheaper to turn it into pow
		// intrinsic so that it could be optimzed as sequence of sqrt's.
		if (!CI->hasNoInfs() \|\| !CI->hasApproxFunc())
		return false;

		if (!CFP->isExactlyValue(0.75) && !CFP->isExactlyValue(0.25))
		return false;

		if (CFP->isExactlyValue(0.25) && !CI->hasNoSignedZeros())
		return false;

		CI->setCalledFunction(
		Intrinsic::getDeclaration(&M, Intrinsic::pow, CI->getType()));
		return true;
		}

		return false;
		}

/// Lowers generic MASSV entries to PowerPC subtarget-specific MASSV entries.		/// Lowers generic MASSV entries to PowerPC subtarget-specific MASSV entries.
/// e.g.: __sind2_massv --> __sind2_P9 for a Power9 subtarget.		/// e.g.: __sind2_massv --> __sind2_P9 for a Power9 subtarget.
/// Both function prototypes and their callsites are updated during lowering.		/// Both function prototypes and their callsites are updated during lowering.
bool PPCLowerMASSVEntries::lowerMASSVCall(CallInst *CI, Function &Func,		bool PPCLowerMASSVEntries::lowerMASSVCall(CallInst *CI, Function &Func,
Module &M,		Module &M,
const PPCSubtarget *Subtarget) {		const PPCSubtarget *Subtarget) {
if (CI->use_empty())		if (CI->use_empty())
return false;		return false;

		// Handling pow(x, 0.25), pow(x, 0.75), powf(x, 0.25), powf(x, 0.75)
		if (handlePowSpecialCases(CI, Func, M))
		return true;

std::string MASSVEntryName = createMASSVFuncName(Func, Subtarget);		std::string MASSVEntryName = createMASSVFuncName(Func, Subtarget);
FunctionCallee FCache = M.getOrInsertFunction(		FunctionCallee FCache = M.getOrInsertFunction(
MASSVEntryName, Func.getFunctionType(), Func.getAttributes());		MASSVEntryName, Func.getFunctionType(), Func.getAttributes());

CI->setCalledFunction(FCache);		CI->setCalledFunction(FCache);

return true;		return true;
}		}
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/pow_massv_075_025exp.ll

This file was added.

				; RUN: llc -vector-library=MASSV < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr9 \| FileCheck -check-prefixes=CHECK-PWR9 %s
				; RUN: llc -vector-library=MASSV < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr8 \| FileCheck -check-prefixes=CHECK-PWR8 %s

				; Exponent is a variable
				define void @my_vpow_var(double* nocapture %z, double* nocapture readonly %y, double* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_var
				; CHECK-PWR9: bl __powd2_P9
				; CHECK-PWR8: bl __powd2_P8
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr double, double* %z, i64 %index
				%next.gep31 = getelementptr double, double* %y, i64 %index
				%next.gep32 = getelementptr double, double* %x, i64 %index
				%0 = bitcast double* %next.gep32 to <2 x double>*
				%wide.load = load <2 x double>, <2 x double>* %0, align 8
				%1 = bitcast double* %next.gep31 to <2 x double>*
				%wide.load33 = load <2 x double>, <2 x double>* %1, align 8
				%2 = call ninf afn nsz <2 x double> @__powd2_massv(<2 x double> %wide.load, <2 x double> %wide.load33)
				%3 = bitcast double* %next.gep to <2 x double>*
				store <2 x double> %2, <2 x double>* %3, align 8
				%index.next = add i64 %index, 2
				%4 = icmp eq i64 %index.next, 1024
				br i1 %4, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is a constant != 0.75 and !=0.25
				define void @my_vpow_const(double* nocapture %y, double* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_const
				; CHECK-PWR9: bl __powd2_P9
				; CHECK-PWR8: bl __powd2_P8
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr double, double* %y, i64 %index
				%next.gep19 = getelementptr double, double* %x, i64 %index
				%0 = bitcast double* %next.gep19 to <2 x double>*
				%wide.load = load <2 x double>, <2 x double>* %0, align 8
				%1 = call ninf afn nsz <2 x double> @__powd2_massv(<2 x double> %wide.load, <2 x double> <double 7.600000e-01, double 7.600000e-01>)
				%2 = bitcast double* %next.gep to <2 x double>*
				store <2 x double> %1, <2 x double>* %2, align 8
				%index.next = add i64 %index, 2
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is 0.75
				define void @my_vpow_075(double* nocapture %y, double* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_075
				; CHECK-NOT: bl __powd2_P{{[8,9]}}
				; CHECK: xvrsqrtesp
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr double, double* %y, i64 %index
				%next.gep19 = getelementptr double, double* %x, i64 %index
				%0 = bitcast double* %next.gep19 to <2 x double>*
				%wide.load = load <2 x double>, <2 x double>* %0, align 8
				%1 = call ninf afn <2 x double> @__powd2_massv(<2 x double> %wide.load, <2 x double> <double 7.500000e-01, double 7.500000e-01>)
				%2 = bitcast double* %next.gep to <2 x double>*
				store <2 x double> %1, <2 x double>* %2, align 8
				%index.next = add i64 %index, 2
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is 0.25
				define void @my_vpow_025(double* nocapture %y, double* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_025
				; CHECK-NOT: bl __powd2_P{{[8,9]}}
				; CHECK: xvrsqrtesp
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr double, double* %y, i64 %index
				%next.gep19 = getelementptr double, double* %x, i64 %index
				%0 = bitcast double* %next.gep19 to <2 x double>*
				%wide.load = load <2 x double>, <2 x double>* %0, align 8
				%1 = call ninf afn nsz <2 x double> @__powd2_massv(<2 x double> %wide.load, <2 x double> <double 2.500000e-01, double 2.500000e-01>)
				%2 = bitcast double* %next.gep to <2 x double>*
				store <2 x double> %1, <2 x double>* %2, align 8
				%index.next = add i64 %index, 2
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is 0.75 but no proper fast-math flags
				define void @my_vpow_075_nofast(double* nocapture %y, double* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_075_nofast
				; CHECK-PWR9: bl __powd2_P9
				; CHECK-PWR8: bl __powd2_P8
				; CHECK-NOT: xvrsqrtesp
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr double, double* %y, i64 %index
				%next.gep19 = getelementptr double, double* %x, i64 %index
				%0 = bitcast double* %next.gep19 to <2 x double>*
				%wide.load = load <2 x double>, <2 x double>* %0, align 8
				%1 = call <2 x double> @__powd2_massv(<2 x double> %wide.load, <2 x double> <double 7.500000e-01, double 7.500000e-01>)
				%2 = bitcast double* %next.gep to <2 x double>*
				store <2 x double> %1, <2 x double>* %2, align 8
				%index.next = add i64 %index, 2
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is 0.25 but no proper fast-math flags
				define void @my_vpow_025_nofast(double* nocapture %y, double* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_025_nofast
				; CHECK-PWR9: bl __powd2_P9
				; CHECK-PWR8: bl __powd2_P8
				; CHECK-NOT: xvrsqrtesp
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr double, double* %y, i64 %index
				%next.gep19 = getelementptr double, double* %x, i64 %index
				%0 = bitcast double* %next.gep19 to <2 x double>*
				%wide.load = load <2 x double>, <2 x double>* %0, align 8
				%1 = call <2 x double> @__powd2_massv(<2 x double> %wide.load, <2 x double> <double 2.500000e-01, double 2.500000e-01>)
				%2 = bitcast double* %next.gep to <2 x double>*
				store <2 x double> %1, <2 x double>* %2, align 8
				%index.next = add i64 %index, 2
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Function Attrs: nounwind readnone speculatable willreturn
				declare <2 x double> @__powd2_massv(<2 x double>, <2 x double>) #1

llvm/test/CodeGen/PowerPC/powf_massv_075_025exp.ll

This file was added.

				; RUN: llc -vector-library=MASSV < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr9 \| FileCheck -check-prefixes=CHECK-PWR9 %s
				; RUN: llc -vector-library=MASSV < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr8 \| FileCheck -check-prefixes=CHECK-PWR8 %s

				; Exponent is a variable
				define void @vspow_var(float* nocapture %z, float* nocapture readonly %y, float* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_var
				; CHECK-PWR9: bl __powf4_P9
				; CHECK-PWR8: bl __powf4_P8
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr float, float* %z, i64 %index
				%next.gep31 = getelementptr float, float* %y, i64 %index
				%next.gep32 = getelementptr float, float* %x, i64 %index
				%0 = bitcast float* %next.gep32 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %0, align 4
				%1 = bitcast float* %next.gep31 to <4 x float>*
				%wide.load33 = load <4 x float>, <4 x float>* %1, align 4
				%2 = call ninf afn nsz <4 x float> @__powf4_massv(<4 x float> %wide.load, <4 x float> %wide.load33)
				%3 = bitcast float* %next.gep to <4 x float>*
				store <4 x float> %2, <4 x float>* %3, align 4
				%index.next = add i64 %index, 4
				%4 = icmp eq i64 %index.next, 1024
				br i1 %4, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is a constant != 0.75 and !=0.25
				define void @vspow_const(float* nocapture %y, float* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_const
				; CHECK-PWR9: bl __powf4_P9
				; CHECK-PWR8: bl __powf4_P8
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr float, float* %y, i64 %index
				%next.gep19 = getelementptr float, float* %x, i64 %index
				%0 = bitcast float* %next.gep19 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %0, align 4
				%1 = call ninf afn nsz <4 x float> @__powf4_massv(<4 x float> %wide.load, <4 x float> <float 0x3FE851EB80000000, float 0x3FE851EB80000000, float 0x3FE851EB80000000, float 0x3FE851EB80000000>)
				%2 = bitcast float* %next.gep to <4 x float>*
				store <4 x float> %1, <4 x float>* %2, align 4
				%index.next = add i64 %index, 4
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is 0.75
				define void @vspow_075(float* nocapture %y, float* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_075
				; CHECK-NOT: bl __powf4_P{{[8,9]}}
				; CHECK: xvrsqrtesp
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr float, float* %y, i64 %index
				%next.gep19 = getelementptr float, float* %x, i64 %index
				%0 = bitcast float* %next.gep19 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %0, align 4
				%1 = call ninf afn <4 x float> @__powf4_massv(<4 x float> %wide.load, <4 x float> <float 7.500000e-01, float 7.500000e-01, float 7.500000e-01, float 7.500000e-01>)
				%2 = bitcast float* %next.gep to <4 x float>*
				store <4 x float> %1, <4 x float>* %2, align 4
				%index.next = add i64 %index, 4
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is 0.25
				define void @vspow_025(float* nocapture %y, float* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_025
				; CHECK-NOT: bl __powf4_P{{[8,9]}}
				; CHECK: xvrsqrtesp
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr float, float* %y, i64 %index
				%next.gep19 = getelementptr float, float* %x, i64 %index
				%0 = bitcast float* %next.gep19 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %0, align 4
				%1 = call ninf afn nsz <4 x float> @__powf4_massv(<4 x float> %wide.load, <4 x float> <float 2.500000e-01, float 2.500000e-01, float 2.500000e-01, float 2.500000e-01>)
				%2 = bitcast float* %next.gep to <4 x float>*
				store <4 x float> %1, <4 x float>* %2, align 4
				%index.next = add i64 %index, 4
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is 0.75 but no proper fast-math flags
				define void @vspow_075_nofast(float* nocapture %y, float* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_075_nofast
				; CHECK-PWR9: bl __powf4_P9
				; CHECK-PWR8: bl __powf4_P8
				; CHECK-NOT: xvrsqrtesp
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr float, float* %y, i64 %index
				%next.gep19 = getelementptr float, float* %x, i64 %index
				%0 = bitcast float* %next.gep19 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %0, align 4
				%1 = call <4 x float> @__powf4_massv(<4 x float> %wide.load, <4 x float> <float 7.500000e-01, float 7.500000e-01, float 7.500000e-01, float 7.500000e-01>)
				%2 = bitcast float* %next.gep to <4 x float>*
				store <4 x float> %1, <4 x float>* %2, align 4
				%index.next = add i64 %index, 4
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Exponent is 0.25 but no proper fast-math flags
				define void @vspow_025_nofast(float* nocapture %y, float* nocapture readonly %x) {
				; CHECK-LABEL: @vspow_025_nofast
				; CHECK-PWR9: bl __powf4_P9
				; CHECK-PWR8: bl __powf4_P8
				; CHECK-NOT: xvrsqrtesp
				; CHECK: blr
				entry:
				br label %vector.body

				vector.body:
				%index = phi i64 [ %index.next, %vector.body ], [ 0, %entry ]
				%next.gep = getelementptr float, float* %y, i64 %index
				%next.gep19 = getelementptr float, float* %x, i64 %index
				%0 = bitcast float* %next.gep19 to <4 x float>*
				%wide.load = load <4 x float>, <4 x float>* %0, align 4
				%1 = call <4 x float> @__powf4_massv(<4 x float> %wide.load, <4 x float> <float 2.500000e-01, float 2.500000e-01, float 2.500000e-01, float 2.500000e-01>)
				%2 = bitcast float* %next.gep to <4 x float>*
				store <4 x float> %1, <4 x float>* %2, align 4
				%index.next = add i64 %index, 4
				%3 = icmp eq i64 %index.next, 1024
				br i1 %3, label %for.end, label %vector.body

				for.end:
				ret void
				}

				; Function Attrs: nounwind readnone speculatable willreturn
				declare <4 x float> @__powf4_massv(<4 x float>, <4 x float>)