This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
6/6
SimplifyLibCalls.cpp
-
test/
-
CodeGen/AArch64/
-
AArch64/
1/2
powi.ll
-
Transforms/InstCombine/
-
InstCombine/
7/8
pow-4.ll
-
pow_fp_int.ll
-
pow_fp_int16.ll

Differential D128591

Transforms: refactor pow(x, n) expansion where n is a constant integer value
ClosedPublic

Authored by pawosm01 on Jun 25 2022, 1:48 PM.

Download Raw Diff

Details

Reviewers

david-arm
sdesmalen
bsmith
fhahn
spatel
RKSimon

Commits

rGb17754bcaa14: [SimplifyLibCalls] refactor pow(x, n) expansion where n is a constant integer…

Summary

Since the backend's codegen is capable to expand powi into fmul's, it
is not needed anymore to do so in the ::optimizePow() function of
SimplifyLibCalls.cpp. What is sufficient is to always turn pow(x, n)
into powi(x, n) for the cases where n is a constant integer value.

Dropping the current expansion code allowed relaxation of the folding
conditions and now this can also happen at optimization levels below
Ofast.

The added CodeGen/AArch64/powi.ll test case ensures that powi is
actually expanded into fmul's, confirming that this refactor did not
cause any performance degradation.

Following an idea proposed by David Sherwood <david.sherwood@arm.com>.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pawosm01 created this revision.Jun 25 2022, 1:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2022, 1:48 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

pawosm01 requested review of this revision.Jun 25 2022, 1:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2022, 1:48 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B172040: Diff 440011.Jun 25 2022, 2:58 PM

fixed formatting.

pawosm01 added reviewers: sdesmalen, bsmith, fhahn.Jun 27 2022, 3:41 AM

Harbormaster completed remote builds in B172152: Diff 440154.Jun 27 2022, 4:27 AM

mgabka added a subscriber: mgabka.Jun 27 2022, 5:09 AM

david-arm added inline comments.Jun 28 2022, 6:55 AM

llvm/include/llvm/IR/Operator.h
263 ↗	(On Diff #440154)	nit: Hi, I think this probably needs a simple comment here, something like /// Test if this operation's arguments and results are assumed to be finite.
llvm/test/Transforms/InstCombine/pow-4.ll
35	It looks like the `nsz` flag is not needed here because the transformation only depends upon `reassoc nnan ninf` I think?

Addressed review comment 1.

Comment 2 addressed.

pawosm01 marked an inline comment as done.Jun 28 2022, 7:53 AM

pawosm01 added inline comments.

llvm/include/llvm/IR/Operator.h
263 ↗	(On Diff #440154)	done
llvm/test/Transforms/InstCombine/pow-4.ll
35	yes

The patch looks good to me. However, @spatel is more familiar than me with the semantics of fast-math flags on operations, so adding as a reviewer in case I've missed something!

llvm/test/Transforms/InstCombine/pow-4.ll
63	nit: Can you remove nsz here too?
110	nit: Can you remove nsz here too?

Another review comment addressed.

pawosm01 marked 2 inline comments as done.Jun 28 2022, 9:27 AM

pawosm01 added inline comments.

llvm/test/Transforms/InstCombine/pow-4.ll
63	Si.
110	Si.

Harbormaster completed remote builds in B172511: Diff 440657.Jun 28 2022, 10:30 AM

Not to confuse things, but we also have @llvm.powi.* intrinsics and expansion which are all very similar to this - can we not merge these more?

Sadly, powi is handled differently (than powf, pow and powl which all result in calling optimizePow() modified by this patch) in the SimplifyLibCalls.cpp code, so extending this patch in that direction would go beyond the initial scope and may overstretch my approved upstreaming activity. I would rather create separate commit covering powi later.

The patch description and tests don't match what the code does currently. We don't require full 'fast'; we just check for 'afn' and propagate that flag to the new instructions.

It might help to know what the motivating source case looks like - are we translating the source-level parameters to FMF as expected?

I'm not opposed to easing the restriction, but it's probably better to add/modify the tests first. That way, we know exactly what the current behavior is and how it will change with the proposed patch.

It seems problematic that LibCallSimplifier::optimizePow also folds pow ->powi yet we also have pow expansion code that is completely separate from powi (where its handled in CGP).

We now have TTI:isBeneficialToExpandPowI to decide when to expand powi intrinsics properly instead of hard coded limits like there are here.

Hi @RKSimon, thanks for your comments, although I'd like to obtain some of your guidance on what can I do to address them in the context of this commit.

It seems problematic that LibCallSimplifier::optimizePow also folds pow ->powi yet we also have pow expansion code that is completely separate from powi (where its handled in CGP).

Does it mean that in order to make things less problematic, all applications of createPowWithIntegerExponent() should be moved somewhere else? Or maybe pow/powf/powl expansion code should not occur in optimizePow()? Should the scope of this commit be extended towards such refactor?

We now have TTI:isBeneficialToExpandPowI to decide when to expand powi intrinsics properly instead of hard coded limits like there are here.

Should it happen the same way for pow/powf/powl too?

In D128591#3620079, @spatel wrote:

The patch description and tests don't match what the code does currently. We don't require full 'fast'; we just check for 'afn' and propagate that flag to the new instructions.

It might help to know what the motivating source case looks like - are we translating the source-level parameters to FMF as expected?

I'm not opposed to easing the restriction, but it's probably better to add/modify the tests first. That way, we know exactly what the current behavior is and how it will change with the proposed patch.

@spatel we have discussed your comment here, and my understanding is that you're not opposing to the proposed change as such, I guess what you wanted to know is how we ended up with proposed set of flags. At the time of our experiments, we were focused on trying to restrict certain optimizations to the lowest possible set of fast math flags. We were comparing with what Intel compiler can do by default at -O3 as it is more relaxed than clang, so we wanted to find a route to getting similar effect.

As such I'm finding the test case presented here fitting to the nature of our experiment and I don't see anything missing. Maybe it is quite academic and many may not find a huge benefit of that; in the end, it is sufficient to use -funsafe-math-optimizations flag in order to relax restrictions.

In D128591#3625167, @pawosm01 wrote:

Hi @RKSimon, thanks for your comments, although I'd like to obtain some of your guidance on what can I do to address them in the context of this commit.

It seems problematic that LibCallSimplifier::optimizePow also folds pow ->powi yet we also have pow expansion code that is completely separate from powi (where its handled in CGP).

Does it mean that in order to make things less problematic, all applications of createPowWithIntegerExponent() should be moved somewhere else? Or maybe pow/powf/powl expansion code should not occur in optimizePow()? Should the scope of this commit be extended towards such refactor?

I'd expect a powi intrinsic instead of expanding the pow multiplies directly - the sqrt variant should be able to fed into the powi intrinsic as well afaict.

We now have TTI:isBeneficialToExpandPowI to decide when to expand powi intrinsics properly instead of hard coded limits like there are here.

Should it happen the same way for pow/powf/powl too?

Yes - create the powi intrinsic and leave it CGP and the backend to decide when to expand it.

In D128591#3628267, @pawosm01 wrote:

@spatel we have discussed your comment here, and my understanding is that you're not opposing to the proposed change as such, I guess what you wanted to know is how we ended up with proposed set of flags. At the time of our experiments, we were focused on trying to restrict certain optimizations to the lowest possible set of fast math flags. We were comparing with what Intel compiler can do by default at -O3 as it is more relaxed than clang, so we wanted to find a route to getting similar effect.

As such I'm finding the test case presented here fitting to the nature of our experiment and I don't see anything missing. Maybe it is quite academic and many may not find a huge benefit of that; in the end, it is sufficient to use -funsafe-math-optimizations flag in order to relax restrictions.

Thanks for the background info. FP optimization flags are not precisely specified, so there will be mismatches between clang vs. icc vs. gcc. If you want to allow this with -fassociative-math + -ffinite-math-only, then we'd need to adjust something in the front-end because reassoc is not applied to the pow call:
https://godbolt.org/z/obeenobfq

I didn't realize we had the TLI hook to expand powi already, so I agree with @RKSimon (and there's a TODO comment in this code about removing the expansion here in IR) - we should convert to powi here and leave the expansion under target control in the backend. Maybe this already works if you just delete the expansion in this file?

pawosm01 updated this revision to Diff 442616.Jul 6 2022, 9:41 AM

pawosm01 retitled this revision from Transforms: Relax restrictions on pow(x, y) expansion to Transforms: refactor pow(x, n) expansion where n is a constant integer value.

pawosm01 edited the summary of this revision. (Show Details)

Herald added a subscriber: kristof.beyls. · View Herald TranscriptJul 6 2022, 9:41 AM

Harbormaster completed remote builds in B173933: Diff 442616.Jul 6 2022, 9:42 AM

@RKSimon and @spatel you were right, this code can be happily removed, the fmul expansion is still happening when it's needed. I've uploaded a completely new commit.

Thanks @pawosm01

llvm/test/Transforms/InstCombine/pow-4.ll
139	I guess we need to decide whether we want to retain this variety of cases somehow? I assume we can perform this as powi(x, 16) * sqrt(x) ?

pawosm01 added inline comments.Jul 6 2022, 12:12 PM

llvm/test/Transforms/InstCombine/pow-4.ll

139

Something like this?:

   const APFloat *ExpoF;
   if (match(Expo, m_APFloat(ExpoF)) && !ExpoF->isExactlyValue(0.5) &&
       !ExpoF->isExactlyValue(-0.5)) {
+    // This transformation applies to integer or integer+0.5 exponents only.
+    // For integer+0.5, we create a sqrt(Base) call.
+    APFloat ExpoA(abs(*ExpoF));
+    Value *Sqrt = nullptr;
+    if (AllowApprox && !ExpoA.isInteger()) {
+      APFloat Expo2 = ExpoA;
+      // To check if ExpoA is an integer + 0.5, we add it to itself. If there
+      // is no floating point exception and the result is an integer, then
+      // ExpoA == integer + 0.5
+      if (Expo2.add(ExpoA, APFloat::rmNearestTiesToEven) != APFloat::opOK)
+        return nullptr;
+
+      if (!Expo2.isInteger())
+        return nullptr;
+
+      Sqrt = getSqrtCall(Base, Pow->getCalledFunction()->getAttributes(),
+                         Pow->doesNotAccessMemory(), M, B, TLI);
+      if (!Sqrt)
+        return nullptr;
+    }
+
     APSInt IntExpo(TLI->getIntSize(), /*isUnsigned=*/false);
+    // pow(x, n) -> powi(x, n) if n is a constant signed integer value
     if (ExpoF->isInteger() &&
         ExpoF->convertToInteger(IntExpo, APFloat::rmTowardZero, &Ignored) ==
             APFloat::opOK) {
-      return copyFlags(
+      Value *PowI = copyFlags(
           *Pow,
           createPowWithIntegerExponent(
               Base, ConstantInt::get(B.getIntNTy(TLI->getIntSize()), IntExpo),
               M, B));
+
+      if (PowI && Sqrt)
+        return B.CreateFMul(PowI, Sqrt);
+
+      return PowI;
     }
   }

Sadly, this leads to infinite loop of self-contradicting optimizations:

FAIL: LLVM :: Transforms/InstCombine/pow_fp_int.ll (10150 of 45161)
******************** TEST 'LLVM :: Transforms/InstCombine/pow_fp_int.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /dsg_space/projectscratch_dsg_space/pawosm01/upstream/llvm-project.git/build-shared-debug/bin/opt -mtriple unknown -passes=instcombine -S < /dsg_space/projectscratch_dsg_space/pawosm01/upstream/llvm-project.git/llvm/test/Transforms/InstCombine/pow_fp_int.ll | /dsg_space/projectscratch_dsg_space/pawosm01/upstream/llvm-project.git/build-shared-debug/bin/FileCheck /dsg_space/projectscratch_dsg_space/pawosm01/upstream/llvm-project.git/llvm/test/Transforms/InstCombine/pow_fp_int.ll
--
Exit Code: 2

Command Output (stderr):
--
LLVM ERROR: Instruction Combining seems stuck in an infinite loop after 100 iterations.

Rebased.

Harbormaster completed remote builds in B173963: Diff 442658.Jul 6 2022, 1:31 PM

@RKSimon could you give some suggestions where to go next with this?

Investigating the infinite loops further makes sense to me - do we have a reverse fold that combines powi / sqrt intrinsics?

@RKSimon turned out doing a simple copy-paste wasn't the best idea. I've corrected the code proposed within one of my comments and put it into a new patch. It passes all test now, nothing runs into infinite loop.

Does this approach fits your idea of addressing this specific case?

Harbormaster completed remote builds in B174378: Diff 443231.Jul 8 2022, 7:06 AM

Yeah, that's the kind of thing I had in mind, but I'd feel happier if somebody else gave this a check through as well.

spatel added inline comments.Jul 8 2022, 8:45 AM

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1941	Delete stale code comment.
1972	Add another line for the sqrt variant like: // pow(x, n) -> powi(x, n) * sqrt(x) if n has exactly a 0.5 fraction
llvm/test/CodeGen/AArch64/powi.ll
8	These tests results are independent of this patch, so this could be committed as a preliminary "NFC" patch. We intentionally don't have regression tests that check end-to-end results of IR optimization + codegen because we want focused/unit testing at this level. The end-to-end tests live in test-suite instead (and could begin from C source too).

Following review comments.

pawosm01 marked 2 inline comments as done.Jul 8 2022, 10:51 AM

pawosm01 added inline comments.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1941	OK. Also followed the comment below.
1972	OK
llvm/test/CodeGen/AArch64/powi.ll
8	I'd keep it if it's not very disturbing. This is the last thing that links us to the idea that brought us here.

Harbormaster completed remote builds in B174416: Diff 443288.Jul 8 2022, 11:52 AM

I'm not sure if we answered the original question about fast-math-flags, but that can be an independent patch if needed. This part LGTM with the dead code removed.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1640–1641	This function is dead code now and should be removed.

This revision is now accepted and ready to land.Jul 8 2022, 12:06 PM

Removed dead function. Thanks @spatel for spotting this!

pawosm01 marked an inline comment as done.Jul 8 2022, 12:53 PM

pawosm01 added inline comments.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp
1640–1641	Removed dead code.

pawosm01 marked an inline comment as done.Jul 8 2022, 1:01 PM

Harbormaster completed remote builds in B174438: Diff 443330.Jul 8 2022, 1:46 PM

Seems my old write access is not valid anymore. Can I ask someone to push this commit?

This revision was landed with ongoing or failed builds.Jul 9 2022, 9:02 AM

Closed by commit rGb17754bcaa14: [SimplifyLibCalls] refactor pow(x, n) expansion where n is a constant integer… (authored by pawosm01, committed by spatel). · Explain Why

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rGb17754bcaa14: [SimplifyLibCalls] refactor pow(x, n) expansion where n is a constant integer….

Hi,

This patch introduces the difference between new inlined and library versions of std::pow, at least on x86_64. For large exponents, the difference is large. Also, when the difference accumulates, it makes a lot of existing tests that compare against golden values to go far beyond the meaningful margin.

Similarly, it introduces the difference between pow of constant and pow of non-constant with the same value.

I think such differences make related debugging a lot more complex. What do you think?

Here are the tests:

[[clang::optnone]] double test_pow(double x, double y) {
  return std::pow(x, y);
}

// PASS - compiler evaluates std::pow(const, const) similar to the library?
TEST(Test, Pow_const) {
  double x = 8.804009981602594;
  double y = 10.0;
  double t = test_pow(x, y);
  double z = std::pow(x, y);
  EXPECT_TRUE(t - z < 1e-6 && z -t < 1e-6);
}

[[clang::optnone]] double init(const char* buf) {
  double x;
  memcpy(&x, buf, sizeof(double));
  return x;
}

// FAIL - https://reviews.llvm.org/D128591 introduces difference.
TEST(Test, Pow_nonconst) {
  double x = init("\x0a\x48\x41\x32\xa7\x9b\x21\x40");  // 8.804009981602594
  double y = 10.0;
  double t = test_pow(x, y);
  double z = std::pow(x, y);
  EXPECT_TRUE(t - z < 1e-6 && z -t < 1e-6);
}

Hi @eaeltsin isn't this the same problem with every other FP optimisation where the user has explicitly requested optimisations to take place? The same thing is true for stuff like the following:

float __attribute__((optnone)) reduc1(float *src, long n) {
  float r = 0;
  for (long i = 0; i < n; i++)
    r += src[i];
  return r;
}

float reduc2(float *src, long n) {
  float r = 0;
  for (long i = 0; i < n; i++)
    r += src[i];
  return r;
}

and the user compiles with -Ofast. With certain pathological cases it's also possible to produce quite different results. It seems to me that if the user has explicitly requested FP optimisations (because they care more about performance than accuracy) and the optimisations are legal within the context of the flags requested, then the compiler is doing the right thing?

Hi David,

One more question - my concern is that we get different results in logically equal contexts:

// FAIL
TEST(Test, Pow7) {
  double x1 = 8.804009981602594;
  double x2 = init("\x0a\x48\x41\x32\xa7\x9b\x21\x40");  // 8.804009981602594
  double y = 10.0;
  double t = std::pow(x1, y);
  double z = std::pow(x2, y);
  EXPECT_TRUE(t - z < 1e-6 && z - t < 1e-6);
}

It looks like only the newly inlined version differs, while compile-time and library versions agree.

In D128591#3662342, @david-arm wrote:

It seems to me that if the user has explicitly requested FP optimisations (because they care more about performance than accuracy) and the optimisations are legal within the context of the flags requested, then the compiler is doing the right thing?

I agree. Though it would be good to see the IR for the test to confirm that we didn't accidentally create a path where non-fast pow got transformed into powi. The description of powi tries to make it clear that it is unreliable in testing like the example - "The order of evaluation of multiplications is not defined." ( https://llvm.org/docs/LangRef.html#llvm-powi-intrinsic )

The question about constant folding a powi call is interesting independently of that - should we evaluate it at compile-time with a multiply loop? There's no way it's going to match the target-specific expansion or library call, so we're just trading one set of different answers for another?

Hi @eaeltsin, it's also worth bearing in mind that we were already doing this optimisation (replacing pow with multiplies) before this patch landed if the user was building with -Ofast. This patch simply extends that behaviour for a different set of fast math flags.

BTW, this is not about -Ofast - the problem reproduces with -O1 - https://gcc.godbolt.org/z/6barovn81

In D128591#3662752, @eaeltsin wrote:

BTW, this is not about -Ofast - the problem reproduces with -O1 - https://gcc.godbolt.org/z/6barovn81

Ah, that's definitely a bug. We can't convert pow to powi without some kind of FMF. I thought the transform was still guarded by 'afn', but I see now that there are non-fast tests that changed too.

In D128591#3662792, @spatel wrote:

In D128591#3662752, @eaeltsin wrote:

BTW, this is not about -Ofast - the problem reproduces with -O1 - https://gcc.godbolt.org/z/6barovn81

Ah, that's definitely a bug. We can't convert pow to powi without some kind of FMF. I thought the transform was still guarded by 'afn', but I see now that there are non-fast tests that changed too.

Fixed to require 'afn':
3d6c10dcf3b5

I'm not sure if that resolves all of the outstanding questions here (and might reopen some of the original motivation for the patch), but it should restore correctness with an -O1 compile.

Thanks Sanjay, this fixes the issues we are seeing so far.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

SimplifyLibCalls.cpp

107 lines

test/

CodeGen/

AArch64/

powi.ll

25 lines

Transforms/

InstCombine/

pow-4.ll

249 lines

pow_fp_int.ll

8 lines

pow_fp_int16.ll

8 lines

Diff 443444

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp

Show First 20 Lines • Show All 1,631 Lines • ▼ Show 20 Lines	if (match(Call->getArgOperand(0), m_FNeg(m_Value(X))))
return copyFlags(*Call,		return copyFlags(*Call,
B.CreateCall(Call->getCalledFunction(), X, "cos"));		B.CreateCall(Call->getCalledFunction(), X, "cos"));
break;		break;
default:		default:
break;		break;
}		}
return nullptr;		return nullptr;
}		}

static Value getPow(Value InnerChain[33], unsigned Exp, IRBuilderBase &B) {
// Multiplications calculated using Addition Chains.
// Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html

assert(Exp != 0 && "Incorrect exponent 0 not handled");

if (InnerChain[Exp])
return InnerChain[Exp];

static const unsigned AddChain[33][2] = {
{0, 0}, // Unused.
{0, 0}, // Unused (base case = pow1).
{1, 1}, // Unused (pre-computed).
{1, 2}, {2, 2}, {2, 3}, {3, 3}, {2, 5}, {4, 4},
{1, 8}, {5, 5}, {1, 10}, {6, 6}, {4, 9}, {7, 7},
{3, 12}, {8, 8}, {8, 9}, {2, 16}, {1, 18}, {10, 10},
{6, 15}, {11, 11}, {3, 20}, {12, 12}, {8, 17}, {13, 13},
{3, 24}, {14, 14}, {4, 25}, {15, 15}, {3, 28}, {16, 16},
};

InnerChain[Exp] = B.CreateFMul(getPow(InnerChain, AddChain[Exp][0], B),
getPow(InnerChain, AddChain[Exp][1], B));
return InnerChain[Exp];
}

// Return a properly extended integer (DstWidth bits wide) if the operation is		// Return a properly extended integer (DstWidth bits wide) if the operation is
		spatelUnsubmitted Done Reply Inline Actions This function is dead code now and should be removed. spatel: This function is dead code now and should be removed.
		pawosm01AuthorUnsubmitted Done Reply Inline Actions Removed dead code. pawosm01: Removed dead code.
// an itofp.		// an itofp.
static Value getIntToFPVal(Value I2F, IRBuilderBase &B, unsigned DstWidth) {		static Value getIntToFPVal(Value I2F, IRBuilderBase &B, unsigned DstWidth) {
if (isa<SIToFPInst>(I2F) \|\| isa<UIToFPInst>(I2F)) {		if (isa<SIToFPInst>(I2F) \|\| isa<UIToFPInst>(I2F)) {
Value *Op = cast<Instruction>(I2F)->getOperand(0);		Value *Op = cast<Instruction>(I2F)->getOperand(0);
// Make sure that the exponent fits inside an "int" of size DstWidth,		// Make sure that the exponent fits inside an "int" of size DstWidth,
// thus avoiding any range issues that FP has not.		// thus avoiding any range issues that FP has not.
unsigned BitWidth = Op->getType()->getPrimitiveSizeInBits();		unsigned BitWidth = Op->getType()->getPrimitiveSizeInBits();
if (BitWidth < DstWidth \|\|		if (BitWidth < DstWidth \|\|
▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	Value LibCallSimplifier::optimizePow(CallInst Pow, IRBuilderBase &B) {

// pow(x, 2.0) -> x * x		// pow(x, 2.0) -> x * x
if (match(Expo, m_SpecificFP(2.0)))		if (match(Expo, m_SpecificFP(2.0)))
return B.CreateFMul(Base, Base, "square");		return B.CreateFMul(Base, Base, "square");

if (Value *Sqrt = replacePowWithSqrt(Pow, B))		if (Value *Sqrt = replacePowWithSqrt(Pow, B))
return Sqrt;		return Sqrt;

// pow(x, n) -> x * x * x * ...		// pow(x, n) -> powi(x, n) * sqrt(x) if n has exactly a 0.5 fraction
		spatelUnsubmitted Done Reply Inline Actions Delete stale code comment. spatel: Delete stale code comment.
		pawosm01AuthorUnsubmitted Done Reply Inline Actions OK. Also followed the comment below. pawosm01: OK. Also followed the comment below.
const APFloat *ExpoF;		const APFloat *ExpoF;
if (AllowApprox && match(Expo, m_APFloat(ExpoF)) &&		if (match(Expo, m_APFloat(ExpoF)) && !ExpoF->isExactlyValue(0.5) &&
!ExpoF->isExactlyValue(0.5) && !ExpoF->isExactlyValue(-0.5)) {		!ExpoF->isExactlyValue(-0.5)) {
// We limit to a max of 7 multiplications, thus the maximum exponent is 32.		APFloat ExpoA(abs(*ExpoF));
// If the exponent is an integer+0.5 we generate a call to sqrt and an		APFloat ExpoI(*ExpoF);
// additional fmul.
// TODO: This whole transformation should be backend specific (e.g. some
// backends might prefer libcalls or the limit for the exponent might
// be different) and it should also consider optimizing for size.
APFloat LimF(ExpoF->getSemantics(), 33),
ExpoA(abs(*ExpoF));
if (ExpoA < LimF) {
// This transformation applies to integer or integer+0.5 exponents only.
// For integer+0.5, we create a sqrt(Base) call.
Value *Sqrt = nullptr;		Value *Sqrt = nullptr;
if (!ExpoA.isInteger()) {		if (AllowApprox && !ExpoA.isInteger()) {
APFloat Expo2 = ExpoA;		APFloat Expo2 = ExpoA;
// To check if ExpoA is an integer + 0.5, we add it to itself. If there		// To check if ExpoA is an integer + 0.5, we add it to itself. If there
// is no floating point exception and the result is an integer, then		// is no floating point exception and the result is an integer, then
// ExpoA == integer + 0.5		// ExpoA == integer + 0.5
if (Expo2.add(ExpoA, APFloat::rmNearestTiesToEven) != APFloat::opOK)		if (Expo2.add(ExpoA, APFloat::rmNearestTiesToEven) != APFloat::opOK)
return nullptr;		return nullptr;

if (!Expo2.isInteger())		if (!Expo2.isInteger())
return nullptr;		return nullptr;

		if (ExpoI.roundToIntegral(APFloat::rmTowardNegative) !=
		APFloat::opInexact)
		return nullptr;
		if (!ExpoI.isInteger())
		return nullptr;
		ExpoF = &ExpoI;

Sqrt = getSqrtCall(Base, Pow->getCalledFunction()->getAttributes(),		Sqrt = getSqrtCall(Base, Pow->getCalledFunction()->getAttributes(),
Pow->doesNotAccessMemory(), M, B, TLI);		Pow->doesNotAccessMemory(), M, B, TLI);
if (!Sqrt)		if (!Sqrt)
return nullptr;		return nullptr;
}		}

// We will memoize intermediate products of the Addition Chain.		// pow(x, n) -> powi(x, n) if n is a constant signed integer value
		spatelUnsubmitted Done Reply Inline Actions Add another line for the sqrt variant like: // pow(x, n) -> powi(x, n) * sqrt(x) if n has exactly a 0.5 fraction spatel: Add another line for the sqrt variant like: // pow(x, n) -> powi(x, n) * sqrt(x) if n has…
		pawosm01AuthorUnsubmitted Done Reply Inline Actions OK pawosm01: OK
Value *InnerChain[33] = {nullptr};
InnerChain[1] = Base;
InnerChain[2] = B.CreateFMul(Base, Base, "square");

// We cannot readily convert a non-double type (like float) to a double.
// So we first convert it to something which could be converted to double.
ExpoA.convert(APFloat::IEEEdouble(), APFloat::rmTowardZero, &Ignored);
Value *FMul = getPow(InnerChain, ExpoA.convertToDouble(), B);

// Expand pow(x, y+0.5) to pow(x, y) * sqrt(x).
if (Sqrt)
FMul = B.CreateFMul(FMul, Sqrt);

// If the exponent is negative, then get the reciprocal.
if (ExpoF->isNegative())
FMul = B.CreateFDiv(ConstantFP::get(Ty, 1.0), FMul, "reciprocal");

return FMul;
}

APSInt IntExpo(TLI->getIntSize(), /isUnsigned=/false);		APSInt IntExpo(TLI->getIntSize(), /isUnsigned=/false);
// powf(x, n) -> powi(x, n) if n is a constant signed integer value
if (ExpoF->isInteger() &&		if (ExpoF->isInteger() &&
ExpoF->convertToInteger(IntExpo, APFloat::rmTowardZero, &Ignored) ==		ExpoF->convertToInteger(IntExpo, APFloat::rmTowardZero, &Ignored) ==
APFloat::opOK) {		APFloat::opOK) {
return copyFlags(		Value *PowI = copyFlags(
*Pow,		*Pow,
createPowWithIntegerExponent(		createPowWithIntegerExponent(
Base, ConstantInt::get(B.getIntNTy(TLI->getIntSize()), IntExpo),		Base, ConstantInt::get(B.getIntNTy(TLI->getIntSize()), IntExpo),
M, B));		M, B));

		if (PowI && Sqrt)
		return B.CreateFMul(PowI, Sqrt);

		return PowI;
}		}
}		}

// powf(x, itofp(y)) -> powi(x, y)		// powf(x, itofp(y)) -> powi(x, y)
if (AllowApprox && (isa<SIToFPInst>(Expo) \|\| isa<UIToFPInst>(Expo))) {		if (AllowApprox && (isa<SIToFPInst>(Expo) \|\| isa<UIToFPInst>(Expo))) {
if (Value *ExpoI = getIntToFPVal(Expo, B, TLI->getIntSize()))		if (Value *ExpoI = getIntToFPVal(Expo, B, TLI->getIntSize()))
return copyFlags(*Pow, createPowWithIntegerExponent(Base, ExpoI, M, B));		return copyFlags(*Pow, createPowWithIntegerExponent(Base, ExpoI, M, B));
}		}
▲ Show 20 Lines • Show All 1,808 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/powi.ll

This file was added.

				; RUN: llc < %s -mtriple=aarch64-- \| FileCheck %s

				declare double @llvm.powi.f64.i32(double, i32)
				declare float @llvm.powi.f32.i32(float, i32)
				declare float @pow(double noundef, double noundef)

				define float @powi_f32(float %x) nounwind {
				; CHECK-LABEL: powi_f32:
				spatelUnsubmitted Not Done Reply Inline Actions These tests results are independent of this patch, so this could be committed as a preliminary "NFC" patch. We intentionally don't have regression tests that check end-to-end results of IR optimization + codegen because we want focused/unit testing at this level. The end-to-end tests live in test-suite instead (and could begin from C source too). spatel: These tests results are independent of this patch, so this could be committed as a preliminary…
				pawosm01AuthorUnsubmitted Done Reply Inline Actions I'd keep it if it's not very disturbing. This is the last thing that links us to the idea that brought us here. pawosm01: I'd keep it if it's not very disturbing. This is the last thing that links us to the idea that…
				; CHECK: // %bb.0:
				; CHECK-NEXT: fmul s0, s0, s0
				; CHECK-NEXT: fmul s0, s0, s0
				; CHECK-NEXT: ret
				%1 = tail call float @llvm.powi.f32.i32(float %x, i32 4)
				ret float %1
				}

				define double @powi_f64(double %x) nounwind {
				; CHECK-LABEL: powi_f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: fmul d1, d0, d0
				; CHECK-NEXT: fmul d0, d0, d1
				; CHECK-NEXT: ret
				%1 = tail call double @llvm.powi.f64.i32(double %x, i32 3)
				ret double %1
				}

llvm/test/Transforms/InstCombine/pow-4.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=instcombine -S < %s -mtriple unknown \| FileCheck %s --check-prefixes=CHECK,CHECKI32,CHECKSQRT			; RUN: opt -passes=instcombine -S < %s -mtriple unknown \| FileCheck %s --check-prefixes=CHECK,CHECKI32,CHECKSQRT
	; RUN: opt -passes=instcombine -S < %s -mtriple unknown -disable-builtin sqrt \| FileCheck %s --check-prefixes=CHECK,CHECKI32,CHECKNOSQRT			; RUN: opt -passes=instcombine -S < %s -mtriple unknown -disable-builtin sqrt \| FileCheck %s --check-prefixes=CHECK,CHECKI32,CHECKNOSQRT
	; RUN: opt -passes=instcombine -S < %s -mtriple msp430 \| FileCheck %s --check-prefixes=CHECK,CHECKI16,CHECKSQRT			; RUN: opt -passes=instcombine -S < %s -mtriple msp430 \| FileCheck %s --check-prefixes=CHECK,CHECKI16,CHECKSQRT
	; RUN: opt -passes=instcombine -S < %s -mtriple msp430 -disable-builtin sqrt \| FileCheck %s --check-prefixes=CHECK,CHECKI16,CHECKNOSQRT			; RUN: opt -passes=instcombine -S < %s -mtriple msp430 -disable-builtin sqrt \| FileCheck %s --check-prefixes=CHECK,CHECKI16,CHECKNOSQRT

	declare double @llvm.pow.f64(double, double)			declare double @llvm.pow.f64(double, double)
	declare float @llvm.pow.f32(float, float)			declare float @llvm.pow.f32(float, float)
	declare <2 x double> @llvm.pow.v2f64(<2 x double>, <2 x double>)			declare <2 x double> @llvm.pow.v2f64(<2 x double>, <2 x double>)
	declare <2 x float> @llvm.pow.v2f32(<2 x float>, <2 x float>)			declare <2 x float> @llvm.pow.v2f32(<2 x float>, <2 x float>)
	declare <4 x float> @llvm.pow.v4f32(<4 x float>, <4 x float>)			declare <4 x float> @llvm.pow.v4f32(<4 x float>, <4 x float>)
	declare double @pow(double, double)			declare double @pow(double, double)

	; pow(x, 3.0)			; pow(x, 3.0)
	define double @test_simplify_3(double %x) {			define double @test_simplify_3(double %x) {
	; CHECK-LABEL: @test_simplify_3(			; CHECKI32-LABEL: @test_simplify_3(
	; CHECK-NEXT: [[SQUARE:%.]] = fmul fast double [[X:%.]], [[X]]			; CHECKI32-NEXT: [[TMP1:%.]] = call fast double @llvm.powi.f64.i32(double [[X:%.]], i32 3)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[X]]			; CHECKI32-NEXT: ret double [[TMP1]]
	; CHECK-NEXT: ret double [[TMP1]]			;
				; CHECKI16-LABEL: @test_simplify_3(
				; CHECKI16-NEXT: [[TMP1:%.]] = call fast double @llvm.powi.f64.i16(double [[X:%.]], i16 3)
				; CHECKI16-NEXT: ret double [[TMP1]]
	;			;
	%1 = call fast double @llvm.pow.f64(double %x, double 3.000000e+00)			%1 = call fast double @llvm.pow.f64(double %x, double 3.000000e+00)
	ret double %1			ret double %1
	}			}

	; powf(x, 4.0)			; powf(x, 4.0)
	define float @test_simplify_4f(float %x) {			define float @test_simplify_4f(float %x) {
	; CHECK-LABEL: @test_simplify_4f(			; CHECKI32-LABEL: @test_simplify_4f(
	; CHECK-NEXT: [[SQUARE:%.]] = fmul fast float [[X:%.]], [[X]]			; CHECKI32-NEXT: [[TMP1:%.]] = call fast float @llvm.powi.f32.i32(float [[X:%.]], i32 4)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast float [[SQUARE]], [[SQUARE]]			; CHECKI32-NEXT: ret float [[TMP1]]
	; CHECK-NEXT: ret float [[TMP1]]			;
				; CHECKI16-LABEL: @test_simplify_4f(
				; CHECKI16-NEXT: [[TMP1:%.]] = call fast float @llvm.powi.f32.i16(float [[X:%.]], i16 4)
				david-armUnsubmitted Done Reply Inline Actions It looks like the `nsz` flag is not needed here because the transformation only depends upon `reassoc nnan ninf` I think? david-arm: It looks like the `nsz` flag is not needed here because the transformation only depends upon…
				pawosm01AuthorUnsubmitted Done Reply Inline Actions yes pawosm01: yes
				; CHECKI16-NEXT: ret float [[TMP1]]
	;			;
	%1 = call fast float @llvm.pow.f32(float %x, float 4.000000e+00)			%1 = call fast float @llvm.pow.f32(float %x, float 4.000000e+00)
	ret float %1			ret float %1
	}			}

	; pow(x, 4.0)			; pow(x, 4.0)
	define double @test_simplify_4(double %x) {			define double @test_simplify_4(double %x) {
	; CHECK-LABEL: @test_simplify_4(			; CHECKI32-LABEL: @test_simplify_4(
	; CHECK-NEXT: [[SQUARE:%.]] = fmul fast double [[X:%.]], [[X]]			; CHECKI32-NEXT: [[TMP1:%.]] = call fast double @llvm.powi.f64.i32(double [[X:%.]], i32 4)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]			; CHECKI32-NEXT: ret double [[TMP1]]
	; CHECK-NEXT: ret double [[TMP1]]			;
				; CHECKI16-LABEL: @test_simplify_4(
				; CHECKI16-NEXT: [[TMP1:%.]] = call fast double @llvm.powi.f64.i16(double [[X:%.]], i16 4)
				; CHECKI16-NEXT: ret double [[TMP1]]
	;			;
	%1 = call fast double @llvm.pow.f64(double %x, double 4.000000e+00)			%1 = call fast double @llvm.pow.f64(double %x, double 4.000000e+00)
	ret double %1			ret double %1
	}			}

	; powf(x, <15.0, 15.0>)			; powf(x, <15.0, 15.0>)
	define <2 x float> @test_simplify_15(<2 x float> %x) {			define <2 x float> @test_simplify_15(<2 x float> %x) {
	; CHECK-LABEL: @test_simplify_15(			; CHECKI32-LABEL: @test_simplify_15(
	; CHECK-NEXT: [[SQUARE:%.]] = fmul fast <2 x float> [[X:%.]], [[X]]			; CHECKI32-NEXT: [[TMP1:%.]] = call fast <2 x float> @llvm.powi.v2f32.i32(<2 x float> [[X:%.]], i32 15)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast <2 x float> [[SQUARE]], [[X]]			; CHECKI32-NEXT: ret <2 x float> [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP1]]			;
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP2]], [[TMP2]]			; CHECKI16-LABEL: @test_simplify_15(
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP3]]			; CHECKI16-NEXT: [[TMP1:%.]] = call fast <2 x float> @llvm.powi.v2f32.i16(<2 x float> [[X:%.]], i16 15)
				david-armUnsubmitted Done Reply Inline Actions nit: Can you remove nsz here too? david-arm: nit: Can you remove nsz here too?
				pawosm01AuthorUnsubmitted Done Reply Inline Actions Si. pawosm01: Si.
	; CHECK-NEXT: ret <2 x float> [[TMP4]]			; CHECKI16-NEXT: ret <2 x float> [[TMP1]]
	;			;
	%1 = call fast <2 x float> @llvm.pow.v2f32(<2 x float> %x, <2 x float> <float 1.500000e+01, float 1.500000e+01>)			%1 = call fast <2 x float> @llvm.pow.v2f32(<2 x float> %x, <2 x float> <float 1.500000e+01, float 1.500000e+01>)
	ret <2 x float> %1			ret <2 x float> %1
	}			}

	; pow(x, -7.0)			; pow(x, -7.0)
	define <2 x double> @test_simplify_neg_7(<2 x double> %x) {			define <2 x double> @test_simplify_neg_7(<2 x double> %x) {
	; CHECK-LABEL: @test_simplify_neg_7(			; CHECKI32-LABEL: @test_simplify_neg_7(
	; CHECK-NEXT: [[SQUARE:%.]] = fmul fast <2 x double> [[X:%.]], [[X]]			; CHECKI32-NEXT: [[TMP1:%.]] = call fast <2 x double> @llvm.powi.v2f64.i32(<2 x double> [[X:%.]], i32 -7)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast <2 x double> [[SQUARE]], [[SQUARE]]			; CHECKI32-NEXT: ret <2 x double> [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x double> [[TMP1]], [[X]]			;
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x double> [[SQUARE]], [[TMP2]]			; CHECKI16-LABEL: @test_simplify_neg_7(
	; CHECK-NEXT: [[RECIPROCAL:%.*]] = fdiv fast <2 x double> <double 1.000000e+00, double 1.000000e+00>, [[TMP3]]			; CHECKI16-NEXT: [[TMP1:%.]] = call fast <2 x double> @llvm.powi.v2f64.i16(<2 x double> [[X:%.]], i16 -7)
	; CHECK-NEXT: ret <2 x double> [[RECIPROCAL]]			; CHECKI16-NEXT: ret <2 x double> [[TMP1]]
	;			;
	%1 = call fast <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double -7.000000e+00, double -7.000000e+00>)			%1 = call fast <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double -7.000000e+00, double -7.000000e+00>)
	ret <2 x double> %1			ret <2 x double> %1
	}			}

	; powf(x, -19.0)			; powf(x, -19.0)
	define float @test_simplify_neg_19(float %x) {			define float @test_simplify_neg_19(float %x) {
	; CHECK-LABEL: @test_simplify_neg_19(			; CHECKI32-LABEL: @test_simplify_neg_19(
	; CHECK-NEXT: [[SQUARE:%.]] = fmul fast float [[X:%.]], [[X]]			; CHECKI32-NEXT: [[TMP1:%.]] = call fast float @llvm.powi.f32.i32(float [[X:%.]], i32 -19)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast float [[SQUARE]], [[SQUARE]]			; CHECKI32-NEXT: ret float [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast float [[TMP1]], [[TMP1]]			;
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast float [[TMP2]], [[TMP2]]			; CHECKI16-LABEL: @test_simplify_neg_19(
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast float [[SQUARE]], [[TMP3]]			; CHECKI16-NEXT: [[TMP1:%.]] = call fast float @llvm.powi.f32.i16(float [[X:%.]], i16 -19)
	; CHECK-NEXT: [[TMP5:%.*]] = fmul fast float [[TMP4]], [[X]]			; CHECKI16-NEXT: ret float [[TMP1]]
	; CHECK-NEXT: [[RECIPROCAL:%.*]] = fdiv fast float 1.000000e+00, [[TMP5]]
	; CHECK-NEXT: ret float [[RECIPROCAL]]
	;			;
	%1 = call fast float @llvm.pow.f32(float %x, float -1.900000e+01)			%1 = call fast float @llvm.pow.f32(float %x, float -1.900000e+01)
	ret float %1			ret float %1
	}			}

	; pow(x, 11.23)			; pow(x, 11.23)
	define double @test_simplify_11_23(double %x) {			define double @test_simplify_11_23(double %x) {
	; CHECK-LABEL: @test_simplify_11_23(			; CHECK-LABEL: @test_simplify_11_23(
	; CHECK-NEXT: [[TMP1:%.]] = call fast double @llvm.pow.f64(double [[X:%.]], double 1.123000e+01)			; CHECK-NEXT: [[TMP1:%.]] = call fast double @llvm.pow.f64(double [[X:%.]], double 1.123000e+01)
	; CHECK-NEXT: ret double [[TMP1]]			; CHECK-NEXT: ret double [[TMP1]]
	;			;
	%1 = call fast double @llvm.pow.f64(double %x, double 1.123000e+01)			%1 = call fast double @llvm.pow.f64(double %x, double 1.123000e+01)
	ret double %1			ret double %1
	}			}

	; powf(x, 32.0)			; powf(x, 32.0)
	define float @test_simplify_32(float %x) {			define float @test_simplify_32(float %x) {
	; CHECK-LABEL: @test_simplify_32(			; CHECKI32-LABEL: @test_simplify_32(
				david-armUnsubmitted Done Reply Inline Actions nit: Can you remove nsz here too? david-arm: nit: Can you remove nsz here too?
				pawosm01AuthorUnsubmitted Done Reply Inline Actions Si. pawosm01: Si.
	; CHECK-NEXT: [[SQUARE:%.]] = fmul fast float [[X:%.]], [[X]]			; CHECKI32-NEXT: [[TMP1:%.]] = call fast float @llvm.powi.f32.i32(float [[X:%.]], i32 32)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast float [[SQUARE]], [[SQUARE]]			; CHECKI32-NEXT: ret float [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast float [[TMP1]], [[TMP1]]			;
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast float [[TMP2]], [[TMP2]]			; CHECKI16-LABEL: @test_simplify_32(
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast float [[TMP3]], [[TMP3]]			; CHECKI16-NEXT: [[TMP1:%.]] = call fast float @llvm.powi.f32.i16(float [[X:%.]], i16 32)
	; CHECK-NEXT: ret float [[TMP4]]			; CHECKI16-NEXT: ret float [[TMP1]]
	;			;
	%1 = call fast float @llvm.pow.f32(float %x, float 3.200000e+01)			%1 = call fast float @llvm.pow.f32(float %x, float 3.200000e+01)
	ret float %1			ret float %1
	}			}

	; pow(x, 33.0)			; pow(x, 33.0)
	define double @test_simplify_33(double %x) {			define double @test_simplify_33(double %x) {
	; CHECKI32-LABEL: @test_simplify_33(			; CHECKI32-LABEL: @test_simplify_33(
	; CHECKI32-NEXT: [[TMP1:%.]] = call fast double @llvm.powi.f64.i32(double [[X:%.]], i32 33)			; CHECKI32-NEXT: [[TMP1:%.]] = call fast double @llvm.powi.f64.i32(double [[X:%.]], i32 33)
	; CHECKI32-NEXT: ret double [[TMP1]]			; CHECKI32-NEXT: ret double [[TMP1]]
	;			;
	; CHECKI16-LABEL: @test_simplify_33(			; CHECKI16-LABEL: @test_simplify_33(
	; CHECKI16-NEXT: [[TMP1:%.]] = call fast double @llvm.powi.f64.i16(double [[X:%.]], i16 33)			; CHECKI16-NEXT: [[TMP1:%.]] = call fast double @llvm.powi.f64.i16(double [[X:%.]], i16 33)
	; CHECKI16-NEXT: ret double [[TMP1]]			; CHECKI16-NEXT: ret double [[TMP1]]
	;			;
	%1 = call fast double @llvm.pow.f64(double %x, double 3.300000e+01)			%1 = call fast double @llvm.pow.f64(double %x, double 3.300000e+01)
	ret double %1			ret double %1
	}			}

	; pow(x, 16.5) with double			; pow(x, 16.5) with double
	define double @test_simplify_16_5(double %x) {			define double @test_simplify_16_5(double %x) {
	; CHECK-LABEL: @test_simplify_16_5(			; CHECK32-LABEL: @test_simplify_16_5(
	; CHECK-NEXT: [[SQRT:%.]] = call fast double @llvm.sqrt.f64(double [[X:%.]])			; CHECK32-NEXT: [[SQRT:%.]] = call fast double @llvm.sqrt.f64(double [[X:%.]])
				RKSimonUnsubmitted Not Done Reply Inline Actions I guess we need to decide whether we want to retain this variety of cases somehow? I assume we can perform this as powi(x, 16) * sqrt(x) ? RKSimon: I guess we need to decide whether we want to retain this variety of cases somehow? I assume we…
				pawosm01AuthorUnsubmitted Done Reply Inline Actions Something like this?: const APFloat ExpoF; if (match(Expo, m_APFloat(ExpoF)) && !ExpoF->isExactlyValue(0.5) && !ExpoF->isExactlyValue(-0.5)) { + // This transformation applies to integer or integer+0.5 exponents only. + // For integer+0.5, we create a sqrt(Base) call. + APFloat ExpoA(abs(ExpoF)); + Value Sqrt = nullptr; + if (AllowApprox && !ExpoA.isInteger()) { + APFloat Expo2 = ExpoA; + // To check if ExpoA is an integer + 0.5, we add it to itself. If there + // is no floating point exception and the result is an integer, then + // ExpoA == integer + 0.5 + if (Expo2.add(ExpoA, APFloat::rmNearestTiesToEven) != APFloat::opOK) + return nullptr; + + if (!Expo2.isInteger()) + return nullptr; + + Sqrt = getSqrtCall(Base, Pow->getCalledFunction()->getAttributes(), + Pow->doesNotAccessMemory(), M, B, TLI); + if (!Sqrt) + return nullptr; + } + APSInt IntExpo(TLI->getIntSize(), /isUnsigned=/false); + // pow(x, n) -> powi(x, n) if n is a constant signed integer value if (ExpoF->isInteger() && ExpoF->convertToInteger(IntExpo, APFloat::rmTowardZero, &Ignored) == APFloat::opOK) { - return copyFlags( + Value PowI = copyFlags( Pow, createPowWithIntegerExponent( Base, ConstantInt::get(B.getIntNTy(TLI->getIntSize()), IntExpo), M, B)); + + if (PowI && Sqrt) + return B.CreateFMul(PowI, Sqrt); + + return PowI; } } Sadly, this leads to infinite loop of self-contradicting optimizations: FAIL: LLVM :: Transforms/InstCombine/pow_fp_int.ll (10150 of 45161) ***************** TEST 'LLVM :: Transforms/InstCombine/pow_fp_int.ll' FAILED **************** Script: -- : 'RUN: at line 1'; /dsg_space/projectscratch_dsg_space/pawosm01/upstream/llvm-project.git/build-shared-debug/bin/opt -mtriple unknown -passes=instcombine -S < /dsg_space/projectscratch_dsg_space/pawosm01/upstream/llvm-project.git/llvm/test/Transforms/InstCombine/pow_fp_int.ll \| /dsg_space/projectscratch_dsg_space/pawosm01/upstream/llvm-project.git/build-shared-debug/bin/FileCheck /dsg_space/projectscratch_dsg_space/pawosm01/upstream/llvm-project.git/llvm/test/Transforms/InstCombine/pow_fp_int.ll -- Exit Code: 2 Command Output (stderr): -- LLVM ERROR: Instruction Combining seems stuck in an infinite loop after 100 iterations. pawosm01:** Something like this?: ``` const APFloat *ExpoF; if (match(Expo, m_APFloat(ExpoF)) && !
	; CHECK-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]			; CHECK32-NEXT: [[POWI:%.*]] = call fast double @llvm.powi.f64.i32(double [[X]], i32 16)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]			; CHECK32-NEXT: [[TMP1:%.*]] = fmul fast double [[POWI]], [[SQRT]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]			; CHECK32-NEXT: ret double [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]			;
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]			; CHECK16-LABEL: @test_simplify_16_5(
	; CHECK-NEXT: ret double [[TMP4]]			; CHECK16-NEXT: [[SQRT:%.]] = call fast double @llvm.sqrt.f64(double [[X:%.]])
				; CHECK16-NEXT: [[POWI:%.*]] = call fast double @llvm.powi.f64.i16(double [[X]], i16 16)
				; CHECK16-NEXT: [[TMP1:%.*]] = fmul fast double [[POWI]], [[SQRT]]
				; CHECK16-NEXT: ret double [[TMP1]]
	;			;
	%1 = call fast double @llvm.pow.f64(double %x, double 1.650000e+01)			%1 = call fast double @llvm.pow.f64(double %x, double 1.650000e+01)
	ret double %1			ret double %1
	}			}

	; pow(x, -16.5) with double			; pow(x, -16.5) with double
	define double @test_simplify_neg_16_5(double %x) {			define double @test_simplify_neg_16_5(double %x) {
	; CHECK-LABEL: @test_simplify_neg_16_5(			; CHECK32-LABEL: @test_simplify_neg_16_5(
	; CHECK-NEXT: [[SQRT:%.]] = call fast double @llvm.sqrt.f64(double [[X:%.]])			; CHECK32-NEXT: [[SQRT:%.]] = call fast double @llvm.sqrt.f64(double [[X:%.]])
	; CHECK-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]			; CHECK32-NEXT: [[POWI:%.*]] = call fast double @llvm.powi.f64.i32(double [[X]], i32 -17)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]			; CHECK32-NEXT: [[TMP1:%.*]] = fmul fast double [[POWI]], [[SQRT]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]			; CHECK32-NEXT: ret double [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]			;
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]			; CHECK16-LABEL: @test_simplify_neg_16_5(
	; CHECK-NEXT: [[RECIPROCAL:%.*]] = fdiv fast double 1.000000e+00, [[TMP4]]			; CHECK16-NEXT: [[SQRT:%.]] = call fast double @llvm.sqrt.f64(double [[X:%.]])
	; CHECK-NEXT: ret double [[RECIPROCAL]]			; CHECK16-NEXT: [[POWI:%.*]] = call fast double @llvm.powi.f64.i16(double [[X]], i16 -17)
				; CHECK16-NEXT: [[TMP1:%.*]] = fmul fast double [[POWI]], [[SQRT]]
				; CHECK16-NEXT: ret double [[TMP1]]
	;			;
	%1 = call fast double @llvm.pow.f64(double %x, double -1.650000e+01)			%1 = call fast double @llvm.pow.f64(double %x, double -1.650000e+01)
	ret double %1			ret double %1
	}			}

	; pow(x, 16.5) with double			; pow(x, 0.5) with double

	define double @test_simplify_16_5_libcall(double %x) {			define double @test_simplify_0_5_libcall(double %x) {
	; SQRT-LABEL: @test_simplify_16_5_libcall(			; CHECKSQRT-LABEL: @test_simplify_0_5_libcall(
	; SQRT-NEXT: [[SQRT:%.]] = call fast double @sqrt(double [[X:%.]])
	; SQRT-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]
	; SQRT-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]
	; SQRT-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]
	; SQRT-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]
	; SQRT-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]
	; SQRT-NEXT: ret double [[TMP4]]
	;
	; NOSQRT-LABEL: @test_simplify_16_5_libcall(
	; NOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double 1.650000e+01)
	; NOSQRT-NEXT: ret double [[TMP1]]
	;
	; CHECKSQRT-LABEL: @test_simplify_16_5_libcall(
	; CHECKSQRT-NEXT: [[SQRT:%.]] = call fast double @sqrt(double [[X:%.]])			; CHECKSQRT-NEXT: [[SQRT:%.]] = call fast double @sqrt(double [[X:%.]])
	; CHECKSQRT-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]			; CHECKSQRT-NEXT: ret double [[SQRT]]
	; CHECKSQRT-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]
	; CHECKSQRT-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]
	; CHECKSQRT-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]
	; CHECKSQRT-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]
	; CHECKSQRT-NEXT: ret double [[TMP4]]
	;			;
	; CHECKNOSQRT-LABEL: @test_simplify_16_5_libcall(			; CHECKNOSQRT-LABEL: @test_simplify_0_5_libcall(
	; CHECKNOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double 1.650000e+01)			; CHECKNOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double 5.000000e-01)
	; CHECKNOSQRT-NEXT: ret double [[TMP1]]			; CHECKNOSQRT-NEXT: ret double [[TMP1]]
	;			;
	%1 = call fast double @pow(double %x, double 1.650000e+01)			%1 = call fast double @pow(double %x, double 5.000000e-01)
	ret double %1			ret double %1
	}			}

	; pow(x, -16.5) with double			; pow(x, -0.5) with double

	define double @test_simplify_neg_16_5_libcall(double %x) {			define double @test_simplify_neg_0_5_libcall(double %x) {
	; SQRT-LABEL: @test_simplify_neg_16_5_libcall(			; CHECKSQRT-LABEL: @test_simplify_neg_0_5_libcall(
	; SQRT-NEXT: [[SQRT:%.]] = call fast double @sqrt(double [[X:%.]])
	; SQRT-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]
	; SQRT-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]
	; SQRT-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]
	; SQRT-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]
	; SQRT-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]
	; SQRT-NEXT: [[RECIPROCAL:%.*]] = fdiv fast double 1.000000e+00, [[TMP4]]
	; SQRT-NEXT: ret double [[RECIPROCAL]]
	;
	; NOSQRT-LABEL: @test_simplify_neg_16_5_libcall(
	; NOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double -1.650000e+01)
	; NOSQRT-NEXT: ret double [[TMP1]]
	;
	; CHECKSQRT-LABEL: @test_simplify_neg_16_5_libcall(
	; CHECKSQRT-NEXT: [[SQRT:%.]] = call fast double @sqrt(double [[X:%.]])			; CHECKSQRT-NEXT: [[SQRT:%.]] = call fast double @sqrt(double [[X:%.]])
	; CHECKSQRT-NEXT: [[SQUARE:%.*]] = fmul fast double [[X]], [[X]]			; CHECKSQRT-NEXT: [[RECIPROCAL:%.*]] = fdiv fast double 1.000000e+00, [[SQRT]]
	; CHECKSQRT-NEXT: [[TMP1:%.*]] = fmul fast double [[SQUARE]], [[SQUARE]]
	; CHECKSQRT-NEXT: [[TMP2:%.*]] = fmul fast double [[TMP1]], [[TMP1]]
	; CHECKSQRT-NEXT: [[TMP3:%.*]] = fmul fast double [[TMP2]], [[TMP2]]
	; CHECKSQRT-NEXT: [[TMP4:%.*]] = fmul fast double [[TMP3]], [[SQRT]]
	; CHECKSQRT-NEXT: [[RECIPROCAL:%.*]] = fdiv fast double 1.000000e+00, [[TMP4]]
	; CHECKSQRT-NEXT: ret double [[RECIPROCAL]]			; CHECKSQRT-NEXT: ret double [[RECIPROCAL]]
	;			;
	; CHECKNOSQRT-LABEL: @test_simplify_neg_16_5_libcall(			; CHECKNOSQRT-LABEL: @test_simplify_neg_0_5_libcall(
	; CHECKNOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double -1.650000e+01)			; CHECKNOSQRT-NEXT: [[TMP1:%.]] = call fast double @pow(double [[X:%.]], double -5.000000e-01)
	; CHECKNOSQRT-NEXT: ret double [[TMP1]]			; CHECKNOSQRT-NEXT: ret double [[TMP1]]
	;			;
	%1 = call fast double @pow(double %x, double -1.650000e+01)			%1 = call fast double @pow(double %x, double -5.000000e-01)
	ret double %1			ret double %1
	}			}

	; pow(x, -8.5) with float			; pow(x, -8.5) with float
	define float @test_simplify_neg_8_5(float %x) {			define float @test_simplify_neg_8_5(float %x) {
	; CHECK-LABEL: @test_simplify_neg_8_5(			; CHECK32-LABEL: @test_simplify_neg_8_5(
	; CHECK-NEXT: [[SQRT:%.]] = call fast float @llvm.sqrt.f32(float [[X:%.]])			; CHECK32-NEXT: [[SQRT:%.]] = call fast float @llvm.sqrt.f32(float [[X:%.]])
	; CHECK-NEXT: [[SQUARE:%.*]] = fmul fast float [[X]], [[X]]			; CHECK32-NEXT: [[POWI:%.*]] = call fast float @llvm.powi.f32.i32(float [[X]], i32 -9)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast float [[SQUARE]], [[SQUARE]]			; CHECK32-NEXT: [[TMP1:%.*]] = fmul fast float [[POWI]], [[SQRT]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast float [[TMP1]], [[SQRT]]			;
	; CHECK-NEXT: [[RECIPROCAL:%.*]] = fdiv fast float 1.000000e+00, [[TMP2]]			; CHECK16-LABEL: @test_simplify_neg_8_5(
	; CHECK-NEXT: ret float [[RECIPROCAL]]			; CHECK16-NEXT: [[SQRT:%.]] = call fast float @llvm.sqrt.f32(float [[X:%.]])
				; CHECK16-NEXT: [[POWI:%.*]] = call fast float @llvm.powi.f32.i16(float [[X]], i16 -9)
				; CHECK16-NEXT: [[TMP1:%.*]] = fmul fast float [[POWI]], [[SQRT]]
	;			;
	%1 = call fast float @llvm.pow.f32(float %x, float -0.450000e+01)			%1 = call fast float @llvm.pow.f32(float %x, float -0.850000e+01)
	ret float %1			ret float %1
	}			}

	; pow(x, 7.5) with <2 x double>			; pow(x, 7.5) with <2 x double>
	define <2 x double> @test_simplify_7_5(<2 x double> %x) {			define <2 x double> @test_simplify_7_5(<2 x double> %x) {
	; CHECK-LABEL: @test_simplify_7_5(			; CHECK32-LABEL: @test_simplify_7_5(
	; CHECK-NEXT: [[SQRT:%.]] = call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> [[X:%.]])			; CHECK32-NEXT: [[SQRT:%.]] = call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> [[X:%.]])
	; CHECK-NEXT: [[SQUARE:%.*]] = fmul fast <2 x double> [[X]], [[X]]			; CHECK32-NEXT: [[POWI:%.*]] = call fast <2 x double> @llvm.powi.v2f64.i32(<2 x double> [[X]], i32 7)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast <2 x double> [[SQUARE]], [[SQUARE]]			; CHECK32-NEXT: [[TMP1:%.*]] = fmul fast <2 x double> [[POWI]], [[SQRT]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <2 x double> [[TMP1]], [[X]]			;
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x double> [[SQUARE]], [[TMP2]]			; CHECK16-LABEL: @test_simplify_7_5(
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <2 x double> [[TMP3]], [[SQRT]]			; CHECK16-NEXT: [[SQRT:%.]] = call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> [[X:%.]])
	; CHECK-NEXT: ret <2 x double> [[TMP4]]			; CHECK16-NEXT: [[POWI:%.*]] = call fast <2 x double> @llvm.powi.v2f64.i16(<2 x double> [[X]], i16 7)
				; CHECK16-NEXT: [[TMP1:%.*]] = fmul fast <2 x double> [[POWI]], [[SQRT]]
	;			;
	%1 = call fast <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double 7.500000e+00, double 7.500000e+00>)			%1 = call fast <2 x double> @llvm.pow.v2f64(<2 x double> %x, <2 x double> <double 7.500000e+00, double 7.500000e+00>)
	ret <2 x double> %1			ret <2 x double> %1
	}			}

	; pow(x, 3.5) with <4 x float>			; pow(x, 3.5) with <4 x float>
	define <4 x float> @test_simplify_3_5(<4 x float> %x) {			define <4 x float> @test_simplify_3_5(<4 x float> %x) {
	; CHECK-LABEL: @test_simplify_3_5(			; CHECK32-LABEL: @test_simplify_3_5(
	; CHECK-NEXT: [[SQRT:%.]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[X:%.]])			; CHECK32-NEXT: [[SQRT:%.]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[X:%.]])
	; CHECK-NEXT: [[SQUARE:%.*]] = fmul fast <4 x float> [[X]], [[X]]			; CHECK32-NEXT: [[POWI:%.*]] = call fast <4 x float> @llvm.powi.v4f32.i32(<4 x float> [[X]], i32 3)
	; CHECK-NEXT: [[TMP1:%.*]] = fmul fast <4 x float> [[SQUARE]], [[X]]			; CHECK32-NEXT: [[TMP1:%.*]] = fmul fast <4 x float> [[POWI]], [[SQRT]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[TMP1]], [[SQRT]]			;
	; CHECK-NEXT: ret <4 x float> [[TMP2]]			; CHECK16-LABEL: @test_simplify_3_5(
				; CHECK16-NEXT: [[SQRT:%.]] = call fast <4 x float> @llvm.sqrt.v4f32(<4 x float> [[X:%.]])
				; CHECK16-NEXT: [[POWI:%.*]] = call fast <4 x float> @llvm.powi.v4f32.i16(<4 x float> [[X]], i16 3)
				; CHECK16-NEXT: [[TMP1:%.*]] = fmul fast <4 x float> [[POWI]], [[SQRT]]
	;			;
	%1 = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %x, <4 x float> <float 3.500000e+00, float 3.500000e+00, float 3.500000e+00, float 3.500000e+00>)			%1 = call fast <4 x float> @llvm.pow.v4f32(<4 x float> %x, <4 x float> <float 3.500000e+00, float 3.500000e+00, float 3.500000e+00, float 3.500000e+00>)
	ret <4 x float> %1			ret <4 x float> %1
	}			}

	; (float)pow((double)(float)x, 0.5)			; (float)pow((double)(float)x, 0.5)
	define float @shrink_pow_libcall_half(float %x) {			define float @shrink_pow_libcall_half(float %x) {
	; CHECK-LABEL: @shrink_pow_libcall_half(			; CHECK-LABEL: @shrink_pow_libcall_half(
	Show All 19 Lines

llvm/test/Transforms/InstCombine/pow_fp_int.ll

	Show First 20 Lines • Show All 438 Lines • ▼ Show 20 Lines
	;			;
	%subfp = uitofp i32 %x to double			%subfp = uitofp i32 %x to double
	%pow = tail call double @llvm.pow.f64(double %base, double %subfp)			%pow = tail call double @llvm.pow.f64(double %base, double %subfp)
	ret double %pow			ret double %pow
	}			}

	define double @powf_exp_const_int_no_fast(double %base) {			define double @powf_exp_const_int_no_fast(double %base) {
	; CHECK-LABEL: @powf_exp_const_int_no_fast(			; CHECK-LABEL: @powf_exp_const_int_no_fast(
	; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.pow.f64(double [[BASE:%.]], double 4.000000e+01)			; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.powi.f64.i32(double [[BASE:%.]], i32 40)
	; CHECK-NEXT: ret double [[RES]]			; CHECK-NEXT: ret double [[RES]]
	;			;
	%res = tail call double @llvm.pow.f64(double %base, double 4.000000e+01)			%res = tail call double @llvm.pow.f64(double %base, double 4.000000e+01)
	ret double %res			ret double %res
	}			}

	define double @powf_exp_const_not_int_fast(double %base) {			define double @powf_exp_const_not_int_fast(double %base) {
	; CHECK-LABEL: @powf_exp_const_not_int_fast(			; CHECK-LABEL: @powf_exp_const_not_int_fast(
	; CHECK-NEXT: [[RES:%.]] = tail call fast double @llvm.pow.f64(double [[BASE:%.]], double 3.750000e+01)			; CHECK-NEXT: [[SQRT:%.]] = call fast double @llvm.sqrt.f64(double [[BASE:%.]])
				; CHECK-NEXT: [[POWI:%.*]] = tail call fast double @llvm.powi.f64.i32(double [[BASE]], i32 37)
				; CHECK-NEXT: [[RES:%.*]] = fmul fast double [[POWI]], [[SQRT]]
	; CHECK-NEXT: ret double [[RES]]			; CHECK-NEXT: ret double [[RES]]
	;			;
	%res = tail call fast double @llvm.pow.f64(double %base, double 3.750000e+01)			%res = tail call fast double @llvm.pow.f64(double %base, double 3.750000e+01)
	ret double %res			ret double %res
	}			}

	define double @powf_exp_const_not_int_no_fast(double %base) {			define double @powf_exp_const_not_int_no_fast(double %base) {
	; CHECK-LABEL: @powf_exp_const_not_int_no_fast(			; CHECK-LABEL: @powf_exp_const_not_int_no_fast(
	; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.pow.f64(double [[BASE:%.]], double 3.750000e+01)			; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.pow.f64(double [[BASE:%.]], double 3.750000e+01)
	; CHECK-NEXT: ret double [[RES]]			; CHECK-NEXT: ret double [[RES]]
	;			;
	%res = tail call double @llvm.pow.f64(double %base, double 3.750000e+01)			%res = tail call double @llvm.pow.f64(double %base, double 3.750000e+01)
	ret double %res			ret double %res
	}			}

	define double @powf_exp_const2_int_no_fast(double %base) {			define double @powf_exp_const2_int_no_fast(double %base) {
	; CHECK-LABEL: @powf_exp_const2_int_no_fast(			; CHECK-LABEL: @powf_exp_const2_int_no_fast(
	; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.pow.f64(double [[BASE:%.]], double -4.000000e+01)			; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.powi.f64.i32(double [[BASE:%.]], i32 -40)
	; CHECK-NEXT: ret double [[RES]]			; CHECK-NEXT: ret double [[RES]]
	;			;
	%res = tail call double @llvm.pow.f64(double %base, double -4.000000e+01)			%res = tail call double @llvm.pow.f64(double %base, double -4.000000e+01)
	ret double %res			ret double %res
	}			}

	declare float @llvm.pow.f32(float, float)			declare float @llvm.pow.f32(float, float)
	declare double @llvm.pow.f64(double, double)			declare double @llvm.pow.f64(double, double)

llvm/test/Transforms/InstCombine/pow_fp_int16.ll

	Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines
	;			;
	%subfp = uitofp i16 %x to double			%subfp = uitofp i16 %x to double
	%pow = tail call double @llvm.pow.f64(double %base, double %subfp)			%pow = tail call double @llvm.pow.f64(double %base, double %subfp)
	ret double %pow			ret double %pow
	}			}

	define double @powf_exp_const_int_no_fast(double %base) {			define double @powf_exp_const_int_no_fast(double %base) {
	; CHECK-LABEL: @powf_exp_const_int_no_fast(			; CHECK-LABEL: @powf_exp_const_int_no_fast(
	; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.pow.f64(double [[BASE:%.]], double 4.000000e+01)			; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.powi.f64.i16(double [[BASE:%.]], i16 40)
	; CHECK-NEXT: ret double [[RES]]			; CHECK-NEXT: ret double [[RES]]
	;			;
	%res = tail call double @llvm.pow.f64(double %base, double 4.000000e+01)			%res = tail call double @llvm.pow.f64(double %base, double 4.000000e+01)
	ret double %res			ret double %res
	}			}

	define double @powf_exp_const_not_int_fast(double %base) {			define double @powf_exp_const_not_int_fast(double %base) {
	; CHECK-LABEL: @powf_exp_const_not_int_fast(			; CHECK-LABEL: @powf_exp_const_not_int_fast(
	; CHECK-NEXT: [[RES:%.]] = tail call fast double @llvm.pow.f64(double [[BASE:%.]], double 3.750000e+01)			; CHECK-NEXT: [[SQRT:%.]] = call fast double @llvm.sqrt.f64(double [[BASE:%.]])
				; CHECK-NEXT: [[POWI:%.*]] = tail call fast double @llvm.powi.f64.i16(double [[BASE]], i16 37)
				; CHECK-NEXT: [[RES:%.*]] = fmul fast double [[POWI]], [[SQRT]]
	; CHECK-NEXT: ret double [[RES]]			; CHECK-NEXT: ret double [[RES]]
	;			;
	%res = tail call fast double @llvm.pow.f64(double %base, double 3.750000e+01)			%res = tail call fast double @llvm.pow.f64(double %base, double 3.750000e+01)
	ret double %res			ret double %res
	}			}

	define double @powf_exp_const_not_int_no_fast(double %base) {			define double @powf_exp_const_not_int_no_fast(double %base) {
	; CHECK-LABEL: @powf_exp_const_not_int_no_fast(			; CHECK-LABEL: @powf_exp_const_not_int_no_fast(
	; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.pow.f64(double [[BASE:%.]], double 3.750000e+01)			; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.pow.f64(double [[BASE:%.]], double 3.750000e+01)
	; CHECK-NEXT: ret double [[RES]]			; CHECK-NEXT: ret double [[RES]]
	;			;
	%res = tail call double @llvm.pow.f64(double %base, double 3.750000e+01)			%res = tail call double @llvm.pow.f64(double %base, double 3.750000e+01)
	ret double %res			ret double %res
	}			}

	define double @powf_exp_const2_int_no_fast(double %base) {			define double @powf_exp_const2_int_no_fast(double %base) {
	; CHECK-LABEL: @powf_exp_const2_int_no_fast(			; CHECK-LABEL: @powf_exp_const2_int_no_fast(
	; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.pow.f64(double [[BASE:%.]], double -4.000000e+01)			; CHECK-NEXT: [[RES:%.]] = tail call double @llvm.powi.f64.i16(double [[BASE:%.]], i16 -40)
	; CHECK-NEXT: ret double [[RES]]			; CHECK-NEXT: ret double [[RES]]
	;			;
	%res = tail call double @llvm.pow.f64(double %base, double -4.000000e+01)			%res = tail call double @llvm.pow.f64(double %base, double -4.000000e+01)
	ret double %res			ret double %res
	}			}

	declare float @llvm.pow.f32(float, float)			declare float @llvm.pow.f32(float, float)
	declare double @llvm.pow.f64(double, double)			declare double @llvm.pow.f64(double, double)

This is an archive of the discontinued LLVM Phabricator instance.

Transforms: refactor pow(x, n) expansion where n is a constant integer valueClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 443444

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp

llvm/test/CodeGen/AArch64/powi.ll

llvm/test/Transforms/InstCombine/pow-4.ll

llvm/test/Transforms/InstCombine/pow_fp_int.ll

llvm/test/Transforms/InstCombine/pow_fp_int16.ll

Transforms: refactor pow(x, n) expansion where n is a constant integer value
ClosedPublic