This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
41/41
LICM.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
-
expr-reassociate.ll

Differential D152281

[Transforms][LICM] Add the ability to undo unprofitable reassociation
ClosedPublic

Authored by pawosm01 on Jun 6 2023, 9:09 AM.

Download Raw Diff

Details

Reviewers

qcolombet
fhahn
huntergr
nikic
paulwalker-arm

Commits

rG8698d56d996a: [Transforms][LICM] Add the ability to undo unprofitable reassociation

Summary

Consider the following piece of code:

void innermost_loop(int i, double d1, double d2, double delta, int n, double cells[n])
{
  int j;
  const double d1d = d1 * delta;
  const double d2d = d2 * delta;

  for (j = 0; j <= i; j++)
    cells[j] = d1d * cells[j + 1] + d2d * cells[j];
}

When compiling at -Ofast level, after the "Reassociate expressions"
pass, this code is transformed into an equivalent of:

int j;

for (j = 0; j <= i; j++)
  cells[j] = (d1 * cells[j + 1] + d2 * cells[j]) * delta;

Effectively, the computation of those loop invariants isn't done
before the loop anymore, we have one extra multiplication on each
loop iteration instead. Sadly, this results in a significant
performance hit.

Similarly, specifically crafted user code will also experience
inability to hoist those invariants.

This patch is solving this issue by adding the ability to undo such
reassociation into the LICM pass. Note that for doing such
transformation this pass requires the same conditions as the
"Reassociate expressions" pass, namely, the involved binary operators
must have the reassociations allowed (e.g. by specifying the fast
attribute) and they must have single use only.

Some parts of this patch were suggested by Nikita Popov.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pawosm01 created this revision.Jun 6 2023, 9:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2023, 9:09 AM

Herald added subscribers: StephenFan, asbirlea, hiraditya. · View Herald Transcript

pawosm01 requested review of this revision.Jun 6 2023, 9:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2023, 9:09 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B236974: Diff 528898.Jun 6 2023, 9:09 AM

pawosm01 added a parent revision: D152282: [Transforms][LICM] A test case for the upcoming fix D152281 for the issue with reassociation profitability.Jun 6 2023, 9:11 AM

Making it slightly less readable, per clang-format request.

Harbormaster completed remote builds in B237034: Diff 528981.Jun 6 2023, 2:35 PM

mgabka added a subscriber: mgabka.Jun 7 2023, 12:36 AM

pawosm01 mentioned this in D151616: [Transforms][Reassociate] "Reassociate expressions" pass optimizations not always profitable.Jun 7 2023, 9:21 AM

Back to the drawing board: my patch can’t cope with IR emitted by flang-new: Instruction does not dominate all uses

Fixing the problem I've found while using this patch with a larger codebase.

Harbormaster completed remote builds in B238647: Diff 531108.Jun 13 2023, 6:02 PM

I had to simplify this logic. After some time even I couldn't understand it.

Harbormaster completed remote builds in B240189: Diff 533183.Jun 21 2023, 2:43 AM

mgabka added a reviewer: huntergr.Jun 28 2023, 6:59 AM

huntergr added inline comments.Jul 4 2023, 2:00 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2754–2755	This seems a little odd, like you might be processing more candidates than you originally identified and want to prevent wrapping? I think it might be better to see if you can add candidate operations to a SmallVector worklist and process them directly from that instead of walking back through the IR a second time.

xbolva00 added a subscriber: xbolva00.Jul 4 2023, 2:05 AM

qcolombet added inline comments.Jul 4 2023, 6:42 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2677	Could you add a comment on what this function is doing and what it means to return true or false? It'll make the review easier.

Reaction to the review comments.

pawosm01 marked 2 inline comments as done.Jul 6 2023, 3:00 PM

pawosm01 added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
2677	I'll try.
2754–2755	Yeah, that simplifies the code indeed.

Harbormaster completed remote builds in B243586: Diff 537892.Jul 6 2023, 5:54 PM

huntergr added inline comments.Jul 14 2023, 2:26 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2705–2706	Can just 'return false;' here instead of taking the extra step before returning. Same with the other checks in this loop.
2730	I don't think we need a count here; just a boolean 'CandidateFound' would suffice.
2742–2752	Using a loop over the ops with a break after the substitution is performed feels a bit unwieldy. Could you try recording the Ops in a known order in Changes (probably using std::swap as you did before), so you know that e.g. the first one is always loop invariant? I don't think we need to maintain the order of operations for an fmul so you could overwrite both, but if there is a good reason to maintain the order then you could just make the nested pair in changes a tuple instead and record the index of the invariant operand.

pawosm01 updated this revision to Diff 540712.Jul 15 2023, 11:26 AM

pawosm01 marked 2 inline comments as done.

post-review changes

pawosm01 marked 3 inline comments as done.Jul 15 2023, 11:56 AM

pawosm01 added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
2705–2706	Yes, same in the other places.
2730	It isn't as easy as it seems. As it turns, this transformation shouldn't be applied for every situation where at least one candidate is found. There are two problems here: a simple expression like inv1 * var1 * inv2 will be found as a candidate and transformed into var1 * (inv1 * inv2) where (inv1 * inv2) part will be hoisted. But this expression itself can be a part of a longer expression, e.g. inv1 * var1 * inv2 + inv1 * var2 * inv2 + inv1 * var3 * inv2 which before the introduction of my change could easily be transformed into (var1 + var2 + var3) * (inv1 * inv2) with (inv1 * inv2) hoisted. The transformation I'm proposing should rather focus on the expressions described in the comment block placed before this function which usually require undoing the actions of some previous pass (namely, the Reasociate pass), and leave simpler cases alone and make the existing test cases supporting them just pass without any awfully messy change. Hence, I've added `AnyAdds` variable to make sure the intended expression patterns are being found. The previous oversimlified condition eliminated such cases by a sheer coincidence. with the expression long enough and small enough trip count there is a possibility that the transformation leading to the hoisting can cause performance degradation. If most of the operations in a particularly long expression involves non-invariants, there will be more multiplications introduced than operations hoisted. By changing the type of the `Candidates` counter to a signed integer and decrementing it whenever there is no invariant in the operation (hence no candidate for hoisting), the applicability of this transformation can be limited to the situations where the numbers of non-invariant operations and hoisting candidates are at least equal. A crude estimate and naïve one, I can admit that, but anything better would overcomplicate the logic here, e.g. by introducing instruction cost estimation and trip count estimation, the cost of doing it can easily outgrow any benefits.
2742–2752	yeah, I've simplified this.

Harbormaster completed remote builds in B245611: Diff 540712.Jul 15 2023, 12:18 PM

huntergr added inline comments.Jul 20 2023, 1:39 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2702–2704	It may be worthwhile considering an upper limit on the number of multiplies you support -- depending on the trip count of the loop, you may actually be introducing extra work for no gain (especially for a target with wide vectors that may complete everything in one vector iteration).
2730	Yes, I see that if you have multiplies with no loop-invariant operands in the chain of adds then you need to add more multiplies inside the loop, which seems to defeat the purpose of the optimization. So perhaps we should avoid performing this transformation if there is such a multiply in the chain. It might be possible to perform it with only 1 such multiply since you'll be reordering the operations and may reduce the latency, but that would require benchmarking to confirm and is dependent on the workload and target, but I think the safe side is to bail out.

Following the reviewer's suggestions.

pawosm01 marked 2 inline comments as done.Jul 20 2023, 5:48 AM

pawosm01 added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
2702–2704	I've imposed a reasonable upper limit below.

LGTM

This revision is now accepted and ready to land.Jul 20 2023, 6:08 AM

nikic added a subscriber: nikic.Jul 20 2023, 6:38 AM

nikic added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
2677	deassociate -> reassociate
2689–2690	Here and elsewhere: Remove all unnecessary parentheses, apply de Morgan as needed.
2700	I think you just reinvented `Use`? That's how you reference a specific operand of an instruction.

Harbormaster completed remote builds in B246884: Diff 542461.Jul 20 2023, 8:32 AM

Post-review changes again.

pawosm01 marked 3 inline comments as done.Jul 21 2023, 5:53 AM

pawosm01 added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
2700	Ok, I've changed it to make use of `Use`, I hope you'll like it.

pawosm01 marked an inline comment as done.Jul 21 2023, 6:03 AM

nikic added inline comments.Jul 21 2023, 6:45 AM

llvm/lib/Transforms/Scalar/LICM.cpp

2725

This should be &Op->getOperandUse(Ops[L.isLoopInvariant(Ops[0]) ? 0 : 1]) or so. But I think it would be less awkward to do something like this, so you don't go back and forth between operand indices, Values and Uses:

if (Op->getOpcode() != Instruction::FMul) {
  if (OpNext->getOpcode() != Instruction::FMul)
    return false;
  std::swap(Op, OpNext);
}
if (!Op->hasOneUse() || !Op->hasAllowReassoc() || L.isLoopInvariant(Op))
  return false;
Use &U0 = Op->getOperandUse(0);
Use &U1 = Op->getOperandUse(1);
if (L.isLoopInvariant(U0))
  Changes.push_back(&U0);
else if (L.isLoopInvariant(U1))
  Changes.push_back(&U1);
else
  return false;

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LICM.cpp

100 lines

test/

Transforms/

LICM/

expr-reassociate.ll

63 lines

Diff 531108

llvm/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines

STATISTIC(NumLoadPromoted, "Number of load-only promotions"); STATISTIC(NumLoadPromoted, "Number of load-only promotions");

STATISTIC(NumLoadStorePromoted, "Number of load and store promotions"); STATISTIC(NumLoadStorePromoted, "Number of load and store promotions");

STATISTIC(NumMinMaxHoisted, STATISTIC(NumMinMaxHoisted,

"Number of min/max expressions hoisted out of the loop"); "Number of min/max expressions hoisted out of the loop");

STATISTIC(NumGEPsHoisted, STATISTIC(NumGEPsHoisted,

"Number of geps reassociated and hoisted out of the loop"); "Number of geps reassociated and hoisted out of the loop");

STATISTIC(NumAddSubHoisted, "Number of add/subtract expressions reassociated " STATISTIC(NumAddSubHoisted, "Number of add/subtract expressions reassociated "

"and hoisted out of the loop"); "and hoisted out of the loop");

STATISTIC(NumFPAssociationsHoisted, "Number of invariant FP expressions "

"reassociated and hoisted out of the loop");

/// Memory promotion is enabled by default. /// Memory promotion is enabled by default.

static cl::opt<bool> static cl::opt<bool>

DisablePromotion("disable-licm-promotion", cl::Hidden, cl::init(false), DisablePromotion("disable-licm-promotion", cl::Hidden, cl::init(false),

cl::desc("Disable memory promotion in LICM pass")); cl::desc("Disable memory promotion in LICM pass"));

static cl::opt<bool> ControlFlowHoisting( static cl::opt<bool> ControlFlowHoisting(

"licm-control-flow-hoisting", cl::Hidden, cl::init(false), "licm-control-flow-hoisting", cl::Hidden, cl::init(false),

cl::desc("Enable control flow (and PHI) hoisting in LICM")); cl::desc("Enable control flow (and PHI) hoisting in LICM"));

static cl::opt<bool> static cl::opt<bool>

SingleThread("licm-force-thread-model-single", cl::Hidden, cl::init(false), SingleThread("licm-force-thread-model-single", cl::Hidden, cl::init(false),

cl::desc("Force thread model single in LICM pass")); cl::desc("Force thread model single in LICM pass"));

static cl::opt<uint32_t> MaxNumUsesTraversed( static cl::opt<uint32_t> MaxNumUsesTraversed(

"licm-max-num-uses-traversed", cl::Hidden, cl::init(8), "licm-max-num-uses-traversed", cl::Hidden, cl::init(8),

cl::desc("Max num uses visited for identifying load " cl::desc("Max num uses visited for identifying load "

"invariance in loop using invariant start (default = 8)")); "invariance in loop using invariant start (default = 8)"));

// Experimental option to allow imprecision in LICM in pathological cases, in // Experimental option to allow imprecision in LICM in pathological cases, in

// exchange for faster compile. This is to be removed if MemorySSA starts to // exchange for faster compile. This is to be removed if MemorySSA starts to

paulwalker-armUnsubmitted

Done

Up to you but I think "licm-max-num-fp-reassociations" or "licm-fp-reassociation-cap" would be more in keeping with the existing naming of licm options.

paulwalker-arm: Up to you but I think "licm-max-num-fp-reassociations" or "licm-fp-reassociation-cap" would be…

pawosm01AuthorUnsubmitted

Done

yes, "licm-max-num-fp-reassociations" seems a better name for an option.

pawosm01: yes, "licm-max-num-fp-reassociations" seems a better name for an option.

// address the same issue. LICM calls MemorySSAWalker's // address the same issue. LICM calls MemorySSAWalker's

// getClobberingMemoryAccess, up to the value of the Cap, getting perfect // getClobberingMemoryAccess, up to the value of the Cap, getting perfect

// accuracy. Afterwards, LICM will call into MemorySSA's getDefiningAccess, // accuracy. Afterwards, LICM will call into MemorySSA's getDefiningAccess,

// which may not be precise, since optimizeUses is capped. The result is // which may not be precise, since optimizeUses is capped. The result is

// correct, but we may not get as "far up" as possible to get which access is // correct, but we may not get as "far up" as possible to get which access is

// clobbering the one queried. // clobbering the one queried.

cl::opt<unsigned> llvm::SetLicmMssaOptCap( cl::opt<unsigned> llvm::SetLicmMssaOptCap(

"licm-mssa-optimization-cap", cl::init(100), cl::Hidden, "licm-mssa-optimization-cap", cl::init(100), cl::Hidden,

▲ Show 20 Lines • Show All 2,527 Lines • ▼ Show 20 Lines if (hoistAdd(Pred, LHS, RHS, cast<ICmpInst>(I), L, SafetyInfo, MSSAU, AC, DT))

return true; return true;

if (hoistSub(Pred, LHS, RHS, cast<ICmpInst>(I), L, SafetyInfo, MSSAU, AC, DT)) if (hoistSub(Pred, LHS, RHS, cast<ICmpInst>(I), L, SafetyInfo, MSSAU, AC, DT))

return true; return true;

return false; return false;

} }

static bool hoistFPAssociation(Instruction &I, Loop &L,

qcolombetUnsubmitted

Done

Could you add a comment on what this function is doing and what it means to return true or false?

It'll make the review easier.

qcolombet: Could you add a comment on what this function is doing and what it means to return true or…

pawosm01AuthorUnsubmitted

Done

I'll try.

pawosm01: I'll try.

nikicUnsubmitted

Done

deassociate -> reassociate

nikic: deassociate -> reassociate

ICFLoopSafetyInfo &SafetyInfo,

MemorySSAUpdater &MSSAU, AssumptionCache *AC,

DominatorTree *DT) {

using namespace PatternMatch;

Value *VariantOp = nullptr, *InvariantOp = nullptr;

if (!(match(&I, m_FMul(m_Value(VariantOp), m_Value(InvariantOp))) &&

I.hasAllowReassoc()))

return false;

if (L.isLoopInvariant(VariantOp))

std::swap(VariantOp, InvariantOp);

if (L.isLoopInvariant(VariantOp) || !L.isLoopInvariant(InvariantOp))

return false;

nikicUnsubmitted

Done

Value *VariantOp = nullptr, *InvariantOp = nullptr;

- if (!(match(&I, m_FMul(m_Value(VariantOp), m_Value(InvariantOp))) &&

- I.hasAllowReassoc()))

+ if (!match(&I, m_FMul(m_Value(VariantOp), m_Value(InvariantOp))) ||

+ !I.hasAllowReassoc())

return false;

Here and elsewhere: Remove all unnecessary parentheses, apply de Morgan as needed.

nikic: Here and elsewhere: Remove all unnecessary parentheses, apply de Morgan as needed.

Value *Factor = InvariantOp;

// First, we need to know if there is enough candidates for transformation.

unsigned candidates = 0U;

for (BinaryOperator *Op = nullptr, *OpNext = nullptr,

*VOp = dyn_cast<BinaryOperator>(VariantOp);

VOp; VOp = OpNext) {

if (!(VOp->hasOneUse() && VOp->hasAllowReassoc())) {

candidates = 0U;

nikicUnsubmitted

Done

As a final note, I would rewrite this loop into a worklist iteration, something along these lines:

SmallVector<BinaryOperator *> Worklist;
Worklist.push_back(VariantOp);
while (!Worklist.empty()) {
  BinaryOperator *BO = Worklist.pop_back_val();
  if (!BO->hasOneUse())
    return false;
  BinaryOperator *Op0, *Op1;
  if (match(BO, m_FAdd(m_BinOp(Op0), m_BinOp(Op1))) &&
      I->hasAllocReassoc()) {
    Worklist.push_back(Op0);
    Worklist.push_back(Op1);
    continue;
  }
  if (BO->getOpcode() != Instruction::FMul || !BO->hasAllocReassoc() ||
      L.isLoopInvariant(BO))
    return false;
  Use &U0 = BO->getOperandUse(0);
  Use &U1 = BO->getOperandUse(1);
   if (L.isLoopInvariant(U0))
     Changes.push_back(&U0);
   else if (L.isLoopInvariant(U1))
     Changes.push_back(&U1);
   else
     return false;
   if (Changes.size() > 5U) // A reasonable upper limit
     return false;
}

I also wanted to ask where the number 5 for the cutoff comes from. Is that just chosen arbitrarily?

nikic: As a final note, I would rewrite this loop into a worklist iteration, something along these…

pawosm01AuthorUnsubmitted

Done

Well, I deliberately wanted to avoid recursion, even if explicitly using stack (here named a Worklist). But this looks so neat that let's have it this way.

pawosm01: Well, I deliberately wanted to avoid recursion, even if explicitly using stack (here named a…

pawosm01AuthorUnsubmitted

Done

I also wanted to ask where the number 5 for the cutoff comes from. Is that just chosen arbitrarily?

This was chosen arbitrarily indeed, the choice was based on an observation of the problem at hand.

pawosm01: > I also wanted to ask where the number 5 for the cutoff comes from. Is that just chosen…

paulwalker-armUnsubmitted

Done

Perhaps it's worth replacing the hardwired value with a command line option that defaults to 5?

paulwalker-arm: Perhaps it's worth replacing the hardwired value with a command line option that defaults to 5?

pawosm01AuthorUnsubmitted

Done

Agreed. I've added one.

pawosm01: Agreed. I've added one.

break;

nikicUnsubmitted

Done

I think you just reinvented Use? That's how you reference a specific operand of an instruction.

nikic: I think you just reinvented `Use`? That's how you reference a specific operand of an…

pawosm01AuthorUnsubmitted

Done

Ok, I've changed it to make use of Use, I hope you'll like it.

pawosm01: Ok, I've changed it to make use of `Use`, I hope you'll like it.

}

if (!(match(VOp, m_FAdd(m_BinOp(Op), m_BinOp(OpNext))))) {

Op = VOp;

OpNext = nullptr;

huntergrUnsubmitted

Done

It may be worthwhile considering an upper limit on the number of multiplies you support -- depending on the trip count of the loop, you may actually be introducing extra work for no gain (especially for a target with wide vectors that may complete everything in one vector iteration).

huntergr: It may be worthwhile considering an upper limit on the number of multiplies you support…

pawosm01AuthorUnsubmitted

Done

I've imposed a reasonable upper limit below.

pawosm01: I've imposed a reasonable upper limit below.

}

Value *Ops[] = {nullptr, nullptr};

huntergrUnsubmitted

Done

Can just 'return false;' here instead of taking the extra step before returning.

Same with the other checks in this loop.

huntergr: Can just 'return false;' here instead of taking the extra step before returning. Same with the…

pawosm01AuthorUnsubmitted

Done

Yes, same in the other places.

pawosm01: Yes, same in the other places.

if (!(match(Op, m_FMul(m_Value(Ops[0]), m_Value(Ops[1]))))) {

if (!OpNext) {

nikicUnsubmitted

Done

Do we need to require that the fadd is reassoc as well?

nikic: Do we need to require that the fadd is reassoc as well?

pawosm01AuthorUnsubmitted

Done

Yes, this match specifically requires that the expression is of the form that is described in the comment written above this function: ((A1 * B1) + (A2 * B2) + ...). I wrote this function to solve the problem at hand, to which I know that my solution provides noticeable performance benefit. Trying to make it more general would require a whole lot more research and performance impact analysis that would go far beyond what I wanted to fix. I'm offering this solution which I based on the other hoist... functions in this file, I believe if one finds a reason for extending this (to handle their own performance issue at hand), they could do that basing on what I did propose.

pawosm01: Yes, this match specifically requires that the expression is of the form that is described in…

nikicUnsubmitted

Done

Let me rephrase this: Isn't this missing a reassoc check on the fadd? We check the root fmul above, and we check the inner fmuls below, but we don't seem to check anything about the fadd.

nikic: Let me rephrase this: Isn't this missing a reassoc check on the fadd? We check the root fmul…

pawosm01AuthorUnsubmitted

Done

I assumed it would be checked anyhow, when OpNext is processed, but yeah, it is more clear to check it early, yet also it should be checked whether it is FAdd in the first place

pawosm01: I assumed it would be checked anyhow, when OpNext is processed, but yeah, it is more clear to…

candidates = 0U;

break;

}

if (!(match(OpNext, m_FMul(m_Value(Ops[0]), m_Value(Ops[1]))))) {

candidates = 0U;

break;

}

std::swap(Op, OpNext);

}

if (!(Op->hasOneUse() && Op->hasAllowReassoc())) {

candidates = 0U;

break;

}

if ((L.isLoopInvariant(Ops[0])) || (L.isLoopInvariant(Ops[1])))

candidates++;

}

if (!(candidates > 1U))

nikicUnsubmitted

Done

if (Op->getOpcode() != Instruction::FMul) {
  if (OpNext->getOpcode() != Instruction::FMul)
    return false;
  std::swap(Op, OpNext);
}
if (!Op->hasOneUse() || !Op->hasAllowReassoc() || L.isLoopInvariant(Op))
  return false;
Use &U0 = Op->getOperandUse(0);
Use &U1 = Op->getOperandUse(1);
if (L.isLoopInvariant(U0))
  Changes.push_back(&U0);
else if (L.isLoopInvariant(U1))
  Changes.push_back(&U1);
else
  return false;

nikic: This should be `&Op->getOperandUse(Ops[L.isLoopInvariant(Ops[0]) ? 0 : 1])` or so. But I think…

pawosm01AuthorUnsubmitted

Done

ok, thanks for your valuable input.

pawosm01: ok, thanks for your valuable input.

return false;

// We know we have enough candidates, let's do the transformations.

auto *Preheader = L.getLoopPreheader();

assert(Preheader && "Loop is not in simplify form?");

huntergrUnsubmitted

Done

I don't think we need a count here; just a boolean 'CandidateFound' would suffice.

huntergr: I don't think we need a count here; just a boolean 'CandidateFound' would suffice.

pawosm01AuthorUnsubmitted

Done

It isn't as easy as it seems. As it turns, this transformation shouldn't be applied for every situation where at least one candidate is found.

There are two problems here:

a simple expression like inv1 * var1 * inv2 will be found as a candidate and transformed into var1 * (inv1 * inv2) where (inv1 * inv2) part will be hoisted. But this expression itself can be a part of a longer expression, e.g. inv1 * var1 * inv2 + inv1 * var2 * inv2 + inv1 * var3 * inv2 which before the introduction of my change could easily be transformed into (var1 + var2 + var3) * (inv1 * inv2) with (inv1 * inv2) hoisted. The transformation I'm proposing should rather focus on the expressions described in the comment block placed before this function which usually require undoing the actions of some previous pass (namely, the Reasociate pass), and leave simpler cases alone and make the existing test cases supporting them just pass without any awfully messy change. Hence, I've added AnyAdds variable to make sure the intended expression patterns are being found. The previous oversimlified condition eliminated such cases by a sheer coincidence.

with the expression long enough and small enough trip count there is a possibility that the transformation leading to the hoisting can cause performance degradation. If most of the operations in a particularly long expression involves non-invariants, there will be more multiplications introduced than operations hoisted. By changing the type of the Candidates counter to a signed integer and decrementing it whenever there is no invariant in the operation (hence no candidate for hoisting), the applicability of this transformation can be limited to the situations where the numbers of non-invariant operations and hoisting candidates are at least equal. A crude estimate and naïve one, I can admit that, but anything better would overcomplicate the logic here, e.g. by introducing instruction cost estimation and trip count estimation, the cost of doing it can easily outgrow any benefits.

pawosm01: It isn't as easy as it seems. As it turns, this transformation shouldn't be applied for every…

huntergrUnsubmitted

Done

Yes, I see that if you have multiplies with no loop-invariant operands in the chain of adds then you need to add more multiplies inside the loop, which seems to defeat the purpose of the optimization. So perhaps we should avoid performing this transformation if there is such a multiply in the chain.

It might be possible to perform it with only 1 such multiply since you'll be reordering the operations and may reduce the latency, but that would require benchmarking to confirm and is dependent on the workload and target, but I think the safe side is to bail out.

huntergr: Yes, I see that if you have multiplies with no loop-invariant operands in the chain of adds…

nikicUnsubmitted

Done

I didn't quite get your argument in 1. for why it's a bad idea to transform inv1 * var1 * inv2 into var1 * (inv1 * inv2). Why is there a problem if it's part of a longer expression? Wouldn't we still get (var1 + var2 + var3) * (inv1 * inv2) as the end result if inv1 * inv2 is hoisted out of each part and then CSEd?

As far as I can tell, the transform should be profitable regardless of whether there is an fadd or not.

nikic: I didn't quite get your argument in 1. for why it's a bad idea to transform inv1 * var1 * inv2…

pawosm01AuthorUnsubmitted

Done

Followed your observation, I've added such ability, plus a simple test case covering it.

pawosm01: Followed your observation, I've added such ability, plus a simple test case covering it.

for (BinaryOperator *Op = nullptr, *OpNext = nullptr,

*VOp = dyn_cast<BinaryOperator>(VariantOp);

VOp; VOp = OpNext) {

if (!(match(VOp, m_FAdd(m_BinOp(Op), m_BinOp(OpNext))))) {

Op = VOp;

nikicUnsubmitted

Done

nit: Can move this IRBuilder outside the loop.

nikic: nit: Can move this IRBuilder outside the loop.

pawosm01AuthorUnsubmitted

Done

Ah, yeah, good catch; corrected.

pawosm01: Ah, yeah, good catch; corrected.

OpNext = nullptr;

}

Value *Ops[] = {nullptr, nullptr};

if (!(match(Op, m_FMul(m_Value(Ops[0]), m_Value(Ops[1]))))) {

nikicUnsubmitted

Done

dyn_cast + assert == cast

nikic: dyn_cast + assert == cast

pawosm01AuthorUnsubmitted

Done

ah, I didn't notice that, I'll get back to it soon.

pawosm01: ah, I didn't notice that, I'll get back to it soon.

pawosm01AuthorUnsubmitted

Done

Done.

pawosm01: Done.

assert(OpNext && "Operation is neither FAdd or FMul!");

if (!(match(OpNext, m_FMul(m_Value(Ops[0]), m_Value(Ops[1])))))

OpNext = nullptr;

nikicUnsubmitted

Done

This copies the FMF flags from the top-level fmul, but is that correct? We have multiple fadds and fmuls involved in the transform, why is it safe to use the flags from that one in particular?

(The test coverage for this is not great, because you just use "fast" everywhere.)

nikic: This copies the FMF flags from the top-level fmul, but is that correct? We have multiple fadds…

pawosm01AuthorUnsubmitted

Done

Yes I assume it's correct. What is happening here is that the fmul instruction from which the flag is taken is effectively being copied to every subexpression in the parenthesis, it is reasonable to copy the fast flag too. As checking for associations being allowed is implicitly checking the fast flag, nothing harmful is happening here.

pawosm01: Yes I assume it's correct. What is happening here is that the fmul instruction from which the…

nikicUnsubmitted

Done

The transform only requires the reassoc flag on each instruction, while this is also copying all other FMF flags. It's not obvious to me that this is correct.

One could probably make an argument along the lines of "if the final fmul is nnan, then that implies that the fadd chain result is not nan. As the result would be nan if any operand were nan, we can propagate the nnan flag up the chain". This sounds plausible. But I'm not sure the same logic holds up for ninf and other FMF flags.

nikic: The transform only requires the reassoc flag on each instruction, while this is also copying…

pawosm01AuthorUnsubmitted

Done

Ok, so let's copy it from the affected instruction itself...

pawosm01: Ok, so let's copy it from the affected instruction itself...

std::swap(Op, OpNext);

}

assert(Op && "Neither current or next operation is FMul!");

Value *Factored = nullptr;

for (int i = 0; i < 2; i++) {

if (i || L.isLoopInvariant(Ops[i])) {

assert(!Factored);

IRBuilder<> Builder(

(L.isLoopInvariant(Ops[i]) && (Op->getParent() != Preheader))

? Preheader->getTerminator()

huntergrUnsubmitted

Done

Using a loop over the ops with a break after the substitution is performed feels a bit unwieldy. Could you try recording the Ops in a known order in Changes (probably using std::swap as you did before), so you know that e.g. the first one is always loop invariant? I don't think we need to maintain the order of operations for an fmul so you could overwrite both, but if there is a good reason to maintain the order then you could just make the nested pair in changes a tuple instead and record the index of the invariant operand.

huntergr: Using a loop over the ops with a break after the substitution is performed feels a bit unwieldy.

pawosm01AuthorUnsubmitted

Done

yeah, I've simplified this.

pawosm01: yeah, I've simplified this.

: Op);

Factored = Builder.CreateFMulFMF(Ops[i], Factor, &I, "factor.op.fmul");

Op->setOperand(i, Factored);

huntergrUnsubmitted

Done

This seems a little odd, like you might be processing more candidates than you originally identified and want to prevent wrapping?

I think it might be better to see if you can add candidate operations to a SmallVector worklist and process them directly from that instead of walking back through the IR a second time.

huntergr: This seems a little odd, like you might be processing more candidates than you originally…

pawosm01AuthorUnsubmitted

Done

Yeah, that simplifies the code indeed.

pawosm01: Yeah, that simplifies the code indeed.

if (candidates)

candidates--;

break;

}

assert(Factored);

}

assert(!candidates);

I.replaceAllUsesWith(VariantOp);

eraseInstruction(I, SafetyInfo, MSSAU);

return true;

}

static bool hoistArithmetics(Instruction &I, Loop &L, static bool hoistArithmetics(Instruction &I, Loop &L,

ICFLoopSafetyInfo &SafetyInfo, ICFLoopSafetyInfo &SafetyInfo,

MemorySSAUpdater &MSSAU, AssumptionCache *AC, MemorySSAUpdater &MSSAU, AssumptionCache *AC,

DominatorTree *DT) { DominatorTree *DT) {

// Optimize complex patterns, such as (x < INV1 && x < INV2), turning them // Optimize complex patterns, such as (x < INV1 && x < INV2), turning them

// into (x < min(INV1, INV2)), and hoisting the invariant part of this // into (x < min(INV1, INV2)), and hoisting the invariant part of this

// expression out of the loop. // expression out of the loop.

if (hoistMinMax(I, L, SafetyInfo, MSSAU)) { if (hoistMinMax(I, L, SafetyInfo, MSSAU)) {

Show All 11 Lines static bool hoistArithmetics(Instruction &I, Loop &L,

// Try to hoist add/sub's by reassociation. // Try to hoist add/sub's by reassociation.

if (hoistAddSub(I, L, SafetyInfo, MSSAU, AC, DT)) { if (hoistAddSub(I, L, SafetyInfo, MSSAU, AC, DT)) {

++NumHoisted; ++NumHoisted;

++NumAddSubHoisted; ++NumAddSubHoisted;

return true; return true;

} }

if (hoistFPAssociation(I, L, SafetyInfo, MSSAU, AC, DT)) {

++NumHoisted;

++NumFPAssociationsHoisted;

return true;

}

return false; return false;

} }

/// Little predicate that returns true if the specified basic block is in /// Little predicate that returns true if the specified basic block is in

/// a subloop of the current one, not the current one itself. /// a subloop of the current one, not the current one itself.

/// ///

static bool inSubLoop(BasicBlock *BB, Loop *CurLoop, LoopInfo *LI) { static bool inSubLoop(BasicBlock *BB, Loop *CurLoop, LoopInfo *LI) {

assert(CurLoop->contains(BB) && "Only valid if BB is IN the loop"); assert(CurLoop->contains(BB) && "Only valid if BB is IN the loop");

return LI->getLoopFor(BB) != CurLoop; return LI->getLoopFor(BB) != CurLoop;

} }

llvm/test/Transforms/LICM/expr-reassociate.ll

	Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines
	; LICM_ONLY-NEXT: store double [[FADD_1]], ptr [[ARRAYIDX_J]], align 8			; LICM_ONLY-NEXT: store double [[FADD_1]], ptr [[ARRAYIDX_J]], align 8
	; LICM_ONLY-NEXT: br label [[FOR_COND]]			; LICM_ONLY-NEXT: br label [[FOR_COND]]
	; LICM_ONLY: for.end:			; LICM_ONLY: for.end:
	; LICM_ONLY-NEXT: ret void			; LICM_ONLY-NEXT: ret void
	;			;
	; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_2d_fast			; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_2d_fast
	; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[DELTA:%.]], ptr [[CELLS:%.*]]) {			; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[DELTA:%.]], ptr [[CELLS:%.*]]) {
	; LICM_AFTER_REASSOCIATE-NEXT: entry:			; LICM_AFTER_REASSOCIATE-NEXT: entry:
				; LICM_AFTER_REASSOCIATE-NEXT: [[FACTOR_OP_FMUL:%.*]] = fmul fast double [[D2]], [[DELTA]]
				; LICM_AFTER_REASSOCIATE-NEXT: [[FACTOR_OP_FMUL1:%.*]] = fmul fast double [[D1]], [[DELTA]]
	; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND:%.*]]			; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND:%.*]]
	; LICM_AFTER_REASSOCIATE: for.cond:			; LICM_AFTER_REASSOCIATE: for.cond:
	; LICM_AFTER_REASSOCIATE-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]			; LICM_AFTER_REASSOCIATE-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]			; LICM_AFTER_REASSOCIATE-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
	; LICM_AFTER_REASSOCIATE-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; LICM_AFTER_REASSOCIATE-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; LICM_AFTER_REASSOCIATE: for.body:			; LICM_AFTER_REASSOCIATE: for.body:
	; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1			; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
	; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64			; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
	; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8			; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[D1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[FACTOR_OP_FMUL1]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64			; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
	; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]			; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8			; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_2]], [[D2]]			; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_2]], [[FACTOR_OP_FMUL]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[FMUL_2]], [[FMUL_1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[FMUL_2]], [[FMUL_1]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_MUL:%.*]] = fmul fast double [[REASS_ADD]], [[DELTA]]			; LICM_AFTER_REASSOCIATE-NEXT: store double [[REASS_ADD]], ptr [[ARRAYIDX_J]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: store double [[REASS_MUL]], ptr [[ARRAYIDX_J]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND]]			; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND]]
	; LICM_AFTER_REASSOCIATE: for.end:			; LICM_AFTER_REASSOCIATE: for.end:
	; LICM_AFTER_REASSOCIATE-NEXT: ret void			; LICM_AFTER_REASSOCIATE-NEXT: ret void
	;			;
	entry:			entry:
	%fmul.d1 = fmul fast double %d1, %delta			%fmul.d1 = fmul fast double %d1, %delta
	%fmul.d2 = fmul fast double %d2, %delta			%fmul.d2 = fmul fast double %d2, %delta
	br label %for.cond			br label %for.cond
	▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines
	; LICM_ONLY-NEXT: store double [[FADD_2]], ptr [[ARRAYIDX_J_2]], align 8			; LICM_ONLY-NEXT: store double [[FADD_2]], ptr [[ARRAYIDX_J_2]], align 8
	; LICM_ONLY-NEXT: br label [[FOR_COND]]			; LICM_ONLY-NEXT: br label [[FOR_COND]]
	; LICM_ONLY: for.end:			; LICM_ONLY: for.end:
	; LICM_ONLY-NEXT: ret void			; LICM_ONLY-NEXT: ret void
	;			;
	; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_3d_fast			; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_3d_fast
	; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[D3:%.]], double [[DELTA:%.]], ptr [[CELLS:%.]]) {			; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[D3:%.]], double [[DELTA:%.]], ptr [[CELLS:%.]]) {
	; LICM_AFTER_REASSOCIATE-NEXT: entry:			; LICM_AFTER_REASSOCIATE-NEXT: entry:
				; LICM_AFTER_REASSOCIATE-NEXT: [[FACTOR_OP_FMUL:%.*]] = fmul fast double [[D3]], [[DELTA]]
				; LICM_AFTER_REASSOCIATE-NEXT: [[FACTOR_OP_FMUL2:%.*]] = fmul fast double [[D2]], [[DELTA]]
				; LICM_AFTER_REASSOCIATE-NEXT: [[FACTOR_OP_FMUL3:%.*]] = fmul fast double [[D1]], [[DELTA]]
	; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND:%.*]]			; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND:%.*]]
	; LICM_AFTER_REASSOCIATE: for.cond:			; LICM_AFTER_REASSOCIATE: for.cond:
	; LICM_AFTER_REASSOCIATE-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]			; LICM_AFTER_REASSOCIATE-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]			; LICM_AFTER_REASSOCIATE-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
	; LICM_AFTER_REASSOCIATE-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; LICM_AFTER_REASSOCIATE-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; LICM_AFTER_REASSOCIATE: for.body:			; LICM_AFTER_REASSOCIATE: for.body:
	; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1			; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
	; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64			; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
	; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8			; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[D1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[FACTOR_OP_FMUL3]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64			; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
	; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]			; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8			; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_2]], [[D2]]			; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_2]], [[FACTOR_OP_FMUL2]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_2:%.*]] = add nuw nsw i32 [[J]], 2			; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_2:%.*]] = add nuw nsw i32 [[J]], 2
	; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_2:%.*]] = zext i32 [[ADD_J_2]] to i64			; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_2:%.*]] = zext i32 [[ADD_J_2]] to i64
	; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_2:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_2]]			; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_2:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_2]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_3:%.*]] = load double, ptr [[ARRAYIDX_J_2]], align 8			; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_3:%.*]] = load double, ptr [[ARRAYIDX_J_2]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_3:%.*]] = fmul fast double [[CELL_3]], [[D3]]			; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_3:%.*]] = fmul fast double [[CELL_3]], [[FACTOR_OP_FMUL]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[FMUL_2]], [[FMUL_1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[FMUL_2]], [[FMUL_1]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_ADD1:%.*]] = fadd fast double [[REASS_ADD]], [[FMUL_3]]			; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_ADD1:%.*]] = fadd fast double [[REASS_ADD]], [[FMUL_3]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_MUL:%.*]] = fmul fast double [[REASS_ADD1]], [[DELTA]]			; LICM_AFTER_REASSOCIATE-NEXT: store double [[REASS_ADD1]], ptr [[ARRAYIDX_J_2]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: store double [[REASS_MUL]], ptr [[ARRAYIDX_J_2]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND]]			; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND]]
	; LICM_AFTER_REASSOCIATE: for.end:			; LICM_AFTER_REASSOCIATE: for.end:
	; LICM_AFTER_REASSOCIATE-NEXT: ret void			; LICM_AFTER_REASSOCIATE-NEXT: ret void
	;			;
	entry:			entry:
	%fmul.d1 = fmul fast double %d1, %delta			%fmul.d1 = fmul fast double %d1, %delta
	%fmul.d2 = fmul fast double %d2, %delta			%fmul.d2 = fmul fast double %d2, %delta
	%fmul.d3 = fmul fast double %d3, %delta			%fmul.d3 = fmul fast double %d3, %delta
	▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
	; REASSOCIATE_ONLY-NEXT: store double [[REASS_MUL]], ptr [[ARRAYIDX_J]], align 8			; REASSOCIATE_ONLY-NEXT: store double [[REASS_MUL]], ptr [[ARRAYIDX_J]], align 8
	; REASSOCIATE_ONLY-NEXT: br label [[FOR_COND]]			; REASSOCIATE_ONLY-NEXT: br label [[FOR_COND]]
	; REASSOCIATE_ONLY: for.end:			; REASSOCIATE_ONLY: for.end:
	; REASSOCIATE_ONLY-NEXT: ret void			; REASSOCIATE_ONLY-NEXT: ret void
	;			;
	; LICM_ONLY-LABEL: define void @innermost_loop_2d_fast_reassociated			; LICM_ONLY-LABEL: define void @innermost_loop_2d_fast_reassociated
	; LICM_ONLY-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[DELTA:%.]], ptr [[CELLS:%.*]]) {			; LICM_ONLY-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[DELTA:%.]], ptr [[CELLS:%.*]]) {
	; LICM_ONLY-NEXT: entry:			; LICM_ONLY-NEXT: entry:
				; LICM_ONLY-NEXT: [[FACTOR_OP_FMUL:%.*]] = fmul fast double [[D2]], [[DELTA]]
				; LICM_ONLY-NEXT: [[FACTOR_OP_FMUL1:%.*]] = fmul fast double [[D1]], [[DELTA]]
	; LICM_ONLY-NEXT: br label [[FOR_COND:%.*]]			; LICM_ONLY-NEXT: br label [[FOR_COND:%.*]]
	; LICM_ONLY: for.cond:			; LICM_ONLY: for.cond:
	; LICM_ONLY-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]			; LICM_ONLY-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]
	; LICM_ONLY-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]			; LICM_ONLY-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
	; LICM_ONLY-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; LICM_ONLY-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; LICM_ONLY: for.body:			; LICM_ONLY: for.body:
	; LICM_ONLY-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1			; LICM_ONLY-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
	; LICM_ONLY-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64			; LICM_ONLY-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
	; LICM_ONLY-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]			; LICM_ONLY-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
	; LICM_ONLY-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8			; LICM_ONLY-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8
	; LICM_ONLY-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[D1]]			; LICM_ONLY-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[FACTOR_OP_FMUL1]]
	; LICM_ONLY-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64			; LICM_ONLY-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
	; LICM_ONLY-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]			; LICM_ONLY-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]
	; LICM_ONLY-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8			; LICM_ONLY-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8
	; LICM_ONLY-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_2]], [[D2]]			; LICM_ONLY-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_2]], [[FACTOR_OP_FMUL]]
	; LICM_ONLY-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[FMUL_2]], [[FMUL_1]]			; LICM_ONLY-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[FMUL_2]], [[FMUL_1]]
	; LICM_ONLY-NEXT: [[REASS_MUL:%.*]] = fmul fast double [[REASS_ADD]], [[DELTA]]			; LICM_ONLY-NEXT: store double [[REASS_ADD]], ptr [[ARRAYIDX_J]], align 8
	; LICM_ONLY-NEXT: store double [[REASS_MUL]], ptr [[ARRAYIDX_J]], align 8
	; LICM_ONLY-NEXT: br label [[FOR_COND]]			; LICM_ONLY-NEXT: br label [[FOR_COND]]
	; LICM_ONLY: for.end:			; LICM_ONLY: for.end:
	; LICM_ONLY-NEXT: ret void			; LICM_ONLY-NEXT: ret void
	;			;
	; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_2d_fast_reassociated			; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_2d_fast_reassociated
	; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[DELTA:%.]], ptr [[CELLS:%.*]]) {			; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[DELTA:%.]], ptr [[CELLS:%.*]]) {
	; LICM_AFTER_REASSOCIATE-NEXT: entry:			; LICM_AFTER_REASSOCIATE-NEXT: entry:
				; LICM_AFTER_REASSOCIATE-NEXT: [[FACTOR_OP_FMUL:%.*]] = fmul fast double [[D2]], [[DELTA]]
				; LICM_AFTER_REASSOCIATE-NEXT: [[FACTOR_OP_FMUL1:%.*]] = fmul fast double [[D1]], [[DELTA]]
	; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND:%.*]]			; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND:%.*]]
	; LICM_AFTER_REASSOCIATE: for.cond:			; LICM_AFTER_REASSOCIATE: for.cond:
	; LICM_AFTER_REASSOCIATE-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]			; LICM_AFTER_REASSOCIATE-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]			; LICM_AFTER_REASSOCIATE-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
	; LICM_AFTER_REASSOCIATE-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; LICM_AFTER_REASSOCIATE-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; LICM_AFTER_REASSOCIATE: for.body:			; LICM_AFTER_REASSOCIATE: for.body:
	; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1			; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
	; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64			; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
	; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8			; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[D1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[FACTOR_OP_FMUL1]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64			; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
	; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]			; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8			; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_2]], [[D2]]			; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_2]], [[FACTOR_OP_FMUL]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[FMUL_2]], [[FMUL_1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[FMUL_2]], [[FMUL_1]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_MUL:%.*]] = fmul fast double [[REASS_ADD]], [[DELTA]]			; LICM_AFTER_REASSOCIATE-NEXT: store double [[REASS_ADD]], ptr [[ARRAYIDX_J]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: store double [[REASS_MUL]], ptr [[ARRAYIDX_J]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND]]			; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND]]
	; LICM_AFTER_REASSOCIATE: for.end:			; LICM_AFTER_REASSOCIATE: for.end:
	; LICM_AFTER_REASSOCIATE-NEXT: ret void			; LICM_AFTER_REASSOCIATE-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines
	; REASSOCIATE_ONLY-NEXT: store double [[REASS_MUL]], ptr [[ARRAYIDX_J]], align 8			; REASSOCIATE_ONLY-NEXT: store double [[REASS_MUL]], ptr [[ARRAYIDX_J]], align 8
	; REASSOCIATE_ONLY-NEXT: br label [[FOR_COND]]			; REASSOCIATE_ONLY-NEXT: br label [[FOR_COND]]
	; REASSOCIATE_ONLY: for.end:			; REASSOCIATE_ONLY: for.end:
	; REASSOCIATE_ONLY-NEXT: ret void			; REASSOCIATE_ONLY-NEXT: ret void
	;			;
	; LICM_ONLY-LABEL: define void @innermost_loop_3d_fast_reassociated_different			; LICM_ONLY-LABEL: define void @innermost_loop_3d_fast_reassociated_different
	; LICM_ONLY-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[DELTA:%.]], ptr [[CELLS:%.*]]) {			; LICM_ONLY-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[DELTA:%.]], ptr [[CELLS:%.*]]) {
	; LICM_ONLY-NEXT: entry:			; LICM_ONLY-NEXT: entry:
				; LICM_ONLY-NEXT: [[FACTOR_OP_FMUL1:%.*]] = fmul fast double [[D2]], [[DELTA]]
				; LICM_ONLY-NEXT: [[FACTOR_OP_FMUL2:%.*]] = fmul fast double [[D1]], [[DELTA]]
	; LICM_ONLY-NEXT: br label [[FOR_COND:%.*]]			; LICM_ONLY-NEXT: br label [[FOR_COND:%.*]]
	; LICM_ONLY: for.cond:			; LICM_ONLY: for.cond:
	; LICM_ONLY-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]			; LICM_ONLY-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]
	; LICM_ONLY-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]			; LICM_ONLY-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
	; LICM_ONLY-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; LICM_ONLY-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; LICM_ONLY: for.body:			; LICM_ONLY: for.body:
	; LICM_ONLY-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1			; LICM_ONLY-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
	; LICM_ONLY-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64			; LICM_ONLY-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
	; LICM_ONLY-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]			; LICM_ONLY-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
	; LICM_ONLY-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8			; LICM_ONLY-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8
	; LICM_ONLY-NEXT: [[IDXPROM_J_2:%.*]] = zext i32 [[ADD_J_1]] to i64			; LICM_ONLY-NEXT: [[IDXPROM_J_2:%.*]] = zext i32 [[ADD_J_1]] to i64
	; LICM_ONLY-NEXT: [[ARRAYIDX_J_2:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_2]]			; LICM_ONLY-NEXT: [[ARRAYIDX_J_2:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_2]]
	; LICM_ONLY-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J_2]], align 8			; LICM_ONLY-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J_2]], align 8
	; LICM_ONLY-NEXT: [[CELL_3:%.*]] = load double, ptr [[ARRAYIDX_J_2]], align 8			; LICM_ONLY-NEXT: [[CELL_3:%.*]] = load double, ptr [[ARRAYIDX_J_2]], align 8
	; LICM_ONLY-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64			; LICM_ONLY-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
	; LICM_ONLY-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]			; LICM_ONLY-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]
	; LICM_ONLY-NEXT: [[CELL_4:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8			; LICM_ONLY-NEXT: [[CELL_4:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8
	; LICM_ONLY-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[D1]]			; LICM_ONLY-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[FACTOR_OP_FMUL2]]
	; LICM_ONLY-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_4]], [[D2]]			; LICM_ONLY-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_4]], [[FACTOR_OP_FMUL1]]
	; LICM_ONLY-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[FMUL_2]], [[FMUL_1]]			; LICM_ONLY-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[FMUL_2]], [[FMUL_1]]
	; LICM_ONLY-NEXT: [[EXTRA_MUL:%.*]] = fmul fast double [[CELL_3]], [[CELL_2]]			; LICM_ONLY-NEXT: [[FACTOR_OP_FMUL:%.*]] = fmul fast double [[CELL_2]], [[DELTA]]
				; LICM_ONLY-NEXT: [[EXTRA_MUL:%.*]] = fmul fast double [[CELL_3]], [[FACTOR_OP_FMUL]]
	; LICM_ONLY-NEXT: [[EXTRA_ADD:%.*]] = fadd fast double [[EXTRA_MUL]], [[REASS_ADD]]			; LICM_ONLY-NEXT: [[EXTRA_ADD:%.*]] = fadd fast double [[EXTRA_MUL]], [[REASS_ADD]]
	; LICM_ONLY-NEXT: [[REASS_MUL:%.*]] = fmul fast double [[EXTRA_ADD]], [[DELTA]]			; LICM_ONLY-NEXT: store double [[EXTRA_ADD]], ptr [[ARRAYIDX_J]], align 8
	; LICM_ONLY-NEXT: store double [[REASS_MUL]], ptr [[ARRAYIDX_J]], align 8
	; LICM_ONLY-NEXT: br label [[FOR_COND]]			; LICM_ONLY-NEXT: br label [[FOR_COND]]
	; LICM_ONLY: for.end:			; LICM_ONLY: for.end:
	; LICM_ONLY-NEXT: ret void			; LICM_ONLY-NEXT: ret void
	;			;
	; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_3d_fast_reassociated_different			; LICM_AFTER_REASSOCIATE-LABEL: define void @innermost_loop_3d_fast_reassociated_different
	; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[DELTA:%.]], ptr [[CELLS:%.*]]) {			; LICM_AFTER_REASSOCIATE-SAME: (i32 [[I:%.]], double [[D1:%.]], double [[D2:%.]], double [[DELTA:%.]], ptr [[CELLS:%.*]]) {
	; LICM_AFTER_REASSOCIATE-NEXT: entry:			; LICM_AFTER_REASSOCIATE-NEXT: entry:
				; LICM_AFTER_REASSOCIATE-NEXT: [[FACTOR_OP_FMUL:%.*]] = fmul fast double [[D2]], [[DELTA]]
				; LICM_AFTER_REASSOCIATE-NEXT: [[FACTOR_OP_FMUL2:%.*]] = fmul fast double [[D1]], [[DELTA]]
	; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND:%.*]]			; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND:%.*]]
	; LICM_AFTER_REASSOCIATE: for.cond:			; LICM_AFTER_REASSOCIATE: for.cond:
	; LICM_AFTER_REASSOCIATE-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]			; LICM_AFTER_REASSOCIATE-NEXT: [[J:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD_J_1:%.]], [[FOR_BODY:%.]] ]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]			; LICM_AFTER_REASSOCIATE-NEXT: [[CMP_NOT:%.*]] = icmp sgt i32 [[J]], [[I]]
	; LICM_AFTER_REASSOCIATE-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; LICM_AFTER_REASSOCIATE-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; LICM_AFTER_REASSOCIATE: for.body:			; LICM_AFTER_REASSOCIATE: for.body:
	; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1			; LICM_AFTER_REASSOCIATE-NEXT: [[ADD_J_1]] = add nuw nsw i32 [[J]], 1
	; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64			; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_1:%.*]] = zext i32 [[ADD_J_1]] to i64
	; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_1:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_1]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8			; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_1:%.*]] = load double, ptr [[ARRAYIDX_J_1]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_2:%.*]] = zext i32 [[ADD_J_1]] to i64			; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J_2:%.*]] = zext i32 [[ADD_J_1]] to i64
	; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_2:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_2]]			; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J_2:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J_2]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J_2]], align 8			; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_2:%.*]] = load double, ptr [[ARRAYIDX_J_2]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_3:%.*]] = load double, ptr [[ARRAYIDX_J_2]], align 8			; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_3:%.*]] = load double, ptr [[ARRAYIDX_J_2]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64			; LICM_AFTER_REASSOCIATE-NEXT: [[IDXPROM_J:%.*]] = zext i32 [[J]] to i64
	; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]			; LICM_AFTER_REASSOCIATE-NEXT: [[ARRAYIDX_J:%.*]] = getelementptr inbounds double, ptr [[CELLS]], i64 [[IDXPROM_J]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_4:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8			; LICM_AFTER_REASSOCIATE-NEXT: [[CELL_4:%.*]] = load double, ptr [[ARRAYIDX_J]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[D1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_1:%.*]] = fmul fast double [[CELL_1]], [[FACTOR_OP_FMUL2]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_4]], [[D2]]			; LICM_AFTER_REASSOCIATE-NEXT: [[FMUL_2:%.*]] = fmul fast double [[CELL_4]], [[FACTOR_OP_FMUL]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[EXTRA_MUL:%.*]] = fmul fast double [[CELL_3]], [[CELL_2]]			; LICM_AFTER_REASSOCIATE-NEXT: [[FACTOR_OP_FMUL1:%.*]] = fmul fast double [[CELL_2]], [[DELTA]]
				; LICM_AFTER_REASSOCIATE-NEXT: [[EXTRA_MUL:%.*]] = fmul fast double [[CELL_3]], [[FACTOR_OP_FMUL1]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[EXTRA_MUL]], [[FMUL_1]]			; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_ADD:%.*]] = fadd fast double [[EXTRA_MUL]], [[FMUL_1]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[EXTRA_ADD:%.*]] = fadd fast double [[REASS_ADD]], [[FMUL_2]]			; LICM_AFTER_REASSOCIATE-NEXT: [[EXTRA_ADD:%.*]] = fadd fast double [[REASS_ADD]], [[FMUL_2]]
	; LICM_AFTER_REASSOCIATE-NEXT: [[REASS_MUL:%.*]] = fmul fast double [[EXTRA_ADD]], [[DELTA]]			; LICM_AFTER_REASSOCIATE-NEXT: store double [[EXTRA_ADD]], ptr [[ARRAYIDX_J]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: store double [[REASS_MUL]], ptr [[ARRAYIDX_J]], align 8
	; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND]]			; LICM_AFTER_REASSOCIATE-NEXT: br label [[FOR_COND]]
	; LICM_AFTER_REASSOCIATE: for.end:			; LICM_AFTER_REASSOCIATE: for.end:
	; LICM_AFTER_REASSOCIATE-NEXT: ret void			; LICM_AFTER_REASSOCIATE-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	Show All 32 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Transforms][LICM] Add the ability to undo unprofitable reassociationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 531108

llvm/lib/Transforms/Scalar/LICM.cpp

llvm/test/Transforms/LICM/expr-reassociate.ll

[Transforms][LICM] Add the ability to undo unprofitable reassociation
ClosedPublic