This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
1/4
LoopInterchange.cpp
-
test/Transforms/LoopInterchange/
-
Transforms/
-
LoopInterchange/
-
lcssa.ll
-
reductions-across-inner-and-outer-loop.ll

Differential D117450

[LoopInterchange] Support loop interchange with floating point reductions
ClosedPublic

Authored by congzhe on Jan 16 2022, 8:50 PM.

Download Raw Diff

Details

Reviewers

bmahjour
Whitney
Meinersbur

Group Reviewers

Restricted Project

Commits

rG1ef04326ec5f: [LoopInterchange] Support loop interchange with floating point reductions

Summary

Enabled loop interchange support for floating point reductions if fastmath is enabled.

Previously when we encouter a floating point PHI node in the outer loop exit block, we bailed out since we could not detect floating point reductions in the early days. Now we remove this limiation since we are able to detect floating point reductions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

congzhe created this revision.Jan 16 2022, 8:50 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 16 2022, 8:50 PM

congzhe requested review of this revision.Jan 16 2022, 8:50 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 16 2022, 8:50 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

congzhe edited the summary of this revision. (Show Details)Jan 16 2022, 8:51 PM

Harbormaster completed remote builds in B143714: Diff 400427.Jan 16 2022, 9:41 PM

Meinersbur added a subscriber: Meinersbur.Jan 17 2022, 6:49 AM

Meinersbur added inline comments.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
903	I couldn't find a reference for `unsafe-fp-math`, but I would have guessed that it forces all instructions to be fast-math, even if not marked as fast. If already marked as fast, then no additional tag `unsafe-fp-math` should be necessary. Instead of `isFast`, would `allowReassoc` be sufficient (or whatever flag controls commutativity)? The logic here will check only one fp instruction. What if two instructions are involved? Such as: for (...) for (...) { sum += A[i]; sum += B[j]; } print(sum);

congzhe updated this revision to Diff 401088.Jan 18 2022, 8:44 PM

congzhe added inline comments.Jan 18 2022, 9:04 PM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
903	Thanks a lot for the review! I updated the patch accordingly. Instead of using `isFast()`, I now use `hasAllowReassoc()`. Instead of only checking the variable obtained from `followLCSSA(&PHI)`, now I added a simple data flow analysis `areAllInstsReassoc()` that checks whether all insturctions involved in the FP reduction allow reassociation. Hopefully this is sufficient for the two-fp-instruction case you provided (and cases where multiple fp instructions are involved). Regarding "unsafe-fp-math" VS "fast": thank you and I see your point. And yes, I did see cases where the function has the "unsafe-fp-math" attribute but its instructions do not (please see example 1 below). Nevertheless I did also see test cases where instructions are marked "fast" but the surrounding function does not have the "unsafe-fp-math" attribute (please see example 2 below). It seems like there is some minor inconsistency between the attribute and the flag, so I just wanted to be conservatively correct by checking both -- if we have the attribute then that's fine, otherwise we continue checking instruction flags. However, if you think the logic can be simpilfied I'll be glad to update the code. Example 1 (clang/test/CodeGen/fp-options-to-fast-math-flags.c): Note that the `fast` flag is not generated even if compiled with `-ffast-math`. float test(float a) { return a + fn(a); } // CHECK-FAST: [[CALL_RES:%.+]] = call reassoc nnan ninf nsz arcp afn float @fn(float noundef {{%.+}}) // CHECK-FAST: {{%.+}} = fadd reassoc nnan ninf nsz arcp afn float {{%.+}}, [[CALL_RES]] Example 2 (`llvm/test/Transforms/InstSimplify/ConstProp/math-1.ll`): define double @f_acos() { ; CHECK-LABEL: @f_acos( ; CHECK-NEXT: ret double 0.000000e+00 ; %res = tail call fast double @acos(double 1.0) ret double %res }

Harbormaster completed remote builds in B144196: Diff 401088.Jan 18 2022, 9:37 PM

There is the function findInnerReductionPhi that calls RecurrenceDescriptor::isReductionPHI to determine whether an operation is a reduction and also already checks fast-math flags. Isn't that check already sufficient?

Instead of using isFast(), I now use hasAllowReassoc().

I am not a floating-point-expert, so I was asking which flags are actually required. I think at least "nsz" would be necessary are well because (+0) * (-0) is (+0) but (-0) * (+0) is (-0). See https://en.wikipedia.org/wiki/Signed_zero. Similar non-commutativity rules may apply to NaN/Inf. When in doubt, isFast should be the safe option.

Regarding "unsafe-fp-math" VS "fast":

IIRC, the per-operation flags have superseded the per-function unsafe-fp-math and the latter only used for compatibility. Please look up a reference or how other passes handle this. See e.g. https://lists.llvm.org/pipermail/llvm-dev/2012-October/054980.html. Unfortunately it doesn't mention how to handle unsafe-fp-math.

@Meinersbur Thank you for the comments, I've updated the patch accordingly.

I've made use of RecurrenceDescriptor::isReductionPHI, especially RecurrenceDescriptor::getExactFPMathInst() to determine whether it is safe to reorer FP instructions. This is similar to what we've done in loop vectorization.

However, it seems that there is a bug in RecurrenceDescriptor, as it fails to detect cases where there's two fadd instructions involved in the FP reduction and only one of them has the "fast" flag (please refer to test5 in reductions-across-inner-and-outer-loop.ll below). In this case RecurrenceDescriptor::getExactFPMathInst() is supposed to return the fadd instruction that does not have the "fast" flag meaning it is unsafe to reorder them, but it returns NULL meaning it determines it is safe to reorder them.

I've changed IVDescriptor.cpp to fix the bug and temporarily included the change in this patch, but I'll post this part of change as another phabricator patch and rebase this patch on top.

Instead of using isFast(), I now use hasAllowReassoc().

Since we now use helper functions from IVDescriptor, this is less of an issue for now. Nevertheless in RecurrenceDescriptor::isRecurrenceInstr() it seems like they only check I->hasAllowReassoc().

Regarding "unsafe-fp-math" VS "fast":

it does look like "unsafe-fp-math" is deprecated and it is not dealt with by any mid-end passes, so I removed the handling of "unsafe-fp-math".

The IVDescriptor patch was posted here:
https://reviews.llvm.org/D118073.

Hi Michael @Meinersbur, I just checked that if we remove the check

if (PHI.getType()->isFloatingPointTy() &&
        RedDesc[followLCSSA(&PHI)].getExactFPMathInst() != nullptr)

then test5() in reductions-across-inner-and-outer-loop.ll would fail, the loop would be interchanged although reordering should not be allowed.

As a follow-up to our discussion: I understand your point that RecurrenceDescriptor::isReductionPHI() is supposed to return true if FP reductions can be reordered, and false otherwise. Nevertheless as the above test case showed, it may not work as expected (please correct me if I'm mistaken) . If we take the loop vectorization pass as an example, the way it decides whether reductions can be reordered is as follows (in function LoopVectorizationLegality::canVectorizeInstrs()):

RecurrenceDescriptor RedDes;
if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes, DB, AC,
                                                 DT)) {
  Requirements->addExactFPMathInst(RedDes.getExactFPMathInst());
  AllowedExit.insert(RedDes.getLoopExitInstr());
  Reductions[Phi] = RedDes;
  continue;
}

Similarly they also used getExactFPMathInst() to get the 1st FP instruction that does not allow reordering, as an indication whether the FP reduction can be reordered, even if RecurrenceDescriptor::isReductionPHI() returns true. So probably the isReductionPHI() did not work as we expected. If you wish, I'll be glad to modify this API to make it return true if FP reductions can be reordered and false otherwise, and change the code in vectorization as well.

I'd very much appreciate it if you could let me know your comments.

Harbormaster completed remote builds in B145290: Diff 402606.Jan 26 2022, 11:00 AM

Rebased for now on my IVDescriptor patch https://reviews.llvm.org/D118073.

I applied the patch myself the check.

The other uses of IVDescriptors is LoopVectorizationLegality.

if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes, DB, AC,
                                         DT)) {
  Requirements->addExactFPMathInst(RedDes.getExactFPMathInst());
  AllowedExit.insert(RedDes.getLoopExitInstr());
  Reductions[Phi] = RedDes;
  continue;
}

addExactFPMathInst as the comments:

/// This holds vectorization requirements that must be verified late in
/// the process. The requirements are set by legalize and costmodel. Once
/// vectorization has been determined to be possible and profitable the
/// requirements can be verified by looking for metadata or compiler options.
/// For example, some loops require FP commutativity which is only allowed if
/// vectorization is explicitly specified or if the fast-math compiler option
/// has been provided.
/// Late evaluation of these requirements allows helpful diagnostics to be
/// composed that tells the user what need to be done to vectorize the loop. For
/// example, by specifying #pragma clang loop vectorize or -ffast-math. Late
/// evaluation should be used only when diagnostics can generated that can be
/// followed by a non-expert user.
class LoopVectorizationRequirements {
public:
  /// Track the 1st floating-point instruction that can not be reassociated.
  void addExactFPMathInst(Instruction *I);

I suggest to do the same with LoopInterchange. That is, RD.getFastMathFlags() contains requirements that still needs to be checked.

In D117450#3280642, @Meinersbur wrote:

I applied the patch myself the check.

The other uses of IVDescriptors is LoopVectorizationLegality.

if (RecurrenceDescriptor::isReductionPHI(Phi, TheLoop, RedDes, DB, AC,
                                         DT)) {
  Requirements->addExactFPMathInst(RedDes.getExactFPMathInst());
  AllowedExit.insert(RedDes.getLoopExitInstr());
  Reductions[Phi] = RedDes;
  continue;
}

addExactFPMathInst as the comments:

/// This holds vectorization requirements that must be verified late in
/// the process. The requirements are set by legalize and costmodel. Once
/// vectorization has been determined to be possible and profitable the
/// requirements can be verified by looking for metadata or compiler options.
/// For example, some loops require FP commutativity which is only allowed if
/// vectorization is explicitly specified or if the fast-math compiler option
/// has been provided.
/// Late evaluation of these requirements allows helpful diagnostics to be
/// composed that tells the user what need to be done to vectorize the loop. For
/// example, by specifying #pragma clang loop vectorize or -ffast-math. Late
/// evaluation should be used only when diagnostics can generated that can be
/// followed by a non-expert user.
class LoopVectorizationRequirements {
public:
  /// Track the 1st floating-point instruction that can not be reassociated.
  void addExactFPMathInst(Instruction *I);

I suggest to do the same with LoopInterchange. That is, RD.getFastMathFlags() contains requirements that still needs to be checked.

Thanks Michael, this is what I described as well (I updated my previous reply, just in case you did not notice: https://reviews.llvm.org/D117450#3272927).

Are you suggesting me to modify RedDesc[followLCSSA(&PHI)].getExactFPMathInst() != nullptr to something like !RedDesc[followLCSSA(&PHI)].getFastMathFlags().isfast() in areOuterLoopExitPHIsSupported()?

Harbormaster completed remote builds in B146366: Diff 404134.Jan 28 2022, 2:48 PM

In D117450#3280681, @congzhe wrote:

Thanks Michael, this is what I described as well (I updated my previous reply, just in case you did not notice: https://reviews.llvm.org/D117450#3272927).

Sorry, I missed that. Probably because I had the Phabricator page already loaded when you submitted the comment.

Are you suggesting me to modify RedDesc[followLCSSA(&PHI)].getExactFPMathInst() != nullptr to something like !RedDesc[followLCSSA(&PHI)].getFastMathFlags().isfast() in areOuterLoopExitPHIsSupported()?

I suggest this in findInnerReductionPhi:

if (RecurrenceDescriptor::isReductionPHI(PHI, L, RD)) {
  if (RD.needsExactFPMath())
    return nullptr;
  return PHI;
}

It might be more fine-grained using RD.getFastMathFlags() and check against compiler flags (if RecurrenceDescriptor did not already take them into account). LoopVectorize has additional hints/compiler flags that act like -ffast-math just for vectorization.

congzhe updated this revision to Diff 405526.Feb 2 2022, 10:15 PM

In D117450#3281892, @Meinersbur wrote:
I suggest this in findInnerReductionPhi:
if (RecurrenceDescriptor::isReductionPHI(PHI, L, RD)) {
  if (RD.needsExactFPMath())
    return nullptr;
  return PHI;
}

Thanks, I've updated the patch accordingly. I used RD.getExactFPMathInst() instead, since needsExactFPMath() does not belong to RecurrenceDescriptor.

It might be more fine-grained using RD.getFastMathFlags() and check against compiler flags (if RecurrenceDescriptor did not already take them into account). LoopVectorize has additional hints/compiler flags that act like -ffast-math just for vectorization.

I do agree that it would be more fine-grained using RD.getFastMathFlags() and check against flags. I see SLP checks noNaNs() for FMax/Fmin, and loop vectorizer relies on getExactFPMathInst() and additional hints. I'm thinking maybe at the moment we could just rely on RD.getExactFPMathInst() to be safe, and we could make it finer-grained later. I'd appreciate it if you could let me know your thoughts.

Harbormaster completed remote builds in B147314: Diff 405526.Feb 2 2022, 10:42 PM

In D117450#3292701, @congzhe wrote:

Thanks, I've updated the patch accordingly. I used RD.getExactFPMathInst() instead, since needsExactFPMath() does not belong to RecurrenceDescriptor.

Correct, unfortunately. I found needsExactFPMath more descriptive. If you agree, you could a definition just like InstDesc has.

I do agree that it would be more fine-grained using RD.getFastMathFlags() and check against flags. I see SLP checks noNaNs() for FMax/Fmin, and loop vectorizer relies on getExactFPMathInst() and additional hints. I'm thinking maybe at the moment we could just rely on RD.getExactFPMathInst() to be safe, and we could make it finer-grained later.

Agreed.

LGTM. Please address the nitpicks before committing.

Thank you for your contribution!

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
885	[nit] unrelated whitespace change
899–900	[style] In my interpretation of the coding standard, the braces should stay here. See the "Use braces for the outer `if` since the nested `for` is braced." example.

This revision is now accepted and ready to land.Feb 4 2022, 9:58 AM

Thanks for all the comments, I've addressed them and will land the patch shortly.

This revision was landed with ongoing or failed builds.Feb 6 2022, 2:09 PM

Closed by commit rG1ef04326ec5f: [LoopInterchange] Support loop interchange with floating point reductions (authored by congzhe). · Explain Why

This revision was automatically updated to reflect the committed changes.

congzhe added a commit: rG1ef04326ec5f: [LoopInterchange] Support loop interchange with floating point reductions.

Harbormaster completed remote builds in B147843: Diff 406279.Feb 6 2022, 2:38 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopInterchange.cpp

43 lines

test/

Transforms/

LoopInterchange/

lcssa.ll

12 lines

reductions-across-inner-and-outer-loop.ll

80 lines

Diff 406281

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

Show First 20 Lines • Show All 727 Lines • ▼ Show 20 Lines	static PHINode findInnerReductionPhi(Loop L, Value *V) {
if (isa<Constant>(V))		if (isa<Constant>(V))
return nullptr;		return nullptr;

for (Value *User : V->users()) {		for (Value *User : V->users()) {
if (PHINode *PHI = dyn_cast<PHINode>(User)) {		if (PHINode *PHI = dyn_cast<PHINode>(User)) {
if (PHI->getNumIncomingValues() == 1)		if (PHI->getNumIncomingValues() == 1)
continue;		continue;
RecurrenceDescriptor RD;		RecurrenceDescriptor RD;
if (RecurrenceDescriptor::isReductionPHI(PHI, L, RD))		if (RecurrenceDescriptor::isReductionPHI(PHI, L, RD)) {
		// Detect floating point reduction only when it can be reordered.
		if (RD.getExactFPMathInst() != nullptr)
		return nullptr;
return PHI;		return PHI;
		}
return nullptr;		return nullptr;
}		}
}		}

return nullptr;		return nullptr;
}		}

bool LoopInterchangeLegality::findInductionAndReductions(		bool LoopInterchangeLegality::findInductionAndReductions(
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	if (any_of(PHI.users(), [&Reductions, OuterL](User *U) {
return !PN \|\|		return !PN \|\|
(!Reductions.count(PN) && OuterL->contains(PN->getParent()));		(!Reductions.count(PN) && OuterL->contains(PN->getParent()));
})) {		})) {
return false;		return false;
}		}
}		}
return true;		return true;
}		}

MeinersburUnsubmitted Not Done Reply Inline Actions [nit] unrelated whitespace change Meinersbur: [nit] unrelated whitespace change
// We currently support LCSSA PHI nodes in the outer loop exit, if their		// We currently support LCSSA PHI nodes in the outer loop exit, if their
// incoming values do not come from the outer loop latch or if the		// incoming values do not come from the outer loop latch or if the
// outer loop latch has a single predecessor. In that case, the value will		// outer loop latch has a single predecessor. In that case, the value will
// be available if both the inner and outer loop conditions are true, which		// be available if both the inner and outer loop conditions are true, which
// will still be true after interchanging. If we have multiple predecessor,		// will still be true after interchanging. If we have multiple predecessor,
// that may not be the case, e.g. because the outer loop latch may be executed		// that may not be the case, e.g. because the outer loop latch may be executed
// if the inner loop is not executed.		// if the inner loop is not executed.
static bool areOuterLoopExitPHIsSupported(Loop OuterLoop, Loop InnerLoop) {		static bool areOuterLoopExitPHIsSupported(Loop OuterLoop, Loop InnerLoop) {
BasicBlock *LoopNestExit = OuterLoop->getUniqueExitBlock();		BasicBlock *LoopNestExit = OuterLoop->getUniqueExitBlock();
for (PHINode &PHI : LoopNestExit->phis()) {		for (PHINode &PHI : LoopNestExit->phis()) {
// FIXME: We currently are not able to detect floating point reductions
// and have to use floating point PHIs as a proxy to prevent
// interchanging in the presence of floating point reductions.
if (PHI.getType()->isFloatingPointTy())
return false;
for (unsigned i = 0; i < PHI.getNumIncomingValues(); i++) {		for (unsigned i = 0; i < PHI.getNumIncomingValues(); i++) {
		MeinersburUnsubmitted Not Done Reply Inline Actions [style] In my interpretation of the coding standard, the braces should stay here. See the "Use braces for the outer `if` since the nested `for` is braced." example. Meinersbur: [style] In my interpretation of the [[ https://llvm.org/docs/CodingStandards.html#don-t-use…
Instruction *IncomingI = dyn_cast<Instruction>(PHI.getIncomingValue(i));		Instruction *IncomingI = dyn_cast<Instruction>(PHI.getIncomingValue(i));
if (!IncomingI \|\| IncomingI->getParent() != OuterLoop->getLoopLatch())		if (!IncomingI \|\| IncomingI->getParent() != OuterLoop->getLoopLatch())
continue;		continue;
		MeinersburUnsubmitted Not Done Reply Inline Actions I couldn't find a reference for `unsafe-fp-math`, but I would have guessed that it forces all instructions to be fast-math, even if not marked as fast. If already marked as fast, then no additional tag `unsafe-fp-math` should be necessary. Instead of `isFast`, would `allowReassoc` be sufficient (or whatever flag controls commutativity)? The logic here will check only one fp instruction. What if two instructions are involved? Such as: for (...) for (...) { sum += A[i]; sum += B[j]; } print(sum); Meinersbur: I couldn't find a reference for `unsafe-fp-math`, but I would have guessed that it forces all…
		congzheAuthorUnsubmitted Done Reply Inline Actions Thanks a lot for the review! I updated the patch accordingly. Instead of using `isFast()`, I now use `hasAllowReassoc()`. Instead of only checking the variable obtained from `followLCSSA(&PHI)`, now I added a simple data flow analysis `areAllInstsReassoc()` that checks whether all insturctions involved in the FP reduction allow reassociation. Hopefully this is sufficient for the two-fp-instruction case you provided (and cases where multiple fp instructions are involved). Regarding "unsafe-fp-math" VS "fast": thank you and I see your point. And yes, I did see cases where the function has the "unsafe-fp-math" attribute but its instructions do not (please see example 1 below). Nevertheless I did also see test cases where instructions are marked "fast" but the surrounding function does not have the "unsafe-fp-math" attribute (please see example 2 below). It seems like there is some minor inconsistency between the attribute and the flag, so I just wanted to be conservatively correct by checking both -- if we have the attribute then that's fine, otherwise we continue checking instruction flags. However, if you think the logic can be simpilfied I'll be glad to update the code. Example 1 (clang/test/CodeGen/fp-options-to-fast-math-flags.c): Note that the `fast` flag is not generated even if compiled with `-ffast-math`. float test(float a) { return a + fn(a); } // CHECK-FAST: [[CALL_RES:%.+]] = call reassoc nnan ninf nsz arcp afn float @fn(float noundef {{%.+}}) // CHECK-FAST: {{%.+}} = fadd reassoc nnan ninf nsz arcp afn float {{%.+}}, [[CALL_RES]] Example 2 (`llvm/test/Transforms/InstSimplify/ConstProp/math-1.ll`): define double @f_acos() { ; CHECK-LABEL: @f_acos( ; CHECK-NEXT: ret double 0.000000e+00 ; %res = tail call fast double @acos(double 1.0) ret double %res } congzhe: Thanks a lot for the review! I updated the patch accordingly. Instead of using `isFast()`, I…

// The incoming value is defined in the outer loop latch. Currently we		// The incoming value is defined in the outer loop latch. Currently we
// only support that in case the outer loop latch has a single predecessor.		// only support that in case the outer loop latch has a single predecessor.
// This guarantees that the outer loop latch is executed if and only if		// This guarantees that the outer loop latch is executed if and only if
// the inner loop is executed (because tightlyNested() guarantees that the		// the inner loop is executed (because tightlyNested() guarantees that the
// outer loop header only branches to the inner loop or the outer loop		// outer loop header only branches to the inner loop or the outer loop
// latch).		// latch).
// FIXME: We could weaken this logic and allow multiple predecessors,		// FIXME: We could weaken this logic and allow multiple predecessors,
// if the values are produced outside the loop latch. We would need		// if the values are produced outside the loop latch. We would need
// additional logic to update the PHI nodes in the exit block as		// additional logic to update the PHI nodes in the exit block as
// well.		// well.
if (OuterLoop->getLoopLatch()->getUniquePredecessor() == nullptr)		if (OuterLoop->getLoopLatch()->getUniquePredecessor() == nullptr)
return false;		return false;
}		}
}		}
return true;		return true;
}		}

// In case of multi-level nested loops, it may occur that lcssa phis exist in		// In case of multi-level nested loops, it may occur that lcssa phis exist in
// the latch of InnerLoop, i.e., when defs of the incoming values are further		// the latch of InnerLoop, i.e., when defs of the incoming values are further
// inside the loopnest. Sometimes those incoming values are not available		// inside the loopnest. Sometimes those incoming values are not available
▲ Show 20 Lines • Show All 805 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/lcssa.ll

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	for.exit: ; preds = %outer.inc
%iv.inner.lcssa.lcssa = phi i64 [ %iv.inner.lcssa, %outer.inc ]		%iv.inner.lcssa.lcssa = phi i64 [ %iv.inner.lcssa, %outer.inc ]
store i64 %iv.inner.lcssa.lcssa, i64* @Y		store i64 %iv.inner.lcssa.lcssa, i64* @Y
br label %for.end16		br label %for.end16

for.end16: ; preds = %for.exit		for.end16: ; preds = %for.exit
ret void		ret void
}		}

; FIXME: We currently do not support LCSSA phi nodes involving floating point		; Loops with floating point reductions are interchanged with fastmath.
; types, as we fail to detect floating point reductions for now.		; REMARK: Interchanged
; REMARK: UnsupportedPHIOuter
; REMARK-NEXT: lcssa_04		; REMARK-NEXT: lcssa_04

define void @lcssa_04() {		define void @lcssa_04() {
entry:		entry:
br label %outer.header		br label %outer.header

outer.header: ; preds = %outer.inc, %entry		outer.header: ; preds = %outer.inc, %entry
%iv.outer = phi i64 [ 1, %entry ], [ %iv.outer.next, %outer.inc ]		%iv.outer = phi i64 [ 1, %entry ], [ %iv.outer.next, %outer.inc ]
%float.outer = phi float [ 1.000000e+00, %entry ], [ 2.000000e+00, %outer.inc ]		%float.outer = phi float [ 1.000000e+00, %entry ], [ %float.outer.next, %outer.inc ]
br label %for.body3		br label %for.body3

for.body3: ; preds = %for.body3, %outer.header		for.body3: ; preds = %for.body3, %outer.header
%iv.inner = phi i64 [ %iv.inner.next, %for.body3 ], [ 1, %outer.header ]		%iv.inner = phi i64 [ %iv.inner.next, %for.body3 ], [ 1, %outer.header ]
		%float.inner = phi float [ %float.inner.next, %for.body3 ], [ %float.outer, %outer.header ]
%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %iv.inner, i64 %iv.outer		%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %iv.inner, i64 %iv.outer
%vA = load i32, i32* %arrayidx5		%vA = load i32, i32* %arrayidx5
%arrayidx9 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @C, i64 0, i64 %iv.inner, i64 %iv.outer		%arrayidx9 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @C, i64 0, i64 %iv.inner, i64 %iv.outer
%vC = load i32, i32* %arrayidx9		%vC = load i32, i32* %arrayidx9
%add = add nsw i32 %vA, %vC		%add = add nsw i32 %vA, %vC
		%float.inner.next = fadd fast float %float.inner, 1.000000e+00
store i32 %add, i32* %arrayidx5		store i32 %add, i32* %arrayidx5
%iv.inner.next = add nuw nsw i64 %iv.inner, 1		%iv.inner.next = add nuw nsw i64 %iv.inner, 1
%exitcond = icmp eq i64 %iv.inner.next, 100		%exitcond = icmp eq i64 %iv.inner.next, 100
br i1 %exitcond, label %outer.inc, label %for.body3		br i1 %exitcond, label %outer.inc, label %for.body3

outer.inc: ; preds = %for.body3		outer.inc: ; preds = %for.body3
		%float.outer.next = phi float [ %float.inner.next, %for.body3 ]
%iv.outer.next = add nsw i64 %iv.outer, 1		%iv.outer.next = add nsw i64 %iv.outer, 1
%cmp = icmp eq i64 %iv.outer.next, 100		%cmp = icmp eq i64 %iv.outer.next, 100
br i1 %cmp, label %outer.header, label %for.exit		br i1 %cmp, label %outer.header, label %for.exit

for.exit: ; preds = %outer.inc		for.exit: ; preds = %outer.inc
%float.outer.lcssa = phi float [ %float.outer, %outer.inc ]		%float.outer.lcssa = phi float [ %float.outer.next, %outer.inc ]
store float %float.outer.lcssa, float* @F		store float %float.outer.lcssa, float* @F
br label %for.end16		br label %for.end16

for.end16: ; preds = %for.exit		for.end16: ; preds = %for.exit
ret void		ret void
}		}

; PHI node in inner latch with multiple predecessors.		; PHI node in inner latch with multiple predecessors.
▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/reductions-across-inner-and-outer-loop.ll

Show First 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	for1.inc: ; preds = %for2
%indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1		%indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1
%exit2 = icmp eq i64 %indvars.iv.next24, 100		%exit2 = icmp eq i64 %indvars.iv.next24, 100
br i1 %exit2, label %for1.loopexit, label %for1.header		br i1 %exit2, label %for1.loopexit, label %for1.header

for1.loopexit: ; preds = %for1.inc		for1.loopexit: ; preds = %for1.inc
%il.res.lcssa2 = phi i64 [ %sum.inc.amend, %for1.inc ]		%il.res.lcssa2 = phi i64 [ %sum.inc.amend, %for1.inc ]
ret i64 %il.res.lcssa2		ret i64 %il.res.lcssa2
}		}

		; Floating point reductions are interchanged if all the fp instructions
		; involved allow reassociation.
		; REMARKS: --- !Passed
		; REMARKS-NEXT: Pass: loop-interchange
		; REMARKS-NEXT: Name: Interchanged
		; REMARKS-NEXT: Function: test5

		define float @test5([100 x [100 x float]]* %Arr, [100 x [100 x float]]* %Arr2) {
		entry:
		br label %outer.header

		outer.header: ; preds = %outer.inc, %entry
		%iv.outer = phi i64 [ 1, %entry ], [ %iv.outer.next, %outer.inc ]
		%float.outer = phi float [ 1.000000e+00, %entry ], [ %float.inner.lcssa, %outer.inc ]
		br label %for.body3

		for.body3: ; preds = %for.body3, %outer.header
		%float.inner = phi float [ %float.outer , %outer.header ], [ %float.inner.inc.inc, %for.body3 ]
		%iv.inner = phi i64 [ %iv.inner.next, %for.body3 ], [ 1, %outer.header ]
		%arrayidx5 = getelementptr inbounds [100 x [100 x float]], [100 x [100 x float]]* %Arr, i64 0, i64 %iv.inner, i64 %iv.outer
		%vA = load float, float* %arrayidx5
		%float.inner.inc = fadd fast float %float.inner, %vA
		%arrayidx6 = getelementptr inbounds [100 x [100 x float]], [100 x [100 x float]]* %Arr2, i64 0, i64 %iv.inner, i64 %iv.outer
		%vB = load float, float* %arrayidx6
		%float.inner.inc.inc = fadd fast float %float.inner.inc, %vB
		%iv.inner.next = add nuw nsw i64 %iv.inner, 1
		%exitcond = icmp eq i64 %iv.inner.next, 100
		br i1 %exitcond, label %outer.inc, label %for.body3

		outer.inc: ; preds = %for.body3
		%float.inner.lcssa = phi float [ %float.inner.inc.inc, %for.body3 ]
		%iv.outer.next = add nsw i64 %iv.outer, 1
		%cmp = icmp eq i64 %iv.outer.next, 100
		br i1 %cmp, label %outer.header, label %for.exit

		for.exit: ; preds = %outer.inc
		%float.outer.lcssa = phi float [ %float.inner.lcssa, %outer.inc ]
		ret float %float.outer.lcssa
		}

		; Floating point reductions are not interchanged if not all the fp instructions
		; involved allow reassociation.
		; REMARKS: --- !Missed
		; REMARKS-NEXT: Pass: loop-interchange
		; REMARKS-NEXT: Name: UnsupportedPHIOuter
		; REMARKS-NEXT: Function: test6

		define float @test6([100 x [100 x float]]* %Arr, [100 x [100 x float]]* %Arr2) {
		entry:
		br label %outer.header

		outer.header: ; preds = %outer.inc, %entry
		%iv.outer = phi i64 [ 1, %entry ], [ %iv.outer.next, %outer.inc ]
		%float.outer = phi float [ 1.000000e+00, %entry ], [ %float.inner.lcssa, %outer.inc ]
		br label %for.body3

		for.body3: ; preds = %for.body3, %outer.header
		%float.inner = phi float [ %float.outer , %outer.header ], [ %float.inner.inc.inc, %for.body3 ]
		%iv.inner = phi i64 [ %iv.inner.next, %for.body3 ], [ 1, %outer.header ]
		%arrayidx5 = getelementptr inbounds [100 x [100 x float]], [100 x [100 x float]]* %Arr, i64 0, i64 %iv.inner, i64 %iv.outer
		%vA = load float, float* %arrayidx5
		%float.inner.inc = fadd float %float.inner, %vA ; do not allow reassociation
		%arrayidx6 = getelementptr inbounds [100 x [100 x float]], [100 x [100 x float]]* %Arr2, i64 0, i64 %iv.inner, i64 %iv.outer
		%vB = load float, float* %arrayidx6
		%float.inner.inc.inc = fadd fast float %float.inner.inc, %vB
		%iv.inner.next = add nuw nsw i64 %iv.inner, 1
		%exitcond = icmp eq i64 %iv.inner.next, 100
		br i1 %exitcond, label %outer.inc, label %for.body3

		outer.inc: ; preds = %for.body3
		%float.inner.lcssa = phi float [ %float.inner.inc.inc, %for.body3 ]
		%iv.outer.next = add nsw i64 %iv.outer, 1
		%cmp = icmp eq i64 %iv.outer.next, 100
		br i1 %cmp, label %outer.header, label %for.exit

		for.exit: ; preds = %outer.inc
		%float.outer.lcssa = phi float [ %float.inner.lcssa, %outer.inc ]
		ret float %float.outer.lcssa
		}