This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Avoid sinking fdiv into a loop
AbandonedPublic

Authored by david-arm on Feb 14 2023, 2:06 PM.

Download Raw Diff

Details

Reviewers

craig.topper
sdesmalen
nikic
efriedma
spatel

Summary

This an alternate to D87479 that uses LoopInfo when available
to more precisely target the problematic transform. If we
know the fmul lives in a loop then we will only sink the fdiv
if it's not loop invariant. Otherwise, if we don't have any
loop information, or the fmul is not in a loop we only combine
the fmul and fdiv if they are in the same block. Allowing the
transform early in the optimization pipeline (because there
may not be any LoopInfo yet) doesn't seem like a bad thing.

As discussed in D143631, there's a larger issue between
reassociation/combining/LICM because this doesn't solve a more
general problem that we can show in an example with no fdivs:
https://godbolt.org/z/xrs9xfGrf

...but this does manage to avoid the problem in the motivating
test for D143631.

Co-authored-by: Sanjay Patel <spatel@rotateright.com>

Diff Detail

Event Timeline

spatel created this revision.Feb 14 2023, 2:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2023, 2:06 PM

Herald added subscribers: ormris, StephenFan, asbirlea and 4 others. · View Herald Transcript

spatel requested review of this revision.Feb 14 2023, 2:06 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2023, 2:06 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B213734: Diff 497429.Feb 14 2023, 2:07 PM

Before we do this, we should change InstCombine to accept a pass parameter that determines whether LoopInfo is used or not. Currently, we will use it if it happens to be cached, which is very fragile under the new pass manager. I think the particular motivating case will work because InstCombine is preceded by an AlignmentFromAssumption pass, which requires SCEV, which requires LoopInfo, and which preserves CFG. But in other pipeline positions we will randomly either use or not use LoopInfo, depending on whether passes in between happened to make a change or not.

Instead we should specify whether each specific InstCombine invocation should be using LoopInfo or not (probably based on guaranteed availability at that pipeline position).

This revision now requires changes to proceed.Feb 14 2023, 11:35 PM

spatel mentioned this in D144199: [InstCombine] create and use a pass options container.Feb 16 2023, 9:03 AM

spatel mentioned this in rG4ecc6af813e2: [InstCombine] create a pass options container and add "use-loop-info" argument.Feb 17 2023, 7:30 AM

spatel mentioned this in D144274: [InstCombine] use loop info when running the pass after loop vectorization.Feb 17 2023, 9:01 AM

spatel mentioned this in rG43ae4b62b267: [InstCombine] use loop info when running the pass after loop vectorization.Mar 11 2023, 11:22 AM

I have spoken with @spatel who said he is unlikely to have much time to progress this patch and he's happy for me to commandeer it. I would like to make progress on this because it is an important fix on AArch64 for the SPEC2017 benchmark parest.

@nikic do you have any thoughts about whether it's worth re-opening D144274 and potentially using the LoopInfo in more places to fix the compile-time issue you saw?

Hi @david-arm, I can see the value of adding LoopInfo and using that to avoid reversing some of LICM's decisions, but I believe this patch makes the decision the wrong way around.
Rather than making the combine unless it can prove that it doesn't reverse LICM, I'd rather see InstCombine not make the combine unless it can prove that it's not reversing LICM.

To clarify, at the moment InstCombine has the following behaviour:

(A) Combine the reciprocal + fmul => fdiv always

With this patch, the behaviour changes when LoopInfo is available:

(B) Combine the reciprocal + fmul => fdiv only if the reciprocal has not been hoisted (requires LoopInfo)

We could add a new, more conservative mode where InstCombine does the following:

(C) Combine the reciprocal + fmul => fdiv only if these live in the same basic block.

Then we can have the following behaviours:

Default:
- If LoopInfo is not available: (C)
- If LoopInfo is cached: (B)
- With this new default, it's no longer possible for InstCombine and LICM to conflict, because InstCombine is by default more conservative. This means that having forgotten to enable LoopInfo (when another instance of InstCombine is added to the pipeline) can't lead to regressions for this case.
Explicit option to force canonical form regardless of LoopInfo: (A)
- This way it's still possible to run InstCombine in the mode it currently runs.
- InstCombine can be forced to run in this mode earlier on in the pipeline before LICM has run.
Explicit option to force availability of LoopInfo (i.e. this patch as it is now): (B)
- I'm not sure how useful this option is in practice, but at least it allows InstCombine to always make the most informed decision, regardless of LoopInfo being intact from a previous pass.

What do you think?

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
663–669	nit: there is little value in creating a lambda and using it only once?

david-arm mentioned this in D130618: [AArch64][LoopVectorize] Enable tail-folding of simple loops on neoverse-v1.Apr 24 2023, 8:32 AM

Matt added a subscriber: Matt.Apr 24 2023, 2:06 PM

Address review comments by only applying the transformation when either a) we know the fmul lives in a loop and the fdiv is loop invariant, or b) when we know the fmul and fdiv live in the same block.

david-arm marked an inline comment as done.May 12 2023, 2:47 AM

Did you do any performance runs to check if there were regressions? If so, we'd need to change some of the InstCombine instances to run in a different mode (suggestion (2)) by explicitly forcing the canonical form regardless of LoopInfo. If not, I'm not sure if such an option is still worth adding.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
663–669	Is this equivalent to: bool ShouldSink = L ? !L->isLoopInvariant(FDiv) : cast<Instruction>(FDiv)->getParent() == I.getParent(); ?

Harbormaster completed remote builds in B231555: Diff 521590.May 12 2023, 3:30 AM

Refactored logic.

In D144045#4337318, @sdesmalen wrote:

Did you do any performance runs to check if there were regressions? If so, we'd need to change some of the InstCombine instances to run in a different mode (suggestion (2)) by explicitly forcing the canonical form regardless of LoopInfo. If not, I'm not sure if such an option is still worth adding.

I didn't see any change in performance when running SPEC2017 on neoverse-v1 and on a X86 machine. On neoverse-v1 I also ran some additional HPC benchmarks and some LLVM test suite workloads without seeing any regressions.

Thanks for the refactor and collecting some benchmark stats @david-arm!

The patch LGTM. @nikic are you happy to accept it as well?

Harbormaster completed remote builds in B231595: Diff 521641.May 12 2023, 8:33 AM

My general inclination would be not to do this -- I spent some time looking into it, and I think we would be better off dropping the LoopInfo dependency from InstCombine. Unless we actually want to make it a hard dependency, this is always going to be fragile. It also only addresses one pretty specific case (though an important one, of course), while InstCombine potentially sinking instructions into loops is a generic problem.

My preference for solving this would be to enable the final LICM run for the full LTO pipeline instead. I hate to do this because it has significant compile-time impact (I've been working on reducing LICM compile-time, and it is now much cheaper than when this was first discussed years ago, but it's still fairly expensive). But I think this is the right thing to do despite that. My longer-term goal would be to unify the thin LTO and full LTO optimization pipelines to the degree it is possible, and adding that final LICM run would be required for that as well. As long as we have the pipeline differences, we're going to see these kinds of optimization failures in full LTO, simply because the major production users all use thin LTO and full LTO nowadays tends to be an afterthought only.

david-arm mentioned this in D143631: [LTO] Ensure LICM hoists expensive fdiv instructions introduced by InstCombine.Jun 29 2023, 6:54 AM

david-arm abandoned this revision.Jul 10 2023, 2:36 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineMulDivRem.cpp

26 lines

test/

Transforms/

InstCombine/

fmul.ll

22 lines

PhaseOrdering/

X86/

vdiv-nounroll.ll

12 lines

vdiv.ll

122 lines

lto-licm.ll

7 lines

Diff 521641

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

Show All 9 Lines
// srem, urem, frem.		// srem, urem, frem.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
▲ Show 20 Lines • Show All 616 Lines • ▼ Show 20 Lines	if (match(Op1, m_Constant(C)) && C->isFiniteNonZeroFP()) {
if (Constant *CC1 = ConstantFoldBinaryOpOperands(		if (Constant *CC1 = ConstantFoldBinaryOpOperands(
Instruction::FMul, C, C1, DL)) {		Instruction::FMul, C, C1, DL)) {
Value *XC = Builder.CreateFMulFMF(X, C, &I);		Value *XC = Builder.CreateFMulFMF(X, C, &I);
return BinaryOperator::CreateFSubFMF(CC1, XC, &I);		return BinaryOperator::CreateFSubFMF(CC1, XC, &I);
}		}
}		}
}		}

		// Sink division: (X / Y) * Z --> (X * Z) / Y
		Value *FDiv;
Value *Z;		Value *Z;
if (match(&I, m_c_FMul(m_OneUse(m_FDiv(m_Value(X), m_Value(Y))),		if (match(&I,
		m_c_FMul(m_CombineAnd(m_Value(FDiv),
		m_OneUse(m_FDiv(m_Value(X), m_Value(Y)))),
m_Value(Z)))) {		m_Value(Z)))) {
// Sink division: (X / Y) * Z --> (X * Z) / Y		// If we know the fmul lives in a loop then only sink the fdiv if we can
		// prove it isn't loop invariant. We'd like to avoid putting an expensive
		// math op into a loop that it doesn't need to be in.
		// Otherwise, we only attempt to combine the fdiv and fmul if we know they
		// live in the same block.
		Loop *L = LI ? LI->getLoopFor(I.getParent()) : nullptr;
		// The fdiv should always be an instruction so the cast is safe.
		bool ShouldSink =
		L ? !L->isLoopInvariant(FDiv)
		: cast<Instruction>(FDiv)->getParent() == I.getParent();
		if (ShouldSink) {
Value *NewFMul = Builder.CreateFMulFMF(X, Z, &I);		Value *NewFMul = Builder.CreateFMulFMF(X, Z, &I);
		sdesmalenUnsubmitted Done Reply Inline Actions nit: there is little value in creating a lambda and using it only once? sdesmalen: nit: there is little value in creating a lambda and using it only once?
		sdesmalenUnsubmitted Done Reply Inline Actions Is this equivalent to: bool ShouldSink = L ? !L->isLoopInvariant(FDiv) : cast<Instruction>(FDiv)->getParent() == I.getParent(); ? sdesmalen: Is this equivalent to: bool ShouldSink = L ? !L->isLoopInvariant(FDiv) : cast<Instruction>…
return BinaryOperator::CreateFDivFMF(NewFMul, Y, &I);		return BinaryOperator::CreateFDivFMF(NewFMul, Y, &I);
}		}
		}

// sqrt(X) * sqrt(Y) -> sqrt(X * Y)		// sqrt(X) * sqrt(Y) -> sqrt(X * Y)
// nnan disallows the possibility of returning a number if both operands are		// nnan disallows the possibility of returning a number if both operands are
// negative (in that case, we should return NaN).		// negative (in that case, we should return NaN).
if (I.hasNoNaNs() && match(Op0, m_OneUse(m_Sqrt(m_Value(X)))) &&		if (I.hasNoNaNs() && match(Op0, m_OneUse(m_Sqrt(m_Value(X)))) &&
match(Op1, m_OneUse(m_Sqrt(m_Value(Y))))) {		match(Op1, m_OneUse(m_Sqrt(m_Value(Y))))) {
Value *XY = Builder.CreateFMulFMF(X, Y, &I);		Value *XY = Builder.CreateFMulFMF(X, Y, &I);
Value *Sqrt = Builder.CreateUnaryIntrinsic(Intrinsic::sqrt, XY, &I);		Value *Sqrt = Builder.CreateUnaryIntrinsic(Intrinsic::sqrt, XY, &I);
▲ Show 20 Lines • Show All 1,364 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/fmul.ll

	Show First 20 Lines • Show All 1,045 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret float [[MUL]]			; CHECK-NEXT: ret float [[MUL]]
	;			;
	%div = fdiv float %x, 42.0			%div = fdiv float %x, 42.0
	call void @use_f32(float %div)			call void @use_f32(float %div)
	%mul = fmul reassoc float %div, %y			%mul = fmul reassoc float %div, %y
	ret float %mul			ret float %mul
	}			}

				; In this case the fdiv doesn't get sunk into the loop because
				; the fmul and fdiv live in different blocks.
	define void @fmul_loop_invariant_fdiv(float* %a, float %x) {			define void @fmul_loop_invariant_fdiv(float* %a, float %x) {
	; CHECK-LABEL: @fmul_loop_invariant_fdiv(			; CHECK-LABEL: @fmul_loop_invariant_fdiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[D:%.]] = fdiv fast float 1.000000e+00, [[X:%.]]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_08:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[I_08:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[I_08]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[I_08]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[IDXPROM]]
	; CHECK-NEXT: [[F:%.*]] = load float, ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: [[F:%.*]] = load float, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[M:%.]] = fdiv fast float [[F]], [[X:%.]]			; CHECK-NEXT: [[M:%.*]] = fmul fast float [[F]], [[D]]
	; CHECK-NEXT: store float [[M]], ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: store float [[M]], ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_08]], 1
	; CHECK-NEXT: [[CMP_NOT:%.*]] = icmp eq i32 [[INC]], 1024			; CHECK-NEXT: [[CMP_NOT:%.*]] = icmp eq i32 [[INC]], 1024
	; CHECK-NEXT: br i1 [[CMP_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[CMP_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	;			;
	entry:			entry:
	%d = fdiv fast float 1.0, %x			%d = fdiv fast float 1.0, %x
	br label %for.body			br label %for.body

	for.cond.cleanup:			for.cond.cleanup:
	ret void			ret void

	for.body:			for.body:
	%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	%idxprom = zext i32 %i.08 to i64			%idxprom = zext i32 %i.08 to i64
	%arrayidx = getelementptr inbounds float, float* %a, i64 %idxprom			%arrayidx = getelementptr inbounds float, float* %a, i64 %idxprom
	%f = load float, float* %arrayidx, align 4			%f = load float, float* %arrayidx, align 4
	%m = fmul fast float %f, %d			%m = fmul fast float %f, %d
	store float %m, float* %arrayidx, align 4			store float %m, float* %arrayidx, align 4
	%inc = add nuw nsw i32 %i.08, 1			%inc = add nuw nsw i32 %i.08, 1
	%cmp.not = icmp eq i32 %inc, 1024			%cmp.not = icmp eq i32 %inc, 1024
	br i1 %cmp.not, label %for.cond.cleanup, label %for.body			br i1 %cmp.not, label %for.cond.cleanup, label %for.body
	}			}

				define void @fmul_fdiv_same_block(float* %a, float %x) {
				; CHECK-LABEL: @fmul_fdiv_same_block(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[F:%.]] = load float, ptr [[A:%.]], align 4
				; CHECK-NEXT: [[M:%.]] = fdiv fast float [[F]], [[X:%.]]
				; CHECK-NEXT: store float [[M]], ptr [[A]], align 4
				; CHECK-NEXT: ret void
				;
				entry:
				%d = fdiv fast float 1.0, %x
				%arrayidx = getelementptr inbounds float, float* %a, i64 0
				%f = load float, float* %arrayidx, align 4
				%m = fmul fast float %f, %d
				store float %m, float* %arrayidx, align 4
				ret void
				}

	; Avoid infinite looping by moving negation out of a constant expression.			; Avoid infinite looping by moving negation out of a constant expression.

	@g = external global {[2 x ptr]}, align 1			@g = external global {[2 x ptr]}, align 1

	define double @fmul_negated_constant_expression(double %x) {			define double @fmul_negated_constant_expression(double %x) {
	; CHECK-LABEL: @fmul_negated_constant_expression(			; CHECK-LABEL: @fmul_negated_constant_expression(
	; CHECK-NEXT: [[FSUB:%.*]] = fneg double bitcast (i64 ptrtoint (ptr getelementptr inbounds ({ [2 x ptr] }, ptr @g, i64 0, inrange i32 0, i64 2) to i64) to double)			; CHECK-NEXT: [[FSUB:%.*]] = fneg double bitcast (i64 ptrtoint (ptr getelementptr inbounds ({ [2 x ptr] }, ptr @g, i64 0, inrange i32 0, i64 2) to i64) to double)
	; CHECK-NEXT: [[R:%.]] = fmul double [[FSUB]], [[X:%.]]			; CHECK-NEXT: [[R:%.]] = fmul double [[FSUB]], [[X:%.]]
	▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vdiv-nounroll.ll

	Show All 11 Lines
	; void vdiv(ptr a, float b) {			; void vdiv(ptr a, float b) {
	; for (int i = 0; i != 1024; ++i)			; for (int i = 0; i != 1024; ++i)
	; a[i] /= b;			; a[i] /= b;
	; }			; }

	define void @vdiv(ptr %a, float %b) #0 {			define void @vdiv(ptr %a, float %b) #0 {
	; CHECK-LABEL: @vdiv(			; CHECK-LABEL: @vdiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float [[B:%.]], i64 0			; CHECK-NEXT: [[TMP0:%.]] = fdiv fast float 1.000000e+00, [[B:%.]]
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP0:%.*]] = fdiv fast <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, [[BROADCAST_SPLAT]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[INDEX]]
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP1]], align 4, !tbaa [[TBAA3:![0-9]+]]			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x float>, ptr [[TMP1]], align 4, !tbaa [[TBAA3:![0-9]+]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <4 x float> [[WIDE_LOAD]], [[TMP0]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: store <4 x float> [[TMP3]], ptr [[TMP1]], align 4, !tbaa [[TBAA3]]			; CHECK-NEXT: store <4 x float> [[TMP2]], ptr [[TMP1]], align 4, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
	; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024			; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1024
	; CHECK-NEXT: br i1 [[TMP5]], label [[FOR_COND_CLEANUP:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP3]], label [[FOR_COND_CLEANUP:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%a.addr = alloca ptr, align 8			%a.addr = alloca ptr, align 8
	%b.addr = alloca float, align 4			%b.addr = alloca float, align 4
	%i = alloca i32, align 4			%i = alloca i32, align 4
	store ptr %a, ptr %a.addr, align 8, !tbaa !3			store ptr %a, ptr %a.addr, align 8, !tbaa !3
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -O3 -S \| FileCheck %s			; RUN: opt < %s -O3 -S \| FileCheck %s
	; RUN: opt < %s -passes="default<O3>" -S \| FileCheck %s			; RUN: opt < %s -passes="default<O3>" -S \| FileCheck %s

	; Test that IR is optimal after vectorization/unrolling/CSE/canonicalization.			; Test that IR is optimal after vectorization/unrolling/CSE/canonicalization.
	; In particular, there should be no fdivs inside loops because that is expensive.			; In particular, there should be no fdivs inside loops because that is expensive.

	; TODO: There is a CSE opportunity to reduce the hoisted fdivs after vectorization/unrolling.			; TODO: There is a CSE opportunity to reduce the hoisted fdivs after vectorization/unrolling.
	; PR46115 - https://llvm.org/PR46115			; PR46115 - https://llvm.org/PR46115

	target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.15.0"			target triple = "x86_64-apple-macosx10.15.0"

	define void @vdiv(ptr %x, ptr %y, double %a, i32 %N) #0 {			define void @vdiv(ptr %x, ptr %y, double %a, i32 %N) #0 {
	; CHECK-LABEL: @vdiv(			; CHECK-LABEL: @vdiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[DIV:%.]] = fdiv fast double 1.000000e+00, [[A:%.]]
	; CHECK-NEXT: [[CMP1:%.]] = icmp sgt i32 [[N:%.]], 0			; CHECK-NEXT: [[CMP1:%.]] = icmp sgt i32 [[N:%.]], 0
	; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_END:%.]]			; CHECK-NEXT: br i1 [[CMP1]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_END:%.]]
	; CHECK: for.body.preheader:			; CHECK: for.body.preheader:
	; CHECK-NEXT: [[X4:%.]] = ptrtoint ptr [[X:%.]] to i64			; CHECK-NEXT: [[X4:%.]] = ptrtoint ptr [[X:%.]] to i64
	; CHECK-NEXT: [[Y5:%.]] = ptrtoint ptr [[Y:%.]] to i64			; CHECK-NEXT: [[Y5:%.]] = ptrtoint ptr [[Y:%.]] to i64
	; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64			; CHECK-NEXT: [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[N]] to i64
	; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 16			; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[N]], 16
	; CHECK-NEXT: [[TMP0:%.*]] = sub i64 [[X4]], [[Y5]]			; CHECK-NEXT: [[TMP0:%.*]] = sub i64 [[X4]], [[Y5]]
	; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 128			; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 128
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[MIN_ITERS_CHECK]], i1 true, i1 [[DIFF_CHECK]]			; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[MIN_ITERS_CHECK]], i1 true, i1 [[DIFF_CHECK]]
	; CHECK-NEXT: br i1 [[OR_COND]], label [[FOR_BODY_PREHEADER15:%.]], label [[VECTOR_PH:%.]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[FOR_BODY_PREHEADER15:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:			; CHECK: vector.ph:
	; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967280			; CHECK-NEXT: [[N_VEC:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 4294967280
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x double> poison, double [[A:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x double> poison, double [[DIV]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.*]] = insertelement <4 x double> poison, double [[A]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.*]] = insertelement <4 x double> poison, double [[DIV]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT9]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT9]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT11:%.*]] = insertelement <4 x double> poison, double [[A]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT11:%.*]] = insertelement <4 x double> poison, double [[DIV]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT12:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT11]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT12:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT11]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT13:%.*]] = insertelement <4 x double> poison, double [[A]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT13:%.*]] = insertelement <4 x double> poison, double [[DIV]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT14:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT13]], <4 x double> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT14:%.*]] = shufflevector <4 x double> [[BROADCAST_SPLATINSERT13]], <4 x double> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP2:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT10]]
	; CHECK-NEXT: [[TMP3:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT12]]
	; CHECK-NEXT: [[TMP4:%.*]] = fdiv fast <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, [[BROADCAST_SPLAT14]]
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDEX]]
	; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, ptr [[TMP5]], align 8, !tbaa [[TBAA3:![0-9]+]]			; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <4 x double>, ptr [[TMP1]], align 8, !tbaa [[TBAA3:![0-9]+]]
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds double, ptr [[TMP5]], i64 4			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds double, ptr [[TMP1]], i64 4
	; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <4 x double>, ptr [[TMP6]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[WIDE_LOAD6:%.*]] = load <4 x double>, ptr [[TMP2]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds double, ptr [[TMP5]], i64 8			; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds double, ptr [[TMP1]], i64 8
	; CHECK-NEXT: [[WIDE_LOAD7:%.*]] = load <4 x double>, ptr [[TMP7]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[WIDE_LOAD7:%.*]] = load <4 x double>, ptr [[TMP3]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds double, ptr [[TMP5]], i64 12			; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds double, ptr [[TMP1]], i64 12
	; CHECK-NEXT: [[WIDE_LOAD8:%.*]] = load <4 x double>, ptr [[TMP8]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[WIDE_LOAD8:%.*]] = load <4 x double>, ptr [[TMP4]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP9:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[TMP1]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <4 x double> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
	; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x double> [[WIDE_LOAD6]], [[TMP2]]			; CHECK-NEXT: [[TMP6:%.*]] = fmul fast <4 x double> [[WIDE_LOAD6]], [[BROADCAST_SPLAT10]]
	; CHECK-NEXT: [[TMP11:%.*]] = fmul fast <4 x double> [[WIDE_LOAD7]], [[TMP3]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <4 x double> [[WIDE_LOAD7]], [[BROADCAST_SPLAT12]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <4 x double> [[WIDE_LOAD8]], [[TMP4]]			; CHECK-NEXT: [[TMP8:%.*]] = fmul fast <4 x double> [[WIDE_LOAD8]], [[BROADCAST_SPLAT14]]
	; CHECK-NEXT: [[TMP13:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDEX]]			; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDEX]]
	; CHECK-NEXT: store <4 x double> [[TMP9]], ptr [[TMP13]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store <4 x double> [[TMP5]], ptr [[TMP9]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds double, ptr [[TMP13]], i64 4			; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds double, ptr [[TMP9]], i64 4
	; CHECK-NEXT: store <4 x double> [[TMP10]], ptr [[TMP14]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store <4 x double> [[TMP6]], ptr [[TMP10]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP15:%.*]] = getelementptr inbounds double, ptr [[TMP13]], i64 8			; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds double, ptr [[TMP9]], i64 8
	; CHECK-NEXT: store <4 x double> [[TMP11]], ptr [[TMP15]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store <4 x double> [[TMP7]], ptr [[TMP11]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds double, ptr [[TMP13]], i64 12			; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds double, ptr [[TMP9]], i64 12
	; CHECK-NEXT: store <4 x double> [[TMP12]], ptr [[TMP16]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store <4 x double> [[TMP8]], ptr [[TMP12]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP17:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]			; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
	; CHECK-NEXT: br i1 [[TMP17]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP7:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER15]]			; CHECK-NEXT: br i1 [[CMP_N]], label [[FOR_END]], label [[FOR_BODY_PREHEADER15]]
	; CHECK: for.body.preheader15:			; CHECK: for.body.preheader15:
	; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]			; CHECK-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK]] ]
	; CHECK-NEXT: [[TMP18:%.*]] = xor i64 [[INDVARS_IV_PH]], -1			; CHECK-NEXT: [[TMP14:%.*]] = xor i64 [[INDVARS_IV_PH]], -1
	; CHECK-NEXT: [[TMP19:%.*]] = add nsw i64 [[TMP18]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[TMP15:%.*]] = add nsw i64 [[TMP14]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: [[XTRAITER:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 7			; CHECK-NEXT: [[XTRAITER:%.*]] = and i64 [[WIDE_TRIP_COUNT]], 7
	; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0			; CHECK-NEXT: [[LCMP_MOD_NOT:%.*]] = icmp eq i64 [[XTRAITER]], 0
	; CHECK-NEXT: br i1 [[LCMP_MOD_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.]], label [[FOR_BODY_PROL_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[LCMP_MOD_NOT]], label [[FOR_BODY_PROL_LOOPEXIT:%.]], label [[FOR_BODY_PROL:%.]]
	; CHECK: for.body.prol.preheader:
	; CHECK-NEXT: [[TMP20:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: br label [[FOR_BODY_PROL:%.*]]
	; CHECK: for.body.prol:			; CHECK: for.body.prol:
	; CHECK-NEXT: [[INDVARS_IV_PROL:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PROL_PREHEADER]] ]			; CHECK-NEXT: [[INDVARS_IV_PROL:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.]], [[FOR_BODY_PROL]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER15]] ]
	; CHECK-NEXT: [[PROL_ITER:%.]] = phi i64 [ [[PROL_ITER_NEXT:%.]], [[FOR_BODY_PROL]] ], [ 0, [[FOR_BODY_PROL_PREHEADER]] ]			; CHECK-NEXT: [[PROL_ITER:%.]] = phi i64 [ [[PROL_ITER_NEXT:%.]], [[FOR_BODY_PROL]] ], [ 0, [[FOR_BODY_PREHEADER15]] ]
	; CHECK-NEXT: [[ARRAYIDX_PROL:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_PROL]]			; CHECK-NEXT: [[ARRAYIDX_PROL:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_PROL]]
	; CHECK-NEXT: [[T0_PROL:%.*]] = load double, ptr [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[T0_PROL:%.*]] = load double, ptr [[ARRAYIDX_PROL]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP21:%.*]] = fmul fast double [[T0_PROL]], [[TMP20]]			; CHECK-NEXT: [[MUL_PROL:%.*]] = fmul fast double [[T0_PROL]], [[DIV]]
	; CHECK-NEXT: [[ARRAYIDX2_PROL:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_PROL]]			; CHECK-NEXT: [[ARRAYIDX2_PROL:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_PROL]]
	; CHECK-NEXT: store double [[TMP21]], ptr [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store double [[MUL_PROL]], ptr [[ARRAYIDX2_PROL]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 1
	; CHECK-NEXT: [[PROL_ITER_NEXT]] = add i64 [[PROL_ITER]], 1			; CHECK-NEXT: [[PROL_ITER_NEXT]] = add i64 [[PROL_ITER]], 1
	; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_NEXT]], [[XTRAITER]]			; CHECK-NEXT: [[PROL_ITER_CMP_NOT:%.*]] = icmp eq i64 [[PROL_ITER_NEXT]], [[XTRAITER]]
	; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP10:![0-9]+]]			; CHECK-NEXT: br i1 [[PROL_ITER_CMP_NOT]], label [[FOR_BODY_PROL_LOOPEXIT]], label [[FOR_BODY_PROL]], !llvm.loop [[LOOP10:![0-9]+]]
	; CHECK: for.body.prol.loopexit:			; CHECK: for.body.prol.loopexit:
	; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER15]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ]			; CHECK-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER15]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL]] ]
	; CHECK-NEXT: [[TMP22:%.*]] = icmp ult i64 [[TMP19]], 7			; CHECK-NEXT: [[TMP16:%.*]] = icmp ult i64 [[TMP15]], 7
	; CHECK-NEXT: br i1 [[TMP22]], label [[FOR_END]], label [[FOR_BODY_PREHEADER15_NEW:%.*]]			; CHECK-NEXT: br i1 [[TMP16]], label [[FOR_END]], label [[FOR_BODY:%.*]]
	; CHECK: for.body.preheader15.new:
	; CHECK-NEXT: [[TMP23:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP24:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP25:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP26:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP27:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP28:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP29:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: [[TMP30:%.*]] = fdiv fast double 1.000000e+00, [[A]]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_UNR]], [[FOR_BODY_PREHEADER15_NEW]] ], [ [[INDVARS_IV_NEXT_7:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT_7:%.]], [[FOR_BODY]] ], [ [[INDVARS_IV_UNR]], [[FOR_BODY_PROL_LOOPEXIT]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[T0:%.*]] = load double, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[T0:%.*]] = load double, ptr [[ARRAYIDX]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP31:%.*]] = fmul fast double [[T0]], [[TMP23]]			; CHECK-NEXT: [[MUL:%.*]] = fmul fast double [[T0]], [[DIV]]
	; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store double [[TMP31]], ptr [[ARRAYIDX2]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store double [[MUL]], ptr [[ARRAYIDX2]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX_1:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[T0_1:%.*]] = load double, ptr [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[T0_1:%.*]] = load double, ptr [[ARRAYIDX_1]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP32:%.*]] = fmul fast double [[T0_1]], [[TMP24]]			; CHECK-NEXT: [[MUL_1:%.*]] = fmul fast double [[T0_1]], [[DIV]]
	; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX2_1:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: store double [[TMP32]], ptr [[ARRAYIDX2_1]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store double [[MUL_1]], ptr [[ARRAYIDX2_1]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_1]]			; CHECK-NEXT: [[ARRAYIDX_2:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_1]]
	; CHECK-NEXT: [[T0_2:%.*]] = load double, ptr [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[T0_2:%.*]] = load double, ptr [[ARRAYIDX_2]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP33:%.*]] = fmul fast double [[T0_2]], [[TMP25]]			; CHECK-NEXT: [[MUL_2:%.*]] = fmul fast double [[T0_2]], [[DIV]]
	; CHECK-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_1]]			; CHECK-NEXT: [[ARRAYIDX2_2:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_1]]
	; CHECK-NEXT: store double [[TMP33]], ptr [[ARRAYIDX2_2]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store double [[MUL_2]], ptr [[ARRAYIDX2_2]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_2]]			; CHECK-NEXT: [[ARRAYIDX_3:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_2]]
	; CHECK-NEXT: [[T0_3:%.*]] = load double, ptr [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[T0_3:%.*]] = load double, ptr [[ARRAYIDX_3]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP34:%.*]] = fmul fast double [[T0_3]], [[TMP26]]			; CHECK-NEXT: [[MUL_3:%.*]] = fmul fast double [[T0_3]], [[DIV]]
	; CHECK-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_2]]			; CHECK-NEXT: [[ARRAYIDX2_3:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_2]]
	; CHECK-NEXT: store double [[TMP34]], ptr [[ARRAYIDX2_3]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store double [[MUL_3]], ptr [[ARRAYIDX2_3]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_3:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 4			; CHECK-NEXT: [[INDVARS_IV_NEXT_3:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 4
	; CHECK-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_3]]			; CHECK-NEXT: [[ARRAYIDX_4:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_3]]
	; CHECK-NEXT: [[T0_4:%.*]] = load double, ptr [[ARRAYIDX_4]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[T0_4:%.*]] = load double, ptr [[ARRAYIDX_4]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP35:%.*]] = fmul fast double [[T0_4]], [[TMP27]]			; CHECK-NEXT: [[MUL_4:%.*]] = fmul fast double [[T0_4]], [[DIV]]
	; CHECK-NEXT: [[ARRAYIDX2_4:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_3]]			; CHECK-NEXT: [[ARRAYIDX2_4:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_3]]
	; CHECK-NEXT: store double [[TMP35]], ptr [[ARRAYIDX2_4]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store double [[MUL_4]], ptr [[ARRAYIDX2_4]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_4:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 5			; CHECK-NEXT: [[INDVARS_IV_NEXT_4:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 5
	; CHECK-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_4]]			; CHECK-NEXT: [[ARRAYIDX_5:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_4]]
	; CHECK-NEXT: [[T0_5:%.*]] = load double, ptr [[ARRAYIDX_5]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[T0_5:%.*]] = load double, ptr [[ARRAYIDX_5]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP36:%.*]] = fmul fast double [[T0_5]], [[TMP28]]			; CHECK-NEXT: [[MUL_5:%.*]] = fmul fast double [[T0_5]], [[DIV]]
	; CHECK-NEXT: [[ARRAYIDX2_5:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_4]]			; CHECK-NEXT: [[ARRAYIDX2_5:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_4]]
	; CHECK-NEXT: store double [[TMP36]], ptr [[ARRAYIDX2_5]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store double [[MUL_5]], ptr [[ARRAYIDX2_5]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_5:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 6			; CHECK-NEXT: [[INDVARS_IV_NEXT_5:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 6
	; CHECK-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_5]]			; CHECK-NEXT: [[ARRAYIDX_6:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_5]]
	; CHECK-NEXT: [[T0_6:%.*]] = load double, ptr [[ARRAYIDX_6]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[T0_6:%.*]] = load double, ptr [[ARRAYIDX_6]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP37:%.*]] = fmul fast double [[T0_6]], [[TMP29]]			; CHECK-NEXT: [[MUL_6:%.*]] = fmul fast double [[T0_6]], [[DIV]]
	; CHECK-NEXT: [[ARRAYIDX2_6:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_5]]			; CHECK-NEXT: [[ARRAYIDX2_6:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_5]]
	; CHECK-NEXT: store double [[TMP37]], ptr [[ARRAYIDX2_6]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store double [[MUL_6]], ptr [[ARRAYIDX2_6]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_6:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 7			; CHECK-NEXT: [[INDVARS_IV_NEXT_6:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 7
	; CHECK-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_6]]			; CHECK-NEXT: [[ARRAYIDX_7:%.*]] = getelementptr inbounds double, ptr [[Y]], i64 [[INDVARS_IV_NEXT_6]]
	; CHECK-NEXT: [[T0_7:%.*]] = load double, ptr [[ARRAYIDX_7]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: [[T0_7:%.*]] = load double, ptr [[ARRAYIDX_7]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[TMP38:%.*]] = fmul fast double [[T0_7]], [[TMP30]]			; CHECK-NEXT: [[MUL_7:%.*]] = fmul fast double [[T0_7]], [[DIV]]
	; CHECK-NEXT: [[ARRAYIDX2_7:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_6]]			; CHECK-NEXT: [[ARRAYIDX2_7:%.*]] = getelementptr inbounds double, ptr [[X]], i64 [[INDVARS_IV_NEXT_6]]
	; CHECK-NEXT: store double [[TMP38]], ptr [[ARRAYIDX2_7]], align 8, !tbaa [[TBAA3]]			; CHECK-NEXT: store double [[MUL_7]], ptr [[ARRAYIDX2_7]], align 8, !tbaa [[TBAA3]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT_7]] = add nuw nsw i64 [[INDVARS_IV]], 8			; CHECK-NEXT: [[INDVARS_IV_NEXT_7]] = add nuw nsw i64 [[INDVARS_IV]], 8
	; CHECK-NEXT: [[EXITCOND_NOT_7:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_7]], [[WIDE_TRIP_COUNT]]			; CHECK-NEXT: [[EXITCOND_NOT_7:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_7]], [[WIDE_TRIP_COUNT]]
	; CHECK-NEXT: br i1 [[EXITCOND_NOT_7]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT_7]], label [[FOR_END]], label [[FOR_BODY]], !llvm.loop [[LOOP12:![0-9]+]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%div = fdiv fast double 1.0, %a			%div = fdiv fast double 1.0, %a
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/lto-licm.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes='lto<O3>' -S < %s \| FileCheck %s			; RUN: opt -passes='lto<O3>' -S < %s \| FileCheck %s

	define void @hoist_fdiv(ptr %a, float %b) {			define void @hoist_fdiv(ptr %a, float %b) {
	; CHECK-LABEL: @hoist_fdiv(			; CHECK-LABEL: @hoist_fdiv(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = fdiv fast float 1.000000e+00, [[B:%.]]
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: [[I_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_INC:%.]] ]			; CHECK-NEXT: [[I_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.]], [[FOR_INC:%.]] ]
	; CHECK-NEXT: [[CMP_NOT:%.*]] = icmp eq i32 [[I_0]], 1024			; CHECK-NEXT: [[CMP_NOT:%.*]] = icmp eq i32 [[I_0]], 1024
	; CHECK-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_INC]]			; CHECK-NEXT: br i1 [[CMP_NOT]], label [[FOR_END:%.*]], label [[FOR_INC]]
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[I_0]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[I_0]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, ptr [[A:%.]], i64 [[IDXPROM]]
	; CHECK-NEXT: [[TMP0:%.*]] = load float, ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP1:%.*]] = load float, ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = fdiv fast float [[TMP0]], [[B:%.]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul fast float [[TMP1]], [[TMP0]]
	; CHECK-NEXT: store float [[TMP1]], ptr [[ARRAYIDX]], align 4			; CHECK-NEXT: store float [[TMP2]], ptr [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_0]], 1			; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[I_0]], 1
	; CHECK-NEXT: br label [[FOR_COND]]			; CHECK-NEXT: br label [[FOR_COND]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	Show All 23 Lines