This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold add/sub over phi node
Needs ReviewPublic

Authored by inouehrs on Mar 6 2018, 6:21 AM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
scanon
hfinkel
efriedma
kbarton
nemanjai

Summary

This patch adds FoldPHIUserOpIntoPred, which finds a phi node that is

used by only one add/sub instruction and
all its incoming values are ConstantInt or add/sub instruction used only by this phi node.

For such Phi nodes, we can eliminate add/sub instruction after the phi node by changing immediates of previous add/sub instructions.

Example of redundant add instruction to be optimized:

BB1:
  %add = add i64 %a, 5 # -> immediate will be changed to 6
  br label %BB3
BB2:
  %sub = sub i64 %b, 3 # -> immediate will be changed to 2
  br label %BB3
BB3:
  %phi = phi i64 [ %add, %BB1 ], [ %sub, %BB2 ]
  %rc = add i64 %phi, 1 # -> will be removed

Additionally, if only one incoming value to the phi node does not meet above conditions, we can move the add/sub instruction to avoid partially redundant computation.
This optimization happens about 19k times (4k for full redundant cases and others for partially redundant cases) while building LLVM/Clang.

I am going to add similar optimization for logical operations as well as add/sub as a separate patch later.

Diff Detail

Event Timeline

inouehrs created this revision.Mar 6 2018, 6:21 AM

Herald added subscribers: javed.absar, sanjoy. · View Herald TranscriptMar 6 2018, 6:21 AM

So far, I disabled this optimization for Hexagon since this patch affects Hexagon loop idiom recognition (test/CodeGen/Hexagon/loop-idiom/pmpy-mod.ll).
Should I add a flag in backend to control this optimization rather than explicitly checking the triple in this method?

It looks like a lot of the test changes are changing post-increment loop exit checks to pre-increment loop exit checks; that isn't a profitable transform.

This transform is a variation of some of the other phi of ops vs op of phi transforms we already do, just applied to a very limited case of expressions that would not normally be fully available, but where you've decide the cost is low.
It is also likely to screw up loop induction variables :)

None of these are fully redundant (because the expressions do not currently exist in the program in the places you have them), they are all partially redundant.
There is no need to special case it as you have in general, it just requires expression insertion. You are inserting them too, you are just repurposing part of an existing expression that you know will be dead.

The generalization is essentially what you get if you apply the phi-of-ops transform to PRE on top of NewGVN.

I doubt this should be done here because it's going to be hard to control the cost model or anything else here, and it's non-trivial to predict the effects.

Thanks for the comments!
After disabling optimization for loop induction variables, optimization still happens more than 10k during the bootstrap. However, I cannot see visible change in code size and performance with benchmarks.
So, I will revisit this when I find a realistic case for which it matters.

As a simplified toy example, the following C code results in obviously redundant generated code:

unsigned long func(unsigned long v, bool b) {
  unsigned long rc;
  if (b) rc = v + 1;
  else rc = callee(v);
  return rc - 1;
}

generates a code sequence like

	...
	addi 3, 3, 1
	addi 3, 3, -1
	blr

Yeah, this is a generally tricky case to get.
It requires a value based PRE that is going to do a phi of ops transform,
because this testcase is really:

if (b) phival1 = v + 1
else phival2 = callee(v)
phi = (phival1, phival2)
return phi -1

So it has to understand this is equivalent to:

if (b) phival1 = v+ 1, tmpval1 = v
else phival2 = callee(v), tmpval2 = 1 + callee
phi = (phival1, phival2)
return phi-1

Then it can see that v is already available, so tmpval1 is available, so
this is PREable.

None of our PRE's do.
NewGVN does detect this, but it's not a full redundancy, and full
NewGVN-PRE is not done.
IE you will get:
Processing instruction %10 = sub nsw i64 %.0, 1
Simplified <badref> = sub nsw i64 %6, 1 to variable i64 %0
Found phi of ops operand i64 %0 in %5
Cannot find phi of ops operand for <badref> = sub nsw i64 %8, 1 in block
%7

You could trivially handle these kinds of case though by making
makePossiblePHIOfOps do PRE insertion depending on safety and availability
in preds. It computes availability already, and knows this is a PRE case.
It computes some parts of safety but not the parts you'd need to insert
memory instructions.
It also would not be "lifetime optimal" PRE.

It would subsume what we do for GVN's scalar PRE, but it would also require
multiple iterations of NewGVN (just as we require multiple iterations of
GVN's PRE) to get all cases instead of being able to do it in one step.

If you change it to a full redundancy, you'll see it will already perform
the no-cost transform:

c = v+2
printf("%d\n", c); make c used
if (b) phival1 = v + 1
else phival2 =v + 3
phi = (phival1, phival2)
return phi -1
->
c = v+2
printf("%d\n", c); make c used
if (b) phival1 = v + 1
else phival2 =v + 3
phi = (phival1, phival2)
phiofops = (v, c)
return phiofops

(I didn't add global reassociation, so how far you get here depends on how
good a just InstSimplify does at reassociating for you. It's actually very
trivial to add though if we wanted)

Bryant Wong is working on a NewGVN PRE that could perform this in a
lifetime optimal way if you add the phi of ops transform we do in the
appropriate places.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineInternal.h

1 line

InstCombinePHI.cpp

123 lines

test/

Analysis/

ValueTracking/

non-negative-phi-bits.ll

4 lines

Transforms/

IndVarSimplify/

rewrite-loop-exit-value.ll

2 lines

InstCombine/

stacksaverestore.ll

2 lines

LoopUnroll/

runtime-loop-multiple-exits.ll

8 lines

LoopVectorize/

X86/

masked_load_store.ll

10 lines

SLPVectorizer/

AArch64/

gather-reduce.ll

16 lines

Diff 137177

lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 697 Lines • ▼ Show 20 Lines	private:

/// \brief Try to rotate an operation below a PHI node, using PHI nodes for		/// \brief Try to rotate an operation below a PHI node, using PHI nodes for
/// its operands.		/// its operands.
Instruction *FoldPHIArgOpIntoPHI(PHINode &PN);		Instruction *FoldPHIArgOpIntoPHI(PHINode &PN);
Instruction *FoldPHIArgBinOpIntoPHI(PHINode &PN);		Instruction *FoldPHIArgBinOpIntoPHI(PHINode &PN);
Instruction *FoldPHIArgGEPIntoPHI(PHINode &PN);		Instruction *FoldPHIArgGEPIntoPHI(PHINode &PN);
Instruction *FoldPHIArgLoadIntoPHI(PHINode &PN);		Instruction *FoldPHIArgLoadIntoPHI(PHINode &PN);
Instruction *FoldPHIArgZextsIntoPHI(PHINode &PN);		Instruction *FoldPHIArgZextsIntoPHI(PHINode &PN);
		Instruction *FoldPHIUserOpIntoPred(PHINode &PN);

/// If an integer typed PHI has only one use which is an IntToPtr operation,		/// If an integer typed PHI has only one use which is an IntToPtr operation,
/// replace the PHI with an existing pointer typed PHI if it exists. Otherwise		/// replace the PHI with an existing pointer typed PHI if it exists. Otherwise
/// insert a new pointer typed PHI and replace the original one.		/// insert a new pointer typed PHI and replace the original one.
Instruction *FoldIntegerTypedPHI(PHINode &PN);		Instruction *FoldIntegerTypedPHI(PHINode &PN);

/// Helper function for FoldPHIArgXIntoPHI() to set debug location for the		/// Helper function for FoldPHIArgXIntoPHI() to set debug location for the
/// folded operation.		/// folded operation.
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombinePHI.cpp

//===- InstCombinePHI.cpp -------------------------------------------------===//		//===- InstCombinePHI.cpp -------------------------------------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the visitPHINode function.		// This file implements the visitPHINode function.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InstCombineInternal.h"		#include "InstCombineInternal.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
		#include "llvm/ADT/Triple.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "instcombine"		#define DEBUG_TYPE "instcombine"

		static cl::opt<bool>
		FoldPhiUser("instcombine-fold-phi-user", cl::Hidden, cl::init(true),
		cl::desc("enable folding phi user into incoming values"));

/// The PHI arguments will be folded into a single operation with a PHI node		/// The PHI arguments will be folded into a single operation with a PHI node
/// as input. The debug location of the single operation will be the merged		/// as input. The debug location of the single operation will be the merged
/// locations of the original PHI node arguments.		/// locations of the original PHI node arguments.
void InstCombiner::PHIArgMergedDebugLoc(Instruction *Inst, PHINode &PN) {		void InstCombiner::PHIArgMergedDebugLoc(Instruction *Inst, PHINode &PN) {
auto *FirstInst = cast<Instruction>(PN.getIncomingValue(0));		auto *FirstInst = cast<Instruction>(PN.getIncomingValue(0));
Inst->setDebugLoc(FirstInst->getDebugLoc());		Inst->setDebugLoc(FirstInst->getDebugLoc());
// We do not expect a CallInst here, otherwise, N-way merging of DebugLoc		// We do not expect a CallInst here, otherwise, N-way merging of DebugLoc
// will be inefficient.		// will be inefficient.
▲ Show 20 Lines • Show All 596 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::FoldPHIArgLoadIntoPHI(PHINode &PN) {
if (isVolatile)		if (isVolatile)
for (Value *IncValue : PN.incoming_values())		for (Value *IncValue : PN.incoming_values())
cast<LoadInst>(IncValue)->setVolatile(false);		cast<LoadInst>(IncValue)->setVolatile(false);

PHIArgMergedDebugLoc(NewLI, PN);		PHIArgMergedDebugLoc(NewLI, PN);
return NewLI;		return NewLI;
}		}

		// FoldPHIUserOpIntoPred finds a phi node that is used by only one add/sub and
		// all its incomming values are ConstantInt or add/sub used only by this phi.
		// For such case, we can eliminate one add/sub by changing immediates.
		//
		// Example of redundant add instruction to be optimized:
		// BB1:
		// %add = add i64 %a, 5
		// br label %BB3
		// BB2:
		// %sub = sub i64 %b, 3
		// br label %BB3
		// BB3:
		// %phi = phi i64 [ %add, %BB1 ], [ %sub, %BB2 ]
		// %rc = add i64 %phi, 1 # -> will be removed
		//
		// Additionally, if only one incoming value to the phi does not meet above
		// condition, we can move the add/sub instruction to avoid partially redundant
		// computation.

		Instruction *InstCombiner::FoldPHIUserOpIntoPred(PHINode &Phi) {
		// This optimization is disabled for Hexagon so far because it affects
		// Hexagon loop idiom recognition.
		Triple T(Phi.getModule()->getTargetTriple());
		if (T.getArch() == Triple::hexagon \|\| !FoldPhiUser)
		return nullptr;

		if (!Phi.hasOneUse())
		return nullptr;

		// We optimize a phi node that is used by only one add/sub instruction.
		Instruction *User = Phi.user_back();
		ConstantInt *UserImm = nullptr;
		if (!match(User, m_Add(m_Specific(&Phi), m_ConstantInt(UserImm))) &&
		!match(User, m_Sub(m_Specific(&Phi), m_ConstantInt(UserImm))))
		return nullptr;

		int FailCount = 0;
		int FailedIdx = -1;
		// Here we check all incoming values.
		for (unsigned Idx = 0; Idx < Phi.getNumIncomingValues(); Idx++) {
		Value *V = Phi.getIncomingValue(Idx);
		// We can optimize constant int by changing the value.
		if (isa<ConstantInt>(V))
		continue;

		// An add/sub with an immediate can be optimized if it is used only by
		// this phi node.
		if (V->hasOneUse() &&
		(match(V, m_Add(m_Value(), m_ConstantInt())) \|\|
		match(V, m_Sub(m_Value(), m_ConstantInt()))))
		continue;

		// We need to handle partially redundant case here.
		// We do not eliminate partial redudancy if there are more than one
		// incoming values that cannot be optimized to avoid code size bloat.
		if (++FailCount > 1)
		break;

		// If this is a cyclic phi chain, moving instruction may potentially cause
		// infinite loop. This case, we do not set FailedIdx.
		std::function<bool(Value,Value)>
		IsPotentialPhiLoop = [&IsPotentialPhiLoop](Value V, Value AddVal) {
		if (!V->hasOneUse() \|\| !isa<PHINode>(V))
		return false;
		PHINode *PN = dyn_cast<PHINode>(V);
		for (Value *V : PN->incoming_values())
		if (V == AddVal \|\| IsPotentialPhiLoop(V, AddVal))
		return true;
		return false;
		};
		if (IsPotentialPhiLoop(V, User)) break;

		// We remember which incoming value cannot be optimized.
		FailedIdx = Idx;
		}

		// If all incoming values can be optimized (FailCount == 0) or
		// all but one incoming values cannot be optimized (FailCount == 1),
		// apply optimization here.
		if (FailCount == 0 \|\| (FailCount == 1 && FailedIdx != -1)) {
		for (unsigned Idx = 0; Idx < Phi.getNumIncomingValues(); Idx++) {
		Value *V = Phi.getIncomingValue(Idx);
		if ((int)Idx == FailedIdx) {
		// We move add/sub instruction into a BB, which we cannot change
		// immediate in the incoming value from the BB.
		assert(FailCount != 0 &&
		"FailedIdx must not be set for fully redundant case");
		User->setOperand(0, V);
		User->moveBefore(Phi.getIncomingBlock(Idx)->getTerminator());
		} else if (isa<Instruction>(V)) {
		// Update the immediate of the add/sub instruction.
		Instruction *I = cast<Instruction>(V);
		ConstantInt *PredImm = cast<ConstantInt>(I->getOperand(1));
		auto PM = (User->getOpcode() == I->getOpcode()) ? Instruction::Add:
		Instruction::Sub;
		Value* NewImm = ConstantExpr::get(PM, PredImm, UserImm);
		I->setOperand(1, NewImm);
		}
		else if (isa<ConstantInt>(V)) {
		ConstantInt *PredImm = cast<ConstantInt>(V);
		Value* NewImm = ConstantExpr::get(User->getOpcode(), PredImm, UserImm);
		Phi.setIncomingValue(Idx, NewImm);
		}
		}
		User->replaceAllUsesWith(&Phi);
		if (FailedIdx != -1)
		Phi.setIncomingValue(FailedIdx, User);
		return Φ
		}

		return nullptr;
		}

/// TODO: This function could handle other cast types, but then it might		/// TODO: This function could handle other cast types, but then it might
/// require special-casing a cast from the 'i1' type. See the comment in		/// require special-casing a cast from the 'i1' type. See the comment in
/// FoldPHIArgOpIntoPHI() about pessimizing illegal integer types.		/// FoldPHIArgOpIntoPHI() about pessimizing illegal integer types.
Instruction *InstCombiner::FoldPHIArgZextsIntoPHI(PHINode &Phi) {		Instruction *InstCombiner::FoldPHIArgZextsIntoPHI(PHINode &Phi) {
// We cannot create a new instruction after the PHI if the terminator is an		// We cannot create a new instruction after the PHI if the terminator is an
// EHPad because there is no valid insertion point.		// EHPad because there is no valid insertion point.
if (TerminatorInst *TI = Phi.getParent()->getTerminator())		if (TerminatorInst *TI = Phi.getParent()->getTerminator())
if (TI->isEHPad())		if (TI->isEHPad())
▲ Show 20 Lines • Show All 479 Lines • ▼ Show 20 Lines	if (isa<Instruction>(PN.getIncomingValue(0)) &&
cast<Instruction>(PN.getIncomingValue(0))->getOpcode() ==		cast<Instruction>(PN.getIncomingValue(0))->getOpcode() ==
cast<Instruction>(PN.getIncomingValue(1))->getOpcode() &&		cast<Instruction>(PN.getIncomingValue(1))->getOpcode() &&
// FIXME: The hasOneUse check will fail for PHIs that use the value more		// FIXME: The hasOneUse check will fail for PHIs that use the value more
// than themselves more than once.		// than themselves more than once.
PN.getIncomingValue(0)->hasOneUse())		PN.getIncomingValue(0)->hasOneUse())
if (Instruction *Result = FoldPHIArgOpIntoPHI(PN))		if (Instruction *Result = FoldPHIArgOpIntoPHI(PN))
return Result;		return Result;

		if (PN.hasOneUse()) {
		if (Instruction *Result = FoldPHIUserOpIntoPred(PN))
		return Result;
		}

// If this is a trivial cycle in the PHI node graph, remove it. Basically, if		// If this is a trivial cycle in the PHI node graph, remove it. Basically, if
// this PHI only has a single use (a PHI), and if that PHI only has one use (a		// this PHI only has a single use (a PHI), and if that PHI only has one use (a
// PHI)... break the cycle.		// PHI)... break the cycle.
if (PN.hasOneUse()) {		if (PN.hasOneUse()) {
if (Instruction *Result = FoldIntegerTypedPHI(PN))		if (Instruction *Result = FoldIntegerTypedPHI(PN))
return Result;		return Result;

Instruction *PHIUser = cast<Instruction>(PN.user_back());		Instruction *PHIUser = cast<Instruction>(PN.user_back());
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

test/Analysis/ValueTracking/non-negative-phi-bits.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -instcombine < %s -S \| FileCheck %s			; RUN: opt -instcombine < %s -S \| FileCheck %s

	define void @test() #0 {			define void @test() #0 {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 1, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ult i64 [[INDVARS_IV]], 40
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ult i64 [[INDVARS_IV_NEXT]], 40
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%indvars.iv.next = add nsw i64 %indvars.iv, 1			%indvars.iv.next = add nsw i64 %indvars.iv, 1
	%exitcond = icmp slt i64 %indvars.iv.next, 40			%exitcond = icmp slt i64 %indvars.iv.next, 40
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

test/Transforms/IndVarSimplify/rewrite-loop-exit-value.ll

	; RUN: opt -indvars -instcombine -S < %s \| FileCheck %s			; RUN: opt -indvars -instcombine -instcombine-fold-phi-user=0 -S < %s \| FileCheck %s

	;; Test that loop's exit value is rewritten to its initial			;; Test that loop's exit value is rewritten to its initial
	;; value from loop preheader			;; value from loop preheader
	define i32 @test1(i32* %var) {			define i32 @test1(i32* %var) {
	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	entry:			entry:
	%cond = icmp eq i32* %var, null			%cond = icmp eq i32* %var, null
	br label %header			br label %header
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

test/Transforms/InstCombine/stacksaverestore.ll

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	loop:
br i1 %done, label %loop, label %return		br i1 %done, label %loop, label %return

return:		return:
ret void		ret void
}		}

; CHECK-LABEL: define void @test3(		; CHECK-LABEL: define void @test3(
; CHECK: loop:		; CHECK: loop:
; CHECK: %i = phi i32 [ 0, %entry ], [ %i1, %loop ]		; CHECK: %i = phi i32 [ 1, %entry ], [ %i1, %loop ]
; CHECK: %save1 = call i8* @llvm.stacksave()		; CHECK: %save1 = call i8* @llvm.stacksave()
; CHECK: %argmem = alloca inalloca i32		; CHECK: %argmem = alloca inalloca i32
; CHECK: store i32 0, i32* %argmem		; CHECK: store i32 0, i32* %argmem
; CHECK: call void @inalloca_callee(i32* inalloca {{.*}} %argmem)		; CHECK: call void @inalloca_callee(i32* inalloca {{.*}} %argmem)
; CHECK: call void @llvm.stackrestore(i8* %save1)		; CHECK: call void @llvm.stackrestore(i8* %save1)
; CHECK: br i1 %done, label %loop, label %return		; CHECK: br i1 %done, label %loop, label %return
; CHECK: ret void		; CHECK: ret void

test/Transforms/LoopUnroll/runtime-loop-multiple-exits.ll

	Show All 31 Lines

	; PROLOG: test1(			; PROLOG: test1(
	; PROLOG-NEXT: entry:			; PROLOG-NEXT: entry:
	; PROLOG-NEXT: [[TMP0:%.]] = add i64 [[TRIP:%.]], -1			; PROLOG-NEXT: [[TMP0:%.]] = add i64 [[TRIP:%.]], -1
	; PROLOG-NEXT: [[XTRAITER:%.*]] = and i64 [[TRIP]], 7			; PROLOG-NEXT: [[XTRAITER:%.*]] = and i64 [[TRIP]], 7
	; PROLOG-NEXT: [[TMP1:%.*]] = icmp eq i64 [[XTRAITER]], 0			; PROLOG-NEXT: [[TMP1:%.*]] = icmp eq i64 [[XTRAITER]], 0
	; PROLOG-NEXT: br i1 [[TMP1]], label %loop_header.prol.loopexit, label %loop_header.prol.preheader			; PROLOG-NEXT: br i1 [[TMP1]], label %loop_header.prol.loopexit, label %loop_header.prol.preheader
	; PROLOG: loop_header.prol:			; PROLOG: loop_header.prol:
	; PROLOG-NEXT: %iv.prol = phi i64 [ 0, %loop_header.prol.preheader ], [ %iv_next.prol, %loop_latch.prol ]			; PROLOG-NEXT: %iv.prol = phi i64 [ 1, %loop_header.prol.preheader ], [ %iv_next.prol, %loop_latch.prol ]
	; PROLOG-NEXT: %prol.iter = phi i64 [ [[XTRAITER]], %loop_header.prol.preheader ], [ %prol.iter.sub, %loop_latch.prol ]			; PROLOG-NEXT: %prol.iter = phi i64 [ [[XTRAITER]], %loop_header.prol.preheader ], [ %prol.iter.sub, %loop_latch.prol ]
	; PROLOG-NEXT: br i1 %cond, label %loop_latch.prol, label %loop_exiting_bb1.prol			; PROLOG-NEXT: br i1 %cond, label %loop_latch.prol, label %loop_exiting_bb1.prol
	; PROLOG: loop_latch.prol:			; PROLOG: loop_latch.prol:
	; PROLOG-NEXT: %iv_next.prol = add i64 %iv.prol, 1
	; PROLOG-NEXT: %prol.iter.sub = add i64 %prol.iter, -1			; PROLOG-NEXT: %prol.iter.sub = add i64 %prol.iter, -1
	; PROLOG-NEXT: %prol.iter.cmp = icmp eq i64 %prol.iter.sub, 0			; PROLOG-NEXT: %prol.iter.cmp = icmp eq i64 %prol.iter.sub, 0
				; PROLOG-NEXT: %iv_next.prol = add i64 %iv.prol, 1
	; PROLOG-NEXT: br i1 %prol.iter.cmp, label %loop_header.prol.loopexit.unr-lcssa, label %loop_header.prol			; PROLOG-NEXT: br i1 %prol.iter.cmp, label %loop_header.prol.loopexit.unr-lcssa, label %loop_header.prol
	; PROLOG: loop_latch.7:			; PROLOG: loop_latch.7:
	; PROLOG-NEXT: %iv_next.7 = add i64 %iv, 8			; PROLOG-NEXT: %iv_next.7 = add i64 %iv, 8
	; PROLOG-NEXT: %cmp.7 = icmp eq i64 %iv_next.7, %trip			; PROLOG-NEXT: %cmp.7 = icmp eq i64 %iv_next.7, %trip
	; PROLOG-NEXT: br i1 %cmp.7, label %exit2.loopexit.unr-lcssa, label %loop_header			; PROLOG-NEXT: br i1 %cmp.7, label %exit2.loopexit.unr-lcssa, label %loop_header
	entry:			entry:
	br label %loop_header			br label %loop_header

	▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
	; PROLOG-NEXT: [[TMP1:%.*]] = icmp eq i64 [[XTRAITER]], 0			; PROLOG-NEXT: [[TMP1:%.*]] = icmp eq i64 [[XTRAITER]], 0
	; PROLOG-NEXT: br i1 [[TMP1]], label %loop_header.prol.loopexit, label %loop_header.prol.preheader			; PROLOG-NEXT: br i1 [[TMP1]], label %loop_header.prol.loopexit, label %loop_header.prol.preheader
	; PROLOG: loop_header:			; PROLOG: loop_header:
	; PROLOG-NEXT: %iv = phi i64 [ %iv.unr, %entry.new ], [ %iv_next.7, %loop_latch.7 ]			; PROLOG-NEXT: %iv = phi i64 [ %iv.unr, %entry.new ], [ %iv_next.7, %loop_latch.7 ]
	; PROLOG-NEXT: %sum = phi i64 [ %sum.unr, %entry.new ], [ %sum.next.7, %loop_latch.7 ]			; PROLOG-NEXT: %sum = phi i64 [ %sum.unr, %entry.new ], [ %sum.next.7, %loop_latch.7 ]
	; PROLOG: loop_exiting_bb1.7:			; PROLOG: loop_exiting_bb1.7:
	; PROLOG-NEXT: switch i64 %sum.next.6, label %loop_latch.7			; PROLOG-NEXT: switch i64 %sum.next.6, label %loop_latch.7
	; PROLOG: loop_latch.7:			; PROLOG: loop_latch.7:
	; PROLOG-NEXT: %iv_next.7 = add nsw i64 %iv, 8			; PROLOG-NEXT: %iv_next.7 = add nuw nsw i64 %iv, 8
	; PROLOG-NEXT: %sum.next.7 = add i64 %sum.next.6, %add			; PROLOG-NEXT: %sum.next.7 = add i64 %sum.next.6, %add
	; PROLOG-NEXT: %cmp.7 = icmp eq i64 %iv_next.7, %trip			; PROLOG-NEXT: %cmp.7 = icmp eq i64 %iv_next.7, %trip
	; PROLOG-NEXT: br i1 %cmp.7, label %exit2.loopexit.unr-lcssa, label %loop_header			; PROLOG-NEXT: br i1 %cmp.7, label %exit2.loopexit.unr-lcssa, label %loop_header
	entry:			entry:
	br label %loop_header			br label %loop_header

	loop_header:			loop_header:
	%iv = phi i64 [ 0, %entry ], [ %iv_next, %loop_latch ]			%iv = phi i64 [ 0, %entry ], [ %iv_next, %loop_latch ]
	▲ Show 20 Lines • Show All 299 Lines • ▼ Show 20 Lines

	declare i8 addrspace(1)* @foo(i32)			declare i8 addrspace(1)* @foo(i32)
	; inner loop prolog unrolled			; inner loop prolog unrolled
	; a value from outer loop is used in exit block of inner loop.			; a value from outer loop is used in exit block of inner loop.
	; Don't create VMap entries for such values (%trip).			; Don't create VMap entries for such values (%trip).
	define i8 addrspace(1)* @test9(i8* nocapture readonly %arg, i32 %n) {			define i8 addrspace(1)* @test9(i8* nocapture readonly %arg, i32 %n) {
	; PROLOG: test9(			; PROLOG: test9(
	; PROLOG: header.prol:			; PROLOG: header.prol:
	; PROLOG-NEXT: %phi.prol = phi i64 [ 0, %header.prol.preheader ], [ %iv.next.prol, %latch.prol ]			; PROLOG-NEXT: %phi.prol = phi i64 [ 1, %header.prol.preheader ], [ %iv.next.prol, %latch.prol ]
	; PROLOG: latch.prol:			; PROLOG: latch.prol:
	; PROLOG-NOT: trip			; PROLOG-NOT: trip
	; PROLOG: br i1 %prol.iter.cmp, label %header.prol.loopexit.unr-lcssa, label %header.prol			; PROLOG: br i1 %prol.iter.cmp, label %header.prol.loopexit.unr-lcssa, label %header.prol
	bb:			bb:
	br label %outerloopHdr			br label %outerloopHdr

	outerloopHdr: ; preds = %outerLatch, %bb			outerloopHdr: ; preds = %outerLatch, %bb
	%trip = add i32 %n, -1			%trip = add i32 %n, -1
	Show All 25 Lines

test/Transforms/LoopVectorize/X86/masked_load_store.ll

	Show First 20 Lines • Show All 1,982 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: [[BOUND1:%.]] = icmp ugt i32 [[TMP1]], [[TRIGGER]]			; AVX512-NEXT: [[BOUND1:%.]] = icmp ugt i32 [[TMP1]], [[TRIGGER]]
	; AVX512-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]			; AVX512-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
	; AVX512-NEXT: [[BOUND017:%.]] = icmp ugt double [[SCEVGEP15]], [[A]]			; AVX512-NEXT: [[BOUND017:%.]] = icmp ugt double [[SCEVGEP15]], [[A]]
	; AVX512-NEXT: [[BOUND118:%.]] = icmp ugt double [[SCEVGEP]], [[B]]			; AVX512-NEXT: [[BOUND118:%.]] = icmp ugt double [[SCEVGEP]], [[B]]
	; AVX512-NEXT: [[FOUND_CONFLICT19:%.*]] = and i1 [[BOUND017]], [[BOUND118]]			; AVX512-NEXT: [[FOUND_CONFLICT19:%.*]] = and i1 [[BOUND017]], [[BOUND118]]
	; AVX512-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT19]]			; AVX512-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[FOUND_CONFLICT]], [[FOUND_CONFLICT19]]
	; AVX512-NEXT: br i1 [[CONFLICT_RDX]], label [[FOR_BODY_PREHEADER:%.]], label [[VECTOR_BODY:%.]]			; AVX512-NEXT: br i1 [[CONFLICT_RDX]], label [[FOR_BODY_PREHEADER:%.]], label [[VECTOR_BODY:%.]]
	; AVX512: vector.body:			; AVX512: vector.body:
	; AVX512-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT_2:%.]], [[VECTOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; AVX512-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT_2:%.]], [[VECTOR_BODY]] ], [ 24, [[ENTRY:%.*]] ]
	; AVX512-NEXT: [[VEC_IND:%.]] = phi <8 x i64> [ [[VEC_IND_NEXT_2:%.]], [[VECTOR_BODY]] ], [ <i64 0, i64 16, i64 32, i64 48, i64 64, i64 80, i64 96, i64 112>, [[ENTRY]] ]			; AVX512-NEXT: [[VEC_IND:%.]] = phi <8 x i64> [ [[VEC_IND_NEXT_2:%.]], [[VECTOR_BODY]] ], [ <i64 0, i64 16, i64 32, i64 48, i64 64, i64 80, i64 96, i64 112>, [[ENTRY]] ]
	; AVX512-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], <8 x i64> [[VEC_IND]]			; AVX512-NEXT: [[TMP2:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], <8 x i64> [[VEC_IND]]
	; AVX512-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP2]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef), !alias.scope !41			; AVX512-NEXT: [[WIDE_MASKED_GATHER:%.]] = call <8 x i32> @llvm.masked.gather.v8i32.v8p0i32(<8 x i32> [[TMP2]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef), !alias.scope !41
	; AVX512-NEXT: [[TMP3:%.*]] = icmp slt <8 x i32> [[WIDE_MASKED_GATHER]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP3:%.*]] = icmp slt <8 x i32> [[WIDE_MASKED_GATHER]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP4:%.*]] = shl nuw nsw <8 x i64> [[VEC_IND]], <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1>			; AVX512-NEXT: [[TMP4:%.*]] = shl nuw nsw <8 x i64> [[VEC_IND]], <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1>
	; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds double, double [[B]], <8 x i64> [[TMP4]]			; AVX512-NEXT: [[TMP5:%.]] = getelementptr inbounds double, double [[B]], <8 x i64> [[TMP4]]
	; AVX512-NEXT: [[WIDE_MASKED_GATHER20:%.]] = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double> [[TMP5]], i32 8, <8 x i1> [[TMP3]], <8 x double> undef), !alias.scope !44			; AVX512-NEXT: [[WIDE_MASKED_GATHER20:%.]] = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double> [[TMP5]], i32 8, <8 x i1> [[TMP3]], <8 x double> undef), !alias.scope !44
	; AVX512-NEXT: [[TMP6:%.*]] = sitofp <8 x i32> [[WIDE_MASKED_GATHER]] to <8 x double>			; AVX512-NEXT: [[TMP6:%.*]] = sitofp <8 x i32> [[WIDE_MASKED_GATHER]] to <8 x double>
	Show All 17 Lines
	; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_MASKED_GATHER_2]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>			; AVX512-NEXT: [[TMP17:%.*]] = icmp slt <8 x i32> [[WIDE_MASKED_GATHER_2]], <i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100, i32 100>
	; AVX512-NEXT: [[TMP18:%.*]] = shl nuw nsw <8 x i64> [[VEC_IND_NEXT_1]], <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1>			; AVX512-NEXT: [[TMP18:%.*]] = shl nuw nsw <8 x i64> [[VEC_IND_NEXT_1]], <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1>
	; AVX512-NEXT: [[TMP19:%.]] = getelementptr inbounds double, double [[B]], <8 x i64> [[TMP18]]			; AVX512-NEXT: [[TMP19:%.]] = getelementptr inbounds double, double [[B]], <8 x i64> [[TMP18]]
	; AVX512-NEXT: [[WIDE_MASKED_GATHER20_2:%.]] = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double> [[TMP19]], i32 8, <8 x i1> [[TMP17]], <8 x double> undef), !alias.scope !44			; AVX512-NEXT: [[WIDE_MASKED_GATHER20_2:%.]] = call <8 x double> @llvm.masked.gather.v8f64.v8p0f64(<8 x double> [[TMP19]], i32 8, <8 x i1> [[TMP17]], <8 x double> undef), !alias.scope !44
	; AVX512-NEXT: [[TMP20:%.*]] = sitofp <8 x i32> [[WIDE_MASKED_GATHER_2]] to <8 x double>			; AVX512-NEXT: [[TMP20:%.*]] = sitofp <8 x i32> [[WIDE_MASKED_GATHER_2]] to <8 x double>
	; AVX512-NEXT: [[TMP21:%.*]] = fadd <8 x double> [[WIDE_MASKED_GATHER20_2]], [[TMP20]]			; AVX512-NEXT: [[TMP21:%.*]] = fadd <8 x double> [[WIDE_MASKED_GATHER20_2]], [[TMP20]]
	; AVX512-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double [[A]], <8 x i64> [[VEC_IND_NEXT_1]]			; AVX512-NEXT: [[TMP22:%.]] = getelementptr inbounds double, double [[A]], <8 x i64> [[VEC_IND_NEXT_1]]
	; AVX512-NEXT: call void @llvm.masked.scatter.v8f64.v8p0f64(<8 x double> [[TMP21]], <8 x double*> [[TMP22]], i32 8, <8 x i1> [[TMP17]]), !alias.scope !46, !noalias !48			; AVX512-NEXT: call void @llvm.masked.scatter.v8f64.v8p0f64(<8 x double> [[TMP21]], <8 x double*> [[TMP22]], i32 8, <8 x i1> [[TMP17]]), !alias.scope !46, !noalias !48
	; AVX512-NEXT: [[INDEX_NEXT_2]] = add nuw nsw i64 [[INDEX]], 24
	; AVX512-NEXT: [[VEC_IND_NEXT_2]] = add <8 x i64> [[VEC_IND]], <i64 384, i64 384, i64 384, i64 384, i64 384, i64 384, i64 384, i64 384>			; AVX512-NEXT: [[VEC_IND_NEXT_2]] = add <8 x i64> [[VEC_IND]], <i64 384, i64 384, i64 384, i64 384, i64 384, i64 384, i64 384, i64 384>
	; AVX512-NEXT: [[TMP23:%.*]] = icmp eq i64 [[INDEX_NEXT_2]], 624			; AVX512-NEXT: [[TMP23:%.*]] = icmp eq i64 [[INDEX]], 624
				; AVX512-NEXT: [[INDEX_NEXT_2]] = add nuw nsw i64 [[INDEX]], 24
	; AVX512-NEXT: br i1 [[TMP23]], label [[FOR_BODY_PREHEADER]], label [[VECTOR_BODY]], !llvm.loop !49			; AVX512-NEXT: br i1 [[TMP23]], label [[FOR_BODY_PREHEADER]], label [[VECTOR_BODY]], !llvm.loop !49
	; AVX512: for.body.preheader:			; AVX512: for.body.preheader:
	; AVX512-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ 9984, [[VECTOR_BODY]] ]			; AVX512-NEXT: [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ 9984, [[VECTOR_BODY]] ]
	; AVX512-NEXT: [[TMP24:%.*]] = sub nsw i64 9999, [[INDVARS_IV_PH]]			; AVX512-NEXT: [[TMP24:%.*]] = sub nsw i64 9999, [[INDVARS_IV_PH]]
	; AVX512-NEXT: br label [[FOR_BODY_PROL:%.*]]			; AVX512-NEXT: br label [[FOR_BODY_PROL:%.*]]
	; AVX512: for.body.prol:			; AVX512: for.body.prol:
	; AVX512-NEXT: [[INDVARS_IV_PROL:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.]], [[FOR_INC_PROL:%.*]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER]] ]			; AVX512-NEXT: [[INDVARS_IV_PROL:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PROL:%.]], [[FOR_INC_PROL:%.*]] ], [ [[INDVARS_IV_PH]], [[FOR_BODY_PREHEADER]] ]
	; AVX512-NEXT: [[PROL_ITER:%.]] = phi i64 [ [[PROL_ITER_SUB:%.]], [[FOR_INC_PROL]] ], [ 1, [[FOR_BODY_PREHEADER]] ]			; AVX512-NEXT: [[PROL_ITER:%.]] = phi i64 [ [[PROL_ITER_SUB:%.]], [[FOR_INC_PROL]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; AVX512-NEXT: [[ARRAYIDX_PROL:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDVARS_IV_PROL]]			; AVX512-NEXT: [[ARRAYIDX_PROL:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDVARS_IV_PROL]]
	; AVX512-NEXT: [[TMP25:%.]] = load i32, i32 [[ARRAYIDX_PROL]], align 4			; AVX512-NEXT: [[TMP25:%.]] = load i32, i32 [[ARRAYIDX_PROL]], align 4
	; AVX512-NEXT: [[CMP1_PROL:%.*]] = icmp slt i32 [[TMP25]], 100			; AVX512-NEXT: [[CMP1_PROL:%.*]] = icmp slt i32 [[TMP25]], 100
	; AVX512-NEXT: br i1 [[CMP1_PROL]], label [[IF_THEN_PROL:%.*]], label [[FOR_INC_PROL]]			; AVX512-NEXT: br i1 [[CMP1_PROL]], label [[IF_THEN_PROL:%.*]], label [[FOR_INC_PROL]]
	; AVX512: if.then.prol:			; AVX512: if.then.prol:
	; AVX512-NEXT: [[TMP26:%.*]] = shl nuw nsw i64 [[INDVARS_IV_PROL]], 1			; AVX512-NEXT: [[TMP26:%.*]] = shl nuw nsw i64 [[INDVARS_IV_PROL]], 1
	; AVX512-NEXT: [[ARRAYIDX3_PROL:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP26]]			; AVX512-NEXT: [[ARRAYIDX3_PROL:%.]] = getelementptr inbounds double, double [[B]], i64 [[TMP26]]
	; AVX512-NEXT: [[TMP27:%.]] = load double, double [[ARRAYIDX3_PROL]], align 8			; AVX512-NEXT: [[TMP27:%.]] = load double, double [[ARRAYIDX3_PROL]], align 8
	; AVX512-NEXT: [[CONV_PROL:%.*]] = sitofp i32 [[TMP25]] to double			; AVX512-NEXT: [[CONV_PROL:%.*]] = sitofp i32 [[TMP25]] to double
	; AVX512-NEXT: [[ADD_PROL:%.*]] = fadd double [[TMP27]], [[CONV_PROL]]			; AVX512-NEXT: [[ADD_PROL:%.*]] = fadd double [[TMP27]], [[CONV_PROL]]
	; AVX512-NEXT: [[ARRAYIDX7_PROL:%.]] = getelementptr inbounds double, double [[A]], i64 [[INDVARS_IV_PROL]]			; AVX512-NEXT: [[ARRAYIDX7_PROL:%.]] = getelementptr inbounds double, double [[A]], i64 [[INDVARS_IV_PROL]]
	; AVX512-NEXT: store double [[ADD_PROL]], double* [[ARRAYIDX7_PROL]], align 8			; AVX512-NEXT: store double [[ADD_PROL]], double* [[ARRAYIDX7_PROL]], align 8
	; AVX512-NEXT: br label [[FOR_INC_PROL]]			; AVX512-NEXT: br label [[FOR_INC_PROL]]
	; AVX512: for.inc.prol:			; AVX512: for.inc.prol:
	; AVX512-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 16			; AVX512-NEXT: [[INDVARS_IV_NEXT_PROL]] = add nuw nsw i64 [[INDVARS_IV_PROL]], 16
				; AVX512-NEXT: [[PROL_ITER_CMP:%.*]] = icmp eq i64 [[PROL_ITER]], 0
	; AVX512-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1			; AVX512-NEXT: [[PROL_ITER_SUB]] = add i64 [[PROL_ITER]], -1
	; AVX512-NEXT: [[PROL_ITER_CMP:%.*]] = icmp eq i64 [[PROL_ITER_SUB]], 0
	; AVX512-NEXT: br i1 [[PROL_ITER_CMP]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL]], !llvm.loop !50			; AVX512-NEXT: br i1 [[PROL_ITER_CMP]], label [[FOR_BODY_PROL_LOOPEXIT:%.*]], label [[FOR_BODY_PROL]], !llvm.loop !50
	; AVX512: for.body.prol.loopexit:			; AVX512: for.body.prol.loopexit:
	; AVX512-NEXT: [[DOTMASK:%.*]] = and i64 [[TMP24]], 9984			; AVX512-NEXT: [[DOTMASK:%.*]] = and i64 [[TMP24]], 9984
	; AVX512-NEXT: [[TMP28:%.*]] = icmp eq i64 [[DOTMASK]], 0			; AVX512-NEXT: [[TMP28:%.*]] = icmp eq i64 [[DOTMASK]], 0
	; AVX512-NEXT: br i1 [[TMP28]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]			; AVX512-NEXT: br i1 [[TMP28]], label [[FOR_END:%.]], label [[FOR_BODY:%.]]
	; AVX512: for.body:			; AVX512: for.body:
	; AVX512-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT_3:%.]], [[FOR_INC_3:%.*]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL_LOOPEXIT]] ]			; AVX512-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT_3:%.]], [[FOR_INC_3:%.*]] ], [ [[INDVARS_IV_NEXT_PROL]], [[FOR_BODY_PROL_LOOPEXIT]] ]
	; AVX512-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDVARS_IV]]			; AVX512-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[TRIGGER]], i64 [[INDVARS_IV]]
	▲ Show 20 Lines • Show All 1,330 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/AArch64/gather-reduce.ll

	Show All 27 Lines
	; GENERIC: for.body.preheader:			; GENERIC: for.body.preheader:
	; GENERIC-NEXT: br label [[FOR_BODY:%.*]]			; GENERIC-NEXT: br label [[FOR_BODY:%.*]]
	; GENERIC: for.cond.cleanup.loopexit:			; GENERIC: for.cond.cleanup.loopexit:
	; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]			; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]
	; GENERIC: for.cond.cleanup:			; GENERIC: for.cond.cleanup:
	; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]			; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]
	; GENERIC: for.body:			; GENERIC: for.body:
	; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 1, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8			; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i32 7			; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i32 7
	; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
				; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[I_0103]], [[N]]
	; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	; GENERIC-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]			; GENERIC-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
	;			;
	; KRYO-LABEL: @gather_reduce_8x16_i32(			; KRYO-LABEL: @gather_reduce_8x16_i32(
	; KRYO-NEXT: entry:			; KRYO-NEXT: entry:
	; KRYO-NEXT: [[CMP_99:%.]] = icmp sgt i32 [[N:%.]], 0			; KRYO-NEXT: [[CMP_99:%.]] = icmp sgt i32 [[N:%.]], 0
	; KRYO-NEXT: br i1 [[CMP_99]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]			; KRYO-NEXT: br i1 [[CMP_99]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
	; KRYO: for.body.preheader:			; KRYO: for.body.preheader:
	; KRYO-NEXT: br label [[FOR_BODY:%.*]]			; KRYO-NEXT: br label [[FOR_BODY:%.*]]
	; KRYO: for.cond.cleanup.loopexit:			; KRYO: for.cond.cleanup.loopexit:
	; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]			; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]
	; KRYO: for.cond.cleanup:			; KRYO: for.cond.cleanup:
	; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]			; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]
	; KRYO: for.body:			; KRYO: for.body:
	; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 1, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8			; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i32 7			; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i32 7
	; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
				; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[I_0103]], [[N]]
	; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	; KRYO-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]			; KRYO-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
	;			;
	entry:			entry:
	%cmp.99 = icmp sgt i32 %n, 0			%cmp.99 = icmp sgt i32 %n, 0
	br i1 %cmp.99, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp.99, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader:			for.body.preheader:
	br label %for.body			br label %for.body
	▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	; GENERIC: for.body.preheader:			; GENERIC: for.body.preheader:
	; GENERIC-NEXT: br label [[FOR_BODY:%.*]]			; GENERIC-NEXT: br label [[FOR_BODY:%.*]]
	; GENERIC: for.cond.cleanup.loopexit:			; GENERIC: for.cond.cleanup.loopexit:
	; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]			; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]
	; GENERIC: for.cond.cleanup:			; GENERIC: for.cond.cleanup:
	; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]			; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]
	; GENERIC: for.body:			; GENERIC: for.body:
	; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 1, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8			; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i32 7			; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i32 7
	; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
				; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[I_0103]], [[N]]
	; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	; GENERIC-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]			; GENERIC-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
	;			;
	; KRYO-LABEL: @gather_reduce_8x16_i64(			; KRYO-LABEL: @gather_reduce_8x16_i64(
	; KRYO-NEXT: entry:			; KRYO-NEXT: entry:
	; KRYO-NEXT: [[CMP_99:%.]] = icmp sgt i32 [[N:%.]], 0			; KRYO-NEXT: [[CMP_99:%.]] = icmp sgt i32 [[N:%.]], 0
	; KRYO-NEXT: br i1 [[CMP_99]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]			; KRYO-NEXT: br i1 [[CMP_99]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
	; KRYO: for.body.preheader:			; KRYO: for.body.preheader:
	; KRYO-NEXT: br label [[FOR_BODY:%.*]]			; KRYO-NEXT: br label [[FOR_BODY:%.*]]
	; KRYO: for.cond.cleanup.loopexit:			; KRYO: for.cond.cleanup.loopexit:
	; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]			; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]
	; KRYO: for.cond.cleanup:			; KRYO: for.cond.cleanup:
	; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]			; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]
	; KRYO: for.body:			; KRYO: for.body:
	; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 1, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8			; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i32 7			; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i32 7
	; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
				; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[I_0103]], [[N]]
	; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	; KRYO-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]			; KRYO-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP_LOOPEXIT]], label [[FOR_BODY]]
	;			;
	entry:			entry:
	%cmp.99 = icmp sgt i32 %n, 0			%cmp.99 = icmp sgt i32 %n, 0
	br i1 %cmp.99, label %for.body.preheader, label %for.cond.cleanup			br i1 %cmp.99, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader:			for.body.preheader:
	br label %for.body			br label %for.body
	▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines