This is an archive of the discontinued LLVM Phabricator instance.

Differential D27321

Fix LSR ImmCost calculation for profitable chains
AbandonedPublic

Authored by evstupac on Dec 1 2016, 2:11 PM.

Download Raw Diff

Details

Reviewers

qcolombet

Summary

For the following case:
for(j = 0; j < n; j++) {

s += *(in++);
s += *(in++);

}
LSR consider "in" as profitable chain and do count ImmCost only for first load.
However when we have a base set before the loop
in += 1024;
ImmCost could be a significant value.

Diff Detail

Repository: rL LLVM

Event Timeline

evstupac updated this revision to Diff 79976.Dec 1 2016, 2:11 PM

evstupac retitled this revision from to Fix LSR ImmCost calculation for profitable chains.

evstupac updated this object.

evstupac added a reviewer: qcolombet.

evstupac set the repository for this revision to rL LLVM.

evstupac added subscribers: llvm-commits, anna, wmi, hfinkel.

Herald added a subscriber: mzolotukhin. · View Herald TranscriptDec 1 2016, 2:11 PM

evstupac added a subscriber: Farhana.Dec 1 2016, 2:12 PM

Hi,

I am not sure I get the problem you are trying to solve.
You are saying that we want to account for an extra cost if the setting outside of the base is big, right?
Then, why is the average offset relevant to answer that question?

Thanks,
-Quentin

lib/Transforms/Scalar/LoopStrengthReduce.cpp
1025	Use oxygen style: ///
test/Transforms/LoopStrengthReduce/X86/imm-cost.ll
27	Use opt -instnamer to get rid of implicit variables.

Hi,

Thanks for taking a look.

You are saying that we want to account for an extra cost if the setting outside of the base is big, right?

No. I just want correct behavior from current algorithm. LSR already take in account ImmCost, but do this incorrectly for "profitable chains".
When we do "CollectFixupsAndInitialFormulae()":
(line 2969) // Skip IV users that are part of profitable IV Chains.
We do not insert Fixups for memory instructions in the "profitable chain" and therefore do not count ImmCost for them in RateFormula():
(line 1156) for (const LSRFixup &Fixup : LU.Fixups) {

Then, why is the average offset relevant to answer that question?

As there is no exact Offset (see above) we can only estimate.

Thanks,
Evgeny

Updated according to inline comments.

Hi,

Okay, I see what you are trying to solve now.

I'll have a closer look hopefully tomorrow.

Cheers,
-Quentin

Hi,

As there is no exact Offset (see above) we can only estimate.

That part bothered me in the first place. I don't see how we can derive good heuristics by injecting guesses in the input data. It sounds to me that we rely even more on the luck factor.

Looking a bit closer, I agree that the main problem is that we assume the fix ups are free for the profitable chain. Maybe we should change the profitability check for the chains altogether or have a different user for them, like one that only sums the immcost?
In particular, the code below that line could use some refinement:

// Incrementing by zero or some constant is neutral. We assume constants can
// be folded into an addressing mode or an add's immediate operand.

That being said, I believe we have another bug in the profitable computation. Namely, I believe this part

// An IV chain with a single increment is handled by LSR's postinc
// uses. However, a chain with multiple increments requires keeping the IV's
// value live longer than it needs to be if chained.
if (NumConstIncrements > 1)
  --cost;

Should be turned into:

// An IV chain with a single increment is handled by LSR's postinc
// uses. However, a chain with multiple increments requires keeping the IV's
// value live longer than it needs to be if chained.
if (NumConstIncrements > 1)
  ++cost;

I.e., having to keep something around should increase the cost of the chain, not decreasing it.

Fixing that seems to fix your test case, but I may have overlooked something admittedly.

Cheers,
-Quentin

Looking a bit closer, I agree that the main problem is that we assume the fix ups are free for the profitable chain. Maybe we should change the profitability check for the chains altogether or have a different user for them, like one that only sums the immcost?

That is exactly what I'm trying to implement in the patch. Basically "avg Offset" will most likely be just Offset, because it is hard to imagine how there could be more than 1 fixup for profitable chain.

In particular, the code below that line could use some refinement:
// Incrementing by zero or some constant is neutral. We assume constants can
// be folded into an addressing mode or an add's immediate operand.

...

Fixing that seems to fix your test case, but I may have overlooked something admittedly.

Yes it help with the test, but do not solve the problem.
Maybe we should this fix this as well as a part of another patch.

The following C example also suffers from incorrect immcost calculation (-fno-tree-vectorize required):
void foo(int n, char *x, char *y) {

char *xx = x - 1024;
char *yy = y - 2;
for (int i = 0; i < 1024; i++) {
  ++*xx++;
  ++*yy++;
}

}

Currently LLVM creates:

movq    $-1024, %rax

.LBB0_1:

incb    (%rsi,%rax)
incb    1022(%rdx,%rax)
incb    1(%rsi,%rax)
incb    1023(%rdx,%rax)
addq    $2, %rax
testl   %eax, %eax
jne     .LBB0_1

Assuming ImmCost(1022) < ImmCost(1024), but it should not as real ImmCost of the solution should be ImmCost(1022) + ImmCost(1023) which is > ImmCost(1024).

So the solution should look like:

movq    $0, %rax
addl    $-2, %rsi
addl    $-1024, %rdx

.LBB0_1:

incb    (%rsi,%rax)
incb    (%rdx,%rax)
incb    1(%rsi,%rax)
incb    1(%rdx,%rax)
addq    $2, %rax
cmpq   $1024, %rax
jne     .LBB0_1

PING.

Hi,

Assuming ImmCost(1022) < ImmCost(1024), but it should not as real ImmCost of the solution should be ImmCost(1022) + ImmCost(1023) which is > ImmCost(1024).

Hold on, the constants that you are using for your cost is not what the LLVM IR has when LSR does the transformation.
Assuming I compiled the test case correctly, we traverse:
getelementptr inbounds i8, i8* %arg1, i64 -1024
getelementptr inbounds i8, i8* %arg2, i64 -2
For 0 to 1024 with a +2 increment (loop partially unrolled).

The IV we get is {-1024,+,2}, that means,
First access of %arg1 is through IV + arg1
Second access of %arg1 is through PrevArg1Access + 1 <-- this is supposed to be free by the chain profitability model
First access of %arg2 is through IV + arg2 + 1022
Second access of %arg2 is through PrevArg2Access + 1 <-- ditto
Then the comparison is against 0

In the end, the only ImmCost that LSR sees is 1022 and 0 for that solution. I understand you'd like to add 1023 (second access of arg2) and 1 (second access of arg1), but I don't think again the averaging the fixups with the number of same base address is the right way to do it.
Instead, if you believe the cost is not free for the second accesses, I would suggest to rework the profitability check for the chain, i.e., isProfitableChain.
More particularly, this snippet:

// Incrementing by zero or some constant is neutral. We assume constants can
// be folded into an addressing mode or an add's immediate operand.
if (isa<SCEVConstant>(Inc.IncExpr)) {
  ++NumConstIncrements;
  continue;
}

Thanks,
-Quentin

This revision now requires changes to proceed.Jan 25 2017, 1:36 PM

Second access of %arg1 is through PrevArg1Access + 1 <-- this is supposed to be free by the chain profitability model

It is free in terms of RegNum or AddRecCost... which has highest priority. But it is not always free in terms of ImmCost.

Narrowing profitable chains hits will lead to more Uses and exceed complexity limit. Which is not ok for RegNum...
So I don't think we should limit profitable chains somehow.

I don't see how I can increase ImmCost here:

// Incrementing by zero or some constant is neutral. We assume constants can
// be folded into an addressing mode or an add's immediate operand.
if (isa<SCEVConstant>(Inc.IncExpr)) {
  ++NumConstIncrements;
  continue;
}

If not count the same base address, I don't see how influence on ImmCost only.

but I don't think again the averaging the fixups with the number of same base address is the right way to do it.

We can add just cost of base offset.

Anyway there is no much profit from this (just some code size). I noticed this unexpected behavior during testing.
And I'm ok to leave the code as is.

evstupac abandoned this revision.May 2 2018, 12:03 PM

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LoopStrengthReduce.cpp

51 lines

test/

Transforms/

LoopStrengthReduce/

X86/

imm-cost.ll

40 lines

Diff 81192

lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 1,016 Lines • ▼ Show 20 Lines	public:

/// The list of operands which are to be replaced.		/// The list of operands which are to be replaced.
SmallVector<LSRFixup, 8> Fixups;		SmallVector<LSRFixup, 8> Fixups;

/// Keep track of the min and max offsets of the fixups.		/// Keep track of the min and max offsets of the fixups.
int64_t MinOffset;		int64_t MinOffset;
int64_t MaxOffset;		int64_t MaxOffset;

		/// Number of same base addresses.
		qcolombetUnsubmitted Not Done Reply Inline Actions Use oxygen style: /// qcolombet: Use oxygen style: ///
		unsigned SameBaseAddress;

/// This records whether all of the fixups using this LSRUse are outside of		/// This records whether all of the fixups using this LSRUse are outside of
/// the loop, in which case some special-case heuristics may be used.		/// the loop, in which case some special-case heuristics may be used.
bool AllFixupsOutsideLoop;		bool AllFixupsOutsideLoop;

/// RigidFormula is set to true to guarantee that this use will be associated		/// RigidFormula is set to true to guarantee that this use will be associated
/// with a single formula--the one that initially matched. Some SCEV		/// with a single formula--the one that initially matched. Some SCEV
/// expressions cannot be expanded. This allows LSR to consider the registers		/// expressions cannot be expanded. This allows LSR to consider the registers
/// used by those expressions without the need to expand them later after		/// used by those expressions without the need to expand them later after
Show All 11 Lines	public:
/// formulate a replacement for OperandValToReplace in UserInst.		/// formulate a replacement for OperandValToReplace in UserInst.
SmallVector<Formula, 12> Formulae;		SmallVector<Formula, 12> Formulae;

/// The set of register candidates used by all formulae in this LSRUse.		/// The set of register candidates used by all formulae in this LSRUse.
SmallPtrSet<const SCEV *, 4> Regs;		SmallPtrSet<const SCEV *, 4> Regs;

LSRUse(KindType K, MemAccessTy AT)		LSRUse(KindType K, MemAccessTy AT)
: Kind(K), AccessTy(AT), MinOffset(INT64_MAX), MaxOffset(INT64_MIN),		: Kind(K), AccessTy(AT), MinOffset(INT64_MAX), MaxOffset(INT64_MIN),
AllFixupsOutsideLoop(true), RigidFormula(false),		SameBaseAddress(0), AllFixupsOutsideLoop(true), RigidFormula(false),
WidestFixupType(nullptr) {}		WidestFixupType(nullptr) {}

LSRFixup &getNewFixup() {		LSRFixup &getNewFixup() {
Fixups.push_back(LSRFixup());		Fixups.push_back(LSRFixup());
return Fixups.back();		return Fixups.back();
}		}

void pushFixup(LSRFixup &f) {		void pushFixup(LSRFixup &f) {
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	if (NumBaseParts > 1)
NumBaseAdds +=		NumBaseAdds +=
NumBaseParts - (1 + (F.Scale && isAMCompletelyFolded(TTI, LU, F)));		NumBaseParts - (1 + (F.Scale && isAMCompletelyFolded(TTI, LU, F)));
NumBaseAdds += (F.UnfoldedOffset != 0);		NumBaseAdds += (F.UnfoldedOffset != 0);

// Accumulate non-free scaling amounts.		// Accumulate non-free scaling amounts.
ScaleCost += getScalingFactorCost(TTI, LU, F);		ScaleCost += getScalingFactorCost(TTI, LU, F);

// Tally up the non-zero immediates.		// Tally up the non-zero immediates.
		int64_t AvgOffset = 0;
for (const LSRFixup &Fixup : LU.Fixups) {		for (const LSRFixup &Fixup : LU.Fixups) {
int64_t O = Fixup.Offset;		int64_t O = Fixup.Offset;
int64_t Offset = (uint64_t)O + F.BaseOffset;		int64_t Offset = (uint64_t)O + F.BaseOffset;
		AvgOffset += O;
if (F.BaseGV)		if (F.BaseGV)
ImmCost += 64; // Handle symbolic values conservatively.		ImmCost += 64; // Handle symbolic values conservatively.
// TODO: This should probably be the pointer size.		// TODO: This should probably be the pointer size.
else if (Offset != 0)		else if (Offset != 0)
ImmCost += APInt(64, Offset, true).getMinSignedBits();		ImmCost += APInt(64, Offset, true).getMinSignedBits();

// Check with target if this offset with this instruction is		// Check with target if this offset with this instruction is
// specifically not supported.		// specifically not supported.
if ((isa<LoadInst>(Fixup.UserInst) \|\| isa<StoreInst>(Fixup.UserInst)) &&		if ((isa<LoadInst>(Fixup.UserInst) \|\| isa<StoreInst>(Fixup.UserInst)) &&
!TTI.isFoldableMemAccessOffset(Fixup.UserInst, Offset))		!TTI.isFoldableMemAccessOffset(Fixup.UserInst, Offset))
NumBaseAdds++;		NumBaseAdds++;
}		}

		if (LU.Kind == LSRUse::Address && LU.SameBaseAddress) {
		if (LU.Fixups.size())
		AvgOffset /= LU.Fixups.size();
		int64_t Offset = AvgOffset + F.BaseOffset;
		if (F.BaseGV)
		ImmCost += 64 * LU.SameBaseAddress;
		else if (F.BaseOffset != 0)
		ImmCost +=
		APInt(64, Offset, true).getMinSignedBits() * LU.SameBaseAddress;
		}
assert(isValid() && "invalid cost");		assert(isValid() && "invalid cost");
}		}

/// Set this cost to a losing value.		/// Set this cost to a losing value.
void Cost::Lose() {		void Cost::Lose() {
NumRegs = ~0u;		NumRegs = ~0u;
AddRecCost = ~0u;		AddRecCost = ~0u;
NumIVMuls = ~0u;		NumIVMuls = ~0u;
▲ Show 20 Lines • Show All 494 Lines • ▼ Show 20 Lines	class LSRInstance {

/// IV users can form a chain of IV increments.		/// IV users can form a chain of IV increments.
SmallVector<IVChain, MaxChains> IVChainVec;		SmallVector<IVChain, MaxChains> IVChainVec;

/// IV users that belong to profitable IVChains.		/// IV users that belong to profitable IVChains.
SmallPtrSet<Use*, MaxChains> IVIncSet;		SmallPtrSet<Use*, MaxChains> IVIncSet;

void OptimizeShadowIV();		void OptimizeShadowIV();
bool FindIVUserForCond(ICmpInst Cond, IVStrideUse &CondUse);		bool FindIVUserForInst(Instruction Cond, IVStrideUse &CondUse);
ICmpInst OptimizeMax(ICmpInst Cond, IVStrideUse* &CondUse);		ICmpInst OptimizeMax(ICmpInst Cond, IVStrideUse* &CondUse);
void OptimizeLoopTermCond();		void OptimizeLoopTermCond();

void ChainInstruction(Instruction UserInst, Instruction IVOper,		void ChainInstruction(Instruction UserInst, Instruction IVOper,
SmallVectorImpl<ChainUsers> &ChainUsersVec);		SmallVectorImpl<ChainUsers> &ChainUsersVec);
void FinalizeChain(IVChain &Chain);		void FinalizeChain(IVChain &Chain);
void CollectChains();		void CollectChains();
void GenerateIVChain(const IVChain &Chain, SCEVExpander &Rewriter,		void GenerateIVChain(const IVChain &Chain, SCEVExpander &Rewriter,
Show All 18 Lines	class LSRInstance {

void InsertInitialFormula(const SCEV *S, LSRUse &LU, size_t LUIdx);		void InsertInitialFormula(const SCEV *S, LSRUse &LU, size_t LUIdx);
void InsertSupplementalFormula(const SCEV *S, LSRUse &LU, size_t LUIdx);		void InsertSupplementalFormula(const SCEV *S, LSRUse &LU, size_t LUIdx);
void CountRegisters(const Formula &F, size_t LUIdx);		void CountRegisters(const Formula &F, size_t LUIdx);
bool InsertFormula(LSRUse &LU, unsigned LUIdx, const Formula &F);		bool InsertFormula(LSRUse &LU, unsigned LUIdx, const Formula &F);

void CollectLoopInvariantFixupsAndFormulae();		void CollectLoopInvariantFixupsAndFormulae();

		void CountSameBaseUses();

void GenerateReassociations(LSRUse &LU, unsigned LUIdx, Formula Base,		void GenerateReassociations(LSRUse &LU, unsigned LUIdx, Formula Base,
unsigned Depth = 0);		unsigned Depth = 0);

void GenerateReassociationsImpl(LSRUse &LU, unsigned LUIdx,		void GenerateReassociationsImpl(LSRUse &LU, unsigned LUIdx,
const Formula &Base, unsigned Depth,		const Formula &Base, unsigned Depth,
size_t Idx, bool IsScaledReg = false);		size_t Idx, bool IsScaledReg = false);
void GenerateCombinations(LSRUse &LU, unsigned LUIdx, Formula Base);		void GenerateCombinations(LSRUse &LU, unsigned LUIdx, Formula Base);
void GenerateSymbolicOffsetsImpl(LSRUse &LU, unsigned LUIdx,		void GenerateSymbolicOffsetsImpl(LSRUse &LU, unsigned LUIdx,
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	for (IVUsers::const_iterator UI = IU.begin(), E = IU.end();
ShadowUse->eraseFromParent();		ShadowUse->eraseFromParent();
Changed = true;		Changed = true;
break;		break;
}		}
}		}

/// If Cond has an operand that is an expression of an IV, set the IV user and		/// If Cond has an operand that is an expression of an IV, set the IV user and
/// stride information and return true, otherwise return false.		/// stride information and return true, otherwise return false.
bool LSRInstance::FindIVUserForCond(ICmpInst Cond, IVStrideUse &CondUse) {		bool LSRInstance::FindIVUserForInst(Instruction Cond, IVStrideUse &CondUse) {
for (IVStrideUse &U : IU)		for (IVStrideUse &U : IU)
if (U.getUser() == Cond) {		if (U.getUser() == Cond) {
// NOTE: we could handle setcc instructions with multiple uses here, but		// NOTE: we could handle setcc instructions with multiple uses here, but
// InstCombine does it as well for simple uses, it's not clear that it		// InstCombine does it as well for simple uses, it's not clear that it
// occurs enough in real life to handle.		// occurs enough in real life to handle.
CondUse = &U;		CondUse = &U;
return true;		return true;
}		}
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	if (!TermBr)
continue;		continue;
// FIXME: Overly conservative, termination condition could be an 'or' etc..		// FIXME: Overly conservative, termination condition could be an 'or' etc..
if (TermBr->isUnconditional() \|\| !isa<ICmpInst>(TermBr->getCondition()))		if (TermBr->isUnconditional() \|\| !isa<ICmpInst>(TermBr->getCondition()))
continue;		continue;

// Search IVUsesByStride to find Cond's IVUse if there is one.		// Search IVUsesByStride to find Cond's IVUse if there is one.
IVStrideUse *CondUse = nullptr;		IVStrideUse *CondUse = nullptr;
ICmpInst *Cond = cast<ICmpInst>(TermBr->getCondition());		ICmpInst *Cond = cast<ICmpInst>(TermBr->getCondition());
if (!FindIVUserForCond(Cond, CondUse))		if (!FindIVUserForInst(Cond, CondUse))
continue;		continue;

// If the trip count is computed in terms of a max (due to ScalarEvolution		// If the trip count is computed in terms of a max (due to ScalarEvolution
// being unable to find a sufficient guard, for example), change the loop		// being unable to find a sufficient guard, for example), change the loop
// comparison to use SLT or ULT instead of NE.		// comparison to use SLT or ULT instead of NE.
// One consequence of doing this now is that it disrupts the count-down		// One consequence of doing this now is that it disrupts the count-down
// optimization. That's not always a bad thing though, because in such		// optimization. That's not always a bad thing though, because in such
// cases it may still be worthwhile to avoid a max.		// cases it may still be worthwhile to avoid a max.
▲ Show 20 Lines • Show All 1,041 Lines • ▼ Show 20 Lines	else if (const SCEVUDivExpr *D = dyn_cast<SCEVUDivExpr>(S)) {
InsertSupplementalFormula(US, LU, LUIdx);		InsertSupplementalFormula(US, LU, LUIdx);
CountRegisters(LU.Formulae.back(), Uses.size() - 1);		CountRegisters(LU.Formulae.back(), Uses.size() - 1);
break;		break;
}		}
}		}
}		}
}		}

		/// Set LU.SameBaseAddress to number of Incs for each profitable Chain.
		void
		LSRInstance::CountSameBaseUses() {
		for (const IVChain &Chain : IVChainVec) {
		// Get an LSRUse.
		if (Chain.Incs.size() < 2)
		continue;
		IVStrideUse *IUse;
		if (!FindIVUserForInst(Chain.Incs[0].UserInst, IUse))
		continue;
		const SCEV S = IU.getExpr(IUse);
		ExtractImmediate(S, SE);
		LSRUse::SCEVUseKindPair P = LSRUse::SCEVUseKindPair(S, LSRUse::Address);
		if (UseMap.count(P)) {
		// A use already existed with this base.
		size_t LUIdx = UseMap.find(P)->second;
		LSRUse &LU = Uses[LUIdx];
		LU.SameBaseAddress += (Chain.Incs.size() - 1);
		}
		}
		}

/// Split S into subexpressions which can be pulled out into separate		/// Split S into subexpressions which can be pulled out into separate
/// registers. If C is non-null, multiply each subexpression by C.		/// registers. If C is non-null, multiply each subexpression by C.
///		///
/// Return remainder expression after factoring the subexpressions captured by		/// Return remainder expression after factoring the subexpressions captured by
/// Ops. If Ops is complete, return NULL.		/// Ops. If Ops is complete, return NULL.
static const SCEV CollectSubexprs(const SCEV S, const SCEVConstant *C,		static const SCEV CollectSubexprs(const SCEV S, const SCEVConstant *C,
SmallVectorImpl<const SCEV *> &Ops,		SmallVectorImpl<const SCEV *> &Ops,
const Loop *L,		const Loop *L,
▲ Show 20 Lines • Show All 1,661 Lines • ▼ Show 20 Lines	#endif // DEBUG
}		}

// Start collecting data and preparing for the solver.		// Start collecting data and preparing for the solver.
CollectChains();		CollectChains();
CollectInterestingTypesAndFactors();		CollectInterestingTypesAndFactors();
CollectFixupsAndInitialFormulae();		CollectFixupsAndInitialFormulae();
CollectLoopInvariantFixupsAndFormulae();		CollectLoopInvariantFixupsAndFormulae();

		// Calculate address uses with same base address.
		CountSameBaseUses();

assert(!Uses.empty() && "IVUsers reported at least one use");		assert(!Uses.empty() && "IVUsers reported at least one use");
DEBUG(dbgs() << "LSR found " << Uses.size() << " uses:\n";		DEBUG(dbgs() << "LSR found " << Uses.size() << " uses:\n";
print_uses(dbgs()));		print_uses(dbgs()));

// Now use the reuse data to generate a bunch of interesting ways		// Now use the reuse data to generate a bunch of interesting ways
// to formulate the values needed for the uses.		// to formulate the values needed for the uses.
GenerateAllReuseFormulae();		GenerateAllReuseFormulae();

▲ Show 20 Lines • Show All 194 Lines • Show Last 20 Lines

test/Transforms/LoopStrengthReduce/X86/imm-cost.ll

				; RUN: opt < %s -loop-reduce -mtriple=x86_64 -S \| FileCheck %s
				; The test checks that imm cost is counted for both loads %0 and %1
				; If it counts only for the first load %0 LSR will prefer to leave
				; set up add.

				target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"

				; CHECK: for.body.preheader:
				; CHECK-NEXT: br

				; Function Attrs: norecurse nounwind readonly uwtable
				define i32 @foo(i32 %n, i8* nocapture readonly %a) {
				entry:
				%cmp11 = icmp sgt i32 %n, 0
				br i1 %cmp11, label %for.body.preheader, label %for.end

				for.body.preheader: ; preds = %entry
				%add.ptr = getelementptr inbounds i8, i8* %a, i64 -1
				br label %for.body

				for.body: ; preds = %for.body, %for.body.preheader
				%inptr.014 = phi i8* [ %incdec.ptr1, %for.body ], [ %add.ptr, %for.body.preheader ]
				%s.013 = phi i32 [ %add3, %for.body ], [ 0, %for.body.preheader ]
				%i.012 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%incdec.ptr = getelementptr inbounds i8, i8* %inptr.014, i64 1
				%tmp = load i8, i8* %inptr.014, align 1
				%conv = sext i8 %tmp to i32
				qcolombetUnsubmitted Not Done Reply Inline Actions Use opt -instnamer to get rid of implicit variables. qcolombet: Use opt -instnamer to get rid of implicit variables.
				%add = add nsw i32 %conv, %s.013
				%incdec.ptr1 = getelementptr inbounds i8, i8* %inptr.014, i64 2
				%tmp1 = load i8, i8* %incdec.ptr, align 1
				%conv2 = sext i8 %tmp1 to i32
				%add3 = add nsw i32 %add, %conv2
				%inc = add nuw nsw i32 %i.012, 1
				%exitcond = icmp eq i32 %inc, %n
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body, %entry
				%s.0.lcssa = phi i32 [ 0, %entry ], [ %add3, %for.body ]
				ret i32 %s.0.lcssa
				}