This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
6/6
LoopIdiomRecognize.cpp
-
test/Transforms/LoopIdiom/
-
Transforms/
-
LoopIdiom/
-
memset-runtime-debug.ll
1/4
memset-runtime.ll

Differential D108112

[LoopIdiom] Let LIR fold memset pointer / stride SCEV regarding loop guards
ClosedPublic

Authored by eopXD on Aug 16 2021, 2:52 AM.

Download Raw Diff

Details

Reviewers

Whitney
lebedev.ri
efriedma
mkazantsev
fhahn
qianzhen
bmahjour

Commits

rGbc17d32a5f71: [LoopIdiom] Let LIR fold memset pointer / stride SCEV regarding loop guards

Summary

Expression guraded in loop entry can be folded prior to comparison. This patch
proceeds D107353 and makes LIR able to deal with nested for-loop.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

eopXD created this revision.Aug 16 2021, 2:52 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 16 2021, 2:52 AM

eopXD requested review of this revision.Aug 16 2021, 2:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 2:52 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B119671: Diff 366576.Aug 16 2021, 3:13 AM

I think this needs to be somewhere in SCEV itself.

In D108112#2946374, @lebedev.ri wrote:

I think this needs to be somewhere in SCEV itself.

Do you mean the structure SCEVFolder?

Fix clang-format warning.

In D108112#2946376, @eopXD wrote:

In D108112#2946374, @lebedev.ri wrote:

I think this needs to be somewhere in SCEV itself.

Do you mean the structure SCEVFolder?

No, i mean that the ScalarEvolution should perform this canonicalization internally, in ScalarEvolution::getSignExtendExpr()
It already does it, in fact: https://github.com/llvm/llvm-project/blob/0dc6b597db4d8b25f3df96a8f6574be732e18fdd/llvm/lib/Analysis/ScalarEvolution.cpp#L2100-L2103
Though perhaps that is not feasible because we don't know the loop we're after when in ScalarEvolution::getSignExtendExpr().

No, i mean that the ScalarEvolution should perform this canonicalization internally, in ScalarEvolution::getSignExtendExpr()
It already does it, in fact: https://github.com/llvm/llvm-project/blob/0dc6b597db4d8b25f3df96a8f6574be732e18fdd/llvm/lib/Analysis/ScalarEvolution.cpp#L2100-L2103

I see. Thank you for the elaboration.

Though perhaps that is not feasible because we don't know the loop we're after when in ScalarEvolution::getSignExtendExpr().

I agree.

Harbormaster completed remote builds in B119673: Diff 366579.Aug 16 2021, 4:00 AM

Whitney added a reviewer: qianzhen.Aug 16 2021, 7:06 AM

Whitney added a subscriber: bmahjour.

Fix some minor type-o.

bmahjour added inline comments.Aug 16 2021, 10:02 AM

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
310	that is -> that are
319	greater of -> greater or
321–325	if (SE.isLoopEntryGuardedByCond(CurLoop, ICmpInst::ICMP_SGE, Expr, SE.getZero(Expr->getType()))) return SE.getZeroExtendExpr(visit(Expr->getOperand()), Expr->getType()); return Expr; }
988–989	fold expressions that is -> fold an expression that is
988–989	proceed optimization if equal. -> proceed with optimization, if equal.
993	with respect to -> based on
llvm/test/Transforms/LoopIdiom/memset-runtime.ll
116	Given that the stride is `o x 4`, it's not clear how the loop guard can help make assumptions about sign/zero extension. Can you explain this example in the description section?

Harbormaster completed remote builds in B119733: Diff 366661.Aug 16 2021, 10:09 AM

Address comments.
Thank you for pointing out the grammatical errors.

Harbormaster completed remote builds in B119818: Diff 366783.Aug 16 2021, 6:53 PM

Update test case CHECK based on change of debug log.

Harbormaster completed remote builds in B120257: Diff 367393.Aug 18 2021, 8:49 PM

LGTM

This revision is now accepted and ready to land.Aug 19 2021, 8:41 AM

Could you clean up the basic block names & co in the test case?

Update test case with better renaming and clear redundant blocks.

Harbormaster completed remote builds in B120494: Diff 367706.Aug 19 2021, 9:38 PM

Thank you @lebedev.ri and @bmahjour for reviewing the patch. I am now going to commit it.

I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode. But what if we are compiling in 32-bit mode, or if 'n' and 'm' where 'signed long long'?

In D108112#2957417, @bmahjour wrote:

But what if we are compiling in 32-bit mode,

This is a valid point. Thank you for pointing this out.
Created D108507 to add test case that is aware of 32-bit mode.

or if 'n' and 'm' where 'signed long long'?

The accepted scenario for LIR is MemsetSizeSCEV == PositivePointerStrideSCEV.
The n, m accepted would always be non-negative. I can't think of scenario when n, m needs to remain as a signed long long. (Please correct me if I'm wrong.

I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode.

I agree with you that SCEVFolder has trivial functionality. It only to turn sext to zext. This patch only fixes the test case added, which is very limited.
On the other hand, I don't think I have deep understanding to benchmarks. So I am not sure if I have covered most of the "practical memset idioms" that can be optimized here. Do you have other "fold loop-guard into SCEV-expression` cases that you wish this patch to have? I would be happy to resolve it.

eopXD mentioned this in D108507: [NFC][LoopIdiom] Add more test case to runtime-determined memset size.Aug 26 2021, 1:15 AM

gentle ping @bmahjour

In D108112#2958737, @eopXD wrote:

In D108112#2957417, @bmahjour wrote:

or if 'n' and 'm' where 'signed long long'?

The accepted scenario for LIR is MemsetSizeSCEV == PositivePointerStrideSCEV.
The n, m accepted would always be non-negative. I can't think of scenario when n, m needs to remain as a signed long long. (Please correct me if I'm wrong.

Since we don't normalize loops we may have reverse loops where the upper bound is a negative signed value. Another example is if IVs start with negative values:

void foo(int n, int m, float Arr[n][m]) {
  for (int i = -100; i < n; i++)
    for (int j = -200; j < m; j++)
      Arr[i+100][j+200] = 1.0;
}

I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode.

I agree with you that SCEVFolder has trivial functionality. It only to turn sext to zext. This patch only fixes the test case added, which is very limited.

The term "fold" doesn't seem to be the right terminology. I think "rewrite" is more appropriate. For example, SCEVFolder may be called SCEVSignToZeroExtentionRewriter (or something similar).

On the other hand, I don't think I have deep understanding to benchmarks. So I am not sure if I have covered most of the "practical memset idioms" that can be optimized here. Do you have other "fold loop-guard into SCEV-expression` cases that you wish this patch to have? I would be happy to resolve it.

I don't have any specific benchmarks in mind.

eopXD retitled this revision from [LoopIdiom] let LIR fold memset pointer / stride SCEV regarding loop guards to [LoopIdiom] Let LIR fold memset pointer / stride SCEV regarding loop guards.Oct 11 2021, 5:51 PM

Whitney added a project: Restricted Project.Dec 1 2021, 8:26 AM

Rebase to latest main.

Update testcase due to rebase.

Rename SCEVFolder to SCEVSignToZeroExtentionRewriter.
Add example with negative start.

Hi @bmahjour,

Sorry for such delay.

I think the example you raised don't cause a problem to the current implementation.
The comparison is between "pointer stride" and "memset size", which all need to be non-negative.
The start/end of the induction variable don't matter because if the origin stride is negative it will be turned positive before comparison.

I have addressed your comment on renaming the SCEV folding structure and added test case for negative start loops like your example.
(Your example is loop strided store, I modified it to memset in the examples)

Harbormaster completed remote builds in B138456: Diff 393170.Dec 9 2021, 9:15 AM

bmahjour added inline comments.Dec 10 2021, 11:02 AM

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
110–140	Why is this test removed?

eopXD added inline comments.Dec 10 2021, 9:19 PM

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
110–140	The test file is split into 2 files, `memset-runtime-32bit.ll` and `memset-runtime-64bit.ll` to test on 32-bit mode and 64-bit mode in D108507. So `memset-runtime.ll` is gone.

LGTM

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
110–140	If you could put the comment, describing what these tests are meant for (ie lines 120-125), at the beginning of memset-runtime-32bit.ll and/or memset-runtime-64bit.ll it would be helpful.

bmahjour accepted this revision.Dec 13 2021, 7:10 AM

Rebase.

Harbormaster completed remote builds in B138971: Diff 393893.Dec 13 2021, 8:11 AM

Last rebase was with the wrong version, rebase again.

Address @bmahjour 's comment, add comment to test case this patch improves.

Harbormaster completed remote builds in B138982: Diff 393911.Dec 13 2021, 9:21 AM

This revision was landed with ongoing or failed builds.Dec 13 2021, 9:37 AM

Closed by commit rGbc17d32a5f71: [LoopIdiom] Let LIR fold memset pointer / stride SCEV regarding loop guards (authored by eopXD). · Explain Why

This revision was automatically updated to reflect the committed changes.

eopXD added a commit: rGbc17d32a5f71: [LoopIdiom] Let LIR fold memset pointer / stride SCEV regarding loop guards.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopIdiomRecognize.cpp

40 lines

test/

Transforms/

LoopIdiom/

memset-runtime-debug.ll

3 lines

memset-runtime.ll

125 lines

Diff 367393

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	public:
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
AU.addPreserved<MemorySSAWrapperPass>();		AU.addPreserved<MemorySSAWrapperPass>();
getLoopAnalysisUsage(AU);		getLoopAnalysisUsage(AU);
}		}
};		};

		// The Folder will fold expressions that are guarded by the loop entry.
		bmahjourUnsubmitted Done Reply Inline Actions that is -> that are bmahjour: that is -> that are
		class SCEVFolder : public SCEVRewriteVisitor<SCEVFolder> {
		public:
		ScalarEvolution &SE;
		const Loop *CurLoop;
		SCEVFolder(ScalarEvolution &SE, const Loop *CurLoop)
		: SCEVRewriteVisitor(SE), SE(SE), CurLoop(CurLoop) {}

		const SCEV visitSignExtendExpr(const SCEVSignExtendExpr Expr) {
		// If expression is guarded by CurLoop to be greater or equal to zero
		bmahjourUnsubmitted Done Reply Inline Actions greater of -> greater or bmahjour: greater of -> greater or
		// then convert sext to zext. Otherwise return the original expression.
		if (SE.isLoopEntryGuardedByCond(CurLoop, ICmpInst::ICMP_SGE, Expr,
		SE.getZero(Expr->getType())))
		return SE.getZeroExtendExpr(visit(Expr->getOperand()), Expr->getType());
		return Expr;
		}
		bmahjourUnsubmitted Done Reply Inline Actions if (SE.isLoopEntryGuardedByCond(CurLoop, ICmpInst::ICMP_SGE, Expr, SE.getZero(Expr->getType()))) return SE.getZeroExtendExpr(visit(Expr->getOperand()), Expr->getType()); return Expr; } bmahjour: ``` if (SE.isLoopEntryGuardedByCond(CurLoop, ICmpInst::ICMP_SGE, Expr…
		};

} // end anonymous namespace		} // end anonymous namespace

char LoopIdiomRecognizeLegacyPass::ID = 0;		char LoopIdiomRecognizeLegacyPass::ID = 0;

PreservedAnalyses LoopIdiomRecognizePass::run(Loop &L, LoopAnalysisManager &AM,		PreservedAnalyses LoopIdiomRecognizePass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,		LoopStandardAnalysisResults &AR,
LPMUpdater &) {		LPMUpdater &) {
if (DisableLIRP::All)		if (DisableLIRP::All)
▲ Show 20 Lines • Show All 644 Lines • ▼ Show 20 Lines	if (IsConstantSize) {
IsNegStride = PointerStrideSCEV->isNonConstantNegative();		IsNegStride = PointerStrideSCEV->isNonConstantNegative();
const SCEV *PositiveStrideSCEV =		const SCEV *PositiveStrideSCEV =
IsNegStride ? SE->getNegativeSCEV(PointerStrideSCEV)		IsNegStride ? SE->getNegativeSCEV(PointerStrideSCEV)
: PointerStrideSCEV;		: PointerStrideSCEV;
LLVM_DEBUG(dbgs() << " MemsetSizeSCEV: " << *MemsetSizeSCEV << "\n"		LLVM_DEBUG(dbgs() << " MemsetSizeSCEV: " << *MemsetSizeSCEV << "\n"
<< " PositiveStrideSCEV: " << *PositiveStrideSCEV		<< " PositiveStrideSCEV: " << *PositiveStrideSCEV
<< "\n");		<< "\n");

if (PositiveStrideSCEV != MemsetSizeSCEV) {		if (PositiveStrideSCEV != MemsetSizeSCEV) {
// TODO: folding can be done to the SCEVs		// The folding is to fold an expression that is covered by the loop guard
		bmahjourUnsubmitted Done Reply Inline Actions fold expressions that is -> fold an expression that is bmahjour: fold expressions that is -> fold an expression that is
		bmahjourUnsubmitted Done Reply Inline Actions proceed optimization if equal. -> proceed with optimization, if equal. bmahjour: proceed optimization if equal. -> proceed with optimization, if equal.
// The folding is to fold expressions that is covered by the loop guard		// at loop entry. After the folding, compare again and proceed with
// at loop entry. After the folding, compare again and proceed		// optimization, if equal.
// optimization if equal.		SCEVFolder Folder(*SE, CurLoop);
		const SCEV *FoldedPositiveStride = Folder.visit(PositiveStrideSCEV);
		bmahjourUnsubmitted Done Reply Inline Actions with respect to -> based on bmahjour: with respect to -> based on
		const SCEV *FoldedMemsetSize = Folder.visit(MemsetSizeSCEV);

		LLVM_DEBUG(dbgs() << " Try to fold SCEV based on loop guard\n"
		<< " FoldedMemsetSize: " << *FoldedMemsetSize << "\n"
		<< " FoldedPositiveStride: " << *FoldedPositiveStride
		<< "\n");

		if (FoldedPositiveStride != FoldedMemsetSize) {
LLVM_DEBUG(dbgs() << " SCEV don't match, abort\n");		LLVM_DEBUG(dbgs() << " SCEV don't match, abort\n");
return false;		return false;
}		}
}		}
		}

// Verify that the memset value is loop invariant. If not, we can't promote		// Verify that the memset value is loop invariant. If not, we can't promote
// the memset.		// the memset.
Value *SplatValue = MSI->getValue();		Value *SplatValue = MSI->getValue();
if (!SplatValue \|\| !CurLoop->isLoopInvariant(SplatValue))		if (!SplatValue \|\| !CurLoop->isLoopInvariant(SplatValue))
return false;		return false;

SmallPtrSet<Instruction *, 1> MSIs;		SmallPtrSet<Instruction *, 1> MSIs;
▲ Show 20 Lines • Show All 1,865 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopIdiom/memset-runtime-debug.ll

	Show All 13 Lines
	; Check on debug outputs...			; Check on debug outputs...
	; CHECK: loop-idiom Scanning: F[MemsetSize_LoopVariant] Countable Loop %for.body			; CHECK: loop-idiom Scanning: F[MemsetSize_LoopVariant] Countable Loop %for.body
	; CHECK-NEXT: memset size is non-constant			; CHECK-NEXT: memset size is non-constant
	; CHECK-NEXT: memset size is not a loop-invariant, abort			; CHECK-NEXT: memset size is not a loop-invariant, abort
	; CHECK: loop-idiom Scanning: F[MemsetSize_Stride_Mismatch] Countable Loop %for.body			; CHECK: loop-idiom Scanning: F[MemsetSize_Stride_Mismatch] Countable Loop %for.body
	; CHECK-NEXT: memset size is non-constant			; CHECK-NEXT: memset size is non-constant
	; CHECK-NEXT: MemsetSizeSCEV: (4 * (sext i32 %m to i64))<nsw>			; CHECK-NEXT: MemsetSizeSCEV: (4 * (sext i32 %m to i64))<nsw>
	; CHECK-NEXT: PositiveStrideSCEV: (4 + (4 * (sext i32 %m to i64))<nsw>)<nsw>			; CHECK-NEXT: PositiveStrideSCEV: (4 + (4 * (sext i32 %m to i64))<nsw>)<nsw>
				; CHECK-NEXT: Try to fold SCEV based on loop guard
				; CHECK-NEXT: FoldedMemsetSize: (4 * (sext i32 %m to i64))<nsw>
				; CHECK-NEXT: FoldedPositiveStride: (4 + (4 * (sext i32 %m to i64))<nsw>)<nsw>
	; CHECK-NEXT: SCEV don't match, abort			; CHECK-NEXT: SCEV don't match, abort
	; CHECK: loop-idiom Scanning: F[NonZeroAddressSpace] Countable Loop %for.cond1.preheader			; CHECK: loop-idiom Scanning: F[NonZeroAddressSpace] Countable Loop %for.cond1.preheader
	; CHECK-NEXT: memset size is non-constant			; CHECK-NEXT: memset size is non-constant
	; CHECK-NEXT: pointer is not in address space zero, abort			; CHECK-NEXT: pointer is not in address space zero, abort
	; CHECK: loop-idiom Scanning: F[NonAffinePointer] Countable Loop %for.body			; CHECK: loop-idiom Scanning: F[NonAffinePointer] Countable Loop %for.body
	; CHECK-NEXT: Pointer is not affine, abort			; CHECK-NEXT: Pointer is not affine, abort

	define void @MemsetSize_LoopVariant(i32* %ar, i32 %n, i32 %m) {			define void @MemsetSize_LoopVariant(i32* %ar, i32 %n, i32 %m) {
	▲ Show 20 Lines • Show All 241 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopIdiom/memset-runtime.ll

	Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines

	for.cond.for.end_crit_edge: ; preds = %for.inc			for.cond.for.end_crit_edge: ; preds = %for.inc
	br label %for.end			br label %for.end

	for.end: ; preds = %for.cond.for.end_crit_edge, %entry			for.end: ; preds = %for.cond.for.end_crit_edge, %entry
	ret void			ret void
	}			}

				; The C code to generate this testcase:
				; void test(int n, int m, int o, int *ar)
				; {
				; for (int i=0; i<n; ++i) {
				; for (int j=0; j<m; ++j) {
				; int arr = ar + i m * o + j * o;
				; memset(arr, 0, o * sizeof(int));
				bmahjourUnsubmitted Done Reply Inline Actions Given that the stride is `o x 4`, it's not clear how the loop guard can help make assumptions about sign/zero extension. Can you explain this example in the description section? bmahjour: Given that the stride is `o x 4`, it's not clear how the loop guard can help make assumptions…
				; }
				; }
				; }
				; This case requires SCEVFolder in LoopIdiomRecognize.cpp to fold SCEV prior to comparison.
				; For the inner-loop, SCEVFolder is not needed, however the promoted memset size would be based
				; on the trip count of inner-loop (which is an unsigned integer).
				; Then in the outer loop, the pointer stride SCEV for memset needs to be converted based on the
				; loop guard for it to equal to the memset size SCEV. The loop guard guaranteeds that m >= 0
				; inside the loop, so m can be converted from sext to zext, making the two SCEV-s equal.
				; Below are the debug log of LoopIdiomRecognize.
				; loop-idiom Scanning: F[NestedFor] Countable Loop %for.body3.us
				; memset size is non-constant
				; MemsetSizeSCEV: (4 * (sext i32 %o to i64))<nsw>
				; PositiveStrideSCEV: (4 * (sext i32 %o to i64))<nsw>
				; Formed memset: call void @llvm.memset.p0i8.i64(i8* align 4 %scevgep1, i8 0, i64 %6, i1 false)
				; loop-idiom Scanning: F[NestedFor] Countable Loop %for.body.us
				; memset size is non-constant
				; MemsetSizeSCEV: (4 * (zext i32 %m to i64) * (sext i32 %o to i64))
				; PositiveStrideSCEV: (4 * (sext i32 %m to i64) * (sext i32 %o to i64))
				; Try to fold SCEV based on loop guard
				; FoldedMemsetSize: (4 * (zext i32 %m to i64) * (sext i32 %o to i64))
				; FoldedPositiveStride: (4 * (zext i32 %m to i64) * (sext i32 %o to i64))
				; Formed memset: call void @llvm.memset.p0i8.i64(i8* align 4 %ar2, i8 0, i64 %8, i1 false)
				define void @NestedFor(i32 %n, i32 %m, i32 %o, i32* %ar) {
				bmahjourUnsubmitted Not Done Reply Inline Actions Why is this test removed? bmahjour: Why is this test removed?
				eopXDAuthorUnsubmitted Not Done Reply Inline Actions The test file is split into 2 files, `memset-runtime-32bit.ll` and `memset-runtime-64bit.ll` to test on 32-bit mode and 64-bit mode in D108507. So `memset-runtime.ll` is gone. eopXD: The test file is split into 2 files, `memset-runtime-32bit.ll` and `memset-runtime-64bit.ll` to…
				bmahjourUnsubmitted Not Done Reply Inline Actions If you could put the comment, describing what these tests are meant for (ie lines 120-125), at the beginning of memset-runtime-32bit.ll and/or memset-runtime-64bit.ll it would be helpful. bmahjour: If you could put the comment, describing what these tests are meant for (ie lines 120-125), at…
				; CHECK-LABEL: @NestedFor(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[AR2:%.]] = bitcast i32 [[AR:%.]] to i8
				; CHECK-NEXT: [[CMP3:%.]] = icmp slt i32 0, [[N:%.]]
				; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_END11:%.]]
				; CHECK: for.body.lr.ph:
				; CHECK-NEXT: [[CMP21:%.]] = icmp slt i32 0, [[M:%.]]
				; CHECK-NEXT: [[CONV:%.]] = sext i32 [[O:%.]] to i64
				; CHECK-NEXT: [[MUL8:%.*]] = mul i64 [[CONV]], 4
				; CHECK-NEXT: br i1 [[CMP21]], label [[FOR_BODY_LR_PH_SPLIT_US:%.*]], label [[FOR_END11]]
				; CHECK: for.body.lr.ph.split.us:
				; CHECK-NEXT: [[TMP0:%.*]] = sext i32 [[O]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[M]] to i64
				; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[O]] to i64
				; CHECK-NEXT: [[WIDE_TRIP_COUNT10:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP0]], [[TMP1]]
				; CHECK-NEXT: [[TMP4:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP0]], [[TMP4]]
				; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[TMP5]], 2
				; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP5]], [[WIDE_TRIP_COUNT10]]
				; CHECK-NEXT: [[TMP8:%.*]] = shl i64 [[TMP7]], 2
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[AR2]], i8 0, i64 [[TMP8]], i1 false)
				; CHECK-NEXT: br label [[FOR_END11]]
				; CHECK: for.end11:
				; CHECK-NEXT: ret void
				;
				entry:
				%cmp3 = icmp slt i32 0, %n
				br i1 %cmp3, label %for.body.lr.ph, label %for.end11

				for.body.lr.ph: ; preds = %entry
				%cmp21 = icmp slt i32 0, %m
				%conv = sext i32 %o to i64
				%mul8 = mul i64 %conv, 4
				br i1 %cmp21, label %for.body.lr.ph.split.us, label %for.body.lr.ph.split

				for.body.lr.ph.split.us: ; preds = %for.body.lr.ph
				%0 = sext i32 %o to i64
				%1 = sext i32 %m to i64
				%2 = sext i32 %o to i64
				%wide.trip.count10 = zext i32 %n to i64
				br label %for.body.us

				for.body.us: ; preds = %for.inc9.us, %for.body.lr.ph.split.us
				%indvars.iv6 = phi i64 [ %indvars.iv.next7, %for.inc9.us ], [ 0, %for.body.lr.ph.split.us ]
				br label %for.body3.lr.ph.us

				for.end.us: ; preds = %for.cond1.for.end_crit_edge.us
				br label %for.inc9.us

				for.inc9.us: ; preds = %for.end.us
				%indvars.iv.next7 = add nuw nsw i64 %indvars.iv6, 1
				%exitcond11 = icmp ne i64 %indvars.iv.next7, %wide.trip.count10
				br i1 %exitcond11, label %for.body.us, label %for.cond.for.end11_crit_edge.split.us

				for.body3.us: ; preds = %for.body3.lr.ph.us, %for.inc.us
				%indvars.iv = phi i64 [ 0, %for.body3.lr.ph.us ], [ %indvars.iv.next, %for.inc.us ]
				%3 = mul nsw i64 %indvars.iv, %0
				%add.ptr7.us = getelementptr inbounds i32, i32* %add.ptr.us, i64 %3
				%4 = bitcast i32* %add.ptr7.us to i8*
				call void @llvm.memset.p0i8.i64(i8* align 4 %4, i8 0, i64 %mul8, i1 false)
				br label %for.inc.us

				for.inc.us: ; preds = %for.body3.us
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond, label %for.body3.us, label %for.cond1.for.end_crit_edge.us

				for.body3.lr.ph.us: ; preds = %for.body.us
				%5 = mul nsw i64 %indvars.iv6, %1
				%6 = mul nsw i64 %5, %2
				%add.ptr.us = getelementptr inbounds i32, i32* %ar, i64 %6
				%wide.trip.count = zext i32 %m to i64
				br label %for.body3.us

				for.cond1.for.end_crit_edge.us: ; preds = %for.inc.us
				br label %for.end.us

				for.cond.for.end11_crit_edge.split.us: ; preds = %for.inc9.us
				br label %for.cond.for.end11_crit_edge

				for.body.lr.ph.split: ; preds = %for.body.lr.ph
				br label %for.cond.for.end11_crit_edge.split

				for.cond.for.end11_crit_edge.split: ; preds = %for.body.lr.ph.split
				br label %for.cond.for.end11_crit_edge

				for.cond.for.end11_crit_edge: ; preds = %for.cond.for.end11_crit_edge.split.us, %for.cond.for.end11_crit_edge.split
				br label %for.end11

				for.end11: ; preds = %for.cond.for.end11_crit_edge, %entry
				ret void
				}

	declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg)			declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg)