This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Scalar/
-
llvm/
-
Transforms/
-
Scalar/
-
LoopIdiomRecognize.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
LoopIdiomRecognize.cpp
-
test/Transforms/LoopIdiom/
-
Transforms/
-
LoopIdiom/
-
memset-runtime-lir.ll
-
memset-runtime-lnir.ll

Differential D104636

[WIP][LoopIdiom] Teach LNIR to versioning loops and add runtime check
Needs ReviewPublic

Authored by eopXD on Jun 21 2021, 5:38 AM.

Download Raw Diff

Details

Reviewers

Whitney
fhahn
lebedev.ri

Summary

Teach LNIR to add runtime upon the loop nest for runtime-determined store size.
Only versioning on the top-level loop is a trade-off between this optimization
and the exponential growth of versioning.

LNIR will perform identical to the current LIR if the runtime detects negative
value or too-large values.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

eopXD created this revision.Jun 21 2021, 5:38 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 21 2021, 5:38 AM

eopXD requested review of this revision.Jun 21 2021, 5:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2021, 5:38 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B110181: Diff 353346.Jun 21 2021, 5:38 AM

eopXD added parent revisions: D104631: [LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified, D104595: [NFCI] [LoopIdiom] Let processLoopStridedStore take StoreSize as SCEV instead of unsigned.Jun 21 2021, 5:38 AM

eopXD added reviewers: Whitney, fhahn, lebedev.ri.Jun 21 2021, 5:40 AM

Why is this beneficial?

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
1 ↗	(On Diff #353346)	Please use update script to autogenerate check lines

In D104636#2830325, @lebedev.ri wrote:

Why is this beneficial?

I shall quote from D104179 as @qianzhen stated this optimization clearly:

When the loop idiom transformation processes a memset instruction in a loop, currently it only handles the memset with a compile-time constant size. The motivation of this work is to relax this limitation, so that a memset with a variable size in a loop may still be processed and promoted to a larger memset if it passes all the eligibility checks. Performance-wise, promoting the memset in a loop to a larger memset reduces the number of calls to memset; hence reducing the overall call overhead.
A similar technique may also apply to the memcpy with a variable size in a loop.

eopXD added inline comments.Jun 21 2021, 5:55 AM

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
1 ↗	(On Diff #353346)	I have never auto-generated check lines before. May I ask if there is some resource for me to explore this?

In D104636#2830333, @eopXD wrote:

In D104636#2830325, @lebedev.ri wrote:

Why is this beneficial?

I shall quote from D104179 as @qianzhen stated this optimization clearly:

When the loop idiom transformation processes a memset instruction in a loop, currently it only handles the memset with a compile-time constant size. The motivation of this work is to relax this limitation, so that a memset with a variable size in a loop may still be processed and promoted to a larger memset if it passes all the eligibility checks. Performance-wise, promoting the memset in a loop to a larger memset reduces the number of calls to memset; hence reducing the overall call overhead.
A similar technique may also apply to the memcpy with a variable size in a loop.

This only reiterates what this patch does, which was pretty obvious, and does not answer my question.
More concretely, can the scalar loop be expected to clear more than i64 -1 bytes of memory?
If not, why can't the byte count to be cleared can't be calculated by promoting to i64, and multiplying with saturation?

I am quite skeptical how useful this can be in practise. Any motivation examples from real world codes/benchmarks?

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
7 ↗	(On Diff #353346)	LoopFlatten pass could convert two loops to one, no? Than you dont need to teach this pass about this pattern. Btw, is this common pattern? Can you show us some motivation examples? Some hits in benchmarks?
22 ↗	(On Diff #353346)	So we have original loops and memset, which could be expanded to series of stores. Nightmare for codesize..

efriedma added a subscriber: efriedma.Jun 21 2021, 6:17 PM

Make testcase more precise to optimization this patch provides.

Thank you @lebedev.ri and @xbolva00 for leaving comments.
I would follow up with benchmarks of the patch.

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
7 ↗	(On Diff #353346)	Yes, LoopFlatten does convert two loops into one. Sorry that my example is misleading. I have adjusted the test case so LoopFlatten is irrelevant. In the new test case the `memset` is runtime determined and cannot be optimized by current LoopIdiomRecognize, which this patch deals with the scenario. To deal with the scenario there should be runtime checks to make sure the optimization is safe, so versioning is needed. To avoid exponential versioning growths, we can do 1 versioning for every nested loop. That is where the `LoopNestPass` comes in handy. The pass runs on `LoopNest` which constructs on a whole nested loop (doesn't run on a sub-loop).
22 ↗	(On Diff #353346)	I think this pass does the opposite. It processes series of store of a loop into a single `memset`. The code size increase scenario by this patch is when versioning happens. (Original LoopIdiomRecognize doesn't involve versioning)

lebedev.ri requested changes to this revision.Jun 22 2021, 2:47 AM

lebedev.ri added inline comments.

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
22 ↗	(On Diff #353346)	The code size increase scenario by this patch is when versioning happens. Isn't that the @xbolva00's comment in the first place?

This revision now requires changes to proceed.Jun 22 2021, 2:47 AM

eopXD marked 2 inline comments as not done.Jun 22 2021, 2:49 AM

Accidentally marked done, undone-ed it.
No bad intentions there.

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
22 ↗	(On Diff #353346)	I think I have misunderstood the english. I originally thought series of store are multiple store instructions. Yes, the versioning would make code size increase. I think it should be on the user whether they want this optimization to be turnt on. I added `ForceNoLoopVersion` as a compile option. If `ForceNoLoopVersion = true`, then no runtime checks will be added and the pass will only process on constant sizes.

Harbormaster completed remote builds in B110363: Diff 353584.Jun 22 2021, 2:58 AM

Fix some clang-format.

Harbormaster completed remote builds in B110760: Diff 354152.Jun 23 2021, 9:55 PM

fhahn added inline comments.Jun 25 2021, 7:15 AM

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
36 ↗	(On Diff #354152)	I'm not sure if I am missing something, but this test has just a single loop, so it's a 'trivial' loop nest, right? Couldn't this case be handled without making it a loop nest pass, by just checking if there's no parent loop? And even if there's a parent loop, wouldn't it also be possible to transform the inner loop to a wider memset, without the parents being perfectly nested?
1 ↗	(On Diff #353346)	You can use the `llvm/utils/update_test_checks.py` script to automatically generate check lines.

update code to let it run correctly.

Harbormaster completed remote builds in B111110: Diff 354652.Jun 26 2021, 2:31 AM

Generate check lines with script.

Harbormaster completed remote builds in B111120: Diff 354663.Jun 26 2021, 3:00 AM

eopXD added inline comments.Jun 26 2021, 3:28 AM

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
36 ↗	(On Diff #354152)	Yes you are right. This example does not show why LoopNest is needed. Current LIR only deals with constant store size (or memset size). This patch uses SCEV to let LIR deal with runtime sizes. If we deal with runtime sizes, we will need to add runtime check for it. if (runtime_check) { //optimized code } else // code that does not optimize runtime variables } With LIR, we can certainly transfer loops from inner loop to a wider one until it reaches the outer most. However if we perform it layer by layer, the total versions created will increase exponentially. To prevent this from happening, we create only 1 if-else for versioning at the top-loop. This is where LoopNest comes in. Thank you for reminding, I shall add an example to show this.

Address comments.

Make processLoopMemset able to deal with smax expressions with constant operands that are loop guards.

How: If operands are constant, we can fold them and add runtime checks when we version and make sure our folding on smax is safe.

Added nested testcase that shows the necessity of using LoopNest.

Why: If we use the current LIR with this patch, which seeks to optimize on runtime determined variables, the versioning would happen loop-by-loop. In other word, the pass would need to generate versioning for every loop until the pass hoists the memset to the outermost-loop's preheader. The LNIR (LoopNestIdiomRecognize) would make only 1 versioning outside of the outer-most loop to prevent this from happening.

Harbormaster completed remote builds in B114181: Diff 358877.Jul 15 2021, 2:39 AM

@lebedev.ri

May I ask what is your main concern to this patch?
I would be glad to discuss and resolve it.

eopXD added inline comments.Jul 15 2021, 3:02 AM

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
36 ↗	(On Diff #354152)	Hi @fhahn, I have added nested test-case which shows improvement of this patch (support IdiomRecognize optimization to runtime-determined variables) and the necessity of LoopNest structure (version on top-level loop). May I mark thread as done?

Just minor comment regarding potential benefits and benchmarking. If I understand patch correctly it flatten multilevel loop with memset into one memset. According simple microbenchmarks: https://godbolt.org/z/MTnYcvvYo on my x86-64 Skylake flattening memset in double loop (memset_3D) into 1 memset gives between even 800% performance boost (on small WS) to ~80% boost (when WS > LLC).
But how useful such transformation would be in practice? I'm not sure. We need to keep in mind that memset is usually just part of initialization/reusing memory code so in real world benchmarks flattening memsets loop may be less beneficial than microbenchmarks shows.

gentle ping

In D104636#2879853, @yurai007 wrote:

But how useful such transformation would be in practice? I'm not sure. We need to keep in mind that memset is usually just part of initialization/reusing memory code so in real world benchmarks flattening memsets loop may be less beneficial than microbenchmarks shows.

In Fortran, initialization of multi-dimensional arrays uses one liners like

A = 0.0

When A is multi-dimensional, it would be converted to loop-nests in LLVM-IR. For example if A is 3-D then the LLVM IR would be like the following. This nested store operation pattern is a common pattern to exist in the language of Fortran.

for i ...
  for j ...
    for k ...
      A[i][j][k] = 0;

I'm not sure we actually need to version the loop to handle overflow.

In any given address-space, there are at most 2^N bytes addressable by "gep ptr, i" where N is DataLayout::getIndexTypeSizeInBits(). Assuming the memset isn't volatile, there isn't any point to storing to an address more than once. So we only actually need to memset min(2^N, TotalBytes) bytes. You should be able to pass that number to memset.

In D104636#2886251, @efriedma wrote:

I'm not sure we actually need to version the loop to handle overflow.

That was my comment, too, and i've yet to see a concise explanation why that isn't the case.

In any given address-space, there are at most 2^N bytes addressable by "gep ptr, i" where N is DataLayout::getIndexTypeSizeInBits(). Assuming the memset isn't volatile, there isn't any point to storing to an address more than once. So we only actually need to memset min(2^N, TotalBytes) bytes. You should be able to pass that number to memset.

Meinersbur added a subscriber: Meinersbur.Jul 18 2021, 6:22 PM

Hi @efriedma,

I am not sure consecutive memory won't exceed 2^N bytes, but we cannot think of a scenario that it can. Thank you for stating this valid point.

However the current code inside LoopIdiomRecognize::processLoopMemset , for the constant size the pass would also check for size overflow. I am curious of why it is originally needed here? Do you have any idea why is the check needed? (the commit history is out of the Phabricator's reach)

// Reject memsets that are so large that they overflow an unsigned.
  uint64_t SizeInBytes = cast<ConstantInt>(MSI->getLength())->getZExtValue();
  if ((SizeInBytes >> 32) != 0)
    return false;

Assume we don't need runtime check for size overflow, we still need runtime checks for the assumptions made to fold SCEV expressions that enable the optimization. So versioning would still be needed.

In D104636#2896872, @eopXD wrote:

However the current code inside LoopIdiomRecognize::processLoopMemset , for the constant size the pass would also check for size overflow. I am curious of why it is originally needed here? Do you have any idea why is the check needed? (the commit history is out of the Phabricator's reach)

Code originates from rG8643810eded6eef8ad2753478a8403437695228f . As for why, not sure. There isn't any obvious reason it's necessary.

Assume we don't need runtime check for size overflow, we still need runtime checks for the assumptions made to fold SCEV expressions that enable the optimization. So versioning would still be needed.

Why can't we use the SCEV expression for the trip count as-is?

Assume we don't need runtime check for size overflow, we still need runtime checks for the assumptions made to fold SCEV expressions that enable the optimization. So versioning would still be needed.

Why can't we use the SCEV expression for the trip count as-is?

The SCEV of trip count may contain smax which would make the memset size not comparable in processLoopMemset. To conservatively fold the smax expression a runtime check is added.

In a do-while loop of the following, the trip-count SCEV would be (1 smax (sext i32 %n to i64)). To fold the SCEV we add run-check condition for n >= 1, and fold the expression into sext i32 %n to i64.

int i = 0;
do {
  ++i;
}  while (i < n);

An example would be the 2nd test case in memset-runtime.ll.
For the inner-loop, the optimized memset instruction would have the memsetSizeSCEV contain smax expression since NumBytesSCEV = TripCountSCEV * StoreSizeSCEV

Calculate NumBytesSCEV = TripCountSCEV * StoreSizeSCEV
  TripCountSCEV: (1 smax (sext i32 %m to i64))
  StoreSizeSCEV: (4 * (sext i32 %o to i64))<nsw>
  NumBytesSCEV: (4 * (sext i32 %o to i64) * (1 smax (sext i32 %m to i64)))

Then in the outer-loop when the pass want to compare check whether memsetSizeSCEV == pointerStrideSCEV we need to fold the expression to eliminate smax so that the comparison can be performed. If the folded expression is successful then the pass would proceed to perform the optimization for the outer loop.

MemsetSizeSCEV: (4 * (sext i32 %o to i64) * (1 smax (sext i32 %m to i64)))
PositiveStrideSCEV: (4 * (sext i32 %m to i64) * (sext i32 %o to i64))
Try to convert SCEV expression and compare again
  MemsetSCEVConv: (4 * (zext i32 %m to i64) * (zext i32 %o to i64))
  PositiveStrideSCEVConv: (4 * (zext i32 %m to i64) * (zext i32 %o to i64))

Yes, I guess you need to version to handle that case in particular. I'd like to avoid versioning in simpler cases, though.

I have to ask; is that case actually showing up in real code? Usually, people prefer for loops over do-while loops in situations like that. If you write an equivalent test using for loops, SCEV should simplify the backedge-taken count.

Address comments.

no need to check overflow, the assumption is no consecutive memory exceeds 2^N (where N is DataLayout::getIndexTypeSizeInBits()).
add nested for-loop to testcase

Harbormaster completed remote builds in B116004: Diff 361419.Jul 24 2021, 1:07 AM

SCEV folding is still needed for for-loop test case.

void test(int n, int m, int o, int *ar) {
  for (int i=0; i<n; ++i) {
    for (int j=0; j<m; ++j) {
      int *arr = ar + i * m * o + j * o;
      memset(arr, 0, o * sizeof(int));      
    }
  }
}

For the inner loop:

MemsetSizeSCEV: (4 * (sext i32 %o to i64))<nsw>
PositiveStrideSCEV: (4 * (sext i32 %o to i64))<nsw>
Calculate NumBytesS = TripCountS * StoreSizeSCEV
  TripCountS: (zext i32 %m to i64)
  StoreSizeSCEV: (4 * (sext i32 %o to i64))<nsw>
  NumBytesS: (4 * (zext i32 %m to i64) * (sext i32 %o to i64))

Then in the outer-loop, to compare MemsetSizeSCEV to the PointerStrideSCEV, the pass should convert sext to zext for comparison, which is the SCEV folding and runtime check is added.

MemsetSizeSCEV: (4 * (zext i32 %m to i64) * (sext i32 %o to i64))
PositiveStrideSCEV: (4 * (sext i32 %m to i64) * (sext i32 %o to i64))
Try to convert SCEV expression and compare again
  MemsetSCEVConv: (4 * (zext i32 %m to i64) * (zext i32 %o to i64))
  PositiveStrideSCEVConv: (4 * (zext i32 %m to i64) * (zext i32 %o to i64))

Oh, hmm. Yes, you need to do some sort of SCEV folding to handle that case, but you can perform the check at compile-time using something like ScalarEvolution::isLoopEntryGuardedByCond.

In D104636#2902181, @eopXD wrote:

no need to check overflow, the assumption is no consecutive memory exceeds 2^N (where N is DataLayout::getIndexTypeSizeInBits()).

Maybe to be conservative, restrict this to memset in address-space zero.

In D104636#2905451, @efriedma wrote:

In D104636#2902181, @eopXD wrote:

no need to check overflow, the assumption is no consecutive memory exceeds 2^N (where N is DataLayout::getIndexTypeSizeInBits()).

Maybe to be conservative, restrict this to memset in address-space zero.

The reason for this is that in other address-spaces, you might need to saturate the memset amount, instead of just ignoring overflow, I think.

eopXD updated this revision to Diff 363694.Aug 3 2021, 4:41 AM

This comment was removed by eopXD.

Harbormaster completed remote builds in B117604: Diff 363694.Aug 3 2021, 4:42 AM

Revert to previous edition.

Harbormaster completed remote builds in B117606: Diff 363696.Aug 3 2021, 4:45 AM

Address comments.

For SCEV expression MemsetSize and PointerStride of a memset instruction, the pass
would first fold expressions that is already guarded by the loop guard. If the
folded expressions are not equal, LoopIdiomRecognize will abort. LoopNestIdiomRecognize
would proceed to try convert the SCEV and add appropriate runtime checks.
If the converted expression are equal, LNIR would proceed the optimization and do
versioning at the outer-most loop.

Added separate testcase files for LIR and LNIR.
For LIR, it can deal with runtime determined variables when runtime check is required.
For LNIR, it will do versioning and add runtime check if needed.

Harbormaster completed remote builds in B117610: Diff 363701.Aug 3 2021, 4:56 AM

High-level comment: let's not convolute things too much.
I'm honestly already at loss with the current diff, and (to me!) not very clear answers to the questions.
Let's at least deal with proper loop-idiom patch first,
and maybe follow-up with a loop-nest loop idiom changes.

In D104636#2921989, @lebedev.ri wrote:

High-level comment: let's not convolute things too much.
I'm honestly already at loss with the current diff, and (to me!) not very clear answers to the questions.
Let's at least deal with proper loop-idiom patch first,
and maybe follow-up with a loop-nest loop idiom changes.

Sure.
Let me open a new patch on the proper loop-idiom and maybe come back here after that.
Thank you for following up with this patch.

HLJ2009 added a subscriber: HLJ2009.Aug 3 2021, 5:03 AM

eopXD removed parent revisions: D104595: [NFCI] [LoopIdiom] Let processLoopStridedStore take StoreSize as SCEV instead of unsigned, D104631: [LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified.Aug 3 2021, 7:53 AM

eopXD added a parent revision: D107353: [LoopIdiom] let the pass deal with runtime memset size.

eopXD removed a parent revision: D107353: [LoopIdiom] let the pass deal with runtime memset size.

eopXD added parent revisions: D104179: [LoopIdiom] Introduce LoopNestIdiomRecognize as an alternative, D104620: [LoopVersioning] Add utility to fetch the runtime check basic block, D104631: [LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified, D107353: [LoopIdiom] let the pass deal with runtime memset size.Aug 3 2021, 7:59 AM

Created a new patch D107353 that deals with the proper loop-idiom.
Will come back and modify after D107353 is accepted.

eopXD mentioned this in D107353: [LoopIdiom] let the pass deal with runtime memset size.Aug 13 2021, 10:08 AM

Whitney added a project: Restricted Project.Dec 1 2021, 8:27 AM

eopXD mentioned this in D104631: [LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified.Dec 10 2021, 8:21 AM

eopXD mentioned this in D104179: [LoopIdiom] Introduce LoopNestIdiomRecognize as an alternative.Dec 16 2021, 12:05 AM

eopXD removed a parent revision: D104620: [LoopVersioning] Add utility to fetch the runtime check basic block.Dec 16 2021, 12:18 AM

eopXD removed parent revisions: D107353: [LoopIdiom] let the pass deal with runtime memset size, D104631: [LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified.

eopXD added a parent revision: D104631: [LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified.

eopXD retitled this revision from [LoopIdiom] [LoopNest] let the pass deal with runtime memset size to [LoopIdiom] Teach LNIR to versioning loops and add runtime check.Dec 16 2021, 12:33 AM

eopXD edited the summary of this revision. (Show Details)

Patch description does not explain why versioning is needed in the first place.

In D104636#3196900, @lebedev.ri wrote:

Patch description does not explain why versioning is needed in the first place.

Yes you are right. I will need some time to rebase and update this patch.

Also, I understand that the LNIR idea might not ultimately justify itself to be merged into upstream.
So after this patch is ready for review maybe we can have a discussion on the LoopOpt Work Group.

eopXD removed a parent revision: D104631: [LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified.Dec 16 2021, 12:50 AM

eopXD retitled this revision from [LoopIdiom] Teach LNIR to versioning loops and add runtime check to [WIP][LoopIdiom] Teach LNIR to versioning loops and add runtime check.Dec 16 2021, 1:05 AM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 5:15 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Scalar/

LoopIdiomRecognize.h

6 lines

lib/

Transforms/

Scalar/

LoopIdiomRecognize.cpp

409 lines

test/

Transforms/

LoopIdiom/

memset-runtime-lir.ll

237 lines

memset-runtime-lnir.ll

259 lines

Diff 363701

llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h

	//===- LoopIdiomRecognize.h - Loop Idiom Recognize Pass ---------- C++ --===//			//===- LoopIdiomRecognize.h - Loop Idiom Recognize Pass ---------- C++ --===//
				Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	Show All 30 Lines

	/// Performs Loop Idiom Recognize Pass.			/// Performs Loop Idiom Recognize Pass.
	class LoopIdiomRecognizePass : public PassInfoMixin<LoopIdiomRecognizePass> {			class LoopIdiomRecognizePass : public PassInfoMixin<LoopIdiomRecognizePass> {
	public:			public:
	PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,			PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
	LoopStandardAnalysisResults &AR, LPMUpdater &U);			LoopStandardAnalysisResults &AR, LPMUpdater &U);
	};			};

	// NFC LoopNestPass with regards to the current LoopPass-LoopIdiomRecognize			// The LoopNestIdiomRecognize is a LoopNestPass that feeds LoopNest into
				// LoopIdiomRecognize. The main difference from LoopIdiomRecognize is it
				// allows runtime-determined store size optimization by versioning and creates
				// runtime checks on the top-level loop. The reason to only version on the
				// top-level loop is to avoid the exponential growth of versioning.
	class LoopNestIdiomRecognizePass			class LoopNestIdiomRecognizePass
	: public PassInfoMixin<LoopNestIdiomRecognizePass> {			: public PassInfoMixin<LoopNestIdiomRecognizePass> {
	public:			public:
	PreservedAnalyses run(LoopNest &LN, LoopAnalysisManager &LAM,			PreservedAnalyses run(LoopNest &LN, LoopAnalysisManager &LAM,
	LoopStandardAnalysisResults &AR, LPMUpdater &U);			LoopStandardAnalysisResults &AR, LPMUpdater &U);
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_SCALAR_LOOPIDIOMRECOGNIZE_H			#endif // LLVM_TRANSFORMS_SCALAR_LOOPIDIOMRECOGNIZE_H

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

//===- LoopIdiomRecognize.cpp - Loop idiom recognition --------------------===//		//===- LoopIdiomRecognize.cpp - Loop idiom recognition --------------------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/InstructionCost.h"		#include "llvm/Support/InstructionCost.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BuildLibCalls.h"		#include "llvm/Transforms/Utils/BuildLibCalls.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
		#include "llvm/Transforms/Utils/LoopVersioning.h"
#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"		#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-idiom"		#define DEBUG_TYPE "loop-idiom"

STATISTIC(NumMemSet, "Number of memset's formed from loop stores");		STATISTIC(NumMemSet, "Number of memset's formed from loop stores");
		STATISTIC(NumMemSetRuntimeLength,
		"Number of memset's formed from memset with runtime length");
STATISTIC(NumMemCpy, "Number of memcpy's formed from loop load+stores");		STATISTIC(NumMemCpy, "Number of memcpy's formed from loop load+stores");
STATISTIC(		STATISTIC(
NumShiftUntilBitTest,		NumShiftUntilBitTest,
"Number of uncountable loops recognized as 'shift until bitttest' idiom");		"Number of uncountable loops recognized as 'shift until bitttest' idiom");
STATISTIC(NumShiftUntilZero,		STATISTIC(NumShiftUntilZero,
"Number of uncountable loops recognized as 'shift until zero' idiom");		"Number of uncountable loops recognized as 'shift until zero' idiom");

bool DisableLIRP::All;		bool DisableLIRP::All;
Show All 20 Lines	DisableLIRPMemcpy("disable-" DEBUG_TYPE "-memcpy",
cl::ReallyHidden);		cl::ReallyHidden);

static cl::opt<bool> UseLIRCodeSizeHeurs(		static cl::opt<bool> UseLIRCodeSizeHeurs(
"use-lir-code-size-heurs",		"use-lir-code-size-heurs",
cl::desc("Use loop idiom recognition code size heuristics when compiling"		cl::desc("Use loop idiom recognition code size heuristics when compiling"
"with -Os/-Oz"),		"with -Os/-Oz"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

		static cl::opt<bool> ForceNoLoopVersion(
		DEBUG_TYPE "-no-loop-version",
		cl::desc("Force not to create loop versions if the user guarantees that"
		"the length of each array dimension is positive value, and"
		"the multiplication of lengths of all array dimensions does not"
		"exceeds the range of type size_t"),
		cl::init(false), cl::Hidden);

namespace {		namespace {

		typedef DenseMap<const SCEV *, int> SCEVConstValPairMap;

		/// A helper class to do the following SCEV expression conversions.
		/// 1) "sext %val" to "zext %val"
		/// 2) "SOME_CONSTANT_VALUE smax %val" to "%val"
		class SCEVExprConverter {
		public:
		Loop *CurLoop;
		ScalarEvolution &SE;
		SCEVConstValPairMap CheckSltMap;

		SCEVExprConverter(ScalarEvolution &SE)
		: SE(SE) {}

		const SCEV convertSCEV(const SCEV Expr, bool AddRuntimCheck);
		};

class LoopIdiomRecognize {		class LoopIdiomRecognize {
Loop *CurLoop = nullptr;		Loop *CurLoop = nullptr;
LoopNest *LN;		LoopNest *LN;
		Loop *TopLoop = nullptr;
		Loop *FallBackLoop = nullptr;
		BasicBlock *RuntimeCheckBB = nullptr;
AliasAnalysis *AA;		AliasAnalysis *AA;
DominatorTree *DT;		DominatorTree *DT;
LoopInfo *LI;		LoopInfo *LI;
ScalarEvolution *SE;		ScalarEvolution *SE;
TargetLibraryInfo *TLI;		TargetLibraryInfo *TLI;
const TargetTransformInfo *TTI;		const TargetTransformInfo *TTI;
const DataLayout *DL;		const DataLayout *DL;
OptimizationRemarkEmitter &ORE;		OptimizationRemarkEmitter &ORE;
bool ApplyCodeSizeHeuristics;		bool ApplyCodeSizeHeuristics;
std::unique_ptr<MemorySSAUpdater> MSSAU;		std::unique_ptr<MemorySSAUpdater> MSSAU;
		SCEVConstValPairMap CheckSltMap;
		SCEVExprConverter Converter;

public:		public:
explicit LoopIdiomRecognize(AliasAnalysis AA, DominatorTree DT,		explicit LoopIdiomRecognize(AliasAnalysis AA, DominatorTree DT,
LoopInfo LI, LoopNest LN, ScalarEvolution *SE,		LoopInfo LI, LoopNest LN, ScalarEvolution *SE,
TargetLibraryInfo *TLI,		TargetLibraryInfo *TLI,
const TargetTransformInfo TTI, MemorySSA MSSA,		const TargetTransformInfo TTI, MemorySSA MSSA,
const DataLayout *DL,		const DataLayout *DL,
OptimizationRemarkEmitter &ORE)		OptimizationRemarkEmitter &ORE)
: LN(LN), AA(AA), DT(DT), LI(LI), SE(SE), TLI(TLI), TTI(TTI), DL(DL),		: LN(LN), AA(AA), DT(DT), LI(LI), SE(SE), TLI(TLI), TTI(TTI), DL(DL),
ORE(ORE) {		ORE(ORE), Converter(*SE) {
if (MSSA)		if (MSSA)
MSSAU = std::make_unique<MemorySSAUpdater>(MSSA);		MSSAU = std::make_unique<MemorySSAUpdater>(MSSA);
		if (LN)
		TopLoop = &LN->getOutermostLoop();
}		}

bool runOnLoopNest();		bool runOnLoopNest();
bool runOnLoop(Loop *L);		bool runOnLoop(Loop *L);

private:		private:
using StoreList = SmallVector<StoreInst *, 8>;		using StoreList = SmallVector<StoreInst *, 8>;
using StoreListMap = MapVector<Value *, StoreList>;		using StoreListMap = MapVector<Value *, StoreList>;
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	bool processLoopStoreOfLoopLoad(Value DestPtr, Value SourcePtr,
unsigned StoreSize, MaybeAlign StoreAlign,		unsigned StoreSize, MaybeAlign StoreAlign,
MaybeAlign LoadAlign, Instruction *TheStore,		MaybeAlign LoadAlign, Instruction *TheStore,
Instruction *TheLoad,		Instruction *TheLoad,
const SCEVAddRecExpr *StoreEv,		const SCEVAddRecExpr *StoreEv,
const SCEVAddRecExpr *LoadEv,		const SCEVAddRecExpr *LoadEv,
const SCEV *BECount);		const SCEV *BECount);
bool avoidLIRForMultiBlockLoop(bool IsMemset = false,		bool avoidLIRForMultiBlockLoop(bool IsMemset = false,
bool IsLoopMemset = false);		bool IsLoopMemset = false);
		bool isTopLoopVersioned() const { return RuntimeCheckBB != nullptr; }
		void versionTopLoop();

/// @}		/// @}
/// \name Noncountable Loop Idiom Handling		/// \name Noncountable Loop Idiom Handling
/// @{		/// @{

bool runOnNoncountableLoop();		bool runOnNoncountableLoop();

bool recognizePopcount();		bool recognizePopcount();
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
AU.addPreserved<MemorySSAWrapperPass>();		AU.addPreserved<MemorySSAWrapperPass>();
getLoopAnalysisUsage(AU);		getLoopAnalysisUsage(AU);
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

		/// Implementation of SCEVExprConverter.
		/// Tries to fold the SCEV with regard to loop guards of CurLoop
		const SCEV SCEVExprConverter::convertSCEV(const SCEV Expr, bool AddRuntimeCheck) {
		switch (Expr->getSCEVType()) {
		case scConstant:
		case scUnknown:
		case scCouldNotCompute:
		return Expr;
		case scTruncate: {
		const SCEVTruncateExpr *Trunc = cast<SCEVTruncateExpr>(Expr);
		Type *Ty = Trunc->getType();
		const SCEV *NewTrunc = convertSCEV(Trunc->getOperand(), AddRuntimeCheck);
		return SE.getTruncateExpr(NewTrunc, Ty);
		}
		case scZeroExtend: {
		const SCEVZeroExtendExpr *Zext = cast<SCEVZeroExtendExpr>(Expr);
		Type *Ty = Zext->getType();
		const SCEV *NewZext = convertSCEV(Zext->getOperand(), AddRuntimeCheck);
		return SE.getZeroExtendExpr(NewZext, Ty);
		}
		case scSignExtend: {
		// Return original SCEV if expression is not guarded by loop with Zero and
		// dont want to add runtime check. Otherwise we fold constant and add appropriate
		// runtime check.
		const SCEVSignExtendExpr *Sext = cast<SCEVSignExtendExpr>(Expr);
		if (SE.isLoopEntryGuardedByCond(CurLoop, ICmpInst::ICMP_SGE, Sext, SE.getZero(Sext->getType())) == false) {
		if (AddRuntimeCheck == false)
		return Sext;
		if (CheckSltMap[Sext] < 0)
		CheckSltMap[Sext] = 0;
		}
		const SCEV *NewZext = convertSCEV(Sext->getOperand(), AddRuntimeCheck);
		return SE.getZeroExtendExpr(NewZext, Sext->getType());
		}
		case scAddExpr: {
		const SCEVAddExpr *Add = cast<SCEVAddExpr>(Expr);
		const SCEV *NewAdd = convertSCEV(Add->getOperand(0), AddRuntimeCheck);
		for (int I = 1, E = Add->getNumOperands(); I != E; ++I) {
		NewAdd = SE.getAddExpr(NewAdd, convertSCEV(Add->getOperand(I), AddRuntimeCheck));
		}
		return NewAdd;
		}
		case scMulExpr: {
		const SCEVMulExpr *Mul = cast<SCEVMulExpr>(Expr);
		const SCEV *NewMul = convertSCEV(Mul->getOperand(0), AddRuntimeCheck);
		for (int I = 1, E = Mul->getNumOperands(); I != E; ++I) {
		NewMul = SE.getMulExpr(NewMul, convertSCEV(Mul->getOperand(I), AddRuntimeCheck));
		}
		return NewMul;
		}
		case scUDivExpr: {
		const SCEVUDivExpr *UDiv = cast<SCEVUDivExpr>(Expr);
		const SCEV *NewLHS = convertSCEV(UDiv->getLHS(), AddRuntimeCheck);
		const SCEV *NewRHS = convertSCEV(UDiv->getRHS(), AddRuntimeCheck);
		return SE.getUDivExpr(NewLHS, NewRHS);
		}
		case scAddRecExpr:
		assert(false && "Do not expect AddRec here!");
		case scUMaxExpr: {
		const SCEVUMaxExpr *UMax = cast<SCEVUMaxExpr>(Expr);
		const SCEV *NewUMax = convertSCEV(UMax->getOperand(0), AddRuntimeCheck);
		for (int I = 1, E = UMax->getNumOperands(); I != E; ++I) {
		NewUMax = SE.getUMaxExpr(NewUMax, convertSCEV(UMax->getOperand(I), AddRuntimeCheck));
		}
		return NewUMax;
		}
		case scSMaxExpr: {
		// Return original SCEV if expression is not guarded by loop with Constant and
		// dont want to add runtime check. Otherwise we fold constant and add appropriate
		// runtime check.
		const SCEVSMaxExpr *SMax = cast<SCEVSMaxExpr>(Expr);
		const int NumOfOps = SMax->getNumOperands();
		bool Fold = false;
		// If an operand is constant, it will be the first operand.
		const SCEV *SMaxOp0 = SMax->getOperand(0);
		const SCEVConstant *Cst = dyn_cast<SCEVConstant>(SMaxOp0);

		if (Cst) {
		// check if the operand is guarded to the constant
		// if not, return orignal expression
		Fold = true;
		for (int I = 1, E = NumOfOps; I != E; ++I) {
		auto Ev = SMax->getOperand(I);
		if (SE.isLoopEntryGuardedByCond(CurLoop, ICmpInst::ICMP_SGE, Ev, Cst) == false) {
		if (AddRuntimeCheck == false)
		return SMax;
		int CstValue = Cst->getAPInt().roundToDouble();
		if (CheckSltMap[SMax] < CstValue)
		CheckSltMap[SMax] = CstValue;
		}
		}
		}

		const int StartIdx = Fold ? 1 : 0;
		const SCEV *NewSMax = convertSCEV(SMax->getOperand(StartIdx), AddRuntimeCheck);
		for (int I = StartIdx + 1, E = NumOfOps; I != E; ++I) {
		NewSMax = SE.getSMaxExpr(NewSMax, convertSCEV(SMax->getOperand(I), AddRuntimeCheck));
		}
		return NewSMax;
		}
		case scUMinExpr: {
		const SCEVUMinExpr *UMin = cast<SCEVUMinExpr>(Expr);
		const SCEV *NewUMin = convertSCEV(UMin->getOperand(0), AddRuntimeCheck);
		for (int I = 1, E = UMin->getNumOperands(); I != E; ++I) {
		NewUMin = SE.getUMinExpr(NewUMin, convertSCEV(UMin->getOperand(I), AddRuntimeCheck));
		}
		return NewUMin;
		}
		case scSMinExpr: {
		const SCEVSMinExpr *SMin = cast<SCEVSMinExpr>(Expr);
		const SCEV *NewSMin = convertSCEV(SMin->getOperand(0), AddRuntimeCheck);
		for (int I = 1, E = SMin->getNumOperands(); I != E; ++I) {
		NewSMin = SE.getSMinExpr(NewSMin, convertSCEV(SMin->getOperand(I), AddRuntimeCheck));
		}
		return NewSMin;
		}
		default:
		llvm_unreachable("Unexpected SCEV expression!");
		}
		}

		/// Helper function to generate predicate "X < ConstVal".
		// This function is used to add runtime check conditions for value that we
		// assume in order to canonicalize the SCEV for comparison.
		// For example: turn sext to zext, eliminate smax
		static Value generateSltCstPredicate(const SCEV Ev, int ConstVal,
		BranchInst *BI,
		const DataLayout *DL,
		ScalarEvolution *SE,
		IRBuilder<> &Builder) {
		SCEVExpander Expander(SE, DL, "loop-nest-idiom-runtime-bound-check");
		Type *Ty = Ev->getType();
		Value *Val = Expander.expandCodeFor(Ev, Ty, BI);
		Constant *Cst = ConstantInt::get(Ty, ConstVal);

		return Builder.CreateICmpSLT(Val, Cst);
		}

char LoopIdiomRecognizeLegacyPass::ID = 0;		char LoopIdiomRecognizeLegacyPass::ID = 0;

PreservedAnalyses LoopIdiomRecognizePass::run(Loop &L, LoopAnalysisManager &AM,		PreservedAnalyses LoopIdiomRecognizePass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,		LoopStandardAnalysisResults &AR,
LPMUpdater &) {		LPMUpdater &) {
if (DisableLIRP::All)		if (DisableLIRP::All)
return PreservedAnalyses::all();		return PreservedAnalyses::all();

▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	bool LoopIdiomRecognize::runOnLoopNest() {
std::reverse(WorkList.begin(), WorkList.end());		std::reverse(WorkList.begin(), WorkList.end());

// Iterate from innermost to outermost		// Iterate from innermost to outermost
bool Changed = false;		bool Changed = false;
for (Loop *L : WorkList) {		for (Loop *L : WorkList) {
Changed \|= runOnLoop(L);		Changed \|= runOnLoop(L);
}		}

		// After processing all the loops, we now add the stored conditions into
		// the RuntimeCheckBB. Conditions are stored when:
		// - detect runtime store size in StridedStore
		if (Changed && isTopLoopVersioned()) {
		// Get the branch instruction in the runtime check basic block.
		BranchInst *BI = dyn_cast<BranchInst>(RuntimeCheckBB->getTerminator());
		assert(BI && "Expects a BranchInst");
		LLVM_DEBUG(dbgs() << "runOnLoopNest: Add runtime check to RuntimeCheckBB\n");
		// Create conditional branch instructions for SCEV that is folded.
		// If any of the condition above is true, the fallback loop is taken.
		// Otherwise, the optimized loop is taken.
		LLVMContext &Context = TopLoop->getHeader()->getContext();
		Value *Cond = ConstantInt::getFalse(Context);

		IRBuilder<> Builder(BI);
		// add runtime check for SCEV we fold in SCEVExprConverter
		for (auto Pair : CheckSltMap) {
		const SCEV *Ev = Pair.first;
		const int Cst = Pair.second;
		LLVM_DEBUG(dbgs() << " check: " << *Ev << " < " << Cst << "\n");
		Value *CheckInBoundCond = generateSltCstPredicate(Ev, Cst, BI, DL, SE, Builder);
		Cond = Builder.CreateOr(Cond, CheckInBoundCond);
		}

		BranchInst::Create(FallBackLoop->getLoopPreheader(),
		LN->getOutermostLoop().getLoopPreheader(), Cond, BI);
		deleteDeadInstruction(BI);
		}

return Changed;		return Changed;
}		}

bool LoopIdiomRecognize::runOnLoop(Loop *L) {		bool LoopIdiomRecognize::runOnLoop(Loop *L) {
CurLoop = L;		CurLoop = L;
// If the loop could not be converted to canonical form, it must have an		// If the loop could not be converted to canonical form, it must have an
// indirectbr in it, just give up.		// indirectbr in it, just give up.
if (!L->getLoopPreheader())		if (!L->getLoopPreheader())
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	bool LoopIdiomRecognize::runOnCountableLoop() {
if (SafetyInfo.anyBlockMayThrow())		if (SafetyInfo.anyBlockMayThrow())
return false;		return false;

bool MadeChange = false;		bool MadeChange = false;

// Scan all the blocks in the loop that are not in subloops.		// Scan all the blocks in the loop that are not in subloops.
for (auto *BB : CurLoop->getBlocks()) {		for (auto *BB : CurLoop->getBlocks()) {
// Ignore blocks in subloops.		// Ignore blocks in subloops.
if (LI->getLoopFor(BB) != CurLoop)		LLVM_DEBUG(dbgs() << "Loop Block: " << BB->getName() << "\n");
		if (LI->getLoopFor(BB) != CurLoop) {
		LLVM_DEBUG(dbgs() << "block is not on current loop, abort\n");
continue;		continue;
		}

MadeChange \|= runOnLoopBlock(BB, BECount, ExitBlocks);		MadeChange \|= runOnLoopBlock(BB, BECount, ExitBlocks);
}		}
return MadeChange;		return MadeChange;
}		}

static APInt getStoreStride(const SCEVAddRecExpr *StoreEv) {		static APInt getStoreStride(const SCEVAddRecExpr *StoreEv) {
const SCEVConstant *ConstStride = cast<SCEVConstant>(StoreEv->getOperand(1));		const SCEVConstant *ConstStride = cast<SCEVConstant>(StoreEv->getOperand(1));
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
/// loop and not in any subloops.		/// loop and not in any subloops.
bool LoopIdiomRecognize::runOnLoopBlock(		bool LoopIdiomRecognize::runOnLoopBlock(
BasicBlock BB, const SCEV BECount,		BasicBlock BB, const SCEV BECount,
SmallVectorImpl<BasicBlock *> &ExitBlocks) {		SmallVectorImpl<BasicBlock *> &ExitBlocks) {
// We can only promote stores in this block if they are unconditionally		// We can only promote stores in this block if they are unconditionally
// executed in the loop. For a block to be unconditionally executed, it has		// executed in the loop. For a block to be unconditionally executed, it has
// to dominate all the exit blocks of the loop. Verify this now.		// to dominate all the exit blocks of the loop. Verify this now.
for (unsigned i = 0, e = ExitBlocks.size(); i != e; ++i)		for (unsigned i = 0, e = ExitBlocks.size(); i != e; ++i)
if (!DT->dominates(BB, ExitBlocks[i]))		if (!DT->dominates(BB, ExitBlocks[i])) {
		LLVM_DEBUG(dbgs() << "does not dominate all exit blocks to promote the store, abort.\n");
return false;		return false;
		}

bool MadeChange = false;		bool MadeChange = false;
// Look for store instructions, which may be optimized to memset/memcpy.		// Look for store instructions, which may be optimized to memset/memcpy.
collectStores(BB);		collectStores(BB);

// Look for a single store or sets of stores with a common base, which can be		// Look for a single store or sets of stores with a common base, which can be
// optimized into a memset (memset_pattern). The latter most commonly happens		// optimized into a memset (memset_pattern). The latter most commonly happens
// with structs and handunrolled loops.		// with structs and handunrolled loops.
▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	bool LoopIdiomRecognize::processLoopMemIntrinsic(
BasicBlock *BB,		BasicBlock *BB,
bool (LoopIdiomRecognize::Processor)(MemInst , const SCEV *),		bool (LoopIdiomRecognize::Processor)(MemInst , const SCEV *),
const SCEV *BECount) {		const SCEV *BECount) {
bool MadeChange = false;		bool MadeChange = false;
for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {		for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {
Instruction Inst = &I++;		Instruction Inst = &I++;
// Look for memory instructions, which may be optimized to a larger one.		// Look for memory instructions, which may be optimized to a larger one.
if (MemInst *MI = dyn_cast<MemInst>(Inst)) {		if (MemInst *MI = dyn_cast<MemInst>(Inst)) {
		LLVM_DEBUG(dbgs() << "found MemInst: " << *MI << "\n");
WeakTrackingVH InstPtr(&*I);		WeakTrackingVH InstPtr(&*I);
if (!(this->*Processor)(MI, BECount))		if (!(this->*Processor)(MI, BECount))
continue;		continue;
MadeChange = true;		MadeChange = true;

// If processing the instruction invalidated our iterator, start over from		// If processing the instruction invalidated our iterator, start over from
// the top of the block.		// the top of the block.
if (!InstPtr)		if (!InstPtr)
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	return processLoopStoreOfLoopLoad(Dest, Source, (unsigned)SizeInBytes,
MCI->getDestAlign(), MCI->getSourceAlign(),		MCI->getDestAlign(), MCI->getSourceAlign(),
MCI, MCI, StoreEv, LoadEv, BECount);		MCI, MCI, StoreEv, LoadEv, BECount);
}		}

/// processLoopMemSet - See if this memset can be promoted to a large memset.		/// processLoopMemSet - See if this memset can be promoted to a large memset.
bool LoopIdiomRecognize::processLoopMemSet(MemSetInst *MSI,		bool LoopIdiomRecognize::processLoopMemSet(MemSetInst *MSI,
const SCEV *BECount) {		const SCEV *BECount) {
// We can only handle non-volatile memsets with a constant size.		// We can only handle non-volatile memsets with a constant size.
if (MSI->isVolatile() \|\| !isa<ConstantInt>(MSI->getLength()))		if (MSI->isVolatile())
return false;		return false;

// If we're not allowed to hack on memset, we fail.		// If we're not allowed to hack on memset, we fail.
if (!HasMemset \|\| DisableLIRP::Memset)		if (!HasMemset \|\| DisableLIRP::Memset)
return false;		return false;

Value *Pointer = MSI->getDest();		Value *Pointer = MSI->getDest();

// See if the pointer expression is an AddRec like {base,+,1} on the current		// See if the pointer expression is an AddRec like {base,+,1} on the current
// loop, which indicates a strided store. If we have something else, it's a		// loop, which indicates a strided store. If we have something else, it's a
// random store we can't handle.		// random store we can't handle.
		const SCEV *PointerSCEV = SE->getSCEV(Pointer);
		if (PointerSCEV)
		LLVM_DEBUG(dbgs() << "PointerSCEV: " << *PointerSCEV << "\n");
const SCEVAddRecExpr *Ev = dyn_cast<SCEVAddRecExpr>(SE->getSCEV(Pointer));		const SCEVAddRecExpr *Ev = dyn_cast<SCEVAddRecExpr>(SE->getSCEV(Pointer));
if (!Ev \|\| Ev->getLoop() != CurLoop \|\| !Ev->isAffine())		if (!Ev \|\| Ev->getLoop() != CurLoop \|\| !Ev->isAffine()) {
		LLVM_DEBUG(dbgs() << "PointerSCEV cannot be converted to SCEVAddRecExpr, abort\n");
return false;		return false;
		}
		const SCEV *StrideSCEV = Ev->getOperand(1);
const SCEV *MemsetSizeSCEV = SE->getSCEV(MSI->getLength());		const SCEV *MemsetSizeSCEV = SE->getSCEV(MSI->getLength());
if (!MemsetSizeSCEV)		if (!StrideSCEV \|\| !MemsetSizeSCEV)
return false;		return false;

		bool NegStride;
		const bool IsConstantSize = isa<ConstantInt>(MSI->getLength());
		if (IsConstantSize) {
		// Memset size is constant
// Reject memsets that are so large that they overflow an unsigned.		// Reject memsets that are so large that they overflow an unsigned.
		LLVM_DEBUG(dbgs() << " memset size is constant\n");
uint64_t SizeInBytes = cast<ConstantInt>(MSI->getLength())->getZExtValue();		uint64_t SizeInBytes = cast<ConstantInt>(MSI->getLength())->getZExtValue();
if ((SizeInBytes >> 32) != 0)		if ((SizeInBytes >> 32) != 0)
return false;		return false;

// Check to see if the stride matches the size of the memset. If so, then we		// Check to see if the stride matches the size of the memset. If so, then
// know that every byte is touched in the loop.		// we know that every byte is touched in the loop.
const SCEVConstant *ConstStride = dyn_cast<SCEVConstant>(Ev->getOperand(1));		const SCEVConstant *ConstStride = dyn_cast<SCEVConstant>(Ev->getOperand(1));
if (!ConstStride)		if (!ConstStride)
return false;		return false;

APInt Stride = ConstStride->getAPInt();		APInt Stride = ConstStride->getAPInt();
if (SizeInBytes != Stride && SizeInBytes != -Stride)		if (SizeInBytes != Stride && SizeInBytes != -Stride)
return false;		return false;

		NegStride = SizeInBytes == -Stride;
		} else {
		// Memset size is non-constant
		// Check if the stride matches the memset size, by comparing the SCEV
		// expressions of the stride and memset size. The two expressions match
		// if they are equal. If they match, then we know that every byte is
		// touched in the loop. We only handle memset length and stride that
		// are invariant for the top level loop.
		// To be conservative, in runtime we would not promote pointers that isn't
		// in address space zero
		LLVM_DEBUG(dbgs() << " memset size is non-constant\n");
		if (Pointer->getType()->getPointerAddressSpace() != 0) {
		LLVM_DEBUG(dbgs() << " pointer is not in address space zero\n");
		return false;
		}
		if (!SE->isLoopInvariant(MemsetSizeSCEV, TopLoop) \|\|
		!SE->isLoopInvariant(StrideSCEV, TopLoop)) {
		LLVM_DEBUG(dbgs() << " memset size or stride is not a loop-invariant, "
		<< "abort\n");
		return false;
		}

		// compare positive direction strideSCEV with MemsizeSizeSCEV
		NegStride = StrideSCEV->isNonConstantNegative();
		const SCEV *PositiveStrideSCEV =
		NegStride ? SE->getNegativeSCEV(StrideSCEV) : StrideSCEV;
		LLVM_DEBUG(dbgs() << " MemsetSizeSCEV: " << *MemsetSizeSCEV << "\n"
		<< " PositiveStrideSCEV: " << *PositiveStrideSCEV
		<< "\n");

		if (PositiveStrideSCEV != MemsetSizeSCEV) {
		// If the original StrideSCEV and MemsetSizeSCEV does not match, the pass will
		// fold expressions that is covered by the loop guard at loop entry.
		// We will compare again after the folding and proceed if equal
		Converter.CheckSltMap.clear();
		Converter.CurLoop = CurLoop;
		const SCEV *FoldedPositiveStride =
		Converter.convertSCEV(PositiveStrideSCEV, /AddRuntimeCheck=/false);
		const SCEV *FoldedMemsetSize =
		Converter.convertSCEV(MemsetSizeSCEV, /AddRuntimeCheck=/false);
		LLVM_DEBUG(dbgs() << " Try to fold SCEV expression covered by loop guard\n"
		<< " FoldedMemsetSCEV: " << *FoldedMemsetSize << "\n"
		<< " FoldedPositiveStrideSCEV: "
		<< *FoldedPositiveStride << "\n");

		if (FoldedPositiveStride != FoldedMemsetSize) {
		if (LN == nullptr \|\| ForceNoLoopVersion) {
		LLVM_DEBUG(dbgs() << " unable to do loop versioning here, abort\n");
		return false;
		}
		const SCEV *ConvertedPositiveStride =
		Converter.convertSCEV(FoldedPositiveStride, /AddRuntimeCheck=/true);
		const SCEV *ConvertedMemsetSize =
		Converter.convertSCEV(FoldedMemsetSize, /AddRuntimeCheck=/true);
		LLVM_DEBUG(dbgs() << " Try to convert SCEV expression and add appropriate runtime check\n"
		<< " ConvertedMemsetSCEV: " << *ConvertedMemsetSize << "\n"
		<< " ConvertedPositiveStrideSCEV: "
		<< *ConvertedPositiveStride << "\n");
		if (ConvertedPositiveStride != ConvertedMemsetSize) {
		LLVM_DEBUG(dbgs() << " Converted SCEV inequal, abort\n");
		return false;
		}
		}
		}
		}

// Verify that the memset value is loop invariant. If not, we can't promote		// Verify that the memset value is loop invariant. If not, we can't promote
// the memset.		// the memset.
Value *SplatValue = MSI->getValue();		Value *SplatValue = MSI->getValue();
if (!SplatValue \|\| !CurLoop->isLoopInvariant(SplatValue))		if (!SplatValue \|\| !CurLoop->isLoopInvariant(SplatValue))
return false;		return false;

SmallPtrSet<Instruction *, 1> MSIs;		SmallPtrSet<Instruction *, 1> MSIs;
MSIs.insert(MSI);		MSIs.insert(MSI);
bool NegStride = SizeInBytes == -Stride;		bool Changed = processLoopStridedStore(
return processLoopStridedStore(
Pointer, MemsetSizeSCEV, MaybeAlign(MSI->getDestAlignment()),		Pointer, MemsetSizeSCEV, MaybeAlign(MSI->getDestAlignment()),
SplatValue, MSI, MSIs, Ev, BECount, NegStride, /IsLoopMemset=/true);		SplatValue, MSI, MSIs, Ev, BECount, NegStride, /IsLoopMemset=/true);

		// if we have successfully changed with processLoopStridedStore
		// add the require runtime check information into list.
		if (Changed && isTopLoopVersioned()) {
		for (auto Pair : Converter.CheckSltMap) {
		auto Ev = Pair.first;
		auto Cst = Pair.second;
		if (CheckSltMap[Ev] < Cst)
		CheckSltMap[Ev] = Cst;
		}
		Converter.CheckSltMap.clear();
		}

		return Changed;
}		}

/// mayLoopAccessLocation - Return true if the specified loop might access the		/// mayLoopAccessLocation - Return true if the specified loop might access the
/// specified pointer location, which is a loop-strided access. The 'Access'		/// specified pointer location, which is a loop-strided access. The 'Access'
/// argument specifies what the verboten forms of access are (read or write).		/// argument specifies what the verboten forms of access are (read or write).
static bool		static bool
mayLoopAccessLocation(Value Ptr, ModRefInfo Access, Loop L,		mayLoopAccessLocation(Value Ptr, ModRefInfo Access, Loop L,
const SCEV *BECount, unsigned StoreSize,		const SCEV *BECount, unsigned StoreSize,
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	bool LoopIdiomRecognize::processLoopStridedStore(
SCEVExpanderCleaner ExpCleaner(Expander, *DT);		SCEVExpanderCleaner ExpCleaner(Expander, *DT);

Type *DestInt8PtrTy = Builder.getInt8PtrTy(DestAS);		Type *DestInt8PtrTy = Builder.getInt8PtrTy(DestAS);
Type *IntIdxTy = DL->getIndexType(DestPtr->getType());		Type *IntIdxTy = DL->getIndexType(DestPtr->getType());

bool Changed = false;		bool Changed = false;
const SCEV *Start = Ev->getStart();		const SCEV *Start = Ev->getStart();
// Handle negative strided loops.		// Handle negative strided loops.
if (NegStride)		if (NegStride) {
		LLVM_DEBUG(dbgs() << "Handle negative stride,"
		<< " original Start: " << Start << "\n");
Start = getStartForNegStride(Start, BECount, IntIdxTy, StoreSizeSCEV, SE);		Start = getStartForNegStride(Start, BECount, IntIdxTy, StoreSizeSCEV, SE);
		LLVM_DEBUG(dbgs() << " converted Start: " << Start << "\n");
		}

// TODO: ideally we should still be able to generate memset if SCEV expander		// TODO: ideally we should still be able to generate memset if SCEV expander
// is taught to generate the dependencies at the latest point.		// is taught to generate the dependencies at the latest point.
if (!isSafeToExpand(Start, *SE))		if (!isSafeToExpand(Start, *SE))
return Changed;		return Changed;

// Okay, we have a strided store "p[i]" of a splattable value. We can turn		// Okay, we have a strided store "p[i]" of a splattable value. We can turn
// this into a memset in the loop preheader now if we want. However, this		// this into a memset in the loop preheader now if we want. However, this
Show All 18 Lines	bool LoopIdiomRecognize::processLoopStridedStore(

if (avoidLIRForMultiBlockLoop(/IsMemset=/true, IsLoopMemset))		if (avoidLIRForMultiBlockLoop(/IsMemset=/true, IsLoopMemset))
return Changed;		return Changed;

// Okay, everything looks good, insert the memset.		// Okay, everything looks good, insert the memset.

// NumBytes = TripCount * StoreSize		// NumBytes = TripCount * StoreSize
const SCEV *TripCountS = getTripCount(BECount, IntIdxTy, CurLoop, DL, SE);		const SCEV *TripCountS = getTripCount(BECount, IntIdxTy, CurLoop, DL, SE);

		// If store size is not constant and we need to add runtime checks for the optimization
		// to proceed, then versioning is required.
		const bool IsConstantSize = isa<SCEVConstant>(StoreSizeSCEV);
		if (IsLoopMemset && !IsConstantSize && Converter.CheckSltMap.size()) {
		if (LN == nullptr) {
		LLVM_DEBUG(dbgs() << "requires versioning but running LoopIdiomRecognizer, "
		<< "abort (run LoopNestIdiomRecognizer instead support versioning)\n");
		return Changed;
		}
		if (!SE->isLoopInvariant(TripCountS, TopLoop) \|\| ForceNoLoopVersion) {
		LLVM_DEBUG(dbgs() << "abort becuase TripCount is not TopLoop's invariant "
		<< "or ForceNoLoopVersion = true\n");
		return Changed;
		}
		}

const SCEVConstant *ConstSize = dyn_cast<SCEVConstant>(StoreSizeSCEV);		const SCEVConstant *ConstSize = dyn_cast<SCEVConstant>(StoreSizeSCEV);
const SCEV *NumBytesS;		const SCEV *NumBytesS;
if (!ConstSize && ConstSize->getAPInt() == 1)
		if (ConstSize && ConstSize->getAPInt() == 1) {
NumBytesS = TripCountS;		NumBytesS = TripCountS;
else		LLVM_DEBUG(dbgs() << "StoreSize = 1, NumbytesS: " << *NumBytesS << "\n");
		} else {
NumBytesS =		NumBytesS =
SE->getMulExpr(TripCountS,		SE->getMulExpr(TripCountS,
SE->getTruncateOrZeroExtend(StoreSizeSCEV, IntIdxTy),		SE->getTruncateOrZeroExtend(StoreSizeSCEV, IntIdxTy),
SCEV::FlagNUW);		SCEV::FlagNUW);
		LLVM_DEBUG(dbgs() << " Calculate NumBytesS = TripCountS * StoreSizeSCEV\n"
		<< " TripCountS: " << *TripCountS << "\n"
		<< " StoreSizeSCEV: " << *StoreSizeSCEV << "\n"
		<< " NumBytesS: " << *NumBytesS << "\n");
		}

// TODO: ideally we should still be able to generate memset if SCEV expander		// TODO: ideally we should still be able to generate memset if SCEV expander
// is taught to generate the dependencies at the latest point.		// is taught to generate the dependencies at the latest point.
if (!isSafeToExpand(NumBytesS, *SE))		if (!isSafeToExpand(NumBytesS, *SE))
return Changed;		return Changed;

Value *NumBytes =		Value *NumBytes =
Expander.expandCodeFor(NumBytesS, IntIdxTy, Preheader->getTerminator());		Expander.expandCodeFor(NumBytesS, IntIdxTy, Preheader->getTerminator());

		// If the memset size is not a constant, we will need to version the top
		// level loop in the current loop nest with runtime checks. We are going
		// to version on only the top level loop once to avoid exponential growth
		// of versioning.
		// Here we check whether the top level clone has beed created yet, and create
		// it if it hasn't. The initial runtime check is set to false and the
		// conditions would be updated after we process all the loops.
		if (LN != nullptr && IsLoopMemset && !IsConstantSize && !ForceNoLoopVersion &&
		!isTopLoopVersioned() && Converter.CheckSltMap.size()) {
		LLVM_DEBUG(dbgs() << " Create versioning for top loop because runtime check for SCEV is needed\n");
		versionTopLoop();

		// If current loop is the top loop, versioning would change the loop's
		// preheader to RuntimeCheckBB, so we need to reset the insert point.
		if (CurLoop == TopLoop) {
		Preheader = CurLoop->getLoopPreheader();
		Builder.SetInsertPoint(Preheader->getTerminator());
		}
		}

CallInst *NewCall;		CallInst *NewCall;
if (SplatValue) {		if (SplatValue) {
NewCall = Builder.CreateMemSet(BasePtr, SplatValue, NumBytes,		NewCall = Builder.CreateMemSet(BasePtr, SplatValue, NumBytes,
MaybeAlign(StoreAlignment));		MaybeAlign(StoreAlignment));
} else {		} else {
// Everything is emitted in default address space		// Everything is emitted in default address space
Type *Int8PtrTy = DestInt8PtrTy;		Type *Int8PtrTy = DestInt8PtrTy;

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	for (auto *I : Stores) {
if (MSSAU)		if (MSSAU)
MSSAU->removeMemoryAccess(I, true);		MSSAU->removeMemoryAccess(I, true);
deleteDeadInstruction(I);		deleteDeadInstruction(I);
}		}
if (MSSAU && VerifyMemorySSA)		if (MSSAU && VerifyMemorySSA)
MSSAU->getMemorySSA()->verifyMemorySSA();		MSSAU->getMemorySSA()->verifyMemorySSA();
++NumMemSet;		++NumMemSet;
ExpCleaner.markResultUsed();		ExpCleaner.markResultUsed();
		if (IsLoopMemset && !IsConstantSize)
		++NumMemSetRuntimeLength;
return true;		return true;
}		}

/// If the stored value is a strided load in the same loop with the same stride		/// If the stored value is a strided load in the same loop with the same stride
/// this may be transformable into a memcpy. This kicks in for stuff like		/// this may be transformable into a memcpy. This kicks in for stuff like
/// for (i) A[i] = B[i];		/// for (i) A[i] = B[i];
bool LoopIdiomRecognize::processLoopStoreOfLoopLoad(StoreInst *SI,		bool LoopIdiomRecognize::processLoopStoreOfLoopLoad(StoreInst *SI,
const SCEV *BECount) {		const SCEV *BECount) {
▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	if (CurLoop->isOutermost() && (!IsMemset \|\| !IsLoopMemset)) {
<< " avoided: multi-block top-level loop\n");		<< " avoided: multi-block top-level loop\n");
return true;		return true;
}		}
}		}

return false;		return false;
}		}

		/// versionTopLoop - Create a fallback version the TopLoop
		void LoopIdiomRecognize::versionTopLoop() {
		const LoopAccessInfo LAI(TopLoop, SE, TLI, AA, DT, LI);
		LoopVersioning LV(LAI, LAI.getRuntimePointerChecking()->getChecks(), TopLoop,
		LI, DT, SE);

		LV.versionLoopWithPlainRuntimeCheck();

		RuntimeCheckBB = LV.getRuntimeCheckBB();
		FallBackLoop = LV.getNonVersionedLoop();
		}

bool LoopIdiomRecognize::runOnNoncountableLoop() {		bool LoopIdiomRecognize::runOnNoncountableLoop() {
LLVM_DEBUG(dbgs() << DEBUG_TYPE " Scanning: F["		LLVM_DEBUG(dbgs() << DEBUG_TYPE " Scanning: F["
<< CurLoop->getHeader()->getParent()->getName()		<< CurLoop->getHeader()->getParent()->getName()
<< "] Noncountable Loop %"		<< "] Noncountable Loop %"
<< CurLoop->getHeader()->getName() << "\n");		<< CurLoop->getHeader()->getName() << "\n");

return recognizePopcount() \|\| recognizeAndInsertFFS() \|\|		return recognizePopcount() \|\| recognizeAndInsertFFS() \|\|
recognizeShiftUntilBitTest() \|\| recognizeShiftUntilZero();		recognizeShiftUntilBitTest() \|\| recognizeShiftUntilZero();
▲ Show 20 Lines • Show All 1,367 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopIdiom/memset-runtime-lir.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes="function(loop(loop-idiom,loop-deletion),simplifycfg)" < %s -S \| FileCheck %s
				; The C code to generate this testcase:
				; void test(int ar[][m], long n, long m)
				; {
				; long i;
				; for (i=0; i<n; ++i) {
				; int arr = ar + i m; // ar[i];
				; memset(arr, 0, m * sizeof(int));
				; }
				; }
				; The optimized IR should be similar to the following:
				; void test(int ar[][m], long n, long m)
				; {
				; memset(ar, 0, m * n * sizeof(int));
				; }
				define void @test_simple(i32* nocapture %ar, i64 %n, i64 %m) {
				; CHECK-LABEL: @test_simple(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[AR1:%.]] = bitcast i32 [[AR:%.]] to i8
				; CHECK-NEXT: [[TMP0:%.]] = shl nuw i64 [[M:%.]], 2
				; CHECK-NEXT: [[TMP1:%.]] = mul i64 [[M]], [[N:%.]]
				; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[TMP1]], 2
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[AR1]], i8 0, i64 [[TMP2]], i1 false)
				; CHECK-NEXT: ret void
				;
				entry:
				%0 = shl nuw i64 %m, 2
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.inc4, %entry
				%i.017 = phi i64 [ 0, %entry ], [ %inc5, %for.inc4 ]
				%1 = mul i64 %m, %i.017
				%scevgep = getelementptr i32, i32* %ar, i64 %1
				%scevgep1 = bitcast i32* %scevgep to i8*
				%mul = mul nsw i64 %i.017, %m
				call void @llvm.memset.p0i8.i64(i8* align 4 %scevgep1, i8 0, i64 %0, i1 false)
				br label %for.inc4

				for.inc4: ; preds = %for.cond1.preheader
				%inc5 = add nuw nsw i64 %i.017, 1
				%exitcond18.not = icmp eq i64 %inc5, %n
				br i1 %exitcond18.not, label %for.end6, label %for.cond1.preheader

				for.end6: ; preds = %for.inc4
				ret void
				}

				; The C code to generate this testcase:
				; void test(int n, int m, int o, int *ar)
				; {
				; long i = 0, j;
				; do {
				; j = 0;
				; do {
				; int arr = ar + i m * o + j * o;
				; memset(arr, 0, o * sizeof(int));
				; ++j;
				; } while (j < m);
				; ++i;
				; } while (i < n);
				; }
				define void @test_nested_do_while(i32 %n, i32 %m, i32 %o, i32* nocapture %ar){
				; CHECK-LABEL: @test_nested_do_while(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CONV:%.]] = sext i32 [[M:%.]] to i64
				; CHECK-NEXT: [[CONV2:%.]] = sext i32 [[O:%.]] to i64
				; CHECK-NEXT: [[MUL:%.*]] = mul nsw i64 [[CONV2]], [[CONV]]
				; CHECK-NEXT: [[MUL8:%.*]] = shl nsw i64 [[CONV2]], 2
				; CHECK-NEXT: [[SMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[CONV]], i64 1)
				; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.smax.i32(i32 [[N:%.]], i32 1)
				; CHECK-NEXT: [[SMAX27:%.*]] = zext i32 [[TMP0]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[CONV2]], [[CONV]]
				; CHECK-NEXT: [[TMP2:%.*]] = mul i64 [[SMAX]], [[CONV2]]
				; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 2
				; CHECK-NEXT: br label [[DO_BODY:%.*]]
				; CHECK: do.body:
				; CHECK-NEXT: [[I_0:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC11:%.*]], [[DO_BODY]] ]
				; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP1]], [[I_0]]
				; CHECK-NEXT: [[SCEVGEP:%.]] = getelementptr i32, i32 [[AR:%.*]], i64 [[TMP4]]
				; CHECK-NEXT: [[SCEVGEP1:%.]] = bitcast i32 [[SCEVGEP]] to i8*
				; CHECK-NEXT: [[MUL3:%.*]] = mul i64 [[MUL]], [[I_0]]
				; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i32, i32 [[AR]], i64 [[MUL3]]
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[SCEVGEP1]], i8 0, i64 [[TMP3]], i1 false)
				; CHECK-NEXT: [[INC11]] = add nuw nsw i64 [[I_0]], 1
				; CHECK-NEXT: [[EXITCOND28_NOT:%.*]] = icmp eq i64 [[INC11]], [[SMAX27]]
				; CHECK-NEXT: br i1 [[EXITCOND28_NOT]], label [[DO_END16:%.*]], label [[DO_BODY]]
				; CHECK: do.end16:
				; CHECK-NEXT: ret void
				;
				entry:
				%conv = sext i32 %m to i64
				%conv2 = sext i32 %o to i64
				%mul = mul nsw i64 %conv2, %conv
				%mul8 = shl nsw i64 %conv2, 2
				%smax = call i64 @llvm.smax.i64(i64 %conv, i64 1)
				%0 = call i32 @llvm.smax.i32(i32 %n, i32 1)
				%smax27 = zext i32 %0 to i64
				br label %do.body

				do.body: ; preds = %do.end, %entry
				%i.0 = phi i64 [ 0, %entry ], [ %inc11, %do.end ]
				%mul3 = mul i64 %mul, %i.0
				%add.ptr = getelementptr inbounds i32, i32* %ar, i64 %mul3
				br label %do.body1

				do.body1: ; preds = %do.body1, %do.body
				%j.0 = phi i64 [ 0, %do.body ], [ %inc, %do.body1 ]
				%mul5 = mul nsw i64 %j.0, %conv2
				%add.ptr6 = getelementptr inbounds i32, i32* %add.ptr, i64 %mul5
				%1 = bitcast i32* %add.ptr6 to i8*
				call void @llvm.memset.p0i8.i64(i8* align 4 %1, i8 0, i64 %mul8, i1 false)
				%inc = add nuw nsw i64 %j.0, 1
				%exitcond.not = icmp eq i64 %inc, %smax
				br i1 %exitcond.not, label %do.end, label %do.body1

				do.end: ; preds = %do.body1
				%inc11 = add nuw nsw i64 %i.0, 1
				%exitcond28.not = icmp eq i64 %inc11, %smax27
				br i1 %exitcond28.not, label %do.end16, label %do.body

				do.end16: ; preds = %do.end
				ret void
				}
				; The C code to generate this testcase:
				; void test(int n, int m, int o, int *ar)
				; {
				; for (int i=0; i<n; ++i) {
				; for (int j=0; j<m; ++j) {
				; int arr = ar + i m * o + j * o;
				; memset(arr, 0, o * sizeof(int));
				; }
				; }
				; }
				define void @test_nested_for(i32 %n, i32 %m, i32 %o, i32* %ar) {
				; CHECK-LABEL: @test_nested_for(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[AR2:%.]] = bitcast i32 [[AR:%.]] to i8
				; CHECK-NEXT: [[CMP3:%.]] = icmp slt i32 0, [[N:%.]]
				; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_END11:%.]]
				; CHECK: for.body.lr.ph:
				; CHECK-NEXT: [[CMP21:%.]] = icmp slt i32 0, [[M:%.]]
				; CHECK-NEXT: [[CONV:%.]] = sext i32 [[O:%.]] to i64
				; CHECK-NEXT: [[MUL8:%.*]] = mul i64 [[CONV]], 4
				; CHECK-NEXT: br i1 [[CMP21]], label [[FOR_BODY_LR_PH_SPLIT_US:%.*]], label [[FOR_END11]]
				; CHECK: for.body.lr.ph.split.us:
				; CHECK-NEXT: [[TMP0:%.*]] = sext i32 [[O]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[M]] to i64
				; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[O]] to i64
				; CHECK-NEXT: [[WIDE_TRIP_COUNT10:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP0]], [[TMP1]]
				; CHECK-NEXT: [[TMP4:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP0]], [[TMP4]]
				; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[TMP5]], 2
				; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP5]], [[WIDE_TRIP_COUNT10]]
				; CHECK-NEXT: [[TMP8:%.*]] = shl i64 [[TMP7]], 2
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[AR2]], i8 0, i64 [[TMP8]], i1 false)
				; CHECK-NEXT: br label [[FOR_END11]]
				; CHECK: for.end11:
				; CHECK-NEXT: ret void
				;
				entry:
				%cmp3 = icmp slt i32 0, %n
				br i1 %cmp3, label %for.body.lr.ph, label %for.end11

				for.body.lr.ph: ; preds = %entry
				%cmp21 = icmp slt i32 0, %m
				%conv = sext i32 %o to i64
				%mul8 = mul i64 %conv, 4
				br i1 %cmp21, label %for.body.lr.ph.split.us, label %for.body.lr.ph.split

				for.body.lr.ph.split.us: ; preds = %for.body.lr.ph
				%0 = sext i32 %o to i64
				%1 = sext i32 %m to i64
				%2 = sext i32 %o to i64
				%wide.trip.count10 = zext i32 %n to i64
				br label %for.body.us

				for.body.us: ; preds = %for.inc9.us, %for.body.lr.ph.split.us
				%indvars.iv6 = phi i64 [ %indvars.iv.next7, %for.inc9.us ], [ 0, %for.body.lr.ph.split.us ]
				br label %for.body3.lr.ph.us

				for.end.us: ; preds = %for.cond1.for.end_crit_edge.us
				br label %for.inc9.us

				for.inc9.us: ; preds = %for.end.us
				%indvars.iv.next7 = add nuw nsw i64 %indvars.iv6, 1
				%exitcond11 = icmp ne i64 %indvars.iv.next7, %wide.trip.count10
				br i1 %exitcond11, label %for.body.us, label %for.cond.for.end11_crit_edge.split.us

				for.body3.us: ; preds = %for.body3.lr.ph.us, %for.inc.us
				%indvars.iv = phi i64 [ 0, %for.body3.lr.ph.us ], [ %indvars.iv.next, %for.inc.us ]
				%3 = mul nsw i64 %indvars.iv, %0
				%add.ptr7.us = getelementptr inbounds i32, i32* %add.ptr.us, i64 %3
				%4 = bitcast i32* %add.ptr7.us to i8*
				call void @llvm.memset.p0i8.i64(i8* align 4 %4, i8 0, i64 %mul8, i1 false)
				br label %for.inc.us

				for.inc.us: ; preds = %for.body3.us
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond, label %for.body3.us, label %for.cond1.for.end_crit_edge.us

				for.body3.lr.ph.us: ; preds = %for.body.us
				%5 = mul nsw i64 %indvars.iv6, %1
				%6 = mul nsw i64 %5, %2
				%add.ptr.us = getelementptr inbounds i32, i32* %ar, i64 %6
				%wide.trip.count = zext i32 %m to i64
				br label %for.body3.us

				for.cond1.for.end_crit_edge.us: ; preds = %for.inc.us
				br label %for.end.us

				for.cond.for.end11_crit_edge.split.us: ; preds = %for.inc9.us
				br label %for.cond.for.end11_crit_edge

				for.body.lr.ph.split: ; preds = %for.body.lr.ph
				br label %for.cond.for.end11_crit_edge.split

				for.cond.for.end11_crit_edge.split: ; preds = %for.body.lr.ph.split
				br label %for.cond.for.end11_crit_edge

				for.cond.for.end11_crit_edge: ; preds = %for.cond.for.end11_crit_edge.split.us, %for.cond.for.end11_crit_edge.split
				br label %for.end11

				for.end11: ; preds = %for.cond.for.end11_crit_edge, %entry
				ret void
				}

				; Function Attrs: argmemonly nofree nounwind willreturn writeonly
				declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg)

				; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
				declare i64 @llvm.smax.i64(i64, i64)

				; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
				declare i32 @llvm.smax.i32(i32, i32)

llvm/test/Transforms/LoopIdiom/memset-runtime-lnir.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes="function(loop(loop-nest-idiom,loop-deletion),simplifycfg)" < %s -S \| FileCheck %s
				; The C code to generate this testcase:
				; void test(int ar[][m], long n, long m)
				; {
				; long i;
				; for (i=0; i<n; ++i) {
				; int arr = ar + i m; // ar[i];
				; memset(arr, 0, m * sizeof(int));
				; }
				; }
				; The optimized IR should be similar to the following:
				; void test(int ar[][m], long n, long m)
				; {
				; memset(ar, 0, m * n * sizeof(int));
				; }
				define void @test_simple(i32* nocapture %ar, i64 %n, i64 %m) {
				; CHECK-LABEL: @test_simple(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[AR1:%.]] = bitcast i32 [[AR:%.]] to i8
				; CHECK-NEXT: [[TMP0:%.]] = shl nuw i64 [[M:%.]], 2
				; CHECK-NEXT: [[TMP1:%.]] = mul i64 [[M]], [[N:%.]]
				; CHECK-NEXT: [[TMP2:%.*]] = shl i64 [[TMP1]], 2
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[AR1]], i8 0, i64 [[TMP2]], i1 false)
				; CHECK-NEXT: ret void
				;
				entry:
				%0 = shl nuw i64 %m, 2
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.inc4, %entry
				%i.017 = phi i64 [ 0, %entry ], [ %inc5, %for.inc4 ]
				%1 = mul i64 %m, %i.017
				%scevgep = getelementptr i32, i32* %ar, i64 %1
				%scevgep1 = bitcast i32* %scevgep to i8*
				%mul = mul nsw i64 %i.017, %m
				call void @llvm.memset.p0i8.i64(i8* align 4 %scevgep1, i8 0, i64 %0, i1 false)
				br label %for.inc4

				for.inc4: ; preds = %for.cond1.preheader
				%inc5 = add nuw nsw i64 %i.017, 1
				%exitcond18.not = icmp eq i64 %inc5, %n
				br i1 %exitcond18.not, label %for.end6, label %for.cond1.preheader

				for.end6: ; preds = %for.inc4
				ret void
				}

				; The C code to generate this testcase:
				; void test(int n, int m, int o, int *ar)
				; {
				; long i = 0, j;
				; do {
				; j = 0;
				; do {
				; int arr = ar + i m * o + j * o;
				; memset(arr, 0, o * sizeof(int));
				; ++j;
				; } while (j < m);
				; ++i;
				; } while (i < n);
				; }
				define void @test_nested_do_while(i32 %n, i32 %m, i32 %o, i32* nocapture %ar){
				; CHECK-LABEL: @test_nested_do_while(
				; CHECK-NEXT: do.body.lver.check:
				; CHECK-NEXT: [[AR2:%.]] = bitcast i32 [[AR:%.]] to i8
				; CHECK-NEXT: [[CONV:%.]] = sext i32 [[M:%.]] to i64
				; CHECK-NEXT: [[CONV2:%.]] = sext i32 [[O:%.]] to i64
				; CHECK-NEXT: [[MUL:%.*]] = mul nsw i64 [[CONV2]], [[CONV]]
				; CHECK-NEXT: [[MUL8:%.*]] = shl nsw i64 [[CONV2]], 2
				; CHECK-NEXT: [[SMAX:%.*]] = call i64 @llvm.smax.i64(i64 [[CONV]], i64 1)
				; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.smax.i32(i32 [[N:%.]], i32 1)
				; CHECK-NEXT: [[SMAX27:%.*]] = zext i32 [[TMP0]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[CONV2]], [[CONV]]
				; CHECK-NEXT: [[TMP2:%.*]] = mul i64 [[SMAX]], [[CONV2]]
				; CHECK-NEXT: [[TMP3:%.*]] = shl i64 [[TMP2]], 2
				; CHECK-NEXT: [[TMP4:%.*]] = mul i64 [[TMP2]], [[SMAX27]]
				; CHECK-NEXT: [[TMP5:%.*]] = shl i64 [[TMP4]], 2
				; CHECK-NEXT: [[TMP6:%.*]] = icmp slt i64 [[CONV]], 0
				; CHECK-NEXT: [[TMP7:%.*]] = or i1 false, [[TMP6]]
				; CHECK-NEXT: [[TMP8:%.*]] = icmp slt i64 [[CONV2]], 0
				; CHECK-NEXT: [[TMP9:%.*]] = or i1 [[TMP7]], [[TMP8]]
				; CHECK-NEXT: [[TMP10:%.*]] = icmp slt i64 [[SMAX]], 1
				; CHECK-NEXT: [[TMP11:%.*]] = or i1 [[TMP9]], [[TMP10]]
				; CHECK-NEXT: br i1 [[TMP11]], label [[DO_BODY_LVER_ORIG:%.]], label [[DO_BODY_PH:%.]]
				; CHECK: do.body.lver.orig:
				; CHECK-NEXT: [[I_0_LVER_ORIG:%.]] = phi i64 [ [[INC11_LVER_ORIG:%.]], [[DO_END_LVER_ORIG:%.]] ], [ 0, [[DO_BODY_LVER_CHECK:%.]] ]
				; CHECK-NEXT: [[TMP12:%.*]] = mul i64 [[TMP1]], [[I_0_LVER_ORIG]]
				; CHECK-NEXT: [[SCEVGEP_LVER_ORIG:%.]] = getelementptr i32, i32 [[AR]], i64 [[TMP12]]
				; CHECK-NEXT: [[SCEVGEP1_LVER_ORIG:%.]] = bitcast i32 [[SCEVGEP_LVER_ORIG]] to i8*
				; CHECK-NEXT: [[MUL3_LVER_ORIG:%.*]] = mul i64 [[MUL]], [[I_0_LVER_ORIG]]
				; CHECK-NEXT: [[ADD_PTR_LVER_ORIG:%.]] = getelementptr inbounds i32, i32 [[AR]], i64 [[MUL3_LVER_ORIG]]
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[SCEVGEP1_LVER_ORIG]], i8 0, i64 [[TMP3]], i1 false)
				; CHECK-NEXT: br label [[DO_BODY1_LVER_ORIG:%.*]]
				; CHECK: do.body1.lver.orig:
				; CHECK-NEXT: [[J_0_LVER_ORIG:%.]] = phi i64 [ 0, [[DO_BODY_LVER_ORIG]] ], [ [[INC_LVER_ORIG:%.]], [[DO_BODY1_LVER_ORIG]] ]
				; CHECK-NEXT: [[MUL5_LVER_ORIG:%.*]] = mul nsw i64 [[J_0_LVER_ORIG]], [[CONV2]]
				; CHECK-NEXT: [[ADD_PTR6_LVER_ORIG:%.]] = getelementptr inbounds i32, i32 [[ADD_PTR_LVER_ORIG]], i64 [[MUL5_LVER_ORIG]]
				; CHECK-NEXT: [[TMP13:%.]] = bitcast i32 [[ADD_PTR6_LVER_ORIG]] to i8*
				; CHECK-NEXT: [[INC_LVER_ORIG]] = add nuw nsw i64 [[J_0_LVER_ORIG]], 1
				; CHECK-NEXT: [[EXITCOND_NOT_LVER_ORIG:%.*]] = icmp eq i64 [[INC_LVER_ORIG]], [[SMAX]]
				; CHECK-NEXT: br i1 [[EXITCOND_NOT_LVER_ORIG]], label [[DO_END_LVER_ORIG]], label [[DO_BODY1_LVER_ORIG]]
				; CHECK: do.end.lver.orig:
				; CHECK-NEXT: [[INC11_LVER_ORIG]] = add nuw nsw i64 [[I_0_LVER_ORIG]], 1
				; CHECK-NEXT: [[EXITCOND28_NOT_LVER_ORIG:%.*]] = icmp eq i64 [[INC11_LVER_ORIG]], [[SMAX27]]
				; CHECK-NEXT: br i1 [[EXITCOND28_NOT_LVER_ORIG]], label [[DO_END16:%.*]], label [[DO_BODY_LVER_ORIG]]
				; CHECK: do.body.ph:
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[AR2]], i8 0, i64 [[TMP5]], i1 false)
				; CHECK-NEXT: br label [[DO_END16]]
				; CHECK: do.end16:
				; CHECK-NEXT: ret void
				;
				entry:
				%conv = sext i32 %m to i64
				%conv2 = sext i32 %o to i64
				%mul = mul nsw i64 %conv2, %conv
				%mul8 = shl nsw i64 %conv2, 2
				%smax = call i64 @llvm.smax.i64(i64 %conv, i64 1)
				%0 = call i32 @llvm.smax.i32(i32 %n, i32 1)
				%smax27 = zext i32 %0 to i64
				br label %do.body

				do.body: ; preds = %do.end, %entry
				%i.0 = phi i64 [ 0, %entry ], [ %inc11, %do.end ]
				%mul3 = mul i64 %mul, %i.0
				%add.ptr = getelementptr inbounds i32, i32* %ar, i64 %mul3
				br label %do.body1

				do.body1: ; preds = %do.body1, %do.body
				%j.0 = phi i64 [ 0, %do.body ], [ %inc, %do.body1 ]
				%mul5 = mul nsw i64 %j.0, %conv2
				%add.ptr6 = getelementptr inbounds i32, i32* %add.ptr, i64 %mul5
				%1 = bitcast i32* %add.ptr6 to i8*
				call void @llvm.memset.p0i8.i64(i8* align 4 %1, i8 0, i64 %mul8, i1 false)
				%inc = add nuw nsw i64 %j.0, 1
				%exitcond.not = icmp eq i64 %inc, %smax
				br i1 %exitcond.not, label %do.end, label %do.body1

				do.end: ; preds = %do.body1
				%inc11 = add nuw nsw i64 %i.0, 1
				%exitcond28.not = icmp eq i64 %inc11, %smax27
				br i1 %exitcond28.not, label %do.end16, label %do.body

				do.end16: ; preds = %do.end
				ret void
				}
				; The C code to generate this testcase:
				; void test(int n, int m, int o, int *ar)
				; {
				; for (int i=0; i<n; ++i) {
				; for (int j=0; j<m; ++j) {
				; int arr = ar + i m * o + j * o;
				; memset(arr, 0, o * sizeof(int));
				; }
				; }
				; }
				define void @test_nested_for(i32 %n, i32 %m, i32 %o, i32* %ar) {
				; CHECK-LABEL: @test_nested_for(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[AR2:%.]] = bitcast i32 [[AR:%.]] to i8
				; CHECK-NEXT: [[CMP3:%.]] = icmp slt i32 0, [[N:%.]]
				; CHECK-NEXT: br i1 [[CMP3]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_END11:%.]]
				; CHECK: for.body.lr.ph:
				; CHECK-NEXT: [[CMP21:%.]] = icmp slt i32 0, [[M:%.]]
				; CHECK-NEXT: [[CONV:%.]] = sext i32 [[O:%.]] to i64
				; CHECK-NEXT: [[MUL8:%.*]] = mul i64 [[CONV]], 4
				; CHECK-NEXT: br i1 [[CMP21]], label [[FOR_BODY_LR_PH_SPLIT_US:%.*]], label [[FOR_END11]]
				; CHECK: for.body.lr.ph.split.us:
				; CHECK-NEXT: [[TMP0:%.*]] = sext i32 [[O]] to i64
				; CHECK-NEXT: [[TMP1:%.*]] = sext i32 [[M]] to i64
				; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[O]] to i64
				; CHECK-NEXT: [[WIDE_TRIP_COUNT10:%.*]] = zext i32 [[N]] to i64
				; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP0]], [[TMP1]]
				; CHECK-NEXT: [[TMP4:%.*]] = zext i32 [[M]] to i64
				; CHECK-NEXT: [[TMP5:%.*]] = mul i64 [[TMP0]], [[TMP4]]
				; CHECK-NEXT: [[TMP6:%.*]] = shl i64 [[TMP5]], 2
				; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP5]], [[WIDE_TRIP_COUNT10]]
				; CHECK-NEXT: [[TMP8:%.*]] = shl i64 [[TMP7]], 2
				; CHECK-NEXT: call void @llvm.memset.p0i8.i64(i8* align 4 [[AR2]], i8 0, i64 [[TMP8]], i1 false)
				; CHECK-NEXT: br label [[FOR_END11]]
				; CHECK: for.end11:
				; CHECK-NEXT: ret void
				;
				entry:
				%cmp3 = icmp slt i32 0, %n
				br i1 %cmp3, label %for.body.lr.ph, label %for.end11

				for.body.lr.ph: ; preds = %entry
				%cmp21 = icmp slt i32 0, %m
				%conv = sext i32 %o to i64
				%mul8 = mul i64 %conv, 4
				br i1 %cmp21, label %for.body.lr.ph.split.us, label %for.body.lr.ph.split

				for.body.lr.ph.split.us: ; preds = %for.body.lr.ph
				%0 = sext i32 %o to i64
				%1 = sext i32 %m to i64
				%2 = sext i32 %o to i64
				%wide.trip.count10 = zext i32 %n to i64
				br label %for.body.us

				for.body.us: ; preds = %for.inc9.us, %for.body.lr.ph.split.us
				%indvars.iv6 = phi i64 [ %indvars.iv.next7, %for.inc9.us ], [ 0, %for.body.lr.ph.split.us ]
				br label %for.body3.lr.ph.us

				for.end.us: ; preds = %for.cond1.for.end_crit_edge.us
				br label %for.inc9.us

				for.inc9.us: ; preds = %for.end.us
				%indvars.iv.next7 = add nuw nsw i64 %indvars.iv6, 1
				%exitcond11 = icmp ne i64 %indvars.iv.next7, %wide.trip.count10
				br i1 %exitcond11, label %for.body.us, label %for.cond.for.end11_crit_edge.split.us

				for.body3.us: ; preds = %for.body3.lr.ph.us, %for.inc.us
				%indvars.iv = phi i64 [ 0, %for.body3.lr.ph.us ], [ %indvars.iv.next, %for.inc.us ]
				%3 = mul nsw i64 %indvars.iv, %0
				%add.ptr7.us = getelementptr inbounds i32, i32* %add.ptr.us, i64 %3
				%4 = bitcast i32* %add.ptr7.us to i8*
				call void @llvm.memset.p0i8.i64(i8* align 4 %4, i8 0, i64 %mul8, i1 false)
				br label %for.inc.us

				for.inc.us: ; preds = %for.body3.us
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond, label %for.body3.us, label %for.cond1.for.end_crit_edge.us

				for.body3.lr.ph.us: ; preds = %for.body.us
				%5 = mul nsw i64 %indvars.iv6, %1
				%6 = mul nsw i64 %5, %2
				%add.ptr.us = getelementptr inbounds i32, i32* %ar, i64 %6
				%wide.trip.count = zext i32 %m to i64
				br label %for.body3.us

				for.cond1.for.end_crit_edge.us: ; preds = %for.inc.us
				br label %for.end.us

				for.cond.for.end11_crit_edge.split.us: ; preds = %for.inc9.us
				br label %for.cond.for.end11_crit_edge

				for.body.lr.ph.split: ; preds = %for.body.lr.ph
				br label %for.cond.for.end11_crit_edge.split

				for.cond.for.end11_crit_edge.split: ; preds = %for.body.lr.ph.split
				br label %for.cond.for.end11_crit_edge

				for.cond.for.end11_crit_edge: ; preds = %for.cond.for.end11_crit_edge.split.us, %for.cond.for.end11_crit_edge.split
				br label %for.end11

				for.end11: ; preds = %for.cond.for.end11_crit_edge, %entry
				ret void
				}

				; Function Attrs: argmemonly nofree nounwind willreturn writeonly
				declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i1 immarg)

				; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
				declare i64 @llvm.smax.i64(i64, i64)

				; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
				declare i32 @llvm.smax.i32(i32, i32)

This is an archive of the discontinued LLVM Phabricator instance.

[WIP][LoopIdiom] Teach LNIR to versioning loops and add runtime checkNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 363701

llvm/include/llvm/Transforms/Scalar/LoopIdiomRecognize.h

llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp

llvm/test/Transforms/LoopIdiom/memset-runtime-lir.ll

llvm/test/Transforms/LoopIdiom/memset-runtime-lnir.ll

[WIP][LoopIdiom] Teach LNIR to versioning loops and add runtime check
Needs ReviewPublic