This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
3/6
LoopUnrollAndJam.cpp
-
test/Transforms/LoopUnrollAndJam/
-
Transforms/
-
LoopUnrollAndJam/
10/10
loopnest.ll
-
unroll-and-jam.ll

Differential D73129

[LoopUnrollAndJam] Correctly update LoopInfo when unroll and jam more than 2-levels loop nests.
AbandonedPublic

Authored by Whitney on Jan 21 2020, 12:01 PM.

Download Raw Diff

Details

Reviewers

dmgreen
jdoerfert
Meinersbur
kbarton
bmahjour
etiotto

Summary

Before unroll and jam:

for i
  for j
    for k
      A[i][j][k] = 0

After unroll and jam loop-i by a factor of 2:

for i +=2
  for j {
    for k
      A[i][j][k] = 0
    for k'
      A[i+1][j][k] = 0
  }

Notice that there exists a new child loop loop-k' of loop-j, after unroll and jam loop-i by 2.
This patch correctly update LoopInfo with new loops created during unroll and jamming.

Side discussion:
With the example above, we can see that changing the order of how loops are traverse in a loop nest doesn't actually solve the only one subloop limitation. We need to think of other ways to have more loops able to unroll and jam. One way is try to fuse subloops before giving up. Another way could be unroll and jam the whole loop nest in one attempt, i.e. no need to create the loop-k'. Other ideas are welcome.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Whitney created this revision.Jan 21 2020, 12:01 PM

Herald added subscribers: llvm-commits, zzheng, hiraditya. · View Herald TranscriptJan 21 2020, 12:01 PM

Does it make sense to split the cloneBasicBlocksInLoop changes and review them separatly? It would make the diff smaller and easier to read.

llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll
7	I would generate the check lines with the update_checks script. This doesn't test much.

Does it make sense to split the cloneBasicBlocksInLoop changes and review them separatly? It would make the diff smaller and easier to read.

Sounds good, I will move all simple changes unrelated to cloneBasicBlocksInLoop to another review.

llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll
7	I tried to follow the same pattern as the other unroll and jam lit tests. The main purpose of this test is to check if loop info verified correctly. I will update the test with check lines generated by use update_checks script.

dmgreen added inline comments.Jan 22 2020, 5:59 AM

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
626	I think the #ifndef NDEBUG checks below will re-use L after it has been deleted.
llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll
7	Maybe just show control flow, like the test that was changed below?

Addressed Johannes's comments.

jdoerfert added inline comments.Jan 22 2020, 8:18 AM

llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll
7	Maybe just show control flow, like the test that was changed below? The problem there is that the conditions are missing and you don't know where the loop bodies end up. It's debatable what is best but this representation has some advantages when it comes to changes, it makes them automatically applicable and explicitly shows everything. That said, I'm fine with manually cutting it down if ppl prefer that.

Whitney marked 4 inline comments as done.Jan 22 2020, 8:27 AM

Whitney added inline comments.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
626	Good catch, will address this concern in https://reviews.llvm.org/D73204
llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll
7	How about I only keep the control flow, conditions, and phi nodes?

fhahn added a subscriber: fhahn.Jan 22 2020, 8:43 AM

fhahn added inline comments.

llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll
7	If this test is intended to check that we generate the correct control flow, why not keep the inner loop body to a bare minimum (e.g. just the induction variables and maybe a function call that is passed the induction variables? We force unrolling, so the loop body, so the body is not really important, right? That would reduce the clutter to a bare minimum, even if we use the script to auto-generate the CHECK lines.

Whitney marked 2 inline comments as done.Jan 22 2020, 8:58 AM

Whitney added inline comments.

llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll
7	The intention of the test is to check loop info verify after unroll and jamming and 3 level loop nests. I will change A[i][j][k] = 0 to bar(i, j, k).

Simplified the test case. A[i][j][k]=0 to bar(i,j,k).

fhahn added inline comments.Jan 22 2020, 10:54 AM

llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll
95	if that's readnone nounwind and the result is unused, it will just be removed (by simplifyLoopAfterUnroll probably). If you need actual loop body, it's probably better to drop the attributes.

Whitney marked 2 inline comments as done.Jan 22 2020, 11:59 AM

Whitney added inline comments.

llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll
95	They are required for unroll and jam to be safely performed. i.e. Without those attributes, unroll and jam is not performed.

jdoerfert added inline comments.Jan 22 2020, 12:30 PM

llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll
95	That makes sense but @fhahn has a point. We should try not to introduce test cases that are "trivial" in some sense. I think `A[i][j][k] = 0` was an OK choice. I personally have a harder time understanding check lines and differences if I don't see all of it. Also, the above is arguably the minimal test case for 3 loops already, with `A[i][j][k] = 0` the simplest useful example I can think of. Nit: Make the loop bounds different for each loop. That way you can spot them more easily.

Change the test case back to A[i][j][k] = 0 as I agreed that it is the minimal meaningful body for a three level loop nests.
Also use different upper bound for each loop as suggested by Johannes.

As for what to put for the checks. We could
(1) one check for the induction variable (original patch)
(2) only control flow (suggested by @dmgreen)
(3) all checks (suggested by @jdoerfert)
(4) all checks except the loop body
I don't mind any of them, as long as we know the loop nest is unroll and jammed, and LoopInfo verified, which all 4 suggestion satisfy,
I will wait to see if we can get to a conclusion before doing further changes. Currently is (3).

I'm happy with 3.

Nothing in this code needs the blocks to be processed in RPO? (or at least some defined order)? I'm a little surprised, but I don't spot anything.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
146	BasicBlockSet &Blocks,
186	I think that it will matter that this isn't a deterministic order

dmgreen mentioned this in D73498: [NFC] Remove extra headers included in Loop Unroll and LoopUnrollAndJam files.Jan 27 2020, 1:57 PM

Nothing in this code needs the blocks to be processed in RPO? (or at least some defined order)? I'm a little surprised, but I don't spot anything.

Functionally should be fine without an order, as now we are replacing values for all block for one unroll copy at the same time.

However, changing to deterministic order can make testing easier.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
186	Doesn't matter functionally, but it may affect the order in LIT tests. That's why I used CHECK-DAG in unroll-and-jam.ll. I agree that could be problematic, thinking to change Blocks from a set to a vector, so the caller of the function decide on a order.

I'd expect

#pragma unrollandjam(2)
for i
  for j
    for k
      A[i][j][k] = 0

to generate this:

for i +=2
  for j
    for k
      A[i][j][k] = 0
      A[i+1][j][k] = 0

(as by the side-note by @Whitney)

That is, if the jam loop is not specified, to jam the innermost by default, which is should be what you want in most cases. For some reason I was assuming LoopUnrollAndJam was rejecting this case as not-yet-implemented. Would that be an option instead of fixing a behavior that probably is not the expected effect?

In D73129#1843278, @Meinersbur wrote:
I'd expect
#pragma unrollandjam(2)
for i
  for j
    for k
      A[i][j][k] = 0
to generate this:
for i +=2
  for j
    for k
      A[i][j][k] = 0
      A[i+1][j][k] = 0
(as by the side-note by @Whitney)

That is, if the jam loop is not specified, to jam the innermost by default, which is should be what you want in most cases. For some reason I was assuming LoopUnrollAndJam was rejecting this case as not-yet-implemented. Would that be an option instead of fixing a behavior that probably is not the expected effect?

That's an interesting idea. If we want to do that, we will need to modify multiple places. Mainly outermost loop needs to be partition into a list of ForeBlocks, a list of AftBlocks, and InnermostLoopBlocks, instead of ForeBlocks, SubLoopBlocks, and AftBlocks. ForeBlocks are the block in outermost loop but outside of innermost loop, and before the innermost loop. AftBlocks are block outside of innermost loop, and not in ForeBlocks. Safety check and code generation both need to be modified accordingly. And again, we will iterate the loops from outer to inner. Maybe we can simplify the transformation by only considering perfect nest? What does everyone think?

Put more generally, I was expecting this:

for i
  A(i)
  for j
    B(i, j)
    for k
      C(i, j, k)
    D(i, j)
  E(i)

To be unrolled in i:

for i +=2
  A(i)
  for j
    B(i, j)
    for k
      C(i, j, k)
    D(i, j)
  E(i)
  A(i+1)
  for j
    B(i+1, j)
    for k
      C(i+1, j, k)
    D(i+1, j)
  E(i+1)
for i remainder
  A(i)
  for j
    B(i, j)
    for k
      C(i, j, k)
    D(i, j)
  E(i)

And then the j loops to be jammed:

for i +=2
  A(i)
  A(i+1)
  for j
    B(i, j)
    for k
      C(i, j, k)
    D(i, j)
    B(i+1, j)
    for k
      C(i+1, j, k)
    D(i+1, j)
  E(i)
  E(i+1)
for i remainder
  A(i)
  for j
    B(i, j)
    for k
      C(i, j, k)
    D(i, j)
  E(i)

You are saying that we should also fuse the inner loops? That sounds sensible if we can do it and prove legality (and if we can somehow prove that UnJing i is better than UnJing j, which I can see could come up in places and might out-weight the extra code bloat).

When Unroll And Jam was written we did not have general loop fusion. We now do. Can we make use of it here to fuse any sub-loops together? I believe that is how gcc writes their algorithm, but last I looked they only supported perfectly nested loops which would be a big regression over what is here now. We might just be able to attempt sub-loop fusing, using the loop fusion infrastructure we have?

The alternative like you said would be trying to prove it is valid beforehand, which would mean checking that more blocks inside subloops can be moved past each other and all the extra memory dependencies are safe.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
186	Non-determinism is usually considered a problem, even if both outputs are equally valid (as they would be in this case). Especially in a functional safety context where you need to be testing what you are running. Maybe use a SetVector? A vector may be fine too, if we don't need constant time lookup/removal.

Option 1: Use LoopFusion infrastructure to jam innerloops recursively.

we need this patch to update LI correctly.
modify LoopFusion to make its functionality utilities for other passes.
we may only know the innerloops cannot be fuse after unroll and jam the parent loops.

FYI, @kbarton opinion?

Option 2: Prove safety beforehand, and unroll and jamming without creating new loops.

we no longer need this patch
modify safety checks
modify codegen
not doing unroll and jam as long as one subloop pair is not safe to fuse.
should be faster to compile

Depending on whether we want to unroll and jam loop-i if loop-k is not safe to fuse, we will prefer option 1 or 2.

for i
  A(i)
  for j
    B(i, j)
    for k
      C(i, j, k)
    D(i, j)
  E(i)

I will continue updating this patch, when we have an idea of which option to take.

Sorry for the break, I got the flu.

In D73129#1844313, @dmgreen wrote:

Put more generally, I was expecting this:
...
You are saying that we should also fuse the inner loops?

Yes.

The purposes of unroll-and-jam is to improve instruction-level-parallelism and reduce hot loop overhead. For performance-optimization, we should only consider the innermost body to be relevant (Statement C in your example). IMHO not jamming the innermost loop does not improve ILP nor overhead, so would be quite useless.

A way to define Unroll-And-Jam is to first tile by (unroll-factor,1,1) (all except the outermost tile factors are 1, so don't really need a loop) and the (fully) unroll the tile. As a side-effect, unroll-and-jam on a single loop would be identical to partial unrolling. Tiling is usually only defined for perfect loop nests, and so I would not necessarily assume that unroll-and-jam over non-perfectly nested loops is even defined. If we do, I'd expect something like:

for i += 2
  A(i)
  A(i+1)
  for j
    B(i, j)
    B(i+1, j)
    for k 
      C(i, j, k)
      C(i+1, j, k)
    D(i, j)
    D(i+1, j)
  E(i)
  E(i+1)
for i remainder:
  A(i)
  for j
    B(i, j)
    for k 
      C(i, j, k)
    D(i, j)
  E(i)

Caveat: What if A,B,D or E contain loops themselves? I'd just not allow it.

When Unroll And Jam was written we did not have general loop fusion. We now do. Can we make use of it here to fuse any sub-loops together? I believe that is how gcc writes their algorithm, but last I looked they only supported perfectly nested loops which would be a big regression over what is here now. We might just be able to attempt sub-loop fusing, using the loop fusion infrastructure we have?

I think using the loop fusion here would make the implementation more complicated.

The alternative like you said would be trying to prove it is valid beforehand, which would mean checking that more blocks inside subloops can be moved past each other and all the extra memory dependencies are safe.

I think the generalization of the legality check from "does the dependency violate jump over one loop" to "does it violate jumping n loops" NOT to be hard.

Edit: Added NOT

In D73129#1844566, @Whitney wrote:

Option 1: Use LoopFusion infrastructure to jam innerloops recursively.
Option 2: Prove safety beforehand, and unroll and jamming without creating new loops.

IMHO option 2 is preferable and more robust.

In D73129#1858261, @Meinersbur wrote:

In D73129#1844566, @Whitney wrote:

Option 1: Use LoopFusion infrastructure to jam innerloops recursively.
Option 2: Prove safety beforehand, and unroll and jamming without creating new loops.

IMHO option 2 is preferable and more robust.

Another way to look at unroll-and-jam is a thin pass that invokes loop unroll + loop fusion under the hood. If we structure it that way, we maximize software reuse and any improvements to the loop fusion required to do unroll-and-jam (such as dealing with intervening code) would directly benefit use-cases that are not unroll-and-jam-specific, without duplication of effort. Similarly safety checks and cost modeling can be reused between the two transforms. I realize that may involve more work and take longer to make it functionally equivalent to what is currently available, but seems like the ideal long-term solution.

In D73129#1858261, @Meinersbur wrote:

In D73129#1844566, @Whitney wrote:

Option 1: Use LoopFusion infrastructure to jam innerloops recursively.
Option 2: Prove safety beforehand, and unroll and jamming without creating new loops.

IMHO option 2 is preferable and more robust.

I apologize for my delay in commenting here.

I also think that Option 2 is more preferable.I agree with @Meinersbur that we shouldn't be doing UnrollAndJam if the Jam part is not possible, otherwise you just end up with another pass of unrolling, which will likely end up causing confusion. Thus, it makes more sense for UnrollAndJam to be a stand-alone pass that doesn't rely on another pass to complete.

That said, there may be parts of the analysis that is done in fusion that can be reused (or generalized and commoned out) that could be useful here. I don't know the internal of UnrollAndJam to know whether that is easy to do, or would end up with more work than benefit it provides.

Thanks everyone for the inputs!
https://reviews.llvm.org/D76132 is created to implement the safety checks needed for option 2.

Would it help if a wrote/sketched the code for dependency violation?

In D73129#1928246, @Meinersbur wrote:

Would it help if a wrote/sketched the code for dependency violation?

Is this for https://reviews.llvm.org/D76132 or other review?
If it is for D76132, it would be very helpful. I agree dependency checks can be improved for unroll and jam. Do you think it make sense to do it as a separate patch after?

dmgreen mentioned this in D80619: [UnJ] Update LI for inner nested loops.May 27 2020, 3:42 AM

Whitney abandoned this revision.Apr 30 2021, 6:59 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

LoopUnrollAndJam.cpp

182 lines

test/

Transforms/

LoopUnrollAndJam/

loopnest.ll

130 lines

unroll-and-jam.ll

12 lines

Diff 239701

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	static void moveHeaderPhiOperandsToForeBlocks(BasicBlock *Header,
// Move all instructions in program order to before the InsertLoc		// Move all instructions in program order to before the InsertLoc
BasicBlock *InsertLocBB = InsertLoc->getParent();		BasicBlock *InsertLocBB = InsertLoc->getParent();
for (Instruction *I : reverse(Visited)) {		for (Instruction *I : reverse(Visited)) {
if (I->getParent() != InsertLocBB)		if (I->getParent() != InsertLocBB)
I->moveBefore(InsertLoc);		I->moveBefore(InsertLoc);
}		}
}		}

		/// Clone a single entry single exit set of blocks \p Blocks. \p FirstBlock is
		/// expected to be the single entry of \p Blocks. \p Blocks is expected to be
		/// contained in a loop.
		///
		/// Updates LoopInfo and DominatorTree assuming \p Blocks are dominated by block
		/// \p BlocksDomBB. Insert the new blocks before block specified in \p Before.
		static void cloneBasicBlocksInLoop(BasicBlock Before, BasicBlock BlocksDomBB,
		BasicBlock *FirstBlock,
		SmallPtrSetImpl<BasicBlock *> &Blocks,
		dmgreenUnsubmitted Not Done Reply Inline Actions BasicBlockSet &Blocks, dmgreen: BasicBlockSet &Blocks,
		ValueToValueMapTy &VMap,
		const Twine &NameSuffix, LoopInfo *LI,
		DominatorTree *DT,
		SmallVectorImpl<BasicBlock *> &NewBlocks) {
		assert(!Blocks.empty() && "Expecting non-empty Blocks");
		assert(Blocks.count(FirstBlock) &&
		"Expecting FirstBlock to be part of Blocks");
		Function *F = FirstBlock->getParent();
		Loop *L = LI->getLoopFor(FirstBlock);
		assert(L && "Expecting FirstBlock to be in a loop");
		DenseMap<Loop , Loop > LMap;
		LMap[L] = L;

		BasicBlock *NewFirstBlock = CloneBasicBlock(FirstBlock, VMap, NameSuffix, F);
		VMap[FirstBlock] = NewFirstBlock;
		L->addBasicBlockToLoop(NewFirstBlock, *LI);
		DT->addNewBlock(NewFirstBlock, BlocksDomBB);
		NewBlocks.push_back(NewFirstBlock);

		// Allocate loops that are descendants of L, and contained in Blocks.
		for (Loop *CurLoop : L->getLoopsInPreorder()) {
		if (!Blocks.count(CurLoop->getHeader()))
		continue;

		Loop *&NewLoop = LMap[CurLoop];
		if (!NewLoop) {
		NewLoop = LI->AllocateLoop();

		// Establish the parent/child relationship.
		Loop *OrigParent = CurLoop->getParentLoop();
		assert(OrigParent && "Could not find the original parent loop");
		Loop *NewParentLoop = LMap[OrigParent];
		assert(NewParentLoop && "Could not find the new parent loop");

		NewParentLoop->addChildLoop(NewLoop);
		}
		}

		// Clone Blocks.
		for (BasicBlock *BB : Blocks) {
		dmgreenUnsubmitted Not Done Reply Inline Actions I think that it will matter that this isn't a deterministic order dmgreen: I think that it will matter that this isn't a deterministic order
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Doesn't matter functionally, but it may affect the order in LIT tests. That's why I used CHECK-DAG in unroll-and-jam.ll. I agree that could be problematic, thinking to change Blocks from a set to a vector, so the caller of the function decide on a order. Whitney: Doesn't matter functionally, but it may affect the order in LIT tests. That's why I used CHECK…
		dmgreenUnsubmitted Not Done Reply Inline Actions Non-determinism is usually considered a problem, even if both outputs are equally valid (as they would be in this case). Especially in a functional safety context where you need to be testing what you are running. Maybe use a SetVector? A vector may be fine too, if we don't need constant time lookup/removal. dmgreen: Non-determinism is usually considered a problem, even if both outputs are equally valid (as…
		if (BB == FirstBlock)
		continue;

		Loop *CurLoop = LI->getLoopFor(BB);
		assert(CurLoop && "Expecting BB to be in a loop");
		Loop *NewLoop = LMap[CurLoop];
		assert(NewLoop && "Expecting new loop to be allocated");
		BasicBlock *NewBB = CloneBasicBlock(BB, VMap, NameSuffix, F);
		VMap[BB] = NewBB;

		NewLoop->addBasicBlockToLoop(NewBB, *LI);
		if (CurLoop->getHeader() == BB)
		NewLoop->moveToHeader(NewBB);

		// Add DominatorTree node. After seeing all blocks, update to correct
		// IDom.
		DT->addNewBlock(NewBB, BlocksDomBB);

		NewBlocks.push_back(NewBB);
		}

		// Update DominatorTree.
		for (BasicBlock *BB : Blocks) {
		BasicBlock *IDomBB = DT->getNode(BB)->getIDom()->getBlock();
		if (VMap.count(IDomBB))
		DT->changeImmediateDominator(cast<BasicBlock>(VMap[BB]),
		cast<BasicBlock>(VMap[IDomBB]));
		}

		// Move them physically from the end of the block list.
		F->getBasicBlockList().splice(Before->getIterator(), F->getBasicBlockList(),
		NewFirstBlock->getIterator(), F->end());
		}

/*		/*
This method performs Unroll and Jam. For a simple loop like:		This method performs Unroll and Jam. For a simple loop like:
for (i = ..)		for (i = ..)
Fore(i)		Fore(i)
for (j = ..)		for (j = ..)
SubLoop(i, j)		SubLoop(i, j)
Aft(i)		Aft(i)

▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	LoopUnrollResult llvm::UnrollAndJamLoop(
AftBlocksLast.push_back(L->getExitingBlock());		AftBlocksLast.push_back(L->getExitingBlock());
// Maps Blocks[0] -> Blocks[It]		// Maps Blocks[0] -> Blocks[It]
ValueToValueMapTy LastValueMap;		ValueToValueMapTy LastValueMap;

// Move any instructions from fore phi operands from AftBlocks into Fore.		// Move any instructions from fore phi operands from AftBlocks into Fore.
moveHeaderPhiOperandsToForeBlocks(		moveHeaderPhiOperandsToForeBlocks(
Header, LatchBlock, ForeBlocksLast[0]->getTerminator(), AftBlocks);		Header, LatchBlock, ForeBlocksLast[0]->getTerminator(), AftBlocks);

// The current on-the-fly SSA update requires blocks to be processed in
// reverse postorder so that LastValueMap contains the correct value at each
// exit.
LoopBlocksDFS DFS(L);
DFS.perform(LI);
// Stash the DFS iterators before adding blocks to the loop.
LoopBlocksDFS::RPOIterator BlockBegin = DFS.beginRPO();
LoopBlocksDFS::RPOIterator BlockEnd = DFS.endRPO();

if (Header->getParent()->isDebugInfoForProfiling())		if (Header->getParent()->isDebugInfoForProfiling())
for (BasicBlock *BB : L->getBlocks())		for (BasicBlock *BB : L->getBlocks())
for (Instruction &I : *BB)		for (Instruction &I : *BB)
if (!isa<DbgInfoIntrinsic>(&I))		if (!isa<DbgInfoIntrinsic>(&I))
if (const DILocation *DIL = I.getDebugLoc()) {		if (const DILocation *DIL = I.getDebugLoc()) {
auto NewDIL = DIL->cloneByMultiplyingDuplicationFactor(Count);		auto NewDIL = DIL->cloneByMultiplyingDuplicationFactor(Count);
if (NewDIL)		if (NewDIL)
I.setDebugLoc(NewDIL.getValue());		I.setDebugLoc(NewDIL.getValue());
else		else
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "Failed to create new discriminator: "		<< "Failed to create new discriminator: "
<< DIL->getFilename() << " Line: " << DIL->getLine());		<< DIL->getFilename() << " Line: " << DIL->getLine());
}		}

// Copy all blocks		// Copy all blocks
for (unsigned It = 1; It != Count; ++It) {		for (unsigned It = 1; It != Count; ++It) {
SmallVector<BasicBlock *, 4> NewBlocks;		SmallVector<BasicBlock *, 4> NewBlocks;
// Maps Blocks[It] -> Blocks[It-1]		// Maps Blocks[It] -> Blocks[It-1]
DenseMap<Value , Value > PrevItValueMap;		DenseMap<Value , Value > PrevItValueMap;

for (LoopBlocksDFS::RPOIterator BB = BlockBegin; BB != BlockEnd; ++BB) {		// Copy ForeBlocks.
		{
ValueToValueMapTy VMap;		ValueToValueMapTy VMap;
BasicBlock New = CloneBasicBlock(BB, VMap, "." + Twine(It));		cloneBasicBlocksInLoop(SubLoopBlocksFirst[0], ForeBlocksLast[It - 1],
Header->getParent()->getBasicBlockList().push_back(New);		ForeBlocksFirst[0], ForeBlocks, VMap,
		"." + Twine(It), LI, DT, NewBlocks);
if (ForeBlocks.count(*BB)) {		ForeBlocksFirst.push_back(cast<BasicBlock>(VMap[ForeBlocksFirst[0]]));
L->addBasicBlockToLoop(New, *LI);		ForeBlocksLast.push_back(cast<BasicBlock>(VMap[ForeBlocksLast[0]]));
		for (auto VEntry : VMap) {
if (*BB == ForeBlocksFirst[0])		PrevItValueMap[VEntry->second] = const_cast<Value *>(
ForeBlocksFirst.push_back(New);		It == 1 ? VEntry->first : LastValueMap[VEntry->first]);
if (*BB == ForeBlocksLast[0])		LastValueMap[VEntry->first] = VEntry->second;
ForeBlocksLast.push_back(New);		}
} else if (SubLoopBlocks.count(*BB)) {
SubLoop->addBasicBlockToLoop(New, *LI);

if (*BB == SubLoopBlocksFirst[0])
SubLoopBlocksFirst.push_back(New);
if (*BB == SubLoopBlocksLast[0])
SubLoopBlocksLast.push_back(New);
} else if (AftBlocks.count(*BB)) {
L->addBasicBlockToLoop(New, *LI);

if (*BB == AftBlocksFirst[0])
AftBlocksFirst.push_back(New);
if (*BB == AftBlocksLast[0])
AftBlocksLast.push_back(New);
} else {
llvm_unreachable("BB being cloned should be in Fore/Sub/Aft");
}		}

// Update our running maps of newest clones		// Clone SubLoopBlocks.
PrevItValueMap[New] = (It == 1 ? BB : LastValueMap[BB]);		{
LastValueMap[*BB] = New;		ValueToValueMapTy VMap;
for (ValueToValueMapTy::iterator VI = VMap.begin(), VE = VMap.end();		cloneBasicBlocksInLoop(AftBlocksFirst[0], SubLoopBlocksLast[It - 1],
VI != VE; ++VI) {		SubLoopBlocksFirst[0], SubLoopBlocks, VMap,
PrevItValueMap[VI->second] =		"." + Twine(It), LI, DT, NewBlocks);
const_cast<Value *>(It == 1 ? VI->first : LastValueMap[VI->first]);		SubLoopBlocksFirst.push_back(
LastValueMap[VI->first] = VI->second;		cast<BasicBlock>(VMap[SubLoopBlocksFirst[0]]));
}		SubLoopBlocksLast.push_back(cast<BasicBlock>(VMap[SubLoopBlocksLast[0]]));
		for (auto VEntry : VMap)
NewBlocks.push_back(New);		LastValueMap[VEntry->first] = VEntry->second;

// Update DomTree:
if (*BB == ForeBlocksFirst[0])
DT->addNewBlock(New, ForeBlocksLast[It - 1]);
else if (*BB == SubLoopBlocksFirst[0])
DT->addNewBlock(New, SubLoopBlocksLast[It - 1]);
else if (*BB == AftBlocksFirst[0])
DT->addNewBlock(New, AftBlocksLast[It - 1]);
else {
// Each set of blocks (Fore/Sub/Aft) will have the same internal domtree
// structure.
auto BBDomNode = DT->getNode(*BB);
auto BBIDom = BBDomNode->getIDom();
BasicBlock *OriginalBBIDom = BBIDom->getBlock();
assert(OriginalBBIDom);
assert(LastValueMap[cast<Value>(OriginalBBIDom)]);
DT->addNewBlock(
New, cast<BasicBlock>(LastValueMap[cast<Value>(OriginalBBIDom)]));
}		}

		// Clone AftBlocks.
		{
		ValueToValueMapTy VMap;
		cloneBasicBlocksInLoop(LoopExit, AftBlocksLast[It - 1], AftBlocksFirst[0],
		AftBlocks, VMap, "." + Twine(It), LI, DT,
		NewBlocks);
		AftBlocksFirst.push_back(cast<BasicBlock>(VMap[AftBlocksFirst[0]]));
		AftBlocksLast.push_back(cast<BasicBlock>(VMap[AftBlocksLast[0]]));
		for (auto VEntry : VMap)
		LastValueMap[VEntry->first] = VEntry->second;
}		}

// Remap all instructions in the most recent iteration		// Remap all instructions in the most recent iteration
remapInstructionsInBlocks(NewBlocks, LastValueMap);		remapInstructionsInBlocks(NewBlocks, LastValueMap);
for (BasicBlock *NewBlock : NewBlocks) {		for (BasicBlock *NewBlock : NewBlocks) {
for (Instruction &I : *NewBlock) {		for (Instruction &I : *NewBlock) {
if (auto *II = dyn_cast<IntrinsicInst>(&I))		if (auto *II = dyn_cast<IntrinsicInst>(&I))
if (II->getIntrinsicID() == Intrinsic::assume)		if (II->getIntrinsicID() == Intrinsic::assume)
▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	LoopUnrollResult llvm::UnrollAndJamLoop(
simplifyLoopAfterUnroll(SubLoop, true, LI, SE, DT, AC);		simplifyLoopAfterUnroll(SubLoop, true, LI, SE, DT, AC);
simplifyLoopAfterUnroll(L, !CompletelyUnroll && Count > 1, LI, SE, DT, AC);		simplifyLoopAfterUnroll(L, !CompletelyUnroll && Count > 1, LI, SE, DT, AC);

NumCompletelyUnrolledAndJammed += CompletelyUnroll;		NumCompletelyUnrolledAndJammed += CompletelyUnroll;
++NumUnrolledAndJammed;		++NumUnrolledAndJammed;

// Update LoopInfo if the loop is completely removed.		// Update LoopInfo if the loop is completely removed.
if (CompletelyUnroll)		if (CompletelyUnroll)
LI->erase(L);		LI->erase(L);
		dmgreenUnsubmitted Done Reply Inline Actions I think the #ifndef NDEBUG checks below will re-use L after it has been deleted. dmgreen: I think the #ifndef NDEBUG checks below will re-use L after it has been deleted.
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Good catch, will address this concern in https://reviews.llvm.org/D73204 Whitney: Good catch, will address this concern in https://reviews.llvm.org/D73204

#ifndef NDEBUG		#ifndef NDEBUG
// We shouldn't have done anything to break loop simplify form or LCSSA.		// We shouldn't have done anything to break loop simplify form or LCSSA.
Loop *OutestLoop = SubLoop->getParentLoop()		Loop *OutestLoop = SubLoop->getParentLoop()
? SubLoop->getParentLoop()->getParentLoop()		? SubLoop->getParentLoop()->getParentLoop()
? SubLoop->getParentLoop()->getParentLoop()		? SubLoop->getParentLoop()->getParentLoop()
: SubLoop->getParentLoop()		: SubLoop->getParentLoop()
: SubLoop;		: SubLoop;
▲ Show 20 Lines • Show All 247 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -basicaa -tbaa -loop-unroll-and-jam -allow-unroll-and-jam -verify-loop-info < %s -S \| FileCheck %s
				; RUN: opt -aa-pipeline=type-based-aa,basic-aa -passes='unroll-and-jam,verify<loops>' -allow-unroll-and-jam < %s -S \| FileCheck %s

				; The explicit metadata should force loop-i to be unroll and jammed 4 times (hence the %inc.i.3)

				define void @foo(i64 %N, i32* %A) {
				jdoerfertUnsubmitted Done Reply Inline Actions I would generate the check lines with the update_checks script. This doesn't test much. jdoerfert: I would generate the check lines with the update_checks script. This doesn't test much.
				WhitneyAuthorUnsubmitted Done Reply Inline Actions I tried to follow the same pattern as the other unroll and jam lit tests. The main purpose of this test is to check if loop info verified correctly. I will update the test with check lines generated by use update_checks script. Whitney: I tried to follow the same pattern as the other unroll and jam lit tests. The main purpose of…
				dmgreenUnsubmitted Done Reply Inline Actions Maybe just show control flow, like the test that was changed below? dmgreen: Maybe just show control flow, like the test that was changed below?
				jdoerfertUnsubmitted Done Reply Inline Actions Maybe just show control flow, like the test that was changed below? The problem there is that the conditions are missing and you don't know where the loop bodies end up. It's debatable what is best but this representation has some advantages when it comes to changes, it makes them automatically applicable and explicitly shows everything. That said, I'm fine with manually cutting it down if ppl prefer that. jdoerfert: > Maybe just show control flow, like the test that was changed below? The problem there is…
				WhitneyAuthorUnsubmitted Done Reply Inline Actions How about I only keep the control flow, conditions, and phi nodes? Whitney: How about I only keep the control flow, conditions, and phi nodes?
				fhahnUnsubmitted Done Reply Inline Actions If this test is intended to check that we generate the correct control flow, why not keep the inner loop body to a bare minimum (e.g. just the induction variables and maybe a function call that is passed the induction variables? We force unrolling, so the loop body, so the body is not really important, right? That would reduce the clutter to a bare minimum, even if we use the script to auto-generate the CHECK lines. fhahn: If this test is intended to check that we generate the correct control flow, why not keep the…
				WhitneyAuthorUnsubmitted Done Reply Inline Actions The intention of the test is to check loop info verify after unroll and jamming and 3 level loop nests. I will change A[i][j][k] = 0 to bar(i, j, k). Whitney: The intention of the test is to check loop info verify after unroll and jamming and 3 level…
				; CHECK-LABEL: @foo
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[FOR_I:%.*]]
				; CHECK: for.i:
				; CHECK-NEXT: [[I:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INC_I_3:%.]], [[FOR_I_LATCH:%.]] ]
				; CHECK-NEXT: [[INC_I:%.*]] = add nuw nsw i64 [[I]], 1
				; CHECK-NEXT: [[INC_I_1:%.*]] = add nuw nsw i64 [[INC_I]], 1
				; CHECK-NEXT: [[INC_I_2:%.*]] = add nuw nsw i64 [[INC_I_1]], 1
				; CHECK-NEXT: [[INC_I_3]] = add nuw nsw i64 [[INC_I_2]], 1
				; CHECK-NEXT: br label [[FOR_J:%.*]]
				; CHECK: for.j:
				; CHECK-NEXT: [[J:%.]] = phi i64 [ 0, [[FOR_I]] ], [ [[INC_J:%.]], [[FOR_J_LATCH_3:%.*]] ]
				; CHECK-NEXT: [[J_1:%.]] = phi i64 [ 0, [[FOR_I]] ], [ [[INC_J_1:%.]], [[FOR_J_LATCH_3]] ]
				; CHECK-NEXT: [[J_2:%.]] = phi i64 [ 0, [[FOR_I]] ], [ [[INC_J_2:%.]], [[FOR_J_LATCH_3]] ]
				; CHECK-NEXT: [[J_3:%.]] = phi i64 [ 0, [[FOR_I]] ], [ [[INC_J_3:%.]], [[FOR_J_LATCH_3]] ]
				; CHECK-NEXT: br label [[FOR_K:%.*]]
				; CHECK: for.k:
				; CHECK-NEXT: [[K:%.]] = phi i64 [ 0, [[FOR_J]] ], [ [[INC_K:%.]], [[FOR_K]] ]
				; CHECK-NEXT: [[TMP0:%.]] = mul nuw i64 [[N:%.]], [[N]]
				; CHECK-NEXT: [[TMP1:%.*]] = mul nsw i64 [[I]], [[TMP0]]
				; CHECK-NEXT: [[AI:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[TMP1]]
				; CHECK-NEXT: [[TMP2:%.*]] = mul nsw i64 [[J]], [[N]]
				; CHECK-NEXT: [[AIJ:%.]] = getelementptr inbounds i32, i32 [[AI]], i64 [[TMP2]]
				; CHECK-NEXT: [[AIJK:%.]] = getelementptr inbounds i32, i32 [[AIJ]], i64 [[K]]
				; CHECK-NEXT: store i32 0, i32* [[AIJK]], align 4
				; CHECK-NEXT: [[INC_K]] = add nsw i64 [[K]], 1
				; CHECK-NEXT: [[CMP_K:%.*]] = icmp slt i64 [[INC_K]], 300
				; CHECK-NEXT: br i1 [[CMP_K]], label [[FOR_K]], label [[FOR_J_LATCH:%.*]]
				; CHECK: for.j.latch:
				; CHECK-NEXT: [[INC_J]] = add nuw nsw i64 [[J]], 1
				; CHECK-NEXT: br label [[FOR_K_1:%.*]]
				; CHECK: for.k.1:
				; CHECK-NEXT: [[K_1:%.]] = phi i64 [ 0, [[FOR_J_LATCH]] ], [ [[INC_K_1:%.]], [[FOR_K_1]] ]
				; CHECK-NEXT: [[TMP3:%.*]] = mul nuw i64 [[N]], [[N]]
				; CHECK-NEXT: [[TMP4:%.*]] = mul nsw i64 [[INC_I]], [[TMP3]]
				; CHECK-NEXT: [[AI_1:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP5:%.*]] = mul nsw i64 [[J_1]], [[N]]
				; CHECK-NEXT: [[AIJ_1:%.]] = getelementptr inbounds i32, i32 [[AI_1]], i64 [[TMP5]]
				; CHECK-NEXT: [[AIJK_1:%.]] = getelementptr inbounds i32, i32 [[AIJ_1]], i64 [[K_1]]
				; CHECK-NEXT: store i32 0, i32* [[AIJK_1]], align 4
				; CHECK-NEXT: [[INC_K_1]] = add nsw i64 [[K_1]], 1
				; CHECK-NEXT: [[CMP_K_1:%.*]] = icmp slt i64 [[INC_K_1]], 300
				; CHECK-NEXT: br i1 [[CMP_K_1]], label [[FOR_K_1]], label [[FOR_J_LATCH_1:%.*]]
				; CHECK: for.j.latch.1:
				; CHECK-NEXT: [[INC_J_1]] = add nuw nsw i64 [[J_1]], 1
				; CHECK-NEXT: br label [[FOR_K_2:%.*]]
				; CHECK: for.k.2:
				; CHECK-NEXT: [[K_2:%.]] = phi i64 [ 0, [[FOR_J_LATCH_1]] ], [ [[INC_K_2:%.]], [[FOR_K_2]] ]
				; CHECK-NEXT: [[TMP6:%.*]] = mul nuw i64 [[N]], [[N]]
				; CHECK-NEXT: [[TMP7:%.*]] = mul nsw i64 [[INC_I_1]], [[TMP6]]
				; CHECK-NEXT: [[AI_2:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP7]]
				; CHECK-NEXT: [[TMP8:%.*]] = mul nsw i64 [[J_2]], [[N]]
				; CHECK-NEXT: [[AIJ_2:%.]] = getelementptr inbounds i32, i32 [[AI_2]], i64 [[TMP8]]
				; CHECK-NEXT: [[AIJK_2:%.]] = getelementptr inbounds i32, i32 [[AIJ_2]], i64 [[K_2]]
				; CHECK-NEXT: store i32 0, i32* [[AIJK_2]], align 4
				; CHECK-NEXT: [[INC_K_2]] = add nsw i64 [[K_2]], 1
				; CHECK-NEXT: [[CMP_K_2:%.*]] = icmp slt i64 [[INC_K_2]], 300
				; CHECK-NEXT: br i1 [[CMP_K_2]], label [[FOR_K_2]], label [[FOR_J_LATCH_2:%.*]]
				; CHECK: for.j.latch.2:
				; CHECK-NEXT: [[INC_J_2]] = add nuw nsw i64 [[J_2]], 1
				; CHECK-NEXT: br label [[FOR_K_3:%.*]]
				; CHECK: for.k.3:
				; CHECK-NEXT: [[K_3:%.]] = phi i64 [ 0, [[FOR_J_LATCH_2]] ], [ [[INC_K_3:%.]], [[FOR_K_3]] ]
				; CHECK-NEXT: [[TMP9:%.*]] = mul nuw i64 [[N]], [[N]]
				; CHECK-NEXT: [[TMP10:%.*]] = mul nsw i64 [[INC_I_2]], [[TMP9]]
				; CHECK-NEXT: [[AI_3:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[TMP10]]
				; CHECK-NEXT: [[TMP11:%.*]] = mul nsw i64 [[J_3]], [[N]]
				; CHECK-NEXT: [[AIJ_3:%.]] = getelementptr inbounds i32, i32 [[AI_3]], i64 [[TMP11]]
				; CHECK-NEXT: [[AIJK_3:%.]] = getelementptr inbounds i32, i32 [[AIJ_3]], i64 [[K_3]]
				; CHECK-NEXT: store i32 0, i32* [[AIJK_3]], align 4
				; CHECK-NEXT: [[INC_K_3]] = add nsw i64 [[K_3]], 1
				; CHECK-NEXT: [[CMP_K_3:%.*]] = icmp slt i64 [[INC_K_3]], 300
				; CHECK-NEXT: br i1 [[CMP_K_3]], label [[FOR_K_3]], label [[FOR_J_LATCH_3]]
				; CHECK: for.j.latch.3:
				; CHECK-NEXT: [[INC_J_3]] = add nuw nsw i64 [[J_3]], 1
				; CHECK-NEXT: [[CMP_J_3:%.*]] = icmp ult i64 [[INC_J_3]], 200
				; CHECK-NEXT: br i1 [[CMP_J_3]], label [[FOR_J]], label [[FOR_I_LATCH]]
				; CHECK: for.i.latch:
				; CHECK-NEXT: [[CMP_I_3:%.*]] = icmp ult i64 [[INC_I_3]], 100
				; CHECK-NEXT: br i1 [[CMP_I_3]], label [[FOR_I]], label [[FOR_END:%.*]], !llvm.loop !0
				; CHECK: for.end:
				; CHECK-NEXT: ret void
				;
				entry:
				br label %for.i

				for.i:
				%i = phi i64 [ 0, %entry ], [ %inc.i, %for.i.latch ]
				fhahnUnsubmitted Done Reply Inline Actions if that's readnone nounwind and the result is unused, it will just be removed (by simplifyLoopAfterUnroll probably). If you need actual loop body, it's probably better to drop the attributes. fhahn: if that's readnone nounwind and the result is unused, it will just be removed (by…
				WhitneyAuthorUnsubmitted Done Reply Inline Actions They are required for unroll and jam to be safely performed. i.e. Without those attributes, unroll and jam is not performed. Whitney: They are required for unroll and jam to be safely performed. i.e. Without those attributes…
				jdoerfertUnsubmitted Done Reply Inline Actions That makes sense but @fhahn has a point. We should try not to introduce test cases that are "trivial" in some sense. I think `A[i][j][k] = 0` was an OK choice. I personally have a harder time understanding check lines and differences if I don't see all of it. Also, the above is arguably the minimal test case for 3 loops already, with `A[i][j][k] = 0` the simplest useful example I can think of. Nit: Make the loop bounds different for each loop. That way you can spot them more easily. jdoerfert: That makes sense but @fhahn has a point. We should try not to introduce test cases that are…
				br label %for.j

				for.j:
				%j = phi i64 [ 0, %for.i ], [ %inc.j, %for.j.latch ]
				br label %for.k

				for.k:
				%k = phi i64 [ 0, %for.j ], [ %inc.k, %for.k ]
				%0 = mul nuw i64 %N, %N
				%1 = mul nsw i64 %i, %0
				%Ai = getelementptr inbounds i32, i32* %A, i64 %1
				%2 = mul nsw i64 %j, %N
				%Aij = getelementptr inbounds i32, i32* %Ai, i64 %2
				%Aijk = getelementptr inbounds i32, i32* %Aij, i64 %k
				store i32 0, i32* %Aijk, align 4
				%inc.k = add nsw i64 %k, 1
				%cmp.k = icmp slt i64 %inc.k, 300
				br i1 %cmp.k, label %for.k, label %for.j.latch

				for.j.latch:
				%inc.j = add nsw i64 %j, 1
				%cmp.j = icmp slt i64 %inc.j, 200
				br i1 %cmp.j, label %for.j, label %for.i.latch

				for.i.latch:
				%inc.i = add nsw i64 %i, 1
				%cmp.i = icmp slt i64 %inc.i, 100
				br i1 %cmp.i, label %for.i, label %for.end, !llvm.loop !1

				for.end:
				ret void
				}

				!1 = distinct !{!1, !2}
				!2 = !{!"llvm.loop.unroll_and_jam.count", i32 4}

llvm/test/Transforms/LoopUnrollAndJam/unroll-and-jam.ll

	Show First 20 Lines • Show All 362 Lines • ▼ Show 20 Lines
	; CHECK: for.inner2:			; CHECK: for.inner2:
	; CHECK: br i1 %tobool, label %for.cond4, label %for.inc			; CHECK: br i1 %tobool, label %for.cond4, label %for.inc
	; CHECK: for.cond4:			; CHECK: for.cond4:
	; CHECK: br i1 %tobool.1, label %for.cond4a, label %for.inc			; CHECK: br i1 %tobool.1, label %for.cond4a, label %for.inc
	; CHECK: for.cond4a:			; CHECK: for.cond4a:
	; CHECK: br label %for.inc			; CHECK: br label %for.inc
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK: br i1 %tobool.11, label %for.cond4.1, label %for.inc.1			; CHECK: br i1 %tobool.11, label %for.cond4.1, label %for.inc.1
				; CHECK-DAG: for.cond4.1:
				; CHECK-DAG: br i1 %tobool.1.1, label %for.cond4a.1, label %for.inc.1
				; CHECK-DAG: for.cond4a.1:
				; CHECK-DAG: br label %for.inc.1
				; CHECK-DAG: for.inc.1:
				; CHECK-DAG: br i1 %exitcond.1, label %for.latch, label %for.inner
	; CHECK: for.latch:			; CHECK: for.latch:
	; CHECK: br label %for.end			; CHECK: br label %for.end
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK: ret i32 0			; CHECK: ret i32 0
	; CHECK: for.cond4.1:
	; CHECK: br i1 %tobool.1.1, label %for.cond4a.1, label %for.inc.1
	; CHECK: for.cond4a.1:
	; CHECK: br label %for.inc.1
	; CHECK: for.inc.1:
	; CHECK: br i1 %exitcond.1, label %for.latch, label %for.inner
	@a = hidden global [1 x i32] zeroinitializer, align 4			@a = hidden global [1 x i32] zeroinitializer, align 4
	define i32 @test5() #0 {			define i32 @test5() #0 {
	entry:			entry:
	br label %for.outer			br label %for.outer

	for.outer:			for.outer:
	%.sink16 = phi i32 [ 0, %entry ], [ %add, %for.latch ]			%.sink16 = phi i32 [ 0, %entry ], [ %add, %for.latch ]
	br label %for.inner			br label %for.inner
	▲ Show 20 Lines • Show All 348 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnrollAndJam] Correctly update LoopInfo when unroll and jam more than 2-levels loop nests.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 239701

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp

llvm/test/Transforms/LoopUnrollAndJam/loopnest.ll

llvm/test/Transforms/LoopUnrollAndJam/unroll-and-jam.ll

[LoopUnrollAndJam] Correctly update LoopInfo when unroll and jam more than 2-levels loop nests.
AbandonedPublic