This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
UnrollLoop.h
-
lib/Transforms/
-
Transforms/
-
Scalar/
3/3
LoopUnrollAndJamPass.cpp
-
Utils/
52/52
LoopUnrollAndJam.cpp
-
test/Transforms/LoopUnrollAndJam/
-
Transforms/
-
LoopUnrollAndJam/
14/14
dependencies.ll
-
dependencies_multidims.ll

Differential D76132

[LoopUnrollAndJam] Changed safety checks to consider more than 2-levels loop nest.
ClosedPublic

Authored by Whitney on Mar 13 2020, 7:18 AM.

Download Raw Diff

Details

Reviewers

dmgreen
jdoerfert
Meinersbur
kbarton
bmahjour
etiotto

Commits

rG0a52401ad68b: [LoopUnrollAndJam] Changed safety checks to consider more than 2-levels loop…

Summary

As discussed in https://reviews.llvm.org/D73129.

Example
Before unroll and jam:

for
  A
  for
    B
    for
      C
    D
  E

After unroll and jam (currently):

for
  A
  A'
  for
    B
    for
      C
    D
    B'
    for
      C'
    D'
  E
  E'

After unroll and jam (Ideal):

for
  A
  A'
  for
    B
    B'
    for
      C
      C'
    D
    D'
  E
  E'

This is the first patch to change unroll and jam to work in the ideal way.
This patch change the safety checks needed to make sure is safe to unroll and jam in the ideal way.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Whitney created this revision.Mar 13 2020, 7:18 AM

Herald added subscribers: llvm-commits, zzheng, hiraditya. · View Herald TranscriptMar 13 2020, 7:18 AM

Whitney marked 3 inline comments as done.Mar 13 2020, 7:21 AM

Whitney added inline comments.

llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
283	Not needed as they are already checked in `isSafeToUnrollAndJam`
438–439	Just clang-format
llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
758	Now done in `isEligibleLoopForm`

Harbormaster completed remote builds in B49134: Diff 250193.Mar 13 2020, 8:01 AM

Nice work!

Do you have a test case?

I need to think about how dependencies violation are detected. Could you write something about how you intent to do it?

llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp
438–439	Could you commit this separately? I.e. just push a NFC commit explaining you are working on this file.
llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
102	Could you add a doxygen description for this function?
119	[nit] TODO's should not be doxygen comments
822	[style] [[ https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop \| Don't evaluate `Cur->getLoopDepth()` every time through the loop ]]. Consider using `llvm::seq`
829–830	Could you write a comment about what kind of dependencies this is looking for/are not allowed?
840	Why is rotated form necessary?
847	[style] [[ https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop \| Don’t evaluate `end()` every time through a loop ]]
859	[style] `(*I).begin()` -> `I->begin()`?
868	Can the following be handled correctly? Multiple exits
875	Does the comment need updating?

Whitney marked 3 inline comments as done.Mar 13 2020, 4:53 PM

Whitney added inline comments.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
840	It was there before: if (Latch != Exit) return false; if (SubLoopLatch != SubLoopExit) return false; But at that time the function `isRotatedForm` wasn't exist.
868	I think it should, but I will write a LIT test to be sure. Only one AftBlock is allowed in the outermostloop.

Addressed review comments from Michael. Still need to add a LIT test.

Harbormaster completed remote builds in B49198: Diff 250333.Mar 13 2020, 6:54 PM

Whitney edited the summary of this revision. (Show Details)Mar 16 2020, 5:47 AM

dmgreen added inline comments.Mar 16 2020, 5:51 AM

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
854–855	This looks like it would recalculate LoadsAndStores a lot? Src and Dst don't seem like the right nomenclature too, but that's just a small nit.
863	`size() != 0` -> `!empty()` ?
917	I realize you didn't write the original, but tranform -> transform I believe the old message was intending to show array indices, so Fi was F(i) and Si_j was S(i,j), showing that the second unrolled iteration need to be able to move past the first, in terms of runtime execution.
971	Should this have a check for a 2 deep loop nest at the moment (like before), if the remainder of the analysis/transform code hasn't been updated yet? It looks like the count calculation might just exclude anything with multiple subloop blocks at the moment anyway, so is possibly not a problem in practice, without pragma's.

Addressed Dave's comments.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
854–855	Renamed, and removed the calculation of one of the LoadsAndStores in the outer loop. I considered creating a map, but the code may get complicated. Is the current change okay with you, or you prefer a map?
917	fixed the typo, and added braces, is not any clearer?
971	Not sure I understand, this check is already like before.

Whitney marked an inline comment as done.Mar 16 2020, 8:04 AM

Whitney added inline comments.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
854–855	Renamed, and moved the calculation of one of the LoadsAndStores in the outer loop. I considered creating a map, but the code may get complicated. Is the current change okay with you, or you prefer a map?

Harbormaster failed remote builds in B49312: Diff 250562!Mar 16 2020, 8:42 AM

dmgreen added inline comments.Mar 16 2020, 4:42 PM

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
854–855	You may be right. It looks O(n^2), but perhaps n will not be too high?
971	Sorry, the actual line I was pointing to was semi-random. That wasn't clear. Does the processHeaderPhiOperands check below need to check each level? IIRC it's testing data-dependencies (as in ssa/use-def dependencies, as opposed to the memory dependencies in checkDependencies. Can we physically move any instruction we need from aft to fore). If we end up moving multiple levels past one another, do we have to make the same checks at each level? My general point was that some of the code still only handles 2-deep loop nests. Should we have a check somewhere (perhaps with a fixme next it) that still tests for that condition, until the rest of the code has caught up?

Whitney marked 4 inline comments as done.Mar 17 2020, 5:54 AM

Whitney added inline comments.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
854–855	n is `(loopdepth - 1) * 2 + 1`. I personally think is ok.
971	Because of the way we rearrange basic blocks, we also require that the Fore blocks of L on all unrolled iterations are safe to move before the blocks of the direct child of L of all iterations. So we require that the phi node looping operands of ForeHeader can be moved to at least the end of ForeEnd, so that we can arrange cloned Fore Blocks before the subloop and match up Phi's correctly. As we are only unrolling L, not its child, we don't need to move instructions from non-L AftBlock to non-L ForeBlock, so we don't need to check if the moves are safe. My general point was that some of the code still only handles 2-deep loop nests. Should we have a check somewhere (perhaps with a fixme next it) that still tests for that condition, until the rest of the code has caught up? I modified all checks I found needed in `isSafeToUnrollAndJam`, am I missing something?

Added LITs for dependencies.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
868	Multiple exits are not allowed, and that's blocked in `partitionOuterLoopBlocks`

Harbormaster failed remote builds in B49520: Diff 250943!Mar 17 2020, 5:51 PM

clang-format

Harbormaster failed remote builds in B49526: Diff 250956!Mar 17 2020, 6:56 PM

Meinersbur added inline comments.Mar 17 2020, 9:33 PM

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
659–661	i don't see this being enforced.
819–820	This looks like a bail-out; might improve this by setting `CurLoop` to innermost loop both instructions are in. I'd prefer the following structure: if (D->getDirection(LoopDepth) & Dependence::DVEntry::GT) { // That is, as by the other comment, check whter the root loop carries this dependency if (!CurLoop) return false; // Bail-out ... }
820	I think this is should be looking for whether the first `GT` direction is due to the root loop (i.e. whether the root loop is the cause for the dependence to be fulfilled). Not an correctness issue. The bitwise `&` comparison also matches `NE` and `GE`. Is this intended?
823–824	Isn't this equivalent to `D->getDirection(CurLoopDepth ) == EQ`?
827	Again, the `&` comparison feels wrong. Did you consider a switch over the values of `Dependence::DVEntry`?
llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
513	Does it otherwise unroll-and-jam the middle loop? Should we add a mechanism that stops unrolling of nests of already unrolled loops (e.g. add `llvm.loop.unroll_and_jam.disable` to all nested loops)?

Harbormaster failed remote builds in B49526: Diff 250956!

Looks like some llc test cases failed. Doesn't seem related to this patch to me. Anyone know if it is safe to ignore?

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
659–661	`if (D->getDirection(LoopDepth) & Dependence::DVEntry::GT) return false;` => `allowing only = or < (not >)`
819–820	might improve this by setting CurLoop to innermost loop both instructions are in. Do you mean passing `LI` into this function, and calculating `CurLoop` here?
820	I think this is should be looking for whether the first GT direction is due to the root loop (i.e. whether the root loop is the cause for the dependence to be fulfilled). Not an correctness issue. Here `LoopDepth` is assumed to be the loop depth of the unroll loop. The bitwise & comparison also matches NE and GE. Is this intended? The code before my change is using `&`, so I don't want to change the behaviour.
823–824	Could also be `ALL`. I think `isScalar` is clearer.
827	The code before my change is using `&`. I considered changing to use dependence distance instead of direction to allow more dependence, but I want to keep this patch as changes needed for safety checks for more than 2-levels loop nest. There is a FIXME: `// FIXME: Allow > so long as distance is less than unroll width` What do you think?
llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
513	Currently unroll and jam add `llvm.loop.unroll_and_jam.disable` to the loop it unroll and jammed (not all nested loops). As we are traversing from inner to outer, `for.k` is not considered as it doesn't have a inner loop `for.j` is safe to unroll and jam, but is not my intension for this test case, so I added the disable pragma `for.i` is unsafe only after my change. Even if we change to traverse from outer to inner, we should allow the middle loop to unroll and jam when proven profitable.

fixed typo.

Harbormaster completed remote builds in B49547: Diff 250993.Mar 17 2020, 11:25 PM

Harbormaster completed remote builds in B49548: Diff 250994.

Whitney mentioned this in D73129: [LoopUnrollAndJam] Correctly update LoopInfo when unroll and jam more than 2-levels loop nests..Mar 17 2020, 11:32 PM

Meinersbur added inline comments.Mar 18 2020, 10:20 AM

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
819–820	Where the innermost common loop is computed is unimportant.
820	Here LoopDepth is assumed to be the loop depth of the unroll loop. I know. However, the dependency might also be fullfiled by surrounding loops. eg: for i = 0 ... n-1 #pragma unrollandjam for j = 0 ... n-1 for j = 0 ... n-1: A[i][j][k] = f(A[i-1][j-1][k]); The dependence vector is (>,>,=). Verifying the direction of the j-dependence is not necessary because for the same `i`, the inner loops are parallel. In other words: the `i`-loop already ensures that the dependency is fulfilled.
823–824	`ALL` would be `*` in the dependence vector, i.e. the analysis could not prove any direction. However, it's also not `EQ`. Sorry for the confusion. I am not convinced it can be ignored. #pragma unrollandjam for i = ... for j = ... sum = f(sum, ...); The dependence induced by sum is obviously scalar, but it cannot be jammed (it's sequential). The dependence vector is (=>,>)
827	I think taking the dependence distance into account is something for a different patch.
837	It could also be `ALL` (or `GE`), meaning we don't know what direction it is. This check gives it a pass.

Addressed Michael's comments.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
819–820	Then I think I misunderstood, currently CurLoop is only set when both instructions belongs to the same loop.
820	Added code to allow accesses of different base.
823–824	Why the above example cannot be jammed? Before jam: for i = ... for j = ... sum = f(sum, ...); for j = ... sum = f(sum, ...); After jam: for i = ... for j = ... sum = f(sum, ...); sum = f(sum, ...); The access pattern seems to be the same to me.
837	Changed to `if (D->getDirection(CurLoopDepth) == Dependence::DVEntry::GT)`

Harbormaster completed remote builds in B49645: Diff 251192.Mar 18 2020, 4:19 PM

dmgreen added inline comments.Mar 19 2020, 4:20 AM

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
971	In the summary you have B' moving past D. And we need to be sure that B' doesn't depend on anything from D. I think of it as B(1,0) needs to move past D(0,0). the "j" level loop isn't unrolled, but there still some movement needed at the "i" level.
llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
2	Why is this now using da-disable-delinearization-checks, and why have some of these existing tests been changed to use constant size arrays?

Whitney marked 7 inline comments as done.Mar 19 2020, 5:35 AM

Whitney added inline comments.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
971	Here we are talking about def-use dependence. If an instruction in B' (x2) depend on an instruction in D (y), means there must be an instruction in B (x) that depend on instruction y in D, as B' is clone from B. B: x = phi [y, D]... D: y = As we are placing B' after B, and y is available for B, then y must also be available for B'. Please correct me if I am wrong.
llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
2	`-da-disable-delinearization-checks` is added to more accurately delinearization of fixed-size multi-dimensional arrays. See https://reviews.llvm.org/D72178 more detail explaination. why have some of these existing tests been changed to use constant size arrays They were originally testing single dimensional arrays, which may not be ideal for testing sub-sub portion of code.
363	This test was orignally testing for i for j A[i] A[i-1] which should be safe to unroll and jam. I think it actually want to test the code for sub with sub with the dependence distance of the inner loop is less. for i for j A[i][j] A[i+1][j-1]
401	This test was orignally testing for i for j A[i] A[i] I think it actually want to test the code for sub with sub with the dependence distance of the inner loop is eq. for i for j A[i][j] A[i+1][j]
439	This test was orignally testing for i for j A[i] A[i+1] which should be safe to unroll and jam. I think it actually want to test the code for sub with sub with the dependence distance of the inner loop is more. for i for j A[i][j] A[i+1][j+1]

fhahn added a subscriber: fhahn.Mar 19 2020, 6:25 AM

fhahn added inline comments.

llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
2	-da-disable-delinearization-checks is added to more accurately delinearization of fixed-size multi-dimensional arrays. Well, it returns more optimistic results at the loss of soundness IIRC. I think we should definitely keep test coverage without -da-disable-delinearization-checks. This is the common case, we should definitely handle correctly. Ideally we would have multi-dimensional tests that do not need -da-disable-delinearization-checks. IIRC constant loop bounds might help with that. IMO tests that really require -da-disable-delinearization-checks should be additional, maybe in a separate file.
401	FWIW I think it would be better to keep the original test as is and add the new case as an additional test, as they seem to test different scenarios.

Kept the original test as is and added the new case as an additional test.

Harbormaster completed remote builds in B49775: Diff 251437.Mar 19 2020, 1:09 PM

bmahjour added inline comments.Mar 19 2020, 1:13 PM

llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
2	Ideally we would have multi-dimensional tests that do not need -da-disable-delinearization-checks. It's very difficult to come up with multi-dimensional tests that result in accurate dependence vectors. That's one of the main reasons why this option was added, to be able to let us test/exercise code paths that would otherwise not be taken due to overpessimistic dependence. IIRC constant loop bounds might help with that. The delinearization validity checks compare the subscripts against parameteric terms in the subscript that are believed to be the size of a given dimension. If the arrays have dynamic sizes then the constant loop bound won't help because the subscripts still contain parameteric terms. If the arrays have fixed sizes, then we don't even try to delinearize them unless `-da-disable-delinearization-checks` is enabled.

In D76132#1931830, @Whitney wrote:

Kept the original test as is and added the new case as an additional test.

Thanks!

Whitney marked 2 inline comments as done.Mar 19 2020, 6:18 PM

dmgreen added inline comments.Mar 22 2020, 2:47 PM

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
971	Yeah, kind of. But with an extra level of unrolling in there. Using your ` notation from before, normal unrolling would do: B: x = phi [y', D'], [x, A] D: y = B': x' = phi [y, D] D: y' = As we need to move B' before D, we also need to be able to hoist y into B, so the phi in B' can point at the correct value. That's what this processHeaderPhiOperands is doing. Note the x` phi only has one operand after unrolling, so will be simplified away. It comes up quite a bit from the increment of the IV variable. The second IV will become an increment of the first after we have pushed the add up into the header from the latch.
llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
363	Hmm. I don't remember what this was trying to test. It feels like a very long time ago now. Thanks for splitting the new tests out. More are always a good thing.

Whitney marked 4 inline comments as done.Mar 22 2020, 3:13 PM

Whitney added inline comments.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
971	We are still only unrolling one loop each time, but fuse LoopDepth-1 times. Instructions in B should be the same as B', except all `i` changed to `i+1`, when unrolling by 2. Likewise, all instructions in D should be the same as D', except all `i` changed to `i+1`, when unrolling by 2. In that case, how can an instruction defined in D' used in B? I think processHeaderPhiOperands is used for the induction variable of the unrolled loop.
llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
363	For sure. I guess is hard to speculate what it was trying to test now.

dmgreen added inline comments.Mar 23 2020, 3:17 AM

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
971	Ah I see. Even though we are moving the blocks past one another, at that level there will not be any instructions that needs to move. OK fair enough :)

Whitney marked 3 inline comments as done.Mar 23 2020, 4:42 AM

Here is my suggested dependency checker:

// Check whether it is semantically safe Src and Dst considering any potential
// dependency between them.
//
// @param UnrollLevel The level of the loop being unrolled
// @param JamLevel    The level of the loop being jammed; if Src and Dst are on
// different levels, the outermost common loop counts as jammed level
//
// @return true if is safe and false if there is a dependency violation.
static bool checkDependency(Instruction *Src, Instruction *Dst,
                            unsigned UnrollLevel, unsigned JamLevel,
                            bool Sequentialized, DependenceInfo &DI) {
  assert(UnrollLevel <= JamLevel);

  if (Src == Dst)
    return true;
  if (isa<LoadInst>(Src) && isa<LoadInst>(Dst))
    return true;

  // Check whether unroll-and-jam may violate a dependency.
  // By construction, every dependency will be lexicographically non-negative
  // (if it was, it would violate the current execution order), such as
  //   (0,0,>,*,*)
  // Unroll-and-jam changes the GT execution of two executions to the same
  // iteration of the chosen unroll level. That is, a GT dependence becomes a GE
  // dependence (or EQ, if we fully unrolled the loop) at the loop's position:
  //   (0,0,>=,*,*)
  // Now, the dependency is not necessarily non-negative anymore, i.e.
  // unroll-and-jam may violate correctness.
  std::unique_ptr<Dependence> D = DI.depends(Src, Dst, true);
  if (!D)
    return true;
  assert(D->isOrdered() && "Expected an output, flow or anti dep.");

  // Quick bail-out.
  if (D->isConfused())
    return false;

  for (unsigned d = 1; d < UnrollLevel; ++d) {
    // Check if dependence is carried by an outer loop.
    // That is, changing
    //   (0,>,>,*,*)
    // to
    //   (0,>,>=,*,*)
    // will still not violate the dependency.
    if (D->getDirection(d) == Dependence::DVEntry::GT)
      return true;
  }

  if (!(D->getDirection(UnrollLevel) & Dependence::DVEntry::GT)) {
    // If the unrolled loop did not carry the dependency in the first place
    // (i.e. being <, <= or =), it will also not need after unroll-and-jam.
    // Note: The standard case is the dependence vector
    //   (0,0,=,>,*)
    // where the unrolled level is already EQ. Changing LT and LE should also
    // not affect the semantics, since these didn't help to fulfill the
    // dependence in the first place.
    return true;
  }

  // Check if unrolled level becomes an EQ dependence at the unroll level,
  // whether one of the inner loops would carry the dependence.
  for (unsigned d = UnrollLevel + 1; d <= JamLevel; ++d) {
    unsigned Dir = D->getDirection(d);

    // Check whether the jammed level will carry the dependency.
    if (Dir == Dependence::DVEntry::GT)
      return true;

    // A possible backwards direction will violate the dependency
    if (Dir & Dependence::DVEntry::LT)
      return false;
  }

  // We previously already already checked whether the instructions are in the
  // same region. Reaching this means that they are not, hence we have a
  // dependency violation.
  return Sequentialized;
}

static bool
checkDependencies(Loop &Root, const BasicBlockSet &SubLoopBlocks,
                  const DenseMap<Loop *, BasicBlockSet> &ForeBlocksMap,
                  const DenseMap<Loop *, BasicBlockSet> &AftBlocksMap,
                  DependenceInfo &DI, LoopInfo &LI) {
  SmallVector<BasicBlockSet, 8> AllBlocks;
  for (Loop *L : Root.getLoopsInPreorder())
    if (ForeBlocksMap.find(L) != ForeBlocksMap.end())
      AllBlocks.push_back(ForeBlocksMap.lookup(L));
  AllBlocks.push_back(SubLoopBlocks);
  for (Loop *L : Root.getLoopsInPreorder())
    if (AftBlocksMap.find(L) != AftBlocksMap.end())
      AllBlocks.push_back(AftBlocksMap.lookup(L));

  unsigned LoopDepth = Root.getLoopDepth();
  SmallVector<Instruction *, 4> EarlierLoadsAndStores;
  SmallVector<Instruction *, 4> CurrentLoadsAndStores;
  for (BasicBlockSet &Blocks : AllBlocks) {
    CurrentLoadsAndStores.clear();
    if (!getLoadsAndStores(Blocks, CurrentLoadsAndStores))
      return false;

    Loop *CurLoop = LI.getLoopFor((*Blocks.begin())->front().getParent());
    unsigned CurLoopDepth = CurLoop->getLoopDepth();

    for (auto Earlier : EarlierLoadsAndStores) {
      Loop *EarlierLoop = LI.getLoopFor(Earlier->getParent());
      unsigned EarlierDepth = EarlierLoop->getLoopDepth();
      unsigned CommonLoopDepth = std::min(EarlierDepth, CurLoopDepth);
      for (auto Later : CurrentLoadsAndStores) {
        if (!checkDependency(Earlier, Later, LoopDepth, CommonLoopDepth, false,
                             DI))
          return false;
      }
    }

    size_t NumInsts = CurrentLoadsAndStores.size();
    for (size_t i = 0; i < NumInsts; ++i) {
      for (size_t j = i + 1; j < NumInsts; ++j) {
        if (!checkDependency(CurrentLoadsAndStores[i], CurrentLoadsAndStores[j],
                             LoopDepth, CurLoopDepth, true, DI))
          return false;
      }
    }

    EarlierLoadsAndStores.append(CurrentLoadsAndStores.begin(),
                                 CurrentLoadsAndStores.end());
  }
  return true;
}

I think I discovered a bug in the current dependency checker. LoopUnrollAndJam transforms sub_sub_more in the check dependencies.ll which I dint think it is allowed to do.

llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
439	I think this is NOT safe to unroll-and jam. The unrolled equivalent is: for i += 2 for j S1: A[i] S2: A[i+1] S1': A[i+1] S2': A[i+2] At S1', the lasst access of A[i+1] is S2 from the same itertation. instead if S1 from the previous j-iteration is in the original loop.

Thanks @Meinersbur ! I mostly used your code directly, except

for (unsigned d = 1; d < UnrollLevel; ++d) {
      // Check if dependence is carried by an outer loop.
      // That is, changing
      //   (0,>,>,*,*)
      // to
      //   (0,>,>=,*,*)
      // will still not violate the dependency.
      if (D->getDirection(d) == Dependence::DVEntry::GT)
        return true;
    }

which I think should be safe as long as the one dependence is not EQ then should be safe.

for i
  for j        <= unroll loop
    for k
       A[i][j][k]
       A[i-1][j+1][k]

Loop-j should be safe to unroll and jam. Am I right?

Whitney marked an inline comment as done.Mar 26 2020, 8:22 AM

In D76132#1943872, @Whitney wrote:
Thanks @Meinersbur ! I mostly used your code directly, except
for (unsigned d = 1; d < UnrollLevel; ++d) {
      // Check if dependence is carried by an outer loop.
      // That is, changing
      //   (0,>,>,*,*)
      // to
      //   (0,>,>=,*,*)
      // will still not violate the dependency.
      if (D->getDirection(d) == Dependence::DVEntry::GT)
        return true;
    }
which I think should be safe as long as the one dependence is not EQ then should be safe.
for i
  for j        <= unroll loop
    for k
       A[i][j][k]
       A[i-1][j+1][k]
Loop-j should be safe to unroll and jam. Am I right?

Yes, that's what the cod above would be testing.

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
715–718	[serious] This does not correspond to the `getDirection() == GT` from my version. In particular, this version lets' the loop pass if the outer loops have `NE` or `LT` dependenies. These do not ensure lexicographic greater-than and therefore cause a misoptimization. [remark] While some style guides prefer this functional style, I prefer the version with explicit loops and the comment that explains what the code is doing.

Harbormaster failed remote builds in B50547: Diff 252851!Mar 26 2020, 9:12 AM

In D76132#1943966, @Meinersbur wrote:
In D76132#1943872, @Whitney wrote:
Thanks @Meinersbur ! I mostly used your code directly, except
for (unsigned d = 1; d < UnrollLevel; ++d) {
      // Check if dependence is carried by an outer loop.
      // That is, changing
      //   (0,>,>,*,*)
      // to
      //   (0,>,>=,*,*)
      // will still not violate the dependency.
      if (D->getDirection(d) == Dependence::DVEntry::GT)
        return true;
    }
which I think should be safe as long as the one dependence is not EQ then should be safe.
for i
  for j        <= unroll loop
    for k
       A[i][j][k]
       A[i-1][j+1][k]
Loop-j should be safe to unroll and jam. Am I right?
Yes, that's what the cod above would be testing.

My above example was not good.
Consider this example:

for i
  for j        <= unroll loop
    for k
       A[i][j][k]
       A[i-1][j+1][k-1]

The dependence direction vector would be [LT, GT, LT].

After unroll and jam loop-j:

for i
   for j  += 2
     for k
        A[i][j][k]
        A[i-1][j+1][k-1]
        A[i][j+1][k]
        A[i-1][j+2][k-1]

I think loop-j is safe to unroll and jam.
However if we only allow GT for loop-i, then this would be consider not safe.

In D76132#1944283, @Whitney wrote:
My above example was not good.
Consider this example:
for i
  for j        <= unroll loop
    for k
       A[i][j][k]
       A[i-1][j+1][k-1]
The dependence direction vector would be [LT, GT, LT].

This dependency vector would be an unconditional dependency violation (lexicographically negative; backwards in time)

Let's say the statements are S1 and S2.

No dependence between S1 and itself.
No dependence between S2 and itself.

S2(i2,j2,k2) depends on S1(i1,j1,k1) iff i1==i2-1(A[i1] and A[i2-1] access the same element), j1==j2+1, k1==k2-1 and (i1,j1,k1) <=_{lexicographic} (i2,j2,k2),
in other words, the dependence vector is (+1,-1,+1) or (GT,LT,GT) [direction from S1->S2, since S1 is the source and S2 is the consumer].

For the iteration in the i-loop, S1 and S2 will never access the same element, hence we can reorder S1 and S2 within the outermost loops as we like.

After unroll and jam loop-j:
for i
   for j  += 2
     for k
        A[i][j][k]
        A[i-1][j+1][k-1]
        A[i][j+1][k]
        A[i-1][j+2][k-1]
I think loop-j is safe to unroll and jam.
However if we only allow GT for loop-i, then this would be consider not safe.

But the dependency is strictly GT in the first element ?!?

Consider a case where the dependency in the i-loop is GE:

for i
  for j        <= unroll loop
    for k
       A[i][j][k]
       A[c ? i : i-1][j-1][k]

Here S2 conditionally accessed the element from the previous i-iteration or the same, depending on `c`. Thus the possibility of "EQ" requires us to check further.

Michael

Consider a case where the dependency in the i-loop is GE:

for i
  for j        <= unroll loop
   for k
      A[i][j][k]>
      A[c ? i : i-1][j-1][k]

Here S2 conditionally accessed the element from the previous i-iteration or the same, depending on `c`. Thus the possibility of "EQ" requires us to check further.

Thanks for your reply.
I am still confused. Your version only allow GT (by checking == GT). My version only allow GT, LT, NE (by checking ! &EQ). So both versions requires us to check further with GE.
I was trying to use

for i
  for j        <= unroll loop
    for k
       S1: A[i][j][k]
       S2: A[i-1][j+1][k-1]

as an example, where your version would consider as unsafe and my version would consider as safe.
Am I correct to think loop-j is safe to unroll and jam?
If yes, do you have an example where LT or NE for loop-i should be considered as unsafe for unroll and jam loop-j?

In D76132#1947821, @Whitney wrote:
I am still confused. Your version only allow GT (by checking == GT). My version only allow GT, LT, NE (by checking ! &EQ). So both versions requires us to check further with GE.
I was trying to use
for i
  for j        <= unroll loop
    for k
       S1: A[i][j][k]
       S2: A[i-1][j+1][k-1]
as an example, where your version would consider as unsafe and my version would consider as safe.
Am I correct to think loop-j is safe to unroll and jam?

Yes

If yes, do you have an example where LT or NE for loop-i should be considered as unsafe for unroll and jam loop-j?

My justification, as laid out in the comments, is that a GT dependency ensures that the dependency vector is lexicographic positive. An LT or NE dependency does not do that. I would need to see an argument -- not an example -- why LT or NE provide safety. Also consider that DependencyInfo can overapproximate direction bits, so there may not be an example with exactly these direction vectors.

S2(i2,j2,k2) depends on S1(i1,j1,k1) iff i1==i2-1(A[i1] and A[i2-1] access the same element), j1==j2+1, k1==k2-1 and (i1,j1,k1) <=_{lexicographic} (i2,j2,k2),
in other words, the dependence vector is (+1,-1,+1) or (GT,LT,GT) [direction from S1->S2, since S1 is the source and S2 is the consumer].

This is a bit counter intuitive, but a positive dependence direction, is represented as LT (not GT). See section 4.2.1 in G.G, Ken Kennedy, C.W. Tseng, 1990. Practical Dependence Testing.
The corresponding direction vector for (+1,-1,+1) is (LT, GT, LT) which would be a lexicographically positive direction vector.

I think loop-j is safe to unroll and jam.
However if we only allow GT for loop-i, then this would be consider not safe.

But the dependency is strictly GT in the first element ?!?

Given that the direction vector is (LT, GT, LT), it follows that only allowing GT will disqualify the above example as an unsafe case.

In D76132#1947897, @Meinersbur wrote:
In D76132#1947821, @Whitney wrote:
I am still confused. Your version only allow GT (by checking == GT). My version only allow GT, LT, NE (by checking ! &EQ). So both versions requires us to check further with GE.
I was trying to use
for i
  for j        <= unroll loop
    for k
       S1: A[i][j][k]
       S2: A[i-1][j+1][k-1]
as an example, where your version would consider as unsafe and my version would consider as safe.
Am I correct to think loop-j is safe to unroll and jam?
Yes

If yes, do you have an example where LT or NE for loop-i should be considered as unsafe for unroll and jam loop-j?

My justification, as laid out in the comments, is that a GT dependency ensures that the dependency vector is lexicographic positive. An LT or NE dependency does not do that. I would need to see an argument -- not an example -- why LT or NE provide safety. Also consider that DependencyInfo can overapproximate direction bits, so there may not be an example with exactly these direction vectors.

I believe Whitney's reasoning is based on the assumption that if outer levels (levels enclosing the loop being unroll-and-jammed) have a non-equal direction, then the locations accessed in the inner levels cannot overlap in memory. One counter example to that I can think of is where the indexes overlap into neighboring dimensions. @Meinersbur I'm just curious, is that what you had in mind too? I agree we need to be conservative, but I just want to know the concern so we can document/think about it.

In D76132#1949923, @bmahjour wrote:

S2(i2,j2,k2) depends on S1(i1,j1,k1) iff i1==i2-1(A[i1] and A[i2-1] access the same element), j1==j2+1, k1==k2-1 and (i1,j1,k1) <=_{lexicographic} (i2,j2,k2),
in other words, the dependence vector is (+1,-1,+1) or (GT,LT,GT) [direction from S1->S2, since S1 is the source and S2 is the consumer].

This is a bit counter intuitive, but a positive dependence direction, is represented as LT (not GT). See section 4.2.1 in G.G, Ken Kennedy, C.W. Tseng, 1990. Practical Dependence Testing.
The corresponding direction vector for (+1,-1,+1) is (LT, GT, LT) which would be a lexicographically positive direction vector.

I indeed seem to have been confused the directions, maybe influence by the original implementation predominantly testing against GT. Thanks for pointing this out. It also means that some assumptions I had while writing the code were wrong.

One other is that DependenceAnalysis would not return a [> ...] dependence, as this would be a dependence of Dst to a Src in the future. Well:

for (int i = 0; i < n; ++i) {
        A[i] = 42;
        sum += A[i+1];
}

$ opt -da
Src:  store double 4.200000e+01, double* %arrayidx, align 8 --> Dst:  %0 = load double, double* %arrayidx2, align 8
  da analyze - consistent flow [>]!

It's not a flow dependence, but an anti-dependence. This the result of DA being invoked as analyze(store,load), but DA is not able to determine from the direction whether it's a flow or an anti-dependence.

That is, one DA call returns two results mixed into a single dependence vector: Src --> Dst and Dst --> Src. That doesn't look ideal and I am not sure about the consequences.

In D76132#1949936, @bmahjour wrote:

I believe Whitney's reasoning is based on the assumption that if outer levels (levels enclosing the loop being unroll-and-jammed) have a non-equal direction, then the locations accessed in the inner levels cannot overlap in memory.

Could this be commented, with a bit more detail, in the source code?

One counter example to that I can think of is where the indexes overlap into neighboring dimensions. @Meinersbur I'm just curious, is that what you had in mind too? I agree we need to be conservative, but I just want to know the concern so we can document/think about it.

What I had in mind was if DependencyInfo was combining multiple cases into a single Dependence result. For instance, depending on a condition c the dependence vector is either (+1,-1) or (0,+1). The combined direction flags would be (<=,<>). On the other side, if a particular instance has a reverse (i.e. negative, >, GT) direction, then one previous dimensions must have been positive (otherwise the value would have time-traveled). That could also serve a a justification.

Meinersbur added inline comments.Mar 30 2020, 10:08 PM

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
792	This should be `for (size_t j = i; j < NumInsts; ++j)` to not skip over self-dependencies of a store.

Given that the DependencyAnalysis result has to be interpreted different than I initially though, I came up with a new legality check.

Since DA returns (lexical) forward and backwards dependency in a single result, both have to be checked. There is also a symmetry between them as the alternative would be to invoke ->depends(Src,Dst) and well as ->depends(Dst,Src) and check them separately.

static bool preservesForwardDependence(Instruction *Src, Instruction *Dst,
                                       unsigned UnrollLevel, unsigned JamLevel,
                                       bool Sequentialized, Dependence *D) {
  // UnrollLevel might carry the dependency Src --> Dst
  // Does a different loop after unrolling?
  for (unsigned d = UnrollLevel + 1; d <= JamLevel; ++d) {
    auto JammedDir = D->getDirection(d);
    if (JammedDir == Dependence::DVEntry::LT)
      return true;

    if (JammedDir & Dependence::DVEntry::GT)
      return false;
  }

  return true;
}

static bool preservesBackwardDependence(Instruction *Src, Instruction *Dst,
                                        unsigned UnrollLevel, unsigned JamLevel,
                                        bool Sequentialized, Dependence *D) {
  // UnrollLevel might carry the dependency Dst --> Src
  for (unsigned d = UnrollLevel + 1; d <= JamLevel; ++d) {
    auto JammedDir = D->getDirection(d);
    if (JammedDir == Dependence::DVEntry::GT)
      return true;

    if (JammedDir & Dependence::DVEntry::LT)
      return false;
  }

  // Backward dependencies are only preserved if not interleaved.
  return Sequentialized;
}

/// Also a forward-dependency, but not carried by UnrollLoop.
static bool preservesNonCarriedDependence(Instruction *Src, Instruction *Dst,
                                          unsigned UnrollLevel,
                                          unsigned JamLevel,
                                          bool Sequentialized, Dependence *D) {
  // There might be dependency Src --> Dst that is not carried by UnrollLoop.
  for (unsigned d = UnrollLevel + 1; d <= JamLevel; ++d) {
    // TODO: Justify this; without it, sub_sub_eq fails
    if (D->isScalar(d))
      continue;

    auto JammedDir = D->getDirection(d);
    if (JammedDir == Dependence::DVEntry::LT)
      return true;

    if (JammedDir & Dependence::DVEntry::GT)
      return false;
  }

  return true;
}

static bool checkDependency(Instruction *Src, Instruction *Dst,
                            unsigned UnrollLevel, unsigned JamLevel,
                            bool Sequentialized, DependenceInfo &DI) {
  assert(UnrollLevel <= JamLevel);

  if (Src == Dst)
    return true;
  if (isa<LoadInst>(Src) && isa<LoadInst>(Dst))
    return true;

  std::unique_ptr<Dependence> D = DI.depends(Src, Dst, true);
  if (!D)
    return true;
  assert(D->isOrdered() && "Expected an output, flow or anti dep.");

  // Quick bail-out.
  if (D->isConfused())
    return false;

  for (unsigned d = 1; d < UnrollLevel; ++d) {
    // Insert comment justifying this here
    if (!(D->getDirection(d) & Dependence::DVEntry::EQ))
      return true;
  }

  auto UnrollDir = D->getDirection(UnrollLevel);
  if (UnrollDir & Dependence::DVEntry::LT &&
      !preservesForwardDependence(Src, Dst, UnrollLevel, JamLevel,
                                  Sequentialized, D.get()))
    return false;

  if (UnrollDir & Dependence::DVEntry::GT &&
      !preservesBackwardDependence(Src, Dst, UnrollLevel, JamLevel,
                                   Sequentialized, D.get()))
    return false;

  if (UnrollDir & Dependence::DVEntry::EQ &&
      !preservesNonCarriedDependence(Src, Dst, UnrollLevel, JamLevel,
                                     Sequentialized, D.get()))
    return false;

  return true;
}

This seem to work nicely with that checks, but I am not sure whether it is correct to ignore isScalar in the EQ case and why. It seems obvious that the sub_sub_eq test case can be unroll-and-jammed. Bit if we add the same skip to preservesForwardDependence, the test case sub_sub_less is unroll-and-jammed which it must not,

llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll
363	This is not safe to unroll-and-jam. For %N == 2 the excution sequence is (Sa being the first access in the loop body, Sb the second) Sa(0,0): A[0] Sb(0,0): A[-1] Sa(0,1): A[0] Sb(0,1): A[-1] Sa(1,0): A[1] Sb(1,0): A[0] Sa(1,1): A[1] Sb(1,1): A[0] After unroll-and-jam by 2: Sa(0,0): A[0] Sb(0,0): A[-1] Sa(1,0): A[1] Sb(1,0): A[0] Sa(0,1): A[0] Sb(0,1): A[-1] Sa(1,1): A[1] Sb(1,1): A[0] That is, the dependency chain `Sa(0,0)->Sa(0,1)->Sb(1,0)->Sb(1,1)` has become `Sa(0,0)->Sb(1,0)->Sa(0,1)->Sb(1,1)` and therefore has been violated.

Thanks Michael for the suggested change, I am still thinking/understanding it.

Harbormaster failed remote builds in B51412: Diff 254388!Apr 1 2020, 7:37 PM

After a tiring search through the literature I've finally found a paper that states a theorem specifically about safety of unroll-and-jam. See Callahan et al. 1988. Estimating Interlock And Improving Balance For Pipelined Architectures, section 3.5, theorem 4. The theorem generally agrees with the approach of checking for lexicographical positivity, but unfortunately it doesn't say anything about cases where the direction vector elements in between the k (unrolled level) and j (inner level carrying a negative dependence) are non-zero. Another paper, S. Carr and K. Kennedy. 1994. Improving the Ratio of Memory operations to Floating-Point Operations in Loops appears to interpret it as if any negative entry in the direction vector between the unrolled level and the inner-most level is not legal. I find the latter too conservative, for instance if we have:

loop i
  loop j
    loop k
      A(i+1, j+1, k-1) = A(i, j, k)

the direction vector would be [< < >] and the unroll-and-jam would be legal. Checking for lexicographical positivity, as proposed in @Meinersbur 's solution, correctly identifies it as a legal case. The first paper (and others too) also suggest that unroll-and-jam can be viewed as an interchange followed by inner-loop unrolling followed by another interchange. The legality checks for interchange seem to also be overly conservative, for example in this case:

loop i
  loop j
    A(i, j+1) = A(i, j)

After the first interchange and inner loop unrolling we get:

loop j
  loop i
    A(i, j+1) = A(i, j)
    A(i+1, j+1) = A(i+1, j)

now there is an interchange preventing dependence from the first statement to the second with direction vector [< >]. We know, however that if we unrolled i in the original nest, there would be no fusion-preventing dependencies between the unrolled iterations at j-level, so unrol-and-jam is legal.

In conclusion, I find the lexicographical positivity test to be the most accurate and I cannot come up with a counter example that exposes a correctness bug, so I tend to agree with it more than anything else.

This seem to work nicely with that checks, but I am not sure whether it is correct to ignore isScalar in the EQ case and why. It seems obvious that the sub_sub_eq test case can be unroll-and-jammed. Bit if we add the same skip to preservesForwardDependence, the test case sub_sub_less is unroll-and-jammed which it must not

It would not be correct to ignore scalar dependencies, as they carry dependencies across all iterations of a loop, but in cases where the direction at the unrolled level is exactly EQ (eg in sub_sub_eq) we can assume safety without considering the inner levels. The reason is that if the unrolled level is exactly EQ, it will become LT (or GT) after unrolling which causes the dependencies in the inner loops to become non-fusion-preventing because the memory accesses would be disjoint. On the other hand, for cases like sub_sub_less, the dependence carried by the outer loop is LT but the dependence carried by the inner loop is scalar, which implies a lexicographical negativity. You get this check for free in your suggested implementation above because, by design, the direction bits in the DependenceInfo are all set when we have a scalar dependence, and preservesForwardDependence returns false causing the test to identify this case as illegal. We can add an explicit check for isScalar to be more clear, but I think just checking for the direction bits is simpler and good enough.

I don't think we need preservesNonCarriedDependence but we'd need a test for the EQ case, so I'd suggest removing preservesNonCarriedDependence and replacing checkDependency with the following which works well on all the tests and avoids the inexplicable skipping of scalar dependencies.

static bool checkDependency(Instruction *Src, Instruction *Dst,
                            unsigned UnrollLevel, unsigned JamLevel,
                            bool Sequentialized, DependenceInfo &DI) {
  assert(UnrollLevel <= JamLevel);

  if (Src == Dst)
    return true;
  if (isa<LoadInst>(Src) && isa<LoadInst>(Dst))
    return true;

  std::unique_ptr<Dependence> D = DI.depends(Src, Dst, true);
  if (!D)
    return true;
  assert(D->isOrdered() && "Expected an output, flow or anti dep.");

  // Quick bail-out.
  if (D->isConfused())
    return false;

  for (unsigned d = 1; d < UnrollLevel; ++d) {
    // Insert comment justifying this here
    if (!(D->getDirection(d) & Dependence::DVEntry::EQ))
      return true;
  }

  auto UnrollDir = D->getDirection(UnrollLevel);

  // If the distance carried by the unrolled loop is 0, then after unrolling
  // that distance will become non-zero resulting in non-overlapping accesses in
  // the inner loops.
  if (UnrollDir == Dependence::DVEntry::EQ)
    return true;

  if (UnrollDir & Dependence::DVEntry::LT &&
      !preservesForwardDependence(Src, Dst, UnrollLevel, JamLevel,
                                  Sequentialized, D.get()))
    return false;

  if (UnrollDir & Dependence::DVEntry::GT &&
      !preservesBackwardDependence(Src, Dst, UnrollLevel, JamLevel,
                                   Sequentialized, D.get()))
    return false;

  return true;
}

We should also add a test for the following illegal case involving scalar dependence at the outer level:

// loop k
//   loop i
//     loop j
//       A(i-1, j) = A(i, j)

Sounds right. @Whitney Can you update this patch?

Updated patch as suggested by @bmahjour and @Meinersbur. Thanks!

Harbormaster failed remote builds in B55583: Diff 261716!May 3 2020, 12:45 PM

Meinersbur added inline comments.May 4 2020, 9:09 AM

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp
792	Could you please address this comment?

Addressed the last comment.

Harbormaster failed remote builds in B55656: Diff 261856!May 4 2020, 10:11 AM

LGTM, thank you.

This revision is now accepted and ready to land.May 4 2020, 3:22 PM

Closed by commit rG0a52401ad68b: [LoopUnrollAndJam] Changed safety checks to consider more than 2-levels loop… (authored by Whitney). · Explain WhyMay 6 2020, 3:20 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

UnrollLoop.h

2 lines

lib/

Transforms/

Scalar/

LoopUnrollAndJamPass.cpp

20 lines

Utils/

LoopUnrollAndJam.cpp

403 lines

test/

Transforms/

LoopUnrollAndJam/

dependencies.ll

2 lines

dependencies_multidims.ll

219 lines

Diff 262489

llvm/include/llvm/Transforms/Utils/UnrollLoop.h

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	LoopUnrollResult UnrollAndJamLoop(Loop *L, unsigned Count, unsigned TripCount,
unsigned TripMultiple, bool UnrollRemainder,		unsigned TripMultiple, bool UnrollRemainder,
LoopInfo LI, ScalarEvolution SE,		LoopInfo LI, ScalarEvolution SE,
DominatorTree DT, AssumptionCache AC,		DominatorTree DT, AssumptionCache AC,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
OptimizationRemarkEmitter *ORE,		OptimizationRemarkEmitter *ORE,
Loop **EpilogueLoop = nullptr);		Loop **EpilogueLoop = nullptr);

bool isSafeToUnrollAndJam(Loop *L, ScalarEvolution &SE, DominatorTree &DT,		bool isSafeToUnrollAndJam(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
DependenceInfo &DI);		DependenceInfo &DI, LoopInfo &LI);

bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI,		bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI,
DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,		DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,
const SmallPtrSetImpl<const Value *> &EphValues,		const SmallPtrSetImpl<const Value *> &EphValues,
OptimizationRemarkEmitter *ORE, unsigned &TripCount,		OptimizationRemarkEmitter *ORE, unsigned &TripCount,
unsigned MaxTripCount, bool MaxOrZero,		unsigned MaxTripCount, bool MaxOrZero,
unsigned &TripMultiple, unsigned LoopSize,		unsigned &TripMultiple, unsigned LoopSize,
TargetTransformInfo::UnrollingPreferences &UP,		TargetTransformInfo::UnrollingPreferences &UP,
Show All 27 Lines

llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp

Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	static bool computeUnrollAndJamCount(
return false;		return false;
}		}

static LoopUnrollResult		static LoopUnrollResult
tryToUnrollAndJamLoop(Loop L, DominatorTree &DT, LoopInfo LI,		tryToUnrollAndJamLoop(Loop L, DominatorTree &DT, LoopInfo LI,
ScalarEvolution &SE, const TargetTransformInfo &TTI,		ScalarEvolution &SE, const TargetTransformInfo &TTI,
AssumptionCache &AC, DependenceInfo &DI,		AssumptionCache &AC, DependenceInfo &DI,
OptimizationRemarkEmitter &ORE, int OptLevel) {		OptimizationRemarkEmitter &ORE, int OptLevel) {
// Quick checks of the correct loop form
WhitneyAuthorUnsubmitted Done Reply Inline Actions Not needed as they are already checked in `isSafeToUnrollAndJam` Whitney: Not needed as they are already checked in `isSafeToUnrollAndJam`
if (!L->isLoopSimplifyForm() \|\| L->getSubLoops().size() != 1)
return LoopUnrollResult::Unmodified;
Loop *SubLoop = L->getSubLoops()[0];
if (!SubLoop->isLoopSimplifyForm())
return LoopUnrollResult::Unmodified;

BasicBlock *Latch = L->getLoopLatch();
BasicBlock *Exit = L->getExitingBlock();
BasicBlock *SubLoopLatch = SubLoop->getLoopLatch();
BasicBlock *SubLoopExit = SubLoop->getExitingBlock();

if (Latch != Exit \|\| SubLoopLatch != SubLoopExit)
return LoopUnrollResult::Unmodified;

TargetTransformInfo::UnrollingPreferences UP =		TargetTransformInfo::UnrollingPreferences UP =
gatherUnrollingPreferences(L, SE, TTI, nullptr, nullptr, OptLevel, None,		gatherUnrollingPreferences(L, SE, TTI, nullptr, nullptr, OptLevel, None,
None, None, None, None, None, None, None);		None, None, None, None, None, None, None);
if (AllowUnrollAndJam.getNumOccurrences() > 0)		if (AllowUnrollAndJam.getNumOccurrences() > 0)
UP.UnrollAndJam = AllowUnrollAndJam;		UP.UnrollAndJam = AllowUnrollAndJam;
if (UnrollAndJamThreshold.getNumOccurrences() > 0)		if (UnrollAndJamThreshold.getNumOccurrences() > 0)
UP.UnrollAndJamInnerLoopThreshold = UnrollAndJamThreshold;		UP.UnrollAndJamInnerLoopThreshold = UnrollAndJamThreshold;
// Exit early if unrolling is disabled.		// Exit early if unrolling is disabled.
Show All 13 Lines	tryToUnrollAndJamLoop(Loop L, DominatorTree &DT, LoopInfo LI,
// metadata. This means #pragma nounroll will disable unroll and jam as well		// metadata. This means #pragma nounroll will disable unroll and jam as well
// as unrolling		// as unrolling
if (hasAnyUnrollPragma(L, "llvm.loop.unroll.") &&		if (hasAnyUnrollPragma(L, "llvm.loop.unroll.") &&
!hasAnyUnrollPragma(L, "llvm.loop.unroll_and_jam.")) {		!hasAnyUnrollPragma(L, "llvm.loop.unroll_and_jam.")) {
LLVM_DEBUG(dbgs() << " Disabled due to pragma.\n");		LLVM_DEBUG(dbgs() << " Disabled due to pragma.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}

if (!isSafeToUnrollAndJam(L, SE, DT, DI)) {		if (!isSafeToUnrollAndJam(L, SE, DT, DI, *LI)) {
LLVM_DEBUG(dbgs() << " Disabled due to not being safe.\n");		LLVM_DEBUG(dbgs() << " Disabled due to not being safe.\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}

// Approximate the loop size and collect useful info		// Approximate the loop size and collect useful info
unsigned NumInlineCandidates;		unsigned NumInlineCandidates;
bool NotDuplicatable;		bool NotDuplicatable;
bool Convergent;		bool Convergent;
SmallPtrSet<const Value *, 32> EphValues;		SmallPtrSet<const Value *, 32> EphValues;
CodeMetrics::collectEphemeralValues(L, &AC, EphValues);		CodeMetrics::collectEphemeralValues(L, &AC, EphValues);
		Loop *SubLoop = L->getSubLoops()[0];
unsigned InnerLoopSize =		unsigned InnerLoopSize =
ApproximateLoopSize(SubLoop, NumInlineCandidates, NotDuplicatable,		ApproximateLoopSize(SubLoop, NumInlineCandidates, NotDuplicatable,
Convergent, TTI, EphValues, UP.BEInsns);		Convergent, TTI, EphValues, UP.BEInsns);
unsigned OuterLoopSize =		unsigned OuterLoopSize =
ApproximateLoopSize(L, NumInlineCandidates, NotDuplicatable, Convergent,		ApproximateLoopSize(L, NumInlineCandidates, NotDuplicatable, Convergent,
TTI, EphValues, UP.BEInsns);		TTI, EphValues, UP.BEInsns);
LLVM_DEBUG(dbgs() << " Outer Loop Size: " << OuterLoopSize << "\n");		LLVM_DEBUG(dbgs() << " Outer Loop Size: " << OuterLoopSize << "\n");
LLVM_DEBUG(dbgs() << " Inner Loop Size: " << InnerLoopSize << "\n");		LLVM_DEBUG(dbgs() << " Inner Loop Size: " << InnerLoopSize << "\n");
Show All 21 Lines	tryToUnrollAndJamLoop(Loop L, DominatorTree &DT, LoopInfo LI,
// for the jammed inner loop.		// for the jammed inner loop.
Optional<MDNode *> NewInnerEpilogueLoopID = makeFollowupLoopID(		Optional<MDNode *> NewInnerEpilogueLoopID = makeFollowupLoopID(
OrigOuterLoopID, {LLVMLoopUnrollAndJamFollowupAll,		OrigOuterLoopID, {LLVMLoopUnrollAndJamFollowupAll,
LLVMLoopUnrollAndJamFollowupRemainderInner});		LLVMLoopUnrollAndJamFollowupRemainderInner});
if (NewInnerEpilogueLoopID.hasValue())		if (NewInnerEpilogueLoopID.hasValue())
SubLoop->setLoopID(NewInnerEpilogueLoopID.getValue());		SubLoop->setLoopID(NewInnerEpilogueLoopID.getValue());

// Find trip count and trip multiple		// Find trip count and trip multiple
		BasicBlock *Latch = L->getLoopLatch();
		BasicBlock *SubLoopLatch = SubLoop->getLoopLatch();
unsigned OuterTripCount = SE.getSmallConstantTripCount(L, Latch);		unsigned OuterTripCount = SE.getSmallConstantTripCount(L, Latch);
unsigned OuterTripMultiple = SE.getSmallConstantTripMultiple(L, Latch);		unsigned OuterTripMultiple = SE.getSmallConstantTripMultiple(L, Latch);
unsigned InnerTripCount = SE.getSmallConstantTripCount(SubLoop, SubLoopLatch);		unsigned InnerTripCount = SE.getSmallConstantTripCount(SubLoop, SubLoopLatch);

// Decide if, and by how much, to unroll		// Decide if, and by how much, to unroll
bool IsCountSetExplicitly = computeUnrollAndJamCount(		bool IsCountSetExplicitly = computeUnrollAndJamCount(
L, SubLoop, TTI, DT, LI, SE, EphValues, &ORE, OuterTripCount,		L, SubLoop, TTI, DT, LI, SE, EphValues, &ORE, OuterTripCount,
OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP);		OuterTripMultiple, OuterLoopSize, InnerTripCount, InnerLoopSize, UP);
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

static bool tryToUnrollAndJamLoop(Function &F, DominatorTree &DT, LoopInfo &LI,		static bool tryToUnrollAndJamLoop(Function &F, DominatorTree &DT, LoopInfo &LI,
ScalarEvolution &SE,		ScalarEvolution &SE,
const TargetTransformInfo &TTI,		const TargetTransformInfo &TTI,
AssumptionCache &AC, DependenceInfo &DI,		AssumptionCache &AC, DependenceInfo &DI,
OptimizationRemarkEmitter &ORE,		OptimizationRemarkEmitter &ORE,
int OptLevel) {		int OptLevel) {
bool DidSomething = false;		bool DidSomething = false;

// The loop unroll and jam pass requires loops to be in simplified form, and		// The loop unroll and jam pass requires loops to be in simplified form, and
WhitneyAuthorUnsubmitted Done Reply Inline Actions Just clang-format Whitney: Just clang-format
MeinersburUnsubmitted Done Reply Inline Actions Could you commit this separately? I.e. just push a NFC commit explaining you are working on this file. Meinersbur: Could you commit this separately? I.e. just push a NFC commit explaining you are working on…
// also needs LCSSA. Since simplification may add new inner loops, it has to		// also needs LCSSA. Since simplification may add new inner loops, it has to
// run before the legality and profitability checks. This means running the		// run before the legality and profitability checks. This means running the
// loop unroll and jam pass will simplify all loops, regardless of whether		// loop unroll and jam pass will simplify all loops, regardless of whether
// anything end up being unroll and jammed.		// anything end up being unroll and jammed.
for (auto &L : LI) {		for (auto &L : LI) {
DidSomething \|=		DidSomething \|=
simplifyLoop(L, &DT, &LI, &SE, &AC, nullptr, false /* PreserveLCSSA */);		simplifyLoop(L, &DT, &LI, &SE, &AC, nullptr, false /* PreserveLCSSA */);
DidSomething \|= formLCSSARecursively(*L, DT, &LI, &SE);		DidSomething \|= formLCSSARecursively(*L, DT, &LI, &SE);
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp

Show All 9 Lines
// LoopUnroll.cpp implements loop unroll.		// LoopUnroll.cpp implements loop unroll.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
		#include "llvm/ADT/Sequence.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/DependenceAnalysis.h"		#include "llvm/Analysis/DependenceAnalysis.h"
Show All 38 Lines

STATISTIC(NumUnrolledAndJammed, "Number of loops unroll and jammed");		STATISTIC(NumUnrolledAndJammed, "Number of loops unroll and jammed");
STATISTIC(NumCompletelyUnrolledAndJammed, "Number of loops unroll and jammed");		STATISTIC(NumCompletelyUnrolledAndJammed, "Number of loops unroll and jammed");

typedef SmallPtrSet<BasicBlock *, 4> BasicBlockSet;		typedef SmallPtrSet<BasicBlock *, 4> BasicBlockSet;

// Partition blocks in an outer/inner loop pair into blocks before and after		// Partition blocks in an outer/inner loop pair into blocks before and after
// the loop		// the loop
static bool partitionOuterLoopBlocks(Loop L, Loop SubLoop,		static bool partitionLoopBlocks(Loop &L, BasicBlockSet &ForeBlocks,
BasicBlockSet &ForeBlocks,		BasicBlockSet &AftBlocks, DominatorTree &DT) {
BasicBlockSet &SubLoopBlocks,		Loop *SubLoop = L.getSubLoops()[0];
BasicBlockSet &AftBlocks,
DominatorTree *DT) {
BasicBlock *SubLoopLatch = SubLoop->getLoopLatch();		BasicBlock *SubLoopLatch = SubLoop->getLoopLatch();
SubLoopBlocks.insert(SubLoop->block_begin(), SubLoop->block_end());

for (BasicBlock *BB : L->blocks()) {		for (BasicBlock *BB : L.blocks()) {
if (!SubLoop->contains(BB)) {		if (!SubLoop->contains(BB)) {
if (DT->dominates(SubLoopLatch, BB))		if (DT.dominates(SubLoopLatch, BB))
AftBlocks.insert(BB);		AftBlocks.insert(BB);
else		else
ForeBlocks.insert(BB);		ForeBlocks.insert(BB);
}		}
}		}

// Check that all blocks in ForeBlocks together dominate the subloop		// Check that all blocks in ForeBlocks together dominate the subloop
// TODO: This might ideally be done better with a dominator/postdominators.		// TODO: This might ideally be done better with a dominator/postdominators.
BasicBlock *SubLoopPreHeader = SubLoop->getLoopPreheader();		BasicBlock *SubLoopPreHeader = SubLoop->getLoopPreheader();
for (BasicBlock *BB : ForeBlocks) {		for (BasicBlock *BB : ForeBlocks) {
if (BB == SubLoopPreHeader)		if (BB == SubLoopPreHeader)
continue;		continue;
Instruction *TI = BB->getTerminator();		Instruction *TI = BB->getTerminator();
for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)		for (BasicBlock *Succ : successors(TI))
if (!ForeBlocks.count(TI->getSuccessor(i)))		if (!ForeBlocks.count(Succ))
		return false;
		}

		return true;
		}

		/// Partition blocks in a loop nest into blocks before and after each inner
		MeinersburUnsubmitted Done Reply Inline Actions Could you add a doxygen description for this function? Meinersbur: Could you add a doxygen description for this function?
		/// loop.
		static bool partitionOuterLoopBlocks(
		Loop &Root, Loop &JamLoop, BasicBlockSet &JamLoopBlocks,
		DenseMap<Loop *, BasicBlockSet> &ForeBlocksMap,
		DenseMap<Loop *, BasicBlockSet> &AftBlocksMap, DominatorTree &DT) {
		JamLoopBlocks.insert(JamLoop.block_begin(), JamLoop.block_end());

		for (Loop *L : Root.getLoopsInPreorder()) {
		if (L == &JamLoop)
		break;

		if (!partitionLoopBlocks(*L, ForeBlocksMap[L], AftBlocksMap[L], DT))
return false;		return false;
}		}

return true;		return true;
}		}
		MeinersburUnsubmitted Done Reply Inline Actions [nit] TODO's should not be doxygen comments Meinersbur: [nit] TODO's should not be doxygen comments

		// TODO Remove when UnrollAndJamLoop changed to support unroll and jamming more
		// than 2 levels loop.
		static bool partitionOuterLoopBlocks(Loop L, Loop SubLoop,
		BasicBlockSet &ForeBlocks,
		BasicBlockSet &SubLoopBlocks,
		BasicBlockSet &AftBlocks,
		DominatorTree *DT) {
		SubLoopBlocks.insert(SubLoop->block_begin(), SubLoop->block_end());
		return partitionLoopBlocks(L, ForeBlocks, AftBlocks, DT);
		}

// Looks at the phi nodes in Header for values coming from Latch. For these		// Looks at the phi nodes in Header for values coming from Latch. For these
// instructions and all their operands calls Visit on them, keeping going for		// instructions and all their operands calls Visit on them, keeping going for
// all the operands in AftBlocks. Returns false if Visit returns false,		// all the operands in AftBlocks. Returns false if Visit returns false,
// otherwise returns true. This is used to process the instructions in the		// otherwise returns true. This is used to process the instructions in the
// Aft blocks that need to be moved before the subloop. It is used in two		// Aft blocks that need to be moved before the subloop. It is used in two
// places. One to check that the required set of instructions can be moved		// places. One to check that the required set of instructions can be moved
// before the loop. Then to collect the instructions to actually move in		// before the loop. Then to collect the instructions to actually move in
// moveHeaderPhiOperandsToForeBlocks.		// moveHeaderPhiOperandsToForeBlocks.
▲ Show 20 Lines • Show All 502 Lines • ▼ Show 20 Lines	#ifndef NDEBUG
SE->verify();		SE->verify();
#endif		#endif

return CompletelyUnroll ? LoopUnrollResult::FullyUnrolled		return CompletelyUnroll ? LoopUnrollResult::FullyUnrolled
: LoopUnrollResult::PartiallyUnrolled;		: LoopUnrollResult::PartiallyUnrolled;
}		}

static bool getLoadsAndStores(BasicBlockSet &Blocks,		static bool getLoadsAndStores(BasicBlockSet &Blocks,
SmallVector<Value *, 4> &MemInstr) {		SmallVector<Instruction *, 4> &MemInstr) {
// Scan the BBs and collect legal loads and stores.		// Scan the BBs and collect legal loads and stores.
// Returns false if non-simple loads/stores are found.		// Returns false if non-simple loads/stores are found.
for (BasicBlock *BB : Blocks) {		for (BasicBlock *BB : Blocks) {
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
if (auto *Ld = dyn_cast<LoadInst>(&I)) {		if (auto *Ld = dyn_cast<LoadInst>(&I)) {
if (!Ld->isSimple())		if (!Ld->isSimple())
return false;		return false;
MemInstr.push_back(&I);		MemInstr.push_back(&I);
} else if (auto *St = dyn_cast<StoreInst>(&I)) {		} else if (auto *St = dyn_cast<StoreInst>(&I)) {
if (!St->isSimple())		if (!St->isSimple())
return false;		return false;
MemInstr.push_back(&I);		MemInstr.push_back(&I);
} else if (I.mayReadOrWriteMemory()) {		} else if (I.mayReadOrWriteMemory()) {
return false;		return false;
}		}
}		}
}		}
return true;		return true;
}		}

static bool checkDependencies(SmallVector<Value *, 4> &Earlier,		static bool preservesForwardDependence(Instruction Src, Instruction Dst,
SmallVector<Value *, 4> &Later,		unsigned UnrollLevel, unsigned JamLevel,
unsigned LoopDepth, bool InnerLoop,		bool Sequentialized, Dependence *D) {
DependenceInfo &DI) {		// UnrollLevel might carry the dependency Src --> Dst
// Use DA to check for dependencies between loads and stores that make unroll		// Does a different loop after unrolling?
// and jam invalid		for (unsigned CurLoopDepth = UnrollLevel + 1; CurLoopDepth <= JamLevel;
for (Value *I : Earlier) {		++CurLoopDepth) {
for (Value *J : Later) {		auto JammedDir = D->getDirection(CurLoopDepth);
Instruction *Src = cast<Instruction>(I);		if (JammedDir == Dependence::DVEntry::LT)
Instruction *Dst = cast<Instruction>(J);		return true;

		if (JammedDir & Dependence::DVEntry::GT)
		return false;
		}

		return true;
		}

		static bool preservesBackwardDependence(Instruction Src, Instruction Dst,
		unsigned UnrollLevel, unsigned JamLevel,
		bool Sequentialized, Dependence *D) {
		// UnrollLevel might carry the dependency Dst --> Src
		for (unsigned CurLoopDepth = UnrollLevel + 1; CurLoopDepth <= JamLevel;
		++CurLoopDepth) {
		auto JammedDir = D->getDirection(CurLoopDepth);
		if (JammedDir == Dependence::DVEntry::GT)
		return true;

		if (JammedDir & Dependence::DVEntry::LT)
		return false;
		}

		// Backward dependencies are only preserved if not interleaved.
		return Sequentialized;
		}

		// Check whether it is semantically safe Src and Dst considering any potential
		// dependency between them.
		//
		// @param UnrollLevel The level of the loop being unrolled
		// @param JamLevel The level of the loop being jammed; if Src and Dst are on
		// different levels, the outermost common loop counts as jammed level
		//
		// @return true if is safe and false if there is a dependency violation.
		static bool checkDependency(Instruction Src, Instruction Dst,
		unsigned UnrollLevel, unsigned JamLevel,
		bool Sequentialized, DependenceInfo &DI) {
		assert(UnrollLevel <= JamLevel &&
		MeinersburUnsubmitted Done Reply Inline Actions [serious] This does not correspond to the `getDirection() == GT` from my version. In particular, this version lets' the loop pass if the outer loops have `NE` or `LT` dependenies. These do not ensure lexicographic greater-than and therefore cause a misoptimization. [remark] While some style guides prefer this functional style, I prefer the version with explicit loops and the comment that explains what the code is doing. Meinersbur: [serious] This does not correspond to the `getDirection() == GT` from my version. In particular…
		"Expecting JamLevel to be at least UnrollLevel");

if (Src == Dst)		if (Src == Dst)
continue;		return true;
// Ignore Input dependencies.		// Ignore Input dependencies.
if (isa<LoadInst>(Src) && isa<LoadInst>(Dst))		if (isa<LoadInst>(Src) && isa<LoadInst>(Dst))
continue;		return true;

// Track dependencies, and if we find them take a conservative approach		// Check whether unroll-and-jam may violate a dependency.
// by allowing only = or < (not >), altough some > would be safe		// By construction, every dependency will be lexicographically non-negative
// (depending upon unroll width).		// (if it was, it would violate the current execution order), such as
MeinersburUnsubmitted Done Reply Inline Actions i don't see this being enforced. Meinersbur: i don't see this being enforced.
WhitneyAuthorUnsubmitted Done Reply Inline Actions `if (D->getDirection(LoopDepth) & Dependence::DVEntry::GT) return false;` => `allowing only = or < (not >)` Whitney: `if (D->getDirection(LoopDepth) & Dependence::DVEntry::GT) return false;` => `allowing only =…
// For the inner loop, we need to disallow any (> <) dependencies		// (0,0,>,,)
// FIXME: Allow > so long as distance is less than unroll width		// Unroll-and-jam changes the GT execution of two executions to the same
if (auto D = DI.depends(Src, Dst, true)) {		// iteration of the chosen unroll level. That is, a GT dependence becomes a GE
		// dependence (or EQ, if we fully unrolled the loop) at the loop's position:
		// (0,0,>=,,)
		// Now, the dependency is not necessarily non-negative anymore, i.e.
		// unroll-and-jam may violate correctness.
		std::unique_ptr<Dependence> D = DI.depends(Src, Dst, true);
		if (!D)
		return true;
assert(D->isOrdered() && "Expected an output, flow or anti dep.");		assert(D->isOrdered() && "Expected an output, flow or anti dep.");

if (D->isConfused()) {		if (D->isConfused()) {
LLVM_DEBUG(dbgs() << " Confused dependency between:\n"		LLVM_DEBUG(dbgs() << " Confused dependency between:\n"
<< " " << *Src << "\n"		<< " " << *Src << "\n"
<< " " << *Dst << "\n");		<< " " << *Dst << "\n");
return false;		return false;
}		}
if (!InnerLoop) {
if (D->getDirection(LoopDepth) & Dependence::DVEntry::GT) {		// If outer levels (levels enclosing the loop being unroll-and-jammed) have a
LLVM_DEBUG(dbgs() << " > dependency between:\n"		// non-equal direction, then the locations accessed in the inner levels cannot
<< " " << *Src << "\n"		// overlap in memory. We assumes the indexes never overlap into neighboring
<< " " << *Dst << "\n");		// dimensions.
		for (unsigned CurLoopDepth = 1; CurLoopDepth < UnrollLevel; ++CurLoopDepth)
		if (!(D->getDirection(CurLoopDepth) & Dependence::DVEntry::EQ))
		return true;

		auto UnrollDirection = D->getDirection(UnrollLevel);

		// If the distance carried by the unrolled loop is 0, then after unrolling
		// that distance will become non-zero resulting in non-overlapping accesses in
		// the inner loops.
		if (UnrollDirection == Dependence::DVEntry::EQ)
		return true;

		if (UnrollDirection & Dependence::DVEntry::LT &&
		!preservesForwardDependence(Src, Dst, UnrollLevel, JamLevel,
		Sequentialized, D.get()))
return false;		return false;

		if (UnrollDirection & Dependence::DVEntry::GT &&
		!preservesBackwardDependence(Src, Dst, UnrollLevel, JamLevel,
		Sequentialized, D.get()))
		return false;

		return true;
}		}
} else {
assert(LoopDepth + 1 <= D->getLevels());		static bool
if (D->getDirection(LoopDepth) & Dependence::DVEntry::GT &&		checkDependencies(Loop &Root, const BasicBlockSet &SubLoopBlocks,
D->getDirection(LoopDepth + 1) & Dependence::DVEntry::LT) {		const DenseMap<Loop *, BasicBlockSet> &ForeBlocksMap,
LLVM_DEBUG(dbgs() << " < > dependency between:\n"		const DenseMap<Loop *, BasicBlockSet> &AftBlocksMap,
<< " " << *Src << "\n"		DependenceInfo &DI, LoopInfo &LI) {
<< " " << *Dst << "\n");		SmallVector<BasicBlockSet, 8> AllBlocks;
		for (Loop *L : Root.getLoopsInPreorder())
		if (ForeBlocksMap.find(L) != ForeBlocksMap.end())
		AllBlocks.push_back(ForeBlocksMap.lookup(L));
		AllBlocks.push_back(SubLoopBlocks);
		for (Loop *L : Root.getLoopsInPreorder())
		if (AftBlocksMap.find(L) != AftBlocksMap.end())
		AllBlocks.push_back(AftBlocksMap.lookup(L));

		unsigned LoopDepth = Root.getLoopDepth();
		MeinersburUnsubmitted Done Reply Inline Actions This should be `for (size_t j = i; j < NumInsts; ++j)` to not skip over self-dependencies of a store. Meinersbur: This should be ` for (size_t j = i; j < NumInsts; ++j)` to not skip over self-dependencies of a…
		MeinersburUnsubmitted Done Reply Inline Actions Could you please address this comment? Meinersbur: Could you please address this comment?
		SmallVector<Instruction *, 4> EarlierLoadsAndStores;
		SmallVector<Instruction *, 4> CurrentLoadsAndStores;
		for (BasicBlockSet &Blocks : AllBlocks) {
		CurrentLoadsAndStores.clear();
		if (!getLoadsAndStores(Blocks, CurrentLoadsAndStores))
		return false;

		Loop CurLoop = LI.getLoopFor((Blocks.begin())->front().getParent());
		unsigned CurLoopDepth = CurLoop->getLoopDepth();

		for (auto *Earlier : EarlierLoadsAndStores) {
		Loop *EarlierLoop = LI.getLoopFor(Earlier->getParent());
		unsigned EarlierDepth = EarlierLoop->getLoopDepth();
		unsigned CommonLoopDepth = std::min(EarlierDepth, CurLoopDepth);
		for (auto *Later : CurrentLoadsAndStores) {
		if (!checkDependency(Earlier, Later, LoopDepth, CommonLoopDepth, false,
		DI))
return false;		return false;
}		}
}		}

		size_t NumInsts = CurrentLoadsAndStores.size();
		for (size_t I = 0; I < NumInsts; ++I) {
		for (size_t J = I; J < NumInsts; ++J) {
		if (!checkDependency(CurrentLoadsAndStores[I], CurrentLoadsAndStores[J],
		LoopDepth, CurLoopDepth, true, DI))
		return false;
}		}
		MeinersburUnsubmitted Done Reply Inline Actions I think this is should be looking for whether the first `GT` direction is due to the root loop (i.e. whether the root loop is the cause for the dependence to be fulfilled). Not an correctness issue. The bitwise `&` comparison also matches `NE` and `GE`. Is this intended? Meinersbur: I think this is should be looking for whether the first `GT` direction is due to the root loop…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions I think this is should be looking for whether the first GT direction is due to the root loop (i.e. whether the root loop is the cause for the dependence to be fulfilled). Not an correctness issue. Here `LoopDepth` is assumed to be the loop depth of the unroll loop. The bitwise & comparison also matches NE and GE. Is this intended? The code before my change is using `&`, so I don't want to change the behaviour. Whitney: > I think this is should be looking for whether the first GT direction is due to the root loop…
		MeinersburUnsubmitted Done Reply Inline Actions Here LoopDepth is assumed to be the loop depth of the unroll loop. I know. However, the dependency might also be fullfiled by surrounding loops. eg: for i = 0 ... n-1 #pragma unrollandjam for j = 0 ... n-1 for j = 0 ... n-1: A[i][j][k] = f(A[i-1][j-1][k]); The dependence vector is (>,>,=). Verifying the direction of the j-dependence is not necessary because for the same `i`, the inner loops are parallel. In other words: the `i`-loop already ensures that the dependency is fulfilled. Meinersbur: > Here LoopDepth is assumed to be the loop depth of the unroll loop. I know. However, the…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Added code to allow accesses of different base. Whitney: Added code to allow accesses of different base.
		MeinersburUnsubmitted Done Reply Inline Actions This looks like a bail-out; might improve this by setting `CurLoop` to innermost loop both instructions are in. I'd prefer the following structure: if (D->getDirection(LoopDepth) & Dependence::DVEntry::GT) { // That is, as by the other comment, check whter the root loop carries this dependency if (!CurLoop) return false; // Bail-out ... } Meinersbur: This looks like a bail-out; might improve this by setting `CurLoop` to innermost loop both…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions might improve this by setting CurLoop to innermost loop both instructions are in. Do you mean passing `LI` into this function, and calculating `CurLoop` here? Whitney: > might improve this by setting CurLoop to innermost loop both instructions are in. Do you…
		MeinersburUnsubmitted Done Reply Inline Actions Where the innermost common loop is computed is unimportant. Meinersbur: Where the innermost common loop is computed is unimportant.
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Then I think I misunderstood, currently CurLoop is only set when both instructions belongs to the same loop. Whitney: Then I think I misunderstood, currently CurLoop is only set when both instructions belongs to…
}		}

		MeinersburUnsubmitted Done Reply Inline Actions [style] [[ https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop \| Don't evaluate `Cur->getLoopDepth()` every time through the loop ]]. Consider using `llvm::seq` Meinersbur: [style] [[ https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a…
		EarlierLoadsAndStores.append(CurrentLoadsAndStores.begin(),
		CurrentLoadsAndStores.end());
		MeinersburUnsubmitted Done Reply Inline Actions Isn't this equivalent to `D->getDirection(CurLoopDepth ) == EQ`? Meinersbur: Isn't this equivalent to `D->getDirection(CurLoopDepth ) == EQ`?
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Could also be `ALL`. I think `isScalar` is clearer. Whitney: Could also be `ALL`. I think `isScalar` is clearer.
		MeinersburUnsubmitted Done Reply Inline Actions `ALL` would be `` in the dependence vector, i.e. the analysis could not prove any direction. However, it's also not `EQ`. Sorry for the confusion. I am not convinced it can be ignored. #pragma unrollandjam for i = ... for j = ... sum = f(sum, ...); The dependence induced by sum is obviously scalar, but it cannot be jammed (it's sequential). The dependence vector is (=>,>) Meinersbur:* `ALL` would be `*` in the dependence vector, i.e. the analysis could not prove any direction.
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Why the above example cannot be jammed? Before jam: for i = ... for j = ... sum = f(sum, ...); for j = ... sum = f(sum, ...); After jam: for i = ... for j = ... sum = f(sum, ...); sum = f(sum, ...); The access pattern seems to be the same to me. Whitney: Why the above example cannot be jammed? Before jam: ``` for i = ... for j = ... sum = f…
}		}
return true;		return true;
}		}
		MeinersburUnsubmitted Done Reply Inline Actions Again, the `&` comparison feels wrong. Did you consider a switch over the values of `Dependence::DVEntry`? Meinersbur: Again, the `&` comparison feels wrong. Did you consider a switch over the values of `Dependence…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions The code before my change is using `&`. I considered changing to use dependence distance instead of direction to allow more dependence, but I want to keep this patch as changes needed for safety checks for more than 2-levels loop nest. There is a FIXME: `// FIXME: Allow > so long as distance is less than unroll width` What do you think? Whitney: The code before my change is using `&`. I considered changing to use dependence distance…
		MeinersburUnsubmitted Done Reply Inline Actions I think taking the dependence distance into account is something for a different patch. Meinersbur: I think taking the dependence distance into account is something for a different patch.

static bool checkDependencies(Loop *L, BasicBlockSet &ForeBlocks,		static bool isEligibleLoopForm(const Loop &Root) {
BasicBlockSet &SubLoopBlocks,		// Root must have a child.
		MeinersburUnsubmitted Done Reply Inline Actions Could you write a comment about what kind of dependencies this is looking for/are not allowed? Meinersbur: Could you write a comment about what kind of dependencies this is looking for/are not allowed?
BasicBlockSet &AftBlocks, DependenceInfo &DI) {		if (Root.getSubLoops().size() != 1)
// Get all loads/store pairs for each blocks		return false;
SmallVector<Value *, 4> ForeMemInstr;
SmallVector<Value *, 4> SubLoopMemInstr;		const Loop *L = &Root;
SmallVector<Value *, 4> AftMemInstr;		do {
if (!getLoadsAndStores(ForeBlocks, ForeMemInstr) \|\|		// All loops in Root need to be in simplify and rotated form.
!getLoadsAndStores(SubLoopBlocks, SubLoopMemInstr) \|\|		if (!L->isLoopSimplifyForm())
		MeinersburUnsubmitted Done Reply Inline Actions It could also be `ALL` (or `GE`), meaning we don't know what direction it is. This check gives it a pass. Meinersbur: It could also be `ALL` (or `GE`), meaning we don't know what direction it is. This check gives…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Changed to `if (D->getDirection(CurLoopDepth) == Dependence::DVEntry::GT)` Whitney: Changed to `if (D->getDirection(CurLoopDepth) == Dependence::DVEntry::GT)`
!getLoadsAndStores(AftBlocks, AftMemInstr))		return false;
return false;
		if (!L->isRotatedForm())
		MeinersburUnsubmitted Done Reply Inline Actions Why is rotated form necessary? Meinersbur: Why is rotated form necessary?
		WhitneyAuthorUnsubmitted Done Reply Inline Actions It was there before: if (Latch != Exit) return false; if (SubLoopLatch != SubLoopExit) return false; But at that time the function `isRotatedForm` wasn't exist. Whitney: It was there before: ``` if (Latch != Exit) return false; if (SubLoopLatch != SubLoopExit)…
// Check for dependencies between any blocks that may change order		return false;
unsigned LoopDepth = L->getLoopDepth();
return checkDependencies(ForeMemInstr, SubLoopMemInstr, LoopDepth, false,		if (L->getHeader()->hasAddressTaken()) {
DI) &&		LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; Address taken\n");
checkDependencies(ForeMemInstr, AftMemInstr, LoopDepth, false, DI) &&		return false;
checkDependencies(SubLoopMemInstr, AftMemInstr, LoopDepth, false,		}
DI) &&
		MeinersburUnsubmitted Done Reply Inline Actions [style] [[ https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop \| Don’t evaluate `end()` every time through a loop ]] Meinersbur: [style] [[ https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a…
checkDependencies(SubLoopMemInstr, SubLoopMemInstr, LoopDepth, true,		unsigned SubLoopsSize = L->getSubLoops().size();
DI);		if (SubLoopsSize == 0)
		return true;

		// Only one child is allowed.
		if (SubLoopsSize != 1)
		return false;

		dmgreenUnsubmitted Done Reply Inline Actions This looks like it would recalculate LoadsAndStores a lot? Src and Dst don't seem like the right nomenclature too, but that's just a small nit. dmgreen: This looks like it would recalculate LoadsAndStores a lot? Src and Dst don't seem like the…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Renamed, and removed the calculation of one of the LoadsAndStores in the outer loop. I considered creating a map, but the code may get complicated. Is the current change okay with you, or you prefer a map? Whitney: Renamed, and removed the calculation of one of the LoadsAndStores in the outer loop. I…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Renamed, and moved the calculation of one of the LoadsAndStores in the outer loop. I considered creating a map, but the code may get complicated. Is the current change okay with you, or you prefer a map? Whitney: Renamed, and moved the calculation of one of the LoadsAndStores in the outer loop. I considered…
		dmgreenUnsubmitted Done Reply Inline Actions You may be right. It looks O(n^2), but perhaps n will not be too high? dmgreen: You may be right. It looks O(n^2), but perhaps n will not be too high?
		WhitneyAuthorUnsubmitted Done Reply Inline Actions n is `(loopdepth - 1) * 2 + 1`. I personally think is ok. Whitney: n is `(loopdepth - 1) * 2 + 1`. I personally think is ok.
		L = L->getSubLoops()[0];
		} while (L);

		return true;
		MeinersburUnsubmitted Done Reply Inline Actions [style] `(I).begin()` -> `I->begin()`? Meinersbur:* [style] `(*I).begin()` -> `I->begin()`?
		}

		static Loop getInnerMostLoop(Loop L) {
		while (!L->getSubLoops().empty())
		dmgreenUnsubmitted Done Reply Inline Actions `size() != 0` -> `!empty()` ? dmgreen: `size() != 0` -> `!empty()` ?
		L = L->getSubLoops()[0];
		return L;
}		}

bool llvm::isSafeToUnrollAndJam(Loop *L, ScalarEvolution &SE, DominatorTree &DT,		bool llvm::isSafeToUnrollAndJam(Loop *L, ScalarEvolution &SE, DominatorTree &DT,
		MeinersburUnsubmitted Done Reply Inline Actions Can the following be handled correctly? Multiple exits Meinersbur: Can the following be handled correctly? * Multiple exits
		WhitneyAuthorUnsubmitted Done Reply Inline Actions I think it should, but I will write a LIT test to be sure. Only one AftBlock is allowed in the outermostloop. Whitney: I think it should, but I will write a LIT test to be sure. Only one AftBlock is allowed in the…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Multiple exits are not allowed, and that's blocked in `partitionOuterLoopBlocks` Whitney: Multiple exits are not allowed, and that's blocked in `partitionOuterLoopBlocks`
DependenceInfo &DI) {		DependenceInfo &DI, LoopInfo &LI) {
		if (!isEligibleLoopForm(*L)) {
		LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; Ineligible loop form\n");
		return false;
		}

/* We currently handle outer loops like this:		/* We currently handle outer loops like this:
		MeinersburUnsubmitted Done Reply Inline Actions Does the comment need updating? Meinersbur: Does the comment need updating?
\|		\|
ForeFirst <----\ }		ForeFirst <------\ }
Blocks \| } ForeBlocks		Blocks \| } ForeBlocks of L
ForeLast \| }		ForeLast \| }
\| \|		\| \|
SubLoopFirst <\ \| }		... \|
Blocks \| \| } SubLoopBlocks		\| \|
SubLoopLast -/ \| }		ForeFirst <----\ \| }
		Blocks \| \| } ForeBlocks of a inner loop of L
		ForeLast \| \| }
		\| \| \|
		JamLoopFirst <\ \| \| }
		Blocks \| \| \| } JamLoopBlocks of the innermost loop
		JamLoopLast -/ \| \| }
		\| \| \|
		AftFirst \| \| }
		Blocks \| \| } AftBlocks of a inner loop of L
		AftLast ------/ \| }
		\| \|
		... \|
\| \|		\| \|
AftFirst \| }		AftFirst \| }
Blocks \| } AftBlocks		Blocks \| } AftBlocks of L
AftLast ------/ }		AftLast --------/ }
\|		\|

There are (theoretically) any number of blocks in ForeBlocks, SubLoopBlocks		There are (theoretically) any number of blocks in ForeBlocks, SubLoopBlocks
and AftBlocks, providing that there is one edge from Fores to SubLoops,		and AftBlocks, providing that there is one edge from Fores to SubLoops,
one edge from SubLoops to Afts and a single outer loop exit (from Afts).		one edge from SubLoops to Afts and a single outer loop exit (from Afts).
In practice we currently limit Aft blocks to a single block, and limit		In practice we currently limit Aft blocks to a single block, and limit
things further in the profitablility checks of the unroll and jam pass.		things further in the profitablility checks of the unroll and jam pass.

Because of the way we rearrange basic blocks, we also require that		Because of the way we rearrange basic blocks, we also require that
the Fore blocks on all unrolled iterations are safe to move before the		the Fore blocks of L on all unrolled iterations are safe to move before the
SubLoop blocks of all iterations. So we require that the phi node looping		blocks of the direct child of L of all iterations. So we require that the
operands of ForeHeader can be moved to at least the end of ForeEnd, so that		phi node looping operands of ForeHeader can be moved to at least the end of
we can arrange cloned Fore Blocks before the subloop and match up Phi's		ForeEnd, so that we can arrange cloned Fore Blocks before the subloop and
correctly.		match up Phi's correctly.

i.e. The old order of blocks used to be F1 S1_1 S1_2 A1 F2 S2_1 S2_2 A2.		i.e. The old order of blocks used to be
It needs to be safe to tranform this to F1 F2 S1_1 S2_1 S1_2 S2_2 A1 A2.		(F1)1 (F2)1 J1_1 J1_2 (A2)1 (A1)1 (F1)2 (F2)2 J2_1 J2_2 (A2)2 (A1)2.
		It needs to be safe to transform this to
		dmgreenUnsubmitted Done Reply Inline Actions I realize you didn't write the original, but tranform -> transform I believe the old message was intending to show array indices, so Fi was F(i) and Si_j was S(i,j), showing that the second unrolled iteration need to be able to move past the first, in terms of runtime execution. dmgreen: I realize you didn't write the original, but tranform -> transform I believe the old message…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions fixed the typo, and added braces, is not any clearer? Whitney: fixed the typo, and added braces, is not any clearer?
		(F1)1 (F1)2 (F2)1 (F2)2 J1_1 J1_2 J2_1 J2_2 (A2)1 (A2)2 (A1)1 (A1)2.

There are then a number of checks along the lines of no calls, no		There are then a number of checks along the lines of no calls, no
exceptions, inner loop IV is consistent, etc. Note that for loops requiring		exceptions, inner loop IV is consistent, etc. Note that for loops requiring
runtime unrolling, UnrollRuntimeLoopRemainder can also fail in		runtime unrolling, UnrollRuntimeLoopRemainder can also fail in
UnrollAndJamLoop if the trip count cannot be easily calculated.		UnrollAndJamLoop if the trip count cannot be easily calculated.
*/		*/

if (!L->isLoopSimplifyForm() \|\| L->getSubLoops().size() != 1)
WhitneyAuthorUnsubmitted Done Reply Inline Actions Now done in `isEligibleLoopForm` Whitney: Now done in `isEligibleLoopForm`
return false;
Loop *SubLoop = L->getSubLoops()[0];
if (!SubLoop->isLoopSimplifyForm())
return false;

BasicBlock *Header = L->getHeader();
BasicBlock *Latch = L->getLoopLatch();
BasicBlock *Exit = L->getExitingBlock();
BasicBlock *SubLoopHeader = SubLoop->getHeader();
BasicBlock *SubLoopLatch = SubLoop->getLoopLatch();
BasicBlock *SubLoopExit = SubLoop->getExitingBlock();

if (Latch != Exit)
return false;
if (SubLoopLatch != SubLoopExit)
return false;

if (Header->hasAddressTaken() \|\| SubLoopHeader->hasAddressTaken()) {
LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; Address taken\n");
return false;
}

// Split blocks into Fore/SubLoop/Aft based on dominators		// Split blocks into Fore/SubLoop/Aft based on dominators
		Loop *JamLoop = getInnerMostLoop(L);
BasicBlockSet SubLoopBlocks;		BasicBlockSet SubLoopBlocks;
BasicBlockSet ForeBlocks;		DenseMap<Loop *, BasicBlockSet> ForeBlocksMap;
BasicBlockSet AftBlocks;		DenseMap<Loop *, BasicBlockSet> AftBlocksMap;
if (!partitionOuterLoopBlocks(L, SubLoop, ForeBlocks, SubLoopBlocks,		if (!partitionOuterLoopBlocks(L, JamLoop, SubLoopBlocks, ForeBlocksMap,
AftBlocks, &DT)) {		AftBlocksMap, DT)) {
LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; Incompatible loop layout\n");		LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; Incompatible loop layout\n");
return false;		return false;
}		}

// Aft blocks may need to move instructions to fore blocks, which becomes more		// Aft blocks may need to move instructions to fore blocks, which becomes more
// difficult if there are multiple (potentially conditionally executed)		// difficult if there are multiple (potentially conditionally executed)
// blocks. For now we just exclude loops with multiple aft blocks.		// blocks. For now we just exclude loops with multiple aft blocks.
if (AftBlocks.size() != 1) {		if (AftBlocksMap[L].size() != 1) {
LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; Can't currently handle "		LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; Can't currently handle "
"multiple blocks after the loop\n");		"multiple blocks after the loop\n");
return false;		return false;
}		}

// Check inner loop backedge count is consistent on all iterations of the		// Check inner loop backedge count is consistent on all iterations of the
// outer loop		// outer loop
if (!hasIterationCountInvariantInParent(SubLoop, SE)) {		if (any_of(L->getLoopsInPreorder(), [&SE](Loop *SubLoop) {
		return !hasIterationCountInvariantInParent(SubLoop, SE);
		})) {
LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; Inner loop iteration count is "		LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; Inner loop iteration count is "
"not consistent on each iteration\n");		"not consistent on each iteration\n");
return false;		return false;
}		}

// Check the loop safety info for exceptions.		// Check the loop safety info for exceptions.
SimpleLoopSafetyInfo LSI;		SimpleLoopSafetyInfo LSI;
LSI.computeLoopSafetyInfo(L);		LSI.computeLoopSafetyInfo(L);
if (LSI.anyBlockMayThrow()) {		if (LSI.anyBlockMayThrow()) {
LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; Something may throw\n");		LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; Something may throw\n");
return false;		return false;
}		}

// We've ruled out the easy stuff and now need to check that there are no		// We've ruled out the easy stuff and now need to check that there are no
// interdependencies which may prevent us from moving the:		// interdependencies which may prevent us from moving the:
// ForeBlocks before Subloop and AftBlocks.		// ForeBlocks before Subloop and AftBlocks.
// Subloop before AftBlocks.		// Subloop before AftBlocks.
// ForeBlock phi operands before the subloop		// ForeBlock phi operands before the subloop

// Make sure we can move all instructions we need to before the subloop		// Make sure we can move all instructions we need to before the subloop
		BasicBlock *Header = L->getHeader();
		dmgreenUnsubmitted Done Reply Inline Actions Should this have a check for a 2 deep loop nest at the moment (like before), if the remainder of the analysis/transform code hasn't been updated yet? It looks like the count calculation might just exclude anything with multiple subloop blocks at the moment anyway, so is possibly not a problem in practice, without pragma's. dmgreen: Should this have a check for a 2 deep loop nest at the moment (like before), if the remainder…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Not sure I understand, this check is already like before. Whitney: Not sure I understand, this check is already like before.
		dmgreenUnsubmitted Done Reply Inline Actions Sorry, the actual line I was pointing to was semi-random. That wasn't clear. Does the processHeaderPhiOperands check below need to check each level? IIRC it's testing data-dependencies (as in ssa/use-def dependencies, as opposed to the memory dependencies in checkDependencies. Can we physically move any instruction we need from aft to fore). If we end up moving multiple levels past one another, do we have to make the same checks at each level? My general point was that some of the code still only handles 2-deep loop nests. Should we have a check somewhere (perhaps with a fixme next it) that still tests for that condition, until the rest of the code has caught up? dmgreen: Sorry, the actual line I was pointing to was semi-random. That wasn't clear. Does the…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Because of the way we rearrange basic blocks, we also require that the Fore blocks of L on all unrolled iterations are safe to move before the blocks of the direct child of L of all iterations. So we require that the phi node looping operands of ForeHeader can be moved to at least the end of ForeEnd, so that we can arrange cloned Fore Blocks before the subloop and match up Phi's correctly. As we are only unrolling L, not its child, we don't need to move instructions from non-L AftBlock to non-L ForeBlock, so we don't need to check if the moves are safe. My general point was that some of the code still only handles 2-deep loop nests. Should we have a check somewhere (perhaps with a fixme next it) that still tests for that condition, until the rest of the code has caught up? I modified all checks I found needed in `isSafeToUnrollAndJam`, am I missing something? Whitney: ``` Because of the way we rearrange basic blocks, we also require that the Fore blocks…
		dmgreenUnsubmitted Done Reply Inline Actions In the summary you have B' moving past D. And we need to be sure that B' doesn't depend on anything from D. I think of it as B(1,0) needs to move past D(0,0). the "j" level loop isn't unrolled, but there still some movement needed at the "i" level. dmgreen: In the summary you have B' moving past D. And we need to be sure that B' doesn't depend on…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Here we are talking about def-use dependence. If an instruction in B' (x2) depend on an instruction in D (y), means there must be an instruction in B (x) that depend on instruction y in D, as B' is clone from B. B: x = phi [y, D]... D: y = As we are placing B' after B, and y is available for B, then y must also be available for B'. Please correct me if I am wrong. Whitney: Here we are talking about def-use dependence. If an instruction in B' (x2) depend on an…
		dmgreenUnsubmitted Done Reply Inline Actions Yeah, kind of. But with an extra level of unrolling in there. Using your ` notation from before, normal unrolling would do: B: x = phi [y', D'], [x, A] D: y = B': x' = phi [y, D] D: y' = As we need to move B' before D, we also need to be able to hoist y into B, so the phi in B' can point at the correct value. That's what this processHeaderPhiOperands is doing. Note the x` phi only has one operand after unrolling, so will be simplified away. It comes up quite a bit from the increment of the IV variable. The second IV will become an increment of the first after we have pushed the add up into the header from the latch. dmgreen: Yeah, kind of. But with an extra level of unrolling in there. Using your ` notation from before…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions We are still only unrolling one loop each time, but fuse LoopDepth-1 times. Instructions in B should be the same as B', except all `i` changed to `i+1`, when unrolling by 2. Likewise, all instructions in D should be the same as D', except all `i` changed to `i+1`, when unrolling by 2. In that case, how can an instruction defined in D' used in B? I think processHeaderPhiOperands is used for the induction variable of the unrolled loop. Whitney: We are still only unrolling one loop each time, but fuse LoopDepth-1 times. Instructions in B…
		dmgreenUnsubmitted Done Reply Inline Actions Ah I see. Even though we are moving the blocks past one another, at that level there will not be any instructions that needs to move. OK fair enough :) dmgreen: Ah I see. Even though we are moving the blocks past one another, at that level there will not…
		BasicBlock *Latch = L->getLoopLatch();
		BasicBlockSet AftBlocks = AftBlocksMap[L];
		Loop *SubLoop = L->getSubLoops()[0];
if (!processHeaderPhiOperands(		if (!processHeaderPhiOperands(
Header, Latch, AftBlocks, [&AftBlocks, &SubLoop](Instruction *I) {		Header, Latch, AftBlocks, [&AftBlocks, &SubLoop](Instruction *I) {
if (SubLoop->contains(I->getParent()))		if (SubLoop->contains(I->getParent()))
return false;		return false;
if (AftBlocks.count(I->getParent())) {		if (AftBlocks.count(I->getParent())) {
// If we hit a phi node in afts we know we are done (probably		// If we hit a phi node in afts we know we are done (probably
// LCSSA)		// LCSSA)
if (isa<PHINode>(I))		if (isa<PHINode>(I))
Show All 9 Lines	if (!processHeaderPhiOperands(
LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; can't move required "		LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; can't move required "
"instructions after subloop to before it\n");		"instructions after subloop to before it\n");
return false;		return false;
}		}

// Check for memory dependencies which prohibit the unrolling we are doing.		// Check for memory dependencies which prohibit the unrolling we are doing.
// Because of the way we are unrolling Fore/Sub/Aft blocks, we need to check		// Because of the way we are unrolling Fore/Sub/Aft blocks, we need to check
// there are no dependencies between Fore-Sub, Fore-Aft, Sub-Aft and Sub-Sub.		// there are no dependencies between Fore-Sub, Fore-Aft, Sub-Aft and Sub-Sub.
if (!checkDependencies(L, ForeBlocks, SubLoopBlocks, AftBlocks, DI)) {		if (!checkDependencies(*L, SubLoopBlocks, ForeBlocksMap, AftBlocksMap, DI,
		LI)) {
LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; failed dependency check\n");		LLVM_DEBUG(dbgs() << "Won't unroll-and-jam; failed dependency check\n");
return false;		return false;
}		}

return true;		return true;
}		}

llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll

	; RUN: opt -basicaa -loop-unroll-and-jam -allow-unroll-and-jam -unroll-and-jam-count=4 < %s -S \| FileCheck %s			; RUN: opt -basicaa -loop-unroll-and-jam -allow-unroll-and-jam -unroll-and-jam-count=4 < %s -S \| FileCheck %s
	; RUN: opt -aa-pipeline=basic-aa -passes='unroll-and-jam' -allow-unroll-and-jam -unroll-and-jam-count=4 < %s -S \| FileCheck %s			; RUN: opt -aa-pipeline=basic-aa -passes='unroll-and-jam' -allow-unroll-and-jam -unroll-and-jam-count=4 < %s -S \| FileCheck %s
				dmgreenUnsubmitted Done Reply Inline Actions Why is this now using da-disable-delinearization-checks, and why have some of these existing tests been changed to use constant size arrays? dmgreen: Why is this now using da-disable-delinearization-checks, and why have some of these existing…
				WhitneyAuthorUnsubmitted Done Reply Inline Actions `-da-disable-delinearization-checks` is added to more accurately delinearization of fixed-size multi-dimensional arrays. See https://reviews.llvm.org/D72178 more detail explaination. why have some of these existing tests been changed to use constant size arrays They were originally testing single dimensional arrays, which may not be ideal for testing sub-sub portion of code. Whitney: `-da-disable-delinearization-checks` is added to more accurately delinearization of fixed-size…
				fhahnUnsubmitted Done Reply Inline Actions -da-disable-delinearization-checks is added to more accurately delinearization of fixed-size multi-dimensional arrays. Well, it returns more optimistic results at the loss of soundness IIRC. I think we should definitely keep test coverage without -da-disable-delinearization-checks. This is the common case, we should definitely handle correctly. Ideally we would have multi-dimensional tests that do not need -da-disable-delinearization-checks. IIRC constant loop bounds might help with that. IMO tests that really require -da-disable-delinearization-checks should be additional, maybe in a separate file. fhahn: > -da-disable-delinearization-checks is added to more accurately delinearization of fixed-size…
				bmahjourUnsubmitted Done Reply Inline Actions Ideally we would have multi-dimensional tests that do not need -da-disable-delinearization-checks. It's very difficult to come up with multi-dimensional tests that result in accurate dependence vectors. That's one of the main reasons why this option was added, to be able to let us test/exercise code paths that would otherwise not be taken due to overpessimistic dependence. IIRC constant loop bounds might help with that. The delinearization validity checks compare the subscripts against parameteric terms in the subscript that are believed to be the size of a given dimension. If the arrays have dynamic sizes then the constant loop bound won't help because the subscripts still contain parameteric terms. If the arrays have fixed sizes, then we don't even try to delinearize them unless `-da-disable-delinearization-checks` is enabled. bmahjour: > Ideally we would have multi-dimensional tests that do not need -da-disable-delinearization…

	target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

	; CHECK-LABEL: fore_aft_less			; CHECK-LABEL: fore_aft_less
	; CHECK: %j = phi			; CHECK: %j = phi
	; CHECK: %j.1 = phi			; CHECK: %j.1 = phi
	; CHECK: %j.2 = phi			; CHECK: %j.2 = phi
	; CHECK: %j.3 = phi			; CHECK: %j.3 = phi
	▲ Show 20 Lines • Show All 344 Lines • ▼ Show 20 Lines
	cleanup:			cleanup:
	ret void			ret void
	}			}


	; CHECK-LABEL: sub_sub_less			; CHECK-LABEL: sub_sub_less
	; CHECK: %j = phi			; CHECK: %j = phi
	; CHECK-NOT: %j.1 = phi			; CHECK-NOT: %j.1 = phi
	define void @sub_sub_less(i32* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {			define void @sub_sub_less(i32* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {
	WhitneyAuthorUnsubmitted Done Reply Inline Actions This test was orignally testing for i for j A[i] A[i-1] which should be safe to unroll and jam. I think it actually want to test the code for sub with sub with the dependence distance of the inner loop is less. for i for j A[i][j] A[i+1][j-1] Whitney: This test was orignally testing ``` for i for j A[i] A[i-1] ``` which should be…
	dmgreenUnsubmitted Done Reply Inline Actions Hmm. I don't remember what this was trying to test. It feels like a very long time ago now. Thanks for splitting the new tests out. More are always a good thing. dmgreen: Hmm. I don't remember what this was trying to test. It feels like a very long time ago now.
	WhitneyAuthorUnsubmitted Done Reply Inline Actions For sure. I guess is hard to speculate what it was trying to test now. Whitney: For sure. I guess is hard to speculate what it was trying to test now.
	MeinersburUnsubmitted Done Reply Inline Actions This is not safe to unroll-and-jam. For %N == 2 the excution sequence is (Sa being the first access in the loop body, Sb the second) Sa(0,0): A[0] Sb(0,0): A[-1] Sa(0,1): A[0] Sb(0,1): A[-1] Sa(1,0): A[1] Sb(1,0): A[0] Sa(1,1): A[1] Sb(1,1): A[0] After unroll-and-jam by 2: Sa(0,0): A[0] Sb(0,0): A[-1] Sa(1,0): A[1] Sb(1,0): A[0] Sa(0,1): A[0] Sb(0,1): A[-1] Sa(1,1): A[1] Sb(1,1): A[0] That is, the dependency chain `Sa(0,0)->Sa(0,1)->Sb(1,0)->Sb(1,1)` has become `Sa(0,0)->Sb(1,0)->Sa(0,1)->Sb(1,1)` and therefore has been violated. Meinersbur: This is not safe to unroll-and-jam. For %N == 2 the excution sequence is (Sa being the first…
	entry:			entry:
	%cmp = icmp sgt i32 %N, 0			%cmp = icmp sgt i32 %N, 0
	br i1 %cmp, label %for.outer, label %cleanup			br i1 %cmp, label %for.outer, label %cleanup

	for.outer:			for.outer:
	%i = phi i32 [ %add7, %for.latch ], [ 0, %entry ]			%i = phi i32 [ %add7, %for.latch ], [ 0, %entry ]
	br label %for.inner			br label %for.inner

	Show All 21 Lines
	cleanup:			cleanup:
	ret void			ret void
	}			}


	; CHECK-LABEL: sub_sub_eq			; CHECK-LABEL: sub_sub_eq
	; CHECK: %j = phi			; CHECK: %j = phi
	; CHECK: %j.1 = phi			; CHECK: %j.1 = phi
				; CHECK: %j.2 = phi
				; CHECK: %j.3 = phi
	define void @sub_sub_eq(i32* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {			define void @sub_sub_eq(i32* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {
	WhitneyAuthorUnsubmitted Done Reply Inline Actions This test was orignally testing for i for j A[i] A[i] I think it actually want to test the code for sub with sub with the dependence distance of the inner loop is eq. for i for j A[i][j] A[i+1][j] Whitney: This test was orignally testing ``` for i for j A[i] A[i] ``` I think it actually…
	fhahnUnsubmitted Done Reply Inline Actions FWIW I think it would be better to keep the original test as is and add the new case as an additional test, as they seem to test different scenarios. fhahn: FWIW I think it would be better to keep the original test as is and add the new case as an…
	entry:			entry:
	%cmp = icmp sgt i32 %N, 0			%cmp = icmp sgt i32 %N, 0
	br i1 %cmp, label %for.outer, label %cleanup			br i1 %cmp, label %for.outer, label %cleanup

	for.outer:			for.outer:
	%i = phi i32 [ %add7, %for.latch ], [ 0, %entry ]			%i = phi i32 [ %add7, %for.latch ], [ 0, %entry ]
	br label %for.inner			br label %for.inner

	Show All 21 Lines
	cleanup:			cleanup:
	ret void			ret void
	}			}


	; CHECK-LABEL: sub_sub_more			; CHECK-LABEL: sub_sub_more
	; CHECK: %j = phi			; CHECK: %j = phi
	; CHECK-NOT: %j.1 = phi			; CHECK-NOT: %j.1 = phi
	define void @sub_sub_more(i32* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {			define void @sub_sub_more(i32* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {
	WhitneyAuthorUnsubmitted Done Reply Inline Actions This test was orignally testing for i for j A[i] A[i+1] which should be safe to unroll and jam. I think it actually want to test the code for sub with sub with the dependence distance of the inner loop is more. for i for j A[i][j] A[i+1][j+1] Whitney: This test was orignally testing ``` for i for j A[i] A[i+1] ``` which should be…
	MeinersburUnsubmitted Done Reply Inline Actions I think this is NOT safe to unroll-and jam. The unrolled equivalent is: for i += 2 for j S1: A[i] S2: A[i+1] S1': A[i+1] S2': A[i+2] At S1', the lasst access of A[i+1] is S2 from the same itertation. instead if S1 from the previous j-iteration is in the original loop. Meinersbur: I think this is NOT safe to unroll-and jam. The unrolled equivalent is: ``` for i += 2 for j…
	entry:			entry:
	%cmp = icmp sgt i32 %N, 0			%cmp = icmp sgt i32 %N, 0
	br i1 %cmp, label %for.outer, label %cleanup			br i1 %cmp, label %for.outer, label %cleanup

	for.outer:			for.outer:
	%i = phi i32 [ %add7, %for.latch ], [ 0, %entry ]			%i = phi i32 [ %add7, %for.latch ], [ 0, %entry ]
	br label %for.inner			br label %for.inner

	Show All 15 Lines

	for.latch:			for.latch:
	%add7 = add nuw nsw i32 %i, 1			%add7 = add nuw nsw i32 %i, 1
	%exitcond29 = icmp eq i32 %add7, %N			%exitcond29 = icmp eq i32 %add7, %N
	br i1 %exitcond29, label %cleanup, label %for.outer			br i1 %exitcond29, label %cleanup, label %for.outer

	cleanup:			cleanup:
	ret void			ret void
	}			}
				MeinersburUnsubmitted Done Reply Inline Actions Does it otherwise unroll-and-jam the middle loop? Should we add a mechanism that stops unrolling of nests of already unrolled loops (e.g. add `llvm.loop.unroll_and_jam.disable` to all nested loops)? Meinersbur: Does it otherwise unroll-and-jam the middle loop? Should we add a mechanism that stops…
				WhitneyAuthorUnsubmitted Done Reply Inline Actions Currently unroll and jam add `llvm.loop.unroll_and_jam.disable` to the loop it unroll and jammed (not all nested loops). As we are traversing from inner to outer, `for.k` is not considered as it doesn't have a inner loop `for.j` is safe to unroll and jam, but is not my intension for this test case, so I added the disable pragma `for.i` is unsafe only after my change. Even if we change to traverse from outer to inner, we should allow the middle loop to unroll and jam when proven profitable. Whitney: Currently unroll and jam add `llvm.loop.unroll_and_jam.disable` to the loop it unroll and…

llvm/test/Transforms/LoopUnrollAndJam/dependencies_multidims.ll

This file was added.

				; RUN: opt -da-disable-delinearization-checks -basicaa -loop-unroll-and-jam -allow-unroll-and-jam -unroll-and-jam-count=4 < %s -S \| FileCheck %s
				; RUN: opt -da-disable-delinearization-checks -aa-pipeline=basic-aa -passes='unroll-and-jam' -allow-unroll-and-jam -unroll-and-jam-count=4 < %s -S \| FileCheck %s

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

				; CHECK-LABEL: sub_sub_less
				; CHECK: %j = phi
				; CHECK-NOT: %j.1 = phi
				define void @sub_sub_less([100 x i32]* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {
				entry:
				%cmp = icmp sgt i32 %N, 0
				br i1 %cmp, label %for.outer, label %cleanup

				for.outer:
				%i = phi i32 [ %add7, %for.latch ], [ 0, %entry ]
				br label %for.inner

				for.inner:
				%j = phi i32 [ %add6, %for.inner ], [ 0, %for.outer ]
				%sum = phi i32 [ %add, %for.inner ], [ 0, %for.outer ]
				%arrayidx5 = getelementptr inbounds i32, i32* %B, i32 %j
				%0 = load i32, i32* %arrayidx5, align 4
				%mul = mul nsw i32 %0, %i
				%add = add nsw i32 %mul, %sum
				%add6 = add nuw nsw i32 %j, 1
				%arrayidx = getelementptr inbounds [100 x i32], [100 x i32]* %A, i32 %i, i32 %j
				store i32 1, i32* %arrayidx, align 4
				%add72 = add nuw nsw i32 %i, 1
				%add73 = add nuw nsw i32 %j, -1
				%arrayidx8 = getelementptr inbounds [100 x i32], [100 x i32]* %A, i32 %add72, i32 %add73
				store i32 %add, i32* %arrayidx8, align 4
				%exitcond = icmp eq i32 %add6, %N
				br i1 %exitcond, label %for.latch, label %for.inner

				for.latch:
				%add7 = add nuw nsw i32 %i, 1
				%exitcond29 = icmp eq i32 %add7, %N
				br i1 %exitcond29, label %cleanup, label %for.outer

				cleanup:
				ret void
				}


				; CHECK-LABEL: sub_sub_eq
				; CHECK: %j = phi
				; CHECK: %j.1 = phi
				; CHECK: %j.2 = phi
				; CHECK: %j.3 = phi
				define void @sub_sub_eq([100 x i32]* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {
				entry:
				%cmp = icmp sgt i32 %N, 0
				br i1 %cmp, label %for.outer, label %cleanup

				for.outer:
				%i = phi i32 [ %add7, %for.latch ], [ 0, %entry ]
				br label %for.inner

				for.inner:
				%j = phi i32 [ %add6, %for.inner ], [ 0, %for.outer ]
				%sum = phi i32 [ %add, %for.inner ], [ 0, %for.outer ]
				%arrayidx5 = getelementptr inbounds i32, i32* %B, i32 %j
				%0 = load i32, i32* %arrayidx5, align 4
				%mul = mul nsw i32 %0, %i
				%add = add nsw i32 %mul, %sum
				%add6 = add nuw nsw i32 %j, 1
				%arrayidx = getelementptr inbounds [100 x i32], [100 x i32]* %A, i32 %i, i32 %j
				store i32 1, i32* %arrayidx, align 4
				%add72 = add nuw nsw i32 %i, 1
				%add73 = add nuw nsw i32 %j, 0
				%arrayidx8 = getelementptr inbounds [100 x i32], [100 x i32]* %A, i32 %add72, i32 %add73
				store i32 %add, i32* %arrayidx8, align 4
				%exitcond = icmp eq i32 %add6, %N
				br i1 %exitcond, label %for.latch, label %for.inner

				for.latch:
				%add7 = add nuw nsw i32 %i, 1
				%exitcond29 = icmp eq i32 %add7, %N
				br i1 %exitcond29, label %cleanup, label %for.outer

				cleanup:
				ret void
				}


				; CHECK-LABEL: sub_sub_more
				; CHECK: %j = phi
				; CHECK: %j.1 = phi
				; CHECK: %j.2 = phi
				; CHECK: %j.3 = phi
				define void @sub_sub_more([100 x i32]* noalias nocapture %A, i32 %N, i32* noalias nocapture readonly %B) {
				entry:
				%cmp = icmp sgt i32 %N, 0
				br i1 %cmp, label %for.outer, label %cleanup

				for.outer:
				%i = phi i32 [ %add7, %for.latch ], [ 0, %entry ]
				br label %for.inner

				for.inner:
				%j = phi i32 [ %add6, %for.inner ], [ 0, %for.outer ]
				%sum = phi i32 [ %add, %for.inner ], [ 0, %for.outer ]
				%arrayidx5 = getelementptr inbounds i32, i32* %B, i32 %j
				%0 = load i32, i32* %arrayidx5, align 4
				%mul = mul nsw i32 %0, %i
				%add = add nsw i32 %mul, %sum
				%add6 = add nuw nsw i32 %j, 1
				%arrayidx = getelementptr inbounds [100 x i32], [100 x i32]* %A, i32 %i, i32 %j
				store i32 1, i32* %arrayidx, align 4
				%add72 = add nuw nsw i32 %i, 1
				%add73 = add nuw nsw i32 %j, 1
				%arrayidx8 = getelementptr inbounds [100 x i32], [100 x i32]* %A, i32 %add72, i32 %add73
				store i32 %add, i32* %arrayidx8, align 4
				%exitcond = icmp eq i32 %add6, %N
				br i1 %exitcond, label %for.latch, label %for.inner

				for.latch:
				%add7 = add nuw nsw i32 %i, 1
				%exitcond29 = icmp eq i32 %add7, %N
				br i1 %exitcond29, label %cleanup, label %for.outer

				cleanup:
				ret void
				}

				; CHECK-LABEL: sub_sub_less_3d
				; CHECK: %k = phi
				; CHECK-NOT: %k.1 = phi

				; for (long i = 0; i < 100; ++i)
				; for (long j = 0; j < 100; ++j)
				; for (long k = 0; k < 100; ++k) {
				; A[i][j][k] = 0;
				; A[i+1][j][k-1] = 0;
				; }

				define void @sub_sub_less_3d([100 x [100 x i32]]* noalias %A) {
				entry:
				br label %for.i

				for.i:
				%i = phi i32 [ 0, %entry ], [ %inc.i, %for.i.latch ]
				br label %for.j

				for.j:
				%j = phi i32 [ 0, %for.i ], [ %inc.j, %for.j.latch ]
				br label %for.k

				for.k:
				%k = phi i32 [ 0, %for.j ], [ %inc.k, %for.k ]
				%arrayidx = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* %A, i32 %i, i32 %j, i32 %k
				store i32 0, i32* %arrayidx, align 4
				%add.i = add nsw i32 %i, 1
				%sub.k = add nsw i32 %k, -1
				%arrayidx2 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* %A, i32 %add.i, i32 %j, i32 %sub.k
				store i32 0, i32* %arrayidx2, align 4
				%inc.k = add nsw i32 %k, 1
				%cmp.k = icmp slt i32 %inc.k, 100
				br i1 %cmp.k, label %for.k, label %for.j.latch

				for.j.latch:
				%inc.j = add nsw i32 %j, 1
				%cmp.j = icmp slt i32 %inc.j, 100
				br i1 %cmp.j, label %for.j, label %for.i.latch, !llvm.loop !1

				for.i.latch:
				%inc.i = add nsw i32 %i, 1
				%cmp.i = icmp slt i32 %inc.i, 100
				br i1 %cmp.i, label %for.i, label %for.end

				for.end:
				ret void
				}

				; CHECK-LABEL: sub_sub_outer_scalar
				; CHECK: %k = phi
				; CHECK-NOT: %k.1 = phi

				define void @sub_sub_outer_scalar([100 x i32]* %A) {
				entry:
				br label %for.i

				for.i:
				%i = phi i64 [ 0, %entry ], [ %inc.i, %for.i.latch ]
				br label %for.j

				for.j:
				%j = phi i64 [ 0, %for.i ], [ %inc.j, %for.j.latch ]
				br label %for.k

				for.k:
				%k = phi i64 [ 0, %for.j ], [ %inc.k, %for.k ]
				%arrayidx = getelementptr inbounds [100 x i32], [100 x i32]* %A, i64 %j
				%arrayidx7 = getelementptr inbounds [100 x i32], [100 x i32]* %arrayidx, i64 0, i64 %k
				%0 = load i32, i32* %arrayidx7, align 4
				%sub.j = sub nsw i64 %j, 1
				%arrayidx8 = getelementptr inbounds [100 x i32], [100 x i32]* %A, i64 %sub.j
				%arrayidx9 = getelementptr inbounds [100 x i32], [100 x i32]* %arrayidx8, i64 0, i64 %k
				store i32 %0, i32* %arrayidx9, align 4
				%inc.k = add nsw i64 %k, 1
				%cmp.k = icmp slt i64 %inc.k, 100
				br i1 %cmp.k, label %for.k, label %for.j.latch

				for.j.latch:
				%inc.j = add nsw i64 %j, 1
				%cmp.j = icmp slt i64 %inc.j, 100
				br i1 %cmp.j, label %for.j, label %for.i.latch

				for.i.latch:
				%inc.i = add nsw i64 %i, 1
				%cmp.i = icmp slt i64 %inc.i, 100
				br i1 %cmp.i, label %for.i, label %for.end

				for.end:
				ret void
				}

				!1 = distinct !{!1, !2}
				!2 = !{!"llvm.loop.unroll_and_jam.disable"}

This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnrollAndJam] Changed safety checks to consider more than 2-levels loop nest.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 262489

llvm/include/llvm/Transforms/Utils/UnrollLoop.h

llvm/lib/Transforms/Scalar/LoopUnrollAndJamPass.cpp

llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp

llvm/test/Transforms/LoopUnrollAndJam/dependencies.ll

llvm/test/Transforms/LoopUnrollAndJam/dependencies_multidims.ll

[LoopUnrollAndJam] Changed safety checks to consider more than 2-levels loop nest.
ClosedPublic