This is an archive of the discontinued LLVM Phabricator instance.

Loop Versioning for LICM
ClosedPublic

Authored by ashutosh.nema on Apr 21 2015, 7:30 AM.

Details

Summary

I like to propose a new loop multi versioning optimization for LICM.

Cases where memory aliasing wasn't sure compiler became conservative and
didn't proceeds with invariant code motion. This results in possible
missed optimization opportunities.

Our main motivation is to exploit such cases and allow LICM optimization.

Most of the time when alias analysis is unsure about memory access and it
assumes may-alias. This un surety from alias analysis restrict LICM to proceed
further. Cases where alias analysis is unsure we like to use loop versioning
as an alternative.

Loop Versioning will creates version of the loop with aggressive alias and
the other with conservative (default) alias. Aggressive alias version of loop
will have all the memory access marked as no-alias. These two version of loop
will be preceded by a memory runtime check. This runtime check consists of bound
checks for all unique memory accessed in loop, and it ensures aliasing of memory.
Based on this check result at runtime any of the loops gets executed, if memory
is non aliased then aggressive aliasing loop gets executed, else when memory is
aliased then non aggressive aliased version gets executed.

Following are the top level steps:

  • Perform loop versioning feasibility check.
  • If loop is a candidate for versioning then create a memory bound check, by considering all the memory access in loop body.
  • Clone original loop and set all memory access as no-alias in new loop.
  • Set original loop & versioned loop as a branch target of runtime check result.
  • Call LICM::promoteLoopAccessesToScalars on aggressive alias versioned of loop.

Consider following test:

1 int foo(int * var1, int * var2, int * var3, unsigned itr) {
2 unsigned i = 0, j = 0;
3 for(; i < itr; i++) {
4 for(; j < itr; j++) {
5 var1[j] = itr + i;
6 var3[i] = var1[j] + var3[i];
7 }
8 }
9 }

At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars)
but because of alias analysis un surety about memory access it unable to move it out.

After Loop versioning IR:

<Versioned Loop>
for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion

%indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ]
%arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 %indvars.iv.loopVersion
store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, !alias.scope !11, !noalias !11
%indvars.iv.next.loopVersion = add nuw nsw i64 %indvars.iv.loopVersion, 1
%lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32
%exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0
br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label %for.body3.loopVersion
<Original Loop>
for.body3: ; preds = %for.body3.lr.ph, %for.body3

%indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, %for.body3.lr.ph ]
%arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv
store i32 %add, i32* %arrayidx, align 4, !tbaa !1
%8 = load i32* %arrayidx7, align 4, !tbaa !1
%add8 = add nsw i32 %8, %add
store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv to i32
%exitcond = icmp eq i32 %lftr.wideiv, %0
br i1 %exitcond, label %for.inc11, label %for.body3
In versioned loop difference is visible, 1 store has moved out.

Following are some high level details about current implementation:

LoopVersioning
LoopVersioning is main class which holds multi versioning functionality.

LoopVersioning :: isLegalForVersioning
Its member to ‘LoopVersioning’
Does feasibility check for loop versioning.
a) Checks layout of loop.
b) Instruction level check.
c) memory checks.

LoopVersioning :: versionizeLoop
a) Clone original loop
b) Create a runtime memory check.
c) Add both loops under runtime check results target.

In this patch used maximum loop nest threshold as 2, and maximum number
of pointers in runtime memory check as 5, also these threshold are controlled
by command line.

During feasibility check, we compare possible invariant code motion with total
instruction of loop. If the comparison goes beyond certain threshold limit and
found profitable then only we do loop versioning.

Requesting to go through patch for detailed approach.
Suggestions are comments are welcome.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
reames added inline comments.Jun 17 2015, 5:40 PM
lib/Transforms/Scalar/LoopVersioningLICM.cpp
4

Please follow the standard header format. See LICM as an example.

It would help to have a trivial example here. The shortest possible while loop which shows the transform would make the discussion much more clear.

58

Include order should be sorted.

79

What is an "invariant threshold"? You probably need a better name for this.

125

I believe this is duplicated code with LICM. Please fine a place to put a utility function instead. Transforms/Utils/Loops.h would be fine.

Ideally, this would be split into it's own patch.

146

A better name would help here.

Don't we have similar code in CGP?

Thanks Philip for this review.

Drive by review.  A few comments here or there.

Meta comments:

- Code structure and naming is somewhat confusing.  Attention to factoring of functions and method names would help.
- This feels like too large a patch to easily review.  I would strongly prefer that you split the patch into the smallest possible piece (i.e. it works on trivial cases only), then extend.  Given Hal has already started reviewing this, I'll defer to his preferences here, but I'm unlikely to signoff on a patch this large and complex.

I'll try to divide this patch at least at logical level.
Also as suggested there are possibly few functions that can be moved as utility.
I'll come-up with a separate patch for them.

REPOSITORY
  rL LLVM

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:3
@@ +2,3 @@
+//
+//                     The LLVM Compiler Infrastructure
+//
----------------
Please follow the standard header format.  See LICM as an example.

It would help to have a trivial example here.  The shortest possible while loop which shows the transform would make the discussion much more clear.

Sure will update it and add an example.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:57
@@ +56,3 @@
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Analysis/ScalarEvolutionExpander.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
----------------
Include order should be sorted.

Sure will sort the included headers and update.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:78
@@ +77,3 @@
+
+/// Loop Versioning invariant threshold
+static cl::opt<unsigned>
----------------
What is an "invariant threshold"?  You probably need a better name for this.

This option is to control the profitability of loop versioning for LICM.
It checks invariant load & store vs total instruction of loop. If invariants
are more than defined threshold then only go for LICM loop versioning.
Threshold is defined in percentage with a default value 25%.

How about "-licm-versioning-threshold" or "-licm-versioning-invariant-threshold" ?

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:124
@@ +123,3 @@
+/// \brief Recursively clone the specified loop and all of its children,
+/// mapping the blocks with the specified map.
+static Loop *cloneLoop(Loop *L, Loop *PL, ValueToValueMapTy &VM, LoopInfo *LI,
----------------
I believe this is duplicated code with LICM.  Please fine a place to put a utility function instead.  Transforms/Utils/Loops.h would be fine.

Ideally, this would be split into it's own patch.

Sure I'll move this to a utility.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:145
@@ +144,3 @@
+/// return the induction operand of the gep pointer.
+static Value *stripGetElementPtr(Value *Ptr, ScalarEvolution *SE, Loop *Lp) {
+  GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(Ptr);
----------------
A better name would help here.
Don't we have similar code in CGP?

Similar function exists in LoopVectorizer will change it as a utility and use it.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:161
@@ +160,3 @@
+
+/// \brief Look for a cast use of the passed value.
+static Value *getUniqueCastUse(Value *Ptr, Loop *Lp, Type *Ty) {
----------------
Er, what does this actually do?  The Ty and L params make me thing this isn't about finding a unique use...

Similar function exists in LoopVectorizer will change it as a utility and use it.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:686
@@ +685,3 @@
+///                           +----------+
+Loop *LoopVersioningLICM::versionizeLoop(LPPassManager &LPM) {
+  std::vector<BasicBlock *> LoopBlocks;
----------------
I suspect there is code which can and should be commoned with unswitch here.

I'll check and come back to you.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:826
@@ +825,3 @@
+      // Only interested in load & stores
+      if (!it->mayReadFromMemory() && !it->mayWriteToMemory())
+        continue;
----------------
Your comment and code disagree.  Are you intentionally handling RMW operations?

Will update comments here, we are handling RMW operations.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:833
@@ +832,3 @@
+      NoAliases.push_back(NewScope);
+      // Set no-alias for current instruction.
+      I->setMetadata(
----------------
The loop structure is very odd here.  It looks like later instructions end up with more noalias markers than earlier ones?  I suspect that's not what you want.

Here we want to make no-alias to instructions, yes the last one will have more no-alias.
But its required to make instruction no-alias, i.e. loop has 3 instruction I1, I2, I3.
Then here we are setting no-alias property like below:
I1 No-Alias-1
I2 No-Alias-1, No-Alias-2
I2 No-Alias-1, No-Alias-2, No-Alias-3

My understanding may be wrong here, if you have any better way of doing it then please suggest.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:871
@@ +870,3 @@
+    BasicBlock *BB = *I;
+    if (LI->getLoopFor(BB) == L) // Ignore blocks in subloops.
+      CurAST->add(*BB);          // Incorporate the specified basic block
----------------
Why?

As we are only interested in the inner most loop, it's overhead creating AST for sub loops.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:889
@@ +888,3 @@
+    // Delete newly created loop from loop pass manager.
+    LPM.deleteLoopFromQueue(VerLoop);
+    Changed = true;
----------------
Why?

This need to be removed as we already introduced metadata to ensure not revisiting same loop.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:893
@@ +892,3 @@
+  // Delete allocated memory.
+  delete CurAST;
+  delete PointerSet;
----------------
These can be stack allocated.

Sure will make stack allocation.

http://reviews.llvm.org/D9151

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/

In addition to some dups you already noticed in LoopVectorize, I spotted a few more on my initial scan.

lib/Transforms/Scalar/LoopVersioningLICM.cpp
100

This is duplicated from LoopVectorize.

180

Duplicate in LoopVectorize.

339

Almost identical to the one in LoopVectorize.

chatur01 added inline comments.Jun 22 2015, 4:08 PM
lib/Transforms/Scalar/LoopVersioningLICM.cpp
503

I don't get where this check is being made. Is the comment out-of-date?

Thanks Charlie for looking into this.

Yes, these functions can be moved to a utility, soon will come up with a patch.

Regards,
Ashutosh

Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:502
@@ +501,3 @@
+/ 2) Make sure loop body should not have any call instruction.
+
/ 3) Check store instruction execution guarantee.

+/// 4) Loop body should not have any may throw instruction.

I don't get where this check is being made. Is the comment out-of-date?

Sorry, these are stale comments.

ashutosh.nema edited edge metadata.
  1. LoopVersioning utility is now available, so LoopVersioningLICM is using that utility.
  1. Earlier following functions was part of LoopVersioningLICM now these are moved to utility. a) getGEPInductionOperand, b) stripGetElementPtr, c) getUniqueCastUse, d) getStrideFromPointer

So, LoopVersioningLICM is now using these utility functions.

  1. Incorporated previous review comments.
  1. Code re-factoring.

LoopVersioning utility now calls ‘addPHINodes’ implicitly(r245579).
Updated LoopVersioningLICM to consider this change.

Hal, Philip, Charlie,

Does this patch look OK to you now ?

Regards,
Ashutosh

chatur01 requested changes to this revision.Aug 27 2015, 3:53 PM
chatur01 added a reviewer: chatur01.

Hi Ashutosh,

This needs some superficial changes before it could go further. Some of which I've pointed out below. I'd suggest going through all the comments as well and checking grammar, I didn't point out all of that.

I see you have tested when your heuristics would be suitable to apply the transformation are triggered, but there's no tests for the transform itself. Would it not make sense to add some of those too?

I'm merely an interested observer here, I don't have the experience to be able to OK the implementation. Hopefully by sticking my oar in it will help get the attention of people that can :)

Regards,
Charlie.

lib/Transforms/Scalar/LoopVersioningLICM.cpp
103

I think it would be more useful to describe what the threshold is for than to state the description twice. Seems redundant. Same goes for the other options.

201

Why is this duplicated from LoopVectorizationLegality?

219

s/its/it's/

220

That's a weird name for the function, do you think legalLoopStructure would be better? Thinking how it would read in a conditional.

226

s/loops/loop/

272

Don't bother duplicating the assert reason in the line right above the assert.

280–282

Use = false here

286

Sloppy sentence. Put a space between at and least

287

shouldn't

288

Make sure the alias set doesn't have any MustAlias set.

289–290

Could you instead use the fancy new range-for?

292

I don't find this comment useful, I'd be more interested in /why/ we skip forward alias sets, I can see from the code that we do. If it's completely obvious why we skip forward alias sets, then the comment isn't needed.

295

Same here.

299

Coding violation

300

Try to maintain some consistency in your spelling of these things.

357–360

Ooof, that conditional is pretty hairy :)

375

LoadAndStoreCounter

405–406

Use = false

407

Coding convention

410–411

Coding convetion and also range for

413–414

and again

438–439

You don't need all those parens. (float)a / float(b) * 100. < c should be fine. Put it in a variable so you don't repeat it below as well.

513

StringRef

518–522

Range for, coding convention!

564–565

Range for

This revision now requires changes to proceed.Aug 27 2015, 3:53 PM

Thanks Charlie, I will correct these.

Regards,
Ashutosh

ashutosh.nema edited edge metadata.

Incorporated review comments.

ashutosh.nema marked 41 inline comments as done.Sep 1 2015, 4:42 AM
ashutosh.nema added inline comments.
lib/Transforms/Scalar/LoopVersioningLICM.cpp
202

Its just a wrapper which calls 'getStrideFromPointer' utility.
Earlier 'getStrideFromPointer' was the part of LoopVectorizer now moved to utility.

ashutosh.nema marked an inline comment as done.Sep 6 2015, 9:31 PM

Ping.

chatur01 accepted this revision.Sep 16 2015, 9:19 AM
chatur01 edited edge metadata.

Hi Ashutosh,

You can't commit this change until one of the other reviewers accepts this, but I'll step out of the way now and accept the revision, I'm worried that by leaving it as rejected it's holding up review from the other nominated reviewers.

I'm still not completely happy with the comments, but I think this has been held up for long enough on style issues alone. I hope someone can give you feedback soon, I'm not able to OK the revision.

--Charlie.

This revision is now accepted and ready to land.Sep 16 2015, 9:19 AM

I've started to look at this again; I apologize for the delay...

lib/Transforms/Scalar/LoopVersioningLICM.cpp
126

I see that you've marked this as done; does this patch need to be updated to reflect the change?

224

Can you be more specific? Are these checks reflecting limitation of the versioning infrastructure? Heuristics for profitability? Both? Please specifically explain this in the comments for each check.

292

Why? Must-alias with what? Why does it matter if there happens to be something that aliases with something else, if there is a third thing that only may alias with the first two?

309

alias -> Alias

334

Atleast -> At least

357

Why not? I understand that the vectorizer has these checks, but I don't see why this belongs in an LICM pass?

Thanks Hal for looking into this again.

I've started to look at this again; I apologize for the delay...

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:125
@@ +124,3 @@
+
+// Sets input string as meta data to loop latch terminator instruction.
+static void setLoopMetaData(Loop *CurLoop, const char *MDString) {
----------------
I see that you've marked this as done; does this patch need to be updated to reflect the change?

This comments was old comment on moving 'cloneLoop' to utility.
But now it's no more required as that function got removed, and using LoopVersioning utility for cloning etc.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:223
@@ +222,3 @@
+
+/// \brief Check loop structure and confirms it's good for LoopVersioningLICM.
+bool LoopVersioningLICM::legalLoopStructure() {
----------------
Can you be more specific? Are these checks reflecting limitation of the versioning infrastructure?

Some part of checks are repeated, but I kept them to complete legality at one place.
Not sure it's a right decision.

Heuristics for profitability? Both? Please specifically explain this in the comments for each check.

Sure will update.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:291
@@ +290,3 @@
+  // 2) PointerSet shouldn't have pointers more then RuntimeMemoryCheckThreshold
+  // 3) Make sure AliasSets doesn't have any must alias set.
+  for (const auto &I : *CurAST) {
----------------
Why? Must-alias with what? Why does it matter if there happens to be something that aliases with something else, if there is a third thing that only may alias with the first two?

Case where 2 pointers are must-aliased, there runtime bound check always give same result. In such cases there is no need for runtime checks and loop versioning.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:308
@@ +307,3 @@
+      Value *Ptr = A.getValue();
+      // alias tracker should have pointers of same data type.
+      typeCheck = (typeCheck && (SomePtr->getType() == Ptr->getType()));
----------------
alias -> Alias

Will correct.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:333
@@ +332,3 @@
+  }
+  // Atleast 2 pointers needed for runtime check.
+  if (PointerSet.size() <= 1) {
----------------
Atleast -> At least

Will correct.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:356
@@ +355,3 @@
+  const bool IsAnnotatedParallel = CurLoop->isAnnotatedParallel();
+  // We dont support call instructions. however, we ignore few intrinsic
+  // and libfunc callsite. We don't allow non-intrinsic, non-libfunc callsite
----------------
Why not? I understand that the vectorizer has these checks, but I don't see why this belongs in an LICM pass?

There is a possibility that call may modify aliasing behavior, which may defeat the purpose of versioning & runtime checks.

Repository:
  rL LLVM

http://reviews.llvm.org/D9151

Thanks Hal for looking into this again.

I've started to look at this again; I apologize for the delay...

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:125
@@ +124,3 @@
+
+// Sets input string as meta data to loop latch terminator instruction.
+static void setLoopMetaData(Loop *CurLoop, const char *MDString) {
----------------
I see that you've marked this as done; does this patch need to be updated to reflect the change?

This comments was old comment on moving 'cloneLoop' to utility.
But now it's no more required as that function got removed, and using LoopVersioning utility for cloning etc.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:223
@@ +222,3 @@
+
+/// \brief Check loop structure and confirms it's good for LoopVersioningLICM.
+bool LoopVersioningLICM::legalLoopStructure() {
----------------
Can you be more specific? Are these checks reflecting limitation of the versioning infrastructure?

Some part of checks are repeated, but I kept them to complete legality at one place.
Not sure it's a right decision.

Heuristics for profitability? Both? Please specifically explain this in the comments for each check.

Sure will update.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:291
@@ +290,3 @@
+  // 2) PointerSet shouldn't have pointers more then RuntimeMemoryCheckThreshold
+  // 3) Make sure AliasSets doesn't have any must alias set.
+  for (const auto &I : *CurAST) {
----------------
Why? Must-alias with what? Why does it matter if there happens to be something that aliases with something else, if there is a third thing that only may alias with the first two?

Case where 2 pointers are must-aliased, there runtime bound check always give same result. In such cases there is no need for runtime checks and loop versioning.

Alright, I have a better understanding of what you're doing now. Your technique for generating the versioned loop is to check that all pointer access in the loop are independent (and, thus, don't alias), and guarded by that check, you reach the versioned variant of the loop where the aliasing metadata asserts that all access are mutually independent. The follow-up LICM pass then actually does the hoisting.

I agree this makes sense, because if you had multiple aliasing domains then you'd still not necessarily be able to hoist the potentially-loop-invariant accesses out of the loop. Please, however, explain all of this near the CurAST iteration code so that it is clear why you have this restriction.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:308
@@ +307,3 @@
+      Value *Ptr = A.getValue();
+      // alias tracker should have pointers of same data type.
+      typeCheck = (typeCheck && (SomePtr->getType() == Ptr->getType()));
----------------
alias -> Alias

Will correct.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:333
@@ +332,3 @@
+  }
+  // Atleast 2 pointers needed for runtime check.
+  if (PointerSet.size() <= 1) {
----------------
Atleast -> At least

Will correct.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:356
@@ +355,3 @@
+  const bool IsAnnotatedParallel = CurLoop->isAnnotatedParallel();
+  // We dont support call instructions. however, we ignore few intrinsic
+  // and libfunc callsite. We don't allow non-intrinsic, non-libfunc callsite
----------------
Why not? I understand that the vectorizer has these checks, but I don't see why this belongs in an LICM pass?

There is a possibility that call may modify aliasing behavior, which may defeat the purpose of versioning & runtime checks.

Ah, good point. However, please then replace this check with an appropriate AA getModRef-type check for whether the call might alias with the relevant pointers (if the function, for example, does not alias with any of the loop-invariant accesses, then it can't affect what you're trying to do, although you'll need to be somewhat more careful about adding the noalias metadata to those functions, etc.).

Repository:
  rL LLVM

http://reviews.llvm.org/D9151
lib/Transforms/Scalar/LoopVersioningLICM.cpp
98

please use metadata names more consistent with our general naming scheme for standard metadata (llvm.loop.whatever). You should need only one metadata type, to disable licm-based versioning, it should be documented in the LangRef, and you can add it to both original and versioned loops once this pass has run.

427

Should comment say "load or store instruction"?

Thanks Hal for review.

I'll incorporate your comments and come back soon.

>
>   ================
>   Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:291
>   @@ +290,3 @@
>   +  // 2) PointerSet shouldn't have pointers more then RuntimeMemoryCheckThreshold
>   +  // 3) Make sure AliasSets doesn't have any must alias set.
>   +  for (const auto &I : *CurAST) {
>   ----------------
>   Why? Must-alias with what? Why does it matter if there happens to be something that aliases with something else, if there is a third thing that only may alias with the first two?
>
> Case where 2 pointers are must-aliased, there runtime bound check always give same result. In such cases there is no need for runtime checks and loop versioning.


Alright, I have a better understanding of what you're doing now. Your technique for generating the versioned loop is to check that all pointer access in the loop are independent (and, thus, don't alias), and guarded by that check, you reach the versioned variant of the loop where the aliasing metadata asserts that all access are mutually independent. The follow-up LICM pass then actually does the hoisting.

I agree this makes sense, because if you had multiple aliasing domains then you'd still not necessarily be able to hoist the potentially-loop-invariant accesses out of the loop. Please, however, explain all of this near the CurAST iteration code so that it is clear why you have this restriction.

Sure I'll mention about this.

>   Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:356

>   @@ +355,3 @@

>   +  const bool IsAnnotatedParallel = CurLoop->isAnnotatedParallel();

>   +  // We dont support call instructions. however, we ignore few intrinsic

>   +  // and libfunc callsite. We don't allow non-intrinsic, non-libfunc callsite

>   ----------------

>   Why not? I understand that the vectorizer has these checks, but I don't see why this belongs in an LICM pass?

>

> There is a possibility that call may modify aliasing behavior, which may defeat the purpose of versioning & runtime checks.


Ah, good point. However, please then replace this check with an appropriate AA getModRef-type check for whether the call might alias with the relevant pointers (if the function, for example, does not alias with any of the loop-invariant accesses, then it can't affect what you're trying to do, although you'll need to be somewhat more careful about adding the noalias metadata to those functions, etc.).

I'm not sure that itself would be sufficient, consider indirect calls probably need to consider function pointer in runtime check.
Also there is a possibility that in call argument one of the runtime pointer escaped, in such cases it become more difficult to ensure correctness.
That's why for simplicity I considered vectorizer approach to consider only few functions.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:97
@@ +96,3 @@
+#define DEBUG_TYPE "loop-versioning-licm"
+#define ORIGINAL_LOOP_METADATA "LoopVersioningLICMOriginalLoop"
+#define VERSION_LOOP_METADATA "LoopVersioningLICMVersionLoop"
----------------
please use metadata names more consistent with our general naming scheme for standard metadata (llvm.loop.whatever). You should need only one metadata type, to disable licm-based versioning, it should be documented in the LangRef, and you can add it to both original and versioned loops once this pass has run.

Sure I'll use only one meta data.
Will update LangRef as well.

================
Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:426
@@ +425,3 @@
+  }
+  // Loop should have at least invariant store instruction.
+  if (!InvariantCounter) {
----------------
Should comment say "load or store instruction"?

Ah, this comment needs to be updated.

Regards,
Ashutosh

Hal, is below comments on call handling looks OK to you ?

>   Comment at: lib/Transforms/Scalar/LoopVersioningLICM.cpp:356

>   @@ +355,3 @@

>   +  const bool IsAnnotatedParallel = CurLoop->isAnnotatedParallel();

>   +  // We dont support call instructions. however, we ignore few intrinsic

>   +  // and libfunc callsite. We don't allow non-intrinsic, non-libfunc callsite

>   ----------------

>   Why not? I understand that the vectorizer has these checks, but I don't see why this belongs in an LICM pass?

>

> There is a possibility that call may modify aliasing behavior, which may defeat the purpose of versioning & runtime checks.


Ah, good point. However, please then replace this check with an appropriate AA getModRef-type check for whether the call might alias with the relevant pointers (if the function, for example, does not alias with any of the loop-invariant accesses, then it can't affect what you're trying to do, although you'll need to be somewhat more careful about adding the noalias metadata to those functions, etc.).

I'm not sure that itself would be sufficient, consider indirect calls probably need to consider function pointer in runtime check.
Also there is a possibility that in call argument one of the runtime pointer escaped, in such cases it become more difficult to ensure correctness.
That's why for simplicity I considered vectorizer approach to consider only few functions.

Can I keep this like vectorizer ?

Regards,
Ashutosh

ashutosh.nema edited edge metadata.

Incorporated review comments from Hal.

This patch does not contain LangRef changes, I'll soon submitting it.

Incorporated Hal comments & updated LangRef.

Hi Hal,

Is this looks OK now ?

Regards,
Ashutosh

I apologize for taking so long to get back to this...

Do the runtme checks inserted cover only the interactions between the invariant access and the non-invariant accesses, or do they also perform range overlap checks on the non-invariant accesses?

lib/Transforms/Scalar/LoopVersioningLICM.cpp
18

conservative (default) alias -> conservative (default) aliasing assumptions

18

Aggressive alias version -> The version of the loop making aggressive aliasing assumptions

19

access -> accesses

19

version -> versions

22

ensures aliasing of memory -> ensures the lack of memory aliasing

25

I'd reword this whole sentence, as: If the runtime check detects any memory aliasing, then the original loop is executed. Otherwise, the version with aggressive aliasing assumptions is used.

250

I think this would be better as:

<< "    loop is not bottom tested\n");
254

This is fine for now. I suspect that eventually we'll want to deal with this by versioning the whole loop nest instead of just the inner loop.

344

Didnt found -> Didn't find

364

if( -> if (

374

Parallel loops must not have aliasing loop-invariant memory accesses, or else they'd not be trivially vectorizable. We don't need to version anything in this case, but rather, we should be able to hoist any invariant access. If we don't already get this case, then don't get it here (we don't need to version the loop), but rather, we should handle this directly in the usual LICM pass.

376

load( -> load (

391

store( -> store (

445

Please avoid the use of floating-point computation here (we don't necessarily compile LLVM with strict IEEE support, so we need to be careful to make sure that decision-making procedures are not sensitive to small accuracy changes).

InvariantPercentage < InvariantThreshold

can become:

InvariantCounter*100 < InvariantThreshold*LoadAndStoreCounter

(or something like that)

hfinkel added inline comments.Dec 11 2015, 3:29 AM
docs/LangRef.rst
4561

Add a space just before "(LICM)".

4562

clone loop -> all other loop versions

4562

original loop -> the original loop

4563

It helps suggesting loop-versioning-licm pass that the loop should not be re-versioned. -> The loop-versioning-licm pass will not create additional loop versions of loops with this metadata.

lib/Transforms/Scalar/LoopVersioningLICM.cpp
13

un surety -> uncertainty

13

restrict LICM to proceed -> restricts LICM from proceeding

14

Cases -> In cases

14

like to -> will

17

creates version -> create a version

17

alias -> aliasing assumptions

ashutosh.nema marked 27 inline comments as done.

Thanks Hal, for again looking into this change.

Do the runtme checks inserted cover only the interactions between
the invariant access and the non-invariant accesses, or do they also
perform range overlap checks on the non-invariant accesses?

Runtime check also perform overlap checks between non-invariant
accesses. without checking all memory access we can’t make aggressive
aliasing assumptions.

In this patch, incorporated comments from Hal.

Hi Hal,

Is this looks OK now ?

Regards,
Ashutosh

Thanks Hal, for again looking into this change.

Do the runtme checks inserted cover only the interactions between
the invariant access and the non-invariant accesses, or do they also
perform range overlap checks on the non-invariant accesses?

Runtime check also perform overlap checks between non-invariant
accesses. without checking all memory access we can’t make aggressive
aliasing assumptions.

In that case, you should also add llvm.mem.parallel_loop_access metadata to the versioned loop. Otherwise, the vectorizer will add a duplicate set of checks should it decide to vectorize the versiond loop. We also need to add metadata to the original loop to disable vectorization (as we already know that the vectorization checks will have failed when that loop is reached).

lib/Transforms/IPO/PassManagerBuilder.cpp
111

new, experimental -> experimental

(If something is experimental, and yet not new, something's probably wrong ;) )

lib/Transforms/Scalar/LoopVersioningLICM.cpp
6

Remove this paragraph.

10

Remove this line (it, and the paragraph above, are not needed because the problem is explained clearly in the next paragraph).

63

These lines need to appear directly under the "The LLVM Compiler Infrastructure" line.

136

This utility function is not right, we need to combine the loop metadata here with any other loop metadata which might also exist.

Please see the writeHintsToMetadata function in lib/Transforms/Vectorize/LoopVectorize.cpp and/or the relevant code in lib/Transforms/Utils/LoopUnrollRuntime.cpp near the end of the CloneLoopBlocks function.

As a general note, we really should refactor this into a common utility function.

155

Add here:

AU.addPreserved<AAResultsWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();

(so that you don't kill off GlobalsAA here).

311

typeCheck -> TypeCheck

324

You're checking here that all points in the given alias set have the same type; why?

Dear All,

sorry for jumping in late. I wonder if there is a specific reason this pass is run as in the canonicalization phase as part of the inliner loop. Would it not make more sense to run it before the vectorizers?

Best,
Tobias

Hi Tobias,

I wonder if there is a specific reason this pass is run as in the
canonicalization phase as part of the inliner loop.

Not very clear, what you mean by 'inliner loop', is it inner loop ?

Would it not make more sense to run it before the vectorizers?

If you see pass manager its scheduled prior to Vectorizer.
As its LICM based, it’s not performing any vectorizer legality.
It helps in cases where LICM became conservative because of
memory aliasing uncertainty.

Thanks,
Ashutosh

Hi Hal,

Will work on your comments and come back.

Thanks,
Ashutosh

lib/Transforms/Scalar/LoopVersioningLICM.cpp
324

Actually this is a pre-condition in LICM’s “promoteLoopAccessesToScalars”
where it expects all pointers in alias should have same type.

<LICM.cpp>
881 Check that all of the pointers in the alias set have the same type. We
882
cannot (yet) promote a memory location that is loaded and stored in
883 // different sizes.
884 if (SomePtr->getType() != ASIV->getType())
885 return Changed;

To confirm same behaviour we added this check.

Addressed Hal's comment.

mehdi_amini added inline comments.Jan 2 2016, 8:36 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

Why is is done here and not where LICM is usually done? Line 419?
Did you try multiple placements? Do you have benchmarks results that drives this choice?

ashutosh.nema added inline comments.Jan 12 2016, 9:10 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

We want to run this when inining is over, because after that we may see more accurate aliasing.
That’s why we placed it after ‘createBarrierNoopPass’. If we do this earlier probably of no-alias
aliasing assumptions in version loop, other optimizations may get some benefit.

This can be schedule later as well but I don’t see any benefit.

We have observed good gains in our internal benchmarks, mostly based on customer workloads.

mehdi_amini added inline comments.Jan 12 2016, 10:36 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

I'm not sure I follow why you need the inliner to be fully complete over the whole module and not having processed the function on which you want to run LICM. What kind of aliasing will this avoid? Do you have an example?

Also, you are saying that you don't see any benefit of scheduling this later, can you elaborate with respect to the other run of LICM I pointed?

Also, I ask about benchmark with respect to the different possible placement for this pass, and you didn't answer this in particular.

grosser added inline comments.Jan 12 2016, 10:40 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

One reason to run this late is that too early versioning may prevent both further inlining (due to increased code size) as well as versioning later a possibly larger loop.

Furthermore, to my understanding the inlining loop is indeed a canonicalization phase. We are planning to run Polly after the inliner loop for this very same reason.

mehdi_amini added inline comments.Jan 12 2016, 10:47 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

Fair for the inliner loop. But what about the already existing LICM that already runs a little bit later?

grosser added inline comments.Jan 12 2016, 10:55 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

Are you talking about the LICM pass within the inliner loop? It is to my understanding part of the canonicalization sequence that helps both to eliminate loops (in case all statements can be proven loop invariant) and which also helps SCEV by moving certain SCEVUnknowns further out of the tree. As the existing LICM does not add a large code size increase, it seems to be save to be run in the inliner loop.

(LICM is not a pure canonicalization hence it causes some troubles for Polly, but we are working on addressing them on our side).

mehdi_amini added inline comments.Jan 12 2016, 11:27 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

No, not within the inliner loop, see line 419 (or 411 before the patch)

ashutosh.nema added inline comments.Jan 12 2016, 11:37 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

We did tried multiple placements i.e. prior to inlining completion and post completion.
As this is an internal benchmark, we can’t disclose details. But with these placements we
did not noticed any degrade in our regular testing. Reason for running just after inlining
completion is other passes might gain from non-alias version of loop (As of now we do
not have any case where other optimization getting benefits from no-alias version of loop).

I think this is getting really close to being ready.

docs/LangRef.rst
4557

For consistency with our other metadata (such as llvm.loop.unroll.disable), we should name this:

llvm.loop.licm_versioning.disable
4560

This paragraph is too implementation-specific. How about this:

This metadata indicates that the loop should not be versioned for the purpose of enabling loop-invariant code motion (LICM). The metadata has a single operand which is the string `llvm.loop.licm_versioning.disable`.
include/llvm/Transforms/Utils/LoopUtils.h
383

Remove space in between the * and TheLoop.

386

How about calling this:

addStringMetadataToLoop
lib/Transforms/Scalar/LoopVersioningLICM.cpp
11

This first statement is actually misleading; it should always return MayAlias when it is uncertain. How about saying this:

Then alias analysis is uncertain about the aliasing between any two accesses, it will return MayAlias.
13

will -> might

17

and the other -> in addition to the original

22

"Based on this check result at runtime any of the loop gets executed," -> The result of the runtime check determines which of the loop versions is executed:

29

LoopVersioningLICM -> LoopVersioningLICM's

31

access -> accesses

32

access -> accesses

33

of runtime check -> of the runtime check

36

transform -> transforms

96

Please update to llvm.loop.licm_versioning.disable

101

instruction -> instructions

104

LoopVersioningLICM threshold minimum -> LoopVersioningLICM's minimum

105

for -> of

106

instruction -> instructions

106

in a -> per

113

LoopVersioningLICM -> LoopVersioningLICM's

119

LoopVersioningLICM's maximum number of pointers per runtime check

303

Why are you checking this?

387

Shouldn't you use LAI->getNumRuntimePointerChecks() here instead of PointerSet.size()?

414

Performing this check per instruction seems silly. Maybe move to legalLoopStructure?

633

do-loop-versioning-licm -> loop-versioning-licm

(I often just use DEBUG_TYPE for this; there's no reason for them to be out-of-sync)

644

do-loop-versioning-licm -> loop-versioning-licm (or DEBUG_TYPE)

mehdi_amini added inline comments.Jan 15 2016, 11:21 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

Well as long as it is not enabled by default in the pipeline, it does not matter. It is worth noting that this question may come back later though.

hfinkel added inline comments.Jan 15 2016, 11:29 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

I don't think that making functions larger prior to inlining is something we can justify (i.e. I'm sure that will cause regressions). Also, as functions get inlined, it is likely to turn out that aliasing that we could not reason about in the function in isolation can be reasoned about in the context of a particular caller. Making the decision to version early will be problematic in that sense as well (it would be nice to think that, in such a case, we'd statically evaluate the runtime condition one way or the other, but I don't think that can happen now either).

mehdi_amini added inline comments.Jan 15 2016, 11:42 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

After talking on IRC, the answer to my original question about the LICM that runs later in the pipeline is that it runs after the vectorizer as part of post-vectorize cleanup. The current point of the multi-versioning LICM is different: it may help to allow better vectorization, so it makes sense to differentiate with the existing LICM.

hfinkel added inline comments.Jan 15 2016, 11:47 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
385

Correct. One issue, however, is that unconditionally running LICM after the versioning pass is not ideal (in the compile-time sense), because LICM Is not super cheap.

Given that we broke LICM up into a set of utility functions (exposed in include/llvm/Transforms/Utils/LoopUtils.h), we should really call them from this pass only if we actually do anything.

anemet added inline comments.Jan 15 2016, 11:57 AM
lib/Transforms/IPO/PassManagerBuilder.cpp
382–385

@ashutosh.nema, We should document the outcome of the above discussion in a comment here. E.g.:

The reason we run LICMLVer right after the CGSCC pass manager so that later patches can take advantage of the new non-alias annotations.

And the reason we actually schedule an LICM at this point so that invariant access to global could be hoisted out of the loop which could allow further vectorizations.

@hfinkel, Please let me know if any of this is inaccurate.

ashutosh.nema added inline comments.Jan 18 2016, 10:34 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
382–385

Sure Adam, will document key points in comments.

385

LICM utilities can be called from LoopVersioningLICM, but we identified issues
running it, i.e. in few cases we noticed LoopInfo for clone is not getting updated,
which can make this optimization ineffective.
Understand its compile time expensive to run complete pass, but If we run LICM
pass later we will not see issues like LoopInfo get getting updated.
Also this is an optional pass, should be OK to run LICM.

anemet added inline comments.Jan 20 2016, 6:05 PM
lib/Transforms/Scalar/LoopVersioningLICM.cpp
556–576

You probably want to add a comment saying that you can add no-alias between any pairs of memory operations because you ignore loops with must aliasing accesses. Otherwise I don't think this would be valid.

623–624

Why is this necessary? LoopVersioning is supposed to update the DT.

Addressed Hal's & Adam's comments.

ashutosh.nema added inline comments.Jan 22 2016, 5:01 AM
lib/Transforms/Scalar/LoopVersioningLICM.cpp
303

Loop trip count is required for runtime check generation, as the bound checks are based on this.

387

‘LAI->getNumRuntimePointerChecks()’ returns number of runtime checks but here we are checking maximum possible pointers in runtime check.

633

Tried making it 'loop-versioning-licm' but later found command line error mentioning option registered more than once.
So, made DEBUG_TYPE as “do-loop-versioning-licm”. Hoping this should be OK.

Hi Hal,

Is this Looks OK now ?

Regards,
Ashutosh

hfinkel added inline comments.Feb 2 2016, 9:44 PM
lib/Transforms/IPO/PassManagerBuilder.cpp
120

"loop-versioning-licm" -> "enable-loop-versioning-licm"

lib/Transforms/Scalar/LoopVersioningLICM.cpp
304

Okay, please explain this better in the comment. Something like:

// We need to be able to compute the loop trip count in order to generate the bound checks.
388

Why do you care how many pointers there are? I'd think only the number of checks generated matters in terms of cost.

634

No, please rename the other option (as I noted by the other option, putting adding enable- as a prefix would be natural). Then make this change.

Thanks Hal.

I'll work on your comments and come back.

lib/Transforms/Scalar/LoopVersioningLICM.cpp
634

I did not understood this comment completely.

Are you saying use "enable-loop-versioning-licm" here, in DEBUG_TYPE & in PassManagerBuilder ?

hfinkel added inline comments.Feb 3 2016, 4:23 AM
lib/Transforms/Scalar/LoopVersioningLICM.cpp
634

Use "enable-loop-versioning-licm" in PassManagerBuilder, and then use "loop-versioning-licm" in DEBUG_TYPE and here. Thanks!

Thanks Hal for clarification, will make these changes and come back.

Regards,
Ashutosh

Addressed Hal's comments.

Corrected include file order.

hfinkel accepted this revision.Feb 5 2016, 11:03 AM
hfinkel edited edge metadata.

LGTM, thanks!

This revision was automatically updated to reflect the committed changes.