Page MenuHomePhabricator

[LAA] Add Memory dependence and unknown bounds remarks.
Needs ReviewPublic

Authored by malharJ on Aug 19 2021, 6:11 AM.

Details

Summary

Adds new optimization remarks when vectorization fails.

More specifically, new remarks are added for following 5 cases:

  • Backward dependency
  • Backward dependency that prevents Store-to-load forwarding
  • Forward dependency that prevents Store-to-load forwarding
  • Unknown dependency
  • Unknown array bounds

The failures are classified mainly into 2 categories
based on whether it is an aliased memory access
that is unsafe, and whether a bound can be computed for an array.

It is important to note that only one of the sources
of failures (to vectorize) is reported by the remarks.
This source of failure may not be first in program order.
In the future if may be possible to report all sources
of failure (and in program order).

A regression test has been added to test the following cases:

a) Loop can be vectorized: An optimization remark is
emitted indicating that vectorization was successful.
(note that this functionality was already present, ie.
has not been added as part of this patch.)

b) Loop can not be vectorized: In this case an optimization
remark will be emitted for one source of failure.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fhahn added inline comments.Sep 6 2021, 4:57 AM
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
252

There's no need to return a SmallVector with the length encoded, the callers should not care about that. You can use ArrayRef if the callers do not add to the vector or SmalLVectorImpl if they need to add elements.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1767

Why call it UnsafeDependences if it is supposed to only contain unknown dependences?

llvm/test/Transforms/LoopVectorize/loopvectorize-opt-remarks.ll
1 ↗(On Diff #370670)

please avoid adding tests with -enable-new-pm=0. You should be able to move that one to the LoopAccessAnalysis directory.

6 ↗(On Diff #370670)

Instead of having the C code here, I think it would be more helpful if you explain what this tests in a sentence or 2 with references to the IR, e.g. which memory instruction/GEP has the uncomputable bound and *why*.

malharJ updated this revision to Diff 370983.EditedSep 6 2021, 7:25 PM
malharJ marked 4 inline comments as done.
  1. Fixed the unit tests:

    This required initializing the ORE object using one of the constructors: OptimizationRemarkEmitter(const Function& F)

    Previously several of the unit tests were hitting memory errors and they seem to have been fixed by this change.

    There are also some minor formatting updates to the remark emitted.
  1. Removed checks for loop distribution pragma remark from LIT tests. This is because there is no analysis being done to support it.
  1. Updated recordAnalysis() to allow generating multiple reports when they are of the same type. This seems like a logical thing to do, for example if you have multiple instances of unbound array index case in a loop, it would be good for a user to view (in the report) similar types of errors and then correct them.
  1. Updated LIT test to use new Pass Manager syntax
  1. Minor formatting updates
malharJ added inline comments.Sep 6 2021, 7:27 PM
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
252

Currently the code uses "auto" when accessing the. value returned by this getter
so there isn't a need for the user to know the size used in the template ...

regardless, I've changed it to use SmallVector<T> instead if that's ok.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
588

It was being accessed from outside the class on line 2045.

I've moved it to private now and added a public getter instead.

1767

I think UnsafeDependences contains both unknown and known unsafe dependences.
I think the comment is unclear so I'm removing it.

malharJ updated this revision to Diff 370992.Sep 6 2021, 11:08 PM

This is to correct a mistake made in the previous patch.

"loop not vectorized" should not be emitted from loop-access analysis.
(the loop-vectorize takes care of adding 'loop not vectorized')

llvm/test/Transforms/LoopVectorize/diag-with-hotness-info-2.ll
25

This hint is a useful one, and it was added purposefully. It should remain unchanged. I would suggest that you add a case in elaborateMemoryReport to recreate this hint.

malharJ updated this revision to Diff 371057.Sep 7 2021, 6:19 AM
malharJ marked an inline comment as done.

Re-added the remark/note about using pragma loop distribute.

Just one small comment. Otherwise the patch looks good to me

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
534

Nit: Missing a slash

malharJ updated this revision to Diff 371371.Sep 8 2021, 9:28 AM
malharJ marked 2 inline comments as done.

Trivial formatting update.

Thanks Malhar. The patch looks good to me

sdesmalen added inline comments.Sep 9 2021, 5:04 AM
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
252

If there is no need to modify the array returned by getUnsafeDependences (which there isn't, because the returned value is const), ArrayRef seems the better class to use, because it has all sorts of convenient utilities and is similarly just a (immutable) reference to the array.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2148

Is this case not already covered by the recordAnalysis call above?

It seems like a mechanism for doing this already exists (recordAnalysis), it may be worth extending that to handle multiple reports (there is currently a limitation that it uses a single Report variable, but perhaps that could be made into a vector of reports which can be appended to).

2205

nit: single-use variable, can be inlined in the next statement.

2262–2263

nit: please start your comments with capitalisation and end with a period.

llvm/test/Analysis/LoopAccessAnalysis/memory-dep-remarks.ll
12 ↗(On Diff #371371)

Would a prefix like "Loop cannot be vectorized: " be useful?

I think the message is a bit cryptic, how about:

Unable to determine aliasing information because the bounds of this access cannot be computed.
malharJ updated this revision to Diff 371619.Sep 9 2021, 9:13 AM
malharJ marked 3 inline comments as done.
  • Updated remark for unknown array bounds case
  • Moved call to recordAnalysis() inside elaborateMemoryReport() for the UnsafeDataDependenceTriedRT case
  • Changed getUnsafeDependences() to return an ArrayRef since return value is only read.
  • Minor formatting updates
malharJ added inline comments.Sep 9 2021, 9:14 AM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2148

I agree it has been covered by the call to recordAnalysis() at line 2070 ...
I've now moved that call to recordAnalysis() to inside elaborateMemoryReport().

Regarding the second issue, I'm not sure myself why recordAnalysis has Report as scalar and not a vector but that is not really a part of my change ... I just tried to re-use it.

llvm/test/Analysis/LoopAccessAnalysis/memory-dep-remarks.ll
12 ↗(On Diff #371371)

If you see the earlier version/diffs of the patch, the reporting was being done in LoopVectorizer.
That time I had added "loop not vectorized" in the tests.

But now that it has moved to LoopAccessAnalysis, we are no longer emitting "loop not vectorized" here.
That prefix is prepended by the function reportVectorizationFailure() in LoopVectorize.cpp

For the latter part, I have made the change.

malharJ updated this revision to Diff 371751.Sep 9 2021, 4:59 PM

corrected trivial typo in LIT test.

malharJ updated this revision to Diff 371826.Sep 10 2021, 1:49 AM

minor formatting update for fixing a failing test.

Are there any other review comments ?

david-arm accepted this revision.Oct 1 2021, 12:29 AM

LGTM! It looks like you've addressed all the other reviewer's comments and I think the patch looks good now. Thanks!

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2266

nit: Maybe it's worth moving the recordAnalysis call to before the for loop, i.e.

OptimizationRemarkAnalysis R = recordAnalysis("UnknownArrayBounds", I);
for (...)
This revision is now accepted and ready to land.Oct 1 2021, 12:29 AM
fhahn added inline comments.Oct 1 2021, 1:53 AM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
777

If computing the bounds fails here, we may retry creating checks by adding assumptions below (see line 806).

I think it could happen that we have multiple uncomputable pointer bounds here, but for some of them we may be able to actually compute bounds below. Should we remove the ones we can compute bounds below from the set?

llvm/test/Analysis/LoopAccessAnalysis/memory-dep-remarks.ll
12 ↗(On Diff #371371)

Unable to determine aliasing information because the bounds of this access cannot be computed.

I'm not sure this is entirely accurate. AFAIK uncomputable bounds here mean we cannot generate runtime checks.

'aliasing information' seems ambiguous here, because at the point where we check whether the bounds are computable, we already determined that the access may alias another access, independent of whether the bounds can be computed or not (like the underlying objects may alias).

malharJ updated this revision to Diff 380621.Oct 19 2021, 2:38 AM
malharJ marked an inline comment as done.
  1. Updated format of remark
  2. Removed pointers from UncomputablePtrs Set if it's bound can be computed on a retry (with Assumptions).
malharJ added inline comments.Oct 19 2021, 2:39 AM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
777

Done. thanks for that suggestion.

2266

Unfortunately that can't be done because Instruction* I is being declared inside the for-loop.

llvm/test/Analysis/LoopAccessAnalysis/memory-dep-remarks.ll
12 ↗(On Diff #371371)

Ok I agree.

I have changed the remark accordingly.

Are there any further review comments ?

malharJ updated this revision to Diff 383842.Nov 1 2021, 11:13 AM
  • Removed the mechanism that collected the remarks (enum "FailureReason" and function "elaborateMemoryReport()") and emitted them at the end.

    Instead, now the remarks are emitted inside LoopAccessInfo::analyzeLoop(), ie. while the loop is being analyzed.
  • Only one remark is emitted, at the point it is found.
malharJ updated this revision to Diff 384022.EditedNov 2 2021, 3:08 AM
  • Removed remark emission (OptimizationRemarkEmitter) to outside LoopAccessAnalysis. LAA generates a "Report" which can be used by other passes to emit remarks.

This also fixes the issue with emission of the same/similar remark twice (once by LAA, and once by a parent pass, eg: LV)

  • Updated the tests accordingly.
malharJ updated this revision to Diff 384037.EditedTue, Nov 2, 4:10 AM

Fixed minor typos and one LIT test (vectorization-remarks-missed.ll)

@sdesmalen , can you please review the latest updates ?

sdesmalen requested changes to this revision.Wed, Nov 10, 5:32 AM

Hi @malharJ, thanks for cleaning up the patch and making changes based on what we discussed offline, it's a bit more manageable to review now.
There are still some changes that I think are unnecessary, and I found that the patch needs to be rebased (this is how I found some confusing output test/Analysis/LoopAccessAnalysis/pointer-phis.ll).

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
623

This seems unused?

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1770

Can there only be a single unsafe/unknown dependence? Or can there be more?

2109

UncomputablePtr and the mechanism around it seems entirely redundant, because it doesn't add any information that is used anywhere. The only reason that UncomputablePtr is collected is to avoid printing "Cannot identify array bounds" when it gets to the then-block of if (!CanDoRTIfNeeded) { ... }. The information is redundant, because when UncomputablePtr is set, then CanDoRT is false, and vice-versa.

2112

nit: The capitalisation of cannot -> Cannot seems unnecessary.

2148

unnecessary change.

2176

These changes here obfuscate the report that's generated by LAA when running loop(print-access-info). The information printed was:

Loop access info in function 'store_with_pointer_phi_incoming_phi':
  loop.header:
 The compiler can't determine the cause of the issue.
    Dependences:
      Unknown:
          %v8 = load double, double* %arrayidx, align 8 ->
          store double %mul16, double* %ptr.2, align 8

The information that is added by the remark is:

Loop access info in function 'store_with_pointer_phi_incoming_phi':
  loop.header:
    Report: unsafe dependent memory operations in loop. Use #pragma loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
Unknown data dependence. Memory location is the same as accessed at <UNKNOWN LOCATION>. 
 The compiler can't determine the cause of the issue.
    Dependences:
      Unknown:
          %v8 = load double, double* %arrayidx, align 8 ->  
          store double %mul16, double* %ptr.2, align 8

The extra information here isn't particularly useful, especially because it doesn't have any line information. Perhaps the code should disable the extra info if the location is unknown/unavailable.

2189

nit: Please remove the newline.

2189

Instead of using + to concatenate two strings, let this be handled by raw_ostream using << .

llvm/test/Analysis/LoopAccessAnalysis/stride-access-dependence.ll
121

Please CHECK-NEXT for the full line that is printed.

This revision now requires changes to proceed.Wed, Nov 10, 5:32 AM
malharJ updated this revision to Diff 387087.Sun, Nov 14, 7:44 AM
malharJ marked 6 inline comments as done.
  • Rebased patch onto main
  • Updated code to avoid printing <unknown> when no debug info is present
  • minor formatting updates
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
623

thanks for pointing it out. Removed it.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
1770

There can be more.

But we are only emitting (as a remark) the first one found.

2109

Ok, I agree but there's an issue here ..

LoopAccessInfo::recordAnalysis() cant be called from within the function ( AccessAnalysis::canCheckPtrAtRT() ).
as it's not a member of AccessAnalysis class.

Perhaps one way to do it would be for AccessAnalysis::canCheckPtrAtRT() to accept a parameter
by reference and then we can use the value after the call as input to recordAnalysis() here.
But I'm afraid this approach is not much less verbose than current approach pf having UncomputablePtr as a member variable.

2176

Thanks for pointing this out. Done.

2189

can we keep this newline ?

The current text (Note: it was not introduced by this patch) is too long already:

unsafe dependent memory operations in loop. Use "#pragma loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop

llvm/test/Analysis/LoopAccessAnalysis/stride-access-dependence.ll
121

Agreed.

But now the code will only print the remaining text: "Memory location is the same as ...etc"
when debug info is available.

so this is no longer an issue (since this test does not contain any debug info/metadata).

sdesmalen added inline comments.Wed, Nov 24, 9:16 AM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
586

UncomputablePtr doesn't really need a separate variable in AccessAnalysis, since it's only set by one function, and used directly by the function that calls it.

You can change canCheckPtrAtRT to take Value **UncomputablePtr = nullptr, and in canCheckPtrAtRT assign the value if the value is requested:

if (UncomputablePtr)
  *UncomputablePtr = Access.getPointer();

Where you call it, you can have:

Value *UncomputablePtr = nullptr;
bool CanDoRTIfNeeded =
    Accesses.canCheckPtrAtRT(*PtrRtChecking, PSE->getSE(), TheLoop,
                             SymbolicStrides, false, &UncomputablePtr);

if (!CanDoRTIfNeeded) {
  if (auto *I = dyn_cast_or_null<Instruction>(UncomputablePtr))
    recordAnalysis("UnknownArrayBounds", I)
}

Can you split this change out into a separate patch with a test that demonstrates the change?

1779

It shows here that the dependences are already collected. You can iterate Dependences to find the dependence that isn't safe, so that you don't have to add UnsafeDependence and maintain that separately.

1861

I guess this can be more briefly written as:

Loc = I->getDebugLoc();
if (auto *PtrI = dyn_cast_or_null<Instruction>(getPointerOperand(I)))
  Loc = PtrI->getDebugLoc();

(at which point it probably doesn't require a separate function anymore, especially since it has only one use).

1864–1866

This seems like a case that should be added to getPointerOperand ?

2110

It now passes in I as the instruction, but the debug location of the remark seems unchanged in the test. What value is this adding?

2111

Just leave the name as CantIdentifyArrayBounds ?

RKSimon resigned from this revision.Mon, Nov 29, 2:50 AM

Sorry this is not an area I known enough about to review

fhahn added inline comments.Mon, Nov 29, 3:20 AM
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
304

Does this need to be a shared_ptr? If you want to encode the fact that it may not be set, using Optional may be a better choice. Or you could initialize it to NoDep in case there is no unsafe dependence.