Page MenuHomePhabricator

[LAA] Add Memory dependence and unknown bounds remarks.
Needs ReviewPublic

Authored by malharJ on Aug 19 2021, 6:11 AM.



Adds new optimization remarks when vectorization fails.

More specifically, new remarks are added for following 5 cases:

  • Backward dependency
  • Backward dependency that prevents Store-to-load forwarding
  • Forward dependency that prevents Store-to-load forwarding
  • Unknown dependency
  • Unknown array bounds

The failures are classified mainly into 2 categories
based on whether it is an aliased memory access
that is unsafe, and whether a bound can be computed for an array.

It is important to note that only one of the sources
of failures (to vectorize) is reported by the remarks.
This source of failure may not be first in program order.
In the future if may be possible to report all sources
of failure (and in program order).

A regression test has been added to test the following cases:

a) Loop can be vectorized: An optimization remark is
emitted indicating that vectorization was successful.
(note that this functionality was already present, ie.
has not been added as part of this patch.)

b) Loop can not be vectorized: In this case an optimization
remark will be emitted for one source of failure.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fhahn added inline comments.Sep 6 2021, 4:57 AM

There's no need to return a SmallVector with the length encoded, the callers should not care about that. You can use ArrayRef if the callers do not add to the vector or SmalLVectorImpl if they need to add elements.


Why call it UnsafeDependences if it is supposed to only contain unknown dependences?

1 ↗(On Diff #370670)

please avoid adding tests with -enable-new-pm=0. You should be able to move that one to the LoopAccessAnalysis directory.

6 ↗(On Diff #370670)

Instead of having the C code here, I think it would be more helpful if you explain what this tests in a sentence or 2 with references to the IR, e.g. which memory instruction/GEP has the uncomputable bound and *why*.

malharJ updated this revision to Diff 370983.EditedSep 6 2021, 7:25 PM
malharJ marked 4 inline comments as done.
  1. Fixed the unit tests:

    This required initializing the ORE object using one of the constructors: OptimizationRemarkEmitter(const Function& F)

    Previously several of the unit tests were hitting memory errors and they seem to have been fixed by this change.

    There are also some minor formatting updates to the remark emitted.
  1. Removed checks for loop distribution pragma remark from LIT tests. This is because there is no analysis being done to support it.
  1. Updated recordAnalysis() to allow generating multiple reports when they are of the same type. This seems like a logical thing to do, for example if you have multiple instances of unbound array index case in a loop, it would be good for a user to view (in the report) similar types of errors and then correct them.
  1. Updated LIT test to use new Pass Manager syntax
  1. Minor formatting updates
malharJ added inline comments.Sep 6 2021, 7:27 PM

Currently the code uses "auto" when accessing the. value returned by this getter
so there isn't a need for the user to know the size used in the template ...

regardless, I've changed it to use SmallVector<T> instead if that's ok.


It was being accessed from outside the class on line 2045.

I've moved it to private now and added a public getter instead.


I think UnsafeDependences contains both unknown and known unsafe dependences.
I think the comment is unclear so I'm removing it.

malharJ updated this revision to Diff 370992.Sep 6 2021, 11:08 PM

This is to correct a mistake made in the previous patch.

"loop not vectorized" should not be emitted from loop-access analysis.
(the loop-vectorize takes care of adding 'loop not vectorized')


This hint is a useful one, and it was added purposefully. It should remain unchanged. I would suggest that you add a case in elaborateMemoryReport to recreate this hint.

malharJ updated this revision to Diff 371057.Sep 7 2021, 6:19 AM
malharJ marked an inline comment as done.

Re-added the remark/note about using pragma loop distribute.

Just one small comment. Otherwise the patch looks good to me


Nit: Missing a slash

malharJ updated this revision to Diff 371371.Sep 8 2021, 9:28 AM
malharJ marked 2 inline comments as done.

Trivial formatting update.

Thanks Malhar. The patch looks good to me

sdesmalen added inline comments.Sep 9 2021, 5:04 AM

If there is no need to modify the array returned by getUnsafeDependences (which there isn't, because the returned value is const), ArrayRef seems the better class to use, because it has all sorts of convenient utilities and is similarly just a (immutable) reference to the array.


Is this case not already covered by the recordAnalysis call above?

It seems like a mechanism for doing this already exists (recordAnalysis), it may be worth extending that to handle multiple reports (there is currently a limitation that it uses a single Report variable, but perhaps that could be made into a vector of reports which can be appended to).


nit: single-use variable, can be inlined in the next statement.


nit: please start your comments with capitalisation and end with a period.

12 ↗(On Diff #371371)

Would a prefix like "Loop cannot be vectorized: " be useful?

I think the message is a bit cryptic, how about:

Unable to determine aliasing information because the bounds of this access cannot be computed.
malharJ updated this revision to Diff 371619.Sep 9 2021, 9:13 AM
malharJ marked 3 inline comments as done.
  • Updated remark for unknown array bounds case
  • Moved call to recordAnalysis() inside elaborateMemoryReport() for the UnsafeDataDependenceTriedRT case
  • Changed getUnsafeDependences() to return an ArrayRef since return value is only read.
  • Minor formatting updates
malharJ added inline comments.Sep 9 2021, 9:14 AM

I agree it has been covered by the call to recordAnalysis() at line 2070 ...
I've now moved that call to recordAnalysis() to inside elaborateMemoryReport().

Regarding the second issue, I'm not sure myself why recordAnalysis has Report as scalar and not a vector but that is not really a part of my change ... I just tried to re-use it.

12 ↗(On Diff #371371)

If you see the earlier version/diffs of the patch, the reporting was being done in LoopVectorizer.
That time I had added "loop not vectorized" in the tests.

But now that it has moved to LoopAccessAnalysis, we are no longer emitting "loop not vectorized" here.
That prefix is prepended by the function reportVectorizationFailure() in LoopVectorize.cpp

For the latter part, I have made the change.

malharJ updated this revision to Diff 371751.Sep 9 2021, 4:59 PM

corrected trivial typo in LIT test.

malharJ updated this revision to Diff 371826.Sep 10 2021, 1:49 AM

minor formatting update for fixing a failing test.

Are there any other review comments ?

david-arm accepted this revision.Oct 1 2021, 12:29 AM

LGTM! It looks like you've addressed all the other reviewer's comments and I think the patch looks good now. Thanks!


nit: Maybe it's worth moving the recordAnalysis call to before the for loop, i.e.

OptimizationRemarkAnalysis R = recordAnalysis("UnknownArrayBounds", I);
for (...)
This revision is now accepted and ready to land.Oct 1 2021, 12:29 AM
fhahn added inline comments.Oct 1 2021, 1:53 AM

If computing the bounds fails here, we may retry creating checks by adding assumptions below (see line 806).

I think it could happen that we have multiple uncomputable pointer bounds here, but for some of them we may be able to actually compute bounds below. Should we remove the ones we can compute bounds below from the set?

12 ↗(On Diff #371371)

Unable to determine aliasing information because the bounds of this access cannot be computed.

I'm not sure this is entirely accurate. AFAIK uncomputable bounds here mean we cannot generate runtime checks.

'aliasing information' seems ambiguous here, because at the point where we check whether the bounds are computable, we already determined that the access may alias another access, independent of whether the bounds can be computed or not (like the underlying objects may alias).

malharJ updated this revision to Diff 380621.Oct 19 2021, 2:38 AM
malharJ marked an inline comment as done.
  1. Updated format of remark
  2. Removed pointers from UncomputablePtrs Set if it's bound can be computed on a retry (with Assumptions).
malharJ added inline comments.Oct 19 2021, 2:39 AM

Done. thanks for that suggestion.


Unfortunately that can't be done because Instruction* I is being declared inside the for-loop.

12 ↗(On Diff #371371)

Ok I agree.

I have changed the remark accordingly.

Are there any further review comments ?

malharJ updated this revision to Diff 383842.Nov 1 2021, 11:13 AM
  • Removed the mechanism that collected the remarks (enum "FailureReason" and function "elaborateMemoryReport()") and emitted them at the end.

    Instead, now the remarks are emitted inside LoopAccessInfo::analyzeLoop(), ie. while the loop is being analyzed.
  • Only one remark is emitted, at the point it is found.
malharJ updated this revision to Diff 384022.EditedNov 2 2021, 3:08 AM
  • Removed remark emission (OptimizationRemarkEmitter) to outside LoopAccessAnalysis. LAA generates a "Report" which can be used by other passes to emit remarks.

This also fixes the issue with emission of the same/similar remark twice (once by LAA, and once by a parent pass, eg: LV)

  • Updated the tests accordingly.
malharJ updated this revision to Diff 384037.EditedTue, Nov 2, 4:10 AM

Fixed minor typos and one LIT test (vectorization-remarks-missed.ll)

@sdesmalen , can you please review the latest updates ?

sdesmalen requested changes to this revision.Wed, Nov 10, 5:32 AM

Hi @malharJ, thanks for cleaning up the patch and making changes based on what we discussed offline, it's a bit more manageable to review now.
There are still some changes that I think are unnecessary, and I found that the patch needs to be rebased (this is how I found some confusing output test/Analysis/LoopAccessAnalysis/pointer-phis.ll).


This seems unused?


Can there only be a single unsafe/unknown dependence? Or can there be more?


UncomputablePtr and the mechanism around it seems entirely redundant, because it doesn't add any information that is used anywhere. The only reason that UncomputablePtr is collected is to avoid printing "Cannot identify array bounds" when it gets to the then-block of if (!CanDoRTIfNeeded) { ... }. The information is redundant, because when UncomputablePtr is set, then CanDoRT is false, and vice-versa.


nit: The capitalisation of cannot -> Cannot seems unnecessary.


unnecessary change.


These changes here obfuscate the report that's generated by LAA when running loop(print-access-info). The information printed was:

Loop access info in function 'store_with_pointer_phi_incoming_phi':
 The compiler can't determine the cause of the issue.
          %v8 = load double, double* %arrayidx, align 8 ->
          store double %mul16, double* %ptr.2, align 8

The information that is added by the remark is:

Loop access info in function 'store_with_pointer_phi_incoming_phi':
    Report: unsafe dependent memory operations in loop. Use #pragma loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop
Unknown data dependence. Memory location is the same as accessed at <UNKNOWN LOCATION>. 
 The compiler can't determine the cause of the issue.
          %v8 = load double, double* %arrayidx, align 8 ->  
          store double %mul16, double* %ptr.2, align 8

The extra information here isn't particularly useful, especially because it doesn't have any line information. Perhaps the code should disable the extra info if the location is unknown/unavailable.


nit: Please remove the newline.


Instead of using + to concatenate two strings, let this be handled by raw_ostream using << .


Please CHECK-NEXT for the full line that is printed.

This revision now requires changes to proceed.Wed, Nov 10, 5:32 AM
malharJ updated this revision to Diff 387087.Sun, Nov 14, 7:44 AM
malharJ marked 6 inline comments as done.
  • Rebased patch onto main
  • Updated code to avoid printing <unknown> when no debug info is present
  • minor formatting updates

thanks for pointing it out. Removed it.


There can be more.

But we are only emitting (as a remark) the first one found.


Ok, I agree but there's an issue here ..

LoopAccessInfo::recordAnalysis() cant be called from within the function ( AccessAnalysis::canCheckPtrAtRT() ).
as it's not a member of AccessAnalysis class.

Perhaps one way to do it would be for AccessAnalysis::canCheckPtrAtRT() to accept a parameter
by reference and then we can use the value after the call as input to recordAnalysis() here.
But I'm afraid this approach is not much less verbose than current approach pf having UncomputablePtr as a member variable.


Thanks for pointing this out. Done.


can we keep this newline ?

The current text (Note: it was not introduced by this patch) is too long already:

unsafe dependent memory operations in loop. Use "#pragma loop distribute(enable) to allow loop distribution to attempt to isolate the offending operations into a separate loop



But now the code will only print the remaining text: "Memory location is the same as ...etc"
when debug info is available.

so this is no longer an issue (since this test does not contain any debug info/metadata).

sdesmalen added inline comments.Wed, Nov 24, 9:16 AM

UncomputablePtr doesn't really need a separate variable in AccessAnalysis, since it's only set by one function, and used directly by the function that calls it.

You can change canCheckPtrAtRT to take Value **UncomputablePtr = nullptr, and in canCheckPtrAtRT assign the value if the value is requested:

if (UncomputablePtr)
  *UncomputablePtr = Access.getPointer();

Where you call it, you can have:

Value *UncomputablePtr = nullptr;
bool CanDoRTIfNeeded =
    Accesses.canCheckPtrAtRT(*PtrRtChecking, PSE->getSE(), TheLoop,
                             SymbolicStrides, false, &UncomputablePtr);

if (!CanDoRTIfNeeded) {
  if (auto *I = dyn_cast_or_null<Instruction>(UncomputablePtr))
    recordAnalysis("UnknownArrayBounds", I)

Can you split this change out into a separate patch with a test that demonstrates the change?


It shows here that the dependences are already collected. You can iterate Dependences to find the dependence that isn't safe, so that you don't have to add UnsafeDependence and maintain that separately.


I guess this can be more briefly written as:

Loc = I->getDebugLoc();
if (auto *PtrI = dyn_cast_or_null<Instruction>(getPointerOperand(I)))
  Loc = PtrI->getDebugLoc();

(at which point it probably doesn't require a separate function anymore, especially since it has only one use).


This seems like a case that should be added to getPointerOperand ?


It now passes in I as the instruction, but the debug location of the remark seems unchanged in the test. What value is this adding?


Just leave the name as CantIdentifyArrayBounds ?

RKSimon resigned from this revision.Mon, Nov 29, 2:50 AM

Sorry this is not an area I known enough about to review

fhahn added inline comments.Mon, Nov 29, 3:20 AM

Does this need to be a shared_ptr? If you want to encode the fact that it may not be set, using Optional may be a better choice. Or you could initialize it to NoDep in case there is no unsafe dependence.