Page MenuHomePhabricator

[LAA] Analyze pointers forked by a select
Needs ReviewPublic

Authored by huntergr on Aug 25 2021, 6:04 AM.



Given a function like the following:

void forked_ptrs_different_base_same_offset(float *Base1, float *Base2, float *Dest, int *Preds) {
  for (int i=0; i<100; i++) {
    if (Pred[i] != 0) {
      Dest[i] = Base1[i];
    } else {
      Dest[i] = Base2[i];

LLVM will optimize the IR to a single load using a pointer determined by a select instruction: = select i1 %cmp1.not, float* %Base2, float* %Base1 = getelementptr inbounds float, float*, i64 %indvars.iv 
%.sink = load float, float*, align 4

LAA is currently unable to analyze such IR, since ScalarEvolution will return a SCEVUnknown for the pointer operand for the load.

This patch adds initial optional support for analyzing both possibilities for the pointer and allowing LAA to generate runtime checks for the bounds if required.

Diff Detail

Event Timeline

huntergr created this revision.Aug 25 2021, 6:04 AM
huntergr requested review of this revision.Aug 25 2021, 6:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2021, 6:04 AM
huntergr updated this revision to Diff 370905.Sep 6 2021, 6:18 AM

Rebased, ran clang-format over the patch.

Hi @huntergr, apologies for delay reviewing this patch! I've not finished reviewing it, but I've left some comments that I have so far. Thanks!


If we're only ever going to support two forks couldn't we make this simpler by just having:

unsigned Members[2];

instead or is this a real pain to implement? Or is the idea that we may want to extend the fork to allow more than 2 in future?


Again, does this need to be a SmallVector and can it just be const SCEV *Starts[2];?


Hi Graham, it feels slightly awkward having to essentially redefine a ForkedPointer as a pair in the same way as ForkedPtrs in llvm/include/llvm/Analysis/LoopAccessAnalysis.h. Maybe we could define something like:

using ForkedPointer = std::pair<const SCEV *, const SCEV *>;

in llvm/include/llvm/Analysis/ScalarEvolution.h that can be used in LoopAccessAnalysis too?


nit: Perhaps you can just write assert(Fork <= 1 && ...) since it's unsigned anyway?


nit: Perhaps you can just write assert(Fork <= 1 && ...) since it's unsigned anyway?


Have you rewritten the logic here as a performance improvement, i.e. to avoid calling Group.addPointer() after it's already been merged?


nit: This is just a suggestion, but you could restructure this a little to remove the extra indentation I think here:

if (!AR) {
  if (!RtCheck.AllowForkedPtrs)
    return false;

Not sure if this is better?


Is it better to just make a recursive call to hasComputableBounds instead of calling isAffine here? We're potentially missing out on future improvements to hasComputableBounds here, and also we're ignoring the possibility of a forked pointer being loop invariant I think.


Can we also have a test where at least one of the forked pointers is loop-invariant?

huntergr added inline comments.Sep 7 2021, 7:31 AM

So this is for a RuntimeCheckingPtrGroup, which may have more than two members already; I've only changed it from a SmallVector to a SmallSetVector here in order to avoid duplicate members, since both sides of a forked pointer can be added to the same group.


I'll look into it, but it'll make other parts messier since I can't just iterate over the members of the SmallVectors and would need to perform null checks for all those places for the second element.

One solution I thought of (but didn't implement, since I wanted some community feedback on the overall work first) was to create a new SCEVForkedExpr so that we wouldn't need to store two SCEVs here and could just note which fork(s) was represented. It'll take a while to implement that, but might be cleaner. Thoughts?


Not intentionally, no -- I just replicated the behaviour of the original (break out of the loop if the pointer merged into a group) but considered both potential forks.


We already do; see forked_ptrs_uniform_and_contiguous_forks

We could properly analyze and vectorize that case as well, but I haven't implemented that yet so I'm just testing that it gets rejected for now.

david-arm added inline comments.Sep 7 2021, 7:39 AM

Ah ok, sorry I missed that. I guess what I meant was that this should be trivial to implement, particularly if we can find a way of making calls to hasComputableBounds recursive and re-use the existing code that checks for loop-invariants and affine pointers.

huntergr updated this revision to Diff 378920.Tue, Oct 12, 1:53 AM

Rebased, updated based on review comments.

huntergr marked 4 inline comments as done.Tue, Oct 12, 2:09 AM
huntergr added inline comments.

We still needed to bail out if AR was false and forked pointers weren't allowed... so I rewrote it to check for the positive case for AR first and then proceed with the forked pointers check and default false afterwards.


We can't do that right now -- hasComputableBounds takes a Value* rather than a SCEV* so it can (potentially) be added to the stride map in replaceSymbolicStrideSCEV. We'd just end up looking at the same Value and splitting again.

This is something the SCEVForkedExpr would make cleaner, since we would only need to evaluate a single SCEV. But it'll take a bit of refactoring to do that, which is why I wanted some feedback on the whole idea first.

We could also separate out parts of these functions to make it recursive, but I'll need to be careful since replaceSymbolicStrideSCEV has other users.


Sadly, it'll require a bit more work to support invariant addresses. My original downstream code allowed them, but we ran into a bug with it and disabled them.

My plan is to get the base functionality committed, then go back and add an interface so that LoopVectorize (or other LAA consumers) can query the type of the forks in order to generate correct (and hopefully more optimal) IR for various cases -- the current contiguous-only SCEVs, strides of >1, loop-invariant but unknown strides, uniform/invariant addresses, indexed gather/scatter, etc.

If both forks have a stride of 1, or are invariant, then we could potentially plant two masked load instructions (or load + broadcast) instead of a gather, for instance. But that's future work until this part is completed.

Thanks for making the changes @huntergr! I've got a few mostly minor comments so far, but still have to review findForkedSCEVs. :)


Hi @huntergr, thanks for making the changes to Exprs. I just wonder if we should have an assert here that ScExprs.size() <= 2?


nit: Is it worth calling CheckingGroups.emplace_back(I, *this, /*Fork=0*/0) for clarity here?


nit: This is just a minor comment, but you could remove indentation further here by bailing out early, i.e.

if (!FPtr)
  return false;

const SCEV *A =

nit: Instead of writing:

LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << *(FPtr->first) << "\n");

I think you can just write

LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << *A << "\n");

and same for the second one.


Is there any danger in setting this before we return true?


You're introducing a new implicit TypeSize -> uint64_t cast here. Could you rewrite this as:

int64_t Size = DL.getTypeAllocSize(PtrTy->getElementType()).getFixedSize();
huntergr updated this revision to Diff 381211.Thu, Oct 21, 4:43 AM

Rebased, minor fixes from review comments.

huntergr marked 5 inline comments as done.Thu, Oct 21, 4:45 AM