Page MenuHomePhabricator

[LAA] Analyze pointers forked by a select
ClosedPublic

Authored by huntergr on Aug 25 2021, 6:04 AM.

Details

Summary

Given a function like the following:

void forked_ptrs_different_base_same_offset(float *Base1, float *Base2, float *Dest, int *Preds) {
  for (int i=0; i<100; i++) {
    if (Pred[i] != 0) {
      Dest[i] = Base1[i];
    } else {
      Dest[i] = Base2[i];
    }
  }
}

LLVM will optimize the IR to a single load using a pointer determined by a select instruction:

%spec.select = select i1 %cmp1.not, float* %Base2, float* %Base1
%.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv 
%.sink = load float, float* %.sink.in, align 4

LAA is currently unable to analyze such IR, since ScalarEvolution will return a SCEVUnknown for the pointer operand for the load.

This patch adds initial optional support for analyzing both possibilities for the pointer and allowing LAA to generate runtime checks for the bounds if required.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
david-arm added inline comments.Oct 20 2021, 6:54 AM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
395

nit: Is it worth calling CheckingGroups.emplace_back(I, *this, /*Fork=0*/0) for clarity here?

718

nit: This is just a minor comment, but you could remove indentation further here by bailing out early, i.e.

if (!FPtr)
  return false;

const SCEV *A =
722

nit: Instead of writing:

LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << *(FPtr->first) << "\n");

I think you can just write

LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << *A << "\n");

and same for the second one.

724

Is there any danger in setting this before we return true?

771

You're introducing a new implicit TypeSize -> uint64_t cast here. Could you rewrite this as:

int64_t Size = DL.getTypeAllocSize(PtrTy->getElementType()).getFixedSize();
huntergr updated this revision to Diff 381211.Oct 21 2021, 4:43 AM

Rebased, minor fixes from review comments.

huntergr marked 5 inline comments as done.Oct 21 2021, 4:45 AM
david-arm added a subscriber: lebedev.ri.

Thanks for dealing with all the review comments @huntergr! I think the patch looks sensible to me, although I'm not as familiar with the SCEV code as others might be. I'm adding @lebedev.ri as a potential reviewer if that's ok?

llvm/test/Transforms/LoopVectorize/forked-pointers.ll
34

It would be good to show some distinction here between Check 1 and Check 2. I assume it's actually checking each of the forked pointers, but the output doesn't make that clear.

huntergr updated this revision to Diff 387879.Nov 17 2021, 2:23 AM
huntergr marked an inline comment as done.

Rebased, and changed the membership of a checking group to be a pair of PointerInfo index and fork so that printing can show which forks are present in a group.

huntergr updated this revision to Diff 388889.Nov 22 2021, 5:23 AM

Rebased, disabled by default, added a couple of different instructions into the tests to ensure those paths are at least covered even if they aren't in a positive case right now; I'm still planning to leave those cases (known strides, g/s, uniform) for a followup patch; I suppose I could start a patch series if people want to see those earlier.

I've run the LNT nightly suite with this enabled, along with a few HPC benchmarks I have access to and they all pass.

LGTM! Thanks for making all the changes @huntergr and adding the tests. This patch is low risk at the moment with the flag being disabled currently. At some point once there have been performance investigations to prove there are no regressions we can then enable this by default.

david-arm accepted this revision.Nov 23 2021, 5:10 AM
This revision is now accepted and ready to land.Nov 23 2021, 5:10 AM
fhahn added a comment.Nov 23 2021, 5:22 AM

LGTM! Thanks for making all the changes @huntergr and adding the tests. This patch is low risk at the moment with the flag being disabled currently. At some point once there have been performance investigations to prove there are no regressions we can then enable this by default.

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
20

Is this needed? Other SCEV types are forward-declared below, so ScalarEvolution.h doesn't need to be included.

334

Could we use a bool for Fork if it only has 2 possible values?

llvm/include/llvm/Analysis/ScalarEvolution.h
769

This is only used by LAA. Is there a reason this needs to be part of ScalarEvolution?

llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1

the tests running -loop-accesses should be in llvm/test/Analysis/LoopAccessAnalysis.

Also, could you pre-commit the tests and update the diff here to show only the difference? That way it is a bit easier to see the impact in the diff.

I just realized how this may be a bit similar to how we handle pointers that are phi nodes. Currently those are handled by adding accesses for both incoming values (see D109381). Unfortunately the same approach cannot be directly used for selects, because we need to create 2 pointers that do not exist in the IR.

But if MemAccessInfo/ would also carry the pointer SCEV directly, I think it would be possible to avoid adding another dimension to RuntimePointerChecking::PointerInfo. Instead we would add 2 PointerInfo entries with separate translated pointer SCEVs. I put up a rough sketch of what this may look like in D114480, D114479 to see how this may look like

fhahn added a comment.Nov 23 2021, 4:19 PM

I just realized how this may be a bit similar to how we handle pointers that are phi nodes. Currently those are handled by adding accesses for both incoming values (see D109381). Unfortunately the same approach cannot be directly used for selects, because we need to create 2 pointers that do not exist in the IR.

But if MemAccessInfo/ would also carry the pointer SCEV directly, I think it would be possible to avoid adding another dimension to RuntimePointerChecking::PointerInfo. Instead we would add 2 PointerInfo entries with separate translated pointer SCEVs. I put up a rough sketch of what this may look like in D114480, D114479 to see how this may look like

I also put up a sketch of a stripped down variant that only supports runtime check generation by adding multiple PointerInfos with different associated pointer SCEVs: D114487.

The variant that also extends MemAccessInfo should also be able to determine that certain loops are safe without runtime checks, e.g. like the one below I think:

%s1 = type { [32000 x float], [32000 x float], [32000 x float] }
define dso_local void @foo(%s1 * nocapture readonly %Base, i32* nocapture readonly %Preds) {
entry:
  br label %for.body

for.cond.cleanup:
  ret void

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
  %0 = load i32, i32* %arrayidx, align 4
  %cmp1.not = icmp eq i32 %0, 0
  %gep.1 = getelementptr inbounds %s1, %s1* %Base, i64 0, i32 1, i32 0
  %gep.2 = getelementptr inbounds %s1, %s1* %Base, i64 0, i32 2, i32 0
  %spec.select = select i1 %cmp1.not, float* %gep.1, float* %gep.2
  %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv
  %.sink = load float, float* %.sink.in, align 4
  %1= getelementptr inbounds %s1, %s1 * %Base, i64 0, i32 0, i64 %indvars.iv
  store float %.sink, float* %1, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 100
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}
huntergr updated this revision to Diff 389956.Nov 26 2021, 2:22 AM

Revised based on @fhahn 's initial suggestions, rebased.

I'll look into the MemAccessInfo approach.

huntergr marked 3 inline comments as done.Nov 26 2021, 2:27 AM
huntergr added inline comments.
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
20

It was there to include the 'using ForkedPointer =' definition, but moving it out of ScalarEvolution means we can just define it here.

I had hoped that we could maybe integrate it further into ScalarEvolution as a new type of SCEV, but maybe this isn't the right time -- you're correct, this will be the only user for now, so we can just keep it here until we find a case where it would be useful to have it more widely available.

mgabka added a subscriber: mgabka.Nov 29 2021, 1:52 AM
david-arm accepted this revision.Nov 29 2021, 4:53 AM

LGTM!

Hi @huntergr, thanks for making all the changes. I think the patch looks good to
go for now. It's still worth looking into the MemAccessInfo approach as a follow-up,
but the patch has been sat in review for long enough (3 months) without any
fundamental objections so I'd prefer we got something merged now to get the
functionality defended and unblock future work. This is currently only enabled
under an option, so any refactoring for the MemAccessInfo approach can be
done safely under another patch.

fhahn added a comment.Nov 29 2021, 2:07 PM

LGTM!

Hi @huntergr, thanks for making all the changes. I think the patch looks good to
go for now. It's still worth looking into the MemAccessInfo approach as a follow-up,
but the patch has been sat in review for long enough (3 months) without any
fundamental objections so I'd prefer we got something merged now to get the
functionality defended and unblock future work.

I agree it is not ideal that there's been not much feedback so far, but now that there is some additional feedback/suggestion, I think it would be good to hear additional opinions on the preferred direction here or at least discuss concrete potential alternatives; D114487 in particular which should have similar effects, but with less invasive changes, i.e. there's no need to adjust isNoWrap, hasComputableBounds, RuntimePointerChecking::insert or add additional state.

You mention future work blocked by this as a reason for landing this now, however I cannot find any references to patches depending on this work.

This is currently only enabled
under an option, so any refactoring for the MemAccessInfo approach can be
done safely under another patch.

While it is true that it is off by default, it adds substantial complexity to LAA which is already quite complex and the changes are quite spread out. One concern is that it adds an additional way to model 'forked pointers'; we already handle 'forked pointers' via phi nodes (by adding multiple PointerInfos). I am also not sure it should be off by default, once it lands. IMO better analysis should be the right thing to do and should be quite safe from a performance regression perspective. I don't see a strong reason for not enabling this by default and having wide testing surface potential issues early.

huntergr updated this revision to Diff 406734.Feb 8 2022, 2:09 AM
huntergr marked an inline comment as done.
huntergr set the repository for this revision to rG LLVM Github Monorepo.

Rebased and changed to use @fhahn 's lighter-weight approach from D114487 combined with my recursive function to find the SCEVs. Although the simplified tests are handled with a couple levels of checking, the real applications I was working on had additional operations between the ptr value for the load or store and a select.

It might be best to introduce a limit to recursion though, any thoughts?

bsmith added a subscriber: bsmith.Feb 28 2022, 3:01 AM
huntergr updated this revision to Diff 412642.Mar 3 2022, 2:19 AM

Rebased, added a recursion limit to the SCEV building function.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 2:19 AM
huntergr updated this revision to Diff 421130.Apr 7 2022, 1:47 AM

Rebased. Ping?

fhahn added a comment.Apr 10 2022, 2:59 PM

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
133

It would be good to add a test for the option, e.g. a case that requires 2 or 3 recursions and set test with max-forked-scev-depth=2/3

409

This scheme would need documenting, i.e. why we can have multiple expressions for a pointer.

414

should be removed?

696

should be removed?

841

Needs resolving.

It should be needed, because the above code may have added assumptions, which make Ptr an AddRec, See comment in D114487.

855

missing test for this?

873

It looks like tests for those conditions are missing?

886

I don't think we can use the inbounds info here, unless we prove that the program is undefined if GEP is poison.

Consider something like below. If %c is always false, the GEP index could be out-of-bounds (and the GEP poison). Adding a runtime check based on the SCEV expression may introduce a branch on poison unconditionally.

define dso_local void @forked_ptrs_different_base_same_offset(float* nocapture readonly %Base1, float* nocapture readonly %Base2, float* nocapture %Dest, i32* nocapture readonly %Preds, i1 %c) {
entry:
  br label %for.body

for.cond.cleanup:
  ret void

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %latch ]
  %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
  %0 = load i32, i32* %arrayidx, align 4
  %cmp1.not = icmp eq i32 %0, 0
  %spec.select = select i1 %cmp1.not, float* %Base2, float* %Base1
  %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv
  %.sink = load float, float* %.sink.in, align 4
  %1 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv
  br i1 %c, label %then, label %latch

then:
  store float %.sink, float* %1, align 4
  br label %latch

latch:
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 100
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}
893

It might be a bit simpler to just add a duplicate to BaseScevs/OffsetScevs and have one common code path to compute the SCEVs to add

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

Sure, that sounds like a good plan. I'll try and review your patch this week.

fhahn added a comment.May 2 2022, 11:41 AM

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

Sure, that sounds like a good plan. I'll try and review your patch this week.

Sounds great, thanks!

huntergr updated this revision to Diff 437820.Jun 17 2022, 2:15 AM

Rebased on top of Florian's patch from https://reviews.llvm.org/D114487

That patch has been reverted for now so builds will fail, but this should be pretty close to what we want once the base feature is back in.

I've cut the walker function down to only handle GEPs and selects for now; we can add more cases (and more tests) later.

I've added tests to exercise the checks that exclude a given select or GEP from processing, as well as one for the recursion limit.

I'll precommit the new tests before the next update on this patch.

fhahn added a comment.Jun 28 2022, 2:13 AM

Rebased on top of Florian's patch from https://reviews.llvm.org/D114487

That patch has been reverted for now so builds will fail, but this should be pretty close to what we want once the base feature is back in.

Thanks, the patch has been relanded a while ago.

I've cut the walker function down to only handle GEPs and selects for now; we can add more cases (and more tests) later.

I've added tests to exercise the checks that exclude a given select or GEP from processing, as well as one for the recursion limit.

I'll precommit the new tests before the next update on this patch.

Sounds good! I skimmed the current version and it looks like most comments should be addressed in the latest update. It might be good to mark them as done. One thing I couldn't spot in the latest version is a test for the potential inbounds issue mentioned inline.

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
1 ↗(On Diff #437820)

It would be probably be good to convert the IR to use opaque pointers in a pre-commit, so -opaque-pointers is not needed.

huntergr updated this revision to Diff 444549.Jul 14 2022, 1:08 AM

Rebased on top of @fhahn 's fixed patch, including changes to detect possible undef/poison.

New tests were precommitted in rGa19cf47da095.

huntergr marked 5 inline comments as done.Jul 14 2022, 1:14 AM
huntergr added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
886

I removed the code to add no-wrap flags to the SCEV based on 'inbounds', then added your test.

fhahn accepted this revision.Jul 18 2022, 1:10 AM

LGTM, thanks!

llvm/lib/Analysis/LoopAccessAnalysis.cpp
935

nit: unnecessary move?

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
158 ↗(On Diff #444549)

nit: it would probably be good to also have a few tests that access different sizes, including odd ones like i23 or something like that, to ensure the right size expressions are used.

This revision was landed with ongoing or failed builds.Jul 18 2022, 4:08 AM
This revision was automatically updated to reflect the committed changes.
ro added a subscriber: ro.Jul 18 2022, 6:45 AM

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::SCEV*, bool> > findForkedPointer(llvm::PredicatedScalarEvolution&, const ValueToValueMap&, llvm::Value*, const llvm::Loop*)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: could not convert ‘Scevs’ from ‘SmallVector<[...],2>’ to ‘SmallVector<[...],3>’
  916 |     return Scevs;
      |            ^~~~~
      |            |
      |            SmallVector<[...],2>
In D108699#3659633, @ro wrote:

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::SCEV*, bool> > findForkedPointer(llvm::PredicatedScalarEvolution&, const ValueToValueMap&, llvm::Value*, const llvm::Loop*)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: could not convert ‘Scevs’ from ‘SmallVector<[...],2>’ to ‘SmallVector<[...],3>’
  916 |     return Scevs;
      |            ^~~~~
      |            |
      |            SmallVector<[...],2>

Does rG4bd072c56b87 fix this for you?

ro added a comment.Jul 18 2022, 10:22 AM
In D108699#3659633, @ro wrote:

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

[...]

Does rG4bd072c56b87 fix this for you?

It does indeed, thanks. FWIW, this is with g++ 11.3.0.

https://github.com/llvm/llvm-project/issues/57368 is an open bug report describing a regression from this patch.

https://github.com/llvm/llvm-project/issues/57368 is an open bug report describing a regression from this patch.

Thanks for the report and reproducer, I'll look into it.