This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
5/8
LoopAccessAnalysis.h
1/2
ScalarEvolution.h
-
lib/
-
Analysis/
13/24
LoopAccessAnalysis.cpp
-
ScalarEvolution.cpp
-
Transforms/
-
Scalar/
-
LoopDistribute.cpp
-
LoopLoadElimination.cpp
-
Utils/
-
LoopVersioning.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
3/6
forked-pointers.ll

Differential D108699

[LAA] Analyze pointers forked by a select
ClosedPublic

Authored by huntergr on Aug 25 2021, 6:04 AM.

Download Raw Diff

Details

Reviewers

fhahn
sdesmalen
reames
eli.friedman
david-arm
lebedev.ri
Ayal
Meinersbur

Commits

rGdb8fcb2c2537: [LAA] Add recursive IR walker for forked pointers

Summary

Given a function like the following:

void forked_ptrs_different_base_same_offset(float *Base1, float *Base2, float *Dest, int *Preds) {
  for (int i=0; i<100; i++) {
    if (Pred[i] != 0) {
      Dest[i] = Base1[i];
    } else {
      Dest[i] = Base2[i];
    }
  }
}

LLVM will optimize the IR to a single load using a pointer determined by a select instruction:

%spec.select = select i1 %cmp1.not, float* %Base2, float* %Base1
%.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv 
%.sink = load float, float* %.sink.in, align 4

LAA is currently unable to analyze such IR, since ScalarEvolution will return a SCEVUnknown for the pointer operand for the load.

This patch adds initial optional support for analyzing both possibilities for the pointer and allowing LAA to generate runtime checks for the bounds if required.

Diff Detail

Event Timeline

huntergr created this revision.Aug 25 2021, 6:04 AM

Herald added subscribers: javed.absar, hiraditya. · View Herald TranscriptAug 25 2021, 6:04 AM

huntergr requested review of this revision.Aug 25 2021, 6:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2021, 6:04 AM

Harbormaster completed remote builds in B121152: Diff 368619.Aug 25 2021, 6:46 AM

Rebased, ran clang-format over the patch.

Harbormaster completed remote builds in B122756: Diff 370905.Sep 6 2021, 6:54 AM

Hi @huntergr, apologies for delay reviewing this patch! I've not finished reviewing it, but I've left some comments that I have so far. Thanks!

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
365	If we're only ever going to support two forks couldn't we make this simpler by just having: unsigned Members[2]; instead or is this a real pain to implement? Or is the idea that we may want to extend the fork to allow more than 2 in future?
385	Again, does this need to be a `SmallVector` and can it just be `const SCEV *Starts[2];`?
llvm/include/llvm/Analysis/ScalarEvolution.h
460	Hi Graham, it feels slightly awkward having to essentially redefine a ForkedPointer as a pair in the same way as `ForkedPtrs` in llvm/include/llvm/Analysis/LoopAccessAnalysis.h. Maybe we could define something like: using ForkedPointer = std::pair<const SCEV , const SCEV >; in llvm/include/llvm/Analysis/ScalarEvolution.h that can be used in LoopAccessAnalysis too?
llvm/lib/Analysis/LoopAccessAnalysis.cpp
186–187	nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway?
310	nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway?
461	Have you rewritten the logic here as a performance improvement, i.e. to avoid calling `Group.addPointer()` after it's already been merged?
706	nit: This is just a suggestion, but you could restructure this a little to remove the extra indentation I think here: if (!AR) { if (!RtCheck.AllowForkedPtrs) return false; ... Not sure if this is better?
722	Is it better to just make a recursive call to hasComputableBounds instead of calling `isAffine` here? We're potentially missing out on future improvements to hasComputableBounds here, and also we're ignoring the possibility of a forked pointer being loop invariant I think.
llvm/test/Transforms/LoopVectorize/forked-pointers.ll
2	Can we also have a test where at least one of the forked pointers is loop-invariant?

huntergr added inline comments.Sep 7 2021, 7:31 AM

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
365	So this is for a RuntimeCheckingPtrGroup, which may have more than two members already; I've only changed it from a SmallVector to a SmallSetVector here in order to avoid duplicate members, since both sides of a forked pointer can be added to the same group.
385	I'll look into it, but it'll make other parts messier since I can't just iterate over the members of the SmallVectors and would need to perform null checks for all those places for the second element. One solution I thought of (but didn't implement, since I wanted some community feedback on the overall work first) was to create a new SCEVForkedExpr so that we wouldn't need to store two SCEVs here and could just note which fork(s) was represented. It'll take a while to implement that, but might be cleaner. Thoughts?
llvm/lib/Analysis/LoopAccessAnalysis.cpp
461	Not intentionally, no -- I just replicated the behaviour of the original (break out of the loop if the pointer merged into a group) but considered both potential forks.
llvm/test/Transforms/LoopVectorize/forked-pointers.ll
2	We already do; see forked_ptrs_uniform_and_contiguous_forks We could properly analyze and vectorize that case as well, but I haven't implemented that yet so I'm just testing that it gets rejected for now.

david-arm added inline comments.Sep 7 2021, 7:39 AM

llvm/test/Transforms/LoopVectorize/forked-pointers.ll
2	Ah ok, sorry I missed that. I guess what I meant was that this should be trivial to implement, particularly if we can find a way of making calls to hasComputableBounds recursive and re-use the existing code that checks for loop-invariants and affine pointers.

Rebased, updated based on review comments.

huntergr marked 4 inline comments as done.Oct 12 2021, 2:09 AM

huntergr added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
706	We still needed to bail out if AR was false and forked pointers weren't allowed... so I rewrote it to check for the positive case for AR first and then proceed with the forked pointers check and default false afterwards.
722	We can't do that right now -- hasComputableBounds takes a Value* rather than a SCEV* so it can (potentially) be added to the stride map in replaceSymbolicStrideSCEV. We'd just end up looking at the same Value and splitting again. This is something the SCEVForkedExpr would make cleaner, since we would only need to evaluate a single SCEV. But it'll take a bit of refactoring to do that, which is why I wanted some feedback on the whole idea first. We could also separate out parts of these functions to make it recursive, but I'll need to be careful since replaceSymbolicStrideSCEV has other users.
llvm/test/Transforms/LoopVectorize/forked-pointers.ll
2	Sadly, it'll require a bit more work to support invariant addresses. My original downstream code allowed them, but we ran into a bug with it and disabled them. My plan is to get the base functionality committed, then go back and add an interface so that LoopVectorize (or other LAA consumers) can query the type of the forks in order to generate correct (and hopefully more optimal) IR for various cases -- the current contiguous-only SCEVs, strides of >1, loop-invariant but unknown strides, uniform/invariant addresses, indexed gather/scatter, etc. If both forks have a stride of 1, or are invariant, then we could potentially plant two masked load instructions (or load + broadcast) instead of a gather, for instance. But that's future work until this part is completed.

Harbormaster completed remote builds in B128298: Diff 378920.Oct 12 2021, 2:53 AM

Ping.

Thanks for making the changes @huntergr! I've got a few mostly minor comments so far, but still have to review findForkedSCEVs. :)

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
408	Hi @huntergr, thanks for making the changes to `Exprs`. I just wonder if we should have an assert here that `ScExprs.size() <= 2`?
llvm/lib/Analysis/LoopAccessAnalysis.cpp
395	nit: Is it worth calling `CheckingGroups.emplace_back(I, this, /Fork=0*/0)` for clarity here?
718	nit: This is just a minor comment, but you could remove indentation further here by bailing out early, i.e. if (!FPtr) return false; const SCEV *A =
722	nit: Instead of writing: LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << (FPtr->first) << "\n"); I think you can just write LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << A << "\n"); and same for the second one.

david-arm added inline comments.Oct 20 2021, 6:54 AM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
724	Is there any danger in setting this before we return true?
771	You're introducing a new implicit TypeSize -> uint64_t cast here. Could you rewrite this as: int64_t Size = DL.getTypeAllocSize(PtrTy->getElementType()).getFixedSize();

Rebased, minor fixes from review comments.

huntergr marked 5 inline comments as done.Oct 21 2021, 4:45 AM

Harbormaster completed remote builds in B129914: Diff 381211.Oct 21 2021, 5:22 AM

Thanks for dealing with all the review comments @huntergr! I think the patch looks sensible to me, although I'm not as familiar with the SCEV code as others might be. I'm adding @lebedev.ri as a potential reviewer if that's ok?

llvm/test/Transforms/LoopVectorize/forked-pointers.ll
34	It would be good to show some distinction here between Check 1 and Check 2. I assume it's actually checking each of the forked pointers, but the output doesn't make that clear.

Rebased, and changed the membership of a checking group to be a pair of PointerInfo index and fork so that printing can show which forks are present in a group.

Harbormaster completed remote builds in B134698: Diff 387879.Nov 17 2021, 3:00 AM

Rebased, disabled by default, added a couple of different instructions into the tests to ensure those paths are at least covered even if they aren't in a positive case right now; I'm still planning to leave those cases (known strides, g/s, uniform) for a followup patch; I suppose I could start a patch series if people want to see those earlier.

I've run the LNT nightly suite with this enabled, along with a few HPC benchmarks I have access to and they all pass.

Harbormaster completed remote builds in B135406: Diff 388889.Nov 22 2021, 8:47 AM

LGTM! Thanks for making all the changes @huntergr and adding the tests. This patch is low risk at the moment with the flag being disabled currently. At some point once there have been performance investigations to prove there are no regressions we can then enable this by default.

david-arm accepted this revision.Nov 23 2021, 5:10 AM

This revision is now accepted and ready to land.Nov 23 2021, 5:10 AM

In D108699#3148674, @david-arm wrote:

LGTM! Thanks for making all the changes @huntergr and adding the tests. This patch is low risk at the moment with the flag being disabled currently. At some point once there have been performance investigations to prove there are no regressions we can then enable this by default.

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
20	Is this needed? Other SCEV types are forward-declared below, so ScalarEvolution.h doesn't need to be included.
334	Could we use a bool for `Fork` if it only has 2 possible values?
llvm/include/llvm/Analysis/ScalarEvolution.h
769	This is only used by LAA. Is there a reason this needs to be part of `ScalarEvolution`?
llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1	the tests running `-loop-accesses` should be in `llvm/test/Analysis/LoopAccessAnalysis`. Also, could you pre-commit the tests and update the diff here to show only the difference? That way it is a bit easier to see the impact in the diff.

I just realized how this may be a bit similar to how we handle pointers that are phi nodes. Currently those are handled by adding accesses for both incoming values (see D109381). Unfortunately the same approach cannot be directly used for selects, because we need to create 2 pointers that do not exist in the IR.

But if MemAccessInfo/ would also carry the pointer SCEV directly, I think it would be possible to avoid adding another dimension to RuntimePointerChecking::PointerInfo. Instead we would add 2 PointerInfo entries with separate translated pointer SCEVs. I put up a rough sketch of what this may look like in D114480, D114479 to see how this may look like

In D108699#3150041, @fhahn wrote:

I just realized how this may be a bit similar to how we handle pointers that are phi nodes. Currently those are handled by adding accesses for both incoming values (see D109381). Unfortunately the same approach cannot be directly used for selects, because we need to create 2 pointers that do not exist in the IR.

But if MemAccessInfo/ would also carry the pointer SCEV directly, I think it would be possible to avoid adding another dimension to RuntimePointerChecking::PointerInfo. Instead we would add 2 PointerInfo entries with separate translated pointer SCEVs. I put up a rough sketch of what this may look like in D114480, D114479 to see how this may look like

I also put up a sketch of a stripped down variant that only supports runtime check generation by adding multiple PointerInfos with different associated pointer SCEVs: D114487.

The variant that also extends MemAccessInfo should also be able to determine that certain loops are safe without runtime checks, e.g. like the one below I think:

%s1 = type { [32000 x float], [32000 x float], [32000 x float] }
define dso_local void @foo(%s1 * nocapture readonly %Base, i32* nocapture readonly %Preds) {
entry:
  br label %for.body

for.cond.cleanup:
  ret void

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
  %0 = load i32, i32* %arrayidx, align 4
  %cmp1.not = icmp eq i32 %0, 0
  %gep.1 = getelementptr inbounds %s1, %s1* %Base, i64 0, i32 1, i32 0
  %gep.2 = getelementptr inbounds %s1, %s1* %Base, i64 0, i32 2, i32 0
  %spec.select = select i1 %cmp1.not, float* %gep.1, float* %gep.2
  %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv
  %.sink = load float, float* %.sink.in, align 4
  %1= getelementptr inbounds %s1, %s1 * %Base, i64 0, i32 0, i64 %indvars.iv
  store float %.sink, float* %1, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 100
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}

huntergr mentioned this in rGdee810e117ad: [NFC][LAA] Precommit tests for forked pointers.Nov 24 2021, 8:21 AM

Revised based on @fhahn 's initial suggestions, rebased.

I'll look into the MemAccessInfo approach.

huntergr marked 3 inline comments as done.Nov 26 2021, 2:27 AM

huntergr added inline comments.

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
20	It was there to include the 'using ForkedPointer =' definition, but moving it out of ScalarEvolution means we can just define it here. I had hoped that we could maybe integrate it further into ScalarEvolution as a new type of SCEV, but maybe this isn't the right time -- you're correct, this will be the only user for now, so we can just keep it here until we find a case where it would be useful to have it more widely available.

Harbormaster completed remote builds in B136174: Diff 389956.Nov 26 2021, 3:03 AM

mgabka added a subscriber: mgabka.Nov 29 2021, 1:52 AM

LGTM!

Hi @huntergr, thanks for making all the changes. I think the patch looks good to
go for now. It's still worth looking into the MemAccessInfo approach as a follow-up,
but the patch has been sat in review for long enough (3 months) without any
fundamental objections so I'd prefer we got something merged now to get the
functionality defended and unblock future work. This is currently only enabled
under an option, so any refactoring for the MemAccessInfo approach can be
done safely under another patch.

fhahn mentioned this in D114487: [LAA] Support runtime checks for select GEP base pointers..Nov 29 2021, 1:58 PM

In D108699#3158014, @david-arm wrote:

LGTM!

Hi @huntergr, thanks for making all the changes. I think the patch looks good to
go for now. It's still worth looking into the MemAccessInfo approach as a follow-up,
but the patch has been sat in review for long enough (3 months) without any
fundamental objections so I'd prefer we got something merged now to get the
functionality defended and unblock future work.

I agree it is not ideal that there's been not much feedback so far, but now that there is some additional feedback/suggestion, I think it would be good to hear additional opinions on the preferred direction here or at least discuss concrete potential alternatives; D114487 in particular which should have similar effects, but with less invasive changes, i.e. there's no need to adjust isNoWrap, hasComputableBounds, RuntimePointerChecking::insert or add additional state.

You mention future work blocked by this as a reason for landing this now, however I cannot find any references to patches depending on this work.

This is currently only enabled
under an option, so any refactoring for the MemAccessInfo approach can be
done safely under another patch.

While it is true that it is off by default, it adds substantial complexity to LAA which is already quite complex and the changes are quite spread out. One concern is that it adds an additional way to model 'forked pointers'; we already handle 'forked pointers' via phi nodes (by adding multiple PointerInfos). I am also not sure it should be off by default, once it lands. IMO better analysis should be the right thing to do and should be quite safe from a performance regression perspective. I don't see a strong reason for not enabling this by default and having wide testing surface potential issues early.

Rebased and changed to use @fhahn 's lighter-weight approach from D114487 combined with my recursive function to find the SCEVs. Although the simplified tests are handled with a couple levels of checking, the real applications I was working on had additional operations between the ptr value for the load or store and a select.

It might be best to introduce a limit to recursion though, any thoughts?

Harbormaster completed remote builds in B148191: Diff 406734.Feb 8 2022, 3:05 AM

peterwaller-arm added a subscriber: peterwaller-arm.Feb 15 2022, 5:12 AM

bsmith added a subscriber: bsmith.Feb 28 2022, 3:01 AM

Rebased, added a recursion limit to the SCEV building function.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 2:19 AM

Harbormaster completed remote builds in B152318: Diff 412642.Mar 3 2022, 2:41 AM

Rebased. Ping?

Harbormaster completed remote builds in B158428: Diff 421130.Apr 7 2022, 2:46 AM

fhahn mentioned this in rG3c1483609369: [LAA] Add test with simpler load of pointer select..Apr 10 2022, 2:55 PM

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
133	It would be good to add a test for the option, e.g. a case that requires 2 or 3 recursions and set test with `max-forked-scev-depth=2/3`
409	This scheme would need documenting, i.e. why we can have multiple expressions for a pointer.
414	should be removed?
696	should be removed?
841	Needs resolving. It should be needed, because the above code may have added assumptions, which make Ptr an AddRec, See comment in D114487.
855	missing test for this?
873	It looks like tests for those conditions are missing?
886	I don't think we can use the inbounds info here, unless we prove that the program is undefined if GEP is poison. Consider something like below. If `%c` is always false, the GEP index could be out-of-bounds (and the GEP poison). Adding a runtime check based on the SCEV expression may introduce a branch on poison unconditionally. define dso_local void @forked_ptrs_different_base_same_offset(float* nocapture readonly %Base1, float* nocapture readonly %Base2, float* nocapture %Dest, i32* nocapture readonly %Preds, i1 %c) { entry: br label %for.body for.cond.cleanup: ret void for.body: %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %latch ] %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv %0 = load i32, i32* %arrayidx, align 4 %cmp1.not = icmp eq i32 %0, 0 %spec.select = select i1 %cmp1.not, float* %Base2, float* %Base1 %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv %.sink = load float, float* %.sink.in, align 4 %1 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv br i1 %c, label %then, label %latch then: store float %.sink, float* %1, align 4 br label %latch latch: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond.not = icmp eq i64 %indvars.iv.next, 100 br i1 %exitcond.not, label %for.cond.cleanup, label %for.body }
893	It might be a bit simpler to just add a duplicate to BaseScevs/OffsetScevs and have one common code path to compute the SCEVs to add

In D108699#3441724, @fhahn wrote:

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

Sure, that sounds like a good plan. I'll try and review your patch this week.

In D108699#3445309, @huntergr wrote:

In D108699#3441724, @fhahn wrote:

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

Sure, that sounds like a good plan. I'll try and review your patch this week.

Sounds great, thanks!

fhahn mentioned this in rG5890b3010599: [LAA] Initial support for runtime checks with pointer selects..May 12 2022, 11:34 AM

Rebased on top of Florian's patch from https://reviews.llvm.org/D114487

That patch has been reverted for now so builds will fail, but this should be pretty close to what we want once the base feature is back in.

I've cut the walker function down to only handle GEPs and selects for now; we can add more cases (and more tests) later.

I've added tests to exercise the checks that exclude a given select or GEP from processing, as well as one for the recursion limit.

I'll precommit the new tests before the next update on this patch.

Harbormaster completed remote builds in B170454: Diff 437820.Jun 17 2022, 2:16 AM

In D108699#3591443, @huntergr wrote:

Rebased on top of Florian's patch from https://reviews.llvm.org/D114487

That patch has been reverted for now so builds will fail, but this should be pretty close to what we want once the base feature is back in.

Thanks, the patch has been relanded a while ago.

I've cut the walker function down to only handle GEPs and selects for now; we can add more cases (and more tests) later.

I've added tests to exercise the checks that exclude a given select or GEP from processing, as well as one for the recursion limit.

I'll precommit the new tests before the next update on this patch.

Sounds good! I skimmed the current version and it looks like most comments should be addressed in the latest update. It might be good to mark them as done. One thing I couldn't spot in the latest version is a test for the potential inbounds issue mentioned inline.

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
1 ↗	(On Diff #437820)	It would be probably be good to convert the IR to use opaque pointers in a pre-commit, so `-opaque-pointers` is not needed.

Rebased on top of @fhahn 's fixed patch, including changes to detect possible undef/poison.

New tests were precommitted in rGa19cf47da095.

huntergr marked 5 inline comments as done.Jul 14 2022, 1:14 AM

huntergr added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
886	I removed the code to add no-wrap flags to the SCEV based on 'inbounds', then added your test.

Harbormaster completed remote builds in B175325: Diff 444549.Jul 14 2022, 2:04 AM

LGTM, thanks!

llvm/lib/Analysis/LoopAccessAnalysis.cpp
935	nit: unnecessary move?
llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
158 ↗	(On Diff #444549)	nit: it would probably be good to also have a few tests that access different sizes, including odd ones like `i23` or something like that, to ensure the right size expressions are used.

This revision was landed with ongoing or failed builds.Jul 18 2022, 4:08 AM

Closed by commit rGdb8fcb2c2537: [LAA] Add recursive IR walker for forked pointers (authored by huntergr). · Explain Why

This revision was automatically updated to reflect the committed changes.

huntergr added a commit: rGdb8fcb2c2537: [LAA] Add recursive IR walker for forked pointers.

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::SCEV*, bool> > findForkedPointer(llvm::PredicatedScalarEvolution&, const ValueToValueMap&, llvm::Value*, const llvm::Loop*)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: could not convert ‘Scevs’ from ‘SmallVector<[...],2>’ to ‘SmallVector<[...],3>’
  916 |     return Scevs;
      |            ^~~~~
      |            |
      |            SmallVector<[...],2>

In D108699#3659633, @ro wrote:

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::SCEV*, bool> > findForkedPointer(llvm::PredicatedScalarEvolution&, const ValueToValueMap&, llvm::Value*, const llvm::Loop*)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: could not convert ‘Scevs’ from ‘SmallVector<[...],2>’ to ‘SmallVector<[...],3>’
  916 |     return Scevs;
      |            ^~~~~
      |            |
      |            SmallVector<[...],2>

Does rG4bd072c56b87 fix this for you?

In D108699#3659641, @huntergr wrote:

In D108699#3659633, @ro wrote:

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

[...]

Does rG4bd072c56b87 fix this for you?

It does indeed, thanks. FWIW, this is with g++ 11.3.0.

https://github.com/llvm/llvm-project/issues/57368 is an open bug report describing a regression from this patch.

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptAug 29 2022, 11:07 AM

In D108699#3756189, @efriedma wrote:

https://github.com/llvm/llvm-project/issues/57368 is an open bug report describing a regression from this patch.

Thanks for the report and reproducer, I'll look into it.

Allen mentioned this in D158493: [LAA] Support forked pointer in the form of phi.Aug 22 2023, 1:07 AM

Allen mentioned this in D158965: [LAA] Analyze pointers forked by a phi.Aug 27 2023, 11:32 PM

GitHub <noreply@github.com> mentioned this in rG48caa0723c89: [LAA] Analyze pointers forked by a phi (#65834).Sep 18 2023, 6:16 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopAccessAnalysis.h

88 lines

ScalarEvolution.h

10 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

264 lines

ScalarEvolution.cpp

160 lines

Transforms/

Scalar/

LoopDistribute.cpp

47 lines

LoopLoadElimination.cpp

4 lines

Utils/

LoopVersioning.cpp

5 lines

test/

Transforms/

LoopVectorize/

forked-pointers.ll

310 lines

Diff 388889

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

Show All 9 Lines
// was originally developed for the Loop Vectorizer.		// was originally developed for the Loop Vectorizer.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_ANALYSIS_LOOPACCESSANALYSIS_H		#ifndef LLVM_ANALYSIS_LOOPACCESSANALYSIS_H
#define LLVM_ANALYSIS_LOOPACCESSANALYSIS_H		#define LLVM_ANALYSIS_LOOPACCESSANALYSIS_H

#include "llvm/ADT/EquivalenceClasses.h"		#include "llvm/ADT/EquivalenceClasses.h"
		#include "llvm/ADT/SetVector.h"
#include "llvm/Analysis/LoopAnalysisManager.h"		#include "llvm/Analysis/LoopAnalysisManager.h"
		#include "llvm/Analysis/ScalarEvolution.h"
		fhahnUnsubmitted Done Reply Inline Actions Is this needed? Other SCEV types are forward-declared below, so ScalarEvolution.h doesn't need to be included. fhahn: Is this needed? Other SCEV types are forward-declared below, so ScalarEvolution.h doesn't need…
		huntergrAuthorUnsubmitted Done Reply Inline Actions It was there to include the 'using ForkedPointer =' definition, but moving it out of ScalarEvolution means we can just define it here. I had hoped that we could maybe integrate it further into ScalarEvolution as a new type of SCEV, but maybe this isn't the right time -- you're correct, this will be the only user for now, so we can just keep it here until we find a case where it would be useful to have it more widely available. huntergr: It was there to include the 'using ForkedPointer =' definition, but moving it out of…
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"

namespace llvm {		namespace llvm {

class AAResults;		class AAResults;
class DataLayout;		class DataLayout;
▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	private:
void mergeInStatus(VectorizationSafetyStatus S);		void mergeInStatus(VectorizationSafetyStatus S);
};		};

class RuntimePointerChecking;		class RuntimePointerChecking;
/// A grouping of pointers. A single memcheck is required between		/// A grouping of pointers. A single memcheck is required between
/// two groups.		/// two groups.
struct RuntimeCheckingPtrGroup {		struct RuntimeCheckingPtrGroup {
/// Create a new pointer checking group containing a single		/// Create a new pointer checking group containing a single
/// pointer, with index \p Index in RtCheck.		/// pointer, with index \p Index in RtCheck and using \p Fork (0 or 1) if
RuntimeCheckingPtrGroup(unsigned Index, RuntimePointerChecking &RtCheck);		/// this represents a forked pointer.
		fhahnUnsubmitted Done Reply Inline Actions Could we use a bool for `Fork` if it only has 2 possible values? fhahn: Could we use a bool for `Fork` if it only has 2 possible values?
		RuntimeCheckingPtrGroup(unsigned Index, RuntimePointerChecking &RtCheck,
		unsigned Fork = 0);

RuntimeCheckingPtrGroup(unsigned Index, const SCEV Start, const SCEV End,		RuntimeCheckingPtrGroup(unsigned Index, const SCEV Start, const SCEV End,
unsigned AS)		unsigned AS, unsigned Fork = 0)
: High(End), Low(Start), AddressSpace(AS) {		: High(End), Low(Start), AddressSpace(AS) {
Members.push_back(Index);		Members.push_back(std::make_pair(Index, Fork));
}		}

/// Tries to add the pointer recorded in RtCheck at index		/// Tries to add the pointer recorded in RtCheck at index
/// \p Index to this pointer checking group. We can only add a pointer		/// \p Index to this pointer checking group. We can only add a pointer
/// to a checking group if we will still be able to get		/// to a checking group if we will still be able to get
/// the upper and lower bounds of the check. Returns true in case		/// the upper and lower bounds of the check. Returns true in case
/// of success, false otherwise.		/// of success, false otherwise.
bool addPointer(unsigned Index, RuntimePointerChecking &RtCheck);		/// For forked pointers this will only add one fork at a time, determined
		/// by \p Fork (0 or 1).
		bool addPointer(unsigned Index, RuntimePointerChecking &RtCheck,
		unsigned Fork = 0);
bool addPointer(unsigned Index, const SCEV Start, const SCEV End,		bool addPointer(unsigned Index, const SCEV Start, const SCEV End,
unsigned AS, ScalarEvolution &SE);		unsigned AS, ScalarEvolution &SE, unsigned Fork = 0);

/// The SCEV expression which represents the upper bound of all the		/// The SCEV expression which represents the upper bound of all the
/// pointers in this group.		/// pointers in this group.
const SCEV *High;		const SCEV *High;
/// The SCEV expression which represents the lower bound of all the		/// The SCEV expression which represents the lower bound of all the
/// pointers in this group.		/// pointers in this group.
const SCEV *Low;		const SCEV *Low;
/// Indices of all the pointers that constitute this grouping.		/// Indices of all the pointers that constitute this grouping, including
SmallVector<unsigned, 2> Members;		/// which fork if the pointer is forked.
		SmallVector<std::pair<unsigned, unsigned>, 2> Members;
/// Address space of the involved pointers.		/// Address space of the involved pointers.
		david-armUnsubmitted Not Done Reply Inline Actions If we're only ever going to support two forks couldn't we make this simpler by just having: unsigned Members[2]; instead or is this a real pain to implement? Or is the idea that we may want to extend the fork to allow more than 2 in future? david-arm: If we're only ever going to support two forks couldn't we make this simpler by just having…
		huntergrAuthorUnsubmitted Not Done Reply Inline Actions So this is for a RuntimeCheckingPtrGroup, which may have more than two members already; I've only changed it from a SmallVector to a SmallSetVector here in order to avoid duplicate members, since both sides of a forked pointer can be added to the same group. huntergr: So this is for a RuntimeCheckingPtrGroup, which may have more than two members already; I've…
unsigned AddressSpace;		unsigned AddressSpace;
};		};

/// A memcheck which made up of a pair of grouped pointers.		/// A memcheck which made up of a pair of grouped pointers.
typedef std::pair<const RuntimeCheckingPtrGroup *,		typedef std::pair<const RuntimeCheckingPtrGroup *,
const RuntimeCheckingPtrGroup *>		const RuntimeCheckingPtrGroup *>
RuntimePointerCheck;		RuntimePointerCheck;

/// Holds information about the memory runtime legality checks to verify		/// Holds information about the memory runtime legality checks to verify
/// that a group of pointers do not overlap.		/// that a group of pointers do not overlap.
class RuntimePointerChecking {		class RuntimePointerChecking {
friend struct RuntimeCheckingPtrGroup;		friend struct RuntimeCheckingPtrGroup;

public:		public:
struct PointerInfo {		struct PointerInfo {
/// Holds the pointer value that we need to check.		/// Holds the pointer value that we need to check.
TrackingVH<Value> PointerValue;		TrackingVH<Value> PointerValue;
/// Holds the smallest byte address accessed by the pointer throughout all		/// Holds the smallest byte address(es) accessed by the pointer throughout
/// iterations of the loop.		/// all iterations of the loop.
const SCEV *Start;		const SCEV *Starts[2];
		david-armUnsubmitted Done Reply Inline Actions Again, does this need to be a `SmallVector` and can it just be `const SCEV Starts[2];`? david-arm:* Again, does this need to be a `SmallVector` and can it just be `const SCEV *Starts[2];`?
		huntergrAuthorUnsubmitted Not Done Reply Inline Actions I'll look into it, but it'll make other parts messier since I can't just iterate over the members of the SmallVectors and would need to perform null checks for all those places for the second element. One solution I thought of (but didn't implement, since I wanted some community feedback on the overall work first) was to create a new SCEVForkedExpr so that we wouldn't need to store two SCEVs here and could just note which fork(s) was represented. It'll take a while to implement that, but might be cleaner. Thoughts? huntergr: I'll look into it, but it'll make other parts messier since I can't just iterate over the…
/// Holds the largest byte address accessed by the pointer throughout all		/// Holds the largest byte address(es) accessed by the pointer throughout
/// iterations of the loop, plus 1.		/// all iterations of the loop, plus 1.
const SCEV *End;		const SCEV *Ends[2];
/// Holds the information if this pointer is used for writing to memory.		/// Holds the information if this pointer is used for writing to memory.
bool IsWritePtr;		bool IsWritePtr;
/// Holds the id of the set of pointers that could be dependent because of a		/// Holds the id of the set of pointers that could be dependent because of a
/// shared underlying object.		/// shared underlying object.
unsigned DependencySetId;		unsigned DependencySetId;
/// Holds the id of the disjoint alias set to which this pointer belongs.		/// Holds the id of the disjoint alias set to which this pointer belongs.
unsigned AliasSetId;		unsigned AliasSetId;
/// SCEV for the access.		/// SCEV(s) for the access.
const SCEV *Expr;		const SCEV *Exprs[2];
		/// Determines whether this represents a forked pointer, which will have
PointerInfo(Value PointerValue, const SCEV Start, const SCEV *End,		/// two SCEV expressions to consider for runtime checking instead of one.
bool IsWritePtr, unsigned DependencySetId, unsigned AliasSetId,		bool HasFork = false;
const SCEV *Expr)
: PointerValue(PointerValue), Start(Start), End(End),		PointerInfo(Value PointerValue, SmallVectorImpl<const SCEV > &ScStarts,
IsWritePtr(IsWritePtr), DependencySetId(DependencySetId),		SmallVectorImpl<const SCEV *> &ScEnds, bool IsWritePtr,
AliasSetId(AliasSetId), Expr(Expr) {}		unsigned DependencySetId, unsigned AliasSetId,
		SmallVectorImpl<const SCEV *> &ScExprs)
		: PointerValue(PointerValue), IsWritePtr(IsWritePtr),
		DependencySetId(DependencySetId), AliasSetId(AliasSetId) {
		assert(ScExprs.size() <= 2 && "Too many SCEV expressions for pointer");
		david-armUnsubmitted Done Reply Inline Actions Hi @huntergr, thanks for making the changes to `Exprs`. I just wonder if we should have an assert here that `ScExprs.size() <= 2`? david-arm: Hi @huntergr, thanks for making the changes to `Exprs`. I just wonder if we should have an…
		for (size_t i = 0; i < ScExprs.size(); ++i) {
		Exprs[i] = ScExprs[i];
		Starts[i] = ScStarts[i];
		Ends[i] = ScEnds[i];
		}

		// If there are two SCEV expressions associated with this pointer,
		// then we have a fork.
		HasFork = (ScExprs.size() == 2);
		}
};		};

RuntimePointerChecking(ScalarEvolution *SE) : Need(false), SE(SE) {}		RuntimePointerChecking(ScalarEvolution *SE, bool AllowForkedPtrs)
		: Need(false), AllowForkedPtrs(AllowForkedPtrs), SE(SE) {}

/// Reset the state of the pointer runtime information.		/// Reset the state of the pointer runtime information.
void reset() {		void reset() {
Need = false;		Need = false;
Pointers.clear();		Pointers.clear();
		ForkedPtrs.clear();
Checks.clear();		Checks.clear();
}		}

/// Insert a pointer and calculate the start and end SCEVs.		/// Insert a pointer and calculate the start and end SCEVs.
/// We need \p PSE in order to compute the SCEV expression of the pointer		/// We need \p PSE in order to compute the SCEV expression of the pointer
/// according to the assumptions that we've made during the analysis.		/// according to the assumptions that we've made during the analysis.
/// The method might also version the pointer stride according to \p Strides,		/// The method might also version the pointer stride according to \p Strides,
/// and add new predicates to \p PSE.		/// and add new predicates to \p PSE.
Show All 26 Lines	public:
/// Print the list run-time memory checks necessary.		/// Print the list run-time memory checks necessary.
void print(raw_ostream &OS, unsigned Depth = 0) const;		void print(raw_ostream &OS, unsigned Depth = 0) const;

/// Print \p Checks.		/// Print \p Checks.
void printChecks(raw_ostream &OS,		void printChecks(raw_ostream &OS,
const SmallVectorImpl<RuntimePointerCheck> &Checks,		const SmallVectorImpl<RuntimePointerCheck> &Checks,
unsigned Depth = 0) const;		unsigned Depth = 0) const;

		/// Returns true if the pointer can decompose into two separate pointers
		/// through a select.
		bool isForkedPtr(const Value *Ptr) const {
		return ForkedPtrs.count(Ptr) != 0;
		}

/// This flag indicates if we need to add the runtime check.		/// This flag indicates if we need to add the runtime check.
bool Need;		bool Need;

		/// This flags indicates we allow diverging pointers over a select
		bool AllowForkedPtrs;

/// Information about the pointers that may require checking.		/// Information about the pointers that may require checking.
SmallVector<PointerInfo, 2> Pointers;		SmallVector<PointerInfo, 2> Pointers;

		/// Mapping between a pointer Value used by memory operations and a pair
		/// of SCEV expressions. This is used in cases where a 'select' instruction
		/// is used to form part of an address, and we could form a single
		/// SCEVAddRecExpr with either side but not with both, so we just calculate
		/// and store the two expressions to determine whether checks are required.
		DenseMap<const Value *, ForkedPointer> ForkedPtrs;

/// Holds a partitioning of pointers into "check groups".		/// Holds a partitioning of pointers into "check groups".
SmallVector<RuntimeCheckingPtrGroup, 2> CheckingGroups;		SmallVector<RuntimeCheckingPtrGroup, 2> CheckingGroups;

/// Check if pointers are in the same partition		/// Check if pointers are in the same partition
///		///
/// \p PtrToPartition contains the partition number for pointers (-1 if the		/// \p PtrToPartition contains the partition number for pointers (-1 if the
/// pointer belongs to multiple partitions).		/// pointer belongs to multiple partitions).
static bool		static bool
▲ Show 20 Lines • Show All 319 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 448 Lines • ▼ Show 20 Lines	public:
unsigned getComplexity() const override { return Preds.size(); }		unsigned getComplexity() const override { return Preds.size(); }

/// Methods for support type inquiry through isa, cast, and dyn_cast:		/// Methods for support type inquiry through isa, cast, and dyn_cast:
static bool classof(const SCEVPredicate *P) {		static bool classof(const SCEVPredicate *P) {
return P->getKind() == P_Union;		return P->getKind() == P_Union;
}		}
};		};

		/// Type for a possible forked pointer, where a single pointer could be
		/// one of two different pointers determined conditionally via a select
		/// instruction.
		using ForkedPointer = std::pair<const SCEV , const SCEV >;
		david-armUnsubmitted Done Reply Inline Actions Hi Graham, it feels slightly awkward having to essentially redefine a ForkedPointer as a pair in the same way as `ForkedPtrs` in llvm/include/llvm/Analysis/LoopAccessAnalysis.h. Maybe we could define something like: using ForkedPointer = std::pair<const SCEV , const SCEV >; in llvm/include/llvm/Analysis/ScalarEvolution.h that can be used in LoopAccessAnalysis too? david-arm: Hi Graham, it feels slightly awkward having to essentially redefine a ForkedPointer as a pair…

/// The main scalar evolution driver. Because client code (intentionally)		/// The main scalar evolution driver. Because client code (intentionally)
/// can't do much with the SCEV objects directly, they must ask this class		/// can't do much with the SCEV objects directly, they must ask this class
/// for services.		/// for services.
class ScalarEvolution {		class ScalarEvolution {
friend class ScalarEvolutionsTest;		friend class ScalarEvolutionsTest;

public:		public:
/// An enum describing the relationship between a SCEV and a loop.		/// An enum describing the relationship between a SCEV and a loop.
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	public:
///		///
/// In the case that a relevant loop exit value cannot be computed, the		/// In the case that a relevant loop exit value cannot be computed, the
/// original value V is returned.		/// original value V is returned.
const SCEV getSCEVAtScope(const SCEV S, const Loop *L);		const SCEV getSCEVAtScope(const SCEV S, const Loop *L);

/// This is a convenience function which does getSCEVAtScope(getSCEV(V), L).		/// This is a convenience function which does getSCEVAtScope(getSCEV(V), L).
const SCEV getSCEVAtScope(Value V, const Loop *L);		const SCEV getSCEVAtScope(Value V, const Loop *L);

		/// This function determines whether the given pointer value V is a forked
		/// pointer; that is, could it have two possible values distinguished by
		/// a select instruction.
		Optional<ForkedPointer> findForkedPointer(Value V, const Loop L);
		fhahnUnsubmitted Not Done Reply Inline Actions This is only used by LAA. Is there a reason this needs to be part of `ScalarEvolution`? fhahn: This is only used by LAA. Is there a reason this needs to be part of `ScalarEvolution`?

/// Test whether entry to the loop is protected by a conditional between LHS		/// Test whether entry to the loop is protected by a conditional between LHS
/// and RHS. This is used to help avoid max expressions in loop trip		/// and RHS. This is used to help avoid max expressions in loop trip
/// counts, and to eliminate casts.		/// counts, and to eliminate casts.
bool isLoopEntryGuardedByCond(const Loop *L, ICmpInst::Predicate Pred,		bool isLoopEntryGuardedByCond(const Loop *L, ICmpInst::Predicate Pred,
const SCEV LHS, const SCEV RHS);		const SCEV LHS, const SCEV RHS);

/// Test whether entry to the basic block is protected by a conditional		/// Test whether entry to the basic block is protected by a conditional
/// between LHS and RHS.		/// between LHS and RHS.
▲ Show 20 Lines • Show All 1,468 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines

/// Enable store-to-load forwarding conflict detection. This option can		/// Enable store-to-load forwarding conflict detection. This option can
/// be disabled for correctness testing.		/// be disabled for correctness testing.
static cl::opt<bool> EnableForwardingConflictDetection(		static cl::opt<bool> EnableForwardingConflictDetection(
"store-to-load-forwarding-conflict-detection", cl::Hidden,		"store-to-load-forwarding-conflict-detection", cl::Hidden,
cl::desc("Enable conflict detection in loop-access analysis"),		cl::desc("Enable conflict detection in loop-access analysis"),
cl::init(true));		cl::init(true));

		/// Enables the detection of forked pointers when analyzing loops; these
		/// pointers could have two possible values at runtime based on a conditional
		fhahnUnsubmitted Not Done Reply Inline Actions It would be good to add a test for the option, e.g. a case that requires 2 or 3 recursions and set test with `max-forked-scev-depth=2/3` fhahn: It would be good to add a test for the option, e.g. a case that requires 2 or 3 recursions and…
		/// select instruction, and we can analyze both possibilities to determine
		/// bounds.
		static cl::opt<bool> EnableForkedPointers(
		"enable-forked-pointer-detection", cl::Hidden,
		cl::desc("Enable detection of pointers forked by a select instruction"),
		cl::init(false));

bool VectorizerParams::isInterleaveForced() {		bool VectorizerParams::isInterleaveForced() {
return ::VectorizationInterleave.getNumOccurrences() > 0;		return ::VectorizationInterleave.getNumOccurrences() > 0;
}		}

Value llvm::stripIntegerCast(Value V) {		Value llvm::stripIntegerCast(Value V) {
if (auto *CI = dyn_cast<CastInst>(V))		if (auto *CI = dyn_cast<CastInst>(V))
if (CI->getOperand(0)->getType()->isIntegerTy())		if (CI->getOperand(0)->getType()->isIntegerTy())
return CI->getOperand(0);		return CI->getOperand(0);
Show All 23 Lines	const SCEV *llvm::replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE,
auto *Expr = PSE.getSCEV(Ptr);		auto *Expr = PSE.getSCEV(Ptr);

LLVM_DEBUG(dbgs() << "LAA: Replacing SCEV: " << *OrigSCEV		LLVM_DEBUG(dbgs() << "LAA: Replacing SCEV: " << *OrigSCEV
<< " by: " << *Expr << "\n");		<< " by: " << *Expr << "\n");
return Expr;		return Expr;
}		}

RuntimeCheckingPtrGroup::RuntimeCheckingPtrGroup(		RuntimeCheckingPtrGroup::RuntimeCheckingPtrGroup(
unsigned Index, RuntimePointerChecking &RtCheck)		unsigned Index, RuntimePointerChecking &RtCheck, unsigned Fork)
: High(RtCheck.Pointers[Index].End), Low(RtCheck.Pointers[Index].Start),		: High(RtCheck.Pointers[Index].Ends[Fork]),
		Low(RtCheck.Pointers[Index].Starts[Fork]),
AddressSpace(RtCheck.Pointers[Index]		AddressSpace(RtCheck.Pointers[Index]
.PointerValue->getType()		.PointerValue->getType()
->getPointerAddressSpace()) {		->getPointerAddressSpace()) {
Members.push_back(Index);		assert(Fork <= 1 && "Fork out of range for pointer checking");
		Members.push_back(std::make_pair(Index, Fork));
		david-armUnsubmitted Done Reply Inline Actions nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway? david-arm: nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway?
}		}

/// Calculate Start and End points of memory access.		/// Calculate Start and End points of memory access.
/// Let's assume A is the first access and B is a memory access on N-th loop		/// Let's assume A is the first access and B is a memory access on N-th loop
/// iteration. Then B is calculated as:		/// iteration. Then B is calculated as:
/// B = A + Step*N .		/// B = A + Step*N .
/// Step value may be positive or negative.		/// Step value may be positive or negative.
/// N is a calculated back-edge taken count:		/// N is a calculated back-edge taken count:
Show All 10 Lines	void RuntimePointerChecking::insert(Loop Lp, Value Ptr, bool WritePtr,
PredicatedScalarEvolution &PSE) {		PredicatedScalarEvolution &PSE) {
// Get the stride replaced scev.		// Get the stride replaced scev.
const SCEV *Sc = replaceSymbolicStrideSCEV(PSE, Strides, Ptr);		const SCEV *Sc = replaceSymbolicStrideSCEV(PSE, Strides, Ptr);
ScalarEvolution *SE = PSE.getSE();		ScalarEvolution *SE = PSE.getSE();

const SCEV *ScStart;		const SCEV *ScStart;
const SCEV *ScEnd;		const SCEV *ScEnd;

		// See if this was a ForkedPtr. If so, we want to add checks for both sides.
		SmallVector<const SCEV *, 2> Scevs;
		SmallVector<const SCEV *, 2> Starts;
		SmallVector<const SCEV *, 2> Ends;
		if (ForkedPtrs.count(Ptr)) {
		Scevs.push_back(ForkedPtrs[Ptr].first);
		Scevs.push_back(ForkedPtrs[Ptr].second);
		} else
		Scevs.push_back(Sc);

		for (const SCEV *Sc : Scevs) {
if (SE->isLoopInvariant(Sc, Lp)) {		if (SE->isLoopInvariant(Sc, Lp)) {
ScStart = ScEnd = Sc;		ScStart = ScEnd = Sc;
} else {		} else {
const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Sc);		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Sc);
assert(AR && "Invalid addrec expression");		assert(AR && "Invalid addrec expression");
const SCEV *Ex = PSE.getBackedgeTakenCount();		const SCEV *Ex = PSE.getBackedgeTakenCount();

ScStart = AR->getStart();		ScStart = AR->getStart();
ScEnd = AR->evaluateAtIteration(Ex, *SE);		ScEnd = AR->evaluateAtIteration(Ex, *SE);
const SCEV Step = AR->getStepRecurrence(SE);		const SCEV Step = AR->getStepRecurrence(SE);

// For expressions with negative step, the upper bound is ScStart and the		// For expressions with negative step, the upper bound is ScStart and the
// lower bound is ScEnd.		// lower bound is ScEnd.
if (const auto *CStep = dyn_cast<SCEVConstant>(Step)) {		if (const auto *CStep = dyn_cast<SCEVConstant>(Step)) {
if (CStep->getValue()->isNegative())		if (CStep->getValue()->isNegative())
std::swap(ScStart, ScEnd);		std::swap(ScStart, ScEnd);
} else {		} else {
// Fallback case: the step is not constant, but we can still		// Fallback case: the step is not constant, but we can still
// get the upper and lower bounds of the interval by using min/max		// get the upper and lower bounds of the interval by using min/max
// expressions.		// expressions.
ScStart = SE->getUMinExpr(ScStart, ScEnd);		ScStart = SE->getUMinExpr(ScStart, ScEnd);
ScEnd = SE->getUMaxExpr(AR->getStart(), ScEnd);		ScEnd = SE->getUMaxExpr(AR->getStart(), ScEnd);
}		}
}		}
// Add the size of the pointed element to ScEnd.		// Add the size of the pointed element to ScEnd.
auto &DL = Lp->getHeader()->getModule()->getDataLayout();		auto &DL = Lp->getHeader()->getModule()->getDataLayout();
Type *IdxTy = DL.getIndexType(Ptr->getType());		Type *IdxTy = DL.getIndexType(Ptr->getType());
const SCEV *EltSizeSCEV =		const SCEV *EltSizeSCEV =
SE->getStoreSizeOfExpr(IdxTy, Ptr->getType()->getPointerElementType());		SE->getStoreSizeOfExpr(IdxTy, Ptr->getType()->getPointerElementType());
ScEnd = SE->getAddExpr(ScEnd, EltSizeSCEV);		ScEnd = SE->getAddExpr(ScEnd, EltSizeSCEV);
		Starts.push_back(ScStart);
Pointers.emplace_back(Ptr, ScStart, ScEnd, WritePtr, DepSetId, ASId, Sc);		Ends.push_back(ScEnd);
		}
		Pointers.emplace_back(Ptr, Starts, Ends, WritePtr, DepSetId, ASId, Scevs);
}		}

SmallVector<RuntimePointerCheck, 4>		SmallVector<RuntimePointerCheck, 4>
RuntimePointerChecking::generateChecks() const {		RuntimePointerChecking::generateChecks() const {
SmallVector<RuntimePointerCheck, 4> Checks;		SmallVector<RuntimePointerCheck, 4> Checks;

for (unsigned I = 0; I < CheckingGroups.size(); ++I) {		for (unsigned I = 0; I < CheckingGroups.size(); ++I) {
for (unsigned J = I + 1; J < CheckingGroups.size(); ++J) {		for (unsigned J = I + 1; J < CheckingGroups.size(); ++J) {
Show All 13 Lines	void RuntimePointerChecking::generateChecks(
groupChecks(DepCands, UseDependencies);		groupChecks(DepCands, UseDependencies);
Checks = generateChecks();		Checks = generateChecks();
}		}

bool RuntimePointerChecking::needsChecking(		bool RuntimePointerChecking::needsChecking(
const RuntimeCheckingPtrGroup &M, const RuntimeCheckingPtrGroup &N) const {		const RuntimeCheckingPtrGroup &M, const RuntimeCheckingPtrGroup &N) const {
for (unsigned I = 0, EI = M.Members.size(); EI != I; ++I)		for (unsigned I = 0, EI = M.Members.size(); EI != I; ++I)
for (unsigned J = 0, EJ = N.Members.size(); EJ != J; ++J)		for (unsigned J = 0, EJ = N.Members.size(); EJ != J; ++J)
if (needsChecking(M.Members[I], N.Members[J]))		if (needsChecking(M.Members[I].first, N.Members[J].first))
return true;		return true;
return false;		return false;
}		}

/// Compare \p I and \p J and return the minimum.		/// Compare \p I and \p J and return the minimum.
/// Return nullptr in case we couldn't find an answer.		/// Return nullptr in case we couldn't find an answer.
static const SCEV getMinFromExprs(const SCEV I, const SCEV *J,		static const SCEV getMinFromExprs(const SCEV I, const SCEV *J,
ScalarEvolution *SE) {		ScalarEvolution *SE) {
const SCEV *Diff = SE->getMinusSCEV(J, I);		const SCEV *Diff = SE->getMinusSCEV(J, I);
const SCEVConstant *C = dyn_cast<const SCEVConstant>(Diff);		const SCEVConstant *C = dyn_cast<const SCEVConstant>(Diff);

if (!C)		if (!C)
return nullptr;		return nullptr;
if (C->getValue()->isNegative())		if (C->getValue()->isNegative())
return J;		return J;
return I;		return I;
}		}

bool RuntimeCheckingPtrGroup::addPointer(unsigned Index,		bool RuntimeCheckingPtrGroup::addPointer(unsigned Index,
RuntimePointerChecking &RtCheck) {		RuntimePointerChecking &RtCheck,
		unsigned Fork) {
		assert(Fork <= 1 && "Fork out of range for pointer checking");
		david-armUnsubmitted Done Reply Inline Actions nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway? david-arm: nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway?
return addPointer(		return addPointer(
Index, RtCheck.Pointers[Index].Start, RtCheck.Pointers[Index].End,		Index, RtCheck.Pointers[Index].Starts[Fork],
		RtCheck.Pointers[Index].Ends[Fork],
RtCheck.Pointers[Index].PointerValue->getType()->getPointerAddressSpace(),		RtCheck.Pointers[Index].PointerValue->getType()->getPointerAddressSpace(),
*RtCheck.SE);		*RtCheck.SE, Fork);
}		}

bool RuntimeCheckingPtrGroup::addPointer(unsigned Index, const SCEV *Start,		bool RuntimeCheckingPtrGroup::addPointer(unsigned Index, const SCEV *Start,
const SCEV *End, unsigned AS,		const SCEV *End, unsigned AS,
ScalarEvolution &SE) {		ScalarEvolution &SE, unsigned Fork) {
assert(AddressSpace == AS &&		assert(AddressSpace == AS &&
"all pointers in a checking group must be in the same address space");		"all pointers in a checking group must be in the same address space");

// Compare the starts and ends with the known minimum and maximum		// Compare the starts and ends with the known minimum and maximum
// of this set. We need to know how we compare against the min/max		// of this set. We need to know how we compare against the min/max
// of the set in order to be able to emit memchecks.		// of the set in order to be able to emit memchecks.
const SCEV *Min0 = getMinFromExprs(Start, Low, &SE);		const SCEV *Min0 = getMinFromExprs(Start, Low, &SE);
if (!Min0)		if (!Min0)
return false;		return false;

const SCEV *Min1 = getMinFromExprs(End, High, &SE);		const SCEV *Min1 = getMinFromExprs(End, High, &SE);
if (!Min1)		if (!Min1)
return false;		return false;

// Update the low bound expression if we've found a new min value.		// Update the low bound expression if we've found a new min value.
if (Min0 == Start)		if (Min0 == Start)
Low = Start;		Low = Start;

// Update the high bound expression if we've found a new max value.		// Update the high bound expression if we've found a new max value.
if (Min1 != End)		if (Min1 != End)
High = End;		High = End;

Members.push_back(Index);		Members.push_back(std::make_pair(Index, Fork));
return true;		return true;
}		}

void RuntimePointerChecking::groupChecks(		void RuntimePointerChecking::groupChecks(
MemoryDepChecker::DepCandidates &DepCands, bool UseDependencies) {		MemoryDepChecker::DepCandidates &DepCands, bool UseDependencies) {
// We build the groups from dependency candidates equivalence classes		// We build the groups from dependency candidates equivalence classes
// because:		// because:
// - We know that pointers in the same equivalence class share		// - We know that pointers in the same equivalence class share
Show All 34 Lines	void RuntimePointerChecking::groupChecks(
// is also false. In this case we will use the fallback path and create		// is also false. In this case we will use the fallback path and create
// separate checking groups for all pointers.		// separate checking groups for all pointers.

// If we don't have the dependency partitions, construct a new		// If we don't have the dependency partitions, construct a new
// checking pointer group for each pointer. This is also required		// checking pointer group for each pointer. This is also required
// for correctness, because in this case we can have checking between		// for correctness, because in this case we can have checking between
// pointers to the same underlying object.		// pointers to the same underlying object.
if (!UseDependencies) {		if (!UseDependencies) {
for (unsigned I = 0; I < Pointers.size(); ++I)		for (unsigned I = 0; I < Pointers.size(); ++I) {
CheckingGroups.push_back(RuntimeCheckingPtrGroup(I, *this));		auto Group = CheckingGroups.emplace_back(I, this, /Fork=*/0);
		david-armUnsubmitted Done Reply Inline Actions nit: Is it worth calling `CheckingGroups.emplace_back(I, this, /Fork=0/0)` for clarity here? david-arm:* nit: Is it worth calling `CheckingGroups.emplace_back(I, this, /Fork=0*/0)` for clarity here?
		// Try to add a fork to the same group first, otherwise establish
		// a new one.
		if (Pointers[I].HasFork)
		if (!Group.addPointer(I, this, /Fork=*/1))
		CheckingGroups.emplace_back(I, this, /Fork=*/1);
		}
return;		return;
}		}

unsigned TotalComparisons = 0;		unsigned TotalComparisons = 0;

DenseMap<Value *, unsigned> PositionMap;		DenseMap<Value *, unsigned> PositionMap;
for (unsigned Index = 0; Index < Pointers.size(); ++Index)		for (unsigned Index = 0; Index < Pointers.size(); ++Index)
PositionMap[Pointers[Index].PointerValue] = Index;		PositionMap[Pointers[Index].PointerValue] = Index;
		fhahnUnsubmitted Not Done Reply Inline Actions This scheme would need documenting, i.e. why we can have multiple expressions for a pointer. fhahn: This scheme would need documenting, i.e. why we can have multiple expressions for a pointer.

// We need to keep track of what pointers we've already seen so we		// We need to keep track of what pointers we've already seen so we
// don't process them twice.		// don't process them twice.
SmallSet<unsigned, 2> Seen;		SmallSet<unsigned, 2> Seen;

		fhahnUnsubmitted Not Done Reply Inline Actions should be removed? fhahn: should be removed?
// Go through all equivalence classes, get the "pointer check groups"		// Go through all equivalence classes, get the "pointer check groups"
// and add them to the overall solution. We use the order in which accesses		// and add them to the overall solution. We use the order in which accesses
// appear in 'Pointers' to enforce determinism.		// appear in 'Pointers' to enforce determinism.
for (unsigned I = 0; I < Pointers.size(); ++I) {		for (unsigned I = 0; I < Pointers.size(); ++I) {
// We've seen this pointer before, and therefore already processed		// We've seen this pointer before, and therefore already processed
// its equivalence class.		// its equivalence class.
if (Seen.count(I))		if (Seen.count(I))
continue;		continue;
Show All 10 Lines	for (unsigned I = 0; I < Pointers.size(); ++I) {
// the order in which unions and insertions are performed on the		// the order in which unions and insertions are performed on the
// equivalence class, the iteration order is deterministic.		// equivalence class, the iteration order is deterministic.
for (auto MI = DepCands.member_begin(LeaderI), ME = DepCands.member_end();		for (auto MI = DepCands.member_begin(LeaderI), ME = DepCands.member_end();
MI != ME; ++MI) {		MI != ME; ++MI) {
auto PointerI = PositionMap.find(MI->getPointer());		auto PointerI = PositionMap.find(MI->getPointer());
assert(PointerI != PositionMap.end() &&		assert(PointerI != PositionMap.end() &&
"pointer in equivalence class not found in PositionMap");		"pointer in equivalence class not found in PositionMap");
unsigned Pointer = PointerI->second;		unsigned Pointer = PointerI->second;
bool Merged = false;		bool Merged[2] = {false, false};
		// Two expressions to evaluate for forked pointers.
		const unsigned NumExprs = Pointers[Pointer].HasFork ? 2 : 1;
// Mark this pointer as seen.		// Mark this pointer as seen.
Seen.insert(Pointer);		Seen.insert(Pointer);

// Go through all the existing sets and see if we can find one		// Go through all the existing sets and see if we can find one
// which can include this pointer.		// which can include this pointer.
for (RuntimeCheckingPtrGroup &Group : Groups) {		for (RuntimeCheckingPtrGroup &Group : Groups) {
// Don't perform more than a certain amount of comparisons.		// Don't perform more than a certain amount of comparisons.
// This should limit the cost of grouping the pointers to something		// This should limit the cost of grouping the pointers to something
// reasonable. If we do end up hitting this threshold, the algorithm		// reasonable. If we do end up hitting this threshold, the algorithm
// will create separate groups for all remaining pointers.		// will create separate groups for all remaining pointers.
if (TotalComparisons > MemoryCheckMergeThreshold)		if (TotalComparisons > MemoryCheckMergeThreshold)
break;		break;

TotalComparisons++;		TotalComparisons++;

if (Group.addPointer(Pointer, *this)) {		// Try to add both forks, if applicable.
Merged = true;		for (unsigned J = 0; J < NumExprs; ++J)
		if (!Merged[J])
		david-armUnsubmitted Not Done Reply Inline Actions Have you rewritten the logic here as a performance improvement, i.e. to avoid calling `Group.addPointer()` after it's already been merged? david-arm: Have you rewritten the logic here as a performance improvement, i.e. to avoid calling `Group.
		huntergrAuthorUnsubmitted Not Done Reply Inline Actions Not intentionally, no -- I just replicated the behaviour of the original (break out of the loop if the pointer merged into a group) but considered both potential forks. huntergr: Not intentionally, no -- I just replicated the behaviour of the original (break out of the loop…
		Merged[J] = Group.addPointer(Pointer, *this, J);

		if (Merged[0] && (Merged[1] \|\| NumExprs == 1))
break;		break;
}		}
}

if (!Merged)		// If we couldn't add the pointer expression(s) to any existing set or
// We couldn't add this pointer to any existing set or the threshold		// the threshold for the number of comparisons has been reached, create a
// for the number of comparisons has been reached. Create a new group		// new group to hold the current pointer expression(s).
// to hold the current pointer.		RuntimeCheckingPtrGroup *Group = nullptr;
Groups.push_back(RuntimeCheckingPtrGroup(Pointer, *this));		if (!Merged[0])
		Group = &Groups.emplace_back(Pointer, *this);
		if (!Merged[1] && NumExprs == 2) {
		// Try merging forks first, as they may have the same base address
		// but different offsets.
		if (!Group \|\| !Group->addPointer(Pointer, this, /Fork=*/1))
		Groups.emplace_back(Pointer, this, /Fork=*/1);
		}
}		}

// We've computed the grouped checks for this partition.		// We've computed the grouped checks for this partition.
// Save the results and continue with the next one.		// Save the results and continue with the next one.
llvm::copy(Groups, std::back_inserter(CheckingGroups));		llvm::copy(Groups, std::back_inserter(CheckingGroups));
}		}
}		}

Show All 28 Lines	void RuntimePointerChecking::printChecks(
unsigned Depth) const {		unsigned Depth) const {
unsigned N = 0;		unsigned N = 0;
for (const auto &Check : Checks) {		for (const auto &Check : Checks) {
const auto &First = Check.first->Members, &Second = Check.second->Members;		const auto &First = Check.first->Members, &Second = Check.second->Members;

OS.indent(Depth) << "Check " << N++ << ":\n";		OS.indent(Depth) << "Check " << N++ << ":\n";

OS.indent(Depth + 2) << "Comparing group (" << Check.first << "):\n";		OS.indent(Depth + 2) << "Comparing group (" << Check.first << "):\n";
for (unsigned K = 0; K < First.size(); ++K)		for (unsigned K = 0; K < First.size(); ++K) {
OS.indent(Depth + 2) << *Pointers[First[K]].PointerValue << "\n";		const PointerInfo &Info = Pointers[First[K].first];
		OS.indent(Depth + 2) << *Info.PointerValue << "\n";
		if (Info.HasFork)
		OS.indent(Depth + 6)
		<< "Fork: " << *Info.Exprs[First[K].second] << "\n";
		}

OS.indent(Depth + 2) << "Against group (" << Check.second << "):\n";		OS.indent(Depth + 2) << "Against group (" << Check.second << "):\n";
for (unsigned K = 0; K < Second.size(); ++K)		for (unsigned K = 0; K < Second.size(); ++K) {
OS.indent(Depth + 2) << *Pointers[Second[K]].PointerValue << "\n";		const PointerInfo &Info = Pointers[Second[K].first];
		OS.indent(Depth + 2) << *Info.PointerValue << "\n";
		if (Info.HasFork)
		OS.indent(Depth + 6)
		<< "Fork: " << *Info.Exprs[Second[K].second] << "\n";
		}
}		}
}		}

void RuntimePointerChecking::print(raw_ostream &OS, unsigned Depth) const {		void RuntimePointerChecking::print(raw_ostream &OS, unsigned Depth) const {

OS.indent(Depth) << "Run-time memory checks:\n";		OS.indent(Depth) << "Run-time memory checks:\n";
printChecks(OS, Checks, Depth);		printChecks(OS, Checks, Depth);

OS.indent(Depth) << "Grouped accesses:\n";		OS.indent(Depth) << "Grouped accesses:\n";
for (unsigned I = 0; I < CheckingGroups.size(); ++I) {		for (unsigned I = 0; I < CheckingGroups.size(); ++I) {
const auto &CG = CheckingGroups[I];		const auto &CG = CheckingGroups[I];

OS.indent(Depth + 2) << "Group " << &CG << ":\n";		OS.indent(Depth + 2) << "Group " << &CG << ":\n";
OS.indent(Depth + 4) << "(Low: " << CG.Low << " High: " << CG.High		OS.indent(Depth + 4) << "(Low: " << CG.Low << " High: " << CG.High
<< ")\n";		<< ")\n";
for (unsigned J = 0; J < CG.Members.size(); ++J) {		for (unsigned J = 0; J < CG.Members.size(); ++J) {
OS.indent(Depth + 6) << "Member: " << *Pointers[CG.Members[J]].Expr		const PointerInfo &Info = Pointers[CG.Members[J].first];
		OS.indent(Depth + 6) << "Member: " << *(Info.Exprs[CG.Members[J].second])
<< "\n";		<< "\n";
}		}
}		}
}		}

namespace {		namespace {

/// Analyses memory accesses in a loop.		/// Analyses memory accesses in a loop.
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines

} // end anonymous namespace		} // end anonymous namespace

/// Check whether a pointer can participate in a runtime bounds check.		/// Check whether a pointer can participate in a runtime bounds check.
/// If \p Assume, try harder to prove that we can compute the bounds of \p Ptr		/// If \p Assume, try harder to prove that we can compute the bounds of \p Ptr
/// by adding run-time checks (overflow checks) if necessary.		/// by adding run-time checks (overflow checks) if necessary.
static bool hasComputableBounds(PredicatedScalarEvolution &PSE,		static bool hasComputableBounds(PredicatedScalarEvolution &PSE,
const ValueToValueMap &Strides, Value *Ptr,		const ValueToValueMap &Strides, Value *Ptr,
Loop *L, bool Assume) {		Loop *L, bool Assume,
		RuntimePointerChecking &RtCheck) {
const SCEV *PtrScev = replaceSymbolicStrideSCEV(PSE, Strides, Ptr);		const SCEV *PtrScev = replaceSymbolicStrideSCEV(PSE, Strides, Ptr);

		ScalarEvolution *SE = PSE.getSE();
		fhahnUnsubmitted Not Done Reply Inline Actions should be removed? fhahn: should be removed?
// The bounds for loop-invariant pointer is trivial.		// The bounds for loop-invariant pointer is trivial.
if (PSE.getSE()->isLoopInvariant(PtrScev, L))		if (SE->isLoopInvariant(PtrScev, L))
return true;		return true;

const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);

if (!AR && Assume)		if (!AR && Assume)
AR = PSE.getAsAddRec(Ptr);		AR = PSE.getAsAddRec(Ptr);

if (!AR)		if (AR)
		david-armUnsubmitted Not Done Reply Inline Actions nit: This is just a suggestion, but you could restructure this a little to remove the extra indentation I think here: if (!AR) { if (!RtCheck.AllowForkedPtrs) return false; ... Not sure if this is better? david-arm: nit: This is just a suggestion, but you could restructure this a little to remove the extra…
		huntergrAuthorUnsubmitted Done Reply Inline Actions We still needed to bail out if AR was false and forked pointers weren't allowed... so I rewrote it to check for the positive case for AR first and then proceed with the forked pointers check and default false afterwards. huntergr: We still needed to bail out if AR was false and forked pointers weren't allowed... so I rewrote…
return false;

return AR->isAffine();		return AR->isAffine();

		// If we can't find a single SCEVAddRecExpr, then maybe we can find two.
		// The findForkedPointer function tries walking backwards through IR until
		// it finds operands that are either loop invariant or for which a
		// SCEVAddRecExpr can be formed. If a select instruction is encountered
		// during this walk then we split the walk and try to generate two valid
		// SCEVAddRecExprs. Both of those SCEVs can then be considered when
		// deciding whether runtime checks are required.
		if (RtCheck.AllowForkedPtrs) {
		Optional<ForkedPointer> FPtr = SE->findForkedPointer(Ptr, L);
		if (!FPtr)
		david-armUnsubmitted Done Reply Inline Actions nit: This is just a minor comment, but you could remove indentation further here by bailing out early, i.e. if (!FPtr) return false; const SCEV A = david-arm:* nit: This is just a minor comment, but you could remove indentation further here by bailing out…
		return false;

		const SCEV *A = FPtr->first;
		const SCEV *B = FPtr->second;
		david-armUnsubmitted Not Done Reply Inline Actions Is it better to just make a recursive call to hasComputableBounds instead of calling `isAffine` here? We're potentially missing out on future improvements to hasComputableBounds here, and also we're ignoring the possibility of a forked pointer being loop invariant I think. david-arm: Is it better to just make a recursive call to hasComputableBounds instead of calling `isAffine`…
		huntergrAuthorUnsubmitted Done Reply Inline Actions We can't do that right now -- hasComputableBounds takes a Value* rather than a SCEV* so it can (potentially) be added to the stride map in replaceSymbolicStrideSCEV. We'd just end up looking at the same Value and splitting again. This is something the SCEVForkedExpr would make cleaner, since we would only need to evaluate a single SCEV. But it'll take a bit of refactoring to do that, which is why I wanted some feedback on the whole idea first. We could also separate out parts of these functions to make it recursive, but I'll need to be careful since replaceSymbolicStrideSCEV has other users. huntergr: We can't do that right now -- hasComputableBounds takes a Value* rather than a SCEV* so it can…
		david-armUnsubmitted Done Reply Inline Actions nit: Instead of writing: LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << (FPtr->first) << "\n"); I think you can just write LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << A << "\n"); and same for the second one. david-arm: nit: Instead of writing: LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << *(FPtr->first) << "\n"); I…
		LLVM_DEBUG(dbgs() << "LAA: ForkedPtr found: " << *Ptr << "\n");
		LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << *A << "\n");
		david-armUnsubmitted Done Reply Inline Actions Is there any danger in setting this before we return true? david-arm: Is there any danger in setting this before we return true?
		LLVM_DEBUG(dbgs() << "LAA: SCEV2: " << *B << "\n");
		if (isa<SCEVAddRecExpr>(A) && cast<SCEVAddRecExpr>(A)->isAffine() &&
		isa<SCEVAddRecExpr>(B) && cast<SCEVAddRecExpr>(B)->isAffine()) {
		RtCheck.ForkedPtrs[Ptr] = *FPtr;
		return true;
		}

		LLVM_DEBUG(
		dbgs() << "LAA: Could not determine bounds for forked pointer\n");
		}

		return false;
}		}

/// Check whether a pointer address cannot wrap.		/// Check whether a pointer address cannot wrap.
static bool isNoWrap(PredicatedScalarEvolution &PSE,		static bool isNoWrap(PredicatedScalarEvolution &PSE,
const ValueToValueMap &Strides, Value Ptr, Loop L) {		const ValueToValueMap &Strides, Value Ptr, Loop L,
		RuntimePointerChecking &RtCheck) {
const SCEV *PtrScev = PSE.getSCEV(Ptr);		const SCEV *PtrScev = PSE.getSCEV(Ptr);
if (PSE.getSE()->isLoopInvariant(PtrScev, L))		if (PSE.getSE()->isLoopInvariant(PtrScev, L))
return true;		return true;

		// The SCEV generated directly from a forked pointer is not an AddRecExpr,
		// but an unknown SCEV. The PSE.hasNoOverflow method currently assumes
		// that it must be an AddRecExpr and just casts. So we just bail out at
		// this point, since we can't pass in the SCEVs one at a time --
		// hasNoOverflow takes a Value* as a param instead of a SCEV*.
		// TODO: Support overflow checking for ForkedPointers.
		//
		// We can, however, calculate an effective stride for each side of the fork
		// and check if the stride is 1 for both; this is the other way we can
		// assume a pointer doesn't wrap.
		if (RtCheck.isForkedPtr(Ptr)) {
		auto SCEVs = RtCheck.ForkedPtrs[Ptr];
		const SCEVAddRecExpr *LSAR = cast<SCEVAddRecExpr>(SCEVs.first);
		const SCEVAddRecExpr *RSAR = cast<SCEVAddRecExpr>(SCEVs.second);
		const SCEV LStep = LSAR->getStepRecurrence(PSE.getSE());
		const SCEV RStep = RSAR->getStepRecurrence(PSE.getSE());

		if (LStep != RStep) {
		LLVM_DEBUG(dbgs() << "LAA: Forkedptr with mismatched steps\n");
		return false;
		}

		auto *PtrTy = dyn_cast<PointerType>(Ptr->getType());
		auto &DL = L->getHeader()->getModule()->getDataLayout();
		int64_t Size = DL.getTypeAllocSize(PtrTy->getElementType()).getFixedSize();
		david-armUnsubmitted Done Reply Inline Actions You're introducing a new implicit TypeSize -> uint64_t cast here. Could you rewrite this as: int64_t Size = DL.getTypeAllocSize(PtrTy->getElementType()).getFixedSize(); david-arm: You're introducing a new implicit TypeSize -> uint64_t cast here. Could you rewrite this as…

		const SCEVConstant *C = dyn_cast<SCEVConstant>(LStep);
		if (!C) {
		LLVM_DEBUG(dbgs() << "LAA: Forkedptr with non-constant steps\n");
		return false;
		}
		const APInt &APStepVal = C->getAPInt();
		if (APStepVal.getBitWidth() > 64) {
		LLVM_DEBUG(dbgs() << "LAA: Forkedptr step is too large\n");
		return false;
		}

		int64_t StepVal = APStepVal.getSExtValue();

		// Strided access.
		int64_t Stride = StepVal / Size;
		int64_t Rem = StepVal % Size;
		if (Rem) {
		LLVM_DEBUG(dbgs() << "LAA: Forkedptr step not multiple of elt size\n");
		return false;
		}

		return Stride == 1;
		}

Type *AccessTy = Ptr->getType()->getPointerElementType();		Type *AccessTy = Ptr->getType()->getPointerElementType();
int64_t Stride = getPtrStride(PSE, AccessTy, Ptr, L, Strides);		int64_t Stride = getPtrStride(PSE, AccessTy, Ptr, L, Strides);
if (Stride == 1 \|\| PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW))		if (Stride == 1 \|\| PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW))
return true;		return true;

return false;		return false;
}		}

bool AccessAnalysis::createCheckForAccess(RuntimePointerChecking &RtCheck,		bool AccessAnalysis::createCheckForAccess(RuntimePointerChecking &RtCheck,
MemAccessInfo Access,		MemAccessInfo Access,
const ValueToValueMap &StridesMap,		const ValueToValueMap &StridesMap,
DenseMap<Value *, unsigned> &DepSetId,		DenseMap<Value *, unsigned> &DepSetId,
Loop *TheLoop, unsigned &RunningDepId,		Loop *TheLoop, unsigned &RunningDepId,
unsigned ASId, bool ShouldCheckWrap,		unsigned ASId, bool ShouldCheckWrap,
bool Assume) {		bool Assume) {
Value *Ptr = Access.getPointer();		Value *Ptr = Access.getPointer();

if (!hasComputableBounds(PSE, StridesMap, Ptr, TheLoop, Assume))		if (!hasComputableBounds(PSE, StridesMap, Ptr, TheLoop, Assume, RtCheck))
return false;		return false;

// When we run after a failing dependency check we have to make sure		// When we run after a failing dependency check we have to make sure
// we don't have wrapping pointers.		// we don't have wrapping pointers.
if (ShouldCheckWrap && !isNoWrap(PSE, StridesMap, Ptr, TheLoop)) {		if (ShouldCheckWrap && !isNoWrap(PSE, StridesMap, Ptr, TheLoop, RtCheck)) {
auto *Expr = PSE.getSCEV(Ptr);		auto *Expr = PSE.getSCEV(Ptr);
if (!Assume \|\| !isa<SCEVAddRecExpr>(Expr))		if (!Assume \|\| !isa<SCEVAddRecExpr>(Expr))
return false;		return false;
PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);		PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
}		}

// The id of the dependence set.		// The id of the dependence set.
unsigned DepId;		unsigned DepId;

if (isDependencyCheckNeeded()) {		if (isDependencyCheckNeeded()) {
Value *Leader = DepCands.getLeaderValue(Access).getPointer();		Value *Leader = DepCands.getLeaderValue(Access).getPointer();
unsigned &LeaderId = DepSetId[Leader];		unsigned &LeaderId = DepSetId[Leader];
if (!LeaderId)		if (!LeaderId)
LeaderId = RunningDepId++;		LeaderId = RunningDepId++;
DepId = LeaderId;		DepId = LeaderId;
} else		} else
// Each access has its own dependence set.		// Each access has its own dependence set.
DepId = RunningDepId++;		DepId = RunningDepId++;

bool IsWrite = Access.getInt();		bool IsWrite = Access.getInt();
RtCheck.insert(TheLoop, Ptr, IsWrite, DepId, ASId, StridesMap, PSE);		RtCheck.insert(TheLoop, Ptr, IsWrite, DepId, ASId, StridesMap, PSE);
LLVM_DEBUG(dbgs() << "LAA: Found a runtime check ptr:" << *Ptr << '\n');		LLVM_DEBUG(dbgs() << "LAA: Found a runtime check ptr:" << *Ptr << '\n');
		fhahnUnsubmitted Not Done Reply Inline Actions Needs resolving. It should be needed, because the above code may have added assumptions, which make Ptr an AddRec, See comment in D114487. fhahn: Needs resolving. It should be needed, because the above code may have added assumptions, which…

return true;		return true;
}		}

bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,		bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,
ScalarEvolution SE, Loop TheLoop,		ScalarEvolution SE, Loop TheLoop,
const ValueToValueMap &StridesMap,		const ValueToValueMap &StridesMap,
bool ShouldCheckWrap) {		bool ShouldCheckWrap) {
// Find pointers with computable bounds. We are going to use this information		// Find pointers with computable bounds. We are going to use this information
// to place a runtime bound check.		// to place a runtime bound check.
bool CanDoRT = true;		bool CanDoRT = true;

bool MayNeedRTCheck = false;		bool MayNeedRTCheck = false;
if (!IsRTCheckAnalysisNeeded) return true;		if (!IsRTCheckAnalysisNeeded) return true;
		fhahnUnsubmitted Done Reply Inline Actions missing test for this? fhahn: missing test for this?

bool IsDepCheckNeeded = isDependencyCheckNeeded();		bool IsDepCheckNeeded = isDependencyCheckNeeded();

// We assign a consecutive id to access from different alias sets.		// We assign a consecutive id to access from different alias sets.
// Accesses between different groups doesn't need to be checked.		// Accesses between different groups doesn't need to be checked.
unsigned ASId = 0;		unsigned ASId = 0;
for (auto &AS : AST) {		for (auto &AS : AST) {
int NumReadPtrChecks = 0;		int NumReadPtrChecks = 0;
int NumWritePtrChecks = 0;		int NumWritePtrChecks = 0;
bool CanDoAliasSetRT = true;		bool CanDoAliasSetRT = true;
++ASId;		++ASId;

// We assign consecutive id to access from different dependence sets.		// We assign consecutive id to access from different dependence sets.
// Accesses within the same set don't need a runtime check.		// Accesses within the same set don't need a runtime check.
unsigned RunningDepId = 1;		unsigned RunningDepId = 1;
DenseMap<Value *, unsigned> DepSetId;		DenseMap<Value *, unsigned> DepSetId;

SmallVector<MemAccessInfo, 4> Retries;		SmallVector<MemAccessInfo, 4> Retries;
		fhahnUnsubmitted Done Reply Inline Actions It looks like tests for those conditions are missing? fhahn: It looks like tests for those conditions are missing?

// First, count how many write and read accesses are in the alias set. Also		// First, count how many write and read accesses are in the alias set. Also
// collect MemAccessInfos for later.		// collect MemAccessInfos for later.
SmallVector<MemAccessInfo, 4> AccessInfos;		SmallVector<MemAccessInfo, 4> AccessInfos;
for (const auto &A : AS) {		for (const auto &A : AS) {
Value *Ptr = A.getValue();		Value *Ptr = A.getValue();
bool IsWrite = Accesses.count(MemAccessInfo(Ptr, true));		bool IsWrite = Accesses.count(MemAccessInfo(Ptr, true));

if (IsWrite)		if (IsWrite)
++NumWritePtrChecks;		++NumWritePtrChecks;
else		else
++NumReadPtrChecks;		++NumReadPtrChecks;
AccessInfos.emplace_back(Ptr, IsWrite);		AccessInfos.emplace_back(Ptr, IsWrite);
		fhahnUnsubmitted Not Done Reply Inline Actions I don't think we can use the inbounds info here, unless we prove that the program is undefined if GEP is poison. Consider something like below. If `%c` is always false, the GEP index could be out-of-bounds (and the GEP poison). Adding a runtime check based on the SCEV expression may introduce a branch on poison unconditionally. define dso_local void @forked_ptrs_different_base_same_offset(float* nocapture readonly %Base1, float* nocapture readonly %Base2, float* nocapture %Dest, i32* nocapture readonly %Preds, i1 %c) { entry: br label %for.body for.cond.cleanup: ret void for.body: %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %latch ] %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv %0 = load i32, i32* %arrayidx, align 4 %cmp1.not = icmp eq i32 %0, 0 %spec.select = select i1 %cmp1.not, float* %Base2, float* %Base1 %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv %.sink = load float, float* %.sink.in, align 4 %1 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv br i1 %c, label %then, label %latch then: store float %.sink, float* %1, align 4 br label %latch latch: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond.not = icmp eq i64 %indvars.iv.next, 100 br i1 %exitcond.not, label %for.cond.cleanup, label %for.body } fhahn: I don't think we can use the inbounds info here, unless we prove that the program is undefined…
		huntergrAuthorUnsubmitted Done Reply Inline Actions I removed the code to add no-wrap flags to the SCEV based on 'inbounds', then added your test. huntergr: I removed the code to add no-wrap flags to the SCEV based on 'inbounds', then added your test.
}		}

// We do not need runtime checks for this alias set, if there are no writes		// We do not need runtime checks for this alias set, if there are no writes
// or a single write and no reads.		// or a single write and no reads.
if (NumWritePtrChecks == 0 \|\|		if (NumWritePtrChecks == 0 \|\|
(NumWritePtrChecks == 1 && NumReadPtrChecks == 0)) {		(NumWritePtrChecks == 1 && NumReadPtrChecks == 0)) {
assert((AS.size() <= 1 \|\|		assert((AS.size() <= 1 \|\|
		fhahnUnsubmitted Done Reply Inline Actions It might be a bit simpler to just add a duplicate to BaseScevs/OffsetScevs and have one common code path to compute the SCEVs to add fhahn: It might be a bit simpler to just add a duplicate to BaseScevs/OffsetScevs and have one common…
all_of(AS,		all_of(AS,
[this](auto AC) {		[this](auto AC) {
MemAccessInfo AccessWrite(AC.getValue(), true);		MemAccessInfo AccessWrite(AC.getValue(), true);
return DepCands.findValue(AccessWrite) == DepCands.end();		return DepCands.findValue(AccessWrite) == DepCands.end();
})) &&		})) &&
"Can only skip updating CanDoRT below, if all entries in AS "		"Can only skip updating CanDoRT below, if all entries in AS "
"are reads or there is at most 1 entry");		"are reads or there is at most 1 entry");
continue;		continue;
Show All 25 Lines	for (auto &AS : AST) {
if (NeedsAliasSetRTCheck && !CanDoAliasSetRT) {		if (NeedsAliasSetRTCheck && !CanDoAliasSetRT) {
// Reset the CanDoSetRt flag and retry all accesses that have failed.		// Reset the CanDoSetRt flag and retry all accesses that have failed.
// We know that we need these checks, so we can now be more aggressive		// We know that we need these checks, so we can now be more aggressive
// and add further checks if required (overflow checks).		// and add further checks if required (overflow checks).
CanDoAliasSetRT = true;		CanDoAliasSetRT = true;
for (auto Access : Retries)		for (auto Access : Retries)
if (!createCheckForAccess(RtCheck, Access, StridesMap, DepSetId,		if (!createCheckForAccess(RtCheck, Access, StridesMap, DepSetId,
TheLoop, RunningDepId, ASId,		TheLoop, RunningDepId, ASId,
ShouldCheckWrap, /Assume=/true)) {		ShouldCheckWrap, /Assume=/true)) {
		fhahnUnsubmitted Not Done Reply Inline Actions nit: unnecessary move? fhahn: nit: unnecessary move?
CanDoAliasSetRT = false;		CanDoAliasSetRT = false;
break;		break;
}		}
}		}

CanDoRT &= CanDoAliasSetRT;		CanDoRT &= CanDoAliasSetRT;
MayNeedRTCheck \|= NeedsAliasSetRTCheck;		MayNeedRTCheck \|= NeedsAliasSetRTCheck;
++ASId;		++ASId;
▲ Show 20 Lines • Show All 1,049 Lines • ▼ Show 20 Lines	void LoopAccessInfo::analyzeLoop(AAResults AA, LoopInfo LI,

bool HasComplexMemInst = false;		bool HasComplexMemInst = false;

// A runtime check is only legal to insert if there are no convergent calls.		// A runtime check is only legal to insert if there are no convergent calls.
HasConvergentOp = false;		HasConvergentOp = false;

PtrRtChecking->Pointers.clear();		PtrRtChecking->Pointers.clear();
PtrRtChecking->Need = false;		PtrRtChecking->Need = false;
		PtrRtChecking->ForkedPtrs.clear();

const bool IsAnnotatedParallel = TheLoop->isAnnotatedParallel();		const bool IsAnnotatedParallel = TheLoop->isAnnotatedParallel();

const bool EnableMemAccessVersioningOfLoop =		const bool EnableMemAccessVersioningOfLoop =
EnableMemAccessVersioning &&		EnableMemAccessVersioning &&
!TheLoop->getHeader()->getParent()->hasOptSize();		!TheLoop->getHeader()->getParent()->hasOptSize();

// For each block.		// For each block.
▲ Show 20 Lines • Show All 367 Lines • ▼ Show 20 Lines	void LoopAccessInfo::collectStridedAccess(Value *MemAccess) {
SymbolicStrides[Ptr] = Stride;		SymbolicStrides[Ptr] = Stride;
StrideSet.insert(Stride);		StrideSet.insert(Stride);
}		}

LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,		LoopAccessInfo::LoopAccessInfo(Loop L, ScalarEvolution SE,
const TargetLibraryInfo TLI, AAResults AA,		const TargetLibraryInfo TLI, AAResults AA,
DominatorTree DT, LoopInfo LI)		DominatorTree DT, LoopInfo LI)
: PSE(std::make_unique<PredicatedScalarEvolution>(SE, L)),		: PSE(std::make_unique<PredicatedScalarEvolution>(SE, L)),
PtrRtChecking(std::make_unique<RuntimePointerChecking>(SE)),		PtrRtChecking(
		std::make_unique<RuntimePointerChecking>(SE, EnableForkedPointers)),
DepChecker(std::make_unique<MemoryDepChecker>(*PSE, L)), TheLoop(L),		DepChecker(std::make_unique<MemoryDepChecker>(*PSE, L)), TheLoop(L),
NumLoads(0), NumStores(0), MaxSafeDepDistBytes(-1), CanVecMem(false),		NumLoads(0), NumStores(0), MaxSafeDepDistBytes(-1), CanVecMem(false),
HasConvergentOp(false),		HasConvergentOp(false),
HasDependenceInvolvingLoopInvariantAddress(false) {		HasDependenceInvolvingLoopInvariantAddress(false) {
if (canAnalyzeLoop())		if (canAnalyzeLoop())
analyzeLoop(AA, LI, TLI, DT);		analyzeLoop(AA, LI, TLI, DT);
}		}

▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,265 Lines • ▼ Show 20 Lines	const SCEV ScalarEvolution::computeSCEVAtScope(const SCEV V, const Loop *L) {

llvm_unreachable("Unknown SCEV type!");		llvm_unreachable("Unknown SCEV type!");
}		}

const SCEV ScalarEvolution::getSCEVAtScope(Value V, const Loop *L) {		const SCEV ScalarEvolution::getSCEVAtScope(Value V, const Loop *L) {
return getSCEVAtScope(getSCEV(V), L);		return getSCEVAtScope(getSCEV(V), L);
}		}

		// Walk back through the IR for a pointer, looking for a select like the
		// following:
		//
		// %offset = select i1 %cmp, i64 %a, i64 %b
		// %addr = getelementptr double, double* %base, i64 %offset
		// %ld = load double, double* %addr, align 8
		//
		// We won't be able to form a single SCEVAddRecExpr from this since the
		// address for each loop iteration depends on %cmp. We could potentially
		// produce multiple valid SCEVAddRecExprs, though, and check all of them for
		// memory safety/aliasing if needed.
		//
		// If we encounter some IR we don't yet handle, or something obviously fine
		// like a constant, then we just add the SCEV for that term to the list passed
		// in by the caller. If we have a node that may potentially yield a valid
		// SCEVAddRecExpr then we decompose it into parts and build the SCEV terms
		// ourselves before adding to the list.
		static void findForkedSCEVs(ScalarEvolution SE, const Loop L, Value *Ptr,
		SmallVectorImpl<const SCEV *> &ScevList) {
		const SCEV *Scev = SE->getSCEV(Ptr);
		if (SE->isLoopInvariant(Scev, L) \|\| isa<SCEVAddRecExpr>(Scev) \|\|
		!isa<Instruction>(Ptr)) {
		ScevList.push_back(Scev);
		return;
		}

		auto GetBinOpExpr = [&SE](unsigned Opcode, const SCEV L, const SCEV R) {
		switch (Opcode) {
		case Instruction::Add:
		return SE->getAddExpr(L, R);
		case Instruction::Sub:
		return SE->getMinusSCEV(L, R);
		case Instruction::Mul:
		return SE->getMulExpr(L, R);
		default:
		llvm_unreachable("Unexpected binary operator when walking ForkedPtrs");
		}
		};

		Instruction *I = cast<Instruction>(Ptr);
		unsigned Opcode = I->getOpcode();
		switch (Opcode) {
		case Instruction::BitCast:
		findForkedSCEVs(SE, L, I->getOperand(0), ScevList);
		break;
		case Instruction::SExt:
		case Instruction::ZExt: {
		SmallVector<const SCEV *, 2> ExtScevs;
		findForkedSCEVs(SE, L, I->getOperand(0), ExtScevs);
		for (const SCEV *Scev : ExtScevs)
		if (Opcode == Instruction::SExt)
		ScevList.push_back(SE->getSignExtendExpr(Scev, I->getType()));
		else
		ScevList.push_back(SE->getZeroExtendExpr(Scev, I->getType()));
		break;
		}
		case Instruction::GetElementPtr: {
		GetElementPtrInst *GEP = cast<GetElementPtrInst>(I);
		Type *SourceTy = GEP->getSourceElementType();
		// We only handle base + single offset GEPs here for now.
		// Not dealing with preexisting gathers yet, so no vectors.
		if (I->getNumOperands() != 2 \|\| SourceTy->isVectorTy()) {
		ScevList.push_back(Scev);
		break;
		}
		SmallVector<const SCEV *, 2> BaseScevs;
		SmallVector<const SCEV *, 2> OffsetScevs;
		findForkedSCEVs(SE, L, I->getOperand(0), BaseScevs);
		findForkedSCEVs(SE, L, I->getOperand(1), OffsetScevs);

		// Make sure we get the correct pointer type to extend to, including the
		// address space.
		const SCEV *BaseExpr = SE->getSCEV(GEP->getPointerOperand());
		Type *IntPtrTy = SE->getEffectiveSCEVType(BaseExpr->getType());
		SCEV::NoWrapFlags Wrap =
		GEP->isInBounds() ? SCEV::FlagNSW : SCEV::FlagAnyWrap;
		// Find the size of the type being pointed to. We only have a single
		// index term (guarded above) so we don't need to index into arrays or
		// structures, just get the size of the scalar value.
		const SCEV *Size = SE->getSizeOfExpr(IntPtrTy, SourceTy);

		if (OffsetScevs.size() == 2 && BaseScevs.size() == 1) {
		const SCEV *Off1 = SE->getTruncateOrSignExtend(OffsetScevs[0], IntPtrTy);
		const SCEV *Off2 = SE->getTruncateOrSignExtend(OffsetScevs[1], IntPtrTy);
		const SCEV *Mul1 = SE->getMulExpr(Size, Off1, Wrap);
		const SCEV *Mul2 = SE->getMulExpr(Size, Off2, Wrap);
		const SCEV *Add1 = SE->getAddExpr(BaseScevs[0], Mul1, Wrap);
		const SCEV *Add2 = SE->getAddExpr(BaseScevs[0], Mul2, Wrap);
		ScevList.push_back(Add1);
		ScevList.push_back(Add2);
		} else if (BaseScevs.size() == 2 && OffsetScevs.size() == 1) {
		const SCEV *Off = SE->getTruncateOrSignExtend(OffsetScevs[0], IntPtrTy);
		const SCEV *Mul = SE->getMulExpr(Size, Off, Wrap);
		const SCEV *Add1 = SE->getAddExpr(BaseScevs[0], Mul, Wrap);
		const SCEV *Add2 = SE->getAddExpr(BaseScevs[1], Mul, Wrap);
		ScevList.push_back(Add1);
		ScevList.push_back(Add2);
		} else
		ScevList.push_back(Scev);
		break;
		}
		case Instruction::Select: {
		SmallVector<const SCEV *, 2> ChildScevs;
		// A select means we've found a forked pointer, but we currently only
		// support a single select per pointer so if there's another behind this
		// then we just bail out and return the generic SCEV.
		findForkedSCEVs(SE, L, I->getOperand(1), ChildScevs);
		findForkedSCEVs(SE, L, I->getOperand(2), ChildScevs);
		if (ChildScevs.size() == 2) {
		ScevList.push_back(ChildScevs[0]);
		ScevList.push_back(ChildScevs[1]);
		} else
		ScevList.push_back(Scev);
		break;
		}
		// If adding another binop to this list, update GetBinOpExpr above
		case Instruction::Add:
		case Instruction::Sub:
		case Instruction::Mul: {
		SmallVector<const SCEV *, 2> LScevs;
		SmallVector<const SCEV *, 2> RScevs;
		findForkedSCEVs(SE, L, I->getOperand(0), LScevs);
		findForkedSCEVs(SE, L, I->getOperand(1), RScevs);
		if (LScevs.size() == 2 && RScevs.size() == 1) {
		const SCEV *Op1 = GetBinOpExpr(Opcode, LScevs[0], RScevs[0]);
		const SCEV *Op2 = GetBinOpExpr(Opcode, LScevs[1], RScevs[0]);
		ScevList.push_back(Op1);
		ScevList.push_back(Op2);
		} else if (LScevs.size() == 1 && RScevs.size() == 2) {
		const SCEV *Op1 = GetBinOpExpr(Opcode, LScevs[0], RScevs[0]);
		const SCEV *Op2 = GetBinOpExpr(Opcode, LScevs[0], RScevs[1]);
		ScevList.push_back(Op1);
		ScevList.push_back(Op2);
		} else
		ScevList.push_back(Scev);
		break;
		}
		default:
		// Just return the current SCEV if we haven't handled the instruction yet.
		LLVM_DEBUG(dbgs() << "ForkedPtr unhandled instruction: " << *I << "\n");
		ScevList.push_back(Scev);
		break;
		}

		return;
		}

		Optional<ForkedPointer> ScalarEvolution::findForkedPointer(Value *V,
		const Loop *L) {
		assert(isSCEVable(V->getType()) && "Value is not SCEVable!");
		SmallVector<const SCEV *, 2> Scevs;
		findForkedSCEVs(this, L, V, Scevs);

		// For now, we will only accept a forked pointer with two options.
		if (Scevs.size() == 2)
		return std::make_pair(Scevs[0], Scevs[1]);

		return None;
		}

const SCEV ScalarEvolution::stripInjectiveFunctions(const SCEV S) const {		const SCEV ScalarEvolution::stripInjectiveFunctions(const SCEV S) const {
if (const SCEVZeroExtendExpr *ZExt = dyn_cast<SCEVZeroExtendExpr>(S))		if (const SCEVZeroExtendExpr *ZExt = dyn_cast<SCEVZeroExtendExpr>(S))
return stripInjectiveFunctions(ZExt->getOperand());		return stripInjectiveFunctions(ZExt->getOperand());
if (const SCEVSignExtendExpr *SExt = dyn_cast<SCEVSignExtendExpr>(S))		if (const SCEVSignExtendExpr *SExt = dyn_cast<SCEVSignExtendExpr>(S))
return stripInjectiveFunctions(SExt->getOperand());		return stripInjectiveFunctions(SExt->getOperand());
return S;		return S;
}		}

▲ Show 20 Lines • Show All 4,710 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopDistribute.cpp

Show First 20 Lines • Show All 902 Lines • ▼ Show 20 Lines	private:
/// number -1 means that the pointer is used in multiple partitions. In this		/// number -1 means that the pointer is used in multiple partitions. In this
/// case we can't safely omit the check.		/// case we can't safely omit the check.
SmallVector<RuntimePointerCheck, 4> includeOnlyCrossPartitionChecks(		SmallVector<RuntimePointerCheck, 4> includeOnlyCrossPartitionChecks(
const SmallVectorImpl<RuntimePointerCheck> &AllChecks,		const SmallVectorImpl<RuntimePointerCheck> &AllChecks,
const SmallVectorImpl<int> &PtrToPartition,		const SmallVectorImpl<int> &PtrToPartition,
const RuntimePointerChecking *RtPtrChecking) {		const RuntimePointerChecking *RtPtrChecking) {
SmallVector<RuntimePointerCheck, 4> Checks;		SmallVector<RuntimePointerCheck, 4> Checks;

copy_if(AllChecks, std::back_inserter(Checks),		copy_if(
		AllChecks, std::back_inserter(Checks),
[&](const RuntimePointerCheck &Check) {		[&](const RuntimePointerCheck &Check) {
for (unsigned PtrIdx1 : Check.first->Members)		for (auto PtrIdx1 : Check.first->Members)
for (unsigned PtrIdx2 : Check.second->Members)		for (auto PtrIdx2 : Check.second->Members)
// Only include this check if there is a pair of pointers		// Only include this check if there is a pair of pointers
// that require checking and the pointers fall into		// that require checking and the pointers fall into
// separate partitions.		// separate partitions.
//		//
// (Note that we already know at this point that the two		// (Note that we already know at this point that the two
// pointer groups need checking but it doesn't follow		// pointer groups need checking but it doesn't follow
// that each pair of pointers within the two groups need		// that each pair of pointers within the two groups need
// checking as well.		// checking as well.
//		//
// In other words we don't want to include a check just		// In other words we don't want to include a check just
// because there is a pair of pointers between the two		// because there is a pair of pointers between the two
// pointer groups that require checks and a different		// pointer groups that require checks and a different
// pair whose pointers fall into different partitions.)		// pair whose pointers fall into different partitions.)
if (RtPtrChecking->needsChecking(PtrIdx1, PtrIdx2) &&		if (RtPtrChecking->needsChecking(PtrIdx1.first, PtrIdx2.first) &&
!RuntimePointerChecking::arePointersInSamePartition(		!RuntimePointerChecking::arePointersInSamePartition(
PtrToPartition, PtrIdx1, PtrIdx2))		PtrToPartition, PtrIdx1.first, PtrIdx2.first))
return true;		return true;
return false;		return false;
});		});

return Checks;		return Checks;
}		}

/// Check whether the loop metadata is forcing distribution to be		/// Check whether the loop metadata is forcing distribution to be
/// enabled/disabled.		/// enabled/disabled.
void setForced() {		void setForced() {
Optional<const MDOperand *> Value =		Optional<const MDOperand *> Value =
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp

Show First 20 Lines • Show All 387 Lines • ▼ Show 20 Lines	SmallVector<RuntimePointerCheck, 4> collectMemchecks(

const auto &AllChecks = LAI.getRuntimePointerChecking()->getChecks();		const auto &AllChecks = LAI.getRuntimePointerChecking()->getChecks();
SmallVector<RuntimePointerCheck, 4> Checks;		SmallVector<RuntimePointerCheck, 4> Checks;

copy_if(AllChecks, std::back_inserter(Checks),		copy_if(AllChecks, std::back_inserter(Checks),
[&](const RuntimePointerCheck &Check) {		[&](const RuntimePointerCheck &Check) {
for (auto PtrIdx1 : Check.first->Members)		for (auto PtrIdx1 : Check.first->Members)
for (auto PtrIdx2 : Check.second->Members)		for (auto PtrIdx2 : Check.second->Members)
if (needsChecking(PtrIdx1, PtrIdx2, PtrsWrittenOnFwdingPath,		if (needsChecking(PtrIdx1.first, PtrIdx2.first,
CandLoadPtrs))		PtrsWrittenOnFwdingPath, CandLoadPtrs))
return true;		return true;
return false;		return false;
});		});

LLVM_DEBUG(dbgs() << "\nPointer Checks (count: " << Checks.size()		LLVM_DEBUG(dbgs() << "\nPointer Checks (count: " << Checks.size()
<< "):\n");		<< "):\n");
LLVM_DEBUG(LAI.getRuntimePointerChecking()->printChecks(dbgs(), Checks));		LLVM_DEBUG(LAI.getRuntimePointerChecking()->printChecks(dbgs(), Checks));

▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopVersioning.cpp

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	void LoopVersioning::prepareNoAliasMetadata() {
// reverse map from pointers to the pointer checking group they were assigned		// reverse map from pointers to the pointer checking group they were assigned
// to.		// to.
MDBuilder MDB(Context);		MDBuilder MDB(Context);
MDNode *Domain = MDB.createAnonymousAliasScopeDomain("LVerDomain");		MDNode *Domain = MDB.createAnonymousAliasScopeDomain("LVerDomain");

for (const auto &Group : RtPtrChecking->CheckingGroups) {		for (const auto &Group : RtPtrChecking->CheckingGroups) {
GroupToScope[&Group] = MDB.createAnonymousAliasScope(Domain);		GroupToScope[&Group] = MDB.createAnonymousAliasScope(Domain);

for (unsigned PtrIdx : Group.Members)		for (auto PtrIdx : Group.Members)
PtrToGroup[RtPtrChecking->getPointerInfo(PtrIdx).PointerValue] = &Group;		PtrToGroup[RtPtrChecking->getPointerInfo(PtrIdx.first).PointerValue] =
		&Group;
}		}

// Go through the checks and for each pointer group, collect the scopes for		// Go through the checks and for each pointer group, collect the scopes for
// each non-aliasing pointer group.		// each non-aliasing pointer group.
DenseMap<const RuntimeCheckingPtrGroup , SmallVector<Metadata , 4>>		DenseMap<const RuntimeCheckingPtrGroup , SmallVector<Metadata , 4>>
GroupToNonAliasingScopes;		GroupToNonAliasingScopes;

for (const auto &Check : AliasChecks)		for (const auto &Check : AliasChecks)
▲ Show 20 Lines • Show All 163 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/forked-pointers.ll

This file was added.

				; RUN: opt -loop-accesses -analyze -enable-new-pm=0 %s 2>&1 \| FileCheck %s --check-prefix=NO-FORKED-PTRS
				fhahnUnsubmitted Done Reply Inline Actions the tests running `-loop-accesses` should be in `llvm/test/Analysis/LoopAccessAnalysis`. Also, could you pre-commit the tests and update the diff here to show only the difference? That way it is a bit easier to see the impact in the diff. fhahn: the tests running `-loop-accesses` should be in `llvm/test/Analysis/LoopAccessAnalysis`. Also…
				; RUN: opt -disable-output -passes='require<scalar-evolution>,require<aa>,loop(print-access-info)' %s 2>&1 \| FileCheck %s --check-prefix=NO-FORKED-PTRS
				david-armUnsubmitted Not Done Reply Inline Actions Can we also have a test where at least one of the forked pointers is loop-invariant? david-arm: Can we also have a test where at least one of the forked pointers is loop-invariant?
				huntergrAuthorUnsubmitted Done Reply Inline Actions We already do; see forked_ptrs_uniform_and_contiguous_forks We could properly analyze and vectorize that case as well, but I haven't implemented that yet so I'm just testing that it gets rejected for now. huntergr: We already do; see forked_ptrs_uniform_and_contiguous_forks We could properly analyze and…
				david-armUnsubmitted Not Done Reply Inline Actions Ah ok, sorry I missed that. I guess what I meant was that this should be trivial to implement, particularly if we can find a way of making calls to hasComputableBounds recursive and re-use the existing code that checks for loop-invariants and affine pointers. david-arm: Ah ok, sorry I missed that. I guess what I meant was that this should be trivial to implement…
				huntergrAuthorUnsubmitted Done Reply Inline Actions Sadly, it'll require a bit more work to support invariant addresses. My original downstream code allowed them, but we ran into a bug with it and disabled them. My plan is to get the base functionality committed, then go back and add an interface so that LoopVectorize (or other LAA consumers) can query the type of the forks in order to generate correct (and hopefully more optimal) IR for various cases -- the current contiguous-only SCEVs, strides of >1, loop-invariant but unknown strides, uniform/invariant addresses, indexed gather/scatter, etc. If both forks have a stride of 1, or are invariant, then we could potentially plant two masked load instructions (or load + broadcast) instead of a gather, for instance. But that's future work until this part is completed. huntergr: Sadly, it'll require a bit more work to support invariant addresses. My original downstream…
				; RUN: opt -enable-forked-pointer-detection -loop-accesses -analyze -enable-new-pm=0 %s 2>&1 \| FileCheck %s --check-prefix=FORKED-PTRS
				; RUN: opt -enable-forked-pointer-detection -disable-output -passes='require<scalar-evolution>,require<aa>,loop(print-access-info)' %s 2>&1 \| FileCheck %s --check-prefix=FORKED-PTRS
				; RUN: opt -loop-vectorize -instcombine -enable-forked-pointer-detection -force-vector-width=4 -S < %s 2>&1 \| FileCheck %s --check-prefix=FP-VEC

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

				; NO-FORKED-PTRS-LABEL: function 'forked_ptrs_different_base_same_offset':
				; NO-FORKED-PTRS: for.body:
				; NO-FORKED-PTRS: Report: cannot identify array bounds


				; FORKED-PTRS-LABEL: function 'forked_ptrs_different_base_same_offset':
				; FORKED-PTRS: for.body:
				; FORKED-PTRS: Memory dependences are safe with run-time checks
				; FORKED-PTRS: Dependences:
				; FORKED-PTRS: Run-time memory checks:
				; FORKED-PTRS: Check 0:
				; FORKED-PTRS: Comparing group
				; FORKED-PTRS: %1 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv
				; FORKED-PTRS: Against group
				; FORKED-PTRS: %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
				; FORKED-PTRS: Check 1:
				; FORKED-PTRS: Comparing group
				; FORKED-PTRS: %1 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv
				; FORKED-PTRS: Against group
				; FORKED-PTRS: %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv
				; FORKED-PTRS: Fork: {%Base2,+,4}<nsw><%for.body>
				; FORKED-PTRS: Check 2:
				; FORKED-PTRS: Comparing group
				; FORKED-PTRS: %1 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv
				; FORKED-PTRS: Against group
				; FORKED-PTRS: %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv
				david-armUnsubmitted Not Done Reply Inline Actions It would be good to show some distinction here between Check 1 and Check 2. I assume it's actually checking each of the forked pointers, but the output doesn't make that clear. david-arm: It would be good to show some distinction here between Check 1 and Check 2. I assume it's…
				; FORKED-PTRS: Fork: {%Base1,+,4}<nsw><%for.body>
				; FORKED-PTRS: Grouped accesses:
				; FORKED-PTRS: Group
				; FORKED-PTRS: (Low: %Dest High: (400 + %Dest))
				; FORKED-PTRS: Member: {%Dest,+,4}<nuw><%for.body>
				; FORKED-PTRS: Group
				; FORKED-PTRS: (Low: %Preds High: (400 + %Preds))
				; FORKED-PTRS: Member: {%Preds,+,4}<nuw><%for.body>
				; FORKED-PTRS: Group
				; FORKED-PTRS: (Low: %Base2 High: (400 + %Base2))
				; FORKED-PTRS: Member: {%Base2,+,4}<nsw><%for.body>
				; FORKED-PTRS: Group
				; FORKED-PTRS: (Low: %Base1 High: (400 + %Base1))
				; FORKED-PTRS: Member: {%Base1,+,4}<nsw><%for.body>
				; FORKED-PTRS: Non vectorizable stores to invariant address were not found in loop.
				; FORKED-PTRS: SCEV assumptions:
				; FORKED-PTRS: Expressions re-written:

				; FP-VEC-LABEL: @forked_ptrs_different_base_same_offset
				; FP-VEC: vector.memcheck:
				; FP-VEC: [[DESTEND:%[a-zA-Z0-9]+]] = getelementptr float, float* %Dest, i64 100
				; FP-VEC: [[PREDSEND:%[a-zA-Z0-9]+]] = getelementptr i32, i32* %Preds, i64 100
				; FP-VEC: [[BASE2END:%[a-zA-Z0-9]+]] = getelementptr float, float* %Base2, i64 100
				; FP-VEC: [[BASE1END:%[a-zA-Z0-9]+]] = getelementptr float, float* %Base1, i64 100

				;;;; Preds vs. Dest
				; FP-VEC: [[PREDSCAST:%[0-9]+]] = bitcast i32* [[PREDSEND]] to float*
				; FP-VEC: [[PBOUND1:%[a-zA-Z0-9]+]] = icmp ugt float* [[PREDSCAST]], %Dest
				; FP-VEC: [[DESTCAST:%[0-9]+]] = bitcast float* [[DESTEND]] to i32*
				; FP-VEC: [[PBOUND2:%[a-zA-Z0-9]+]] = icmp ugt i32* [[DESTCAST]], %Preds
				; FP-VEC: [[PCONFLICT:%[a-zA-Z0-9\.]+]] = and i1 [[PBOUND1]], [[PBOUND2]]

				;;;; Base2 vs. Dest
				; FP-VEC: [[B2BOUND1:%[a-zA-Z0-9]+]] = icmp ugt float* [[BASE2END]], %Dest
				; FP-VEC: [[B2BOUND2:%[a-zA-Z0-9]+]] = icmp ugt float* [[DESTEND]], %Base2
				; FP-VEC: [[B2CONFLICT:%[a-zA-Z0-9\.]+]] = and i1 [[B2BOUND1]], [[B2BOUND2]]
				; FP-VEC: [[COMBINED1:%[a-zA-Z0-9\.]+]] = or i1 [[PCONFLICT]], [[B2CONFLICT]]

				;;;; Base1 vs. Dest
				; FP-VEC: [[B1BOUND1:%[a-zA-Z0-9]+]] = icmp ugt float* [[BASE1END]], %Dest
				; FP-VEC: [[B1BOUND2:%[a-zA-Z0-9]+]] = icmp ugt float* [[DESTEND]], %Base1
				; FP-VEC: [[B1CONFLICT:%[a-zA-Z0-9\.]+]] = and i1 [[B1BOUND1]], [[B1BOUND2]]
				; FP-VEC: [[COMBINED2:%[a-zA-Z0-9\.]+]] = or i1 [[COMBINED1]], [[B1CONFLICT]]

				; FP-VEC: br i1 [[COMBINED2]], label %scalar.ph, label %vector.ph

				;;;; Check we get a gather load. We could do two contiguous masked loads but
				;;;; haven't implemented that yet.
				; FP-VEC: [[LANE1:%[0-9]+]] = load float
				; FP-VEC: [[LANE2:%[0-9]+]] = load float
				; FP-VEC: [[LANE3:%[0-9]+]] = load float
				; FP-VEC: [[LANE4:%[0-9]+]] = load float
				; FP-VEC: [[INS1:%[0-9]+]] = insertelement <4 x float> poison, float [[LANE1]], i32 0
				; FP-VEC: [[INS2:%[0-9]+]] = insertelement <4 x float> [[INS1]], float [[LANE2]], i32 1
				; FP-VEC: [[INS3:%[0-9]+]] = insertelement <4 x float> [[INS2]], float [[LANE3]], i32 2
				; FP-VEC: [[INS4:%[0-9]+]] = insertelement <4 x float> [[INS3]], float [[LANE4]], i32 3


				;;;; Derived from the following C code
				;; void forked_ptrs_different_base_same_offset(float A, float B, float C, int D) {
				;; for (int i=0; i<100; i++) {
				;; if (D[i] != 0) {
				;; C[i] = A[i];
				;; } else {
				;; C[i] = B[i];
				;; }
				;; }
				;; }

				define dso_local void @forked_ptrs_different_base_same_offset(float* nocapture readonly %Base1, float* nocapture readonly %Base2, float* nocapture %Dest, i32* nocapture readonly %Preds) {
				entry:
				br label %for.body

				for.cond.cleanup:
				ret void

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%cmp1.not = icmp eq i32 %0, 0
				%spec.select = select i1 %cmp1.not, float* %Base2, float* %Base1
				%.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv
				%.sink = load float, float* %.sink.in, align 4
				%1 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv
				store float %.sink, float* %1, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, 100
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				; NO-FORKED-PTRS-LABEL: function 'forked_ptrs_same_base_different_offset':
				; NO-FORKED-PTRS: for.body:
				; NO-FORKED-PTRS: Report: cannot identify array bounds


				; FORKED-PTRS-LABEL: function 'forked_ptrs_same_base_different_offset':
				; FORKED-PTRS: for.body:
				; FORKED-PTRS: Memory dependences are safe with run-time checks
				; FORKED-PTRS: Dependences:
				; FORKED-PTRS: Run-time memory checks:
				; FORKED-PTRS: Check 0:
				; FORKED-PTRS: Comparing group
				; FORKED-PTRS: %arrayidx5 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv
				; FORKED-PTRS: Against group
				; FORKED-PTRS: %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
				; FORKED-PTRS: Check 1:
				; FORKED-PTRS: Comparing group
				; FORKED-PTRS: %arrayidx5 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv
				; FORKED-PTRS: Against group
				; FORKED-PTRS: %arrayidx3 = getelementptr inbounds float, float* %Base, i64 %idxprom213
				; FORKED-PTRS: Fork: {(4 + %Base)<nsw>,+,4}<nsw><%for.body>
				; FORKED-PTRS: %arrayidx3 = getelementptr inbounds float, float* %Base, i64 %idxprom213
				; FORKED-PTRS: Fork: {%Base,+,4}<nsw><%for.body>
				; FORKED-PTRS: Grouped accesses:
				; FORKED-PTRS: Group
				; FORKED-PTRS: (Low: %Dest High: (400 + %Dest))
				; FORKED-PTRS: Member: {%Dest,+,4}<nuw><%for.body>
				; FORKED-PTRS: Group
				; FORKED-PTRS: (Low: %Preds High: (400 + %Preds))
				; FORKED-PTRS: Member: {%Preds,+,4}<nuw><%for.body>
				; FORKED-PTRS: Group
				; FORKED-PTRS: (Low: %Base High: (404 + %Base))
				; FORKED-PTRS: Member: {(4 + %Base)<nsw>,+,4}<nsw><%for.body>
				; FORKED-PTRS: Member: {%Base,+,4}<nsw><%for.body>
				; FORKED-PTRS: Non vectorizable stores to invariant address were not found in loop.
				; FORKED-PTRS: SCEV assumptions:
				; FORKED-PTRS: Expressions re-written:

				; FP-VEC-LABEL: @forked_ptrs_same_base_different_offset
				; FP-VEC: vector.memcheck:
				; FP-VEC: [[DESTEND:%[a-zA-Z0-9]+]] = getelementptr float, float* %Dest, i64 100
				; FP-VEC: [[PREDSEND:%[a-zA-Z0-9]+]] = getelementptr i32, i32* %Preds, i64 100
				; FP-VEC: [[BASEEND:%[a-zA-Z0-9]+]] = getelementptr float, float* %Base, i64 101

				;;;; Preds vs. Dest
				; FP-VEC: [[PREDSCAST:%[0-9]+]] = bitcast i32* [[PREDSEND]] to float*
				; FP-VEC: [[PBOUND1:%[a-zA-Z0-9]+]] = icmp ugt float* [[PREDSCAST]], %Dest
				; FP-VEC: [[DESTCAST:%[0-9]+]] = bitcast float* [[DESTEND]] to i32*
				; FP-VEC: [[PBOUND2:%[a-zA-Z0-9]+]] = icmp ugt i32* [[DESTCAST]], %Preds
				; FP-VEC: [[PCONFLICT:%[a-zA-Z0-9\.]+]] = and i1 [[PBOUND1]], [[PBOUND2]]

				;;;; Base vs. Dest
				; FP-VEC: [[BBOUND1:%[a-zA-Z0-9]+]] = icmp ugt float* [[BASEEND]], %Dest
				; FP-VEC: [[BBOUND2:%[a-zA-Z0-9]+]] = icmp ugt float* [[DESTEND]], %Base
				; FP-VEC: [[BCONFLICT:%[a-zA-Z0-9\.]+]] = and i1 [[BBOUND1]], [[BBOUND2]]
				; FP-VEC: [[COMBINED:%[a-zA-Z0-9\.]+]] = or i1 [[PCONFLICT]], [[BCONFLICT]]

				; FP-VEC: br i1 [[COMBINED]], label %scalar.ph, label %vector.ph

				;;;; Check we get a gather load. We could do a better job here, especially
				;;;; since the offset is only 1 element, but we haven't implemented that yet.
				; FP-VEC: [[LANE1:%[0-9]+]] = load float
				; FP-VEC: [[LANE2:%[0-9]+]] = load float
				; FP-VEC: [[LANE3:%[0-9]+]] = load float
				; FP-VEC: [[LANE4:%[0-9]+]] = load float
				; FP-VEC: [[INS1:%[0-9]+]] = insertelement <4 x float> poison, float [[LANE1]], i32 0
				; FP-VEC: [[INS2:%[0-9]+]] = insertelement <4 x float> [[INS1]], float [[LANE2]], i32 1
				; FP-VEC: [[INS3:%[0-9]+]] = insertelement <4 x float> [[INS2]], float [[LANE3]], i32 2
				; FP-VEC: [[INS4:%[0-9]+]] = insertelement <4 x float> [[INS3]], float [[LANE4]], i32 3

				;;;; Derived from the following C code
				;; void forked_ptrs_same_base_different_offset(float A, float B, int *C) {
				;; int offset;
				;; for (int i = 0; i < 100; i++) {
				;; if (C[i] != 0)
				;; offset = i;
				;; else
				;; offset = i+1;
				;; B[i] = A[offset];
				;; }
				;; }

				define dso_local void @forked_ptrs_same_base_different_offset(float* nocapture readonly %Base, float* nocapture %Dest, i32* nocapture readonly %Preds) {
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%i.014 = phi i32 [ 0, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%cmp1.not = icmp eq i32 %0, 0
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%add = add nuw nsw i32 %i.014, 1
				%1 = trunc i64 %indvars.iv to i32
				%offset.0 = select i1 %cmp1.not, i32 %add, i32 %1
				%idxprom213 = zext i32 %offset.0 to i64
				%arrayidx3 = getelementptr inbounds float, float* %Base, i64 %idxprom213
				%2 = load float, float* %arrayidx3, align 4
				%arrayidx5 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv
				store float %2, float* %arrayidx5, align 4
				%exitcond.not = icmp eq i64 %indvars.iv.next, 100
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				;;;; Cases that can be handled by a forked pointer but are not currently allowed.

				; NO-FORKED-PTRS-LABEL: function 'forked_ptrs_uniform_and_strided_forks':
				; NO-FORKED-PTRS: for.body:
				; NO-FORKED-PTRS: Report: cannot identify array bounds


				; FORKED-PTRS-LABEL: function 'forked_ptrs_uniform_and_strided_forks':
				; FORKED-PTRS: for.body:
				; FORKED-PTRS: Report: cannot identify array bounds

				;;;; Derived from forked_ptrs_same_base_different_offset with a manually
				;;;; added uniform offset and a mul to provide a stride

				define dso_local void @forked_ptrs_uniform_and_strided_forks(float* nocapture readonly %Base, float* nocapture %Dest, i32* nocapture readonly %Preds) {
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%i.014 = phi i32 [ 0, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%cmp1.not = icmp eq i32 %0, 0
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%add = add nuw nsw i32 %i.014, 1
				%1 = trunc i64 %indvars.iv to i32
				%mul = mul i32 %1, 3
				%offset.0 = select i1 %cmp1.not, i32 4, i32 %mul
				%idxprom213 = sext i32 %offset.0 to i64
				%arrayidx3 = getelementptr inbounds float, float* %Base, i64 %idxprom213
				%2 = load float, float* %arrayidx3, align 4
				%arrayidx5 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv
				store float %2, float* %arrayidx5, align 4
				%exitcond.not = icmp eq i64 %indvars.iv.next, 100
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				; NO-FORKED-PTRS-LABEL: function 'forked_ptrs_gather_and_contiguous_forks':
				; NO-FORKED-PTRS: for.body:
				; NO-FORKED-PTRS: Report: cannot identify array bounds


				; FORKED-PTRS-LABEL: function 'forked_ptrs_gather_and_contiguous_forks':
				; FORKED-PTRS: for.body:
				; FORKED-PTRS: Report: cannot identify array bounds

				;;;; Derived from forked_ptrs_same_base_different_offset with a gather
				;;;; added using Preds as an index array in addition to the per-iteration
				;;;; condition.

				define dso_local void @forked_ptrs_gather_and_contiguous_forks(float* nocapture readonly %Base1, float* nocapture readonly %Base2, float* nocapture %Dest, i32* nocapture readonly %Preds) {
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%cmp1.not = icmp eq i32 %0, 0
				%arrayidx9 = getelementptr inbounds float, float* %Base2, i64 %indvars.iv
				%idxprom4 = sext i32 %0 to i64
				%arrayidx5 = getelementptr inbounds float, float* %Base1, i64 %idxprom4
				%.sink.in = select i1 %cmp1.not, float* %arrayidx9, float* %arrayidx5
				%.sink = load float, float* %.sink.in, align 4
				%1 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv
				store float %.sink, float* %1, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, 100
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LAA] Analyze pointers forked by a selectClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 388889

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/include/llvm/Analysis/ScalarEvolution.h

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/lib/Transforms/Scalar/LoopDistribute.cpp

llvm/lib/Transforms/Scalar/LoopLoadElimination.cpp

llvm/lib/Transforms/Utils/LoopVersioning.cpp

llvm/test/Transforms/LoopVectorize/forked-pointers.ll

[LAA] Analyze pointers forked by a select
ClosedPublic