This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
13/24
LoopAccessAnalysis.cpp
-
test/
-
Analysis/LoopAccessAnalysis/
-
LoopAccessAnalysis/
1/2
forked-pointers.ll
-
Transforms/LoopVectorize/
-
LoopVectorize/
3/6
forked-pointers.ll

Differential D108699

[LAA] Analyze pointers forked by a select
ClosedPublic

Authored by huntergr on Aug 25 2021, 6:04 AM.

Download Raw Diff

Details

Reviewers

fhahn
sdesmalen
reames
eli.friedman
david-arm
lebedev.ri
Ayal
Meinersbur

Commits

rGdb8fcb2c2537: [LAA] Add recursive IR walker for forked pointers

Summary

Given a function like the following:

void forked_ptrs_different_base_same_offset(float *Base1, float *Base2, float *Dest, int *Preds) {
  for (int i=0; i<100; i++) {
    if (Pred[i] != 0) {
      Dest[i] = Base1[i];
    } else {
      Dest[i] = Base2[i];
    }
  }
}

LLVM will optimize the IR to a single load using a pointer determined by a select instruction:

%spec.select = select i1 %cmp1.not, float* %Base2, float* %Base1
%.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv 
%.sink = load float, float* %.sink.in, align 4

LAA is currently unable to analyze such IR, since ScalarEvolution will return a SCEVUnknown for the pointer operand for the load.

This patch adds initial optional support for analyzing both possibilities for the pointer and allowing LAA to generate runtime checks for the bounds if required.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

huntergr created this revision.Aug 25 2021, 6:04 AM

Herald added subscribers: javed.absar, hiraditya. · View Herald TranscriptAug 25 2021, 6:04 AM

huntergr requested review of this revision.Aug 25 2021, 6:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2021, 6:04 AM

Harbormaster completed remote builds in B121152: Diff 368619.Aug 25 2021, 6:46 AM

Rebased, ran clang-format over the patch.

Harbormaster completed remote builds in B122756: Diff 370905.Sep 6 2021, 6:54 AM

Hi @huntergr, apologies for delay reviewing this patch! I've not finished reviewing it, but I've left some comments that I have so far. Thanks!

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
374 ↗	(On Diff #370905)	If we're only ever going to support two forks couldn't we make this simpler by just having: unsigned Members[2]; instead or is this a real pain to implement? Or is the idea that we may want to extend the fork to allow more than 2 in future?
395 ↗	(On Diff #370905)	Again, does this need to be a `SmallVector` and can it just be `const SCEV *Starts[2];`?
llvm/include/llvm/Analysis/ScalarEvolution.h
444 ↗	(On Diff #370905)	Hi Graham, it feels slightly awkward having to essentially redefine a ForkedPointer as a pair in the same way as `ForkedPtrs` in llvm/include/llvm/Analysis/LoopAccessAnalysis.h. Maybe we could define something like: using ForkedPointer = std::pair<const SCEV , const SCEV >; in llvm/include/llvm/Analysis/ScalarEvolution.h that can be used in LoopAccessAnalysis too?
llvm/lib/Analysis/LoopAccessAnalysis.cpp
183	nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway?
379	nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway?
505	Have you rewritten the logic here as a performance improvement, i.e. to avoid calling `Group.addPointer()` after it's already been merged?
743	nit: This is just a suggestion, but you could restructure this a little to remove the extra indentation I think here: if (!AR) { if (!RtCheck.AllowForkedPtrs) return false; ... Not sure if this is better?
759	Is it better to just make a recursive call to hasComputableBounds instead of calling `isAffine` here? We're potentially missing out on future improvements to hasComputableBounds here, and also we're ignoring the possibility of a forked pointer being loop invariant I think.
llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1	Can we also have a test where at least one of the forked pointers is loop-invariant?

huntergr added inline comments.Sep 7 2021, 7:31 AM

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
374 ↗	(On Diff #370905)	So this is for a RuntimeCheckingPtrGroup, which may have more than two members already; I've only changed it from a SmallVector to a SmallSetVector here in order to avoid duplicate members, since both sides of a forked pointer can be added to the same group.
395 ↗	(On Diff #370905)	I'll look into it, but it'll make other parts messier since I can't just iterate over the members of the SmallVectors and would need to perform null checks for all those places for the second element. One solution I thought of (but didn't implement, since I wanted some community feedback on the overall work first) was to create a new SCEVForkedExpr so that we wouldn't need to store two SCEVs here and could just note which fork(s) was represented. It'll take a while to implement that, but might be cleaner. Thoughts?
llvm/lib/Analysis/LoopAccessAnalysis.cpp
505	Not intentionally, no -- I just replicated the behaviour of the original (break out of the loop if the pointer merged into a group) but considered both potential forks.
llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1	We already do; see forked_ptrs_uniform_and_contiguous_forks We could properly analyze and vectorize that case as well, but I haven't implemented that yet so I'm just testing that it gets rejected for now.

david-arm added inline comments.Sep 7 2021, 7:39 AM

llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1	Ah ok, sorry I missed that. I guess what I meant was that this should be trivial to implement, particularly if we can find a way of making calls to hasComputableBounds recursive and re-use the existing code that checks for loop-invariants and affine pointers.

Rebased, updated based on review comments.

huntergr marked 4 inline comments as done.Oct 12 2021, 2:09 AM

huntergr added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
743	We still needed to bail out if AR was false and forked pointers weren't allowed... so I rewrote it to check for the positive case for AR first and then proceed with the forked pointers check and default false afterwards.
759	We can't do that right now -- hasComputableBounds takes a Value* rather than a SCEV* so it can (potentially) be added to the stride map in replaceSymbolicStrideSCEV. We'd just end up looking at the same Value and splitting again. This is something the SCEVForkedExpr would make cleaner, since we would only need to evaluate a single SCEV. But it'll take a bit of refactoring to do that, which is why I wanted some feedback on the whole idea first. We could also separate out parts of these functions to make it recursive, but I'll need to be careful since replaceSymbolicStrideSCEV has other users.
llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1	Sadly, it'll require a bit more work to support invariant addresses. My original downstream code allowed them, but we ran into a bug with it and disabled them. My plan is to get the base functionality committed, then go back and add an interface so that LoopVectorize (or other LAA consumers) can query the type of the forks in order to generate correct (and hopefully more optimal) IR for various cases -- the current contiguous-only SCEVs, strides of >1, loop-invariant but unknown strides, uniform/invariant addresses, indexed gather/scatter, etc. If both forks have a stride of 1, or are invariant, then we could potentially plant two masked load instructions (or load + broadcast) instead of a gather, for instance. But that's future work until this part is completed.

Harbormaster completed remote builds in B128298: Diff 378920.Oct 12 2021, 2:53 AM

Ping.

Thanks for making the changes @huntergr! I've got a few mostly minor comments so far, but still have to review findForkedSCEVs. :)

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
409 ↗	(On Diff #378920)	Hi @huntergr, thanks for making the changes to `Exprs`. I just wonder if we should have an assert here that `ScExprs.size() <= 2`?
llvm/lib/Analysis/LoopAccessAnalysis.cpp
464	nit: Is it worth calling `CheckingGroups.emplace_back(I, this, /Fork=0*/0)` for clarity here?
755	nit: This is just a minor comment, but you could remove indentation further here by bailing out early, i.e. if (!FPtr) return false; const SCEV *A =
759	nit: Instead of writing: LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << (FPtr->first) << "\n"); I think you can just write LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << A << "\n"); and same for the second one.

david-arm added inline comments.Oct 20 2021, 6:54 AM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
761	Is there any danger in setting this before we return true?
780	You're introducing a new implicit TypeSize -> uint64_t cast here. Could you rewrite this as: int64_t Size = DL.getTypeAllocSize(PtrTy->getElementType()).getFixedSize();

Rebased, minor fixes from review comments.

huntergr marked 5 inline comments as done.Oct 21 2021, 4:45 AM

Harbormaster completed remote builds in B129914: Diff 381211.Oct 21 2021, 5:22 AM

Thanks for dealing with all the review comments @huntergr! I think the patch looks sensible to me, although I'm not as familiar with the SCEV code as others might be. I'm adding @lebedev.ri as a potential reviewer if that's ok?

llvm/test/Transforms/LoopVectorize/forked-pointers.ll
33	It would be good to show some distinction here between Check 1 and Check 2. I assume it's actually checking each of the forked pointers, but the output doesn't make that clear.

Rebased, and changed the membership of a checking group to be a pair of PointerInfo index and fork so that printing can show which forks are present in a group.

Harbormaster completed remote builds in B134698: Diff 387879.Nov 17 2021, 3:00 AM

Rebased, disabled by default, added a couple of different instructions into the tests to ensure those paths are at least covered even if they aren't in a positive case right now; I'm still planning to leave those cases (known strides, g/s, uniform) for a followup patch; I suppose I could start a patch series if people want to see those earlier.

I've run the LNT nightly suite with this enabled, along with a few HPC benchmarks I have access to and they all pass.

Harbormaster completed remote builds in B135406: Diff 388889.Nov 22 2021, 8:47 AM

LGTM! Thanks for making all the changes @huntergr and adding the tests. This patch is low risk at the moment with the flag being disabled currently. At some point once there have been performance investigations to prove there are no regressions we can then enable this by default.

david-arm accepted this revision.Nov 23 2021, 5:10 AM

This revision is now accepted and ready to land.Nov 23 2021, 5:10 AM

In D108699#3148674, @david-arm wrote:

LGTM! Thanks for making all the changes @huntergr and adding the tests. This patch is low risk at the moment with the flag being disabled currently. At some point once there have been performance investigations to prove there are no regressions we can then enable this by default.

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
20 ↗	(On Diff #388889)	Is this needed? Other SCEV types are forward-declared below, so ScalarEvolution.h doesn't need to be included.
334 ↗	(On Diff #388889)	Could we use a bool for `Fork` if it only has 2 possible values?
llvm/include/llvm/Analysis/ScalarEvolution.h
769 ↗	(On Diff #388889)	This is only used by LAA. Is there a reason this needs to be part of `ScalarEvolution`?
llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1	the tests running `-loop-accesses` should be in `llvm/test/Analysis/LoopAccessAnalysis`. Also, could you pre-commit the tests and update the diff here to show only the difference? That way it is a bit easier to see the impact in the diff.

I just realized how this may be a bit similar to how we handle pointers that are phi nodes. Currently those are handled by adding accesses for both incoming values (see D109381). Unfortunately the same approach cannot be directly used for selects, because we need to create 2 pointers that do not exist in the IR.

But if MemAccessInfo/ would also carry the pointer SCEV directly, I think it would be possible to avoid adding another dimension to RuntimePointerChecking::PointerInfo. Instead we would add 2 PointerInfo entries with separate translated pointer SCEVs. I put up a rough sketch of what this may look like in D114480, D114479 to see how this may look like

In D108699#3150041, @fhahn wrote:

I just realized how this may be a bit similar to how we handle pointers that are phi nodes. Currently those are handled by adding accesses for both incoming values (see D109381). Unfortunately the same approach cannot be directly used for selects, because we need to create 2 pointers that do not exist in the IR.

But if MemAccessInfo/ would also carry the pointer SCEV directly, I think it would be possible to avoid adding another dimension to RuntimePointerChecking::PointerInfo. Instead we would add 2 PointerInfo entries with separate translated pointer SCEVs. I put up a rough sketch of what this may look like in D114480, D114479 to see how this may look like

I also put up a sketch of a stripped down variant that only supports runtime check generation by adding multiple PointerInfos with different associated pointer SCEVs: D114487.

The variant that also extends MemAccessInfo should also be able to determine that certain loops are safe without runtime checks, e.g. like the one below I think:

%s1 = type { [32000 x float], [32000 x float], [32000 x float] }
define dso_local void @foo(%s1 * nocapture readonly %Base, i32* nocapture readonly %Preds) {
entry:
  br label %for.body

for.cond.cleanup:
  ret void

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
  %0 = load i32, i32* %arrayidx, align 4
  %cmp1.not = icmp eq i32 %0, 0
  %gep.1 = getelementptr inbounds %s1, %s1* %Base, i64 0, i32 1, i32 0
  %gep.2 = getelementptr inbounds %s1, %s1* %Base, i64 0, i32 2, i32 0
  %spec.select = select i1 %cmp1.not, float* %gep.1, float* %gep.2
  %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv
  %.sink = load float, float* %.sink.in, align 4
  %1= getelementptr inbounds %s1, %s1 * %Base, i64 0, i32 0, i64 %indvars.iv
  store float %.sink, float* %1, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 100
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}

huntergr mentioned this in rGdee810e117ad: [NFC][LAA] Precommit tests for forked pointers.Nov 24 2021, 8:21 AM

Revised based on @fhahn 's initial suggestions, rebased.

I'll look into the MemAccessInfo approach.

huntergr marked 3 inline comments as done.Nov 26 2021, 2:27 AM

huntergr added inline comments.

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
20 ↗	(On Diff #388889)	It was there to include the 'using ForkedPointer =' definition, but moving it out of ScalarEvolution means we can just define it here. I had hoped that we could maybe integrate it further into ScalarEvolution as a new type of SCEV, but maybe this isn't the right time -- you're correct, this will be the only user for now, so we can just keep it here until we find a case where it would be useful to have it more widely available.

Harbormaster completed remote builds in B136174: Diff 389956.Nov 26 2021, 3:03 AM

mgabka added a subscriber: mgabka.Nov 29 2021, 1:52 AM

LGTM!

Hi @huntergr, thanks for making all the changes. I think the patch looks good to
go for now. It's still worth looking into the MemAccessInfo approach as a follow-up,
but the patch has been sat in review for long enough (3 months) without any
fundamental objections so I'd prefer we got something merged now to get the
functionality defended and unblock future work. This is currently only enabled
under an option, so any refactoring for the MemAccessInfo approach can be
done safely under another patch.

fhahn mentioned this in D114487: [LAA] Support runtime checks for select GEP base pointers..Nov 29 2021, 1:58 PM

In D108699#3158014, @david-arm wrote:

LGTM!

Hi @huntergr, thanks for making all the changes. I think the patch looks good to
go for now. It's still worth looking into the MemAccessInfo approach as a follow-up,
but the patch has been sat in review for long enough (3 months) without any
fundamental objections so I'd prefer we got something merged now to get the
functionality defended and unblock future work.

I agree it is not ideal that there's been not much feedback so far, but now that there is some additional feedback/suggestion, I think it would be good to hear additional opinions on the preferred direction here or at least discuss concrete potential alternatives; D114487 in particular which should have similar effects, but with less invasive changes, i.e. there's no need to adjust isNoWrap, hasComputableBounds, RuntimePointerChecking::insert or add additional state.

You mention future work blocked by this as a reason for landing this now, however I cannot find any references to patches depending on this work.

This is currently only enabled
under an option, so any refactoring for the MemAccessInfo approach can be
done safely under another patch.

While it is true that it is off by default, it adds substantial complexity to LAA which is already quite complex and the changes are quite spread out. One concern is that it adds an additional way to model 'forked pointers'; we already handle 'forked pointers' via phi nodes (by adding multiple PointerInfos). I am also not sure it should be off by default, once it lands. IMO better analysis should be the right thing to do and should be quite safe from a performance regression perspective. I don't see a strong reason for not enabling this by default and having wide testing surface potential issues early.

Rebased and changed to use @fhahn 's lighter-weight approach from D114487 combined with my recursive function to find the SCEVs. Although the simplified tests are handled with a couple levels of checking, the real applications I was working on had additional operations between the ptr value for the load or store and a select.

It might be best to introduce a limit to recursion though, any thoughts?

Harbormaster completed remote builds in B148191: Diff 406734.Feb 8 2022, 3:05 AM

peterwaller-arm added a subscriber: peterwaller-arm.Feb 15 2022, 5:12 AM

bsmith added a subscriber: bsmith.Feb 28 2022, 3:01 AM

Rebased, added a recursion limit to the SCEV building function.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 2:19 AM

Harbormaster completed remote builds in B152318: Diff 412642.Mar 3 2022, 2:41 AM

Rebased. Ping?

Harbormaster completed remote builds in B158428: Diff 421130.Apr 7 2022, 2:46 AM

fhahn mentioned this in rG3c1483609369: [LAA] Add test with simpler load of pointer select..Apr 10 2022, 2:55 PM

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
134	It would be good to add a test for the option, e.g. a case that requires 2 or 3 recursions and set test with `max-forked-scev-depth=2/3`
469–473	This scheme would need documenting, i.e. why we can have multiple expressions for a pointer.
474	should be removed?
732	should be removed?
836	missing test for this?
854	It looks like tests for those conditions are missing?
867	I don't think we can use the inbounds info here, unless we prove that the program is undefined if GEP is poison. Consider something like below. If `%c` is always false, the GEP index could be out-of-bounds (and the GEP poison). Adding a runtime check based on the SCEV expression may introduce a branch on poison unconditionally. define dso_local void @forked_ptrs_different_base_same_offset(float* nocapture readonly %Base1, float* nocapture readonly %Base2, float* nocapture %Dest, i32* nocapture readonly %Preds, i1 %c) { entry: br label %for.body for.cond.cleanup: ret void for.body: %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %latch ] %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv %0 = load i32, i32* %arrayidx, align 4 %cmp1.not = icmp eq i32 %0, 0 %spec.select = select i1 %cmp1.not, float* %Base2, float* %Base1 %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv %.sink = load float, float* %.sink.in, align 4 %1 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv br i1 %c, label %then, label %latch then: store float %.sink, float* %1, align 4 br label %latch latch: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond.not = icmp eq i64 %indvars.iv.next, 100 br i1 %exitcond.not, label %for.cond.cleanup, label %for.body }
874	It might be a bit simpler to just add a duplicate to BaseScevs/OffsetScevs and have one common code path to compute the SCEVs to add
955	Needs resolving. It should be needed, because the above code may have added assumptions, which make Ptr an AddRec, See comment in D114487.

In D108699#3441724, @fhahn wrote:

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

Sure, that sounds like a good plan. I'll try and review your patch this week.

In D108699#3445309, @huntergr wrote:

In D108699#3441724, @fhahn wrote:

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

Sure, that sounds like a good plan. I'll try and review your patch this week.

Sounds great, thanks!

fhahn mentioned this in rG5890b3010599: [LAA] Initial support for runtime checks with pointer selects..May 12 2022, 11:34 AM

Rebased on top of Florian's patch from https://reviews.llvm.org/D114487

That patch has been reverted for now so builds will fail, but this should be pretty close to what we want once the base feature is back in.

I've cut the walker function down to only handle GEPs and selects for now; we can add more cases (and more tests) later.

I've added tests to exercise the checks that exclude a given select or GEP from processing, as well as one for the recursion limit.

I'll precommit the new tests before the next update on this patch.

Harbormaster completed remote builds in B170454: Diff 437820.Jun 17 2022, 2:16 AM

In D108699#3591443, @huntergr wrote:

Rebased on top of Florian's patch from https://reviews.llvm.org/D114487

That patch has been reverted for now so builds will fail, but this should be pretty close to what we want once the base feature is back in.

Thanks, the patch has been relanded a while ago.

I've cut the walker function down to only handle GEPs and selects for now; we can add more cases (and more tests) later.

I've added tests to exercise the checks that exclude a given select or GEP from processing, as well as one for the recursion limit.

I'll precommit the new tests before the next update on this patch.

Sounds good! I skimmed the current version and it looks like most comments should be addressed in the latest update. It might be good to mark them as done. One thing I couldn't spot in the latest version is a test for the potential inbounds issue mentioned inline.

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
1–2	It would be probably be good to convert the IR to use opaque pointers in a pre-commit, so `-opaque-pointers` is not needed.

Rebased on top of @fhahn 's fixed patch, including changes to detect possible undef/poison.

New tests were precommitted in rGa19cf47da095.

huntergr marked 5 inline comments as done.Jul 14 2022, 1:14 AM

huntergr added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
867	I removed the code to add no-wrap flags to the SCEV based on 'inbounds', then added your test.

Harbormaster completed remote builds in B175325: Diff 444549.Jul 14 2022, 2:04 AM

LGTM, thanks!

llvm/lib/Analysis/LoopAccessAnalysis.cpp
916	nit: unnecessary move?
llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
158	nit: it would probably be good to also have a few tests that access different sizes, including odd ones like `i23` or something like that, to ensure the right size expressions are used.

This revision was landed with ongoing or failed builds.Jul 18 2022, 4:08 AM

Closed by commit rGdb8fcb2c2537: [LAA] Add recursive IR walker for forked pointers (authored by huntergr). · Explain Why

This revision was automatically updated to reflect the committed changes.

huntergr added a commit: rGdb8fcb2c2537: [LAA] Add recursive IR walker for forked pointers.

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::SCEV*, bool> > findForkedPointer(llvm::PredicatedScalarEvolution&, const ValueToValueMap&, llvm::Value*, const llvm::Loop*)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: could not convert ‘Scevs’ from ‘SmallVector<[...],2>’ to ‘SmallVector<[...],3>’
  916 |     return Scevs;
      |            ^~~~~
      |            |
      |            SmallVector<[...],2>

In D108699#3659633, @ro wrote:

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::SCEV*, bool> > findForkedPointer(llvm::PredicatedScalarEvolution&, const ValueToValueMap&, llvm::Value*, const llvm::Loop*)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: could not convert ‘Scevs’ from ‘SmallVector<[...],2>’ to ‘SmallVector<[...],3>’
  916 |     return Scevs;
      |            ^~~~~
      |            |
      |            SmallVector<[...],2>

Does rG4bd072c56b87 fix this for you?

In D108699#3659641, @huntergr wrote:

In D108699#3659633, @ro wrote:

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

[...]

Does rG4bd072c56b87 fix this for you?

It does indeed, thanks. FWIW, this is with g++ 11.3.0.

https://github.com/llvm/llvm-project/issues/57368 is an open bug report describing a regression from this patch.

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptAug 29 2022, 11:07 AM

In D108699#3756189, @efriedma wrote:

https://github.com/llvm/llvm-project/issues/57368 is an open bug report describing a regression from this patch.

Thanks for the report and reproducer, I'll look into it.

Allen mentioned this in D158493: [LAA] Support forked pointer in the form of phi.Aug 22 2023, 1:07 AM

Allen mentioned this in D158965: [LAA] Analyze pointers forked by a phi.Aug 27 2023, 11:32 PM

GitHub <noreply@github.com> mentioned this in rG48caa0723c89: [LAA] Analyze pointers forked by a phi (#65834).Sep 18 2023, 6:16 PM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

LoopAccessAnalysis.cpp

156 lines

test/

Analysis/

LoopAccessAnalysis/

forked-pointers.ll

90 lines

Transforms/

LoopVectorize/

forked-pointers.ll

76 lines

Diff 445449

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines

/// Enable store-to-load forwarding conflict detection. This option can		/// Enable store-to-load forwarding conflict detection. This option can
/// be disabled for correctness testing.		/// be disabled for correctness testing.
static cl::opt<bool> EnableForwardingConflictDetection(		static cl::opt<bool> EnableForwardingConflictDetection(
"store-to-load-forwarding-conflict-detection", cl::Hidden,		"store-to-load-forwarding-conflict-detection", cl::Hidden,
cl::desc("Enable conflict detection in loop-access analysis"),		cl::desc("Enable conflict detection in loop-access analysis"),
cl::init(true));		cl::init(true));

		static cl::opt<unsigned> MaxForkedSCEVDepth(
		"max-forked-scev-depth", cl::Hidden,
		fhahnUnsubmitted Not Done Reply Inline Actions It would be good to add a test for the option, e.g. a case that requires 2 or 3 recursions and set test with `max-forked-scev-depth=2/3` fhahn: It would be good to add a test for the option, e.g. a case that requires 2 or 3 recursions and…
		cl::desc("Maximum recursion depth when finding forked SCEVs (default = 5)"),
		cl::init(5));

bool VectorizerParams::isInterleaveForced() {		bool VectorizerParams::isInterleaveForced() {
return ::VectorizationInterleave.getNumOccurrences() > 0;		return ::VectorizationInterleave.getNumOccurrences() > 0;
}		}

Value llvm::stripIntegerCast(Value V) {		Value llvm::stripIntegerCast(Value V) {
if (auto *CI = dyn_cast<CastInst>(V))		if (auto *CI = dyn_cast<CastInst>(V))
if (CI->getOperand(0)->getType()->isIntegerTy())		if (CI->getOperand(0)->getType()->isIntegerTy())
return CI->getOperand(0);		return CI->getOperand(0);
Show All 29 Lines

RuntimeCheckingPtrGroup::RuntimeCheckingPtrGroup(		RuntimeCheckingPtrGroup::RuntimeCheckingPtrGroup(
unsigned Index, RuntimePointerChecking &RtCheck)		unsigned Index, RuntimePointerChecking &RtCheck)
: High(RtCheck.Pointers[Index].End), Low(RtCheck.Pointers[Index].Start),		: High(RtCheck.Pointers[Index].End), Low(RtCheck.Pointers[Index].Start),
AddressSpace(RtCheck.Pointers[Index]		AddressSpace(RtCheck.Pointers[Index]
.PointerValue->getType()		.PointerValue->getType()
->getPointerAddressSpace()),		->getPointerAddressSpace()),
NeedsFreeze(RtCheck.Pointers[Index].NeedsFreeze) {		NeedsFreeze(RtCheck.Pointers[Index].NeedsFreeze) {
Members.push_back(Index);		Members.push_back(Index);
		david-armUnsubmitted Done Reply Inline Actions nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway? david-arm: nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway?
}		}

/// Calculate Start and End points of memory access.		/// Calculate Start and End points of memory access.
/// Let's assume A is the first access and B is a memory access on N-th loop		/// Let's assume A is the first access and B is a memory access on N-th loop
/// iteration. Then B is calculated as:		/// iteration. Then B is calculated as:
/// B = A + Step*N .		/// B = A + Step*N .
/// Step value may be positive or negative.		/// Step value may be positive or negative.
/// N is a calculated back-edge taken count:		/// N is a calculated back-edge taken count:
▲ Show 20 Lines • Show All 179 Lines • ▼ Show 20 Lines	static const SCEV getMinFromExprs(const SCEV I, const SCEV *J,
if (C->getValue()->isNegative())		if (C->getValue()->isNegative())
return J;		return J;
return I;		return I;
}		}

bool RuntimeCheckingPtrGroup::addPointer(unsigned Index,		bool RuntimeCheckingPtrGroup::addPointer(unsigned Index,
RuntimePointerChecking &RtCheck) {		RuntimePointerChecking &RtCheck) {
return addPointer(		return addPointer(
Index, RtCheck.Pointers[Index].Start, RtCheck.Pointers[Index].End,		Index, RtCheck.Pointers[Index].Start, RtCheck.Pointers[Index].End,
		david-armUnsubmitted Done Reply Inline Actions nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway? david-arm: nit: Perhaps you can just write `assert(Fork <= 1 && ...)` since it's unsigned anyway?
RtCheck.Pointers[Index].PointerValue->getType()->getPointerAddressSpace(),		RtCheck.Pointers[Index].PointerValue->getType()->getPointerAddressSpace(),
RtCheck.Pointers[Index].NeedsFreeze, *RtCheck.SE);		RtCheck.Pointers[Index].NeedsFreeze, *RtCheck.SE);
}		}

bool RuntimeCheckingPtrGroup::addPointer(unsigned Index, const SCEV *Start,		bool RuntimeCheckingPtrGroup::addPointer(unsigned Index, const SCEV *Start,
const SCEV *End, unsigned AS,		const SCEV *End, unsigned AS,
bool NeedsFreeze,		bool NeedsFreeze,
ScalarEvolution &SE) {		ScalarEvolution &SE) {
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	void RuntimePointerChecking::groupChecks(

// If we don't have the dependency partitions, construct a new		// If we don't have the dependency partitions, construct a new
// checking pointer group for each pointer. This is also required		// checking pointer group for each pointer. This is also required
// for correctness, because in this case we can have checking between		// for correctness, because in this case we can have checking between
// pointers to the same underlying object.		// pointers to the same underlying object.
if (!UseDependencies) {		if (!UseDependencies) {
for (unsigned I = 0; I < Pointers.size(); ++I)		for (unsigned I = 0; I < Pointers.size(); ++I)
CheckingGroups.push_back(RuntimeCheckingPtrGroup(I, *this));		CheckingGroups.push_back(RuntimeCheckingPtrGroup(I, *this));
return;		return;
		david-armUnsubmitted Done Reply Inline Actions nit: Is it worth calling `CheckingGroups.emplace_back(I, this, /Fork=0/0)` for clarity here? david-arm:* nit: Is it worth calling `CheckingGroups.emplace_back(I, this, /Fork=0*/0)` for clarity here?
}		}

unsigned TotalComparisons = 0;		unsigned TotalComparisons = 0;

DenseMap<Value *, SmallVector<unsigned>> PositionMap;		DenseMap<Value *, SmallVector<unsigned>> PositionMap;
for (unsigned Index = 0; Index < Pointers.size(); ++Index) {		for (unsigned Index = 0; Index < Pointers.size(); ++Index) {
auto Iter = PositionMap.insert({Pointers[Index].PointerValue, {}});		auto Iter = PositionMap.insert({Pointers[Index].PointerValue, {}});
Iter.first->second.push_back(Index);		Iter.first->second.push_back(Index);
}		}
		fhahnUnsubmitted Not Done Reply Inline Actions This scheme would need documenting, i.e. why we can have multiple expressions for a pointer. fhahn: This scheme would need documenting, i.e. why we can have multiple expressions for a pointer.

		fhahnUnsubmitted Not Done Reply Inline Actions should be removed? fhahn: should be removed?
// We need to keep track of what pointers we've already seen so we		// We need to keep track of what pointers we've already seen so we
// don't process them twice.		// don't process them twice.
SmallSet<unsigned, 2> Seen;		SmallSet<unsigned, 2> Seen;

// Go through all equivalence classes, get the "pointer check groups"		// Go through all equivalence classes, get the "pointer check groups"
// and add them to the overall solution. We use the order in which accesses		// and add them to the overall solution. We use the order in which accesses
// appear in 'Pointers' to enforce determinism.		// appear in 'Pointers' to enforce determinism.
for (unsigned I = 0; I < Pointers.size(); ++I) {		for (unsigned I = 0; I < Pointers.size(); ++I) {
Show All 14 Lines	for (unsigned I = 0; I < Pointers.size(); ++I) {
// the order in which unions and insertions are performed on the		// the order in which unions and insertions are performed on the
// equivalence class, the iteration order is deterministic.		// equivalence class, the iteration order is deterministic.
for (auto MI = DepCands.member_begin(LeaderI), ME = DepCands.member_end();		for (auto MI = DepCands.member_begin(LeaderI), ME = DepCands.member_end();
MI != ME; ++MI) {		MI != ME; ++MI) {
auto PointerI = PositionMap.find(MI->getPointer());		auto PointerI = PositionMap.find(MI->getPointer());
assert(PointerI != PositionMap.end() &&		assert(PointerI != PositionMap.end() &&
"pointer in equivalence class not found in PositionMap");		"pointer in equivalence class not found in PositionMap");
for (unsigned Pointer : PointerI->second) {		for (unsigned Pointer : PointerI->second) {
bool Merged = false;		bool Merged = false;
		david-armUnsubmitted Not Done Reply Inline Actions Have you rewritten the logic here as a performance improvement, i.e. to avoid calling `Group.addPointer()` after it's already been merged? david-arm: Have you rewritten the logic here as a performance improvement, i.e. to avoid calling `Group.
		huntergrAuthorUnsubmitted Not Done Reply Inline Actions Not intentionally, no -- I just replicated the behaviour of the original (break out of the loop if the pointer merged into a group) but considered both potential forks. huntergr: Not intentionally, no -- I just replicated the behaviour of the original (break out of the loop…
// Mark this pointer as seen.		// Mark this pointer as seen.
Seen.insert(Pointer);		Seen.insert(Pointer);

// Go through all the existing sets and see if we can find one		// Go through all the existing sets and see if we can find one
// which can include this pointer.		// which can include this pointer.
for (RuntimeCheckingPtrGroup &Group : Groups) {		for (RuntimeCheckingPtrGroup &Group : Groups) {
// Don't perform more than a certain amount of comparisons.		// Don't perform more than a certain amount of comparisons.
// This should limit the cost of grouping the pointers to something		// This should limit the cost of grouping the pointers to something
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines
};		};

} // end anonymous namespace		} // end anonymous namespace

/// Check whether a pointer can participate in a runtime bounds check.		/// Check whether a pointer can participate in a runtime bounds check.
/// If \p Assume, try harder to prove that we can compute the bounds of \p Ptr		/// If \p Assume, try harder to prove that we can compute the bounds of \p Ptr
/// by adding run-time checks (overflow checks) if necessary.		/// by adding run-time checks (overflow checks) if necessary.
static bool hasComputableBounds(PredicatedScalarEvolution &PSE, Value *Ptr,		static bool hasComputableBounds(PredicatedScalarEvolution &PSE, Value *Ptr,
const SCEV PtrScev, Loop L, bool Assume) {		const SCEV PtrScev, Loop L, bool Assume) {
		fhahnUnsubmitted Not Done Reply Inline Actions should be removed? fhahn: should be removed?
// The bounds for loop-invariant pointer is trivial.		// The bounds for loop-invariant pointer is trivial.
if (PSE.getSE()->isLoopInvariant(PtrScev, L))		if (PSE.getSE()->isLoopInvariant(PtrScev, L))
return true;		return true;

const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);		const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);

if (!AR && Assume)		if (!AR && Assume)
AR = PSE.getAsAddRec(Ptr);		AR = PSE.getAsAddRec(Ptr);

if (!AR)		if (!AR)
return false;		return false;
		david-armUnsubmitted Not Done Reply Inline Actions nit: This is just a suggestion, but you could restructure this a little to remove the extra indentation I think here: if (!AR) { if (!RtCheck.AllowForkedPtrs) return false; ... Not sure if this is better? david-arm: nit: This is just a suggestion, but you could restructure this a little to remove the extra…
		huntergrAuthorUnsubmitted Done Reply Inline Actions We still needed to bail out if AR was false and forked pointers weren't allowed... so I rewrote it to check for the positive case for AR first and then proceed with the forked pointers check and default false afterwards. huntergr: We still needed to bail out if AR was false and forked pointers weren't allowed... so I rewrote…

return AR->isAffine();		return AR->isAffine();
}		}

/// Check whether a pointer address cannot wrap.		/// Check whether a pointer address cannot wrap.
static bool isNoWrap(PredicatedScalarEvolution &PSE,		static bool isNoWrap(PredicatedScalarEvolution &PSE,
const ValueToValueMap &Strides, Value Ptr, Type AccessTy,		const ValueToValueMap &Strides, Value Ptr, Type AccessTy,
Loop *L) {		Loop *L) {
const SCEV *PtrScev = PSE.getSCEV(Ptr);		const SCEV *PtrScev = PSE.getSCEV(Ptr);
if (PSE.getSE()->isLoopInvariant(PtrScev, L))		if (PSE.getSE()->isLoopInvariant(PtrScev, L))
return true;		return true;

		david-armUnsubmitted Done Reply Inline Actions nit: This is just a minor comment, but you could remove indentation further here by bailing out early, i.e. if (!FPtr) return false; const SCEV A = david-arm:* nit: This is just a minor comment, but you could remove indentation further here by bailing out…
int64_t Stride = getPtrStride(PSE, AccessTy, Ptr, L, Strides);		int64_t Stride = getPtrStride(PSE, AccessTy, Ptr, L, Strides);
if (Stride == 1 \|\| PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW))		if (Stride == 1 \|\| PSE.hasNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW))
return true;		return true;

		david-armUnsubmitted Not Done Reply Inline Actions Is it better to just make a recursive call to hasComputableBounds instead of calling `isAffine` here? We're potentially missing out on future improvements to hasComputableBounds here, and also we're ignoring the possibility of a forked pointer being loop invariant I think. david-arm: Is it better to just make a recursive call to hasComputableBounds instead of calling `isAffine`…
		huntergrAuthorUnsubmitted Done Reply Inline Actions We can't do that right now -- hasComputableBounds takes a Value* rather than a SCEV* so it can (potentially) be added to the stride map in replaceSymbolicStrideSCEV. We'd just end up looking at the same Value and splitting again. This is something the SCEVForkedExpr would make cleaner, since we would only need to evaluate a single SCEV. But it'll take a bit of refactoring to do that, which is why I wanted some feedback on the whole idea first. We could also separate out parts of these functions to make it recursive, but I'll need to be careful since replaceSymbolicStrideSCEV has other users. huntergr: We can't do that right now -- hasComputableBounds takes a Value* rather than a SCEV* so it can…
		david-armUnsubmitted Done Reply Inline Actions nit: Instead of writing: LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << (FPtr->first) << "\n"); I think you can just write LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << A << "\n"); and same for the second one. david-arm: nit: Instead of writing: LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << *(FPtr->first) << "\n"); I…
return false;		return false;
}		}
		david-armUnsubmitted Done Reply Inline Actions Is there any danger in setting this before we return true? david-arm: Is there any danger in setting this before we return true?

static void visitPointers(Value *StartPtr, const Loop &InnermostLoop,		static void visitPointers(Value *StartPtr, const Loop &InnermostLoop,
function_ref<void(Value *)> AddPointer) {		function_ref<void(Value *)> AddPointer) {
SmallPtrSet<Value *, 8> Visited;		SmallPtrSet<Value *, 8> Visited;
SmallVector<Value *> WorkList;		SmallVector<Value *> WorkList;
WorkList.push_back(StartPtr);		WorkList.push_back(StartPtr);

while (!WorkList.empty()) {		while (!WorkList.empty()) {
Value *Ptr = WorkList.pop_back_val();		Value *Ptr = WorkList.pop_back_val();
if (!Visited.insert(Ptr).second)		if (!Visited.insert(Ptr).second)
continue;		continue;
auto *PN = dyn_cast<PHINode>(Ptr);		auto *PN = dyn_cast<PHINode>(Ptr);
// SCEV does not look through non-header PHIs inside the loop. Such phis		// SCEV does not look through non-header PHIs inside the loop. Such phis
// can be analyzed by adding separate accesses for each incoming pointer		// can be analyzed by adding separate accesses for each incoming pointer
// value.		// value.
if (PN && InnermostLoop.contains(PN->getParent()) &&		if (PN && InnermostLoop.contains(PN->getParent()) &&
PN->getParent() != InnermostLoop.getHeader()) {		PN->getParent() != InnermostLoop.getHeader()) {
for (const Use &Inc : PN->incoming_values())		for (const Use &Inc : PN->incoming_values())
WorkList.push_back(Inc);		WorkList.push_back(Inc);
		david-armUnsubmitted Done Reply Inline Actions You're introducing a new implicit TypeSize -> uint64_t cast here. Could you rewrite this as: int64_t Size = DL.getTypeAllocSize(PtrTy->getElementType()).getFixedSize(); david-arm: You're introducing a new implicit TypeSize -> uint64_t cast here. Could you rewrite this as…
} else		} else
AddPointer(Ptr);		AddPointer(Ptr);
}		}
}		}

		// Walk back through the IR for a pointer, looking for a select like the
		// following:
		//
		// %offset = select i1 %cmp, i64 %a, i64 %b
		// %addr = getelementptr double, double* %base, i64 %offset
		// %ld = load double, double* %addr, align 8
		//
		// We won't be able to form a single SCEVAddRecExpr from this since the
		// address for each loop iteration depends on %cmp. We could potentially
		// produce multiple valid SCEVAddRecExprs, though, and check all of them for
		// memory safety/aliasing if needed.
		//
		// If we encounter some IR we don't yet handle, or something obviously fine
		// like a constant, then we just add the SCEV for that term to the list passed
		// in by the caller. If we have a node that may potentially yield a valid
		// SCEVAddRecExpr then we decompose it into parts and build the SCEV terms
		// ourselves before adding to the list.
		static void
		findForkedSCEVs(ScalarEvolution SE, const Loop L, Value *Ptr,
		SmallVectorImpl<std::pair<const SCEV *, bool>> &ScevList,
		unsigned Depth) {
		// If our Value is a SCEVAddRecExpr, loop invariant, not an instruction, or
		// we've exceeded our limit on recursion, just return whatever we have
		// regardless of whether it can be used for a forked pointer or not, along
		// with an indication of whether it might be a poison or undef value.
		const SCEV *Scev = SE->getSCEV(Ptr);
		if (isa<SCEVAddRecExpr>(Scev) \|\| L->isLoopInvariant(Ptr) \|\|
		!isa<Instruction>(Ptr) \|\| Depth == 0) {
		ScevList.push_back(
		std::make_pair(Scev, !isGuaranteedNotToBeUndefOrPoison(Ptr)));
		return;
		}

		Depth--;

		auto UndefPoisonCheck = [](std::pair<const SCEV *, bool> S) -> bool {
		return S.second;
		};

		Instruction *I = cast<Instruction>(Ptr);
		unsigned Opcode = I->getOpcode();
		switch (Opcode) {
		case Instruction::GetElementPtr: {
		GetElementPtrInst *GEP = cast<GetElementPtrInst>(I);
		Type *SourceTy = GEP->getSourceElementType();
		// We only handle base + single offset GEPs here for now.
		// Not dealing with preexisting gathers yet, so no vectors.
		if (I->getNumOperands() != 2 \|\| SourceTy->isVectorTy()) {
		ScevList.push_back(
		std::make_pair(Scev, !isGuaranteedNotToBeUndefOrPoison(GEP)));
		break;
		fhahnUnsubmitted Done Reply Inline Actions missing test for this? fhahn: missing test for this?
		}
		SmallVector<std::pair<const SCEV *, bool>, 2> BaseScevs;
		SmallVector<std::pair<const SCEV *, bool>, 2> OffsetScevs;
		findForkedSCEVs(SE, L, I->getOperand(0), BaseScevs, Depth);
		findForkedSCEVs(SE, L, I->getOperand(1), OffsetScevs, Depth);

		// See if we need to freeze our fork...
		bool NeedsFreeze = any_of(BaseScevs, UndefPoisonCheck) \|\|
		any_of(OffsetScevs, UndefPoisonCheck);

		// Check that we only have a single fork, on either the base or the offset.
		// Copy the SCEV across for the one without a fork in order to generate
		// the full SCEV for both sides of the GEP.
		if (OffsetScevs.size() == 2 && BaseScevs.size() == 1)
		BaseScevs.push_back(BaseScevs[0]);
		else if (BaseScevs.size() == 2 && OffsetScevs.size() == 1)
		OffsetScevs.push_back(OffsetScevs[0]);
		else {
		fhahnUnsubmitted Done Reply Inline Actions It looks like tests for those conditions are missing? fhahn: It looks like tests for those conditions are missing?
		ScevList.push_back(std::make_pair(Scev, NeedsFreeze));
		break;
		}

		// Find the pointer type we need to extend to.
		Type *IntPtrTy = SE->getEffectiveSCEVType(
		SE->getSCEV(GEP->getPointerOperand())->getType());

		// Find the size of the type being pointed to. We only have a single
		// index term (guarded above) so we don't need to index into arrays or
		// structures, just get the size of the scalar value.
		const SCEV *Size = SE->getSizeOfExpr(IntPtrTy, SourceTy);

		fhahnUnsubmitted Not Done Reply Inline Actions I don't think we can use the inbounds info here, unless we prove that the program is undefined if GEP is poison. Consider something like below. If `%c` is always false, the GEP index could be out-of-bounds (and the GEP poison). Adding a runtime check based on the SCEV expression may introduce a branch on poison unconditionally. define dso_local void @forked_ptrs_different_base_same_offset(float* nocapture readonly %Base1, float* nocapture readonly %Base2, float* nocapture %Dest, i32* nocapture readonly %Preds, i1 %c) { entry: br label %for.body for.cond.cleanup: ret void for.body: %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %latch ] %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv %0 = load i32, i32* %arrayidx, align 4 %cmp1.not = icmp eq i32 %0, 0 %spec.select = select i1 %cmp1.not, float* %Base2, float* %Base1 %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv %.sink = load float, float* %.sink.in, align 4 %1 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv br i1 %c, label %then, label %latch then: store float %.sink, float* %1, align 4 br label %latch latch: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond.not = icmp eq i64 %indvars.iv.next, 100 br i1 %exitcond.not, label %for.cond.cleanup, label %for.body } fhahn: I don't think we can use the inbounds info here, unless we prove that the program is undefined…
		huntergrAuthorUnsubmitted Done Reply Inline Actions I removed the code to add no-wrap flags to the SCEV based on 'inbounds', then added your test. huntergr: I removed the code to add no-wrap flags to the SCEV based on 'inbounds', then added your test.
		// Scale up the offsets by the size of the type, then add to the bases.
		const SCEV *Scaled1 = SE->getMulExpr(
		Size, SE->getTruncateOrSignExtend(OffsetScevs[0].first, IntPtrTy));
		const SCEV *Scaled2 = SE->getMulExpr(
		Size, SE->getTruncateOrSignExtend(OffsetScevs[1].first, IntPtrTy));
		ScevList.push_back(std::make_pair(
		SE->getAddExpr(BaseScevs[0].first, Scaled1), NeedsFreeze));
		fhahnUnsubmitted Done Reply Inline Actions It might be a bit simpler to just add a duplicate to BaseScevs/OffsetScevs and have one common code path to compute the SCEVs to add fhahn: It might be a bit simpler to just add a duplicate to BaseScevs/OffsetScevs and have one common…
		ScevList.push_back(std::make_pair(
		SE->getAddExpr(BaseScevs[1].first, Scaled2), NeedsFreeze));
		break;
		}
		case Instruction::Select: {
		SmallVector<std::pair<const SCEV *, bool>, 2> ChildScevs;
		// A select means we've found a forked pointer, but we currently only
		// support a single select per pointer so if there's another behind this
		// then we just bail out and return the generic SCEV.
		findForkedSCEVs(SE, L, I->getOperand(1), ChildScevs, Depth);
		findForkedSCEVs(SE, L, I->getOperand(2), ChildScevs, Depth);
		if (ChildScevs.size() == 2) {
		ScevList.push_back(ChildScevs[0]);
		ScevList.push_back(ChildScevs[1]);
		} else
		ScevList.push_back(
		std::make_pair(Scev, !isGuaranteedNotToBeUndefOrPoison(Ptr)));
		break;
		}
		default:
		// Just return the current SCEV if we haven't handled the instruction yet.
		LLVM_DEBUG(dbgs() << "ForkedPtr unhandled instruction: " << *I << "\n");
		ScevList.push_back(
		std::make_pair(Scev, !isGuaranteedNotToBeUndefOrPoison(Ptr)));
		break;
		}

		return;
		}

		static SmallVector<std::pair<const SCEV *, bool>>
		findForkedPointer(PredicatedScalarEvolution &PSE,
		const ValueToValueMap &StridesMap, Value *Ptr,
		const Loop *L) {
		ScalarEvolution *SE = PSE.getSE();
		assert(SE->isSCEVable(Ptr->getType()) && "Value is not SCEVable!");
		SmallVector<std::pair<const SCEV *, bool>, 2> Scevs;
		findForkedSCEVs(SE, L, Ptr, Scevs, MaxForkedSCEVDepth);

		// For now, we will only accept a forked pointer with two possible SCEVs.
		if (Scevs.size() == 2)
		return Scevs;
		fhahnUnsubmitted Not Done Reply Inline Actions nit: unnecessary move? fhahn: nit: unnecessary move?

		return {
		std::make_pair(replaceSymbolicStrideSCEV(PSE, StridesMap, Ptr), false)};
		}

bool AccessAnalysis::createCheckForAccess(RuntimePointerChecking &RtCheck,		bool AccessAnalysis::createCheckForAccess(RuntimePointerChecking &RtCheck,
MemAccessInfo Access, Type *AccessTy,		MemAccessInfo Access, Type *AccessTy,
const ValueToValueMap &StridesMap,		const ValueToValueMap &StridesMap,
DenseMap<Value *, unsigned> &DepSetId,		DenseMap<Value *, unsigned> &DepSetId,
Loop *TheLoop, unsigned &RunningDepId,		Loop *TheLoop, unsigned &RunningDepId,
unsigned ASId, bool ShouldCheckWrap,		unsigned ASId, bool ShouldCheckWrap,
bool Assume) {		bool Assume) {
Value *Ptr = Access.getPointer();		Value *Ptr = Access.getPointer();

ScalarEvolution &SE = *PSE.getSE();		SmallVector<std::pair<const SCEV *, bool>> TranslatedPtrs =
SmallVector<std::pair<const SCEV *, bool>> TranslatedPtrs;		findForkedPointer(PSE, StridesMap, Ptr, TheLoop);
auto *SI = dyn_cast<SelectInst>(Ptr);
// Look through selects in the current loop.
if (SI && !TheLoop->isLoopInvariant(SI)) {
TranslatedPtrs = {
std::make_pair(SE.getSCEV(SI->getOperand(1)),
!isGuaranteedNotToBeUndefOrPoison(SI->getOperand(1))),
std::make_pair(SE.getSCEV(SI->getOperand(2)),
!isGuaranteedNotToBeUndefOrPoison(SI->getOperand(2)))};
} else
TranslatedPtrs = {
std::make_pair(replaceSymbolicStrideSCEV(PSE, StridesMap, Ptr), false)};

for (auto &P : TranslatedPtrs) {		for (auto &P : TranslatedPtrs) {
const SCEV *PtrExpr = P.first;		const SCEV *PtrExpr = P.first;
if (!hasComputableBounds(PSE, Ptr, PtrExpr, TheLoop, Assume))		if (!hasComputableBounds(PSE, Ptr, PtrExpr, TheLoop, Assume))
return false;		return false;

// When we run after a failing dependency check we have to make sure		// When we run after a failing dependency check we have to make sure
// we don't have wrapping pointers.		// we don't have wrapping pointers.
if (ShouldCheckWrap) {		if (ShouldCheckWrap) {
// Skip wrap checking when translating pointers.		// Skip wrap checking when translating pointers.
if (TranslatedPtrs.size() > 1)		if (TranslatedPtrs.size() > 1)
return false;		return false;

if (!isNoWrap(PSE, StridesMap, Ptr, AccessTy, TheLoop)) {		if (!isNoWrap(PSE, StridesMap, Ptr, AccessTy, TheLoop)) {
auto *Expr = PSE.getSCEV(Ptr);		auto *Expr = PSE.getSCEV(Ptr);
if (!Assume \|\| !isa<SCEVAddRecExpr>(Expr))		if (!Assume \|\| !isa<SCEVAddRecExpr>(Expr))
return false;		return false;
PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);		PSE.setNoOverflow(Ptr, SCEVWrapPredicate::IncrementNUSW);
}		}
}		}
// If there's only one option for Ptr, look it up after bounds and wrap		// If there's only one option for Ptr, look it up after bounds and wrap
// checking, because assumptions might have been added to PSE.		// checking, because assumptions might have been added to PSE.
if (TranslatedPtrs.size() == 1)		if (TranslatedPtrs.size() == 1)
		fhahnUnsubmitted Not Done Reply Inline Actions Needs resolving. It should be needed, because the above code may have added assumptions, which make Ptr an AddRec, See comment in D114487. fhahn: Needs resolving. It should be needed, because the above code may have added assumptions, which…
TranslatedPtrs[0] = std::make_pair(		TranslatedPtrs[0] = std::make_pair(
replaceSymbolicStrideSCEV(PSE, StridesMap, Ptr), false);		replaceSymbolicStrideSCEV(PSE, StridesMap, Ptr), false);
}		}

for (auto &P : TranslatedPtrs) {		for (auto &P : TranslatedPtrs) {
const SCEV *PtrExpr = P.first;		const SCEV *PtrExpr = P.first;

// The id of the dependence set.		// The id of the dependence set.
▲ Show 20 Lines • Show All 1,723 Lines • Show Last 20 Lines

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll

; RUN: opt -disable-output -passes='print-access-info' %s 2>&1 \| FileCheck %s		; RUN: opt -disable-output -passes='print-access-info' %s 2>&1 \| FileCheck %s
		; RUN: opt -disable-output -passes='print-access-info' -max-forked-scev-depth=2 %s 2>&1 \| FileCheck -check-prefix=RECURSE %s
		fhahnUnsubmitted Done Reply Inline Actions It would be probably be good to convert the IR to use opaque pointers in a pre-commit, so `-opaque-pointers` is not needed. fhahn: It would be probably be good to convert the IR to use opaque pointers in a pre-commit, so `…

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

; CHECK-LABEL: function 'forked_ptrs_simple':		; CHECK-LABEL: function 'forked_ptrs_simple':
; CHECK-NEXT: loop:		; CHECK-NEXT: loop:
; CHECK-NEXT: Memory dependences are safe with run-time checks		; CHECK-NEXT: Memory dependences are safe with run-time checks
; CHECK-NEXT: Dependences:		; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:		; CHECK-NEXT: Run-time memory checks:
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	loop:
%exitcond.not = icmp eq i64 %iv.next, 100		%exitcond.not = icmp eq i64 %iv.next, 100
br i1 %exitcond.not, label %exit, label %loop		br i1 %exitcond.not, label %exit, label %loop

exit:		exit:
ret void		ret void
}		}

; CHECK-LABEL: function 'forked_ptrs_different_base_same_offset':		; CHECK-LABEL: function 'forked_ptrs_different_base_same_offset':
; CHECK-NEXT: for.body:		; CHECK-NEXT: for.body:
; CHECK-NEXT: Report: cannot identify array bounds		; CHECK-NEXT: Memory dependences are safe with run-time checks
; CHECK-NEXT: Dependences:		; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:		; CHECK-NEXT: Run-time memory checks:
		; CHECK-NEXT: Check 0:
		; CHECK-NEXT: Comparing group ([[G1:.+]]):
		; CHECK-NEXT: %1 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
		; CHECK-NEXT: Against group ([[G2:.+]]):
		; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
		; CHECK-NEXT: Check 1:
		; CHECK-NEXT: Comparing group ([[G1]]):
		; CHECK-NEXT: %1 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
		; CHECK-NEXT: Against group ([[G3:.+]]):
		; CHECK-NEXT: %.sink.in = getelementptr inbounds float, ptr %spec.select, i64 %indvars.iv
		; CHECK-NEXT: Check 2:
		; CHECK-NEXT: Comparing group ([[G1]]):
		; CHECK-NEXT: %1 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
		; CHECK-NEXT: Against group ([[G4:.+]]):
		; CHECK-NEXT: %.sink.in = getelementptr inbounds float, ptr %spec.select, i64 %indvars.iv
; CHECK-NEXT: Grouped accesses:		; CHECK-NEXT: Grouped accesses:
		; CHECK-NEXT: Group [[G1]]:
		; CHECK-NEXT: (Low: %Dest High: (400 + %Dest))
		; CHECK-NEXT: Member: {%Dest,+,4}<nuw><%for.body>
		; CHECK-NEXT: Group [[G2]]:
		; CHECK-NEXT: (Low: %Preds High: (400 + %Preds))
		; CHECK-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
		; CHECK-NEXT: Group [[G3]]:
		; CHECK-NEXT: (Low: %Base2 High: (400 + %Base2))
		; CHECK-NEXT: Member: {%Base2,+,4}<nw><%for.body>
		; CHECK-NEXT: Group [[G4]]:
		; CHECK-NEXT: (Low: %Base1 High: (400 + %Base1))
		; CHECK-NEXT: Member: {%Base1,+,4}<nw><%for.body>
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.		; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:		; CHECK-NEXT: SCEV assumptions:
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: Expressions re-written:		; CHECK-NEXT: Expressions re-written:

		;; We have a limit on the recursion depth for finding a loop invariant or
		;; addrec term; confirm we won't exceed that depth by forcing a lower
		;; limit via -max-forked-scev-depth=2
		; RECURSE-LABEL: Loop access info in function 'forked_ptrs_same_base_different_offset':
		; RECURSE-NEXT: for.body:
		; RECURSE-NEXT: Report: cannot identify array bounds
		; RECURSE-NEXT: Dependences:
		; RECURSE-NEXT: Run-time memory checks:
		; RECURSE-NEXT: Grouped accesses:
		; RECURSE-EMPTY:
		; RECURSE-NEXT: Non vectorizable stores to invariant address were not found in loop.
		; RECURSE-NEXT: SCEV assumptions:
		; RECURSE-EMPTY:
		; RECURSE-NEXT: Expressions re-written:

;;;; Derived from the following C code		;;;; Derived from the following C code
;; void forked_ptrs_different_base_same_offset(float A, float B, float C, int D) {		;; void forked_ptrs_different_base_same_offset(float A, float B, float C, int D) {
;; for (int i=0; i<100; i++) {		;; for (int i=0; i<100; i++) {
;; if (D[i] != 0) {		;; if (D[i] != 0) {
;; C[i] = A[i];		;; C[i] = A[i];
;; } else {		;; } else {
;; C[i] = B[i];		;; C[i] = B[i];
;; }		;; }
Show All 18 Lines	for.body:
%1 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv		%1 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
store float %.sink, ptr %1, align 4		store float %.sink, ptr %1, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond.not = icmp eq i64 %indvars.iv.next, 100		%exitcond.not = icmp eq i64 %indvars.iv.next, 100
br i1 %exitcond.not, label %for.cond.cleanup, label %for.body		br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}		}

; CHECK-LABEL: function 'forked_ptrs_different_base_same_offset_possible_poison':		; CHECK-LABEL: function 'forked_ptrs_different_base_same_offset_possible_poison':
; CHECK-NEXT: for.body:		; CHECK-NEXT: for.body:
; CHECK-NEXT: Report: cannot identify array bounds		; CHECK-NEXT: Memory dependences are safe with run-time checks
; CHECK-NEXT: Dependences:		; CHECK-NEXT: Dependences:
; CHECK-NEXT: Run-time memory checks:		; CHECK-NEXT: Run-time memory checks:
		; CHECK-NEXT: Check 0:
		; CHECK-NEXT: Comparing group ([[G1:.+]]):
		; CHECK-NEXT: %1 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
		; CHECK-NEXT: Against group ([[G2:.+]]):
		; CHECK-NEXT: %arrayidx = getelementptr inbounds i32, ptr %Preds, i64 %indvars.iv
		fhahnUnsubmitted Not Done Reply Inline Actions nit: it would probably be good to also have a few tests that access different sizes, including odd ones like `i23` or something like that, to ensure the right size expressions are used. fhahn: nit: it would probably be good to also have a few tests that access different sizes, including…
		; CHECK-NEXT: Check 1:
		; CHECK-NEXT: Comparing group ([[G1]]):
		; CHECK-NEXT: %1 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
		; CHECK-NEXT: Against group ([[G3:.+]]):
		; CHECK-NEXT: %.sink.in = getelementptr inbounds float, ptr %spec.select, i64 %indvars.iv
		; CHECK-NEXT: Check 2:
		; CHECK-NEXT: Comparing group ([[G1]]):
		; CHECK-NEXT: %1 = getelementptr inbounds float, ptr %Dest, i64 %indvars.iv
		; CHECK-NEXT: Against group ([[G4:.+]]):
		; CHECK-NEXT: %.sink.in = getelementptr inbounds float, ptr %spec.select, i64 %indvars.iv
; CHECK-NEXT: Grouped accesses:		; CHECK-NEXT: Grouped accesses:
		; CHECK-NEXT: Group [[G1]]:
		; CHECK-NEXT: (Low: %Dest High: (400 + %Dest))
		; CHECK-NEXT: Member: {%Dest,+,4}<nw><%for.body>
		; CHECK-NEXT: Group [[G2]]:
		; CHECK-NEXT: (Low: %Preds High: (400 + %Preds))
		; CHECK-NEXT: Member: {%Preds,+,4}<nuw><%for.body>
		; CHECK-NEXT: Group [[G3]]:
		; CHECK-NEXT: (Low: %Base2 High: (400 + %Base2))
		; CHECK-NEXT: Member: {%Base2,+,4}<nw><%for.body>
		; CHECK-NEXT: Group [[G4]]:
		; CHECK-NEXT: (Low: %Base1 High: (400 + %Base1))
		; CHECK-NEXT: Member: {%Base1,+,4}<nw><%for.body>
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.		; CHECK-NEXT: Non vectorizable stores to invariant address were not found in loop.
; CHECK-NEXT: SCEV assumptions:		; CHECK-NEXT: SCEV assumptions:
; CHECK-EMPTY:		; CHECK-EMPTY:
; CHECK-NEXT: Expressions re-written:		; CHECK-NEXT: Expressions re-written:

define dso_local void @forked_ptrs_different_base_same_offset_possible_poison(ptr nocapture readonly %Base1, ptr nocapture readonly %Base2, ptr nocapture %Dest, ptr nocapture readonly %Preds, i1 %c) {		define dso_local void @forked_ptrs_different_base_same_offset_possible_poison(ptr nocapture readonly %Base1, ptr nocapture readonly %Base2, ptr nocapture %Dest, ptr nocapture readonly %Preds, i1 %c) {
entry:		entry:
▲ Show 20 Lines • Show All 306 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/forked-pointers.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				david-armUnsubmitted Not Done Reply Inline Actions Can we also have a test where at least one of the forked pointers is loop-invariant? david-arm: Can we also have a test where at least one of the forked pointers is loop-invariant?
				huntergrAuthorUnsubmitted Done Reply Inline Actions We already do; see forked_ptrs_uniform_and_contiguous_forks We could properly analyze and vectorize that case as well, but I haven't implemented that yet so I'm just testing that it gets rejected for now. huntergr: We already do; see forked_ptrs_uniform_and_contiguous_forks We could properly analyze and…
				david-armUnsubmitted Not Done Reply Inline Actions Ah ok, sorry I missed that. I guess what I meant was that this should be trivial to implement, particularly if we can find a way of making calls to hasComputableBounds recursive and re-use the existing code that checks for loop-invariants and affine pointers. david-arm: Ah ok, sorry I missed that. I guess what I meant was that this should be trivial to implement…
				huntergrAuthorUnsubmitted Done Reply Inline Actions Sadly, it'll require a bit more work to support invariant addresses. My original downstream code allowed them, but we ran into a bug with it and disabled them. My plan is to get the base functionality committed, then go back and add an interface so that LoopVectorize (or other LAA consumers) can query the type of the forks in order to generate correct (and hopefully more optimal) IR for various cases -- the current contiguous-only SCEVs, strides of >1, loop-invariant but unknown strides, uniform/invariant addresses, indexed gather/scatter, etc. If both forks have a stride of 1, or are invariant, then we could potentially plant two masked load instructions (or load + broadcast) instead of a gather, for instance. But that's future work until this part is completed. huntergr: Sadly, it'll require a bit more work to support invariant addresses. My original downstream…
				fhahnUnsubmitted Done Reply Inline Actions the tests running `-loop-accesses` should be in `llvm/test/Analysis/LoopAccessAnalysis`. Also, could you pre-commit the tests and update the diff here to show only the difference? That way it is a bit easier to see the impact in the diff. fhahn: the tests running `-loop-accesses` should be in `llvm/test/Analysis/LoopAccessAnalysis`. Also…
	; RUN: opt -loop-vectorize -instcombine -force-vector-width=4 -S < %s 2>&1 \| FileCheck %s			; RUN: opt -loop-vectorize -instcombine -force-vector-width=4 -S < %s 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	;;;; Derived from the following C code			;;;; Derived from the following C code
	;; void forked_ptrs_different_base_same_offset(float A, float B, float C, int D) {			;; void forked_ptrs_different_base_same_offset(float A, float B, float C, int D) {
	;; for (int i=0; i<100; i++) {			;; for (int i=0; i<100; i++) {
	;; if (D[i] != 0) {			;; if (D[i] != 0) {
	;; C[i] = A[i];			;; C[i] = A[i];
	;; } else {			;; } else {
	;; C[i] = B[i];			;; C[i] = B[i];
	;; }			;; }
	;; }			;; }
	;; }			;; }

	define dso_local void @forked_ptrs_different_base_same_offset(float* nocapture readonly %Base1, float* nocapture readonly %Base2, float* nocapture %Dest, i32* nocapture readonly %Preds) {			define dso_local void @forked_ptrs_different_base_same_offset(float* nocapture readonly %Base1, float* nocapture readonly %Base2, float* nocapture %Dest, i32* nocapture readonly %Preds) {
	; CHECK-LABEL: @forked_ptrs_different_base_same_offset(			; CHECK-LABEL: @forked_ptrs_different_base_same_offset(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[BASE1_FR:%.]] = freeze float [[BASE1:%.*]]
				; CHECK-NEXT: [[BASE2_FR:%.]] = freeze float [[BASE2:%.*]]
				; CHECK-NEXT: [[DEST_FR:%.]] = freeze float [[DEST:%.*]]
				; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_MEMCHECK:%.]]
				; CHECK: vector.memcheck:
				; CHECK-NEXT: [[DEST1:%.]] = ptrtoint float [[DEST_FR]] to i64
				; CHECK-NEXT: [[PREDS2:%.]] = ptrtoint i32 [[PREDS:%.*]] to i64
				; CHECK-NEXT: [[BASE23:%.]] = ptrtoint float [[BASE2_FR]] to i64
				; CHECK-NEXT: [[BASE15:%.]] = ptrtoint float [[BASE1_FR]] to i64
				; CHECK-NEXT: [[TMP0:%.*]] = sub i64 [[DEST1]], [[PREDS2]]
				; CHECK-NEXT: [[DIFF_CHECK:%.*]] = icmp ult i64 [[TMP0]], 16
				; CHECK-NEXT: [[TMP1:%.*]] = sub i64 [[DEST1]], [[BASE23]]
				; CHECK-NEXT: [[DIFF_CHECK4:%.*]] = icmp ult i64 [[TMP1]], 16
				; CHECK-NEXT: [[CONFLICT_RDX:%.*]] = or i1 [[DIFF_CHECK]], [[DIFF_CHECK4]]
				david-armUnsubmitted Not Done Reply Inline Actions It would be good to show some distinction here between Check 1 and Check 2. I assume it's actually checking each of the forked pointers, but the output doesn't make that clear. david-arm: It would be good to show some distinction here between Check 1 and Check 2. I assume it's…
				; CHECK-NEXT: [[TMP2:%.*]] = sub i64 [[DEST1]], [[BASE15]]
				; CHECK-NEXT: [[DIFF_CHECK7:%.*]] = icmp ult i64 [[TMP2]], 16
				; CHECK-NEXT: [[CONFLICT_RDX8:%.*]] = or i1 [[CONFLICT_RDX]], [[DIFF_CHECK7]]
				; CHECK-NEXT: br i1 [[CONFLICT_RDX8]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <4 x float> poison, float* [[BASE2_FR]], i64 0
				; CHECK-NEXT: [[BROADCAST_SPLAT:%.]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT]], <4 x float*> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.]] = insertelement <4 x float> poison, float* [[BASE1_FR]], i64 0
				; CHECK-NEXT: [[BROADCAST_SPLAT10:%.]] = shufflevector <4 x float> [[BROADCAST_SPLATINSERT9]], <4 x float*> poison, <4 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[INDEX]], 1
				; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[INDEX]], 2
				; CHECK-NEXT: [[TMP5:%.*]] = or i64 [[INDEX]], 3
				; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds i32, i32 [[PREDS]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[TMP6]] to <4 x i32>*
				; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP7]], align 4
				; CHECK-NEXT: [[TMP8:%.*]] = icmp eq <4 x i32> [[WIDE_LOAD]], zeroinitializer
				; CHECK-NEXT: [[TMP9:%.]] = select <4 x i1> [[TMP8]], <4 x float> [[BROADCAST_SPLAT]], <4 x float*> [[BROADCAST_SPLAT10]]
				; CHECK-NEXT: [[TMP10:%.]] = extractelement <4 x float> [[TMP9]], i64 0
				; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds float, float [[TMP10]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP12:%.]] = extractelement <4 x float> [[TMP9]], i64 1
				; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds float, float [[TMP12]], i64 [[TMP3]]
				; CHECK-NEXT: [[TMP14:%.]] = extractelement <4 x float> [[TMP9]], i64 2
				; CHECK-NEXT: [[TMP15:%.]] = getelementptr inbounds float, float [[TMP14]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP16:%.]] = extractelement <4 x float> [[TMP9]], i64 3
				; CHECK-NEXT: [[TMP17:%.]] = getelementptr inbounds float, float [[TMP16]], i64 [[TMP5]]
				; CHECK-NEXT: [[TMP18:%.]] = load float, float [[TMP11]], align 4
				; CHECK-NEXT: [[TMP19:%.]] = load float, float [[TMP13]], align 4
				; CHECK-NEXT: [[TMP20:%.]] = load float, float [[TMP15]], align 4
				; CHECK-NEXT: [[TMP21:%.]] = load float, float [[TMP17]], align 4
				; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x float> poison, float [[TMP18]], i64 0
				; CHECK-NEXT: [[TMP23:%.*]] = insertelement <4 x float> [[TMP22]], float [[TMP19]], i64 1
				; CHECK-NEXT: [[TMP24:%.*]] = insertelement <4 x float> [[TMP23]], float [[TMP20]], i64 2
				; CHECK-NEXT: [[TMP25:%.*]] = insertelement <4 x float> [[TMP24]], float [[TMP21]], i64 3
				; CHECK-NEXT: [[TMP26:%.]] = getelementptr inbounds float, float [[DEST_FR]], i64 [[INDEX]]
				; CHECK-NEXT: [[TMP27:%.]] = bitcast float [[TMP26]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP25]], <4 x float>* [[TMP27]], align 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
				; CHECK-NEXT: [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
				; CHECK-NEXT: br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: br i1 true, label [[FOR_COND_CLEANUP:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 100, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ], [ 0, [[VECTOR_MEMCHECK]] ]
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[PREDS:%.*]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[PREDS]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP29:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[CMP1_NOT:%.*]] = icmp eq i32 [[TMP0]], 0			; CHECK-NEXT: [[CMP1_NOT:%.*]] = icmp eq i32 [[TMP29]], 0
	; CHECK-NEXT: [[SPEC_SELECT:%.]] = select i1 [[CMP1_NOT]], float [[BASE2:%.]], float [[BASE1:%.*]]			; CHECK-NEXT: [[SPEC_SELECT:%.]] = select i1 [[CMP1_NOT]], float [[BASE2_FR]], float* [[BASE1_FR]]
	; CHECK-NEXT: [[DOTSINK_IN:%.]] = getelementptr inbounds float, float [[SPEC_SELECT]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[DOTSINK_IN:%.]] = getelementptr inbounds float, float [[SPEC_SELECT]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[DOTSINK:%.]] = load float, float [[DOTSINK_IN]], align 4			; CHECK-NEXT: [[DOTSINK:%.]] = load float, float [[DOTSINK_IN]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds float, float [[DEST:%.*]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[TMP30:%.]] = getelementptr inbounds float, float [[DEST_FR]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store float [[DOTSINK]], float* [[TMP1]], align 4			; CHECK-NEXT: store float [[DOTSINK]], float* [[TMP30]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 100			; CHECK-NEXT: [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 100
	; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.cond.cleanup:			for.cond.cleanup:
	ret void			ret void

	for.body:			for.body:
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LAA] Analyze pointers forked by a selectClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 445449

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll

llvm/test/Transforms/LoopVectorize/forked-pointers.ll

[LAA] Analyze pointers forked by a select
ClosedPublic