This is an archive of the discontinued LLVM Phabricator instance.

[LAA] Analyze pointers forked by a select
ClosedPublic

Authored by huntergr on Aug 25 2021, 6:04 AM.

Details

Summary

Given a function like the following:

void forked_ptrs_different_base_same_offset(float *Base1, float *Base2, float *Dest, int *Preds) {
  for (int i=0; i<100; i++) {
    if (Pred[i] != 0) {
      Dest[i] = Base1[i];
    } else {
      Dest[i] = Base2[i];
    }
  }
}

LLVM will optimize the IR to a single load using a pointer determined by a select instruction:

%spec.select = select i1 %cmp1.not, float* %Base2, float* %Base1
%.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv 
%.sink = load float, float* %.sink.in, align 4

LAA is currently unable to analyze such IR, since ScalarEvolution will return a SCEVUnknown for the pointer operand for the load.

This patch adds initial optional support for analyzing both possibilities for the pointer and allowing LAA to generate runtime checks for the bounds if required.

Diff Detail

Event Timeline

huntergr created this revision.Aug 25 2021, 6:04 AM
huntergr requested review of this revision.Aug 25 2021, 6:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2021, 6:04 AM
huntergr updated this revision to Diff 370905.Sep 6 2021, 6:18 AM

Rebased, ran clang-format over the patch.

Hi @huntergr, apologies for delay reviewing this patch! I've not finished reviewing it, but I've left some comments that I have so far. Thanks!

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
374 ↗(On Diff #370905)

If we're only ever going to support two forks couldn't we make this simpler by just having:

unsigned Members[2];

instead or is this a real pain to implement? Or is the idea that we may want to extend the fork to allow more than 2 in future?

395 ↗(On Diff #370905)

Again, does this need to be a SmallVector and can it just be const SCEV *Starts[2];?

llvm/include/llvm/Analysis/ScalarEvolution.h
444 ↗(On Diff #370905)

Hi Graham, it feels slightly awkward having to essentially redefine a ForkedPointer as a pair in the same way as ForkedPtrs in llvm/include/llvm/Analysis/LoopAccessAnalysis.h. Maybe we could define something like:

using ForkedPointer = std::pair<const SCEV *, const SCEV *>;

in llvm/include/llvm/Analysis/ScalarEvolution.h that can be used in LoopAccessAnalysis too?

llvm/lib/Analysis/LoopAccessAnalysis.cpp
183

nit: Perhaps you can just write assert(Fork <= 1 && ...) since it's unsigned anyway?

379

nit: Perhaps you can just write assert(Fork <= 1 && ...) since it's unsigned anyway?

505

Have you rewritten the logic here as a performance improvement, i.e. to avoid calling Group.addPointer() after it's already been merged?

743

nit: This is just a suggestion, but you could restructure this a little to remove the extra indentation I think here:

if (!AR) {
  if (!RtCheck.AllowForkedPtrs)
    return false;
  ...

Not sure if this is better?

759

Is it better to just make a recursive call to hasComputableBounds instead of calling isAffine here? We're potentially missing out on future improvements to hasComputableBounds here, and also we're ignoring the possibility of a forked pointer being loop invariant I think.

llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1

Can we also have a test where at least one of the forked pointers is loop-invariant?

huntergr added inline comments.Sep 7 2021, 7:31 AM
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
374 ↗(On Diff #370905)

So this is for a RuntimeCheckingPtrGroup, which may have more than two members already; I've only changed it from a SmallVector to a SmallSetVector here in order to avoid duplicate members, since both sides of a forked pointer can be added to the same group.

395 ↗(On Diff #370905)

I'll look into it, but it'll make other parts messier since I can't just iterate over the members of the SmallVectors and would need to perform null checks for all those places for the second element.

One solution I thought of (but didn't implement, since I wanted some community feedback on the overall work first) was to create a new SCEVForkedExpr so that we wouldn't need to store two SCEVs here and could just note which fork(s) was represented. It'll take a while to implement that, but might be cleaner. Thoughts?

llvm/lib/Analysis/LoopAccessAnalysis.cpp
505

Not intentionally, no -- I just replicated the behaviour of the original (break out of the loop if the pointer merged into a group) but considered both potential forks.

llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1

We already do; see forked_ptrs_uniform_and_contiguous_forks

We could properly analyze and vectorize that case as well, but I haven't implemented that yet so I'm just testing that it gets rejected for now.

david-arm added inline comments.Sep 7 2021, 7:39 AM
llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1

Ah ok, sorry I missed that. I guess what I meant was that this should be trivial to implement, particularly if we can find a way of making calls to hasComputableBounds recursive and re-use the existing code that checks for loop-invariants and affine pointers.

huntergr updated this revision to Diff 378920.Oct 12 2021, 1:53 AM

Rebased, updated based on review comments.

huntergr marked 4 inline comments as done.Oct 12 2021, 2:09 AM
huntergr added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
743

We still needed to bail out if AR was false and forked pointers weren't allowed... so I rewrote it to check for the positive case for AR first and then proceed with the forked pointers check and default false afterwards.

759

We can't do that right now -- hasComputableBounds takes a Value* rather than a SCEV* so it can (potentially) be added to the stride map in replaceSymbolicStrideSCEV. We'd just end up looking at the same Value and splitting again.

This is something the SCEVForkedExpr would make cleaner, since we would only need to evaluate a single SCEV. But it'll take a bit of refactoring to do that, which is why I wanted some feedback on the whole idea first.

We could also separate out parts of these functions to make it recursive, but I'll need to be careful since replaceSymbolicStrideSCEV has other users.

llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1

Sadly, it'll require a bit more work to support invariant addresses. My original downstream code allowed them, but we ran into a bug with it and disabled them.

My plan is to get the base functionality committed, then go back and add an interface so that LoopVectorize (or other LAA consumers) can query the type of the forks in order to generate correct (and hopefully more optimal) IR for various cases -- the current contiguous-only SCEVs, strides of >1, loop-invariant but unknown strides, uniform/invariant addresses, indexed gather/scatter, etc.

If both forks have a stride of 1, or are invariant, then we could potentially plant two masked load instructions (or load + broadcast) instead of a gather, for instance. But that's future work until this part is completed.

Thanks for making the changes @huntergr! I've got a few mostly minor comments so far, but still have to review findForkedSCEVs. :)

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
409 ↗(On Diff #378920)

Hi @huntergr, thanks for making the changes to Exprs. I just wonder if we should have an assert here that ScExprs.size() <= 2?

llvm/lib/Analysis/LoopAccessAnalysis.cpp
464

nit: Is it worth calling CheckingGroups.emplace_back(I, *this, /*Fork=0*/0) for clarity here?

755

nit: This is just a minor comment, but you could remove indentation further here by bailing out early, i.e.

if (!FPtr)
  return false;

const SCEV *A =
759

nit: Instead of writing:

LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << *(FPtr->first) << "\n");

I think you can just write

LLVM_DEBUG(dbgs() << "LAA: SCEV1: " << *A << "\n");

and same for the second one.

david-arm added inline comments.Oct 20 2021, 6:54 AM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
761

Is there any danger in setting this before we return true?

780

You're introducing a new implicit TypeSize -> uint64_t cast here. Could you rewrite this as:

int64_t Size = DL.getTypeAllocSize(PtrTy->getElementType()).getFixedSize();
huntergr updated this revision to Diff 381211.Oct 21 2021, 4:43 AM

Rebased, minor fixes from review comments.

huntergr marked 5 inline comments as done.Oct 21 2021, 4:45 AM
david-arm added a subscriber: lebedev.ri.

Thanks for dealing with all the review comments @huntergr! I think the patch looks sensible to me, although I'm not as familiar with the SCEV code as others might be. I'm adding @lebedev.ri as a potential reviewer if that's ok?

llvm/test/Transforms/LoopVectorize/forked-pointers.ll
33

It would be good to show some distinction here between Check 1 and Check 2. I assume it's actually checking each of the forked pointers, but the output doesn't make that clear.

huntergr updated this revision to Diff 387879.Nov 17 2021, 2:23 AM
huntergr marked an inline comment as done.

Rebased, and changed the membership of a checking group to be a pair of PointerInfo index and fork so that printing can show which forks are present in a group.

huntergr updated this revision to Diff 388889.Nov 22 2021, 5:23 AM

Rebased, disabled by default, added a couple of different instructions into the tests to ensure those paths are at least covered even if they aren't in a positive case right now; I'm still planning to leave those cases (known strides, g/s, uniform) for a followup patch; I suppose I could start a patch series if people want to see those earlier.

I've run the LNT nightly suite with this enabled, along with a few HPC benchmarks I have access to and they all pass.

LGTM! Thanks for making all the changes @huntergr and adding the tests. This patch is low risk at the moment with the flag being disabled currently. At some point once there have been performance investigations to prove there are no regressions we can then enable this by default.

david-arm accepted this revision.Nov 23 2021, 5:10 AM
This revision is now accepted and ready to land.Nov 23 2021, 5:10 AM
fhahn added a comment.Nov 23 2021, 5:22 AM

LGTM! Thanks for making all the changes @huntergr and adding the tests. This patch is low risk at the moment with the flag being disabled currently. At some point once there have been performance investigations to prove there are no regressions we can then enable this by default.

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
20 ↗(On Diff #388889)

Is this needed? Other SCEV types are forward-declared below, so ScalarEvolution.h doesn't need to be included.

334 ↗(On Diff #388889)

Could we use a bool for Fork if it only has 2 possible values?

llvm/include/llvm/Analysis/ScalarEvolution.h
769 ↗(On Diff #388889)

This is only used by LAA. Is there a reason this needs to be part of ScalarEvolution?

llvm/test/Transforms/LoopVectorize/forked-pointers.ll
1

the tests running -loop-accesses should be in llvm/test/Analysis/LoopAccessAnalysis.

Also, could you pre-commit the tests and update the diff here to show only the difference? That way it is a bit easier to see the impact in the diff.

I just realized how this may be a bit similar to how we handle pointers that are phi nodes. Currently those are handled by adding accesses for both incoming values (see D109381). Unfortunately the same approach cannot be directly used for selects, because we need to create 2 pointers that do not exist in the IR.

But if MemAccessInfo/ would also carry the pointer SCEV directly, I think it would be possible to avoid adding another dimension to RuntimePointerChecking::PointerInfo. Instead we would add 2 PointerInfo entries with separate translated pointer SCEVs. I put up a rough sketch of what this may look like in D114480, D114479 to see how this may look like

fhahn added a comment.Nov 23 2021, 4:19 PM

I just realized how this may be a bit similar to how we handle pointers that are phi nodes. Currently those are handled by adding accesses for both incoming values (see D109381). Unfortunately the same approach cannot be directly used for selects, because we need to create 2 pointers that do not exist in the IR.

But if MemAccessInfo/ would also carry the pointer SCEV directly, I think it would be possible to avoid adding another dimension to RuntimePointerChecking::PointerInfo. Instead we would add 2 PointerInfo entries with separate translated pointer SCEVs. I put up a rough sketch of what this may look like in D114480, D114479 to see how this may look like

I also put up a sketch of a stripped down variant that only supports runtime check generation by adding multiple PointerInfos with different associated pointer SCEVs: D114487.

The variant that also extends MemAccessInfo should also be able to determine that certain loops are safe without runtime checks, e.g. like the one below I think:

%s1 = type { [32000 x float], [32000 x float], [32000 x float] }
define dso_local void @foo(%s1 * nocapture readonly %Base, i32* nocapture readonly %Preds) {
entry:
  br label %for.body

for.cond.cleanup:
  ret void

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
  %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
  %0 = load i32, i32* %arrayidx, align 4
  %cmp1.not = icmp eq i32 %0, 0
  %gep.1 = getelementptr inbounds %s1, %s1* %Base, i64 0, i32 1, i32 0
  %gep.2 = getelementptr inbounds %s1, %s1* %Base, i64 0, i32 2, i32 0
  %spec.select = select i1 %cmp1.not, float* %gep.1, float* %gep.2
  %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv
  %.sink = load float, float* %.sink.in, align 4
  %1= getelementptr inbounds %s1, %s1 * %Base, i64 0, i32 0, i64 %indvars.iv
  store float %.sink, float* %1, align 4
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 100
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}
huntergr updated this revision to Diff 389956.Nov 26 2021, 2:22 AM

Revised based on @fhahn 's initial suggestions, rebased.

I'll look into the MemAccessInfo approach.

huntergr marked 3 inline comments as done.Nov 26 2021, 2:27 AM
huntergr added inline comments.
llvm/include/llvm/Analysis/LoopAccessAnalysis.h
20 ↗(On Diff #388889)

It was there to include the 'using ForkedPointer =' definition, but moving it out of ScalarEvolution means we can just define it here.

I had hoped that we could maybe integrate it further into ScalarEvolution as a new type of SCEV, but maybe this isn't the right time -- you're correct, this will be the only user for now, so we can just keep it here until we find a case where it would be useful to have it more widely available.

mgabka added a subscriber: mgabka.Nov 29 2021, 1:52 AM
david-arm accepted this revision.Nov 29 2021, 4:53 AM

LGTM!

Hi @huntergr, thanks for making all the changes. I think the patch looks good to
go for now. It's still worth looking into the MemAccessInfo approach as a follow-up,
but the patch has been sat in review for long enough (3 months) without any
fundamental objections so I'd prefer we got something merged now to get the
functionality defended and unblock future work. This is currently only enabled
under an option, so any refactoring for the MemAccessInfo approach can be
done safely under another patch.

fhahn added a comment.Nov 29 2021, 2:07 PM

LGTM!

Hi @huntergr, thanks for making all the changes. I think the patch looks good to
go for now. It's still worth looking into the MemAccessInfo approach as a follow-up,
but the patch has been sat in review for long enough (3 months) without any
fundamental objections so I'd prefer we got something merged now to get the
functionality defended and unblock future work.

I agree it is not ideal that there's been not much feedback so far, but now that there is some additional feedback/suggestion, I think it would be good to hear additional opinions on the preferred direction here or at least discuss concrete potential alternatives; D114487 in particular which should have similar effects, but with less invasive changes, i.e. there's no need to adjust isNoWrap, hasComputableBounds, RuntimePointerChecking::insert or add additional state.

You mention future work blocked by this as a reason for landing this now, however I cannot find any references to patches depending on this work.

This is currently only enabled
under an option, so any refactoring for the MemAccessInfo approach can be
done safely under another patch.

While it is true that it is off by default, it adds substantial complexity to LAA which is already quite complex and the changes are quite spread out. One concern is that it adds an additional way to model 'forked pointers'; we already handle 'forked pointers' via phi nodes (by adding multiple PointerInfos). I am also not sure it should be off by default, once it lands. IMO better analysis should be the right thing to do and should be quite safe from a performance regression perspective. I don't see a strong reason for not enabling this by default and having wide testing surface potential issues early.

huntergr updated this revision to Diff 406734.Feb 8 2022, 2:09 AM
huntergr marked an inline comment as done.
huntergr set the repository for this revision to rG LLVM Github Monorepo.

Rebased and changed to use @fhahn 's lighter-weight approach from D114487 combined with my recursive function to find the SCEVs. Although the simplified tests are handled with a couple levels of checking, the real applications I was working on had additional operations between the ptr value for the load or store and a select.

It might be best to introduce a limit to recursion though, any thoughts?

bsmith added a subscriber: bsmith.Feb 28 2022, 3:01 AM
huntergr updated this revision to Diff 412642.Mar 3 2022, 2:19 AM

Rebased, added a recursion limit to the SCEV building function.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 2:19 AM
huntergr updated this revision to Diff 421130.Apr 7 2022, 1:47 AM

Rebased. Ping?

fhahn added a comment.Apr 10 2022, 2:59 PM

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
134

It would be good to add a test for the option, e.g. a case that requires 2 or 3 recursions and set test with max-forked-scev-depth=2/3

469–473

This scheme would need documenting, i.e. why we can have multiple expressions for a pointer.

474

should be removed?

732

should be removed?

836

missing test for this?

854

It looks like tests for those conditions are missing?

867

I don't think we can use the inbounds info here, unless we prove that the program is undefined if GEP is poison.

Consider something like below. If %c is always false, the GEP index could be out-of-bounds (and the GEP poison). Adding a runtime check based on the SCEV expression may introduce a branch on poison unconditionally.

define dso_local void @forked_ptrs_different_base_same_offset(float* nocapture readonly %Base1, float* nocapture readonly %Base2, float* nocapture %Dest, i32* nocapture readonly %Preds, i1 %c) {
entry:
  br label %for.body

for.cond.cleanup:
  ret void

for.body:
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %latch ]
  %arrayidx = getelementptr inbounds i32, i32* %Preds, i64 %indvars.iv
  %0 = load i32, i32* %arrayidx, align 4
  %cmp1.not = icmp eq i32 %0, 0
  %spec.select = select i1 %cmp1.not, float* %Base2, float* %Base1
  %.sink.in = getelementptr inbounds float, float* %spec.select, i64 %indvars.iv
  %.sink = load float, float* %.sink.in, align 4
  %1 = getelementptr inbounds float, float* %Dest, i64 %indvars.iv
  br i1 %c, label %then, label %latch

then:
  store float %.sink, float* %1, align 4
  br label %latch

latch:
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond.not = icmp eq i64 %indvars.iv.next, 100
  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
}
874

It might be a bit simpler to just add a duplicate to BaseScevs/OffsetScevs and have one common code path to compute the SCEVs to add

955

Needs resolving.

It should be needed, because the above code may have added assumptions, which make Ptr an AddRec, See comment in D114487.

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

Sure, that sounds like a good plan. I'll try and review your patch this week.

fhahn added a comment.May 2 2022, 11:41 AM

Thanks for the update!

I think it would be good to split this up to have a first patch that just adds very limited support in findForkedSCEVs and then gradually add support for more cases separately. This also makes it easier to make sure all code paths in findForkedSCEVs are covered by the unit tests.

I went ahead and rebased D114487 and stripped the fork analysis to the bare minimum. I'd be happy to land the scaffolding in D114487 separately and this patch and following could add the sophisticated fork analysis.

Sure, that sounds like a good plan. I'll try and review your patch this week.

Sounds great, thanks!

huntergr updated this revision to Diff 437820.Jun 17 2022, 2:15 AM

Rebased on top of Florian's patch from https://reviews.llvm.org/D114487

That patch has been reverted for now so builds will fail, but this should be pretty close to what we want once the base feature is back in.

I've cut the walker function down to only handle GEPs and selects for now; we can add more cases (and more tests) later.

I've added tests to exercise the checks that exclude a given select or GEP from processing, as well as one for the recursion limit.

I'll precommit the new tests before the next update on this patch.

fhahn added a comment.Jun 28 2022, 2:13 AM

Rebased on top of Florian's patch from https://reviews.llvm.org/D114487

That patch has been reverted for now so builds will fail, but this should be pretty close to what we want once the base feature is back in.

Thanks, the patch has been relanded a while ago.

I've cut the walker function down to only handle GEPs and selects for now; we can add more cases (and more tests) later.

I've added tests to exercise the checks that exclude a given select or GEP from processing, as well as one for the recursion limit.

I'll precommit the new tests before the next update on this patch.

Sounds good! I skimmed the current version and it looks like most comments should be addressed in the latest update. It might be good to mark them as done. One thing I couldn't spot in the latest version is a test for the potential inbounds issue mentioned inline.

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
1–2

It would be probably be good to convert the IR to use opaque pointers in a pre-commit, so -opaque-pointers is not needed.

huntergr updated this revision to Diff 444549.Jul 14 2022, 1:08 AM

Rebased on top of @fhahn 's fixed patch, including changes to detect possible undef/poison.

New tests were precommitted in rGa19cf47da095.

huntergr marked 5 inline comments as done.Jul 14 2022, 1:14 AM
huntergr added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
867

I removed the code to add no-wrap flags to the SCEV based on 'inbounds', then added your test.

fhahn accepted this revision.Jul 18 2022, 1:10 AM

LGTM, thanks!

llvm/lib/Analysis/LoopAccessAnalysis.cpp
916

nit: unnecessary move?

llvm/test/Analysis/LoopAccessAnalysis/forked-pointers.ll
158

nit: it would probably be good to also have a few tests that access different sizes, including odd ones like i23 or something like that, to ensure the right size expressions are used.

This revision was landed with ongoing or failed builds.Jul 18 2022, 4:08 AM
This revision was automatically updated to reflect the committed changes.
ro added a subscriber: ro.Jul 18 2022, 6:45 AM

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::SCEV*, bool> > findForkedPointer(llvm::PredicatedScalarEvolution&, const ValueToValueMap&, llvm::Value*, const llvm::Loop*)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: could not convert ‘Scevs’ from ‘SmallVector<[...],2>’ to ‘SmallVector<[...],3>’
  916 |     return Scevs;
      |            ^~~~~
      |            |
      |            SmallVector<[...],2>
In D108699#3659633, @ro wrote:

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::SCEV*, bool> > findForkedPointer(llvm::PredicatedScalarEvolution&, const ValueToValueMap&, llvm::Value*, const llvm::Loop*)’:
/vol/llvm/src/llvm-project/dist/llvm/lib/Analysis/LoopAccessAnalysis.cpp:916:12: error: could not convert ‘Scevs’ from ‘SmallVector<[...],2>’ to ‘SmallVector<[...],3>’
  916 |     return Scevs;
      |            ^~~~~
      |            |
      |            SmallVector<[...],2>

Does rG4bd072c56b87 fix this for you?

ro added a comment.Jul 18 2022, 10:22 AM
In D108699#3659633, @ro wrote:

This patch broke the Solaris/amd64 and Solaris/sparcv9 builds:

[...]

Does rG4bd072c56b87 fix this for you?

It does indeed, thanks. FWIW, this is with g++ 11.3.0.

https://github.com/llvm/llvm-project/issues/57368 is an open bug report describing a regression from this patch.

https://github.com/llvm/llvm-project/issues/57368 is an open bug report describing a regression from this patch.

Thanks for the report and reproducer, I'll look into it.