This is an archive of the discontinued LLVM Phabricator instance.

[ConstraintElim] Add A < B if A is an increasing phi for A != B.
ClosedPublic

Authored by fhahn on Jun 12 2023, 10:30 AM.

Details

Summary

This patch adds additional logic to add additional facts for A != B, if
A is a monotonically increasing induction phi. The motivating use case
for this is removing checks when using iterators with hardened libc++,
e.g. https://godbolt.org/z/zhKEP37vG.

The initial patch only covers pointer comparisons, but I will shortly
share a follow-up for integer inductions, which will also handle #63125
(looks like this is hardened glibc++).

The patch hand-rolls detection of inductions (instead of using
something like isInduction from IVDescriptors.h) to avoid pulling in
SCEV. It adds LoopInfo, but this doesn't seem to have any noticable
compile-time impact.

Depends on D151799.

Diff Detail

Event Timeline

fhahn created this revision.Jun 12 2023, 10:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2023, 10:30 AM
fhahn requested review of this revision.Jun 12 2023, 10:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2023, 10:30 AM
fhahn updated this revision to Diff 530586.Jun 12 2023, 10:35 AM

update llvm/test/Transforms/ConstraintElimination/monotonic-pointer-phis-crashes.ll

llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
1361

Very minor, but one may find this slightly more readable?

if (NumIn == N->getDFSNumIn() && NumOut == N->getDFSNumOut())
  InLoopSucc = Succ;
else
  LoopExitSucc = Succ;
nikic added inline comments.Jun 18 2023, 8:00 AM
llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
1383

This does not guarantee a step of 1, because you're missing the multiplication with the type alloc size here. And if you add that, it likely won't support the cases you're interested in any more.

It would probably be easier to handle integer inductions, pointers are more complicated.

fhahn updated this revision to Diff 535494.Jun 28 2023, 12:55 PM

Rebase on top of recent changes, check element type of the GEP, simplify loop as suggested.

fhahn updated this revision to Diff 535501.Jun 28 2023, 1:24 PM
fhahn marked 2 inline comments as done.

Refine element type checks to avoid another problematic case. Tests added in 82aeec6570aa405.

fhahn added a subscriber: ldionne.Jun 28 2023, 1:27 PM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
1361

Simplified, thanks!

1383

Yep! I *think* GEPs with element type of i8 should be fine unconditionally, for others, it should be fine if the GEP we compare to has the same element type as the step GEP.

The main motivation is hardening libc++ where this allows us to remove checks for loops using iterators which should be quite high impact.

nikic added inline comments.Jun 28 2023, 1:35 PM
llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
1393

GEP steps by the type alloc size, which is generally not the same as the type size.

1397

I think you need significantly more checks to make this valid reasoning. There is no guarantee here that both GEPs are based on the same value, so maybe the induction GEP is going over 16 byte aligned pointers, while the UpperGEP is working on pointers offset by 8, in which case the condition will never be true.

nikic added inline comments.Jun 28 2023, 1:37 PM
llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
1311

Why is this only done for the "and" case and not for normal dominating conditions?

fhahn updated this revision to Diff 535707.Jun 29 2023, 3:38 AM
fhahn marked 3 inline comments as done.

Addressed latest comments, thanks! The main change is switching the code to use collectOffsets to check the step & upper GEPs.

llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
1311

It's done when adding facts in general (around line 1454) or do you mean something else?

1393

Updated the code to use collectOffset which should take care of this and added extra tests in 36999318f091

1397

Added some extra tests in 36999318f091, d34bc66ef9bd & c92c9462a0ca. For now, I restricted them to cases where both UpperGEP and the phi are based on the same start values as the later ULE check is not enough to avoid this in all cases.

I also adjusted the code to collect the collectOffset (this allows for easy handling of GEPs accessing structs & constant offsets) and check if the upper offset is a multiple of the step offset.

nikic added a comment.Jun 29 2023, 8:12 AM

This feels very fragile... both in that it's easy to get wrong and only handles a very specific pattern.

llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
1382

Does this check for step 1 still make sense, given the more generic checks below?

Also, aren't we missing a check here that the GEP pointer operand is the PN? I think right now it can be unrelated...

1400

This looks like a complicated way to write accumulateConstantOffset().

1425

A cleaner (and more general) way to write this would be that all the scales/offset must be divisible by StepConstantOffset.

nikic added inline comments.Jun 29 2023, 8:28 AM
llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
1405

I think you need to require inbounds here as well. Otherwise, for non-power-of-two steps, you don't have a guarantee that the end value will be hit without wrapping (just that it is hit eventually).

fhahn updated this revision to Diff 536278.Jun 30 2023, 8:46 AM

Address comments, including using accumulateConstantOffsets, generalzing check of offsets and simplifying step GEP check. I added a number of additional tests to check for various additional issues.

fhahn marked 4 inline comments as done.Jun 30 2023, 8:49 AM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
1382

Does this check for step 1 still make sense, given the more generic checks below?

Not needed any longer, removed, thanks!

Also, aren't we missing a check here that the GEP pointer operand is the PN? I think right now it can be unrelated...

Yep, added the check. The original test cases I had for that case didn't show the issue during verification, but added a couple of additional ones.

1400

Updated to use accumulateConstantOffset, thanks!

1405

Agreed, added the check. I don't think there's a way to add a test that triggers a miscompile because missing inbounds will prevent any further analysis of the GEP in the pass.

1425

Yep, generalizing it also results in simpler code overall, thanks!

nikic added a comment.Jul 7 2023, 3:42 AM

This patch adds additional logic to add additional facts for A != B, if
A is a monotonically increasing induction phi. The motivating use case
for this is removing checks when using iterators with hardened libc++,
e.g. https://godbolt.org/z/zhKEP37vG.

Looks like on trunk this doesn't emit bounds checks in the first place...

llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
1340

TODO: Handle phi on RHS? I don't think any canonicalization guarantees that it is on the LHS.

1361

Wonder whether it would be better to store the DTN in FactOrCheck rather than NumIn/NumOut.

1368

Generally, this whole "look at branch of single user of comparison" approach looks pretty hacky. This seems like something that should be done when initially queueing the fact, as we have direct knowledge of the branch structure at that point.

1396

Unused variable

1409

Must be same as BitWidth?

1424

Start -> StartValue

llvm/test/Transforms/PhaseOrdering/iterator-with-runtime-check.ll
43 ↗(On Diff #536278)

I'm a bit confused on how this one ends up working. Here we have the condition on the latch, so it doesn't dominate for.body. How can we still eliminate a condition there?

fhahn updated this revision to Diff 538175.Jul 7 2023, 9:32 AM
fhahn marked 11 inline comments as done.

Address latest comments, thanks!

This patch adds additional logic to add additional facts for A != B, if
A is a monotonically increasing induction phi. The motivating use case
for this is removing checks when using iterators with hardened libc++,
e.g. https://godbolt.org/z/zhKEP37vG.

Looks like on trunk this doesn't emit bounds checks in the first place...

I think that's because the macro name changed in be02f912d6b9b515a1d1b30cf06dfff1c085ea34 . Here's an updated link: https://godbolt.org/z/qfqq9E3nK

llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
1340

Added a TODO, should be fairly straigth-forward to extend.

1361

The main reason for storing the numbers directly is to avoid unnecessary indirection when sorting the worklist, which probably outweighs the benefit here of having easy access to DTN, but I might be wrong.

1368

It would be slightly cleaner to do it a queuing time. There are 2 current limitations that prevent us from doing so:

  1. We would need a way to add a precondition for facts that needs to be checked just before adding a fact (StartValue <= B)
  2. A way track synthetic facts (e.g. PN < B which doesn't exist as ICmpInst)
1396

Removed, thanks!

1409

Yep, simplified, thanks!

1424

Updated, thanks!

llvm/test/Transforms/PhaseOrdering/iterator-with-runtime-check.ll
43 ↗(On Diff #536278)

ConstraintElimination runs before loop-rotate, it processes and simplifies the following IR:

define void @test_fill_with_foreach([2 x i64] %elems.coerce) local_unnamed_addr {                        <<<
entry:
  %elems.coerce.fca.0.extract = extractvalue [2 x i64] %elems.coerce, 0
  %0 = inttoptr i64 %elems.coerce.fca.0.extract to ptr
  %elems.coerce.fca.1.extract = extractvalue [2 x i64] %elems.coerce, 1
  %add.ptr.i = getelementptr inbounds i32, ptr %0, i64 %elems.coerce.fca.1.extract
  %cmp.not.i.i.i.i = icmp slt i64 %elems.coerce.fca.1.extract, 0
  br i1 %cmp.not.i.i.i.i, label %error, label %for.cond

common.ret:                                       ; preds = %for.cond, %error
  ret void

error:                                            ; preds = %for.body, %entry
  call void @error()
  br label %common.ret

for.cond:                                         ; preds = %entry, %for.latch
  %__begin1.sroa.0.0 = phi ptr [ %incdec.ptr.i, %for.latch ], [ %0, %entry ]
  %cmp.i.not = icmp eq ptr %__begin1.sroa.0.0, %add.ptr.i
  br i1 %cmp.i.not, label %common.ret, label %for.body

for.body:                                         ; preds = %for.cond
  %cmp.not.i.i = icmp uge ptr %__begin1.sroa.0.0, %0
  %cmp2.i.i = icmp ult ptr %__begin1.sroa.0.0, %add.ptr.i
  %sel = select i1 %cmp.not.i.i, i1 %cmp2.i.i, i1 false
  br i1 %sel, label %for.latch, label %error

for.latch:                                        ; preds = %for.body
  call void @use(ptr noundef nonnull align 4 dereferenceable(4) %__begin1.sroa.0.0)
  %incdec.ptr.i = getelementptr inbounds i32, ptr %__begin1.sroa.0.0, i64 1
  br label %for.cond
}
fhahn updated this revision to Diff 541424.Jul 18 2023, 3:22 AM

Rebase and ping :)

fhahn updated this revision to Diff 543561.Jul 24 2023, 8:10 AM

Rebase & ping :)

fhahn updated this revision to Diff 545999.Aug 1 2023, 4:03 AM

Fix formatting, rebase & ping :)

nikic added a comment.Aug 1 2023, 5:32 AM

The last time I looked at this, I think the logic was correct, but I was having some serious doubts that doing this in ConstraintElim is really a good idea. A big red flag here is that this will only work if the "primary" loop condition comes first, while in canonical (rotated) form, it would be last, on the loop latch. It happens to work out for the motivating example because ConstraintElim runs fairly early, but that also means we are locked into that pipeline position, and that this may not generalize to other frontends. There are other weaknesses (e.g. the fact that this only works on pre-inc IV, even though post-inc is canonical), but support for those can be added. The dependence on the order of exits seems like a fundamental limitation that is hard to avoid.

optimizeLoopExits() in IndVarSimplify is normally the transform for eliminating such exits. That one can handle exits in any order, and I does eliminate such checks in many other cases. What prevents it from optimizing this particular case? Is that a fundamental limitation of optimizeLoopExits(), or something that can be fixed? Is there some other reason we should be doing this in ConstraintElim rather than IndVars (or other SCEV-based transform)?

cynecx added a subscriber: cynecx.Aug 2 2023, 6:40 PM
fhahn updated this revision to Diff 551184.Aug 17 2023, 10:30 AM

The last time I looked at this, I think the logic was correct, but I was having some serious doubts that doing this in ConstraintElim is really a good idea. A big red flag here is that this will only work if the "primary" loop condition comes first, while in canonical (rotated) form, it would be last, on the loop latch. It happens to work out for the motivating example because ConstraintElim runs fairly early, but that also means we are locked into that pipeline position, and that this may not generalize to other frontends. There are other weaknesses (e.g. the fact that this only works on pre-inc IV, even though post-inc is canonical), but support for those can be added. The dependence on the order of exits seems like a fundamental limitation that is hard to avoid.

optimizeLoopExits() in IndVarSimplify is normally the transform for eliminating such exits. That one can handle exits in any order, and I does eliminate such checks in many other cases. What prevents it from optimizing this particular case? Is that a fundamental limitation of optimizeLoopExits(), or something that can be fixed? Is there some other reason we should be doing this in ConstraintElim rather than IndVars (or other SCEV-based transform)?

My main motivation for this is to be able to easily combine the info about the induction with other facts that are available to ConstraintElimination (in particular that the offset is signed positive from outside the loop); it might be possible to handle this pattern directly in optimizeLoopExits (possibly in combination with applyLoopGuards), but augmenting ConstraintElimination with info about induction phis should allow additional optimizations by combining other facts in a generic way.

ConstraintElimination at the moment is deliberately placed before LoopRotation, and it works quite well in practice for less-than/greater-than loop conditions. A more generic way to handle this in ConstraintElimination would be to compute the final value of inductions based on the symbolic trip count and inject that as upper bound for the induction at entry of the header block, but that is a bit more expensive. I experimented with a few different options to rework the patch using SCEV and the more general handling adds about +0.3 - 0.4% compile time overall. (//github.com/apple/llvm-project/commit/83e1361ff2eee1a14e96025914f8d14ff347f01b)

Using SCEV to detect induction monotonic Add recurrence phis and getting the step doesn't seem more expensive than the current implementation and I sketched that together with adding the facts to the initial worklist (rather than later) in the updated patch. I've also put together a variant of this patch that uses SCEV to identify the recurrence + get the step (https://github.com/fhahn/llvm-project/commit/e655ef3b8c62929581c839803d9872d93abf327b).

WDYT?

fhahn updated this revision to Diff 553249.Aug 24 2023, 1:54 PM

Restructure checks a bit, move down SCEV based monotonic check, support adding (ICMP_UGE, PN, StartValue) without precondition whe monotonically increasing and with precondition if it isnt.

This now supports integers (with step of 1) and pointers.

nikic added a comment.Aug 25 2023, 3:27 AM

It looks like this patch doesn't apply to main -- are there more changes missing from the stack?

fhahn updated this revision to Diff 553451.Aug 25 2023, 6:22 AM

Put up the missing changes, rebased this one. The full set of patches is available here: https://github.com/fhahn/llvm-project/tree/perf/ce-libcxx-iterator-with-scev

fhahn updated this revision to Diff 554358.Aug 29 2023, 8:17 AM

Rebase on top of recent main (and earlier patches in the stack).

fhahn updated this revision to Diff 557296.Sep 25 2023, 2:47 AM

Rebased after all pending commits have landed. Ping :)

nikic added a comment.Sep 25 2023, 6:15 AM

Btw, I don't think that there is any useful SCEV sharing going on at the current pipeline position (did you see worse compile-time results prior to the phase ordering change?) I believe LPM1 does not use SCEV, only preserve it.

llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
134

May be cleaner to give ConditionTy a default constructor?

148

And then here accept ConditionTy DoesHold = ConditionTy() rather than unpacked form?

830

Pred = CmpInst::getSwappedPredicate(Pred) to avoid a miscompile if this function every handles non-equality predicates...

838

Succ -> InLoopSucc

863

This is not the StepTy. If you used the actual StepTy here (i.e. the type of getStepRecurrence) then you would not need the pointer special case.

Alternatively you could also use SE.getTypeSizeInBits(AR).

906

If I understood right, for this case we don't actually need the additional gep proof above?

fhahn updated this revision to Diff 557327.Sep 25 2023, 1:43 PM
fhahn marked 5 inline comments as done.

Addressed latest comments, thanks!

Btw, I don't think that there is any useful SCEV sharing going on at the current pipeline position (did you see worse compile-time results prior to the phase ordering change?) I believe LPM1 does not use SCEV, only preserve it.

Yep, the pipeline adjustment only shaved off a small amout of compile-time (maybe ~0.05% or something IIRC), but should be worthwhile on its own.

fhahn marked an inline comment as done.Sep 25 2023, 1:43 PM
fhahn added inline comments.
llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
134

Done, thanks!

830

Adjusted, thanks!

838

Renamed, thanks!

863

Turns it this doesn't seem to be needed in the current version, as we just assign the step-recurrence's APInt below, removed

906

Yep, move this up, thanks!

nikic accepted this revision.Sep 27 2023, 1:16 AM

LGTM

llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
911

Should we avoid pushing this condition again if we already added it above?

This revision is now accepted and ready to land.Sep 27 2023, 1:16 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked an inline comment as done.
fhahn marked an inline comment as done.Sep 27 2023, 3:04 AM

Thank you very much for all the comments!

llvm/lib/Transforms/Scalar/ConstraintElimination.cpp
911

Added this conditionally, thanks!