This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Use umin_seq for BECount of multi-exit loops
ClosedPublic

Authored by nikic on May 4 2022, 2:31 AM.

Details

Summary

When computing the BECount for multi-exit loops, we need to combine individual exit counts using umin_seq rather than umin. This is because an earlier exit may exit on the first iteration, in which case later exit expressions will not be evaluated and could be poisonous. We cannot propagate potential poison values from later exits.

In particular, this avoids the introduction of "branch on poison" UB when optimizing multi-exit loops.

Diff Detail

Event Timeline

nikic created this revision.May 4 2022, 2:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 2:31 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
nikic requested review of this revision.May 4 2022, 2:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 2:31 AM
nikic edited the summary of this revision. (Show Details)May 9 2022, 8:59 AM
nikic added reviewers: lebedev.ri, reames, fhahn, nlopes.
nikic added inline comments.May 9 2022, 9:02 AM
llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll
140

Worth noting that this gets InstCombined to length == 0 || tmp0 >= length -- we could further fold this to freeze(tmp0) >= length if we wanted.

nikic updated this revision to Diff 428355.May 10 2022, 5:56 AM

Rebase over regenerated checks.

nikic added inline comments.May 10 2022, 6:28 AM
llvm/test/Transforms/PhaseOrdering/AArch64/peel-multiple-unreachable-exits-for-vectorization.ll
48

I wonder whether we wouldn't be better off lowering %x umin_seq %y to umin(%x, freeze %y) rather than %x == 0 ? 0 : umin(%x, %y). The latter is a more accurate representation (the former is a refinement), but it seems like in practice the former optimizes better. Especially if %x == 0 folds in some way, we have a hard time recovering from that.

For example, here's the diff between this patch and a umin(freeze) lowering for this test file: https://gist.github.com/nikic/a61a04e3ddf52108be721434f8f2228c

nikic added inline comments.
llvm/test/Transforms/PhaseOrdering/AArch64/peel-multiple-unreachable-exits-for-vectorization.ll
48

I've put up https://reviews.llvm.org/D125372 to implement this. Ultimately I don't know how important this is either way.

Please rebase when you get a chance. Want to see what the revised test diffs look like.

Assuming they're not terrible, I'm planning to LGTM this. This is a active miscompile, and while we might need to revert if practical perf impacts are too terrible, my default is to fix the bug and only closely examine the perf if we have evidence we need to.

llvm/lib/Analysis/ScalarEvolution.cpp
8168–8171

This definitely deserves a comment. :)

nikic updated this revision to Diff 430265.May 18 2022, 1:00 AM

Rebase over expansion change.

fhahn accepted this revision.May 20 2022, 2:09 AM

LGTM, thanks! Might be good to wait another day with committing in case @reames has additional thoughts.

This revision is now accepted and ready to land.May 20 2022, 2:09 AM

LGTM for me as well. Sorry, I crossed this off my mental list as done, but apparently forgot to actually give the LGTM.

This revision was landed with ongoing or failed builds.May 21 2022, 6:48 AM
This revision was automatically updated to reflect the committed changes.