This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec
ClosedPublic

Authored by reames on Sep 15 2021, 12:26 PM.

Details

Summary

This fixes a violation of the wrap flag rules introduced in c4048d8f. This is an alternate fix to D106852.

The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined.

In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation. I'd welcome input on how to reword the comment to make it easier to follow. I believe the reasoning is sound, but its super hard to follow.

One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece.

The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.)

Diff Detail

Event Timeline

reames created this revision.Sep 15 2021, 12:26 PM
reames requested review of this revision.Sep 15 2021, 12:26 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2021, 12:26 PM
reames updated this revision to Diff 372779.Sep 15 2021, 12:58 PM

Rebase over additional tests landed in 248e430f3. These were the reduced ones arrived at during discussion; I'd apparently never actually landed them.

mkazantsev added inline comments.Sep 24 2021, 7:58 PM
llvm/lib/Analysis/ScalarEvolution.cpp
2808

Preheader has single successor by definition.

2825

Do we reject constants due to some block-related reasoning? Aren't they always available?

2831

Imagine the situation when inner loop is entered (and existing checks basically only test it), but always side-exits on 1st iteration in a call that is inside the inner loop. So actually whatever flags the inner AddRec has, it doesn't matter because they will never actually come in play. Will they still be propagated to outside?

reames added inline comments.Sep 27 2021, 2:20 PM
llvm/lib/Analysis/ScalarEvolution.cpp
2808

Yep, good catch. This is overly general from an earlier version of the code which had broader scope.

2825

I don't follow your question?

We're analyzing a non-constant expression. If we find a constant operand, we need to prove that *this occurrence* of the expression must execute on entry to the function. Does that answer your concern?

2831

I believe this is still fine. Here's my reasoning:

The flags being present on the inner addrec disallows the case where the call is before all UB triggering uses of the inner addrec. See usage of programUndefinedIfPoison in isSCEVExprNeverPoison.

As such, we know that if we reach the inner loop, we must execute UB if the inner add-rec's flags produce poison.

We also know that the inner add (not addrec!) must also trigger UB on overflow. (From argument)

Together, those let us know that the loop invariant add can have flags within the inner loop. (This doesn't correspond to a SCEV, but is a useful mental stepping stone.)

All that's left is proving that all instances of the add in the defining scope must reach the one in the inner loop.

Thus, we're good. (Assuming that all flags were set correctly to start at least.)

reames updated this revision to Diff 376719.Oct 2 2021, 2:13 PM

Rebase over conceptual split introduced in 26223af. Resulting code is much easier to follow and reason about.

nikic added inline comments.Oct 2 2021, 3:03 PM
llvm/lib/Analysis/ScalarEvolution.cpp
6598

Might make sense to just make &*F.getEntryBlock().begin() the fallback return value? That one should hold for all SCEVs, not just SCEVConstant in particular.

reames added inline comments.Oct 2 2021, 3:40 PM
llvm/lib/Analysis/ScalarEvolution.cpp
6598

Hm, with the current code structure, you're right. Once I build the recursion interface, I'll want to go back to the nullptr form (as a signal to recurse), but this is a good idea.

reames added inline comments.Oct 2 2021, 4:11 PM
llvm/lib/Analysis/ScalarEvolution.cpp
6598

Ran a test with this, and don't really see much meaningful test diff. Still worth doing, but do you mind if I leave it to a follow on patch?

nikic accepted this revision.Oct 3 2021, 1:38 AM

LGTM

llvm/lib/Analysis/ScalarEvolution.cpp
6598

I'm fine with that, but also don't really see the point of landing it separately. The suggestion was mostly intended as code cleanup, as I found the current treatment of SCEVConstant unintuitive.

6607

These instruction walks have pathological cases for large BBs. We should limit this walk to a small constant number of instructions. E.g. the programUndefinedIfPoison() check looks through 32 guaranteed-to-execute instructions. (This is fine as a followup.)

Alternatively we could add an ICF-style cache for this, but I am somewhat hesitant to suggest adding yet another cache that requires invalidation to SCEV.

This revision is now accepted and ready to land.Oct 3 2021, 1:38 AM
reames added inline comments.Oct 3 2021, 4:17 PM
llvm/lib/Analysis/ScalarEvolution.cpp
6598

Addressed in 35ab21. As you'll note, I needed to rename a bit and restructure the code to make the bounding explicit to make it clear to me. :)

6607

Addressed in 5f7a53.