This is an archive of the discontinued LLVM Phabricator instance.

[ScalarEvolution] Fix pointer/int type handling converting select/phi to min/max.
ClosedPublic

Authored by efriedma on Jun 3 2021, 6:01 PM.

Details

Summary

The old version of this code would blindly perform arithmetic without paying attention to whether the types involved were pointers or integers. This could lead to weird expressions like negating a pointer.

Explicitly handle simple cases involving pointers, like "x < y ? x : y". In all other cases, coerce the operands of the comparison to integer types. This avoids the weird cases, while handling most of the interesting cases.

Diff Detail

Event Timeline

efriedma created this revision.Jun 3 2021, 6:01 PM
efriedma requested review of this revision.Jun 3 2021, 6:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2021, 6:01 PM
xbolva00 added inline comments.
llvm/test/Transforms/IndVarSimplify/pr45835.ll
29

remove then?

reames requested changes to this revision.Jun 4 2021, 2:53 PM
reames added a subscriber: nikic.

Can you take a step back and describe the problem you're trying to solve? I've read through the bug, but nothing there immediately makes me think "bug in core part of SCEV" as opposed to "bug in user of SCEV". How'd you get from there to here?

This patch as implemented seems to be a strict regression. Without understanding what symptom you're worried about, it's hard to tell if this is a reasonable solution.

Given this seems to be related to a release regression and there may be a tendency to rush this, let me be really explicit. This, and the change it's based on, are not okay to land without either my sign off or from someone else who regularly works on SCEV (e.g. @mkazantsev, @nikic). This is a deep change which has non-obvious implications, with unclear motivation. Currently, I don't think either of these changes should land at all, much less in a hurry.

JFYI, I'm going to be offline for the next two days, I'll return to this Monday.

This revision now requires changes to proceed.Jun 4 2021, 2:53 PM

I'm happy to take some time to discuss this; if we urgently some other solution for the branch, we can take a more targeted approach.

The ultimate goal here is to make SCEV consistent about the "pointer-ness" of a SCEV value. If you call getSCEV() on a value, or evaluate a SCEV value at a particular iteration, or something along those lines, the output should be a pointer if and only if the input is a pointer. And the input and output should have the same pointer base. Enforcing these restrictions will make it easier to preserve correctness, and to reason about values like non-integral pointers.

As far as I can tell, we're pretty close. This patch plus D103656 handle almost all the interesting cases. The one remaining issue after these two patches is code outside SCEV that explicitly constructs min/max nodes using pointer values. I have a WIP patch for that which seems to pass tests; I'll try to post some time next week.

Eli,

The goal stated of having getSCEV(V)->getType()->isPointerTy() == V->getType()->isPointerTy() seems reasonable to me. I'm not as sure about the baseof(V) == baseof(S) bit, but I tentatively accept that as it's not my main confusion.

My main confusion is why tackle the problem this way? If we're constructing a SCEV node for an existing IR instruction which has two base pointers involved, then a) we've got a question about what the semantics of that instruction are at all (e.g. it's probably poison), and b) the SCEV result seems like it should have the same set of base pointers as the instruction.

Or, maybe said another way, what makes selects special here?

Can I maybe suggest you split this into a patch which enforces the pointerness property, and then a patch which imposes the baseof property? The former seems easy-ish to assert in getSCEV(V) and enumerate the cases which violate.

Here's a possible algorithm for determining pointer-ness of a SCEV expression:

  1. SCEVUnknown is a pointer if and only if the LLVM IR value is a pointer.
  2. SCEVPtrToInt is never a pointer.
  3. If any other SCEV expression has no pointer operands, the result is an integer.
  4. If a SCEVAddExpr has exactly one pointer operand, the result is a pointer.
  5. If a SCEVAddRecExpr's first operand is a pointer, and it has no other pointer operands, the result is a pointer.
  6. Otherwise, the SCEV expression is invalid.

Most of the results that come out of this algorithm aren't really controversial. It doesn't make sense to multiply by a pointer, or divide by a pointer, or add two pointers to each other.

We could possibly add a rule like "If a SCEVMinMaxExpr has all pointer operands, the result is a pointer". When I was looking at this originally, I was under the impression this would be problematic due to existing SCEV transforms, but looking again, maybe it's okay; I somehow thought the SCEV getter operations were more aggressive than they actually are. If it does work out, that would allow me to narrow the scope of this patch to some extent.

In any case, the issue with createNodeForSelectOrPHI is that it likes to create expressions like, for example, ((-1 * %p) + ((1 + %p) umax (2 + %p))), where %p is a pointer. (https://bugs.llvm.org/show_bug.cgi?id=46786#c17) This breaks any reasonable version of the above rules.

I'll mess with this a bit more.

efriedma updated this revision to Diff 352357.Jun 16 2021, 12:33 AM
efriedma retitled this revision from [ScalarEvolution] Don't form min/max for pointer-type phi/select. to [ScalarEvolution] Fix pointer/int type handling converting select/phi to min/max..
efriedma edited the summary of this revision. (Show Details)

Switching patch to a version that preserves most of the analysis power, while still avoiding the weird cases.

The revised change looks a lot more reasonable. With some cleanup, I'd be willing to LGTM this. I'm much happier with the explicit focus on avoiding the construction of a subtract of pointer type.

You combined the logic for the signed and unsigned case. Can I ask that you commit an NFC change which does that, and then rebase this? The overall structure is fine for the NFC version, I just want the diffs to stand out in this change.

Your suggested rules in the previous comment also seem to be a reasonable direction. My concern is around the subtract case. As you've noted, there's a bunch of cases where we subtract pointers today, and finding variants for those is going to taken some work. I'd also *strongly* encourage you to encode your rules as assertions. :)

Hm, have you considered doing the coercion check inside getMinusSCEV? If the construct we're trying to outlaw is a subtract of pointers, maybe we should just explicitly do that? (I'm fine with a cleaned up version of this landing, then exploring that if desired.)

llvm/lib/Analysis/ScalarEvolution.cpp
5554

Repeated code, masking a variable.

llvm/test/Transforms/IndVarSimplify/pr45835.ll
13

Can you explain this test change?

Hm, have you considered doing the coercion check inside getMinusSCEV? If the construct we're trying to outlaw is a subtract of pointers, maybe we should just explicitly do that? (I'm fine with a cleaned up version of this landing, then exploring that if desired.)

I'll thin about it.

llvm/test/Transforms/IndVarSimplify/pr45835.ll
13

The select was getting matched as a umax; the current version of the logic can't handle that.

reames added inline comments.Jun 16 2021, 9:11 AM
llvm/test/Transforms/IndVarSimplify/pr45835.ll
13

Ok.

efriedma updated this revision to Diff 352520.Jun 16 2021, 12:41 PM

Address review comments.

reames accepted this revision.Jun 16 2021, 2:03 PM

LGTM

This revision is now accepted and ready to land.Jun 16 2021, 2:03 PM