This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] SCEVPtrToIntExpr simplifications
ClosedPublic

Authored by lebedev.ri on Oct 19 2020, 5:15 AM.

Details

Summary

If we've got an SCEVPtrToIntExpr(op), where op is not an SCEVUnknown,
we want to sink the SCEVPtrToIntExpr into an operand, so that the operation
is performed on integers, and eventually we end up with just an SCEVPtrToIntExpr(SCEVUnknown).

Diff Detail

Event Timeline

lebedev.ri created this revision.Oct 19 2020, 5:15 AM
lebedev.ri requested review of this revision.Oct 19 2020, 5:15 AM
lebedev.ri edited the summary of this revision. (Show Details)

Rebased, addded proper test coverage.

@efriedma does this look about right?

Speaking for myself, this isn't quite what i want to have in the end,
it should be closer to: (just so i don't loose the code)

const SCEV *ScalarEvolution::getPtrToIntExpr(const SCEV *Op, Type *Ty,
                                             unsigned Depth) {
  assert(Ty->isIntegerTy() && "Target type must be an integer type!");
  assert(Depth <= 1 && "getPtrToIntExpr() should self-recurse at most once.");

  // We could be called with an integer-typed operands during SCEV rewrites.
  // Since the operand is an integer already, just perform zext/trunc/self cast.
  if (!Op->getType()->isPointerTy())
    return getTruncateOrZeroExtend(Op, Ty);

  // What would be an ID for such a SCEV cast expression?
  FoldingSetNodeID ID;
  ID.AddInteger(scPtrToInt);
  ID.AddPointer(Op);

  void *IP = nullptr;

  // Is there already an expression for such a cast?
  if (const SCEV *S = UniqueSCEVs.FindNodeOrInsertPos(ID, IP))
    return getTruncateOrZeroExtend(S, Ty);

  // If not, is this expression something we can't reduce any further?
  if (isa<SCEVUnknown>(Op)) {
    // Create an explicit cast node.
    // We can reuse the existing insert position since if we get here,
    // we won't have made any changes which would invalidate it.
    Type *IntPtrTy = getDataLayout().getIntPtrType(Op->getType());
    assert(getDataLayout().getTypeSizeInBits(getEffectiveSCEVType(
               Op->getType())) == getDataLayout().getTypeSizeInBits(IntPtrTy) &&
           "We can only model ptrtoint if SCEV's effective (integer) type is "
           "sufficiently wide to represent all possible pointer values.");
    SCEV *S = new (SCEVAllocator)
        SCEVPtrToIntExpr(ID.Intern(SCEVAllocator), Op, IntPtrTy);
    UniqueSCEVs.InsertNode(S, IP);
    addToLoopUseLists(S);
    return getTruncateOrZeroExtend(S, Ty);
  }

  assert(Depth == 0 &&
         "getPtrToIntExpr() should not self-recurse for non-SCEVUnknown's.");

  // Otherwise, we've got some expression that is more complex than just a
  // single SCEVUnknown. But we don't want to have a SCEVPtrToIntExpr of an
  // arbitrary expression, we want to have SCEVPtrToIntExpr of an SCEVUnknown
  // only, and the expressions must otherwise be integer-typed.
  // So sink the cast down to the SCEVUnknown's.

  /// The SCEVPtrToIntSinkingRewriter takes a scalar evolution expression,
  /// which computes a pointer-typed value, and rewrites the whole expression
  /// tree so that *all* the computations are done on integers, and the only
  /// pointer-typed operands in the expression are SCEVUnknown.
  class SCEVPtrToIntSinkingRewriter
      : public SCEVRewriteVisitor<SCEVPtrToIntSinkingRewriter> {
    using Base = SCEVRewriteVisitor<SCEVPtrToIntSinkingRewriter>;

  public:
    SCEVPtrToIntSinkingRewriter(ScalarEvolution &SE) : SCEVRewriteVisitor(SE) {}

    static const SCEV *rewrite(const SCEV *Scev, ScalarEvolution &SE) {
      SCEVPtrToIntSinkingRewriter Rewriter(SE);
      return Rewriter.visit(Scev);
    }

    const SCEV *visit(const SCEV *S) {
      Type *STy = S->getType();
      // If the expression is not pointer-typed, just keep it as-is.
      if (!STy->isPointerTy())
        return S;
      // Else, recursively sink the cast down into it.
      return Base::visit(S);
    }

    const SCEV *visitUnknown(const SCEVUnknown *Expr) {
      Type *ExprPtrTy = Expr->getType();
      assert(ExprPtrTy->isPointerTy() &&
             "Should only reach pointer-typed SCEVUnknown's.");
      Type *ExprIntPtrTy = SE.getDataLayout().getIntPtrType(ExprPtrTy);
      return SE.getPtrToIntExpr(Expr, ExprIntPtrTy, /*Depth=*/1);
    }
  };

  // And actually perform the cast sinking.
  const SCEV *IntOp = SCEVPtrToIntSinkingRewriter::rewrite(Op, *this);
  assert(IntOp->getType()->isIntegerTy() &&
         "We must have succeeded in sinking the cast, "
         "and ending up with an integer-typed expression!");
  return getTruncateOrZeroExtend(IntOp, Ty);
}

... but that requires SCEVRewriteVisitor changes so it doesn't loose no-wrap flags on add/mul,
(unlike AddRec, where it already preserves them),
but i would imagine the approach to that refactoring to be non-obvious,
so i would prefer not to block this on that, and do that as a follow-up..

mkazantsev added inline comments.Oct 26 2020, 1:45 AM
llvm/test/Analysis/ScalarEvolution/ptrtoint.ll
501

Regression in range estimates.

lebedev.ri marked an inline comment as done.Oct 26 2020, 5:16 AM

@mkazantsev thank you for taking a look!

llvm/test/Analysis/ScalarEvolution/ptrtoint.ll
501

Originally we had pointer-typed AddRec there: {%arg,+,4}<nuw><%bb6>, which is:

; X32-NEXT:    %i7 = phi i32* [ %arg, %bb3 ], [ %i15, %bb6 ]
; X32-NEXT:    --> {%arg,+,4}<nuw><%bb6> U: full-set S: full-set Exits: ((4 * ((-4 + (-1 * %arg) + %arg1) /u 4))<nuw> + %arg) LoopDispositions: { %bb6: Computable }

It's constant range is 32-bit fullset, because we don't have any range knowledge for %arg.

Then we had a cast of it to pointer: (zext i32 (ptrtoint i32* {%arg,+,4}<nuw><%bb6> to i32) to i64),
and zero-extended it to 64-bit.
So is obvious that for the old expression, constant range [0,4294967296) is correct.

But now we start with zext i32 (ptrtoint i32* %arg to i32) to i64 (which is [0,4294967296))
and only then we have an AddRec, and in 64 bits now.
For the AddRec, the range is [Base + Step*max backedge-taken count, End + Step*max backedge-taken count).
For 32-bit AddRec, that computed to full-set that we then zero-extended,
but here it obviously computes to [0, 4294967296 + 4*1073741823].

So that's it, i believe. ptrtoint is essentially a red herring here,
we should have this problem every time we sink z/s-ext into an AddRec.

lebedev.ri marked an inline comment as done.

Rebased, NFC.

lebedev.ri added inline comments.Oct 26 2020, 5:27 AM
llvm/test/Analysis/ScalarEvolution/ptrtoint.ll
501

Edit: what i forgot to add: SCEVPtrToIntCast is intentionally not bitwidth-changing itself,
and sinking it does not affect the computations. The important bit is that getZeroExtendExpr()
decided that it was okay to sink the zero-extension into SCEVAddRecExpr,
and that is not something i changed in this patch.

It doesn't do that indiscriminately:

// If the input value is a chrec scev, and we can prove that the value
// did not overflow the old, smaller, value, we can zero extend all of the
// operands (often constants).  This allows analysis of something like
// this:  for (unsigned char X = 0; X < 100; ++X) { int Y = X; }
if (const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Op))

Clearly, there was NUW on that SCEVAddRecExpr, so the heuristic passed.

But obviously i32 SCEVAddRecExpr w/NUW and i64 SCEVAddRecExpr w/NUW result in
different possible ranges, and we lost the knowledge that NUW was for i32 bitwidth.

mkazantsev added inline comments.Oct 27 2020, 3:24 AM
llvm/test/Analysis/ScalarEvolution/ptrtoint.ll
501

Thanks for clarifying this. Actually that looks familiar. Just curious, could you please check if the problem goes away with https://reviews.llvm.org/D89381 and flag scalar-evolution-use-expensive-range-sharpening turned on?

mkazantsev added inline comments.Oct 27 2020, 3:30 AM
llvm/lib/Analysis/ScalarEvolution.cpp
1053

If you check it's either Nary or unknown, checking it NOT be something else in addition is an overkill. :)

1061

nit: could go with SmallVector<const SCEV *, 2> NewOps(NaryExpr->getNumOperands())

1067

Please init with nullptr to make it easier to track further possible mistakes.

1088

Is it commented on purpose?

lebedev.ri marked 5 inline comments as done.

@mkazantsev thank you for taking a look!
Addressing review comments.

llvm/lib/Analysis/ScalarEvolution.cpp
1061

That doesn't reserve thought, but creates a vector with that many nullptrs.
And afterwards we'd have to do something like

for(int I = 0; I != NaryExpr->getNumOperands(); ++I) {
  const auto& Op = NaryExpr->getOperand(I);
  NewOps[I] = Op->getType()->isPointerTy()
                           ? getPtrToIntExpr(Op, IntPtrTy, Depth + 1)
                           : Op;
}

which seems strictly worse to me. So i'll keep this as-is.

1088

Mainly because i was unsuccessful with acquiring a test case.
But as i have already noted in https://reviews.llvm.org/D89692#2352566,
this hand-written logic will hopefully go away afterwards.

But for now, let's just err on the safe side, and have dead/untested code.

llvm/test/Analysis/ScalarEvolution/ptrtoint.ll
501

I just tried, and that (the patch + flag) does not help here.

@mkazantsev let me know if there are more things to do here

mkazantsev accepted this revision.Oct 29 2020, 10:09 PM

I don't see any more problems, thanks!

This revision is now accepted and ready to land.Oct 29 2020, 10:09 PM

I don't see any more problems, thanks!

Thank you for the review!
Proceeding to landing both patches.

This revision was landed with ongoing or failed builds.Oct 30 2020, 1:14 AM
This revision was automatically updated to reflect the committed changes.

Hello. This increased codesize by a chunk on one of the testcases we track. Part of csibe/zlib. I've put a reproducer in https://godbolt.org/z/o58s7d. I've not really looked any deeper, other than it's quite a bit longer even without outlining, happens on the few architectures I tried and possibly something is going odd in LSR? It was (comparatively) a fairly big chunk of codesize across zlib, not just this file.

Hello. This increased codesize by a chunk on one of the testcases we track. Part of csibe/zlib. I've put a reproducer in https://godbolt.org/z/o58s7d. I've not really looked any deeper, other than it's quite a bit longer even without outlining, happens on the few architectures I tried and possibly something is going odd in LSR? It was (comparatively) a fairly big chunk of codesize across zlib, not just this file.

Hmm. In that snipped, after middle-end, the IR is identical.
But if i llvm-reduce that snippet based on the criteria that the final assembly differs, i get


And if we compare llc -debug -print-before-all logs: ,
indeed, it would seem like that allowed LSR to change pointer-typed PHI into an integer-typed one,
which ends up being bad for the final code size.