This is an archive of the discontinued LLVM Phabricator instance.

IndVarSimplify: Don't let LFTR compare against a poison value
ClosedPublic

Authored by majnemer on Sep 3 2014, 10:02 AM.

Details

Summary

LinearFunctionTestReplace tries to use the *next* indvar to compare
against when possible. However, it may be the case that the calculation
for the next indvar has NUW/NSW flags and that it may only be safely
used inside the loop. Using it in a comparison to calculate the exit
condition could result in observing poison.

This fixes PR20680.

Diff Detail

Repository
rL LLVM

Event Timeline

majnemer updated this revision to Diff 13216.Sep 3 2014, 10:02 AM
majnemer retitled this revision from to IndVarSimplify: Don't let LFTR compare against a poison value.
majnemer updated this object.
majnemer added a subscriber: Unknown Object (MLST).
hfinkel added inline comments.
lib/Transforms/Scalar/IndVarSimplify.cpp
1652 ↗(On Diff #13216)

I'm not 100% sure this is right. I think that IncrementedIndvarSCEV->getNoWrapFlags() will return only the deduced flags (the ones that SCEV can independently prove), and that could be different from the set of flags on the instruction itself. It might be better to test the flags on IncrementedIndvar itself (but, then again, I could just be wrong).

hfinkel accepted this revision.Sep 3 2014, 3:37 PM
hfinkel added a reviewer: hfinkel.

We talked about this on IRC; taking the flags from SCEV seems reasonable because SCEV is the one that generated the instruction in the first place (although a comment should be added to that effect).

LGTM.

This revision is now accepted and ready to land.Sep 3 2014, 3:37 PM
majnemer closed this revision.Sep 3 2014, 4:13 PM
majnemer updated this revision to Diff 13233.

Closed by commit rL217102 (authored by @majnemer).

atrick edited edge metadata.Sep 3 2014, 4:33 PM

I'll say LGTM because fixing a bug is an improvement. This is unfortunate though because LFTR does not care about overflow and we don't want to pessimize code just because SCEV was able to infer NSW. The problem is that we can't express that NSW applies to some uses an not others.

The fix is also not totally robust because SCEV could, in theory, drop NSW flags even though they exist in IR. The induction variable was not necessarily produced by SCEVExpander. I think making it totally robust would involve getting a new expression for CmpIndVar by applying non-NSW AddExpr's (similar to genLoopLimit). That is of course much more complexity, hence risk, so maybe best as a FIXME.

Out of curiosity, can you explain how we came to generate bad code for this case (so I don't have to debug the test case)? Was there an optimization after LFTR that assumed NSW at the compare? Do we end up widening the loop test and failing to exit?

FWIW: The check against FlagAnyWrap was a little confusing to me, as opposed to simply:
if (maskFlags(Expr->getNoWrapFlags(), SCEV::NSW | SCEV::NUW))

In D5174#13, @atrick wrote:

I'll say LGTM because fixing a bug is an improvement. This is unfortunate though because LFTR does not care about overflow and we don't want to pessimize code just because SCEV was able to infer NSW. The problem is that we can't express that NSW applies to some uses an not others.

I agree, but using the induction variable seemed better than introducing another addition; I don't know of any machinery in LLVM that will merge a NSW "add" and a normal "add" into a single, normal, add.

The fix is also not totally robust because SCEV could, in theory, drop NSW flags even though they exist in IR. The induction variable was not necessarily produced by SCEVExpander. I think making it totally robust would involve getting a new expression for CmpIndVar by applying non-NSW AddExpr's (similar to genLoopLimit). That is of course much more complexity, hence risk, so maybe best as a FIXME.

I'll add a FIXME.

Out of curiosity, can you explain how we came to generate bad code for this case (so I don't have to debug the test case)? Was there an optimization after LFTR that assumed NSW at the compare? Do we end up widening the loop test and failing to exit?

I think we inferred that the rewritten exit condition was unreachable because of the NUW/NSW behavior of the SCEV.

Before my patch, we had BBs that looked like this:

for.inc13.i:
  %indvars.iv.next.i = add nuw nsw i32 %indvars.iv.i, 1
  br i1 true, label %for.cond2.preheader.i, label %fn1.exit.us-lcssa.us-lcssa

After my patch, the same BB looks like this:

for.inc13.1:
  %indvars.iv.next.i = add nuw nsw i32 %indvars.iv.i, 1
  br i1 %exitcond19.i, label %for.cond2.preheader.i, label %fn1.exit.us-lcssa.us-lcssa

FWIW: The check against FlagAnyWrap was a little confusing to me, as opposed to simply:
if (maskFlags(Expr->getNoWrapFlags(), SCEV::NSW | SCEV::NUW))

I'll clean this up as well.

Review comments addressed in rL217115.

mroth added a subscriber: mroth.Sep 5 2014, 8:39 AM

Hi all,

We've been seeing some performance regressions in our internal AArch64 tests due to this patch. I'll preface this by saying that I'm not an expert in this matter at all so take everything I say with a grain of salt :)

I've reduced the test case (this one is from LNT) to what's attached (

). Cleaned up, the IR looks something like this:

for.cond1.preheader:                              ; preds = %middle.block, %entry                                                                           
  %indvars.iv17 = phi i64 [ 0, %entry ], [ %indvars.iv.next18, %middle.block ]                                      
  %indvars.iv.next18 = add nuw nsw i64 %indvars.iv17, 1                                                             
  ...                                                                                                                                                       
                                                                                                      
middle.block:                                     ; preds = %vector.body                                                                                            
  %exitcond19 = icmp eq i64 %indvars.iv17, 255                                                                      
  br i1 %exitcond19, label %for.end10, label %for.cond1.preheader

Using -target aarch64-linux-gnu -mcpu=cortex-a57 -O3, this compiles to the following assembly, also shortened:

.LBB0_1:                                // %for.cond1.preheader                                                     
                                        // =>This Loop Header: Depth=1                                              
                                        //     Child Loop BB0_2 Depth 2                                             
        add     x11, x10, #1            // =1
        ...
// BB#3:                                // %middle.block                                                            
                                        //   in Loop: Header=BB0_1 Depth=1                                                                                           
        cmp      x10, #255              // =255                                                                     
        mov      x10, x11                                                                                           
        b.ne    .LBB0_1

It seems like the exact problem the patch addresses occurs: since the calculation for the next indvar has NUW/NSW flags set, it can't be used to check the exit condition. This leads to an extra register being used for the induction variable.

In D5174#13, @atrick wrote:

I'll say LGTM because fixing a bug is an improvement. This is unfortunate though because LFTR does not care about overflow and we don't want to pessimize code just because SCEV was able to infer NSW. The problem is that we can't express that NSW applies to some uses an not others.

Am I right in thinking that the behaviour in this specific test case is too conservative as a result of the patch? (An i64 won't overflow, since we know the trip count is only 256.)

If yes, do you have any suggestions on how we could fix this? I'm assuming this might also affect other targets, at least similar ones such as ARM/AArch32 or Thumb. Any input would be welcome.

Cheers
Moritz

kongyi added a subscriber: kongyi.Sep 5 2014, 8:46 AM
atrick added a comment.Sep 5 2014, 1:15 PM

What you've done is the right idea. We want to allow post-increment comparison whenever we can prove no overflow on the increment. Handling the constant case as you've done seems safe, albeit messy.

A more general way to check overflow is something like this (I haven't tested this in your case):

unsigned BitWidth = SE->getTypeSizeInBits(BECount->getType());
// BitWidt+1 should actually be sufficient
Type *WideTy = IntegerType::get(SE->getContext(), BitWidth * 2);
IVCount = SE->getAddExpr(BackedgeTakenCount,
                         SE->getConstant(BackedgeTakenCount->getType(), 1));
IVCountWide = SE->getAddExpr(
                SE->getSignExtendedExpr(BackedgeTakenCount, WideTy),
                SE->getConstant(WideTy, 1));
if (getSignExtendExpr(IVCount, WideTy) == IVCountWide)
  // no signed overflow

You may want to give this a try and make sure it's still safe in the problem case.