This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Do not cache S -> V if S is not equivalent of V
ClosedPublic

Authored by skatkov on Dec 26 2017, 2:21 AM.

Details

Summary

SCEV tracks the correspondence of created SCEV to original instruction.
However during creation of SCEV it is possible that nuw/nsw/exact flags are
lost.

As a result during expansion of the SCEV the instruction with nuw/nsw/exact
will be used where it was expected and we produce poison incorreclty.

Diff Detail

Event Timeline

skatkov created this revision.Dec 26 2017, 2:21 AM
sanjoy accepted this revision.Dec 26 2017, 1:36 PM

lgtm!

lib/Analysis/ScalarEvolution.cpp
3773

Why BS?

3776

I think you also need to check for I having the exact bit set.

This revision is now accepted and ready to land.Dec 26 2017, 1:36 PM
skatkov added inline comments.Dec 26 2017, 8:47 PM
lib/Analysis/ScalarEvolution.cpp
3773

BinarySCEV ? :)

sanjoy added inline comments.Dec 26 2017, 9:44 PM
lib/Analysis/ScalarEvolution.cpp
3773

But it is n-ary SCEV (add / multiply expressions in SCEV can take an arbitrary number of operands).

NAS is better?

Can I lend it with BS -> NAS?

oops, I did not update with exact yet... Will be published soon (as soon as testing completes).

NAS is better?

I'd just go with NS.

Thanks! Unfortunately adding check for exact causes the failure of Transforms/LoopStrengthReduce/post-inc-icmpzero.ll. Will take a look into it.

Hi Sanjoy, I wonder whether SCEV has some equivalent of exact flag?

Hi Sanjoy, I wonder whether SCEV has some equivalent of exact flag?

No.

skatkov updated this revision to Diff 128202.Dec 26 2017, 11:37 PM
skatkov edited the summary of this revision. (Show Details)
skatkov added inline comments.Dec 26 2017, 11:41 PM
test/Transforms/LoopStrengthReduce/post-inc-icmpzero.ll
56 ↗(On Diff #128202)

Actually I realized that this test shows that my solution is not complete. This is better than nothing and fixes the bug I'm working on but it is not a full solution.

The problem is that poison is transitive. Specifically within this patch it will to set SCEV->%sub.ptr.div39 mapping but will set SCEV->%idx.ext21.

And that is not correct.

I tend to land this patch and add TODO into SCEVLostPoisonFlags that we must consider not exactly instruction but really answer to the question whether I can lead to poison...

What do you think?

This revision was automatically updated to reflect the committed changes.
eastig added a subscriber: eastig.Jan 10 2018, 9:59 AM

Hi Sergey,

The patch caused 4% regression in one of our benchmarks. It affected LoopStrength Reduction. I don't have a full reproducer yet but maybe the following will be enough to figure out the issue:

IR before the patch

while.end83.i:                                    ; preds = %while.end83.isplit, %while.body76.3.i.while.end83.i_crit_edge
  ...
  %mul86.i = mul nuw nsw i32 %phitmp280.i, %phitmp.i
  %sub.ptr.lhs.cast.i = ptrtoint i8* %incdec.ptr70.lcssa.i to i32
  %mul88.i = mul i32 %mul86.i, 3
  %sub.ptr.sub.i = add i32 %mul88.i, %sub.ptr.lhs.cast.i
  %add89.i = sub i32 %sub.ptr.sub.i, ptrtoint (...)
  %cmp90.i = icmp eq i32 %add89.i, 230416
  br i1 %cmp90.i, label %if.end94.i, label %if.then92.i


if.end94.i:                                       ; preds = %while.end83.i
  ...
  %cmp97.i = icmp eq i8* %call96.i, null
  br i1 %cmp97.i, label %if.then99.i, label %if.end100.i

if.then99.i:                                      ; preds = %if.end94.i
  ...
  br label %if.end100.i

if.end100.i:                                      ; preds = %if.then99.i, %if.end94.i
  ...
  %cmp101247.i = icmp eq i32 %13, 0
  br i1 %cmp101247.i, label %for.end147.i, label %for.body.lr.ph.i

for.body.lr.ph.i:                                 ; preds = %if.end100.i
  %cmp104243.i = icmp eq i32 %mul86.i, 0
  br i1 %cmp104243.i, label %for.body.preheader.i, label %for.body.us.preheader.i

for.body.us.preheader.i:                          ; preds = %for.body.lr.ph.i
  %14 = add nsw i32 %mul86.i, -1
  %xtraiter289.i = and i32 %mul86.i, 3
  %15 = icmp ult i32 %14, 3
  %lcmp.mod292.i = icmp eq i32 %xtraiter289.i, 0
  %epil.iter.cmp291.i = icmp eq i32 %xtraiter289.i, 1
  %epil.iter.cmp291.1.i = icmp eq i32 %xtraiter289.i, 2
  %16 = sub i32 %xtraiter289.i, %mul86.i

IR after the patch

while.end83.i:                                    ; preds = %while.end83.isplit, %while.body76.3.i.while.end83.i_crit_edge
  ...
  %mul86.i = mul nuw nsw i32 %phitmp280.i, %phitmp.i
  %sub.ptr.lhs.cast.i = ptrtoint i8* %incdec.ptr70.lcssa.i to i32
  %mul88.i = mul i32 %mul86.i, 3
  %sub.ptr.sub.i = add i32 %mul88.i, %sub.ptr.lhs.cast.i
  %add89.i = sub i32 %sub.ptr.sub.i, ptrtoint (...)
  %cmp90.i = icmp eq i32 %add89.i, 230416
  br i1 %cmp90.i, label %if.end94.i, label %if.then92.i

if.end94.i:                                       ; preds = %while.end83.i
  ...
  %cmp97.i = icmp eq i8* %call96.i, null
  br i1 %cmp97.i, label %if.then99.i, label %if.end100.i

if.then99.i:                                      ; preds = %if.end94.i
  ...
  br label %if.end100.i

if.end100.i:                                      ; preds = %if.then99.i, %if.end94.i
  ...
  %cmp101247.i = icmp eq i32 %13, 0
  br i1 %cmp101247.i, label %for.end147.i, label %for.body.lr.ph.i

for.body.lr.ph.i:                                 ; preds = %if.end100.i
  %cmp104243.i = icmp eq i32 %mul86.i, 0
  br i1 %cmp104243.i, label %for.body.preheader.i, label %for.body.us.preheader.i

for.body.us.preheader.i:                          ; preds = %for.body.lr.ph.i
  %14 = add nsw i32 %mul86.i, -1
  %xtraiter289.i = and i32 %mul86.i, 3
  %15 = icmp ult i32 %14, 3
  %lcmp.mod292.i = icmp eq i32 %xtraiter289.i, 0
  %epil.iter.cmp291.i = icmp eq i32 %xtraiter289.i, 1
  %epil.iter.cmp291.1.i = icmp eq i32 %xtraiter289.i, 2
  %16 = mul i32 %phitmp280.i, %phitmp.i
  %17 = sub i32 %xtraiter289.i, %16

The patch caused '%16 = mul i32 %phitmp280.i, %phitmp.i' to appear in the loop.

Any idea what to look at?

Thanks,
Evgeny Astigeevich

skatkov added a comment.EditedJan 10 2018, 8:04 PM

Hi Evgeny, thank you for reporting the problem.

According to IR you posted the patch did what it expected to do.
Specifically, there is an instruction
%mul86.i = mul nuw nsw i32 %phitmp280.i, %phitmp.i
which most probably used in SCEV creation. Somehow SCEV lost nuw/nsw flags.

Before the patch, the matching S -> %mul86.i has been saved for later re-use.
Within this patch It does not keep this matching because SCEV without nuw/nsw should not correspond to instruction with nuw/nsw.

Later when someone expanded this SCEV S at point
sub i32 %xtraiter289.i, ...

Before the patch it was able to re-use %mul86.i (as soon matching is present) and after the patch it could do so due to no matching.
Instead it literally expanded this SCEV to new mul instruction.

What I see can be done here (any of):

  1. Try to analyze why SCEV lost nuw/nsw information and possible fix/extend that place to preserve nuw/nsw in SCEV. In this case matching will still in place and new mul instruction will not be generated.
  2. Potentially "%mul86.i = mul nuw nsw i32 %phitmp280.i, %phitmp.i" can be replaced with "%16 = mul i32 %phitmp280.i, %phitmp.i". It can be done by some other pass but I'm not sure who should be responsible for that. I guess GVN can do this but not sure that it works after LSR in your pipeline.
  3. We can update this code to clean poison flags from instruction if built SCEV lost information about these flags. To my understanding cleaning of poison flags is a safe operation from correctness point of view but it can prohibit some optimization opportunities because we loose some valuable information.

To be honest I'm not sure what approach is a best here. I'd like to get a feedback from Sanjoy and others...

Sergey,
Thank you for the initial analysis. I'll try to debug SCEV.

What I see can be done here (any of):

  1. Try to analyze why SCEV lost nuw/nsw information and possible fix/extend that place to preserve nuw/nsw in SCEV. In this case matching will still in place and new mul instruction will not be generated.
  2. Potentially "%mul86.i = mul nuw nsw i32 %phitmp280.i, %phitmp.i" can be replaced with "%16 = mul i32 %phitmp280.i, %phitmp.i". It can be done by some other pass but I'm not sure who should be responsible for that. I guess GVN can do this but not sure that it works after LSR in your pipeline.
  3. We can update this code to clean poison flags from instruction if built SCEV lost information about these flags. To my understanding cleaning of poison flags is a safe operation from correctness point of view but it can prohibit some optimization opportunities because we loose some valuable information.

To be honest I'm not sure what approach is a best here. I'd like to get a feedback from Sanjoy and others...

Given that this pessimization is happening in LSR which runs pretty late, I would be okay with (3) -- most of the nsw/nuw flag specific optimizations would have already happened by now so there is no big reason to keep the flags anymore.

What I see can be done here (any of):

  1. Try to analyze why SCEV lost nuw/nsw information and possible fix/extend that place to preserve nuw/nsw in SCEV. In this case matching will still in place and new mul instruction will not be generated.
  2. Potentially "%mul86.i = mul nuw nsw i32 %phitmp280.i, %phitmp.i" can be replaced with "%16 = mul i32 %phitmp280.i, %phitmp.i". It can be done by some other pass but I'm not sure who should be responsible for that. I guess GVN can do this but not sure that it works after LSR in your pipeline.
  3. We can update this code to clean poison flags from instruction if built SCEV lost information about these flags. To my understanding cleaning of poison flags is a safe operation from correctness point of view but it can prohibit some optimization opportunities because we loose some valuable information.

To be honest I'm not sure what approach is a best here. I'd like to get a feedback from Sanjoy and others...

Given that this pessimization is happening in LSR which runs pretty late, I would be okay with (3) -- most of the nsw/nuw flag specific optimizations would have already happened by now so there is no big reason to keep the flags anymore.

Hi Sanjoy, I guess this piece of code is common one and will be executed SCEV is created each time whenever it is a LSR or earlier passes. Do you propose to introduce some flag and use it in LSR?
This seems to me very intrusive fix.

I also thought about clearing of nuw/nsw at moment of expansion. Specifically when we check whether there is a instruction for SCEV, first we can give a priority for the instruction without nuw/nsw and second even if we decide to use instruction with nuw/nsw we can clean these flags at that moment. At least it should reduce the cases when we clean up these flags (only when expansion happens). I think the better fix would be really to understand why SCEV looses these flags and fix/extend that place and as I understand Evgeny looks in this direction.

Hi Sanjoy, I guess this piece of code is common one and will be executed SCEV is created each time whenever it is a LSR or earlier passes. Do you propose to introduce some flag and use it in LSR?
This seems to me very intrusive fix.

There is already a SCEVExpander::LSRMode.

I think the better fix would be really to understand why SCEV looses these flags and fix/extend that place and as I understand Evgeny looks in this direction.

Agreed.

I have found that the patch also caused 7.31% regression in SPEC2k6 401.bzip2 on Cortex-A57 (AArch64).

Hi Evgeny, please let me know if you need some more deep explanation about workaround Sanjoy and I mentioned in terms of stripping nuw/nsw for instruction in case you are in LSR.

This is for the case: the regressions are urgent for you and you need some quick workaround while you are investigating why SCEV lost the poison flags.

Hi Evgeny, please let me know if you need some more deep explanation about workaround Sanjoy and I mentioned in terms of stripping nuw/nsw for instruction in case you are in LSR.

This is for the case: the regressions are urgent for you and you need some quick workaround while you are investigating why SCEV lost the poison flags.

I am working on a reproducer. The IR causing the issue is a result of LTO. It's huge and has a lot of loops. As soon as I reduce it to something small I'll start debugging.

Thanks,
Evgeny

A reproducer:

$ opt.patch.off -loop-reduce -S -o test.patch.off.ll test.ll
$ opt -loop-reduce -S -o test.patch.on.ll test.ll
$ diff test.patch.off.ll test.patch.on.ll
26c26,27
<   %0 = sub i32 %xtraiter289.i, %mul86.i
---
>   %0 = mul i32 %p281, %p280
>   %1 = sub i32 %xtraiter289.i, %0
37c38
<   %lsr.iv = phi i32 [ %0, %for.body106.us.i.preheader ], [ %lsr.iv.next, %for.body106.us.i ]
---
>   %lsr.iv = phi i32 [ %1, %for.body106.us.i.preheader ], [ %lsr.iv.next, %for.body106.us.i ]

Hi Sergey and Sanjoy,

To set flags ScalarEvolution::createSCEV calls:

SCEV::NoWrapFlags ScalarEvolution::getNoWrapFlagsFromUB(const Value *V) {
  if (isa<ConstantExpr>(V)) return SCEV::FlagAnyWrap;
  const BinaryOperator *BinOp = cast<BinaryOperator>(V);

  // Return early if there are no flags to propagate to the SCEV.
  SCEV::NoWrapFlags Flags = SCEV::FlagAnyWrap;
  if (BinOp->hasNoUnsignedWrap())
    Flags = ScalarEvolution::setFlags(Flags, SCEV::FlagNUW);
  if (BinOp->hasNoSignedWrap())
    Flags = ScalarEvolution::setFlags(Flags, SCEV::FlagNSW);
  if (Flags == SCEV::FlagAnyWrap)
    return SCEV::FlagAnyWrap;

  return isSCEVExprNeverPoison(BinOp) ? Flags : SCEV::FlagAnyWrap;
}

For "%mul86.i = mul nuw nsw i32 %p280, %p281" isSCEVExprNeverPoison return false which causes the SCEV '(%p280 * %p281)' to have SCEV::FlagAnyWrap instead of a combination of SCEV::FlagNUW and Flags, SCEV::FlagNSW.

Am I correct SCEV::FlagAnyWrap means no poisoned values? If so, this looks strange. Shouldn't it be:

return isSCEVExprNeverPoison(BinOp) ? SCEV::FlagAnyWrap : Flags;

Hi Sergey and Sanjoy,

To set flags ScalarEvolution::createSCEV calls:

SCEV::NoWrapFlags ScalarEvolution::getNoWrapFlagsFromUB(const Value *V) {
  if (isa<ConstantExpr>(V)) return SCEV::FlagAnyWrap;
  const BinaryOperator *BinOp = cast<BinaryOperator>(V);

  // Return early if there are no flags to propagate to the SCEV.
  SCEV::NoWrapFlags Flags = SCEV::FlagAnyWrap;
  if (BinOp->hasNoUnsignedWrap())
    Flags = ScalarEvolution::setFlags(Flags, SCEV::FlagNUW);
  if (BinOp->hasNoSignedWrap())
    Flags = ScalarEvolution::setFlags(Flags, SCEV::FlagNSW);
  if (Flags == SCEV::FlagAnyWrap)
    return SCEV::FlagAnyWrap;

  return isSCEVExprNeverPoison(BinOp) ? Flags : SCEV::FlagAnyWrap;
}

For "%mul86.i = mul nuw nsw i32 %p280, %p281" isSCEVExprNeverPoison return false which causes the SCEV '(%p280 * %p281)' to have SCEV::FlagAnyWrap instead of a combination of SCEV::FlagNUW and Flags, SCEV::FlagNSW.

Am I correct SCEV::FlagAnyWrap means no poisoned values? If so, this looks strange. Shouldn't it be:

As I understand this flag means have no idea, in other words may overflow.

return isSCEVExprNeverPoison(BinOp) ? SCEV::FlagAnyWrap : Flags;

No, I read this code as if isSCEVExprNeverPoison(BinOp) is true then we can trust the flags otherwise we cannot...

But it would be better to hear Sanjoy's opinion here.

For "%mul86.i = mul nuw nsw i32 %p280, %p281" isSCEVExprNeverPoison return false which causes the SCEV '(%p280 * %p281)' to have SCEV::FlagAnyWrap instead of a combination of SCEV::FlagNUW and Flags, SCEV::FlagNSW.

Am I correct SCEV::FlagAnyWrap means no poisoned values? If so, this looks strange. Shouldn't it be:

As I understand this flag means have no idea, in other words may overflow.

When SCEV::FlagAnyWrap is set SCEVNAryExpr::hasNoUnsignedWrap and SCEVNAryExpr::hasNoSignedWrap return false:

bool hasNoUnsignedWrap() const {
  return getNoWrapFlags(FlagNUW) != FlagAnyWrap;
}

bool hasNoSignedWrap() const {
  return getNoWrapFlags(FlagNSW) != FlagAnyWrap;
}

From this I can conclude SCEV::FlagAnyWrap mean no poisoned values.

return isSCEVExprNeverPoison(BinOp) ? SCEV::FlagAnyWrap : Flags;

No, I read this code as if isSCEVExprNeverPoison(BinOp) is true then we can trust the flags otherwise we cannot...

But it would be better to hear Sanjoy's opinion here.

Sergey, it looks like your patch does not take into account that nsw and nuw have not been set because of additional analysis, e.g. isSCEVExprNeverPoison. So SCEVLostPoisonFlags thinks flags are lost but they were not set intentionally. Where the issue your patch fixes come from? I don't see any IR test.

skatkov added a comment.EditedJan 18 2018, 6:29 PM

For "%mul86.i = mul nuw nsw i32 %p280, %p281" isSCEVExprNeverPoison return false which causes the SCEV '(%p280 * %p281)' to have SCEV::FlagAnyWrap instead of a combination of SCEV::FlagNUW and Flags, SCEV::FlagNSW.

Am I correct SCEV::FlagAnyWrap means no poisoned values? If so, this looks strange. Shouldn't it be:

As I understand this flag means have no idea, in other words may overflow.

When SCEV::FlagAnyWrap is set SCEVNAryExpr::hasNoUnsignedWrap and SCEVNAryExpr::hasNoSignedWrap return false:

bool hasNoUnsignedWrap() const {
  return getNoWrapFlags(FlagNUW) != FlagAnyWrap;
}

bool hasNoSignedWrap() const {
  return getNoWrapFlags(FlagNSW) != FlagAnyWrap;
}

From this I can conclude SCEV::FlagAnyWrap mean no poisoned values.

return isSCEVExprNeverPoison(BinOp) ? SCEV::FlagAnyWrap : Flags;

No, I read this code as if isSCEVExprNeverPoison(BinOp) is true then we can trust the flags otherwise we cannot...

But it would be better to hear Sanjoy's opinion here.

Sergey, it looks like your patch does not take into account that nsw and nuw have not been set because of additional analysis, e.g. isSCEVExprNeverPoison. So SCEVLostPoisonFlags thinks flags are lost but they were not set intentionally. Where the issue your patch fixes come from? I don't see any IR test.

I understand what you mean, however the patch does a right thing in general. SCEV without nuw/nsw should not correspond to instruction within these flags.
The functional bug I had looked as follows:
there were two instructions, say:
%a = add nuw nsw %c, -1
%b = add %c, -1

First for instruction %a the SCEV S has been built.
Then for instruction %b SCEV S' has been build. Due to loosing the nuw/nsw flags S == S'.
Later SCEV expander has been used to expand S'. But as soon as we mapped S'==S => %a, %a has been used as expansion of S'.
So in reality it resulted in %a with nuw/nsw has been used where %b should be used and it introduces usage of nuw/nsw instruction where it was not expected.
The later backend did a wrong code generation for -1 due to present of nuw/nsw flags in an instruction.

Let me try to add a clearance of nuw/nsw flags in expander as I explained before...

Evgeny, I posted an alternate solution https://reviews.llvm.org/D42290. Could you please check whether it fixes your regression and do not introduce others?

Evgeny, I posted an alternate solution https://reviews.llvm.org/D42290. Could you please check whether it fixes your regression and do not introduce others?

Hi Sergey,
I checked it fixes regressions. Could the test I posted here be added as a regression test to D42290?

Thanks,
Evgeny

Evgeny, I posted an alternate solution https://reviews.llvm.org/D42290. Could you please check whether it fixes your regression and do not introduce others?

Hi Sergey,
I checked it fixes regressions. Could the test I posted here be added as a regression test to D42290?

Did you check whether it introduces new regressions?
I will add. Anyway I will wait for review from Sanjoy...

Thanks,
Evgeny

Evgeny, I posted an alternate solution https://reviews.llvm.org/D42290. Could you please check whether it fixes your regression and do not introduce others?

Hi Sergey,
I checked it fixes regressions. Could the test I posted here be added as a regression test to D42290?

Did you check whether it introduces new regressions?
I will add. Anyway I will wait for review from Sanjoy...

I have not seen new regressions.