This is an archive of the discontinued LLVM Phabricator instance.

[SimpleLoopUnswitch] Re-fix introduction of UB when hoisted condition may be undef or poison
ClosedPublic

Authored by hyeongyukim on Jul 15 2021, 12:19 AM.

Details

Summary

https://bugs.llvm.org/show_bug.cgi?id=27506
https://bugs.llvm.org/show_bug.cgi?id=31652
https://bugs.llvm.org/show_bug.cgi?id=51043

Problems with SimpleLoopUnswitch cause the bug reports above.

while (...) {
  if (C) { A }
  else   { B }
}
Into:

C' = freeze(C)
if (C') {
  while (...) { A }
} else {
  while (...) { B }
}

This problem can be solved by adding a freeze on hoisted branches(above transform) and has been solved by D29015.
However, D29015 is now reverted by performance regression(https://github.com/llvm/llvm-project/commit/2b5a8976514de326bb84f0913d9d451089c11d22)

It is not the first time that an added freeze has caused performance regression.
SimplifyCFG also had a problem with UB caused by branching-on-undef, which was solved by adding freeze to the branching condition. (D104569)
Performance regression occurred in D104569, and patches such as D105344 and D105392 were written to minimize it.

This patch will correct the SimpleLoopUnswitch as D104569 handles the SimplyCFG while minimizing performance loss by introducing patches like D105344 and D105392(This patch was rebased with the author's permission)

Diff Detail

Event Timeline

hyeongyukim created this revision.Jul 15 2021, 12:19 AM
hyeongyukim requested review of this revision.Jul 15 2021, 12:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2021, 12:19 AM

When this patch was applied, there is a significant change in the compiled assembly of LLVM testsuite.
There have been changes to 123 of the 1977 files, and I am now looking for ways to reduce them.

hyeongyukim edited the summary of this revision. (Show Details)Jul 15 2021, 1:01 AM
hyeongyukim retitled this revision from [SimpleLoopUnswitch] Re-fix introduction of UB when hoisted condition may be undef or poison to [WIP][SimpleLoopUnswitch] Re-fix introduction of UB when hoisted condition may be undef or poison.Jul 16 2021, 4:25 AM
tambre added a subscriber: tambre.Jul 18 2021, 7:18 AM

While analyzing the cause of performance regression, I found that freeze(icmp(..)) is interrupting SimplifyCFG.
And most of them will be solved by pushing freeze to icmp's operands. (Make freeze(icmp(Op0, Op1)) to icmp(freeze(Op0), freeze(Op1)))

if (InsertFreeze) {
  auto Cond = SI->getCondition();
  if (!isGuaranteedNotToBeUndefOrPoison(Cond, &AC, BI, &DT)) {
    if (ICmpInst* I = dyn_cast<ICmpInst>(Cond)) {
      Value* Op0 = I->getOperand(0);
      Value* Op1 = I->getOperand(1);

      auto FreezeOperand = [&](Value* Op) {
        if (isGuaranteedNotToBePoison(Op, &AC, BI, &DT)) 
          return;

        auto *FrozenOp = new FreezeInst(Op, Op->getName() + ".fr");
        FrozenOp->insertBefore(I);
        Op = FrozenOp;
        return;
      };

      FreezeOperand(Op0);
      FreezeOperand(Op1);
    }
    else
      SI->setCondition(new FreezeInst(Cond, Cond->getName() + ".fr", SI));
  }
}

Actually, performance regression was significantly reduced when the above code was added.
Before I push freeze to icmp's operands, 104 assembly diffs were found while compiling the LLVM testsuite.
But after I apply the above code, 104 assembly diffs were reduced to 37. (Which means performance regression was reduced)

But I'm not sure it's the right approach to treat icmp separately inside the LU.
Can you check if it's okay to add these codes? @aqjune

aqjune added a comment.Aug 3 2021, 6:36 AM

I think so? it sounds great.
Could you try inserting freeze to icmp only when one of its operands is constant?
ex) freeze (icmp op0, 1) -> icmp (freeze(op0), 1)
ex2) freeze (icmp op0, op1) -/-> icmp (freeze(op0), freeze(op1)) <= this splits one freeze into two; its benefit is not clear.

Much like with pushing freeze within simplifycfg, doing that in loopunswitch seems misplaced.
It's impossible to tell without a phase-ordering test (one with -O3, not just a single pass),
but perhaps we simply need to run/move instcombine invocation somewhere in the pipeline?

Much like with pushing freeze within simplifycfg, doing that in loopunswitch seems misplaced.
It's impossible to tell without a phase-ordering test (one with -O3, not just a single pass),
but perhaps we simply need to run/move instcombine invocation somewhere in the pipeline?

@lebedev.ri Sounds good to me.
Could you please tell me how to change the pipeline?
And... is there anything I should consider while changing the pipeline?

I'd need to see the test to guess where we need to add it.
Generally, simpleloopunswitch is only run once,
scheduled in PassBuilder::buildFunctionSimplificationPipeline() (line 761),

LPM1.addPass(
    SimpleLoopUnswitchPass(/* NonTrivial */ Level == OptimizationLevel::O3 &&
                           EnableO3NonTrivialUnswitching));

While the very next simplifycfg (line 796) runs before the first instcombine run (line 797),
there's further simplifycfg run (line 858) that runs after instcombine.
So i'm not sure what's going on, and i won't be able to help without a test...

Oh, thank you for letting me know.
I'll give it a try!

hyeongyukim edited the summary of this revision. (Show Details)

If Cond is not one-use, pushFreezeToPreventPoisonFromPropagating is not working, so the Freeze cannot be pushed.
Unpushed freezes are disturbing SimplyCFG, so I changed all uses of Cond to use Cond.fr.

hyeongyukim added a comment.EditedAug 7 2021, 4:06 AM

We talked about changing the pipeline, but there is no need to change the pipeline.
And the performance degradation caused by this patch was also reduced (104 differences were reduced to 99.) @aqjune

While checking the performance regression caused by this patch, I found the following issue:

  Cond.fr = freeze(Cond)
  br Cond.fr, BB_True, BB_False
BB_True:
  use(Cond.fr)			; Cond.fr is not replaced to true
  ...
BB_False:
  use(Cond.fr)			; Cond.fr is not repalced to false
  ...

In Original LoopUnswitch, we check the use of invariants and replace them if they are dominated by LoopPH or ClonedPH.
But while fixing LoopUnswitch, I inserted freeze to Cond(which is invariant) and replaced all original uses of Cond to Cond.fr.
Not only Cond but also Cond.fr is invariant, but the current implementation doesn't know that Cond.fr is invariant.
I added a code and test code(test1_freeze) that tells that Cond and Cond.fr are both invariants.

Add Freeze-loop-unswitch-cond flag.
This flag determines whether to add a freeze to the condition of the loop switch.
When this flag is turned on, the miscompilation is solved, but It will degrade the performance.
The default value of the flag is set to false until the performance problem is completely solved.

update test codes

refix test code

Since the flag that adds the freeze is off, there is no change in the existing test code.
How about adding -freeze-loop-unswitch-cond to some of the tests so that we can see where the freeze is added?

Add nontrivial-unswitch-freeze.ll to show changes.

remove unused sink functions.

hyeongyukim retitled this revision from [WIP][SimpleLoopUnswitch] Re-fix introduction of UB when hoisted condition may be undef or poison to [SimpleLoopUnswitch] Re-fix introduction of UB when hoisted condition may be undef or poison.Oct 6 2021, 4:40 AM
hyeongyukim added a subscriber: nlopes.
RKSimon resigned from this revision.Oct 6 2021, 5:52 AM
xbolva00 added inline comments.
llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2232

Consider new helper function

reames requested changes to this revision.Oct 6 2021, 10:08 AM

See inline review comments. Overall intent of my suggestions is to cut away everything needing further discussion so that we can get the simplest patch in tree (off by default) and then revisit pieces needing discussion in separate reviews.

llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp
2217

This block of code appears to be re-implementing LICM, and it's not clear we should do this here at all. Please refresh the patch to place the freeze unconditionally before the branch, and then revisit this piece in a separate review with it's own justification.

2232

If we keep the code, agreed. See also comment on previous copy of code, also applies here.

2425

The use of the lambda here is NFC, the code change is reasonable on it's own (without the addition), please pull out and submit separately (without additional review) to simplify this patch.

2427

Do you mind moving this to a separate patch? I think we can do better than this. Specifically, for the case where V is a constant, *many* instructions constant fold and thus we could possibly do something for all users which happens to constant fold freeze without special magic.

llvm/test/Transforms/SimpleLoopUnswitch/nontrivial-unswitch-freeze.ll
2

Please use ./utils/update_test_checks.py

This revision now requires changes to proceed.Oct 6 2021, 10:08 AM

Right, many codes were not directly related to solving the loop unswitch problem.
Those codes were created to reduce performance regression, and I agree to separate them into different patches.

Changed to the original patch.

reames accepted this revision.Oct 11 2021, 8:58 AM

LGTM

We can do better about caching and reusing the expensive analyzes here (ICF, and guaranteed-non-poison), but that can be done in following changes given this is off-by-default.

This revision is now accepted and ready to land.Oct 11 2021, 8:58 AM

Thank you for the review.
I will also prepare for improvement!