This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Allow hoisting terminators only with HoistCommonInsts=false.
ClosedPublic

Authored by fhahn on Apr 12 2021, 10:40 AM.

Details

Summary

As a side-effect of the change to default HoistCommonInsts to false
early in the pipeline, we fail to convert conditional branch & phis to
selects early on, which prevents vectorization for loops that contain
conditional branches that effectively are selects (or if the loop gets
vectorized, it will get vectorized very inefficiently).

This patch updates SimplifyCFG to perform hoisting if the only
instruction in both BBs is an equal branch. In this case, the only
additional instructions are selects for phis, which should be cheap.

Even though we perform hoisting, the benefits of this kind of hoisting
should by far outweigh the negatives.

For example, the loop in the code below will not get vectorized on
AArch64 with the current default, but will with the patch. This is a
fundamental pattern we should definitely vectorize. Besides that, I
think the select variants should be easier to use for reasoning across
other passes as well.

https://clang.godbolt.org/z/sbjd8Wshx

double clamp(double v) {
  if (v < 0.0)
    return 0.0;
  if (v > 6.0)
    return 6.0;
  return v;
}

void loop(double* X, double *Y) {
  for (unsigned i = 0; i < 20000; i++) {
    X[i] = clamp(Y[i]);
  }
}

Diff Detail

Event Timeline

fhahn created this revision.Apr 12 2021, 10:40 AM
fhahn requested review of this revision.Apr 12 2021, 10:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2021, 10:40 AM
fhahn added inline comments.
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1439

This could be further restricted to unconditional branches only, to avoid cases with many new successors with PHI nodes.

Please add phaseordering test

fhahn updated this revision to Diff 336928.Apr 12 2021, 12:09 PM

Add phase-ordering test.

lebedev.ri accepted this revision.Apr 12 2021, 12:43 PM

Hmm, this seems pretty harmless.
I do wonder if FoldTwoEntryPHINode() should be generalized, but until then this seems ok.
Thank you.

This revision is now accepted and ready to land.Apr 12 2021, 12:43 PM
This revision was landed with ongoing or failed builds.Apr 13 2021, 2:33 AM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Apr 13 2021, 2:37 AM

Hmm, this seems pretty harmless.
I do wonder if FoldTwoEntryPHINode() should be generalized, but until then this seems ok.
Thank you.

Thanks! I checked the stats for loop-rotate for -O3 -flto on ARM64, and there were no changes, so it does not seem to work against the original goal of D84108. I'm not sure how much work adjusting FoldTwoEntryPHINode would be and what the benefits would be. One advantage of doing it HoistThenElseCodeToIf is that if there are no PHIs, there are no extra instructions necessary at all. But if the successors have PHIs, it might make sense to limit it to cases where the phis can be folded.

Hi,

This patch seems to make -simplifycfg generate different code with/without dbg intrinsics present.
So
opt -simplifycfg -S -o - scfg.ll
and
opt -strip-debug -o - -S scfg.ll | opt -simplifycfg -S -o -
give different results for the attached scfg.ll.

Hi,

This patch seems to make -simplifycfg generate different code with/without dbg intrinsics present.
So
opt -simplifycfg -S -o - scfg.ll
and
opt -strip-debug -o - -S scfg.ll | opt -simplifycfg -S -o -
give different results for the attached scfg.ll.

I wrote https://bugs.llvm.org/show_bug.cgi?id=49982 about it.