This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Ignore ephemeral values when counting insts for threading
ClosedPublic

Authored by tejohnson on Apr 28 2021, 3:27 PM.

Details

Summary

Ignore ephemeral values (only feeding llvm.assume intrinsics) when
computing the instruction count to decide if a block is small enough for
threading. This is similar to the handling of these values in the
InlineCost computation. These instructions will eventually be removed
and shouldn't count against code size (similar to the existing ignoring
of phis).

Without this change, when enabling -fwhole-program-vtables, which causes
type test / assume sequences to be inserted by clang, we can get
different threading decisions. In particular, when building with
instrumentation FDO it can affect the optimizations decisions before FDO
matching, leading to some mismatches.

Diff Detail

Event Timeline

tejohnson created this revision.Apr 28 2021, 3:27 PM
tejohnson requested review of this revision.Apr 28 2021, 3:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2021, 3:27 PM
tejohnson added inline comments.Apr 28 2021, 3:29 PM
llvm/test/Transforms/SimplifyCFG/unprofitable-pr.ll
3

The change to the max small block size is needed because the below test cases all include an llvm.assume sequence, which is now ignored.

nikic added a subscriber: nikic.

Unless I'm missing something, this has a cache invalidation issue and will likely lead to non-deterministic builds. Say you have a dead block with an assume, which gets added to ephemeral values. Then that block and the instructions in it are removed. Now EphValues contains dangling pointers. Then new instructions get allocated and reuse the same memory. Now EphValues claims that values are ephemeral that aren't ephemeral.

Unless I'm missing something, this has a cache invalidation issue and will likely lead to non-deterministic builds. Say you have a dead block with an assume, which gets added to ephemeral values. Then that block and the instructions in it are removed. Now EphValues contains dangling pointers. Then new instructions get allocated and reuse the same memory. Now EphValues claims that values are ephemeral that aren't ephemeral.

It's basically the same problem we have/had with LoopHeaders, yep.

Unless I'm missing something, this has a cache invalidation issue and will likely lead to non-deterministic builds. Say you have a dead block with an assume, which gets added to ephemeral values. Then that block and the instructions in it are removed. Now EphValues contains dangling pointers. Then new instructions get allocated and reuse the same memory. Now EphValues claims that values are ephemeral that aren't ephemeral.

It's basically the same problem we have/had with LoopHeaders, yep.

Ah, thanks for pointing that out. I see there is a loop scoped collectEphemeralValues, is that how it was solved for LoopHeaders? I should probably just add a BB scoped collectEphemeralValues and use it in BlockIsSimpleEnoughToThreadThrough. That would avoid this issue and also avoid the need to pass it around.

mkazantsev resigned from this revision.Apr 30 2021, 2:49 AM

Sorry, I know nothing about ephemereal values, so I don't think I can give any useful feedback here.

tejohnson updated this revision to Diff 342045.Apr 30 2021, 2:35 PM

Compute ephemeral values on per-BB basis when needed

nikic added inline comments.May 3 2021, 2:31 PM
llvm/lib/Analysis/CodeMetrics.cpp
131 ↗(On Diff #342045)

So there's two potential compile-time problems here: The first is that while this starts off with assume inside the block, it may scan uses outside the block as well. Additionally it scans over all assumes in order to find those in the block.

I'm not sure whether this is really important in practice, but having seen pathological ephemeral value collection during inlining, I'm being a bit cautious here.

I think for the particular case it is used for here, it might make the most sense to not collect ephemeral values upfront, instead compute them on the fly. We can do this by changing the direction of the instruction walk from end to start, and then doing something like:

SmallPtrSet<const Value *, 32> EphValues;
auto IsEphemeral = [&](const Value *V) {
  if (isa<AssumeInst>(V))
    return true;
  return isSafeToSpeculativelyExecute(V) &&
         all_of(V->users(), [&](const User *U) { return EphValues.count(U); });
};
for (Instruction &I : reverse()) {
  if (IsEphemeral(&I))
    EphValues.insert(&I);

  // Otherwise normal code.
}

This also has the advantage that we don't need to compute any ephemeral values past the ten or so instructions we look at.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
2448–2449

This comment is confusing, in that ephemeral values will not be deleted while threading (unlike phis). They will only be deleted during codegen.

tejohnson marked an inline comment as done.May 4 2021, 6:35 PM
tejohnson added inline comments.
llvm/lib/Analysis/CodeMetrics.cpp
131 ↗(On Diff #342045)

I was concerned about this too at first. I collected some stats for a large application build and found that on average there were very few assumptions being checked. That being said, pathological cases could occur, and I agree that it is straightforward to reverse the loop and collect on demand, so I changed it to do that.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
2439

One issue with the reversed loop is that the Size is checked against the limit the iteration after it is incremented. When iterating in forward order, this means that the branch is not counted against the limit, since the loop exits before the subsequent check. But with the reversed order it gets counted and the test started failing since we no longer did the threading. I decided to consolidate the Size increment and check to make it more consistent, and simply bumped up the default limit and the one used in the test so that there is no change to the status quo in terms of non-ephemeral values.

tejohnson updated this revision to Diff 342929.May 4 2021, 6:35 PM

Address comments

nikic accepted this revision.May 8 2021, 1:23 PM

LGTM, though personally I'd keep the current value of the limit and change the condition to Size++ > MaxSmallBlockSize instead (using post-increment instead of pre-increment). I think that should retain the behavior. I'm okay either way though.

This revision is now accepted and ready to land.May 8 2021, 1:23 PM

LGTM, though personally I'd keep the current value of the limit and change the condition to Size++ > MaxSmallBlockSize instead (using post-increment instead of pre-increment). I think that should retain the behavior. I'm okay either way though.

Yep, went ahead and switched to that approach.

tejohnson updated this revision to Diff 343952.May 9 2021, 7:06 PM

Address comment

This revision was landed with ongoing or failed builds.May 9 2021, 7:08 PM
This revision was automatically updated to reflect the committed changes.